Published

OGC Discussion Paper

The HDF5 profile for labeled point cloud data
Taehoon Kim Editor Wijae Cho Editor Kyoung-Sook Kim Editor
Additional Formats: PDF
OGC Discussion Paper

Published

Document number:21-077
Document type:OGC Discussion Paper
Document subtype:
Document stage:Published
Document language:English

License Agreement

Permission is hereby granted by the Open Geospatial Consortium, (“Licensor”), free of charge and subject to the terms set forth below, to any person obtaining a copy of this Intellectual Property and any associated documentation, to deal in the Intellectual Property without restriction (except as set forth below), including without limitation the rights to implement, use, copy, modify, merge, publish, distribute, and/or sublicense copies of the Intellectual Property, and to permit persons to whom the Intellectual Property is furnished to do so, provided that all copyright notices on the intellectual property are retained intact and that each person to whom the Intellectual Property is furnished agrees to the terms of this Agreement.

If you modify the Intellectual Property, all copies of the modified Intellectual Property must include, in addition to the above copyright notice, a notice that the Intellectual Property includes modifications that have not been approved or adopted by LICENSOR.

THIS LICENSE IS A COPYRIGHT LICENSE ONLY, AND DOES NOT CONVEY ANY RIGHTS UNDER ANY PATENTS THAT MAY BE IN FORCE ANYWHERE IN THE WORLD. THE INTELLECTUAL PROPERTY IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NONINFRINGEMENT OF THIRD PARTY RIGHTS. THE COPYRIGHT HOLDER OR HOLDERS INCLUDED IN THIS NOTICE DO NOT WARRANT THAT THE FUNCTIONS CONTAINED IN THE INTELLECTUAL PROPERTY WILL MEET YOUR REQUIREMENTS OR THAT THE OPERATION OF THE INTELLECTUAL PROPERTY WILL BE UNINTERRUPTED OR ERROR FREE. ANY USE OF THE INTELLECTUAL PROPERTY SHALL BE MADE ENTIRELY AT THE USER’S OWN RISK. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR ANY CONTRIBUTOR OF INTELLECTUAL PROPERTY RIGHTS TO THE INTELLECTUAL PROPERTY BE LIABLE FOR ANY CLAIM, OR ANY DIRECT, SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM ANY ALLEGED INFRINGEMENT OR ANY LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR UNDER ANY OTHER LEGAL THEORY, ARISING OUT OF OR IN CONNECTION WITH THE IMPLEMENTATION, USE, COMMERCIALIZATION OR PERFORMANCE OF THIS INTELLECTUAL PROPERTY.

This license is effective until terminated. You may terminate it at any time by destroying the Intellectual Property together with all copies in any form. The license will also terminate if you fail to comply with any term or condition of this Agreement. Except as provided in the following sentence, no such termination of this license shall require the termination of any third party end-user sublicense to the Intellectual Property which is in force as of the date of notice of such termination. In addition, should the Intellectual Property, or the operation of the Intellectual Property, infringe, or in LICENSOR’s sole opinion be likely to infringe, any patent, copyright, trademark or other right of a third party, you agree that LICENSOR, in its sole discretion, may terminate this license without any compensation or liability to you, your licensees or any other party. You agree upon termination of any kind to destroy or cause to be destroyed the Intellectual Property together with all copies in any form, whether held by you or by any third party.

Except as contained in this notice, the name of LICENSOR or of any other holder of a copyright in all or part of the Intellectual Property shall not be used in advertising or otherwise to promote the sale, use or other dealings in this Intellectual Property without prior written authorization of LICENSOR or such copyright holder. LICENSOR is and shall at all times be the sole entity that may authorize you or any third party to use certification marks, trademarks or other special designations to indicate compliance with any LICENSOR standards or specifications. This Agreement is governed by the laws of the Commonwealth of Massachusetts. The application to this Agreement of the United Nations Convention on Contracts for the International Sale of Goods is hereby expressly excluded. In the event any provision of this Agreement shall be deemed unenforceable, void or invalid, such provision shall be modified so as to make it valid and enforceable, and as so modified the entire Agreement shall remain in full force and effect. No decision, action or inaction by LICENSOR shall be construed to be a waiver of any rights or remedies available to it.

None of the Intellectual Property or underlying information or technology may be downloaded or otherwise exported or reexported in violation of U.S. export laws and regulations. In addition, you are responsible for complying with any local laws in your jurisdiction which may impact your right to import, export or use the Intellectual Property, and you represent that you have complied with any regulations or registration procedures required by applicable law to make this license enforceable.



I.  Abstract

Point cloud data are unstructured three-dimensional sample points to express the basic shape of objects and spaces. However, it is challenging to automatically generate continuous surfaces and infer semantic structures, such as cars, trees, buildings and roads, from a dataset of point clouds generated by a sensor. The understanding of the semantic structures is essential for recording geospatial information. Despite the good performance of deep learning-based approaches in understanding point clouds, their target coverage is still limited by the lack of training datasets that include semantic labels. This discussion paper addresses data formats to share a Labeled Point Cloud (LPC), in which point-level semantic information is annotated to each point.

Creating LPCs manually or semi-manually is a time-consuming task. Therefore, sharing LPCs in an open standard format is becoming increasingly important for the development of more advanced deep learning algorithms for object detection, semantic segmentation, and instance segmentation. Even though several data formats are used to distribute LPC, there is a variety to represent the semantic information depending on distributors or domains. This discussion paper analyzes three popular formats of ASCII text, PLY, and LAS, for supporting LPC and finally proposes a practice to effectively apply HDF5 to facilitate the sharing and importing of LPC datasets.

II.  Keywords

The following are keywords to be used by search engines and document catalogues.

ogcdoc, OGC document, OGC HDF5, labeled point cloud, deep learning, point cloud, LPC, machine learning, lidar


III.  Preface

Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. The Open Geospatial Consortium shall not be held responsible for identifying any or all such patent rights.

Recipients of this document are requested to submit, with their comments, notification of any relevant patent claims or other intellectual property rights of which they may be aware that might be infringed by any implementation of the standard set forth in this document, and to provide supporting documentation.

IV.  Security considerations

No security considerations have been made for this document.

V.  Submitting Organizations

The following organizations submitted this Document to the Open Geospatial Consortium (OGC):

VI.  Submitters

All questions regarding this submission should be directed to the editors or the submitters:

Name Affiliation
Kyoung-Sook Kim National Institute of Advanced Industrial Science and Technology
Taehoon Kim National Institute of Advanced Industrial Science and Technology
Wijae Cho National Institute of Advanced Industrial Science and Technology

The HDF5 profile for labeled point cloud data

1.  Scope

This OGC Discussion Paper (DP) aims to investigate and summarize point cloud data formats (such as PLY, LAS, etc.) and how they can support the labeled point clouds. Based on the issue survey, this DP demonstrates the ease of use and flexibility of the HDF5 format for labeled point clouds.

The DP covers the following scopes:

2.  Conformance

This Discussion Paper defines an HDF5 profile for labeled point cloud data.

The document identifies a Core requirements class and a series of requirements belonging to that class.

3.  Normative references

The following documents are referred to in the text in such a way that some or all of their content constitutes requirements of this document. For dated references, only the edition cited applies. For undated references, the latest edition of the referenced document (including any amendments) applies.

ISO: ISO 19101-1:2014, Geographic information — Reference model — Part 1: Fundamentals. International Organization for Standardization, Geneva (2014). https://www.iso.org/standard/59164.html

Aleksandar Jelenak, Ted Habermann, Gerd Heber: OGC 18-043r3, OGC Hierarchical Data Format Version 5 (HDF5®) Core Standard. Open Geospatial Consortium (2019). http://docs.opengeospatial.org/is/18-043r3/18-043r3.html

4.  Terms and definitions

This document uses the terms defined in OGC Policy Directive 49, which is based on the ISO/IEC Directives, Part 2, Rules for the structure and drafting of International Standards. In particular, the word “shall” (not “must”) is the verb form used to indicate a requirement to be strictly followed to conform to this document and OGC documents do not use the equivalent phrases in the ISO/IEC Directives, Part 2.

This document also uses terms defined in the OGC Standard for Modular specifications (OGC 08-131r3), also known as the ‘ModSpec’. The definitions of terms such as standard, specification, requirement, and conformance test are provided in the ModSpec.

For the purposes of this document, the following additional terms and definitions apply.

4.1. feature

abstraction of real-world phenomena

[SOURCE: ISO 19101-1:2014]

4.2. labeled point cloud

a set of points which have a semantic label (or index) with its coordinates

4.3. point cloud annotation

process of attaching a set of semantic information to point cloud data without any change to that data

5.  Conventions

This section provides details and examples for any conventions used in the document. Examples of conventions are symbols, abbreviations, use of XML schema, or special notes regarding how to read the document.

5.1.  Abbreviated terms

The following abbreviated terms are used in this discussion paper:

AISTNational Institute of Advanced Industrial Science and Technology
ASPRSAmerican Society for Photogrammetry and Remote Sensing
DPDiscussion Paper
HDF5Hierarchical Data Format Version 5
HDF5LPCHDF5 for the Labeled Point Cloud
LIDARLight Detection And Ranging (or Laser Imaging, Detection, And Ranging)
LPCLabel Point Cloud
OGCOpen Geospatial Consortium
PCASPoint Cloud Annotation System
PLYPolygon File Format
VLRVariable Length Record
3DThree-dimensional

5.2.  Identifiers

The normative provisions in this document are denoted by the URI

http://www.opengis.net/spec/HDF5LPC/0.1

NOTE  The ‘0.1’ version segment in the URI indicates that this URI is for a prototype. A future OGC Standard on HDF5 LPC, if approved, would use a ‘1.0’ version segment.

All requirements and conformance tests that appear in this document are denoted by partial URIs which are relative to this base.

6.  Background

6.1.  Labeled point cloud

Point clouds are unstructured data to express the shape of objects and spaces, like three-dimensional (3D) photos. In the geospatial domain, point cloud datasets are collected from 3D scanners such as LiDAR systems and used to generate three-dimensional (3D) structured information representing the real world. Each point has multiple attributes, including x, y, and z coordinates. For example, timestamp, intensity, and color information are stored as basic attribute information with 3D coordinates. When acquiring point cloud data through a sensor, each point is unclassified as to what it represents, such as part of a car, a tree, a building, or a road. In other words, it contains only information about shape and no semantic information. The classification of point clouds is challenging due to the difficulty in inferring the underlying continuous surface from discrete unstructured samples. In a traditional way, each point is manually or semi-manually assigned to a feature label, such as a wall, a ceiling, a floor, a door, a desk or a chair in the case of indoor space. The labeled point clouds are transformed into continuous surfaces representing objects or spaces using commercial software. This manual or semi-manual semantic classification is time-consuming and requires specialized knowledge for software use.

In recent years, Artificial Intelligence (AI), especially Deep Learning (DL) neural networks, has been considered to accelerate geographic feature extraction from satellite imagery or point cloud data and build geospatial data infrastructures. The role of AI in geospatial applications becomes crucial for automating geospatial information systems (GIS) with human-level cognition. In deep learning, the quantity and quality of the training set, usually called labeled point cloud, determines its performance. However, the creation of training datasets of point cloud requires high costs in data collection and annotation of each point. Figure 1 shows an example of labeled point clouds to represent a room in an indoor space. A more practical approach would be to reuse existing training data and modify the semantic labels according to the application scenario. Suppose there are various types of labeled point cloud datasets that support interoperability between tasks such as object detection, semantic segmentation, and instance segmentation. In this case, the cost of implementing AI technology in GIS applications can be reduced.

This discussion paper addresses a few popular labeled point cloud (LPC) datasets and investigates their data formats for the distribution of training datasets. Finally, this paper describes a best practice of HDF5 to effectively store, share, and reuse LPC, namely HDF5LPC.