I. Abstract
Point cloud data are unstructured three-dimensional sample points to express the basic shape of objects and spaces. However, it is challenging to automatically generate continuous surfaces and infer semantic structures, such as cars, trees, buildings and roads, from a dataset of point clouds generated by a sensor. The understanding of the semantic structures is essential for recording geospatial information. Despite the good performance of deep learning-based approaches in understanding point clouds, their target coverage is still limited by the lack of training datasets that include semantic labels. This discussion paper addresses data formats to share a Labeled Point Cloud (LPC), in which point-level semantic information is annotated to each point.
Creating LPCs manually or semi-manually is a time-consuming task. Therefore, sharing LPCs in an open standard format is becoming increasingly important for the development of more advanced deep learning algorithms for object detection, semantic segmentation, and instance segmentation. Even though several data formats are used to distribute LPC, there is a variety to represent the semantic information depending on distributors or domains. This discussion paper analyzes three popular formats of ASCII text, PLY, and LAS, for supporting LPC and finally proposes a practice to effectively apply HDF5 to facilitate the sharing and importing of LPC datasets.
II. Keywords
The following are keywords to be used by search engines and document catalogues.
ogcdoc, OGC document, OGC HDF5, labeled point cloud, deep learning, point cloud, LPC, machine learning, lidar
III. Preface
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. The Open Geospatial Consortium shall not be held responsible for identifying any or all such patent rights.
Recipients of this document are requested to submit, with their comments, notification of any relevant patent claims or other intellectual property rights of which they may be aware that might be infringed by any implementation of the standard set forth in this document, and to provide supporting documentation.
IV. Security considerations
No security considerations have been made for this document.
V. Submitting Organizations
The following organizations submitted this Document to the Open Geospatial Consortium (OGC):
- National Institute of Advanced Industrial Science and Technology
VI. Submitters
All questions regarding this submission should be directed to the editors or the submitters:
Name | Affiliation |
---|---|
Kyoung-Sook Kim | National Institute of Advanced Industrial Science and Technology |
Taehoon Kim | National Institute of Advanced Industrial Science and Technology |
Wijae Cho | National Institute of Advanced Industrial Science and Technology |
The HDF5 profile for labeled point cloud data
1. Scope
This OGC Discussion Paper (DP) aims to investigate and summarize point cloud data formats (such as PLY, LAS, etc.) and how they can support the labeled point clouds. Based on the issue survey, this DP demonstrates the ease of use and flexibility of the HDF5 format for labeled point clouds.
The DP covers the following scopes:
Survey focusing on how to support the labeled point cloud data in widely used point cloud data format (used in the open dataset);
Practice using the HDF5 format for the labeled point cloud.
2. Conformance
This Discussion Paper defines an HDF5 profile for labeled point cloud data.
The document identifies a Core requirements class and a series of requirements belonging to that class.
3. Normative references
The following documents are referred to in the text in such a way that some or all of their content constitutes requirements of this document. For dated references, only the edition cited applies. For undated references, the latest edition of the referenced document (including any amendments) applies.
ISO: ISO 19101-1:2014, Geographic information — Reference model — Part 1: Fundamentals. International Organization for Standardization, Geneva (2014). https://www.iso.org/standard/59164.html
Aleksandar Jelenak, Ted Habermann, Gerd Heber: OGC 18-043r3, OGC Hierarchical Data Format Version 5 (HDF5®) Core Standard. Open Geospatial Consortium (2019). http://docs.opengeospatial.org/is/18-043r3/18-043r3.html
4. Terms and definitions
This document uses the terms defined in OGC Policy Directive 49, which is based on the ISO/IEC Directives, Part 2, Rules for the structure and drafting of International Standards. In particular, the word “shall” (not “must”) is the verb form used to indicate a requirement to be strictly followed to conform to this document and OGC documents do not use the equivalent phrases in the ISO/IEC Directives, Part 2.
This document also uses terms defined in the OGC Standard for Modular specifications (OGC 08-131r3), also known as the ‘ModSpec’. The definitions of terms such as standard, specification, requirement, and conformance test are provided in the ModSpec.
For the purposes of this document, the following additional terms and definitions apply.
4.1. feature
abstraction of real-world phenomena
[SOURCE: ISO 19101-1:2014]
4.2. labeled point cloud
a set of points which have a semantic label (or index) with its coordinates
4.3. point cloud annotation
process of attaching a set of semantic information to point cloud data without any change to that data
5. Conventions
This section provides details and examples for any conventions used in the document. Examples of conventions are symbols, abbreviations, use of XML schema, or special notes regarding how to read the document.
5.1. Abbreviated terms
The following abbreviated terms are used in this discussion paper:
AIST | National Institute of Advanced Industrial Science and Technology |
ASPRS | American Society for Photogrammetry and Remote Sensing |
DP | Discussion Paper |
HDF5 | Hierarchical Data Format Version 5 |
HDF5LPC | HDF5 for the Labeled Point Cloud |
LIDAR | Light Detection And Ranging (or Laser Imaging, Detection, And Ranging) |
LPC | Label Point Cloud |
OGC | Open Geospatial Consortium |
PCAS | Point Cloud Annotation System |
PLY | Polygon File Format |
VLR | Variable Length Record |
3D | Three-dimensional |
5.2. Identifiers
The normative provisions in this document are denoted by the URI
http://www.opengis.net/spec/HDF5LPC/0.1
NOTE The ‘0.1’ version segment in the URI indicates that this URI is for a prototype. A future OGC Standard on HDF5 LPC, if approved, would use a ‘1.0’ version segment.
All requirements and conformance tests that appear in this document are denoted by partial URIs which are relative to this base.
6. Background
6.1. Labeled point cloud
Point clouds are unstructured data to express the shape of objects and spaces, like three-dimensional (3D) photos. In the geospatial domain, point cloud datasets are collected from 3D scanners such as LiDAR systems and used to generate three-dimensional (3D) structured information representing the real world. Each point has multiple attributes, including x, y, and z coordinates. For example, timestamp, intensity, and color information are stored as basic attribute information with 3D coordinates. When acquiring point cloud data through a sensor, each point is unclassified as to what it represents, such as part of a car, a tree, a building, or a road. In other words, it contains only information about shape and no semantic information. The classification of point clouds is challenging due to the difficulty in inferring the underlying continuous surface from discrete unstructured samples. In a traditional way, each point is manually or semi-manually assigned to a feature label, such as a wall, a ceiling, a floor, a door, a desk or a chair in the case of indoor space. The labeled point clouds are transformed into continuous surfaces representing objects or spaces using commercial software. This manual or semi-manual semantic classification is time-consuming and requires specialized knowledge for software use.
In recent years, Artificial Intelligence (AI), especially Deep Learning (DL) neural networks, has been considered to accelerate geographic feature extraction from satellite imagery or point cloud data and build geospatial data infrastructures. The role of AI in geospatial applications becomes crucial for automating geospatial information systems (GIS) with human-level cognition. In deep learning, the quantity and quality of the training set, usually called labeled point cloud, determines its performance. However, the creation of training datasets of point cloud requires high costs in data collection and annotation of each point. Figure 1 shows an example of labeled point clouds to represent a room in an indoor space. A more practical approach would be to reuse existing training data and modify the semantic labels according to the application scenario. Suppose there are various types of labeled point cloud datasets that support interoperability between tasks such as object detection, semantic segmentation, and instance segmentation. In this case, the cost of implementing AI technology in GIS applications can be reduced.
This discussion paper addresses a few popular labeled point cloud (LPC) datasets and investigates their data formats for the distribution of training datasets. Finally, this paper describes a best practice of HDF5 to effectively store, share, and reuse LPC, namely HDF5LPC.