I. Abstract
Point cloud data are unstructured three-dimensional sample points to express the basic shape of objects and spaces. However, it is challenging to automatically generate continuous surfaces and infer semantic structures, such as cars, trees, buildings and roads, from a dataset of point clouds generated by a sensor. The understanding of the semantic structures is essential for recording geospatial information. Despite the good performance of deep learning-based approaches in understanding point clouds, their target coverage is still limited by the lack of training datasets that include semantic labels. This discussion paper addresses data formats to share a Labeled Point Cloud (LPC), in which point-level semantic information is annotated to each point.
Creating LPCs manually or semi-manually is a time-consuming task. Therefore, sharing LPCs in an open standard format is becoming increasingly important for the development of more advanced deep learning algorithms for object detection, semantic segmentation, and instance segmentation. Even though several data formats are used to distribute LPC, there is a variety to represent the semantic information depending on distributors or domains. This discussion paper analyzes three popular formats of ASCII text, PLY, and LAS, for supporting LPC and finally proposes a practice to effectively apply HDF5 to facilitate the sharing and importing of LPC datasets.
II. Keywords
The following are keywords to be used by search engines and document catalogues.
ogcdoc, OGC document, OGC HDF5, labeled point cloud, deep learning, point cloud, LPC, machine learning, lidar
III. Preface
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. The Open Geospatial Consortium shall not be held responsible for identifying any or all such patent rights.
Recipients of this document are requested to submit, with their comments, notification of any relevant patent claims or other intellectual property rights of which they may be aware that might be infringed by any implementation of the standard set forth in this document, and to provide supporting documentation.
IV. Security considerations
No security considerations have been made for this document.
V. Submitting Organizations
The following organizations submitted this Document to the Open Geospatial Consortium (OGC):
- National Institute of Advanced Industrial Science and Technology
VI. Submitters
All questions regarding this submission should be directed to the editors or the submitters:
Name | Affiliation |
---|---|
Kyoung-Sook Kim | National Institute of Advanced Industrial Science and Technology |
Taehoon Kim | National Institute of Advanced Industrial Science and Technology |
Wijae Cho | National Institute of Advanced Industrial Science and Technology |
The HDF5 profile for labeled point cloud data
1. Scope
This OGC Discussion Paper (DP) aims to investigate and summarize point cloud data formats (such as PLY, LAS, etc.) and how they can support the labeled point clouds. Based on the issue survey, this DP demonstrates the ease of use and flexibility of the HDF5 format for labeled point clouds.
The DP covers the following scopes:
Survey focusing on how to support the labeled point cloud data in widely used point cloud data format (used in the open dataset);
Practice using the HDF5 format for the labeled point cloud.
2. Conformance
This Discussion Paper defines an HDF5 profile for labeled point cloud data.
The document identifies a Core requirements class and a series of requirements belonging to that class.
3. Normative references
The following documents are referred to in the text in such a way that some or all of their content constitutes requirements of this document. For dated references, only the edition cited applies. For undated references, the latest edition of the referenced document (including any amendments) applies.
ISO: ISO 19101-1:2014, Geographic information — Reference model — Part 1: Fundamentals. International Organization for Standardization, Geneva (2014). https://www.iso.org/standard/59164.html
Aleksandar Jelenak, Ted Habermann, Gerd Heber: OGC 18-043r3, OGC Hierarchical Data Format Version 5 (HDF5®) Core Standard. Open Geospatial Consortium (2019). http://docs.opengeospatial.org/is/18-043r3/18-043r3.html
4. Terms and definitions
This document uses the terms defined in OGC Policy Directive 49, which is based on the ISO/IEC Directives, Part 2, Rules for the structure and drafting of International Standards. In particular, the word “shall” (not “must”) is the verb form used to indicate a requirement to be strictly followed to conform to this document and OGC documents do not use the equivalent phrases in the ISO/IEC Directives, Part 2.
This document also uses terms defined in the OGC Standard for Modular specifications (OGC 08-131r3), also known as the ‘ModSpec’. The definitions of terms such as standard, specification, requirement, and conformance test are provided in the ModSpec.
For the purposes of this document, the following additional terms and definitions apply.
4.1. feature
abstraction of real-world phenomena
[SOURCE: ISO 19101-1:2014]
4.2. labeled point cloud
a set of points which have a semantic label (or index) with its coordinates
4.3. point cloud annotation
process of attaching a set of semantic information to point cloud data without any change to that data
5. Conventions
This section provides details and examples for any conventions used in the document. Examples of conventions are symbols, abbreviations, use of XML schema, or special notes regarding how to read the document.
5.1. Abbreviated terms
The following abbreviated terms are used in this discussion paper:
AIST | National Institute of Advanced Industrial Science and Technology |
ASPRS | American Society for Photogrammetry and Remote Sensing |
DP | Discussion Paper |
HDF5 | Hierarchical Data Format Version 5 |
HDF5LPC | HDF5 for the Labeled Point Cloud |
LIDAR | Light Detection And Ranging (or Laser Imaging, Detection, And Ranging) |
LPC | Label Point Cloud |
OGC | Open Geospatial Consortium |
PCAS | Point Cloud Annotation System |
PLY | Polygon File Format |
VLR | Variable Length Record |
3D | Three-dimensional |
5.2. Identifiers
The normative provisions in this document are denoted by the URI
http://www.opengis.net/spec/HDF5LPC/0.1
NOTE The ‘0.1’ version segment in the URI indicates that this URI is for a prototype. A future OGC Standard on HDF5 LPC, if approved, would use a ‘1.0’ version segment.
All requirements and conformance tests that appear in this document are denoted by partial URIs which are relative to this base.
6. Background
6.1. Labeled point cloud
Point clouds are unstructured data to express the shape of objects and spaces, like three-dimensional (3D) photos. In the geospatial domain, point cloud datasets are collected from 3D scanners such as LiDAR systems and used to generate three-dimensional (3D) structured information representing the real world. Each point has multiple attributes, including x, y, and z coordinates. For example, timestamp, intensity, and color information are stored as basic attribute information with 3D coordinates. When acquiring point cloud data through a sensor, each point is unclassified as to what it represents, such as part of a car, a tree, a building, or a road. In other words, it contains only information about shape and no semantic information. The classification of point clouds is challenging due to the difficulty in inferring the underlying continuous surface from discrete unstructured samples. In a traditional way, each point is manually or semi-manually assigned to a feature label, such as a wall, a ceiling, a floor, a door, a desk or a chair in the case of indoor space. The labeled point clouds are transformed into continuous surfaces representing objects or spaces using commercial software. This manual or semi-manual semantic classification is time-consuming and requires specialized knowledge for software use.
In recent years, Artificial Intelligence (AI), especially Deep Learning (DL) neural networks, has been considered to accelerate geographic feature extraction from satellite imagery or point cloud data and build geospatial data infrastructures. The role of AI in geospatial applications becomes crucial for automating geospatial information systems (GIS) with human-level cognition. In deep learning, the quantity and quality of the training set, usually called labeled point cloud, determines its performance. However, the creation of training datasets of point cloud requires high costs in data collection and annotation of each point. Figure 1 shows an example of labeled point clouds to represent a room in an indoor space. A more practical approach would be to reuse existing training data and modify the semantic labels according to the application scenario. Suppose there are various types of labeled point cloud datasets that support interoperability between tasks such as object detection, semantic segmentation, and instance segmentation. In this case, the cost of implementing AI technology in GIS applications can be reduced.
This discussion paper addresses a few popular labeled point cloud (LPC) datasets and investigates their data formats for the distribution of training datasets. Finally, this paper describes a best practice of HDF5 to effectively store, share, and reuse LPC, namely HDF5LPC.
Figure 1 — The example of the labeled point cloud (data from 2D-3D-S); the semantic information is represented as color
6.2. Open datasets of labeled point cloud
The creation of a labeled point cloud requires a great deal of time and knowledge. Despite the obvious importance of data sharing in the geospatial community, not many datasets are available on the Internet. Here are three popular datasets released by academic organizations as follows:
Stanford 2D-3D-Semantics Dataset(2D-3D-S): The 2D-3D-S dataset provides a variety of mutually registered modalities from 2D, 2.5D, and 3D domains, with instance-level semantic and geometric annotations. It includes 695,878,620 colored 3D point cloud data that have been previously presented in the Stanford large-scale 3D Indoor Spaces Dataset (S3DIS). In more detail, the dataset is collected in 6 large-scale indoor areas (consisting of 270 indoor scenes) that originate from 3 different buildings of mainly educational and office use. The annotations are instance-level and consistent across all modalities and correspond to 13 object classes (structural elements: ceiling, floor, wall, beam, column, window, door, and movable elements: table, chair, sofa, bookcase, board, and clutter for all other elements) and 11 scene categories (office, conference room, hallway, auditorium, open space, lobby, lounge, pantry, copy room, storage, and WC). The file format of these point cloud datasets is Clause 6.3.1, not the standard file format of point clouds, such as Clause 6.3.2 and Clause 6.3.3. The data consists of the coordinates (x,y, and z) and RGB color information (red, blue, and green) of each point in an ASCII plain text file. The file names are wall_1.txt, wall_2.txt, chair_1.txt, and so on, indicating one of the object classes in the scene and its instance label.
ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes(ScanNet v2): ScanNet is an RGB-D video dataset containing 2.5 million views in more than 1500 scans, annotated with 3D camera poses, surface reconstructions, and instance-level semantic segmentations. Annotation is instance-level, consistent across all modalities, and includes 20 object classes (floor, wall, cabinet, bed, chair, sofa, table, door, window, bookshelf, picture, counter, desk, curtain, refrigerator, bathtub, show curtain, toilet, sink, and otherfurniture) included in the package. The class information used is a subset of the NYUv2 40-label set. The point cloud data file format for this data set is Clause 6.3.2. This file contains the coordinates (x,y,z) of each point, its RGB color information (red, blue, green), and an index of the object’s class (label).
KITTI-360: A large-scale dataset with 3D&2D annotations (KITTI-360): KITTI-360 presents a large-scale dataset that contains rich sensory information and full annotations, corresponding to over 320k images and 100k laser scans in a driving distance of 73.7 km. The annotations are semantic and instance-level and consistent across all modalities and correspond to 45 object classes with eight categories (void, flat, construction, object, nature, sky, human, and vehicle) as denoted here. The file format of this dataset’s point cloud data is Clause 6.3.2. The file consisted of each point’s world coordinates (longitude, latitude, and altitude), RGB color information (red, blue, and green), object class indies (semanticID, instanceID), and additional information (isVisible, confidence). Note that, the confidence value indicates the confidence of the semantic label of a 3D point.
Each dataset provides similar information, but the provided file format and method of constructing semantic information are different for each dataset. This paper divides their patterns into two types: point cloud data with single-semantic label and point cloud data with multi-semantic label as shown in Figure 2.
Figure 2 — Two types of the labeled point cloud in the open dataset
6.2.1. Point cloud data with single-semantic label
Case 1 in Figure 2 is a method in which all points of the point cloud data in one file have a single meaning. The semantic information is usually specified by the file name. It means there is no label information in the point cloud data. All points in the file have the same semantic label, and it is not separate in the instance level. However, if the file is organized in the instance of an object, this method can provide semantic information at the instance level.
Since the semantic label of point cloud data is determined by the file name, it is possible to find the desired data without searching within the file, just as one would search in a file system. In addition, when merging multiple instances considering the coordinate system, it is possible to convert to Clause 6.2.2. However, the method that does not include semantic information inside the file is less efficient than Clause 6.2.2 because the number of disk accesses increases as the number of objects and semantic information increases.
6.2.2. Point cloud data with multi-semantic label
The point cloud data in one file has different semantic information at the point level, as shown in Case 2 of Figure 2. It is used to express a scene in which several objects exist rather than a single 3D object (and feature), similar to the real world. The semantic information is mainly managed as a separate file, and each semantic information (semantic label) is organized as an integer-type index. And each point in the file has an index of semantic information and 3D coordinates (and additional sensing information, such as color). When a user is looking for data with a desired semantic label, it is necessary to search the entire point cloud inside the file. Also, since semantic information is managed as a separate file, it is ineffective compared to working with a single file when exchanging or sharing files.
6.3. Popular point cloud formats
This section introduces the representation of semantic labels with typical point cloud formats, such as Clause 6.3.1, Clause 6.3.2, and Clause 6.3.3. They have been used in open datasets of point clouds for a long time in GIS applications.
6.3.1. ASCII text
It is a data format that does not define a file structure but expresses a set of points in terms of 3D Cartesian coordinates (and color information) of the points. Since there is no file structure, there is no header describing the file structure. In addition, meta-information about the data is described in a separate file or a website that provides labeled point-cloud datasets. Since the data consists of the minimum necessary information, the user must manage the required static information (e.g. number of points). The 2D-3D-S dataset uses this method for storing labeled point data and is usually used for simple data sharing.
6.3.2. PLY (Polygon File Format or the Stanford Triangle Format)
The PLY file format, developed by Greg Turk in 1994, describes a collection of polygons for storing graphical objects. The PLY format has a relatively simple structure and is easy to implement, but that is general enough to be useful for a wide range of models. Therefore, the PLY format is widely used to store 3D data from 3D scanners. Various properties can be stored, including color and transparency, normal vectors, texture coordinates, and semantic labels. If users want to store a semantic label, a new property for the semantic label should be defined as a user-defined property with its name and type. This is the format used to store labeled point clouds in ScanNet v2 and KITTI-360 datasets. Another file can manage a semantic label description, but also it can be added in a PLY file as a comment in the header part. However, it only shows that it is possible. There are no rules and structures for that. It needs to be defined as a structure that has a precise meaning to express semantic label (also instance label) information, to use it with other people.
6.3.3. LAS 1.4
The LAS 1.4 format is a file format designed for exchanging and archiving LIDAR (or other) point cloud data. It is an open, binary format specified by the American Society for Photogrammetry and Remote Sensing (ASPRS). The LAS specification opened as the OGC community standard. The data specification includes various types of point data record formats with essential properties. For semantic information, users can use classification values (1 byte, unsigned char), from 64 to 255, because 0 to 63 is already reserved by the specification (LAS 1.4 version, and when we used Point Data Record Formats 6-10). The user-defined classification information can add using Variable Length Records (VLRs); its specification is defined as “CLASSIFICATION LOOKUP” in the LAS 1.4 specification document. The problem is that the maximum number of classes is limited (as 192). It may be enough for semantic labelling, but may not be enough for instance level labelling.
6.4. Limitations
With a brief view of the survey, some limitations are found for sharing and reusing labeled point cloud data as follows:
The point cloud data formats currently widely used cannot express semantic information by itself, or there is a limitation in the method and number of expressions.
In the case of the open datasets for the labeled point cloud, the shared data format and the method of representation and managing semantic information are different. Therefore, people must know how to match the point elements in the shared data file and semantic information in the separate file to use labeled point cloud data.
Academically, in the case of deep learning or machine learning methods using labeled point cloud data, the input and output data formats are different because there is no standard format.
Since labeled point clouds primarily lack a standard defined format, an open format and data model that allows for easy sharing of labeled point cloud training data sets would seem to be necessary to address the above limitations. An open and standard data format would allow deep learning developers to efficiently use labeled point clouds without additional understanding of the data or tools. Therefore, this paper recommends a single format for point clouds that includes semantic information to reduce information loss during distribution like Figure 3.
Figure 3 — Problem of the current (labeled) point cloud format
7. HDF5 for label point cloud (HDF5LPC)
7.1. Hierarchical Data Format Version 5 (HDF5)
HDF5 is a data and storage model designed to store and organize large amounts of data with fast I/O processing. In addition, the HDF5 core standard document is approved as an OGC international standard. The HDF5 data model is designed to support a variety of data types (including data type customization) and provide flexible and efficient processing for large volumes and complex data, especially multi-dimensional numerical arrays. It is suitable for scientific and engineering geospatial applications that describe phenomena that vary in time and space. Various advantages of using HDF5 are described in Example. Naturally, it is suitable for labeled point cloud data, a kind of big data, to combine the geometric and semantic information into one data file, providing various tools and libraries.
Example
What is HDF5®? from the HDF5 official site:
Heterogeneous data: HDF® supports n-dimensional datasets and each element in the dataset may itself be a complex object.
Easy sharing: HDF® is portable, with no vendor lock-in, and is a self-describing file format, meaning everything all data and metadata can be passed along in one file.
Cross Platform: HDF® is a software library that runs on a range of computational platforms, from laptops to massively parallel systems, and implements a high-level API with C, C+, Fortran 90, and Java interfaces. HDF has a large ecosystem with 700 GitHub projects.
Fast I/O: HDF® is high-performance I/O with a rich set of integrated performance features that allow for access time and storage space optimizations.
Big data: There is no limit on the number or size of data objects in the collection, giving great flexibility for big data.
Keep metadata with data: HDF5® allows you to keep the metadata with the data, streamlining data lifecycles and pipelines.
The HDF5 data model consists of six entities: Group, Dataset, Link, Datatype, Dataspace and Attribute, as shown in Figure 4. The HDF5 simplifies the file structure to include two main entities:
Dataset, which comprises a multidimensional array, an HDF5 Datatype describing the array’s data elements, and an HDF5 Dataspace specifying the array’s rank and extent.
Group, which is a container of zero or more HDF5 Links and has a functionality akin to a file system directory. An HDF5 Object (Group, Dataset, or committed Datatype) linked into a group is said to be a member of that group.
Figure 4 — Entities of the HDF5 data model (OGC 18-043r3, HDF5)
7.2. Purpose
In the case of a labeled point cloud with multiple semantic labels, only the label index is appended to the point cloud data, thereby reducing the size of duplicate semantic information. The semantic label information (e.g. name and class) is managed in a separate file. This is because the existing formats for point cloud data do not clearly support it functionally, as described in Section Clause 6.4. The following features allow HDF5 to be adapted to manage all information of labeled point cloud within a file:
Based on HDF5 introduced in the previous section, design a structure that defines only the minimum amount of information required.
Design the structure of the point cloud information in an easily expandable form to support the existing point cloud format.
Design in a structure that minimizes duplicate information.
7.3. Structure
HDF5LPC utilizes three HDF5 Groups for the storage and management of labeled point cloud data as follows:
point_data is to store point cloud data, such as 3D coordinates (x,y, and z), color (red, green, and blue), normal vector (nx, ny, and nz), and so on;
label_index is to store the semantic (and instance label) index of each point in the dataset included in data;
label_info is to store the semantic information (and instance label,) such as the name or class of the semantic label.
Requirements class 1 | |
---|---|
Target type | HDF5 container |
Dependency | http://www.opengis.net/spec/HDF5/data-model/1.0/req/core |
Requirement 1 | |
---|---|
http://www.opengis.net/spec/HDF5LPC/0.1/req/core/point-data | |
The contents of an HDF5 Dataset in the data Group SHALL contain n-dimensional(n > 1) coordinates at least. |
Requirement 2 | |
---|---|
http://www.opengis.net/spec/HDF5LPC/0.1/req/core/label-index | |
The contents of an HDF5 Dataset in the label_index Group SHALL consist of two label indices from the label_info Group (“semantic_label” and “instance_label” Dataset, respectively). |
Requirement 3 | |
---|---|
http://www.opengis.net/spec/HDF5LPC/0.1/req/core/label-index-name | |
The name of an HDF5 Dataset in the label_index Group SHALL match (i.e. be the same as) the name of a data Group. |
Requirement 4 | |
---|---|
“http://www.opengis.net/spec/HDF5LPC/0.1/req/core/label-index-size | |
The size of an HDF5 Dataset in the label_index Group SHALL be the same as that of the matched Dataset in the data Group. |
Requirement 5 | |
---|---|
http://www.opengis.net/spec/HDF5LPC/0.1/req/core/label-info | |
The label_info Group SHALL have two Dataset entities named “semantic_label” and “instance_label” as “String” type. |
Requirement 6 | |
---|---|
http://www.opengis.net/spec/HDF5LPC/0.1/req/core/type | |
The data element in point_data and label_index HDF5 groups SHALL consist of a “Compound” type. |
The sample of HDF5LPC is shown in Figure 5. It has three HDF5 groups: data, label_index and label_info.
point_data group has an HDF5 dataset which represents a point cloud dataset; in this example, x, y, and z coordinates, and red, green, and blue colors, as well as the normal vector of each axis (nx, ny, and nz). The point_data group can have multiple HDF5 datasets; in this case, each dataset has a unique name, and its semantic label index dataset will exist in the label_index group with the same name.
label_index group has an HDF5 dataset which represents a label index indicating semantic and instance information that is described in the label_info group. Each label index should be matched to each point in a dataset whose name is the same and located in the point_data group. The matching occurs by order of each element in the dataset. In this figure, we can check it as the ‘Point Index’ column.
label_info group has two HDF5 datasets; semantic_label dataset represents semantic label information, such as the semantic label’s name (or class). instance_label dataset is similar to semantic_label dataset, but it has semantic label index, which represents the semantic information of instance, from semantic_label dataset. In Figure 5, we can verify the semantic class of the “CeilingObj1” instance is “Ceiling”.
The data element in point_data and label_index HDF5 groups SHALL consist of a Compound type with headings and values as tuples. The header of the data element is well-defined to represent the semantics of the data. In addition, the description of the header can be added by HDF5 Attribute, in which the header is the name of the attribute.
Figure 5 — Example of the structure of the HDF5LPC
7.4. Summary
Designing HDF5LPC as a shared data format for labeled point clouds has the following advantages and disadvantages.
Advantages of HDF5LPC
Combining point cloud data and semantic information into a single HDF5 file eliminates missing data and facilitates data sharing and exchange.
By defining a structure to store semantic information, the contents of data can be easily understood without additional description.
Since the HDF5 structure is used, it can be loaded and processed at high-speed using open libraries of various programming languages.
Supports semantic labels and instance labels to compose richer semantic information.
Disadvantages of HDF5LPC
To check the contents of the HDF5 file, it needs to be visualized with a dedicated viewer (HDFView) or printed at the programming code level using a library.
At this time, there are no open tools that support HDF5LPC.
8. Use case of HDF5LPC
This chapter demonstrates a new Point Cloud Annotation System (PCAS), developed in AIST to represent HDF5LPC data.
8.1. Point Cloud Annotation System (PCAS)
PCAS is a web-based application for adding semantic information to point cloud data. Once point cloud data is registered with a local server, semantic information is automatically generated using a deep learning model with a semantic segmentation task. Here PointNet, a deep learning model trained to classify 13 features on the 2D-3D-S dataset is applied to make a pre-classification of labeled point clouds. After classification using a pre-trained deep learning model, a user can edit each labeled point with a different label or merge the points with the same label. Finally, the system produces an HDF5LPC file for downloads.
Figure 6 shows the main viewer when PCAS is started.
Figure 6 — The main UI of PCAS
Figure 7 shows an example of the classification of point clouds with 13 classes after data registration. The result of each category is represented by a different color.