Published

OGC Engineering Report

OGC Testbed 17: Geo Data Cube API Engineering Report
Jérôme Jacovella-St-Louis Editor
OGC Engineering Report

Published

Document number:21-027
Document type:OGC Engineering Report
Document subtype:
Document stage:Published
Document language:English

License Agreement

Permission is hereby granted by the Open Geospatial Consortium, (“Licensor”), free of charge and subject to the terms set forth below, to any person obtaining a copy of this Intellectual Property and any associated documentation, to deal in the Intellectual Property without restriction (except as set forth below), including without limitation the rights to implement, use, copy, modify, merge, publish, distribute, and/or sublicense copies of the Intellectual Property, and to permit persons to whom the Intellectual Property is furnished to do so, provided that all copyright notices on the intellectual property are retained intact and that each person to whom the Intellectual Property is furnished agrees to the terms of this Agreement.

If you modify the Intellectual Property, all copies of the modified Intellectual Property must include, in addition to the above copyright notice, a notice that the Intellectual Property includes modifications that have not been approved or adopted by LICENSOR.

THIS LICENSE IS A COPYRIGHT LICENSE ONLY, AND DOES NOT CONVEY ANY RIGHTS UNDER ANY PATENTS THAT MAY BE IN FORCE ANYWHERE IN THE WORLD. THE INTELLECTUAL PROPERTY IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NONINFRINGEMENT OF THIRD PARTY RIGHTS. THE COPYRIGHT HOLDER OR HOLDERS INCLUDED IN THIS NOTICE DO NOT WARRANT THAT THE FUNCTIONS CONTAINED IN THE INTELLECTUAL PROPERTY WILL MEET YOUR REQUIREMENTS OR THAT THE OPERATION OF THE INTELLECTUAL PROPERTY WILL BE UNINTERRUPTED OR ERROR FREE. ANY USE OF THE INTELLECTUAL PROPERTY SHALL BE MADE ENTIRELY AT THE USER’S OWN RISK. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR ANY CONTRIBUTOR OF INTELLECTUAL PROPERTY RIGHTS TO THE INTELLECTUAL PROPERTY BE LIABLE FOR ANY CLAIM, OR ANY DIRECT, SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM ANY ALLEGED INFRINGEMENT OR ANY LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR UNDER ANY OTHER LEGAL THEORY, ARISING OUT OF OR IN CONNECTION WITH THE IMPLEMENTATION, USE, COMMERCIALIZATION OR PERFORMANCE OF THIS INTELLECTUAL PROPERTY.

This license is effective until terminated. You may terminate it at any time by destroying the Intellectual Property together with all copies in any form. The license will also terminate if you fail to comply with any term or condition of this Agreement. Except as provided in the following sentence, no such termination of this license shall require the termination of any third party end-user sublicense to the Intellectual Property which is in force as of the date of notice of such termination. In addition, should the Intellectual Property, or the operation of the Intellectual Property, infringe, or in LICENSOR’s sole opinion be likely to infringe, any patent, copyright, trademark or other right of a third party, you agree that LICENSOR, in its sole discretion, may terminate this license without any compensation or liability to you, your licensees or any other party. You agree upon termination of any kind to destroy or cause to be destroyed the Intellectual Property together with all copies in any form, whether held by you or by any third party.

Except as contained in this notice, the name of LICENSOR or of any other holder of a copyright in all or part of the Intellectual Property shall not be used in advertising or otherwise to promote the sale, use or other dealings in this Intellectual Property without prior written authorization of LICENSOR or such copyright holder. LICENSOR is and shall at all times be the sole entity that may authorize you or any third party to use certification marks, trademarks or other special designations to indicate compliance with any LICENSOR standards or specifications. This Agreement is governed by the laws of the Commonwealth of Massachusetts. The application to this Agreement of the United Nations Convention on Contracts for the International Sale of Goods is hereby expressly excluded. In the event any provision of this Agreement shall be deemed unenforceable, void or invalid, such provision shall be modified so as to make it valid and enforceable, and as so modified the entire Agreement shall remain in full force and effect. No decision, action or inaction by LICENSOR shall be construed to be a waiver of any rights or remedies available to it.

None of the Intellectual Property or underlying information or technology may be downloaded or otherwise exported or reexported in violation of U.S. export laws and regulations. In addition, you are responsible for complying with any local laws in your jurisdiction which may impact your right to import, export or use the Intellectual Property, and you represent that you have complied with any regulations or registration procedures required by applicable law to make this license enforceable.



I.  Abstract

This OGC Testbed 17 Engineering Report (ER) documents the results and recommendations of the Geo Data Cube API task. The ER defines a draft specification for an interoperable Geo Data Cube (GDC) API leveraging OGC API building blocks, details implementation of the draft API, and explores various aspects including data retrieval and discovery, cloud computing and Machine Learning. Implementations of the draft GDC API are demonstrated with use cases including the integration of terrestrial and marine elevation data and forestry information for Canadian wetlands.

II.  Executive Summary

II.A.  Key findings

An architecture for a Geo Data Cube API framework is proposed. The framework is built using approved OGC standards and draft OGC API specifications. OGC API — Common provides a cohesive consistency for presenting an API landing page, conformance declaration and API description in Part 1: Core OGC 19-072, while defining collections of spatiotemporal data corresponding to the GDC API data cube resources in Part 2: Geospatial data OGC 20-024. Multi-resolution data can be stored, indexed, and represented as such Geo Data Cube resources. These resources can be transformed by performing different operations such as resampling, subsetting, aggregation, filtering, band arithmetic calculations, or processing algorithms. This also includes using complex workflows, queried by multiple data access mechanisms defined in OGC API building blocks, and returned as outputs in suitable negotiated formats.

The following OGC standards and specifications were considered and/or used in defining the GDC API.

  • OGC API — Coverages is considered a key GDC capability with its subsetting, range (fields) subsetting, scaling and tiles conformance classes, as well as proposed extensions for supporting filtering expressions, band arithmetic calculations and varying resolution.

  • Cloud Optimized GeoTIFF (COG) was considered as both a backend data store and an efficient distribution mechanism.

  • OGC API — Tiles and OGC API — Features are also considered as data access mechanisms.

  • OGC API — Environmental Data Retrieval (EDR) which offers queries for typical meteorological use cases such as data along a trajectory or within a corridor.

  • OGC API — Discrete Global Grid Systems is suggested as an important component to integrate within a GDC API framework.

  • Complex analytics can be achieved using OGC API — Processes — Part 1: Core, while simpler analytics capabilities should be conveniently integrated directly within OGC API — Coverages and EDR data requests.

  • The OGC API — Processes — Part 2: Deploy, Replace, Update draft specification is highlighted as a way to deploy new complex algorithms close to data.

  • The OGC API — Processes — Part 3: Workflows & Chaining draft specification is highlighted as a way to present the output of a process or workflow as a data cube, while supporting integration of distributed data cubes and analytics capabilities.

  • The OGC API — Maps and OGC API — Tiles specifications were identified as ways to directly integrate server-side visualization capabilities within a GDC framework.

  • The role of OGC API — Records, STAC and OGC API — Common for data discovery was explored.

Some overlap between the OGC API — EDR Standard and draft OGC API — Coverages specification were identified, particularly in terms of describing a data cube and the EDR cube queries. Some current incompatibilities between the APIs specified in OGC API — EDR and OGC API — Common were also identified.

A Scenes API is proposed as a way to provide a unified data cube while still providing direct access to individual scenes making it up as well as to their metadata. The Scenes API is also proposed as a mechanism to manage multiple scenes making up a data cube. This approach is based on the work from the Testbed 15 — Images API.

The new analytics capabilities defined by Testbed 16 — Data Access & Processing API (DAPA) are proposed as extensions for the Coverages and EDR APIs rather than as a new separate API. The definition of well-known processes supporting convenient processing languages is suggested. The need for identifying data cubes for use as input to particular processes was identified.

II.B.  Results

The initiative participants developed four servers (provided by Wuhan University, 52°North, MEEO, and Ecere) and three clients (provided by Solenix, Ethar and Ecere) implementing selected Geo Data Cube API capabilities based on OGC API standards and specifications:

  • OGC API — Common — Part 1: Core;

  • OGC API — Common — Part 2: Geospatial data;

  • OGC API — Coverages — Part 1: Core, supporting subsetting, range subsetting, i.e. fields selection, scaling;

  • OGC API — Processes — Part 1: Core, supporting synchronous and asynchronous execution;

  • OGC API — Processes — Part 3: Workflows and Chaining, supporting collection input and collection output.

52°North demonstrated the use of the GDC API in the context of machine learning for land cover prediction from Earth Observation imagery. Ethar additionally demonstrated the use of the GDC API in the context of Augmented Reality.

II.C.  Business value

This Engineering Report (ER) describes the results of discussions and experiments evaluating OGC API standards and draft specifications. Further, other data cube implementations developed outside of the OGC were evaluated in the context of a Geo Data Cube API for data access, analytics and discovery. The ER makes recommendations for the OGC Standards Program to improve interoperability of data cubes. The ER also highlights the interoperability drawbacks of defining different specifications for the same functionality within the same family of OGC API standards. Capabilities missing from OGC standards for accessing and performing analytics on data cubes are also identified which should be standardized in a uniform manner by extending the current approved standards and draft specifications. This should in turn facilitate the rapid implementation of interoperable spatiotemporal data cube capabilities within various technologies and spur further innovation.

II.D.  Requirements addressed

In the Testbed-17 GDC task, the participants addressed requirements for defining an OGC API for Geo Data Cubes, leveraging existing building blocks, which would support:

  • access and processing in the cloud,

  • data discovery and querying information of diverse collections of data,

  • interoperability with STAC, registries & catalogs,

  • interoperability of data formats and access methods,

  • interoperability across different cloud providers,

  • interoperable workflows,

  • machine learning for detection from Earth Observation imagery and deriving insights from spatiotemporal data, and

  • interoperability between different Geo Data Cubes and APIs.

II.E.  Motivation for defining a Geo Data Cube API

The motivation for defining a GDC API was to provide efficient access to data cubes, performing analytics close to the data ranging from some simple aggregation and band arithmetic to more complex algorithms, discovering data and analytics capabilities, as well as potentially integrating visualization and analytics management capabilities. Such an API will enable the use of these capabilities in client applications, allowing to derive useful insights from very large collections of data, in particular multi-spectral Earth Observation imagery, which are an important source of information in the context of solving global challenges such as climate change.

II.F.  Recommendations for future work

The GDC task demonstrated the value of the OGC API family of standards, including those already approved (Features Part 1: Core & Part 2: CRS by reference, EDR, and Processes), and those still in draft stage (e.g. Common Part 1: Core & Part 2: Geospatial Data, Features — Part 3: Filtering, CQL2, Tiles, Coverages, Maps, DGGS, Records/STAC, Processes — Part 2: Deploy, Replace, Update & Part 3 — Workflows & Chaining), and recommends prioritizing their completion.

The importance of completing OGC API — Common — Part 1 & Part 2 as a framework for integrating capabilities in particular is highlighted. For example, resolving some incompatibilities that already identified with the EDR and OGC API — Common — Part 2 specifications. These could be resolved, allowing to offer the same data cube using the EDR API plus additional access mechanisms.

The role of the draft OGC API — Coverages specification as a baseline for describing data cubes and providing a simple and convenient data access mechanism should be clarified. This includes support for subsetting domain and range (fields / bands), and resampling. Further, this also includes support for accessing coverage data as tiles, following a fixed pyramidal multi-dimensional tiling scheme. Additional capabilities for filtering based on CQL expressions should be considered as an extension for coverages.

If possible, an attempt should be made to re-align and harmonize the EDR specification’s data description mechanism and its cube query with OGC API — Coverages. The analytics capabilities defined in the Testbed 16 — DAPA specification should be integrated directly within OGC API — Coverages and possibly OGC API — EDR as well as extensions rather than defining a new specification. OGC should ensure separate OGC API standards do not re-define the same capabilities with only superficial variations that could reduce interoperability. This also introduces a significant burden on implementers of clients & services in terms of additional standards to implement.

A Scenes API should be defined making it possible to support both a unified data cube while providing direct access to the data and metadata of individual scenes, thereby enabling integrated discovery, as well as scenes management capabilities.

Defining well-known processes expecting specific inputs — including a particular convenient processing language — to facilitate flexible coverage processing should be considered.

The need for Executable Test Suites for the different OGC API standards was highlighted.

The value of defining a set of standardized OGC API building blocks as a GDC meta-standard should be considered.

Further defining and leveraging the draft OGC API — Processes — Part 3: Workflows and Chaining specification would support presentation of the results of analytics capabilities as a virtual data cube and facilitating the integration of analytics capabilities in visualization clients, as well as facilitating the integration of remote data cubes with processing algorithms. This should be an important priority.

III.  Keywords

The following are keywords to be used by search engines and document catalogues.

ogcdoc, OGC document, API, OpenAPI, OGC API, Coverage, Data Cube


IV.  Preface

Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. The Open Geospatial Consortium shall not be held responsible for identifying any or all such patent rights.

Recipients of this document are requested to submit, with their comments, notification of any relevant patent claims or other intellectual property rights of which they may be aware that might be infringed by any implementation of the standard set forth in this document, and to provide supporting documentation.

V.  Security considerations

No security considerations have been made for this document.

VI.  Submitting Organizations

The following organizations submitted this Document to the Open Geospatial Consortium (OGC):

VII.  Submitters

All questions regarding this document should be directed to the editor or the contributors:

Name Organization Role
Jérôme Jacovella-St-Louis Ecere Editor
Tony Hodgson Ethar, Inc. Contributor
Karri Ojala Solenix GmbH Contributor
Alexander Lais Solenix GmbH Contributor
Peng Yue Wuhan University Contributor
Martin Pontius 52°North GmbH Contributor
Eike Hinderk Jürrens 52°North GmbH Contributor
Sufian Zaabalawi 52°North GmbH Contributor
Joshua Lieberman OGC Contributor
Patrick Dion Ecere Contributor
Diego Caraffini Ecere Contributor
Fabio Govoni MEEO Contributor
Fan Gao Wuhan University Contributor
Shuaifeng Zhao Wuhan University Contributor
Colin Steinmann Ethar, Inc., Open AR Cloud Contributor
Nazih Fino Ethar, Inc., Open AR Cloud Contributor
Panagiotis (Peter) A. Vretanos CubeWerx Inc. Contributor

OGC Testbed 17: Geo Data Cube API Engineering Report

1.  Normative references

The following documents are referred to in the text in such a way that some or all of their content constitutes requirements of this document. For dated references, only the edition cited applies. For undated references, the latest edition of the referenced document (including any amendments) applies.

Clemens Portele, Panagiotis (Peter) A. Vretanos, Charles Heazel: OGC 17-069r3, OGC API — Features — Part 1: Core. Open Geospatial Consortium (2019). http://docs.opengeospatial.org/is/17-069r3/17-069r3.html

Clements Portele, Panagiotis (Peter) A. Vretanos: OGC 18-058, OGC API — Features — Part 2: Coordinate Reference Systems by Reference. Open Geospatial Consortium (2020). https://docs.ogc.org/is/18-058/18-058.html

Peter Baumann: OGC 17-089r1, OGC Web Coverage Service (WCS) 2.1 Interface Standard — Core. Open Geospatial Consortium (2018). http://docs.opengeospatial.org/is/17-089r1/17-089r1.html

Matthias Mueller: OGC 14-065r2, OGC® WPS 2.0.2 Interface Standard: Corrigendum 2. Open Geospatial Consortium (2018). http://docs.opengeospatial.org/is/14-065/14-065r2.html

OGC API — Environmental Data Retrieval Standard, https://www.opengis.net/doc/IS/ogcapi-edr-1/1.0

OGC API — Processes — Part 1, https://docs.ogc.org/is/18-062r2/18-062r2.html

OGC: OGC 07-011, Topic 6 — Schema for coverage geometry and functions. Open Geospatial Consortium (2007). https://portal.ogc.org/files/?artifact_id=19820

Peter Baumann, Eric Hirschorn, Joan Masó: OGC 09-146r6, OGC Coverage Implementation Schema. Open Geospatial Consortium (2017). http://docs.opengeospatial.org/is/09-146r6/09-146r6.html

Joan Masó: OGC 17-083r2, OGC Two Dimensional Tile Matrix Set. Open Geospatial Consortium (2019). http://docs.opengeospatial.org/is/17-083r2/17-083r2.html

Matthew Purss: OGC 15-104r5, Topic 21 — Discrete Global Grid Systems Abstract Specification. Open Geospatial Consortium (2017). http://docs.opengeospatial.org/as/15-104r5/15-104r5.html

2.  Terms, definitions and abbreviated terms

This document uses the terms defined in OGC Policy Directive 49, which is based on the ISO/IEC Directives, Part 2, Rules for the structure and drafting of International Standards. In particular, the word “shall” (not “must”) is the verb form used to indicate a requirement to be strictly followed to conform to this document and OGC documents do not use the equivalent phrases in the ISO/IEC Directives, Part 2.

This document also uses terms defined in the OGC Standard for Modular specifications (OGC 08-131r3), also known as the ‘ModSpec’. The definitions of terms such as standard, specification, requirement, and conformance test are provided in the ModSpec.

For the purposes of this document, the following additional terms and definitions apply.

2.1.  Terms and definitions

2.1.1. cell

unit of a coverage’s domain set (potentially spanning multiple direct positions), of a fixed resolution in the case of gridded coverages, for which a specific set of range values (e.g. a pixel in an image, or a set of measurements) is returned

2.1.2. collection

(in the context of OGC API specifications) resource consisting of geospatial data that may be available as one or more sub-resource distributions that conform to one or more OGC API standards.

(SOURCE: https://github.com/opengeospatial/ogcapi-common/issues/140#issuecomment-642239475)

(in a general computer science context) grouping of some variable number of data items (possibly zero) that have some shared significance to the problem being solved and need to be operated upon together in some controlled fashion

(SOURCE: https://en.wikipedia.org/wiki/Collection_(abstract_data_type) )

2.1.3. coordinate reference system

coordinate system that is related to the real world by a datum

(SOURCE: ISO 19111:2019 Geographic information — Referencing by coordinates)

2.1.4. coordinate reference system

coordinate system that is related to the real world by a datum term name (source: ISO 19111)

2.1.5. coverage

feature that acts as a function to return values from its range for any direct position within its spatio-temporal domain

2.1.6. data cube

multi-dimensional data store

Multi-dimensional (n-D) array of values

(SOURCE: OGC 18-095r7)

Note 1 to entry: The term is also sometimes used to refer to a service or platform providing access to such data cube, or to a federation of such services or platforms.

Note 2 to entry: Even though it is called a ‘cube,’ it can be 1- dimensional, 2-dimensional, 3-dimensional, or higher-dimensional. The dimensions may be coordinates or enumerations, e.g., categories.

2.1.7. dataset

A dataset is a collection of data, published or curated by a single agent. Data comes in many forms including numbers, words, pixels, imagery, sound and other multi-media, and potentially other types, any of which might be collected into a dataset.

(SOURCE: W3C Data Catalog Vocabulary (DCAT) — Version 2, 2020)

Note 1 to entry: There is an important distinction between a dataset as an abstract idea and a distribution as a manifestation of the dataset

2.1.8. data store

A data store is a repository for persistently storing and managing collections of data which include not just repositories like databases, but also simpler store types such as simple files, metadata, models, etc.

(SOURCE: https://www.information-management.com/glossary/d.html:2020)

2.1.9. direct position

position described by a single set of coordinates within a coordinate reference system

(SOURCE: OGC Abstract Topic 6 — Schema for coverage geometry and functions)

2.1.10. domain

well-defined set [ISO/TS 19103]

(SOURCE: OGC Abstract Topic 6 — Schema for coverage geometry and functions)

Note 1 to entry: Domains are used to define the domain and range of operators and functions.

2.1.11. elevation

synonym for “height”

(SOURCE: Clause 4.16 of ISO/TS 19159:2016, https://www.iso.org/obp/ui/#iso:std:iso:ts:19159:-2:ed-1:v1:en)

2.1.12. geo data cube

a data cube for which some dimensions are geospatial (e.g. latitude and longitude, or projected easting and northing; elevation above the WGS84 ellipsoid)

A (geo) data cube is a discretized model of the earth that offers estimated values of certain variables for each partition of the Earth’s surface called a cell. A data cube instance may provide data for the whole Earth or a subset thereof. Ideally, a data cube is dense (i.e., does not include empty cells) with regular cell distance for its spatial and temporal dimensions. A data cube describes its basic structure, i.e., its spatial and temporal characteristics and its supported variables (also known as ‘properties’), as metadata. It is further defined by a set of functions. These functions describe the available discovery, access, view, analytical, and processing methods that are supported to interact with the data cube.

(Source: OGC 21-067)

Note 1 to entry: From a functionality perspective, it can be considered a multi-dimensional field including spatial dimensions, and often temporal dimensions as well (much like a coverage).

Note 2 to entry: As documented in OGC 21-067, this definition was proposed as an outcome of a Workshop and is thus still the subject of discussion.

2.1.13. height

Distance of a point from a chosen reference surface measured upward along a line perpendicular to that surface.

(SOURCE: ISO 19111:2019 Geographic information — Referencing by Coordinates)

Note 1 to entry: A height below the reference surface will have a negative value, which would embrace both gravity-related heights and ellipsoidal heights.

2.1.14. job

instance of a process execution

2.1.15. metadata

information about a resource.

(SOURCE: ISO 19115-1:2014)

Note: The US National System for Geospatial Intelligence (NSG) Metadata Foundation (NMF) Version 3.0 defines metadata as information that captures the characteristics of a resource to represent the ‘who’, ‘what’, ‘when’, ‘where’, ‘why’, and ‘how’ of that resource.

2.1.16. platform

computer hardware, software and/or network services providing a set of defined capabilities

2.1.17. process

series of computing operations to be executed, which may produce one or more output (and/or result in some other side effects), and may take one or more inputs.

2.1.18. range

(coverage) set of feature attribute values associated by a function with the elements of the domain of a coverage

(SOURCE: OGC Abstract Topic 6 — Schema for coverage geometry and functions)

2.1.19. resource

identifiable asset or means that fulfills a requirement

(SOURCE: ISO:19115-1:2014 Geographic information — Metadata — Part 1: Fundamentals)

Note 1 to entry: A web resource, or simply resource, is any identifiable thing, whether digital, physical, or abstract.

2.1.20. slice

subset of a coverage for a single coordinate along a dimension axis, for which the resulting coverage is reduced by one dimension

2.1.21. subsetting

operation whose result is a subset of the original set (e.g. trim or slice operations on a coverage)

2.1.22. tile

geometric shape with known properties that may or may not be the result of a tiling (tessellation)process. A tile consists of a single connected “piece” without “holes” or “lines” (topological disc).

(SOURCE: OGC 19-014r1: Core Tiling Conceptual and Logical Models for 2D Euclidean Space)

Note 1 to entry: “tile” is NOT a packaged blob of data to download in a chunky streaming optimization scheme!

2.1.23. tiling

in mathematics, a tiling (tessellation) is a collection of subsets of the space being tiled, i.e. tiles that cover the space without gaps or overlaps.

(SOURCE: OGC 19-014r1: Core Tiling Conceptual and Logical Models for 2D Euclidean Space)

2.1.24. trim

subset of a coverage between lower and upper bound coordinates along a dimension axis which does not reduce the dimensionality of the resulting coverage

2.1.25. workflow

sequence of processes (whether local or remote) to be executed, possibly with pre-defined and/or external input values, whose output(s) may serve as input(s) to subsequent processes part of the same workflow, whereas those subsequent processes have a dependency on the completion of the operations generating their inputs.

Note 1 to entry: The workflow as a whole may itself take inputs and generate outputs, and may also be encapsulated as a single process.

Note 2 to entry: A workflow (or part of it) may be executed in a distributed manner (e.g. for specific area and/or resolution of interest) if some or all processes involved can be computed in a localized manner.

2.2.  Abbreviated terms

ADES

Application Deployment Execution System

API

Application Programming Interface

COG

Cloud Optimized GeoTIFF

CRS

Coordinate Reference System

CWL

Common Workflow Language

DEM

Digital Elevation Model

DGGS

Discrete Global Grid System

EO

Earth Observation

ESA

European Space Agency

EVI

Enhanced Vegetation Index

FOSS

Free and Open Source Software

GDAL

Geospatial Data Abstraction Library

GDC

Geo Data Cube

GPKG

GeoPackage

GPU

Graphical Processing Unit

JSON

JavaScript Object Notation

LoD

Level of Detail

ML

Machine Learning

MOAW

Modular OGC API Workflows

NDVI

Normalized Difference Vegetation Index

NRCan

Natural Resources Canada

OGC

Open Geospatial Consortium

STAC

SpatioTemporal Asset Catalog

TIE

Technology Integration Experiment

TIFF

Tagged Image File Format

TMS

Tile Matrix Set

UML

Unified Modeling Language

3.  Introduction

Section 4 introduces the concept of a Geo Data Cube. It describes the situation prior to the Testbed-17 work and discusses the requirements set by the sponsoring organizations.

Section 5 discusses the approach to standardizing a GDC API. This includes exploring different OGC API specifications selected for experimentation during the Testbed. These APIs included OGC API — Common, OGC API — Coverages and OGC API — Processes. These current and draft API standards form the basis for the GDC API. Additional specifications providing a basis for the GDC API include the OGC API — Environmental Data Retrieval standard, as well as the draft Data Access and Processing API (DAPA) specification and the draft OGC API — Records specification. These additional specifications could also be integrated within this framework.

Section 6 describes the experimentation and results pertaining to the integration and use of a Machine Learning model within a GDC API.

Section 7 provides an overview of the GDC API services developed and improved for the Testbed-17 GDC task.

Section 8 provides an overview of the GDC API clients developed and improved for the Testbed-17 GDC task, also relating experiences with the use of Augmented Reality and GeoPose together with a GDC API.

Section 9 lays out a path forward for standardization of a GDC API.

Annex A selects GDC API capabilities consisting of current and draft OGC API standards and conformance classes implemented by the Testbed participants.

Annex B summarizes the Technology Integration Experiments conducted between the different server and client components.

4.  Geo Data Cube concepts

This chapter introduces the concept of a Geo Data Cube and the requirements provided by sponsoring organizations guiding this initiative. Literature consulted to inform these concepts includes

4.1.  What is a Geo Data Cube?

Before considering what functionality a Geo Data Cube (GDC) API should provide, clarifying what is meant by a Geo Data Cube is important:

  • A data cube is a multi-dimensional (“n-D”) array of values (OGC 18-095r7).

  • A data cube persistently stores and provides efficient access to multi-dimensional information (although this is not meant to exclude one-dimensional information).

  • A Geo Data Cube is a data cube for which some dimensions are geospatial in nature (such as latitude and longitude, projected easting and northing, or elevation above the WGS84 ellipsoid).

  • In terms of functionality, a geo data cube can be considered a multi-dimensional field including spatial dimensions, and often temporal dimensions as well.

Conceptually, this is essentially the same as a coverage as defined in ISO 19123 / OGC Abstract Topic 6:

  • A coverage is a feature that acts as a function to return values from its range for any direct position within its spatiotemporal domain (OGC 07-011).

Where a Geo Data Cube is established on the basis of a coverage, it may be referred to as a Geospatial Coverage Data Cube. Section 4.2 of the Community Practice (OGC 18-095r7) provides a definition of the term Geospatial Coverage Data Cube. For the purpose of this ER, the term Geospatial Coverage Data Cube is considered a specialization of the term Geo Data Cube.

An API may offer access to information from a particular dataset organized as separate data cubes. Each cube could, for example, represent a different type of information, or a different imagery product or collection, and provide an integrated access to these multiple data cubes. Each of these GDCs would be equivalent to an individual coverage in the draft OGC API — Coverages specification and to a collection in the draft OGC API — Common — Part 2: Geospatial Data specification as well as in the other OGC API standards and draft specifications for data access (Features, Tiles, Maps, EDR…​).

The data cube and Geo Data Cube terms are also sometimes used to refer to a service or platform providing access to such data cubes, or to a federation of such services or platforms. For the purpose of this ER, unless explicitly stated otherwise a GDC refers to a single collection of multi-dimensional data.

This figure from openEO illustrates a multidimensional data cube:

Figure 1 — Illustration of a data cube with multiple imagery bands and time axis

NOTE  The focus of openEO is developing an open API to connect R, Python, JavaScript and other clients to big Earth observation cloud back-ends in a simple and unified way.

4.2.  Goals of a Geo Data Cube API

In addition to providing efficient access to the data, a GDC API may also enable performing analytics close to the data. Analytics could range from simple aggregation and arithmetic to more complex algorithms such as machine learning predictions. A GDC API may also allow discovering data or processing capabilities available either from within the same API or elsewhere.

There may also be interest in integrating visualization and/or data or analytics management capabilities in some deployments.

The purpose of an OGC GDC API is to enable the use of these capabilities in client applications to derive useful insights from very large collections of data, in particular multi-spectral imagery routinely collected by Earth Observation satellites such as the US Landsat, EU Sentinel-2 and Canadian RADARSAT. Such insights are of particular importance in the context of solving global challenges like climate change.

4.2.1.  Needs of end-users and application developers

Two main categories of users must be considered in the design of an OGC GDC API. The first category is that of end-users, such as climate researchers. These end-users are less concerned with the technical aspects of the API as they will likely be using the API indirectly through client applications. Their primary concern is that a standardized GDC API enables interoperability between multiple client applications and services providing datasets and analytics for a common baseline of functionalities meeting their needs. However, server-side OGC API implementations are also intended to be directly accessible by end-users, such as implementing an HTML representation of resources which may readily offer a minimum amount of functionality typical of a client. This should be considered in the design to ensure that it is possible to present the API in a user-friendly manner.

The second category of users is the developers. Developers, who will be using the GDC API to build client applications, are the primary users of the API. These users expect a uniform API that can be used with different services and datasets. They are concerned primarily with the API providing the functionality needed for their application. The ease with which they can learn how to access that functionality and how interoperable and efficient this functionality is in different implementations of the API is also important.

Finally, the back-end developers, although technically not users of the API, must implement the API functionality and are often concerned with the amount of effort required to understand and develop a service conforming to the API specification, with how easy it is to map each operation to their capabilities to provide access to data and analytics, and how possible it is to efficiently map this functionality.

Clearly, all of these targeted users and developers desire a convenient, simple and uniform API. Several of the OGC APIs being considered for use in the GDC API are still at a draft stage. If these draft APIs do not currently satisfy requirements for convenience, simplicity and uniformity, attempts should be made to improve them to address those needs. This approach is better than defining yet another completely distinct API that would further fragment the OGC API standards base and reduce interoperability.

4.2.2.  Requirements from sponsors

From the Testbed 17 Call for Participation, the following sponsor requirements for the Geo Data Cube API task were identified:

  • Define an OGC API leveraging existing building blocks for Geo Data Cubes.

  • Support access and processing in the cloud.

  • Support data discovery and querying information of diverse collections of data, including spatial and temporal resolution, interoperability with STAC, registries and catalogs.

  • Support interoperability of data formats and access methods: Cloud Optimized GeoTIFF which supports direct HTTP range requests, OGC WxS, OGC APIs.

  • Support interoperability across different cloud providers.

  • Support interoperable workflows for terrestrial & marine elevation, forestry information that can:

    • Process / extract information from forestry imagery;

    • Handle formats that enable interoperability such as for images/point clouds;

    • Derive insights & change prediction from spatiotemporal data.

  • Support interoperability between different Geo Data Cubes / APIs, as well as between GDC API and offline.

  • Support integration of terrestrial & marine elevation data from separate Geo Data Cubes.

  • Support for integration with advanced technology such as Machine Learning.

4.2.3.  Data access

A GDC API should support accessing different types of data. Examples are data cubes for regular and irregular gridded raster data and data defined by vector features geometries of different dimensionality (including point clouds with a large number of points). The domain of the data cube should be capable of supporting one or more spatial and/or temporal dimensions, and possibly additional types of dimensions.

The values associated with a direct position in the data cube (range values in coverages terminology) should support both discrete (e.g. land cover category) and continuous (e.g. radiance) observed properties (e.g. the bands / sensor type in EO imagery).

During the GDC work in Testbed-17, some confusion was noted as to what should be presented as a field / property / value of the range vs. what should be presented as an axis of the domain. Topic 6 of the OGC Abstract Specification makes a clear distinction between the two. A dimension is part of the direct position for which values are available, and must be defined in the Coordinate Reference System (CRS) for the overall domain. Most often dimensions are limited to the spatiotemporal domain. Another use case for an additional dimension would be a parameter for which properties were observed at several different values or at a continuous range of values, throughout the other aspects (e.g. spatiotemporal) of the domain.

A data cube may itself be made up of smaller data cube pieces (e.g. imagery scenes or granules). Having the GDC API providing direct access to these scenes would be useful. This could enable an application to more accurately reflect the original data characteristics of those scenes making up the data cube. An example is supporting their native CRS (e.g. Universal Transverse Mercator (UTM) coordinate system zones in Landsat-8) while providing an easier-to-access unifying data cube for the different scenes through a single CRS and a fixed resolution.

Describing these aspects of the data and providing convenient and efficient access to it in its raw form are two key capabilities for a GDC API.

Figure 2 — Figure from openEO showing layers of a data cube across imagery bands and time axis.

4.2.3.1.  Data description

A GDC API needs a mechanism to describe the domain (e.g. the spatiotemporal extent) and the range (the type of values, or observed properties, or fields) defined for each direct position within the data cube. For describing the domain, one or more CRS must be clearly identified and associated with the axes to fully cover the spatiotemporal continuum of the data. For regular axes of a grid, a resolution must be specified, while for irregular axes of a grid, direct positions must be enumerated along the axis.

For describing the range, a list of fields must be enumerated, each ideally annotated with a semantic association, a unit of measure and additional metadata if appropriate. Statistics for the values found in the data cube for each field would also be very useful information, as well as clarifications as to how the data is encoded (e.g. for encodings where it is not possible to provide these additional clarifications internally).

4.2.3.2.  Data retrieval

A GDC API needs a simple mechanism to retrieve data in convenient encodings, without imposing a particular logical data model on those physical encodings.

Although retrieving an entire data cube as a single operation is possible, a common use case is to retrieve only a certain portion of interest. This is often a particular spatial area which also corresponds to a useful resolution (native resolution for small areas, but down-sampled for larger areas), resulting in a constant maximum response size. Support for retrieving only part of the data is critical for large collections of data for which retrieving everything is unnecessary and a waste of processing and bandwidth resources at both ends of the API, and often impractical or impossible. Such subsetting and down-sampling capability can be implemented efficiently with backing data stores supporting overviews / tile pyramids, as in Cloud Optimized GeoTIFF (COG) and Tile Matrix Sets (OGC 17-083r4). Directly exposing the multi-resolution tiles through the API to clients may improve performance by aligning requests with the data store’s internal organization, and thus enable efficient caching of responses on both the server and client side. Requesting a subset of a temporal dimension may also be a desirable capability.

In coverages terminology, a subsetting operation reducing dimensions (e.g. from 3D space + time to 2D space only) is called slicing. A subsetting operation preserving the same number of dimensions is called trimming (i.e. requesting a range of values for each axis in the subsetting operation). A GDC API may also support supersampling, but this is of less value for accessing raw data as it wastes bandwidth and processing resources, and could always be done on the client-end if necessary. However, supersampling may be necessary to present a data cube of a uniform resolution where the resolution of the data source in fact is variable.

Additionally, a client may only be interested in requesting values only for some of the range (observed properties / fields, e.g. specific imagery bands of interest).

Figure 3 — Figure from openEO illustrating data trimming by time, range subsetting (selecting a single band) and intersection with a spatial area.

Figure 4 — Figure from openEO illustrating data slicing, reducing dimensions of the data.

Figure 5 — Figure from openEO illustrating temporal resampling.