I. Abstract
This OGC Testbed 17 Engineering Report (ER) documents the results and recommendations of the Geo Data Cube API task. The ER defines a draft specification for an interoperable Geo Data Cube (GDC) API leveraging OGC API building blocks, details implementation of the draft API, and explores various aspects including data retrieval and discovery, cloud computing and Machine Learning. Implementations of the draft GDC API are demonstrated with use cases including the integration of terrestrial and marine elevation data and forestry information for Canadian wetlands.
II. Executive Summary
II.A. Key findings
An architecture for a Geo Data Cube API framework is proposed. The framework is built using approved OGC standards and draft OGC API specifications. OGC API — Common provides a cohesive consistency for presenting an API landing page, conformance declaration and API description in Part 1: Core OGC 19-072, while defining collections of spatiotemporal data corresponding to the GDC API data cube resources in Part 2: Geospatial data OGC 20-024. Multi-resolution data can be stored, indexed, and represented as such Geo Data Cube resources. These resources can be transformed by performing different operations such as resampling, subsetting, aggregation, filtering, band arithmetic calculations, or processing algorithms. This also includes using complex workflows, queried by multiple data access mechanisms defined in OGC API building blocks, and returned as outputs in suitable negotiated formats.
The following OGC standards and specifications were considered and/or used in defining the GDC API.
OGC API — Coverages is considered a key GDC capability with its subsetting, range (fields) subsetting, scaling and tiles conformance classes, as well as proposed extensions for supporting filtering expressions, band arithmetic calculations and varying resolution.
Cloud Optimized GeoTIFF (COG) was considered as both a backend data store and an efficient distribution mechanism.
OGC API — Tiles and OGC API — Features are also considered as data access mechanisms.
OGC API — Environmental Data Retrieval (EDR) which offers queries for typical meteorological use cases such as data along a trajectory or within a corridor.
OGC API — Discrete Global Grid Systems is suggested as an important component to integrate within a GDC API framework.
Complex analytics can be achieved using OGC API — Processes — Part 1: Core, while simpler analytics capabilities should be conveniently integrated directly within OGC API — Coverages and EDR data requests.
The OGC API — Processes — Part 2: Deploy, Replace, Update draft specification is highlighted as a way to deploy new complex algorithms close to data.
The OGC API — Processes — Part 3: Workflows & Chaining draft specification is highlighted as a way to present the output of a process or workflow as a data cube, while supporting integration of distributed data cubes and analytics capabilities.
The OGC API — Maps and OGC API — Tiles specifications were identified as ways to directly integrate server-side visualization capabilities within a GDC framework.
The role of OGC API — Records, STAC and OGC API — Common for data discovery was explored.
Some overlap between the OGC API — EDR Standard and draft OGC API — Coverages specification were identified, particularly in terms of describing a data cube and the EDR cube queries. Some current incompatibilities between the APIs specified in OGC API — EDR and OGC API — Common were also identified.
A Scenes API is proposed as a way to provide a unified data cube while still providing direct access to individual scenes making it up as well as to their metadata. The Scenes API is also proposed as a mechanism to manage multiple scenes making up a data cube. This approach is based on the work from the Testbed 15 — Images API.
The new analytics capabilities defined by Testbed 16 — Data Access & Processing API (DAPA) are proposed as extensions for the Coverages and EDR APIs rather than as a new separate API. The definition of well-known processes supporting convenient processing languages is suggested. The need for identifying data cubes for use as input to particular processes was identified.
II.B. Results
The initiative participants developed four servers (provided by Wuhan University, 52°North, MEEO, and Ecere) and three clients (provided by Solenix, Ethar and Ecere) implementing selected Geo Data Cube API capabilities based on OGC API standards and specifications:
OGC API — Common — Part 1: Core;
OGC API — Common — Part 2: Geospatial data;
OGC API — Coverages — Part 1: Core, supporting subsetting, range subsetting, i.e. fields selection, scaling;
OGC API — Processes — Part 1: Core, supporting synchronous and asynchronous execution;
OGC API — Processes — Part 3: Workflows and Chaining, supporting collection input and collection output.
52°North demonstrated the use of the GDC API in the context of machine learning for land cover prediction from Earth Observation imagery. Ethar additionally demonstrated the use of the GDC API in the context of Augmented Reality.
II.C. Business value
This Engineering Report (ER) describes the results of discussions and experiments evaluating OGC API standards and draft specifications. Further, other data cube implementations developed outside of the OGC were evaluated in the context of a Geo Data Cube API for data access, analytics and discovery. The ER makes recommendations for the OGC Standards Program to improve interoperability of data cubes. The ER also highlights the interoperability drawbacks of defining different specifications for the same functionality within the same family of OGC API standards. Capabilities missing from OGC standards for accessing and performing analytics on data cubes are also identified which should be standardized in a uniform manner by extending the current approved standards and draft specifications. This should in turn facilitate the rapid implementation of interoperable spatiotemporal data cube capabilities within various technologies and spur further innovation.
II.D. Requirements addressed
In the Testbed-17 GDC task, the participants addressed requirements for defining an OGC API for Geo Data Cubes, leveraging existing building blocks, which would support:
access and processing in the cloud,
data discovery and querying information of diverse collections of data,
interoperability with STAC, registries & catalogs,
interoperability of data formats and access methods,
interoperability across different cloud providers,
interoperable workflows,
machine learning for detection from Earth Observation imagery and deriving insights from spatiotemporal data, and
interoperability between different Geo Data Cubes and APIs.
II.E. Motivation for defining a Geo Data Cube API
The motivation for defining a GDC API was to provide efficient access to data cubes, performing analytics close to the data ranging from some simple aggregation and band arithmetic to more complex algorithms, discovering data and analytics capabilities, as well as potentially integrating visualization and analytics management capabilities. Such an API will enable the use of these capabilities in client applications, allowing to derive useful insights from very large collections of data, in particular multi-spectral Earth Observation imagery, which are an important source of information in the context of solving global challenges such as climate change.
II.F. Recommendations for future work
The GDC task demonstrated the value of the OGC API family of standards, including those already approved (Features Part 1: Core & Part 2: CRS by reference, EDR, and Processes), and those still in draft stage (e.g. Common Part 1: Core & Part 2: Geospatial Data, Features — Part 3: Filtering, CQL2, Tiles, Coverages, Maps, DGGS, Records/STAC, Processes — Part 2: Deploy, Replace, Update & Part 3 — Workflows & Chaining), and recommends prioritizing their completion.
The importance of completing OGC API — Common — Part 1 & Part 2 as a framework for integrating capabilities in particular is highlighted. For example, resolving some incompatibilities that already identified with the EDR and OGC API — Common — Part 2 specifications. These could be resolved, allowing to offer the same data cube using the EDR API plus additional access mechanisms.
The role of the draft OGC API — Coverages specification as a baseline for describing data cubes and providing a simple and convenient data access mechanism should be clarified. This includes support for subsetting domain and range (fields / bands), and resampling. Further, this also includes support for accessing coverage data as tiles, following a fixed pyramidal multi-dimensional tiling scheme. Additional capabilities for filtering based on CQL expressions should be considered as an extension for coverages.
If possible, an attempt should be made to re-align and harmonize the EDR specification’s data description mechanism and its cube query with OGC API — Coverages. The analytics capabilities defined in the Testbed 16 — DAPA specification should be integrated directly within OGC API — Coverages and possibly OGC API — EDR as well as extensions rather than defining a new specification. OGC should ensure separate OGC API standards do not re-define the same capabilities with only superficial variations that could reduce interoperability. This also introduces a significant burden on implementers of clients & services in terms of additional standards to implement.
A Scenes API should be defined making it possible to support both a unified data cube while providing direct access to the data and metadata of individual scenes, thereby enabling integrated discovery, as well as scenes management capabilities.
Defining well-known processes expecting specific inputs — including a particular convenient processing language — to facilitate flexible coverage processing should be considered.
The need for Executable Test Suites for the different OGC API standards was highlighted.
The value of defining a set of standardized OGC API building blocks as a GDC meta-standard should be considered.
Further defining and leveraging the draft OGC API — Processes — Part 3: Workflows and Chaining specification would support presentation of the results of analytics capabilities as a virtual data cube and facilitating the integration of analytics capabilities in visualization clients, as well as facilitating the integration of remote data cubes with processing algorithms. This should be an important priority.
III. Keywords
The following are keywords to be used by search engines and document catalogues.
ogcdoc, OGC document, API, OpenAPI, OGC API, Coverage, Data Cube
IV. Preface
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. The Open Geospatial Consortium shall not be held responsible for identifying any or all such patent rights.
Recipients of this document are requested to submit, with their comments, notification of any relevant patent claims or other intellectual property rights of which they may be aware that might be infringed by any implementation of the standard set forth in this document, and to provide supporting documentation.
V. Security considerations
No security considerations have been made for this document.
VI. Submitting Organizations
The following organizations submitted this Document to the Open Geospatial Consortium (OGC):
- Ecere Corporation
VII. Submitters
All questions regarding this document should be directed to the editor or the contributors:
Name | Organization | Role |
---|---|---|
Jérôme Jacovella-St-Louis | Ecere | Editor |
Tony Hodgson | Ethar, Inc. | Contributor |
Karri Ojala | Solenix GmbH | Contributor |
Alexander Lais | Solenix GmbH | Contributor |
Peng Yue | Wuhan University | Contributor |
Martin Pontius | 52°North GmbH | Contributor |
Eike Hinderk Jürrens | 52°North GmbH | Contributor |
Sufian Zaabalawi | 52°North GmbH | Contributor |
Joshua Lieberman | OGC | Contributor |
Patrick Dion | Ecere | Contributor |
Diego Caraffini | Ecere | Contributor |
Fabio Govoni | MEEO | Contributor |
Fan Gao | Wuhan University | Contributor |
Shuaifeng Zhao | Wuhan University | Contributor |
Colin Steinmann | Ethar, Inc., Open AR Cloud | Contributor |
Nazih Fino | Ethar, Inc., Open AR Cloud | Contributor |
Panagiotis (Peter) A. Vretanos | CubeWerx Inc. | Contributor |
OGC Testbed 17: Geo Data Cube API Engineering Report
1. Normative references
The following documents are referred to in the text in such a way that some or all of their content constitutes requirements of this document. For dated references, only the edition cited applies. For undated references, the latest edition of the referenced document (including any amendments) applies.
Clemens Portele, Panagiotis (Peter) A. Vretanos, Charles Heazel: OGC 17-069r3, OGC API — Features — Part 1: Core. Open Geospatial Consortium (2019). http://docs.opengeospatial.org/is/17-069r3/17-069r3.html
Clements Portele, Panagiotis (Peter) A. Vretanos: OGC 18-058, OGC API — Features — Part 2: Coordinate Reference Systems by Reference. Open Geospatial Consortium (2020). https://docs.ogc.org/is/18-058/18-058.html
Peter Baumann: OGC 17-089r1, OGC Web Coverage Service (WCS) 2.1 Interface Standard — Core. Open Geospatial Consortium (2018). http://docs.opengeospatial.org/is/17-089r1/17-089r1.html
Matthias Mueller: OGC 14-065r2, OGC® WPS 2.0.2 Interface Standard: Corrigendum 2. Open Geospatial Consortium (2018). http://docs.opengeospatial.org/is/14-065/14-065r2.html
OGC API — Environmental Data Retrieval Standard, https://www.opengis.net/doc/IS/ogcapi-edr-1/1.0
OGC API — Processes — Part 1, https://docs.ogc.org/is/18-062r2/18-062r2.html
OGC: OGC 07-011, Topic 6 — Schema for coverage geometry and functions. Open Geospatial Consortium (2007). https://portal.ogc.org/files/?artifact_id=19820
Peter Baumann, Eric Hirschorn, Joan Masó: OGC 09-146r6, OGC Coverage Implementation Schema. Open Geospatial Consortium (2017). http://docs.opengeospatial.org/is/09-146r6/09-146r6.html
Joan Masó: OGC 17-083r2, OGC Two Dimensional Tile Matrix Set. Open Geospatial Consortium (2019). http://docs.opengeospatial.org/is/17-083r2/17-083r2.html
Matthew Purss: OGC 15-104r5, Topic 21 — Discrete Global Grid Systems Abstract Specification. Open Geospatial Consortium (2017). http://docs.opengeospatial.org/as/15-104r5/15-104r5.html
2. Terms, definitions and abbreviated terms
This document uses the terms defined in OGC Policy Directive 49, which is based on the ISO/IEC Directives, Part 2, Rules for the structure and drafting of International Standards. In particular, the word “shall” (not “must”) is the verb form used to indicate a requirement to be strictly followed to conform to this document and OGC documents do not use the equivalent phrases in the ISO/IEC Directives, Part 2.
This document also uses terms defined in the OGC Standard for Modular specifications (OGC 08-131r3), also known as the ‘ModSpec’. The definitions of terms such as standard, specification, requirement, and conformance test are provided in the ModSpec.
For the purposes of this document, the following additional terms and definitions apply.
2.1. Terms and definitions
2.1.1. cell
unit of a coverage’s domain set (potentially spanning multiple direct positions), of a fixed resolution in the case of gridded coverages, for which a specific set of range values (e.g. a pixel in an image, or a set of measurements) is returned
2.1.2. collection
(in the context of OGC API specifications) resource consisting of geospatial data that may be available as one or more sub-resource distributions that conform to one or more OGC API standards.
(SOURCE: https://github.com/opengeospatial/ogcapi-common/issues/140#issuecomment-642239475)
(in a general computer science context) grouping of some variable number of data items (possibly zero) that have some shared significance to the problem being solved and need to be operated upon together in some controlled fashion
(SOURCE: https://en.wikipedia.org/wiki/Collection_(abstract_data_type) )
2.1.3. coordinate reference system
coordinate system that is related to the real world by a datum
(SOURCE: ISO 19111:2019 Geographic information — Referencing by coordinates)
2.1.4. coordinate reference system
coordinate system that is related to the real world by a datum term name (source: ISO 19111)
2.1.5. coverage
feature that acts as a function to return values from its range for any direct position within its spatio-temporal domain
2.1.6. data cube
multi-dimensional data store
Multi-dimensional (n-D) array of values
(SOURCE: OGC 18-095r7)
Note 1 to entry: The term is also sometimes used to refer to a service or platform providing access to such data cube, or to a federation of such services or platforms.
Note 2 to entry: Even though it is called a ‘cube,’ it can be 1- dimensional, 2-dimensional, 3-dimensional, or higher-dimensional. The dimensions may be coordinates or enumerations, e.g., categories.
2.1.7. dataset
A dataset is a collection of data, published or curated by a single agent. Data comes in many forms including numbers, words, pixels, imagery, sound and other multi-media, and potentially other types, any of which might be collected into a dataset.
(SOURCE: W3C Data Catalog Vocabulary (DCAT) — Version 2, 2020)
Note 1 to entry: There is an important distinction between a dataset as an abstract idea and a distribution as a manifestation of the dataset
2.1.8. data store
A data store is a repository for persistently storing and managing collections of data which include not just repositories like databases, but also simpler store types such as simple files, metadata, models, etc.
(SOURCE: https://www.information-management.com/glossary/d.html:2020)
2.1.9. direct position
position described by a single set of coordinates within a coordinate reference system
(SOURCE: OGC Abstract Topic 6 — Schema for coverage geometry and functions)
2.1.10. domain
well-defined set [ISO/TS 19103]
(SOURCE: OGC Abstract Topic 6 — Schema for coverage geometry and functions)
Note 1 to entry: Domains are used to define the domain and range of operators and functions.
2.1.11. elevation
synonym for “height”
(SOURCE: Clause 4.16 of ISO/TS 19159:2016, https://www.iso.org/obp/ui/#iso:std:iso:ts:19159:-2:ed-1:v1:en)
2.1.12. geo data cube
a data cube for which some dimensions are geospatial (e.g. latitude and longitude, or projected easting and northing; elevation above the WGS84 ellipsoid)
A (geo) data cube is a discretized model of the earth that offers estimated values of certain variables for each partition of the Earth’s surface called a cell. A data cube instance may provide data for the whole Earth or a subset thereof. Ideally, a data cube is dense (i.e., does not include empty cells) with regular cell distance for its spatial and temporal dimensions. A data cube describes its basic structure, i.e., its spatial and temporal characteristics and its supported variables (also known as ‘properties’), as metadata. It is further defined by a set of functions. These functions describe the available discovery, access, view, analytical, and processing methods that are supported to interact with the data cube.
(Source: OGC 21-067)
Note 1 to entry: From a functionality perspective, it can be considered a multi-dimensional field including spatial dimensions, and often temporal dimensions as well (much like a coverage).
Note 2 to entry: As documented in OGC 21-067, this definition was proposed as an outcome of a Workshop and is thus still the subject of discussion.
2.1.13. height
Distance of a point from a chosen reference surface measured upward along a line perpendicular to that surface.
(SOURCE: ISO 19111:2019 Geographic information — Referencing by Coordinates)
Note 1 to entry: A height below the reference surface will have a negative value, which would embrace both gravity-related heights and ellipsoidal heights.
2.1.14. job
instance of a process execution
2.1.15. metadata
information about a resource.
(SOURCE: ISO 19115-1:2014)
Note: The US National System for Geospatial Intelligence (NSG) Metadata Foundation (NMF) Version 3.0 defines metadata as information that captures the characteristics of a resource to represent the ‘who’, ‘what’, ‘when’, ‘where’, ‘why’, and ‘how’ of that resource.
2.1.16. platform
computer hardware, software and/or network services providing a set of defined capabilities
2.1.17. process
series of computing operations to be executed, which may produce one or more output (and/or result in some other side effects), and may take one or more inputs.
2.1.18. range
(coverage) set of feature attribute values associated by a function with the elements of the domain of a coverage
(SOURCE: OGC Abstract Topic 6 — Schema for coverage geometry and functions)
2.1.19. resource
identifiable asset or means that fulfills a requirement
(SOURCE: ISO:19115-1:2014 Geographic information — Metadata — Part 1: Fundamentals)
Note 1 to entry: A web resource, or simply resource, is any identifiable thing, whether digital, physical, or abstract.
2.1.20. slice
subset of a coverage for a single coordinate along a dimension axis, for which the resulting coverage is reduced by one dimension
2.1.21. subsetting
operation whose result is a subset of the original set (e.g. trim or slice operations on a coverage)
2.1.22. tile
geometric shape with known properties that may or may not be the result of a tiling (tessellation)process. A tile consists of a single connected “piece” without “holes” or “lines” (topological disc).
(SOURCE: OGC 19-014r1: Core Tiling Conceptual and Logical Models for 2D Euclidean Space)
Note 1 to entry: “tile” is NOT a packaged blob of data to download in a chunky streaming optimization scheme!
2.1.23. tiling
in mathematics, a tiling (tessellation) is a collection of subsets of the space being tiled, i.e. tiles that cover the space without gaps or overlaps.
(SOURCE: OGC 19-014r1: Core Tiling Conceptual and Logical Models for 2D Euclidean Space)
2.1.24. trim
subset of a coverage between lower and upper bound coordinates along a dimension axis which does not reduce the dimensionality of the resulting coverage
2.1.25. workflow
sequence of processes (whether local or remote) to be executed, possibly with pre-defined and/or external input values, whose output(s) may serve as input(s) to subsequent processes part of the same workflow, whereas those subsequent processes have a dependency on the completion of the operations generating their inputs.
Note 1 to entry: The workflow as a whole may itself take inputs and generate outputs, and may also be encapsulated as a single process.
Note 2 to entry: A workflow (or part of it) may be executed in a distributed manner (e.g. for specific area and/or resolution of interest) if some or all processes involved can be computed in a localized manner.
2.2. Abbreviated terms
ADES
Application Deployment Execution System
API
Application Programming Interface
COG
Cloud Optimized GeoTIFF
CRS
Coordinate Reference System
CWL
Common Workflow Language
DEM
Digital Elevation Model
DGGS
Discrete Global Grid System
EO
Earth Observation
ESA
European Space Agency
EVI
Enhanced Vegetation Index
FOSS
Free and Open Source Software
GDAL
Geospatial Data Abstraction Library
GDC
Geo Data Cube
GPKG
GeoPackage
GPU
Graphical Processing Unit
JSON
JavaScript Object Notation
LoD
Level of Detail
ML
Machine Learning
MOAW
Modular OGC API Workflows
NDVI
Normalized Difference Vegetation Index
NRCan
Natural Resources Canada
OGC
Open Geospatial Consortium
STAC
SpatioTemporal Asset Catalog
TIE
Technology Integration Experiment
TIFF
Tagged Image File Format
TMS
Tile Matrix Set
UML
Unified Modeling Language
3. Introduction
Section 4 introduces the concept of a Geo Data Cube. It describes the situation prior to the Testbed-17 work and discusses the requirements set by the sponsoring organizations.
Section 5 discusses the approach to standardizing a GDC API. This includes exploring different OGC API specifications selected for experimentation during the Testbed. These APIs included OGC API — Common, OGC API — Coverages and OGC API — Processes. These current and draft API standards form the basis for the GDC API. Additional specifications providing a basis for the GDC API include the OGC API — Environmental Data Retrieval standard, as well as the draft Data Access and Processing API (DAPA) specification and the draft OGC API — Records specification. These additional specifications could also be integrated within this framework.
Section 6 describes the experimentation and results pertaining to the integration and use of a Machine Learning model within a GDC API.
Section 7 provides an overview of the GDC API services developed and improved for the Testbed-17 GDC task.
Section 8 provides an overview of the GDC API clients developed and improved for the Testbed-17 GDC task, also relating experiences with the use of Augmented Reality and GeoPose together with a GDC API.
Section 9 lays out a path forward for standardization of a GDC API.
Annex A selects GDC API capabilities consisting of current and draft OGC API standards and conformance classes implemented by the Testbed participants.
Annex B summarizes the Technology Integration Experiments conducted between the different server and client components.
4. Geo Data Cube concepts
This chapter introduces the concept of a Geo Data Cube and the requirements provided by sponsoring organizations guiding this initiative. Literature consulted to inform these concepts includes
reports from past OGC initiatives (OGC 21-013 OGC 21-008 OGC 20-016 OGC 20-025r1 OGC 20-035 OGC 20-018 OGC 20-039r2 OGC 20-041 OGC 20-091 OGC 20-073 OGC 19-070 OGC 19-027r2 OGC 19-026 OGC 18-038r2 OGC 18-049r1 OGC 18-050r1 OGC 18-046),
an OGC community best practice (OGC 18-095r7),
an OGC discussion paper (OGC 21-033),
articles (datacubeManifesto viewBasedModelDataCube doiPavingIncreased copernicusEarthSystem),
documentation for data cubes and APIs (openEOAPI sentinelhubAPI up42Doc climateDataStoreAPI roocsTools earthSystemDataCube), as well as
Wikipedia entries (wikiDataCube wikiOLAPcube).
4.1. What is a Geo Data Cube?
Before considering what functionality a Geo Data Cube (GDC) API should provide, clarifying what is meant by a Geo Data Cube is important:
A data cube is a multi-dimensional (“n-D”) array of values (OGC 18-095r7).
A data cube persistently stores and provides efficient access to multi-dimensional information (although this is not meant to exclude one-dimensional information).
A Geo Data Cube is a data cube for which some dimensions are geospatial in nature (such as latitude and longitude, projected easting and northing, or elevation above the WGS84 ellipsoid).
In terms of functionality, a geo data cube can be considered a multi-dimensional field including spatial dimensions, and often temporal dimensions as well.
Conceptually, this is essentially the same as a coverage as defined in ISO 19123 / OGC Abstract Topic 6:
A coverage is a feature that acts as a function to return values from its range for any direct position within its spatiotemporal domain (OGC 07-011).
Where a Geo Data Cube is established on the basis of a coverage, it may be referred to as a Geospatial Coverage Data Cube. Section 4.2 of the Community Practice (OGC 18-095r7) provides a definition of the term Geospatial Coverage Data Cube. For the purpose of this ER, the term Geospatial Coverage Data Cube is considered a specialization of the term Geo Data Cube.
An API may offer access to information from a particular dataset organized as separate data cubes. Each cube could, for example, represent a different type of information, or a different imagery product or collection, and provide an integrated access to these multiple data cubes. Each of these GDCs would be equivalent to an individual coverage in the draft OGC API — Coverages specification and to a collection in the draft OGC API — Common — Part 2: Geospatial Data specification as well as in the other OGC API standards and draft specifications for data access (Features, Tiles, Maps, EDR…).
The data cube and Geo Data Cube terms are also sometimes used to refer to a service or platform providing access to such data cubes, or to a federation of such services or platforms. For the purpose of this ER, unless explicitly stated otherwise a GDC refers to a single collection of multi-dimensional data.
This figure from openEO illustrates a multidimensional data cube:
Figure 1 — Illustration of a data cube with multiple imagery bands and time axis
NOTE The focus of openEO is developing an open API to connect R, Python, JavaScript and other clients to big Earth observation cloud back-ends in a simple and unified way.
4.2. Goals of a Geo Data Cube API
In addition to providing efficient access to the data, a GDC API may also enable performing analytics close to the data. Analytics could range from simple aggregation and arithmetic to more complex algorithms such as machine learning predictions. A GDC API may also allow discovering data or processing capabilities available either from within the same API or elsewhere.
There may also be interest in integrating visualization and/or data or analytics management capabilities in some deployments.
The purpose of an OGC GDC API is to enable the use of these capabilities in client applications to derive useful insights from very large collections of data, in particular multi-spectral imagery routinely collected by Earth Observation satellites such as the US Landsat, EU Sentinel-2 and Canadian RADARSAT. Such insights are of particular importance in the context of solving global challenges like climate change.
4.2.1. Needs of end-users and application developers
Two main categories of users must be considered in the design of an OGC GDC API. The first category is that of end-users, such as climate researchers. These end-users are less concerned with the technical aspects of the API as they will likely be using the API indirectly through client applications. Their primary concern is that a standardized GDC API enables interoperability between multiple client applications and services providing datasets and analytics for a common baseline of functionalities meeting their needs. However, server-side OGC API implementations are also intended to be directly accessible by end-users, such as implementing an HTML representation of resources which may readily offer a minimum amount of functionality typical of a client. This should be considered in the design to ensure that it is possible to present the API in a user-friendly manner.
The second category of users is the developers. Developers, who will be using the GDC API to build client applications, are the primary users of the API. These users expect a uniform API that can be used with different services and datasets. They are concerned primarily with the API providing the functionality needed for their application. The ease with which they can learn how to access that functionality and how interoperable and efficient this functionality is in different implementations of the API is also important.
Finally, the back-end developers, although technically not users of the API, must implement the API functionality and are often concerned with the amount of effort required to understand and develop a service conforming to the API specification, with how easy it is to map each operation to their capabilities to provide access to data and analytics, and how possible it is to efficiently map this functionality.
Clearly, all of these targeted users and developers desire a convenient, simple and uniform API. Several of the OGC APIs being considered for use in the GDC API are still at a draft stage. If these draft APIs do not currently satisfy requirements for convenience, simplicity and uniformity, attempts should be made to improve them to address those needs. This approach is better than defining yet another completely distinct API that would further fragment the OGC API standards base and reduce interoperability.
4.2.2. Requirements from sponsors
From the Testbed 17 Call for Participation, the following sponsor requirements for the Geo Data Cube API task were identified:
Define an OGC API leveraging existing building blocks for Geo Data Cubes.
Support access and processing in the cloud.
Support data discovery and querying information of diverse collections of data, including spatial and temporal resolution, interoperability with STAC, registries and catalogs.
Support interoperability of data formats and access methods: Cloud Optimized GeoTIFF which supports direct HTTP range requests, OGC WxS, OGC APIs.
Support interoperability across different cloud providers.
Support interoperable workflows for terrestrial & marine elevation, forestry information that can:
Process / extract information from forestry imagery;
Handle formats that enable interoperability such as for images/point clouds;
Derive insights & change prediction from spatiotemporal data.
Support interoperability between different Geo Data Cubes / APIs, as well as between GDC API and offline.
Support integration of terrestrial & marine elevation data from separate Geo Data Cubes.
Support for integration with advanced technology such as Machine Learning.
4.2.3. Data access
A GDC API should support accessing different types of data. Examples are data cubes for regular and irregular gridded raster data and data defined by vector features geometries of different dimensionality (including point clouds with a large number of points). The domain of the data cube should be capable of supporting one or more spatial and/or temporal dimensions, and possibly additional types of dimensions.
The values associated with a direct position in the data cube (range values in coverages terminology) should support both discrete (e.g. land cover category) and continuous (e.g. radiance) observed properties (e.g. the bands / sensor type in EO imagery).
During the GDC work in Testbed-17, some confusion was noted as to what should be presented as a field / property / value of the range vs. what should be presented as an axis of the domain. Topic 6 of the OGC Abstract Specification makes a clear distinction between the two. A dimension is part of the direct position for which values are available, and must be defined in the Coordinate Reference System (CRS) for the overall domain. Most often dimensions are limited to the spatiotemporal domain. Another use case for an additional dimension would be a parameter for which properties were observed at several different values or at a continuous range of values, throughout the other aspects (e.g. spatiotemporal) of the domain.
A data cube may itself be made up of smaller data cube pieces (e.g. imagery scenes or granules). Having the GDC API providing direct access to these scenes would be useful. This could enable an application to more accurately reflect the original data characteristics of those scenes making up the data cube. An example is supporting their native CRS (e.g. Universal Transverse Mercator (UTM) coordinate system zones in Landsat-8) while providing an easier-to-access unifying data cube for the different scenes through a single CRS and a fixed resolution.
Describing these aspects of the data and providing convenient and efficient access to it in its raw form are two key capabilities for a GDC API.
Figure 2 — Figure from openEO showing layers of a data cube across imagery bands and time axis.
4.2.3.1. Data description
A GDC API needs a mechanism to describe the domain (e.g. the spatiotemporal extent) and the range (the type of values, or observed properties, or fields) defined for each direct position within the data cube. For describing the domain, one or more CRS must be clearly identified and associated with the axes to fully cover the spatiotemporal continuum of the data. For regular axes of a grid, a resolution must be specified, while for irregular axes of a grid, direct positions must be enumerated along the axis.
For describing the range, a list of fields must be enumerated, each ideally annotated with a semantic association, a unit of measure and additional metadata if appropriate. Statistics for the values found in the data cube for each field would also be very useful information, as well as clarifications as to how the data is encoded (e.g. for encodings where it is not possible to provide these additional clarifications internally).
4.2.3.2. Data retrieval
A GDC API needs a simple mechanism to retrieve data in convenient encodings, without imposing a particular logical data model on those physical encodings.
Although retrieving an entire data cube as a single operation is possible, a common use case is to retrieve only a certain portion of interest. This is often a particular spatial area which also corresponds to a useful resolution (native resolution for small areas, but down-sampled for larger areas), resulting in a constant maximum response size. Support for retrieving only part of the data is critical for large collections of data for which retrieving everything is unnecessary and a waste of processing and bandwidth resources at both ends of the API, and often impractical or impossible. Such subsetting and down-sampling capability can be implemented efficiently with backing data stores supporting overviews / tile pyramids, as in Cloud Optimized GeoTIFF (COG) and Tile Matrix Sets (OGC 17-083r4). Directly exposing the multi-resolution tiles through the API to clients may improve performance by aligning requests with the data store’s internal organization, and thus enable efficient caching of responses on both the server and client side. Requesting a subset of a temporal dimension may also be a desirable capability.
In coverages terminology, a subsetting operation reducing dimensions (e.g. from 3D space + time to 2D space only) is called slicing. A subsetting operation preserving the same number of dimensions is called trimming (i.e. requesting a range of values for each axis in the subsetting operation). A GDC API may also support supersampling, but this is of less value for accessing raw data as it wastes bandwidth and processing resources, and could always be done on the client-end if necessary. However, supersampling may be necessary to present a data cube of a uniform resolution where the resolution of the data source in fact is variable.
Additionally, a client may only be interested in requesting values only for some of the range (observed properties / fields, e.g. specific imagery bands of interest).
Figure 3 — Figure from openEO illustrating data trimming by time, range subsetting (selecting a single band) and intersection with a spatial area.
Figure 4 — Figure from openEO illustrating data slicing, reducing dimensions of the data.
Figure 5 — Figure from openEO illustrating temporal resampling.