Publication Date: 2021-01-13
Approval Date: 2020-12-14
Submission Date: 2020-11-20
Reference number of this document: OGC 20-016
Reference URL for this document: http://www.opengis.net/doc/PER/t16-D005
Category: OGC Public Engineering Report
Editor: Panagiotis (Peter) A. Vretanos
Title: OGC Testbed-16: Data Access and Processing Engineering Report
COPYRIGHT
Copyright © 2021 Open Geospatial Consortium. To obtain additional rights of use, visit http://www.opengeospatial.org/
WARNING
This document is not an OGC Standard. This document is an OGC Public Engineering Report created as a deliverable in an OGC Interoperability Initiative and is not an official position of the OGC membership. It is distributed for review and comment. It is subject to change without notice and may not be referred to as an OGC Standard. Further, any OGC Public Engineering Report should not be referenced as required or mandatory technology in procurements. However, the discussions in this document could very well lead to the definition of an OGC Standard.
LICENSE AGREEMENT
Permission is hereby granted by the Open Geospatial Consortium, ("Licensor"), free of charge and subject to the terms set forth below, to any person obtaining a copy of this Intellectual Property and any associated documentation, to deal in the Intellectual Property without restriction (except as set forth below), including without limitation the rights to implement, use, copy, modify, merge, publish, distribute, and/or sublicense copies of the Intellectual Property, and to permit persons to whom the Intellectual Property is furnished to do so, provided that all copyright notices on the intellectual property are retained intact and that each person to whom the Intellectual Property is furnished agrees to the terms of this Agreement.
If you modify the Intellectual Property, all copies of the modified Intellectual Property must include, in addition to the above copyright notice, a notice that the Intellectual Property includes modifications that have not been approved or adopted by LICENSOR.
THIS LICENSE IS A COPYRIGHT LICENSE ONLY, AND DOES NOT CONVEY ANY RIGHTS UNDER ANY PATENTS THAT MAY BE IN FORCE ANYWHERE IN THE WORLD. THE INTELLECTUAL PROPERTY IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NONINFRINGEMENT OF THIRD PARTY RIGHTS. THE COPYRIGHT HOLDER OR HOLDERS INCLUDED IN THIS NOTICE DO NOT WARRANT THAT THE FUNCTIONS CONTAINED IN THE INTELLECTUAL PROPERTY WILL MEET YOUR REQUIREMENTS OR THAT THE OPERATION OF THE INTELLECTUAL PROPERTY WILL BE UNINTERRUPTED OR ERROR FREE. ANY USE OF THE INTELLECTUAL PROPERTY SHALL BE MADE ENTIRELY AT THE USER’S OWN RISK. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR ANY CONTRIBUTOR OF INTELLECTUAL PROPERTY RIGHTS TO THE INTELLECTUAL PROPERTY BE LIABLE FOR ANY CLAIM, OR ANY DIRECT, SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM ANY ALLEGED INFRINGEMENT OR ANY LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR UNDER ANY OTHER LEGAL THEORY, ARISING OUT OF OR IN CONNECTION WITH THE IMPLEMENTATION, USE, COMMERCIALIZATION OR PERFORMANCE OF THIS INTELLECTUAL PROPERTY.
This license is effective until terminated. You may terminate it at any time by destroying the Intellectual Property together with all copies in any form. The license will also terminate if you fail to comply with any term or condition of this Agreement. Except as provided in the following sentence, no such termination of this license shall require the termination of any third party end-user sublicense to the Intellectual Property which is in force as of the date of notice of such termination. In addition, should the Intellectual Property, or the operation of the Intellectual Property, infringe, or in LICENSOR’s sole opinion be likely to infringe, any patent, copyright, trademark or other right of a third party, you agree that LICENSOR, in its sole discretion, may terminate this license without any compensation or liability to you, your licensees or any other party. You agree upon termination of any kind to destroy or cause to be destroyed the Intellectual Property together with all copies in any form, whether held by you or by any third party.
Except as contained in this notice, the name of LICENSOR or of any other holder of a copyright in all or part of the Intellectual Property shall not be used in advertising or otherwise to promote the sale, use or other dealings in this Intellectual Property without prior written authorization of LICENSOR or such copyright holder. LICENSOR is and shall at all times be the sole entity that may authorize you or any third party to use certification marks, trademarks or other special designations to indicate compliance with any LICENSOR standards or specifications.
This Agreement is governed by the laws of the Commonwealth of Massachusetts. The application to this Agreement of the United Nations Convention on Contracts for the International Sale of Goods is hereby expressly excluded. In the event any provision of this Agreement shall be deemed unenforceable, void or invalid, such provision shall be modified so as to make it valid and enforceable, and as so modified the entire Agreement shall remain in full force and effect. No decision, action or inaction by LICENSOR shall be construed to be a waiver of any rights or remedies available to it.
None of the Intellectual Property or underlying information or technology may be downloaded or otherwise exported or reexported in violation of U.S. export laws and regulations. In addition, you are responsible for complying with any local laws in your jurisdiction which may impact your right to import, export or use the Intellectual Property, and you represent that you have complied with any regulations or registration procedures required by applicable law to make this license enforceable.
- 1. Subject
- 2. Executive Summary
- 3. Standard and/or Domain Working Group review
- 4. Document contributor contact points
- 5. References
- 6. Terms and definitions
- 7. Overview
- 8. Previous work
- 9. Use Cases
- 9.1. Overview
- 9.2. General use cases
- 9.3. Use-Case 3: API Evaluation
- 9.4. Participant use cases
- 9.4.1. Introduction
- 9.4.2. DLR Use cases
- 9.4.3. Terradue Use Cases
- 9.4.4. GMU use cases
- 9.4.5. CRIM Use cases: Gridded climate data
- 10. Jupyter Notebooks
- 11. DAPA Endpoints
- 11.1. Overview
- 11.2. Resource model
- 11.3. API Endpoint 1 (EOX - D165)
- 11.4. API Endpoint 2 (interactive instruments - D166)
- 11.4.1. The data
- 11.4.2. The API
- 11.4.3. The API definition
- 11.4.4. General OGC API resources in the API
- 11.4.5. Access to the raw data
- 11.4.6. DAPA overview
- 11.4.7. The DAPA landing page: Access information about the available data retrieval patterns
- 11.4.8. Access information about the available variables
- 11.4.9. Retrieve a time series for selected variables at a position
- 11.4.9.1. Sample requests
- 11.4.9.1.1. Using coordinates for the location, requesting data for an instant
- 11.4.9.1.2. Using coordinates for the location, requesting data for an interval (time series)
- 11.4.9.1.3. Using a reference for the location, requesting data for an instant
- 11.4.9.1.4. Using a reference for the location, requesting data for an interval (time series)
- 11.4.9.2. Sample responses
- 11.4.9.3. Example use in python
- 11.4.9.1. Sample requests
- 11.4.10. Retrieve a time series for selected variables for each station in an area
- 11.4.11. Retrieve a time series for selected variables for each station in an area and resample the observations to a time series for each cell in a 2D grid
- 11.4.12. Retrieve a time series for selected variables at a position and apply functions on the values for each variable
- 11.4.13. Retrieve a time series for selected variables for each station in an area and apply functions on the values of each time series
- 11.4.14. Retrieve a time series for selected variables for each station in an area and apply functions on the values of each time step
- 11.4.15. Retrieve a time series for selected variables for each station in an area and apply functions on all values
- 11.4.16. Retrieve a time series for selected variables for each station in an area, resample the observations to a time series in a 2D grid and on the values of each time series
- 11.5. API Endpoint 3 (terradue - D167)
- 11.5.1. /collections
- 11.5.2. /collections/{collection_id}/variables
- 11.5.3. /collections/{collection_id}/processes/position:retrieve
- 11.5.4. /collections/{collection_id}/processes/position:aggregate-time:
- 11.5.5. /collections/{collection_id}/processes/area:retrieve
- 11.5.6. /collections/{collection_id}/processes/area:aggregate-space
- 11.5.7. /collections/{collection_id}/processes/area:aggregate-time-space
- 11.5.8. /collections/{collection_id}/processes/area:aggregate-time
- 11.5.9. /geothesaurus - location
- 12. Data Formats
- 13. API Evaluation
- Appendix A: Comparing DAPA and ADES
- A.1. Overview
- A.2. Discovery
- A.3. Process description
- A.4. Encoding
- A.5. Job Control
- A.6. Invocation
- A.7. Responses
- A.8. Examples
- A.9. Process
position
- A.10. Process
area
- A.11. A KVP-encoding for execute ADES processes
- A.12. Execution endpoint
- A.13. Basic form of the URL
- A.14. Process Inputs (<input value>)
- A.15. Process outputs (<output reference>)
- A.16. Examples
- Appendix B: Revision History
1. Subject
This OGC Testbed-16 Engineering Report (ER) describes the work performed in the
Data Access and Processing API
(DAPA) thread.
The primary goal of the DAPA thread is to develop methods and apparatus that simplify access to, processing of, and exchange of environmental and Earth Observation (EO) data from an end-user perspective. This ER presents:
-
The use cases participants proposed to guide the development of the client and server components deployed during the testbed.
-
An abstract description of a resource model that binds a specific function to specific data and also provides a means of expressing valid combinations of data and processes.
-
A description of each DAPA endpoint developed and deployed during the testbed.
-
A description of the client components that interact with the deployed DAPA endpoints.
-
End-user (i.e. data scientist) feedback concerning the ease-of-use of the
Note
|
This ER does not cover the specific details of the Data Access and Processing API. A complete description of the API can be found in the Data Access and Processing API Engineering Report. |
2. Executive Summary
2.2. Goals
In the past, the provider-centric view has defined data retrieval mechanisms for geospatial content. This view has resulted in a collection of "stove-pipe" API specifications organized around resource types that, from an end-user perspective, are not well integrated or easy to invoke. For example, OGC defines a Features API in the OGC API - Feature standard for accessing vector feature data and a Processes API in the OGC API - Processes candidate standard for invoking geo-processes over the Web. Deployed by a single provider, however, which processes can operate on which collection of data is not obvious. Furthermore, invocation of processes on the data is not straight forward or user-friendly. Invoking a process through the Processes API requires the creation of a JSON-encoded execution request which is then sent to an execution endpoint using the HTTP POST method.
By contrast, the DAPA developed in this thread tightly couples a process to the data collection(s) upon which it can operate. The DAPA also uses a simple invocation pattern that makes use of the HTTP GET method with query parameters. This is familiar to anyone who has followed a link in a web page. As a result, a DAPA request can be easily embedded in many environments including a Jupyter notebook or within HTML pages that can be dynamically generated to from DAPA endpoints as just another output format (i.e. text/html). HTML pages provide a convenient way for end-uses to navigate DAPA resources and invoke processes using just a web browser. In the DAPA thread both environments, Jupyter notebooks and HTML pages were used to test and evaluate the DAPA endpoints provided by participants.
2.3. Summary of DAPA evaluation
In order to gauge the ease-of-use of the API, a multi-day workshop was organized involving earth observation and data scientists familiar with the use of Jupyter notebooks but not with DAPA. The purpose of the workshop was to have the scientists use the DAPA and then provide evaluation reports. The primary evaluation criteria were:
-
Learning curve.
-
Richness of functionality.
-
Ease of use (i.e. how much code is required to make a DAPA request)
The evaluation reports are included in the API evaluation clause of this ER and the main points are summarized here:
-
The DAPA succeeded in satisfying all the primary evaluation criteria:
-
Simple to learn.
-
Provided a reasonable set of functionality.
-
Simple to use and embed into a Jupyter notebook.
-
-
The evaluations also described a number of recommendations to enrich the functionality of the API including:
-
Provide an easy-to-use interface to learn how to use the API.
-
Richer metadata via links to describe aspects of the API such as query parameters, metadata about the associated collection, example invocations, etc.
-
Add up/downsampling capability.
-
Polygon filtering rather than just BBOX.
-
Some means to combine different data sets in one request.
-
GDAL-compatible data formats and direct access via GDAL Virtual File Systems.
-
More user-friendly output formats (CSV, PNG, etc.).
-
Data streaming especially binary for raster outputs.
-
HTTP POST request for parameterizable queries
-
A complete, harmonized API description in OpenAPI to allow interactive clients and Jupyter widgets to be created.
-
2.4. Data formats
The work by the participants in this thread focused primarily on the development of the DAPA with respect to the use cases articulated by participants and described in the Participant use cases section of this ER. Due to time and resource constraints, less attention was paid to performing an analysis of output data formats. In order to get a sense of the data formats used in the DAPA thread, the following table presents, based on the type of data, what output formats the participants' DAPA endpoints generated.
Type of data desired | Output Formats |
---|---|
Coverage |
|
Imagery |
|
Vector Features |
|
Tabular Data |
|
Scalar Values |
Plain text or JSON |
2.5. Future work
Below are listed possible future work items that might be considered in relation to the DAPA:
-
Consider adding the ability to negotiate output formats.
-
Consider defining a set of possible output data type structures (e.g. data cube, coverage, time server, multi-dimensional data set, feature or simple value).
-
Define standard encodings for each output data type structure identified (e.g. GeoTIFF, GeoJSON).
-
If they don’t already exist, create MIME type or MIME type templates that capture the valid combinations of data structure and encoding format(s).
-
-
Consider adding query parameters to the API that support:
-
Spatial and temporal (aggregation) resolution on which the processing should be executed,
-
Pre-filtering of collection items that go into the processing function,
-
On-the-fly transformation of attributes (e.g.
&fields=NDVI=(B04-B08)/(B04+B08),NDBI=(B01-B02)/(B01+B02)
), -
Cross-reference of values from other collections (e.g.
&fields=(NDVI=(B04-B08)/(B04+B08))*(external_collection:CLOUD_MASK)
).
-
-
Explore adding support for HTTP POST for invoking DATA processes.
-
Considering adding an up/downsampling capability of data to control the volume of data being processes (see similar feature in Google Earth Engine).
-
Investigate DAPA extensibility.
-
More broadly, consider convergence of DAPA with OGC API - Processes / ADES (see Comparing ADES and DAPA).
-
-
Enhanced and interactive documentation about all aspects of a DAPA deployment (e.g. paths, parameters, test queries, etc.).
-
The ability to combine different datasets in one request (e.g. fusion of Sentinel-1 and Sentinel-2 for a specific point in time)
-
Consider borrowing some capabilities from Google Earth Engine.
2.6. DAPA and Processing in the OGC
Processing within the OGC can be viewed as a spectrum where at one end there is a set of specialized processes deemed to be generally important for the geo-spatial community and at the other end there is the set of geo-processing modules that needs to be deployed and executed on the Web.
Mapping and routing are examples of generally important processes that are distinguished within the OGC by having dedicated APIs defined those processes.
At the other end of the spectrum, the OGC has defined a Processes API, the OGC API - Processes API, that enables any process to be deployed on the Web and invoked in a standardized way with standardized job control for long-running processes and standardized interfaces for retrieving processing results.
Between these two end points exists a set of processing requirements that satisfy the needs of specific communities of interest such as meteorology, pollution monitoring, and so forth. There are important considerations such as providing a rich, integrated functionally but also being easy to use. DAPA sits in this middle ground and provides APIs that encapsulate the set of functionalities required by the community by specifying an easy-to-learn and easy-to-use API. In other words, an API designed from the perspective of the end-user of the community of interest and not the data provider.
3. Standard and/or Domain Working Group review
3.1. Overview
The task participants believed that the work of the Data Access and Process API task of the OGC Testbed-16 is relevant to work being done in the OGC SWGs/DWGs listed below. A request for review of this ER by the SWG/DWG members was forwarded by the editor.
3.1.1. EO Exploitation Platform Domain Working Group
The purpose of the EO Exploitation Platform Working Group is to foster interoperability (i.e. the pre-requisites for successful interaction between platform components and across platforms) among Exploitation Platforms and their components. To this end, the working group will act as an open form for discussion and documentation of interoperability requirements for the domain in order to drive OGC standards evolution towards better support for the use cases of the EO Exploitation Platforms.
The increasingly huge amount of new EO satellite data and In-Situ data every day has incentivized the creation of several web accessible platforms that enables scientists and commercial operators to use such data without the need to download content and have in-house Information Technology infrastructure to manage the large volume of data.
EO Exploitation platforms have been independently developed by public organization or commercial companies, but all share a common set of functionalities:
-
Cataloguing and searching;
-
Storage and access;
-
Visualization;
-
Data processing and analysis; and
-
User Authentication, Authorization, and Accounting.
These platforms, however, have many different implementations, with interfaces and data formats which often hampers interoperability among platforms. In fact, the next step in this data exploitation revolution is to link the platforms together creating an ecosystem. This ecosystem is potentially across different administrative domains, in which each platform can contribute with the offered services to the implementation of more complex use cases.
The work in this Testbed 16 thread focused on the data process and analysis aspects of EO exploitation platforms. The particpants believe that the outcomes of this testbed would be of interest to users of EO exploitation platforms. Specifically the work of the testbed is to develop data access and processing interfaces that are simpler to use and thus simpler to learn and more easily integrated into the kinds of tools that EO exploitation platform users might use (e.g. using Jupyter notebooks to perform some analysis).
3.1.2. Environmental Data Retrieval API SWG
The purpose of the Environmental Data Retrieval (EDR) API SWG is to standardize several APIs, defined using OpenAPI (Version 3), to retrieve various common data patterns from a relatively persistent data store. The data patterns could include, but are not restricted to, data at a point in space and time, time series at a point, data along a trajectory, which may be 2, 3, or 4 dimensional, and covering a specified polygon or rectangular tile. The APIs will enable service users to retrieve resources over a discrete sampling geometry, created by the service in response to a standardized query pattern.
The work of the Testbed-16 Data Access and Processing task was closely aligned with the stated goals of the SWG. The participants believe that the outcomes of this testbed task could help inform the design of the API components that the SWG is endeavouring to specify.
3.1.3. OGC API - Processes SWG
The purpose of the OGC API - Processes SWG is to design a Web API that enables the execution of computing processes and the retrieval of metadata describing their purpose and functionality. Typically, these processes combine raster, vector, coverage and/or point cloud data with well-defined algorithms to produce new raster, vector, coverage and/or point cloud information.
In the current design of the Processes API, executing a process involves creating a JSON document that contains the process inputs and POSTing that document to the execution endpoint of the process. From an end-user perspective, this workflow is more complicated than may be necessary. Since one of the goals of the testbed task was to design a simpler API for invoking processes, the participants believe that this work could inform the design of an alternate invocation of WPS processes using KVP/GET method rather than the current JSON/POST method.
Furthermore, the work being done in Testbed-16 for binding processes to specific datasets and to advertise which combinations of data and processes are valid would be of interest to the SWG.
3.1.4. Citizen Scientist DWG
There are a large and increasing number of citizen science projects active around the world involving the public in environmental monitoring and other scientific research activities. The OGC Citizen Science DWG is motivated to support citizen science by providing a forum for increasing understanding and demonstration of the benefits brought by the use of open standards and best practices. This DWG will support the development of improved interoperability arrangements for the citizen science community.
The work of the testbed to simplify the interfaces for accessing and processing data from the end-user perspective is directly related to the goal of the DWG to improve interoperability arrangements for the citizen science community.
4. Document contributor contact points
All questions regarding this document should be directed to the editor or the contributors:
Contacts
Name | Organization | Role |
---|---|---|
Fabrice Brito |
Terradue |
Contributor |
Jonas Eberle |
DLR |
Contributor |
Pedro Gonçalves |
Terradue |
Contributor |
Torsten Heinen |
DLR |
Contributor |
David Landry |
CRIM |
Contributor |
Clemens Portele |
interactive instruments |
Contributor |
Panagiotis (Peter) A. Vretanos |
CubeWerx Inc. |
Editor |
4.1. Foreword
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. The Open Geospatial Consortium shall not be held responsible for identifying any or all such patent rights.
Recipients of this document are requested to submit, with their comments, notification of any relevant patent claims or other intellectual property rights of which they may be aware that might be infringed by any implementation of the standard set forth in this document, and to provide supporting documentation.
5. References
The following normative documents are referenced in this document.
-
OGC: OGC 17-069r3, OGC® API - Features - Part 1: Core Standard version 1.0, 2019
-
OGC: OGC 10-090r3, OGC Network Common Data Form (NetCDF) Core Encoding Standard version 1.0, 2011
-
Ecma International: ECMA-404, The Json Data Interchange Syntax
-
OGC: OGC 20-044, OGC API - Processes - Part 2: Transactions (draft)
-
OGC: OGC 19-086r2, OGC API - Environmental Data Retrieval Standard (draft)
6. Terms and definitions
For the purposes of this report, the definitions specified in Clause 4 of the OWS Common Implementation Standard OGC 06-121r9 and the clause titled "Terms and Definitions" of OGC 19-072 shall apply. In addition, the following terms and definitions apply.
- ● area
-
region specified with a geographic envelope that may have a vertical dimension
- ● collection
-
a body of resources that belong or are used together; an aggregate, set, or group of related resources
- ● dataset
-
collection of data, published or curated by a single agent, and available for access or download in one or more serializations or formats
- ● endpoint
-
the specific digital location where requests for information are sent to retrieve the digital resource that exists there
- ● input | argument | query parameter
-
refers to data provided to a process; a process input is an identifiable item
- ● LiDAR
-
Light Detection and Ranging — a common method for acquiring point clouds through aerial, terrestrial, and mobile acquisition methods.
- ● location
-
identifiable geographic place
- ● notebook
-
notebook is an electronic file that runs in a web browser and contains both programming code and text descriptions
- ● output
-
data returned as a result of applying a process to a collection
- ● position
-
a place specified with a geographic point
- ● process
-
a function that for each input returns a corresponding output; within this engineering report inputs are also referred to as process arguments or query parameters
- ● variable | field | observed property
-
a variable is the named representation of a property that has been observed; for example precipitation or temperature
7. Overview
The Previous work section discusses several developments and existing best practices for both APIs and data encodings that informed the work does in this thread.
The Use cases section discusses the use cases used to guide the design and implementation of the components developed for the DAPA thread.
The Jupyter notebooks section provides a brief introduction about Jupyter notebook technology and a detailed description of the notebooks developed in this thread.
The API endpoints section describes the DAPA endpoints developed by each of the participants. This section does not describe the API itself which is described in the OGC Testbed-16: Data Access and Processing API ER.
The Data formats section provides a survey of which data formats work best in which situations of data retrieval informed by the different scenarios and user groups addressed in this ER.
The API evaluation section presents the end-user evaluations of the API gather during the X-day Evaluation workshop.
8. Previous work
8.1. Introduction
The Data Access and Processing API development takes into account several developments and existing best practices for both APIs and data encodings.
8.2. APIs
8.2.1. openEO
The volume of Earth Observation data has grown so large that moving such data to a local machine for processing is no longer feasible. Instead of moving the data to local processing, the trend is now to store large repositories of Earth Observation data in the cloud or compute back-ends, and move the processing software to the data. The results of this processing on the cloud can then be browsed remotely or downloaded as desired.
In order to enable this "move the process to the data" concept, the openEO organization developed an API that connects clients such as R, Python and JavaScript to big Earth observation cloud back-ends in a simple and unified way. The main objectives of the project are the following concepts:
-
Simplicity for clients (rather than data providers)
-
Many end-users use Python or R to analyze data and JavaScript to develop web applications. Analyzing large amounts of EO imagery should be equally simple, and seamlessly integrate with existing workflows.
-
-
Unification
-
A common API makes it easier to validate and reproduce processing results across cloud back-ends.
-
A common API makes it easier to provision processing across in a coordinated way different back ends.
-
A common API makes it easier to compare cloud back-ends in term of capability and costs.
-
8.2.2. GeoTrellis
GeoTrellis is a geographic data processing engine for high performance applications. It is implemented as a Scala library and framework that uses Apache Spark to work with raster data and supports many Map Algebra operations as well as vector to raster or raster to vector operations.
8.2.3. GeoAPI
The OGC GeoAPI Implementation Standard defines the GeoAPI library. GeoAPI provides a set programming interfaces for geospatial applications. In a series of packages or modules, GeoAPI 3.0 defines interfaces for metadata handling and for geodetic referencing (map projections). The GeoAPI interfaces closely follow the abstract models published collaboratively by ISO in its 19100 series of documents and the OGC in its abstract and implementation specifications. GeoAPI provides an interpretation and adaptation of these standards to match the expectations of Java or Python programmers. C
8.3. OGC APIs
These APIs are complemented by a set of emerging OGC API standards to handle geospatial data and processes. The OGC API family of (mostly emerging) standards is organized by resource type. So far, OGC API - Features has been released as a standard that specifies the fundamental API building blocks for interacting with features. The spatial data community uses the term 'feature' for things in the real world that are of interest. OGC API standards define modular API building blocks to spatially enable Web APIs in a consistent way. The OpenAPI specification is used to define and document the API building blocks.
8.4. Data encoding
On the data encoding side, there are several existing standards that are frequently being used for Earth observation, environmental, ecological, or climate data. These include NetCDF, GeoTIFF, HDF, GML/Observation and Measurements or variations thereof, or increasingly the use of JSON encoded data. Testbed-16 explored existing solutions as well as emerging specifications and provide recommendations with focus on the end-user, i.e. data or earth scientist.
The OGC-ESIP Coverage and Processing API Sprint at the ESIP Winter meeting in January 2020 performed an analysis on coverages beyond the current OGC WCS standard’s capabilities. This effort took into account various elements that needed to be developed for an API approach based on the abstract specifications for Coverages and Processing as well as OPeNDAP, GeoXarray/Zarr, R-spatial and other modern software development environments. The Geospatial Coverages Data Cube Community Practice document describes community practices for Geospatial Coverage Data Cubes as implemented by multiple communities and running as operational systems.
9. Use Cases
9.1. Overview
This clause describes the use cases that guided the design and implementation of the components developed for the Testbed 16 DAPA task.
This chapter is organized into two sections. The first section describes the general use cases from the end-user perspective. Based on the general use case, the second section describes the Testbed participant use cases that guided the implementations of the DAPA endpoints.
9.2. General use cases
9.2.1. Use case 1 - Data Retrieval
For this use case the typical user is a developer.
The user wants to access geospatial data for a specific area in a simple function call. The function call identifies the data and allows the user to define the discrete sampling geometry. Valid geometries include:
-
Point locations (x, y, and optional z),
-
Bounding-box
-
Polygon
All geometries are provided either in-line or by reference, as illustrated by the following examples:
Specifying sampling geometries by reference also supports the use of an OGC API - Feature endpoint as shown in the example above where the value of the location
parameter is an OGC API - Feature invocation to fetch the feature, and thus the geometry, of the city for Frankfurt.
Users need the ability to access the original data. The use of the term "original" is a little ambiguous in this context as data often undergoes some process on its way from original access to the final product. As an example, imagine a digital temperature sensor. The actual reading performed in the sensor is some form of electricity, but the value provided at the sensor interface is 21°Celsius (approximately 69.8°Fahrenheit). Thus, some form of calibration curve has been applied to the original reading, which might not be available at all. In this case, the value 21°Celsius can be considered as “original”. The same principles apply to satellite data. The original raw data readings are often not accessible. Instead, the data undergoes some correction process before being made available. Higher product levels may include orthorectification or re-gridding processes. In any case, data providers need to provide a description of the performed processing together with the actual data. In addition, data should be available as "raw" as possible.
End-users want to retrieve all data that exists within the provided target geometry. In the case of polygon geometries, the end-user would receive all data that intersects that polygon. In the case of point geometries, the end-user would retrieve the value exactly at that point.
In addition, end-users have the option to define the (interpolation) method for value generation. If no option is selected, the Web API indicates how a given value was produced. Testbed-16 participants developed a set of frequently used production options, including for example:
-
“original value”
-
“interpolation method”
-
“re-gridding”
or any combination thereof. This use case differentiates the following data requests:
-
Synopsis/Time-Averaged Map: The end-user wants to retrieve data for a single point in time or as an average value over a time period. The figure below is an example of visualized time-averaged data for a number of sampling locations.
-
Area-Averaged Time Series: The end-user wants to retrieve a single value that averages all data in the target geometry for each time step. The figure below is an example of visualized area-averaged data for a number of time steps.
-
Time Series: The end-user wants to retrieve the full time series for each data point. The figure below is an example of visualized full time series data set that includes a number of time steps.
Testbed-16 participants explored these use-cases in combination with additional processing steps. For example, the end-user requests synoptic, map, or time series data, that is interpolated to a grid.
9.2.2. Use case 2 - Data Processing
Testbed-16 participants explored simple data processing functions. These include calculations for:
-
The minimum value in a result set.
-
The maximum value in a result set.
-
Average value in a result set.
These values are for any given data retrieval subset as accessible in the Data Retrieval use cases.
9.3. Use-Case 3: API Evaluation
The third use-case is orthogonal to the first two. This use case does not add any additional requirements on the API itself, but evaluates the API from an end-user point of view.
This third use case was implemented during a full-day workshop with several data and earth scientists who were invited to evaluate the DAPA API regarding:
-
Learning curve to use the API.
-
Richness of accessible functionality.
-
Amount of code needed to execute some common analyses.
The goal of the workshop was to allow the API developers and endpoint providers to further refine the API and increase ease-of-use based on the feedback provided by the scientists.
9.4. Participant use cases
9.4.1. Introduction
This section describes the detailed use cases articulated by the Testbed 16 participants, based on the general use cases, that guided the development of thread clients and end-points.
9.4.2. DLR Use cases
9.4.2.1. Use case 1: Data extraction services for volcano / ozone monitoring
9.4.2.1.1. Overview
As a service provider, the user want to retrieve the { average | max } value of { Sulphur dioxide | ozone } in a specific region { volcano | pole } over a specific time period { yesterday | last week } in order to determine if the value is higher than a threshold. If the value is higher, a notification is sent and the creation of a { animation | chart } of the { particle dissemination | ozone concentration) is triggered for each successive day until the value falls below the threshold.
-
Use case inputs:
-
Sentinel-5p products (e.g., L3 Sulphur dioxide, L3 Ozone).
-
-
Expected outputs:
-
Datacube: Ozone concentration for each time instant.
-
Animation: Ozone concentration for each time instant color-coded and encoded video.
-
Coverage: Sulphur dioxide / ozone concentration for time period(s).
-
Chart: Timeseries of ozone concentration over bbox|point|polygon.
-
Feature: Timeseries of ozone concentration over bbox|point|polygon.
-
-
Internal data processing
-
None, only temporal and spatial aggregation of data.
-
-
Requirements for the API
-
Select collection (e.g., Sentinel-5p L3 Ozone).
-
Filter on area of interest (e.g., Frankfurt area).
-
Filter on time range to be processed (e.g., “last week”).
-
Define interval for temporal aggregation (e.g., none, all data, weekly).
-
Define method for temporal aggregation (e.g., mean, min, max, sd).
-
For Chart: Define method for spatial aggregation (e.g., mean).
-
9.4.2.1.2. Use case 1.1 pseudo-api
Retrieve Sulphur dioxide concentration over area g1 in the last month as datacube (x,y,z,t) encoded as CoverageJSON. CoverageJSON, is a data format for describing "coverage" data in JSON. The primary intended purpose of the format is to enable data transfer between servers and web browsers, to support the development of interactive, data-driven web applications.
web.api collection=S5P properties=SO2 subset=(<GEO>, BBOX(g1)) subset=(<TIME>, TODAY-P1M/TODAY) process=NONE output_type=DATACUBE output_format=CoverageJSON ==> { Type: Coverage Domain: type: Grid axes: x,y,z,t Parameters: sulphur_dioxide Ranges: axisNames: [x,y,z,t] sulphur_dioxide: [values] }
9.4.2.1.3. Use case 1.2 pseudo-api
Retrieve Sulphur dioxide concentration over area g1 on a specific day as a coverage encoded as GeoTIFF.
web.api collection=S5P properties=SO2 subset=(<GEO>, BBOX(g1)) subset=(<TIME>, TODAY) process=( <SO2>, max(<TIME>) ) output_type=COVERAGE output_format=geotiff ==> PROJ: geo Meta: Time=TODAY O3: values
9.4.2.1.4. Use case 1.3 pseudo-api
Retrieve maximum Sulphur dioxide concentration over area g1 in the last month as timeseries table encoded in csv.
web.api collection=S5P properties=SO2 subset=(<GEO>, BBOX(g1)) | (<GEO>, POLYGON(local:POI:anak-krakatau) subset=(<TIME>, TODAY-P1M/TODAY) process=(<SO2>, max(<GEO>) ) output_type=TIMESERIES output_format=csv ==> TIME,SO2 2000-01-01,2 2000-01-02,3 2000-01-04,10 ...
9.4.2.1.5. Use case 1.4 pseudo-api
Retrieve maximum Sulphur dioxide concentration over area g1/ref:volcano at t1=2000-01-01 as simple value encoded in a simple GeoJSON feature.
web.api collection=S5P properties=SO2 subset=(<GEO>, BBOX(g1)) | (<GEO>, POLYGON(http://external/anak_krakau)) subset=(<TIME>, 2000-01-01) process=(<SO2>, max(<GEO>) process=(<SO2>, max(<TIME>) output_type=FEATURE output_format=GeoJSON ==> { feature.1: {poly: [anak_krakau], time: 2000-01-01, SO2: value } }