Publication Date: 2020-10-22
Approval Date: 2020-09-23
Submission Date: 2020-09-03
Reference number of this document: OGC 20-042
Reference URL for this document: http://www.opengis.net/doc/PER/EOAppsPilot-Terradue
Category: OGC Public Engineering Report
Editor: Pedro Gonçalves
Title: OGC Earth Observations Applications Pilot: Terradue Engineering Report
COPYRIGHT
Copyright © 2020 Open Geospatial Consortium. To obtain additional rights of use, visit http://www.opengeospatial.org/
WARNING
This document is not an OGC Standard. This document is an OGC Public Engineering Report created as a deliverable in an OGC Interoperability Initiative and is not an official position of the OGC membership. It is distributed for review and comment. It is subject to change without notice and may not be referred to as an OGC Standard. Further, any OGC Public Engineering Report should not be referenced as required or mandatory technology in procurements. However, the discussions in this document could very well lead to the definition of an OGC Standard.
LICENSE AGREEMENT
Permission is hereby granted by the Open Geospatial Consortium, ("Licensor"), free of charge and subject to the terms set forth below, to any person obtaining a copy of this Intellectual Property and any associated documentation, to deal in the Intellectual Property without restriction (except as set forth below), including without limitation the rights to implement, use, copy, modify, merge, publish, distribute, and/or sublicense copies of the Intellectual Property, and to permit persons to whom the Intellectual Property is furnished to do so, provided that all copyright notices on the intellectual property are retained intact and that each person to whom the Intellectual Property is furnished agrees to the terms of this Agreement.
If you modify the Intellectual Property, all copies of the modified Intellectual Property must include, in addition to the above copyright notice, a notice that the Intellectual Property includes modifications that have not been approved or adopted by LICENSOR.
THIS LICENSE IS A COPYRIGHT LICENSE ONLY, AND DOES NOT CONVEY ANY RIGHTS UNDER ANY PATENTS THAT MAY BE IN FORCE ANYWHERE IN THE WORLD. THE INTELLECTUAL PROPERTY IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NONINFRINGEMENT OF THIRD PARTY RIGHTS. THE COPYRIGHT HOLDER OR HOLDERS INCLUDED IN THIS NOTICE DO NOT WARRANT THAT THE FUNCTIONS CONTAINED IN THE INTELLECTUAL PROPERTY WILL MEET YOUR REQUIREMENTS OR THAT THE OPERATION OF THE INTELLECTUAL PROPERTY WILL BE UNINTERRUPTED OR ERROR FREE. ANY USE OF THE INTELLECTUAL PROPERTY SHALL BE MADE ENTIRELY AT THE USER’S OWN RISK. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR ANY CONTRIBUTOR OF INTELLECTUAL PROPERTY RIGHTS TO THE INTELLECTUAL PROPERTY BE LIABLE FOR ANY CLAIM, OR ANY DIRECT, SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM ANY ALLEGED INFRINGEMENT OR ANY LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR UNDER ANY OTHER LEGAL THEORY, ARISING OUT OF OR IN CONNECTION WITH THE IMPLEMENTATION, USE, COMMERCIALIZATION OR PERFORMANCE OF THIS INTELLECTUAL PROPERTY.
This license is effective until terminated. You may terminate it at any time by destroying the Intellectual Property together with all copies in any form. The license will also terminate if you fail to comply with any term or condition of this Agreement. Except as provided in the following sentence, no such termination of this license shall require the termination of any third party end-user sublicense to the Intellectual Property which is in force as of the date of notice of such termination. In addition, should the Intellectual Property, or the operation of the Intellectual Property, infringe, or in LICENSOR’s sole opinion be likely to infringe, any patent, copyright, trademark or other right of a third party, you agree that LICENSOR, in its sole discretion, may terminate this license without any compensation or liability to you, your licensees or any other party. You agree upon termination of any kind to destroy or cause to be destroyed the Intellectual Property together with all copies in any form, whether held by you or by any third party.
Except as contained in this notice, the name of LICENSOR or of any other holder of a copyright in all or part of the Intellectual Property shall not be used in advertising or otherwise to promote the sale, use or other dealings in this Intellectual Property without prior written authorization of LICENSOR or such copyright holder. LICENSOR is and shall at all times be the sole entity that may authorize you or any third party to use certification marks, trademarks or other special designations to indicate compliance with any LICENSOR standards or specifications.
This Agreement is governed by the laws of the Commonwealth of Massachusetts. The application to this Agreement of the United Nations Convention on Contracts for the International Sale of Goods is hereby expressly excluded. In the event any provision of this Agreement shall be deemed unenforceable, void or invalid, such provision shall be modified so as to make it valid and enforceable, and as so modified the entire Agreement shall remain in full force and effect. No decision, action or inaction by LICENSOR shall be construed to be a waiver of any rights or remedies available to it.
None of the Intellectual Property or underlying information or technology may be downloaded or otherwise exported or reexported in violation of U.S. export laws and regulations. In addition, you are responsible for complying with any local laws in your jurisdiction which may impact your right to import, export or use the Intellectual Property, and you represent that you have complied with any regulations or registration procedures required by applicable law to make this license enforceable.
- 1. Subject
- 2. Executive Summary
- 3. References
- 4. Terms and definitions
- 5. Overview
- 6. Earth Observation Platform
- 7. Earth Observation Applications
- 8. Pilot Applications
- 9. Conclusions
- Appendix A: Revision History
- Appendix B: Bibliography
1. Subject
This OGC Engineering Report (ER) documents the findings and experiences resulting from Terradue Activities on the OGC Earth Observation Applications Pilot. More specifically, this ER provides a way forward for the implementation of the "applications to the data" paradigm in the context of Earth Observation (EO) satellite data processing and Cloud-based platforms to facilitate and standardize the access to Earth observation data and information.
2. Executive Summary
The availability of a growing volume of environmental data from space represents a unique opportunity for science, general R&D, and applications (apps), but it also poses a major challenge to achieve its full potential in terms of data exploitation. Firstly, because the emergence of large volumes of data (Petabytes era) raises new issues in terms of discovery, access, exploitation, and visualization of “Big Data”, with profound implications on how users do “data-intensive” Earth Science. Secondly, because the inherent growing diversity and complexity of data and users, whereby different communities – having different needs, skills, methods, languages and protocols – need to cooperate to make sense of a wealth of data of different nature (e.g. EO, in-situ, model), structure and format.
Responding to these technological and community challenges requires the development of new ways of working, capitalizing on Information and Communication Technology (ICT) developments to facilitate the exploitation, analysis, sharing, mining and visualization of massive EO data sets and high-level products within Europe and beyond. Evolution in information technology and the consequent shifts in user behavior and expectations provide new opportunities to provide more significant support to EO data exploitation.
Earth Observation Platforms provide customers with an environment allowing them to focus on their core business and outsource other aspects to a platform that supplies services to a large number of customers with similar needs. The success of platforms in the business world is based on their ability to minimize cost and time to market for their customers, thereby also reducing the risk of exploring unproven business cases. In recent years, Platforms for the Exploitation of Earth Observation data have been developed by public and private companies in order to foster the usage of EO data and expand the market of Earth Observation-derived information. The domain is composed of platform providers, service providers who use the platform to deliver a service to their users, and data providers. The availability of free and open data (e.g. Copernicus Sentinel), together with the availability of affordable computing resources, creates an opportunity for the wide adoption and use of Earth Observation data in a growing number of fields in our society.
OGC activities in Testbed-13, Testbed-14, and Testbed-15 initiated the development of an architecture to allow the ad-hoc deployment and execution of applications close to the physical location of the source data with the goal to minimize data transfer between data repositories and application processes.
The activity described in this Engineering Report responds to the invitation for Earth observation platform operators to implement the OGC Earth Observation Applications Pilot architecture as it has been defined in those previous OGC Innovation Program (IP) initiatives. The goal of the pilot is to evaluate the maturity of those specifications in a real-world environment with several Earth Observation applications brought by several application developers that work with Earth observation satellites. These developers brought different views and requirements in terms of data discovery, data loading, data processing, and result delivery to which the platform readiness is challenged and evolutions are proposed to the architecture.
This Engineering Report initiates by introducing the Earth Observation (EO) Platform architecture and documents the encoding and interfaces needed for defining, deployment and execution of EO Applications brought by the different application developers. The ER concludes with a summary of the main challenges found during the pilot activities and provides further recommendations to advance the architecture, integration and implementation strategies taking in consideration the viewpoints of both EO platforms and EO application developers.
2.1. Document contributor contact points
All questions regarding this document should be directed to the editor or the contributors:
Contacts
Name | Organization | Role |
---|---|---|
Pedro Gonçalves |
Terradue |
Editor |
Fabrice Brito |
Terradue |
Contributor |
Emmanuel Mathot |
Terradue |
Contributor |
2.2. Foreword
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. The Open Geospatial Consortium shall not be held responsible for identifying any or all such patent rights.
Recipients of this document are requested to submit, with their comments, notification of any relevant patent claims or other intellectual property rights of which they may be aware that might be infringed by any implementation of the standard set forth in this document, and to provide supporting documentation.
3. References
The following normative documents are referenced in this document.
-
OGC: OGC 06-121r9, OGC Web Services Common Standard, 2010
-
OGC: OGC 14-065r2, OGC Web Processing Service 2.02
-
OGC: OGC 13-026r8, OGC OpenSearch Extension for Earth Observation 1.0, 2016
-
OGC: OGC 13-032r8, OGC OpenSearch Geo and Time Extensions 1.0.0, 2014
-
OGC: OGC 12-084r2, OWS Context Atom Encoding Standard, 1.0.0, 2014
-
OGC: OGC 14-055r2, OGC OWS Context GeoJSON Encoding Standard, 1.0., 2017
-
OGC: OGC 17-069r3, OGC API - Features - Part 1: Core, 1.0, 2019
4. Terms and definitions
For the purposes of this report, the definitions specified in Clause 4 of the OWS Common Implementation Standard OGC 06-121r9 shall apply. In addition, the following terms and definitions apply.
- ● Application
-
A self-contained set of operations to be performed, typically to achieve a desired data manipulation. The Application must be implemented (codified) for deployment and execution on the platform.
- ● Application Deployment and Execution Service (ADES)
-
WPS-T (REST/JSON) service that incorporates the Docker execution engine, and is responsible for the execution of the processing service (as a WPS request) within the ‘target’ Exploitation Platform.
- ● Application Descriptor
-
A file that provides the metadata part of the Application Package. Provides all the metadata required to accommodate the processor within the WPS service and make it available for execution.
- ● Application Package
-
A platform independent and self-contained representation of a software item, providing executable, metadata and dependencies such that it can be deployed to and executed within an Exploitation Platform. Comprises the Application Descriptor and the Application Artefact.
- ● Compute Platform
-
The Platform on which execution occurs (this may differ from the Host or Home platform where federated processing is happening).
- ● Docker
-
Docker is a set of platform-as-a-service products that use OS-level virtualization to deliver software in packages called containers. Containers are isolated from one another and bundle their own software, libraries and configuration files; they can communicate with each other through well-defined channels.
- ● Execution Management Service (EMS)
-
The EMS is responsible for the orchestration of workflows, including the possibility of steps running on other (remote) platforms, and the on-demand deployment of processors to local/remote ADES as required.
- ● Exploitation Platform
-
An on-line collection of products, services and tools for exploitation of EO data.
- ● Spatiotemporal Asset
-
Any file that represents information about the earth captured in a certain space and time.
- ● GeoTIFF
-
A public domain metadata standard which allows georeferencing information to be embedded within a TIFF file. The potential additional information includes map projection, coordinate systems, ellipsoids, datums, and everything else necessary to establish the exact spatial reference for the file.
- ● JPEG2000
-
An image compression standard and coding system
- ● Processing
-
A set of predefined applications that interact to achieve a result. For the exploitation platform, comprises on-line processing to derive data products from input data, conducted by a hosted processing service execution.
- ● Processing Result
-
The Products produced as output of a Processing Service execution.
- ● Processing Service
-
A non-interactive data processing that has a well-defined set of input data types, input parameterization, producing Processing Results with a well-defined output data type.
- ● Products
-
EO data (commercial and non-commercial) and Value-added products and made available through the EP. It is assumed that the Hosting Environment for the EP makes available an existing supply of EO Data.
- ● Transactional Web Processing Service (WPS-T)
-
Transactional extension to WPS that allows ad hoc deployment / undeployment of user-provided application.
- ● Web Processing Services (WPS)
-
OGC standard that defines how a client can request the execution of a process, and how the output from the process is handled
4.1. Abbreviated terms
-
ADES Application Deployment and Execution Service
-
API Application Programming Interface
-
CWL Common Workflow Language
-
DIAS Copernicus Data and Information Access Services
-
DLR Deutsches Zentrum für Luft- und Raumfahrt (German Aerospace Center)
-
EMS Execution Management Service
-
EO Earth Observation
-
EP Exploitation Platform
-
ER Engineering Report
-
ESA European Space Agency
-
GDAL Geospatial Data Abstraction Library
-
IW Interferometric Wide
-
JSON JavaScript Object Notation
-
OGC Open Geospatial Consortium
-
OS Operating System
-
OWS OGC Web Services
-
REST Representational State Transfer
-
SAFE Standard Archive Format for Europe
-
SatCen European Union Satellite Centre
-
SLC Single Look Complex
-
SNAP Sentinel Application Platform toolbox
-
STAC SpatioTemporal Asset Catalog
-
TBD To Be Determined
-
TIE Technology Integration/Interoperability Experiments
-
TIFF Tagged Image File Format
-
URL Uniform Resource Locator
-
WPS Web Processing Service
-
WPS-T Transactional Web Processing Service
-
XML Extensible Markup Language
-
YAML YAML Ain’t Markup Language
5. Overview
Section 6 introduces the Earth Observation (EO) Platform architecture targeting the deployment and execution of EO Applications in distributed Cloud Platforms. The section provides an overview of EO applications design patterns, package and data interfaces. The section presents the approach followed by previous OGC Testbeds (13, 14 and 15) and the evolutions implemented to respond to specific challenges addressed during the pilot activities.
Section 7 presents the full cycle of definition, deployment and execution of Earth Observation Applications on an Exploitation Platform. This section presents the application packaging using the Common Workflow Language (CWL) [1] and data stage-in/out strategies for the deployment and publication of the application through a Web Service endpoint, OGC Web Processing Service (WPS).
Section 8 presents the EO Applications brought by the different application developers and documents all the steps for their deployment and execution.
Section 9 provides a summary of the main findings and provides further recommendations to advance the architecture, integration and implementation strategies taking in consideration the viewpoints of both EO platforms and EO application developers.
Please note that this ER refers to WPS and OGC API – Processes interchangeably. This is because the OGC API – Processes draft specification emerged from the draft specification of the REST binding of the WPS standard.
6. Earth Observation Platform
This section explains the concepts from the Platform viewpoint.
6.1. Overview
Terradue’s Ellip Platform provides an application integration and processing environment with access to EO data to support Earth Sciences. The Ellip User Algorithm Hosting Service gives access to a dedicated application integration environment in the Cloud, with easily exploitable software tools and libraries and access to distributed EO Data repositories to support the services adaptation and customization. The service provides developers with well-defined operational processes and procedures in order to allow for robust packaging of applications, test and validation activities, and application deployment in production. The adaptation and customization of the services is focused on supporting the developer in defining the parallelization strategy, the data requirements needs, tools and libraries necessary to execute the applications and the overall production strategy.
Ultimately, the User Algorithms are exposed through a Web Service endpoint, Web Processing Service (OGC WPS), that allows end-user portals and B2B client applications to pass processing parameters, trigger a data processing or systematic requests and establish the data pipeline to retrieve the information produced.
This section shows how Ellip was adapted and the evolutions implemented to respond to specific challenges addressed during the pilot activities.
6.2. Architecture
Previous OGC Testbeds (13, 14 and 15) initiated the design of an architecture to allow the packaging, deployment and execution of Earth Observation Applications in distributed Cloud Platforms.
These testbed activities built on OGC standards (e.g. WPS, OWS-Context) to describe data processing applications or workflows as Application Packages that can be deployed on and executed within diverse Cloud Platforms.
The WPS service allows end-user portals and B2B client applications to pass processing parameters, trigger on-demand or systematic data processing requests and establish the data pipeline to retrieve the information produced.
The Application Package includes information about the execution unit to be executed or specific workflow script that can be invoked on the processor directly together with the mappings for parameterization or tailoring.
The testbeds defined an architecture where the Application Package is used with an Execution Management Service (EMS) and Application Deployment and Execution Service (ADES) as seen in the figure below. The EMS provides the workflow orchestration service (WPS-T) to deploy and invoke processing services workflows on multiple deployment and execution services. The ADES provides the execution engine service of the application that was previously deployed as a WPS service by the EMS.
The Execution Management Service (EMS) provides the orchestration service to deploy and invoke services of an application package on an ADES of the selected platform. It is responsible for managing the on-demand deployment and execution of the workflow building blocks on the “local” or “remote” platform.
The main responsibilities of the ADES are:
-
Validate and accept an application deployment request from the EMS
-
Validate and accept an execution request from the EMS
-
Submit the process execution to the processing cluster
-
Monitor the process execution
-
Retrieve the processing results
In order to accomplish the execution and monitor steps above, the internal sub-steps also need to be responsible for the operations of:
-
Data Stage-in for the process inputs
-
Data Stage-out for the process outputs
6.3. Application Integration Scenarios
An Earth Observation Platform provides processing functions, tools and applications invoked individually or utilized in workflows. Users are able to integrate their own processing services into the platform and make them available for exploitation by other users also individually or in their own workflows. To support their integration activities, users are provided with an environment where they can integrate, build, test & debug and deploy.
The platform gives access to a dedicated application integration environment with exploitable software tools and libraries and access to distributed EO Data repositories to support the services adaptation and customization. It allows the design and integration of services as scalable data processing chains, leveraging selected data programming models.
The environment considers two scenarios of EO application integration:
-
Importing: The application is directly packaged as a black-box. It relies upon the stage-in/out of data to the applications existing data access expectations by the Processing Framework.
-
Adapting: The application is adapted to use the data access interfaces offered for data input and output.
The main objective of the current pilot project is to cover the importing scenario.
6.4. Application Design Pattern
6.4.1. Data-driven application with a fan-in application pattern
The data driven application fan-in patterns refers to the execution of a data processing function that aggregates several input products. The platform application accesses a list of input products, retrieves and proceeds with the stage-in of the products making them available to the application execution block.
6.4.2. Data-driven application with a fan-out application pattern
The data driven application fan-out patterns refers to the execution of a data processing function that processes concurrently several products generating independent output for each input. The platform application loops from a list of input products, retrieves and proceeds with the stage-in of the individual products making them available to the application execution block. The platform can apply different strategies to parallelize the execution of each individual product.
6.4.3. Parameter-driven application pattern
Parameter-driven data flows permit cyclic, systematic retrieval of selected groups of input products between selected the parameter intervals (e.g. start and end dates). In this scenario, the parameter interval acts as a step function, determining how the next batch of products is to be selected.
When considering temporal parameters, this pattern creates a "moving window" of processing mostly suitable for daily or weekly composites, or any task where input products must be grouped for processing by time interval.
The platform can apply different strategies to parallelize the execution of each individual product.
6.5. Application Package
The Application Package targets algorithms (developed by a third-party developer) that are to be deployed in a Cloud Platform. The Application Package will contain all the information necessary to allow applications to be specified and deployed in federated resources providers. The Application Package takes into consideration past efforts to integrate OGC services with well-established formats and protocols that are natively ready for interoperability within Cloud solutions.
The main objective of the Application Package description file is to define a data processing application or workflow providing the information about the parameters, software item, executable, metadata and dependencies such that it can be deployed to and executed within an Exploitation Platform. This file must require a simple encoding so that application developers are able to create it, ensure that the application is fully portable among all supporting platforms and supports automatic deployment in a Machine-To-Machine (M2M) scenario. The Application Package information model must also allow the deployment of the application as a WPS service.
The execution block (i.e. Application Artefact) describes the ‘software’ component that represents the execution unit in a specific container (e.g. Docker) to be executed or specific workflow script that can be invoked on the processor directly. Based on the context information provided with the processor, the execution block maps how the container can be parameterized or tailored.
Docker is currently the selected container format and enables applications to be quickly assembled from components and eliminates the friction between development and production environments. As a result, applications can ship faster and run the same process into large multi-tenant data centers in the Cloud. Docker images work as an isolated process in the host operating system, which shares a kernel with other containers. Thereby, still having the benefits of virtualization, it is more portable and more effective by using less disk space. The application packaging allows the developers to remove their focus from the infrastructure details.
Previous work in OGC Testbed-13 (OGC 17-023) defined two alternative encodings for the Application Package. One as an OWS Context Document (OGC 12-084r2) with an XML encoding and an alternative where the information was included in the WPS Process Description in the ows:metadata
element.
Subsequent OGC Testbed-14 (OGC 18-049r1) updated this model with an OWS Context JSON encoding (OGC 14-055r2) with the addition of a (embedded) Common Workflow Language (CWL) file (in JSON or YAML) that could be used in the simplest case to describe a single application and how to invoke the command-line within the container.
The CWL is a specification for describing analysis workflows and tools in a way that makes them portable and scalable across a variety of software and hardware environments, from workstations to cluster, cloud, and high-performance computing (HPC) environments. While CWL is not heavily used in the EO domain, it is designed to meet the needs of data-intensive science, such as Bioinformatics, Medical Imaging, Astronomy, Physics, and Chemistry.
The CWL contains two main specifications. The Command Line Tool Description Specification that specifies the document schema and execution semantics for wrapping and executing command line tools and the Workflow Description Specification that specifies the document schema and execution semantics for composing workflows from components such as command line tools and other workflows.
To summarize, OGC Testbed-14 proposed the use of CWL initially for the command line invocation, the activities during this pilot showed a simple way of pointing to a Docker container from inside the CWL file included in the Application Packaging. We concluded that the CWL file has the potential to reference the application containers (e.g. Dockers) and also allows the definitions of the application parameters, input/output interface and the overall process offering parameters.
6.6. Data Flow Management
Previous OGC Testbeds (13, 14 and 15) followed an approach where the ADES is responsible for providing the data stage-in of product input references specified on the WPS request and stage-out the results of the application.
Data stage-in is the process to locally retrieve the inputs for the processing. Processing inputs are provided as EO Catalogue references and the ADES is responsible to translate those references into inputs available for the local processing.
Data stage-out is the process to upload the outputs of the processing onto external system(s), and make them available for later usage. ADES retrieves the processing outputs and automatically stores them onto an external persistent storage. Additionally, ADES should publish the metadata of the outputs onto a Catalogue and provide their references as an output.
In Testbed-13 the ADES makes the files available in a specific location defined by an environment variable pointing to the working directory. In Testbed-14, the input files location was provided directly in the CWL file, mentioned in the previous section, by defining one or more input files or folder directories.
However, EO product files come in different formats (e.g. GeoTIFF, HDF5, SAFE) and might have sub-items (e.g. metadata, bands, masks) that can be encoded in the same file or follow a given folder structure.
For example, SENTINEL-2 products are made available to users in SENTINEL-SAFE format, including image data in JPEG2000 format, quality indicators (e.g. defective pixels mask), auxiliary data and metadata. The SAFE format wraps a folder containing image data in a binary data format and product metadata in XML. A SENTINEL-2 product refers to a directory folder that contains a collection of information that can include several files like seen in Figure 5.
Different platforms will catalog and stage the products differently. Some will reproduce the exact folder structure and return the folder root or the main XML manifest file. Other platforms will compress the folder structure and return the compressed file. Other platforms will even convert the file and return a GeoTIFF aggregating all the information.
In general, the approach followed in the previous testbeds assumed that the applications were responsible for navigating the input folder directory and programmatically reacting to how the file was staged-in by the platform. This implies that the application developer needs to consider all possible cases when developing their read routines.
Conversely, the outputs of the application are fully managed by the developer that must place the resulting files in the output directory. The only information the ADES has about the output files is the file media type (formerly known as “MIME-type”).
To summarize, in previous testbeds the ADES had limited knowledge about the input and output files respectively their data contents. Thus, ADES was often missing critical information like spatial footprint, sub-items (e.g. masks, bands) and additional metadata (e.g. ground sample distance, orbit direction). To respond to this issue, the activities during this Pilot activity showed the potential to follow an approach that uses a local catalogue encoded using the SpatioTemporal Asset Catalog (STAC) specification as a data manifest for application input and output [2].
The STAC specification standardizes the way geospatial assets are exposed online and queried. A 'spatiotemporal asset' is any file that represents information about the earth captured in a certain space and time (e.g. satellites, planes, drones, balloons). Most importantly the STAC specification can be implemented in a completely 'static' manner as flat files, enabling data publishers to expose their data by simply publishing one or several JSON files drilling down from collection (e.g. Sentinel-2), item (e.g. Sentinel-2 product) and assets (e.g. JPG2000 band file, auxiliary data, browse) with a relative path (something that was not possible using OpenSearch as defined by OGC 13-026r8, OGC 13-032r8).
The STAC specification defines several objects:
-
STAC Catalog: STAC Catalog is a collection of STAC Items or other STAC Catalogs (sub-catalogs). The division of sub-catalogs is transparently managed by links to ease online browsing.
-
STAC Collection: extends the STAC Catalog with additional fields to describe a whole set of STAC Items that share properties and metadata. STAC Collections are meant to be compatible with OGC API - Features Collections (OGC 17-069r3).
-
STAC Item: a GeoJSON Feature with additional fields (e.g. time, geo), links to related entities and STAC Assets.
-
STAC Asset: is an object that contains a link to data associated with the STAC Item that can be downloaded or streamed (e.g. data, metadata, thumbnails) and can contain additional metadata. Similar to atom:link it has properties like href, title, description, type and roles; but allows relative paths.
The STAC specification supports the ADES stage-in operation for all the identified application patterns using EO data catalog references as inputs. A local STAC catalog for the fan-in (Figure 6) and pairs (Figure 8) patterns or several local STAC catalogs with a single item for the fan-out pattern (Figure 7).