Published

OGC Engineering Report

Earth Observation Cloud Platform Concept Development Study Report
Johannes Echterhoff Editor Julia Wagemann Editor Josh Lieberman Editor
OGC Engineering Report

Published

Document number:21-023
Document type:OGC Engineering Report
Document subtype:
Document stage:Published
Document language:English

License Agreement

Permission is hereby granted by the Open Geospatial Consortium, (“Licensor”), free of charge and subject to the terms set forth below, to any person obtaining a copy of this Intellectual Property and any associated documentation, to deal in the Intellectual Property without restriction (except as set forth below), including without limitation the rights to implement, use, copy, modify, merge, publish, distribute, and/or sublicense copies of the Intellectual Property, and to permit persons to whom the Intellectual Property is furnished to do so, provided that all copyright notices on the intellectual property are retained intact and that each person to whom the Intellectual Property is furnished agrees to the terms of this Agreement.

If you modify the Intellectual Property, all copies of the modified Intellectual Property must include, in addition to the above copyright notice, a notice that the Intellectual Property includes modifications that have not been approved or adopted by LICENSOR.

THIS LICENSE IS A COPYRIGHT LICENSE ONLY, AND DOES NOT CONVEY ANY RIGHTS UNDER ANY PATENTS THAT MAY BE IN FORCE ANYWHERE IN THE WORLD. THE INTELLECTUAL PROPERTY IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NONINFRINGEMENT OF THIRD PARTY RIGHTS. THE COPYRIGHT HOLDER OR HOLDERS INCLUDED IN THIS NOTICE DO NOT WARRANT THAT THE FUNCTIONS CONTAINED IN THE INTELLECTUAL PROPERTY WILL MEET YOUR REQUIREMENTS OR THAT THE OPERATION OF THE INTELLECTUAL PROPERTY WILL BE UNINTERRUPTED OR ERROR FREE. ANY USE OF THE INTELLECTUAL PROPERTY SHALL BE MADE ENTIRELY AT THE USER’S OWN RISK. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR ANY CONTRIBUTOR OF INTELLECTUAL PROPERTY RIGHTS TO THE INTELLECTUAL PROPERTY BE LIABLE FOR ANY CLAIM, OR ANY DIRECT, SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM ANY ALLEGED INFRINGEMENT OR ANY LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR UNDER ANY OTHER LEGAL THEORY, ARISING OUT OF OR IN CONNECTION WITH THE IMPLEMENTATION, USE, COMMERCIALIZATION OR PERFORMANCE OF THIS INTELLECTUAL PROPERTY.

This license is effective until terminated. You may terminate it at any time by destroying the Intellectual Property together with all copies in any form. The license will also terminate if you fail to comply with any term or condition of this Agreement. Except as provided in the following sentence, no such termination of this license shall require the termination of any third party end-user sublicense to the Intellectual Property which is in force as of the date of notice of such termination. In addition, should the Intellectual Property, or the operation of the Intellectual Property, infringe, or in LICENSOR’s sole opinion be likely to infringe, any patent, copyright, trademark or other right of a third party, you agree that LICENSOR, in its sole discretion, may terminate this license without any compensation or liability to you, your licensees or any other party. You agree upon termination of any kind to destroy or cause to be destroyed the Intellectual Property together with all copies in any form, whether held by you or by any third party.

Except as contained in this notice, the name of LICENSOR or of any other holder of a copyright in all or part of the Intellectual Property shall not be used in advertising or otherwise to promote the sale, use or other dealings in this Intellectual Property without prior written authorization of LICENSOR or such copyright holder. LICENSOR is and shall at all times be the sole entity that may authorize you or any third party to use certification marks, trademarks or other special designations to indicate compliance with any LICENSOR standards or specifications. This Agreement is governed by the laws of the Commonwealth of Massachusetts. The application to this Agreement of the United Nations Convention on Contracts for the International Sale of Goods is hereby expressly excluded. In the event any provision of this Agreement shall be deemed unenforceable, void or invalid, such provision shall be modified so as to make it valid and enforceable, and as so modified the entire Agreement shall remain in full force and effect. No decision, action or inaction by LICENSOR shall be construed to be a waiver of any rights or remedies available to it.

None of the Intellectual Property or underlying information or technology may be downloaded or otherwise exported or reexported in violation of U.S. export laws and regulations. In addition, you are responsible for complying with any local laws in your jurisdiction which may impact your right to import, export or use the Intellectual Property, and you represent that you have complied with any regulations or registration procedures required by applicable law to make this license enforceable.



I.  Abstract

The Earth Observation Cloud Platform Concept Development Study (CDS) evaluates the readiness of satellite data providers and cloud service providers, as well as the maturity of their current systems, with regard to real-world deployment of the new “Applications-to-the-Data” paradigm, using cloud environments for EO data storage, processing, and retrieval.

II.  Executive Summary

Earth Observation (EO) data include data from satellites, but also in-situ as well as model-based data. The volume of Earth Observation (EO) has drastically increased in recent years. An increasing number of satellites and improved capabilities (better spatial and temporal resolution) of new imaging sensors and weather and climate models have led to an exponential growth in the daily EO data stream. EO data providers as well as EO data users are faced with challenges to manage, process and handle the continued increase in EO data. The growing data volume, combined with the EO variety, and the velocity in which the data is being made available, call for fundamental changes in the traditional modality of how EO data is disseminated and how users process and analyse the data. The volume of EO data can no longer be downloaded and processed on local machines. New EO data analysis workflows focus on ‘bringing applications to the data’, where EO data is accessed and processed in a cloud-based environment. Cloud-based services provide a highly scalable and flexible computing environment, where storage and computing resources can be acquired as needed, and where overall IT costs for working with EO data can significantly be reduced.

Cloud-based systems to store, process, analyze, and make EO data accessible are a paradigm change and disrupt the traditional EO data dissemination and analysis workflow.

The Earth Observation Cloud Platform Concept Development Study (CDS) evaluates the readiness of satellite data providers and cloud service providers, as well as the maturity of their current systems, with regard to real-world deployment of the new “Applications-to-the-Data” paradigm, using cloud environments for EO data storage, processing, and retrieval. The study highlights the results of three activities: (i) a dedicated “EO Technologies Show and Tell” workshop in December 2020, (ii) online meetings with a number of stakeholders in January and February 2021, and (iii) a literature review.

The study report documents the evolution of EO system architectures and covers common aspects of these architectures and platforms — data coverage and transmission, storage, discovery, access, as well as security — and the current status of systems from satellite data as well as cloud service providers. A number of topics were identified that satellite data and cloud service providers intend to address in the near future, such as improving interoperability, continuing the migration into the cloud, and expanding the range of available toolsets for analyzing EO data in the cloud. A number of challenges and recommendations have been identified, as well as lessons learned. They include, but are not limited to: the need for more interoperability in cloud-based applications, the lack of policies for data sharing in case of disasters, and the need for training and capacity building to develop the skills necessary for working with EO data in a cloud-based environment.

The study reveals that satellite data providers are moving towards cloud computing, and implementing the applications-to-the-data paradigm. Right now, the major focus for many providers is to make EO data accessible in the cloud. Others already process, analyze and disseminate EO data in the cloud.

Processing in the cloud removes the need to download large volumes of EO data leading to a decrease in total time it takes to analyze the data. Furthermore, the cloud provides the computing capabilities needed for processing such huge EO datasets, which local computing environments rarely support. The costs of using the cloud can be hard to specify up-front. Gradually migrating a system into the cloud and getting hands-on experience through well-defined projects can help build up the necessary experience, though.

Cloud-based EO systems are a way forward to better, manage, provision and process vast amounts of EO data. However, some problems that are typical for any IT system, such as technical and information interoperability, require additional work. The study shows that bringing applications to the data in the cloud is going to be the future in order to be able to process large amounts of EO data.

Cloud-based EO system architectures will enable monitoring Planet Earth and its ecosystems, and to study the impact of climate initiatives and programs. Simulations and forecasts on a global scale will be made possible, but also timely provisioning of EO data in case of disasters, such as forest fires and flooding. Cloud-based EO systems will play a vital role in supporting the U.S. engagement to address climate change (as outlined in the Executive Order on tackling the climate crisis, e.g. Sec. 211 d) and Sec. 222 b) (ii)) and the goals of the European strategy for data, including the Destination Earth initiative.

III.  Keywords

The following are keywords to be used by search engines and document catalogues.

ogcdoc, OGC document, EO Cloud Platform, CDS, ER


IV.  Preface

Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. The Open Geospatial Consortium shall not be held responsible for identifying any or all such patent rights.

Recipients of this document are requested to submit, with their comments, notification of any relevant patent claims or other intellectual property rights of which they may be aware that might be infringed by any implementation of the standard set forth in this document, and to provide supporting documentation.

V.  Security considerations

No security considerations have been made for this document.

VI.  Submitting Organizations

The following organizations submitted this Document to the Open Geospatial Consortium (OGC):

VII.  Submitters

All questions regarding this document should be directed to the editor or the contributors:

Name Organization Role
Johannes Echterhoff Interactive Instruments Editor/Contributor
Julia Wagemann Consultant Editor/Contributor
Josh Lieberman OGC Editor

VIII.  Acknowledgements

We would like to thank the following companies and organizations who provided inputs for this study.

Company / Organization Contact

Amazon Web Services (AWS)

Mark Korver

European Space Agency (ESA)

Albrecht Schmidt, Cristiano Lopes

European Organisation for the Exploitation of Meteorological Satellites (EUMETSAT)

Joana Miguens, Peter Albert

CERN

Joao Fernandes

Fisheries and Oceans Canada (DFO)

Tobias Spears

Maxar Technologies

Kumar Navulur

Mercator Ocean International

Alain Arnaud

National Aeronautics and Space Administration (NASA)

Chris Lynnes

Natural Resources Canada (NRCan)

Brian Low, Ryan Ahola, Will Mackkinnon

Planet Labs

Chris Holmes, Quinn Scripter

Earth Observation Cloud Platform Concept Development Study Report

1.  Normative references

The following documents are referred to in the text in such a way that some or all of their content constitutes requirements of this document. For dated references, only the edition cited applies. For undated references, the latest edition of the referenced document (including any amendments) applies.

ISO: ISO 19115-1:2014, Geographic information — Metadata — Part 1: Fundamentals. International Organization for Standardization, Geneva (2014). https://www.iso.org/standard/53798.html

ISO: ISO 19115-2:2019, Geographic information — Metadata — Part 2: Extensions for acquisition and processing. International Organization for Standardization, Geneva (2019). https://www.iso.org/standard/67039.html

ISO: ISO 19157:2013, Geographic information  — Data quality. International Organization for Standardization, Geneva (2013). https://www.iso.org/standard/32575.html

ISO: ISO 19165-1:2018, Geographic information — Preservation of digital data and metadata — Part 1: Fundamentals. International Organization for Standardization, Geneva (2018). https://www.iso.org/standard/67325.html

ISO: ISO 19165-2:2020, Geographic information — Preservation of digital data and metadata — Part 2: Content specifications for Earth observation data and derived digital products. International Organization for Standardization, Geneva (2020). https://www.iso.org/standard/73810.html

ISO/IEC: ISO/IEC 19941:2017, Information technology — Cloud computing — Interoperability and portability. International Organization for Standardization and International Electrotechnical Commission, Geneva (2017). https://www.iso.org/standard/66639.html

Graham Vowles: OGC 06-004r4, Topic 18 — Geospatial Digital Rights Management Reference Model (GeoDRM RM). Open Geospatial Consortium (2007). https://portal.ogc.org/files/?artifact_id=17802

2.  Terms, definitions and abbreviated terms

This document uses the terms defined in OGC Policy Directive 49, which is based on the ISO/IEC Directives, Part 2, Rules for the structure and drafting of International Standards. In particular, the word “shall” (not “must”) is the verb form used to indicate a requirement to be strictly followed to conform to this document and OGC documents do not use the equivalent phrases in the ISO/IEC Directives, Part 2.

This document also uses terms defined in the OGC Standard for Modular specifications (OGC 08-131r3), also known as the ‘ModSpec’. The definitions of terms such as standard, specification, requirement, and conformance test are provided in the ModSpec.

For the purposes of this document, the following additional terms and definitions apply.

2.1.  Terms and definitions

2.1.1. Cloud Computing

Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. This cloud model is composed of five essential characteristics (on-demand self-service, broad network access, resource pooling, rapid elasticity, measured service), three service models (Software as a Service, Platform as a Service, Infrastructure as a Service), and four deployment models (private, community, public, and hybrid cloud). Source: The NIST Definition of Cloud Computing.

2.1.2. Earth Observation

Data and information collected about our planet, whether atmospheric, oceanic or terrestrial. This includes space-based or remotely-sensed data, as well as ground-based or in situ data, and computer model based data such as weather forecasts. Based on https://earthobservations.org/geo_wwd.php (accessed on 2021-01-14), extended to include model-based data.

2.2.  Abbreviated terms

ADES

Application Development and Execution Service

AIS

Automatic Identification System

API

Application Programming Interface

ARD

Analysis Ready Data

AWS

Amazon Web Service

CDS

Concept Development Study

CEOS

Committee on Earth Observation Satellites

CFRA

Cloud Federation Reference Architecture

CMR

Common Metadata Repository

COG

Cloud-optimized GeoTIFF

CRR

Cross Region Replication

CWL

Common Workflow Language

DaaS

Data-as-a-Service

DCS

Data Centric Security

DIAS

Data and Information Access Services

DRM

Digital Rights Management

DWG

Domain Working Group

EC

European Commission

ECMWF

European Centre for Medium-Range Weather Forecasts

EMS

Execution Management Service

EO

Earth Observation

EODC

Earth Observation Data Centre

EODMS

EO Data Management System

EOSC

European Open Science Cloud

EOSDIS

Earth Observing System Data and Information System

ESA

European Space Agency

EU

European Union

EUMETSAT

European Organisation for the Exploitation of Meteorological Satellites

EWC

European Weather Cloud

GCMD

Global Change Master Directory

HDA

Harmonized Data Access

HPC

High Performance Computing

IaaS

Infrastructure-as-a-Service

IDN

International Data Network

JP2

JPEG 2000

MAAP

Multi-Mission Algorithm and Analysis Platform

NASA

National Aeronautics and Space Administration

NISAR

NASA-Indian Space Research Organisation Synthetic Aperture Radar

NIST

National Institute of Standards and Technology

NITF

National Imagery Transmission Format

ODIS

Ocean Data and Information Section

PaaS

Platform-as-a-Service

RCM

RADARSAT Constellation Mission

SaaS

Software-as-a-Service

SNAP

Sentinel Application Platform

STAC

Spatio Temporal Asset Catalog

STEP

Science Toolbox Exploitation Platform

SWOT

Surface Water Ocean Topography

TB

Terabyte

TEP

Thematic Exploitation Platform

UMM

Unified Metadata Model

USGS

United States Geological Survey

WPS

Web Processing Service

3.  Overview

The report is structured as follows:

4.  Introduction

Satellites collect vast amounts of Earth Observation (EO) data every day. Increasing numbers of satellites and improved capabilities of new imaging sensors have led to an exponential growth in the daily EO data stream. In 2019, for example, over 18TB of EO data from Copernicus, the European Union’s EO program, have been published on a daily basis under a full, free and open data license (European Space Agency 2020). Taking into account that EO data includes not only space-based data, but also remotely sensed, in-situ, as well as computer model based data, EO data providers as well as EO data users are faced with a challenge to manage, process and make use of this data flood. The growing data volume, combined with the EO variety, and the velocity in which the data is being made available, call for fundamental changes in the traditional modality of how EO data is disseminated and how users process and analyze the data. EO data volumes can no longer be downloaded and processed on local machines. New EO data analysis workflows focus on ‘bringing applications to the data’, where EO data is accessible and processable in a cloud environment.

Cloud-based services as a means to store, process, analyze, and make EO data accessible are a paradigm change and disrupt the traditional EO data dissemination and analysis workflow.

Cloud services vary widely in their capabilities, protocols, business models, and legal policies. There are services offered as Infrastructure (IaaS)- and/or Platform-as-a-Service (PaaS) from commercial cloud vendors, such as Amazon Web Services or Google Cloud and from publicly-funded bodies, such as the Copernicus Data and Information Access Services (DIAS) or the European Open Science Cloud (EOSC). Other cloud services offer more Data (Daas)- or Software-as-a-Service (SaaS) capabilities, such as the Google Earth Engine or the Copernicus Climate/Atmosphere Data Stores. The level of specialization of IaaS/PaaS services is low, which provides a wide flexibility for different applications, but requires system architecture knowledge. SaaS / DaaS services are more specialized to specific application areas or data and require more subject-matter expert knowledge.

The Earth Observation Cloud Platform Concept Development Study (CDS) evaluates the readiness of satellite data providers and cloud service providers, as well as the maturity of their current systems, with regard to real-world deployment of the new “Applications-to-the-Data” paradigm, using cloud environments for EO data storage, processing, and retrieval. The study was conducted by having a dedicated “EO Technologies Show and Tell” workshop in December 2020, as well as conducting online meetings with a number of stakeholders in January and February 2021, and by performing a literature study.

NOTE  The CDS intentionally does not cover mission planning and tasking (e.g. of satellites); the focus is on data storage, access, and processing.

5.  EO System Architecture — Stakeholders and Evolution

The growing volumes of EO data require new approaches to disseminate, access and process the data. The traditional workflow of how EO data is disseminated, processed, and analyzed currently undergoes a significant change process. Subsequently, we will present stakeholder groups of the EO system architecture and secondly, the different stages of the EO system architecture evolution.

5.1.  Stakeholders

The Committee on Earth Observation Satellites (CEOS) identifies three key stakeholder groups of the EO data value chain in their ARD Strategy paper: (i) EO data providers (public and private), (ii) big data hosts and aggregators and (iii) data users. These stakeholder groups are mainly defined based on the value chain for EO data from satellites. For the purposes of this study, EO data encompasses in-situ as well as model-based EO data, in addition to satellite data. Furthermore, through the advent of cloud-based services and platforms for EO data management and analysis, the current EO system architecture landscape diversified and the three stakeholder groups defined by CEOS only partially reflect the EO data value chain. For this reason, rather than combining big data hosts and aggregators, we believe that the ‘aggregators’, how CEOS defines them, should be considered as data users. We therefore propose to differentiate between two groups of data users: intermediate users and end users. This differentiation is also reflected in the Copernicus Market Report 2019.

Subsequently, we describe the following four stakeholder groups: (i) EO data providers, (ii) cloud service / platform providers, (iii) Intermediate data users and (iv) end users.

5.1.1.  EO Data Providers

EO data providers are public and private sector organizations, who operate satellites or run models (e.g. for weather- or climate prediction) and are in charge of disseminating the data. Public sector organizations can be on a national level, e.g. the National Aeronautics and Space Administration (NASA) is an independent agency for the U.S. federal government, but can also be intergovernmental like the European Space Agency (ESA), which coordinates the space programs of 22 member states, or the European Centre for Medium-Range Weather Forecasts (ECMWF). There are also private sector companies that operate their own fleets of satellites, such as MAXAR (previously DigitalGlobe), Planet Labs and EU Space Imaging.

5.1.2.  Cloud Service or Platform Providers

Cloud Service or Platform Providers can be publicly-funded organizations or commercial companies offering different types of cloud services or platforms.

Amazon Web Services, Microsoft Azure, and Google Cloud Platform are examples of popular commercial cloud vendors. The European Open Science Cloud or the Copernicus Data Information and Access Services (DIAS), are examples of publicly-funded cloud services.
Platforms for big EO Data Management and Analysis are defined by Gomes et al. (2020) as ‘computational solution that provide functionalities for big EO data management, storage and access; that allow the processing on server side without having to download big amounts of EO data sets; and that provide a certain level of data and processing abstractions for EO community users and researchers’. Examples of commercially developed EO platforms that provide access to (value-added) EO data and processing are the Euro Data Cube as well as Ellip.

The differentiation between EO data providers and cloud service or platform providers is not always distinct. An EO data provider can also be a cloud service provider, as it is in the case of the European Organisation for the Exploitation of Meteorological Satellites (EUMETSAT) or the European Centre for Medium-Range Weather Forecasts (ECMWF). Both organizations are data providers of large volumes of satellite- and model-based data and at the same time involved in operating the Copernicus DIAS service WEkEO. Additionally, both organizations are also developing the European Weather Cloud (EWC), a dedicated cloud for national weather organisations and researchers of ECMWF and EUMETSAT’s member states.

EO data providers have also developed EO platforms, which makes them platform providers at the same time, e.g. the Thematic Exploitation Platforms (TEP) developed by ESA, the Joint ESA-NASA Multi-Mission Algorithm and Analysis Platform (MAAP) Albinet et.al [1], and the Climate Data Store Toolbox developed by the ECMWF.

On the other hand, cloud service or platform providers can also become EO data providers. Both Amazon and Google, for example, run Public Dataset Programs, where they provide access to open datasets, e.g. Copernicus satellite data. The data originator remains the original EO data provider, but the organisation who provides ‘access’ to the data in this case is the cloud service provider. The same applies for third party platform providers, such as Google Earth Engine or the providers of the Open Data Cube. By developing and managing the platform, they become an EO data provider for the platform users.

5.1.3.  Intermediate Users

Intermediate users are technical experts working in private companies or in publicly funded organizations, for example universities or research organizations, building the bridge between end users on one side, and EO data providers, cloud service and platform providers on the other side. Intermediate users are subject matter experts and have the required technical skills to access, handle, process and analyze EO data in order to retrieve the required information for end users.

5.1.4.  End Users

End users are policy and decision makers who need information gained from the analysis of EO data in the form of a map, graph, or number, summarized in a report. End users do not have the required technical skills to be able to process and analyze EO data, but have expert knowledge in a specific application domain.

5.2.  Evolution of EO System Architectures

The continued exponential growth of big EO data, combined with technological advancement, leads to fundamental changes in how and where EO data is stored and how users access, process, and use the data. The traditional ‘data-centric’ approach limits the full uptake and use of open EO data, as large volumes of EO data are copied to and processed on local machines (see Figure 1). In order to address the evolving need to minimize data duplication and offer scalable processing capabilities, the ‘moving code’ paradigm has evolved, which propagates the storage of large volumes of EO data as cloud objects and the access, processing and analysis of EO data on (cloud-based) servers. In general, cloud systems offer effective and scalable processing capabilities, but require advanced technical understanding from users. For this reason, more ‘user-centric’ approaches aiming to provide advanced access and processing of EO data, while hiding technical complexities, have been developed. Gomes et al. (2020) call these solutions ‘Platforms for big EO Data Management and Analysis’.
The current landscape of EO system architectures offers services of different evolution steps, from traditional EO system architectures to ‘user centric’ platforms to highly advanced cloud services allowing to execute entire applications on data stored in the cloud and federations of multiple cloud-based systems. Federations of cloud-based systems, however, is currently a visionary concept to which EO system architectures will evolve in the future. The role of the end user remains constant in all approaches and for this reason, we do not explicitly elaborate on end users.

5.2.1.  Traditional ‘Data-Centric’ approach

The majority of data users still tend to follow the traditional approach, where large volumes of data are downloaded and processed on local machines. In this scenario, different EO data providers (e.g. satellite-, model-based and in-situ data) are responsible for managing the data and disseminating it via a download service. Intermediate users are e.g. researchers or commercial companies, who make a copy of the data and pre-process and analyze the data on their local machines. With the growing volumes of EO data, this approach has gotten more and more cumbersome and limits the full uptake and use of EO data.

Figure 1 — Traditional data-centric approach, where large volumes of EO data are copied to and processed on local machines

5.2.2.  Traditional Approach With Cloud Storage

The growing volumes of EO data increase the need on the data provider side to more effectively manage EO data and offer data access in a programmatic way through e.g. an Application Programming Interface (API). Many EO data providers store their data archives on a cloud service (either their own cloud implementation or through services from commercial cloud vendors), which is accessible via an API for data users. The modality of how intermediate users access data does not change substantially in this scenario. Intermediate users still follow the traditional approach by ‘bulk-downloading’ large volumes of EO data to their local machines. The only change is the location where the data is stored, of which the intermediate user might or might not be aware of. Examples of this approach are the Copernicus Climate Data Store implemented by the European Centre for Medium-Range Weather Forecasts, as well as the public dataset programs Google Cloud Public Datasets and Earth on AWS.

Figure 2 — User-centric approach with dedicated platforms for EO data management and analysis

5.2.3.  Innovative ‘User Centric’ Approach — EO Data Management and Analysis Platforms

Platforms for EO data management and analysis have been developed based on the need to offer advanced access to and processing of EO data, while hiding technical complexities under a layer of abstraction. Different platforms and approaches have been developed. Examples for EO platforms are the Thematic Exploitation Platforms (TEP) developed by ESA, the Joint ESA-NASA Multi-Mission Algorithm and Analysis Platform (MAAP) (Albinet et.al [1]), the Open Data Cube (Killough, 2018), as well as commercial platforms such as Google Earth Engine (Gorelick et al. 2018). An example from the climate community is the Climate Data Store Toolbox.

Figure 3 — User-centric approach with dedicated platforms for EO data management and analysis

These platforms represent a new ‘user centric’ approach, where users and applications are brought to the data in order to process and analyze it. Platforms differ in the underlying technology they use, level of openness, the level of abstraction, and the data and functionalities they offer. According to the charter of the OGC EO Exploitation Platform DWG, the platforms share a common set of functionalities:

  • Cataloguing and searching;

  • Storage and access;

  • Visualization;

  • Data processing and analysis; and

  • User authentication, authorization, and accounting.

However, since current platforms for EO data management and analysis have been independently developed by public organizations and commercial companies, these platforms are not interoperable. In other words, they do not use (a) common (set of) interfaces and data formats for implementing the aforementioned functionalities. Additionally, the layer of abstraction dampens the flexibility and users might be constrained in the data or functionalities these platforms offer.

Regarding the EO data value chain, these platforms introduce platform providers as an additional stakeholder. Platform providers may be EO data providers, but most likely a different team in the organization or agency, or a third-party organization or company. The responsibility of managing, maintaining and provisioning the platform is shifted from the EO data provider to the platform provider. In the backend, such platforms may be operated in a cloud service, but may also be managed on a company server.

5.2.4.  OGC Activities Driving the Development of Interoperable EO Exploitation Platforms

The OGC has been driving the development of an interoperable Applications-to-the-Data EO platform architecture through multiple OGC Innovation Program initiatives (see Figure 4). Further initiatives — such as OGC Testbed-17 and the OGC Disasters Pilot 2021 — as well as the OGC EO Exploitation Platform Domain Working Group (DWG) will continue to improve and define the architecture in more detail.

Figure 4 — OGC Innovation Program initiatives that drove the development towards an interoperable Applications-to-the-Data EO platform architecture

At present, it is unclear which set of standards and specifications the OGC defines or recommends for realizing an interoperable EO platform. Defining this set of standards and specifications is amongst the key activities of the OGC EO Exploitation Platform DWG. The working group started writing an OGC Best Practice document for Earth Observation Application Package implementation (OGC 20-089). The document will focus on the application package (concept), but will also address the deployment viewpoint, i.e. how to deploy a package within a platform.

Relevant standards, specifications, and technologies appear to be, but are not limited to:

  • Common Workflow Language (CWL) — To describe an application package (input and output parameters, invocation) as well as to define complex application workflows.

  • Spatio Temporal Asset Catalog (STAC) — Used as data manifest for application inputs and outputs metadata.

  • Docker — A docker container encapsulates the implementation of an application, and can be executed in different cloud environments. A docker container registry (also known as docker hub) provides a common means to store and download docker containers. A docker container can potentially also be built ad-hoc, in order to lower security concerns when downloading and executing pre-built third-party applications.

  • OGC API — Processes and OGC Web Processing Service (WPS) — Provide a standard interface for deploying and executing an application. Deployment requires the transactional extension of the API / service. The interface is used by both EMS and ADES, which may require further profiling of the generic processing interface to fully specify the specific interactions and functions of the two services.

  • OGC OpenSearch, with Geo & Time extensions, with EO extensions — For discovery and cataloguing.

5.2.5.  Cloud-Based Data Access and Processing

Compared to EO data management and analysis platforms, cloud-based systems represent a more flexible approach of the ‘moving-code paradigm’. They provide scalable and effective access to and processing of EO data in the cloud and eliminate the need to download massive amounts of data. While intermediate users are provided a higher degree of functionalities compared to EO data platforms, pre-defined data selections often remain. On the other hand, cloud-based systems demand deeper technical knowledge of the system configuration and this poses a challenge to many EO data users.
Many large data organizations are currently either moving their data holdings to commercial cloud providers (e.g. NASA), or are in the process of implementing their own cloud service (e.g. ECMWF). In this latter scenario, data providers turn into cloud service providers at the same time. Nevertheless, cloud-based services also differ in their functionalities and specialization. There are general cloud services from commercial providers, e.g. Amazon Web Services or Google Cloud Platform, offering high flexibility but demanding high technical knowledge from users. With the paradigm change in EO, more specialized cloud services tailored for EO access and management are being implemented, such as the Copernicus Data and Information Access Services (DIAS) or the European Weather Cloud implemented by ECMWF / EUMETSAT. They offer the same flexibility as more general cloud services in terms of VM specification, but offer facilitated access to EO data holdings as well as community specific tools facilitating EO data processing.
Instead of downloading EO data and processing it on local machines, intermediate users specify a virtual machine in the cloud and with this virtual machine, they access and process the data and only transfer intermediate or end results to their local machine. This represents a true paradigm change of how users access and process EO data. However, it also implicates challenges on different levels, including an insufficient expertise in cloud-based systems, a general skepticism in several aspects of cloud security and limited transparency in potentially evolving costs for processing. For these reasons, strong efforts have to be undertaken to strengthen capacities in cloud-based services in general, while building up overall trust in the security of cloud services and a good understanding for emerging processing costs.

Figure 5 — Cloud-based data access and processing

5.2.6.  Future: Federations and Interoperability of Cloud-Based Systems

A natural future evolution of EO cloud-based systems will be that interactions between these systems will increase, sharing data and processing in order to support the information needs of users. Thus, cloud federations will emerge, providing the ability to flexibly share data, applications and processing resources between multiple cloud-based systems.

Figure 6 — Federations of cloud-based systems

A number of challenges will have to be met in order to achieve federations of cloud-based systems, for example reaching a suitable level of interoperability and portability of cloud services and components, as well as being able to control and optimize costs when working within a federated system.

The European Open Science Cloud (EOSC) is an initiative that aims to build a federation of cloud services from both the public and the private sector, in order to enable the re-use of (research) data across borders and scientific disciplines. This is only possible if specific standards and guidelines are put in place for users and cloud-system providers. Users have to trust the data and services that are made available for re-use. For this reason, a central activity for EOSC is to define ‘rules of participation’ that enable users to assess the quality of the data and services offered in federated cloud systems (Bicarregui [2]). Another example of a federated system is WEkEO, which connects Copernicus data servers as well as processing services from different entities, for example the Earth Observation Data Centre (EODC)
for Water Resources Monitoring, which provides compute and storage resources as well as access to high performance computing facilities.

NOTE  The NIST Cloud Federation Reference Architecture (CFRA) identifies necessary and possible functions and capabilities to support general federations (of clouds), and organizes them into a reference architecture, without dictating the use and implementation of these functions and capabilities. Furthermore, the CFRA identifies a range of deployment and governance models. The CFRA can be helpful in getting a better understanding of the design of cloud federations in general, and understanding the nature of particular federated systems such as the EOSC.

6.  Current Status

This chapter describes the current status regarding common aspects of EO system architectures. It synthesizes results of the web meetings with satellite data as well as cloud service providers.

6.1.  Data Coverage and Transmission to the Ground

When talking about big EO data, particularly with remote sensing data produced by satellites, it is useful to know the relative magnitude of the data, as well as which regions of the globe are typically covered, and how that data is actually delivered from satellites down to earth. This section covers these aspects.

Satellite data providers receive tens of Terabytes of new EO data every day (e.g. NASA EOSDIS: ~50TB, Maxar: ~80TB), and their EO data archives are in the Petabyte range (e.g. NASA EOSDIS: ~56PB, Maxar: ~120PB). The satellites can typically cover the whole globe, although commercial providers tend to focus more on inhabited regions of the world, and less on the poles.

The data is transmitted to ground stations, which are mostly self-owned, but in some cases also rented. It was reported that the time it takes to make new EO data available to users — including transmitting the data to the ground and processing it — usually takes a few days (e.g. for Maxar: less than 2 days). For certain high priority user groups, such as the military or in case of a disaster, that time may drastically be reduced (e.g. for Maxar: less than an hour). Emerging technologies such as in-orbit communications can help reduce the time it takes to deliver collected data to the ground — and thus the total delivery time — even further.

NOTE  The market of rentable ground stations has grown over the last decade, with companies such as Amazon, KSAT, Microsoft, and startups offering ground stations around the globe for establishing communication links with satellites. In addition, a market for in-orbit communication is developing (during the interviews, analytical space was mentioned as one example for a company in this field)  — establishing communication networks in space to further increase the time that satellites are connected to the ground, and the average downlink data rate.

One cloud service provider reported that the antennas of rentable ground stations are directly located on top of their data centers. That means that the ground stations are situated close to the cloud environment. Bandwidth limitations for transmitting data — e.g. from the ground station owned by a satellite provider — into the cloud can thus be circumvented. Ground-based transmission would no longer be needed. Instead, the downlink would be directly into the cloud. In other cases, i.e. where there is sufficient bandwidth for transmitting data from the ground station into the cloud, HTTP based streaming directly into cloud object storage was mentioned as an adequate approach.

6.2.  Data Storage

As mentioned in the Introduction, vast amounts of EO data is being produced every day. Managing the growing flood of EO data within their own IT infrastructure is increasingly challenging for EO data providers. Cloud service providers offer a way out of this dilemma, since their cloud environments provide access to a whole network of server plants, with unprecedented levels of storage and processing capabilities. This section documents the results from meetings with both satellite data and cloud service providers on the topic of EO data storage, with specific focus on data storage.

As described in the previous section, satellite data providers nowadays manage up to hundreds of petabytes of EO data. It is increasingly challenging to support self-hosted IT infrastructure for such an amount of data. Satellite data providers therefore tend to move their data into the cloud. Planet Labs even started using the cloud for data storage right when the company was founded.

Maxar, for example, switched to the cloud in 2017, because their data center simply could not be expanded anymore, and an internal evaluation of cloud environments revealed that moving to the cloud would provide clear cost savings. Furthermore, the move enabled Maxar to put their focus back on satellite management, leaving the data center management to the cloud provider.

NASA’s Earth Observing System Data and Information System (EOSDIS) is migrating to the cloud as well. EOSDIS Distributed Active Archive Centers (DAACs), located throughout the United States, process, archive, document, and distribute data from NASA’s past and current EO satellites and field measurement programs. One by one, these centers are moving their operations into the cloud. A major reason for migrating to the cloud is the expected growth of data, due to new high data volume missions such as Surface Water Ocean Topography (SWOT) and NASA-Indian Space Research Organisation Synthetic Aperture Radar (NISAR). The annual growth of data is expected to be 50PB in 2022, with the archive size growing to be more than 246PB by 2025 — see figure X.

Figure 7 — Historic and projected EOSDIS archive size and annual growth (source: https://nasadaacs.eos.nasa.gov/eosdis/cloud-evolution)

Moving the EOSDIS data archive to the cloud is expected to improve management and accessibility of the data, and enables users to bring their applications (close) to the data, by executing these applications in the cloud. The aim is that users do not need to download data (e.g. satellite images) at all; instead the analysis is done directly in the cloud, and only ultimate results would be downloaded.

NRCan is also faced with a significant increase of data volume: while historic EO data currently amounts to ~3PB of data, the recently launched RADARSAT Constellation Mission (RCM) leads to an increased influx of data. The data archive size is expected to grow to ~10PB — exceeding the current storage capacity of the existing system (which is ~5PB). At NRCan, plans were therefore set in motion to migrate their EO Data Management System (EODMS) — an archiving and discovery system for the Government of Canada’s EO data — to cloud based storage (and processing). Within the OGC Earth Observation Applications Pilot project (conducted in 2020), an extension of EODMS was developed that runs in the cloud. Within the next 3 to 6 years, the entire EODMS architecture is planned to be migrated to the cloud. A major goal thereby is to reduce the need for users to download EO data and to encourage them to process the data directly in the cloud. This includes the production of value-added data, which is expected to be available faster with cloud-based processing. Reducing costs — for current data centric operations of EODMS — is another major reason for moving EODMS operations to the cloud. Last but not least, service uptime is expected to be better using the cloud, compared to current data center operations.

The migration of the historic data into the cloud can for example be done using mobile units with large storage quantities (up to Exabytes), for example the AWS Snowmobile. These units collect the data from the provider’s data centers and deliver it to one of the cloud provider’s data nodes, circumventing the alternative of uploading the data via an internet connection (which, for transferring 100PB of data would take ~3 months with an upload speed of 1000Mbit/s, and ~2.5 years with an upload speed of 100Mbit/s).

NOTE 1  The AWS Snowmobile is a member of the AWS Snow Family of devices which can be used to collect and process data at the edge, and migrate data into and out of AWS. Different devices with varying storage volumes are available.

Cloud environments that provide services around the globe are typically structured into multiple regions. Data is transmitted into the cloud usually within a specific region. Processing that is performed on the data within that region is said to be done close to the data. However, when processing would be done in other regions, or data is requested from other regions, delays could occur due to the time it takes to transmit data between the two regions. Some cloud service providers therefore support replication of data across cloud regions. Cross Region Replication (CRR) allows data providers to replicate their data so that it is available as backup or for hot applications in multiple regions.

NOTE 2  This kind of replication may be useful for disaster situations. If, during the disaster preparation phase, certain data is identified as relevant for a disaster response, it could be replicated/loaded onto edge storage devices. When a disaster occurs, these devices could then immediately be (physically) moved to the disaster area — to be used during the disaster response and recovery phases.

NOTE 3  When considering the management of EO data in the cloud, some stakeholders cautioned that the CO2 footprint of doing so should be taken into account. One motivation for the WEkEO platform, for example, was to optimize the CO2 footprint by re-using existing data and infrastructure instead of duplicating it. Duplication may be necessary for a data provider to avoid data loss in case of system failures or disaster, as well as for operational needs of applications. However, unnecessary duplication of data should be avoided.

With respect to actual (meta-)data formats as well as open or proprietary software and standards for storing EO data, interviewees responded as follows:

  • Many satellite data providers use Cloud Optimized GeoTIFF (COG) for storing satellite imagery data in the cloud. In addition, plain GeoTIFF, JPEG 2000 (JP2), National Imagery Transmission Format (NITF), Network Common Data Format Version 4 (netCDF4) were mentioned.

  • Planet Labs primarily uses JPEG 2000 encoding when transmitting raw satellite imagery to the ground, and also for long-term storage of that data. Cloud Optimized GeoTIFF is produced on the fly, upon user request. Caching mechanisms (even user controlled, with a new system/service) are in place to reduce the amount of processing required to do the conversion from JP2 to COG. The preference for JP2 is due to the significantly smaller file size, compared to COG. It was reported that a 2-3 GB JP2 encoded satellite image would result in a ~70 GB COG.

  • Maxar reported that they are developing standards to support their AI / ML applications. Metadata such as spatial resolution and other quality parameters are needed in order to advertise and determine the kind of imagery that these applications can use. When deemed useful for the wider community, this work is being brought to official standards organizations such as the OGC.

6.3.  Data Discovery

Since an enormous amount of historic EO data already exists, and more EO data is produced each day, being able to search for and discover relevant data is a critical function of any EO system.

Satellite data providers typically have some kind of portal that includes discovery functions. Maxar, for example, offers two options: A data catalogue website, where users can search (using a spatio-temporal bounding box), display and browse images (with resolution reduced to 15m), as well as an online platform for registered users (SecureWatch), which provides direct, on-demand access to all EO data in highest resolution. The Earth Observation Data Management System (EODMS) is NRCan’s portal to discover and download satellite data. ESA offers EO-CAT for browsing the metadata and images of EO data acquired by various satellites, as well as Earth Online — an EO information discovery platform. NASA has several discovery tools and services, such as Worldview, OPeNDAP servers, Earthdata Search, the Global Change Master Directory (GCMD), and the International Data Network (IDN) — the latter three of which use the Common Metadata Repository (CMR) as backend.

NASA’s Common Metadata Repository (CMR) is a high-performance, high-quality, continuously evolving metadata system that catalogs all data and service metadata records for NASA’s Earth Observing System Data and Information System (EOSDIS) and will be the authoritative management system for all EOSDIS metadata. These metadata records are registered, modified, discovered, and accessed through programmatic interfaces leveraging standard protocols and APIs.

— source: https://nasadaacs.eos.nasa.gov/cmr[https://nasadaacs.eos.nasa.gov/cmr]

The CMR supports a range of metadata formats as input (e.g. ISO 19115), which are mapped to (and from) core metadata elements defined by the Unified Metadata Model (UMM), and a number of APIs and formats for access (e.g. OpenSearch, STAC, and CMR Rest APIs).

The Planet Explorer is Planet Lab’s GUI-based online tool for searching and analyzing geospatial imagers. In addition, Planet Lab supports a custom RESTful API for searching their complete imagery catalog. The API was built to handle over 1 billion metadata records. It was emphasized that such a number of records poses unique engineering challenges for building a performant, reliable, and scalable discovery system.

Metadata plays an important role for discovering relevant datasets. Interviewees emphasized that it is still difficult to discover new data sources for a given area. Catalogue formats and services could be of use, especially federations of catalogue services. The Spatio Temporal Asset Catalog (STAC) specification — a “common language to describe a range of geospatial information, so it can more easily be indexed and discovered” (source: https://stacspec.org/) — was often mentioned for discovery of satellite imagery within cloud object storage. Centralized access portals, such as the GEOSS Portal — with its discovery component, the GEO Discovery and Access Broker — are also helpful, as they aggregate metadata from multiple EO systems. Open source tools, such as GeoNetwork opensource and Magda, support setting up catalog applications. It would even be possible to set up a database in the cloud to index available data, and make the database publicly available. These approaches are all service driven, i.e. a user has a single entry point that is used to discover available data. A slightly different approach would be to have standardized web APIs that provide access to metadata about available resources (e.g. EO data), and let web crawlers harvest the information. OGC API — Records, which is currently in development at OGC, could be a candidate for such an API. Users could then query for relevant information directly in web search engines such as Google Search.

However, even if data sources are discovered based on matching spatio-temporal criteria, their metadata may not provide sufficient information needed for actually making use of the data. In order for applications to process some input data, that data needs to fulfill the requirements of the application. A satellite data provider emphasized that his AI /ML algorithms require detailed information before being able to use an unknown data source, such as information on the actual content (e.g. observed variables), which kind of processing has been applied, as well as data quality (spatial and spectral errors). While a multitude of metadata formats already exist today (e.g. ISO 19115, ISO 19157, ISO 19165, Spatio Temporal Asset Catalog (STAC)), potentially with a number of profiles and extensions, there does not appear to be a definite standard framework of definitions for the type of information the aforementioned algorithms require. This challenge is described in more detail in the section titled Lack of a Standard Framework for Specific Quality Factors for EO Data.

NOTE  One satellite data provider mentioned that users currently do not ask for specific metadata when data is requested to support a disaster situation. It seems that getting all potentially relevant data seems to be sufficient. Especially for mapping purposes, and visual analytics, that may well be the case. The more automated disaster workflows become in the future, the more standardization and metadata will play a role.

6.4.  Data Processing

Within the EO Cloud Platform architecture, EO data (pre-) processing is a key functionality required to support user applications. The raw data is typically processed multiple times, thus deriving higher-level information that enables decision making. This section documents the type of processing currently performed by satellite data providers, the software and tools used, and the processing challenges they are facing.

Satellite data providers typically pre-process the raw satellite imagery, for example doing orthorectification, band selection, and radiometric as well as atmospheric corrections. In rare cases, users do their own preprocessing to get better results (e.g. using better elevation models or ground control points). Multispectral data is typically requested by data scientists as well as value-added resellers and partners, while many users simply request RGB imagery. In specific cases, in-house analytics can also be done by experts from the satellite provider, on behalf of specific customers. For example, in a disaster situation, Maxar does forest fire detection using AI / ML techniques.

Some satellite data providers actively engage in the implementation of edge computing techniques. Edge computing allows the execution of pre-processing or even feature detection directly at the satellite before the data is transmitted to earth. This can help reduce the latency with which useful information is produced. It can also support the provisioning of actionable intelligence in disaster situations with low connectivity on the ground (for further details, see the section titled Delivering EO Data to Poorly Connected Regions).

During the interviews, satellite data providers were also asked if they allow users to upload (containerized) applications into their systems, to process their EO data. Some satellite data providers have tested this approach, and some users (e.g. the military) already make use of it. For example, Copernicus Data and Information Access Services (DIAS) offer processing using containerized applications close to the data provided by ESA’s Sentinel satellite missions. Docker was mentioned as an application container, for example encapsulating AI / ML applications.
However, some providers refrain from executing processing on behalf of users. They argue that users can build their own processing environment in the cloud, close to the data offered by the provider. Typically, that means using the same cloud (and region) in which the data is stored. However, one satellite provider mentioned that they can also bring the data to the user’s preferred cloud or processing environment. It is unclear which costs that approach would incur (for data transfer and storage in another environment) but apparently some users prefer to have this option.

One interviewee said that the likely reason why most users have not bought into the new applications-to-the-data paradigm yet is that so far intermediate users were responsible for providing value-added services. Others also noted that users may have high performance computing (HPC) facilities (potentially shared with other users), which they intend to continue using for EO data processing. In fact, one interviewee raised the concern that cost might significantly increase when shifting (their model processes) to cloud based processing, though a detailed analysis of actual costs would be needed to verify this concern. Nevertheless, most satellite data providers regard the new paradigm to be the future.

NASA EOSDIS is already actively engaged in moving EO data as well as processing to the cloud. For example, the Harmony project implemented data transformation services in the cloud (e.g. for subsetting, regridding, and reprojecting data) — allowing the production of analysis-ready data (ARD) on the fly for the user (Blumenfeld [3]). With data being available in ARD form, data can directly be processed in the cloud — rather than having to download the data first. NASA EOSDIS is using and supporting an ecosystem of tools that allow for high performance analysis in the cloud, in particular the software packages promoted by Pangeo — a community platform for Big Data geoscience — such as Xarray, Dask, and Jupyter. NASA EOSDIS has a strong commitment to capacity building and education, with the goal to enable data scientists to perform EO data analysis in the cloud.

NOTE  The NASA Earth Data portal has a special section with learning material, including webinars, tutorials, and articles.

Currently, the cloud environment of NASA EOSDIS does not allow users to upload their analytic applications directly into this environment. Instead, users need to build up their processing system in their own cloud accounts.

The following challenges related to processing of EO data were reported during the interviews:

  • When satellite imagery is used to perform visual analysis, color changes from one image to another, due to atmospheric changes between the collection times, pose a challenge. Atmospheric compensation helps to a certain extent (especially for images taken by the same sensor), but a standard way of visually representing satellite images (for images taken by different sensors) was requested to be developed by the industry.

  • Another challenging use case is automated analysis based on inference or information extraction. Well-defined quality metadata is needed in order to provide useful results. Taking information about spatial errors as an example. Every satellite has its own spatial error. Small satellites can potentially be off by 100 meters in absolute position. How to do co-registration (combination) of data from multiple satellites if the spatial errors are not well defined? The spatial errors of the resulting product would be undefined. However, the precision of absolute positions is relevant for disaster response, for example to route resources to a certain location. With well-defined information on spatial errors in input data, algorithms that derive new information from that data should be able to create an estimated error of the precision of a computed location. The resulting error value would be useful for decision makers. For example, an error of 100 meters may not be so critical in flat, open land, but could make a huge difference in a metropolitan area — especially in a disaster scenario. In that case, first responders need to have the means to understand how accurate — and thus trustworthy — some data is. Decision making based on EO data often leverages higher-level information derived from other EO data by some kind of process (including multiple processes). In order to support auditing requirements, the lineage / provenance of derived data must be available (and kept for a certain period of time, e.g. 10 years). If sufficient metadata is kept, derived data can even be reproduced. Keeping sufficient metadata records to support auditing is an important operational — and challenging — aspect for any processing system that produces higher-level information for decision makers.

6.5.  Data Access

With EO data having been produced, processed, and stored by the provider, and discovered by the user, we will now summarize how users are currently able to access EO data.

6.5.1.  Cloud Optimized, Scalable Access

Today, satellite imagery stored in the cloud is often formatted using cloud-optimized formats such as Cloud-optimized GeoTIFF (COG) and Zarr. These formats allow parallel reads and writes, as well as selective access — i.e. reading only those parts of the whole dataset that is relevant for processing (analysis, visualization).

Some satellite providers have converted all their data to these formats. One provider only does so on-demand (for user access), preferring to store data in formats that require less space (e.g. JPEG 2000). In cases where data is already stored in formats such as netCDF4 or HDF5, or in case there is a mandate for the production of such files, new approaches for efficiently reading such data in the cloud need to be developed. Signell [12] describes a potential solution (with potential improvements described in section titled Cloud optimized data formats).

6.5.3.  IT Interfaces, Software, and Standards

Satellite providers usually support multiple IT interfaces and standards — both open and proprietary — to access their data. Maxar, for example, provides multi-spectral imagery data to users that want to perform their own analytic processing via Amazon Simple Storage Service (Amazon S3) buckets. Maxar also supports common OGC interface standards, such as WMS, WMTS, WCS, and WFS. Furthermore, they offer an API that supports integration into standard GIS software (e.g. ArcGIS), for example by directly streaming imagery data into this software. At NASA EOSDIS, international standards are being used as well (e.g. ISO metadata standards, and a variant of the OGC API — EDR), in addition to self-developed APIs and off-the-shelf GIS software. The situation is similar for the NRCan EODMS, which supports a number of OGC service interfaces (CSW, WCS, WMS) as well as OpenSearch and a custom-built API. Planet Labs also supports open standards, and uses open-source software when applicable (e.g. PostgreSQL). The WEkEO platform has developed a custom API, the Harmonized Data Access (HDA) API, which allows uniform, harmonized access to the whole catalogue of WEkEO data sources.

6.5.4.  Accessing Data Within Resource Constrained Environments

Due to the large and increasing volume of EO data, accessing this data within a resource constrained environment can be challenging. Resource thereby primarily means communication links; however, processing also plays a role. In regions with low or no internet connection — which may be caused by a disaster — it is difficult or even impossible to download large quantities of satellite imagery over the internet. Some satellite providers said that they solve this issue by actually delivering satellite images on hard drives. This is further explained in the section titled Delivering EO Data to Poorly Connected Regions. Others noted that using the cloud for both storage and processing is a good solution when dealing with bandwidth limitations, since the data can be condensed inside the cloud to a minimal set with highly relevant information, resulting in a small amount of data that can be downloaded over available communication links (maybe even through the emerging satellite internet).

NOTE  Cloud optimized data formats allow accessing only relevant portions of the data, for example the part of an image that intersects with a disaster area. This can help reduce the amount of data that needs to be accessed and processed. Nevertheless, the data that remains for download may still be huge.

6.6.  Security

Within IT systems in general, and EO systems in particular, security is an important — and cross-cutting — concern. When data is transmitted, in use, or at rest, a real-world system must ensure that the data is secure. The goal is to ensure data confidentiality as well as integrity and availability of the data and the system itself. Cloud service providers specialize in providing customers with all means necessary to achieve this goal. The interviews revealed that satellite data providers which host their data in the cloud have great confidence in cloud environments being secure.

Here are some noteworthy results from the interviews on the topic of security:

  • Once the data leaves the system, security can no longer be guaranteed, as one interviewee reported with specific regard to data integrity. Data integrity concepts typically stop at the edge of the system boundary.

  • A satellite data provider remarked that he was required by law to keep all satellite data that is collected. The cloud environment provides multiple backups of data, in separate physical locations. The satellite provider further noted that whenever new data is being derived from other datasets (e.g. the original satellite data), then extensive metadata and provenance information is being kept — which allows to re-create the derived product if necessary. Taken together, that leads to the satellite data provider considering the risk of losing data when hosting it in the cloud to be very low.

Many times, OGC Innovation Program initiatives have analyzed, developed, and tested security mechanisms and technologies. OGC Testbed-16 investigated federated security as well as data centric security. The OGC Testbed-16 Federated Security Engineering Report (OGC 20-027) [9] analyzes aspects of security and trust in a federated computing environment, as defined in the NIST Cloud Federation Reference Architecture. Possible approaches covered by the report range from traditional methods for securing basic communications between federated entities, to the use of emerging security technologies such as trust frameworks, blockchain, and zero trust architectures. The OGC Testbed-16 Data Centric Security Engineering Report (OGC 20-021r1) [8], on the other hand, is specifically concerned with securing data itself, ensuring data confidentiality and integrity even in case that data leaves the system of the data provider. Both reports are relevant for securing cloud-based systems.

Data Centric Security (DCS) is about encrypting the data itself whenever it may be stored outside of a fully trusted environment, and controlling access to certain parts of the data, using role-based authentication and access. Examples of storage without full trust can be cloud environments provided by an external service provider, as well as mobile devices. One scenario investigated in Testbed-16 assumes that a user operates in an offline mode, disconnected from the network — which is an important assumption for disaster response operations. Protected data required for a “mission” is stored (in encrypted form) on a mobile device for later field use. DCS mechanisms ensure that access to the data will expire at some point in time in the future (suitable for mission requirements), and that access to the data is limited based on user roles (e.g. the mission commander may have access to more sensitive information).

NOTE  The DCS work in OGC Testbed-16 only investigated GML and GeoJSON formatted feature data. Geospatial payloads such as maps, tiles, and coverages (e.g. in GeoTIFF, and GMLJP2 format) are marked as future work. OGC Testbed-17 will address DCS for these additional payloads (even GeoPackage is mentioned). It would be worthwhile addressing this topic in the OGC Disaster Pilot 2021 as well, leveraging (early) results from Testbed-17 if possible.

National and international laws can have an impact on data confidentiality. National laws may force a company, such as a cloud service provider, with business dependency in that nation to disclose customer data (e.g. because the customer broke the law, or for reasons of national security) — potentially even if the data is stored on servers outside of that nation. Such laws may be in conflict with laws of other countries, creating an unclear situation for everyone involved (especially cloud service providers and data owners). Celeste and Fabbrini [6] describe this issue in more detail. It is unclear if data centric security can help solve this issue, without compromising the key concepts behind the Applications-to-the-Data paradigm, and its realization within the EO Cloud Platform architecture.

The applications-to-the-data paradigm also raises security concerns regarding user applications that are executed within the confines of a data provider. This issue has been investigated to a certain extent within the OGC Earth Observation Applications Pilot (for further details, see section 5.8 of OGC 20-073 [7].

7.  Future Evolution

7.1.  Towards the Production of Higher Value, “Living” Data

Satellite data providers intend to increase the number of satellites. Areas of the earth will be covered with a higher frequency, with different sensors that produce different types of information, which can help to fill important gaps within currently available data.

The amount of generated EO data will increase exponentially. Cloud environments are expected to support handling the increased influx of data. In addition to increasing the overall coverage, satellite data providers also plan to decrease the latency with which data becomes available for decision making (e.g. at Maxar from ~60 minutes down to tens of minutes). AI / ML techniques as well as Edge Computing and the New Space paradigm were mentioned as enabling technologies.

NOTE 1  OGC has held a workshop on the topic of “New Space” in 2020. The workshop program, as well as links to recordings and a report of the event is available at https://www.ogc.org/ogcevents/new-space.

Satellite data providers will thus be able to shift from primarily supporting mapping projects to supporting both mapping of the earth, detecting relevant changes on the earth, as well as providing inputs for computational models — and that shift has already started. Users will be able to work with living datasets. For example, rather than working with building data, i.e. the state of a building as currently known, a user will be able to work with changing building data — subscribing to data updates that will frequently be provided (with the help of satellite imagery and change detection algorithms). Similarly, data scientists will be able to make use of high-volume, n-dimensional data, for example from NASA satellites.

NOTE 2  When the focus of processing EO data shifts from making base products available, to actually analyzing the data in various ways to derive new information (e.g. land cover changes), workflows will become an important topic — together with workflow management, verification of workflow implementations, reproducibility, and auditing.

7.2.  Open vs. Proprietary

Unsurprisingly, satellite data providers use both open and proprietary standards and tools in their system — and intend to continue doing so in the future. Since customers are requesting adherence to open architectures, and since open frameworks as well as open source tools, systems, and platforms — e.g. deep learning frameworks for AI/ML applications (such as TensorFlow), and Kubernetes for orchestration of containerized applications (supporting automation of application deployment, scaling, and management) — have become available and are actively being developed and improved, “open” is being embraced more and more.

7.3.  Improving Interoperability

Many satellite data providers are members of global disaster programs, such as the International Charter — Space and Major Disasters. Surprisingly, these programs do not appear to define specific technical requirements regarding formats and interfaces for accessing EO data during a disaster response, but rather provide a network to bring disaster response and data providers together.
In case of a disaster, data requests are sent out to program members. The members then typically make data of the disaster area available, beginning with the latest data before the disaster event, followed by new data generated during the program activation. Maxar, for example, publishes the data on their open data portal, which supports subscriptions to new data, allowing relevant disaster response stakeholders to be notified as soon as new data has been published on the portal.
While EO data is being made available for use within a disaster response, the data provider chooses the data formats as well as the interfaces via which the data is published. This may complicate the integration of data from multiple sources. A common framework of standards and specifications for accessing EO data would help to improve the situation. The OGC can play an important role in developing such a framework.

More and more, satellite data providers strive to integrate data from multiple sources themselves, in order to derive new and valuable information of interest to users. Maxar, for example, is pushing the implementation of AI/ML-based multi-sensor, multi-intelligence analytics. For example, they use RADARSAT-2 (from another data provider) if the cloud cover above an area is too high to use their own satellites for detection of fires or burnt areas. Another use case involves tapping into Automatic Identification System (AIS) signals emitted by ships, and combining this information with results of ship detection algorithms that are run on satellite imagery, in order to identify “dark ships”, i.e. ships that have not activated AIS. It was noted that such integration scenarios also require a common framework of standards and specifications. Specific emphasis was put on the current lack of a framework for spatial and spectral error propagation (further details are given in [ref OGC 20-088]), as well as certain metrics that would support AI applications (use case: given some image metadata, it should be possible to tell which objects can be extracted from the image).

To summarize: There is a need for defining a framework of standards for encoding relevant EO data and metadata, as well as interfaces and formats for efficient and interoperable access to the data. Modern analytics on EO data — such as feature detection — require more detailed information about given data sources in order to work well and to create information that is useful for decision makers.

7.4.  Continued Migration to the Cloud

Some satellite data providers have only started to move their EO data and relevant operations into the cloud. These providers typically want to gather experience with the new (cloud) environment before fully committing to it, and need to avoid disruptions of their everyday operations. Consequently, plans are being made for the continued migration of their data into the cloud.

NASA EOSDIS, for example, started by moving the operations of NASA’s Global Hydrology Resource Center Distributed Active Archive Center (GHRC DAAC) into the cloud. Other DAACs will follow in the coming years (source: https://nasadaacs.eos.nasa.gov/eosdis/cloud-evolution).

NRCan also started moving to the cloud, prototyping an extension of their EO Data Management System (EODMS) that runs in a cloud environment. The migration of EODMS to the cloud is planned to be completed within the next 3 to 6 years.

EODMS users should have a similar experience when accessing data provided by EODMS as they have nowadays. However, the processing capabilities would improve drastically. While currently there are bottlenecks regarding the amount of data that can be stored and processed, processing in the cloud, close to the data, is expected to provide all the processing power that is needed. Whether EODMS users fully shift to processing EO data in the cloud (e.g. for executing common pre-processing tasks), or whether they keep using their already existing high performance processing infrastructure — only expanding into the cloud when the situation demands it — remains to be seen. Costs need to be considered — of local and cloud-based storage and processing, as well as a hybrid of the two. The topic of costs is further discussed in the section titled Controlling Cloud Costs.

A similar situation presents itself at the department of Fisheries and Oceans Canada. A number of EO user groups still download EO data, for example from multiple space agency sites (such as ESA, NASA, NOAA, CNES, JAXA, USGS Earthexplorer, and EUMETSAT) and process it locally. Other groups have started to explore cloud computing. The Ocean Data and Information Section (ODIS) already migrated their processing to the cloud. While some user groups are reluctant to change their current workflows (involving local storage and processing), they acknowledge that large data transfers are an issue — which is one of the main arguments for adopting the applications to the data paradigm. The intention of ODIS is to support other groups fully migrating to the cloud as well. That is in line with Canada’s cloud-first strategy (for further details, see the Government of Canada White Paper: Data Sovereignty and Public Cloud).

7.5.  Expanding the Toolsets

A number of software tools have already been developed that allow for high performance analysis of EO data in the cloud. The software stack promoted by Pangeo is one example, the Sentinel Application Platform (SNAP) — part of the toolboxes on ESA’s Science Toolbox Exploitation Platform (STEP) — is another. Satellite data providers such as ESA, and platform providers such as WEkEO, are going to continue working with the EO user community to develop new and improve / evolve existing tools, toolboxes, and platforms, making it easier for users to work with the ever-growing amount of EO data.

8.  Recommendations

8.1.  Lack of a Standard Framework for Quality Factors for EO Data

Applications that perform analytics on satellite imagery (e.g. feature extraction) need a specific level of image quality. Quality factors include, but are not limited to: information on spatial and spectral errors (e.g. atmospheric correction and geolocation accuracy). A standard framework needs to be developed that provides clear definitions for these quality measures, to be included in image metadata ([ref OGC 20-088] provides further details). With such definitions contained in image metadata, applications can evaluate if the image data fits the needs of these applications.

NOTE 1  Certain EO communities may have developed — or are in the process of developing — specifications to define the quality measures that are of interest to them. For example, NASA ACCESS 2019 projects are advancing — amongst other topics — Machine Learning for Earth Science Data Systems, which will likely be concerned with quality factors for satellite data. Identifying and analyzing such specifications, as well as developing a standard framework of quality measures based on these specifications, is future work for OGC (as proposed by [ref OGC 20-088]).

The ability of an application to determine whether a given data source can be used or whether the data does not fulfill the minimal requirements is critical in disaster response scenarios. A disaster response manager may find multiple applications that would be useful to his work in a given crisis, but currently it is difficult to evaluate — fast and ideally in an automated fashion — which applications can actually be used with available datasets.

A particular EO Cloud platform may provide certain guarantees regarding the readiness of satellite data for analytics (supporting certain levels of preprocessing, quality information, and metadata to be available) which an application developed for that platform may rely upon. However, developing an application making use of data from multiple platforms is impeded due to the current lack of interoperability between EO Cloud platforms and a framework of standards regarding applications, services, and (meta-) data. OGC working groups such as the OGC EO Exploitation Platform DWG as well as the OGC Data Quality DWG would be good places to pursue the necessary standardization work. In fact, the charter of the OGC EO Exploitation Platform DWG mentions the definition of a standard set of metadata.

NOTE 2  The OGC Testbed-17 task “Federated Cloud Analytics” is going to devise a notional architecture for federated “processing to the data” systems, including a definition of the technologies and standards required to realize that architecture, also taking into account the concept of Analysis Ready Data (ARD). The outcome of that task may provide valuable input for overcoming the challenge described in this section.

8.2.  Cloud Interoperability

Ideally, EO Cloud Platform implementations support multiple cloud service environments. Using components that support open standards — both formats and APIs — so that the underlying cloud specifics can be hidden, would facilitate this goal, and lead to an interoperable system architecture.

Satellite data providers that moved their data into the cloud typically prefer working with a specific cloud provider. However, some of them also support other cloud environments (e.g. due to customer demand), and thus support a multi-cloud approach. For these providers, cloud vendor lock-in was not considered to be an issue. Others noted that cloud interoperability and portability is an important topic. However, introducing levels of abstraction into the architecture to hide cloud specifics comes at the cost of not being able to profit from the benefits that cloud native services, for example, offer. At least for satellite imagery it was noted that cloud object storage, with HTTP based access to files using standard formats (such as [cloud-optimized] GeoTIFF), could quickly be ported to a different cloud service provider. In general, the use of open standards was considered to be helpful regarding system portability. Furthermore, by building a cloud-based system with small components (keyword: microservices), the expectation is that the portability of the system will increase.

NOTE 1  When moving data and processing to the cloud, the possibility of vendor lock-in should be considered — especially considering the ever increasing amount of EO data (a solution that works for current demands on data storage and processing may cease to be a solution when that demand has increased by a factor of ten or even a hundred). Interoperability and portability in cloud computing are important topics that have been investigated by many groups, e.g. ISO (ref ISO/IEC 19941:2017), IEEE and NIST. The web article https://insightaas.com/cloud-interoperability-and-portability-necessary-or-nice-to-have/ provides a good overview and introduction.

The needs of EO users should also be taken into account. Interoperability is seen as a key requirement. A federated system with different access APIs and data formats is difficult to handle for the common user. Therefore, the provision of higher-level information products, created by processing EO data (potentially using complex workflows), should be achieved using open standards — both formats and interfaces / APIs. A uniform access layer still seems to be a highly desirable achievement, also for EO systems based on cloud technology. Cloud-native services as well as custom APIs and tools could still be used for analyzing the data and for creating higher-level information products. A set of best practices should be developed to define the standards and specifications with which interoperability between cloud-based systems can be achieved.

NOTE 2  The challenge with interoperability of data systems is not only technical, but is a paradigm change for many large data organizations. Disseminating large volumes of data involves complex data pipelines linked with tailored data processing and quality control tasks and large data organizations are often reluctant to introduce changes to these highly tailored workflows.

8.3.  Lack of Policies for Data Sharing in Case of Disasters

Bureaucracy was mentioned as a challenge for sharing data during a disaster. Apparently, there is a lack of clear policies defining which information can be shared with which stakeholders, and how quickly the information can be shared. For example, Maxar noted that in one wildfire disaster scenario, information about burnt houses was only allowed to be shared 7 to 10 days after the houses burnt down. In another disaster scenario, data provided by Maxar was not used because data rights as well as sharing policies were not clear to disaster managers. NRCan reported that managing a plethora of data licenses is challenging when trying to determine which users can access what type of data.

Policy makers — on national and international level — need to make sure that data sharing in case of disaster is facilitated by putting relevant policies in place. These policies should address questions of which data may be shared with whom and when in case of a disaster. They should also address questions of liability. Data licensing will play an important role as well. Many EO data sets are already made available under an open license, which is beneficial for sharing data during a disaster response. The goal should be to have well-defined, ideally standardized rules and definitions for data sharing, which can be applied in case of a disaster. This could go as far as enabling automatic evaluation of these policies by EO systems, allowing stakeholders to quickly identify which data they may share and use.

Digital Rights Management (DRM) and Data Centric Security (DCS) may play a role in solving this challenge; emerging projects such as iSHARE may provide valuable concepts and mechanisms as well. Necessary definitions — for example for data products such as detected objects — need to be developed. Semantic technologies may play a role as well, for evaluating the equivalence of definitions developed by different entities. Having an international standard of relevant definitions would be ideal, and — once developed — could be mandated by policy makers. The members of the International Charter — Space and Major Disasters could also play a vital role in the development of policies, best practices, and standards for data sharing.

NOTE  OGC has worked on DRM and DCS. For further details, see OGC 06-004r4 and OGC 20-021r1.

8.4.  Delivering EO Data to Poorly Connected Regions

A major challenge in case of a disaster in poorly connected regions of the world is getting larger amounts of data to emergency response headquarters and personnel in the field. One satellite provider explained that in such a situation, their satellite imagery would currently be provided by sending it on hard drives.

NOTE  It should be noted that any place on earth may become poorly connected or even disconnected in case of a disaster. Satellite based communication links — a growing market — may be the last fallback in such a situation. However, these links are not as powerful as terrestrial links, so poor bandwidth would still be an issue (but at least some communication would be possible).

In the near future, edge computing could help bring information from EO data to poorly connected regions: Analytics on EO data can be applied on the edge (e.g. in space), extracting relevant information (e.g. detected features) from big EO data and only transmitting the distilled results, to be used for decision making. Furthermore, some cloud service providers have a service for moving data — and processing power — from the cloud to a user using edge computing devices (see for example the AWS Snow Family). However, that process may take a considerable amount of time; it was reported that it could take several days, depending on the amount of data and the delivery destination.

One of the published OGC standards is of particular interest for addressing the challenge of getting data to poorly connected regions: GeoPackage.

QUOTE: “A GeoPackage is an open, standards-based, platform-independent, portable, self-describing, compact format for transferring geospatial information. The GeoPackage standard describes a set of conventions for storing the following within a SQLite database: vector features, tile matrix sets of imagery and raster maps at various scales, extensions.” (source: https://www.ogc.org/standards/geopackage)

A GeoPackage represents a file-based container for geospatial data. GeoPackages can be loaded onto mobile devices for offline use, with subsequent synchronization by GIS software when the user is back from the field. OGC Innovation Program initiatives drive the development of the GeoPackage standard, the latest installment being OGC Testbed-16 (see OGC 20-019 [10] for further information). The OGC Disasters Pilot 2021 would be a good opportunity to test the delivery of satellite imagery and derived products such as detected features within a GeoPackage to poorly connected regions.

8.5.  Cloud Optimized Data Formats

Satellite image data has always been big data. New satellite missions will result in a significant increase of data volumes. For example, the NASA-ISRO SAR Mission (NISAR) — to be launched in 2022 — is going to produce roughly 85TB of data per day (source: Blumenfeld [3]). Storing, processing, and disseminating such large amounts of data is a challenge with current technologies. Cloud service providers therefore actively engage with research and user communities to develop efficient mechanisms and data formats that allow constant storage as well as constant streaming of data. Cloud-optimized GeoTIFF (COG) is an example of a data format that was specifically designed for cloud environments. However, this format requires a lot of storage space, compared to other formats such as JPEG 2000. Developing optimizations that support storage, processing, and dissemination of the next generation of big data will be a major challenge.

Signell (2020) describes an approach that allows efficient reading of files in the cloud, particularly for formats that are not cloud-optimized. It was noted that the approach could be improved by:

  • adding common compression/filtering schemes used by NetCDF4, HDF5 and GRIB2 to numcodecs, and

  • developing a binary representation of the fileReferenceSystem metadata (replacing the use of JSON, which is inefficient for very large datasets such as millions of chunks).

NOTE  Chris Holmes from Planet Labs noted that while a lot of work has already gone into optimizing image and raster formats for cloud use, cloud native vector formats also need to be addressed.

8.6.  Controlling Cloud Costs

Cloud use incurs some level of cost. A satellite data provider can typically define the cost related to data storage, and control the cost for processing performed by the provider himself. However, transfer of data out of the cloud or to external services within the cloud (in the same or a different region) is hard to control if the number of users is unknown. Cloud service providers may have different cost models for the aforementioned transfer cases. Transfer within the same cloud (and region) usually has a lower or no cost at all than transferring data out of the cloud. That is a reason why processing should be performed within the cloud, close to the data. Still, what to do if data is publicly accessible and theoretically anyone in the world can download the data? AWS has the concept of Requester Pays for Amazon S3 data stores. That would move the transfer cost to the user. However, that may not be an option for an organization such as NASA, where policies (e.g. the NASA open data, services and software policies) are in place which require that users can download data for free. The transfer cost would then be the burden of the data provider. Organizations need to ensure that the total cost of their cloud based system is within their budget. They may even be legally liable if the costs get higher as the budget allows (see for example the U.S. Antideficiency Act). Cloud use can therefore be a financial challenge.

NASA EOSDIS modelled data egress, i.e. transfer of data out of the cloud, based upon decades-long experience with data egress from their Distributed Active Archive Centers (DAACs), experience gained so far with cloud data storage and use, as well as studies on expected data egress for future missions, for example the NASA-Indian Space Research Organisation Synthetic Aperture Radar (NISAR) (Cassidy [5]). The model allows transfer costs to be calculated, and ultimately to calculate the total cost for the cloud use. Still, actual data egress could exceed expectations, which is why EOSDIS developed and deployed a software implemented data flow controller, which ensures that data download traffic is within allowed bounds at any given time.

Avoiding data transfer as much as possible to reduce costs results in increased processing in the cloud. While that is typically seen as beneficial, it also comes with a cost. These costs need to be understood and carefully weighed against each other. In addition, processing in one cloud may be cheaper than doing so in another. The cost of transferring data between two clouds once, and then processing the data multiple times, may ultimately be less than doing everything within a single cloud.

NOTE  Of course, there is also a question of how long it would take to actually transfer very large datasets (petabytes and more) to another cloud. The Data storage section has an example of how that could be done. In any case, the transfer would take a considerable amount of time — which may not be available in certain circumstances, such as a disaster.

Cloud cost optimization is an important topic. Several companies already offer related services. When performing an analysis of EO data in the cloud, being able to estimate the costs that the analysis will incur is critical to know — to avoid any overspending. Identifying the costs of executing a certain process in the cloud in advance is an open challenge.

8.7.  Capacity Building

Bringing processes to the data within the cloud is a paradigm change for analyzing big EO data. However, in order to create and execute processes in the cloud, users — but also providers — need to develop new skills. That can be a challenge.

NASA EOSDIS, for example, addresses this challenge in multiple ways. On the one hand, they develop, provide and extend learning material (webinars, tutorials, workshops, and articles). On the other hand, they establish focus groups to train different user groups and gather feedback from them. Finally, other community efforts are being leveraged and actively supported — especially Pangeo.

Likewise, NRCan plans to encourage their users to process data directly in the cloud, once the full migration actually starts. The idea is to follow an open approach, having a consultation process for users, as well as engagement with industry. Universities may also help in capacity building, by teaching relevant skills.

At Fisheries and Oceans Canada, Ocean Data and Information Section (ODIS), the aim is to gently encourage and actually help users to develop the necessary skills for working in the cloud, through piloting and testing activities that users should engage in. Demonstrating the values of moving to the cloud — such as cutting out the time and cost associated with download of large EO datasets — is seen as a good incentive.

WEkEO service providers are actively engaged in supporting users, both for urgent questions and for developing required skills, through: a helpdesk, publicly available material (e.g. user guides, webinars, and tutorials), as well as outreach and training (e.g. demonstrations, workshops, and hackathons).

8.8.  Working in Multiple Clouds

When EO data is stored and processed within the same cloud (and region), the implementation of the applications-to-the-data paradigm is relatively straightforward. However, when storage and processing occurs in different clouds and/or cloud regions, data essentially has to be transferred from place to place — which causes delays and costs. Identifying the most efficient way for performing analysis in such an environment — considering the variables cost and processing time — is a challenge.

NOTE  The Joint ESA-NASA Multi-Mission Algorithm and Analysis Platform (MAAP) (Albinet et.al [1]) is an example of a system where processing occurs in different clouds, and data is transferred between clouds.

8.9.  Developing Standards That Support Scalability of Applications for Big EO Data

Satellites produce tens to hundreds of terabytes of EO data each day, with the amount of data expected to increase significantly over the next 5 years — with the trend expected to continue going up in the next decades. This tremendous amount of data poses great engineering challenges.

One satellite data provider emphasized that in order to support the development of scalable systems, standards organizations should focus on standardizing small, reusable system components ((meta-) data formats, APIs, etc). These standards should be flexible / extensible to cater for unexpected needs — which may subsequently lead to standardized extensions. Rather than standardizing whole architectures, best practices should be developed.

9.  Conclusions

The study results reveal that satellite data providers are moving towards cloud computing, and implementing the applications-to-the-data paradigm. The extent to which current EO systems have evolved into fully cloud-based systems varies; some just started to migrate to the cloud, while others already fully embrace cloud computing and run cloud-based EO systems in an operational context. EO data systems of commercial satellite data providers tend to be more cloud-mature compared to EO data systems offered by public organizations.

Right now, the major focus still appears to be making EO data accessible in the cloud. That is no surprise, given the pressing issue of dealing with an ever-increasing volume of EO data, and the burden this places on EO data providers in terms of resources (financial costs, manpower, infrastructure, and expertise). New data formats and APIs, optimized for cloud-based access of EO data, have been and are still being developed. At the same time, tools and techniques for cloud-based processing of big EO datasets have evolved in recent years (one example is the open-source software stack of the Pangeo community). Only a small percentage of data users use cloud-based processing as their modality to access and process data. However, the EO data landscape is moving towards cloud-based systems with large data organizations introducing cloud computing into their long term strategy roadmaps. In ten years, the common modality to access and process data will be through cloud-based systems.

Cloud-based EO systems offer a number of benefits: Vast amounts of EO data can be stored and made accessible. The cloud environment provides the computing power to process such datasets. Cloud storage and computing resources can thereby be acquired as needed, providing EO systems with high levels of scalability and flexibility. Furthermore, since big EO datasets no longer need to be downloaded and processed locally, processing results become available in a more timely fashion. Furthermore, the user can concentrate on developing applications and analyzing results, rather than spending time with the setup and monitoring of EO data downloads.

However, the study also identified a number of concerns and challenges regarding cloud-based EO systems: On the one hand, the costs of operating an EO system in the cloud is not easy to understand and control, particularly for more complex scenarios and use cases. While total costs of running a system in the cloud will likely be significantly less compared to the costs of running such a system locally, within self-owned and -hosted data centers, there still is a level of uncertainty regarding the costs of actually running the system in the cloud. On the other hand, the security and confidentiality of EO data hosted in the cloud is an area of concern. In order to address these two issues, EO system providers as well as users need to build up trust in and gain experience with cloud-based systems through implementing pilot projects and focusing on capacity building.

Finally, we look at a scenario — e.g. during a disaster event or environmental monitoring case study — in which an application requires the combination and analysis of multiple large EO data sources. The scenario highlights a number of aspects that need to be considered in order to operationalize such an application within a cloud-based EO system.
As outlined before, through cloud-based systems large volumes of EO data can better be made available to users. However, users face a number of challenges which make it difficult to readily work with these datasets.

Cloud-based EO systems solve certain problems, especially the management and provision of vast amounts of EO data in an efficient manner. Cloud technology facilitates the development of scalable systems. However, challenges related to interoperability of (cloud-based) data systems remain.


Annex A
(informative)
Revision History

Date Release Author Primary clauses modified Description
March 24, 2021 .9 J. Echterhoff and J. Wagemann all complete version
August 29, 2021 1.0 J. Lieberman all asciidoc conversion and cleanup

Bibliography

[1]  Ingo Simonis: OGC 20-073, OGC Earth Observation Applications Pilot: Summary Engineering Report. Open Geospatial Consortium (2020). https://docs.ogc.org/per/20-073.html

[2]  Aleksandar Balaban: OGC 20-021r2, OGC Testbed-16: Data Centric Security Engineering Report. Open Geospatial Consortium (2021). https://docs.ogc.org/per/20-021r2.html

[3]  Craig A. Lee: OGC 20-027, OGC Testbed-16: Federated Security. Open Geospatial Consortium (2021). https://docs.ogc.org/per/20-027.html

[4]  Jeff Yutzler: OGC 20-019r1, OGC Testbed-16: GeoPackage Engineering Report. Open Geospatial Consortium (2021). https://docs.ogc.org/per/20-019r1.html

[5]  Albinet, C., Whitehurst, A.S., Jewell, L.A. et al.: A Joint ESA-NASA Multi-mission Algorithm and Analysis Platform (MAAP) for Biomass, NISAR, and GEDI. Surv Geophys 40, 1017–1027 (2019). https://doi.org/10.1007/s10712-019-09541-z (accessed 2021-01-12)

[6]  Bicarregui, J.C.: Quality and Trust in the European Open Science Cloud; available online at https://doi.org/10.2218/ijdc.v15i1.720 (accessed 2021-02-05)

[7]  Blumenfeld, J.: Getting Ready for NISAR — and for Managing Big Data using the Commercial Cloud; available online at https://nasadaacs.eos.nasa.gov/learn/articles/tools-and-technology-articles/getting-ready-for-nisar (2017)

[8]  Blumenfeld, J.: Data Chat: Dr. Christopher Lynnes; available online at https://earthdata.nasa.gov/learn/data-chat/data-chat-dr-christopher-lynnes (2020)

[9]  Cassidy, E.: Cloud Data Egress: How EOSDIS Supports User Needs; available online at https://earthdata.nasa.gov/learn/articles/cloud-data-egress

[10]  Celeste, E., Fabbrini, F.: Competing Jurisdictions: Data Privacy Across the Borders. In: Lynn T., Mooney J.G., van der Werff L., Fox G. (eds) Data Privacy and Trust in Cloud Computing. Palgrave Studies in Digital Business & Enabling Technologies. Palgrave Macmillan, Cham., available online at https://doi.org/10.1007/978-3-030-54660-1_3 (accessed 2021-01-12)

[11]  OMG Interoperability and Portability for Cloud Computing: A Guide, Version 2.0, available online at https://www.omg.org/cloud/deliverables/CSCC-Interoperability-and-Portability-for-Cloud-Computing-A-Guide.pdf (accessed 2021-01-15)

[12]  Signell, R. : Cloud-Performant NetCDF4/HDF5 with Zarr, Fsspec, and Intake, available online at https://medium.com/pangeo/cloud-performant-netcdf4-hdf5-with-zarr-fsspec-and-intake-3d3a3e7cb935 (2020)

[13]  K. Navulur, M.C. Abrams: OGC 20-088, Standardizing a Framework for Spatial and Spectral Error Propagation. Open Geospatial Consortium (2021). https://docs.ogc.org/dp/20-088.html