Publication Date: 2020-10-26
Approval Date: 2020-09-23
Submission Date: 2020-08-27
Reference number of this document: OGC 20-045
Reference URL for this document: http://www.opengis.net/doc/PER/EOAppsPilot-CRIM
Category: OGC Public Engineering Report
Editor: Tom Landry
Title: OGC Earth Observation Applications Pilot: CRIM Engineering Report
COPYRIGHT
Copyright © 2020 Open Geospatial Consortium. To obtain additional rights of use, visit http://www.opengeospatial.org/
WARNING
This document is not an OGC Standard. This document is an OGC Public Engineering Report created as a deliverable in an OGC Interoperability Initiative and is not an official position of the OGC membership. It is distributed for review and comment. It is subject to change without notice and may not be referred to as an OGC Standard. Further, any OGC Public Engineering Report should not be referenced as required or mandatory technology in procurements. However, the discussions in this document could very well lead to the definition of an OGC Standard.
LICENSE AGREEMENT
Permission is hereby granted by the Open Geospatial Consortium, ("Licensor"), free of charge and subject to the terms set forth below, to any person obtaining a copy of this Intellectual Property and any associated documentation, to deal in the Intellectual Property without restriction (except as set forth below), including without limitation the rights to implement, use, copy, modify, merge, publish, distribute, and/or sublicense copies of the Intellectual Property, and to permit persons to whom the Intellectual Property is furnished to do so, provided that all copyright notices on the intellectual property are retained intact and that each person to whom the Intellectual Property is furnished agrees to the terms of this Agreement.
If you modify the Intellectual Property, all copies of the modified Intellectual Property must include, in addition to the above copyright notice, a notice that the Intellectual Property includes modifications that have not been approved or adopted by LICENSOR.
THIS LICENSE IS A COPYRIGHT LICENSE ONLY, AND DOES NOT CONVEY ANY RIGHTS UNDER ANY PATENTS THAT MAY BE IN FORCE ANYWHERE IN THE WORLD. THE INTELLECTUAL PROPERTY IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NONINFRINGEMENT OF THIRD PARTY RIGHTS. THE COPYRIGHT HOLDER OR HOLDERS INCLUDED IN THIS NOTICE DO NOT WARRANT THAT THE FUNCTIONS CONTAINED IN THE INTELLECTUAL PROPERTY WILL MEET YOUR REQUIREMENTS OR THAT THE OPERATION OF THE INTELLECTUAL PROPERTY WILL BE UNINTERRUPTED OR ERROR FREE. ANY USE OF THE INTELLECTUAL PROPERTY SHALL BE MADE ENTIRELY AT THE USER’S OWN RISK. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR ANY CONTRIBUTOR OF INTELLECTUAL PROPERTY RIGHTS TO THE INTELLECTUAL PROPERTY BE LIABLE FOR ANY CLAIM, OR ANY DIRECT, SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM ANY ALLEGED INFRINGEMENT OR ANY LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR UNDER ANY OTHER LEGAL THEORY, ARISING OUT OF OR IN CONNECTION WITH THE IMPLEMENTATION, USE, COMMERCIALIZATION OR PERFORMANCE OF THIS INTELLECTUAL PROPERTY.
This license is effective until terminated. You may terminate it at any time by destroying the Intellectual Property together with all copies in any form. The license will also terminate if you fail to comply with any term or condition of this Agreement. Except as provided in the following sentence, no such termination of this license shall require the termination of any third party end-user sublicense to the Intellectual Property which is in force as of the date of notice of such termination. In addition, should the Intellectual Property, or the operation of the Intellectual Property, infringe, or in LICENSOR’s sole opinion be likely to infringe, any patent, copyright, trademark or other right of a third party, you agree that LICENSOR, in its sole discretion, may terminate this license without any compensation or liability to you, your licensees or any other party. You agree upon termination of any kind to destroy or cause to be destroyed the Intellectual Property together with all copies in any form, whether held by you or by any third party.
Except as contained in this notice, the name of LICENSOR or of any other holder of a copyright in all or part of the Intellectual Property shall not be used in advertising or otherwise to promote the sale, use or other dealings in this Intellectual Property without prior written authorization of LICENSOR or such copyright holder. LICENSOR is and shall at all times be the sole entity that may authorize you or any third party to use certification marks, trademarks or other special designations to indicate compliance with any LICENSOR standards or specifications.
This Agreement is governed by the laws of the Commonwealth of Massachusetts. The application to this Agreement of the United Nations Convention on Contracts for the International Sale of Goods is hereby expressly excluded. In the event any provision of this Agreement shall be deemed unenforceable, void or invalid, such provision shall be modified so as to make it valid and enforceable, and as so modified the entire Agreement shall remain in full force and effect. No decision, action or inaction by LICENSOR shall be construed to be a waiver of any rights or remedies available to it.
None of the Intellectual Property or underlying information or technology may be downloaded or otherwise exported or reexported in violation of U.S. export laws and regulations. In addition, you are responsible for complying with any local laws in your jurisdiction which may impact your right to import, export or use the Intellectual Property, and you represent that you have complied with any regulations or registration procedures required by applicable law to make this license enforceable.
- 1. Subject
- 2. Executive Summary
- 3. References
- 4. Terms and Definitions
- 5. Overview
- 6. Earth Observation Platform
- 7. Remote Sensing Application
- 8. Machine Learning Application
- 9. Climate Services
- 10. Earth Observation Applications
- 11. Conclusion
- Appendix A: Climate Indices Available Through Finch
- Appendix B: Process Description of Finch
- Appendix C: Remote Sensing Processing Graph
- Appendix D: JSON File for the Climate Process Execute Request Body
- Appendix E: CWL File for WPS 1.0 Provider
- Appendix F: JSON Request Body to Execute WorkflowSubsetPicker
- Appendix G: Technical Interoperability Experiments
- Appendix H: Example of Configurations for S3 Buckets Support
- Appendix I: Revision History
- Appendix J: Bibliography
1. Subject
This engineering report documents experiments conducted by CRIM in OGC’s Earth Observation Applications Pilot project, sponsored by the European Space Agency (ESA) and Natural Resources Canada (NRCan), with support from Telespazio VEGA UK. Remote sensing, machine learning and climate informatics applications were reused, adapted and matured in a common architecture. These applications were deployed in a number of interoperable data and processing platforms hosted in three Canadian provinces, in Europe and in the United States.
2. Executive Summary
CRIM’s key findings are as follows:
-
From the application developer’s perspective, it was easier to first develop and test locally, and then use an OGC Application Programming Interface (API) to deploy and execute remotely. This is partly due to a smoother learning curve from application to packaging, than from package to platform.
-
Conformance classes for an Application Deployment and Execution Service (ADES) API should be standardized. An offset in the support of API elements subsists in participants' implementations.
-
Application packaging, deployment and execution are appropriately defined by combination of Common Workflow Language (CWL) and Docker images.
-
More tests are required to better support application workflows running in multiple platforms. In order to build a federated cloud, these tests have to run regularly, in a structured and systemic fashion.
-
Participants' findings should offer feedback into EOEPCA, for example for platforms intending to support machine learning (ML) services. From the application developer’s perspective, ML apps are well defined and should be deployable like any other app. It is not clear with the current EOEPCA use cases that the platforms will easily integrate ML services (annotations, trained models, access to GPU clusters, etc.).
The business value of this initiative is articulated in CRIM’s promotional videos, expressed in both the perspective of platform managers and application developers. The figure below presents the key value propositions of multidisciplinary workflows and federated infrastructures.
Again, the reader is invited to consult the associated promotional videos for CRIM’s research and technological transfer motivations. Additionally, the following elements are noted:
-
See Section 6 for motivations related to the Earth Observation Data Management System (EODMS) and Pacific Boreal Cloud (PBC)
-
Machine learning applications, tools and services are of a particular interest with respect to modern architectures such as EOEPCA.
-
There is a need for a base, common stable applications package shared with the community, for example the Sentinel Application Platform (SNAP).
As summary of recommendations, and to introduce potential future work, CRIM proposes the following:
-
Establish a clear intention and roadmap with respect to CWL
-
Seek stronger links with CWL community. Ex: IPython2CWL
-
Consider de facto standardization of CWL for ProcessDescription and AP
-
-
Increase test coverage, depth and capabilities related to the common architecture
-
Consider expanding and enforcing an OGC test suite.
-
Provide additional examples, demonstrations, documentation and tutorials to application developers.
-
Describe and normalize hardware requirements of applications in the AP, and inversely, expose platform capabilities to application users and developers.
-
Establish base tests for applications, but also for system integrity of a federated cloud.
-
Test the SNAP Application Package (AP) with various data and I/O with GDAL, or other mission-specific modules or operators.
-
Provide additional QA of Rardarsat-1 and Radarsat Constellation Mission (RCM) read-writes operators, and integrate into SNAP. Mark RCM or RS-1 files with defective metadata in the catalogues.
-
-
Seek new projects and initiatives to continue maturation of the architecture and the infrastructure
-
Consider another pilot to develop federated workflows and test, for example cumulative effect project. For example, remote sensing applications (feature production, band management, ARD, datacubes) that can feed into machine learning application (detectors, classifiers).
-
Consider global, national and regional analysis applications combining observational outputs with long-term climate projections and indices.
-
Consider mapping the cumulative effect workflows and applications onto SDGs targets.
-
2.1. Document Contributor Contact Points
All questions regarding this document should be directed to the editor or the contributors:
Contacts
Name | Organization | Role |
---|---|---|
Tom Landry |
CRIM |
Editor |
Francis Charette-Migneault |
CRIM |
Contributor |
Mario Beaulieu |
CRIM |
Contributor |
Mathieu Provencher |
CRIM |
Contributor |
Louis-David Perron |
CRIM |
Contributor |
David Byrns |
CRIM |
Contributor |
Samuel Foucher |
CRIM |
Contributor |
William Mackinnon |
NRCAN |
Contributor |
Ryan Ahola |
NRCAN |
Contributor |
2.2. Foreword
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. The Open Geospatial Consortium shall not be held responsible for identifying any or all such patent rights.
Recipients of this document are requested to submit, with their comments, notification of any relevant patent claims or other intellectual property rights of which they may be aware that might be infringed by any implementation of the standard set forth in this document, and to provide supporting documentation.
3. References
The following normative documents are referenced in this document.
-
OGC: OGC: OGC 13-026r8, OGC OpenSearch Extension for Earth Observation 1.0, 2016
-
OGC: OGC: OGC 14-065r2, OGC Web Processing Service 2.0.2 Interface Standard Corrigendum, 2018
-
OGC: OGC: OGC 13-032r8, OGC OpenSearch Geo and Time Extensions 1.0.0, 2014
-
CWL: CWL group: Common Workflow Language Specifications, v1.1, 2020
Additionally, the following unpublished document is referenced in this document.
4. Terms and Definitions
For the purposes of this report, the definitions specified in Clause 4 of the OWS Common Implementation Standard OGC 06-121r9 shall apply. In addition, the following terms and definitions apply.
-
Climate Index
A term used to refer to properties of the climate that are not measured in the field or calculated by climate models but rather that are calculated or derived from climate variables such as temperature and precipitation. Examples include the number of growing degree-days, freeze-thaw cycles, and the drought code index (see variable).
-
Climate Information
Refer to climatic data that describe either past conditions, obtained from meteorological observations (stations, satellites, radars), or the future, obtained from the outputs of climate models.
-
Climate Model
A numerical representation of the climate system based on the physical, chemical, and biological properties of its components, their interactions and feedback processes, and accounting for most of its known properties.
-
Climate Variable
The term climate variable is used to refer to a variable that can be measured directly in the field (at meteorological stations for example) or that is calculated by climate models. (See Index)
-
CMIP5
Coupled Model Intercomparison Project, Phase 5. CMIP5 is a coordinated climate modeling exercise involving 20 climate-modeling groups from around the world. It has provided a standard experimental protocol for producing and studying the output of many different global climate models. The output from CMIP5 ensemble experiments is used to inform international climate assessment reports, such as those from the IPCC.
-
Downscaling
A method that can provide climate model outputs at a finer resolution than their original resolution. Two different approaches are prioritized: statistical downscaling and dynamic downscaling.
-
Statistical Downscaling
This type of downscaling relies on the use of statistical relationship that relates large scale climate features, named predictors, to predictants like local climate variables. (See downscaling)
4.1. Abbreviated Terms
-
ACL - Access Control List
-
ADES - Application Deployment and Execution System
-
AOI - Area Of Interest
-
AP - Application Package
-
API - Application Programming Interface
-
ARD Analysis Ready Data
-
CCCS - Canadian Center for Climate Services
-
CF - Climate Forecast
-
CNN - Convolutional Neural Network
-
CGDI - Canadian Geospatial Data Infrastructure
-
CRIM - Computer Research Institute of Montreal
-
CSW - Catalogue Service for the Web
-
CWL - Common Workflow Language
-
DKRZ - Deutsches Klimarechenzentrum
-
DL - Deep Learning
-
DNN - Deep Neural Network
-
ECCC - Environment and Climate Change Canada
-
ER - Engineering Report
-
EMS - Execution Management System
-
EO - Earth Observation
-
EOC - Earth Observation Clouds
-
EODMS - Earth Observation Data Management System
-
EOEPCA - Earth Observation Exploitation Platform Common Architecture
-
ESA - European Space Agency
-
ESGF - Earth System Grid Federation
-
EP - Exploitation Platform
-
GDAL - Geospatial Data Abstraction Library
-
GFCS - Global Framework for Climate Services
-
GPT Graph Processing Tool
-
IdP - Identity Provider
-
IPCC - Intergovernmental Panel on Climate Change
-
JSON - JavaScript Object Notation
-
LLNL - Lawrence Livermore National Laboratory
-
ML - Machine Learning
-
NRCan - Natural Resources Canada
-
OGC - Open Geospatial Consortium
-
OWS - OGC Web Services
-
QA - Questions and Answers
-
REST - Representational State Transfer
-
SDG - (United Nations) Sustainable Development Goals
-
SNAP - Sentinel Application Platform toolbox
-
TB - Testbed
-
TEP - Thematic Exploitation Platform
-
TIE - Technology Integration Experiments
-
TOI - Time Of Interest
-
URI - Uniform Resource Identifier
-
URL - Uniform Resource Locator
-
VM - Virtual Machine
-
WCS - Web Coverage Service
-
WFS - Web Feature Services
-
WMS - Web Map Service
-
WPS - Web Processing Service
-
WPS-T - Transactional Web Processing Service
5. Overview
Section 5 introduces the Earth Observation (EO) platform architecture, components, implementations and endpoints. It also lists the main experiments conducted with all participants platforms and applications.
Section 6 presents the remote sensing application proposed by CRIM in this pilot.
Section 7 describes the machine learning application and models developed by CRIM.
Section 8 explains the climate processes interoperability experiments aiming to support climate services in Canada.
Section 9 discusses more generally about Earth Observation application packages and their relationship with the platforms.
Section 10 concludes the report by offering a summary of the findings and recommendations.
Annex A provides a list of climate indices available through Finch WPS.
Annex B provides a sample process description for computation of climate indices in Finch WPS.
Annex C provides a sample remote sensing processing graph to run in the Graph Processing Tool (GPT) of SNAP.
Annex D provides a JSON file for the climate process Execute request body.
Annex E provides a CWL file to register an external WPS 1.0 provider.
Annex F provides a JSON request body to execute WorkflowSubsetPicker.
Annex G provides a list of all Technology Integration Experiments (TIEs) conducted by CRIM.
Annex H provides an example of configurations for S3 buckets support.
6. Earth Observation Platform
This section describes the various EO platforms deployed and integrated in this pilot by CRIM. It presents a high-level view from the perspective of platform developers.
6.1. Overview
For several years now, CRIM has been developing research software for its own researchers, and for the larger Canadian scientific community. A key milestone in this series of initiatives is project PAVICS (Plateforme pour l’Analyse et la Visualisation de l’Information Climatique et Scientifique), funded by CANARIE, and aiming to facilitate the Big Data workflow of climate scientists. This cloud research platform offers tailored climate processes as OGC WPS, as well as data services such as WCS, WFS and WMS. In order to modernize the open standards on which it relies, and to enable integration of other applications and expertise at CRIM, PAVICS platform was reused in parts in OGC Testbed-13 to 16. The EO Applications pilot is therefore a culminating effort to mature previous prototypes towards a more operational status. Two other important initiatives spurred from PAVICS. Firstly, ClimateData.ca, a climate information portal that enables Canadians to access, visualize, and analyze climate data, currently uses PAVICS as an analytical backend. Secondly, project DACCS (Data Analytics for Canadian Climate Services), funded by the Canada Foundation for Innovation (CFI), greatly extends PAVICS scope. This project aims to bridge the gap between climate and EO platform by integrating remote sensing and machine learning uses cases. In the upcoming years, project DACCS is expected to continue development of PAVICS and to support ClimateData.ca climate services.
Another participant, sponsor and beneficiary of the pilot project is Natural Resources Canada (NRCan). The Canadian Forest Service (CFS) provided cloud resources through Pacific Boreal Cloud, but also EO data and forestry use cases to drive innovation, in this pilot and since Testbed-13. In parallel, NRCan is also investigating possible use of the architecture for its Earth Observation Data Management System (EODMS). While the architecture has shown promise within experimental testing environments, further investigation using real-world scenarios is required. Through this pilot project, NRCan evaluates the architecture on a version of EODMS, using data and applications that represent common use cases for EODMS clients. Results will highlight a path forward for improvements to the architecture to fully meet real-world operational needs, as well as defining a roadmap for EODMS’s transition to an exploitation platform. It is also hoped that this architecture will improve the ability of Canada’s remote sensing scientists to leverage Canadian and international EO information in a seamless manner.
6.2. EO Platforms Deployment
One for the major particularities of CRIM’s contribution to the pilot is the installation of its ADES/EMS solution, Weaver, on three separate clouds. Additionally, a local EMS sent requests to Spacebel’s ADES. The intent is to deploy and run applications on the most appropriate platform, and to allow elaboration of more advanced operational scenarios. The following figure presents the deployment diagram of platform elements.
The table below lists each of the deployments of ADES and EMS servers. The specificities of the configurations can be found in the appropriate subsection below.
Type | Usage | Endpoint URL | Configuration | Cloud vendor |
---|---|---|---|---|
EMS |
CRIM Dev |
Generic |
OpenStack |
|
ADES |
CRIM Dev |
Generic |
OpenStack |
|
EMS |
CRIM Prod |
Finch |
OpenStack |
|
EMS |
ClimateData.ca Prod |
Finch |
cloud.ca, VMWare |
|
EMS |
CRIM Demo |
Generic |
OpenStack |
|
ADES |
NRCAN Pacific Boreal Cloud |
Generic |
OpenStack |
|
EMS |
NRCAN Pacific Boreal Cloud |
Generic |
OpenStack |
|
ADES |
NRCAN EODMS |
Generic |
Amazon AWS |
|
EMS |
NRCAN EODMS |
Generic |
Amazon AWS |
6.2.1. API routes
The OGC API - Processes path is obtained by appending /ems/api and /ades/api to the EMS and ADES endpoint URLs, respectively, while processes can be found at /ems/processes and /ades/processes.
To facilitate research platform software re-use, CANARIE requires that all platforms developed under the Research Software program, like PAVICS, support the CANARIE Platform and Service Registry and Monitoring System. For each endpoint and as depicted in the next figure, the /canarie route provides the registry and monitoring API that fully describe the details of the platform, how to use it, and how to obtain assistance. CANARIE’s platform monitoring service also measures the availability and usage of research platforms.
6.2.2. CRIM Hybrid Cloud
For the pilot, CRIM provided resources from its private cloud. These cloud resources have been provisioned and used continuously in Testbed-13, 14, 15 and 16. The basic provisioned hardware is composed of twelve OpenStack VM m4.large (2 VCPU, 8GB RAM, 200GB DISK) and three volumes of 2TB on spinning-disk. These resources have been extended in the pilot to accommodate applications with large memory footprints. Virtual machines are deployed in a public OpenStack tenant. Its IT department supports daily operations, while R&D personnel manages a large part of its application space. CRIM, being part of Réseau d’informations scientifiques du Québec (RISQ) and CANARIE network, has access to a pan-Canadian high-speed network dedicated to research.
6.2.3. Pacific Boreal Cloud (PBC)
The Canadian Forest Service (CFS) presented or supported several experiments in OGC’s innovation program, in part by contributing resources on the Pacific Boreal Cloud (PBC), a high-performance cloud infrastructure at the Pacific Forestry Centre in Victoria. The center provided virtual machines, storage, networking, as well as EO data, to enable numerous experiments such as biomass estimation using point clouds, cloudless mosaicking, tree species recognition or lake-river discrimination.
In OGC Testbed-13, CFS expressed interest in extracting polarimetric parameters from Radarsat-2 SQW data using a combination of OGC Web services and Cloud environments. The requirements for such EO processing are well inline with the remote sensing application presented in this pilot in Section 7. In the architecture envisioned at that moment, resources are only used when necessary, thus reducing overhead costs of maintaining expensive servers or computing power. Assuming successful research and development, CFS could consider processing larger regions in Canada, and possibly for the National Forest Inventory Plots across the country. Later, in Testbed-14, CRIM assessed the feasibility of installing its EMS on the PBC, but did not proceed at the time. CRIM also accessed the PBC to ensure that application packages could be deployed by an ADES hosted on the cloud. Finally, the Distributed Access Control System (DACS) security solution was tested and briefly evaluated. For this pilot, one key expectation of Pacific Forestry Service and sponsors is the deployment of an ADES/EMS pair and successful execution of a remote sensing application.
Summary of Experiments
All previous ADES/EMS deployments were designed to have their own host name (distinct for each one), with the ADES/EMS exposed at the root URL. To expose the service externally at PBC, it was mandatory to use the proxy provided by PBC, with a common host name (borealweb.nfis.org) and an added prefix to the URL path to differentiate between the ADES/EMS residing under the shared host name (/ogc-pilot-ades/ or /ogc-pilot-ems/). CRIM performed code and configuration changes to adapt some of the components to support this environment. Moreover, communications between PBC’s proxy and ADES/EMS deployments were in HTTP instead of HTTPS. Changes were needed to support this difference to properly redirect incoming requests to the appropriate services (Magpie, Twitcher, Weaver).
For applications, the stacker app (#TIE1001) was run but error in SNAP - an underlying application in deployed process package. Another application was successfully executed (#TIE1302) to validate the server configuration, and that stacker failure was only caused by the failing package dependency. In that case, the executed application converts a JSON-file with literal NetCDF links into direct HTTP references to NetCDF files which can be exposed externally.
CRIM deployed the servers twice on the PBC: once for initial deployment/configuration, and once to update Weaver to more recent version following EODMS developments. The latest deployed version on PBC for ADES, EMS and apps is now in sync with EODMS. Experimental feedback from NRCan CFS is pending regarding the execution of more advanced applications, for specific use cases, as per their desired functionalities.
6.2.4. EO Data Management System (EODMS)
NRCan is responsible for the operation of the EODMS. EODMS provides an archiving and discovery system for the Government of Canada’s EO data (e.g. satellite imagery). As EODMS facilitates federal and public access to crucial geospatial information, it represents a core component of the CGDI.
New developments in the EO domain present challenges to the ongoing viability of the current EODMS architecture. Traditionally, EO users have identified appropriate imagery in repositories, downloaded the required information and finally applied necessary processing on local workstations. With massively increasing availability of EO data, the EO community is moving to an “exploitation platform” approach. Here, users and their applications are brought close to the physical location of EO data, helping to minimize data transfer between repositories and applications. To fully enable EO exploitation platforms, an interoperable, open standards-based architecture transformation is required.
Summary of Experiments
Initially, CRIM experienced difficulty to connect to the Amazon Web Services (AWS) instance through AWS SSM (Systems Manager Agent) instead of plain SSH as per NRCan’s security requirement. One needs to first connect via AWS-CLI to the AWS instance running the EODMS-flavored ADES/EMS. The server configuration can then be accessed. In contrast, NRCan’s PBC requires two-step SSH: one to access the Boreal Cloud proxy, and another to connect to the actual server instance from there. Other instances of EODMS are directly accessed via a single SSH to the server.
Per NRCAN’s requirement, AWS Elastic Load Balancer (ELB) was used instead of direct HTTPS connection to the services, even if when running single instances of each services. This caused difficulty with health check configuration. If ELB check fails, calls to EMS/ADES servers fail 50% of time. Health check is mandatory with ELB. Some components need to communicate between each other through the actual public URL, which normally goes through ELB. Due to NRCAN’s security policy, it was impossible to connect to ELB from the instances. A workaround had to be applied by using Docker’s extra_hosts options and disable SSL certificate validation for those internal calls.
Currently, only one server exists behind the ELB of each ADES/EMS. Had it been required or desired to have several parallel deployments, several issues would need to be considered. First, major refactoring of the server configuration would be required to support a separate database from the other services, so it can be shared across multiple load-balanced instances. Second, multiple tests would be required to ensure no incoherent operations occur, such as race conditions of read/write to the shared database. Other considerations such as common file storage and shared Docker images cache for coherent behavior across replicated instances would also be to be addressed
Generic docker-compose for EMS/ADES is in a private git repository but could be shared. There is minimal amount of sensitive info (if any), as most is dummy information to be replaced by actual deployment. The generic configuration is implemented using template files, that generate actual configurations loaded to run the server after substitution of specific server configurations. Specific server configurations are unique for each deployment (every combination of ADES/EMS and location: EODMS, PBC, CRIM, etc.). The specific configuration is "side-by-side" to the generic ones and overrides the needed parameters to generate the specific configuration files from the templates.
Docker images of the services (both specific Weaver, Magpie, Twitcher, as well as generic database, proxy, etc.) are all public, and used for installation and maintenance by pulling them from their respective location via docker-compose definition. Issues can be added to Weaver, Twitcher, Magpie GitHub repositories. Once corrected, merged and tagged, the tagged version launches a Docker build automatically. The generated image is pushed to the Docker repository for use by stakeholders. Specific server configurations need to be updated with desired versions of each service, according to desired functionalities.
Both TIE#1002 and TIE#1003 are Stacker app execution tests, and they both fail due to SNAP as in NRCan PBC’s case. Additionally, TIE#1102 is for ML Segmentation app and TIE#1303 for basic JSON-to-NetCDF converter are planned but data was not provisioned in time to execute them. CRIM is standing by for more use cases from NRCan in order to deploy application and test their execution.
AWS S3 Buckets
In order to support EODMS Amazon infrastructure, Weaver was modified to take the S3 bucket location as the input/output location of processes. S3 Buckets were not trivial to connect to EODMS' Amazon Virtual Machines (VMs), in part due to the very high level of security. Once S3 Buckets were accessible to servers, connection configurations and credentials had to be mounted to Weaver in order to actually allow fetching/storing input/output files to S3 buckets. For process execution inputs, a simple S3 endpoint is required, as normally done for HTTP(S) file references. For outputs, Weaver needs to be configured with weaver.wps_output_bucket = <s3-bucket>. Aside from providing the bucket reference, other capabilities were developed, for instance parsing the reference, retrieving AWS configuration and staging files locally for use by the executed process. Sample configurations employed for CRIM and EODMS instances can be found in Annex H.
6.2.5. NRCan Intentions
With two Weaver platforms now deployed inside the NRCan’s EO data value stream, one upstream with EODMS and LEVEL 0, 1 data and one downstream in the PBC with LEVEL 2 value-add data, NRCan will be looking to invest in two main avenues:
-
On-boarding of EODMS client workflows as ADES containers to operate in close proximity to the EO data it would normally have to download. Critical clients include the NRCan’s Emergency Geomatics Service who provide critical, near real-time information to Public Safety Canada and emergency responders during ice break-up and flood events, ECCC’s Canadian Ice Service who provide timely and accurate information about ice in Canada’s navigable waters and the Near Real Time Ship Detection services from the Department of National Defence (DND).
-
Interoperability of the EODMS-WEAVER and PBC-WEAVER to reciprocate value-added services using the standards that Weaver has enriched each platform with.
6.3. Climate Platforms
General information about ClimateData.ca can be found in Section 8. More information about experiments conducted with ESGF can be found in ESGF Compute Challenge engineering report [1].
6.4. Interoperability Experiments
All interoperability experiments either planned or conducted by CRIM in this pilot project can be found in Annex G. Note that the TIE numbers in this table are used throughout this report to refer to specific issues or findings. Integration tests then evaluate EMS/ADES server configurations using various AP deployment and execution combinations in order to evaluate functional operation of intended server-specific behavior. For example, TIE-1004 is presented below.
-
Each TIE number in the TIE Table is prefixed on the corresponding test functions defined in the code to ease their identification.
-
Tests are separated into different files in order to segregate configurations of the respective server parameters.
-
JSON payloads for Application Packages and/or requests are provided in tests/resources.
@pytest.mark.ADES
@pytest.mark.CRIM # app & server
@pytest.mark.application
@pytest.mark.Deimos
@pytest.mark.ProbaV
@pytest.mark.RS2
@test_steps("deploy", "submit", "execute", "stage-out")
def TIE_1004_CRIM_stacker_on_CRIM_ADES(): # noqa: C0103,N802
"""Similar to TIE-1000, but with different data sources."""
cookies = login(CRIM_ADES_MAGPIE, "CRIM_ADES_USERNAME", "CRIM_ADES_PASSWORD")
proc_id = deploy_process(CRIM_ADES_WEAVER, "crim-stacker-deploy.yml", cookies=cookies, visibility=True)
yield "deploy"
job_id = execute_process(CRIM_ADES_WEAVER, "crim-stacker-execute-no-sentinel.yml", proc_id, cookies=cookies)
yield "submit"
monitor_job(CRIM_ADES_WEAVER, proc_id, job_id, cookies=cookies)
yield "execute"
fetch_result(CRIM_ADES_WEAVER, proc_id, job_id, cookies=cookies)
yield "stage-out"
6.5. Applications and Processes
6.5.1. Data Interfaces
The following data interfaces were used in the system, in the pilot or in previous initiatives.
-
HTTPS
-
OGC API - Features
-
WMS, WFS, WCS
-
S3 bucket
-
local filesystem
-
OPEnDAP
-
OpenSearch
-
EOImage
6.5.2. Applications
Most of CRIM’s applications adopt a fan-in design, where an array of files is reduced to a single output. CRIM public application packages repository can be found on GitHub. The AP contains the CWL descriptors that refers to the appropriate execution unit, in this case Docker images. The images are built on demand, and stored in a private Docker registry. The table below shows the source code location and the resulting Docker images. Images published in the /ogc-public can be accessed, pulled and executed by anonymous users. Access to all other images requires appropriate credentials by end users or the ADES.
Source repo | Docker registry | Access | Description |
---|---|---|---|
docker-registry.crim.ca/ogc-public |
Public |
SNAP/ML applications developed since OGC-TB14 project |
|
docker-registry.crim.ca/ogc |
Private |
SNAP preprocessing apps + EMS/ADES-related items |
6.5.3. XML and JSON Bindings
CRIM employs JSON bindings for body of both the deploy and execute requests, as described in WPS-T 2.0 with REST/JSON bindings. Translation from XML to JSON, and vice-versa, is a matter of changing a library. With respect to Python programming, use of JSON files are considered a best practice. JSON files are supported natively by Python standard library. Additionally, their structure is very similar. Participants would benefit from selecting and maintaining common JSON parsing libraries and services.
6.5.4. Quoting and Billing
In this pilot, CRIM did not seek to improve its implementation of billing and quoting presented in Testbed-14. As stated in the ADES & EMS Results and Best Practices ER, the complexity of quoting a workflow execution is very high. As the number of steps increase, the deterministic behavior of a workflow rapidly decrease. For example, assuming adequate description by application developers, it might be possible to infer estimates of time required for a single application based on input data volume alone, for a specific type of machine. If the data output of that first application is meant to be an input for a second application, the subsequent estimates might vary wildly due to mounting uncertainty on data volume. Also, as the workflow gets larger, the parameter space grows, adding even more uncertainty.
6.6. Platform Architecture
6.6.1. EMS and ADES Responsibilities
Experiments with multiple sites indicate that EMS is a good practice, as it acts as a proxy to the ADES. As such, it provides more flexibility in security schemes. Technically, nothing would preclude from a service or user to call ADES API directly. By allowing registration of pre-deployed WPS and API routes, the EMS can also act as a federated process catalog. An EMS can take into account services, applications and processes provided by several ADES, facilitating its role of orchestrator of distributed workflows.
6.6.2. Security
-
There was no use for WSO2 in this pilot, but the challenge of authentication in federated environments remains.
-
The Policy Enforcement Point (PEP) of CRIM’s solution is called Twitcher; as it acts a security proxy.
-
Similarly, the EMS also acts as a proxy to the ADES, but not specifically or exclusively as security proxy.
-
The roles of the PEP, the EMS process registry and the security proxy can overlap, but were not addressed specifically in this pilot.
6.6.3. Resource Access
Docker credentials
For Docker credentials and configs, see TIE-3000, where Pixalytics added CRIM as user in Docker repository. A file located at ~/.docker/config.json is automatically picked up by Docker pull commands (standard file employed by Docker CLI). Weaver is therefore launched by mounting that file at the appropriate location after doing docker login command toward the private repository that contains targeted images in the Application Packages. Given proper credentials, this allows Weaver to run docker pull from the private repository since the user is authenticated within "~/.docker/config.json".
Request validation
A YAML file is loaded in Weaver to validate at run-time specific or invalid requests. A custom configuration specifies, for matched regex URLs, additional parameters to be provided or attributed to HTTP requests, such as custom headers, credentials, timeout duration, etc. File in question has an example is on GitHub.
Other
We also note the following:
-
In order to connect to S3 buckets, CRIM adapted its ADES to conform to NRCan’s infra IAM roles
-
We note the TIE-4000 with Rhea Group, where data inputs where self-hosted, but without security certificates. Added config to trust the site, bypassing checks.
-
It was challenging to manage, store, and inject credentials in apps, but even more in workflows
-
No discussion for security in DAPA, or even in ADES-EMS flow. Security should be baked in an API.
6.6.4. Access Control Lists
-
ACL is managed by the Magpie component, by users, groups, services and by resources: same approach as Testbed-14, but with additional fixes.
-
Could restrict execution of a visible, deployed process by looking at quotas for example.
-
By default, CRIM deploys an application as private (not visible). Visibility by a user is not the same as right to execute.
-
Learning curve required to do authorized API requests
-
Federated ACL are still useful in more advanced use cases. Review use of external Identity Providers, such as Keycloak. Could be connected through OpenID.
-
Establish clear trust relationships between ADES an EMS.
6.6.5. Implementation
Configurations
Generic source configuration repository for server instances (OGC, NRCAN). OGC-based servers employ a generic configuration repository and are specialized with server-specific settings and/or overrides. Finch-based servers employ a similar docker-compose structure as other servers, but the generic configuration repository is different. Each server has the above 'generic' configuration under path ~/compose while the specific configurations are under ~/config.
-
Pull changes from ems-ades-compose under the ~/compose directory.
-
Pull changes from branch master of the corresponding specific server configuration repository under the ~/config directory.
-
Run server-compose up -d. This step does not need a specific directory, as it finds its way from anywhere to the docker-compose in ~/compose. Sometimes a --force-recreate parameter is needed to force the hand of docker-compose to regenerate some images. System link references from ~/compose to relevant files in ~/config are already defined to find required environment variables and overrides.
See here for more information on PAVICS configuration.
7. Remote Sensing Application
This section presents experiments conducted with a general-purpose remote sensing application, specialized as an image stacker.
7.1. Overview
This application runs an acyclic graph of remote sensing operations. In this pilot, the graph describes a stacker that takes several images as input and produces a single co-registered multi-band output. Internally, the Graph Processing Tool (GPT) of the Sentinel Application Platform (SNAP) toolbox takes as input an XML document describing the remote processing graph. At runtime, the application automatically parses the CWL to map inputs and outputs onto an internal XML representation.
7.1.1. Purpose
Data from different missions, captured at different azimuth or time, need to be co-registered to a common reference before analysis. This registration takes into account the local topography of the terrain to compensate for visual aberrations and missing information from the sources. The purpose of a stacker is therefore to generate a set of analysis-ready data from heterogenous observational sources. The resulting file, a multi-band image, exhibits uniform spatial sampling projected on the same location. Akin to datacubes, the data can then be considered analysis-ready.
7.1.2. Data Types
The application’s reader and writer operations are compatible with the file types presented in the table below.
Format | Description |
---|---|
BEAM-DIMAP |
The standard BEAM I/O format. It comprises an XML header based on the SpotImage/CNES DIMAP schema and ENVI images for the raster data. |
GeoTIFF |
A widely used EO data format, e.g. for Quickbird, LANDSAT, SPOT. |
NetCDF |
A widely used EO data format. BEAM supports NetCDF files conforming to the NetCDF CF Metadata Convention. |
HDF-EOS |
BEAM supports the HDF-EOS profile (HDF4) used by NASA Ocean Color data products of SeaWiFS, MODIS, OCTS, CZCS, and the gridded MODIS L3 products. |
7.1.3. Open Source Software
The main software packaged in the application is SNAP. Below are the main components used.
-
Sentinel Toolbox - SNAP, ESA’s SentiNel Application Platform, version 7.x
-
Sen2Cor - Processor for Sentinel-2 Level 2A product generation and formatting
-
Snappy - Python interface from SNAP
For this application, CRIM also created a new toolbox, CRIMTBX, that can be easily added to the current SNAP modules. At the moment of writing, the source code of this toolbox is not yet released as open source software, but is available on demand. Below are the operations provided by the CRIMTBX toolbox and used by the application.
Operator Name | Short Description |
---|---|
StackCreationOp.java |
Utility functions to create a stacked product from a series of input images |
Collocate.java |
Collocates two products based on their geocoding |
MTAnalysisOp.java |
Multi-Temporal Analysis operations |
SfsOp.java |
Structural Feature Set (SFS), including Standard Deviation, Mean, Maximum and Minimum |
ThreshOp.java |
Separates an image in two or more classes using Otsu’s method |
7.2. Inputs
The application uses a series of files pointing to imagery data products to be co-registered and stacked. All inputs defined in the CWL are first downloaded (or simply mounted) in the working directory by the CWL runner. As shown in the code excerpt below, each image input is then mapped in a temporary internal XML file descriptor used by SNAP’s Read operation.
{
<graph id="Graph">
<version>1.0</version>
<node id="Read">
<operator>Read</operator>
<sources/>
<parameters class="com.bc.ceres.binding.dom.XppDomElement">
<file>./RS2_OK18072_PK188251_DK178156_F22_20090501_110525_HH_HV_SLC.zip</file>
</parameters>
</node>
<node id="Read(2)">[...]</node>
<node id="Read(3)">[...]</node>
<node id="Read(4)">[...]</node>
<node id="Read(5)">[...]</node>
[...]
</graph>
}
}
The stacker application was tested using Deimos, PROBA-V, Radarsat-2, Sentinel-1 and Sentinel-2 data products found in a common ROI over Montreal. As the RS2_OK18072_PK188251_DK178156_F22_20090501_110525_HH_HV_SLC.zip product is defined first, it is used as a common referential. Below are the samples used:
Name | Description |
---|---|
RS2_OK18072_PK188251_DK178156_F22_20090501_110525_HH_HV_SLC.zip |
RADARSAT2 product, HH HV SLC |
Deimos.tif |
Deimos image, unidentified data product |
PROBAV_S1_TOA_X10Y02_20150617_1KM_V101.tif |
PROBAV data product |
S1A_IW_GRDH_1SDV_20170519T224357_20170519T224422_016657_01BA57_B4F7.SAFE.zip |
Sentinel-1 data product, GRDH |
S2A_MSIL1C_20191212T155641_N0208_R054_T18TXR_20191212T191005.zip |
Sentinel-2 data product |
7.3. Processing
All dependencies required to install SNAP are packaged in an image named esa-snap-install-7. In turn, this image serves as a base for esa-snap-proc-7, which offers helper code to interface with GPT. Finally, from this last image is derived snap7-stack-creation. This image contains the CWL descriptor file as well as the parser that produces the internal XML representation.
Below is the entry point of the snap7-stack-creation Docker file. In this file, parse.py first generates the temporary graph.xml before calling GPT, effectively starting execution of the application.
#!/usr/bin/env bash
set -ex
GRAPH="/tmp/graph.xml"
CUR_DIR=$(dirname $(realpath $0))
echo "Generating process graph"
${CONDA_ENV}/bin/python ${CUR_DIR}/parse.py --graph "${GRAPH}" "$@"
echo "Processing graph"
gpt ${GRAPH}
echo "Process complete"
In the graph, the stacking operation by itself is contained in the operator StackCreation of the CRIMTBX toolbox. The first data product is chosen as reference for projection and resampling operations on all other bands from all other products. Bands are projected and resampled on the same geographic coordinate system and spatial resolution of the reference band. The intersection between the different bands is determined, and the bands are then extracted according to this intersection. The code excerpt below shows the sourceProduct mapping of read data products as inputs for the StackCreation operation.
{
<graph id="Graph">
<version>1.0</version>
<node id="Read">[...]</node>
#for all images found in CWL inputs, do a node read
[...]
<node id="StackCreation">
<operator>StackCreation</operator>
<sources>
<sourceProduct refid="Read"/>
#for all images read, map sources
[...]
</sources>
<parameters class="com.bc.ceres.binding.dom.XppDomElement"/>
</node>
<node id="Write">[...]</node>
</graph>
}
7.3.1. Parameters
The application’s interface has been kept intentionally simple, with a minimal number of parameters. While this led to simpler packaging, it causes limitations when using multiresolution products as reference images. In this implementation, only the metadata of the first band is used as a reference. In case of products with multiple resolutions, for example Sentinel2 at 10-20-60 meters, it is not possible to use a 10m resolution because these are not the first band.
7.4. Outputs
{
<graph id="Graph">
<version>1.0</version>
<node id="Read">[...]</node>
#for all images found in CWL inputs, do a node read
[...]
<node id="StackCreation">[...]</node>
<node id="Write">
<operator>Write</operator>
<sources>
<sourceProduct refid="StackCreation"/>
</sources>
<parameters class="com.bc.ceres.binding.dom.XppDomElement">
<file>./stacker_output.tif</file>
<formatName>GeoTIFF</formatName>
</parameters>
</node>
</graph>
}
The output is a single file, by default named stacker_output.tif, containing separate bands for each input data source. The following figure shows an RGB visualization of the output bands. From left to right, Radarsat-2, Sentinel-1, Sentinel-2, ProbaV, Deimos.
8. Machine Learning Application
This section presents a summary of experiments conducted with a trained model operating on satellite imagery.
8.1. Overview
This application uses a helper module called thelper to run model inference on an input raster image using a sliding window. A short annotation campaign was conducted before training of a model using PyTorch. The learned model classifies Sentinel-2 into 8 separate land use classes.
8.1.1. Purpose
The purpose of this application is to determine the most likely class for each pixel of a Sentinel-2 image. The figure below shows annotations that were used to train the model.
Once trained, the model output classes or labels for each pixel of an input image.