I. Executive Summary
The Testbed 18 task Identifiers for Reproducible Science explored and developed EO workflows demonstrating the best practices from FAIR data and reproducible science whilst exploring the usability of Whole Tale as a component for reproducible workflows. EO workflows involve multiple processing steps such as imagery pre-processing, training of AI / ML models, the fusion / mosaicking of imagery together, and other analytical processes. In order for EO workflows to be reproducible, each step of the process must be documented in sufficient detail and that documentation needs to be made available to scientists and end users.
The following five participating organizations contributed workflows for the Testbed 18 Reproducible Science task.
52 Degrees North developed a Whole Tale workflow for land cover classification.
Arizona State University is developing a reproducible workflow for a deep learning application for target detection from earth observation imagery.
Ecere worked on the implementation of reproducible workflows following the approach described in the OGC API Process Part 3: Workflows and Chaining for Modular OGC API Workflows.
GeoLabs developed a reproducible workflow that runs an OGC API Process and Feature Server instance within a Whole Tale environment.
Terradue developed a water body detection Application Package to cover the identifier assignment and reproducibility from code to several execution scenarios (local, Exploitation Platform, Whole Tale) and is the editor for the Reproducible Best Practices ER, which is another component of the Reproducible Science stream.
Over the course of the Reproducible Science task multiple considerations and limitations for reproducible workflows were discovered including the following.
The expansion of FAIR to include replicability, repeatability, reproducibility, and reusability (reproducible-FAIR).
Replicability: A process with the same input yields the same output.
Repeatability: A process with a similar input yields the same output.
Reproducibility: different inputs, platforms, and outputs result in the same conclusion.
Reusability: The ability to use a specific workflow for different areas with the same degree of accuracy and reliability on the output.
Addressing randomness in deep learning applications.
Addressing the limitation of Whole Tale’s inability to assign a DOI to a binary docker image used to build a Whole Tale experiment.
Recommended future work includes the impact of FAIR workflows for healthcare use cases which makes data more available and reliable to researchers, healthcare practitioners, emergency response personnel, and decision makers.
II. Keywords
The following are keywords to be used by search engines and document catalogues.
testbed, docker, web service, reproducibility, earth observation, workflows, whole tale, deep learning, fair
III. Security considerations
No security considerations have been made for this document.
IV. Submitters
All questions regarding this document should be directed to the editor or the contributors:
Table — Submitters
Name | Organization | Role |
---|---|---|
Paul Churchyard | HSR.health | Editor |
Ajay K. Gupta | HSR.health | Editor |
Martin Pontius | 52 North | Contributor |
Chia-Yu Hsu | Arizona State University | Contributor |
Jerome Jacovella-St-Louis | Ecere | Contributor |
Patrick Dion | Ecere | Contributor |
Gérald Fenoy | GeoLabs | Contributor |
Fabrice Brito | Terradue | Contributor |
Pedro Goncalves | Terradue | Contributor |
Josh Lieberman | OGC | Contributor |
V. Abstract
The OGC’s Testbed 18 initiative explored the following six tasks.
1.) Advanced Interoperability for Building Energy
2.) Secure Asynchronous Catalogs
3.) Identifiers for Reproducible Science
4.) Moving Features and Sensor Integration
5.) 3D+ Data Standards and Streaming
6.) Machine Learning Training Data
Testbed 18 Task 3, Identifiers for Reproducible Science, explored and developed workflows demonstrating best practices at the intersection of Findable, Accessible, Interoperable, and Reusable (or FAIR) data and reproducible science.
The workflows developed in this Testbed included:
the development of a Whole Tail workflow for land cover classification (52 Degrees North);
the development of a reproducible workflow for a deep learning application for target detection (Arizona State University);
the implementation of reproducible workflows following the approach described in the OGC API Process Part 3: Workflows and Chaining for Modular OGC API Workflows (Ecere);
the development of a reproducible workflow that runs an OGC API — Process and Feature Server instance within a Whole Tale environment (GeoLabs); and
the development of a water body detection Application Package to cover the identifier assignment and reproducibility from code to several execution scenarios (local, Exploitation Platform, Whole Tale) (Terradue).
Testbed 18 participants identified considerations and limitations for reproducible workflows and recommendations for future work to identify the benefits of reproducible science for healthcare use cases.
Testbed-18: Identifiers for Reproducible Science Summary Engineering Report
1. Scope
This report is a summary of activities undertaken in the execution of the Testbed 18 Identifiers for Reproducible Science Stream. This included the development of best practices to describe all steps of an Earth Observation scientific workflow, including:
input data from various sources such as files, APIs, and data cubes;
the workflow itself with the involved application(s) and corresponding parameterizations; and
output data.
The participants were also tasked with producing reproducible workflows and examining the feasibility of Whole Tale as a tool for reproducible workflows.
2. Normative references
The following documents are referred to in the text in such a way that some or all of their content constitutes requirements of this document. For dated references, only the edition cited applies. For undated references, the latest edition of the referenced document (including any amendments) applies.
Open API Initiative: OpenAPI Specification 3.0.2, 2018 https://github.com/OAI/OpenAPI-Specification/blob/master/versions/3.0.2.md
van den Brink, L., Portele, C., Vretanos, P.: OGC 10-100r3, Geography Markup Language (GML) Simple Features Profile, 2012 http://portal.opengeospatial.org/files/?artifact_id=42729
W3C: HTML5, W3C Recommendation, 2019 http://www.w3.org/TR/html5/
Schema.org: http://schema.org/docs/schemas.html
R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, T. Berners-Lee: IETF RFC 2616, Hypertext Transfer Protocol — HTTP/1.1. RFC Publisher (1999). https://www.rfc-editor.org/info/rfc2616.
E. Rescorla: IETF RFC 2818, HTTP Over TLS. RFC Publisher (2000). https://www.rfc-editor.org/info/rfc2818.
G. Klyne, C. Newman: IETF RFC 3339, Date and Time on the Internet: Timestamps. RFC Publisher (2002). https://www.rfc-editor.org/info/rfc3339.
M. Nottingham: IETF RFC 8288, Web Linking. RFC Publisher (2017). https://www.rfc-editor.org/info/rfc8288.
H. Butler, M. Daly, A. Doyle, S. Gillies, S. Hagen, T. Schaub: IETF RFC 7946, The GeoJSON Format. RFC Publisher (2016). https://www.rfc-editor.org/info/rfc7946.
3. Terms, definitions and abbreviated terms
This document uses the terms defined in OGC Policy Directive 49, which is based on the ISO/IEC Directives, Part 2, Rules for the structure and drafting of International Standards. In particular, the word “shall” (not “must”) is the verb form used to indicate a requirement to be strictly followed to conform to this document and OGC documents do not use the equivalent phrases in the ISO/IEC Directives, Part 2.
This document also uses terms defined in the OGC Standard for Modular specifications (OGC 08-131r3), also known as the ‘ModSpec’. The definitions of terms such as standard, specification, requirement, and conformance test are provided in the ModSpec.
For the purposes of this document, the following additional terms and definitions apply.
3.1. API Coverages
“A Web API for accessing coverages that are modeled according to the Coverage Implementation Schema (CIS) 1.1. Coverages are represented by some binary or ASCII serialization, specified by some data (encoding) format.” Open Geospatial Consortium
3.2. API Features
“A multi-part standard that offers the capability to create, modify, and query spatial data on the Web and specifies requirements and recommendations for APIs that want to follow a standard way of sharing feature data.” Open Geospatial Consortium
3.3. GeoTIFF
“A GeoTIFF file extension contains geographic metadata that describes the actual location in space that each pixel in an image represents.” Heavy.ai
3.4. Copernicus CORINE Land Cover dataset
A collection of Land Cover images that covers over 44 classes since 1985. Copernicus
3.5. Data Cube
“A multi-dimensional (“n-D”) array of values, with emphasis on the fact that “cube” is just a metaphor to help illustrate a data structure that can in fact be 1- dimensional, 2-dimensional, 3-dimensional, or higher-dimensional.” Open Geospatial Consortium
3.6. DVC
“Open-Source Version Control System for Machine Learning Projects” DVC
3.7. GDAL
“A translator library for raster and vector geospatial data formats that is released under an MIT style Open Source License by the Open Source Geospatial Foundation.” GDAL
3.8. Non-Deterministic Models
An algorithm that can exhibit different behaviors on different runs which are useful for finding approximate solutions when an exact solution is far too difficult or expensive to derive using a deterministic algorithm. Engati
3.9. Parameterization Tuning / Tuning Parameters / Hyperparameters
Parameters that are a component of machine learning models that cannot be directly estimated from the data, but often control the complexity and variances in the model. Kuhn M. and Johnson K.
3.10. PyTorch
“An open source machine learning framework that accelerates the path from research prototyping to production deployment.” PyTorch
3.11. Whole Tale
“A scalable, open source, web-based, multi-user platform for reproducible research enabling the creation, publication, and execution of tales — executable research objects that capture data, code, and the complete software environment used to produce research findings.” Whole Tale
3.12. LandSat
The NASA/USGS Landsat Program provides the longest continuous space-based record of Earth’s land in existence. Landsat data give us information essential for making informed decisions about Earth’s resources and environment. National Air and Space Administration
3.13. Sentinel-2
A European wide-swath, high-resolution, multi-spectral imaging mission. European Space Agency
3.14. Software Heritage
Software Heritage is an organization that allows for the preservation, archiving, and sharing of source code for software. Software Heritage
3.15. ISEA3H DGGS
Icosahedral Snyder Equal Area Aperture 3 Hexagon Discrete Global Grid System — A specification for equal area DGGS based on the Icosahedral Snyder Equal Area (ISEA) projection Southern Oregon University
3.16. RDF Encoding
A metadata format that “provides interoperability between applications that exchange machine-understandable information on the Web.” W3
3.17. Zenodo
A platform and repository developed and operated by CERN to enable the sharing of research data and outputs. <<zenodo>
3.18. CodeMeta
“CodeMeta contributors are creating a minimal metadata schema for science software and code, in JSON and XML. The goal of CodeMeta is to create a concept vocabulary that can be used to standardize the exchange of software metadata across repositories and organizations.” CodeMeta
3.19. Abbreviated terms
ADES
Application Deployment and Execution Service
API
Application Programming Interface
AWS
Amazon Web Services
COG
Cloud Optimized GeoTIFF
CWL
Common Workflow Language
DGGS
Discrete Global Grid System
EMS
Exploitation Platform Management Service
ESA
European Space Agency
FAIR
Findable, Accessible, Interoperable, Reusable
IANA
Internet Assigned Numbers Authority
MIME
Multipurpose Internet Mail Extensions
MODIS
Moderate Resolution Imaging Spectroradiometer
NASA
National Air and Space Administration
NSF
National Science Foundation
ODC
Open Data Cube
SCM
Software Configuration Management
SPDX
Software Package Data Exchange
STAC
Spatio-Temporal Asset Catalog
SWG
Subject Working Group
USGS
United States Geological Survey
WG
Working Group
4. Introduction
The OGC’s Testbed 18 initiative explored six tasks, including: Advanced Interoperability for Building Energy; Secure Asynchronous Catalogs; Identifiers for Reproducible Science; Moving Features and Sensor Integration; 3D+ Data Standards and Streaming; and Machine Learning Training Data.
This component of OGC’s Testbed 18 focuses on the Identifiers for Reproducible Science, a part of TestBed Thread 3: FUTURE OF OPEN SCIENCE AND BUILDING ENERGY INTEROPERABILITY (FOB).
Issues around true science are the topic of numerous academic and technical journals. A common theme among these scholarly articles is that the reproducibility of studies is a key aspect of science.
OGC’s mission is to make location information Findable, Accessible, Interoperable, and Reusable (FAIR). This Testbed task will explore and develop workflows demonstrating best practices at the intersection of FAIR data and reproducible science.
This task shall develop best practices to describe all steps of a scientific workflow, including:
data curation from various authoritative sources such as files, APIs, and data cubes;
the workflows themselves supporting multiple applications and corresponding parameterization tuning for machine learning processes; and
workflow outputs or results that support decision-making.
The workflows included in this component represent key areas of scientific discovery leveraging location information, Earth Observation data, and geospatial processes.
The description of the models will discuss how each step of the workflow can abide by FAIR principles.
Testbed 18 builds on the OGC’s past work as well as broader industry efforts. One of the tasks of the Testbed is to explore the utilization of Whole Tale as a tool for reproducible workflows and ideally work collaboratively with the Whole Tale team to identify and address limitations in the use of Whole Tale for reproducible workflows. Some of the work in the individual workflows build off of the participant’s efforts in previous TestBeds.
5. Common Considerations and Limitations for Reproducible Workflows
Over the course of this testbed a number of considerations and limitations were discovered pertaining to the workflows of the participants. The common considerations and limitations identified across multiple workflows are discussed in this section. There are a few considerations that pertained to certain workflows which are discussed in the individual component sections that follow.
5.1. The Expansion of FAIR
Replicability: A process with the same input yields the same output.
An analysis of soil characteristics of a soil sample produces the same result each time the process is run on the same sample.
Repeatability: A process with a similar input yields the same output.
An analysis of soil characteristics performed on an identical but different soil sample produces the same result.
Reproducibility: Different inputs, platforms, and outputs result in the same conclusion.
The classification workflow for a particular city based on an analysis of Landsat data should have a similar result when performed by aerial or other satellite imagery taken at the same time for the same location. Additionally, the workflow and results should be the same whether run on different Cloud-based computing environments (e.g., Google Cloud Platform (GCP), Microsoft Azure, or Amazon Web Services).
Reusability: The ability to use a specific workflow for different areas with the same degree of accuracy and reliability on the output.
An image classification workflow created for a specific city could also be used to classify a different city within a level of accuracy or confidence interval.
5.2. Randomness in Deep Learning Applications
Randomness is important for deep learning models and applications for optimization and generalization. However, the randomness inherent in the models makes true reproducibility a challenge.
5.2.1. Where Does Randomness Appear in Deep Learning Models?
Hardware
Environment
Software/framework: different version releases, individual commits, operating systems, etc,
Function: how the models are written as code and the package dependencies based on different programming languages
Algorithm: random initialization, data augmentation, data shuffling, and stochastic layers (dropout, noisy activations)
5.2.2. Why is Randomness Critical and Important to Deep Learning Applications?
Different outputs from the same input: normally an expected result is the same output given the same input, e.g., classification or detection, but sometimes some “creativity” is necessary in the model. Examples are: the first move of playing GO; draw pictures given a blank paper; etc.
Learning/optimization: a neural network loss function has many local minima and it is easy to get stuck at the local minima during the optimization process. Randomness in many algorithms allows the optimization to bounce out of the local minima, such as random sampling in a stochastic gradient descent (SGD).
Generalization: randomness brings better generalization in the network by injecting noise to the learning process, such as Dropout
5.2.3. How to Limit the Source of Randomness and Non-Deterministic Behaviors?
Using PyTorch as an example, the sources of randomness can be limited through:
Control sources of randomness
torch.manual_seed(0): control the random seed of PyTorch operations; and
random.seed(0): control the random seed of customized Python operations.
Avoiding using non-deterministic algorithms
torch.backends.cudnn.benchmark = False: for a new set of hyperparameters, the cudnn library would run different algorithms, benchmark them, and select the fastest one. Disabling the feature could cause a reduction in performance.
5.2.4. What is the Trade-Off Between Randomness and Reproducibility?
Randomness plays an important role in deep learning and completely similar results are not guaranteed and should not be expected. As such, the same or similar results necessitated by reproducibility may be an important, but not a critical, issue in deep learning models. As such, transparency is important in the reproducibility of deep learning models so that every step of the process can be examined.
5.3. Cloud Utilization Cost Estimation
There is not always clarity in Cloud utilization costs. Any Cloud-based development efforts should make a concerted effort to track fees associated with the Cloud infrastructure. Most, if not all, Cloud hosting firms make tools available to help anticipate costs. For instance, AWS has a Cost Calculator that can help provide insights into future Cloud hosting and utilization fees.
6. Individual Component Descriptions
6.1. 52°North
6.1.1. Goals of Participation
The goals of participation included the selection and development of a viable workflow on Whole Tale that provides the following features.
End-users will be able to experience how Spatial Data Analysis can be published in a reproducible FAIR manner.
The implementation of this task will help OGC WG derive requirements and limitations of existing standards for the enablement of reproducible FAIR workflows.
Developers will be able to follow a proof of concept for setting up a reproducible FAIR workflow based on OGC standards and an Open Data Cube instance.
6.1.2. Contributed Workflows and Architecture
52°North brought in a selection of use-cases from their own and partner research activities. From these potential workflows the use-case “Exploring Wilderness Using Explainable Machine Learning in Satellite Imagery” was chosen. This scientific study was conducted as part of the “KI:STE — Artificial Intelligence (AI) strategy for Earth system data” project and was already published on arXiv (https://arxiv.org/abs/2203.00379). The goal of the study was the detection of wilderness areas using remote sensing data from Sentinel-2. Moreover, the developed machine learning models allow the interpretation of the results by applying explainable machine learning techniques. The study area is Fennoscandia (https://en.wikipedia.org/wiki/Fennoscandia). For this region the AnthroProtect dataset was prepared and openly released (http://rs.ipb.uni-bonn.de/data/anthroprotect/). This dataset consists of preprocessed Sentinel-2 data. The regions of interest were determined using data from the Copernicus CORINE Land Cover dataset and from the World Database on Protected Areas (WDPA). Additionally, land cover data from five different sources are part of the AnthroProtect dataset: Copernicus CORINE Land Cover dataset, MODIS Land Cover Type 1, Copernicus Global Land Service, ESA GlobCover, and Sentinel-2 scene classification map.
In order to make the data available inside Whole Tale and to investigate reproducibility aspects of OGC APIs in conjunction with Open Data Cube, the AnthroProtect dataset was imported and indexed in an Open Data Cube instance and published via API Coverages and STAC. It would have been beneficial to offer some sub-processes of the workflow via API Processes, but this was not possible within these Testbed activities.
The source code of the original study is available at https://gitlab.jsc.fz-juelich.de/kiste/wilderness. A slightly modified version is available at https://gitlab.jsc.fz-juelich.de/kiste/asos and was used as a starting point for Testbed 18. Based on these developments a separate github repository was created (https://github.com/52North/testbed18-wilderness-workflow) which includes parts of the original workflow. From this repository, a tale on Whole Tale (https://dashboard.wholetale.org/run/633d5fb4eb89f198ef8ce83f) was created which can be executed by interested users to reproduce parts of the study.
6.1.3. Workflow Description
Figure 1 shows an overview of the workflow steps with their inputs and outputs performed in the original study.
The first step of the workflow, the preparation of the AnthroProtect dataset, was performed with Google Earth Engine (GEE). The download and preprocessing is described in the research article in detail and can be executed using Jupyter notebooks in a sub-project of the source repository (https://gitlab.jsc.fz-juelich.de/kiste/asos/-/tree/main/projects/anthroprotect). Due to time constraints this step was excluded in the Testbed activities. The prepared dataset can be downloaded as a zip file and is regarded as the “source of truth” from where reproducibility is enhanced. Some thoughts on reproducibility regarding closed source APIs/services like GEE are described in 52°North’s future work section.
Steps 2 and 3 include the training of the ML model and a sensitivity analysis (Activation Space Occlusion Sensitivity (ASOS)) which helps to interpret model results. For details, we refer to the research article mentioned in the previous paragraph. As these steps are processing intensive with processing times that are not well-suited for demonstration purposes, they are not part of the Whole Tale tale. However, it would be interesting to set up OGC API Processes for these. Further reproducibility aspects of machine learning could be studied, e.g., how does the choice of training data or hyperparameter tuning influence model weights or how ML models can be versioned, and it could be demonstrated how the OGC Processing API can contribute to reproducibility.
The trained model and the calculated sensitivities can be used to analyze Sentinel-2 scenes to predict activation and sensitivity maps. The sensitivity maps show the classification of a scene as wild or anthropogenic. Workflow step 5.1 allows the performance of such an analysis with available Sentinel-2 samples. It is the core of the developed Whole Tale tale in these Testbed activities. Workflow Step 4 allows for the inspection of the activation space in detail and for the investigation of how areas in the activation space relate to land cover classes.