OGC Engineering Report

Testbed-18: Identifiers for Reproducible Science Summary Engineering Report
Paul Churchyard Editor Ajay Gupta Editor
Additional Formats: PDF
OGC Engineering Report


Document number:22-020r1
Document type:OGC Engineering Report
Document subtype:
Document stage:Published
Document language:English

License Agreement

Use of this document is subject to the license agreement at

I.  Executive Summary

The Testbed 18 task Identifiers for Reproducible Science explored and developed EO workflows demonstrating the best practices from FAIR data and reproducible science whilst exploring the usability of Whole Tale as a component for reproducible workflows. EO workflows involve multiple processing steps such as imagery pre-processing, training of AI / ML models, the fusion / mosaicking of imagery together, and other analytical processes. In order for EO workflows to be reproducible, each step of the process must be documented in sufficient detail and that documentation needs to be made available to scientists and end users.

The following five participating organizations contributed workflows for the Testbed 18 Reproducible Science task.

Over the course of the Reproducible Science task multiple considerations and limitations for reproducible workflows were discovered including the following.

Recommended future work includes the impact of FAIR workflows for healthcare use cases which makes data more available and reliable to researchers, healthcare practitioners, emergency response personnel, and decision makers.

II.  Keywords

The following are keywords to be used by search engines and document catalogues.

testbed, docker, web service, reproducibility, earth observation, workflows, whole tale, deep learning, fair

III.  Security considerations

No security considerations have been made for this document.

IV.  Submitters

All questions regarding this document should be directed to the editor or the contributors:

Table — Submitters

Paul ChurchyardHSR.healthEditor
Ajay K. GuptaHSR.healthEditor
Martin Pontius52 NorthContributor
Chia-Yu HsuArizona State UniversityContributor
Jerome Jacovella-St-LouisEcereContributor
Patrick DionEcereContributor
Gérald FenoyGeoLabsContributor
Fabrice BritoTerradueContributor
Pedro GoncalvesTerradueContributor
Josh LiebermanOGCContributor

V.  Abstract

The OGC’s Testbed 18 initiative explored the following six tasks.

Testbed 18 Task 3, Identifiers for Reproducible Science, explored and developed workflows demonstrating best practices at the intersection of Findable, Accessible, Interoperable, and Reusable (or FAIR) data and reproducible science.

The workflows developed in this Testbed included:

Testbed 18 participants identified considerations and limitations for reproducible workflows and recommendations for future work to identify the benefits of reproducible science for healthcare use cases.

Testbed-18: Identifiers for Reproducible Science Summary Engineering Report

1.  Scope

This report is a summary of activities undertaken in the execution of the Testbed 18 Identifiers for Reproducible Science Stream. This included the development of best practices to describe all steps of an Earth Observation scientific workflow, including:

The participants were also tasked with producing reproducible workflows and examining the feasibility of Whole Tale as a tool for reproducible workflows.

2.  Normative references

The following documents are referred to in the text in such a way that some or all of their content constitutes requirements of this document. For dated references, only the edition cited applies. For undated references, the latest edition of the referenced document (including any amendments) applies.

Open API Initiative: OpenAPI Specification 3.0.2, 2018

van den Brink, L., Portele, C., Vretanos, P.: OGC 10-100r3, Geography Markup Language (GML) Simple Features Profile, 2012

W3C: HTML5, W3C Recommendation, 2019

R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, T. Berners-Lee: IETF RFC 2616, Hypertext Transfer Protocol — HTTP/1.1. RFC Publisher (1999).

E. Rescorla: IETF RFC 2818, HTTP Over TLS. RFC Publisher (2000).

G. Klyne, C. Newman: IETF RFC 3339, Date and Time on the Internet: Timestamps. RFC Publisher (2002).

M. Nottingham: IETF RFC 8288, Web Linking. RFC Publisher (2017).

H. Butler, M. Daly, A. Doyle, S. Gillies, S. Hagen, T. Schaub: IETF RFC 7946, The GeoJSON Format. RFC Publisher (2016).

3.  Terms, definitions and abbreviated terms

This document uses the terms defined in OGC Policy Directive 49, which is based on the ISO/IEC Directives, Part 2, Rules for the structure and drafting of International Standards. In particular, the word “shall” (not “must”) is the verb form used to indicate a requirement to be strictly followed to conform to this document and OGC documents do not use the equivalent phrases in the ISO/IEC Directives, Part 2.

This document also uses terms defined in the OGC Standard for Modular specifications (OGC 08-131r3), also known as the ‘ModSpec’. The definitions of terms such as standard, specification, requirement, and conformance test are provided in the ModSpec.

For the purposes of this document, the following additional terms and definitions apply.

3.1. API Coverages

“A Web API for accessing coverages that are modeled according to the Coverage Implementation Schema (CIS) 1.1. Coverages are represented by some binary or ASCII serialization, specified by some data (en­coding) format.” Open Geospatial Consortium

3.2. API Features

“A multi-part standard that offers the capability to create, modify, and query spatial data on the Web and specifies requirements and recommendations for APIs that want to follow a standard way of sharing feature data.” Open Geospatial Consortium

3.3. GeoTIFF

“A GeoTIFF file extension contains geographic metadata that describes the actual location in space that each pixel in an image represents.”

3.4. Copernicus CORINE Land Cover dataset

A collection of Land Cover images that covers over 44 classes since 1985. Copernicus

3.5. Data Cube

“A multi-dimensional (“n-D”) array of values, with emphasis on the fact that “cube” is just a metaphor to help illustrate a data structure that can in fact be 1- dimensional, 2-dimensional, 3-dimensional, or higher-dimensional.” Open Geospatial Consortium

3.6. DVC

“Open-Source Version Control System for Machine Learning Projects” DVC

3.7. GDAL

“A translator library for raster and vector geospatial data formats that is released under an MIT style Open Source License by the Open Source Geospatial Foundation.” GDAL

3.8. Non-Deterministic Models

An algorithm that can exhibit different behaviors on different runs which are useful for finding approximate solutions when an exact solution is far too difficult or expensive to derive using a deterministic algorithm. Engati

3.9. Parameterization Tuning / Tuning Parameters / Hyperparameters

Parameters that are a component of machine learning models that cannot be directly estimated from the data, but often control the complexity and variances in the model. Kuhn M. and Johnson K.

3.10. PyTorch

“An open source machine learning framework that accelerates the path from research prototyping to production deployment.” PyTorch

3.11. Whole Tale

“A scalable, open source, web-based, multi-user platform for reproducible research enabling the creation, publication, and execution of tales — executable research objects that capture data, code, and the complete software environment used to produce research findings.” Whole Tale

3.12. LandSat

The NASA/USGS Landsat Program provides the longest continuous space-based record of Earth’s land in existence. Landsat data give us information essential for making informed decisions about Earth’s resources and environment. National Air and Space Administration

3.13. Sentinel-2

A European wide-swath, high-resolution, multi-spectral imaging mission. European Space Agency

3.14. Software Heritage

Software Heritage is an organization that allows for the preservation, archiving, and sharing of source code for software. Software Heritage


Icosahedral Snyder Equal Area Aperture 3 Hexagon Discrete Global Grid System — A specification for equal area DGGS based on the Icosahedral Snyder Equal Area (ISEA) projection Southern Oregon University

3.16. RDF Encoding

A metadata format that “provides interoperability between applications that exchange machine-understandable information on the Web.” W3

3.17. Zenodo

A platform and repository developed and operated by CERN to enable the sharing of research data and outputs. <<zenodo>

3.18. CodeMeta

“CodeMeta contributors are creating a minimal metadata schema for science software and code, in JSON and XML. The goal of CodeMeta is to create a concept vocabulary that can be used to standardize the exchange of software metadata across repositories and organizations.” CodeMeta

3.19.  Abbreviated terms


Application Deployment and Execution Service


Application Programming Interface


Amazon Web Services


Cloud Optimized GeoTIFF


Common Workflow Language


Discrete Global Grid System


Exploitation Platform Management Service


European Space Agency


Findable, Accessible, Interoperable, Reusable


Internet Assigned Numbers Authority


Multipurpose Internet Mail Extensions


Moderate Resolution Imaging Spectroradiometer


National Air and Space Administration


National Science Foundation


Open Data Cube


Software Configuration Management


Software Package Data Exchange


Spatio-Temporal Asset Catalog


Subject Working Group


United States Geological Survey


Working Group

4.  Introduction

The OGC’s Testbed 18 initiative explored six tasks, including: Advanced Interoperability for Building Energy; Secure Asynchronous Catalogs; Identifiers for Reproducible Science; Moving Features and Sensor Integration; 3D+ Data Standards and Streaming; and Machine Learning Training Data.

This component of OGC’s Testbed 18 focuses on the Identifiers for Reproducible Science, a part of TestBed Thread 3: FUTURE OF OPEN SCIENCE AND BUILDING ENERGY INTEROPERABILITY (FOB).

Issues around true science are the topic of numerous academic and technical journals. A common theme among these scholarly articles is that the reproducibility of studies is a key aspect of science.

OGC’s mission is to make location information Findable, Accessible, Interoperable, and Reusable (FAIR). This Testbed task will explore and develop workflows demonstrating best practices at the intersection of FAIR data and reproducible science.

This task shall develop best practices to describe all steps of a scientific workflow, including:

The workflows included in this component represent key areas of scientific discovery leveraging location information, Earth Observation data, and geospatial processes.

The description of the models will discuss how each step of the workflow can abide by FAIR principles.

Testbed 18 builds on the OGC’s past work as well as broader industry efforts. One of the tasks of the Testbed is to explore the utilization of Whole Tale as a tool for reproducible workflows and ideally work collaboratively with the Whole Tale team to identify and address limitations in the use of Whole Tale for reproducible workflows. Some of the work in the individual workflows build off of the participant’s efforts in previous TestBeds.

5.  Common Considerations and Limitations for Reproducible Workflows

Over the course of this testbed a number of considerations and limitations were discovered pertaining to the workflows of the participants. The common considerations and limitations identified across multiple workflows are discussed in this section. There are a few considerations that pertained to certain workflows which are discussed in the individual component sections that follow.

5.1.  The Expansion of FAIR

  1. Replicability: A process with the same input yields the same output.

    • An analysis of soil characteristics of a soil sample produces the same result each time the process is run on the same sample.

  2. Repeatability: A process with a similar input yields the same output.

    • An analysis of soil characteristics performed on an identical but different soil sample produces the same result.

  3. Reproducibility: Different inputs, platforms, and outputs result in the same conclusion.

    • The classification workflow for a particular city based on an analysis of Landsat data should have a similar result when performed by aerial or other satellite imagery taken at the same time for the same location. Additionally, the workflow and results should be the same whether run on different Cloud-based computing environments (e.g., Google Cloud Platform (GCP), Microsoft Azure, or Amazon Web Services).

  4. Reusability: The ability to use a specific workflow for different areas with the same degree of accuracy and reliability on the output.

    • An image classification workflow created for a specific city could also be used to classify a different city within a level of accuracy or confidence interval.

5.2.  Randomness in Deep Learning Applications

Randomness is important for deep learning models and applications for optimization and generalization. However, the randomness inherent in the models makes true reproducibility a challenge.

5.2.1.  Where Does Randomness Appear in Deep Learning Models?

  • Hardware

  • Environment

  • Software/framework: different version releases, individual commits, operating systems, etc,

  • Function: how the models are written as code and the package dependencies based on different programming languages

  • Algorithm: random initialization, data augmentation, data shuffling, and stochastic layers (dropout, noisy activations)

5.2.2.  Why is Randomness Critical and Important to Deep Learning Applications?

  • Different outputs from the same input: normally an expected result is the same output given the same input, e.g., classification or detection, but sometimes some “creativity” is necessary in the model. Examples are: the first move of playing GO; draw pictures given a blank paper; etc.

  • Learning/optimization: a neural network loss function has many local minima and it is easy to get stuck at the local minima during the optimization process. Randomness in many algorithms allows the optimization to bounce out of the local minima, such as random sampling in a stochastic gradient descent (SGD).

  • Generalization: randomness brings better generalization in the network by injecting noise to the learning process, such as Dropout

5.2.3.  How to Limit the Source of Randomness and Non-Deterministic Behaviors?

Using PyTorch as an example, the sources of randomness can be limited through:

  • Control sources of randomness

    • torch.manual_seed(0): control the random seed of PyTorch operations; and

    • random.seed(0): control the random seed of customized Python operations.

  • Avoiding using non-deterministic algorithms

    • torch.backends.cudnn.benchmark = False: for a new set of hyperparameters, the cudnn library would run different algorithms, benchmark them, and select the fastest one. Disabling the feature could cause a reduction in performance.

5.2.4.  What is the Trade-Off Between Randomness and Reproducibility?

Randomness plays an important role in deep learning and completely similar results are not guaranteed and should not be expected. As such, the same or similar results necessitated by reproducibility may be an important, but not a critical, issue in deep learning models. As such, transparency is important in the reproducibility of deep learning models so that every step of the process can be examined.

5.3.  Cloud Utilization Cost Estimation

There is not always clarity in Cloud utilization costs. Any Cloud-based development efforts should make a concerted effort to track fees associated with the Cloud infrastructure. Most, if not all, Cloud hosting firms make tools available to help anticipate costs. For instance, AWS has a Cost Calculator that can help provide insights into future Cloud hosting and utilization fees.

6.  Individual Component Descriptions

6.1.  52°North

6.1.1.  Goals of Participation

The goals of participation included the selection and development of a viable workflow on Whole Tale that provides the following features.

  • End-users will be able to experience how Spatial Data Analysis can be published in a reproducible FAIR manner.

  • The implementation of this task will help OGC WG derive requirements and limitations of existing standards for the enablement of reproducible FAIR workflows.

  • Developers will be able to follow a proof of concept for setting up a reproducible FAIR workflow based on OGC standards and an Open Data Cube instance.

6.1.2.  Contributed Workflows and Architecture

52°North brought in a selection of use-cases from their own and partner research activities. From these potential workflows the use-case “Exploring Wilderness Using Explainable Machine Learning in Satellite Imagery” was chosen. This scientific study was conducted as part of the “KI:STE — Artificial Intelligence (AI) strategy for Earth system data” project and was already published on arXiv ( The goal of the study was the detection of wilderness areas using remote sensing data from Sentinel-2. Moreover, the developed machine learning models allow the interpretation of the results by applying explainable machine learning techniques. The study area is Fennoscandia ( For this region the AnthroProtect dataset was prepared and openly released ( This dataset consists of preprocessed Sentinel-2 data. The regions of interest were determined using data from the Copernicus CORINE Land Cover dataset and from the World Database on Protected Areas (WDPA). Additionally, land cover data from five different sources are part of the AnthroProtect dataset: Copernicus CORINE Land Cover dataset, MODIS Land Cover Type 1, Copernicus Global Land Service, ESA GlobCover, and Sentinel-2 scene classification map.

In order to make the data available inside Whole Tale and to investigate reproducibility aspects of OGC APIs in conjunction with Open Data Cube, the AnthroProtect dataset was imported and indexed in an Open Data Cube instance and published via API Coverages and STAC. It would have been beneficial to offer some sub-processes of the workflow via API Processes, but this was not possible within these Testbed activities.

The source code of the original study is available at A slightly modified version is available at and was used as a starting point for Testbed 18. Based on these developments a separate github repository was created ( which includes parts of the original workflow. From this repository, a tale on Whole Tale ( was created which can be executed by interested users to reproduce parts of the study.

6.1.3.  Workflow Description

Figure 1 shows an overview of the workflow steps with their inputs and outputs performed in the original study.

The first step of the workflow, the preparation of the AnthroProtect dataset, was performed with Google Earth Engine (GEE). The download and preprocessing is described in the research article in detail and can be executed using Jupyter notebooks in a sub-project of the source repository ( Due to time constraints this step was excluded in the Testbed activities. The prepared dataset can be downloaded as a zip file and is regarded as the “source of truth” from where reproducibility is enhanced. Some thoughts on reproducibility regarding closed source APIs/services like GEE are described in 52°North’s future work section.

Steps 2 and 3 include the training of the ML model and a sensitivity analysis (Activation Space Occlusion Sensitivity (ASOS)) which helps to interpret model results. For details, we refer to the research article mentioned in the previous paragraph. As these steps are processing intensive with processing times that are not well-suited for demonstration purposes, they are not part of the Whole Tale tale. However, it would be interesting to set up OGC API Processes for these. Further reproducibility aspects of machine learning could be studied, e.g., how does the choice of training data or hyperparameter tuning influence model weights or how ML models can be versioned, and it could be demonstrated how the OGC Processing API can contribute to reproducibility.

The trained model and the calculated sensitivities can be used to analyze Sentinel-2 scenes to predict activation and sensitivity maps. The sensitivity maps show the classification of a scene as wild or anthropogenic. Workflow step 5.1 allows the performance of such an analysis with available Sentinel-2 samples. It is the core of the developed Whole Tale tale in these Testbed activities. Workflow Step 4 allows for the inspection of the activation space in detail and for the investigation of how areas in the activation space relate to land cover classes.