OGC Engineering Report

Testbed-19: Machine Learning Models Engineering Report
Samantha Lavender Editor Trent Tinker Editor
OGC Engineering Report


Document number:23-033
Document type:OGC Engineering Report
Document subtype:
Document stage:Published
Document language:English

License Agreement

Use of this document is subject to the license agreement at

I.  Executive Summary

This OGC Testbed 19 Engineering Report (ER) details work to develop a foundation for future standardization of Machine Learning (ML) models for transfer learning. The work is based on previous OGC ML activities and has focused on evaluating the status quo of transfer learning, metadata implications for geo-ML applications of transfer learning, and general questions of sharing and re-use.

The scope is geospatial, especially Earth Observation (EO) applications, with Testbed participants having considered transferring models between software applications, application domains, geographical locations, and synthetic datasets to real EO data. GeoLabs developed an end-to-end framework based on web-services architecture for training, fine-tuning based on a pre-trained model, visualizing model graphs, and inferencing. George Mason University proposed spatiotemporally transferable learning algorithms and a temporal learning strategy that would maximally transfer label data and models from the US case to foreign countries. Pixalytics tested transfer learning by freezing all layers except the bottom layer of a Neural Network model, then trained it to detect a specific category of waste plastic. They also took the Meta AI Segment Anything Model, which was developed for machine vision, and applied approaches where hyperspectral data were used. undertook experiments to understand which synthetic dataset approach yielded the best results before using these for model backbone training — used to fine-tune a COCO model backbone with no real data used in training.

In addition to these experiments, the participants reviewed and provided feedback on research questions outlined within the call for participation. The answers have been formulated around the FAIR principles, considering the description of an ML model to support findability, provide access, and support interoperability. The participants also questioned how an ML Model should be described to enable efficient re-use through transfer learning applications. This last section considered taxonomy, quality measures, the relationship to the training data, and the model’s performance envelope and metrics.

In the Summary & Recommendations section, the ER reviews the findings and makes recommendations about the next steps in terms of both the experiments conducted and broader implications for OGC. Coordination is needed to ensure that the work of the OGC standard working groups brings together the different elements needed to store and share ML models. A focus on metadata is critical to allow users to understand what is available and applicable to the users. In addition, standardization of naming will support interoperability.

II.  Keywords

The following are keywords to be used by search engines and document catalogues.

Machine Learning, Transfer Learning, Earth Observation

III.  Contributors

All questions regarding this document should be directed to the editors or the contributors:

Sam LavenderPixalytics LtdEditor
Trent TinkerOGCEditor
Goncalo MaiaEUSatCenContributor
Rajat ShindeGeoLabs/NASA-IMPACT/UAHContributor
Gérald FenoyGeoLabsContributor
Adrian AkbariGeoLabsContributor
Chen ZhangGMUContributor
Chris AndrewsRendered.aiContributor
Jim AntonisseWiSC/NGA-TAESContributor

IV.  Abstract

The OGC Testbed 19 initiative explored six tasks including this task focused on “Machine Learning: Transfer Learning for Geospatial Applications.”

This OGC Testbed 19 Engineering Report (ER) documents work to develop the foundation for future standardization of Machine Learning models for transfer learning within geospatial, especially Earth Observation, applications. The ER reviews the findings of transfer learning experiments and makes recommendations about the next steps in terms of both the experiments conducted and broader implications for OGC.

1.  Introduction

New and revolutionary Artificial Intelligence (AI) and Machine Learning (ML) algorithms developed over the past ten years have great potential to advance the processing and analysis of Earth Observation (EO) data while comprehensive standards for this technology have yet to emerge. However, the Open Geospatial Consortium (OGC) has investigated opportunities in ML standards for EO such as the ML threads in Testbeds 14, 15, and 16. Further, the OGC TrainingDML-AI (TDML) Standards Working Group developed the Training Data Markup Language for Artificial Intelligence (TrainingDML-AI) Part 1: Conceptual Model. The SWG also provided analyses and recommendations of the Standard and next steps in the TestBed-18 ML thread. Testbed 19 builds on these previous efforts.

1.1.  Introduction to Transfer Learning

Transfer learning is a technique in ML where the knowledge learned from a task is re-used to boost performance and reduce costs on a related task.

Among the most productive methods in the application of ML to new domains has been the re-use of existing ML solutions for new problems. This is where a subset of the Domain Model produced by application of ML in a related domain is taken as the starting point for addressing the new problem. The advantage of this approach is that the investment in the previous ML task, which can be enormous both in terms of the Training Dataset (TDS) generation and computing power required to refine the model, can be repeatedly made to pay off. Therefore, reuse has become very popular in deep learning because a reused deep neural network can then be trained with comparatively little data.

The ground-laying work of Pan and Yang (2010) characterizes transfer learning across Source and Target Domains of application. A Domain is defined as a pair consisting of a Feature Space X = {x1, …, xn} and a Task T defined over the Domain is a pair consisting of a set of Labels Y and an objective function f(·). The objective function f(·) is learned from a TDS consisting of pairings of Features and Labels {xi, yj} where xi is a member of X and yj is a member of Y. The problem of learning f(xi) can equivalently be considered as the problem of discovering the conditional probability P(yi|xi) when given an input xi.

If we focus on Deep Learning, then f(·) is the inferencing capability that results from the discovery of discriminating Features within the Feature Space expressible in the layered neural network. The discovery is achieved through the learning process, the back-propagation of the (costs of) successful and unsuccessful assignments of Labels to instances of the Training Data Set.

We consider two domains, a Source Domain and a Target Domain, for which we are provided Training Data Sets DS={(xS1, yS1),…, (xSn, ySn)} and DT={(xT1, yT1),…, (xTm, yTm)} respectively. Then, following from Pan and Yang (2010), transfer learning can be defined as the case where DS <> DT or TS <> TT but the previous learning of fS(.) from DS nonetheless helps improve the learning of fT(·) from DT.

Note that the restricting condition implies, from DS <> DT, that either XS <> XT or PS(X) <> PT(X), while the condition TS <> TT implies that either YS <> YT or, under the equivalency noted above, that PS(Y|X) <> PT(Y|X). That is, there is some difference between the Features of the two Tasks, or if the Features are the same, then between the distribution of the Feature values with respect to the Labels. For instance, a Source Domain might include a TDS of the labels “Corn,” “Soy,” and “Other” associated with a patch of three-band (2, 3, and 4) Landsat imagery over the US denoted by some geometry within the patches (the Source Data). In contrast, the Target may include a TDS of the labels “Wheat,” “Alfalfa,” and “Other” for the same three-band Landsat imagery acquired over Poland.

As illustrated in Figure 1, transfer learning algorithms pass learned knowledge from one model to fine-tune another model on a different dataset.

Figure 1 — Transfer Learning: the passing of knowledge from one Domain to another.

It is useful to distinguish transfer learning from related ML problems; Zhuang et al. (2020) as follows.

  • Semi-Supervised Learning: Transfer Learning can be seen as related to Semi-Supervised Learning in its marshalling of examples in order to learn the inference function f(·), except that the distribution of Features relative to Labels is the same for every Task instance (in this case, each episode of “masking” that occurs in the Unsupervised Learning exercise).

  • MultiTask Learning: Similarly, transfer learning closely resembles MultiTask learning. The difference in this case is that in MultiTask learning, the Source and Target Labels are invoked within the same learning episode, versus happening strictly sequentially as in transfer learning. The Source and Target Data are identical, and it is the Source and Target Models that are intended to evolve – they, in principle, are independent bodies of learned knowledge (i.e., as independently-applicable inference capabilities) even though they share the same Feature Space and, potentially, many of the same Features.

  • MultiView Learning: The Label set may be identical, but the Source and Target Data may be different, as in learning to distinguish an object from many views or from multimodal data. In this case, the Features might be quite distinct, though they indicate the same Target object.

The transfer learning literature identifies several possible variations on the transfer of knowledge within the general framework described above. Given a source domain, the target domain may have different labels, or different distributions, or different target data, as well as exhibiting distinctions arising from the specifics of their application, e.g., from sensor modalities such as video sequences versus worn sensors of physiological state, which would set transfer learning in a MultiView learning context. However, the literature seems to have converged on a classification of transfer learning techniques reflecting the following four categories as follows.

  1. Instance Transfer, in which the differential weighing of training instances drives the learning process.

  2. Feature Representation Transfer, in which learning includes Feature Discovery, but in which that discovery is given a head start by starting from the feature set previously discovered for a related Task.

  3. Parameter Transfer, in which the learning algorithms exploit the hyperparameters of a related learning problem to guide the setting of its own hyperparameters.

  4. Relational Knowledge Transfer, in which relations, e.g., rules of operation, are learned in one context and applied in a related one.

The work reflected in this Engineering Report (ER) is focused on point 2, Feature Representation Transfer.

The literature reflects three main research issues: what to transfer, how to transfer, and when to transfer.

  1. What to Transfer will depend on the method. The literature suggests four categories ….

  2. How to Transfer will depend on the method. The focus here is on Deep Learning.

  3. When to Transfer because the refinement process may lead to worsening behavior in the original application. However, there are reports in the literature that suggest transfer learning always leads to improvement over start-from-scratch (Wang et al 2014).

This ER presumes points c and b that transfer learning leads to improvements and are exhibited by Deep Learning Neural Networks methods that are the focus. Also, point c is covered as transfer learning is applied to spatio-temporal features and/or spatio-temporal metadata from one application to another, e.g., to support cases where there is a reason to believe the features discovered for one Task may be of use in the Target task, and where metadata describing one Task suggests the Model is suitable for the Target Task.

Among the most productive methods in applying ML to new domains has been the reuse of existing ML solutions for new problems, where a subset of the Domain Model produced by ML in a related domain is taken as the starting point for the new problem; see Figure 2.

Figure 2 — Transfer Learning: reuse of machine learning models.

The advantage of this approach is that the investment in the previous ML task, which can be enormous in terms of the TDS generation and computing power required to refine the model, can be made to pay off repeatedly. Therefore, it has become very popular in deep learning because a reused deep neural network can be trained with comparatively little data.

The TDML Standard considers the tasks to which ML might be applied as follows.

  • Scene Classification — Classifying a scene image to one of a set of predefined scene categories.

  • Object Detection — A computer vision application that detects instances of semantic objects of a certain class.

  • Semantic Segmentation — A common EO application that involves assigning a class label to every pixel in the image.

  • Change Detection — A computer vision/EO task that involves detecting changes in an image or video sequence over time.

  • 3D Model Reconstruction — in computer vision and computer graphics, 3D reconstruction is the process of capturing the shape and appearance of real objects.

In this ER, the focus is primarily on the use of ML for Object Detection and Semantic Segmentation using transfer learning. However, the Testbed participants have also considered other forms of transfer learning as follows.

  • Transferring models between software applications — The strong need for alignment has meant that, in practice, transfer learning has historically almost always been applied only within a single ML architecture, such as between earlier and later instances of TensorFlow. However, having cross-architecture transfer learning available, for instance between instances of TensorFlow and PyTorch would be very beneficial. This topic explores the possibility of the case of geospatial applications, but considers wider AI standards such as the Open Neural Network Exchange (ONNX). ONNX is an open standard for ML interoperability.

  • Transferring models between application domains — This scenario builds on the Deep Learning applications of transfer learning, but considers transferring a model to different input dataset without retraining all or part of it.

  • Transferring models between geographical locations — This scenario has explored the feasibility of the field-level in-season crop mapping of foreign countries by using spatiotemporal transfer learning algorithms.

  • Transferring models from synthetic datasets to real EO data — In this case, transfer learning follows the standard process of reusing, also called freezing, part of a trained model backbone to then combine it with additional model layers that are trained on a new TDS. However, the difference is that the origin TDS will be synthetic while the transfer learning TDS is real EO data.

The experiments’ scope is geospatial use cases, particularly EO applications. In terms of future OGC standards development, further work will need to be undertaken to examine broader geospatial applicability.

1.2.  Testbed-19 Machine Learning Task

A major goal of this effort is to ascertain the degree to which transfer learning may be brought into an OGC standards regime. Re-use depends on two cases:

  • how the model is stored; and

  • how the required ancillary data and the released information can be understood by the user on how the model was constructed/trained.

Both cases are required to determine the best reuse approach.

When transferring a model between applications, re-use is dependent on the new ML application incorporating the results of previous ML applications. Therefore, the ML architecture of the earlier model has to be aligned with that of the later ML application. Part of the work in this Testbed-19 was to determine the data and information elements needed for transfer learning to succeed in the EO domain. As such, questions include the following.

  • How much information about the provenance of the ML model’s TDS needs to be available?

  • Is it important to have a representation of what is in-distribution versus what is out-of-distribution for the ML model?

  • Do quality measures need to be conveyed for transfer learning to be effectively encouraged in the community?

  • Are other elements required to support a standard regime for building out and entering new transfer learning-based capabilities into the marketplace?

In addition, a goal of the Testbed-18 ML thread was to develop the foundation for future standardization of TDS for EO applications. Therefore, a goal of the Testbed-19 ML task was to develop the foundation for future standardization of ML models for transfer learning within geospatial, and especially EO, applications. The task evaluated the status quo of transfer learning, metadata implications for geo-ML applications of transfer learning, and general questions of sharing and re-use. Several initiatives, such as ONNX, have developed implementations that could be used for future standardization work.

As an OGC effort, this Testbed activity is distinct from general applications of AI/ML in that the focus is primarily geospatial ML applications. However, findings and feedback from this Testbed activity may support the wider community.

2.  Overview of the Machine Learning models and datasets being tested

2.1.  GeoLabs

2.1.1.  Dataset  Introduction

The FLAIR (French Land Use/Land Cover Artificial Intelligence Recognition) dataset is a comprehensive and high-quality collection of labeled satellite imagery aimed at advancing land cover classification and geospatial analysis tasks. FLAIR was developed and maintained by the French National Institute of Geographic and Forest Information (IGN) and serves as a valuable resource for researchers, data scientists, and practitioners in the field of remote sensing and geospatial analysis: Garioud et al (2022).  Dataset Overview

The FLAIR dataset provides a diverse range of satellite imagery covering various regions of France.

Figures 3 and 4 show an image and labels sample of the FLAIR dataset. It encompasses both rural and urban areas, capturing the intricate details of land use and land cover across the country. The dataset offers multi-temporal imagery with different spectral bands, resolutions, and acquisition dates, enabling the exploration of temporal dynamics and changes in land cover.

Figure 3 — FLAIR dataset image.