I. Executive Summary
This OGC Testbed 19 Engineering Report (ER) details work to develop a foundation for future standardization of Machine Learning (ML) models for transfer learning. The work is based on previous OGC ML activities and has focused on evaluating the status quo of transfer learning, metadata implications for geo-ML applications of transfer learning, and general questions of sharing and re-use.
The scope is geospatial, especially Earth Observation (EO) applications, with Testbed participants having considered transferring models between software applications, application domains, geographical locations, and synthetic datasets to real EO data. GeoLabs developed an end-to-end framework based on web-services architecture for training, fine-tuning based on a pre-trained model, visualizing model graphs, and inferencing. George Mason University proposed spatiotemporally transferable learning algorithms and a temporal learning strategy that would maximally transfer label data and models from the US case to foreign countries. Pixalytics tested transfer learning by freezing all layers except the bottom layer of a Neural Network model, then trained it to detect a specific category of waste plastic. They also took the Meta AI Segment Anything Model, which was developed for machine vision, and applied approaches where hyperspectral data were used. Rendered.ai undertook experiments to understand which synthetic dataset approach yielded the best results before using these for model backbone training — used to fine-tune a COCO model backbone with no real data used in training.
In addition to these experiments, the participants reviewed and provided feedback on research questions outlined within the call for participation. The answers have been formulated around the FAIR principles, considering the description of an ML model to support findability, provide access, and support interoperability. The participants also questioned how an ML Model should be described to enable efficient re-use through transfer learning applications. This last section considered taxonomy, quality measures, the relationship to the training data, and the model’s performance envelope and metrics.
In the Summary & Recommendations section, the ER reviews the findings and makes recommendations about the next steps in terms of both the experiments conducted and broader implications for OGC. Coordination is needed to ensure that the work of the OGC standard working groups brings together the different elements needed to store and share ML models. A focus on metadata is critical to allow users to understand what is available and applicable to the users. In addition, standardization of naming will support interoperability.
II. Keywords
The following are keywords to be used by search engines and document catalogues.
Machine Learning, Transfer Learning, Earth Observation
III. Contributors
All questions regarding this document should be directed to the editors or the contributors:
| Name | Organization | Role |
|---|---|---|
| Sam Lavender | Pixalytics Ltd | Editor |
| Trent Tinker | OGC | Editor |
| Goncalo Maia | EUSatCen | Contributor |
| Rajat Shinde | GeoLabs/NASA-IMPACT/UAH | Contributor |
| Gérald Fenoy | GeoLabs | Contributor |
| Adrian Akbari | GeoLabs | Contributor |
| Chen Zhang | GMU | Contributor |
| Chris Andrews | Rendered.ai | Contributor |
| Jim Antonisse | WiSC/NGA-TAES | Contributor |
IV. Abstract
The OGC Testbed 19 initiative explored six tasks including this task focused on “Machine Learning: Transfer Learning for Geospatial Applications.”
This OGC Testbed 19 Engineering Report (ER) documents work to develop the foundation for future standardization of Machine Learning models for transfer learning within geospatial, especially Earth Observation, applications. The ER reviews the findings of transfer learning experiments and makes recommendations about the next steps in terms of both the experiments conducted and broader implications for OGC.
1. Introduction
New and revolutionary Artificial Intelligence (AI) and Machine Learning (ML) algorithms developed over the past ten years have great potential to advance the processing and analysis of Earth Observation (EO) data while comprehensive standards for this technology have yet to emerge. However, the Open Geospatial Consortium (OGC) has investigated opportunities in ML standards for EO such as the ML threads in Testbeds 14, 15, and 16. Further, the OGC TrainingDML-AI (TDML) Standards Working Group developed the Training Data Markup Language for Artificial Intelligence (TrainingDML-AI) Part 1: Conceptual Model. The SWG also provided analyses and recommendations of the Standard and next steps in the TestBed-18 ML thread. Testbed 19 builds on these previous efforts.
1.1. Introduction to Transfer Learning
Transfer learning is a technique in ML where the knowledge learned from a task is re-used to boost performance and reduce costs on a related task.
Among the most productive methods in the application of ML to new domains has been the re-use of existing ML solutions for new problems. This is where a subset of the Domain Model produced by application of ML in a related domain is taken as the starting point for addressing the new problem. The advantage of this approach is that the investment in the previous ML task, which can be enormous both in terms of the Training Dataset (TDS) generation and computing power required to refine the model, can be repeatedly made to pay off. Therefore, reuse has become very popular in deep learning because a reused deep neural network can then be trained with comparatively little data.
The ground-laying work of Pan and Yang (2010) characterizes transfer learning across Source and Target Domains of application. A Domain is defined as a pair consisting of a Feature Space X = {x1, …, xn} and a Task T defined over the Domain is a pair consisting of a set of Labels Y and an objective function f(·). The objective function f(·) is learned from a TDS consisting of pairings of Features and Labels {xi, yj} where xi is a member of X and yj is a member of Y. The problem of learning f(xi) can equivalently be considered as the problem of discovering the conditional probability P(yi|xi) when given an input xi.
If we focus on Deep Learning, then f(·) is the inferencing capability that results from the discovery of discriminating Features within the Feature Space expressible in the layered neural network. The discovery is achieved through the learning process, the back-propagation of the (costs of) successful and unsuccessful assignments of Labels to instances of the Training Data Set.
We consider two domains, a Source Domain and a Target Domain, for which we are provided Training Data Sets DS={(xS1, yS1),…, (xSn, ySn)} and DT={(xT1, yT1),…, (xTm, yTm)} respectively. Then, following from Pan and Yang (2010), transfer learning can be defined as the case where DS <> DT or TS <> TT but the previous learning of fS(.) from DS nonetheless helps improve the learning of fT(·) from DT.
Note that the restricting condition implies, from DS <> DT, that either XS <> XT or PS(X) <> PT(X), while the condition TS <> TT implies that either YS <> YT or, under the equivalency noted above, that PS(Y|X) <> PT(Y|X). That is, there is some difference between the Features of the two Tasks, or if the Features are the same, then between the distribution of the Feature values with respect to the Labels. For instance, a Source Domain might include a TDS of the labels “Corn,” “Soy,” and “Other” associated with a patch of three-band (2, 3, and 4) Landsat imagery over the US denoted by some geometry within the patches (the Source Data). In contrast, the Target may include a TDS of the labels “Wheat,” “Alfalfa,” and “Other” for the same three-band Landsat imagery acquired over Poland.
As illustrated in Figure 1, transfer learning algorithms pass learned knowledge from one model to fine-tune another model on a different dataset.
Figure 1 — Transfer Learning: the passing of knowledge from one Domain to another.
It is useful to distinguish transfer learning from related ML problems; Zhuang et al. (2020) as follows.
Semi-Supervised Learning: Transfer Learning can be seen as related to Semi-Supervised Learning in its marshalling of examples in order to learn the inference function f(·), except that the distribution of Features relative to Labels is the same for every Task instance (in this case, each episode of “masking” that occurs in the Unsupervised Learning exercise).
MultiTask Learning: Similarly, transfer learning closely resembles MultiTask learning. The difference in this case is that in MultiTask learning, the Source and Target Labels are invoked within the same learning episode, versus happening strictly sequentially as in transfer learning. The Source and Target Data are identical, and it is the Source and Target Models that are intended to evolve – they, in principle, are independent bodies of learned knowledge (i.e., as independently-applicable inference capabilities) even though they share the same Feature Space and, potentially, many of the same Features.
MultiView Learning: The Label set may be identical, but the Source and Target Data may be different, as in learning to distinguish an object from many views or from multimodal data. In this case, the Features might be quite distinct, though they indicate the same Target object.
The transfer learning literature identifies several possible variations on the transfer of knowledge within the general framework described above. Given a source domain, the target domain may have different labels, or different distributions, or different target data, as well as exhibiting distinctions arising from the specifics of their application, e.g., from sensor modalities such as video sequences versus worn sensors of physiological state, which would set transfer learning in a MultiView learning context. However, the literature seems to have converged on a classification of transfer learning techniques reflecting the following four categories as follows.
Instance Transfer, in which the differential weighing of training instances drives the learning process.
Feature Representation Transfer, in which learning includes Feature Discovery, but in which that discovery is given a head start by starting from the feature set previously discovered for a related Task.
Parameter Transfer, in which the learning algorithms exploit the hyperparameters of a related learning problem to guide the setting of its own hyperparameters.
Relational Knowledge Transfer, in which relations, e.g., rules of operation, are learned in one context and applied in a related one.
The work reflected in this Engineering Report (ER) is focused on point 2, Feature Representation Transfer.
The literature reflects three main research issues: what to transfer, how to transfer, and when to transfer.
What to Transfer will depend on the method. The literature suggests four categories ….
How to Transfer will depend on the method. The focus here is on Deep Learning.
When to Transfer because the refinement process may lead to worsening behavior in the original application. However, there are reports in the literature that suggest transfer learning always leads to improvement over start-from-scratch (Wang et al 2014).
This ER presumes points c and b that transfer learning leads to improvements and are exhibited by Deep Learning Neural Networks methods that are the focus. Also, point c is covered as transfer learning is applied to spatio-temporal features and/or spatio-temporal metadata from one application to another, e.g., to support cases where there is a reason to believe the features discovered for one Task may be of use in the Target task, and where metadata describing one Task suggests the Model is suitable for the Target Task.
Among the most productive methods in applying ML to new domains has been the reuse of existing ML solutions for new problems, where a subset of the Domain Model produced by ML in a related domain is taken as the starting point for the new problem; see Figure 2.
Figure 2 — Transfer Learning: reuse of machine learning models.
The advantage of this approach is that the investment in the previous ML task, which can be enormous in terms of the TDS generation and computing power required to refine the model, can be made to pay off repeatedly. Therefore, it has become very popular in deep learning because a reused deep neural network can be trained with comparatively little data.
The TDML Standard considers the tasks to which ML might be applied as follows.
Scene Classification — Classifying a scene image to one of a set of predefined scene categories.
Object Detection — A computer vision application that detects instances of semantic objects of a certain class.
Semantic Segmentation — A common EO application that involves assigning a class label to every pixel in the image.
Change Detection — A computer vision/EO task that involves detecting changes in an image or video sequence over time.
3D Model Reconstruction — in computer vision and computer graphics, 3D reconstruction is the process of capturing the shape and appearance of real objects.
In this ER, the focus is primarily on the use of ML for Object Detection and Semantic Segmentation using transfer learning. However, the Testbed participants have also considered other forms of transfer learning as follows.
Transferring models between software applications — The strong need for alignment has meant that, in practice, transfer learning has historically almost always been applied only within a single ML architecture, such as between earlier and later instances of TensorFlow. However, having cross-architecture transfer learning available, for instance between instances of TensorFlow and PyTorch would be very beneficial. This topic explores the possibility of the case of geospatial applications, but considers wider AI standards such as the Open Neural Network Exchange (ONNX). ONNX is an open standard for ML interoperability.
Transferring models between application domains — This scenario builds on the Deep Learning applications of transfer learning, but considers transferring a model to different input dataset without retraining all or part of it.
Transferring models between geographical locations — This scenario has explored the feasibility of the field-level in-season crop mapping of foreign countries by using spatiotemporal transfer learning algorithms.
Transferring models from synthetic datasets to real EO data — In this case, transfer learning follows the standard process of reusing, also called freezing, part of a trained model backbone to then combine it with additional model layers that are trained on a new TDS. However, the difference is that the origin TDS will be synthetic while the transfer learning TDS is real EO data.
The experiments’ scope is geospatial use cases, particularly EO applications. In terms of future OGC standards development, further work will need to be undertaken to examine broader geospatial applicability.
1.2. Testbed-19 Machine Learning Task
A major goal of this effort is to ascertain the degree to which transfer learning may be brought into an OGC standards regime. Re-use depends on two cases:
how the model is stored; and
how the required ancillary data and the released information can be understood by the user on how the model was constructed/trained.
Both cases are required to determine the best reuse approach.
When transferring a model between applications, re-use is dependent on the new ML application incorporating the results of previous ML applications. Therefore, the ML architecture of the earlier model has to be aligned with that of the later ML application. Part of the work in this Testbed-19 was to determine the data and information elements needed for transfer learning to succeed in the EO domain. As such, questions include the following.
How much information about the provenance of the ML model’s TDS needs to be available?
Is it important to have a representation of what is in-distribution versus what is out-of-distribution for the ML model?
Do quality measures need to be conveyed for transfer learning to be effectively encouraged in the community?
Are other elements required to support a standard regime for building out and entering new transfer learning-based capabilities into the marketplace?
In addition, a goal of the Testbed-18 ML thread was to develop the foundation for future standardization of TDS for EO applications. Therefore, a goal of the Testbed-19 ML task was to develop the foundation for future standardization of ML models for transfer learning within geospatial, and especially EO, applications. The task evaluated the status quo of transfer learning, metadata implications for geo-ML applications of transfer learning, and general questions of sharing and re-use. Several initiatives, such as ONNX, have developed implementations that could be used for future standardization work.
As an OGC effort, this Testbed activity is distinct from general applications of AI/ML in that the focus is primarily geospatial ML applications. However, findings and feedback from this Testbed activity may support the wider community.
2. Overview of the Machine Learning models and datasets being tested
2.1. GeoLabs
2.1.1. Dataset
2.1.1.1. Introduction
The FLAIR (French Land Use/Land Cover Artificial Intelligence Recognition) dataset is a comprehensive and high-quality collection of labeled satellite imagery aimed at advancing land cover classification and geospatial analysis tasks. FLAIR was developed and maintained by the French National Institute of Geographic and Forest Information (IGN) and serves as a valuable resource for researchers, data scientists, and practitioners in the field of remote sensing and geospatial analysis: Garioud et al (2022).
2.1.1.2. Dataset Overview
The FLAIR dataset provides a diverse range of satellite imagery covering various regions of France.
Figures 3 and 4 show an image and labels sample of the FLAIR dataset. It encompasses both rural and urban areas, capturing the intricate details of land use and land cover across the country. The dataset offers multi-temporal imagery with different spectral bands, resolutions, and acquisition dates, enabling the exploration of temporal dynamics and changes in land cover.
Figure 3 — FLAIR dataset image.
Figure 4 — FLAIR dataset labels.
2.1.1.3. Key Features and Statistics
Spatial Coverage: The FLAIR dataset covers the entire country of France, including overseas territories. It provides a representative sample of land cover classes found in different geographic regions.
Spectral Bands: The dataset includes satellite imagery captured across multiple spectral bands, such as visible, near-infrared, and short-wave infrared. This spectral diversity supports the extraction of rich and meaningful information related to land cover and land use patterns.
Temporal Resolution: FLAIR offers multi-temporal imagery, supporting the analysis of land cover changes over time. The dataset comprises images acquired at different intervals, facilitating the examination of seasonal variations and long-term trends.
Annotation and Labels: The FLAIR dataset provides pixel-level annotation for land cover classes, enabling supervised Machine Learning (ML) approaches for land cover classification. The dataset includes a predefined set of land cover categories, which allows for consistency and comparability in analyses.
Dataset Size: FLAIR consists of a substantial amount of imagery data, providing a wide range of training and testing samples for land cover classification models. The dataset size allows for robust model training and evaluation.
2.1.1.4. Applications
The FLAIR dataset is designed to facilitate a variety of applications related to land cover classification, geospatial analysis, and environmental monitoring. Some potential applications include the following.
Land Cover Classification: The dataset serves as a valuable resource for developing and evaluating land cover classification models. Researchers and practitioners can leverage FLAIR to train and test ML algorithms for accurately mapping and monitoring land cover across France.
Land Use Planning: FLAIR can support land use planning efforts by providing detailed and up-to-date information on land cover patterns. It can assist in identifying suitable areas for specific land uses, optimizing resource allocation, and informing policy decisions related to land management.
2.1.1.5. Conclusion
The FLAIR dataset, developed by IGN, offers a rich collection of labeled satellite imagery covering France. With its comprehensive spatial coverage, multi-temporal data, and pixel-level annotation, FLAIR provides a valuable resource for land cover classification, change detection, and geospatial analysis tasks. The dataset’s application potential extends to various domains, including environmental monitoring, land use planning, and impact assessment. The FLAIR dataset contributes to advancing research and applications in remote sensing and geospatial analysis, fostering a deeper understanding of land cover dynamics and supporting evidence-based decision-making. For the Testbed-19 ML Transfer Learning for software implementation task, the FLAIR dataset was used to train TensorFlow and PyTorch based models and export them as an ONNX model for inferencing.
2.1.2. Model description
For this experiment and demo, the following models have been chosen to be fine-tuned on the selected dataset.
The FLAIR 1 AI Challenge Baseline Model: This model is developed for the FLAIR 1 AI Challenge. The challenge focuses on building an AI system to classify French land cover using high-resolution satellite imagery. The baseline model provides a starting point and is based on U-Net architecture with a pre-trained ResNet34 encoder. It has about 24.4M parameters and it is implemented using the segmentation-models-pytorch library.
Another model is based on the Orfeo ToolBox (OTB) and TensorFlow, which combines the capabilities of OTB’s geospatial image processing and analysis with the power of deep learning using TensorFlow.
Segment Anything Model (SAM) — A Foundation Model (FM) for predicting high-quality object masks based on input prompts such as points, bounding boxes, etc. SAM is capable of predicting masks for objects in an image as well as for entire image.
2.1.2.1. Learning-as-a-Service architecture using ZOO-Project:
The Learning-as-a-service (LAAS) architecture builds on top of the ZOO-Project for performing tasks related to ML and deep learning as a service. These tasks include training, visualizing model catalog and model architectures, or deploying a machine or deep learning model as a web service for inferencing. The LAAS approach follows a structured framework with defined components and interactions. The LAAS implementation builds on top of the ZOO-Project, and below is a detailed explanation of the associated components:
ZOO-Project Core Components:
ZOO Kernel: The ZOO-Project’s core component provides the runtime environment and orchestrates the deployment of web services based on the OGC API — Processes (Part 1 Standard and the Part 2 and Part 3 draft standards). The kernel manages client-server communication, handles requests, and coordinates the execution of processes.
ZOO Services: These are individual units encapsulating deep learning models as web services. Each ZOO service represents a specific deep learning model and its associated functionalities.
Model Integration and Configuration:
Deep Learning Model Integration: The deep learning model is integrated into the ZOO-Project by developing a ZOO service incorporating the model’s implementation and functionalities. The service is written using an appropriate programming language compatible with the deep learning framework, such as Python with TensorFlow or PyTorch.
Configuration Definition: The ZOO-Project offers a configuration mechanism to define the input parameters, outputs, and other metadata associated with the deep learning model service. This configuration specifies the expected input format, such as image dimensions or data types, as well as any additional parameters required for model inference.
Data Preprocessing:
Preprocessing Steps: Within the ZOO service, data preprocessing steps are implemented to prepare the input data for deep learning model inference. These steps may include resizing, normalization, or other transformations required to appropriately preprocess the input data.
Model Inference:
Model Loading and Execution: The ZOO service incorporates the code for loading the trained deep learning model. It performs model inference by first converting the model to an interoperable ONNX format by passing the preprocessed input data through the ONNX model and obtaining the output predictions or results.
Web Service Deployment:
ZOO Kernel Operation: The ZOO-Project’s ZOO Kernel acts as the core component for deploying the deep learning model service. It handles the reception of client requests, invokes the model inference process within the respective ZOO service, and returns the results to the clients in a standardized format.
Scalability and Performance: The ZOO-Project architecture leverages underlying web server platforms, such as Apache HTTP Server or Nginx, to ensure scalability and performance. The web server can be configured to handle multiple concurrent requests, enabling the deep learning model service to effectively serve many users.
Interoperability and Extensibility:
Integration of External Resources: The ZOO-Project architecture supports interoperability by facilitating the integration of external resources and capabilities into the deep learning model service. This approach allows the service to utilize additional geospatial or non-geospatial data sources, libraries, or tools to enhance its functionality.
Extensibility: The ZOO-Project framework can be extended to incorporate new functionalities or integrate with existing geospatial or deep learning libraries, enabling the deep learning model service to be enhanced or customized as per specific requirements.
In summary, the LAAS approach provides a formal framework for deploying deep learning models and their associated operations as web services. It includes core components for runtime management, integration of deep learning models, data preprocessing, model inference, and web service deployment. The architecture ensures interoperability, scalability, and performance while allowing for extensibility and integration with external resources.
2.1.3. Components of the Learning-as-a-Service Engine
The ML learning-as-a-service primarily comprises a ZOO-service and the NVIDIA Triton Inference Engine for inferencing. The Triton inference engine is composed of an inference server and a client. The components of the Triton inference engine can be described as follows.
Triton Inference Service Engine: The main component responsible for managing and serving ML models for inference.
Model Repository: Stores ML models in a central repository for easy access.
Model Loader: Loads ML models from the Model Repository into memory for inference.
Inference Server: Handles incoming inference requests, manages model versions, and communicates with the Inference Scheduler.
Inference Scheduler: Schedules and manages the execution of inference requests across multiple Inference Backends.
Inference Backend: Represents the actual hardware or software accelerator (e.g., GPU, CPU) used for inference. Multiple backends can be configured for different hardware options.
The following steps explain the workflow for inferencing using the Triton Inference server within the ZOO Project.
Model Preparation: Prepare the ML model for the inferencing to be used within the ZOO Project. This preparation typically involves training or obtaining a pre-trained model for the specific task.
Model Integration: Integrate the ML model into the ZOO Project’s framework. The ZOO Project allows both the definition and configuration of custom processing services. In this case, the ZOO Project is configured to work with the proposed deep learning model.
Service Configuration: Define a custom processing service within the ZOO Project configuration. This service should specify how to invoke a ML model for object detection. Configuration files and metadata should be set up to describe the input and output parameters of the service.
Triton Inference Server Integration: The Triton inference server can be integrated into the custom ZOO Project service. This integration sends inference requests to Triton for model execution.
Client Request: Clients send requests to the ZOO Project’s WPS services. These requests include the necessary input data for deep learning-based model execution, such as an image or video frame.
ZOO Project Service Execution: The ZOO Project processes the client’s request, which may involve invoking the Triton Inference Server if integrated. It passes the input data to the configured object detection service.
Model Inference: If the Triton Inference Server is used, it performs object detection based on the input data and the configured model. If not, the ZOO Project service directly processes the request using the specified integrated model.
Response to Client: The ZOO Project or Triton generates the results and sends them as part of the response to the client. For example, for object detection, bounding boxes, class labels, and confidence scores are passed as output.
In this context, the ZOO Project serves as the middleware for exposing object detection models as web processing services, making them accessible to clients over the web while Triton Inference Server can be used for efficient model inference if desired. The overall workflow can be illustrated as shown in Figure 5.
Figure 5 — Overall workflow.
2.2. George Mason University
2.2.1. Background
The major goal of this aspect of the OGC Testbed-19 – Transfer Learning for Geospatial Application experiment was to explore the feasibility of the field-level in-season crop mapping of countries outside of the United States by spatiotemporal transfer learning algorithms. As a result, the following objectives and activities were specified.
Development of the transfer learning algorithm and strategy: The project will accomplish the objective of developing a transfer learning algorithm and strategy which will be achieved by training models with U.S. data and applying the trained algorithm to agricultural regions in Brazil and Canada. By doing so, the project will demonstrate the potential of the transfer learning approach in different geographic contexts.
Exploration of image segmentation methods: The project will explore image segmentation methods to automatically extract cropland fields from remote sensing images. This objective aims to improve the accuracy and efficiency of in-season crop mapping by accurately delineating the boundaries of cropland fields.
Enhancement of in-season mapping results: The project will integrate the Segment Anything Model with the transfer learning model to enhance the in-season mapping results. This integration is expected to effectively remove noise from the mapping results and lead to a significant improvement in accuracy.
The success of the experiment will have several significant impacts as follows.
The in-season crop map for countries outside of the United States can be produced automatically from satellite images (e.g., Landsat data or Sentinel-2 data) during the early growing season, which is valuable for agricultural and food security decision makers.
The in-season crop maps can be used for the early estimation of crop yield in the other grain exporters. The early estimation data, especially for those countries with different growing seasons, can provide timely decision support and guidance for farming.
Although this project specifically deals with in-season crop mapping, the transferable ML model developed in this project will be potentially applicable to spatiotemporal transfer learning issues in other domains.
2.2.2. Data
2.2.2.1. Cropland Data Layer
The Cropland Data Layer (CDL) data product is an annual crop-specific agricultural land use map produced by the US Department of Agriculture (USDA) National Agricultural Statistics Service. This map covers the entire Continental US (CONUS) at 30-meter spatial resolution from 2008 to the present and some states from 1997 to 2007. Table 1 summarizes the information about CDL data and its derived data products. The cropland layer provides over 140 land cover classes with around 95% accuracy for major crop types. The crop frequency layer identifies the specific planting frequency of four major crop types across the CONUS, corn, cotton, soybeans, and wheat, based on CDL from 2008 to the present. The confidence layer represents the percentage (0-100) of confidence for each cropland pixel (Liu et al., 2004). The cultivated layer is a crop mask map with pixels that are identified as cultivated in at least two out of the most recent five years of CDL data.
Table 1 — Summary of CDL and its derived data products.
| Layer | Availability | Coverage | Spatial Resolution |
|---|---|---|---|
| Cropland Layer | 1997 to present | CONUS (2008-2020) Some states (1997-2008) | 30-meter |
| Crop Frequency Layer | 2008 to present | CONUS | 30-meter |
| Confidence Layer | 2008 to present | CONUS | 30-meter |
| Cultivated Layer | 2013 to present | CONUS | 30-meter |
2.2.2.2. Satellite Image Data
The satellite images explored in this experiment are derived from the two most widely accessible moderate-to-high spatial resolution data sets: Landsat-8 and Sentinel-2. Landsat is a joint program of the USGS and NASA, which has been observing the Earth at a 30-m resolution in a 16-day repeat cycle continuously from 1972 to the present. As the eighth satellite in the Landsat program, Landsat-8 was launched in February 2013. It carries the Operational Land Imager (OLI) and Thermal Infrared Sensor (TIRS) instruments providing moderate-resolution imagery from 15-100 m. Table 2 lists the spectral band specification of the Landsat-8 sensors.
Table 2 — Landsat-8 spectral band specifications.
| Band | Description | Wavelength | Resolution | Sensor |
|---|---|---|---|---|
| 1 | Coastal aerosol | 0.43-0.45 µm | 30 meters | OLI |
| 2 | Blue | 0.45-0.51 µm | 30 meters | OLI |
| 3 | Green | 0.53-0.59 µm | 30 meters | OLI |
| 4 | Red | 0.64-0.67 µm | 30 meters | OLI |
| 5 | Near Infrared (NIR) | 0.85-0.88 µm | 30 meters | OLI |
| 6 | Shortwave Infrared (SWIR) 1 | 1.57-1.65 µm | 30 meters | OLI |
| 7 | Shortwave Infrared (SWIR) 2 | 2.11-2.29 µm | 30 meters | OLI |
| 8 | Panchromatic | 0.50-0.68 µm | 15 meters | OLI |
| 9 | Cirrus | 1.36-1.38 µm | 30 meters | OLI |
| 10 | Thermal Infrared (TIRS) 1 | 10.60-11.19 µm | 100 meters | TIRS |
| 11 | Thermal Infrared (TIRS) 2 | 11.50-12.51 µm | 100 meters | TIRS |
The Copernicus Sentinel-2 mission is operated by the European Space Agency (ESA). Sentinel-2 consists of two twin polar-orbiting satellites (Sentinel-2A and Sentinel-2B). The Sentinel-2A satellite was launched in June 2015, and the Sentinel-2B was launched in March 2017. They provide the higher temporal resolution of revisiting every five days under the same viewing angles and a higher spatial resolution of 10-60 m. The main instrument of the Sentinel-2 mission, the MultiSpectral Instrument (MSI), covers 13 spectral bands ranging from visible and near-infrared to shortwave infrared wavelengths. Table 3 summarizes the spectral band specification of the Sentinel-2 sensor.
Table 3 — Sentinel-2 spectral band specifications.
| Band | Description | Wavelength | Resolution | Sensor |
|---|---|---|---|---|
| 1 | Coastal aerosol | 443.9nm (S2A) / 442.3nm (S2B) | 60 meters | MSI |
| 2 | Blue | 496.6nm (S2A) / 492.1nm (S2B) | 10 meters | MSI |
| 3 | Green | 560nm (S2A) / 559nm (S2B) | 10 meters | MSI |
| 4 | Red | 664.5nm (S2A) / 665nm (S2B) | 10 meters | MSI |
| 5 | Vegetation Red Edge 1 | 703.9nm (S2A) / 703.8nm (S2B) | 20 meters | MSI |
| 6 | Vegetation Red Edge 2 | 740.2nm (S2A) / 739.1nm (S2B) | 20 meters | MSI |
| 7 | Vegetation Red Edge 3 | 782.5nm (S2A) / 779.7nm (S2B) | 20 meters | MSI |
| 8 | Near infrared (NIR) | 835.1nm (S2A) / 833nm (S2B) | 10 meters | MSI |
| 8A | Vegetation Red Edge 4 | 864.8nm (S2A) / 864nm (S2B) | 20 meters | MSI |
| 9 | Water vapour | 945nm (S2A) / 943.2nm (S2B) | 60 meters | MSI |
| 10 | Shortwave Infrared / Cirrus | 1373.5nm (S2A) / 1376.9nm (S2B) | 60 meters | MSI |
| 11 | Shortwave Infrared (SWIR) 1 | 1613.7nm (S2A) / 1610.4nm (S2B) | 20 meters | MSI |
| 12 | Shortwave Infrared (SWIR) 2 | 2202.4nm (S2A) / 2185.7nm (S2B) | 20 meters | MSI |
There are many ways to access Landsat data and Sentinel-2 data. The USGS Earth Explorer is the official source for downloading Landsat data. The ESA Copernicus Open Access Hub provides complete and open access to Sentinel-2 data. The GEE data catalog has archived diverse standardized geospatial data sets, including the CDL, Landsat-8, and Sentinel-2 data.
2.2.3. Model
2.2.3.1. Crop Type Prediction
Trusted pixels refer to pixels predicted from the historical CDL data with high confidence in the current year’s crop type. As a practical approach for discovering intricate patterns and structures in high-dimensional data, ML has been widely used in Land Use Land Cover (LULC) studies. The production of trusted pixels is based on the crop sequence pattern that is automatically recognized from the CDL time series. To train the crop sequence model, an ANN model was integrated with the in-season mapping workflow, which has proven effective in predicting the spatial distribution of major crop types (Zhang et al., 2019a).
Figure 6 illustrates the process of trusted pixel prediction. First, the historical CDL time series was converted into an image stack with crop sequence features for all pixels. Each crop sequence feature is a one-dimensional array containing the pixel-level time series of historical CDL. Second, each crop sequence feature is fed into the prediction model to predict the following year’s crop type of the corresponding pixel. The ANN model for trusted pixel prediction has the fully-connected multilayer perceptron (MLP) structure, which consists of one input layer, five hidden layers, and one output layer. Each input neuron represents each crop type value of the crop sequence feature. The output layer of the neural network used the SoftMax function to calculate the probability value of three classes (corn, soybeans, or others). The crop type of the corresponding pixel is categorized as a class with the highest probability value. The final output of the prediction model is a prediction map of crop cover and its probability map. By masking the high-confident pixels (>90%) on the prediction map, a map of trusted pixels is generated. If the sequence is like a regular pattern, there is a high chance that the pixel would be classified as a trusted pixel (e.g., corn 90%, soybeans 8%, others 2%). If a sequence cannot be recognized by the well-trained model, the probability of each class could be more even (e.g., corn 45%, soybeans 30%, others 25%) and it would be classified as a non-trusted pixel.
Figure 6 — Predicting trusted pixels from historical CDL time series using ANN.
The training data set was constructed with three recursive subsets, each with an 8-year moving window. While producing trusted pixels for 2019, the ANN model is trained using sub-training sets of 2010–2017 CDL labeled with 2018 CDL, 2009–2016 CDL labeled with 2017 CDL, and 2008–2015 CDL labeled with 2016 CDL. This design can efficiently extend the training data set and allows the neural network to recognize crop sequence labels for the last three consecutive years. To convert features into the readable form of neural network, the training data set was flattened to a structured 2-D table. Each row represents a sample of a sequence of pixel-level crop type features labeled with the corresponding pixel in the label set. For example, a training sample of pixel that follows the corn-soybean rotation pattern will be represented as “1, 5, 1, 5, 1, 5, 1, 5” labeling with “1” or “5, 1, 5, 1, 5, 1, 5, 5” labeling with “5,” where “1” refers to corn and “5” refers to soybeans (the full class table of CDL data is available at Appendix). Although the CDL data has been available since 1997, the training set was not built with this long CDL time series because the quality of the early-year CDL varies across regions, and the coverage of CDL is incomplete before 2008, which may significantly affect the accuracy of the derived ML model.
To train a robust prediction model, the training set should provide abundant samples with diverse crop sequence features. Based on the similarity of agricultural characteristics and environment, USDA NASS divided each U.S. state into several Agricultural Statistics Districts (ASDs). To make sure the crop sequence features of the prediction model are correct, ML models need to be trained for each ASD and then trusted pixel mapping has to be used ASD by ASD. In this way, the well-trained neural network would recognize the specific crop sequence information for the corresponding ASD.
2.2.3.2. Crop Type Classification
Figure 7 shows the procedure of in-season crop type classification using satellite images and trusted pixels. The input data structure of the classification model is an image stack with both spectral and temporal information. The quantity of satellite images used for assembling image stack depends on the availability of cloud-free satellite images within the growing season. Based on the spatial distribution of trusted pixels, the training samples are automatically labeled on the image stack. The trusted pixel-based training samples can be applied to diverse pixel-based classifiers. This experiment applied the MLP-based ANN as the classifier which has a similar structure to the trusted pixel prediction model. Each input neuron represents the value in the one-dimensional band feature of the corresponding pixel. Finally, an in-season crop cover map can be generated by applying the trained classification model on the full image. The geography, season starting, and temporal collection of satellite images may significantly vary among the different scenes over a large area.
Figure 7 — In-season crop type classification using multi-temporal satellite image stack and trusted pixels.
2.3. Pixalytics
2.3.1. Plastics ML Model
The Plastics ML model is not open-source, but the underlying research is documented in a peer-reviewed paper. It was designed to detect and map plastic waste in the environment, supporting clean-up. This has included mapping marine plastics in Indonesia and detecting tires in several countries to support recycling efforts.
A ML-based classifier was developed to run on Copernicus Sentinel-1 and -2 data. To support the training and validation, a dataset was created with terrestrial and aquatic cases by manually digitizing varying landcover classes alongside plastic classes under the sub-categories of greenhouses, plastic, tires, and waste sites.
Pixalytics implemented an initial approach to use transfer learning to take the Artificial Neural Network and train it for specific plastic waste occurrence scenarios: agricultural plastic waste between greenhouses was tested. The aim was to achieve higher accuracy when the model is trained and run on a specific plastic type by using training data focused on that location.
2.3.2. Meta AI Segment Anything Model (SAM)
The Meta Artificial Intelligence (AI) Segment Anything Model (SAM) is documented in a paper of the same name, and available as code in a GitHub repository. SAM has three components: An image encoder (runs once per image), a flexible prompt encoder (that can include text prompts or masks), and a fast mask decoder (generates the output mask). SAM was trained on a dataset comprised of 11 million images and 1.1 billion masks. As acknowledged in the paper, SAM will perform well in general, but can miss fine structures, hallucinates (often defined as “generated content that is nonsensical or unfaithful to the provided source content”) small disconnected components at times, and does not produce boundaries as crisply as more computationally intensive methods that “zoom-in”. Also, in general, the authors of the paper expect dedicated interactive segmentation methods to outperform SAM when many points are provided.
Pixalytics tested this model’s applicability to hyperspectral Earth Observation (EO) data. As a first step, the model was implemented for a three-band RGB quicklook from CHRIS/Proba-1 (see below), and then the model will be transferred so it can be run on the hyperspectral inputs.
The Project for OnBoard Autonomy-1 (Proba-1) mission was launched in 2001 and continues and celebrated its twentieth anniversary in 2021, with new image CHRIS acquisitions stopped at the end of 2022. It carries a hyperspectral instrument, called the Compact High Resolution Imaging Spectrometer (CHRIS), alongside a high-resolution camera and instrument payloads focused on debris and space radiation.
2.4. Rendered.ai
Transfer learning empowers commercial and GEOINT computer vision practitioners by offering a practical, efficient, and effective approach to speeding up deployment of AI solutions for critical tasks. Transfer learning combines the collective knowledge encoded in pre-trained models with fine-tuning using data from a target domain, allowing for training with fewer positive examples of the target object than would otherwise be needed.
Typically, transfer learning techniques start with models trained on generic data, such as the commonly used Common Objects in Context (COCO) dataset, as labeled data for focused domains can be difficult or impossible to acquire. Recent advancements in image simulation techniques, however, enable the possibility of base models that are trained on large, diverse datasets that approximate the target without the need for large amounts of real examples of an exact object of interest. The hypothesis of this project is that synthetic data can be used to build a base model that demonstrates improved model performance when transfer learning techniques are applied when compared with a model pre-trained on generic data. The experiment conducted to test this hypothesis was performed for a common use case relevant to commercial and GEOINT computer vision practitioners — detection of cargo planes.
The goals of this experiment were:
to demonstrate that synthetic data designed to emulate real sensor data can be used to build a model backbone that improves transfer learning outcomes over a backbone trained on generic data;
to determine best practices for synthetic data generation and preparation in transfer learning applications; and
to understand factors that influence synthetic data’s effectiveness in transfer learning, as well as the general limitations of this approach.
To implement these experiments, Rendered.ai focused on the creation of the simulated data and partnered with the geospatial computer vision experts at Orbital Insight to ensure that the model training efforts were conducted using the state of the art in computer vision techniques.
2.4.1. Definition of Real Sensor Dataset and Object Class
The foundation of this investigation involved determining an existing open-source dataset containing labeled objects with enough instances to enable effective model training with real data alone. This was critical to establish a baseline of performance to be used to measure our progress, and to ensure that a diverse test set could be derived from the real data. For this research, the focus was directed towards the xView dataset, an open dataset of satellite imagery at approximately 30 cm resolution that includes 1 million bounding box labels for 60 common man-made object classes covering over 1,400 km2 of the Earth’s surface.
The selection of a target class within the 60 labeled objects in the xView dataset was determined based on the assessed detectability of the object in real data, which is influenced by both the typical size of the object in pixels and the number of instances present in the dataset. With these criteria in mind, the Cargo Plane object class was selected as the object of study. Within xView, there are 718 instances of cargo planes across 143 images, providing enough unique instances for model training within a diverse set of background contexts. Furthermore, the median bounding box area for this object is 11091 pixels in this dataset, providing sufficient detectability using standard deep learning techniques.
Table 4 — Cargo plane objects in xView.
| Object Name | Number of Images | Number of Instances | Median Size (Pixels) |
|---|---|---|---|
| Cargo_Plane | 143 | 718 | 11091 |
2.4.2. Creation of the Synthetic Data Channel
The creation of a synthetic data application capable of emulating the attributes of the selected real dataset was a critical step. For this project, the work was based on preexisting tech available within Rendered.ai that uses a combination of the Blender simulation engine, 3D models of target assets, real imagery backgrounds, and Rendered.ai’s configurable dataset generation capability. In order to customize this application to generate data relevant to this use case, there was a requirement to acquire and deploy a variety of 3D models representative of the target assets and configure background imagery that matches the domain of the real data.
For the target assets, eleven different 3D models of commercial aircraft of various sizes and configurations were utilized. These represented generic aircraft models acquired from 3D model marketplaces such as Turbosquid.com. These were then configured for use in the pre-existing application and deployed to the Rendered.ai platform.
For the backgrounds, fifteen different airport images, each approximately 1 km2 in area, were selected from the xView dataset to ensure image resolution matched that of the target dataset. Due to the size of these images, and the large areas of potential placement within each image, this number was deemed sufficient for experimentation. In cases where real aircraft were present in the image, image manipulation techniques were used to remove these objects from the background to avoid confusion of the model. Once this was complete, “agent factories” were placed along all runways and aircraft traffic areas to denote where aircraft models could potentially be simulated within the scene. These images were then deployed to the Rendered.ai platform along with corresponding metadata that would influence simulation, including ground sample distance (GSD), sun angle, blur, and noise properties. These environmental and scene settings allow for the seamless integration of 3D objects and 2D imagery into a simulated capture scene.
The content described above was then deployed as part of a pre-existing RGB satellite simulation channel on the Rendered.ai platform, which supports intelligent placement and modification of 3D assets within backgrounds, sensor and image specification and variation, and a comprehensive labeling system to support the output of diverse labeled image datasets ready to be used in model training.
2.4.3. Dataset Generation and Domain Adaptation
The resulting synthetic data channel was used in generating datasets specifically designed for training and experimentation. For the purposes of this experiment, two different configurations of the simulation framework were used to test the relative performances of different approaches. For one dataset, the 3D assets were simulated unmodified within the background scene. In the second, modifiers were added to randomly change the color and slightly vary the scale of the input plane assets and to vary sun angle in the scene to project shadows of varying lengths and directions against the background. The purpose of this experiment was to test which approach generated data that provided the best performing model against the xView test set.
Figure 8 — Left: Example synthetic image with unmodified assets (planes). Right: Example image with color and scale of assets and sun angle of the scene varied.
Testing revealed that the dataset with the unmodified assets and scene outperformed that of the parameter-varied set. This result suggests that the accuracy of domain match (or put differently, the “realism”) of the synthetic data output is more important in this case than additional diversity at the potential expense of domain match. This may have been especially true due to the relatively small size of the training datasets.
The next experiment undertaken was to apply a trained Generative Adversarial Network (GAN) domain adaptation model to the synthetic dataset. This GAN model was trained using a source dataset of synthetic satellite image data, and a target set of unmodified xView imagery. Thus, this process uses generative AI techniques to adapt input synthetic images to match the statistical characteristics of the real image data. As seen in the provided example images, this process can change the characteristics of the image, introducing changes in hue, artifacts, and aberrations that otherwise may not be introduced by a pure simulation approach. The relative positions of objects in the image, however, remain consistent, allowing for previously generated labels to maintain their integrity.
Figure 9 — Left: Original simulated synthetic image. Right: Resulting image after modifying the original image with GAN-based domain adaptation, a post-processing technique used to enhance domain match.
The results of this testing showed that the dataset with GAN-based domain adaptation applied demonstrated significantly improved training results over the non-adapted dataset. This confirms prior findings from experiments done by Rendered.ai and Orbital Insight that test model performance on domain-adapted synthetic image data. With these findings established, establishing hypotheses surrounding which synthetic dataset will provide the most effective model backbone for the transfer learning experiments to come could begin.
2.4.4. Model Training and Transfer Learning
To test the effectiveness of using a model backbone trained on synthetic data compared with a generic backbone, the first step was a pre-trained model backbone developed using the COCO dataset. This dataset is commonly used for generic model training due to the large and diverse set of object classes contained in this dataset and its generally accepted level of data label quality. For the detection model, the Faster R-CNN object detection model using the Detectron2 framework was leveraged. This model was chosen due to xView annotations containing only bounding box locations and not full instance-level segmentation masks required for a segmentation model such as Mask R-CNN.
Using the trained COCO model backbone, a baseline detection performance metrics on the cargo plane subset of xView was established. Of the 143 images containing planes, 63 images were selected for the training set, containing a total of 327 object instances. The validation and test sets were then allotted 21 and 59 images respectively, with 92 and 299 object instances, respectively.
Transfer learning models were then trained atop the COCO model backbone using the full training set, as well as six artificially constrained subsets of the training set, containing 50, 40, 30, 20, 10, and 5 positive training image examples. This was done to measure the effects of introducing scarcity into the training set. To achieve a rapid, relative assessment of performance of each of these training sets, model training and fine-tuning hyperparameters were fixed for all training sets and not optimized for each training set independently. Additionally, due to limitations in the Detectron2 libraries, model parameters were not scale-aware, and were susceptible to changes in image scaling due to inconsistent input image size.
Once baseline metrics were determined using a generic COCO model backbone, separate model backbones were trained using both the GAN-adapted synthetic dataset, which showed the best detection performance in the initial testing, as well as a combination of all three synthetic datasets: the non-adapted base dataset; the color, scale, and shadow modified dataset; and the GAN-adapted base dataset. These new backbones were then used to train transfer learning models for each of the full and artificially constrained xView training datasets to compare effectiveness against the generic model backbone results.
3. Transferring models between software applications
With the emergence of various software frameworks for implementing deep learning architectures and rapid development in research using these frameworks, it is imperative to understand the transfer of models between these software frameworks. Some of the notable frameworks are shown in [dl-frameworks-tabl], which is not an exhaustive list.
Table 5 — Existing software frameworks for implementing deep learning based architectures
| Software framework | Year of release | Platform | Type | Repository | License |
|---|---|---|---|---|---|
| TensorFlow | 2015 | Cross-Platform | ML library | https://github.com/tensorflow/tensorflow | Apache License 2.0 |
| PyTorch | 2016 | Cross-Platform | ML library | https://github.com/pytorch/pytorch | Berkeley Software Distribution (BSD) |
| MxNet | 2017 | Linux, macOS, Windows | ML library | https://github.com/apache/mxnet | Apache License 2.0 |
| Caffe | 2014 | Linux, macOS, Windows | DL library | https://github.com/BVLC/caffe/tree/master | The 2-Clause Berkeley Software Distribution (BSD) |
| Keras | 2015 | Cross-Platform | DL library | https://github.com/keras-team/keras | Apache License 2.0 |
| CNTK | 2016 | Cross-Platform | ML and DL library | https://github.com/Microsoft/CNTK | MIT License |
| Deeplearning4j | 2014 | Cross-Platform | Natural Language Processing(NLP), Deep Learning, Machine Vision, Artificial Intelligence(AI) | https://github.com/deeplearning4j/deeplearning4j | Apache License 2.0 |
| Theano (Deprecated) | 2007 | Linux, macOS, Windows | Machine learning library | https://github.com/Theano/Theano | The 3-Clause Berkeley Software Distribution (BSD) |
| Chainer | 2015 | Cross-Platform | DL library | https://github.com/chainer/chainer | MIT License |
Cross-Platform = Linux, macOS, Windows, Android, JavaScript; DL = Deep Learning; ML = Machine Learning
The above mentioned frameworks have attracted huge attention with respect to the number of stars and forks from their respective repositories. However, many of those repositories have been terminated or deprecated with time.
In order to analyze the transfer learning aspect with reference to various software frameworks, an end-to-end framework based on web-services architecture was developed for training, fine-tuning based on a pre-trained model, visualizing model graphs, and inferencing. The client-side authentication was incorporated based on OIDC for a particular task and user. The above-mentioned functionalities are developed as web-services based on OGC Web Processing Service (WPS) Standard and OGC API — Processes — Part 1: Core. Moreover, the training data is encoded as a JSON file based on the OGC Training-data Markup Language for Artificial Intelligence.
Figure 10 illustrates the overall Learning-as-a-service (LAAS) workflow of the proposed standardized framework based on the web-services.
Figure 10 — Overall Learning-as-a-service (LAAS) workflow
The LAAS framework comprises the following end-points for implementing various operations.
Authentication: The OIDC based authentication for introducing security across each project/task ensures accountability and is significant when multiple stakeholders with different tasks and datasets are involved, see Figure 11. Additionally, security is beneficial when multiple users are selectively required to be authenticated.
Figure 11 — Authentication
/tdml — TDML-as-a-service: Endpoint implementing generation of training data encodings in a JSON file format based on the OGC Training-DML for AI Standard.
Figure 12 — TDML-as-a-service
/processes — Processes-as-a-service: Endpoint for executing processes as a web-service based on the OGC API — Processes — Part 1.