Publication Date: 2021-02-15

Approval Date: 2021-02-10

Submission Date: 2020-11-20

Reference number of this document: OGC 20-015r2

Reference URL for this document: http://www.opengis.net/doc/PER/t16-D015

Category: OGC Public Engineering Report

Editor: Panagiotis (Peter) A. Vretanos

Title: OGC Testbed-16: Machine Learning Engineering Report


OGC Public Engineering Report

COPYRIGHT

Copyright © 2021 Open Geospatial Consortium. To obtain additional rights of use, visit http://www.ogc.org/

WARNING

This document is not an OGC Standard. This document is an OGC Public Engineering Report created as a deliverable in an OGC Interoperability Initiative and is not an official position of the OGC membership. It is distributed for review and comment. It is subject to change without notice and may not be referred to as an OGC Standard. Further, any OGC Public Engineering Report should not be referenced as required or mandatory technology in procurements. However, the discussions in this document could very well lead to the definition of an OGC Standard.

LICENSE AGREEMENT

Permission is hereby granted by the Open Geospatial Consortium, ("Licensor"), free of charge and subject to the terms set forth below, to any person obtaining a copy of this Intellectual Property and any associated documentation, to deal in the Intellectual Property without restriction (except as set forth below), including without limitation the rights to implement, use, copy, modify, merge, publish, distribute, and/or sublicense copies of the Intellectual Property, and to permit persons to whom the Intellectual Property is furnished to do so, provided that all copyright notices on the intellectual property are retained intact and that each person to whom the Intellectual Property is furnished agrees to the terms of this Agreement.

If you modify the Intellectual Property, all copies of the modified Intellectual Property must include, in addition to the above copyright notice, a notice that the Intellectual Property includes modifications that have not been approved or adopted by LICENSOR.

THIS LICENSE IS A COPYRIGHT LICENSE ONLY, AND DOES NOT CONVEY ANY RIGHTS UNDER ANY PATENTS THAT MAY BE IN FORCE ANYWHERE IN THE WORLD. THE INTELLECTUAL PROPERTY IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NONINFRINGEMENT OF THIRD PARTY RIGHTS. THE COPYRIGHT HOLDER OR HOLDERS INCLUDED IN THIS NOTICE DO NOT WARRANT THAT THE FUNCTIONS CONTAINED IN THE INTELLECTUAL PROPERTY WILL MEET YOUR REQUIREMENTS OR THAT THE OPERATION OF THE INTELLECTUAL PROPERTY WILL BE UNINTERRUPTED OR ERROR FREE. ANY USE OF THE INTELLECTUAL PROPERTY SHALL BE MADE ENTIRELY AT THE USER’S OWN RISK. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR ANY CONTRIBUTOR OF INTELLECTUAL PROPERTY RIGHTS TO THE INTELLECTUAL PROPERTY BE LIABLE FOR ANY CLAIM, OR ANY DIRECT, SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM ANY ALLEGED INFRINGEMENT OR ANY LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR UNDER ANY OTHER LEGAL THEORY, ARISING OUT OF OR IN CONNECTION WITH THE IMPLEMENTATION, USE, COMMERCIALIZATION OR PERFORMANCE OF THIS INTELLECTUAL PROPERTY.

This license is effective until terminated. You may terminate it at any time by destroying the Intellectual Property together with all copies in any form. The license will also terminate if you fail to comply with any term or condition of this Agreement. Except as provided in the following sentence, no such termination of this license shall require the termination of any third party end-user sublicense to the Intellectual Property which is in force as of the date of notice of such termination. In addition, should the Intellectual Property, or the operation of the Intellectual Property, infringe, or in LICENSOR’s sole opinion be likely to infringe, any patent, copyright, trademark or other right of a third party, you agree that LICENSOR, in its sole discretion, may terminate this license without any compensation or liability to you, your licensees or any other party. You agree upon termination of any kind to destroy or cause to be destroyed the Intellectual Property together with all copies in any form, whether held by you or by any third party.

Except as contained in this notice, the name of LICENSOR or of any other holder of a copyright in all or part of the Intellectual Property shall not be used in advertising or otherwise to promote the sale, use or other dealings in this Intellectual Property without prior written authorization of LICENSOR or such copyright holder. LICENSOR is and shall at all times be the sole entity that may authorize you or any third party to use certification marks, trademarks or other special designations to indicate compliance with any LICENSOR standards or specifications.

This Agreement is governed by the laws of the Commonwealth of Massachusetts. The application to this Agreement of the United Nations Convention on Contracts for the International Sale of Goods is hereby expressly excluded. In the event any provision of this Agreement shall be deemed unenforceable, void or invalid, such provision shall be modified so as to make it valid and enforceable, and as so modified the entire Agreement shall remain in full force and effect. No decision, action or inaction by LICENSOR shall be construed to be a waiver of any rights or remedies available to it.

None of the Intellectual Property or underlying information or technology may be downloaded or otherwise exported or reexported in violation of U.S. export laws and regulations. In addition, you are responsible for complying with any local laws in your jurisdiction which may impact your right to import, export or use the Intellectual Property, and you represent that you have complied with any regulations or registration procedures required by applicable law to make this license enforceable.

Table of Contents

1. Subject

This engineering report describes the work performed in the Machine Learning Thread of OGC’s Testbed-16 initiative.

Previous OGC testbed tasks concerned with Machine Learning (ML) concentrated on the methods and apparatus of training models to produce high quality results. The work reported in this ER, however, focuses less on the accuracy of machine models and more on how the entire machine learning processing chain from discovering training data to visualizing the results of a ML model run can be integrated into a standards-based data infrastructure specifically based on OGC interface standards.

The work performed in this thread consisted of:

  1. Training ML models;

  2. Deploying trained ML models;

  3. Making deployed ML models discoverable;

  4. Executing an ML model;

  5. Publishing the results from executing a ML model;

  6. Visualizing the results from running a ML model.

At each step, the following OGC and related standards were integrated into the workflow to provide an infrastructure upon which the above activities were performed:

  1. OGC API - Features: Approved OGC Standard that provides API building blocks to create, retrieve, modify and query features on the Web.

  2. OGC API - Coverages: Draft OGC Standard that provides API building blocks to create, retrieve, modify and query coverages on the Web.

  3. OGC API - Records: Draft OGC Standard that provides API building block to create, modify and query catalogues on the Web.

  4. Application Deployment and Execution Service: Draft OGC Standard that provides API building blocks to deploy, execute and retrieve results of processes on the Web.

    • MapML is a specification that was published by the Maps For HTML Community Group. It extends the base HTML map element to handle the display and editing of interactive geographic maps and map data without the need of special plugins or JavaScript libraries. The Design of MapML resolves a Web Platform gap by combining map and map data semantics into a hypermedia format that is syntactically and architecturally compatible with and derived from HTML. It provides a standardized way for declarative HTML content to communicate with custom spatial server software (which currently use HTTP APIs based on multiple queries and responses). It allows map and map data semantics to be either included in HTML directly, or referred to at arbitrary URLs that describe stand-alone layers of map content, including hyper-linked annotations to further content.

Particular emphasis was placed on using services based on the emerging OGC API Framework suite of API building blocks.

Note
This ER does not cover the specific details concerning the discovery and reusability of training data sets. A complete description of this topic can be found in the D016 Machine Learning Training Data Engineering Report.

2. Executive Summary

2.1. Business statement

The integration of Machine Learning (ML) tools into a framework composed of catalogues, data access services and data processing services that comply with OGC standards can result in a ML processing chain that can extract knowledge and insight from the vast amount of geospatial data being collected and deployed in cloud platforms.

2.2. Goals

The OGC Testbed-16 goals for the Machine Learning Thread are:

  1. Discovery and reusability of data used to train predictive ML models.

  2. The integration of predictive ML models into a standards-based data infrastructure.

  3. Cost-effective visualization and data exploration technologies based on the use of the Map Markup Language (MapML).

2.3. Scenario / Use-cases

These goals are explored and addressed using the backdrop of a wildland fires scenario.

Wildland fires are those that occur in forests, shrublands and grasslands. While representing a natural component of forest ecosystems, wildland fires can present risks to human lives and infrastructure. Being able to properly plan for and respond to wildland fire events is thus a critical component of forestry management and emergency response.

Appropriate responses to wildland fire events benefit from planning activities undertaken before events occur. ML presents a new opportunity to advance wildland fire planning using diverse sets of geospatial information such as satellite imagery, Light Detection and Ranging (LiDAR) data, land cover information and building footprints. As much of the required geospatial information is available using OGC standard interfaces and encodings, a requirement exists to understand how well these standards can support ML in the context of wildland fire planning. Testbed-16 explored how to leverage ML, cloud deployment and execution, and geospatial information, provided through OGC standards, to improve planning approaches for wildland fire events. Findings inform future improvement and/or development activities for OGC Standards, leading to improved potential for the use of OGC standards within an infrastructure that includes ML applications.

components with challenges
Figure 1. ML and EO integration challenges

Advanced planning for wildland fire events can greatly improve the ability of first responders to address a situation. However, accounting for the many variables (e.g. wind, dryness, fuel loads) and their combinations that will be present at the exact time of an event is very difficult. As such, there is an opportunity to evaluate how ML approaches, combined with geospatial information delivered using OGC Standards, can improve response planning throughout the duration and aftermath of wildland fire occurrences.

Thus, in addition to planning related work, Testbed-16 explored how to leverage ML technologies for dynamic wildland fire response. The planned work provided insight into how OGC Standards can support wildland fire response activities in a dynamic context. Any identified limitations of existing OGC Standards were documented and will be used to plan improvements to these frameworks. The Testbed was also an opportunity to explore how OGC Standards may be able to support the upcoming Canadian WildFireSat mission.

Important
Though this task uses a wildland fire scenario, the emphasis is not on the quality of the modelled results, but on the integration of externally provided source and training data, the deployment of the ML model on remote clouds through a standardized interface, and the visualization of model output.

2.4. Research questions

This ER addresses the following research questions:

  • Does ML require "data interoperability"? Or can ML enable "data interoperability"?

  • How do existing and emerging OGC Standards contribute to a data architecture flow towards "data interoperability"?

  • Is it necessary to have analysis ready data (ARD) for ML? Can ML help ARD development?

  • What is the value of datacubes for ML?

  • How do we address interoperability of distributed datacubes maintained by different organizations?

  • What is the potential of MapML in the context of ML? Where does it need to be enhanced?

  • How to discover and run an existing ML model?

2.5. Primary findings

The answers to the research questions can be found in the Research questions section. The following additional findings of the ML thread in OGC Testbed-16 are noted:

  • The use of catalogues, data access services and data processing services that comply with OGC standards facilitates the modular deployment and expandability of ML processing chains.

  • While the older OGC W*S services (e.g. WMS, WMTS, WFS, CSW, etc.) can be suitably integrated into the processing chain, the newer OGC API interfaces are easier to use and easier to integrate into a ML processing chains.

  • The OGC API - Tiles interface is a good candidate for model training as both value datasets and label datasets can be retrieved.

  • The older WMS interface can also be used for model training but relies on the existence of optional capabilities (i.e. LegendURL and GetStyles) in the service instances being used.

  • The OGC API - Records interface offered a high level of flexibility, allowing both the discovery of training datasets and providing the binding information necessary for data extraction by a ML algorithm. The catalogue is capable of harvesting repositories of different typologies and to list the relevant information for ML applications. This is an emerging OGC standard that can potentially contribute to a data architecture flow towards "data interoperability".

  • Using Docker to encapsulate both trained ML models as well as the entire processing chain facilitated testing the processing chain, as well as scaling it in production.

  • The use of Application Deployment and Execution Service (ADES) and Execution Management Service (EMS) allows the dynamic deployment of trained models encapsulated in Docker containers and provides a consistent interface for executing ML models and retrieving the results of processing. Those results can be persistently stored for use by downstream actors.

  • There are plenty of development libraries available that implement support for OGC standards. Conveniently for the ML domain, numerous Python libraries with OGC standards support exist.

2.6. Future work

The following future-work items where identified:

  • Data Authenticity: This aspect needs to be investigated in order to be sure that the model is trained and inferred with authentic data (the issue of data tampering in satellite imagery was also noted).

  • Analysis Ready Data (ARD): Another important aspect to take into consideration when the framework deals with different data sources like datacubes where some data could be already in ARD format and some other not.

  • ONNX check points: for the time being the actual model stores its checkpoints in native format, but it could be useful to take into consideration the Open Neural Network eXchange (ONNX) format.

  • Training dismissal: another important aspect to be covered is the expected behavior in case it is required to interrupt the training. For instance, all the intermediate training has to be maintained or not?

3. Standard and/or Domain Working Group review

3.1. Overview

The Machine Leaning (ML) thread participants and sponsors believed that the work of the ML task is relevant to work being done in the OGC Standards and Domain Working Groups (SWGs, DWGs) listed below. A request for review of this ER by the SWG/DWG members was forwarded to the working groups by the editor.

3.2. Artificial Intelligence in Geoinformatics (GeoAI) DWG

The Artificial Intelligence in Geoinformatics (GeoAI) DWG is chartered to identify use cases and applications related to Artificial Intelligence (AI) in geospatial domains and focused on the Internet-of-Things (e.g., healthcare, smart energy), robots (e.g., manufacturing, self-driving vehicles), or ‘digital twins’ (e.g., smart buildings and cities). This DWG provides an open forum for broad discussion and presentation of use cases with the purpose of bringing geoscientists, computer scientists, engineers, entrepreneurs, and decision makers from academia, industry, and government together to develop, share, and research the latest trends, successes, challenges, and opportunities in the field of AI with geospatial data. The working group aims to investigate the feasibility and interoperability of OGC standards in incorporating geospatial information with AI and describe gaps and issues which can lead to new geospatial standardization to advance trustworthiness and accountability for this domain community. Furthermore, existing OGC Web Services need to be carefully examined for changes that may need to be made in the context of AI-empowered applications. As some AI methods are already included in OGC standards, it is expected that AI methods will also impact many OGC standards in the future. For example, routing services have not yet been built according to human-centered AI, despite some suggestions to extend the Open Location Services (OpenLS) standard.

The goal of the ML task in Testbed-16 is to explore how to leverage ML through OGC standards to improve planning approaches for wildland fire events and this seems to align with the goals of a GeoAI DWG especially as it relates to incorporating and integrating geospatial information with AI.

3.3. Document contributor contact points

All questions regarding this document should be directed to the editor or the contributors:

Contacts

Name Organization Role

Panagiotis (Peter) A. Vretanos

CubeWerx Inc.

Editor

Samuel Foucher

CRIM

Contributor

Francis Charette-Migneault

CRIM

Contributor

Matthes Rieke

52°North

Contributor

Andrea Cavallini

RHEA Group

Contributor

Nicola Lorusso

RHEA Group

Contributor

Valerio Fontana

RHEA Group

Contributor

3.4. Foreword

Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. The Open Geospatial Consortium shall not be held responsible for identifying any or all such patent rights.

Recipients of this document are requested to submit, with their comments, notification of any relevant patent claims or other intellectual property rights of which they may be aware that might be infringed by any implementation of the standard set forth in this document, and to provide supporting documentation.

4. References

5. Terms and definitions

For the purposes of this report, the definitions specified in Clause 4 of the OWS Common Implementation Standard OGC 06-121r9 shall apply. In addition, the following terms and definitions apply.

● container

a software package that contains everything needed to run a program or application; this includes the executable program as well as system tools, libraries, and settings

● convolutional neural network (CNN)

a class of deep neural networks commonly applied to analyzing visual imagery

● deployment

a trained model available for execution behind a standardized API; in OGC this standardized API is defined by the OGC API - Processes specification with extensions.

● Docker

a software platform for building applications based on containers

● model inference

refers to the process of taking a machine learning model that has already been trained and using that trained model to make useful predictions based on new data

● trained model

providing a machine learning algorithm with known data from which it can learn; the model artifact created by the training process is called a trained model

6. Abbreviated terms

  • ADES - Application Deployment and Execution System

  • AI - Artificial Intelligence

  • API - Application Programming Interface

  • ARD - Analysis Ready Data

  • CNN - Convolutional Neural Networks

  • CRIM - Computer Research Institute of Montréal

  • CRS - Coordinate Reference System

  • CSW - Catalogue Service for the Web

  • CWL - Common Workflow Language

  • DL - Deep Learning

  • ER - Engineering Report

  • EMS - Execution Management System

  • HTTP - Hypertext Transfer Protocol

  • JSON - JavaScript Object Notation

  • LiDAR - Light Detection and Ranging

  • MapML - Map Markup Language

  • ML - Machine Learning

  • OGC - Open Geospatial Consortium

  • ONNX - Open Neural Network Exchange Format

  • OWS - OGC Web Services

  • Pub/Sub - Publication/Subscription

  • REST - Representational State Transfer

  • RNN - Recurrent Neural Network

  • SAR - Synthetic Aperture Radar

  • TIE - Technology Integration Experiments

  • URL - Uniform Resource Locator

  • VCS - Version Control Systems

  • WCS - Web Coverage Service

  • WES - Web Enterprise Suite

  • WFS - Web Feature Service

  • WMS - Web Map Service

  • WPS - Web Processing Service

  • WPS-T - Transactional Web Processing Service

7. Overview

The "Scenario" section provides a detailed description of the scenario and use cases used to drive the participants' implementations.

The "Infrastructure overview" section provides background information about ML and Deep Learning (DL) as well as some of the underlying OGC technologies such as the Application Deployment and Execution Service (ADES) which is an emerging technology developed in OGC Testbeds 13, 14 and 15.

The "Training, deployment and execution of machine learning models" section describes how machine learning models were deployed via existing and emerging OGC APIs in the ML task. Specifically the clause describes how ML models are packaged into containers and deployed via an ADES.

The "Visualization of ML Results" section describes how MapML was used to visualize and interact with geospatial information within a web browser.

The "Research questions" section attempt to answer research-questions originally expressed in the OGC Testbed-16: Call for Participation (CFP) based on the experiences of the thread participants.

The "Issues" section provides a summary of the issues discussed during the Testbed.

8. Scenario

8.1. Overview

The Machine Learning task scenario addresses two phases of wildland fire management,

  • Wildland Fire Planning

  • Wildland Fire Response.

For both scenarios, various steps of training and analysis data integration, processing and visualization were performed as outlined below. The scenarios serve the purpose of guiding the activity through the various steps in the two phases of wildland fire planning and response. The scenarios help to ground all work in a real-world situation.

The Wildland fire planning scenario includes the following major steps:

  1. Investigate the application of different ML frameworks (e.g. Mapbox RoboSat, Azavea’s Raster Vision, GeoDeepLearning) to multiple types of remotely sensed information such as synthetic aperture radar, optical satellite imagery, and LiDAR. Access to these data sources was provided through OGC standards. The focus was to identify fuel availability within targeted forest regions.

  2. Explore interoperability challenges related to ML training data. Develop solutions that allow the wildland fire training data, test data, and validation data be structured, described, generated, discovered, accessed, and curated within data infrastructures.

  3. Explore the interoperability and reusability of trained ML models to determine potential for applications using different types of geospatial information. Interoperability, reusability and discoverability are essential elements for cost-efficient ML. The structure and content of the trained ML models have to provide information about its purpose. Questions such as: “What is the ML model trained to do?” or “What data was the model trained on?” or “Where is the model applicable?” need to be answered sufficiently in order to provide guidance on the appropriate use of a model. For example, models trained with data from a specific area that contains a specific features profile (e.g. forested land) may not be appropriate for use in another area with a different features profile (e.g. grassland). Interoperability of training data should be addressed equivalently.

  4. Deep Learning (DL) architectures can use LiDAR to classify field objects (e.g. buildings, low vegetation, etc.). These architectures mainly use the TIFF and ASCII image formats. Other DL architectures use 3D data stored in a raster or voxel form. However, 3D voxels or raster forms may have many approximations that make classification and segmentation vulnerable to errors. Therefore, Testbed-16 participants should apply advanced DL architectures directly to the raw point cloud to classify points and segments of individual items (e.g. trees, etc.). The PointNET architecture for this or propose different approaches should be investigated. If different DL architectures are proposed, an alternative to PointNET could be considered.

  5. Leverage outcomes from the previous steps to predict wildland fire behavior within a given area through ML. Incorporate training of ML using historical fire information and the Canadian Forest Fire Danger Rating system (fire weather index, fire behavior prediction) leveraging weather, elevation models, fuels.

  6. Using ML to discover and map suitably sized and shaped water bodies for water bombers and helicopters.

  7. Investigate the use of ML to develop smoke forecasts based on weather conditions, elevation models, vegetation/fuel and active fires (size) based on distributed data sources and datacubes using OGC standards.

The Wildland fire response scenario includes the following major steps:

  1. Explore ML methods for identifying active wildland fire locations through analysis of fire information data feeds (e.g. the Canadian Wildland Fire Information System, the United States Geological Survey LANDFIRE system) and aggregation methods. Explore the potential of MapML as an input to the ML process and the usefulness of a structured Web of geospatial data in this context.

  2. Implement ML to identify potential risks to buildings and other infrastructure given identified fire locations. Consider the potential for estimating damage costs.

  3. Investigate how existing standards related to water resources (e.g. WaterML, Common Hydrology Features (CHyF), in conjunction with ML, can be used to locate potential water sources for wildland fire event response.

  4. Develop evacuation and first responder routes based on ML predictions of active fire behavior and real-time conditions (e.g. weather, environmental conditions).

  5. Based on smoke forecasts and suitable water bodies, determine if suitable water bodies are accessible to water bombers and helicopters.

  6. Explore the communication of evacuation and first responder routes, as well as other wildland fire information, through Publication/Subscription (Pub/Sub) messaging.

  7. Examine how ML can be used to identify watersheds/water sources that will be more susceptible to degradation (e.g. flooding, erosion, poor water quality) after a fire has occurred.

  8. Identify how OGC standards and ML may be able to support the goals of the upcoming Canadian WildFireSat mission.

8.2. Components

The following diagram provides an overview of the main work items for this task. The diagram is structured to show the training data at the bottom, existing platforms and corresponding APIs to the left, and Machine Learning models and visualization efforts to the right.

components with questions
Figure 2. Major components and research aspects

The following overarching research questions further helped to guide the work in this task:

  • Does ML require "data interoperability"?

    • Or can ML enable "data interoperability"?

    • How do existing and emerging OGC standards contribute to a data architecture flow towards "data interoperability"?

  • Where do trained datasets go and how can they be re-used?

  • How can we ensure the authenticity of trained datasets?

  • Is it necessary to have analysis ready data (ARD) for ML? Can ML help ARD development?

    • For the purposes of serving the data from OGC API sources such as coverage server the data needs to be orthorectified

    • This is probably true for ML models as well

  • What is the value of datacubes for ML?

  • How do we address interoperability of distributed datacubes maintained by different organizations?

  • What is the potential of MapML in the context of ML?

    • Where does it need to be enhanced?

  • How to discover and run an existing ML model?

9. Infrastructure overview

9.1. Introduction

A primary task of this thread was to explore the use of existing and emerging OGC APIs to enable a processing chain that starts with discovering training data for a specific purpose and ends with a deployed ML model that can be executed and its results visualized using a browser.

The following components diagram shows the interactions of the various ML and OGC components used in the thread: