Published

OGC Engineering Report

OGC Testbed-17: Moving Features ER
Guy Schumann Editor
OGC Engineering Report

Published

Document number:21-036
Document type:OGC Engineering Report
Document subtype:
Document stage:Published
Document language:English

License Agreement

Permission is hereby granted by the Open Geospatial Consortium, (“Licensor”), free of charge and subject to the terms set forth below, to any person obtaining a copy of this Intellectual Property and any associated documentation, to deal in the Intellectual Property without restriction (except as set forth below), including without limitation the rights to implement, use, copy, modify, merge, publish, distribute, and/or sublicense copies of the Intellectual Property, and to permit persons to whom the Intellectual Property is furnished to do so, provided that all copyright notices on the intellectual property are retained intact and that each person to whom the Intellectual Property is furnished agrees to the terms of this Agreement.

If you modify the Intellectual Property, all copies of the modified Intellectual Property must include, in addition to the above copyright notice, a notice that the Intellectual Property includes modifications that have not been approved or adopted by LICENSOR.

THIS LICENSE IS A COPYRIGHT LICENSE ONLY, AND DOES NOT CONVEY ANY RIGHTS UNDER ANY PATENTS THAT MAY BE IN FORCE ANYWHERE IN THE WORLD. THE INTELLECTUAL PROPERTY IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NONINFRINGEMENT OF THIRD PARTY RIGHTS. THE COPYRIGHT HOLDER OR HOLDERS INCLUDED IN THIS NOTICE DO NOT WARRANT THAT THE FUNCTIONS CONTAINED IN THE INTELLECTUAL PROPERTY WILL MEET YOUR REQUIREMENTS OR THAT THE OPERATION OF THE INTELLECTUAL PROPERTY WILL BE UNINTERRUPTED OR ERROR FREE. ANY USE OF THE INTELLECTUAL PROPERTY SHALL BE MADE ENTIRELY AT THE USER’S OWN RISK. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR ANY CONTRIBUTOR OF INTELLECTUAL PROPERTY RIGHTS TO THE INTELLECTUAL PROPERTY BE LIABLE FOR ANY CLAIM, OR ANY DIRECT, SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM ANY ALLEGED INFRINGEMENT OR ANY LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR UNDER ANY OTHER LEGAL THEORY, ARISING OUT OF OR IN CONNECTION WITH THE IMPLEMENTATION, USE, COMMERCIALIZATION OR PERFORMANCE OF THIS INTELLECTUAL PROPERTY.

This license is effective until terminated. You may terminate it at any time by destroying the Intellectual Property together with all copies in any form. The license will also terminate if you fail to comply with any term or condition of this Agreement. Except as provided in the following sentence, no such termination of this license shall require the termination of any third party end-user sublicense to the Intellectual Property which is in force as of the date of notice of such termination. In addition, should the Intellectual Property, or the operation of the Intellectual Property, infringe, or in LICENSOR’s sole opinion be likely to infringe, any patent, copyright, trademark or other right of a third party, you agree that LICENSOR, in its sole discretion, may terminate this license without any compensation or liability to you, your licensees or any other party. You agree upon termination of any kind to destroy or cause to be destroyed the Intellectual Property together with all copies in any form, whether held by you or by any third party.

Except as contained in this notice, the name of LICENSOR or of any other holder of a copyright in all or part of the Intellectual Property shall not be used in advertising or otherwise to promote the sale, use or other dealings in this Intellectual Property without prior written authorization of LICENSOR or such copyright holder. LICENSOR is and shall at all times be the sole entity that may authorize you or any third party to use certification marks, trademarks or other special designations to indicate compliance with any LICENSOR standards or specifications. This Agreement is governed by the laws of the Commonwealth of Massachusetts. The application to this Agreement of the United Nations Convention on Contracts for the International Sale of Goods is hereby expressly excluded. In the event any provision of this Agreement shall be deemed unenforceable, void or invalid, such provision shall be modified so as to make it valid and enforceable, and as so modified the entire Agreement shall remain in full force and effect. No decision, action or inaction by LICENSOR shall be construed to be a waiver of any rights or remedies available to it.

None of the Intellectual Property or underlying information or technology may be downloaded or otherwise exported or reexported in violation of U.S. export laws and regulations. In addition, you are responsible for complying with any local laws in your jurisdiction which may impact your right to import, export or use the Intellectual Property, and you represent that you have complied with any regulations or registration procedures required by applicable law to make this license enforceable.



I.  Abstract

The OGC Testbed-17 Moving Features (MF) task addressed the exchange of moving object detections, shared processing of detections for correlation and analysis, and visualization of moving objects within common operational pictures. This Engineering Report (ER) explores and describes an architecture for collaborative distributed object detection and analysis of multi-source motion imagery, supported by OGC MF standards. The ER presents the proposed architecture, identifies the necessary standards, describes all developed components, reports on the results of all TIE activities, and provides a description of recommended future work items.

II.  Executive Summary

Moving Features play an essential role in many application scenarios. The growing availability of digital motion imagery and advancements in machine learning technology will further accelerate widespread use and deployment of moving feature detection and analysis systems. The OGC Testbed-17 Moving Features task considers these developments by addressing exchange of moving object detections, shared processing of detections for correlation and analysis, and visualization of moving objects within common operational pictures. This OGC Moving Features (MF) Engineering Report (ER) explores and develops an architecture for collaborative distributed object detection and analysis of multi-source motion imagery. The goal is to define a powerful Application Programming Interface (API) for discovery, access, and exchange of moving features and their corresponding tracks and to exercise this API in a near real-time scenario.

An additional goal is to investigate how moving object information can be made accessible through HTML in a web browser using Web Video Map Tracks (WebVMT) as part of the ongoing Web Platform Incubator Community Group (WICG) DataCue activity at W3C. This aims to facilitate access to geotagged media online and leverage web technologies with seamless integration of timed metadata, including spatial data.

In the Testbed-17 Moving Features thread, raw data were provided from drones and stationary cameras that push raw video frames to the deep learning computer. A deep learning computer model detected moving features (school buses in the scenario employed) using a pre-trained model in each frame and then built tracklets one by one using a prediction and estimation algorithm for consecutive frames. The tracklets were then sent to the Ingestion Service. The Storage Service received and returned moving features as JSON objects. The Tracking Service employed object detection and tracking methods to extract the location of moving objects from video frames. The scope of the Machine Analytics Client component was to generate information derived from the tracklets provided by the Tracking Service developing a set of analytics. This included enriching the existing tracks by creating a more precise segmentation of the moving features detected by the Ingestion Service.

Some of the important recommendations for future work include:

II.A.  General Purpose of the MF thread and this Engineering Report

Testbed 16 demonstrated that Motion Imagery derived Video Moving Target Indicators (VMTI) can be extracted from an MPEG-2 Transport Stream file and represented as OGC Moving Features or WebVMT. The work in the Testbed-17 activity formalized an architecture for integrating moving object detections, proposed standards for the required APIs and content encodings, expanded the sources of moving object detection that can be supported, and explored exploitation and enhancement capabilities which would leverage the resulting store of moving features.

The Testbed-17 Call for Participation stated that the architecture shall include the following components:

  1. Detection ingest: This component will ingest data from a moving object detection system, extract detections and partial tracks (tracklets), and export the detections and tracklets as OGC Moving Features.

  2. Tracker: This component ingests detections and tracklets as OGC Moving Features, then correlates them into longer tracks. Those tracks are then exported as OGC Moving Features.

  3. Data Store: Provides persistent storage of the Moving Feature tracks.

  4. Machine Analytics: Software which enriches the existing tracks and/or generates derived information from the tracks.

  5. Human Analytics: Software and tools to help users exploit the Motion Imagery tracks and corresponding detections or correlated tracks. For example, a common operational picture showing both static and dynamic features.

This list of components and their definitions serve as a starting point. Participants in this task were free to modify them as conditions require. This work was demonstrated using a real-time situational awareness scenario. A key objective was to experiment with both subscription models as well as data streams to trigger prompt updates in the analytics components based on Moving Feature behavior.

II.B.  Deliverables and requirements of the MF set components in particular

The following figure illustrates the work items and deliverables of this Testbed-17 MF task.

Figure 1 — Moving Features task work items and deliverables. (Source: OGC Testbed-17 CFP)

Important to note is that Figure 1 shows D138 and D142, as provided in the CFP document, however, these modules were removed before the start of this testbed.

The MF Engineering Report (ER) captures the proposed architecture, identifies the necessary standards, describes all developed components, reports on the results of all TIE activities, provides an executive summary and finally a description of recommended future work items.

In summary, Testbed-17 MF addressed the following components and requirements:

  • D135 Ingestion Service — Software component that ingests data from a moving object detection system, extracts detections and partial tracks (tracklets), and exports the detections and tracklets as OGC Moving Features to the Storage Service via an interface conforming to OGC API — Moving Features. The component provider shall make the data set available that has been used for object detection to other participants in this task. If no source data is found for the final use cases, OGC and sponsors will help finding appropriate video material. The component can be implemented as a microservice or client.

  • D136 Ingestion Service — component similar to D135.

  • D137 Tracking Service — Service component that correlates detections and tracklets into longer tracks. Those tracks are then exported as OGC Moving Features to the Storage Service via an interface conforming to the draft OGC API — Moving Features specification. In addition, the service shall expose the interface conforming to OGC API — Moving Features to allow other software components to discover and access tracks directly. The Tracking Service can work on its own detection system, but shall access detections and tracklets from the Storage Service. Ideally, the service supports subscriptions.

  • D139 Machine Analytics Client — Client component that provides OGC Moving Feature analytics and annotation. The client shall enrich existing tracks and/or generate derived information from the tracks. The software shall demonstrate the value added of multi-source track data. Enriched OGC Moving Features shall be stored in the Storage Service. In contrast to the Client D140, this client focuses on the analytics. It accesses external or uses internally available additional data sources, e.g. road and hiking path network data, to annotate detected moving objects in the scenarios.

  • D140 — Human Analytics Client — Client software and tools to help users exploit the multisource track data. For example, a common operational picture showing both static and dynamic features. In contrast to the Machine Analytics Client, focus is here on graphical representation of OGC Moving Features, detected and annotated from multiple source systems, in a common operational picture.

  • D141 Storage Service — Service component that stores OGC Moving Features. The service exposes the interface conforming to OGC API — Moving Features to discover, access, and upload OGC Moving Feature resources. The storage service shall have the potential to serve tracks in near real time.

The figure below shows in diagram form the architecture linking the different components of the Moving Features (MF) task.

Figure 2 — Testbed-17 Moving Features preliminary architecture workflow. (Source: OGC Testbed-17 MF participants)

III.  Keywords

The following are keywords to be used by search engines and document catalogues.

ogcdoc, OGC document, Moving Features


IV.  Preface

Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. The Open Geospatial Consortium shall not be held responsible for identifying any or all such patent rights.

Recipients of this document are requested to submit, with their comments, notification of any relevant patent claims or other intellectual property rights of which they may be aware that might be infringed by any implementation of the standard set forth in this document, and to provide supporting documentation.

V.  Security considerations

No security considerations have been made for this document.

VI.  Submitting Organizations

The following organizations submitted this Document to the Open Geospatial Consortium (OGC):

VII.  Submitters

All questions regarding this document should be directed to the editor or the contributors:

Name Organization Role
Guy Schumann RSS Hydro Editor
Alex Robin Botts Innovative Research Contributor
Martin Desruisseaux GEOMATYS Contributor
Andrea Cavallini RHEA Group Contributor
Rob Smith Away Team Contributor
Dean Younge Compusult Contributor
Sepehr Honarparvar University of Calgary Contributor
Steve Liang University of Calgary Contributor
Sizhe Wang ASU Contributor
Chuck Heazel Heazel Technologies Contributor
Brad Miller Compusult Contributor

OGC Testbed-17: Moving Features ER

1.  Scope

This ER represents deliverable D020 of the OGC Testbed-17 Moving Features task. A ‘feature’ is defined as an abstraction of real world phenomena [ISO 19109:2015] whereas a “moving feature” is defined as a representation, using a local origin and local ordinate vectors, of a geometric object at a given reference time [adapted from ISO 19141:2008]. In the context of this ER, the geometric object represents a feature.

This ER aims to demonstrate the business value of moving features that play an essential role in many application scenarios.

The value of this Engineering Report is to improve interoperability, advance location-based technologies and help realize innovations in the context of moving features.

Note that this ER (OGC 21-036) is a stand-alone document and there is thus some considerable overlap with the T17 D021 OGC API Moving Features ER (OGC 21-028).

1.1.  Terms and definitions

Moving feature

A representation, using a local origin and local ordinate vectors, of a geometric object at a given reference time (ISO 19141:2008). In the context of this ER, the geometric object is a feature, which is an abstraction of real world phenomena (ISO 19109:2015).

Tracking

Monitoring and reporting the location of a moving object (adapted from ISO 19133:2005).

Tracklet

A fragment of the track followed by a moving object.

Trajectory

Path of a moving point described by a one parameter set of points (ISO 19141:2008).

Trajectory mining

The study of the trajectories of moving objects in order to find interesting characteristics, detect anomalies and discover spatial and spatiotemporal patterns among them.

1.2.  Abbreviated terms

API

Application Programming Interface

MF

Moving Feature(s)

MISB

Motion Imagery Standards Board

ML

Machine Learning

MPEG

Moving Picture Experts Group

MSE

Mean Squared Error

VMTI

Video Moving Target Indicator

WebVMT

Web Video Map Tracks

WICG

Web Platform Incubator Community Group

WMS

Web Map Service

W3C

World Wide Web Consortium

2.  Overview

This engineering report represents deliverable D016 of the OGC Testbed 17 performed under the OGC Innovation Program.

Chapter 1 introduces the scope of the subject matter of this Testbed 17 OGC Engineering Report.

Chapter 2 provides an executive summary of the Testbed-17 MF activity.

Chapter 3 provides a short overview description of each chapter (this chapter).

Chapter 4 provides a short introduction to the Testbed-17 MF activity.

Chapter 5 provides an overview of the requirements and scenario.

Chapter 6 illustrates the flow of work items.

Chapters 7 to 13 contain the main technical details and work activity description of this ER. This section provides a high-level outline of the use cases, followed by an in-depth description of the work performed and the challenges encountered, raising issues and discussing possible solutions.

Chapter 14 summarizes the MF TIE tracking.

Chapter 15 summarizes recommendations and suggests top-priority items for future work.

Annex A includes an informative revision history table of changes made to this document.

Bibliography

3.  Introduction

The following are the topics identified during the recent OGC testbeds that were evaluated and described in the Testbed-17 initiative:

There are a number of ways that systems detect and report on moving objects. These systems exist in “stovepipes of excellence”. As a result, users of these systems do not have access to information generated through other means. The ability to combine multiple sources of moving object data would greatly improve the quality of the data and the analytics which could be applied.

The overall aim is to identify an architecture framework and corresponding standards which will allow multiple sources of moving object detections to be integrated into a common analytic environment.

In this context, Testbed 16 explored technologies to transform detections of moving objects reported using motion imagery standards (e.g. MISB Std. 0903) into the model and encoding defined in the OGC Moving Features Standard (OGC 18-075). That work suggests a notional workflow:

  1. Extract moving object detections from the motion imagery stream

  2. Encode the detections as moving features

  3. Correlate the detection of moving features into track moving features

  4. Perform analytics to enrich and exploit the tracks of moving features

This work is documented in the Testbed-16 Full Motion Video to Moving Features Engineering Report (OGC 20-036).

The OGC Moving Features Standards Working Group (SWG) has added a new work activity for defining an OGC-MF API. The participants watched this process closely to ensure both activities are aligned properly. In any case, Testbed-17 participants worked closely with the SWG and coordinated all efforts.

4.  Requirements, Scenarios and Architecture

This chapter identifies the requirements and lays out the architecture framework as well as the scenario.

4.1.  Requirements

Testbed 16 demonstrated that Motion Imagery derived Video Moving Target Indicators (VMTI) can be extracted from an MPEG-2 motion imagery stream and represented as OGC Moving Features or Web Video Map Tracks (WebVMT).

The work performed in the TB-17 MF task formalized an architecture for integrating moving object detections, proposed standards for the required APIs and content encodings, expanded the sources of moving object detection that can be supported, and explored exploitation and enhancement capabilities which would leverage the resulting store of moving features.

The architecture includes the following components :

  1. Detection ingest: This component will ingest data from a moving object detection system, extract detections and partial tracks (tracklets), and export the detections and tracklets as OGC Moving Features.

  2. Tracker: This component ingests detections and tracklets as OGC Moving Features, then correlates them into longer tracks. Those tracks are then exported as OGC Moving Features

  3. Data Store: provides persistent storage of the Moving Feature tracks.

  4. Machine Analytics: software which enriches the existing tracks and/or generates derived information from the tracks

  5. Human Analytics: software and tools to help users exploit the Motion Imagery tracks and corresponding detections or correlated tracks. For example, a common operational picture showing both static and dynamic features.

This work was demonstrated using a real-time situational awareness scenario.

4.2.  Scenario use case

This ER describes the detection and tracking of moving buses in front of a school. The video was acquired by a light-weight Unmanned Aerial Vehicle (UAV) or drone, courtesy of University of Calgary.

A separate autonomous vehicle use case was analyzed to detect and track people and vehicles moving nearby with WebVMT. Video and lidar data were captured from a moving StreetDrone vehicle and provided courtesy of Ordnance Survey UK.

5.  Flow of work items

The diagram below is meant to describe the flow from start to finish and it includes all modules covered in Testbed-17.

Figure 3 — Flow of modules. (Source: Testbed-17 MF participants)

6.  Ingestion Service (University of Calgary)

6.1.  Introduction

The Ingestion service receives raw data from sensors or edge computers and converts it into Observations (See SensorThing’s specification for details). The service then posts these observations to the Storage service where it is interpreted and stored as Features (See Feature specification for details). There are three components to an Ingestion service; the receiver, the convertor and the sender. The receiver reads in the raw data. The convertor parses the data and turns it into SensorThing’s Observations. Finally, the sender component publishes the Observations to the Storage service via the MQTT protocol.

6.2.  Ingestion process architecture

As the developed ingestion service ingests tracklets from detected objects in video frames, the following architecture was designed to handle the ingestion tasks.

Figure 4 — Ingestion process architecture

In Testbed-17, raw data providers were drones and stationary cameras that push raw video frames to the deep learning computer. The deep learning computer detects moving features (buses) using a pre-trained model in each frame and then builds tracklets one by one using a prediction and estimation algorithm for consecutive frames. The result of this procedure is the object bounding box (bbox), class, track_id, color, and the detection time. Then these tracklets are sent to the ingestion service using http POST. The ingestion service was implemented on an AWS EC2 service with Ubuntu server x64 OS. The flask web framework Nginx, was used as the web server, and Guncorn used as the web server gateway to handle the ingestion service. The EC2 service takes the data in the format that is mentioned in the Input section . Ingestion service includes a camera registration module which lets users register cameras with their metadata in the ingestion service. To register a camera, a route in the service based on the following format was created:

http://52.26.17.1:5000/register_cam

To register a camera the following payload should be posted.

{
    "id":"name of the camera",
    "cam_location":[longitude,latitude],
    "image_coords":[[x0, y0], [x1, y1], [x2, y2],...,[xn, yn]],
    "ground_coords":[[longitude0, latitude0],
                        [longitude1, latitude1],
                        [longitude2, latitude2],
                        [longitude3, latitude3],...,[longitude_n, latitude_n]]
}

The camera metadata is stored in a MongoDB (nosql) database which can be used for transformation of image coordinates to geographic coordinates. Also, the ingestion service uses this data to let the storage service know what the source of the observations is. Based on the Testbed-17 Moving Feature architecture, cameras are registered as Things in the SensorThings API (STA) model. The names of Thing instances are used to let other services, such as machine analytics, access the raw or process video streams.

After the camera registration in the ingestion service, tracklets are posted to the relevant camera route in the ingestion service. To do so, the following route should be used.

http://52.26.17.1:5000/ingestion/Camera_name

The payload input format of this POST request is mentioned in the Input section.

NOTE  As the other ingestion service does not have a direct access to the tracklet results, the developed ingestion service provides the transformed tracklets to the other ingestion service.

After receiving the tracklets from the deep learning computer, based on the camera that records the video, coordinates are transformed into longitude and latitude . For the transformation, image and the corresponding ground control points were employed. The details are explained in the Transformation section.

Finally, tracklets are enriched with the geographic coordinates and are published in the STA observation format (which is discussed in the Output section) in MQTT payloads. The storage service endpoint details for receiving the tracklets from ingestion services is:

{
    "broker": "tb17.geomatys.com",
    "port": 30170,
    "topic": "/Observations",
    "Datastream": 1
}

6.3.  Transformation

The final task was to transform objects’ locations from the video space into geographic space. To do so, a homomorphic model which is similar to 2D projective transformation, was used. Homography includes 8 main variables in a 3×3 transformation matrix. So, to resolve the transformation, at least 4 GCP (Ground Control Point) are required. This approach is widely used for transforming objects from one planar space to another planar space. [3] In the following Figure, one of the plains can be considered as the image frame and the other one would be the ground plane in the geography space.

Figure 5 — Homography transformation (source: http://man.hubwiz.com)

6.4.  Input

Sensors should push tracklets data into the ingestion service while the tracklet payload must have bbox, time, and trackid as attributes. Also, the ingestion service can process bbox and color of objects as optional attributes. The input format of the ingestion service:

{
    "class":"bus",
    "track_id":23,
    "bbox":[23,12,34,51],
    "time":"2021-05-13T00:09:18Z",
    "color":[255,255,0]
}

definitions:

class: The moving object type which can be car, bus, bicycle, person, …​

track_id: The unique id of the tracked moving object

bbox: Bounding box of objects in the camera coordinate system. [upper-left x, upper-left y, width, height]

time: The phenomenon time or the time that object has been detected on video frames

color: The mean RGB color code of the detected object

6.5.  Functions

The ingestion service uses a NoSQL database to register cameras. They are stored in the camera table as json objects. These objects include coordinates transformation parameters as well as camera metadata. This information is used in the conversion part of the ingestion service when image coordinates are received by the ingestion service. The output of this conversion is the geographic coordinate of the moving object. The following shows the payload for registering cameras in the ingestion service.

{
    "id":"GoProTestbed17",
    "cam_location":[ -114.16064143180847, 51.085716521902036],
    "image_coords":[[629, 881], [1201, 695], [855, 604],[1808, 572]],
    "ground_coords":[[-114.160505, 51.085698],
                    [-114.160515, 51.085604],
                    [-114.160166, 51.085476],
                    [-114.160769, 51.085106]]
}

After points transformation, the STA function converts the received data to the STA format and sends it to the storage service endpoint in MQTT format. The following shows an example of the output of the ingestion service to the storage service.

{
    "phenomenonTime": "2021-07-27T04:03:03Z",
    "resultTime": "2021-07-27T04:03:03Z",
    "result": 48,
    "FeaturesOfInterest": {
        "name": "48,-114.16081989038098,51.085107555986895",
        "description": "BusMovingObject",
        "encodingType": "application/vnd.geo+json",
        "feature": {
            "type": "Feature",
            "properties": {
                "image_bbox": [
                    944,
                    765,
                    123,
                    59
                ],
                "image_color": [
                    42,
                    38,
                    41
                ],
                "class": "bus"
            },
            "geometry": {
                "type": "Point",
                "coordinates": [
                    -114.16081989038098,
                    51.085107555986895
                ]
            }
        }
    },
    "Datastream": {
        "@iot.id": 60987
    }
}

The other module is the Compusult Component which feeds the D135 ingestion service.

6.6.  Data

To show the capabilities of the ingestion service, two videos were recorded from two different points of views. One of them was recorded from a drone’s point of view and the other one was recorded from a fixed GoPro camera. The first camera points to the bus station and the second camera points to the street which ends at the bus station. Using the methods which has been described in the next chapter, objects are detected and tracked. Then the detected objects are sent to the ingestion service, as soon as they are observed in a frame, using the HTTP POST method. Figure 1 illustrates a frame of detected objects by GoPro camera and Figure 2 shows a snapshot of a bus station recorded by the drone. These two videos are almost synced.