I. Abstract
The OGC Testbed-17 Moving Features (MF) task addressed the exchange of moving object detections, shared processing of detections for correlation and analysis, and visualization of moving objects within common operational pictures. This Engineering Report (ER) explores and describes an architecture for collaborative distributed object detection and analysis of multi-source motion imagery, supported by OGC MF standards. The ER presents the proposed architecture, identifies the necessary standards, describes all developed components, reports on the results of all TIE activities, and provides a description of recommended future work items.
II. Executive Summary
Moving Features play an essential role in many application scenarios. The growing availability of digital motion imagery and advancements in machine learning technology will further accelerate widespread use and deployment of moving feature detection and analysis systems. The OGC Testbed-17 Moving Features task considers these developments by addressing exchange of moving object detections, shared processing of detections for correlation and analysis, and visualization of moving objects within common operational pictures. This OGC Moving Features (MF) Engineering Report (ER) explores and develops an architecture for collaborative distributed object detection and analysis of multi-source motion imagery. The goal is to define a powerful Application Programming Interface (API) for discovery, access, and exchange of moving features and their corresponding tracks and to exercise this API in a near real-time scenario.
An additional goal is to investigate how moving object information can be made accessible through HTML in a web browser using Web Video Map Tracks (WebVMT) as part of the ongoing Web Platform Incubator Community Group (WICG) DataCue activity at W3C. This aims to facilitate access to geotagged media online and leverage web technologies with seamless integration of timed metadata, including spatial data.
In the Testbed-17 Moving Features thread, raw data were provided from drones and stationary cameras that push raw video frames to the deep learning computer. A deep learning computer model detected moving features (school buses in the scenario employed) using a pre-trained model in each frame and then built tracklets one by one using a prediction and estimation algorithm for consecutive frames. The tracklets were then sent to the Ingestion Service. The Storage Service received and returned moving features as JSON objects. The Tracking Service employed object detection and tracking methods to extract the location of moving objects from video frames. The scope of the Machine Analytics Client component was to generate information derived from the tracklets provided by the Tracking Service developing a set of analytics. This included enriching the existing tracks by creating a more precise segmentation of the moving features detected by the Ingestion Service.
Some of the important recommendations for future work include:
-
Ingestion Service: update Ingestion service to temporarily store observations locally. This will prevent data loss if the Storage service goes offline.
-
Machine analytics client: a seasonality analysis should take as input data conditioned on the season in which they were retrieved in order to find the difference in patterns among the data based on the different seasons with the aim of identifying the effect that the time in which the data was retrieved has on the behavior and the distribution.
-
Autonomous vehicle use case: combining multi-sensor data to improve detection accuracy and cognitive guidance.
II.A. General Purpose of the MF thread and this Engineering Report
Testbed 16 demonstrated that Motion Imagery derived Video Moving Target Indicators (VMTI) can be extracted from an MPEG-2 Transport Stream file and represented as OGC Moving Features or WebVMT. The work in the Testbed-17 activity formalized an architecture for integrating moving object detections, proposed standards for the required APIs and content encodings, expanded the sources of moving object detection that can be supported, and explored exploitation and enhancement capabilities which would leverage the resulting store of moving features.
The Testbed-17 Call for Participation stated that the architecture shall include the following components:
-
Detection ingest: This component will ingest data from a moving object detection system, extract detections and partial tracks (tracklets), and export the detections and tracklets as OGC Moving Features.
-
Tracker: This component ingests detections and tracklets as OGC Moving Features, then correlates them into longer tracks. Those tracks are then exported as OGC Moving Features.
-
Data Store: Provides persistent storage of the Moving Feature tracks.
-
Machine Analytics: Software which enriches the existing tracks and/or generates derived information from the tracks.
-
Human Analytics: Software and tools to help users exploit the Motion Imagery tracks and corresponding detections or correlated tracks. For example, a common operational picture showing both static and dynamic features.
This list of components and their definitions serve as a starting point. Participants in this task were free to modify them as conditions require. This work was demonstrated using a real-time situational awareness scenario. A key objective was to experiment with both subscription models as well as data streams to trigger prompt updates in the analytics components based on Moving Feature behavior.
II.B. Deliverables and requirements of the MF set components in particular
The following figure illustrates the work items and deliverables of this Testbed-17 MF task.
Figure 1 — Moving Features task work items and deliverables. (Source: OGC Testbed-17 CFP)
Important to note is that Figure 1 shows D138 and D142, as provided in the CFP document, however, these modules were removed before the start of this testbed.
The MF Engineering Report (ER) captures the proposed architecture, identifies the necessary standards, describes all developed components, reports on the results of all TIE activities, provides an executive summary and finally a description of recommended future work items.
In summary, Testbed-17 MF addressed the following components and requirements:
-
D135 Ingestion Service — Software component that ingests data from a moving object detection system, extracts detections and partial tracks (tracklets), and exports the detections and tracklets as OGC Moving Features to the Storage Service via an interface conforming to OGC API — Moving Features. The component provider shall make the data set available that has been used for object detection to other participants in this task. If no source data is found for the final use cases, OGC and sponsors will help finding appropriate video material. The component can be implemented as a microservice or client.
-
D136 Ingestion Service — component similar to D135.
-
D137 Tracking Service — Service component that correlates detections and tracklets into longer tracks. Those tracks are then exported as OGC Moving Features to the Storage Service via an interface conforming to the draft OGC API — Moving Features specification. In addition, the service shall expose the interface conforming to OGC API — Moving Features to allow other software components to discover and access tracks directly. The Tracking Service can work on its own detection system, but shall access detections and tracklets from the Storage Service. Ideally, the service supports subscriptions.
-
D139 Machine Analytics Client — Client component that provides OGC Moving Feature analytics and annotation. The client shall enrich existing tracks and/or generate derived information from the tracks. The software shall demonstrate the value added of multi-source track data. Enriched OGC Moving Features shall be stored in the Storage Service. In contrast to the Client D140, this client focuses on the analytics. It accesses external or uses internally available additional data sources, e.g. road and hiking path network data, to annotate detected moving objects in the scenarios.
-
D140 — Human Analytics Client — Client software and tools to help users exploit the multisource track data. For example, a common operational picture showing both static and dynamic features. In contrast to the Machine Analytics Client, focus is here on graphical representation of OGC Moving Features, detected and annotated from multiple source systems, in a common operational picture.
-
D141 Storage Service — Service component that stores OGC Moving Features. The service exposes the interface conforming to OGC API — Moving Features to discover, access, and upload OGC Moving Feature resources. The storage service shall have the potential to serve tracks in near real time.
The figure below shows in diagram form the architecture linking the different components of the Moving Features (MF) task.
Figure 2 — Testbed-17 Moving Features preliminary architecture workflow. (Source: OGC Testbed-17 MF participants)
III. Keywords
The following are keywords to be used by search engines and document catalogues.
ogcdoc, OGC document, Moving Features
IV. Preface
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. The Open Geospatial Consortium shall not be held responsible for identifying any or all such patent rights.
Recipients of this document are requested to submit, with their comments, notification of any relevant patent claims or other intellectual property rights of which they may be aware that might be infringed by any implementation of the standard set forth in this document, and to provide supporting documentation.
V. Security considerations
No security considerations have been made for this document.
VI. Submitting Organizations
The following organizations submitted this Document to the Open Geospatial Consortium (OGC):
- RSS-Hydro Sarl
VII. Submitters
All questions regarding this document should be directed to the editor or the contributors:
Name | Organization | Role |
---|---|---|
Guy Schumann | RSS Hydro | Editor |
Alex Robin | Botts Innovative Research | Contributor |
Martin Desruisseaux | GEOMATYS | Contributor |
Andrea Cavallini | RHEA Group | Contributor |
Rob Smith | Away Team | Contributor |
Dean Younge | Compusult | Contributor |
Sepehr Honarparvar | University of Calgary | Contributor |
Steve Liang | University of Calgary | Contributor |
Sizhe Wang | ASU | Contributor |
Chuck Heazel | Heazel Technologies | Contributor |
Brad Miller | Compusult | Contributor |
OGC Testbed-17: Moving Features ER
1. Scope
This ER represents deliverable D020 of the OGC Testbed-17 Moving Features task. A ‘feature’ is defined as an abstraction of real world phenomena [ISO 19109:2015] whereas a “moving feature” is defined as a representation, using a local origin and local ordinate vectors, of a geometric object at a given reference time [adapted from ISO 19141:2008]. In the context of this ER, the geometric object represents a feature.
This ER aims to demonstrate the business value of moving features that play an essential role in many application scenarios.
The value of this Engineering Report is to improve interoperability, advance location-based technologies and help realize innovations in the context of moving features.
Note that this ER (OGC 21-036) is a stand-alone document and there is thus some considerable overlap with the T17 D021 OGC API Moving Features ER (OGC 21-028).
1.1. Terms and definitions
Moving feature
-
A representation, using a local origin and local ordinate vectors, of a geometric object at a given reference time (ISO 19141:2008). In the context of this ER, the geometric object is a feature, which is an abstraction of real world phenomena (ISO 19109:2015).
Tracking
-
Monitoring and reporting the location of a moving object (adapted from ISO 19133:2005).
Tracklet
-
A fragment of the track followed by a moving object.
Trajectory
-
Path of a moving point described by a one parameter set of points (ISO 19141:2008).
Trajectory mining
-
The study of the trajectories of moving objects in order to find interesting characteristics, detect anomalies and discover spatial and spatiotemporal patterns among them.
1.2. Abbreviated terms
API
-
Application Programming Interface
MF
-
Moving Feature(s)
MISB
-
Motion Imagery Standards Board
ML
-
Machine Learning
MPEG
-
Moving Picture Experts Group
MSE
-
Mean Squared Error
VMTI
-
Video Moving Target Indicator
WebVMT
-
Web Video Map Tracks
WICG
-
Web Platform Incubator Community Group
WMS
-
Web Map Service
W3C
-
World Wide Web Consortium
2. Overview
This engineering report represents deliverable D016 of the OGC Testbed 17 performed under the OGC Innovation Program.
Chapter 1 introduces the scope of the subject matter of this Testbed 17 OGC Engineering Report.
Chapter 2 provides an executive summary of the Testbed-17 MF activity.
Chapter 3 provides a short overview description of each chapter (this chapter).
Chapter 4 provides a short introduction to the Testbed-17 MF activity.
Chapter 5 provides an overview of the requirements and scenario.
Chapter 6 illustrates the flow of work items.
Chapters 7 to 13 contain the main technical details and work activity description of this ER. This section provides a high-level outline of the use cases, followed by an in-depth description of the work performed and the challenges encountered, raising issues and discussing possible solutions.
Chapter 14 summarizes the MF TIE tracking.
Chapter 15 summarizes recommendations and suggests top-priority items for future work.
Annex A includes an informative revision history table of changes made to this document.
Bibliography
3. Introduction
The following are the topics identified during the recent OGC testbeds that were evaluated and described in the Testbed-17 initiative:
There are a number of ways that systems detect and report on moving objects. These systems exist in “stovepipes of excellence”. As a result, users of these systems do not have access to information generated through other means. The ability to combine multiple sources of moving object data would greatly improve the quality of the data and the analytics which could be applied.
The overall aim is to identify an architecture framework and corresponding standards which will allow multiple sources of moving object detections to be integrated into a common analytic environment.
In this context, Testbed 16 explored technologies to transform detections of moving objects reported using motion imagery standards (e.g. MISB Std. 0903) into the model and encoding defined in the OGC Moving Features Standard (OGC 18-075). That work suggests a notional workflow:
-
Extract moving object detections from the motion imagery stream
-
Encode the detections as moving features
-
Correlate the detection of moving features into track moving features
-
Perform analytics to enrich and exploit the tracks of moving features
This work is documented in the Testbed-16 Full Motion Video to Moving Features Engineering Report (OGC 20-036).
The OGC Moving Features Standards Working Group (SWG) has added a new work activity for defining an OGC-MF API. The participants watched this process closely to ensure both activities are aligned properly. In any case, Testbed-17 participants worked closely with the SWG and coordinated all efforts.
4. Requirements, Scenarios and Architecture
This chapter identifies the requirements and lays out the architecture framework as well as the scenario.
4.1. Requirements
Testbed 16 demonstrated that Motion Imagery derived Video Moving Target Indicators (VMTI) can be extracted from an MPEG-2 motion imagery stream and represented as OGC Moving Features or Web Video Map Tracks (WebVMT).
The work performed in the TB-17 MF task formalized an architecture for integrating moving object detections, proposed standards for the required APIs and content encodings, expanded the sources of moving object detection that can be supported, and explored exploitation and enhancement capabilities which would leverage the resulting store of moving features.
The architecture includes the following components :
-
Detection ingest: This component will ingest data from a moving object detection system, extract detections and partial tracks (tracklets), and export the detections and tracklets as OGC Moving Features.
-
Tracker: This component ingests detections and tracklets as OGC Moving Features, then correlates them into longer tracks. Those tracks are then exported as OGC Moving Features
-
Data Store: provides persistent storage of the Moving Feature tracks.
-
Machine Analytics: software which enriches the existing tracks and/or generates derived information from the tracks
-
Human Analytics: software and tools to help users exploit the Motion Imagery tracks and corresponding detections or correlated tracks. For example, a common operational picture showing both static and dynamic features.
This work was demonstrated using a real-time situational awareness scenario.
4.2. Scenario use case
This ER describes the detection and tracking of moving buses in front of a school. The video was acquired by a light-weight Unmanned Aerial Vehicle (UAV) or drone, courtesy of University of Calgary.
A separate autonomous vehicle use case was analyzed to detect and track people and vehicles moving nearby with WebVMT. Video and lidar data were captured from a moving StreetDrone vehicle and provided courtesy of Ordnance Survey UK.
5. Flow of work items
The diagram below is meant to describe the flow from start to finish and it includes all modules covered in Testbed-17.
Figure 3 — Flow of modules. (Source: Testbed-17 MF participants)
6. Ingestion Service (University of Calgary)
6.1. Introduction
The Ingestion service receives raw data from sensors or edge computers and converts it into Observations (See SensorThing’s specification for details). The service then posts these observations to the Storage service where it is interpreted and stored as Features (See Feature specification for details). There are three components to an Ingestion service; the receiver, the convertor and the sender. The receiver reads in the raw data. The convertor parses the data and turns it into SensorThing’s Observations. Finally, the sender component publishes the Observations to the Storage service via the MQTT protocol.
6.2. Ingestion process architecture
As the developed ingestion service ingests tracklets from detected objects in video frames, the following architecture was designed to handle the ingestion tasks.
Figure 4 — Ingestion process architecture
In Testbed-17, raw data providers were drones and stationary cameras that push raw video frames to the deep learning computer. The deep learning computer detects moving features (buses) using a pre-trained model in each frame and then builds tracklets one by one using a prediction and estimation algorithm for consecutive frames. The result of this procedure is the object bounding box (bbox), class, track_id, color, and the detection time. Then these tracklets are sent to the ingestion service using http POST. The ingestion service was implemented on an AWS EC2 service with Ubuntu server x64 OS. The flask web framework Nginx, was used as the web server, and Guncorn used as the web server gateway to handle the ingestion service. The EC2 service takes the data in the format that is mentioned in the Input section . Ingestion service includes a camera registration module which lets users register cameras with their metadata in the ingestion service. To register a camera, a route in the service based on the following format was created:
http://52.26.17.1:5000/register_cam
To register a camera the following payload should be posted.
{
"id":"name of the camera",
"cam_location":[longitude,latitude],
"image_coords":[[x0, y0], [x1, y1], [x2, y2],...,[xn, yn]],
"ground_coords":[[longitude0, latitude0],
[longitude1, latitude1],
[longitude2, latitude2],
[longitude3, latitude3],...,[longitude_n, latitude_n]]
}
The camera metadata is stored in a MongoDB (nosql) database which can be used for transformation of image coordinates to geographic coordinates. Also, the ingestion service uses this data to let the storage service know what the source of the observations is. Based on the Testbed-17 Moving Feature architecture, cameras are registered as Things in the SensorThings API (STA) model. The names of Thing instances are used to let other services, such as machine analytics, access the raw or process video streams.
After the camera registration in the ingestion service, tracklets are posted to the relevant camera route in the ingestion service. To do so, the following route should be used.
http://52.26.17.1:5000/ingestion/Camera_name
The payload input format of this POST request is mentioned in the Input section.
NOTE As the other ingestion service does not have a direct access to the tracklet results, the developed ingestion service provides the transformed tracklets to the other ingestion service.
After receiving the tracklets from the deep learning computer, based on the camera that records the video, coordinates are transformed into longitude and latitude . For the transformation, image and the corresponding ground control points were employed. The details are explained in the Transformation section.
Finally, tracklets are enriched with the geographic coordinates and are published in the STA observation format (which is discussed in the Output section) in MQTT payloads. The storage service endpoint details for receiving the tracklets from ingestion services is:
{
"broker": "tb17.geomatys.com",
"port": 30170,
"topic": "/Observations",
"Datastream": 1
}
6.3. Transformation
The final task was to transform objects’ locations from the video space into geographic space. To do so, a homomorphic model which is similar to 2D projective transformation, was used. Homography includes 8 main variables in a 3×3 transformation matrix. So, to resolve the transformation, at least 4 GCP (Ground Control Point) are required. This approach is widely used for transforming objects from one planar space to another planar space. [3] In the following Figure, one of the plains can be considered as the image frame and the other one would be the ground plane in the geography space.
Figure 5 — Homography transformation (source: http://man.hubwiz.com)
6.4. Input
Sensors should push tracklets data into the ingestion service while the tracklet payload must have bbox, time, and trackid as attributes. Also, the ingestion service can process bbox and color of objects as optional attributes. The input format of the ingestion service:
{
"class":"bus",
"track_id":23,
"bbox":[23,12,34,51],
"time":"2021-05-13T00:09:18Z",
"color":[255,255,0]
}
definitions:
class: The moving object type which can be car, bus, bicycle, person, …
track_id: The unique id of the tracked moving object
bbox: Bounding box of objects in the camera coordinate system. [upper-left x, upper-left y, width, height]
time: The phenomenon time or the time that object has been detected on video frames
color: The mean RGB color code of the detected object
6.5. Functions
The ingestion service uses a NoSQL database to register cameras. They are stored in the camera table as json objects. These objects include coordinates transformation parameters as well as camera metadata. This information is used in the conversion part of the ingestion service when image coordinates are received by the ingestion service. The output of this conversion is the geographic coordinate of the moving object. The following shows the payload for registering cameras in the ingestion service.
{
"id":"GoProTestbed17",
"cam_location":[ -114.16064143180847, 51.085716521902036],
"image_coords":[[629, 881], [1201, 695], [855, 604],[1808, 572]],
"ground_coords":[[-114.160505, 51.085698],
[-114.160515, 51.085604],
[-114.160166, 51.085476],
[-114.160769, 51.085106]]
}
After points transformation, the STA function converts the received data to the STA format and sends it to the storage service endpoint in MQTT format. The following shows an example of the output of the ingestion service to the storage service.
{
"phenomenonTime": "2021-07-27T04:03:03Z",
"resultTime": "2021-07-27T04:03:03Z",
"result": 48,
"FeaturesOfInterest": {
"name": "48,-114.16081989038098,51.085107555986895",
"description": "BusMovingObject",
"encodingType": "application/vnd.geo+json",
"feature": {
"type": "Feature",
"properties": {
"image_bbox": [
944,
765,
123,
59
],
"image_color": [
42,
38,
41
],
"class": "bus"
},
"geometry": {
"type": "Point",
"coordinates": [
-114.16081989038098,
51.085107555986895
]
}
}
},
"Datastream": {
"@iot.id": 60987
}
}
The other module is the Compusult Component which feeds the D135 ingestion service.
6.6. Data
To show the capabilities of the ingestion service, two videos were recorded from two different points of views. One of them was recorded from a drone’s point of view and the other one was recorded from a fixed GoPro camera. The first camera points to the bus station and the second camera points to the street which ends at the bus station. Using the methods which has been described in the next chapter, objects are detected and tracked. Then the detected objects are sent to the ingestion service, as soon as they are observed in a frame, using the HTTP POST method. Figure 1 illustrates a frame of detected objects by GoPro camera and Figure 2 shows a snapshot of a bus station recorded by the drone. These two videos are almost synced.