Open Geospatial Consortium

Submission Date: 2023-09-28

Approval Date: 2023-09-28

Publication Date: 2024-01-29

External identifier of this OGC® document: http://www.opengis.net/doc/dp/mobility-data-science

Internal reference number of this OGC® document: 23-056

Category: OGC® Discussion Paper

Editors: Song WU and Mahmoud SAKR

Mobility Data Science Discussion Paper

Copyright notice

Copyright © 2024 Open Geospatial Consortium

To obtain additional rights of use, visit http://www.opengeospatial.org/legal/

License Agreement

Permission for use/distribution of this document and any associated materials is subject to the terms of this License Agreement: https://www.ogc.org/license

Warning

This document is not an OGC Standard. This document is an OGC Discussion Paper and is therefore not an official position of the OGC membership. It is distributed for review and comment. It is subject to change without notice and may not be referred to as an OGC Standard. Further, an OGC Discussion Paper should not be referenced as required or mandatory technology in procurements.

Document type: OGC® Discussion Paper

Document subtype:

Document stage: Approved

Document language: English

1. Mobility Data Science Summit

1.1. Summit Organizers

Table 1. List of summit organizers
Name Affiliation

Mahmoud Sakr

Université Libre de Bruxelles

Nobunobuhiro Ishimaru

Hitachi

Kyoung-Sook Kim

National Institute of Advanced Industrial Science and Technology (AIST)

Scott Simmons

OGC

1.2. Contributors

Table 2. List of document contributors
Name Affiliation

Martin Desruisseaux

Geomatys

Cheng Fu

University of Zurich

Anita Graser

Austrian Institute of Technology

Charles Heazel

WiSC Enterprises

Pin Kung

Sky eyes GPS technologies

Johannes Lauer

HERE

Steve Liang

University of Calgary

Chris Little

UK Met Office

Mohamed Mokbel

University of Minnesota

George Percival

GeoRoundtable

Alex Ramage

Scottish Government / Transport Scotland

Rob Smith

Away Team Software

Stan Tillman

Hexagon AB

Esteban Zimanyi

Université Libre de Bruxelles

2. Overview

Almost every activity in our modern life leaves a digital trace, typically including location and time. Either captured by a sensor, manually input, or extracted from a social media post, the increase in the volume, variety, and velocity of spatiotemporal data is unprecedented. The ability to manage and analyze this data is important for many application domains, including smart cities, health, transportation, agriculture, sports, biodiversity, et cetera. It is critical to not only effectively manage and analyze the data but also to uphold privacy and ethical considerations. Since the civilian use of GPS was allowed in 1980, followed by the technological advances in other location tracking systems – wifi, RFID, bluetooth, etc., it is becoming more and more easy to track moving objects. The Mobility Data Science Summit was an opportunity to discuss the challenges of managing this data and making sense of it, with a focus on the tooling and standardization requirements.

MDS
Figure 1. Mobility Data Science Overview

Data science is commonly known as the pipeline of methods and tools from data acquisitions, until the delivery of useful insights, going through data cleaning, integration, management, and analysis. Many tools exist for helping data scientists in every step in this pipeline. Yet mobility data has its own characteristics that cannot be handled by common data science tools. Mobility data is typically available in the form of sequences of location points with time stamps that are generated by location tracking devices. So the is both multidimensional and time series, a structure that requires special data science tools and methods.

OGC has proactively envisioned the need for specialized data models and exchange formats, and formed working groups including moving features SWG and Temporal DWG. It is also natural that the temporal concepts found their way to the work of other working groups, such as GeoPose. This summit aimed to synchronize across working groups, and to align the concepts.

3. What is special about ‘Mobility’ when it comes to Data Science?

It is not surprising that mobility data can find important applications in a broad range of domains such as maritime, public transport, and logistics. In some sense, it can have a fundamental impact in many aspects of real-life.

Different from other types of data, such as spatial data and time series data, mobility data has several challenging characteristics.

  1. Dynamic: mobility data records the evolving/changing properties of moving objects over time. One characteristic of mobility data is that once it has been collected and stored, it is very difficult or impossible to update or correct, i.e., a good practice is to only append new information to the original. However, most existing data infrastructures and formats were designed for static attributes, making them inapplicable for handling dynamic data, e.g., dynamic metadata in the map, which may require clever use of attributes in OpenStreetMap.

  2. Diverse: because mobility data can be collected using various devices (e.g., GPS, Bluetooth, RFID) and sampling strategies, availability of mobility data and scale/frequency of collection varies considerably across different datasets. Therefore, analysis methods and tools may not be transferable across multiple types of data. Also, precision in mobility data can vary, so users need to consider scale and precision with respect to the science being explored, e.g., movement of people, wildlife tracking, agriculture, et cetera.

  3. Heterogeneous: besides the common form of mobility data as a sequence of time-stamped points, other forms of mobility data exist including, but not limited to: discrete check-in data (e.g., geo-tagged tweet posts, ticketing/accounting data, and taxi pick-up/drop-off data), origin-destination OD flow data, schedules and realtime operations of public transport in the form of GTFS and GTFS-realtime, et cetera. For example, some public transport companies in Brussels are interested in developing a common ticketing scheme that can support multi modal transportation. However, such data usually do not have accurate coordinates/locations attached, making their analysis and modeling more complex. For example, interpolation on such data may not make sense at all, and the lack of continuous tracking of moving objects makes it hard or even impossible to do some aggregation analysis, such as centroid of movement or average of some scalar properties like speed, et cetera. A larger discussion to define which data sources and which methods should be included under the umbrella of mobility data science would be necessary, but delving into this is beyond the scope of the current paper.

  4. Viewpoint: One unique aspect of mobility data is whether the data has an “Eulerian” or “Lagrangian” viewpoint. This differentiates, for instance, between the moving train carriage observed from the station platform, or the station platform observed from the carriage. Autonomous vehicle sensor systems may take either approach.

Mobility Data Computation Stack
Figure 2. Mobility Data Computation Stack

As shown in Figure 2 Mobility Data Computation Stack, mobility data science brings requirements to several layers, from the low level of computation infrastructure to the high level of data modeling and various tasks.

Computation Infrastructure determines where and how mobility data can be generated, collated, derived, swapped, and archived.

  1. Cloud Services: Cloud Services are widely deployed in recent years due to many advantages such as easy scalability and high availability. There are many cloud services on the market, not only the large providers like Azure and AWS. Many cloud platforms have their unique value for specialized use. However, most existing cloud services target general data. As a result, is it possible to build a mobility data cloud using some basic modules provided by these available cloud services now? To support mobility data science, the power is actually in coupling and connecting these cloud services (linked and available data) and cloud connectivity is required, because probably no one cloud fits everything. However, it remains a challenge to connect different cloud services to work together, because the mobility data from multiple clouds may have different semantics and were not necessarily planned to work together, so only a big data lake is not enough. Another issue is that cloud providers may offer everything that we need, but not necessarily organized or assembled in a fashion that is immediately useful. Also, it is relatively easier to collect data than use/analyze data, so a lot of information can be found on the cloud now, but the question remains on how much is made accessible and usable.

  2. Edge Computing/IoT: Mobility data may go through several places from where it is generated to where it is used, e.g., from vehicle to vehicle edge to road edge to road network to cloud. Those places can differ greatly in terms of their computational power, and it is worthwhile to investigate which kinds of tasks are better put (assigned) at (to) which kinds of places, and what encoding formats are more suitable for those resource-constrained devices.

  3. 5G offers unprecedented ultra-low latency: This latency enables the capturing of features that require very high sampling frequency, such as orientation, which is relatively new compared to typical features like longitude-latitude. Another implication of 5G is that it is not about having to do tasks at the edge side or the cloud side, but it allows seamlessly moving data/computations around between edge and cloud. So coupled with the mobility data characteristics, these computation infrastructure call for innovative data modeling techniques and encoding formats such that mobility data can flow through various computational contexts/environments smoothly.

Next, mobility data tasks face more challenges than other types of data.

  1. Fusion of Sensors (data aggregation/integration): Nowadays, moving objects are equipped with an array of sensors, and it is often the case that trajectories need to be built based on multiple sensor inputs. For example, cars are now full of sensors, creating a "data ocean”. Such fusion of multiple inputs allows analysis of heterogeneous data of independent moving features, thus enabling interesting applications such as autonomous robot navigation. However, the following issues still need to be solved:

    1. varying levels of access to data causes problems with aggregation;

    2. different sources were not necessarily planned to work together;

    3. friction on aggregation is very high due to different data models and semantics;

    4. although there has been considerable work on trajectories, aggregation of trajectory data may lead to better-fitting use cases;

    5. for time-critical situations where every second matters, such as fire rescue and disaster risk management, how to efficiently aggregate multiple sensor inputs and stitch all kinds of dynamic data in real-time to support timely decision making is still a challenging problem; and

    6. how to align sources with different spatial and temporal resolution.

  2. Data Sharing: Some communities have drivers to ensure data sharing: ocean science and arctic studies - the date is so difficult to collect that researchers have to share information. Some communities share voluntarily very well, such as cyclists. Although most people agree that mobility data sharing can lead to improvements in society, there is still reluctance to share. Barriers include privacy, loss of competitive advantage, lack of cloud solutions, regulatory compliance, fear of losing control, interoperability issues, etc. Moreover, business models can be one way - sharing into a system without corresponding sharing outward.

  3. Visualization: Visualization is a good way for people to explore mobility data. However, when dealing with massive amounts of mobility data, just plotting everything makes a big mess. So a common practice is to use GIS software to visualize aggregated results. In such cases it becomes necessary to do data aggregation and produce visual summaries that make sense for moving data. For example, mobility data can be represented as density maps and grid-based description of values and trends through "prototypes," e.g., showing density of objects moving north in a grid. In the geography community, Discrete Global Grid Systems (DGGS) is shown to be valuable, and grids are very useful in doing grid-based analysis on scalable computing clusters, particularly equal-area grids.

  4. Mobility data science as a service: Mobility data science can be provided as a service, which can provide rich functionalities to help users better understand and utilize mobility data. Typical statistics are not enough for this purpose and users will be more interested to ask mobility questions, such as “Are there any times when two cars come close within 100 meters?” So how to express such requests in an API needs to be investigated, and the emerging OGC APIs and SQL may serve as a basis for such service interfaces.

In terms of general mobility data challenges, one special concern is privacy issues. Privacy can block analysis or can enable better analysis by using more restrictive data. This issue concerns not only humans, but also some commercial and endangered animals, which may also have security concerns. For example, cows can be tracked to provide business-sensitive analyses. For humans, due to the highly predictable nature of human behaviors, even small pieces of mobility data can lead to the leakage of identity information. So it is important to find the balance between utility of data and privacy preservation. Unfortunately, most privacy preservation methods restrict analysis ability and de-anonymization is always a concern. Notably, in some cases, the privacy issue may not exist when the question of interest can be answered at a mass level and data analysis does not need to focus on individuals.

Last but not least, evaluation of mobility data needs more effort in the future. Due to the particularities of mobility data, people need better characterization of data quality and more means to assess data quality, so that people can know whether the datasets at hand are suitable for the target analysis. Also, interoperability is an important aspect of, for example, integration of mobility data from different systems requires that those systems can talk to each other and understand each other’s data semantics.

The reader is referred to additional community publications that elucidate the differentiation between mobility data science and general data science, and could thus complement the discussion in this section [1-3].

4. What is the state of technology and tools ?

Currently, there are not enough common tools for mobility data science, because both mobility datasets and use-cases are so diverse. Existing analysis methods and tools are often not transferable across multiple types of data. This lack of widely-used tools is slowing down the community effort towards collaboratively building a mobility data science eco-system and tool-box. In terms of handling massive datasets, the existing big data tools are designed for general purposes and limited in ability to specifically handle mobility data, as a result, mobility data is not the first-class citizen in these tools.

Then a natural question is: which kind of tools are expected for mobility data science? Well, a first requirement is the capability of rapid processing of large mobility datasets, and the critical point is to make proper analysis using data reduction. For example, just plotting everything for visualization makes a big mess, so it is important to design visual summaries that make sense for visualization. A recent work reflecting this point is presented in [4], where the work models movement locations, directions, and speeds using “prototypes,” and supports exploration and anomaly detection.

Another example is Mapbox vector tiles, which can carry time information such that the returned tiles are temporal-aware, instead of the spatial-only ones served by traditional tile servers. To better fulfill the requirement of mobility data, the generation strategy of vector tiles can also take into account factors such as zoom level, viewport, and the amount of data being processed. Then secondly, consider the question “what can be done with the existing OGC Standards to enable richer queries and analysis?” For example, it is not enough to just answer questions based on a single trajectory, but we also need to think about use cases that go beyond a single trajectory to a group of trajectories. Afterwards, when some widely-used tools appear, an attempt can be made to structure/classify mobility datasets to derive some metadata that can help define use cases and give guidelines for certain types of analysis (Figure 3).

Evaluation of dataset suitability
Figure 3. Evaluation of dataset suitability

During the summit, invited speakers brought examples of their work about creating tools for mobility data science, including QARTA [6], MobilityDB [5], and SensorUp [10].

QARTA is an open source map service featuring high accuracy and scalability. The main motivation behind QARTA is that both researchers and industry practitioners have put much effort into the efficiency of map services, so currently efficiency is no longer a bottleneck. Instead, the accuracy is becoming a bigger concern in such services. For example, even if the most efficient shortest path algorithm is available at hand, the query results would still be as inaccurate as the input map. With the idea that mobility data can be leveraged to boost the accuracy of map services, QARTA includes a Match or Make module (see Figure 4). Given a road network G and trajectory points P, this module will do map matching when G is more accurate than P, and vice versa. This module will perform map making to update G based on P when P is more accurate than G. In summary, QARTA’s success is due to two features: (1) QARTA uses machine learning to build its own highly accurate map, in terms of map topology, and more importantly, in terms of dynamic metadata like edge weights of the road network; and (2) QARTA employs machine learning to calibrate its query answers based on various contextual information. Currently, QARTA has been deployed in all taxis and the third largest food delivery company in the state of Qatar and performed as well, or even better than, commercial map services.

Match or Make
Figure 4. Match or Make (taken from [6])

MobilityDB is an open source geospatial trajectory management and analysis platform, which is built on top of PostgreSQL and PostGIS. With the aim to be a mainstream system for industry use, MobilityDB provides many benefits including:

  1. compact geospatial data storage;

  2. rich mobility analytics;

  3. easy-to-use full SQL interface; and

  4. compliance with OGC Moving Features Standards, et cetera.

To support efficient management of mobility data, MobilityDB implements multiple temporal types, such as tgeogpoint for a temporal geography point and tfloat for dynamic attributes including speed, heading, and so on. Currently, MobilityDB is in active development, and more functionalities will be provided or enhanced.