Abstract
This white paper is a survey of Big Geospatial Data with these main themes:
- Geospatial data is increasing in volume and variety;
- New Big Data computing techniques are being applied to geospatial data;
- Geospatial Big Data techniques benefit many applications; and
- Open standards are needed for interoperability, efficiency, innovation and cost effectiveness.
The main purpose of this White Paper is to identify activities to be undertaken in OGC Programs that advance the Big Data capabilities as applied to geospatial information.
This white paper was developed based on two Location Powers events:
- Location Powers: Big Data, Orlando, September 20th, 2016; and
- Location Powers: Big Linked Data, Delft, March 22nd, 2017.
For information on Location Powers: http://www.locationpowers.net/pastevents/
Keywords
ogcdoc, OGC documents, Big Data, geospatial, location, open, standards, interoperability, cloud computing
Submitters of this document
All questions regarding this white paper should be directed to the editor or the submitters:
Name | Affiliation |
---|---|
George Percivall, editor |
OGC |
Carl Reed |
Carl Reed and Associates |
Ingo Simonis |
OGC |
Josh Lieberman |
Tumbling Walls |
Steven Ramage |
Group on Earth Observations |
1. The Big Data Trend and Geospatial Information
Every second day the human race generates as much data as was generated from the dawn of humanity through the year 2003[1]. Big Data is both a challenge and an opportunity. Big Data is “extensive datasets — primarily in the characteristics of volume, variety, velocity, and/or variability — that require a scalable technology for efficient storage, manipulation, management, and analysis.”[2]
Geospatial data has been Big Data for decades. New tools and technologies are now available to deal with Big Geo Data analytics and visualization. Geospatial information is advancing in all the dimensions of Big Data.
- Volume: The European Space Agency’s Copernicus Missions archive is an ~8 PB archive and growing[3]. DigitalGlobe currently archives 70 PB of satellite imagery[4]. ECMWF currently has 180PB of weather data with plans to be archiving 1 PB/day.
- Variety: NASA distributed more than 3,500 distinct data products in 2015.[5] Geospatial attributes are being connected to data with an increasing diversity of structures and vocabularies.
- Velocity: For urban monitoring in Tokyo, the locations of one million people collected every minute adds up to 1.4 billion records per day[6]
- Veracity: Advances in Big Data processing based on machine learning and deep learning provide great predictive power. Understanding the algorithms and quantifying result uncertainties remains the subject of intense research.
This white paper addresses Big Geo Data in the following sections.
Section 2. Value of Big Geo Data
Applications of geospatial using Big Data techniques are described to show the value of these new capabilities.
Section 3. Use Cases for Big Geo Data
Use cases are presented to demonstrate commonality across applications domains. This commonality allows best practices be defined through common standards and workflows. This helps manage the complexity in applying big data technology based on investments in
Section 4. OGC Big Geo Data Opportunities
Several high priority focus areas for advancing big geo data implementations based on open standards are presented as opportunities for OGC activities.
Section 5. OGC Activities on Big Geo Data
Existing and potential new activities are listed for consideration to be undertaken in OGC Programs and in coordination with external alliances.
2. The Value of Big Geo Data Applications
2.1 Earth Observations
Observations of the Earth support global efforts to understand our shared physical environment. The environmental monitoring and modeling community generates Big Data to better understand Earth systems. High volumes (petabytes), at increasing velocity (distributed worldwide using high performance computing facilities) and variety (of data formats and resolutions) need to be handled and smoothly integrated to deal meet modern challenges such as global food security, effects and mitigation of climate change, or global logistics and infrastructures.
In a keynote presentation to the Location Powers: Big Geo Data workshop, Jibo Sanyal (ORNL) illustrated the value of Earth Observations as Big Earth data for estimating population (Figure 1). High-resolution population distribution data are critical for successfully addressing important issues ranging from socio-environmental research to public health to homeland security. Sanyal’s keynote addressed how such data are of paramount importance for responding to policy topics, such as the UN 2030 Agenda and the sustainable development goals.
(Figure Source: J. Sanyal, ORNL)
Satellite-based Earth Observations were an early driver to the Big Data explosion. Current emphasis on Big Data can be seen in the recent Big Data from Space 2106 conference in Europe and in the Big Earth Data Initiative (BEDI) in the United States. Ground-based Earth Observations, such as in-stream flow monitoring and air particulates monitoring, have traditionally been lower volume outputs than space based sensors but this situation is changing. NOAA’s Big Data Project is engaged with several cloud providers with one of the most innovative being the hosting of data from ground based NEXRAD high-resolution Doppler radar. Non-traditional ground based sensors coming from IoT and Smart City applications will also drive new applications of ground based Earth Observations.
2.2 Resource Management: Precision Agriculture
Due to growing world population and changing climatic norms, the sustainment of crop quality as well as quantity from existing agricultural land is an important challenge today in reducing global food insecurity. The goal of precision agriculture research is to define a decision support system (DSS) for whole farm management with the goal of optimizing returns on inputs while preserving resources[7]. Precision can only improve decision-making and farm management if agriculture farmers have access to the necessary small scale, detailed information to make informed choices. The creation of field or even plant-level information can support farmers to improve their crop production and attract long-term investment. The more comprehensive and up-to-date picture that farmers have about their crops (e.g., through remote sensing and GPS technologies), the better decisions they can make as to where and when to apply seed, how much to fertilize, when to irrigate and so forth. Longer-term records of agricultural processes from precision farming data allow farmers to use their cropland more efficiently, increase crop size and quality, and respond more effectively to climatic challenges such as drought. Precision farming has the potential to make a worthwhile difference in farmers’ income, crop yields, and resilience while mitigating negative environmental impacts farming.[8]
Activities such as GEO GLAM provide regular Earth observations to feed into crop monitoring for early warning and production systems. www.cropmonitor.org The Group on Earth Observations (GEO) Global Agricultural Monitoring (GLAM) flagship follows the GEO data sharing and data management principles. Realizing this potential depends greatly on the cost and difficulty for the farmer of collecting and working with big geospatial standards.
The LEO Horizon2020 (H2020) Project developed software tools that support the whole lifecycle of reuse of EO data and related linked geospatial data. To demonstrate the benefits of linked open EO data and its combination with linked geospatial data to the European economy, a precision farming application was developed (Figure 2).
(Figure Source: Manolis Koubarakis)
Another example of Big Data for Agriculture includes satellite image processing to calculate the available area of arable land. Fritz et al. have shown that land suitable for cultivating biofuel crops has been vastly overestimated. They have reduced the estimate by almost 80 percent and expressed a growing concern about how production of biofuels will impact food security. Based on Big Data analytics, Fritz’s et al. study showed that previous studies had overestimated the amount of arable land and had underestimated the amount of land already being cultivated[9].
Other initiatives are currently exploring the reusability of Big Data concepts, technologies, and architectures across domains to leverage synergy effects. The OGC participates in the research and development project DATABIO, co-funded by the European Commission. DATABIO focuses on the data intensive target sector Data-Driven Bioeconomy. More specifically, DATABIO explores the potential of Big Data integration and analytics in the domains agriculture, forestry, and fishery/aquaculture including taking into account interoperability and sustainability aspects in the heterogeneous European bioeconomy landscape.
DATABIO proposes to deploy a state of the art big data platform on top of existing partners’ infrastructure and solutions, the Big DATABIO Platform. DATABIO features continuous cooperation of experts from end user and technology provider companies, from bioeconomy and technology research institutes, standardization organizations such as OGC, and of other partners, mainly of the public administration sector. A series of pilots allows associated partners and other stakeholders to get actively involved in the project.
2.3 Mobile Location Services
Location-enabled mobile devices are a major source of Big Data. Location data coming from the mobile devices and their associated networks enables many Big Data applications. The Ways Big Geospatial Data Is Driving Analytics In the Real World begins with this observation:
“Amid the flood of data we collect and contend with on a daily basis, geospatial data occupies a unique place. Thanks to the networks of GPS satellites and cell towers and the emerging Internet of Things, we’re able to track and correlate the location of people and objects in very precise ways that were not possible until recently”.
Recent studies of mobile devices identified the predictability of human mobility. A study reported in Science found that “by measuring entropy of individual’s trajectory, we find 93% potential predictability in user mobility” as determined based on a study of ~10 million anonymous mobile phone users. Cardiff University Researchers have shown the effectiveness of detecting real-world events using Twitter based on location detection and disambiguation[10]. The power of location data was highlighted by Sir Martin Sorrell, CEO WPP, during his speech at Mobile World Congress in his comment that “Location targeting is holy grail for marketers.”
Location based contextual awareness is relevant to location based marketing, first responders, urban planners and many other applications. Creating useful local context requires Big Data analytics platforms. Big data processing and high velocity streaming of location-based data creates the richest contextual awareness. Data from many sources including IoT devices, sensor webs, social media and crowd-sourcing are combined with semantically rich urban and indoor spatial data. The resulting context information is delivered to and shared by mobile devices in connected and disconnected operations. Open standards play a key role in establishing context platforms and marketplaces. Successful approaches will consolidate data from ubiquitous sensing technologies on to enabled context-aware analysis of environmental and social dynamics. For example Pitney Bowes is applying big data developments of automating marketing based on location (Figure 3).
(Figure source: Jon Spinney, Location Intelligence, Pitney Bowes)
2.4 Transportation and Moving Objects
Management and optimization of transportation systems benefits from the Big Data platforms to monitor, visualize and perform predictive analytics of objects moving in space and time. Traffic congestion is reduced as trip demand data collected using transportation surveys is integrated with real time or projected traffic data. Combined, optimal schedules and routes can be calculated. Thanks to the availability of real time reports from location enable devices, these schedules and routes can even be optimized at runtime. Automobiles will continue to increase as generators of location based big data. Intel has predicted that autonomous vehicles will generate 4 TB of observation and measurement data per day (Figure 4).
The more real-time information is made available, the better the optimization algorithms work defining requirements on Big Data handling and processing for, standards can leverage these capabilities. OGC Moving Features standard[12] allows seamless integration of mobile objects and predictions based on mobile objects across systems.