Open Geospatial Consortium

Submission Date: 2020-05-22

Approval Date:   2020-06-05

Publication Date:   2020-07-22

External identifier of this OGC® document: http://www.opengis.net/doc/WP/GeoDataSci

Internal reference number of this OGC® document:  20-001r2 

Category: OGC® White Paper

Editor:  George Percivall

Geospatial Data Science

Copyright notice

Copyright © 2020 Open Geospatial Consortium

To obtain additional rights of use, visit http://www.opengeospatial.org/legal/

Warning

This document is not an OGC Standard. This document is an OGC White Paper and is therefore not an official position of the OGC membership. It is distributed for review and comment. It is subject to change without notice and may not be referred to as an OGC Standard. Further, an OGC White Paper should not be referenced as required or mandatory technology in procurements.

Document type:    OGC® White Paper

Document subtype:

Document stage:    Approved

Document language:  English

License Agreement

Permission is hereby granted by the Open Geospatial Consortium, ("Licensor"), free of charge and subject to the terms set forth below, to any person obtaining a copy of this Intellectual Property and any associated documentation, to deal in the Intellectual Property without restriction (except as set forth below), including without limitation the rights to implement, use, copy, modify, merge, publish, distribute, and/or sublicense copies of the Intellectual Property, and to permit persons to whom the Intellectual Property is furnished to do so, provided that all copyright notices on the intellectual property are retained intact and that each person to whom the Intellectual Property is furnished agrees to the terms of this Agreement.

If you modify the Intellectual Property, all copies of the modified Intellectual Property must include, in addition to the above copyright notice, a notice that the Intellectual Property includes modifications that have not been approved or adopted by LICENSOR.

THIS LICENSE IS A COPYRIGHT LICENSE ONLY, AND DOES NOT CONVEY ANY RIGHTS UNDER ANY PATENTS THAT MAY BE IN FORCE ANYWHERE IN THE WORLD.

THE INTELLECTUAL PROPERTY IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NONINFRINGEMENT OF THIRD PARTY RIGHTS. THE COPYRIGHT HOLDER OR HOLDERS INCLUDED IN THIS NOTICE DO NOT WARRANT THAT THE FUNCTIONS CONTAINED IN THE INTELLECTUAL PROPERTY WILL MEET YOUR REQUIREMENTS OR THAT THE OPERATION OF THE INTELLECTUAL PROPERTY WILL BE UNINTERRUPTED OR ERROR FREE. ANY USE OF THE INTELLECTUAL PROPERTY SHALL BE MADE ENTIRELY AT THE USER’S OWN RISK. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR ANY CONTRIBUTOR OF INTELLECTUAL PROPERTY RIGHTS TO THE INTELLECTUAL PROPERTY BE LIABLE FOR ANY CLAIM, OR ANY DIRECT, SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM ANY ALLEGED INFRINGEMENT OR ANY LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR UNDER ANY OTHER LEGAL THEORY, ARISING OUT OF OR IN CONNECTION WITH THE IMPLEMENTATION, USE, COMMERCIALIZATION OR PERFORMANCE OF THIS INTELLECTUAL PROPERTY.

This license is effective until terminated. You may terminate it at any time by destroying the Intellectual Property together with all copies in any form. The license will also terminate if you fail to comply with any term or condition of this Agreement. Except as provided in the following sentence, no such termination of this license shall require the termination of any third party end-user sublicense to the Intellectual Property which is in force as of the date of notice of such termination. In addition, should the Intellectual Property, or the operation of the Intellectual Property, infringe, or in LICENSOR’s sole opinion be likely to infringe, any patent, copyright, trademark or other right of a third party, you agree that LICENSOR, in its sole discretion, may terminate this license without any compensation or liability to you, your licensees or any other party. You agree upon termination of any kind to destroy or cause to be destroyed the Intellectual Property together with all copies in any form, whether held by you or by any third party.

Except as contained in this notice, the name of LICENSOR or of any other holder of a copyright in all or part of the Intellectual Property shall not be used in advertising or otherwise to promote the sale, use or other dealings in this Intellectual Property without prior written authorization of LICENSOR or such copyright holder. LICENSOR is and shall at all times be the sole entity that may authorize you or any third party to use certification marks, trademarks or other special designations to indicate compliance with any LICENSOR standards or specifications. This Agreement is governed by the laws of the Commonwealth of Massachusetts. The application to this Agreement of the United Nations Convention on Contracts for the International Sale of Goods is hereby expressly excluded. In the event any provision of this Agreement shall be deemed unenforceable, void or invalid, such provision shall be modified so as to make it valid and enforceable, and as so modified the entire Agreement shall remain in full force and effect. No decision, action or inaction by LICENSOR shall be construed to be a waiver of any rights or remedies available to it.

i. Abstract

This OGC White Paper describes Geospatial Data Science based on the Location Powers: Data Science Summit of November 2019. The white paper provides a description of the presentations and discussions of the summit along with recommendations for OGC activities to advance the field of Geospatial Data Science.

ii. Keywords

The following are keywords to be used by search engines and document catalogues.

ogcdoc, OGC document, Data Science, Analytics, Statistics, Artificial Intelligence, Machine Learning, Edge Computing, Knowledge-based Models, Data Management, IT Ethics, Heterogenous computing

iii. Preface

Geospatial Data Science is defined in this white paper as “The art and craft of people leveraging technology to create value out of data using location and time.” The components of geospatial data science are data, tools, applications, ethics, and emerging trends. The data component is composed of discussions about big geospatial data; data scientists, teams and process; and data management. The tools component is composed of discussions about geospatial representations and analytics, the application of machine learning to geospatial, and knowledge-based models to support decision making.

An objective of the white paper to serve as a basis for the promotion of geospatial data science within and external to OGC. OGC has a role to conduct activities that will advance innovation and standardization in geospatial data science. The overall objective is to enable beneficial use of geospatial information in humanities critical decisions.

iv. Submitting organizations

This document is prepared from material provided by organizations that planned and/or presented in the summit: AIST, Topio Networks, AWS, Orion Systems, City of Los Angeles, CrowdAI, Defense Digital Service, ESIP Federation, Esri, European Space Agency, Google, Health Solutions Research, JCC Consulting, MAXAR, NASA, NatureServe, NGA, NVIDIA, OmniSci, Oracle, Ordnance Survey UK, Pitney Bowes, Radiant Earth, SOFWERX, The Climate Corporation, University of Virginia, University of Maryland - College Park, University of Illinois - Urbana Champaign, University of Iowa, US Bureau of Labor Statistics, University of Southern California, and US Department of Transportation.

A full listing of organizations that participated in the Location Powers: Data Science Summit is in Annex A.

v. Submitters

All questions regarding this document should be directed to the editor: George Percivall, Open Geospatial Consortium

1. Overview of White Paper

Geospatial Data Science has been identified as an important technology development trend by the Open Geospatial Consortium (OGC). The OGC Technology Forecasting activity began focusing on data science as an outcome of the development of the Big Geospatial Data topic area. Both Big Data and Data Science have been topics in recent Location Powers Summits.

The Location Powers: Data Science Summit (LP_DS) organized by OGC was held on November 13 and 14, 2019, hosted by Google in Mountain View, CA. This Geospatial Data Science White Paper captures the content of the Summit and provide a basis for further action in OGC and beyond.

Location Powers Summits bring together industry, research, and government experts from across the globe into an interactive discussion that assesses the current situation and produces recommendations for future technology innovations and standards development. The Location Powers Summits are key to the technology innovation promoted by the OGC.

The Location Powers: Data Science Summit convened experts on data science, machine learning, artificial intelligence, cloud computing, remote sensing and GIS to assess the current situation of geospatial data science. Participation by leaders in social sciences, business development, government policy, and information technology led to recommendations with meaningful outcomes for geospatial data science development.

The LP_DS Summit considered the explosive availability of data about nearly every aspect of human activity along with revolutionary advances in computing technologies that is transforming geospatial data science. The shift from data-scarce to data-rich environment comes from mobile devices, remote sensing, and the Internet of Things. Nearly all of this data has components of location and time. Innovations in cloud computing and big data provides methods to perform data analytics at exceedingly large scale and speed. The development of intelligent systems using knowledge models and their impact on our insights and understanding was the focus of the LP_DS.

A summary of the topics discussed in the LP_DS is shown in the figure below.

600
Figure 1. Geospatial Data Science

This White Paper is organized as follows:

  • Data Topics

    • Big Geospatial Data (Clause 3)

    • Data Scientists, Teams, Process (Clause 4)

    • Data Management (Clause 5)

  • Tools

    • Geospatial Representations and Analytics (Clause 6)

    • AI and Machine Learning (Clause 7)

    • Models and Decisions (Clause 8)

  • Data Science Applications and Ethics (Clause 9)

  • Emerging Trends (Clause 10)

The Emerging Trends are: Edge Computing and Heterogeneous Computing

An Annex provides information about the summit including: the agenda and the organizations that participated in the Summit.

2. Overview of Geospatial Data Science

This definition was developed and repeated in several presentations and discussion sessions of the Location Powers Data Science Summit (LP_DS):

Geospatial Data Science is “The art and craft of people leveraging technology
to create value out of data using location and time.”

To set the context for LP_DS, a definition for Data Science in the context of Big Data systems coming from NIST was considered. The NIST Big Data Interoperability Framework defines Data Science as the extraction of useful knowledge directly from data through a process of discovery, or of hypothesis formulation and hypothesis testing. The NIST document goes on to identify Data Science Sub-disciplines as 1) Mathematical and computer science foundations in statistics and machine learning; along with 2) Software and systems engineering methods to handle large data volumes and innovative query and analytics techniques; and, in some extended definitions, may include 3) domain data and processes.

400
Figure 2. Data Science from NIST Big Data interoperability Framework

Applying Data Science in the context of Geospatial Information is producing tremendous results. Geospatial information is experiencing the data explosion of mobile devices, remote sensing, and the Internet of Things perhaps more than other fields as all of these data types include location, spatial, and temporal information.

The Location Powers: Data Science Summit expanded beyond the topics listed above leading to this outline of key topics in Geospatial Data Science: Data, Tools, Applications, and Trends.

  • Data: It is obvious, but important, to state that Data is a core topic of data science. The availability of increasing availability of data triggered new possible analyses. Geospatial Data, which has always been big data, provides opportunities for analytics in data science. Therefore, the opening discussion of data is about Big Geospatial Data (Clause 3). For data science to be effective, data scientists needs to work in multi-disciplinary teams with an agile process. These topics are addressed in Data Scientists, Teams, Process (Clause 4). Managing big data requires addressing data policy along with the ecosystems and platforms to manage the data. Cloud-Native data management is providing nimble and novel methods to work with big data. These topics are addressed in Data Management (Clause 5)

  • Tools: Working with Big Data requires appropriate tools. As geospatial has always been big data, many of the geospatial analysis methods were data science before the term was introduced. Methods long familiar to the geospatial community along with extensions to those methods are addressed in the clause on Representation and analytics (Clause 6). The third wave of Artificial Intelligence has been lead by machine learning based such as convolutional neural networks. The application of machine learning to big geo data in particular imagery is addressed in AI and Machine Learning (Clause 7). Knowledge based data science depends upon models that are predictive of some portion of the geospatial world. Spatial decision support is supported by knowledge based models. These topics are address in the last tools clause on Models and Decisions (Clause 8).

  • Applications and Ethics. Applying Data Science to geospatial data is producing results which were discussed in the summit The Summit discussed nearly a dozen application areas. The applications discussion surfaced need for consideration of ethics regarding Data and Algorithms. (Clause 9)

  • Trends that look to be further advancing geospatial data science include Computing at the Edge and Heterogenous Computing. Each of these are addressed in Emerging Trends (Clause 10).

3. Data: Big Geospatial Data

The emergence of Data Science concepts and motivation can be traced to Jim Grey’s concepts in "The Fourth Paradigm: Data-Intensive Scientific Discovery," by Tony Hey, Stewart Tansley, and Kristin Tolle. This book surveys opportunities and challenges for data-intensive science to prepare for the data deluge of a “sensors everywhere” data infrastructure supporting a fourth paradigm of scientific research based on “Data Exploration.” A recurring theme in Location Powers: Data Science summit was that of "telling stories with data." Using stories to explore and understand the data from a domain results in insights not previously available. Data Science can be described as the exploration of big data about a domain.

This Clause addresses topics related to big data for data science.

  • Big Data with Location

  • Big Data Software Stack

  • Big Geo Data Use Cases

  • Recommendations

3.1. Big Data with Location

Geospatial data has always been big data was a theme of two Location Powers: Big Data summits and the resulting Big Geospatial Data – an OGC White Paper. The Big Geo Data white paper had these main themes:

  • Geospatial data is increasing in volume and variety;

  • New Big Data computing techniques are being applied to geospatial data;

  • Geospatial Big Data techniques benefit many applications; and

  • Open standards are needed for interoperability, efficiency, innovation and cost effectiveness.

The growth of geospatial highlighted in the Big Geo Data White Paper continues and is increasing. Patrick Griffiths, ESA, highlighted this trend during LP_DS. The ESA archives alone will be over 100 Petabytes by 2026.

FIG03.01_EO_data_growth.png
Figure 3. The EO Big Data Revolution

Marc Armstrong, Univeristy of Iowa, at LP_DS described future satellite constellations that are being planned by different companies including Amazon and SpaceX. SpaceX is planning to deploy 12,000 satellites for communications, military, and scientific purposes. The revisit rate for viewing locations will increase dramatically. BlackSky is proposing 40 to 70 revisits each day. In addition to the static imagery, there is a lot of streaming video that’s going to be provided as well.

The Big Geo Data revolution is not only driven by remote sensing from satellites. Philippe Cases, Topio Networks, provided estimates to LP_DS on the magnitude of the data deluge coming from edge devices. All of this Edge Data has components of location and time that can be exploited in data science.