Publication Date: 2020-02-13
Approval Date: 2019-12-25
Submission Date: 2019-11-18
Reference number of this document: OGC 19-083
Reference URI for this document: http://www.opengis.net/doc/PER/CitSciIE-1
Category: Public Engineering Report
Editor: Joan Masó
Title: OGC Citizen Science Interoperability Experiment Engineering Report
COPYRIGHT
Copyright © 2020 Open Geospatial Consortium. To obtain additional rights of use, visit http://www.opengeospatial.org/
WARNING
This document is not an OGC Standard. This document is an OGC Public Engineering Report created as a deliverable in an OGC Interoperability Initiative and is not an official position of the OGC membership. It is distributed for review and comment. It is subject to change without notice and may not be referred to as an OGC Standard. Further, any OGC Engineering Report should not be referenced as required or mandatory technology in procurements. However, the discussions in this document could very well lead to the definition of an OGC Standard.
LICENSE AGREEMENT
Permission is hereby granted by the Open Geospatial Consortium, ("Licensor"), free of charge and subject to the terms set forth below, to any person obtaining a copy of this Intellectual Property and any associated documentation, to deal in the Intellectual Property without restriction (except as set forth below), including without limitation the rights to implement, use, copy, modify, merge, publish, distribute, and/or sublicense copies of the Intellectual Property, and to permit persons to whom the Intellectual Property is furnished to do so, provided that all copyright notices on the intellectual property are retained intact and that each person to whom the Intellectual Property is furnished agrees to the terms of this Agreement.
If you modify the Intellectual Property, all copies of the modified Intellectual Property must include, in addition to the above copyright notice, a notice that the Intellectual Property includes modifications that have not been approved or adopted by LICENSOR.
THIS LICENSE IS A COPYRIGHT LICENSE ONLY, AND DOES NOT CONVEY ANY RIGHTS UNDER ANY PATENTS THAT MAY BE IN FORCE ANYWHERE IN THE WORLD. THE INTELLECTUAL PROPERTY IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NONINFRINGEMENT OF THIRD PARTY RIGHTS. THE COPYRIGHT HOLDER OR HOLDERS INCLUDED IN THIS NOTICE DO NOT WARRANT THAT THE FUNCTIONS CONTAINED IN THE INTELLECTUAL PROPERTY WILL MEET YOUR REQUIREMENTS OR THAT THE OPERATION OF THE INTELLECTUAL PROPERTY WILL BE UNINTERRUPTED OR ERROR FREE. ANY USE OF THE INTELLECTUAL PROPERTY SHALL BE MADE ENTIRELY AT THE USER’S OWN RISK. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR ANY CONTRIBUTOR OF INTELLECTUAL PROPERTY RIGHTS TO THE INTELLECTUAL PROPERTY BE LIABLE FOR ANY CLAIM, OR ANY DIRECT, SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM ANY ALLEGED INFRINGEMENT OR ANY LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR UNDER ANY OTHER LEGAL THEORY, ARISING OUT OF OR IN CONNECTION WITH THE IMPLEMENTATION, USE, COMMERCIALIZATION OR PERFORMANCE OF THIS INTELLECTUAL PROPERTY.
This license is effective until terminated. You may terminate it at any time by destroying the Intellectual Property together with all copies in any form. The license will also terminate if you fail to comply with any term or condition of this Agreement. Except as provided in the following sentence, no such termination of this license shall require the termination of any third party end-user sublicense to the Intellectual Property which is in force as of the date of notice of such termination. In addition, should the Intellectual Property, or the operation of the Intellectual Property, infringe, or in LICENSOR’s sole opinion be likely to infringe, any patent, copyright, trademark or other right of a third party, you agree that LICENSOR, in its sole discretion, may terminate this license without any compensation or liability to you, your licensees or any other party. You agree upon termination of any kind to destroy or cause to be destroyed the Intellectual Property together with all copies in any form, whether held by you or by any third party.
Except as contained in this notice, the name of LICENSOR or of any other holder of a copyright in all or part of the Intellectual Property shall not be used in advertising or otherwise to promote the sale, use or other dealings in this Intellectual Property without prior written authorization of LICENSOR or such copyright holder. LICENSOR is and shall at all times be the sole entity that may authorize you or any third party to use certification marks, trademarks or other special designations to indicate compliance with any LICENSOR standards or specifications.
This Agreement is governed by the laws of the Commonwealth of Massachusetts. The application to this Agreement of the United Nations Convention on Contracts for the International Sale of Goods is hereby expressly excluded. In the event any provision of this Agreement shall be deemed unenforceable, void or invalid, such provision shall be modified so as to make it valid and enforceable, and as so modified the entire Agreement shall remain in full force and effect. No decision, action or inaction by LICENSOR shall be construed to be a waiver of any rights or remedies available to it.
None of the Intellectual Property or underlying information or technology may be downloaded or otherwise exported or reexported in violation of U.S. export laws and regulations. In addition, you are responsible for complying with any local laws in your jurisdiction which may impact your right to import, export or use the Intellectual Property, and you represent that you have complied with any regulations or registration procedures required by applicable law to make this license enforceable.
- 1. Summary
- 2. References
- 3. Terms and definitions
- 4. Overview
- 5. O&M for Cit Sci
- 6. SOS architectures
- 7. SOS servers
- 8. SOS clients
- 9. Data quality estimations on the client side
- 9.1. Quality estimation on vector data
- 9.1.1. How data quality is presented
- 9.1.2. How to start computing data quality
- 9.1.3. Case 1: Positional accuracy of the layer from observation uncertainties
- 9.1.4. Case 2: Logical consistency of the thematic attributes
- 9.1.5. Case 3: Temporal validity of the observation date
- 9.1.6. Case 4: Validity of the positions of observations (by bounding box)
- 9.2. Quality estimation on raster data
- 9.3. Future work
- 9.1. Quality estimation on vector data
- 10. Definitions Server
- 11. User and Application Federation
- 12. Connecting Citizen Science data sets to GEOSS
- Appendix A: Integration between SCENT & LandSense
- Appendix B: Revision History
1. Summary
This Engineering report describes the first phase of the Citizen Science (CS) Interoperability Experiment (IE) organized by the EU H2020 WeObserve project under the OGC Innovation Program and supported by the four H2020 Citizen Observatories projects (SCENT, GROW, LandSense, and GroundTruth 2.0) as well as the EU H2020 NEXTGEOSS project. The activity covered aspects of data sharing architectures for Citizen Science data, data quality, data definitions and user authentication.
The final aim was to propose solutions on how Citizen Science data could be integrated in the Global Earth Observation System of Systems (GEOSS). The solution is necessarily a combination of technical and networking components, being the first ones the focus of this work. The applications of international geospatial standards in current Citizen Science and citizen observatory projects to improve interoperability and foster innovation is one of the main tasks in the IE.
The main result of the activity was to demonstrate that Sensor Observing Services can be used for Citizen Science data (as proposed in the Open Geospatial Consortium (OGC) Sensor Web Enablement for Citizen Science (SWE4CS) Discussion Paper) by implementing SWE4CS in several clients and servers that have been combined to show Citizen Science observations. In addition, an authentication server was used to create a federation between three projects. This federated approach is part of the proposed solution for GEOSS that can be found in the last chapter. Many open issues have been identified and are expected to be addressed in the second phase of the experiment, including the use of a definitions server.
1.1. Requirements & Research Motivation
This experiment was designed to demonstrate how current ICT-based tools can be applied together to allow better citizen participation in CS projects and enable better reuse of the data gathered. Citizen Science is highly transdisciplinary and heterogeneous by nature and current standardization efforts already occur in the OGC (e.g., addressing data model and sharing issues) as well as outside the OGC (primarily addressing project descriptions and dataset metadata). Citizen Science projects might benefit from concrete examples and best practices required to achieve the full benefits of interoperability. OGC is in the ideal position to develop and provide such best practice guidance to the international community. Developed solutions in this IE should be applicable to most Citizen Science projects. Findings from this IE will be generalized as practice examples and might set the basis for additional experimentation in the future.
The FP7 Citizen Observatory Web (COBWEB) project was the first to propose the use of SWE in CS. This work resulted in an OGC public Discussion Paper available on the OGC website (OGC 16-129). The Discussion Paper describes a data model for the standardized exchange of Citizen Science sampling data based on SWE standards. This Discussion Paper was the initial motivation for this IE.
Beyond the work described above, the Citizen Science Association’s International Working Group on Citizen Science Data and Metadata has developed the PPSR-CORE metadata standard and the European Citizen Science Association (ECSA) has a working group that recognizes the value of standardization in the CS activities (supported by a COST Action). However, these activities could benefit from some experimentation that would be able to suggest common best practices while recognizing the particularities and current approaches in different thematic domains, such as biodiversity monitoring. Citizen Science can complement authoritative in-situ observations and fill the information gaps in numerous scientific disciplines that could be essential for informed decision making. In that sense, the way Citizen Science can be integrated into The GEOSS (including GEOSS-Data Core as the pool to promote and share open and free data) is still under investigation.
The Ecosystem of Citizen Observatories (CO) for Environmental Monitoring WeObserve project is a Horizon 2020 funded project focused on improving the coordination between existing COs and related regional, European, and international activities. WeObserve tackles three key challenges that face COs: awareness, acceptability, and sustainability. The WeObserve Community of Practice 3 (CoP3) is about Interoperability of Citizen Science projects. The WeObserve project – via its CoP activities – has represented an opportunity to promote interoperability experimentation in collaboration with the OGC. Such collaboration addresses questions raised in the SWE4CS discussion. In addition, the work offers the possibility to directly feed the results into the relevant OGC standards and promotes their usage within GEOSS (as an important user community of OGC standards).
In anticipation of the 50th Anniversary of Earth Day in 2020, Earth Day Network, the Woodrow Wilson International Center for Scholars, and the U.S. Department of State, through the Eco-Capitals Forum, announce Earth Challenge 2020, a Citizen Science Initiative. This initiative is in collaboration with Connect4Climate – World Bank Group, Conservation X Labs, Hult Prize, National Council for Science and the Environment (NCSE), OGC, Reset, SciStarter, UN Environment, and others to be announced. Earth Challenge 2020 will help engage millions of global citizens in collecting one billion data points in areas including air quality, water quality, biodiversity, pollution, and human health. Earth Change 2020 data will be shared through the GEOSS Portal.
1.2. Prior-After Comparison
This is the first Citizen Science IE conducted by the OGC. Prior to this activity, there was a Discussion Paper on how to apply the SWE standards in Citizen Science (SWE4CS). This experiment positively tested the proposed route using Sensor Observing Services but also has opened the door to future exploration of the SensorThings API.
Also prior this activity, was a H2020 project with a Authentication Service and after the activity, three H2020 efforts formed a bigger federation demonstrating the route to federating and aggregating Citizen Science projects to contribute to GEO objectives.
This work is relevant to the OGC Citizen Science Domain Working Group.
1.3. Recommendations for Future Work
This OGC IE ended on June 2019, but a second IE is foreseen for the following year.
New possible topics for next IE to be discussed among the members include the following.
-
There is a need for clarifying how to coordinate infrastructures for Citizen Science in Europe and adopt standard procedures for data sharing and single sign on. Solving this issue will help in connecting CS to GEO. Steffen Fritz (IIASA) has proposed a side event in the next EuroGEOSS workshop to discuss this coordination with the relevant players. This is emerging as a new activity in the WeObserve Interoperability CoP that is related, but not directly connected, to the IEs.
-
The WMO (World Meteorological Organization) is concerned about the amount of different CS activities that are being organized by meteorological organizations. WMO is looking for ways to take advantage of this new data stream, but problems of standardization of what is measured and how data is being shared arise. WMO has identified the potential of these data streams and would like to harmonize the situations to make data more useful for weather predictions in the future.
-
OGC is promoting a new generation of web services based on OpenAPI. It is unclear how these new web services could impact the use of OGC standards by CS projects but it is seen as an opportunity to make OGC standards more usable and compatible with IT mainstream. A hackathon to develop OGC API specifications occurred on 20-21 June 2019 in London and subsequent hackathons and sprints continue to advance the OGC API standards.
The definition of the follow-up IE started upon completion of this IE.
1.4. Document contributor contact points
All questions regarding this document should be directed to the editor or the contributors:
Contacts
Name | Organization |
---|---|
Joan Maso |
UAB-CREAF |
Andy Cobley |
University of Dundee |
Valantis Tsiakos |
Institute of Communication & Computer Systems (ICCS) |
Nikolaos Tousert |
Institute of Communication & Computer Systems (ICCS) |
Theodoros Theodoropoulos |
Institute of Communication & Computer Systems (ICCS) |
Simon Jirka |
52 North |
Sven Schade |
European Commission, Joint Research Center (JRC) |
Andreas Matheus |
Secure Dimensions |
Stefano Tamascelli |
XTeam Software Solutions |
Friederike Klan |
Citizen Science Group, Institute of Data Science, DLR |
Trupti Padiya |
Citizen Science Group, Institute of Data Science, DLR, Friedrich-Schiller-University Jena |
Initiators
Organization |
---|
Universtat Autònoma de Barcelona - CREAF (UAB-CREAF) |
International Institute for Applied Systems Analysis (IIASA) |
Joint Research Center (JRC) |
European Space Agency (ESA) |
Woodrow Wilson International Center for Scholars (Wilson Center) |
The WeObserve project has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under grant agreement No. 776740.
This presentation reflects only the editor’s views and the EU Agency is not responsible for any use that may be made of the information it contains.
1.5. Foreword
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. The Open Geospatial Consortium shall not be held responsible for identifying any or all such patent rights.
Recipients of this document are requested to submit, with their comments, notification of any relevant patent claims or other intellectual property rights of which they may be aware that might be infringed by any implementation of the standard set forth in this document, and to provide supporting documentation.
1.6. Acknowledgements
This report was coordinated and developed with funding from the European Union’s Horizon 2020 research and innovation programme under grant agreements: no 776740 (WeObserve), no 688930 (SCENT), no 689744 (Ground Truth 2.0), no 690199 (GROW Observatory), no 689812 (LandSense) as well as no 730329 (NEXTGEOSS).
2. References
The following normative documents are referenced in this document.
Although the following is an OGC Discussion Paper that is not an OGC standard and cannot be considered strictly a normative reference, it is actually the basis for several sections of this document and should be considered as important background:
3. Terms and definitions
For the purposes of this report, the definitions specified in Clause 4 of the OWS Common Implementation Standard OGC 06-121r9 shall apply. In addition, the following terms and definitions apply.
-
Citizen Observatory (CO)
Community-based environmental monitoring and information systems that invite individuals to share observations, typically via mobile phone or the web (from: https://www.weobserve.eu/about/citizen-observatories).
-
Citizen Science (CS)
The collection and analysis of data relating to the natural world by members of the general public, typically as part of a collaborative project with professional scientists (from: https://www.uen.org/crowdandcloud/citizen.shtml).
-
Citizen Science Association
A network that seeks to promote and advance citizen science in a region or around the world. Examples are the American Citizen Science Association (CSA), The European Citizen Science Association (ECSA), or the Citizen Science Global Partnership (CSGP).
-
Citizen Science Federation
A network of Citizen Science that aims to aggregate innovative Earth Observation technologies, mobile devices, community-based environmental monitoring, data collection, interpretation, and information delivery systems to empower communities to monitor and report on their environment. An example of this is the The LandSense Federation.
-
Community of Practice (CoP)
Community which works to consolidate practice-based knowledge of COs sharing information and resources as well as developing guidelines and toolkits for COs (from: https://www.weobserve.eu/cops/).
3.1. Abbreviated terms
-
CitSciIE Citizen Science Interoperability Experiment
-
CO Citizen Observatory
-
CoP Community of Practice
-
COST European Cooperation in Science and Technology
-
CS Citizen Science
-
CS DWG Citizen Science Domain Working Group
-
CSGP Citizen Science Global Partnership
-
ECSA European Association of Citizen Science
-
EO Earth Observation
-
ICT Information and Communication Technologies
-
IE Interoperability Experiment
-
O&M Observation and Measurements
-
PPSR Public Participation in Scientific Research
-
SOS Sensor Observation Service
-
SSO Single Sign On
-
SWE Sensor Web Enablement
-
SWE4CS Sensor Web Enablement for Citizen Science
-
TC Technical Committee
-
TIE Technology Integration Experiments
-
WPS Web Processing Service
4. Overview
This Engineering Report focuses on the findings of the first phase of the Citizen Science Interoperability Experiment (CitSciIE).
The primary focus of the OGC CitSciIE was to demonstrate the interoperability of Citizen Observatories and Citizen Science projects and the way OGC standards can be applied to Citizen Science, including possible relationships to other relevant standards from the community. In particular, a subset the originally proposed topics were being addressed based on the participant organizations:
-
The use of OGC standards or draft specifications (e.g., SWE or SWE4CS) to support data integration among CS projects, and with other sources, especially authoritative data;
-
The use of ISO standards, OGC publications, and community resources to document data quality aspects (e.g., UncertML, QualityML);
-
The integration of CS projects/campaigns in a Single Sign-On system (SSO) federation; and
-
The relationships between OGC standards and data and metadata standards currently used by Citizen Science projects.
The desired outcome of this experiment was the following.
-
Successfully demonstrate how OGC standards (e.g., SWE) are applicable to Citizen Science, document available supporting tools, identify the challenges of using OGC SWE standards (or Internet of Things equivalent solutions) within current Citizen Science projects, and propose a way forward. Make recommendations to the Earth Science 2020 initiative on which OGC standards should be utilized to underpin interoperable data collection and sharing.
-
Successfully demonstrate how to estimate Citizen Science data quality and make the quality indicators and conformity available in the document and in supporting tools and link them to the OGC SWE standards (or Internet of Things equivalent solutions) within current CS projects, as well as propose a way forward.
-
Determine the security considerations and the available tools to support an SSO federation that helps users in participating in several projects by using a single user account.
-
Assess the possible relationships of OGC standards (e.g., SensorML) with other existing standards in the field (e.g., PPSR - CORE, the ontology developed by the COST Action on Citizen Science, and the Citizen Science Definition Service (CS-DS) developed in the NextGEOSS project).
-
Satisfy and document the necessary requirements to integrate Citizen Science into GEOSS by using OGC standards.
This IE has been promoted by the OGC Citizen Science Domain Working Group, the WeObserve and NextGEOSS H2020 projects, and The Earth Challenge 2020 project as supported by National Geographic Society. This IE contributes not only to the interoperability and possibly standardization program of the OGC, but also to the GEOSS. This work is also relevant to the foundational objectives of the Citizen Science Global Partnership (CSGP). Regional and national Citizen Science Associations will equally benefit from the results of this OGC IE.
4.1. Structure of the activities
The official kick-off meeting for the OGC CitSciIE experiment was held on Friday 14th September 2018 at the OGC TC meeting in Stuttgart, Germany. Activities continued until March 2019.
During the kick-off meeting of the IE, the following subgroups emerged.
-
V: Vocabularies for organizing Citizen Science projects. There was a discussion on essential variables but also on other kinds of practices that can be associated to vocabularies, i.e., on how to publish vocabularies (PublishingDefs) or on defining a list of vocabularies that could be useful to experiment with (observations, project descriptions, general glossaries of terms).
-
Working item V.1: A list of the current projects that the Wilson Center knows can align with the Earth Challenge topics (air and water quality, pollution, human health, and eventually biodiversity) and extraction of a common set of variables the projects cover.
-
Working item V.2: Analysis of data models that contributors in the experiment can bring in: Air quality (HackAir), Biodiversity (Atlas of Living Australia & Natusfera), Mosquito (CREAF), Land Use (IIASA), Phenology (CREAF), Invasive Alien Species (JRC).
-
Working item V.3: Consider the COST action metadata model for inclusions as another vocabulary: this might include a set of definitions of phenomena that are being addressed by CS initiatives (based on the inventory of citizen science activities for environment policies).
-
-
D: Data sharing using OGC standards such as Observations and Measiurements (O&M) and Sensor Observation Service (SOS). A pool of services were identified for participating in an IE, including SOS services and clients and citizen science project databases and APIs.
-
Working item D.1: A set of instructions on how a CS project can easily setup an SOS service. The service could include 52North implementation and might include MiraMon SOS (with some work in the implementation). The service should address the case of a small project contributing to the Earth Challenge 2020.
-
Working item D.2: Create an SOS endpoint for HackAir data with minimum resources.
-
Working item D.3: Define the requirements for a data provider that could assist the Wilson Center in setting up the challenge database. The requirements should consider upload of data into the system. The IE preference was to go for a harvest system instead of a federated system. The working item could describe a possible architecture to allow the dialog between the central database and the small contributing projects and should impose data sharing requirements (services o APIs) on the central database.
-
-
S: Connection between LandSense federation and JRC user system.
-
Working item S.1: Interoperability test on the integration of LS-SSO and JRC-SSO.
-
-
Q: Data quality.
-
Working item Q.1: Write a document on perspectives of the different quality aspects: Quality assessment (ISO 19157-QualityML), Quality improvement, Quality plan, Data Management principles (ISO 8001), Quality documentation, Quality communication.
-
Working item Q.2: Perfect the quality measurement system based on Web Processing Service (WPS) and SOS harvest by demonstrating the concept in practice. Also include in the SOS harvesting the possibility to have a query for assessing the quality of "views"/"selections"/"fragments" of a dataset.
-
Connection with: D.2.
-
-
Working item Q.3: Refine the QualityML vocabulary with new entries considering the work done in Australia.
-
Connection with: D.3.
-
-
Working item Q.4: Add new entry point the QualityML for other common vocabulary formats like TTL, etc.
-
Connection with: V.
-
-
For each of the subgroups a chair and the main participants and contributors were identified. Responsible persons were also assigned to each of the working items.
4.2. Results detailed in the subsequent sections
These are the main activities and outcomes of the interoperability experiment detailed by activity.
Data sharing using OGC standards such as O&M and SOS
This activity has been the most active one. During the IE, the following servers have been deployed: MiraMon SOS server, Grow SOS, DLR istSOS SOS, and 52north SOS. Three clients have also been produced: MiraMon SOS browser, Grow SOS data viewer, and 52north Helgoland. In the last meeting at the EGU, the group was able to demonstrate interoperability by connecting the SOS clients to the SOS services and showing the data on clients, sometimes mixing data form different services and datasets in a single view. This is the most significant result of the experiment and is being extensively documented in this Engineering Report in sections 5, 6 and 7.
Data quality
Two quality vocabularies have been identified: Australian work done by Peter Brenton’s team (https://github.com/tdwg/bdq) and the QualityML vocabulary developed by CREAF in the GeoViQua project. The intent of this IE was to do a comparison of both approaches, but the participants were not able to do so in the timeframe of the first IE. Section 8 describes the current status of the activity. It is foreseen that the second IE will continue what was started here.
Definition server for organizing Citizen Science projects
The objective of this activity was to support the Earth Challenge 2020 research questions. The questions were defined during the first month of the experiment and now it is time to analyze the questions in terms of data needs and thematic vocabularies to be used. Because the analysis has not yet been performed, this activity has not resulted in tangible outputs and will be reintroduced in the second IE. Details of this development are described in section 9.
Connection between LandSense federation and other user systems
Secure Dimensions (Andreas Matheus) was very active in providing demonstrations and information on how the LandSense federation works and how other projects can be included in the federation and use the SSO facility. Unfortunately, no other member of the CoP had the resources to apply the SSO on their services or clients and take advantage of the LandSense offering. The activity resulted in a video demonstration that is publicly available here: https://portal.opengeospatial.org/files/?artifact_id=81550.
Section 11 Details the current status of the activity.
Other
Section 11 summarizes the lessons learned that can be applied to GEOSS.
In addition to these activities, another activity about quality annotating scientific documentation in a standard way was proposed by Lucy Bastin. A video was recorded that summarizes the idea: https://portal.opengeospatial.org/files/?artifact_id=82544.
5. O&M for Cit Sci
In a feature model all characteristics of a feature are considered properties of the feature and are not semantically separated at the abstract level.
The O&M standard ([OGC 10-004r3], Abstract Specification Topic 20: Observations and Measurements) defines a data model for observations where main concepts are separated as represented in the Figure 1.
For each observation, O&M allows us to document the following characteristics.
-
Where the observation is located: even if the observation was made remotely with a camera or a drone, it is commonly more relevant to know the position of the observed phenomena (the sensor position can also be recorded).
-
When the observation took place and what time period it represents: even if samples were collected and analyzed later, it is commonly more relevant to know the instant or period of the observed phenomenon.
-
How the observation was done: this will describe the procedure and instrument used to capture the phenomenon.
-
Who did the observation: the procedure and instrument used to capture the phenomenon was installed or used on site by someone. In citizen Science, where many observers contribute small pieces of information that together will form a dataset, it is particularly important to record at least an observer identifier.
-
What was measured: this will define the property names and units of measure of the variables observed.
-
What data was collected: this will record the actual values of the properties measured.
-
What is the expected quality of the observation: if an estimation of the quality of the observation was done, it is important to document the quality.
In the O&M data model, the above aspects are clearly separated semantically as shown in Figure 2. This is the main value of the O&M model and its usage SOS (or the SensorThingsAPI that uses a very similar approach to model the data), but it is also the main handicap in applying the standard.
Concept | O&M | type |
---|---|---|
Where |
featureOfInterest |
GFI_Feature |
When |
phenomenonTime, resultTime |
|
How |
procedure |
OM_Process |
Who |
procedure |
OM_Process |
What |
observedProperty |
GF_PropertyType |
Data |
result |
Any |
Quality |
resultQuality |
Even if the aspects above are separated, the O&M model gives a lot of flexibility in defining the properties and this flexibility can condition interoperability when trying to combine data from different sources. The standard give us freedom to select among the different geometries provided by GML to define the featureOfInterest. The standard gives us even more freedom on the data collected that can have any imaginable structure.
That is the reason why the data model used to represent the data gathered by a Citizen Observatory needs to be carefully considered before even starting the first data collection campaign. Data models can be designed in UML for clarity, but they are later encoded in XML. XML is the only official encoding that O&M references in the OGC website ([OGC 10-025r1] Observations and Measurements - XML Implementation v2.0). Nevertheless, there is a JSON alternative discussed in an OGC Discussion Paper ([OGC 15-100r1] OGC Observations and Measurements – JSON Implementation) that does not represent an official position of the OGC but can be implemented anyway. As it will be discussed latter, the interpretation of long XML files might be to slow in web browsers, and in this case, a JSON encoding is regarded as a good alternative either in O&M or the SensorThings API.
5.1. The GT20 examples
In the Ground Truth 2.0 project, we have been using the MiraMon implementation of O&M. This implementation assumes a simplified situation that considers that each observation can be represented by a single row in a CSV or in a single record of a database table. Coordinates are represented as a single point. In this situation, we select which column names represent the phonomenonTime, the procedure (that actually is including the user name), and the featureOfInterest (the coordinates). The rest of the columns are considered part of the data record that needs to be provided as the result.
Section 8.2.1 of the [OGC 08-094r1] OGC® SWE Common Data Model Encoding Standard v2.0 describes a way to encode a DataRecord as an array of fields that can numbers, strings, dates, etc. In our simplified assumption, this array is ideal to wrap the properties of the observations that cannot be mapped to any other O&M aspect. This practice is consistent with the section 7.2.8 of the SWE4CS discussion paper.
The following is an example of how a water quality observation is represented following the O&M model and encoded in XML.
<om:OM_Observation gml:id="vatten-fokus_2_1">
<om:type xlink:href="http://www.opengis.net/def/observationType/OGC-OM/2.0/OM_ComplexObservation"/>
<om:procedure xlink:href="http://www.opengis.uab.cat/vatten-fokus/procedure/22655"/>
<om:observedProperty xlink:href="http://www.opengis.uab.cat/vatten-fokus/observedProperty"/>
<om:featureOfInterest xlink:href="http://www.opengis.uab.cat/vatten-fokus/featureOfInterest/2"/>
<om:result xsi:type="swe:DataRecordPropertyType">
<swe:DataRecord>
<swe:field name="CREA_DATE">
<swe:Text definition="http://www.opengis.uab.cat/vatten-fokus/field/Creation_Date">
<swe:value>07/12/2018 17:23</swe:value>
</swe:Text>
</swe:field>
<swe:field name="SITE_NAME">
<swe:Text definition="http://www.opengis.uab.cat/vatten-fokus/field/Site_name">
<swe:value>Dunkershall. V¤gtrumma uppst¤ms.</swe:value>
</swe:Text>
</swe:field>
<swe:field name="LAND_USE">
<swe:Text definition="http://www.opengis.uab.cat/vatten-fokus/field/Land_use_in_the_immediate_surroundings">
<swe:value>Agriculture</swe:value>
</swe:Text>
</swe:field>
<swe:field name="BANK_VEGE">
<swe:Text definition="http://www.opengis.uab.cat/vatten-fokus/field/Bank_vegetation">
<swe:value>Grass</swe:value>
</swe:Text>
</swe:field>
<swe:field name="NITRATE">
<swe:Quantity definition="http://www.opengis.uab.cat/vatten-fokus/variable/NITRATE">
<swe:uom/>
<swe:value>1.50</swe:value>
</swe:Quantity>
</swe:field>
<swe:field name="PHOSPHATE">
<swe:Quantity definition="http://www.opengis.uab.cat/vatten-fokus/variable/PHOSPHATE">
<swe:uom/>
<swe:value>0.075</swe:value>
</swe:Quantity>
</swe:field>
<swe:field name="WATER_COLOR">
<swe:Text definition="http://www.opengis.uab.cat/vatten-fokus/field/Estimated_water_colour">
<swe:value>Colourless</swe:value>
</swe:Text>
</swe:field>
</swe:DataRecord>
</om:result>
</om:OM_Observation>
The following is an example of how two air quality observations are represented following the O&M model and encoded in JSON.
{
"id":"meet-mee-mechelen_1_0",
"type" : "http://www.opengis.net/def/observationType/OGC-OM/2.0/OM_ComplexObservation",
"phenomenonTime" : "2017-11-19 17:20:00+01",
"resultTime" : "2017-11-19 17:20:00+01",
"procedure" : "http://www.opengis.uab.cat/meet-mee-mechelen/procedure/5",
"observedProperty" : "http://www.opengis.uab.cat/meet-mee-mechelen/observedProperty",
"featureOfInterest" : "http://www.opengis.uab.cat/meet-mee-mechelen/featureOfInterest/1",
"result": {
"type":"DataRecord",
"field":[
{
"name" : "CAMPAIGN",
"type" : "Text",
"definition" :"http://www.opengis.uab.cat/meet-mee-mechelen/field/CAMPAIGN",
"value" : "Oct-Nov2017"
},
{
"name" : "bc_aggr",
"type" : "Quantity",
"definition" :"http://www.opengis.uab.cat/meet-mee-mechelen/variable/bc_aggr",
"value" : "3155"
},
{
"name" : "bc_aggr_mi",
"type" : "Quantity",
"definition" :"http://www.opengis.uab.cat/meet-mee-mechelen/variable/bc_aggr_mi",
"value" : "80"
},
{
"name" : "bc_aggr_ma",
"type" : "Quantity",
"definition" :"http://www.opengis.uab.cat/meet-mee-mechelen/variable/bc_aggr_ma",
"value" : "16413"
},
{
"name" : "bc_aggr_st",
"type" : "Quantity",
"definition" :"http://www.opengis.uab.cat/meet-mee-mechelen/variable/bc_aggr_st",
"value" : "3398"
},
{
"name" : "uncertaint",
"type" : "Quantity",
"definition" :"http://www.opengis.uab.cat/meet-mee-mechelen/variable/uncertaint",
"value" : "0.50"
}
]
}
},
{
"id":"meet-mee-mechelen_2_1",
"type" : "http://www.opengis.net/def/observationType/OGC-OM/2.0/OM_ComplexObservation",
"phenomenonTime" : "2017-11-19 17:20:06+01",
"resultTime" : "2017-11-19 17:20:06+01",
"procedure" : "http://www.opengis.uab.cat/meet-mee-mechelen/procedure/5",
"observedProperty" : "http://www.opengis.uab.cat/meet-mee-mechelen/observedProperty",
"featureOfInterest" : "http://www.opengis.uab.cat/meet-mee-mechelen/featureOfInterest/2",
"result": {
"type":"DataRecord",
"field":[
{
"name" : "CAMPAIGN",
"type" : "Text",
"definition" :"http://www.opengis.uab.cat/meet-mee-mechelen/field/CAMPAIGN",
"value" : "Oct-Nov2017"
},
{
"name" : "time_first",
"type" : "Text",
"definition" :"http://www.opengis.uab.cat/meet-mee-mechelen/field/time_first",
"value" : "2017-11-06 08:00:18+01"
},
{
"name" : "bc_aggr",
"type" : "Quantity",
"definition" :"http://www.opengis.uab.cat/meet-mee-mechelen/variable/bc_aggr",
"value" : "3382"
},
{
"name" : "bc_aggr_mi",
"type" : "Quantity",
"definition" :"http://www.opengis.uab.cat/meet-mee-mechelen/variable/bc_aggr_mi",
"value" : "80"
},
{
"name" : "bc_aggr_ma",
"type" : "Quantity",
"definition" :"http://www.opengis.uab.cat/meet-mee-mechelen/variable/bc_aggr_ma",
"value" : "17256"
},
{
"name" : "bc_aggr_st",
"type" : "Quantity",
"definition" :"http://www.opengis.uab.cat/meet-mee-mechelen/variable/bc_aggr_st",
"value" : "3663"
},
{
"name" : "number_of_",
"type" : "Quantity",
"definition" :"http://www.opengis.uab.cat/meet-mee-mechelen/variable/number_of_",
"value" : "25"
},
{
"name" : "number_o_1",
"type" : "Quantity",
"definition" :"http://www.opengis.uab.cat/meet-mee-mechelen/variable/number_o_1",
"value" : "13"
},
{
"name" : "mean_numbe",
"type" : "Quantity",
"definition" :"http://www.opengis.uab.cat/meet-mee-mechelen/variable/mean_numbe",
"value" : "7"
},
{
"name" : "uncertaint",
"type" : "Quantity",
"definition" :"http://www.opengis.uab.cat/meet-mee-mechelen/variable/uncertaint",
"value" : "0.50"
}
]
}
}
These examples were produced by SOS requests to this URL: http://www.ogc3.uab.cat/cgi-bin/CitSci/MiraMon.cgi?. A client connecting to this service can be found here: http://www.ogc3.uab.cat/gt20/.
5.2. HackAir examples
To illustrate the flexibility of the O&M, we have included this air quality report that shows how HackAir data is presented by a 52North SOS implementation. In this case the result presents a single numerical value while the other information is provided as parameters. This approach is consistent with section 7.2.2.5 of the O&M standard.
<om:OM_Observation gml:id="o_499">
<om:type xlink:href="http://www.opengis.net/def/observationType/OGC-OM/2.0/OM_Measurement"/>
<om:phenomenonTime>
<gml:TimeInstant gml:id="phenomenonTime_499">
<gml:timePosition>2019-01-01T00:00:12.000Z</gml:timePosition>
</gml:TimeInstant>
</om:phenomenonTime>
<om:resultTime xlink:href="#phenomenonTime_499"/>
<om:procedure xlink:href="sensors_arduino_1000"/>
<om:parameter>
<om:NamedValue>
<om:name xlink:href="PM2.5_AirPollutantIndex"/>
<om:value xmlns:xs="http://www.w3.org/2001/XMLSchema" xsi:type="xs:string">bad</om:value>
</om:NamedValue>
</om:parameter>
<om:parameter>
<om:NamedValue>
<om:name xlink:href="http://www.opengis.net/def/param-name/OGC-OM/2.0/samplingGeometry"/>
<om:value xmlns:ns="http://www.opengis.net/gml/3.2" xsi:type="ns:GeometryPropertyType">
<ns:Point ns:id="Point_sp_45C0E376C40E98E8EC0D48C05F7558C2FFD15245">
<ns:pos srsName="http://www.opengis.net/def/crs/EPSG/0/4326">52.063269625917 4.5077472925186</ns:pos>
</ns:Point>
</om:value>
</om:NamedValue>
</om:parameter>
<om:parameter>
<om:NamedValue>
<om:name xlink:href="source"/>
<om:value xmlns:xs="http://www.w3.org/2001/XMLSchema" xsi:type="xs:string">sensors_arduino</om:value>
</om:NamedValue>
</om:parameter>
<om:parameter>
<om:NamedValue>
<om:name xlink:href="user"/>
<om:value xmlns:xs="http://www.w3.org/2001/XMLSchema" xsi:type="xs:string">sID :1000</om:value>
</om:NamedValue>
</om:parameter>
<om:observedProperty xlink:href="PM2.5_AirPollutantValue" xlink:title="PM2.5_AirPollutantValue"/>
<om:featureOfInterest xlink:href="sensors_arduino_1000"/>
<om:result xmlns:ns="http://www.opengis.net/gml/3.2" uom="μg/m3" xsi:type="ns:MeasureType">130.67</om:result>
</om:OM_Observation>
A service producing this type of results can be seen here: https://nexos.demo.52north.org/52n-sos-hackair-webapp/service.
5.3. GROW example
In the GROW project the SME Hydrologic has developed a SOS service that uses an O&M observation. In this case, a single number is provided as the result of the observation and additional parameters are transported.
<OM_Observation xmlns="http://www.opengis.net/om/2.0">
<type gml:remoteSchema="http://www.opengis.net/def/observationType/OGC-OM/2.0/OM_Measurement" />
<phenomenonTime>
<gml:TimePeriod>
<gml:beginPosition>2018-09-03T09:01:38.000Z</gml:beginPosition>
<gml:endPosition>2018-09-03T09:01:38.000Z</gml:endPosition>
</gml:TimePeriod>
</phenomenonTime>
<resultTime>
<gml:TimeInstant>
<gml:timePosition>2018-09-03T09:01:38.000Z</gml:timePosition>
</gml:TimeInstant>
</resultTime>
<procedure>Grow.Thingful.Sensors_je47sfac</procedure>
<observedProperty nilReason="Thingful.Connectors.GROWSensors.AirTemperature" />
<featureOfInterest nilReason="je47sfac" />
<result>20.64</result>
</OM_Observation>
5.4. Future work
So far we have seen 3 servers using 2 different approaches to represent the result. That is not a problem for a web service (that only outputs data), but it is not the best situation to ensure interoperability at the client side where an integrated client will need to react to any possible encoding variation and deliver the best result.
5.4.1. How to encode the procedure.
The SWE4CS Discussion Paper suggest that we use an approach to encode the procedure that takes into account a recommendation extracted from section 6.18.1 of the Timeseries Profile of Observations and Measurements standard [OGC 15-042r5] that suggests an encoding for both the observation process and the operator of the sensor (the citizen doing Citizen Science) that is based on ISO metadata. This approach will ensure a uniform way to report on these two important aspects of the observation.
Note
|
This approach has not been implemented during the IE but it is considered something we can experiment with in the future. An example of this procedure is provided in the SWE4CS document and reproduced here for convenience. |
<om:procedure>
<tsml:ObservationProcess gml:id="op1">
<!-- processType defines observation performed by human with sensor -->
<tsml:processType
xlink:href="http://www.opengis.net/def/waterml/2.0/processType/Sensor"/>
<!-- processReference defines sampling protocol -->
<tsml:processReference
xlink:href="https://dyfi.cobwebproject.eu/skos/JapaneseKnotweedSamplingProtocol"/>
<!-- if a sensor is used, provide the link to the sensor definition here. Use
SensorML if possible -->
<tsml:parameter>
<om:NamedValue>
<om:name xlink:href="http://www.opengis.net/def/property/OGC/0/SensorType"/>
<om:value>http://www.motorola.com/XT1068</om:value>
</om:NamedValue>
</tsml:parameter>
<!-- operator defines the citizen scientist producing this observation -->
<tsml:operator>
<gmd:CI_ResponsibleParty>
<gmd:individualName>
<gco:CharacterString>Ingo Simonis</gco:CharacterString>
</gmd:individualName>
<gmd:organisationName>
<gco:CharacterString>OGC</gco:CharacterString>
</gmd:organisationName>
<gmd:role>
<gmd:CI_RoleCode
codeList="http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml"
codeListValue="resourceProvider"/>
</gmd:role>
</gmd:CI_ResponsibleParty>
</tsml:operator>
</tsml:ObservationProcess>
</om:procedure>
The result is quite verbose, which might affect performance when many data is transmitted.
5.4.2. Avoiding verbosity by defining a data stream
An approach based on providing a comma-separated recordset that is described only once at the beginning should be more compact and efficient to parse.
Section 8.4.3 of the [OGC 08-094r1] OGC® SWE Common Data Model Encoding Standard v2.0 describes a way to encode a DataStream only once and then send the data directly as a CSV format using HTTP of other protocol. A similar solution could be worth to be tested in the future to increase performance.
6. SOS architectures
In this chapter, we describe three architectures tested in the IE that demonstrate end-to-end architectures as well as interoperability among servers and clients.
6.1. Architecture 1: SOS services integrated in a SOS client
In this architecture ([MiraMonSOSArchit]), the client access directly two different SOS services. It formulates a GetFeatureOfInterest to determine the positions of the individual observations and a GetObservation each time it needs to show a complete description of a single point (the user triggers this event by clicking on an icon) or if it needs to represent different icons as a function of the value of the observation. In this case, interoperability happens directly in the client. Since the SOS requests are communicated to the Internet, this client is exposing requests and response, allowing people to explore the SOS protocol with both the map browser console as well as the browser developer tools.
It is worth mentioning that this architecture is only possible if both services are declaring their willingness to be combined in the header of the responses. By default, programatically reading XML or JSON data coming from a Internet domain different from the client itself is not allowed except if the server states in the header that this is allowed. This is known as Cross-Origin Resource Sharing (CORS). The following headers will allow CORS with anybody.
Access-Control-Allow-Origin: * Access-Control-Allow-Methods: POST, GET, OPTIONS, DELETE
In the case a SOS server does not allow CORS, our client is still able to force a solution by redirecting the request to our server with an extra parameter ServerToRequest. In this case, our server will cascade the request to the specified server and return the response back to the client as if there was only one server involved in only one domain.
6.2. Architecture 2: SOS services integrated in a combined agile service
In this architecture ([GrowSOSArchit]), a common server pulls two or more SOS requests into a central tabular datastore. This datastore records only the information from the returned SOS data that is required for the final visualization and removes information that is redundant, creating a data warehouse representing only one version of the data. In this approach, the interoperability happens internally in the datastore and the SOS requests and responses are not exposed to the final client.
In the diagram below ([GRowDataFlow]), data is stored in the data warehouse and Microsoft’s PowerBI does the heavy lifting for the visualization of the combined data sources.
The common server would typically be a cloud server, but for some clients this is not necessary, in the case of PowerBI, a data bridge is created between the data source and the visualization tool before it is published to a web client.
Other visualization tools (such as Tableau) will have their own methods of connecting to the data warehouse and publishing the results to a web based client.
In this architecture, data is only as up-to-date as the latest data pull from the SOS servers; in the case of GROW, this is done nightly, but this could be made more frequent or moved towards real time using a data log pipeline in a kappa architecture.
6.3. Architecture 3: SOS service for interoperability and JSON API for fast client
This architecture was especially optimized to support the development of lightweight Sensor Web applications. This is achieved by avoiding the direct XML encoding/decoding on the client device. Instead, the interactions between the client (in this case the 52°North Helgoland Sensor Web Viewer) and the server components is achieved via a REST and JSON interface (the 52°North Sensor Web API).
This API can be directly exposed by Sensor Web servers such as the 52°North implementation. Alternatively, an available proxy component is also able to encapsulate existing OGC SOS servers behind the lightweight interface of the 52°North Sensor Web API.
The advantage of this approach is a more lightweight communication pattern to be implemented on the client side. In addition, the 52°North Sensor Web API offers further convenience methods as well as functionalities for reducing the transferred data volume (by generalizing observation data) and improving the data visualization (e.g., providing rending hints). A drawback of this approach is a less direct interaction with SOS servers, so that for integrating new SOS servers, a proxy component has to be configured/adjusted.
7. SOS servers
In this chapter we describe four SOS servers tested in the IE.
7.1. 52 North solution
The 52°North Sensor Web Server comprises several server-side modules which closely interact to provide different kinds data access functionality. In detail, server comprises the following elements.
-
Data storage: The database for storing the observation data is integrated through an object-relational mapping layer based on the Hibernate framework. This allows the flexible integration of different types of database management systems (e.g., PostgreSQL, Oracle, MS SQL Server, MySQL) and data models. For this IE, PostgreSQL was used.
-
For the access to observation data, the server offers three dedicated modules, which use the same common Sensor Web database.
-
SOS: The SOS module offers a comprehensive implementation of the OGC Sensor Observation Service 2.0 standard (including beyond the core several extended functionalities, transactional, and result handling operations). It also offers several interoperability enhancements such as a support of the INSPIRE Technical Guidance on the SOS as a Download Service.
-
SensorThings API: In addition to the SOS support, a dedicated module is available for supporting the OGC SensorThings API Part 1: Sensing (not yet evaluated as part of this IE).
-
52°North Sensor Web API: Complementary to the previous modules, the 52°North Sensor Web API is also offered. This API offers an additional, but optional, convenience layer for building client applications. While both the SOS and the SensorThings API standards are well suited for enabling the interoperable access to observation data, the Sensor Web REST-API allows to provide additional functionality that significantly facilitates the development of client applications. Typical examples of this additional functionality comprise: generalization of observation data (important for developing mobile applications), provision of rendering hints (e.g., styling information for time series), and conversion of data to mainstream formats such as CSV.
-
The URL of the instance used in this IE was: https://nexos.demo.52north.org/52n-sos-hackair-webapp/service.
More information about the initiative can be found here: https://52north.org/software/software-projects/sos/
7.2. istSOS
istSOS (Istituto Scienze della Terra Sensor Observation Service) is an OGC SOS server implementation written in Python. istSOS allows for managing and dispatch observations from monitoring sensors according to the Sensor Observation Service standard.
istSOS evolved over time from being a SOS service provider to complete data management system. But the standard does not account for a number of functionalities that were later included in the software. Some of the extending capabilities are:
-
Handle of irregular time series;
-
On-the-fly aggregation of observed measures with no-data management;
-
Capability to filter observations based on partial observed property names (LIKE filtering support); and
-
Native support for data validation and data quality index associated with each observation.
The project also provides also a Graphical user Interface that allows for easing the daily operations and a RESTFul Web API for automatizing administration procedures.
The URL of the instance used in this IE was: http://artemis.geogr.uni-jena.de/istsos_ie/soil?service=SOS&request=GetCapabilities&version=1.0.0 More information about the initiative can be found here: http://istsos.org/
7.3. Comparison of 52 North SOS and istSOS implementations
52 North and istSOS are both RESTful implementations of OGC SOS standard. We provide a short comparison of these tools, which might help end-users to choose a tool based on their requirements.
52 North and istSOS both provide support for all core operations defined in the OGC SOS specification. 52 North SOS is a Java based implementation whereas istSOS is a Python based implementation. As they are based on different programming languages, their supported hosting application server differs. In order to deploy 52 North SOS, the end-user can either use Tomcat, Jetty, or Glassfish. In order to deploy istSOS, Apache mod_wsgi is required. Both implementations have a restful web interface and support JSON, XML, and plain text for data encoding. 52 North offers bindings like KVP, SOAP, POX, and EXI. istSOS offers bindings like KVP and SOAP. They both run on major OSs like Windows, Mac, and Linux. They both support Postgres/PostGIS as underlying database management system. 52 North implementation also supports database like Oracle, Microsoft SQL Server and MySQL. 52 North implementation provides multilingual support for querying data. istSOS offers automatic notifications via email, Twitter, or other social media. Both SOS implementations provide a user-friendly graphical interface which includes a built-in client, data viewer, and data manager. Both tools come with a detailed documentation with proper examples. They both have a friendly support community which is easily reachable via email and have a supporting emailing list where users can post questions.
7.4. MiraMon SOS Server
MiraMon Server is a stand alone CGI application that runs on Windows operating systems that can be used in combination with a web server such as Internet Information Server or Apache for Windows. It is the ideal solution for people that already uses MiraMon professional on desktop because it uses the same MiraMon formats in the back-end. MiraMon server is based on the same libraries that are used by MiraMon professional and has the same capabilities in terms of CRS support, interpolation algorithms, MMZX compression, etc. One particularity of the software is the internal tiled schema required to serve maps and tiles in a fast an scalable way. MiraMon Server uses OGC web services as a baseline for the interaction to the client. Currently, MiraMon server provides support for the following standards:
-
Web Map Service (all versions)
-
Web Map Tile Service (all versions)
-
Web Coverage Service (version 1.0)
-
Web Feature Service (version 2.0)
-
Web Processing Service (version 1.0)
-
Sensor Observing Service (version 2.0)
The SOS capacity is used in tandem with the Web Feature Service and uses the same MiraMon topologically-structured formats in the back-end. It has been developed in the Ground Truth 2.0 project to serve interoperable data from the Citizen Observatories created during that project. The current implementation is incomplete and only supports GetFeatureOfInterest and GetOBservation operation with limited capabilities. The objective of the minimum capabilities developed was to report the requirements of a viewer client needed to represent a map of the features of interest provided by the service and to allow for a query in a point to get more information about the observations at that point. Each dataset in MiraMon becomes a observedProperty in the SOS service. Each observation is a position in a PNT file that has a DBF record associated that is automatically transformed to a O&M DataRecord. Internally, it is possible to mark field names in the DBF that are associated to concepts in the O&M, such as the phenomenon time and the user name.
Below is the internal format for the small REL5 document necessary to include the extra information that the server requires.
file name: C:\inetpub\SIWeb\gt20\VattenFokus\VattenFokusT.dbf Last update on: 24-01-2019 Number of records: 254 Number of fields per record: 32 Character set: Windows ANSI (88, 0x58) Field characteristics: ------------------------------------------------------------------------------------------- NUM | NAME | DESCRIPTOR | T | SIZ | REL ------------------------------------------------------------------------------------------- 1 | ID_GRAFIC | Identificador Gràfic ID | N | 3 | 2 | USER_ID | User ID | N | 5 | 3 | SAMPLE_ID | Sample ID | N | 5 | 4 | CREA_DATE | Creation Date | C | 16 | 5 | CHAN_DATE | Modification date | C | 16 | 6 | SAMPLEDATE | Sample date | C | 16 | 7 | GROUP_ID | Group ID | C | 34 | 8 | SITE_NAME | Site name | C | 111 | 9 | Sample_date_time | Sample date/time | C | 16 | 10 | N_PARTICIPANT | Total number of participants | N | 3 | 11 | NOTES | Notes | C | 276 | 12 | WATER_TYPE | Freshwater body type | C | 7 | 13 | OTHER_WATER_TYPE | Other freshwater body type | C | 50 | 14 | LAND_USE | Land use in surroundings | C | 17 | 15 | OTHER_LAND_USE | Other land use in surrounding | C | 84 | 16 | BANK_VEGE | Bank vegetation | C | 36 | 17 | OTHER_BANK_VEGE | Other bank vegetation | C | 74 | 18 | ON_WATER | On the water surface | C | 30 | 19 | POPUT_SOURCES | Pollution in surroundings | C | 46 | 20 | WaterUses | Evidence of water uses | C | 33 | 21 | OTHER_WATER_USE | Other evidence of water uses | C | 10 | 22 | AQUATIC_LIVE | Evidence of aquatic life | C | 69 | 23 | OTHER_AQUATIC_LIVE | Other evidence of aq. life | C | 38 | 24 | ALGUE | Algae presence | C | 16 | 25 | WATER_FLOW | Estimated the water flow | C | 7 | 26 | WATER_LEVEL | Estimated water level | C | 7 | 27 | NITRATE | Nitrate | N | 4 | 28 | PHOSPHATE | Phosphate | N | 5 | 29 | TURBIDITY | Water Quality Turbidity | C | 7 | 30 | RESULT | Result | N | 3 | 31 | WATER_COLOR | Estimated water colour | C | 10 | 32 | OTHER_WATER_COLOR | Other estimated water colour | C | 43 |
[VERSIO]
Vers=5
SubVers=0
[GetObservation]
GetObsservation_Vers=5
GetOBservation_SubVers=0
Fitxer=MeetMeeMechelenT.rel
CampDataHoraFenomen=time_last
CampNomSensor=street_nam
Final representation as XML O&M of the same structure:
<?xml version="1.0" encoding="ISO-8859-1"?>
<sos:GetObservationResponse xmlns:sos="http://www.opengis.net/sos/2.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:om="http://www.opengis.net/om/2.0" xmlns:gml="http://www.opengis.net/gml/3.2" xmlns:swe="http://www.opengis.net/swe/2.0">
<sos:observationData>
<om:OM_Observation gml:id="vatten-fokus_2_1">
<om:type xlink:href="http://www.opengis.net/def/observationType/OGC-OM/2.0/OM_ComplexObservation"/>
<om:procedure xlink:href="http://www.opengis.uab.cat/vatten-fokus/procedure/22655"/>
<om:observedProperty xlink:href="http://www.opengis.uab.cat/vatten-fokus/observedProperty"/>
<om:featureOfInterest xlink:href="http://www.opengis.uab.cat/vatten-fokus/featureOfInterest/2"/>
<om:result xsi:type="swe:DataRecordPropertyType">
<swe:DataRecord>
<swe:field name="SAMPLE_ID">
<swe:Quantity definition="http://www.opengis.uab.cat/vatten-fokus/variable/SAMPLE_ID">
<swe:uom/>
<swe:value>45821</swe:value>
</swe:Quantity>
</swe:field>
<swe:field name="CREA_DATE">
<swe:Text definition="http://www.opengis.uab.cat/vatten-fokus/field/Creation_Date">
<swe:value>07/12/2018 17:23</swe:value>
</swe:Text>
</swe:field>
<swe:field name="CHAN_DATE">
<swe:Text definition="http://www.opengis.uab.cat/vatten-fokus/field/Modification_date">
<swe:value>07/12/2018 17:23</swe:value>
</swe:Text>
</swe:field>
<swe:field name="SAMPLEDATE">
<swe:Text definition="http://www.opengis.uab.cat/vatten-fokus/field/Sample_date">
<swe:value>07/12/2018 15:00</swe:value>
</swe:Text>
</swe:field>
<swe:field name="GROUP_ID">
<swe:Text definition="http://www.opengis.uab.cat/vatten-fokus/field/Group_ID">
<swe:value>Dunkern, Group ID: 38438</swe:value>
</swe:Text>
</swe:field>
<swe:field name="SITE_NAME">
<swe:Text definition="http://www.opengis.uab.cat/vatten-fokus/field/Site_name">
<swe:value>Dunkershall. V¤gtrumma uppst¤ms.</swe:value>
</swe:Text>
</swe:field>
<swe:field name="Sample_date_time">
<swe:Text definition="http://www.opengis.uab.cat/vatten-fokus/field/Sample_date/time">
<swe:value>07/12/2018 15:00</swe:value>
</swe:Text>
</swe:field>
<swe:field name="N_PARTICIPANT">
<swe:Quantity definition="http://www.opengis.uab.cat/vatten-fokus/variable/N_PARTICIPANT">
<swe:uom/>
<swe:value>1</swe:value>
</swe:Quantity>
</swe:field>
<swe:field name="NOTES">
<swe:Text definition="http://www.opengis.uab.cat/vatten-fokus/field/Notes">
<swe:value>+2 grader C.</swe:value>
</swe:Text>
</swe:field>
<swe:field name="WATER_TYPE">
<swe:Text definition="http://www.opengis.uab.cat/vatten-fokus/field/Freshwater_body_type">
<swe:value>Other</swe:value>
</swe:Text>
</swe:field>
<swe:field name="OTHER_WATER_TYPE">
<swe:Text definition="http://www.opengis.uab.cat/vatten-fokus/field/Other_freshwater_body_type">
<swe:value>Dike</swe:value>
</swe:Text>
</swe:field>
<swe:field name="LAND_USE">
<swe:Text definition="http://www.opengis.uab.cat/vatten-fokus/field/Land_use_in_the_immediate_surroundings">
<swe:value>Agriculture</swe:value>
</swe:Text>
</swe:field>
<swe:field name="OTHER_LAND_USE">
<swe:Text definition="http://www.opengis.uab.cat/vatten-fokus/field/Other_the_land_use_in_the_immediate_surroundings">
<swe:value></swe:value>
</swe:Text>
</swe:field>
<swe:field name="BANK_VEGE">
<swe:Text definition="http://www.opengis.uab.cat/vatten-fokus/field/Bank_vegetation">
<swe:value>Grass</swe:value>
</swe:Text>
</swe:field>
<swe:field name="OTHER_BANK_VEGE">
<swe:Text definition="http://www.opengis.uab.cat/vatten-fokus/field/Other_bank_vegetation">
<swe:value></swe:value>
</swe:Text>
</swe:field>
<swe:field name="ON_WATER">
<swe:Text definition="http://www.opengis.uab.cat/vatten-fokus/field/On_the_water_surface">
<swe:value>None</swe:value>
</swe:Text>
</swe:field>
<swe:field name="POPUT_SOURCES">
<swe:Text definition="http://www.opengis.uab.cat/vatten-fokus/field/Pollution_sources_in_the_immediate_surroundings">
<swe:value>Other</swe:value>
</swe:Text>
</swe:field>
<swe:field name="WaterUses">
<swe:Text definition="http://www.opengis.uab.cat/vatten-fokus/field/Evidence_of_water_uses">
<swe:value></swe:value>
</swe:Text>
</swe:field>
<swe:field name="OTHER_WATER_USE">
<swe:Text definition="http://www.opengis.uab.cat/vatten-fokus/field/Other_evidence_of_water_uses">
<swe:value></swe:value>
</swe:Text>
</swe:field>
<swe:field name="AQUATIC_LIVE">
<swe:Text definition="http://www.opengis.uab.cat/vatten-fokus/field/Evidence_of_aquatic_life">
<swe:value></swe:value>
</swe:Text>
</swe:field>
<swe:field name="OTHER_AQUATIC_LIVE">
<swe:Text definition="http://www.opengis.uab.cat/vatten-fokus/field/Other_evidence_of_aquatic_life">
<swe:value></swe:value>
</swe:Text>
</swe:field>
<swe:field name="ALGUE">
<swe:Text definition="http://www.opengis.uab.cat/vatten-fokus/field/Algae_presence">
<swe:value>No algae</swe:value>
</swe:Text>
</swe:field>
<swe:field name="WATER_FLOW">
<swe:Text definition="http://www.opengis.uab.cat/vatten-fokus/field/Estimated_the_water_flow">
<swe:value>Surging</swe:value>
</swe:Text>
</swe:field>
<swe:field name="WATER_LEVEL">
<swe:Text definition="http://www.opengis.uab.cat/vatten-fokus/field/Estimated_water_level">
<swe:value>Average</swe:value>
</swe:Text>
</swe:field>
<swe:field name="NITRATE">
<swe:Quantity definition="http://www.opengis.uab.cat/vatten-fokus/variable/NITRATE">
<swe:uom/>
<swe:value>1.50</swe:value>
</swe:Quantity>
</swe:field>
<swe:field name="PHOSPHATE">
<swe:Quantity definition="http://www.opengis.uab.cat/vatten-fokus/variable/PHOSPHATE">
<swe:uom/>
<swe:value>0.075</swe:value>
</swe:Quantity>
</swe:field>
<swe:field name="TURBIDITY">
<swe:Text definition="http://www.opengis.uab.cat/vatten-fokus/field/Water_Quality_Secchi_Tube_(Turbidity).">
<swe:value><14</swe:value>
</swe:Text>
</swe:field>
<swe:field name="RESULT">
<swe:Quantity definition="http://www.opengis.uab.cat/vatten-fokus/variable/RESULT">
<swe:uom/>
<swe:value></swe:value>
</swe:Quantity>
</swe:field>
<swe:field name="WATER_COLOR">
<swe:Text definition="http://www.opengis.uab.cat/vatten-fokus/field/Estimated_water_colour">
<swe:value>Colourless</swe:value>
</swe:Text>
</swe:field>
<swe:field name="OTHER_WATER_COLOR">
<swe:Text definition="http://www.opengis.uab.cat/vatten-fokus/field/Other_estimated_water_colour">
<swe:value></swe:value>
</swe:Text>
</swe:field>
</swe:DataRecord>
</om:result>
</om:OM_Observation>
</sos:observationData>
...
</sos:GetObservationResponse>
7.5. GROW SOS server implementation
The GROW SOS service is tightly integrated into the GROW platform based around Hydrologic’s existing Hydronet 4 platform. The SOS 2.0 service runs concurrently with the GROW standards API in the HydroNET GROW Server.
The service implements two .net packages that disseminate GROW data to the SOS 2.0 standard within the GROW instance of HydroNET 4 Server. A SOS package covers the mapping of SOS 2.0 requests to GROW requests and the mapping of GROW data structures to SOS 2.0 standards. A second package Ogc.Wrapper.Entities contains the SOS 2.0 entity definitions.
The Base URL of the GROW SOS 2.0 service: http://grow-beta-api.hydronet.com/api/service/sos
SOS 2.0 knows four core operations: GetCapabilities, DescribeSensor, GetObservation, and GetFeatureOfInterest.
7.5.1. GetCapabilities
This operation lists all available metadata in the service and provides a detailed list of all other operations that are provided in the service itself. It provides the information you need to execute other operations within the SOS 2.0 effectively.
7.5.2. GetObservation
This operation gives the client access to observation data from sensors. What data is returned is dependent on the parameters you give as a client.
The GetObservation operation requires the following parameters.
-
Procedure: The identifier of the sensor. The procedure can be found in the GetCapabilities response.
-
ObservedProperty: The parameter that you want to query data for. A sensor can provide data for multiple parameters (e.g., soil moisture, temperature). Which ObservedProperies are available for this sensor can be found in the GetCapabilities response or the DescribeSensor response of the relevant sensor.
-
TemporalFilter: The timespan for which you want to query data. Datetimes follow the ISO 8601 standard. The full timespan of data that the sensor provides can be found in the GetCapabilities or DescribeSensor (for the relevant sensor) response.
7.5.3. GetFeatureOfInterest
This operation provides information about features of interest (name, description, coordinates, etc.) of a sensor or an observation.
The GetFeatureOfInterest operation requires the following parameter:
-
procedure: The identifier of the sensor. The procedure can be found in the GetCapabilities response.
The SOS 2.0 service in GROW provides two response types: XML and JSON. The client can provide ResponseFormat in the parameters to define which response is desired. If no ResponseFormat is given, the service returns XML by default.
8. SOS clients
In this chapter we describe three SOS integrated clients tested in the IE.
8.1. Grow Client
The GROW client is a demonstrator application showing the how data sources can be combined in a tabular data warehouse and off-the-shelf visualization tools can be used to create rich visualizations. In the case of this demonstrator, Microsoft’s PowerBi was used, although commercial tools such as Tableau or Spotfire can be used. In addition, free tools such as Grafana, Rawgraphs, or Apache Superset can be employed.
The diagram below shows a typical output using PowerBi, showing GROW sensor locations combined with Airhack sensor locations.
In this application, data is pulled from two SOS sources (GROW and AirHack) via a python script, usually overnight. This data is stored in a tabular data warehouse, in this prototype this is just flat files, however a full scale data warehouse such as Microsoft’s Analysis services or a tabular database such as Cassandra could be used.
Data from the warehouse is pulled nightly into Microsoft’s cloud and then published to a web page that makes heavy use of JavaScript to provide an rich exporable interface.
This approach can be extended to mix different types of data into one visualization. In the figure below, gridded data (locations of sensors) is combined with time series data so that the user can explore the data more fully. In this application, when a user click on a sensor, the time series data from that sensor is displayed to the user. In this case, the tool takes the time series data and displays the average, minimum, and maximum of the time series data.
8.2. MiraMon Client
The MiraMon map Browser is a long-term developing effort to create a visualization, analysis, and download tool that runs in modern web browsers. Based on HTML5 and JavaScript, it uses OGC web service protocols to connect to web services and show the information to the user. The objective of the development of this tool is to assign to web browser and the JavaScript engine as much work as possible, limiting interactions to the server to the minimum possible and the transfer of information to a format that is as raw as possible. This approach can be surprising: these days are many application which prefer to perform processing functionalities in the cloud and not by the client machine. Most of the time, the MiraMon map Browser is directly responsible for creating the visualization on-the-fly based on the raw data, allowing the user to change visualization properties, perform analysis, statistics, or build time series in the client side directly. In the Data quality estimations on the client side section, we will show how the same principle can be used to compute overall data set quality can be computed on-the-fly, as well.
Below are the main functionalities and standards used to achieve that functionality.
-
Raster visualization and query by location is possible using OGC Web Map Service and Web Map Tile Service.
-
Raster analytics is possible by using the OGC Web Map Service in an special way that transmits binary arrays of values (raw or RLE encoded) instead of pictorial representations. Pixel based analysis is performed directly in JavaScript in the client side and visualized on-the-fly.
-
Vector visualization and query by location is possible using OGC Web Feature Service and Sensor Observation Service. The client accepts both XML an JSON formats.
-
Data download is limited to the use of Web Coverage Service v.1.0.
Several layers of data coming from different servers and using different protocols can be overlaid simultaneously. Some layers represent data from a single dataset while other can be a virtual datasets computed on-the-fly for each zoom and pan and created by combining data from more than one dataset and server.
Both WFS and SOS visualization are currently limited to points that are represented in a HTML5 canvas as icons, circles, and texts and could be easily extended to lines and polygons in the near future. These functionalities were particularly useful to show occasional observations made by citizens in different places. We were able to show together Ground Truth 2.0 observations with HackAir observations using the SOS protocol directly in XML and in JSON. Some of the visualization functionalities were actually improved during the IE, such as the capability to condition the color of a circle by an attribute (or an observed result) of the feature. This way, it was possible to represent the level of concentration of a pollutant as a colored circle using a color code that was represented in the legend.
Representing the positions of the observations will require only the GetFeatureOfInterest operation unless the visualization of the feature should depend on the result of the observation. In the later case, a GetObservation is performed and all the information on the features is loaded in the client, making query by location non-dependent on the server.
8.3. 52°North Helgoland Sensor Web Viewer
The Helgoland Sensor Web Viewer developed by 52°North is an open source visualization tool for different kinds of Sensor Web data. It allows exploration of available observation data sets and visualization of the actual data (e.g., time series) as diagrams.
Figure 1 shows the map view of the Helgoland Sensor Web Viewer. In this case, a sample data set of the hackAIR project is explored. The map view shows the locations at which air quality sensors are located.
After selecting a specific measurement location, the data can be visualized as a diagram (see Figure 2). It is possible to combine data from multiple sensors, multiple observed properties, and even from different providers into a single diagram.
8.4. SOS Technology Integration Experiments
During the IE, a set of Technology Integration Experiments (TIE) were conducted. In the client and server architecture a TIE is a test that combines a server with a client and demonstrates that communication between client and server is possible and the user (operating the client) is able to see or get some data. The following table summarizes the tests conducted and the degree of success achieved.
Servers | Data | MiraMon Client | Helgoland Client | Grow Client |
---|---|---|---|---|
MiraMon |
Ground Truth 2.0 |
Yes |
||
Helgoland |
HackAir |
Yes |
||
istSOS |
HackAir |
Yes |
Yes |
|
Grow |
Grow |
Yes |
9. Data quality estimations on the client side
One of the main concerns in using and adopting citizen science-based data is the quality of observations. Citizen Observatories (and, by extension, Citizen Science) are particularly sensitive to data quality because the number of contributors is larger and more heterogeneous than in a traditional data survey campaign. An additional difficulty is that active Citizen Observatories are receiving continuous inputs and updates from citizens. GroundTruth 2.0 has developed a tool to document well the quality of datasets in order to increase the trust in the information collected by citizens integrated in the MiraMon Map browser.
The tool requires that data is exposed in the Web as a service using SOS. It presents a set of tests like positional accuracy, attribute consistency, or confusion matrix that can be applied to a complete dataset or to an area the user is visualizing. Results include an overall quality indicator for the dataset.
The Ground Truth 2.0 Data Quality tool uses an interoperable approach based on QualityML that allows parametrization of the different statistics that are used to assess the quality of the data, and it focuses on data quality indicators for Citizen Science datasets from the QualityML list. The quality module is encoded in JavaScript and has been made available as part of the web based MiraMon Map Browser (https://github.com/joanma747/MiraMonMapBrowser).
9.1. Quality estimation on vector data
The SOS protocol and the GetObservation operation enables a client to retrieve all the information about the results of the observations. With this data, the client can perform all sorts of analysis on the observations including application of some quality checks. This section will discuss a pilot that was done in the GroundTruth 2.0 project that demonstrates this capability in some practical cases.
The selected cases and their implementation are based on the QualityML vocabulary. The scenario of rapidly growing geodata catalogues requires tools focused on facilitating for users the choice of products. QualityML is a dictionary that contains hierarchically-structured concepts to precisely define and relate quality levels: from quality classes to quality measurements. These levels are used to encode quality semantics for geospatial data by mapping them to the corresponding metadata schemas. The benefits of having encoded quality semantics, in the case of data producers, are related with improvements in their product discovery and better transmission of their characteristics. In the case of data users, they would better compare quality and uncertainty measures to take the best selection of data as well as to perform dataset intercomparison. Also, encoded quality semantics allow other components (such as visualization, discovery, or comparison tools) to be quality-aware and interoperable. On one hand, QualityML is a profile of the ISO geospatial metadata standards (e.g., ISO 19157), providing a set of rules for precisely documenting quality measure parameters that are structured in 5 levels. On the other hand, QualityML includes semantics and vocabularies for the quality concepts. Whenever possible, QualityML uses statistic expressions from the UncertML dictionary (http://www.uncertml.org) encoding. However, QualityML also extends UncertML to provide a list of alternative metrics that are commonly used to quantify quality beyond the uncertainty concept.
9.1.1. How data quality is presented
Datasets can have precomputed data quality indicators associated. This is part of the metadata of the datasets, but in the map browser, it has a prominent place in the quality option in the context menu of the layer name in the legend.
Data quality indicators are presented following the QualityML model. For a quality class (in the Figure 16 "Thematic classification correctness"), there could be one or more quality measures (in the Figure 16 "Misclassification") that are made by applying some metrics (in the Figure 16 "MeanAbsolute and StandardDeviation") over a domain (in the first entry of Figure 16 "Omission Error" over the categories "water" and "no water").
Every concept used here is connected to the QualityML vocabulary to know more details about the concept.
9.1.2. How to start computing data quality
The MiraMon Map Browser described in the section SOS clients allows for computing some data quality indicators. To start the process, we should select the right option in the context menu by clicking in the layer name in the legend.
This option opens a dialog box that offers a short list of four quality indicators that will grow with new tests.
9.1.3. Case 1: Positional accuracy of the layer from observation uncertainties
Many citizen Science projects use a mobile phone to get observations. In this process, they use the location capabilities of the phone, including GPS, 3G triangulation, Wifi antenna location, or IP address registration. Each of these methods have different known positional accuracies and the phone is able to estimate accuracy at the same time as it estimates the position. In this case, we will assume that the individual observations have a position and some estimation of the positional uncertainty and these values are recorded by the service and offered as properties of the observation.
To compute this indicator, we should select the property associated with the observation that contains the positional uncertainty.
The calculated data quality parameter is not shown immediately but added to the previous recorded data quality indicators.
The result is a quality report that can be found in the quality
option in the context menu by clicking in the layer name in the legend.
Several things can be commented here. First, the scope is not the full dataset but the view used for calculating the quality indicator: "Dataset fragment of this area: x=[-1.22,4.76], y=[40.36,42.97]." Second, the statement reports that not all observations have positional uncertainties: "There are 140 of 254 that does not have uncertainty information." The accuracy is reported as a half-length confidence interval with a confidence level of 0.683. An uncertainty of 180.39m is not particularly good, indicating the heterogeneity of the methods used to calculate the positions of the observations, some with big uncertainties.
9.1.4. Case 2: Logical consistency of the thematic attributes
Many Citizen Science projects provide the citizens with comprehensive instructions on how to conduct observational tasks. In some cases, observations are limited to a set of possibilities from a list. In more complex cases, selection of an option in the first list results in limited values from a second list. Sometimes apps control user inputs, preventing citizens to input a value that is not listed in the instructions, but in some cases (such as bulk input from a csv), there might be no controls and unwanted values or incompatible value combinations could end up in the database.
In the case of RitmeNatura citizen observatory, we rely on Natusfera software that is designed for biodiversity in general, allowing any possible scientific name while RitmeNatura is asking for a limited set of species. Obviously, if nobody filters the results, there is a chance that observations report on species not contemplated by the RitmeNatura subset.
The logical consistency test can count how many observations are not consistent with a controlled list of possibilities. To compute this indicator, we select the property (or properties) associated with the observation that are affected by a controlled list of possibilities and list the possible combinations of attributes. In this simple case, we will test if the scientific name
is compatible with the list of possibilities described in the legend.
A new quality indicator will be added to the list of quality indicators related to this layer.
Only 129 of the 249 species scientific names are consistent with the legend. In addition, 5 observations have no scientific name (probably because the observer did not know the name).
9.1.5. Case 3: Temporal validity of the observation date
One very simple quality control that can be performed is to check if the observations have an associated date, if the date is in the right format, and if the date is in a range of plausible values.
In this example, we test if the observations where done after the year 2000 because we know there should not be observations before this date.
A new quality indicator will be added to the list of quality indicators related to this layer.
In this case we see that all of the observations have passed the test.
9.1.6. Case 4: Validity of the positions of observations (by bounding box)
One very common mistake in data gathering projects is the presence of observations in places that do not make much sense. Typical mistakes are the swap of latitude and longitude values or simply to have observations in the middle of the Atlantic ocean at the 0,0 position.
In these cases, we are going to run a test to find how many observations are in the Catalonian bounding box.
A new quality indicator will be added to the list of quality indicators related to this layer.
The result identifies 35 observations in this view that are clearly outside the boundaries of Catalonia.
9.2. Quality estimation on raster data
As explained before, the WMS protocol can be used to transport binary arrays instead of pictures. During this IE, we have implemented a comparison functionality that can be used to compare two categorical maps with the same legend. This comparison results in a new map with all combinations of the two maps categories allowing us to discover changes in this maps.
This can be used to compare maps but also to quality control maps if we assume that one map represents the truth.
9.2.1. Confusion matrix
In this exercise we will combine one land cover map created from Open Street Map with another one created by remote sensing.
The process of creating a confusion matrix starts by requesting the combination of both maps in a single layer where the pixels will contain classes that are all possible permutations of the legend. In the Figure 32, the Coimbra version is the map generated from OSM while the CREAF-RS version is the map created by remote sensing. The result of the combination is shown in Figure 32. In principle, the number of combinations is 25, there are only 5 colors present, corresponding to the classes that are the same in both maps.
Now we can request the confusion matrix as a statistical summary of the combination by selecting the option in the context menu.
The diagonal values of the matrix (represented in green) correspond to the pixels that have the same value in both maps. The non-diagonal values are the pixels that have different classes in both maps. We can also see some information about the most similar classes (artificial surfaces and forest and semi natural areas) as well as the Kappa coefficient that is 0.81 (the closer to 1 the better).
A manual exploration of the dataset allows discovery of a big purple area in artificial surfaces from the OSM and forest and semi natural areas from the RS map.
The discrepancy makes sense. A big park in the city is identified as artificial in the OSM version that is more focused on land use while is seen as a forest area from remote sensing due to its green land cover.
9.3. Future work
There are some points the authors of this chapter believe are worthy to develop or explore.
-
In the implementation of the confusion matrix, there is no connection to the QualityML. This connection should be made.
-
Highlight of the observations that were detected as less accurate could be an interesting feature to have.
-
We would like to be able to share the quality assessments with other users. One possibility is using the OGC Geospatial User Feedback standard to report data quality assessments and share those assessments with other users. Saving the quality report in the NiMMbus database (www.opengis.uab.cat/nimmbus) implemented in the NextGEOSS project will allow this sharing.
-
The computations done in the MiraMon map browser are just a small subset of the QualityML vocabulary. We would like to extend the implementation to cover a better range of possibilities.
-
QualityML is a vocabulary for data quality. The OGC definitions server presented in Definitions Server is a generic tool to share vocabularies. Translating QualityML into a format that can be ingested by the Definitions Server should be a priority of the next IE.
10. Definitions Server
The Definitions Server is a service that allows for storing, querying, and linking definitions. The Definitions Server provides a common way to resolve terms published by the OGC and to get details of definitions (instead of downloading large complex documents in varying formats).
Currently, the Definitions Server has a complete set of terms that have been defined by the OGC since the inception of the OGC Naming Authority - which aims to keep all such URL references consistent. In Citizen Science, we can find hundreds of projects dealing with similar topics, but it is difficult to know if they are collecting variables that can be directly compared. By trying to link to a previous definition on the server, terms or variables become connected to other projects. By exposing their definitions in the Definitions Server, other citizen science projects can reuse the same definitions and methodologies.
The Definitions Server has the ability to get machine readable versions of definition details (e.g., JSON to allow simple integration of details into Web and mobile applications). The Definitions Server has a flexible capability to cross-link between terms and the ability to use any information model to extend available details. Further, the Definitions Server allows for per-term or as-package download.
In this IE the Definitions Server was improved and presented by the OGC to the other participants in the IE. As of November 2019, the Definitions Server API is undergoing an upgrade to comply with the emerging W3C Recommendation for "Content Negotiation by Profile" [https://www.w3.org/TR/dx-prof-conneg/]. In the next phase of the IE, we will test the applicability of the Definitions Server for citizen Science purposes.
10.1. What the Definitions Server does
The OGC Definitions Server is a Web-accessible source of information about things ("concepts") the OGC defines or that communities ask the OGC to host on their behalf. It applies FAIR principles (Findable, Accessible, Interoperable, and Reusable) to the key concepts that underpin interoperability in systems using OGC specifications and standards. These concepts can be anything that is important in the course of interoperability around spatial information where the OGC plays a role in facilitating common understanding - either through publishing standards or assisting communities to share related concepts. OGC uses stable web addresses (URIs) to unambiguously identify concepts in its standards. The Definitions Server makes those URIs "work" - i.e., makes the URIs dereference to a definition that can be used.
The OGC Naming Authority manages the Definitions Server to ensure all URIs are stable with transparent governance. These identifiers can thus be safely used in external context. All content is freely available for re-use. Re-use is envisaged largely through the machine-readable versions.
Examples of content in the Definitions Server include the OGC glossary, technical terms from application schemas (for example the HY schema from the https://www.opengis.net/def/appschema/hy_features/hyf/HY_HydroFeature [hydrology domain]), and many others.
Even though only limited search capability is currently provided, the Definitions Server is implemented using Linked Data principles - so the combination of stable URIs allowing references to be made from outside and "follow your nose" navigation via links from one concept to related concepts provides enhanced findability.
The Definitions Server does not make any assumptions about the client software that may be used now or in the future other than the use of HTTP protocols. This enhances accessibility for different environments.
The "Web-friendly" way of using an identifier (i.e., a URL) to get more information is augmented by "content negotiation" - the Definitions Server can deliver both user friendly Web pages and other forms of resource representations, e.g., JSON-LD or Turtle (TTL).
Figure 38 shows different views of a resource HY_Feature. The left panel shows an HTML representation, the middle shows the same information using TTL, and the right using JSON. All three representations have the same content, but differ in its serialization/format. This allows both human users to explore the OGC Definitions Server, as well as machines to process its content.
10.2. Interoperability in the Definitions Server
The interoperability of these resources is a key goal. There are several aspects of this handled using different mechanisms:
-
Content model: can the client understand how the data is structured;
-
Encoding: can the client parse the response; and
-
Interaction: how can a client ask for the form it needs?
10.2.1. Content interoperability
The identifiers mentioned above, i.e., the URLs that can deliver content to the user, are termed Concepts and are organized into ConceptSchemes and Collections. Concept, ConceptScheme, and Collections are defined by SKOS. SKOS, the Simple Knowledge Organization System, is a common data model for sharing and linking knowledge organization systems via the Web. SKOS is a W3C Recommendation.
So why SKOS? Many knowledge organization systems, such as thesauri, taxonomies, classification schemes, and subject heading systems, share an almost similar structure and are used in almost similar applications. Even though these systems might even share exact semantics, you need to learn the semantic relationships by explicitly discovering, accessing, and evaluating the content. Without a standardized interface, this endeavor is labor-intensive and can hardly be executed by machines.
SKOS captures much of this similarity and makes it explicit. SKOS enables data and technology sharing across diverse applications by providing a lightweight, intuitive language for developing and sharing knowledge. In most cases, existing knowledge can be transformed into SKOS, because the SKOS data model provides a standard, low-cost migration path for porting existing knowledge organization systems.
10.2.2. Encoding Interoperability
The Definitions Server currently offers a range of encodings for all terms: 1. HTML, 2. JSON (using JSON-LD augmentations to specify URLs), 3. RDF (as XML,TTL or JSON-LD), and 4. Plain text.
Where applicable, certain types of resources are also available in the original or additional formats. For example, Application Schemas are available in XML schema (XSD) and UML (XMI) forms.
10.3. Using the Definitions Server
10.3.1. URI access
Access of definitions by following any URI is supported.
The server will respond with a HTTP 303 URI redirect to the current service interface appropriate to the requested profile(view) and format.
http://www.opengis.net/def/docs/03-003r10 ⇒ HTTP 303 Location: http://defs.opengis.net/elda-common/ogc-def/resource?uri=http://www.opengis.net/def/docs/03-003r10
(the actual final resource URL may change as we improve the interface - but the original URI will always work)
10.3.2. ConceptSchemes, Collections, and Semantics
Every term belongs to a "ConceptScheme" which will usually be part of the path.
Each part of the path ending with "/" will represent a Collection that contains a list of members http://www.opengis.net/def/docs policy:collectionView http://www.opengis.net/def/docs/.
Terms may also a non-overlapping set of broader/narrower relationships, with the top of each hierarchy linked via skos:hasTopConcept from the ConceptScheme.
The ConceptSchemes support the following linkages:
-
ConceptSchemes are the "unit of governance" where metadata and download links for sets of definitions can be accessed;
-
Collections are a flexible nested way of listing related subsets of terms - where lists may overlap - but do not state semantic relationships between terms;
-
Terms are the basic resources with definitions; and
-
Terms may be semantically related using broader/narrower and other match (e.g., skos:exactMatch).
10.3.3. Search
A basic search capability is provided via the underlying interface: e.g., http://defs.opengis.net/elda-common/ogc-def/concept?labelcontains=Catchment.
This search capability provides machine readable outputs, if requested, via the _format parameter or the HTTP Accept: header. https://defs.opengis.net/elda-common/ogc-def/concept?labelcontains=Catchment&_format=ttl.
Searches may be constrained to a specific concept scheme:
(note URL encoding is required for parameters with URI values - browsers tend to do this automatically)
10.3.4. Downloading Data
Every term includes a link to an "alternates" view.
(This link can be accessed by qualifying any Definitions Server hosted URIs with _view=alternates or _profile=alternates)
A W3C compliant view for the specific concept (not the dataset as a whole) can be accessed with _profile=all.
This view lists available formats for both the individual term and the collection or package that defines it:
ConceptSchemes offer download options for original sources of definitions - for example an Application Schema will have a download link for the canonical UML model file.
Collections allow list of concepts to be downloaded.
Concepts allow simple packages of information about the concept itself to be accessed.
11. User and Application Federation
In the IE, we experimented with the federated identity provider developed in the project H2020 LandSense as an Authorization Server that enables a federation of applications with user Single-Sign-On.
The Landsense project contributed the Engagement Platform (https://lep.landsense.eu/Project/LEP), the H2020 Scent project contributed the Scent Harmonisation Platform Visualisation Site (https://scent-harm.iccs.gr/) as well as the Scent Explore and Measure mobile applications (https://scent-project.eu/scent-toolbox), and the H2020 NextGEOSS contributed the NiMMbus Geospatial user feedback system (https://www.opengis.uab.cat/nimmbus/). The aforementioned platforms and applications were able to work together and use the LandSense Authorization Server to authenticate users and create a Single-Sign-On experience. From the user perspective, once logged in one of the platforms, the use could use the other two platforms in a transparent way without having to authenticate again.
The federation is designed in a way that it is compliant with the GDPR EU regulation and the user is in full control of which information is released, to whom, and for a particular purpose. When registering an application with the Authorization Server, the operator of the application must declare which personal information of the user is required. At one end of the spectrum, an application can be registered to not require any personal information. On the other end of the spectrum, an application can request personal information and the user must approve the flow of the personal information to the application the first time the application is used. Once approved, the user can revoke the approval, which will stop the application from obtaining personal information.
11.1. The LandSense Authorization Server
The Authorization Server (AS) supports users to login from a variety of login providers including social media, organizations, and academic institutions participating in eduGAIN. Based on the trust in the login providers, registered applications, services, tools and Application Programming Interfaces (APIs) can be used by operating a RFC 6750 compliant Open Authorization 2 (OAuth2) Resource Server that accepts either JSON Web Token (JWT) or Bearer Access Tokens from any LandSense compliant OAuth2/OpenID Authorization Server.
The Authorization Server is extensible to any other login providers as long it is compliant with the federation requirement regarding the participation as a login provider: deployment of a Security Assertion Markup Language v2 (SAML2) compliant Identity Provider. The LandSense Coordination Centre digitally signs and hosts the SAML2 metadata for the LandSense federation by which trust is established between the SAML2 Identity Providers and the Authorization Server.
The LandSense Authorization Server acts as a GDPR compliant broker between the personal information received after a user’s login and registered applications based on user approval. In order to honor GDPR data minimization, the AS requests from the Identity Provider (IdP) at login only that amount of personal information that is required by a registered application. This amount (and which attributes, in detail) is controlled by the registration / login level. The AS provides five levels, of which the first two do not enable an application to obtain personal information: AUTH, CRYPTONAME, SAML, PROFILE, EMAIL, PROFILE+EMAIL. It can be extended to other levels like ADDRESS and PHONE.
11.1.1. Levels of personal information
AUTH Any application that is registered with this level must not be GDPR compliant, as there is no information about the user other than "yes we know that you have successfully logged in with one of the trusted IdPs."" After login with Level AUTH the user will not see any personal information.
An example application could be a simple geo-fencing service where there are only inside/outside of polygons.
CRYPTONAME Any application that is registered with this level will receive a cryptoname for the user. This cryptoname is unique across all trusted IdPs and generated after a successful login. Based on the concept of creating a cryptoname for a user, it can be guaranteed that the identifier is still correct after more IdPs join. At the current state, LandSense federation trusts approximately 2850 Identity Providers worldwide.
The cryptoname is not stored at the Authorization Server, which ensures that no personal information can be obtained based on the single possession of the cryptoname. This allows applications to cluster (group) user contributions without knowing the real identity of the user. Because of the clustering, any registered application processing just the cryptoname must not be GDPR compliant. After login with Level CRYPTONAME, you will see your cryptoname as value of the personal claim sub.
An example API registered with CRYPTONAME level could generate quality indices on citizen science contributions based on user contributions stored at participating Resource Servers without knowing the actual identity of the original user(s). Even though a Resource Server stores more personal information, only the cryptoname would be released (with the actual data), as the level(s) can be verified with the Authorization Server.
SAML This level allows the registered application to obtain metadata about the Identity Provider where the user has used to login: country, federation identifier, and name of the federation the IdP is registered with. This level can used as an add-on for the levels PROFILE and EMAIL.
For an application that operates with CRYPTONAMEs, this level is a valuable add-on as it allows to associate the user with one IdP.
PROFILE Any application that is registered with this level will be able to receive personal information as defined in the OpenID Connect specification for the scope profile (https://openid.net/specs/openid-connect-core-1_0.html) after the user has given their approval. Any application operating on this level must be fully GDPR compliant, which means that the registration process requires a URL containing the privacy statement of the application. This privacy statement defines which personal information is requested, for which purpose, and which operators will be able to also process the personal information. After login with Level PROFILE, you will see the cryptoname plus all available personal information that fall into the scope profile.
EMAIL Any application that is registered with this level will be able to receive personal information as defined in the OpenID Connect specification for the scope profile (https://openid.net/specs/openid-connect-core-1_0.html) after the user has given their approval. Any application operating on this level must be fully GDPR compliant, which means that the registration process requires a URL containing the privacy statement of the application. This privacy statement defines which personal information is requested, for which purpose, and which operators will be able to also process the personal information. After login with Level EMAIL, you will see the cryptoname plus all available personal information that fall into the scope profile.
PROFILE+EMAIL This is a combination of scopes PROFILE and EMAIL. After login, you see your crypto name, email address, whether it is validated, and all the personal information received for scope profile.
11.1.2. How to register an application
Any application (mobile, web browser based, native, or API) that supports OpenID Connect can be registered with the LandSense Authorization Server (OpenID Connect Provider). Assuming successful registration, the application can then use other provisioning and offerings from other registered APIs leveraging access tokens.
11.2. LandSense Engagement Platform
In a nutshell, the LandSense Engagement Platform (https://lep.landsense.eu) is to become the marketplace where citizens can participate in the various Land Use and Land Cover (LULC) related campaigns and interested parties can reuse existing services and register new applications.
The first version of the LandSense Engagement Platform was realized based on the existing tools, services, and platforms from LandSense partners as well as new applications built for the Demo Cases.
11.3. Scent Harmonization Platform
Scent Harmonisation Platform Visualisation Site (https://scent-harm.iccs.gr/) is a client application tailored for the purposes of inspecting and visualizing traditional in-situ and citizen-generated observations.
The Visualisation site constitutes a custom innovative application that exposes the resources made available from the Scent Harmonisation Platform. The application conforms to OGC SensorThings API standard and it consists of the following main characteristics.
-
User-friendly interfaces enabling both time-series analysis and spatial representation of SensorThings API resources including graphical visualizations with filtering capabilities per observed phenomenon and sensor as well as low-level interaction with the Harmonisation Platform SensorThings-driven schema.
-
An interactive campaign dashboard that enables the spatial visualization and graphic representation of the images of Land Cover/ Land Use elements that have been collected from the volunteers in the context of the project’s citizen science campaigns.
Scent Harmonisation Platform manages a variety of citizen-generated data as well as environmental data that has been collected through in-situ monitoring stations in the Kifisos river basin, Attica, Greece. All the data are being maintained and have been structured according to widely accepted standards, such as those from the OGC, in order to be compliant with open and unified frameworks (such as SensorThings API).
Details regarding the integration of Scent Harmonisation Platform Visualisation Site with LandSense authorization server are provided in Integration between SCENT & LandSense.
11.4. Scent Explore
Scent Explore is a mobile application that enables citizens to capture environmental related information. It provides a user-friendly interface through which citizens are guided to areas where essential environmental information is needed. There, they may collect images of LC/LU elements along with textual descriptions, measure water level and flow velocity, and report flood related events like the existence of obstacles in the river, flooded locations, etc. Citizens use the app in a playful way, by discovering and collecting little characters hiding in places around them and thus collecting points.
Details regarding the integration of Scent Explore with LandSense authorization server are provided in Integration between SCENT & LandSense.
11.5. Scent Measure
Scent Measure is a mobile application that works in tandem with a potable smart sensor (Xiaomi International Version Flower Care Smart Monitor) connected to the user’s mobile device aiming to measure soil conditions. Users can simply insert the sensor into the ground and select whether to measure and report soil moisture levels and/or air temperature and receive the measurements directly to the app.
The app constitutes an Android application that has been developed with Java as it enables easy system modeling and has support for many cross-platform software libraries. The Scent Measure application can be easily modified/adapted to support any kind of smart measuring sensors providing they have a Bluetooth connection interface with the portable devices and Bluetooth support for the message exchange from the sensor to the portable device.
Details regarding the integration of Scent Measure with LandSense authorization server are provided in Integration between SCENT & LandSense.
11.6. NiMMBus Geospatial User Feedback
The NiMMBus web portal records geospatial user feedback about existing geospatial resources. The user is able to provide comments, rates, quality reports, and publications related to a geospatial resource. The portal can be used to comment on datasets but also on individual observations. The system allows creation of a citation of an external resource (in an external catalogue or repository) and associate feedback items about it. The system builds upon a service developed in the H2020-funded NextGEOSS project. Registered as an Web Browser-based application with the LandSense Authorization Server, the application can be used to collect user feedback with resources provided by other Resource Servers (APIs) also registered with the LandSense Authorizaiton Server.
The system is based on the NiMMbus, a solution for storing geospatial resources on the MiraMon cloud. The system implements the Geospatial User Feedback (GUF) standard developed in the OGC GUF (and started in the FP7-funded GeoViQua project).
The solution is composed of three elements: the open source code for a JavaScript client, a server that stores the feedback information, and a well-documented API that allows for interacting with the client.
12. Connecting Citizen Science data sets to GEOSS
In addition to the developments outlined above, this IE also examined possibilities for the Citizen Science community to make their projects discoverable and accessible via the GEOSS.
Note
|
The worldwide effort to build GEOSS is led by the Group on Earth Observations (GEO). GEO is an intergovernmental organization working to improve the availability, access, and use of Earth observations for the benefit of society. GEO works to actively improve and coordinate global Earth Observation systems and promote broad, open data sharing. |
We see a major benefit in establishing this connection because GEOSS already provides an established process to clarify data policies together with established data management principles and to provide the minimum metadata required to access the data sets created by Citizen Science projects. This activity thereby helps to surface and mobilize already existing data sets - with clear acknowledgment of the Citizen Science contributions. We expect that this work provides concrete contributions to the increasing discussions on how Citizen Science could connect to GEOSS and, more generally, how more in-situ data and derived knowledge could become available at the global level.
Within this IE, we particularly explored the possible technical connection with the GEOSS Platform facilitated by OGC standards. We also identified organizational structures that would be required in order to provide a more flexible and scalable solution to Citizen Science projects, both big successful ones and small ones. We consider this as a major need for the future evolution of the Citizen Science contribution because of the high number of already existing projects that could potentially be connected to GEOSS. OGC standards can play an essential role to facilitate this connection. This elaboration would be also generalized to ongoing debates on increasing the availability of in-situ data in GEOSS.
Notably, we see the connection with GEOSS as one highly promising way to make Citizen Science data better accessible and more widely used. For example, complementary efforts might be undertaken, to increase the findability via mainstream search engines, or – more generally – to provide the machine readable information about projects resources that is required for automated harvesting by web-crawlers. We come back to this issue in the final part of this section, when outlining possible follow up activities.
With our exercise so far, we consider the following overarching principles and carry them to the Citizen Science community.
-
GEO Data Management Principles: addressing issues such as discovery, access, traceability, quality documentation, and preservation.
-
Not favoring any silos, i.e., need for open data and leveraging open solutions, which in this context means the provision of data as part of the GEOSS Data Core, but also the application of the GEO Architecture Principles, which in addition advocate flexibility, scalability, etc.
-
Distributed, standards-based, and flexible to support interoperability while meeting a range of use cases and needs, which translates to the use of the GEOSS Discovery and Access Broker (GEO DAB) that includes – among other things - the support to a large range of OGC standards.
12.1. Connecting a single Citizen Science project to GEOSS
The most straightforward way to connect Citizen Science data sets to GEOSS is the inclusion of the project that collects these data sets at the GEOSS Platform. This process essentially requires the registration of the project in the GEOSS ‘Yellow Pages’ and it can be used to register multiple data sets from one single project (see also Figure 49 below).
The entry of the Yellow Pages requires information (metadata) such as:
-
Lead organization (name, description, URL, geographical coverage, GEO affiliation, contact points, etc.);
-
Type of provided online resource (data, vocabulary, model, algorithm, etc.) – in the context of this Interoperability Experiment we focus on data;
-
Data policy, including the option to declare as free and open by choosing “GEOSS Data Core” (see Figure 50 below);
-
Indication of GEOSS data management principles that are implemented (see Figure 50 below);
-
Relevance for the Sustainable Development Goals; and
-
A service endpoint.
Once completed, the information provided by the Citizen Science project is passed on to the GEO DAB team. Members of this team check the entries and run some tests (for example, if the provided endpoint actually serves the intended data, if the endpoint needs to implement the required standards (from the OGC or comparable alternatives) and if the endpoint indeed follows the indicated principles and applies the indicated data policy). This testing might entail a dialogue with the registering Citizen Science project in order to make the project’s offering fits the promises of the entry for the GEOSS Yellow Pages.
SCENT Citizen Observatory successfully undertook the process to offer its citizen-generated data to GEOSS. Following the administrative registration that was described in detail above, the Interoperability Registration and Brokering workflow took place. This registration consisted of the following steps.
-
Technological Information was provided to the DAB team. More specifically the SCENT web server URL and the endpoints of the SCENT OGC web services (WMS, WFS) were sent to the DAB team along with the accompanying descriptions. In addition, information was provided about WFS and WMS services versions, the supported relevant operations through the APIs (e.g., GetCapabilities, DescribeFeatureType, GetFeature, etc.), the various feature types, the SCENT data to be integrated to GEOSS (i.e., event, images, video metadata), and the associated geographic regions that the services cover (i.e., Kifisos river basin, Attica Greece and Danube Delta, Romania).
-
DAB team conducted a set of interoperability tests with SCENT WFS and WMS services, including discoverability, accessibility, and visualization use cases. A test report (GEO DAB/SCENT web server Brokering Test Report) was generated as an output of this process.
-
Notification was received from DAB team regarding the successful conduct of the tests and that no interoperability issues were found. Taking into consideration suggestions in the test report, metadata information for the datasets provided was further enriched.
-
As a final step of the process, the DAB team proceeded with the successful integration of the SCENT web server into the GEOSS Portal (offering its services to the production environment). Thus, users can access SCENT resources via the GEOSS Portal catalogue (https://www.geoportal.org/?f:sources=wfsscentID%2CwmsSCENTID) as presented in the Figure 51.
Overall, we proved that Citizen Science projects indeed can implement the free and open data policy of the GEOSS Data Core, and that all of the GEOSS Data Management Principles are followed by Scent. Furthermore, through the OGC Citizen Science Domain Working Group (DWG), we offer examples and guidance on how such implementations can be realized with OGC standards. However and realistically speaking, and in order to see concrete progress, it appears more feasible that projects first register what they already support, and at least fulfill the needs to make their data sets discoverable from within the GEOSS Platform using well-documented metadata and include information about data quality. Following the brokering approach of GEOSS, this registration entails that the projects provide a minimum of required information and ideally follow one of the multiple standards that are already supported by the GEOSS platform. If it should not be immediately possible to also provide one of the multiple options to make data accessible or to provide that data in an already recognized encoding, organizations might still consider registering a project and then update the record in the Yellow Pages once the additional functionalities for harmonized data access are put in place by the project.
12.2. Why a case-by-case registration is not the best way forward
Although the possibility to register Citizen Science projects of any size in GEOSS exists and has been illustrated in the previous section, we do not recommend that each and every project go ahead and register by itself right away. Potentially, the number of relevant contributions is (at least) in the hundreds: see for example, this project inventory and the related prototype of a project catalogue (Figure 52).
It becomes obvious that a case-by-case registration per project (which each might want to register one data set or more) would create a bottleneck at the GEO DAB and the team that is responsible for evaluation and test of entries in the GEOSS Yellow Pages. As a result of our investigations within this IE, we therefore suggest to elaborate on and develop an intermediate layer that provides the required organizational and technical support to the Citizen Science community so that their data sets become better discoverable, accessible, and potentially more widely used - thereby also amplifying visibility and impact of the individual projects.
We consider such structures particularly important in view of larger mobilization campaigns of Citizen Science projects, as, for example, planned within the context of the Earth Challenge 2020 (EC2020). Again, also here the two/multiple-step approach - where project resources become discoverable first and commonly accessible in a second stage - might be most realistic in order to progress more quickly and to have intermediate results.
12.3. How to improve the connection of Citizen Science into GEOSS
In order to move ahead, we identified requirements that we are grouping in different approaches that complement each other.
12.3.1. Provide technical support to connect to the GEOSS platform
There appears to be a need to slot technical support for Citizen Science projects in the GEO DAB. This additional support should remove the potential bottleneck and help to scale up the number or Citizen Science projects and related data sets in the GEOSS Platform (and ideally in the GEOSS Data Core). Requirements for this support entail:
-
Support Citizen Science projects in registering with the GEOSS Yellow Pages;
-
Proving examples and guidance on the use of OGC standards for implementing GEOSS requirements for data discovery, quality descriptions, data access, data encodings, etc.;
-
Pre-testing of Yellow Page entries before registration in GEOSS;
-
If necessary, interaction with individual projects to correct their entries for the Yellow Pages;
-
Liaise with the GEO DAB team in order to actually register the new entries; and
-
Establish a capacity building mechanism, capable of supporting and equipping existing initiatives with the necessary skills to apply data management principles related to the accessibility, discoverability, re-usability, and curation of their resources.
12.3.2. Federate multiple Citizen Science projects and their endpoints into a single access point
To reduce the number of endpoints connected to the GEOSS Platform, federations of citizen Science projects could act as hubs that would cluster multiple Citizen Science projects and their endpoints into a single access point, which is then registered within the GEOSS Platform (see Figure 53).
This federations could be thematic or regional and take advantage of the current structure of activities in the GEO work program.
Considering the Earth Challenge 2020, we could imagine the following architecture: EC2020 will collect new data and offer it via a dedicated API. At the same time, several already existing Citizen Science projects partner with EC2020 and also provide access to their data (in different forms). For the connection to GEOSS, EC2020 could provide a gateway that federates the newly collected data and the offerings of the different partners to a single discovery service and a single data access service. These two endpoints would be registered via the Yellow Pages with the GEOSS Platform only once and thereby make the EC2020 resources more widely visible, together with a clearly defined and well-known data policy and following most recent data management principles. The figure below depicts this setting.
Note
|
It is important to realize that we are not proposing a single federation maintained by GEO but a collection of self organized federation that act as a aggregation point and provide services to the citizen science projects. This federations can have different scales and can be thematic or regional. |
Extra considerations for Citizen science federations
A federation of services can provide extra services to the citizen science projects. Below is a short list of examples that can grow with time.
Standard translation
Currently the GEO DAB does not fully support SWE standards such as OGC Sensor Observation Service and OGC SensorThings API. A service in the federation can provide the translation of services to other supported services such as WFS or WMS, allowing for harmonized access via the GEOSS Platform.
Data aggregation
Some compatible projects can be aggregated into larger virtual datasets that can be served on-demand.
Federated authentication
As discussed in User and Application Federation, a federation could provide a mechanism to authenticate users that can then provide observations to several projects in the federation from a single app or from multiple apps used in parallel. This can be useful on the data input case (data capture), but could be also used for a pull of experts validating data coming from different projects. A federated authentication can also protect the privacy of the citizens as discussed in Citizen privacy and protection
Data preservation
A federation can provide a service that allows for archiving data from project campaigns from ephemeral citizen science projects or projects that can no longer be maintained.
Common definitions
Sharing common definitions (with tools such as the definition server proposed in Definitions Server) will be essential to ensure data integration and should be a part of the federation.
12.3.3. Improve networking and capacity building
On the other hand and because the offerings made above alone would not be enough to actually advance from the current situation, the networking of the Citizen Science community deserves dedicated attention. The current Earth Observations Citizen Science community activity in the GEO work program could provide additional help. The following requirements have been identified.
-
Mobilizing existing data sets, i.e., reaching out to the Citizen Science community and informing that community of the work in this IE and the linked offering of the increased visibility and possible impact of CS data, providing guidelines and practical examples on what would need to be done from their side, and offering support in establishing the connections;
-
Help in preparing new data sets, i.e., be available to consult Citizen Science projects during their set-up phase, and let the community know about this offer;
-
Promote FAIR data management and GEOSS as a practical way to get there;
-
Provide access and training for (OGC) standards-based tools that the community can use to make the connection and implement the desired data policy and data management principles.
For further discussions and possible realizations, it should be considered if the support outlined above could be provided in a coordinated but decentralized way. We could imagine that the above-mentioned support could be tailored for different geographic regions, thematic areas, or other sensible divisions (e.g., specific for EC2020, which would still need to be discussed). Such settings could also help to disseminate good practices, for example, on the use of OGC standards in this context.
12.4. Future work regarding to the GEOSS integration
This IE helps us to identify current possibilities and to shape parts of the way forward. However, the work of the IE has also left a few questions unanswered and raised some new issues. We should develop different scenarios to meet the identified organizational requirements exposed before. From our experiences, we see particular needs to further investigate the following aspects.
-
Acknowledging that Citizen Science data is already included in GEOSS today, i.e., systematically flagging where Citizen Science has already contributed to a knowledge resource on the GEOSS Platform (GEOSS Data Core, ideally).
-
Develop detailed examples and guidance on how CS projects can implement the different GEO Data Management Principles by using the many already supported OGC standards.
-
Consider promoting SWE standards such as OGC Sensor Observation Service and OGC SensorThings API to be considered by the GEO DAB, because both standards appear to be taken up by several Citizen Science projects, but at the moment they are not supported by the GEO DAB, so other standards (such as WFS or WMS) need to be implemented, in addition, to allow harmonized access via the GEOSS Platform.
-
Consider Citizen Science not only as a data source, but also explore the possibilities and use of OCG standards when it comes to the engagement of Citizen Scientists as part of data validation.
-
Also consider Citizen Science as part of the processing capacity, collective intelligence, data cubes, relationship to Web Processing Service (WPS), work on Artificial Intelligence, etc.
While focusing on the connection to GEOSS here, we should also investigate how this work relates to the provision of metadata for ‘flat’ online searches (e.g., Google search) and accessibility of the data to automatic web crawlers. We might want to address both topics in a single go. If we will work towards intermediate organizational structures with the help the Citizen Science community in using OGC standards and the GEOSS Platform for improved data policies and management, can these intermediaries – and the tools and services they provide – also automatically cover these complementary needs?
12.4.1. Citizen privacy and protection
The aspect of citizen privacy and personal data protection is a serious one that should not be undermined. There have been recent examples of commercial companies using social media companies' personal data and citizen profiles for unethical purposes or for their own profit. In extreme cases, companies' business models were based on collecting and integrating personal data of their users to then sell to third parties the personalized databases and services for commercial or political targeting. Accidentally allowing CS data to be gathered by these platforms may open the door to the use of the personal data of those who have collected the CS data and those who use the CS data without their consent. Such a scenario is clearly against the data protection regulations in Europe and other areas, but still is technically possible.
This important issue needs to be addressed by the individual citizen science projects, the emerging federations, and the GEOSS platform at large. This is a real problem that should be included in an GEO architecture ensuring a good balance between the necessary anonymity of the citizens' personal data as well as the acknowledgement of their individual contributions when participating in the Citizen Science activities. The proposed federation discussed in the User and Application Federation has an embedded component taking care of this priovacy aspect (see Levels of personal information) with two levels of privacy control that ensure absolute privacy while other levels allow for some degree of acknowledgement and recognition. These privacy considerations need to complement with the way hosts manage and own data.
It is our responsibility to raise this issue within the GEO community and find the right solution that will most likely require a combination of technical, management, and legal aspects.
Appendix A: Integration between SCENT & LandSense
During the past few years, a variety of citizen-science tools have been implemented aiming to enable citizens and relevant associations and groups to be involved and engaged with environmental monitoring. The scope and functionalities of such tools may vary, ranging from mobile applications enabling collection and semantic annotation of multimedia and communication with portable sensors to visualization engines and content creation and campaign configuration applications.
One of the scopes of this activity is to demonstrate interoperability and integration between implemented processes of authentication and authorization. In particular, three applications developed within the H2020 SCENT Citizen Observatory project: the visualization site of SCENT Harmonization platform, SCENT Explore, and SCENT Measure are integrated with an authorization server implemented by H2020 LandSense Citizen Observatory. Thus, this chapter aims to showcase both the results arising from this process as well as to constitute a guide aiming primarily to assist integration of existing citizen-science applications with LandSense authorization server or even to replicate this process within their own infrastructure.
Authorization depends on, but is rather separated from, authentication. The first term refers to the process of deciding whether an application should be allowed to conduct an operation following the receipt of a request by the user, whilst the second refers to determining the identity of the user or program sending that request. A characteristic of an authentication mechanism constitutes the Single Sign On (SSO), that relates to the user’s identity being used to provide access across multiple systems (i.e., services, applications). SSO allows a single authentication process (managed by a single Identity Provider, or other authentication mechanism) to be used across multiple systems within a single organization or across multiple organizations (i.e., common login credentials across systems). Last but not least, establishing a federation involves the management and mapping of user identities between Identity Providers across organizations (and security domains) via trust relationships. In the context of this experiment, the following paradigms (use cases) were assessed:
-
having different systems across multiple organizations (projects) trusting/connecting to a single third-party Identity Provider; and
-
having different Identity Providers across different projects trusted by a single system.
The SCENT Harmonisation platform visualization site constitutes a single page web application written in JavaScript. To achieve integration with LandSense authorization server, the ‘implicit grant’ modality of OAuth2 protocol is implemented. The reason for using Implicit Grant Type is because the Harmonisation Platform (and any front-end javascript application) cannot guarantee the client secret confidentiality (which is essential for the other OAuth2 flows). Following the user log in, the access_token is issued immediately allowing users to access and use protected resources and operations. In addition to the access token, an ID token is also issued from the Authorization Server. The ID token takes the form of a JWT (JSON Web Token) which is a JSON payload that is signed with the private key of the issuer (LandSense Identity Manager) and can be parsed by the Harmonisation Platform. Inside the JWT (ID token), there are a handful of defined property names that provide information to the application (e.g., the Harmonisation Platform). This information includes a unique identifier for the user, the identifier for the server that issued the token, the identifier for the client that requested the token, etc. But the information also includes some user attributes (email, name, etc.) so that the client (e.g., Harmonisation Platform) is able to display the user information in the navigation bar.
One scenario involves the following main steps as displayed in the image above.
-
The first step involves an interaction of the user with the web-browser (i.e., choosing to log in) upon which the client generates and sends a login request to the authorization server (i.e., LandSense authorization server, in our case). The request is sent in the form of a HTTP request and the information is sent as URL query parameters. More specifically, the following parameters are specified during the request:
-
client_id: A publicly exposed string that is used by the service API to identify the application;
-
redirect_uri: The location where the service will redirect the user after they authorise (or deny) the application (i.e., Harmonisation Platform) and therefore the part of the application that will handle access and ID tokens;
-
scope: specifying the level of access that the application is requesting; and
-
response_type: In this flow, the value is “id_token,” which means that a successful response must include both an access token and an ID token.
-
The configuration file that was created is provided below.
export const oidcSettings = {
authority: 'https://as.landsense.eu',
clientId: '<my_client_id_>',
redirectUri: 'https://scent-harm.iccs.gr/oidc-callback',
responseType: 'id_token',
scope: 'openid profile email landsense birdlife',
end_session_endpoint: 'https://as.landsense.eu/oauth/revoke'
}
The aforementioned configuration file has been used by the library vuex-oidc (url: https://github.com/perarnborg/vuex-oidc, License: MIT). Vuex-oidc library has been integrated with the Harmonisation Platform in order to implement the Oauth2 implicit workflow.
-
In what follows, the authorisation server checks the request, and if it is valid, it presents to the user the login form.
-
The user inserts his/her credential by selecting one of the systems registered to the LandSense authentication server. Following the conclusion of this process, the access and ID tokens are sent to the SCENT Harmonisation visualization site and the user is automatically transferred back to the application.
Scent Measure is a mobile application that works in tandem with a portable smart sensor (Xiaomi International Version Flower Care Smart Monitor), connected to the user’s mobile device intended to measure soil conditions. The application is available for Android versions newer that 4.3 Jelly Bean, requires devices with Bluetooth version greater than 4.1, and offers a maximum measurement update frequency of 15 seconds.
The section below describes the process from the integration of SCENT Measure with LandSense authorization server, while also constituting a guide that can facilitate integration of any android application with Identity Management System that adopt the OAuth2 protocol.
In order to enable login in to your application with Landsense you will have to implement the openid connect authentication code flow. In order to speed-up the development process you can use (AppAuth) which is publicly available and documented in the following link.
https://appauth.io/
You can import AppAuth by using the following dependency in your build.gradle file.
implementation 'net.openid:appauth:0.7.1'
Now you are ready to use app-auth’s classes in order to implement your authentication flow. Initially you will have to create an authentication service configuration according to details provided by Landsense’s configuration web page available at the following URl.
https://as.landsense.eu/.well-known/openid-configuration
The configuration requires Landsense’s authorize and token endpoints.
AuthorizationServiceConfiguration serviceConfig =
new AuthorizationServiceConfiguration(
Uri.parse("https://as.landsense.eu/oauth/authorize"), // authorization endpoint
Uri.parse("https://as.landsense.eu/oauth/token"));
Next the client application’s specific details must be provided in preparation of the authorization request. The following client application specific details are required:
-
Client application id
-
Redirection URL
You can find these details within the dedicated Landsense configuration page available for your application.
AuthorizationRequest.Builder authRequestBuilder =
new AuthorizationRequest.Builder(
serviceConfig, // the authorization service configuration
"XXXXXXXXXXXXXXX@as.landsense.eu", // the client ID, typically pre-registered and static
ResponseTypeValues.CODE, // the response_type value: we want a code
Uri.parse("com.example.application:/callback")); // the redirect URI to which the auth response is sent
Finally, you can build your request and then directly indicate the activities required upon successful and non-successful authentication.
AuthorizationRequest authRequest = authRequestBuilder.build();
AuthorizationService authService = new AuthorizationService(this);
authService.performAuthorizationRequest(
authRequest,
PendingIntent.getActivity(this, 0, new Intent(this, FullscreenActivity.class), 0), //Auth succesfull activity
PendingIntent.getActivity(this, 0, new Intent(this, LoginActivity.class), 0)); //Auth failure activity
You will be able to handle the auth response within the invoked activities as follows:
AuthorizationResponse resp = AuthorizationResponse.fromIntent(getIntent());
AuthorizationException ex = AuthorizationException.fromIntent(getIntent());
Another important aspect required by AppAuth is capturing the authorization redirect. You can configure all redirects through a manifext placeholder io your application’s build.gradle file as follows:
manifestPlaceholders = [
'appAuthRedirectScheme': com.example.application :/callback'
]
and by adding an intent-filter for AppAuth’s RedirectUriReceiverActivity to your AndroidManifest.xml:
<activity
android:name="net.openid.appauth.RedirectUriReceiverActivity"
tools:node="replace">
<intent-filter>
<action android:name="android.intent.action.VIEW"/>
<category android:name="android.intent.category.DEFAULT"/>
<category android:name="android.intent.category.BROWSABLE"/>
<data android:scheme="com.example.application"/>
</intent-filter>
</activity>
Following this, you can now login to your application through LandSense. The following diagram summarizes the process as perceived by Scent Measure’s users:
Scent Explore is a mobile application for crowdsourcing that allows users to take pictures while walking around specific geographic areas - points of interest (PoIs). The application exploits some gamification mechanics such us points and badges to engage users. Explore is an Alternate Reality Gaming (ARG) app related to an Authoring tool which generates and visualizes the PoIs on the map. When approaching a point of interest, the application activates the camera and shows an Augmented reality entity to be captured simply by tapping on the screen while taking a picture of the area.
The user will then be asked to annotate (tag) the picture. To accurately define the position of the PoI, apart from capturing the location through GPS, the application uses also the gyroscope (if available) for the direction, while also integrating these values with the compass information. The app enables the collection of both pictures and videos for land cover / land use and river parameters (water level & velocity) monitoring, respectively.
In Scent Explore the users' registrations are managed by a dedicated server, which also manages all users' scores for gamification. This system is not suitable for using external authorization systems. To overcome this problem, the authorization with LandSense is managed by the application server. The process/steps implemented are described as follows:
-
The user runs SCENT Explore;
-
The user selects login via LandSense;
-
Scent Explore open the login page inside the app (webView for Android and WKWebView in iOS ): this is mandatory in iOS;
-
The user selects the auth provider from the list;
-
LandSense redirects to the selected auth provider;
-
The user provides credentials;
-
URL redirection to Explore server;
-
The Explore Server checks if a SCENT Explore profile exist;
-
If the profile exists, upload the profile information to SCENT Explore;
-
If the profile does not exist, create a new user profile; and
-
Login.
In this example we will use C# code for Unity3D. You need to use a webview, possibly cross-platform, before each call, and you should also destroy the old webview to clear the cache:
if ( UniWebView != null )
Destroy(UniWebView);
To add the webview in the scene, use the gameObject method:
UniWebView = gameObject.AddComponent<UniWebView>();
and add the callback to the webview:
UniWebView.OnPageFinished += OnPageFinished;
Remember to remove also the callback before destroying the webview:
UniWebView.OnPageFinished -= OnPageFinished;
In order to improve the UI, it is suggested to not immediately show the webview. In many cases the mobile connection is slow; thus it is preferable to wait for the web page to be fully loaded before showing the webview.
As a next step, open the login page inside the app:
UniWebView.Load("https://example.com/landsense.php&code=yourprivatecode");
In the php file it is advisable to insert a secret key to protect against possible intrusions. In the landsense.php file insert the code to open the login page of landsense:
https://as.landsense.eu/oauth/authorize/openid?client_id=".$CLIENT_ID."&response_type=code&state=yourstate&grant_type=authorization_code&scope=openid profile email
-
$CLIENT_ID = the CLIENT_ID code of your register app in landsense
-
STATE = a your code to verify the Redirection authenticity
If your Landsense login is successful, you will be redirected to the indicated URL. The URL receives the code for the Authorization Bearer method and the state inserted in the call as an additional verification of authenticity. The code below is used to receive the access token and to be able to use the bees to retrieve the data of the logged user.
$code = $_GET['code'];
$state= $_GET['state'];
$ch = curl_init();
curl_setopt_array($ch, array(
CURLOPT_RETURNTRANSFER => 1,
CURLOPT_URL => 'https://as.landsense.eu/oauth/token',
CURLOPT_HEADER => 0,
CURLOPT_POST => 1,
CURLOPT_HTTPHEADER => array('Authorization: Bearer '.$code),
CURLOPT_POSTFIELDS => array(
'grant_type' => 'authorization_code',
'client_id' => $CLIENT_ID,
'client_secret' => $CLIENT_SECRET,
'scope' => 'openid profile email',
'code' => $code )
));
$result = curl_exec($ch);
The code below is used to receive the user’s info and check if the user already has an account on the Explore management server or if a new account needs to be created.
$ch = curl_init();
curl_setopt_array($ch, array(
CURLOPT_RETURNTRANSFER => 1,
CURLOPT_URL => 'https://as.landsense.eu/oauth/userinfo?client_id='.$CLIENT_ID.'&client_secret='.$CLIENT_SECRET,
CURLOPT_HEADER => 0,
CURLOPT_POST => 1,
CURLOPT_POSTFIELDS => 'client_id='.$CLIENT_ID.'&client_secret='.$CLIENT_SECRET,
CURLOPT_HTTPHEADER => array('Authorization: Bearer '.$usertoken["access_token"],
'Content-Type:application/x-www-form-urlencoded' ),
));
$result = curl_exec($ch);
echo "landsenseloginok:".$result;
“landsenseloginok:” is a keyword, which is used by the application to understand that Landsense has given permission and that user data is transmitted.
void OnPageFinished(UniWebView webView, int statusCode, string url)
{
webView.GetHTMLContent((content)=>{
if ( content.Contains(“landsenseloginok:”) )
{
//elaborate the json value
}
});
}
If the json is correct, a specific php page is called in the user management server to check if the user has an account or if a new account needs to be created.
$ch = curl_init();
curl_setopt_array($ch, array(
CURLOPT_RETURNTRANSFER => 1,
CURLOPT_URL => "https://www.yourserver.com/api/user/generate_auth_cookie/?username=".$username."&password=".$password,
CURLOPT_HEADER => 0,
CURLOPT_POST => 0,
));
$result = curl_exec($ch);
//json decode
$login = json_decode( $result, true );
curl_close( $ch);
//status check
if ( strcmp($login['status'],"ok") == 0 )
{
echo "scentexploreresult:".$result;
exit();
}
if the status is “ok” the user has an account otherwise you have to create an account via the API.