Publication Date: 2017-06-30

Approval Date: 2017-01-26

Posted Date: 2016-11-11

Reference number of this document: OGC 16-041r1

Reference URL for this document: http://www.opengis.net/doc/PER/t12-A080

Category: Public Engineering Report

Editor: Liping Di, Eugene G. Yu, Md Shahinoor Rahman, Ranjay Shrestha

Title: Testbed-12 WPS ISO Data Quality Service Profile Engineering Report


Testbed-12 WPS ISO Data Quality Service Profile Engineering Report (16-041r1)

COPYRIGHT

Copyright © 2017 Open Geospatial Consortium. To obtain additional rights of use, visit http://www.opengeospatial.org/

WARNING

This document is an OGC Public Engineering Report created as a deliverable of an initiative from the OGC Innovation Program (formerly OGC Interoperability Program). It is not an OGC standard and not an official position of the OGC membership.It is distributed for review and comment. It is subject to change without notice and may not be referred to as an OGC Standard. Further, any OGC Engineering Report should not be referenced as required or mandatory technology in procurements. However, the discussions in this document could very well lead to the definition of an OGC Standard.

LICENSE AGREEMENT

Permission is hereby granted by the Open Geospatial Consortium, ("Licensor"), free of charge and subject to the terms set forth below, to any person obtaining a copy of this Intellectual Property and any associated documentation, to deal in the Intellectual Property without restriction (except as set forth below), including without limitation the rights to implement, use, copy, modify, merge, publish, distribute, and/or sublicense copies of the Intellectual Property, and to permit persons to whom the Intellectual Property is furnished to do so, provided that all copyright notices on the intellectual property are retained intact and that each person to whom the Intellectual Property is furnished agrees to the terms of this Agreement.

If you modify the Intellectual Property, all copies of the modified Intellectual Property must include, in addition to the above copyright notice, a notice that the Intellectual Property includes modifications that have not been approved or adopted by LICENSOR.

THIS LICENSE IS A COPYRIGHT LICENSE ONLY, AND DOES NOT CONVEY ANY RIGHTS UNDER ANY PATENTS THAT MAY BE IN FORCE ANYWHERE IN THE WORLD. THE INTELLECTUAL PROPERTY IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NONINFRINGEMENT OF THIRD PARTY RIGHTS. THE COPYRIGHT HOLDER OR HOLDERS INCLUDED IN THIS NOTICE DO NOT WARRANT THAT THE FUNCTIONS CONTAINED IN THE INTELLECTUAL PROPERTY WILL MEET YOUR REQUIREMENTS OR THAT THE OPERATION OF THE INTELLECTUAL PROPERTY WILL BE UNINTERRUPTED OR ERROR FREE. ANY USE OF THE INTELLECTUAL PROPERTY SHALL BE MADE ENTIRELY AT THE USER’S OWN RISK. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR ANY CONTRIBUTOR OF INTELLECTUAL PROPERTY RIGHTS TO THE INTELLECTUAL PROPERTY BE LIABLE FOR ANY CLAIM, OR ANY DIRECT, SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM ANY ALLEGED INFRINGEMENT OR ANY LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR UNDER ANY OTHER LEGAL THEORY, ARISING OUT OF OR IN CONNECTION WITH THE IMPLEMENTATION, USE, COMMERCIALIZATION OR PERFORMANCE OF THIS INTELLECTUAL PROPERTY.

This license is effective until terminated. You may terminate it at any time by destroying the Intellectual Property together with all copies in any form. The license will also terminate if you fail to comply with any term or condition of this Agreement. Except as provided in the following sentence, no such termination of this license shall require the termination of any third party end-user sublicense to the Intellectual Property which is in force as of the date of notice of such termination. In addition, should the Intellectual Property, or the operation of the Intellectual Property, infringe, or in LICENSOR’s sole opinion be likely to infringe, any patent, copyright, trademark or other right of a third party, you agree that LICENSOR, in its sole discretion, may terminate this license without any compensation or liability to you, your licensees or any other party. You agree upon termination of any kind to destroy or cause to be destroyed the Intellectual Property together with all copies in any form, whether held by you or by any third party.

Except as contained in this notice, the name of LICENSOR or of any other holder of a copyright in all or part of the Intellectual Property shall not be used in advertising or otherwise to promote the sale, use or other dealings in this Intellectual Property without prior written authorization of LICENSOR or such copyright holder. LICENSOR is and shall at all times be the sole entity that may authorize you or any third party to use certification marks, trademarks or other special designations to indicate compliance with any LICENSOR standards or specifications.

This Agreement is governed by the laws of the Commonwealth of Massachusetts. The application to this Agreement of the United Nations Convention on Contracts for the International Sale of Goods is hereby expressly excluded. In the event any provision of this Agreement shall be deemed unenforceable, void or invalid, such provision shall be modified so as to make it valid and enforceable, and as so modified the entire Agreement shall remain in full force and effect. No decision, action or inaction by LICENSOR shall be construed to be a waiver of any rights or remedies available to it.

None of the Intellectual Property or underlying information or technology may be downloaded or otherwise exported or reexported in violation of U.S. export laws and regulations. In addition, you are responsible for complying with any local laws in your jurisdiction which may impact your right to import, export or use the Intellectual Property, and you represent that you have complied with any regulations or registration procedures required by applicable law to make this license enforceable.

Table of Contents
Abstract

This Data Quality Engineering Report describes data quality handling requirements, challenges and solutions. One focus is on data quality in general that needs to be communicated from one service to another. In addition, it discusses WPS data quality solutions. The ultimate goal is for it to be nominated as a WPS ISO Data Quality Service Profile. ISO 19139 is used as the base to encode the data quality. WPS and workflows are used to streamline and standardize the process of data quality assurance and quality control. The main topics include: (1) generalized summary and description of the design and best practices for analyzing data quality of all feature data sources used in the Citizen Observatory WEB (COBWEB) project, (2) solutions and recommendations for enabling provenance of data quality transparent to end users when the data is processed through a WPS, (3) best practices and recommendations for designing and prototyping the WPS profile to support data quality service conformant to the NSG Metadata Framework, and (4) general solution for data quality fit for both raster-based imageries and vector-based features.

Business Value

This Engineering Report (ER) captures the essence and best practice for data quality that were successfully established and applied in the Citizen Observatory Web (COWBWEB) project. It goes one step further to formalize and standardize the processes as OGC WPS processes to address data quality issues by using networks of "people as sensors" and by analyzing observations and measurements in real-time combination with authoritative models and datasets. The ER content can be summarized as follows:

  • Innovative use of crowdsourcing and citizen sensors to solve data quality control and assurance with prescribed seven standard WPS processes,

  • Formalize the processes to solve data quality issues using citizen sensors that harmonize the data and service interoperation across processes as Web services, and

  • Achieve compatible data quality assurance levels.

Technology Value

The relevance and importance of the ER to WPS 2.0 SWG are obvious in two aspects. On the one hand, the best practice and solutions described in the ER utilizes WPS 2.0 as a general framework and service implementation specification to achieve data quality control and assurance in dealing with networks of citizen sensors and their information offers. Each data quality operation is implemented as WPS process. The adoption of WPS not only benefits high level interoperation among services, but also prompts the applications of WPS in citizen sensor network applications. On the other hand, the formalization and standardization of seven processes identified in the COBWEB project lead to the development of a WPS profile with ISO Data Quality standards that are applicable for citizen sensor data quality control and assurance. Seven processes are to be specified as WPS process. The seven WPS processes are: (1) LBS-Positioning, (2) Cleaning, (3) Automatic Validation, (4) Authoritative Data Comparison, (5) Model-Based Validation, (6) Linked Data Analysis, and (7) Semantic Harmonization.

How does this ER relate to the work of the Working Group

This ER demonstrates a use case for web-based processing using the WPS 2.0 interface standard. Also, a basis for a data quality WPS profile is described. The goal of the hierarchical profiling approach specified in the WPS 2.0 standard is to foster interoperability among different WPS clients and servers. A data quality profile could serve as proof of concept of the WPS 2.0 profiling approach and could be used to incorporate data quality checks in (automated) geoprocessing workflows.

Keywords

ogcdocs, testbed-12, WPS, Web services, ISO 19139, ISO 19115, Workflow

Proposed OGC Working Group for Review and Approval

The ER will be submitted to WPS 2.0 SWG for review. The ultimate goal is to develop and promote it as a WPS profile with the approval of WPS 2.0 SWG.

1. Introduction

1.1. Scope

This report captures the best practice of using WPS processes as the interoperation framework to support data quality assurance and control using networks of "people as sensors". Seven processes for data quality control shall be formalized and specified as WPS processes. The interoperation among processes as well as between citizen sensors shall be enabled at levels of data and services.

1.2. Document contributor contact points

All questions regarding this document should be directed to the editor or the contributors:

Table 1.1. Contacts
Name Organization

Eugene G. Yu

George Mason University/CSISS

Liping Di

George Mason University/CSISS

Md Shahinoor Rahman

George Mason University/CSISS

Ranjay Shrestha

George Mason University/CSISS

Lingjun Kang

George Mason University/CSISS

Sam Meek

Helyx Secure Information Systems Ltd

1.3. Future Work

Several future recommendations have been identified. Details will be discussed in the section on Future Recommendations. The recommendations are: (1) alignment with the evolution of geospatial standards, (2) data quality workflow enablement, and (3) data quality service test suites.

1.4. Foreword

Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. The Open Geospatial Consortium shall not be held responsible for identifying any or all such patent rights.

Recipients of this document are requested to submit, with their comments, notification of any relevant patent claims or other intellectual property rights of which they may be aware that might be infringed by any implementation of the standard set forth in this document, and to provide supporting documentation.

2. References

The following documents are referenced in this document. For dated references, subsequent amendments to, or revisions of, any of these publications do not apply. For undated references, the latest edition of the normative document referred to applies.

  • OGC 06-121r9, OGC® Web Services Common Standard

Note
This OWS Common Standard contains a list of normative references that are also applicable to this Implementation Standard.
  • OGC 14-065, OGC® WPS 2.0 Interface Standard

  • OGC 06-121r9, OGC® Web Services Common Standard

  • ISO 19157:2013, Geographic information — Data quality

  • ISO/DTS 19157-2, Geographic information — Data quality — Part 2: XML Schema Implementation

  • ISO 19115:2003, Geographic information — Metadata

3. Terms and definitions

For the purposes of this report, the definitions specified in Clause 4 of the OWS Common Implementation Standard [OGC 06-121r9] and in OGC® Abstract Specification Topic 11: Metadata [OGC 01-111] shall apply. In addition, the following terms and definitions apply.

3.1. data quality

Data quality is a concept used in the context to represent the geospatial data quality with multiple components that include data validity, precision and accuracy. Data validity may be described as “fitness for use,” i.e. the degree to which data are fit for an application. Geospatial precision is related to resolution and variation. Geospatial accuracy refers only to how close the measurement is to the true value.

3.2. positional accuracy

the quantifiable value that represents the positional difference between two geospatial layers or between a geospatial layer and reality

3.3. lineage

description of the history of the spatial data, including descriptions of the source material from which the data were derived, and the methods of derivation

3.4. attribute accuracy

the accuracy of the quantitative and qualitative information attached to each feature

3.5. consistency

description of the dependability of relationships encoded in the data structure of the digital spatial data

3.6. completeness

the degree to which geographic features, their attributes and their relationships are included or omitted in a dataset

4. Conventions

4.1. Abbreviated terms

  • API Application Program Interface

  • BPMN Business Process Model and Notation

  • COBWEB Citizen OBservatory WEB

  • COM Component Object Model

  • CORBA Common Object Request Broker Architecture

  • COTS Commercial Off The Shelf

  • DCE Distributed Computing Environment

  • DCOM Distributed Component Object Model

  • DQ Data Quality

  • DTS Draft Technical Specification

  • GeoJSON Geographic JavaScript Object Notation

  • GML Geography Markup Language

  • IDL Interface Definition Language

  • ISO International Organization for Standardization

  • JSON JavaScript Object Notation

  • NGA National Geospatial-Intelligence Agency

  • NSG National System for Geospatial Intelligence

  • UC Use Case

  • WFS Web Feature Service

  • WPS Web Processing Service

  • XML EXtensible Markup Language

4.2. UML notation

Most diagrams that appear in this standard are presented using the Unified Modeling Language (UML) static structure diagram, as described in Subclause 5.2 of [OGC 06-121r9].

4.3. Used parts of other documents

This document uses significant parts of document [OGC 06-121r9]. To reduce the need to refer to that document, this document copies some of those parts with small modifications. To indicate those parts to readers of this document, the largely copied parts are shown with a light grey background (15%).

5. Overview

Data quality services are the focus. Specifications and standards define how data quality is described and presented. Many processes to derive data quality share common solutions for different cases. This Engineering Report (ER) aims to enable automation of commonly required data quality measurements and assessments. Web Processing Service (WPS) is used as the vehicle to achieve such automation.

6. Status Quo & New Requirements Statement

6.1. Status Quo

6.1.1. Data quality assurance and data quality control

The Citizen Observatory Web (COBWEB) is a citizen science project that explores the potential of combining citizen resources and open geospatial standards in supporting biosphere data collection, validation, and analysis[1]. The infrastructure sets a suite of technologies to form a citizens’ observatory framework that effectively exploits technological developments in ubiquitous mobile devices, crowd-sourcing of geographic information and the operational applications of standards based spatial data infrastructure (SDI). The framework enables citizens to collect environmental information on a range of parameters including species distribution, flooding and land cover/use[1, 2]. Workflow was used to design different complex process from component services[3]. Dealing with diversified sources of data, the project had to tackle the data quality issues. One of the important and efficient approaches is its adoption of WPS processes to enable data quality assurance and validation. The data quality was addressed by using networks of “people as sensors” and by analyzing observations and measurements in real-time combination with authoritative models and datasets. The COBWEB project represents the status quo or starting point for the work done in testbed 12 to develop and formalize the WPS processes to facilitate data quality assurance.

6.1.2. Data quality assessment challenges

In the COBWEB project, the challenges of the quality assurance were: how to design and implement a system that was flexible enough to qualify data with different fitness for purpose requirements, different data schemas, recorded by different devices. More specifically, the challenges are as follows.

  1. Fitness of data quality model: What to model? What is the proper model process? What are the variations in capture devices/persons?

  2. Provenance: The history of data collection is important. It is related to what curation process is involved.

  3. Metaquality: The questions for different data qualities need to be answered. What are the qualities of DQ metadata? How to define accuracy? How to define completeness? What criteria and strategies should be used to keep consistency?

  4. Levels of DQ assessment: DQ assessment can be done at different levels. What should be the proper level? Is it needed to be as detailed as up to the level of dataset? Is it only necessary to evaluate at the level of collection?

  5. Propagation of data uncertainty: Data error and uncertainty may be propagated through the chain of processes when multiple processes are involved. How to represent and record the propagation among workflows? How to track the propagation among data fusion?

6.2. Requirements Statement

The requirements for the sponsor, NGA, differ from the COBWEB project in the following ways:

  • The data is authoritative.

  • The data is likely to have a static structure.

  • Metadata is likely to exist for the products, which can be utilized in the qualification process.

  • In COBWEB the focus was on observations recorded as points, this project requires qualification to be performed on different types of data including points, lines, polygons and images.

By analyzing the requirements and the demand on data quality services, the following common requirements can be identified:

  1. Quality assurance of data quality: This defines what to be assessed, how to assess, and required standard approach.

  2. Fit data quality assessment approaches: The atomic process may be represented as a WPS process. The complex assessment process may be combining or chaining several atomic processes to form workflows. Enabling workflows and composition of atomic processes allows extended adaptivity and flexibility to meet various requirements with different complex levels. Efficiency can be achieved with enhanced reusability of atomic processes.

  3. Provenance: This keeps track of data quality and data history.

  4. Unified aggregating data quality to high levels: Approaches and methods to aggregate data quality need to be unified.

  5. Standard mechanisms to encode, store, and retrieve data quality metadata at multiple levels: Different levels of details on data may adopt different encoding, storing, and accessing mechanism. Geospatial data may be dealt with two levels in general: dataset and data collection.

  6. Data quality consumption: Processes and outputs for data quality should be clearly understood about who is the intended consumer of recorded DQ information. Typical, two distinguishable types of consumption should be considered: machine-readable and human-readable.

There are also extra requirements from the sponsor including adherence to specific standards for data and metadata. The main required standard is the NGA Metadata Framework and ISO 19115 metadata documents. Recently, the quality elements of ISO 19115 have been split into a separate document, ISO 19157. It is going to be adopted as the document recording the quality elements as the specifications from OGC and NGA evolve. In this testbed, all these will be taken into account in designing the WPS Data Quality processes.

7. Solutions

7.1. Targeted Solutions

7.1.1. Overall Design Strategy and Architecture

Data quality (DQ) involves different aspects – completeness, positional accuracy, topological accuracy, domain consistency, conceptual consistency, format consistency, and correctness. The realization of such DQ functionality is recommended to be implemented as a series of atomic WPS process. WPS, as an OGC processing specification, is identified as a fit technology to enable the implementation of DQ processes in the Web environment[1, 2]. With considerations of dealing with complexity and multi-levels of granularity, each WPS process should be designed as atomic as possible to allow its reusability in composition through workflows.

As defined in ISO 19157, there are many DQ criteria tests. The DQ WPS should consist of a set of atomic DQ WPS test processes to meet the functional requirements defined in ISO 19157. Each DQ process should be configurable and atomic. They should be passed with metrics that correspond to the Universe of Discourse, or what the thresholds are for what is considered as quality in ISO 19157 terms. The WPS processes all follow a similar design to make them interoperable, suited for chaining and so that they conform to some uniform pattern. This is depicted in the following Figure.

figure atomic dq wps process
Figure 1. Atomic DQ WPS process

Each atomic DQ WPS process may take two types of inputs: data and reference data. Both data and reference data can be served through standard WCS or WFS services. They can be encoded in GML, GeoJSON, XML, or JSON. An atomic DQ WPS process may output metadata in XML and/or optionally non-conforming data in GML. The output should contain a statement to clarify its conformance.

There are three main aspects of data quality issues to be tested with DQ WPS processes. They are as follows.

  1. Data Quality Assurance/Quantity Control WPS,

  2. Encoding/curating data quality: correctness, completeness, consistency, and provenance, and

  3. Standard data quality metadata consumption: making mapping to NSG metadata framework mandatory, and providing both machine-readable and human-readable formats.

7.1.2. Completeness Omission/Completeness Commission

Completeness has two connotations. One is to inspect omission, i.e. how much is not included in the geospatial database. Another is to inspect commission, i.e. how much is falsely included in the geospatial database. The measurements can be in quantity, rate, or duplicates. This can be implemented as one WPS process that completes the computation by comparing the geospatial database with the reference geospatial database.

7.1.3. Positional Accuracy

The position accuracy is related to the geometrical measurements. There are two cases that have quite distinguishing characteristics due to their different formats – vector or raster. Two separate processes are proposed to deal with such different geospatial databases.

Positional Accuracy (vector feature)

Vector-based geospatial features are often managed by database or database-like system. Each feature has a set of attributes. There would be one or more fields that form the primary key. By associating the database to the reference databases, one can verify if they have the required positional accuracy. This will be designed as a dedicated WPS process.

Positional Accuracy (gridded)

Raster-based geospatial features are concerned with spatial resolution and location displacement. The comparison and validation against reference raster-based dataset need to consider both spatial resolution and location displacement. This will be developed as one dedicated WPS process to check positional accuracy using a reference dataset.

7.1.4. Topological Consistency

Geometrical contradictions should not exist in the result geospatial database. This needs to verify that geospatial rules are met, such as one location for one point, polygon bound by lines, etc. A WPS process will be designed and developed to complete the consistency check in a single geospatial database.

7.2. Recommendations

ISO standards will be adopted to encode data quality. Specifically, ISO 19157 is one of the primary standards to support different aspects of data quality. The mapping of elements can be seen in the following table.

figure iso19157 map
Figure 7.2. ISO 19157 Element Map

In overall design, the following are recommended in dealing with data quality issues.

  1. WPS workflow enablement with BPMN for flexibility

  2. Seven important aspects for data quality control: location-based-service position correction, cleaning, model-based validation, authoritative data comparison, automatic validation, linked data analysis, and semantic harmonization (Meek, S Jackson, M Leibovici, DG (2014) )

  3. Recommended levels of data quality metadata: multiple levels of conformance to meet different requirements and standard information to make users aware of levels of data quality assurance and data quality control.

7.2.1. Completeness Omission/Completeness Commission WPS processes

The following table defines the generic WPS process for processing the Completeness Omission/Completeness Commission. There are two types: omission and commission. The processes can be further broken down to different processes for vector-based and raster-based features.

Table 7.1. Completeness WPS Process

Name:

iso19157.DQ_Completeness.DQ_Completeness

Description:

1. Calculate omission and commission of a dataset based on a reference dataset.

2. Calculate rate of omission and commission of a dataset based on a reference dataset.

3. Calculate duplicate features within a dataset vector-based and raster-based features.

Input:

Target dataset, field of interest, Reference dataset, field of interest declaration.

Algorithm:

1. Summarizes the data in each and calculates entry type and frequency for both datasets and compares the results.

2. Uses the summary table calculated in 1) and calculates a percentage of omission/commission.

3. Performs a multi-step check on the dataset. Compares geometries of a feature to all other features, if geometries match then compare each of the fields within the dataset, if the values all match then the entry is a duplicate.

Output:

One of the following:

1. A table listing all data types and frequency for both target and reference datasets.

2. A list of data types and rate of omission/commission

3. The number of duplicate features.

Completeness Omission WPS processes

This section describes completeness omission WPS processes.

Completeness Omission WPS process for vector-based dataset

The following table defines the WPS process to evaluate the Completeness Omission of vector-based dataset.

Table 7.2. Completeness Omission WPS Process for vector-based dataset

Name:

iso19157.DQ_Completeness.DQ_CompletenessOmission

Description:

1. Calculate omission of a vector dataset based on a reference vector dataset.

2. Calculate rate of omission of a vector dataset based on a reference vector dataset.

3. Calculate duplicated features within a vector dataset.

Input:

1. Target vector dataset to be qualified

2. Reference vector dataset to qualify the target vector dataset against

3. Lookup field for the target vector dataset

4. Lookup field for the reference vector dataset

5. Link to metadata document (optional)

6. Threshold for omission rate (percentage)

Algorithm:

1. Summarizes the data in each and calculates entry type and frequency for both datasets and compares the results.

2. Uses the summary table calculated in 1) and calculates a percentage of omission.

3. Performs a multi-step check on the dataset. Compares geometries of a feature to all other features, if geometries match then compare each of the fields within the dataset, if the values all match then the entry is a duplicate.

Output:

One of the following:

1. A table listing all data types and frequency for both target and reference datasets.

2. A list of data types and rate of omission

3. The number of duplicate features.

UML:

See Figure 7.3.

Example:

Endpoint: http://54.201.124.35/wps/WebProcessingService

Request: See example shown in Table A.1. in Appendix A.

Response: See example response shown in Table A.1. in Appendix A.

desc iso19157 DQ Completeness Omission Vector
Figure 7.3. UML model for the Completeness Omission WPS process (vector-based dataset)
Completeness Omission WPS process for raster-based dataset

The following table defines the WPS process to evaluate the Completeness Omission of raster-based dataset.

Table 7.3. Completeness Omission WPS Process for raster-based dataset

Name:

iso19157.DQ_Completeness.DQ_CompletenessOmissionR

Description:

1. Calculate omission of a raster dataset based on a reference raster dataset.

2. Calculate rate of omission of a raster dataset based on a reference raster dataset.

3. Calculate duplicated features within a raster dataset.

Input:

1. Target raster dataset to be qualified

2. Link to metadata document (optional)

3. Threshold for omission rate (percentage)

Algorithm:

1. Summarizes the data in each and calculates entry type and frequency for input dataset.

2. Uses the summary table calculated in 1) and calculates a percentage of omission.

3. Performs a multi-step check on the dataset. Compares the pixel of a feature to all other features, if the geometries match then compare each of the fields within the dataset, if the values all match then the entry is a duplicate.

Output:

One of the following as the result of the test:

1. A table listing all data types and frequency.

2. A list of data types and rate of omission

3. The number of duplicate features.

UML:

See Figure 7.4.

Example:

Request: See example shown in Table A.2. in Appendix A.

Response: See example response shown in Table A.2. in Appendix A.

desc iso19157 DQ Completeness Omission Raster
Figure 7.4. UML model for the Completeness Omission WPS process (raster-based dataset)
Completeness Commission WPS processes

This section describes completeness commission WPS processes.

Completeness Commission WPS process for vector-based dataset

The following table defines the WPS process to evaluate the Completeness Commission of vector-based dataset.

Table 7.4. Completeness Commission WPS Process for vector-based dataset

Name:

iso19157.DQ_Completeness.DQ_CompletenessCommission

Description:

1. Calculate commission of a vector dataset based on a reference vector dataset.

2. Calculate rate of commission of a vector dataset based on a reference vector dataset.

3. Calculate duplicated features within a vector dataset.

Input:

1. Target vector dataset to be qualified

2. Reference vector dataset to qualify the target vector dataset against

3. Lookup field for the target vector dataset

4. Lookup field for the reference vector dataset

5. Link to metadata document (optional)

6. Threshold for omission rate (percentage)

Algorithm:

1. Summarizes the data in each and calculates entry type and frequency for both datasets and compares the results.

2. Uses the summary table calculated in 1) and calculates a percentage of omission.

3. Performs a multi-step check on the dataset. Compares geometries of a feature to all other features, if geometries match then compare each of the fields within the dataset, if the values all match then the entry is a duplicate.

Output:

One of the following:

1. A table listing all data types and frequency for both target and reference datasets.

2. A list of data types and rate of commission

3. The number of duplicate features.

UML:

See Figure 7.5.

Example:

Request: See example shown in Table A.3. in Appendix A.

Response: See example response shown in Table A.3. in Appendix A.

desc iso19157 DQ Completeness Commission Vector
Figure 7.5. UML model for the Completeness Commission WPS process (vector-based dataset)
Completeness Commission WPS process for raster-based dataset

The following table defines the WPS process to evaluate the Completeness Commission of raster-based dataset.

Table 7.5. Completeness Commission WPS Process for raster-based dataset

Name:

iso19157.DQ_Completeness.DQ_CompletenessCOmmissionR

Description:

1. Calculate commission of a raster dataset based on a reference raster dataset.

2. Calculate rate of commission of a raster dataset based on a reference raster dataset.

3. Calculate duplicated features within a raster dataset.

Input:

1. Target raster dataset to be qualified

2. Link to metadata document (optional)

3. Threshold for commission rate (percentage)

Algorithm:

1. Summarizes the data in each and calculates entry type and frequency for input dataset.

2. Uses the summary table calculated in 1) and calculates a percentage of omission.

3. Performs a multi-step check on the dataset. Compares the pixel of a feature to all other features, if the geometries match then compare each of the fields within the dataset, if the values all match then the entry is a duplicate.

Output:

One of the following as the result of the test:

1. A table listing all data types and frequency.

2. A list of data types and rate of commission

3. The number of duplicate features.

UML:

See Figure 7.6.

Example:

Request: See example shown in Table A.4. in Appendix A.

Response: See example response shown in Table A.4. in Appendix A.

desc iso19157 DQ Completeness Commission Raster
Figure 7.6. UML model for the Completeness Commission WPS process (raster-based dataset)

7.2.2. Positional Accuracy WPS processes

This section describes positional accuracy WPS processes.

Positional Accuracy (vector feature) WPS process

The following table defines the Positional Accuracy (vector feature) WPS processes.

Table 7.6. Positional Accuracy (vector feature) WPS processes

Name:

iso19157.DQ_PositionalAccuracy​.DQ_AbsoluteExternalPositionalAccuracy

Description:

Calculates the positional accuracy of a target dataset given a reference dataset and lookup field

Input:

Target dataset, target dataset field ID, reference dataset, reference dataset field ID

Algorithm:

It takes the target dataset and matches up its entries with those in the reference dataset by comparing their Identifiers (IDs) (they must be identified as an integer) - i.e. target dataset field ID and reference dataset field ID defined in the inputs.

Output:

The mean uncertainties as defined by ISO 19157

UML:

See Figure 7.7.

Example:

Request: See example shown in Table B.1. in Appendix B.

Response: See example response shown in Table B.1. in Appendix B.

desc iso19157 DQ PostionalAccuracy AbsoluteExternalPA Vector
Figure 7.7. UML model for the Positional Accuracy WPS process (vector-based dataset)
Positional Accuracy (gridded) WPS process

The following table defines the Positional Accuracy (gridded) WPS processes.

Table 7.7. Positional Accuracy (gridded) WPS processes

Name:

iso19157.DQ_PositionalAccuracy​.DQ_GriddedDataPositionalAccuracy

Description:

Calculates the positional accuracy of a raster dataset based upon edges of buildings matched to a vector reference dataset.

Input:

A georeferenced raster dataset as a GeoTIFF, set of reference polygons, threshold for edge detect (0-255), area for noise removal (very small area polygons usually removed as they constitute noise).

Algorithm:

The process does the following:

• Histogram stretch

• Laplace filter

• Black and White conversion

• Black and white binary image creation

• Black and white binary to polygons

• Polygon distance to nearest reference polygon.

Output:

The mean distance uncertainty as defined by ISO 19157, the Laplace image, the generated polygons.

UML:

See Figure 7.8.

Example:

Request: See example shown in Table B.2. in Appendix B.

Response: See example response shown in Table B.2. in Appendix B.

desc iso19157 DQ PostionalAccuracy GriddedDataPA Raster
Figure 7.8. UML model for the Positional Accuracy WPS process (raster-based dataset)

7.2.3. Logical Consistency WPS processes

This section describes logical consistency WPS processes.

Topological Consistency WPS process

The following table defines the Topological Consistency WPS processes.

Table 7.8. Topological Consistency WPS processes (vector features)

Name:

iso19157.DQ_LogicalConsistency.DQ_TopolgicalConsistency

Description:

Calculates and reports on potential topological issues inside a single dataset.

Input:

Target dataset.

Algorithm:

For line data, check the number of dangles. For polygon data, check the number of overlaps. When an optional parameter for buffer distance or tolerance is entered, polygon overlap or line dangle should be determined as those over such given distance and the threshold would be evaluated as the percentage of failures.

Output:

Number of overlapping polygons equals 2.

UML:

See Figure 7.9.

Example:

Request: See example shown in Table C.1. in Appendix C.

Response: See example response shown in Table C.1. in Appendix C.

desc iso19157 DQ LogicalConsistency TopologicalConsistency Vector
Figure 7.9. UML model for the Topological Consistency WPS process (vector-based dataset)
Conceptual Consistency WPS process

The following table defines the generic Conceptual Consistency WPS processes.

Table 7.9. Conceptual Consistency WPS processes

Name:

iso19157.DQ_LogicalConsistency.DQ_ConceptualConsistency

Description:

Compares the input dataset with the conceptual schema.

Input:

input image, metadata link, conceptual schema.

Algorithm:

Dependent on the conceptual schema, it will involve a comparison between the target dataset and the schema depending on how it is expressed.

Output:

Updated metadata, conformance statement

Conceptual Consistency WPS process (vector features)

The following table defines the Conceptual Consistency WPS processes for vector dataset.

Table 7.10. Conceptual Consistency WPS processes (vector features)

Name:

iso19157.DQ_LogicalConsistency.DQ_ConceptualConsistency

Description:

Compares the input dataset with the conceptual schema.

Input:

input image, metadata link, conceptual schema.

Algorithm:

Dependent on the conceptual schema, it will involve a comparison between the target dataset and the schema depending on how it is expressed.

Output:

Updated metadata, conformance statement

UML:

See Figure 7.10.

Example:

Request: See example shown in Table C.2. in Appendix C.

Response: See example response shown in Table C.2. in Appendix C.

desc iso19157 DQ LogicalConsistency ConceptualConsistency Vector
Figure 7.10. UML model for the Conceptual Consistency WPS process (vector-based dataset)
Conceptual Consistency WPS process (raster dataset)

The following table defines the Conceptual Consistency WPS processes for raster dataset.

Table 7.11. Conceptual Consistency WPS processes (raster dataset)

Name:

iso19157.DQ_LogicalConsistency.DQ_ConceptualConsistencyR

Description:

Compares the input dataset with the conceptual schema.

Input:

input image, metadata link, conceptual schema.

Algorithm:

Dependent on the conceptual schema, it will involve a comparison between the target dataset and the schema depending on how it is expressed.

Output:

Updated metadata, conformance statement

UML:

See Figure 7.11.

Example:

Request: See example shown in Table C.3. in Appendix C.

Response: See example response shown in Table C.3. in Appendix C.

desc iso19157 DQ LogicalConsistency ConceptualConsistency Raster
Figure 7.11. UML model for the Conceptual Consistency WPS process (raster-based dataset)
Domain Consistency WPS process

The following table defines the Domain Consistency WPS processes.

Table 7.12. Domain Consistency WPS processes

Name:

iso19157.DQ_LogicalConsistency.DQ_DomainConsistency

Description:

Calculate and reports on a quantitative data field based on bounds.

Input:

Target dataset, field name, minimum bound, maximum bound, metadata document link.

Algorithm:

For numerical data only, check each record in a field for conformance to the bounds.

Output:

The nonconforming features, a statement of the domain consistency, the metadata document with the updated Domain Consistency field.

UML:

See Figure 7.12.

Example:

Request: See example shown in Table C.4. in Appendix C.

Response: See example response shown in Table C.4. in Appendix C.

desc iso19157 DQ LogicalConsistency DomainConsistency
Figure 7.12. UML model for the Domain Consistency WPS process (vector-based dataset)

7.2.4. Thematic Consistency WPS process

This section describes thematic consistency WPS processes.

Classification Correctness WPS process

The following table defines the Classification Correctness WPS processes.

Table 7.13. Classification Correctness WPS processes

Name:

iso19157.DQ_ThematicAccuracy​.DQ_ThematicClassificationCorrectness

Description:

This process is for domain classified raster datasets that have been generated from imagery. For example, soil, land use, agricultural datasets.

Input:

input GeoTiff, input reference data (polygon) metadata link.

Algorithm:

Check classifications against the universe of discourse provided by an input. Checks each pixel against the corresponding polygon for correctness.

Output:

Updated metadata, conformance statement.

Classification Correctness WPS process (vector features)

The following table defines the Classification Correctness WPS processes for vector features.

Table 7.14. Classification Correctness WPS processes (vector features)

Name:

iso19157.DQ_ThematicAccuracy​.DQ_ThematicClassificationCorrectness

Description:

This process is for domain classified raster datasets that have been generated from imagery. For example, soil, land use, agricultural datasets.

Input:

input target data (raster) in GeoTiff, input reference data (polygon) metadata link..

Algorithm:

Check classifications against the universe of discourse provided by an input. Checks each pixel against the corresponding polygon for correctness.

Output:

Updated metadata, conformance statement.

UML:

See Figure 7.13.

Example:

Request: See example shown in Table D.1. in Appendix D.

Response: See example response shown in Table D.1. in Appendix D.

desc iso19157 DQ ThematicAccuracy ClassificationCorrectness Vector
Figure 7.13. UML model for the Classification Correctness WPS process (vector-based dataset)
Classification Correctness WPS process (raster dataset)

The following table defines the Classification Correctness WPS processes for raster dataset.

Table 7.15. Classification Correctness WPS processes (raster dataset)

Name:

iso19157.DQ_ThematicAccuracy​.DQ_ThematicClassificationCorrectnessR

Description:

This process is for classified raster datasets that have been generated from imagery. For example, soil, land use, agricultural datasets.

Input:

input GeoTiff, input reference data (polygon) metadata link..

Algorithm:

Check classifications against the universe of discourse provided by an input. Checks each pixel against the corresponding polygon for correctness.

Output:

Updated metadata, conformance statement.

UML:

See Figure 7.14.

Example:

Request: See example shown in Table D.2. in Appendix D.

Response: See example response shown in Table D.2. in Appendix D.

desc iso19157 DQ ThematicAccuracy ClassificationCorrectness Raster
Figure 7.14. UML model for the Classification Correctness WPS process (raster-based dataset)
Non-Quantitative Attribute Accuracy WPS process

The following table defines the Non-Quantitative Attribute Accuracy WPS processes.

Table 7.16. Non-Quantitative Attribute Accuracy WPS processes

Name:

iso19157.DQ_ThematicAccuracy​.DQ_NonQuantitativeAttributeAccuracy

Description:

Check non-quantitative attribute consistency.

Input:

target dataset, the dataset to be be used as an authoritative source, the field that holds the non-quantitative values in the target dataset, the field that holds the values in the reference dataset, a link to the metadata document (optional), and the failure threshold as a percentage.

Algorithm:

Check the consistency of the dataset against the reference dataset.

Output:

List of the non-conforming points, the statement of conformance (findings of the test), the full updated metadata document (available if provided as an input), and the results of the test expressed as a metadata chunk.

UML:

See Figure 7.15.

Example:

Request: See example shown in Table D.3. in Appendix D.

Response: See example response shown in Table D.3. in Appendix D.

desc iso19157 DQ ThematicAccuracy NonQuantitativeAttributeAccuracy
Figure 7.15. UML model for the Non-Quantitative Attribute Accuracy WPS process (vector-based dataset)
Quantitative Attribute Accuracy WPS process

The following table defines the Quantitative Attribute Accuracy WPS processes.

Table 7.17. Quantitative Attribute Accuracy WPS processes

Name:

iso19157.DQ_ThematicAccuracy​.DQ_QuantitativeAttributeAccuracy

Description:

Compares a quantitative field from two datasets.

Input:

The dataset to be qualified, the dataset to be be used as an authoritative source, the field that holds the quantitative values in the target dataset, the field that holds the values in the reference dataset, a link to the metadata document (optional), and the failure threshold as a percentage.

Algorithm:

Compares two datasets quantitatively on selected quantitative fields.

Output:

The statement of conformance (findings of the test), the full updated metadata document (available if provided as an input), and the results of the test expressed as a metadata chunk .

UML:

See Figure 7.16.

Example:

Request: See example shown in Table D.4. in Appendix D.

Response: See example response shown in Table D.4. in Appendix D.

desc iso19157 DQ ThematicAccuracy QuantitativeAttributeAccuracy
Figure 7.16. UML model for the Quantitative Attribute Accuracy WPS process (vector-based dataset)

8. Use Cases

8.1. Use Case 1 - Completeness Omission/Completeness Commission

This to test the capabilities in evaluating data quality for Completeness Omission/Completeness Commission.

8.1.1. Use Case 1.1 - Evaluate Data Quality on Completeness Omission

Use Case 1.1.1 Evaluate Data Quality on Completeness Omission (vector feature)
Table 8.1. Use Case for the WPS Process of Data Quality Completeness Omission (vector feature)

Use Case Number

UC1.1.1

Description

This use case demonstrates using the DQ WPS process to check missing data in a dataset against a reference dataset in vector format. Processing two identical vector datasets should return "passed" (or boolean value 1) as expected.

Area map or study area description

In this demonstration, both target dataset and reference dataset used the same OpenStreetMap dataset of Canada on place names. The following figure shows the map area.

Target dataset (vector features, points): See Figure 8.1.

Reference dataset (vector features, points): See Figure 8.2.

Test Page

http://54.201.124.35/wps/test_client

Request File

Completeness.CompletenessOmission2.0.xml

Example Execution

See example request and response in Table A.1. in Appendix A.

uc desc iso19157 DQ Completeness Omission Vector target
Figure 8.1. Target dataset for Use Case 1.1.1
uc desc iso19157 DQ Completeness Omission Vector reference
Figure 8.2. Reference dataset for Use Case 1.1.1
Use Case 1.1.2 Evaluate Data Quality on Completeness Omission (raster dataset)
Table 8.2. Use Case for the WPS Process of Data Quality Completeness Omission (raster dataset)

Use Case Number

UC1.1.2

Description

This use case demonstrates using the DQ WPS process to check missing data by comparing the dataset resolution to a required resolution in raster format. The value of "passed" (or boolean value 1) should be returned if the resolution of the dataset is less than the given threshold value.

Area map or study area description

A GeoTIFF dataset was used with resolution of 0.004000087833889381 in geographic coordinate system. The test checks if the resolution is below threshold 1. Therefore, the test is passed as expected. The following figure shows the dataset used.

Test dataset (raster): See Figure 8.3.

Test Page

http://54.201.124.35/wps/test_client

Request File

Completeness.CompletenessOmissionR2.0.xml

Example Execution

See example request and response in Table A.2. in Appendix A.

uc desc iso19157 DQ Completeness Omission Raster
Figure 8.3. Test dataset for Use Case 1.1.2
Use Case 1.2.1 Evaluate Data Quality on Completeness Commission (vector features)
Table 8.3. Use Case for the WPS Process of Data Quality Completeness Commission (vector features)

Use Case Number

UC1.2.1

Description

This use case demonstrates using the DQ WPS process to check data commission by verifying if there is too much data within a dataset in vector format. Processing two identical vector datasets should return "passed" (or boolean value 1) as expected.

Area map or study area description

In this demonstration, both target dataset and reference dataset used the same OpenStreetMap dataset of Canada on place names. The following figure shows the map area.

Target dataset (vector, points): See Figure 8.4.

Reference dataset (vector, points): See Figure 8.5.

Test Page

http://54.201.124.35/wps/test_client

Request File

Completeness.CompletenessCommission2.0.xml

Example Execution

See example request and response in Table A.3. in Appendix A.

uc desc iso19157 DQ Completeness Commission Vector target
Figure 8.4. Target dataset for Use Case 1.2.1
uc desc iso19157 DQ Completeness Commission Vector reference
Figure 8.5. Reference dataset for Use Case 1.2.1
Use Case 1.2.2 Evaluate Data Quality on Completeness Commission (raster dataset)
Table 8.4. Use Case for the WPS Process of Data Quality Completeness Commission (raster dataset)

Use Case Number

UC1.2.2

Description

This use case demonstrates using the DQ WPS process to check data commission by comparing the dataset resolution to a required resolution in raster format. The value of "passed" (or boolean value 1) should be returned if the resolution of the dataset is less than the given threshold value.

Area map or study area description

A GeoTIFF dataset was used with resolution of 0.004000087833889381 in geographic coordinate system. The test checks if the resolution is above threshold 1. Therefore, the test is failed (or return boolean value 0) as expected. The following figure shows the dataset used.

Test dataset: See Figure 8.6.

Test Page

http://54.201.124.35/wps/test_client

Request File

Completeness.CompletenessCommissionR2.0.xml

Example Execution

See example request and response in Table A.4. in Appendix A.

uc desc iso19157 DQ Completeness Commission Raster
Figure 8.6. Test dataset for Use Case 1.2.2

8.2. Use Case 2 - Positional Accuracy

This section covers the use cases for utilizing Positional Accuracy WPS DQ processes.

8.2.1. Use Case 2.1 - Positional Accuracy (vector feature)

Table 8.5. Use Case for the WPS Process of Data Quality Positional Accuracy (vector features)

Use Case Number

UC2.1

Description

This use case demonstrates using the DQ WPS process to check the positional displacement by using a reference dataset to match pairs to a target dataset and establishing any displacement in vector format. The average displacement is compared against the given threshold to determine if the given dataset passed the verification as overall.

Area map or study area description

In this demonstration, two datasets were used to map the movement of bugs (e.g. beetles). The following figures shows a small section of the bug maps in Lawrence county, South Dakota, USA. The result showed that the test failed to pass the given threshold of 10 since the mean displacement from the authoritative data is 94.84364940038897 for the two given datasets.

Bug map (before their move): See Figure 8.7.

Bug map (after their move): See Figure 8.8.

Displacement map (yellow dots - before; red dots - after): See Figure 8.9.

Test Page

http://54.201.124.35/wps/test_client

Request File

PositionalAccuracy.AbsoluteExternalPositionalAccuracy2.0.xml

Example Execution

See example request and response in Table B.1. in Appendix B.

uc desc iso19157 DQ PostionalAccuracy AbsoluteExternalPA Vector
Figure 8.7. Target dataset for Use Case 2.1
uc desc iso19157 DQ PostionalAccuracy AbsoluteExternalPA Vector moved
Figure 8.8. Reference dataset for Use Case 2.1
uc desc iso19157 DQ PostionalAccuracy AbsoluteExternalPA Vector displacement
Figure 8.9. Displacement of points in target dataset and reference dataset for Use Case 2.1

8.2.2. Use Case 2.2 - Positional Accuracy (gridded)

Table 8.6. Use Case for the WPS Process of Data Quality Positional Accuracy (raster dataset)

Use Case Number

UC2.2

Description

This use case demonstrates using the DQ WPS process to check data positional accuracy by verifying the bounding box of a raster dataset against the bounding box of an authoritative dataset in gridded dataset. The value of "passed" (or boolean value 1) should be returned if the resolution of the dataset is less than the given threshold value.

Area map or study area description

A small portion of data was processed and generated using raster data that covers portion of Pennsylvania, USA. The calculated accuracy for the dataset is 0.000000005172098205179317 which is below the given threshold of 10. Therefore, the test is passed (or return boolean value 1) as expected. The following figure shows the dataset used.

Target dataset (raster dataset): See Figure 8.10.

Reference dataset (vector dataset): See Figure 8.11.

Test Page

http://54.201.124.35/wps/test_client

Request File

PositionalAccuracy.GriddedDataPositionalAccuracy2.0.xml

Example Execution

See example request and response in Table B.2. in Appendix B.