Publication Date: 2017-04-25

Approval Date: 2016-03-09

Posted Date: 2016-12-02

Reference number of this document: OGC 16-059

Reference URL for this document: http://www.opengis.net/doc/PER/t12-A066

Category: Public Engineering Report

Editor: Stephane Fellah

Title: Testbed-12 Semantic Portrayal, Registry and Mediation Engineering Report


OGC Engineering Report

COPYRIGHT

Copyright © 2017 Open Geospatial Consortium. To obtain additional rights of use, visit http://www.opengeospatial.org/

WARNING

This document is an OGC Public Engineering Report created as a deliverable of an initiative from the OGC Innovation Program (formerly OGC Interoperability Program). It is not an OGC standard and not an official position of the OGC membership. It is distributed for review and comment. It is subject to change without notice and may not be referred to as an OGC Standard. Further, any OGC Engineering Report should not be referenced as required or mandatory technology in procurements. However, the discussions in this document could very well lead to the definition of an OGC Standard.

LICENSE AGREEMENT

Permission is hereby granted by the Open Geospatial Consortium, ("Licensor"), free of charge and subject to the terms set forth below, to any person obtaining a copy of this Intellectual Property and any associated documentation, to deal in the Intellectual Property without restriction (except as set forth below), including without limitation the rights to implement, use, copy, modify, merge, publish, distribute, and/or sublicense copies of the Intellectual Property, and to permit persons to whom the Intellectual Property is furnished to do so, provided that all copyright notices on the intellectual property are retained intact and that each person to whom the Intellectual Property is furnished agrees to the terms of this Agreement.

If you modify the Intellectual Property, all copies of the modified Intellectual Property must include, in addition to the above copyright notice, a notice that the Intellectual Property includes modifications that have not been approved or adopted by LICENSOR.

THIS LICENSE IS A COPYRIGHT LICENSE ONLY, AND DOES NOT CONVEY ANY RIGHTS UNDER ANY PATENTS THAT MAY BE IN FORCE ANYWHERE IN THE WORLD. THE INTELLECTUAL PROPERTY IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NONINFRINGEMENT OF THIRD PARTY RIGHTS. THE COPYRIGHT HOLDER OR HOLDERS INCLUDED IN THIS NOTICE DO NOT WARRANT THAT THE FUNCTIONS CONTAINED IN THE INTELLECTUAL PROPERTY WILL MEET YOUR REQUIREMENTS OR THAT THE OPERATION OF THE INTELLECTUAL PROPERTY WILL BE UNINTERRUPTED OR ERROR FREE. ANY USE OF THE INTELLECTUAL PROPERTY SHALL BE MADE ENTIRELY AT THE USER’S OWN RISK. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR ANY CONTRIBUTOR OF INTELLECTUAL PROPERTY RIGHTS TO THE INTELLECTUAL PROPERTY BE LIABLE FOR ANY CLAIM, OR ANY DIRECT, SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM ANY ALLEGED INFRINGEMENT OR ANY LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR UNDER ANY OTHER LEGAL THEORY, ARISING OUT OF OR IN CONNECTION WITH THE IMPLEMENTATION, USE, COMMERCIALIZATION OR PERFORMANCE OF THIS INTELLECTUAL PROPERTY.

This license is effective until terminated. You may terminate it at any time by destroying the Intellectual Property together with all copies in any form. The license will also terminate if you fail to comply with any term or condition of this Agreement. Except as provided in the following sentence, no such termination of this license shall require the termination of any third party end-user sublicense to the Intellectual Property which is in force as of the date of notice of such termination. In addition, should the Intellectual Property, or the operation of the Intellectual Property, infringe, or in LICENSOR’s sole opinion be likely to infringe, any patent, copyright, trademark or other right of a third party, you agree that LICENSOR, in its sole discretion, may terminate this license without any compensation or liability to you, your licensees or any other party. You agree upon termination of any kind to destroy or cause to be destroyed the Intellectual Property together with all copies in any form, whether held by you or by any third party.

Except as contained in this notice, the name of LICENSOR or of any other holder of a copyright in all or part of the Intellectual Property shall not be used in advertising or otherwise to promote the sale, use or other dealings in this Intellectual Property without prior written authorization of LICENSOR or such copyright holder. LICENSOR is and shall at all times be the sole entity that may authorize you or any third party to use certification marks, trademarks or other special designations to indicate compliance with any LICENSOR standards or specifications.

This Agreement is governed by the laws of the Commonwealth of Massachusetts. The application to this Agreement of the United Nations Convention on Contracts for the International Sale of Goods is hereby expressly excluded. In the event any provision of this Agreement shall be deemed unenforceable, void or invalid, such provision shall be modified so as to make it valid and enforceable, and as so modified the entire Agreement shall remain in full force and effect. No decision, action or inaction by LICENSOR shall be construed to be a waiver of any rights or remedies available to it.

None of the Intellectual Property or underlying information or technology may be downloaded or otherwise exported or reexported in violation of U.S. export laws and regulations. In addition, you are responsible for complying with any local laws in your jurisdiction which may impact your right to import, export or use the Intellectual Property, and you represent that you have complied with any regulations or registration procedures required by applicable law to make this license enforceable.

Table of Contents
Abstract

This engineering report documents the findings of the activities related to the Semantic Portrayal, Registry and Mediation components implemented during the OGC Testbed 12. This effort is a continuation of efforts initiated in the OGC Testbed 11. This report provides an analysis of the different standards considered during this effort, documents the rendering endpoints extension added to the Semantic Portrayal Service and the migration of the Portrayal metadata to the Semantic Registry, which is aligned with the DCAT REST Service API. We also discuss the integration of the CSW ebRIM for Application Schema with the Semantic Mediation Service, and document the improvements of the SPARQL Extensions, Portrayal and Semantic Mediation ontologies defined in the previous testbed.

Business Value

Catalog services usually provide discovery of data and services; however, the ability to discover other related resources that can help applications better understand and render the data are not commonly found. For example, getting available styles for a layer or a feature type. In Testbed 12 it was advanced the use of W3C semantic technologies to better integrate datasets, services, schemas, schema mappings , portrayal information, and layers.

What does this ER mean for the Working Group and OGC in general

This engineering report is important to the OGC Geosemantics Domain Working Group as it advances the semantic enablement of geospatial information found in catalogs such dataset, service and portrayal metadata potentially providing a bridge between the geospatial and semantic web communities. The testbed also produced a number of ontologies for portrayal, schema management and registry.

Keywords

ogcdocs, testbed-12, CSW, eb-RIM, catalogue, metadata, SPARQL, RDF, OWL, ontology, semantic web, linked data, DCAT, Portrayal, Schema, Schema Mapping, Hypermedia, REST.

Proposed OGC Working Group for Review and Approval

This engineering report will be submitted to the Geosemantics Domain Working Group for review.

1. Introduction

Catalog services usually provide discovery of data and services; however, the ability to discover other related resources that can help applications better understand and render the data are not commonly found. For example, getting available styles for a layer or a feature type. This report captures the work performed in Testbed 12 in setting up three types of semantic services based on W3C semantic technologies to better integrate datasets, services, schemas, schema mappings , portrayal information, and layers.

  1. The Semantic Registry Service allows discovering and search of geospatial assets (e.g. datasets, services, schemas, portrayal information, and layers). It is based on: the W3C Data Catalog Vocabulary (DCAT); the W3C PAV - Provenance, Authoring and Versioning ontology; and the Dublin Core Metadata Terms. The Service is based on a new developed ontology called Semantic Registry Information Model (SRIM) which generalizes the DCAT model to accommodate other types of geospatial assets. The service API is an hypermedia-driven and uses JSON-LD, Linked Data and Hypermedia Application Language (HAL).

  2. The Semantic Mediation Service provides the ability to perform transformation of data from one schema to another, including chaining of transformation. It is based on a SRIM application profile for describing schemas and schema mappings. The service is an hypermedia-driven REST API.

  3. The Semantic Portrayal Service provides the ability to render datasets to multiple output formats (e.g. SVG and PNG) and generate different styling encoding (e.g. SLD, MapCSS, and CartoCSS). It is based a set of portrayal ontologies for styles, symbols and graphics. It uses a SRIM profile to represent portrayal metadata.

1.1. Scope

This OGC document specifies semantic information models and REST APIs for Semantic Registry, Semantic Mediation and Semantic Portrayal Services. It introduces the Semantic Registry Information Model (SRIM), a superset of the W3C DCAT ontology. SRIM can accommodate registry items other than dcat:Dataset, such as Service description, Schema and Schema Mapping description, and Portrayal Information (Styles, Portrayal Rules, Graphics and Portrayal Catalog), Layer, and Map Context. The Semantic Registry Service is used as an integration solution for federating and unifying information produced by different OGC Catalog Service information by providing a simplified access through hypermedia-driven API, using JSON-LD, Linked Data and HAL-JSON. During the testbed, the Semantic Registry was used to store information about geospatial datasets and services, schemas and portrayal information. The Semantic Mediation Service was used to validate and perform transformation between schemas, including transformation chaining. The Semantic Portrayal Service was used as a convenience API to access Portrayal Registry information and perform rendering of geospatial data to different graphic representation (SVG, Raster and other pluggable formats).

1.2. Document contributor contact points

All questions regarding this document should be directed to the editor or the contributors:

Table 1. Contacts
Name Organization

Stephane Fellah

Image Matters LLC

Gobe Hobona

Envitia

Richard Martell

Galdos Inc

Luis Bermudez

OGC

1.3. Future Work

1.3.1. SRIM and ISO 19115 mapping

The SRIM ontology in Testbed 12 used Dublin Core Metatada Terms. Future work to enhance SRIM is to map it with ISO 19115. Request for changes to improve the current standard ISO 19115 might also be required to better align with Linked Data. ISO 19115 should provide better use of controlled vocabularies, linked data friendly identifiers, and a better service description that enables automated access to services.

1.3.2. SRIM Layer and Map Profile

There is no standard way to describe metadata for layers and maps. While layers and maps are derived from a Dataset, they have their own specific metadata. Future work could  investigate a profile description for layers and maps that extends the Registry Item and relates them to Datasets, Services and Portrayal Information. The description will be used by the Semantic Registry and Semantic Portrayal Service.

1.3.3. Pubsub and federation of Registry

In Testbed 12 , the Semantic Registry harvested information from a federation of CSW services to exercise the Semantic Registry Information Model (RIM) and the REST API. Future work could include improving the efficiency of the harvesting process by investigating the publish/subscribe protocol and versioning management of the register items in the Semantic Registry as they change over time.

1.3.4. Web of Vocabulary Ontology and Service

With the deployment of the Semantic Web and Linked Open Data, data sources have multiplied, as well as, the machine-processable controlled vocabularies that structure and constrain the interpretations of these data. These controlled vocabularies can be ontologies (RDF Schema, OWL), codelists, taxonomies, thesauri (SKOS) sometimes augmented with additional rules (SPARQL/SPIN rules, SWRL, RIF) ,and constraints (SHACL).

Vocabulary directories exist (e.g. LOV), but there is an ever-increasing demand for environments that simplify searching, editing and collaborative contributions to the vocabularies by non-experts of the Semantic Web. This creates a tension between very rich formalisms and a need to democratize participation in the life cycle of controlled vocabularies.

Vocabularies are most likely to be adopted and shared if they are made available easily. Nevertheless, despite successes in the use of SKOS for encoding vocabularies, current standards provide only low-level interfaces to vocabulary data. For example, many vocabularies are published as an RDF document for download. However, if the vocabulary is large, then the download will be commensurately large; if the user only wants to retrieve a single vocabulary term or select a few terms, this option requires processing on the client side. Alternatively, access to vocabularies is often provided at a SPARQL endpoint. SPARQL is the generic RDF query language. While it is powerful, it is also considered a low-level language similar to the relational database query language SQL and normally is only used by database administrators.

Some SKOS vocabularies are published via other HTTP interfaces. However, each implementation uses different protocols and supports a varied set of features (e.g. content-negotiation provided by the GEMET REST interface and NERC Data Grid’s Vocabulary Server SOAP interface). In some cases, one or both of human-readable formats and machine-readable formats are not available. Thus, discovery and access across vocabulary endpoints becomes challenging and ad-hoc.

There is a clear opportunity to design an API to match the SKOS and OWL vocabularies, taking advantage of the fact that most modern vocabulary content is structured using SKOS and OWL classes and predicates. This API can then be used as the basis for various higher-level vocabulary applications (NLP applications, Concept Recommender,Semantic Enricher,etc.) that can be used to enrich for example ISO 19115 metadata and other OGC services using controlled vocabularies in their metadata.

The Testbed 11 and 12 had explored the high-level description of ontologies and schemas to support semantic mediation. Future work can include investigation of the kind of metadata needed to enable search on controlled vocabularies by defining an ontology that addresses the following aspects:

  • Classification of Vocabulary types

  • Relationships to other vocabularies (extensions, imports, specialization, metadata vocabularies,etc).

  • Statistical information about vocabularies (number of concepts, concept schemes, classes, properties, instances, datatypes)

  • Schema encoding (OWL, RDF Schema, SKOS)

  • Expressiveness

  • Preferred prefix

  • Preferred Namespace uri

  • Governance metadata

  • Versioning information

Based on this ontology, we propose to define a standard REST API to search and access vocabulary metadata and their terms using best practices in REST API (hypermedia driven for example).

1.3.5. Application of Shape Constraint Language (SHACL) for Linked Data

The Semantic Registry Information Model (SRIM), developed during Testbed 12, is defined as a superset of W3C DCAT standard and encoded as an OWL ontology. However, the OWL does not capture some of the semantic integrity constraints that are necessary to validate the instance information encoded using the SRIM ontology profiles. This is not an isolated problem. The DCAT ontology, for example, defines a set of classes and properties and reuses a large number of external vocabularies such as Dublin Core, but does not provide any restrictions in the ontology. Users have to read the profile documents such as DCAT-AP or GeoDCAT-AP to know which and how properties should be applied for a given class (mandatory, recommended or optional). For example, a Dataset could have only one title per language, or contact information should have either a person name, organization name or position name, and either email or telephone number. These kind of restrictions cannot be captured with OWL, and until now it required human interpretation to implement the constraints in code.

To fill these gaps, the emerging W3C standard called Shape Constraint Language (SHACL) provides a powerful framework to define the "shape" of the graph data and the ability to define complex integrity constraints using well-defined constraints constructs defined in RDF and SPARQL/Javascript constraints. SHACL is not a replacement of RDFS/OWL, but a complementary technology that is not only very expressive but also highly extensible. While RDFS and OWL are used to define vocabularies terms (classes/properties) and their hierarchies (subclasses, subproperties), as well as the nature of the classes and properties (union, intersection, complement of classes, transitive, inverse, symmetric properties, etc.), SHACL is more appropriate to capture the property constraints (cardinality, valid values or shape values and interdependencies between them) and capable of accommodating multiple profiles by providing different shapes for the same ontology. The SHACL vocabulary is not only defined in RDF itself, but the same macro mechanisms can be used by anyone to define new high-level language elements and publish them on the web. This means that SHACL will not only lead to the reuse of data schemas but also to domain-specific constraint languages. Furthermore, SHACL can be used in conjunction with a variety of languages beside SPARQL, including JavaScript. Complex validation constraints can be expressed in JavaScript so that they can be evaluated client-side. In addition, SHACL can be used to generate validation reports for quality control with potentially suggestions to fix validation errors. Overall, SHACL is a future-proof schema language designed for the Web of Data.While SHACL is not yet a standard, there are already existing implementations using it (e.g. Topbraid) .

Future work could include investigation of the use of SHACL shapes to define application profiles, generation and data entry, data validation, and quality control of linked data information.

1.3.6. Composite Symbology and alternates renderers for Semantic Portrayal Service.

During the Testbed 11, it was introduced the portrayal ontology that focused on point-based symbology (icons for Emergency Management). Testbed 12 extended this work by providing a richer symbolizer and graphics ontology that can accommodate line and area-based symbols along with graphic attributes applicable to these symbols. Future work could extend the ontology to accommodate more complex symbology including composite symbols and symbol templates. The extended ontology will help describe more advanced symbology standards such as the family of MIL2525 symbols.

This Testbed managed to render symbol legend based on their definition, however more work is needed to develop rendering a SVG map based on the portrayal ontology. Future work can lay out the foundation to express styles that have at least the same expressiveness as SLDs. The proposed work can extend the portrayal ontology to represent composite symbols and symbol templates. Related future work can include investigation of other renderer outputs such as JSON encoding of the portrayal information, so they can be handled on the client side in HTML5 Canvas or other rendering libraries such as D3.js. Other renderers may also investigate SLD production from the RDF descriptors and investigate how unsupported features from the portrayal ontology can be supported in less expressive graphic languages than SVG, such as KML.

1.4. Foreword

Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. The Open Geospatial Consortium shall not be held responsible for identifying any or all such patent rights.

Recipients of this document are requested to submit, with their comments, notification of any relevant patent claims or other intellectual property rights of which they may be aware that might be infringed by any implementation of the standard set forth in this document, and to provide supporting documentation.

2. References

The following documents are referenced in this document. For dated references, subsequent amendments to, or revisions of, any of these publications do not apply. For undated references, the latest edition of the normative document referred to applies.

  • OGC 16-062 - OGC® Testbed-12 Catalogue and SPARQL Engineering Report

  • OGC 15-058 - OGC® Testbed-11 Symbology Mediation Engineering Report

  • OGC 15-054 - OGC® Testbed-11 Implementing Linked Data and Semantically Enabling OGC Services Engineering Report

  • OGC 13-084r2, OGC I15 (ISO19115 Metadata) Extension Package of CS-W ebRIM Profile .0, 2014

  • OGC 12-168r6, OGC® Catalogue Services 3.0 - General Model, 2016

  • OGC 11-052r4, OGC GeoSPARQL- A Geographic Query Language for RDF Data, 2011

  • OGC 08-125r1, KML Standard Development Practices, Version 0.6, 2009.

  • OGC 07-147r2, KML Version 2.2.0.2008

  • OGC 07-110r4, CSW-ebRIM Registry Service ebRIM profile of CSW (.0.1), 2009

  • OGC 07-045, OGC Catalogue Services Specification 2.0.2 - ISO Metadata Application Profile (.0.0), 2007

  • OGC 07-006r1, OpenGIS Catalogue Service Implementation Specification 2.0.2, 2007

  • OGC 06-129r1, FGDC CSDGM Application Profile for CSW 2.0 (0.0.12), 2006

  • OGC 06-121r9, OGC® Web Services Common Standard

  • OGC 06-121r3, OpenGIS® Web Services Common Specification, version 1.1.0 with Corrigendum 1 2006

  • OGC 05-078r4, OpenGIS Styled Layer Descriptor Profile of the Web Map Service Implementation Specification, Version 1.1.0, 2006

  • OGC 05-077r4, OpenGIS® Symbology Encoding Implementation Specification, Version 1.1.0, 2006.

  • ISO/TS 19139:2007, Geographic information — Metadata — XML schema implementation

  • ISO 19119:2005, Geographic information — Services

  • ISO 19117:2012, Geographic information — Portrayal

  • ISO 19115:2003, Geographic information — Metadata

  • ISO 19115:2003/Cor 1:2006, Geographic information — Metadata

  • ISO 19115-1:2014, Geographic information — Metadata — Part 1: Fundamentals

  • Dublin Core Metadata Initiative, last visited 12-09-2016, available from http://dublincore.org/

  • NSG Metadata Foundation (NMF) – Part 1: Core, version 2.2, 23 September 2014 https://nsgreg.nga.mil/doc/view?i=4123

  • DGIWG 114, DGIWG Metadata Foundation (DMF),last visited 12-09-2016, available from https://portal.dgiwg.org/files/?artifact_id=9189&format=pdf

  • DoD Discovery Metadata Specification (DDMS),last visited 12-09-2016, available from https://metadata.ces.mil/dse-help/DDMS/index.htm

  • SPARQL Protocol and RDF Query Language (SPARQL),last visited 12-09-2016, available from https://www.w3.org/TR/rdf-sparql-query

  • DCAT, last visited 12-09-2016, available from https://www.w3.org/TR/vocab-dcat/

  • National System for Geospatial Intelligence Metadata Implementation Specification (NMIS) – Part 2: XML Exchange Schema

  • Project Open Data Metadata Schema v1.1 https://project-open-data.cio.gov/v1.1/schema/

  • Asset Description Metadata Schema (ADMS) https://www.w3.org/TR/vocab-adms/

  • JSON-LD 1.0 https://www.w3.org/TR/json-ld/

3. Terms and definitions

For the purposes of this report, the definitions specified in Clause 4 of the OWS Common Implementation Standard [OGC 06-121r9] and in OGC® Abstract Specification Topic TBD: TBD shall apply. In addition, the following terms and definitions apply.

3.1. feature

representation of some real world object or phenomenon

3.2. interoperability

capability to communicate, execute programs, or transfer data among various functional units in a manner that requires the user to have little or no knowledge of the unique characteristics of those units [ISO 19119]

3.3. map

pictorial representation of geographic data

3.4. model

abstraction of some aspects of a universe of discourse [ISO 19109]

3.5. ontology

a formal specification of concrete or abstract things, and the relationships among them, in a prescribed domain of knowledge [ISO/IEC 19763]

3.6. portrayal

portrayal presentation of information to humans [ISO 19117]

3.7. semantic interoperability

the aspect of interoperability that assures that the content is understood in the same way in both systems, including by those humans interacting with the systems in a given context

3.8. semantic mediation

transformation from one or more datasets into a dataset based on a different conceptual  model.

3.9. symbol

a bitmap or vector image that is used to indicate an object or a particular property on a map.

3.10. symbology encoding

style description to apply to the digital features being rendered

3.11. syntactic interoperability

the aspect of interoperability that assures that there is a technical connection, i.e. that the data can be transferred between systems

4. Conventions

4.1. Abbreviated terms

  • API Application Program Interface

  • CRS Coordinate Reference System

  • CSW Catalog Services for the Web

  • DCAT Data Catalog Vocabulary

  • DCAT-AP DCAT Application Profile for Data Portals in Europe

  • DCMI Dublin Core Metadata Initiative

  • EARL Evaluation and Report Language EU European Union

  • EuroVoc Multilingual Thesaurus of the European Union

  • GEMET GEneral Multilingual Environmental Thesaurus

  • GML Geography Markup Language

  • GeoDCAT-AP Geographical extension of DCAT-AP

  • IANA Internet Assigned Numbers Authority

  • INSPIRE Infrastructure for Spatial Information in the European Community

  • ISO International Standardisation Organisation

  • JRC European Commission - Joint Research Centre MDR Metadata Registry

  • N3 Notation 3 format

  • NAL Named Authority Lists

  • OGC Open Geospatial Consortium

  • OWL Web Ontology Language

  • RDF Resource Description Framework

  • RFC Request for Comments

  • SE Symbology Encoding

  • SLD Style Layer Descriptor

  • SKOS Simple Knowledge Organization System

  • SPARQL SPARQL Protocol and RDF Query URI Uniform Resource Identifier

  • SVG Scalable Vector Graphics

  • TTL Turtle Format

  • URI Unique Resource Identifier

  • URL Uniform Resource Locator

  • URN Uniform Resource Name

  • W3C World Wide Web Consortium

  • WG Working Group

  • WKT Well Known Text

  • XML eXtensible Markup Language

  • XSLT eXtensible Stylesheet Language Transformations

5. Overview

This engineering report includes the following major sections:

  • Status Quo & New Requirements Statement: This section describes current standards and requirements for developing Semantic Registry, Semantic Mediation and Semantic Portrayal Services.

  • Solutions: This section describes the solution architectures considered by the testbed, as well as the solution architecture implemented by the testbed. We organized the reporting of the activities of this thread around the services implemented.

  • Appendix sections: These sections include semantic models and REST API of each service.

6. Status Quo & New Requirements Statement

6.1. Status Quo

6.1.1. Semantic Registry Service

Current Catalog solutions are highly dependent upon the metadata model employed for the service and data descriptions. Many of today’s service instances and data holdings are based on an ISO 19115 metadata model using XML as the exchange format. These solutions are very document-centric due to the nature of XML Document Object Model (DOM) that are used to populate these catalogs. This is often a challenge when information need to be integrated and be linked together, as it often requires to use complex protocols (XLink, CSW GetRecords operations) to circumvent the lack of global unique identifier system in XML document. This often leads to significant overhead to query across multiple documents and difficulties in reusing and linking information.

Linked Data provides a solution as information are modeled as a Directed Labeled Graph. The nodes of the graph (called resources) and edges (called properties) are identified with globally unique resource identifiers (URIs) which are assigned to a unique meaning (defined in ontologies). While best practices require to have resolvable resource URIs in Linked Data, in practice it is often not the case. Linked Data from different namespaces are usually aggregated in catalogs to facilitate search and discovery and get easy access to these resource descriptions.

The W3C has released the DCAT recommendation in early 2014. “DCAT is an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web. By using DCAT to describe datasets in data catalogs, publishers increase discoverability and enable applications easily to consume metadata from multiple catalogs. It further enables decentralized publishing of catalogs and facilitates federated dataset search across sites.” (W3C). Thus, DCAT defines a standard way to publish machine-readable metadata about a dataset. It does not make any assumptions about the format of the datasets described in a catalog.

Since then, a number of profiles of DCAT have been created and adopted such as DCAT-AP, GeoDCAT-AP, ADMS and Project Open Data (POD), mostly focused on Dataset description. There is no well-established profile that either generalize or specialize DCAT to describe services, schemas, schema mapping, vocabularies and portrayal information. The Linked Data Query Language SPARQL and its geospatial extension GeoSPARQL are now well-established standards to query Linked Data representation with geospatial information. The SPARQL query Protocol provides a partial answer to access DCAT-related data but it is not sufficient and sometimes too complex to manage and access catalog information through a REST API. CKAN API, an open-source software powering many Open Data Portal have provided recently a plugin to export data in DCAT format. As there is more and more data published in DCAT format in the community, there is a need to have a simple and standardized REST API to manage and access Linked Data describing assets managed by catalogs/registries.

Furthermore, there are many clients that access web services from web browser using protocol based on JSON written in Javascript. The direct use of RDF formats such as RDF/XML, N3, Turtle has proved that they are not easily exploitable by these clients. The new W3C standard JSON-LD is designed to address this gap. It provides the ability to describe Linked Data in JSON and vice versa by leveraging the JSON-LD Context.

The use of REST API in the web community has been widely successfull because it aligns well with web principles and enables rapid integration to build web-based application. A number of effort related to Linked Data and REST API have been conducted such as Linked Data Platform 1.0 (LDP). Recent trends in the industry see the emergence of Hypermedia formats (HAL, Hydra, Collection+JSON, Siren) to support data REST API with hypermedia controls. This introduces a level of decoupling between server and client ecosystem that could enable future evolution of APIs without breaking client ecosystem. The challenge of this testbed is to find out how REST, Hypermedia API, Linked Data representation and APIs, JSON-LD can be combined together to meet the requirements of the service.

6.1.2. Semantic Mediation Service

During the OGC Testbed 11, Image Matters LLC developed the first iteration of the Semantic Mediation Service, demonstrating the transformation of Homeland Security Working Group (HSWG) Incident Ontology to the Canadian Emergency Management Symbology (EMS) Ontology, using a rule engine. The engine was based on the Semantic Mediation Ontology and SPARQL Extension ontology expressing rules and functions of transformations between two semantic models (expressed as ontologies). The transformation from a source ontology to target ontology is called Alignment in the Semantic mediation Ontology.

While semantic transformation in the Testbed 11 was demonstrated on Linked Data representation, there is a large number of messages or documents in the current OGC standards that are based on XML representation. Their structures and syntaxes are defined with XML Schemas (GML, ISO 19139, CSGDM, NMIS). The messages often need to be converted to other XML representation based on another XML schema, typically using XSL Transformations but also by scripts. There is a need to search and discover schema definitions and existing schema mappings and finding optimal transformation paths between two schemas and perform these transformations on demand. The current OGC standards do not provide standard profile to represent schemas and schema mappings in existing OGC Catalog Services. Similary, there is no ontology or DCAT profile capable to describe semantically Schema and Schema Mappings.

There is no existing REST service that enables the search and discovery of schemas and schema mapping, performs validation, calculate optimal transformation paths and perform transformation od message from a schema to another.

6.1.3. Semantic Portrayal Service

The current OGC standards related to Portrayal (Style Layer Descriptor and Symbol Encoding) are based on XML schemas technologies. Styles definition, rules, symbolizers and graphics are defined XML document and do not provide global reusable identifier that allow the reusability and linking of portrayal information to emerge a "web of portrayal information". The standard ISO 19117 that describe portrayal information define an abstract model is mostly designed to be implemented in code but is less adequate to be used a descriptive model.

During the testbed 11, an initial set of portrayal ontologies were defined to describe semantically styles, portrayal rules, point-based symbols and graphics. The information were made accessible through a specialized REST API to be accessible by a WPS to produce SLD documents. The ontologies were limited only on point-based graphic symbols (based on PNG and True type font). The ontology was not able to model line, text and area based symbols and more advanced graphic objects and properties.

A number of attempts have been done in previous testbeds (OWS-8) to represent portrayal information in OGC catalog, but no standard profile has been defined so far and none of them has used a semantic-based approach.

There is a need to manage portrayal information in a catalog that can be related other information such as feature types, layers and taxonomies. This information can be searched and used to perform map portrayal using symbologies from different communities. The initial implementation of the Semantic Portrayal Service in the OGC Testbed 11 did not provide rendering endpoint for the symbology.

6.2. Requirements Statement

6.2.1. Semantic Registry Service

The following were the initial four general RFP requirements on the DCAT as a REST service implementation.

  • It shall be evaluated how DCAT can describe the same service and data sets in RDF as the other catalog services do using XML Schema instance documents compliant to ISO 19115.

  • Demonstrate what role DCAT can play as a heterogeneous catalog integration mechanism and as a possible simplification of the setup and use of catalogs.

  • The DCAT REST implementation shall serve as a Semantic Portrayal Catalog. The Semantic Portrayal Catalog uses an ontology model for managing styles and provides interfaces to access,create,read,update,and delete styles.

  • The DCAT as a REST service shall interface with the Schema Registry. The Schema Registry enables the discovery of XML Schemas, tranformation logic, and ontologies. These items shall be served by the DCAT as a service implementation.

From these initial high-level requirements, we derived and added the following ones:

  • Definition of Common Core Model to represent datasets, services, portrayal information, schema and schema mappings.

  • Demonstrate usage of Linked Data to link heteregeneous Registry Objects

  • Definition of REST API

  • Accomodate Web Client producing and consuming JSON.

  • Ability to evolve API by favoring decoupling between server and client

  • Manage multiple registers (Dataset,Schema, Portrayal Registers)

  • Harvesting API from multiple catalogs (CSW ebRIM, CSW ISO and more)

6.2.2. Semantic Mediation Service

The following items describes the requirements for the Semantic Mediation Service:

  • SRIM Profile for schema and schema mapping

  • Semantic Registry as a service shall interface with the Schema Registry which enables the discovery of XML Schemas, tranformation logic, and ontologies.

  • Support of XML Schema and XSL Transformation

  • Harvesting of Schema and Schema mapping from CSW ebRIM

  • Representation of schema and schema mapping using Linked Data representation

  • Definition of REST API

  • Validation of Document against Schema

  • Transformation from document from Schema A to Schema B.

  • Transformation chaining

6.2.3. Semantic Portrayal Service

The following items describes the requirements for the Semantic Mediation Service

  • Refinement Portrayal ontologies and introduction of Graphic ontology to support line, area, text and composite symbols.

  • Ability to convert of SLD to Semantic Representation

  • The Semantic Registry implementation shall serve as a Semantic Portrayal Catalog. The Semantic Portrayal Catalog uses an ontology model for managing styles and provides interfaces to access, create, read, update, and delete styles.

  • Reusability and linking to existing definition (rules, styles, graphic, symbolizers)

  • REST API for Rendering of Symbols

  • REST API for Rendering of Data using a given style

  • Ability to integrate REST API with Web-based client using JSON.

  • Ability to use portrayal on multiple data representation (Linked Data, GML, JSON).

7. Solutions

This section summarizes the solutions that have been envisioned at the beginning of the testbed, experimented with during the testbed, and that have either been discarded, or implemented, or the decision has been deferred to future activities.

7.1. Targeted Solutions

This first section addresses all solutions that have been discussed during the testbed. They include those possible solutions that have been discarded during the testbed.

7.1.1. Semantic Registry Service Targeted Solutions

The initial requirements of the testbed was to implement a DCAT REST Service that enables the access to DCAT Dataset information. However it quickly emerged that the integration of ISO 19139, NMIS, Services, Schema, Schema Mapping and Portrayal information in the service was not possible using DCAT alone, as DCAT is mostly focused on Dataset description. More generalization was needed to accommodate the different types of information objects in the future (for example Map, Layer, Vocabulary, Sensor).

The closest standard that provides an extensible framework to represent different business objects is the Electronic Business eXtensible Markup Language (ebXML) Registry Information Model (ebRIM). The ebXML registry describes objects that reside in a repository for storage and safekeeping. The information model does not deal with the actual content of the repository. All elements of the information model represent metadata (data type and data relationships) about the content stored in the repository. Such information is used to facilitate ebXML-based Business-to-Business partnerships and transactions. The registry information model provides a high-level schema for the ebXML registry. The ebRIM model has a lot overlapping constructs with the ones defined in RDF and OWL. However we found out that the definition of the classes and properties in ebRIM were often not well aligned with well-established ontologies (with the exception of Dublin Core Terms) and the best practices in the Linked Data community. ebRIM introduces a lot of verbosity to describe simple graph structure thus making very hard to build queries that naturally matches the graph structure. The Linked Data model and the standard SPARQL query language provides a more modern, decentralized approach that lowers the bar of integration and learning curve of manipulating graph-oriented data structure and seems a better match to favor reusability and linkage of information.

By analyzing the ebRIM Model and the ISO 19135 standard, we decided to create a superset of DCAT that support different types of registry items. We borrowed the notion of registers and items from both standards. The proposed solution is a new service called Semantic Registry Service that manages items described semantically and can perform semantic enrichment to better enable search and discovery for information.

Semantic Registry Information Model

We conducted a comparative analysis between different standards related to ISO 19115 profiles and DCAT-related standards (DCAT, DCAT-AP, ADMS, GeoDCAT-AP, Project Open Data). The goal of this analysis was to evaluate how well DCAT can describe the same registry objects (services, datasets, schema, schema mappings and portrayal information) in RDF as the other catalogue services do using XML Schema instance documents. The crosswalk was informed by the work done by the European Commission on developing a geospatial profile for DCAT (alias GeoDCAT). The metadata model provided by DCAT includes classes and attributes for identifying and describing catalogues, datasets, catalogue records, publishing agents and distribution. The metadata model does not include any classes or attributes for identifying or describing services. The absence of classes and attributes for service metadata, portrayal information, schemas and schema mappings in DCAT meant that several ISO service metadata elements did not have equivalent fields in DCAT to map to. To fill the gaps, we determined that a superset of DCAT was needed.

The testbed designed a Semantic Registry Information Model (SRIM) by generalizing the DCAT model to include concepts from ISO 19135, the international standard for procedures for item registration in geographic information systems and core metadata needed to express a large variety of information objects and enable better search and discovery.

Semantic Registry REST API

To facilitate the integration with web-based clients, we decided to implement a REST API that primarily supports JSON_LD output format. However we support also Linked Data Formats for item and register descriptions to enable machine to machine integration. The choice of JSON-LD was based on the fact that it provides a bridge between Linked Data and JSON.

It was also decided that the Semantic Registry REST API will implement both Level 2 and Level 3 (hypermedia REST API) on the Richardson Maturity Model so it can accommodate existing frameworks (such as AngularJS) that build REST API by constructing URL based on well defined url patterns on the client side. The Hypermedia REST API uses the Hypermedia Application Language as it has widespread adoption in the community and demonstrates well the use of hypermedia-control within JSON, and how REST API can evolve independently from the client ecosystem without breaking compatibility.

Semantic Registry Integration with Multi-Catalogs

A number of approaches were considered to integrate multiple catalogues services using different protocols and models (CSW 2.0, 3.0 and ebRIM) with the semantic registry. The different approaches considered during this testbed are described in details in Testbed-12 Catalogue and SPARQL Engineering Report[OGC 16-062].

For practical reasons and due to limited timeframe, we choose to implement a harvester service that convert records from different catalogs to the SRIM model and store them in the Semantic Registry Repository. We also decided to map only the elements of information that were relevant for search and discovery. We also chose to postpone the issue of synchronization of sources for future testbeds. Publish/Subscribe protocols will need to be investigated in the context of the semantic registr in future testbeds.

7.1.2. Semantic Mediation Service Targeted Solutions

The focus of Semantic Mediation Service for this testbed was to focus on transformation of XML document expressed in XML schema using transformation based on XSLT. One of the requirement was to define a ebRIM profile to represent Schema and Schema Mapping served by aebRIM CSW implementation and integrate it with the Semantic Registry. A review of the existing standards to represent schemas and schema mappings in a registry was conducted. Only DCAT and ADMS were found relevant but no profile were defined to accommodate the specificities of schemas and schema mappings such as source and target schema in a mapping. A review of the ebRIM model in CSW was also performed and concluded that an extension was needed to represent schema and schema mapping information in the ebRIM model.

Instead of defining a separate service that manages the schema and schema mapping separately, we decided to extend the core Semantic Registry Information Model by creating a profile to represent them semantically. The benefit of this approach is to test how well the core model can be extended and how well the Semantic Registry REST API can be reused to accomodate different domain models and convenience service APIs. We decided to implement the Semantic Mediation Service as convenience service on top of the Semantic Registy that performs validation and transformation between schemas (including finding a chain of transformation between two schemas). The Semantic Mediation Service would delegate the CRUD operations for schemas, schemas mappings to entirely to the Semantic Registry. We also decided to use a REST API that provide hypermedia controls with JSON-LD representation to lower the bar of integration with web clients, as well as getting Linked Data representation (RDF/XML, Turtle) to describe schemas and schema mappings for machine to machine support.

7.1.3. Semantic Portrayal Service Targeted Solutions

The focus of Semantic Portrayal Service for this testbed was on storing and accessing portrayal information managed by the semantic registry to support symbology mediation and rendering. Instead of defining a separate service that manages the portrayal information separately, we decided to extend the core Semantic Registry Information Model by creating a profile to represent this information semantically. The benefit of this approach is to test how well the core model can be extended and how well the Semantic Registry REST API can be reused to accommodate different domain models. We decided to implement the Semantic Portrayal Service as convenience service on top of the Semantic Registry that performs portrayal information search and rendering. The Semantic Portrayal Service would delegate the Creation/Update/Delete operations for portrayal information entirely to the Semantic Registry. We also decided to use a REST API that provide hypermedia controls with JSON-LD representation to lower the bar of integration with web clients, as well as getting Linked Data representation (RDF/XML, Turtle) to describe portrayal information for machine to machine support.

7.2. Recommendations

This second section summarizes the recommended solution(s) that will be further described in following clauses. It briefly explains the solution(s) and ideally links to relevant sections.

7.2.1. Semantic Registry Service Recommendations

Semantic Registry Information Models

To provide an extensible framework for representing information in a registry, we defined a superset of DCAT called Semantic Information Registry Model (SRIM) that defines the set of core classes and properties that can be used to represent any resources of a domain of interest. The core ontology is extended by defining application profiles. The following profiles were developed during the testbed:

  • Dataset/Service Profile: Used to describe Dataset and Services (such as the one defined in NMIS, ISO 19139). This profile is heavily based on DCAT and GeoDCAT-AP.

  • Schema Application Profile: Used to describe Schema and Schema Mapping (which extends DCAT Dataset) and used by the Semantic Mediation Service

  • Portrayal Application Profile: Used to describe Portrayal information such as Styles, Symbols, Portrayal Rules. This profile was used by the Semantic Portrayal Service.

Semantic Registry REST API

To facilitate the integration of clients with the Semantic Registry Service, we recommended the use of REST API supporting the encoding of the SRIM profiles in Linked Data format using RDF/XML, Turtle, N-Triples and JSON-LD. We also recommended to accommodate Level 2 (Resources with HTTP Verbs) and Level 3 (Hypermedia-driven) of the Richardson Maturity Model. We choose the Level 3 Hypermedia-driven API using the Hypermedia Application Language (HAL+JSON), which is gaining in popularity in the REST community.

Integration of Multi-Catalog REST API

For this testbed, the Semantic Registry service harvested metadata from different OGC Web Catalogs and converted the information to SRIM profiles encoding, but we also allowed for cascading requests to other GeoSPARQL Services that implement the profiles.

7.2.2. Semantic Mediation Service Recommendations

To facilitate the integration of clients with the Semantic Mediation Service, we recommended the use of REST API supporting the encoding of the SRIM Schema Application profile in Linked Data format using RDF/XML, Turtle, N-Triples and JSON-LD. We also recommended to accomodate Level 2 (Resources with HTTP Verbs) and Level 3 (Hypermedia-driven) of the Richardson Maturity Model. We choose the Level 3 Hypermedia-driven API using the Hypermedia Application Language (HAL+JSON), which is gaining in popularity in the REST community. To favor reusability of functionalities, all the CRUD operations of the schemas and schema mappings were implemented by the Semantic Registry. The Semantic Mediation Service was build a convenience service on top of the Semantic Registry to provide search capabilities, validation, transformation path calculation and actual transformation of document based on the path calculated from the schema mappings managed by the registry.

7.2.3. Semantic Portrayal Service Recommendations

To align better with current rendering engine implementation and current descriptive standard for Portrayal (SE, SLD), we decided to align the portrayal ontology closer to the OGC Symbol Encoding (SE)and SVG. We developed the Graphics and Symbolizer ontologies that are closely with these standards, but provide mechanism to support future extensions for more complex stlying scenarios.

The Semantic Portrayal REST Service delegates the CRUD operations on portrayal information to the Semantic Registry which implements the SRIM Portrayal Profile. The Semantic Portrayal Service implements a REST API Level 2 and Level 3 on Richardson Maturity Model. We implemented the Level 3 Hypermedia-driven API using the Hypermedia Application Language (HAL+JSON), which is gaining in popularity in the REST community. The Semantic Portrayal Service should be a convenience service build on top of the Semantic Registry containing Portrayal information by providing search capabilities and rendering points for rendering symbol glyphs for legend and map rendering of geospatial data.

8. Semantic Registry Service

8.1. Overview

Semantic metadata plays a central role in facilitating the discovery and the assessment of geospatial assets (such as datasets, services, portrayal information, schemas, maps, layers), and the integration of these assets in a specific mission. There are a number of standards, formats and APIs that provide the metadata for these assets, but in order to perform efficient search, we need to convert this information into a unified machine readable semantic representation. It is this conversion that enables the discovery of relevant resources that satisfy the mission of the end user. As we increase our understanding of the kind of metadata information needed to perform better and smarter search, we need a model that accommodates extensions over time without breaking the proposed architecture.

During this effort, a number of metadata standards were reviewed (including W3C standard DCAT, DCAT-AP, GeoDCAT-AP, ADMS, Project Open Data 1.1, Dublin Core, ISO 19115, ISO 19119) to identify the common and relevant metadata information needed for search and discovery and to identify any additional metadata information needed to describe dataset, service, portrayal, schema and schema mapping information. It quickly emerged that the DCAT standard and its different application profiles were dataset-centric and insufficient to describe the metadata for portrayal information, schemas, and services. The goal of this effort was not to define a new standard, but to leverage the existing standards to define an application profile of DCAT, with additional properties and fields, that could accommodate the schema, schema mapping, service and portrayal information needed for enhanced search and discovery while still preserving backward compatibility with existing standards.

The effort resulted in a new ontology called Semantic Registry Information Model (SRIM). SRIM is defined as a superset of DCAT and its existing application profiles (DCAT-AP, GeoDCAT-AP,ADMS). It introduces a superclass of dcat:Dataset called srim:Item and the notion of a Register (as defined in ISO 19135). The ontology draws from multiple well-established standards such as W3C DCAT, Project Open Data 1.1, DCAT-AP, GeoDCAT-AP, VCard, Dublin Core, PAV, and ISO 19115, but also addresses some gaps in the standards, such as the description of web services (for example OGC WMS, WFS), richer descriptions of geospatial data, and additional metadata to model schema, schema mapping, and portrayal information, to enable better semantic search of resources that fit with a user’s mission. SRIM enables the integration of different metadata providers (CSW, CKAN, POD WAF, WMS, and WCS) by providing a common core vocabulary to describe resources (data, services, vocabularies, map, layers, schemas, etc.) and by accommodating the specificities of each resource by leveraging the built-in extensibility mechanism of OWL. The integration is done through the use of a semantic bridge that maps the syntactic metadata (JSON, XML based) to the semantic representation based on the SRIM model. The SRIM Core model has been extended by introducing SRIM application profiles to represent other kinds of geospatial assets such as schemas and portrayal information (see sections on Semantic Mediation and Semantic Portrayal Service).

The purpose of the Semantic Registry Service (initially referred to as DCAT REST API) is to define a common interchangeable metadata format for geospatial portals and a REST protocol to access this information. In order to achieve this, SRIM defines a set of classes and properties, which are grouped into mandatory, recommended and optional. Such classes and properties aid interoperability by corresponding to information about register items and registers that is shared by many data portals. Although the Semantic Registry is designed to be independent from its actual implementation, RDF [RDF] and Linked Data [LDBOOK] are the reference technologies that perform the modeling to preserve the semantic fidelity of the conceptual model. However, we wish to facilitate a wide adoption, so we are providing an encoding based on JSON, which could be converted transparently back to a semantic model using a JSON-LD context. The JSON is closely aligned with the Project Open Data metadata schema 1.1 standard, but some extensions and modifications were made when needed to accomodate the Semantic Registry’s requirements. Preferring a decoupling of the server and client ecosystem, the Semantic Registry implementation uses a hypermedia-driven REST API using the Hypermedia Application Language (HAL) with JSON-LD as the payload. Every endpoint of the REST API also provides a Linked Data representation of the resources based on the SRIM ontology.

The following sections describe the different standards that were reviewed, the SRIM model and the implementation details on both the server and the client sides. We also explain the rationale behind some of the design decisions when applicable.

8.2. Review of existing standards

8.2.1. DCAT

DCAT is an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web. By using DCAT to describe datasets in data catalogs, publishers increase discoverability and enable their applications to easily consume metadata from multiple catalogs. It further enables decentralized publishing of catalogs and facilitates federated dataset search across sites. Aggregated DCAT metadata can serve as a manifest file to facilitate digital preservation.

dcat model
Figure 1. DCAT Model

8.2.2. DCAT-AP

The DCAT Application profile for data portals in Europe (DCAT-AP) is a specification based on the Data Catalogue vocabulary (DCAT) for describing public sector datasets in Europe. Its basic use case is to enable cross-data portal search for data sets and to allow public sector data to be easily searchable across borders and sectors. This can be achieved by the exchange of descriptions of datasets among data portals.

In February 2015, the ISA² programme of the European Commission has started an activity to revise the DCAT-AP, based on experience gained since its development in 2013. The outcome of this effort was the publication of DCAT-AP 1.1.

The European Data Portal is implementing the DCAT-AP as the common vocabulary for harmonizing descriptions of over 258,000 datasets harvested from 67 data portals from 34 countries. The DCAT-AP is used in the Open Data Support service initiated by the European Commission with the purpose of realizing the vision of European data portals.

8.2.3. GeoDCAT-AP

GeoDCAT-AP is defined as an extension of DCAT-AP for describing geospatial datasets, dataset series, and services. It provides an RDF syntax binding for the union of metadata elements defined in the core profile of ISO 19115:2003 and those defined in the framework of the INSPIRE Directive. Its basic use case is to make spatial datasets, data series, and services searchable on general data portals, thereby making geospatial information better searchable across borders and sectors. This can be achieved by the exchange of descriptions of datasets among data portals.

8.2.4. Asset Description Metadata Schema (ADMS)

ADMS is a profile of DCAT that is used to describe semantic assets (or just 'Assets'). These assets are defined as highly reusable metadata (e.g. xml schemata, generic data models) and reference data (e.g. code lists, taxonomies, dictionaries, vocabularies) that are used for eGovernment system development.

The ADMS model is intended to facilitate federation and co-operation. Like DCAT, ADMS has the concepts of a repository (catalog), assets within the repository that are often conceptual in nature, and accessible realizations of those assets, known as distributions. An asset may have zero or multiple distributions. As an example, a W3C namespace document can be considered to be a Semantic Asset that is typically available in multiple distributions, one or more machine processable versions and one in HTML for human consumption. An asset without any distributions is effectively a concept with no tangible realization, such as a planned output of a working group that has not yet been drafted.

ADMS is an RDF vocabulary with an RDF schema available at its namespace http://www.w3.org/ns/adms . The original ADMS specification published by the European Commission [ADMS1] includes an XML schema that also defines all of the controlled vocabularies and cardinality constraints associated with the original document.

adms20130520
Figure 2. ADMS Model

8.2.5. Project Open Data (POD)

Project Open Data provides the implementation guide and associated resources for the Federal Executive Order on open data and data management, M-13-13 “Managing Information as an Asset,” which includes the standardized metadata schema that all CFO Act agencies are required to use to publish their enterprise data inventories.

The Project Open Data Metadata Schema is a JSON-based implementation of the W3C DCAT vocabulary. This standard is currently implemented by multiple data catalog platforms as well as state and local governments.

Typically, POD documents are often published in Web Accessible Folder (WOF) and harvested by catalogs such as data.gov. The intent of POD is to lower the bar of complexity neededto represent data information by providing guidelines and recommended metadata. This enables a better search and discovery for datasets within the US goverment.

8.2.6. ISO 19115-1

The standard ISO 19115 defines the schema required for describing geographic information and services that is encoded in XML format. It provides information about the identification, the extent, the quality, the spatial and temporal aspects, the content, the spatial reference, the portrayal, distribution, and other properties of digital geographic data and services. The standard ISO 19115 is applicable to:

  • the cataloguing of all types of resources, clearinghouse activities, and the full description of datasets and services;

  • geographic services, geographic datasets, dataset series, and individual geographic features and feature properties.

ISO 19115-1 defines:

  • mandatory and conditional metadata sections, metadata entities, and metadata elements;

  • the minimum set of metadata required to serve most metadata applications (data discovery, determining data fitness for use, data access, data transfer, and use of digital data and services);

  • optional metadata elements – to allow for a more extensive standard description of resources, if required;

  • a method for extending metadata to fit specialized needs.

Though ISO 19115-1 is applicable to digital data and services, its principles can be extended to many other types of resources such as maps, charts, and textual documents as well as non-geographic data. Certain conditional metadata elements might not apply to these other forms of data.

ISO 19139 defines the XML-based implementation for ISO 19115. ISO 19115-1:2014 [ISO19115-1] has superseded ISO 19115:2003. At the date of publication of this document, the XML-based implementation of ISO 19115-1:2014 (namely, ISO 19115-3), was finalised but not yet officially released.

8.2.7. ISO 19135

This International Standard specifies the procedures to establish, maintain, and publish registers of unique, unambiguous and permanent identifiers and meanings that are define items of geographic information. In order to accomplish this purpose, the standard specifies elements of information that are necessary to provide identification and meaning to the registered items and to manage the registration of these items.

ISO19135RegisterItem
Figure 3. ISO 19135 RegistryItem

8.2.8. Shapes Constraint Language (SHACL)

SHACL is an RDF vocabulary for describing RDF graph structures. These graph structures are captured as "shapes", which correspond to nodes in RDF graphs. These shapes identify predicates and their associated cardinalities, and datatypes. Additional constraints can be associated with shapes using SPARQL or other languages which complement SHACL. SHACL shapes can be used to communicate data structures associated with a process or interface, to generate or validate data, or to drive user interfaces.

Most applications that share data do so using prescribed data structures. While RDFS and OWL enable one to make logical assertions about the objects in some domain, SHACL (Shapes Constraint Language) describes data structures. Features of SHACL include:

  • An RDF vocabulary to define structural declarations of the property constraints associated with those shapes.

  • Complex constraints that can be expressed in extension languages like SPARQL.

  • The possibility to mix SHACL shapes with other semantic web data, as SHACL is based on RDF and is compatible with Linked Data principles

  • SHACL definitions represented in RDF which can be serialized in multiple RDF formats.

8.3. Semantic Registry Information Model (SRIM)

After analysis of the different standards, we decided to create a superset of the DCAT ontology that defines the set of classes and properties commonly used to represent any item in a register. The SRIM ontology borrows extensively from existing standards such as DCAT, GeoDCAT-AP, Dublin Core Terms, ADMS, PROV-O, PAV ontologies, and ISO 19135. In order to entice high reusability of the ontology, we decided not to enforce any restrictions in the ontology, but just define the list of properties and classes that are related through documentation (see Appendix A ). We classified the set of properties for each class as mandatory, recommended and optional.

To address different domains containing different types of items, the Core SRIM ontology is extended through application profiles. An Application Profile is defined as a set of classes and properties that extends the classes and properties defined in the Core Ontology. During this testbed, we defined three different profiles:

  • Dataset/Service Profile: Used to describe Datasets and Services (such as the one defined in NMIS, ISO 19139). This profile is heavily based on DCAT and GeoDCAT-AP.

  • Schema Application Profile: Used to describe Schemas and Schema Mappings (which extends DCAT Dataset) and used by the Semantic Mediation Service

  • Portrayal Application Profile: Used to describe Portrayal information such as Styles, Symbols, and Portrayal Rules. This profile was used by the Semantic Portrayal Service.

We anticipate that in the future, more profiles will be defined for Maps, Layers, Coverage, Imagery, Feature Catalog, and Vocabularies.

8.4. Implementations

8.4.1. Semantic Mapping to SRIM

One of the primary functions of the Semantic Registry is to support search and discovery on a large variety of items using a unified API. The Semantic Registry was tested to handle different item types including Datasets, Services, Schemas, Schema Mappings, as well as Portrayal information such as Symbols, and Portrayal Rules, and Feature Type Styles. To integrate the different encoding standards of this information, including ISO 19139, NMIS, ebRIM Schema Profile, and DCAT, a number of semantic mappers were implemented. These semantic mappers link each standard to the adequate SRIM profiles, and are used by harvesters to extract information from various sources of information.

We found out that a Linked Data encoding (DCAT) of information is easier to integrate than XML encoding because the latter requires code to explicily define the mapping between the syntactic and semantic encoding. The XML encoding of information based on XML schema tends to be more unforgiving when validating data. Another advantage of using a Linked Data approach is that it favors reusability of information that can be created and managed in a decentralized way using a common encoding framework.

One of the biggest challenges when importing data into the system is the validation of the data. The RDF model provides a powerful framework to express any property of a resource by using vocabularies from different ontologies, and it can accomodate easily to partial/incomplete information. However, this flexibility causes difficulties when attempting to validate the data. Due to the limited time for this testbed, we decided to postpone the exploration of SHACL to address this issue for the next testbed. SHACL can provide a powerful way to validate data, define the shape of graph to be processed by the service.

8.4.2. ISO 19139 Mapping Issues.

This section summarizes the list of issues found when mapping ISO 19139 to a semantic representation with data coming from a variety of CSW sources (including data.gov and Geoplatform.gov). Some of these issues come from malformed metadata and ambiguities in the ISO 19115 standards, while others come from a lack of policies from agencies that publish metadata. These issues impede interoperability and integration of information in addition to search and discovery. The usage of Linked Data instead of XML encoding will address many of these problems, but not the ones related to policies.

Identification of Resources
Issue Identification of Resources

Description

There is no consistent way of defining the identifiers for different resources (e.g. organizations, datasets, services, controlled vocabularies, etc.)

Why it is a problem?

Inability to link information and allow reusability. Resource information (concepts) are duplicated several times in different documents with variations of the same information. Updating this information is difficult to perform across all repositories. Need authoritative unambiguous references.

Recommendations

  • Each resource should use a unique URI that is resolvable.

  • A policy needs to be put in place to manage the URI schemes of different types of resources.

  • The maintenance of the information for each resolvable URI should be decentralized to the authoritative party for the resource.

Benefits

A new policy to define URI Sets for US Government assets would provide a consistent means to make these trusted assets available for efficient, widespread discovery and re-use. This will encourage reuse and limit duplication.

Resolvable URI
Issue Resolvable URI

Description

Identifiers used in the 19139 document are often internal (e.g., a primary key in a store implementation) and not accessible as unambiguous web resources.

Why it is a problem?

The lack of consistent machine-resolvable URIs impedes interoperability and limits automation (concepts must be grounded with unambiguous meaning for services to interpret and respond). Grounded URIs will also help humans better understand important concepts.

Recommendations

  • Make links resolvable and semantically-grounded URIs with the right information to support human and machine exploitation (for controlled vocabularies, licenses, organizations, etc.)

  • Make the information accessible for both human consumption (HTML) and machine-understanding (Linked Data).

Benefits

Enables the exploration of a “unified knowledge graph” that links and describes resources. Allows users to search, discover and navigate through “Concept Space”, whereupon each concept is resolvable to a grounded (unambiguous) resource for consistent human and machine understanding.

Multilingual Support
Issue Multilingual Support

Description

The current standard does not enable the support of translations of human readable text in multiple languages. Language is handled at document level, not field level.

Why it is a problem?

Users who do not understand the language of the information producer will not be able to discover relevant data for their tasks

Recommendations

Opt for an implementation that natively provides multilingual support (such as Linked data) or provide guidelines for how to handle multiple languages (e.g., through JSON protocols).

External Resource Descriptions
Issue External Resource Descriptions

Description

  • A number of properties refer to external resources (homepage, landing page, online resource for contact, page about document, reference to metadata document). *Standards such as POD model these resources using a simple URL assigned to a property. This prevents for adding additional properties such as title, description, format or role of the document that helps the user to understand the meaning of the URL

Why it is a problem?

External resources modeled as a URL value inhibits the capture of additional information to help the role and meaning of the external (auxiliary) resource in the context of a given resource

Recommendations

  • Model external resources as objects when their role is ambiguous.

  • If the property referring to a resource URL is unambiguous (homepage), use the URL directly.

Issue Invalid XLinks

Description

For some of the ISO 19139, xlink:href are not valid URLs (example #FS Lower 48)

Why it is a problem?

The ISO 19139 documents with invalid xlink reference do not validate with a XML schema validator.

Recommendations

Comply to standard XML Schema for xlink:href using URLs

Benefits

Correct validation of ISO 19139

Controlled Vocabulary Management
Issue Controlled Vocabulary Management

Description

  • Controlled vocabularies are not made publicly available or are not resolvable (where is the National Map Theme Thesaurus?)

  • Lack unique identifier for controlled vocabulary (e.g., GCMD, Global Change Master Directory)

  • Lack unique identifier for keyword concepts (e.g., Paris, France)

  • Duplication of concepts (keywords) from different taxonomies, e.g., National Map Theme Thesaurus contains “Elevation” and NGDA Portfolio Theme refers to it as “Elevation Theme”. Are they the same concept and meaning?.

  • Tendency to use alternative spellings for the same concept (e.g., US and United States)

Why it is a problem?

  • Can’t perform semantic search

  • Lack consistent use of concepts (keywords) across 19139s

  • Ambiguity in the meaning of concepts (lack of grounded concepts)

Recommendations

  • Define concepts in SKOS encoding with unique identifiers that are resolvable

  • Group alternate labels or translations under the same concept

  • Provide SKOS mappings to other vocabularies to enable semantic search across taxonomies

  • Make controlled vocabularies publicly available and uniquely identified with a resolvable URL.

Benefits

  • Allows reusability of controlled vocabularies

  • Less verbose document

  • Unambiguous interpretation of key concepts

  • Inference enabled by using standard SKOS semantics (semantic search)

  • Enable Multilingual search by concept

Keywords Types
Issue Keyword Types

Description

The list of keyword types in ISO 19115 is limited to a few categories (discipline, strata, topic, place, temporal).

Why it is a problem?

Inability to accommodate new types of concepts such as audience, function, subject, topic, etc..

Recommendations

  • Provide a mechanism to extend the list of keyword types in ISO 19115 using SKOS controlled vocabularies

  • Define the keyword types in a controlled vocabulary to make them uniquely identifiable and resolvable

  • Refer to the keyword type by resolvable URL

Benefits

  • Provide an extensibility mechanism to accommodate other types of concepts (Audience, Function, Purpose, etc.).

  • Allows reusability of keyword types

Keyword Labeling Inconsistencies
Issue Keyword Labeling Inconsistencies

Description

In some instances, multiple labels are encoded as one keyword (e.g., 'list of all US states' is one keyword).

Why it is a problem?

While this is fine for doing lexical-based text search, it is not sufficient when supporting semantic search, where each concept must be grounded to a unique meaning.

Recommendations

  • Each keyword should refer to one concept only

  • In addition to a label, use a URI to refer to a concept

Benefits

  • Less verbose document

  • Enables inference by using standard SKOS semantics

Authority for Controlled Vocabularies
Issue Authority for Controlled Vocabularies

Description

The ISO 19139 uses the list of topic categories in the standard ISO 19115. There is a SKOS encoding available in the European Registry located at: http://inspire.ec.europa.eu/metadata-codelist/TopicCategory.

The mapping to Semantic Registry uses this URI to reference dcat:theme.

Why it is a problem?

If no authority are responsible of the management of controlled vocabularies, the vocabularies will not be reused and risk to be duplicated.

Recommendations

  • There is a need for a registry of controlled vocabularies that are reusable across agencies.

  • OGC could host controlled vocabularies encoded in SKOS (currently only a GML document is hosted by the team from Inspire).

Benefits

The taxonomy is maintained by the authority that defines the standard and thus will favor reusability of the vocabularies among information producers.

Place Name Consistency
Issue Place Name Consistency

Description

ISO 19139 uses keywords to define place names that reference a thesaurus that is not accessible online. There is no consistent way to define place names and resolve ambiguities.

Why it is a problem?

The place name can be ambiguous as there are many locations with the same name (e.g. Leesburg, FL versus Leesburg, VA)

Recommendations

  • Use unique resolvable identifier (URI) to define place name along with a human readable name.

  • Provide a human readable page for place name URI and Linked Data representation, with partonomy relationships, i.e., A semantic gazetteer.

  • Reference gazetteers with a resolvable URI.

  • Use well known gazetteers (Geonames, GNIS)

Contact Point
Issue Contact Point

Description

Contact Point in ISO 19139 is not systematically encoded in the document. The individual’s name is required in POD but is not always present in the ISO document. A generic email reference for the contact role is sometimes used.

Why it is a problem?

When a problem is present in the metadata, a contact point with an email should be available for expedient resolution of issues.

Recommendations

  • Enforce Contact Point for every Resource with email, role name, and individual name.

  • Email associated with contact point should be assigned to a role, not a specific individual.

Benefits

The use of a generic role-based email for the contact will smoothly handle staff changes.

Responsible Party without Role
Issue Responsible Party without Role

Description

Some responsible parties are published without a role, while the ISO standard indicates that the role is mandatory

Why it is a problem?

Without a role, we are unable to understand how each party relates to a data source.

Recommendations

Enforce role in ISO 10139 for each responsible party

Benefits

We are able to discern how each party relates to a metadata item unambiguously.

Responsible Party Role Encoding
Issue Responsible Party Role Encoding

Description

ISO 19139 outlines a well-defined taxonomy for Responsible Party roles (e.g., Publisher, etc). ISO 19139 refers to a GML document, through a URL and an Xpointer, which contains roles and many other concepts (instead of a unique concept)

Why it is a problem?

  • Information conveyed in a GML document cannot be interpreted automatically. The XML schema needs custom code to be interpreted, and the Xpointer URL cannot be used in the context of Linked Data

  • In order to understand the meaning of a role, an unambiguous machine-readable description and human-readable page needs to be provided for each role.

Recommendations

Encode the role taxonomy in SKOS (machine-readable) and use resolvable URIs for roles.

Benefits

Both machine and human can understand the unambiguous meaning of the concept.

Organization Hierarchy
Issue Organization Hierarchy

Description

ISO 19139 does not provide support for the subOrganizationOf property (recommended by Project Open Data).

Why it is a problem?

  • Difficult to understand the hierarchy between organizations

  • Search within a hierarchy of organizations is broken.

Recommendations

  • Add a subOrganizationOf property to the ISO 19115 standard

  • Make the organization resolvable to a URL that provides a machine-processable definition of the organization

Benefits

When a resource search is performed for a given organization, the hierarchy can also be leveraged to search within suborganizations (using transitive inferencing).

Inconsistent Usage of OnlineResource in ContactInfo
Issue Inconsistent Usage of OnlineResource in ContactInfo

Description

In some documents, the link to services and distributions (zip files) is put in a responsible party’s contact information (onlineResource) instead of the ServiceIdentification property or the TransferOptions in a Distribution

Why it is a problem?

The ContactInfo’s onlineResource property is being misused semantically.

Recommendations

  • Enforce a consistent way to encode distribution and service descriptions

  • Clarify the role of onlineResource in ContactInfo

Benefits

Consistency of description of services and distributions in ISO 19139, will help to make a clear distinction between service and distributed content that can be downloaded.

Service API Standards
Issue Service API Standards

Description

There isn’t a consistent manner of referring to the applicable services API standard, e.g., WMS, WFS, ArcREST

Why it is a problem?

There is no systematic and unambiguous way to identify web services standards. The version of a standard is often not clear (OGC:WMS). Smart software, assisted by people, need to resolve spec confusion.

Recommendations

  • Service API should reference an authoritative spec URI to remove any ambiguity.

  • Make the URI of the referred standard resolvable (example: http://www.opengis.net/spec/wms/1.3)

Benefits

Proper classification of service standards, disambiguation, and support of autonomous operations

Service API Specification
Issue Service API Specification

Description

Absence of industry best practices or standards to refer to machine-processable API specifications (RAML, ALPS, Swagger, WSDL, etc.).

Why it is a problem?

  • The ISO standard is not up to date with the techniques currently used in the industry, i.e., REST based API with machine-processable API specifications.

  • Specifications are defined as free text, which is not suitable for machine to machine communication.

Recommendations

Semantic Registry should produce a machine-processable API Document.

Benefits

Integration with the service API can be automated.

Service Online Resource URL
Issue Service Online Resource URL

Description

The access URL for a service is not consistently encoded. For example in a WMS, some URIs point to a GetCapabilities endpoint, while others point to the base URL of the service

Why it is a problem?

There is no systematic way to access the service endpoint for a given service. Software agents have to analyze the URL to get a normalized form

Recommendations

  • Use the base URI for a service

  • Provide reference to a machine processable API document.

Benefits

Systematic access to a service endpoint.

Insufficient Service Metadata
Issue Insufficient Service Metadata

Description

The service description associated with a Dataset has minimal metadata, usually limited to an accessURL and format.

Why it is a problem

  • There is not enough metadata to enable the discovery of services and the coupling of other resources to the service (layers from WMS for example)

  • The service identification information is sometimes too abstract to be leveraged by modern tools

Recommendations

  • Use the base URI for a service

  • Define a rich metadata model for services and coupled resources

  • Provide reference to a machine processable API document or standard

Benefits

Enable the discovery of services and invocation of services in an automated way.

Format and OnlineResource Parity
Issue Format and OnlineResource Parity

Description

The ISO standard decouples Format and OnlineResource. One format can have more than one online resource URL.

Why it is a problem

Having multiple URLs for a format is ambiguous and not friendly to machines or users.

Recommendations

Enforce parity of OnlineResource with format.

Benefits

Proper pairing of format with online resource removes ambiguity to both machines and users.

Download Format Versus Service
Issue Download Format Versus Service

Description

The ISO standard does not clearly distinguish between a download file format and a service API in a Dataset distribution.

Why it is a problem

Classification of services versus downloads is difficult and not friendly to machines or users.

Recommendations

  • Improve the ISO standard to make a clear distinction between a service and a download format.

  • Provide a rich description of services.

Benefits

  • Enhanced classification of various distributions of datasets.

  • Support for autonomous operations

Format Description
Issue Format Description

Description

There is no consistent way to define the format of services (OGC:WMS). Usage of mime type is not consistent in the standard, and most format descriptions are not machine readible.

Why it is a problem?

Inconsistency of format description makes it difficult for software agents to access data in automatically.

Recommendations

  • Use a standard URI when referring to standard service APIs

  • Use a MIME type from IANA to refer to representation formats.

Benefits

Enables automation, content negotiation and service selections based on controlled vocabularies.

Insufficient Map Layer Description
Issue Insufficient Map Layer Description

Description

The ISO standard does not provide enough information to map a dataset to a layer in a map service (WMS, ArcREST). Often multiple layers are provided by the map service and there is no deterministic way to find out which one corresponds to the dataset.

Why it is a problem?

Traceability from dataset to map layer is unavailable. The missing layer metadata is needed to support GeoPlatform search, discovery and proper use.

Recommendations

  • Define a richer description of services/layers and provide them through the Semantic Registry

  • Define a new standard to describe layer metadata, with commensurate industry supported techniques and policies.

Benefits

Support a vastly improved layer search and map building experience.

Data-centric Approach
Issue Data-centric Approach

Description

Data Schema Standardization of domain models uses a syntactic approach. Imposing this strict adherence to a standard tends to minimize heterogeneity.

Why it is a problem?

  • Data Schemas have limited expressiveness.

  • Data Schemas only capture the syntactic and structural constraints of a data model. This does not provide a machine-processable conceptual model or business rules. Implementations need to hardcode the rules, which risks enforcing different interpretations.

  • Evolution of domain model and associated software is difficult when using a data-centric approach because the business rules and data model semantics need to be hardcoded in the application. Any changes in the standard require expensive software updates. Frequent modifications of the data model require building consensus and standardization, which can be a lengthy process.

  • Integration and interoperability with other domains is difficult due to discrepancies between data schemas and business models, as well as the lack of common protocols, and machine-processable conceptual models and business rules.

Recommendations

  • Use a semantic-based approach to embrace the heterogeneity of domain models by providing a common, formal, and sharable framework mechanism for easily extending metamodels to accommodate specific needs. The extensions can be done in a decentralized way without breaking the existing infrastructure.

  • Use of Linked Data standards (such as OWL, SHACL and SPARQL Rules) to provide a standard-based mechanism to capture formal conceptual models, along with their business rules, in a machine-processable way. Information captured in this manner, could be imported by a system implementation without writing additional code.

  • Use ontologies to provide a framework for extending metamodels in a decentralized way to accommodate the specificity of each domain player. The extensions can be integrated and handled by any generic-purpose semantic-based reasoner and validator without rewriting code.

Benefits

  • Decentralized extension of the model.

  • Accommodation of model specificities

  • Shareable model and business rules that are machine processable.

  • Reduction of software development cost

  • Exchangeable machine processable rules and conceptual models, which allow automation and reduction of code.

  • Unambiguous interpretation of domain models

  • Cost reduction in software updates

  • Software that adapts and evolves to match changes in domain models without rewriting code.

  • A decentralized and organic evolution of the domain model

  • Software that can adapt quickly to changes in the model or business rules.

8.4.3. Semantic Registry Service

The Semantic Registry Service was designed to manage multiple registers that are capable of containing item classes from different application profiles. To support the testbed 12, we implemented three different registers:

  • Datasets and Services Register: Manages datasets and services collected from Compusult, Envitia and ESRI CSW instances

  • Schema and Schema Mapping Register: Manages schemas and schema mappings harvested from Galdos CSW Schema Registry

  • Portrayal Service Register: Manages portrayal information (styles, symbols, symbolSets, and portrayal rules)

RegistersOverview
Note
The partitioning of the registers was done to provide some clarity in the organization of the information. However it is possible to create a register that contains multiple application profiles. The partitioning decision is based on the business requirement of the user.

These registers were populated by a harvester service which is integrated with the Semantic Registry Service and accessible by a hypermedia-driven REST API. The harvester service was designed to be extensible and to support multiple types of data sources, including documents extracted from a resolvable URL (Project Open Data, DCAT , ISO 19139, FDGC CSGDM documents), and advanced web services such as CSW, CKAN, Web Accessible Folder, and ESRI Web services. These plugins called harvester types describe the list of parameter descriptors needed by the harvester. An instance of a harvester type is called a harvester source and provides binding of the parameters to values. A harvester source can be triggered for harvesting manually or a given schedule, and the harvester results are returned with statistics (number of harvested objects successfully imported, number of failures) as well as the list of item identifiers. Due to limited time for implementation, only synchronous calls to harvesters are supported. Future development will handle asynchronous harvesting with on demand status reports.

The items managed by the service are stored in a NoSQL store, and are indexed and managed in a RDF store to support graph analytics and SPARQL queries.

8.4.4. Semantic Registry Service REST API

The initial objective of the testbed was to provide a DCAT REST API, which focused on the search and discovery of dcat:Datasets. However, promoting the DCAT model to the superset SRIM model also necessitated a promotion of the REST API to manage registers and harvester types and sources, and to handle more general items, including Portrayal items, Schema and Schema Mapping items.

A review of existing implementations that use DCAT datasets showed that the only consensus in how to access the information through a REST API, was the use of a SPARQL query protocol. Using an OGC filter was not considered adequate enough for complex queries of RDF data, as SPARQL provided a more compact and standardized way to query linked data. One of the main considerations when designing the REST API for the Semantic Registry was to make it accessible for web clients, which primarily operate in JSON, and to bridge the gap between linked data and JSON, the Semantic Registry uses the W3C JSON-LD. The use of JSON-LD context allows the conversion of RDF models to JSON representations and vice versa. Another objective of the API was to provide a degree of separation between the server and client implementation, to allow the API to evolve in the future without breaking client ecosystems. To achieve this, the Semantic Registry uses Hypermedia Links which provide a powerful mechanism to decouple clients and servers. This corresponds to the Level 3 REST API on the Richardson Maturity Model.

RichardsonMaturityModel
Figure 4. Richardson Maturity Model

To implement a Level 3 REST API, we adopted the IETF standard candidate Hypermedia Application Language (HAL), a popular standard candidate which is widely used by JSON hypermedia REST APIs.

We also acknowledge that many web frameworks (such as AngularJS) are designed for Level 2 APIs and construct URLs on the client side to access the different states of a web application. To accomodate these frameworks, we decided to also implement a Level 2 REST API by providing well-defined URL patterns to access the artifacts of the service (registers, items,harvesters types, harvester sources) and a unique identifier for each artifact. The responses of Level 2 are identifical to those of Level 3, except for the exclusion of the hypermedia links to other states. The REST API endpoints URL pattern documented in Appendix D are considered informative only not normative.

In addition to the Level 2 and Level 3 REST APIs that will mostly be used by web clients, we added support for Linked Data API that will mainly be used by machines. Each REST endpoint of the Semantic Registry Service also supports a Linked Data output in RDF/XML, Turtle and N-Triples formats.

Furthermore, each Register endpoint also provides a GeoSPARQL endpoint that permits advanced SPARQL queries on the Linked Data representation of the items managed by each register.

8.4.5. Integration with OGC Catalog Services

To evaluate interoperability aspects in multi-catalog type environments, the testbed considered a number of solutions. Each solution involved various types of catalogue services, for example, CSW featuring ISO based metadata and OpenSearch, other CSW offering a SOAP binding, and support for DCAT using RDF.

Several architectural solutions could be used to establish a multi-catalogue environment, and four key architectural solutions were identified by the Testbed. The identified solutions differ in a variety of ways, including the entry point for client applications and the computational balance between the client application and the services.

The first solution for a multi-catalogue environment includes a client application that can query the various catalogue services directly. This requires the client application to prepare appropriate queries for each catalogue service and to collate the search results when they are returned by the services.

Option 1

The second solution involves the selection of one of the catalogue services to initiate a distributed search. In this case, the client application only needs to prepare queries to send to the cascading catalogue service. Upon receiving a request from the client, the cascading catalogue service then adapts the request to forward to other catalogue services and returns responses from the other services, as well as results from its own catalogue.

Option 2

The third solution involves the harvesting of metadata from one or more source catalogue services into a single target catalogue service. Harvesting is ideally conducted at a scheduled time and not when a query is received from the client. The client application can then query the target catalogue service to discover resources published by both the source and target catalogue services.

Option 3

The fourth solution involves the replication of metadata between a federation of catalogue services. Replication would ideally be conducted at a scheduled time and not when a query is received from the client. The client application can then query any catalogue service to discover resources published by any catalogue service.

Option 4

For this tesbed, the integration with the OGC Catalog Services was accomplished by using a Harvester Service.

Compusult CSW Integration

We integrated Compusult CSW, which serves a ISO 19139 document, using the CSW 2.0 protocol. The integration could have been done with CSW 3.0, but no open-source clients that supported the CSW 3.0 protocol were available at the time of the testbed 12. However, the Harvester configuration for CSW 3.0 would be very similar to the CSW 2.0 GetRecords operation. To map ISO 19139 to SRIM, we use a semantic mapping using the DCAT profiles. We have not found any issues validating the ISO 19139 document against their XML schema, however we found some issues in the ISO 19139 mapping (explained in Section 1.4 Implementations).

A Harvester Source for the CSW Catalog was defined and harvested on demand. The following figure shows a client displaying the harvester source for Compusult CSW:

CompusultCSWHarvester

The following snippet shows the JSON encoding of the harvester source configuration:

{
    "id": "compusultCSW",
    "type": "csw",
    "title": "Testbed12 Compusult CSW",
    "description": "Compusult CSW used for OGC Testbed 12 to harvest ISO19139 documents",
    "created": "2016-10-03T22:49:27.311Z",
    "modified": "2016-10-03T22:49:27.311Z",
    "source": "http://ogc-testbed12.compusult.net/wes/serviceManagerCSW/csw",
    "config": {
        "resourceType": "http://www.isotc211.org/2005/gmd"
    },
    "harvestInterval": "MANUAL",
    "registerId": "datasets"
}
ESRI CSW Integration

We integrated the ESRI OGC CSW, which serves ISO 19139 documents, by defining a Harvester Source with CSW 2.0. We found out that some of the ISO 19139 documents registered in the CSW were not compliant with the standards (for example missing ScopeCode in HierarchyLevel) The following snippet shows the configuration of the harvester:

{
    "id": "esriCSW",
    "type": "csw",
    "title": "Testbed12 ESRI CSW",
    "description": "ESRI CSW used for OGC Testbed 12",
    "created": "2016-11-15T18:11:24.203Z",
    "modified": "2016-11-15T18:11:24.203Z",
    "source": "http://gptogc.esri.com/geoportal/csw",
    "config": {
        "resourceType": "http://www.isotc211.org/2005/gmd"
    },
    "harvestInterval": "MANUAL",
    "registerId": "datasets"
}
Envitia CSW Integration

Envitia provided a CSW instance with a ebRIM profile. We configured the harvester to collect dataset metadata stored in an object of type: urn:ogc:def:ebRIM-ObjectType:OGC-I15::DataMetadata. The following snippet shows the configuration of the CSW Harvester:

{
    "id": "envitiaCSW",
    "type": "csw",
    "title": "Testbed12 Envitia ebRIM CSW",
    "description": "Envitia CSW used harvest ebRIM datasets records",
    "created": "2016-11-15T18:11:24.236Z",
    "modified": "2016-11-15T18:11:24.236Z",
    "source": "http://86.188.147.99:9080/RegistryService/registry",
    "config": {
        "requestXML": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>
                       <csw:GetRecords xmlns:env-ebrim=\"http://www.envitia.com/schemas/georegistry/ebrim-ext\"
					                   xmlns:xmime=\"http://www.w3.org/2005/05/xmlmime\" xmlns:dct=\"http://purl.org/dc/terms/\"
									   xmlns:csw=\"http://www.opengis.net/cat/csw/2.0.2\" xmlns:gml=\"http://www.opengis.net/gml\"
									   xmlns:wrs=\"http://www.opengis.net/cat/wrs/1.0\" xmlns:ows=\"http://www.opengis.net/ows\"
									   xmlns:ogc=\"http://www.opengis.net/ogc\" xmlns:dc=\"http://purl.org/dc/elements/1.1/\"
									   xmlns:xlink=\"http://www.w3.org/1999/xlink\" service=\"CSW\" version=\"2.0.2\"
									   resultType=\"results\" outputSchema=\"urn:oasis:names:tc:ebxml-regrep:xsd:rim:3.0\"
									   startPosition=\"1\" maxRecords=\"50000\">
							<csw:Query typeNames=\"wrs:ExtrinsicObject_coi\">
								<csw:ElementSetName typeNames=\"coi\">full</csw:ElementSetName>
									<csw:Constraint version=\"1.1.0\">
										<ogc:Filter>
										    <ogc:PropertyIsEqualTo>
										        <ogc:PropertyName>$coi/@objectType</ogc:PropertyName>
												<ogc:Literal>urn:ogc:def:ebRIM-ObjectType:OGC-I15::DataMetadata</ogc:Literal>
										    </ogc:PropertyIsEqualTo>
										</ogc:Filter>
									</csw:Constraint>
							</csw:Query>
						</csw:GetRecords>",
        "resourceType": "urn:ogc:def:ebRIM-ObjectType:OGC-I15::DataMetadata"
    },
    "harvestInterval": "MANUAL",
    "registerId": "datasets"
}
CSW ebXML Schema Registry

During the testbed, Galdos provided a CSW 2.0 instance which implemented the ebRIM profile. We extended the profile to accomodate representations of Schemas and Schema Mappings. In addition, we implemented a Semantic Mapper that converts the Schema and Schema Profile to the SRIM Schema Application Profile, and integrated it with a Semantic Registry harvester.

The following shows the Harvester Source Configuration needed to access the Schemas and Schema Mappings from the CSW:

{

    "id": "galdosCSW1",

    "type": "csw",

    "title": "Schema Harvester from Galdos ebRIM CSW ",

    "description": "This source harvests schemas stored in ebRIM Model",

    "created": "2016-10-03T22:49:27.542Z",

    "modified": "2016-10-03T22:49:27.542Z",

    "source": "http://ows.galdosinc.com/indicio/query",

    "config": {

        "requestXML": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>
		                  <csw:GetRecords xmlns:env-ebrim=\"http://www.envitia.com/schemas/georegistry/ebrim-ext\"
						                  xmlns:xmime=\"http://www.w3.org/2005/05/xmlmime\"
										  xmlns:dct=\"http://purl.org/dc/terms/\"
										  xmlns:csw=\"http://www.opengis.net/cat/csw/2.0.2\"
										  xmlns:gml=\"http://www.opengis.net/gml\"
										  xmlns:wrs=\"http://www.opengis.net/cat/wrs/1.0\"
										  xmlns:ows=\"http://www.opengis.net/ows\"
										  xmlns:ogc=\"http://www.opengis.net/ogc\"
										  xmlns:dc=\"http://purl.org/dc/elements/1.1/\"
										  xmlns:xlink=\"http://www.w3.org/1999/xlink\"
										  service=\"CSW\" version=\"2.0.2\"\r\n\tresultType=\"results\" outputSchema=\"urn:oasis:names:tc:ebxml-regrep:xsd:rim:3.0\"
										  startPosition=\"1\" maxRecords=\"50\">
										  <csw:Query typeNames=\"wrs:ExtrinsicObject\">
										    <csw:ElementSetName>full</csw:ElementSetName>
										    <csw:Constraint version=\"1.1.0\">
											    <ogc:Filter>
											        <ogc:PropertyIsEqualTo>
													   <ogc:PropertyName>@objectType</ogc:PropertyName>
												          <ogc:Literal>urn:ogc:def:ebRIM-ObjectType:OGC:Schema</ogc:Literal>
													   </ogc:PropertyIsEqualTo>
											     </ogc:Filter>
											</csw:Constraint>
									     </csw:Query>
						 </csw:GetRecords>",
        "resourceType": "urn:ogc:def:ebRIM-ObjectType:OGC:Schema"
    },

    "harvestInterval": "MANUAL",

    "registerId": "schemas"

}

8.4.6. Integration with Clients

A number of clients were successfully integrated with the Semantic Registry, as illustrated by the following figure:

RegistryClients
Figure 5. Overview of the Semantic Registry Clients
ESRI Semantic Registry Client

The ESRI Client provides a plugin framework to access a variety of catalog services. For this testbed, ESRI developed a plugin to access the Semantic Registry. The following figure shows the results of a search in the ESRI Semantic Registry Client: