Publication Date: 2017-04-25
Approval Date: 2016-03-09
Posted Date: 2016-12-02
Reference number of this document: OGC 16-059
Reference URL for this document: http://www.opengis.net/doc/PER/t12-A066
Category: Public Engineering Report
Editor: Stephane Fellah
Title: Testbed-12 Semantic Portrayal, Registry and Mediation Engineering Report
COPYRIGHT
Copyright © 2017 Open Geospatial Consortium. To obtain additional rights of use, visit http://www.opengeospatial.org/
WARNING
This document is an OGC Public Engineering Report created as a deliverable of an initiative from the OGC Innovation Program (formerly OGC Interoperability Program). It is not an OGC standard and not an official position of the OGC membership. It is distributed for review and comment. It is subject to change without notice and may not be referred to as an OGC Standard. Further, any OGC Engineering Report should not be referenced as required or mandatory technology in procurements. However, the discussions in this document could very well lead to the definition of an OGC Standard.
LICENSE AGREEMENT
Permission is hereby granted by the Open Geospatial Consortium, ("Licensor"), free of charge and subject to the terms set forth below, to any person obtaining a copy of this Intellectual Property and any associated documentation, to deal in the Intellectual Property without restriction (except as set forth below), including without limitation the rights to implement, use, copy, modify, merge, publish, distribute, and/or sublicense copies of the Intellectual Property, and to permit persons to whom the Intellectual Property is furnished to do so, provided that all copyright notices on the intellectual property are retained intact and that each person to whom the Intellectual Property is furnished agrees to the terms of this Agreement.
If you modify the Intellectual Property, all copies of the modified Intellectual Property must include, in addition to the above copyright notice, a notice that the Intellectual Property includes modifications that have not been approved or adopted by LICENSOR.
THIS LICENSE IS A COPYRIGHT LICENSE ONLY, AND DOES NOT CONVEY ANY RIGHTS UNDER ANY PATENTS THAT MAY BE IN FORCE ANYWHERE IN THE WORLD. THE INTELLECTUAL PROPERTY IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NONINFRINGEMENT OF THIRD PARTY RIGHTS. THE COPYRIGHT HOLDER OR HOLDERS INCLUDED IN THIS NOTICE DO NOT WARRANT THAT THE FUNCTIONS CONTAINED IN THE INTELLECTUAL PROPERTY WILL MEET YOUR REQUIREMENTS OR THAT THE OPERATION OF THE INTELLECTUAL PROPERTY WILL BE UNINTERRUPTED OR ERROR FREE. ANY USE OF THE INTELLECTUAL PROPERTY SHALL BE MADE ENTIRELY AT THE USER’S OWN RISK. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR ANY CONTRIBUTOR OF INTELLECTUAL PROPERTY RIGHTS TO THE INTELLECTUAL PROPERTY BE LIABLE FOR ANY CLAIM, OR ANY DIRECT, SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM ANY ALLEGED INFRINGEMENT OR ANY LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR UNDER ANY OTHER LEGAL THEORY, ARISING OUT OF OR IN CONNECTION WITH THE IMPLEMENTATION, USE, COMMERCIALIZATION OR PERFORMANCE OF THIS INTELLECTUAL PROPERTY.
This license is effective until terminated. You may terminate it at any time by destroying the Intellectual Property together with all copies in any form. The license will also terminate if you fail to comply with any term or condition of this Agreement. Except as provided in the following sentence, no such termination of this license shall require the termination of any third party end-user sublicense to the Intellectual Property which is in force as of the date of notice of such termination. In addition, should the Intellectual Property, or the operation of the Intellectual Property, infringe, or in LICENSOR’s sole opinion be likely to infringe, any patent, copyright, trademark or other right of a third party, you agree that LICENSOR, in its sole discretion, may terminate this license without any compensation or liability to you, your licensees or any other party. You agree upon termination of any kind to destroy or cause to be destroyed the Intellectual Property together with all copies in any form, whether held by you or by any third party.
Except as contained in this notice, the name of LICENSOR or of any other holder of a copyright in all or part of the Intellectual Property shall not be used in advertising or otherwise to promote the sale, use or other dealings in this Intellectual Property without prior written authorization of LICENSOR or such copyright holder. LICENSOR is and shall at all times be the sole entity that may authorize you or any third party to use certification marks, trademarks or other special designations to indicate compliance with any LICENSOR standards or specifications.
This Agreement is governed by the laws of the Commonwealth of Massachusetts. The application to this Agreement of the United Nations Convention on Contracts for the International Sale of Goods is hereby expressly excluded. In the event any provision of this Agreement shall be deemed unenforceable, void or invalid, such provision shall be modified so as to make it valid and enforceable, and as so modified the entire Agreement shall remain in full force and effect. No decision, action or inaction by LICENSOR shall be construed to be a waiver of any rights or remedies available to it.
None of the Intellectual Property or underlying information or technology may be downloaded or otherwise exported or reexported in violation of U.S. export laws and regulations. In addition, you are responsible for complying with any local laws in your jurisdiction which may impact your right to import, export or use the Intellectual Property, and you represent that you have complied with any regulations or registration procedures required by applicable law to make this license enforceable.
- 1. Introduction
- 1.1. Scope
- 1.2. Document contributor contact points
- 1.3. Future Work
- 1.3.1. SRIM and ISO 19115 mapping
- 1.3.2. SRIM Layer and Map Profile
- 1.3.3. Pubsub and federation of Registry
- 1.3.4. Web of Vocabulary Ontology and Service
- 1.3.5. Application of Shape Constraint Language (SHACL) for Linked Data
- 1.3.6. Composite Symbology and alternates renderers for Semantic Portrayal Service.
- 1.4. Foreword
- 2. References
- 3. Terms and definitions
- 4. Conventions
- 5. Overview
- 6. Status Quo & New Requirements Statement
- 7. Solutions
- 8. Semantic Registry Service
- 9. Semantic Mediation Service
- 10. Semantic Portrayal Service
- Appendix A: Semantic Registry Information Model (SRIM)
- Namespaces
- Registry Core Ontology
- Overview
- srim:Register
- srim:RegisterEntry
- srim:ItemClass
- srim:Item
- prov:Activity
- vcard:Address
- foaf:Agent
- prov:Attribution
- skos:Concept
- skos:ConceptScheme
- foaf:Document
- vcard:Email
- extent:GeographicExtent
- id:Identifier
- dct:LicenseDocument
- link:Link
- dct:Location
- gr:OpeningHoursSpecification
- org:Organizaton
- dct:PeriodOfTime
- vcard:Phone
- foaf:Project
- dct:ProvenanceStatement
- srim:Release
- dct:RightsStatement
- dct:Standard
- vcard:VCard
- Dataset Application Profile
- Service Application Profile
- Appendix B: SRIM Schema Application Profile
- Appendix C: Semantic Portrayal Ontologies
- Appendix D: Semantic Registry Service REST API v0.1
- Appendix E: Semantic Mediation Service REST API
- Appendix F: Semantic Portrayal Service REST API
- Appendix G: Revision History
- Appendix H: Bibliography
This engineering report documents the findings of the activities related to the Semantic Portrayal, Registry and Mediation components implemented during the OGC Testbed 12. This effort is a continuation of efforts initiated in the OGC Testbed 11. This report provides an analysis of the different standards considered during this effort, documents the rendering endpoints extension added to the Semantic Portrayal Service and the migration of the Portrayal metadata to the Semantic Registry, which is aligned with the DCAT REST Service API. We also discuss the integration of the CSW ebRIM for Application Schema with the Semantic Mediation Service, and document the improvements of the SPARQL Extensions, Portrayal and Semantic Mediation ontologies defined in the previous testbed.
Catalog services usually provide discovery of data and services; however, the ability to discover other related resources that can help applications better understand and render the data are not commonly found. For example, getting available styles for a layer or a feature type. In Testbed 12 it was advanced the use of W3C semantic technologies to better integrate datasets, services, schemas, schema mappings , portrayal information, and layers.
This engineering report is important to the OGC Geosemantics Domain Working Group as it advances the semantic enablement of geospatial information found in catalogs such dataset, service and portrayal metadata potentially providing a bridge between the geospatial and semantic web communities. The testbed also produced a number of ontologies for portrayal, schema management and registry.
ogcdocs, testbed-12, CSW, eb-RIM, catalogue, metadata, SPARQL, RDF, OWL, ontology, semantic web, linked data, DCAT, Portrayal, Schema, Schema Mapping, Hypermedia, REST.
This engineering report will be submitted to the Geosemantics Domain Working Group for review.
1. Introduction
Catalog services usually provide discovery of data and services; however, the ability to discover other related resources that can help applications better understand and render the data are not commonly found. For example, getting available styles for a layer or a feature type. This report captures the work performed in Testbed 12 in setting up three types of semantic services based on W3C semantic technologies to better integrate datasets, services, schemas, schema mappings , portrayal information, and layers.
-
The Semantic Registry Service allows discovering and search of geospatial assets (e.g. datasets, services, schemas, portrayal information, and layers). It is based on: the W3C Data Catalog Vocabulary (DCAT); the W3C PAV - Provenance, Authoring and Versioning ontology; and the Dublin Core Metadata Terms. The Service is based on a new developed ontology called Semantic Registry Information Model (SRIM) which generalizes the DCAT model to accommodate other types of geospatial assets. The service API is an hypermedia-driven and uses JSON-LD, Linked Data and Hypermedia Application Language (HAL).
-
The Semantic Mediation Service provides the ability to perform transformation of data from one schema to another, including chaining of transformation. It is based on a SRIM application profile for describing schemas and schema mappings. The service is an hypermedia-driven REST API.
-
The Semantic Portrayal Service provides the ability to render datasets to multiple output formats (e.g. SVG and PNG) and generate different styling encoding (e.g. SLD, MapCSS, and CartoCSS). It is based a set of portrayal ontologies for styles, symbols and graphics. It uses a SRIM profile to represent portrayal metadata.
1.1. Scope
This OGC document specifies semantic information models and REST APIs for Semantic Registry, Semantic Mediation and Semantic Portrayal Services. It introduces the Semantic Registry Information Model (SRIM), a superset of the W3C DCAT ontology. SRIM can accommodate registry items other than dcat:Dataset, such as Service description, Schema and Schema Mapping description, and Portrayal Information (Styles, Portrayal Rules, Graphics and Portrayal Catalog), Layer, and Map Context. The Semantic Registry Service is used as an integration solution for federating and unifying information produced by different OGC Catalog Service information by providing a simplified access through hypermedia-driven API, using JSON-LD, Linked Data and HAL-JSON. During the testbed, the Semantic Registry was used to store information about geospatial datasets and services, schemas and portrayal information. The Semantic Mediation Service was used to validate and perform transformation between schemas, including transformation chaining. The Semantic Portrayal Service was used as a convenience API to access Portrayal Registry information and perform rendering of geospatial data to different graphic representation (SVG, Raster and other pluggable formats).
1.2. Document contributor contact points
All questions regarding this document should be directed to the editor or the contributors:
Name | Organization |
---|---|
Stephane Fellah |
Image Matters LLC |
Gobe Hobona |
Envitia |
Richard Martell |
Galdos Inc |
Luis Bermudez |
OGC |
1.3. Future Work
1.3.1. SRIM and ISO 19115 mapping
The SRIM ontology in Testbed 12 used Dublin Core Metatada Terms. Future work to enhance SRIM is to map it with ISO 19115. Request for changes to improve the current standard ISO 19115 might also be required to better align with Linked Data. ISO 19115 should provide better use of controlled vocabularies, linked data friendly identifiers, and a better service description that enables automated access to services.
1.3.2. SRIM Layer and Map Profile
There is no standard way to describe metadata for layers and maps. While layers and maps are derived from a Dataset, they have their own specific metadata. Future work could investigate a profile description for layers and maps that extends the Registry Item and relates them to Datasets, Services and Portrayal Information. The description will be used by the Semantic Registry and Semantic Portrayal Service.
1.3.3. Pubsub and federation of Registry
In Testbed 12 , the Semantic Registry harvested information from a federation of CSW services to exercise the Semantic Registry Information Model (RIM) and the REST API. Future work could include improving the efficiency of the harvesting process by investigating the publish/subscribe protocol and versioning management of the register items in the Semantic Registry as they change over time.
1.3.4. Web of Vocabulary Ontology and Service
With the deployment of the Semantic Web and Linked Open Data, data sources have multiplied, as well as, the machine-processable controlled vocabularies that structure and constrain the interpretations of these data. These controlled vocabularies can be ontologies (RDF Schema, OWL), codelists, taxonomies, thesauri (SKOS) sometimes augmented with additional rules (SPARQL/SPIN rules, SWRL, RIF) ,and constraints (SHACL).
Vocabulary directories exist (e.g. LOV), but there is an ever-increasing demand for environments that simplify searching, editing and collaborative contributions to the vocabularies by non-experts of the Semantic Web. This creates a tension between very rich formalisms and a need to democratize participation in the life cycle of controlled vocabularies.
Vocabularies are most likely to be adopted and shared if they are made available easily. Nevertheless, despite successes in the use of SKOS for encoding vocabularies, current standards provide only low-level interfaces to vocabulary data. For example, many vocabularies are published as an RDF document for download. However, if the vocabulary is large, then the download will be commensurately large; if the user only wants to retrieve a single vocabulary term or select a few terms, this option requires processing on the client side. Alternatively, access to vocabularies is often provided at a SPARQL endpoint. SPARQL is the generic RDF query language. While it is powerful, it is also considered a low-level language similar to the relational database query language SQL and normally is only used by database administrators.
Some SKOS vocabularies are published via other HTTP interfaces. However, each implementation uses different protocols and supports a varied set of features (e.g. content-negotiation provided by the GEMET REST interface and NERC Data Grid’s Vocabulary Server SOAP interface). In some cases, one or both of human-readable formats and machine-readable formats are not available. Thus, discovery and access across vocabulary endpoints becomes challenging and ad-hoc.
There is a clear opportunity to design an API to match the SKOS and OWL vocabularies, taking advantage of the fact that most modern vocabulary content is structured using SKOS and OWL classes and predicates. This API can then be used as the basis for various higher-level vocabulary applications (NLP applications, Concept Recommender,Semantic Enricher,etc.) that can be used to enrich for example ISO 19115 metadata and other OGC services using controlled vocabularies in their metadata.
The Testbed 11 and 12 had explored the high-level description of ontologies and schemas to support semantic mediation. Future work can include investigation of the kind of metadata needed to enable search on controlled vocabularies by defining an ontology that addresses the following aspects:
-
Classification of Vocabulary types
-
Relationships to other vocabularies (extensions, imports, specialization, metadata vocabularies,etc).
-
Statistical information about vocabularies (number of concepts, concept schemes, classes, properties, instances, datatypes)
-
Schema encoding (OWL, RDF Schema, SKOS)
-
Expressiveness
-
Preferred prefix
-
Preferred Namespace uri
-
Governance metadata
-
Versioning information
Based on this ontology, we propose to define a standard REST API to search and access vocabulary metadata and their terms using best practices in REST API (hypermedia driven for example).
1.3.5. Application of Shape Constraint Language (SHACL) for Linked Data
The Semantic Registry Information Model (SRIM), developed during Testbed 12, is defined as a superset of W3C DCAT standard and encoded as an OWL ontology. However, the OWL does not capture some of the semantic integrity constraints that are necessary to validate the instance information encoded using the SRIM ontology profiles. This is not an isolated problem. The DCAT ontology, for example, defines a set of classes and properties and reuses a large number of external vocabularies such as Dublin Core, but does not provide any restrictions in the ontology. Users have to read the profile documents such as DCAT-AP or GeoDCAT-AP to know which and how properties should be applied for a given class (mandatory, recommended or optional). For example, a Dataset could have only one title per language, or contact information should have either a person name, organization name or position name, and either email or telephone number. These kind of restrictions cannot be captured with OWL, and until now it required human interpretation to implement the constraints in code.
To fill these gaps, the emerging W3C standard called Shape Constraint Language (SHACL) provides a powerful framework to define the "shape" of the graph data and the ability to define complex integrity constraints using well-defined constraints constructs defined in RDF and SPARQL/Javascript constraints. SHACL is not a replacement of RDFS/OWL, but a complementary technology that is not only very expressive but also highly extensible. While RDFS and OWL are used to define vocabularies terms (classes/properties) and their hierarchies (subclasses, subproperties), as well as the nature of the classes and properties (union, intersection, complement of classes, transitive, inverse, symmetric properties, etc.), SHACL is more appropriate to capture the property constraints (cardinality, valid values or shape values and interdependencies between them) and capable of accommodating multiple profiles by providing different shapes for the same ontology. The SHACL vocabulary is not only defined in RDF itself, but the same macro mechanisms can be used by anyone to define new high-level language elements and publish them on the web. This means that SHACL will not only lead to the reuse of data schemas but also to domain-specific constraint languages. Furthermore, SHACL can be used in conjunction with a variety of languages beside SPARQL, including JavaScript. Complex validation constraints can be expressed in JavaScript so that they can be evaluated client-side. In addition, SHACL can be used to generate validation reports for quality control with potentially suggestions to fix validation errors. Overall, SHACL is a future-proof schema language designed for the Web of Data.While SHACL is not yet a standard, there are already existing implementations using it (e.g. Topbraid) .
Future work could include investigation of the use of SHACL shapes to define application profiles, generation and data entry, data validation, and quality control of linked data information.
1.3.6. Composite Symbology and alternates renderers for Semantic Portrayal Service.
During the Testbed 11, it was introduced the portrayal ontology that focused on point-based symbology (icons for Emergency Management). Testbed 12 extended this work by providing a richer symbolizer and graphics ontology that can accommodate line and area-based symbols along with graphic attributes applicable to these symbols. Future work could extend the ontology to accommodate more complex symbology including composite symbols and symbol templates. The extended ontology will help describe more advanced symbology standards such as the family of MIL2525 symbols.
This Testbed managed to render symbol legend based on their definition, however more work is needed to develop rendering a SVG map based on the portrayal ontology. Future work can lay out the foundation to express styles that have at least the same expressiveness as SLDs. The proposed work can extend the portrayal ontology to represent composite symbols and symbol templates. Related future work can include investigation of other renderer outputs such as JSON encoding of the portrayal information, so they can be handled on the client side in HTML5 Canvas or other rendering libraries such as D3.js. Other renderers may also investigate SLD production from the RDF descriptors and investigate how unsupported features from the portrayal ontology can be supported in less expressive graphic languages than SVG, such as KML.
1.4. Foreword
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. The Open Geospatial Consortium shall not be held responsible for identifying any or all such patent rights.
Recipients of this document are requested to submit, with their comments, notification of any relevant patent claims or other intellectual property rights of which they may be aware that might be infringed by any implementation of the standard set forth in this document, and to provide supporting documentation.
2. References
The following documents are referenced in this document. For dated references, subsequent amendments to, or revisions of, any of these publications do not apply. For undated references, the latest edition of the normative document referred to applies.
-
OGC 16-062 - OGC® Testbed-12 Catalogue and SPARQL Engineering Report
-
OGC 15-058 - OGC® Testbed-11 Symbology Mediation Engineering Report
-
OGC 15-054 - OGC® Testbed-11 Implementing Linked Data and Semantically Enabling OGC Services Engineering Report
-
OGC 13-084r2, OGC I15 (ISO19115 Metadata) Extension Package of CS-W ebRIM Profile .0, 2014
-
OGC 12-168r6, OGC® Catalogue Services 3.0 - General Model, 2016
-
OGC 11-052r4, OGC GeoSPARQL- A Geographic Query Language for RDF Data, 2011
-
OGC 08-125r1, KML Standard Development Practices, Version 0.6, 2009.
-
OGC 07-147r2, KML Version 2.2.0.2008
-
OGC 07-110r4, CSW-ebRIM Registry Service ebRIM profile of CSW (.0.1), 2009
-
OGC 07-045, OGC Catalogue Services Specification 2.0.2 - ISO Metadata Application Profile (.0.0), 2007
-
OGC 07-006r1, OpenGIS Catalogue Service Implementation Specification 2.0.2, 2007
-
OGC 06-129r1, FGDC CSDGM Application Profile for CSW 2.0 (0.0.12), 2006
-
OGC 06-121r9, OGC® Web Services Common Standard
-
OGC 06-121r3, OpenGIS® Web Services Common Specification, version 1.1.0 with Corrigendum 1 2006
-
OGC 05-078r4, OpenGIS Styled Layer Descriptor Profile of the Web Map Service Implementation Specification, Version 1.1.0, 2006
-
OGC 05-077r4, OpenGIS® Symbology Encoding Implementation Specification, Version 1.1.0, 2006.
-
ISO/TS 19139:2007, Geographic information — Metadata — XML schema implementation
-
ISO 19119:2005, Geographic information — Services
-
ISO 19117:2012, Geographic information — Portrayal
-
ISO 19115:2003, Geographic information — Metadata
-
ISO 19115:2003/Cor 1:2006, Geographic information — Metadata
-
ISO 19115-1:2014, Geographic information — Metadata — Part 1: Fundamentals
-
Dublin Core Metadata Initiative, last visited 12-09-2016, available from http://dublincore.org/
-
NSG Metadata Foundation (NMF) – Part 1: Core, version 2.2, 23 September 2014 https://nsgreg.nga.mil/doc/view?i=4123
-
DGIWG 114, DGIWG Metadata Foundation (DMF),last visited 12-09-2016, available from https://portal.dgiwg.org/files/?artifact_id=9189&format=pdf
-
DoD Discovery Metadata Specification (DDMS),last visited 12-09-2016, available from https://metadata.ces.mil/dse-help/DDMS/index.htm
-
SPARQL Protocol and RDF Query Language (SPARQL),last visited 12-09-2016, available from https://www.w3.org/TR/rdf-sparql-query
-
DCAT, last visited 12-09-2016, available from https://www.w3.org/TR/vocab-dcat/
-
National System for Geospatial Intelligence Metadata Implementation Specification (NMIS) – Part 2: XML Exchange Schema
-
Project Open Data Metadata Schema v1.1 https://project-open-data.cio.gov/v1.1/schema/
-
Asset Description Metadata Schema (ADMS) https://www.w3.org/TR/vocab-adms/
-
JSON-LD 1.0 https://www.w3.org/TR/json-ld/
3. Terms and definitions
For the purposes of this report, the definitions specified in Clause 4 of the OWS Common Implementation Standard [OGC 06-121r9] and in OGC® Abstract Specification Topic TBD: TBD shall apply. In addition, the following terms and definitions apply.
3.1. feature
representation of some real world object or phenomenon
3.2. interoperability
capability to communicate, execute programs, or transfer data among various functional units in a manner that requires the user to have little or no knowledge of the unique characteristics of those units [ISO 19119]
3.3. map
pictorial representation of geographic data
3.4. model
abstraction of some aspects of a universe of discourse [ISO 19109]
3.5. ontology
a formal specification of concrete or abstract things, and the relationships among them, in a prescribed domain of knowledge [ISO/IEC 19763]
3.6. portrayal
portrayal presentation of information to humans [ISO 19117]
3.7. semantic interoperability
the aspect of interoperability that assures that the content is understood in the same way in both systems, including by those humans interacting with the systems in a given context
3.8. semantic mediation
transformation from one or more datasets into a dataset based on a different conceptual model.
3.9. symbol
a bitmap or vector image that is used to indicate an object or a particular property on a map.
3.10. symbology encoding
style description to apply to the digital features being rendered
3.11. syntactic interoperability
the aspect of interoperability that assures that there is a technical connection, i.e. that the data can be transferred between systems
4. Conventions
4.1. Abbreviated terms
-
API Application Program Interface
-
CRS Coordinate Reference System
-
CSW Catalog Services for the Web
-
DCAT Data Catalog Vocabulary
-
DCAT-AP DCAT Application Profile for Data Portals in Europe
-
DCMI Dublin Core Metadata Initiative
-
EARL Evaluation and Report Language EU European Union
-
EuroVoc Multilingual Thesaurus of the European Union
-
GEMET GEneral Multilingual Environmental Thesaurus
-
GML Geography Markup Language
-
GeoDCAT-AP Geographical extension of DCAT-AP
-
IANA Internet Assigned Numbers Authority
-
INSPIRE Infrastructure for Spatial Information in the European Community
-
ISO International Standardisation Organisation
-
JRC European Commission - Joint Research Centre MDR Metadata Registry
-
N3 Notation 3 format
-
NAL Named Authority Lists
-
OGC Open Geospatial Consortium
-
OWL Web Ontology Language
-
RDF Resource Description Framework
-
RFC Request for Comments
-
SE Symbology Encoding
-
SLD Style Layer Descriptor
-
SKOS Simple Knowledge Organization System
-
SPARQL SPARQL Protocol and RDF Query URI Uniform Resource Identifier
-
SVG Scalable Vector Graphics
-
TTL Turtle Format
-
URI Unique Resource Identifier
-
URL Uniform Resource Locator
-
URN Uniform Resource Name
-
W3C World Wide Web Consortium
-
WG Working Group
-
WKT Well Known Text
-
XML eXtensible Markup Language
-
XSLT eXtensible Stylesheet Language Transformations
5. Overview
This engineering report includes the following major sections:
-
Status Quo & New Requirements Statement: This section describes current standards and requirements for developing Semantic Registry, Semantic Mediation and Semantic Portrayal Services.
-
Solutions: This section describes the solution architectures considered by the testbed, as well as the solution architecture implemented by the testbed. We organized the reporting of the activities of this thread around the services implemented.
-
Appendix sections: These sections include semantic models and REST API of each service.
6. Status Quo & New Requirements Statement
6.1. Status Quo
6.1.1. Semantic Registry Service
Current Catalog solutions are highly dependent upon the metadata model employed for the service and data descriptions. Many of today’s service instances and data holdings are based on an ISO 19115 metadata model using XML as the exchange format. These solutions are very document-centric due to the nature of XML Document Object Model (DOM) that are used to populate these catalogs. This is often a challenge when information need to be integrated and be linked together, as it often requires to use complex protocols (XLink, CSW GetRecords operations) to circumvent the lack of global unique identifier system in XML document. This often leads to significant overhead to query across multiple documents and difficulties in reusing and linking information.
Linked Data provides a solution as information are modeled as a Directed Labeled Graph. The nodes of the graph (called resources) and edges (called properties) are identified with globally unique resource identifiers (URIs) which are assigned to a unique meaning (defined in ontologies). While best practices require to have resolvable resource URIs in Linked Data, in practice it is often not the case. Linked Data from different namespaces are usually aggregated in catalogs to facilitate search and discovery and get easy access to these resource descriptions.
The W3C has released the DCAT recommendation in early 2014. “DCAT is an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web. By using DCAT to describe datasets in data catalogs, publishers increase discoverability and enable applications easily to consume metadata from multiple catalogs. It further enables decentralized publishing of catalogs and facilitates federated dataset search across sites.” (W3C). Thus, DCAT defines a standard way to publish machine-readable metadata about a dataset. It does not make any assumptions about the format of the datasets described in a catalog.
Since then, a number of profiles of DCAT have been created and adopted such as DCAT-AP, GeoDCAT-AP, ADMS and Project Open Data (POD), mostly focused on Dataset description. There is no well-established profile that either generalize or specialize DCAT to describe services, schemas, schema mapping, vocabularies and portrayal information. The Linked Data Query Language SPARQL and its geospatial extension GeoSPARQL are now well-established standards to query Linked Data representation with geospatial information. The SPARQL query Protocol provides a partial answer to access DCAT-related data but it is not sufficient and sometimes too complex to manage and access catalog information through a REST API. CKAN API, an open-source software powering many Open Data Portal have provided recently a plugin to export data in DCAT format. As there is more and more data published in DCAT format in the community, there is a need to have a simple and standardized REST API to manage and access Linked Data describing assets managed by catalogs/registries.
Furthermore, there are many clients that access web services from web browser using protocol based on JSON written in Javascript. The direct use of RDF formats such as RDF/XML, N3, Turtle has proved that they are not easily exploitable by these clients. The new W3C standard JSON-LD is designed to address this gap. It provides the ability to describe Linked Data in JSON and vice versa by leveraging the JSON-LD Context.
The use of REST API in the web community has been widely successfull because it aligns well with web principles and enables rapid integration to build web-based application. A number of effort related to Linked Data and REST API have been conducted such as Linked Data Platform 1.0 (LDP). Recent trends in the industry see the emergence of Hypermedia formats (HAL, Hydra, Collection+JSON, Siren) to support data REST API with hypermedia controls. This introduces a level of decoupling between server and client ecosystem that could enable future evolution of APIs without breaking client ecosystem. The challenge of this testbed is to find out how REST, Hypermedia API, Linked Data representation and APIs, JSON-LD can be combined together to meet the requirements of the service.
6.1.2. Semantic Mediation Service
During the OGC Testbed 11, Image Matters LLC developed the first iteration of the Semantic Mediation Service, demonstrating the transformation of Homeland Security Working Group (HSWG) Incident Ontology to the Canadian Emergency Management Symbology (EMS) Ontology, using a rule engine. The engine was based on the Semantic Mediation Ontology and SPARQL Extension ontology expressing rules and functions of transformations between two semantic models (expressed as ontologies). The transformation from a source ontology to target ontology is called Alignment in the Semantic mediation Ontology.
While semantic transformation in the Testbed 11 was demonstrated on Linked Data representation, there is a large number of messages or documents in the current OGC standards that are based on XML representation. Their structures and syntaxes are defined with XML Schemas (GML, ISO 19139, CSGDM, NMIS). The messages often need to be converted to other XML representation based on another XML schema, typically using XSL Transformations but also by scripts. There is a need to search and discover schema definitions and existing schema mappings and finding optimal transformation paths between two schemas and perform these transformations on demand. The current OGC standards do not provide standard profile to represent schemas and schema mappings in existing OGC Catalog Services. Similary, there is no ontology or DCAT profile capable to describe semantically Schema and Schema Mappings.
There is no existing REST service that enables the search and discovery of schemas and schema mapping, performs validation, calculate optimal transformation paths and perform transformation od message from a schema to another.
6.1.3. Semantic Portrayal Service
The current OGC standards related to Portrayal (Style Layer Descriptor and Symbol Encoding) are based on XML schemas technologies. Styles definition, rules, symbolizers and graphics are defined XML document and do not provide global reusable identifier that allow the reusability and linking of portrayal information to emerge a "web of portrayal information". The standard ISO 19117 that describe portrayal information define an abstract model is mostly designed to be implemented in code but is less adequate to be used a descriptive model.
During the testbed 11, an initial set of portrayal ontologies were defined to describe semantically styles, portrayal rules, point-based symbols and graphics. The information were made accessible through a specialized REST API to be accessible by a WPS to produce SLD documents. The ontologies were limited only on point-based graphic symbols (based on PNG and True type font). The ontology was not able to model line, text and area based symbols and more advanced graphic objects and properties.
A number of attempts have been done in previous testbeds (OWS-8) to represent portrayal information in OGC catalog, but no standard profile has been defined so far and none of them has used a semantic-based approach.
There is a need to manage portrayal information in a catalog that can be related other information such as feature types, layers and taxonomies. This information can be searched and used to perform map portrayal using symbologies from different communities. The initial implementation of the Semantic Portrayal Service in the OGC Testbed 11 did not provide rendering endpoint for the symbology.
6.2. Requirements Statement
6.2.1. Semantic Registry Service
The following were the initial four general RFP requirements on the DCAT as a REST service implementation.
-
It shall be evaluated how DCAT can describe the same service and data sets in RDF as the other catalog services do using XML Schema instance documents compliant to ISO 19115.
-
Demonstrate what role DCAT can play as a heterogeneous catalog integration mechanism and as a possible simplification of the setup and use of catalogs.
-
The DCAT REST implementation shall serve as a Semantic Portrayal Catalog. The Semantic Portrayal Catalog uses an ontology model for managing styles and provides interfaces to access,create,read,update,and delete styles.
-
The DCAT as a REST service shall interface with the Schema Registry. The Schema Registry enables the discovery of XML Schemas, tranformation logic, and ontologies. These items shall be served by the DCAT as a service implementation.
From these initial high-level requirements, we derived and added the following ones:
-
Definition of Common Core Model to represent datasets, services, portrayal information, schema and schema mappings.
-
Demonstrate usage of Linked Data to link heteregeneous Registry Objects
-
Definition of REST API
-
Accomodate Web Client producing and consuming JSON.
-
Ability to evolve API by favoring decoupling between server and client
-
Manage multiple registers (Dataset,Schema, Portrayal Registers)
-
Harvesting API from multiple catalogs (CSW ebRIM, CSW ISO and more)
6.2.2. Semantic Mediation Service
The following items describes the requirements for the Semantic Mediation Service:
-
SRIM Profile for schema and schema mapping
-
Semantic Registry as a service shall interface with the Schema Registry which enables the discovery of XML Schemas, tranformation logic, and ontologies.
-
Support of XML Schema and XSL Transformation
-
Harvesting of Schema and Schema mapping from CSW ebRIM
-
Representation of schema and schema mapping using Linked Data representation
-
Definition of REST API
-
Validation of Document against Schema
-
Transformation from document from Schema A to Schema B.
-
Transformation chaining
6.2.3. Semantic Portrayal Service
The following items describes the requirements for the Semantic Mediation Service
-
Refinement Portrayal ontologies and introduction of Graphic ontology to support line, area, text and composite symbols.
-
Ability to convert of SLD to Semantic Representation
-
The Semantic Registry implementation shall serve as a Semantic Portrayal Catalog. The Semantic Portrayal Catalog uses an ontology model for managing styles and provides interfaces to access, create, read, update, and delete styles.
-
Reusability and linking to existing definition (rules, styles, graphic, symbolizers)
-
REST API for Rendering of Symbols
-
REST API for Rendering of Data using a given style
-
Ability to integrate REST API with Web-based client using JSON.
-
Ability to use portrayal on multiple data representation (Linked Data, GML, JSON).
7. Solutions
This section summarizes the solutions that have been envisioned at the beginning of the testbed, experimented with during the testbed, and that have either been discarded, or implemented, or the decision has been deferred to future activities.
7.1. Targeted Solutions
This first section addresses all solutions that have been discussed during the testbed. They include those possible solutions that have been discarded during the testbed.
7.1.1. Semantic Registry Service Targeted Solutions
The initial requirements of the testbed was to implement a DCAT REST Service that enables the access to DCAT Dataset information. However it quickly emerged that the integration of ISO 19139, NMIS, Services, Schema, Schema Mapping and Portrayal information in the service was not possible using DCAT alone, as DCAT is mostly focused on Dataset description. More generalization was needed to accommodate the different types of information objects in the future (for example Map, Layer, Vocabulary, Sensor).
The closest standard that provides an extensible framework to represent different business objects is the Electronic Business eXtensible Markup Language (ebXML) Registry Information Model (ebRIM). The ebXML registry describes objects that reside in a repository for storage and safekeeping. The information model does not deal with the actual content of the repository. All elements of the information model represent metadata (data type and data relationships) about the content stored in the repository. Such information is used to facilitate ebXML-based Business-to-Business partnerships and transactions. The registry information model provides a high-level schema for the ebXML registry. The ebRIM model has a lot overlapping constructs with the ones defined in RDF and OWL. However we found out that the definition of the classes and properties in ebRIM were often not well aligned with well-established ontologies (with the exception of Dublin Core Terms) and the best practices in the Linked Data community. ebRIM introduces a lot of verbosity to describe simple graph structure thus making very hard to build queries that naturally matches the graph structure. The Linked Data model and the standard SPARQL query language provides a more modern, decentralized approach that lowers the bar of integration and learning curve of manipulating graph-oriented data structure and seems a better match to favor reusability and linkage of information.
By analyzing the ebRIM Model and the ISO 19135 standard, we decided to create a superset of DCAT that support different types of registry items. We borrowed the notion of registers and items from both standards. The proposed solution is a new service called Semantic Registry Service that manages items described semantically and can perform semantic enrichment to better enable search and discovery for information.
Semantic Registry Information Model
We conducted a comparative analysis between different standards related to ISO 19115 profiles and DCAT-related standards (DCAT, DCAT-AP, ADMS, GeoDCAT-AP, Project Open Data). The goal of this analysis was to evaluate how well DCAT can describe the same registry objects (services, datasets, schema, schema mappings and portrayal information) in RDF as the other catalogue services do using XML Schema instance documents. The crosswalk was informed by the work done by the European Commission on developing a geospatial profile for DCAT (alias GeoDCAT). The metadata model provided by DCAT includes classes and attributes for identifying and describing catalogues, datasets, catalogue records, publishing agents and distribution. The metadata model does not include any classes or attributes for identifying or describing services. The absence of classes and attributes for service metadata, portrayal information, schemas and schema mappings in DCAT meant that several ISO service metadata elements did not have equivalent fields in DCAT to map to. To fill the gaps, we determined that a superset of DCAT was needed.
The testbed designed a Semantic Registry Information Model (SRIM) by generalizing the DCAT model to include concepts from ISO 19135, the international standard for procedures for item registration in geographic information systems and core metadata needed to express a large variety of information objects and enable better search and discovery.
Semantic Registry REST API
To facilitate the integration with web-based clients, we decided to implement a REST API that primarily supports JSON_LD output format. However we support also Linked Data Formats for item and register descriptions to enable machine to machine integration. The choice of JSON-LD was based on the fact that it provides a bridge between Linked Data and JSON.
It was also decided that the Semantic Registry REST API will implement both Level 2 and Level 3 (hypermedia REST API) on the Richardson Maturity Model so it can accommodate existing frameworks (such as AngularJS) that build REST API by constructing URL based on well defined url patterns on the client side. The Hypermedia REST API uses the Hypermedia Application Language as it has widespread adoption in the community and demonstrates well the use of hypermedia-control within JSON, and how REST API can evolve independently from the client ecosystem without breaking compatibility.
Semantic Registry Integration with Multi-Catalogs
A number of approaches were considered to integrate multiple catalogues services using different protocols and models (CSW 2.0, 3.0 and ebRIM) with the semantic registry. The different approaches considered during this testbed are described in details in Testbed-12 Catalogue and SPARQL Engineering Report[OGC 16-062].
For practical reasons and due to limited timeframe, we choose to implement a harvester service that convert records from different catalogs to the SRIM model and store them in the Semantic Registry Repository. We also decided to map only the elements of information that were relevant for search and discovery. We also chose to postpone the issue of synchronization of sources for future testbeds. Publish/Subscribe protocols will need to be investigated in the context of the semantic registr in future testbeds.
7.1.2. Semantic Mediation Service Targeted Solutions
The focus of Semantic Mediation Service for this testbed was to focus on transformation of XML document expressed in XML schema using transformation based on XSLT. One of the requirement was to define a ebRIM profile to represent Schema and Schema Mapping served by aebRIM CSW implementation and integrate it with the Semantic Registry. A review of the existing standards to represent schemas and schema mappings in a registry was conducted. Only DCAT and ADMS were found relevant but no profile were defined to accommodate the specificities of schemas and schema mappings such as source and target schema in a mapping. A review of the ebRIM model in CSW was also performed and concluded that an extension was needed to represent schema and schema mapping information in the ebRIM model.
Instead of defining a separate service that manages the schema and schema mapping separately, we decided to extend the core Semantic Registry Information Model by creating a profile to represent them semantically. The benefit of this approach is to test how well the core model can be extended and how well the Semantic Registry REST API can be reused to accomodate different domain models and convenience service APIs. We decided to implement the Semantic Mediation Service as convenience service on top of the Semantic Registy that performs validation and transformation between schemas (including finding a chain of transformation between two schemas). The Semantic Mediation Service would delegate the CRUD operations for schemas, schemas mappings to entirely to the Semantic Registry. We also decided to use a REST API that provide hypermedia controls with JSON-LD representation to lower the bar of integration with web clients, as well as getting Linked Data representation (RDF/XML, Turtle) to describe schemas and schema mappings for machine to machine support.
7.1.3. Semantic Portrayal Service Targeted Solutions
The focus of Semantic Portrayal Service for this testbed was on storing and accessing portrayal information managed by the semantic registry to support symbology mediation and rendering. Instead of defining a separate service that manages the portrayal information separately, we decided to extend the core Semantic Registry Information Model by creating a profile to represent this information semantically. The benefit of this approach is to test how well the core model can be extended and how well the Semantic Registry REST API can be reused to accommodate different domain models. We decided to implement the Semantic Portrayal Service as convenience service on top of the Semantic Registry that performs portrayal information search and rendering. The Semantic Portrayal Service would delegate the Creation/Update/Delete operations for portrayal information entirely to the Semantic Registry. We also decided to use a REST API that provide hypermedia controls with JSON-LD representation to lower the bar of integration with web clients, as well as getting Linked Data representation (RDF/XML, Turtle) to describe portrayal information for machine to machine support.
7.2. Recommendations
This second section summarizes the recommended solution(s) that will be further described in following clauses. It briefly explains the solution(s) and ideally links to relevant sections.
7.2.1. Semantic Registry Service Recommendations
Semantic Registry Information Models
To provide an extensible framework for representing information in a registry, we defined a superset of DCAT called Semantic Information Registry Model (SRIM) that defines the set of core classes and properties that can be used to represent any resources of a domain of interest. The core ontology is extended by defining application profiles. The following profiles were developed during the testbed:
-
Dataset/Service Profile: Used to describe Dataset and Services (such as the one defined in NMIS, ISO 19139). This profile is heavily based on DCAT and GeoDCAT-AP.
-
Schema Application Profile: Used to describe Schema and Schema Mapping (which extends DCAT Dataset) and used by the Semantic Mediation Service
-
Portrayal Application Profile: Used to describe Portrayal information such as Styles, Symbols, Portrayal Rules. This profile was used by the Semantic Portrayal Service.
Semantic Registry REST API
To facilitate the integration of clients with the Semantic Registry Service, we recommended the use of REST API supporting the encoding of the SRIM profiles in Linked Data format using RDF/XML, Turtle, N-Triples and JSON-LD. We also recommended to accommodate Level 2 (Resources with HTTP Verbs) and Level 3 (Hypermedia-driven) of the Richardson Maturity Model. We choose the Level 3 Hypermedia-driven API using the Hypermedia Application Language (HAL+JSON), which is gaining in popularity in the REST community.
Integration of Multi-Catalog REST API
For this testbed, the Semantic Registry service harvested metadata from different OGC Web Catalogs and converted the information to SRIM profiles encoding, but we also allowed for cascading requests to other GeoSPARQL Services that implement the profiles.
7.2.2. Semantic Mediation Service Recommendations
To facilitate the integration of clients with the Semantic Mediation Service, we recommended the use of REST API supporting the encoding of the SRIM Schema Application profile in Linked Data format using RDF/XML, Turtle, N-Triples and JSON-LD. We also recommended to accomodate Level 2 (Resources with HTTP Verbs) and Level 3 (Hypermedia-driven) of the Richardson Maturity Model. We choose the Level 3 Hypermedia-driven API using the Hypermedia Application Language (HAL+JSON), which is gaining in popularity in the REST community. To favor reusability of functionalities, all the CRUD operations of the schemas and schema mappings were implemented by the Semantic Registry. The Semantic Mediation Service was build a convenience service on top of the Semantic Registry to provide search capabilities, validation, transformation path calculation and actual transformation of document based on the path calculated from the schema mappings managed by the registry.
7.2.3. Semantic Portrayal Service Recommendations
To align better with current rendering engine implementation and current descriptive standard for Portrayal (SE, SLD), we decided to align the portrayal ontology closer to the OGC Symbol Encoding (SE)and SVG. We developed the Graphics and Symbolizer ontologies that are closely with these standards, but provide mechanism to support future extensions for more complex stlying scenarios.
The Semantic Portrayal REST Service delegates the CRUD operations on portrayal information to the Semantic Registry which implements the SRIM Portrayal Profile. The Semantic Portrayal Service implements a REST API Level 2 and Level 3 on Richardson Maturity Model. We implemented the Level 3 Hypermedia-driven API using the Hypermedia Application Language (HAL+JSON), which is gaining in popularity in the REST community. The Semantic Portrayal Service should be a convenience service build on top of the Semantic Registry containing Portrayal information by providing search capabilities and rendering points for rendering symbol glyphs for legend and map rendering of geospatial data.
8. Semantic Registry Service
8.1. Overview
Semantic metadata plays a central role in facilitating the discovery and the assessment of geospatial assets (such as datasets, services, portrayal information, schemas, maps, layers), and the integration of these assets in a specific mission. There are a number of standards, formats and APIs that provide the metadata for these assets, but in order to perform efficient search, we need to convert this information into a unified machine readable semantic representation. It is this conversion that enables the discovery of relevant resources that satisfy the mission of the end user. As we increase our understanding of the kind of metadata information needed to perform better and smarter search, we need a model that accommodates extensions over time without breaking the proposed architecture.
During this effort, a number of metadata standards were reviewed (including W3C standard DCAT, DCAT-AP, GeoDCAT-AP, ADMS, Project Open Data 1.1, Dublin Core, ISO 19115, ISO 19119) to identify the common and relevant metadata information needed for search and discovery and to identify any additional metadata information needed to describe dataset, service, portrayal, schema and schema mapping information. It quickly emerged that the DCAT standard and its different application profiles were dataset-centric and insufficient to describe the metadata for portrayal information, schemas, and services. The goal of this effort was not to define a new standard, but to leverage the existing standards to define an application profile of DCAT, with additional properties and fields, that could accommodate the schema, schema mapping, service and portrayal information needed for enhanced search and discovery while still preserving backward compatibility with existing standards.
The effort resulted in a new ontology called Semantic Registry Information Model (SRIM). SRIM is defined as a superset of DCAT and its existing application profiles (DCAT-AP, GeoDCAT-AP,ADMS). It introduces a superclass of dcat:Dataset called srim:Item and the notion of a Register (as defined in ISO 19135). The ontology draws from multiple well-established standards such as W3C DCAT, Project Open Data 1.1, DCAT-AP, GeoDCAT-AP, VCard, Dublin Core, PAV, and ISO 19115, but also addresses some gaps in the standards, such as the description of web services (for example OGC WMS, WFS), richer descriptions of geospatial data, and additional metadata to model schema, schema mapping, and portrayal information, to enable better semantic search of resources that fit with a user’s mission. SRIM enables the integration of different metadata providers (CSW, CKAN, POD WAF, WMS, and WCS) by providing a common core vocabulary to describe resources (data, services, vocabularies, map, layers, schemas, etc.) and by accommodating the specificities of each resource by leveraging the built-in extensibility mechanism of OWL. The integration is done through the use of a semantic bridge that maps the syntactic metadata (JSON, XML based) to the semantic representation based on the SRIM model. The SRIM Core model has been extended by introducing SRIM application profiles to represent other kinds of geospatial assets such as schemas and portrayal information (see sections on Semantic Mediation and Semantic Portrayal Service).
The purpose of the Semantic Registry Service (initially referred to as DCAT REST API) is to define a common interchangeable metadata format for geospatial portals and a REST protocol to access this information. In order to achieve this, SRIM defines a set of classes and properties, which are grouped into mandatory, recommended and optional. Such classes and properties aid interoperability by corresponding to information about register items and registers that is shared by many data portals. Although the Semantic Registry is designed to be independent from its actual implementation, RDF [RDF] and Linked Data [LDBOOK] are the reference technologies that perform the modeling to preserve the semantic fidelity of the conceptual model. However, we wish to facilitate a wide adoption, so we are providing an encoding based on JSON, which could be converted transparently back to a semantic model using a JSON-LD context. The JSON is closely aligned with the Project Open Data metadata schema 1.1 standard, but some extensions and modifications were made when needed to accomodate the Semantic Registry’s requirements. Preferring a decoupling of the server and client ecosystem, the Semantic Registry implementation uses a hypermedia-driven REST API using the Hypermedia Application Language (HAL) with JSON-LD as the payload. Every endpoint of the REST API also provides a Linked Data representation of the resources based on the SRIM ontology.
The following sections describe the different standards that were reviewed, the SRIM model and the implementation details on both the server and the client sides. We also explain the rationale behind some of the design decisions when applicable.
8.2. Review of existing standards
8.2.1. DCAT
DCAT is an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web. By using DCAT to describe datasets in data catalogs, publishers increase discoverability and enable their applications to easily consume metadata from multiple catalogs. It further enables decentralized publishing of catalogs and facilitates federated dataset search across sites. Aggregated DCAT metadata can serve as a manifest file to facilitate digital preservation.
8.2.2. DCAT-AP
The DCAT Application profile for data portals in Europe (DCAT-AP) is a specification based on the Data Catalogue vocabulary (DCAT) for describing public sector datasets in Europe. Its basic use case is to enable cross-data portal search for data sets and to allow public sector data to be easily searchable across borders and sectors. This can be achieved by the exchange of descriptions of datasets among data portals.
In February 2015, the ISA² programme of the European Commission has started an activity to revise the DCAT-AP, based on experience gained since its development in 2013. The outcome of this effort was the publication of DCAT-AP 1.1.
The European Data Portal is implementing the DCAT-AP as the common vocabulary for harmonizing descriptions of over 258,000 datasets harvested from 67 data portals from 34 countries. The DCAT-AP is used in the Open Data Support service initiated by the European Commission with the purpose of realizing the vision of European data portals.
8.2.3. GeoDCAT-AP
GeoDCAT-AP is defined as an extension of DCAT-AP for describing geospatial datasets, dataset series, and services. It provides an RDF syntax binding for the union of metadata elements defined in the core profile of ISO 19115:2003 and those defined in the framework of the INSPIRE Directive. Its basic use case is to make spatial datasets, data series, and services searchable on general data portals, thereby making geospatial information better searchable across borders and sectors. This can be achieved by the exchange of descriptions of datasets among data portals.
8.2.4. Asset Description Metadata Schema (ADMS)
ADMS is a profile of DCAT that is used to describe semantic assets (or just 'Assets'). These assets are defined as highly reusable metadata (e.g. xml schemata, generic data models) and reference data (e.g. code lists, taxonomies, dictionaries, vocabularies) that are used for eGovernment system development.
The ADMS model is intended to facilitate federation and co-operation. Like DCAT, ADMS has the concepts of a repository (catalog), assets within the repository that are often conceptual in nature, and accessible realizations of those assets, known as distributions. An asset may have zero or multiple distributions. As an example, a W3C namespace document can be considered to be a Semantic Asset that is typically available in multiple distributions, one or more machine processable versions and one in HTML for human consumption. An asset without any distributions is effectively a concept with no tangible realization, such as a planned output of a working group that has not yet been drafted.
ADMS is an RDF vocabulary with an RDF schema available at its namespace http://www.w3.org/ns/adms . The original ADMS specification published by the European Commission [ADMS1] includes an XML schema that also defines all of the controlled vocabularies and cardinality constraints associated with the original document.
8.2.5. Project Open Data (POD)
Project Open Data provides the implementation guide and associated resources for the Federal Executive Order on open data and data management, M-13-13 “Managing Information as an Asset,” which includes the standardized metadata schema that all CFO Act agencies are required to use to publish their enterprise data inventories.
The Project Open Data Metadata Schema is a JSON-based implementation of the W3C DCAT vocabulary. This standard is currently implemented by multiple data catalog platforms as well as state and local governments.
-
Implemented by city, state, and county governments on Data.gov/local
Typically, POD documents are often published in Web Accessible Folder (WOF) and harvested by catalogs such as data.gov. The intent of POD is to lower the bar of complexity neededto represent data information by providing guidelines and recommended metadata. This enables a better search and discovery for datasets within the US goverment.
8.2.6. ISO 19115-1
The standard ISO 19115 defines the schema required for describing geographic information and services that is encoded in XML format. It provides information about the identification, the extent, the quality, the spatial and temporal aspects, the content, the spatial reference, the portrayal, distribution, and other properties of digital geographic data and services. The standard ISO 19115 is applicable to:
-
the cataloguing of all types of resources, clearinghouse activities, and the full description of datasets and services;
-
geographic services, geographic datasets, dataset series, and individual geographic features and feature properties.
ISO 19115-1 defines:
-
mandatory and conditional metadata sections, metadata entities, and metadata elements;
-
the minimum set of metadata required to serve most metadata applications (data discovery, determining data fitness for use, data access, data transfer, and use of digital data and services);
-
optional metadata elements – to allow for a more extensive standard description of resources, if required;
-
a method for extending metadata to fit specialized needs.
Though ISO 19115-1 is applicable to digital data and services, its principles can be extended to many other types of resources such as maps, charts, and textual documents as well as non-geographic data. Certain conditional metadata elements might not apply to these other forms of data.
ISO 19139 defines the XML-based implementation for ISO 19115. ISO 19115-1:2014 [ISO19115-1] has superseded ISO 19115:2003. At the date of publication of this document, the XML-based implementation of ISO 19115-1:2014 (namely, ISO 19115-3), was finalised but not yet officially released.
8.2.7. ISO 19135
This International Standard specifies the procedures to establish, maintain, and publish registers of unique, unambiguous and permanent identifiers and meanings that are define items of geographic information. In order to accomplish this purpose, the standard specifies elements of information that are necessary to provide identification and meaning to the registered items and to manage the registration of these items.
8.2.8. Shapes Constraint Language (SHACL)
SHACL is an RDF vocabulary for describing RDF graph structures. These graph structures are captured as "shapes", which correspond to nodes in RDF graphs. These shapes identify predicates and their associated cardinalities, and datatypes. Additional constraints can be associated with shapes using SPARQL or other languages which complement SHACL. SHACL shapes can be used to communicate data structures associated with a process or interface, to generate or validate data, or to drive user interfaces.
Most applications that share data do so using prescribed data structures. While RDFS and OWL enable one to make logical assertions about the objects in some domain, SHACL (Shapes Constraint Language) describes data structures. Features of SHACL include:
-
An RDF vocabulary to define structural declarations of the property constraints associated with those shapes.
-
Complex constraints that can be expressed in extension languages like SPARQL.
-
The possibility to mix SHACL shapes with other semantic web data, as SHACL is based on RDF and is compatible with Linked Data principles
-
SHACL definitions represented in RDF which can be serialized in multiple RDF formats.
8.3. Semantic Registry Information Model (SRIM)
After analysis of the different standards, we decided to create a superset of the DCAT ontology that defines the set of classes and properties commonly used to represent any item in a register. The SRIM ontology borrows extensively from existing standards such as DCAT, GeoDCAT-AP, Dublin Core Terms, ADMS, PROV-O, PAV ontologies, and ISO 19135. In order to entice high reusability of the ontology, we decided not to enforce any restrictions in the ontology, but just define the list of properties and classes that are related through documentation (see Appendix A ). We classified the set of properties for each class as mandatory, recommended and optional.
To address different domains containing different types of items, the Core SRIM ontology is extended through application profiles. An Application Profile is defined as a set of classes and properties that extends the classes and properties defined in the Core Ontology. During this testbed, we defined three different profiles:
-
Dataset/Service Profile: Used to describe Datasets and Services (such as the one defined in NMIS, ISO 19139). This profile is heavily based on DCAT and GeoDCAT-AP.
-
Schema Application Profile: Used to describe Schemas and Schema Mappings (which extends DCAT Dataset) and used by the Semantic Mediation Service
-
Portrayal Application Profile: Used to describe Portrayal information such as Styles, Symbols, and Portrayal Rules. This profile was used by the Semantic Portrayal Service.
We anticipate that in the future, more profiles will be defined for Maps, Layers, Coverage, Imagery, Feature Catalog, and Vocabularies.
8.4. Implementations
8.4.1. Semantic Mapping to SRIM
One of the primary functions of the Semantic Registry is to support search and discovery on a large variety of items using a unified API. The Semantic Registry was tested to handle different item types including Datasets, Services, Schemas, Schema Mappings, as well as Portrayal information such as Symbols, and Portrayal Rules, and Feature Type Styles. To integrate the different encoding standards of this information, including ISO 19139, NMIS, ebRIM Schema Profile, and DCAT, a number of semantic mappers were implemented. These semantic mappers link each standard to the adequate SRIM profiles, and are used by harvesters to extract information from various sources of information.
We found out that a Linked Data encoding (DCAT) of information is easier to integrate than XML encoding because the latter requires code to explicily define the mapping between the syntactic and semantic encoding. The XML encoding of information based on XML schema tends to be more unforgiving when validating data. Another advantage of using a Linked Data approach is that it favors reusability of information that can be created and managed in a decentralized way using a common encoding framework.
One of the biggest challenges when importing data into the system is the validation of the data. The RDF model provides a powerful framework to express any property of a resource by using vocabularies from different ontologies, and it can accomodate easily to partial/incomplete information. However, this flexibility causes difficulties when attempting to validate the data. Due to the limited time for this testbed, we decided to postpone the exploration of SHACL to address this issue for the next testbed. SHACL can provide a powerful way to validate data, define the shape of graph to be processed by the service.
8.4.2. ISO 19139 Mapping Issues.
This section summarizes the list of issues found when mapping ISO 19139 to a semantic representation with data coming from a variety of CSW sources (including data.gov and Geoplatform.gov). Some of these issues come from malformed metadata and ambiguities in the ISO 19115 standards, while others come from a lack of policies from agencies that publish metadata. These issues impede interoperability and integration of information in addition to search and discovery. The usage of Linked Data instead of XML encoding will address many of these problems, but not the ones related to policies.
Identification of Resources
Issue | Identification of Resources |
---|---|
|
There is no consistent way of defining the identifiers for different resources (e.g. organizations, datasets, services, controlled vocabularies, etc.) |
|
Inability to link information and allow reusability. Resource information (concepts) are duplicated several times in different documents with variations of the same information. Updating this information is difficult to perform across all repositories. Need authoritative unambiguous references. |
|
|
|
A new policy to define URI Sets for US Government assets would provide a consistent means to make these trusted assets available for efficient, widespread discovery and re-use. This will encourage reuse and limit duplication. |
Resolvable URI
Issue | Resolvable URI |
---|---|
|
Identifiers used in the 19139 document are often internal (e.g., a primary key in a store implementation) and not accessible as unambiguous web resources. |
|
The lack of consistent machine-resolvable URIs impedes interoperability and limits automation (concepts must be grounded with unambiguous meaning for services to interpret and respond). Grounded URIs will also help humans better understand important concepts. |
|
|
|
Enables the exploration of a “unified knowledge graph” that links and describes resources. Allows users to search, discover and navigate through “Concept Space”, whereupon each concept is resolvable to a grounded (unambiguous) resource for consistent human and machine understanding. |
Multilingual Support
Issue | Multilingual Support |
---|---|
|
The current standard does not enable the support of translations of human readable text in multiple languages. Language is handled at document level, not field level. |
|
Users who do not understand the language of the information producer will not be able to discover relevant data for their tasks |
|
Opt for an implementation that natively provides multilingual support (such as Linked data) or provide guidelines for how to handle multiple languages (e.g., through JSON protocols). |
External Resource Descriptions
Issue | External Resource Descriptions |
---|---|
|
|
|
External resources modeled as a URL value inhibits the capture of additional information to help the role and meaning of the external (auxiliary) resource in the context of a given resource |
|
|
Invalid XLinks
Issue | Invalid XLinks |
---|---|
|
For some of the ISO 19139, xlink:href are not valid URLs (example #FS Lower 48) |
|
The ISO 19139 documents with invalid xlink reference do not validate with a XML schema validator. |
|
Comply to standard XML Schema for xlink:href using URLs |
|
Correct validation of ISO 19139 |
Controlled Vocabulary Management
Issue | Controlled Vocabulary Management |
---|---|
|
|
|
|
|
|
|
|
Keywords Types
Issue | Keyword Types |
---|---|
|
The list of keyword types in ISO 19115 is limited to a few categories (discipline, strata, topic, place, temporal). |
|
Inability to accommodate new types of concepts such as audience, function, subject, topic, etc.. |
|
|
|
|
Keyword Labeling Inconsistencies
Issue | Keyword Labeling Inconsistencies |
---|---|
|
In some instances, multiple labels are encoded as one keyword (e.g., 'list of all US states' is one keyword). |
|
While this is fine for doing lexical-based text search, it is not sufficient when supporting semantic search, where each concept must be grounded to a unique meaning. |
|
|
|
|
Authority for Controlled Vocabularies
Issue | Authority for Controlled Vocabularies |
---|---|
|
The ISO 19139 uses the list of topic categories in the standard ISO 19115. There is a SKOS encoding available in the European Registry located at: http://inspire.ec.europa.eu/metadata-codelist/TopicCategory. The mapping to Semantic Registry uses this URI to reference dcat:theme. |
|
If no authority are responsible of the management of controlled vocabularies, the vocabularies will not be reused and risk to be duplicated. |
|
|
|
The taxonomy is maintained by the authority that defines the standard and thus will favor reusability of the vocabularies among information producers. |
Place Name Consistency
Issue | Place Name Consistency |
---|---|
|
ISO 19139 uses keywords to define place names that reference a thesaurus that is not accessible online. There is no consistent way to define place names and resolve ambiguities. |
|
The place name can be ambiguous as there are many locations with the same name (e.g. Leesburg, FL versus Leesburg, VA) |
|
|
Contact Point
Issue | Contact Point |
---|---|
|
Contact Point in ISO 19139 is not systematically encoded in the document. The individual’s name is required in POD but is not always present in the ISO document. A generic email reference for the contact role is sometimes used. |
|
When a problem is present in the metadata, a contact point with an email should be available for expedient resolution of issues. |
|
|
|
The use of a generic role-based email for the contact will smoothly handle staff changes. |
Responsible Party without Role
Issue | Responsible Party without Role |
---|---|
|
Some responsible parties are published without a role, while the ISO standard indicates that the role is mandatory |
|
Without a role, we are unable to understand how each party relates to a data source. |
|
Enforce role in ISO 10139 for each responsible party |
|
We are able to discern how each party relates to a metadata item unambiguously. |
Responsible Party Role Encoding
Issue | Responsible Party Role Encoding |
---|---|
|
ISO 19139 outlines a well-defined taxonomy for Responsible Party roles (e.g., Publisher, etc). ISO 19139 refers to a GML document, through a URL and an Xpointer, which contains roles and many other concepts (instead of a unique concept) |
|
|
|
Encode the role taxonomy in SKOS (machine-readable) and use resolvable URIs for roles. |
|
Both machine and human can understand the unambiguous meaning of the concept. |
Organization Hierarchy
Issue | Organization Hierarchy |
---|---|
|
ISO 19139 does not provide support for the subOrganizationOf property (recommended by Project Open Data). |
|
|
|
|
|
When a resource search is performed for a given organization, the hierarchy can also be leveraged to search within suborganizations (using transitive inferencing). |
Inconsistent Usage of OnlineResource in ContactInfo
Issue | Inconsistent Usage of OnlineResource in ContactInfo |
---|---|
|
In some documents, the link to services and distributions (zip files) is put in a responsible party’s contact information (onlineResource) instead of the ServiceIdentification property or the TransferOptions in a Distribution |
|
The ContactInfo’s onlineResource property is being misused semantically. |
|
|
|
Consistency of description of services and distributions in ISO 19139, will help to make a clear distinction between service and distributed content that can be downloaded. |
Service API Standards
Issue | Service API Standards |
---|---|
|
There isn’t a consistent manner of referring to the applicable services API standard, e.g., WMS, WFS, ArcREST |
|
There is no systematic and unambiguous way to identify web services standards. The version of a standard is often not clear (OGC:WMS). Smart software, assisted by people, need to resolve spec confusion. |
|
|
|
Proper classification of service standards, disambiguation, and support of autonomous operations |
Service API Specification
Issue | Service API Specification |
---|---|
|
Absence of industry best practices or standards to refer to machine-processable API specifications (RAML, ALPS, Swagger, WSDL, etc.). |
|
|
|
Semantic Registry should produce a machine-processable API Document. |
|
Integration with the service API can be automated. |
Service Online Resource URL
Issue | Service Online Resource URL |
---|---|
|
The access URL for a service is not consistently encoded. For example in a WMS, some URIs point to a GetCapabilities endpoint, while others point to the base URL of the service |
|
There is no systematic way to access the service endpoint for a given service. Software agents have to analyze the URL to get a normalized form |
|
|
|
Systematic access to a service endpoint. |
Insufficient Service Metadata
Issue | Insufficient Service Metadata |
---|---|
|
The service description associated with a Dataset has minimal metadata, usually limited to an accessURL and format. |
|
|
|
|
|
Enable the discovery of services and invocation of services in an automated way. |
Format and OnlineResource Parity
Issue | Format and OnlineResource Parity |
---|---|
|
The ISO standard decouples Format and OnlineResource. One format can have more than one online resource URL. |
|
Having multiple URLs for a format is ambiguous and not friendly to machines or users. |
|
Enforce parity of OnlineResource with format. |
|
Proper pairing of format with online resource removes ambiguity to both machines and users. |
Download Format Versus Service
Issue | Download Format Versus Service |
---|---|
|
The ISO standard does not clearly distinguish between a download file format and a service API in a Dataset distribution. |
|
Classification of services versus downloads is difficult and not friendly to machines or users. |
|
|
|
|
Format Description
Issue | Format Description |
---|---|
|
There is no consistent way to define the format of services (OGC:WMS). Usage of mime type is not consistent in the standard, and most format descriptions are not machine readible. |
|
Inconsistency of format description makes it difficult for software agents to access data in automatically. |
|
|
|
Enables automation, content negotiation and service selections based on controlled vocabularies. |
Insufficient Map Layer Description
Issue | Insufficient Map Layer Description |
---|---|
|
The ISO standard does not provide enough information to map a dataset to a layer in a map service (WMS, ArcREST). Often multiple layers are provided by the map service and there is no deterministic way to find out which one corresponds to the dataset. |
|
Traceability from dataset to map layer is unavailable. The missing layer metadata is needed to support GeoPlatform search, discovery and proper use. |
|
|
|
Support a vastly improved layer search and map building experience. |
Data-centric Approach
Issue | Data-centric Approach |
---|---|
|
Data Schema Standardization of domain models uses a syntactic approach. Imposing this strict adherence to a standard tends to minimize heterogeneity. |
|
|
|
|
|
|
8.4.3. Semantic Registry Service
The Semantic Registry Service was designed to manage multiple registers that are capable of containing item classes from different application profiles. To support the testbed 12, we implemented three different registers:
-
Datasets and Services Register: Manages datasets and services collected from Compusult, Envitia and ESRI CSW instances
-
Schema and Schema Mapping Register: Manages schemas and schema mappings harvested from Galdos CSW Schema Registry
-
Portrayal Service Register: Manages portrayal information (styles, symbols, symbolSets, and portrayal rules)
Note
|
The partitioning of the registers was done to provide some clarity in the organization of the information. However it is possible to create a register that contains multiple application profiles. The partitioning decision is based on the business requirement of the user. |
These registers were populated by a harvester service which is integrated with the Semantic Registry Service and accessible by a hypermedia-driven REST API. The harvester service was designed to be extensible and to support multiple types of data sources, including documents extracted from a resolvable URL (Project Open Data, DCAT , ISO 19139, FDGC CSGDM documents), and advanced web services such as CSW, CKAN, Web Accessible Folder, and ESRI Web services. These plugins called harvester types describe the list of parameter descriptors needed by the harvester. An instance of a harvester type is called a harvester source and provides binding of the parameters to values. A harvester source can be triggered for harvesting manually or a given schedule, and the harvester results are returned with statistics (number of harvested objects successfully imported, number of failures) as well as the list of item identifiers. Due to limited time for implementation, only synchronous calls to harvesters are supported. Future development will handle asynchronous harvesting with on demand status reports.
The items managed by the service are stored in a NoSQL store, and are indexed and managed in a RDF store to support graph analytics and SPARQL queries.
8.4.4. Semantic Registry Service REST API
The initial objective of the testbed was to provide a DCAT REST API, which focused on the search and discovery of dcat:Datasets. However, promoting the DCAT model to the superset SRIM model also necessitated a promotion of the REST API to manage registers and harvester types and sources, and to handle more general items, including Portrayal items, Schema and Schema Mapping items.
A review of existing implementations that use DCAT datasets showed that the only consensus in how to access the information through a REST API, was the use of a SPARQL query protocol. Using an OGC filter was not considered adequate enough for complex queries of RDF data, as SPARQL provided a more compact and standardized way to query linked data. One of the main considerations when designing the REST API for the Semantic Registry was to make it accessible for web clients, which primarily operate in JSON, and to bridge the gap between linked data and JSON, the Semantic Registry uses the W3C JSON-LD. The use of JSON-LD context allows the conversion of RDF models to JSON representations and vice versa. Another objective of the API was to provide a degree of separation between the server and client implementation, to allow the API to evolve in the future without breaking client ecosystems. To achieve this, the Semantic Registry uses Hypermedia Links which provide a powerful mechanism to decouple clients and servers. This corresponds to the Level 3 REST API on the Richardson Maturity Model.
To implement a Level 3 REST API, we adopted the IETF standard candidate Hypermedia Application Language (HAL), a popular standard candidate which is widely used by JSON hypermedia REST APIs.
We also acknowledge that many web frameworks (such as AngularJS) are designed for Level 2 APIs and construct URLs on the client side to access the different states of a web application. To accomodate these frameworks, we decided to also implement a Level 2 REST API by providing well-defined URL patterns to access the artifacts of the service (registers, items,harvesters types, harvester sources) and a unique identifier for each artifact. The responses of Level 2 are identifical to those of Level 3, except for the exclusion of the hypermedia links to other states. The REST API endpoints URL pattern documented in Appendix D are considered informative only not normative.
In addition to the Level 2 and Level 3 REST APIs that will mostly be used by web clients, we added support for Linked Data API that will mainly be used by machines. Each REST endpoint of the Semantic Registry Service also supports a Linked Data output in RDF/XML, Turtle and N-Triples formats.
Furthermore, each Register endpoint also provides a GeoSPARQL endpoint that permits advanced SPARQL queries on the Linked Data representation of the items managed by each register.
8.4.5. Integration with OGC Catalog Services
To evaluate interoperability aspects in multi-catalog type environments, the testbed considered a number of solutions. Each solution involved various types of catalogue services, for example, CSW featuring ISO based metadata and OpenSearch, other CSW offering a SOAP binding, and support for DCAT using RDF.
Several architectural solutions could be used to establish a multi-catalogue environment, and four key architectural solutions were identified by the Testbed. The identified solutions differ in a variety of ways, including the entry point for client applications and the computational balance between the client application and the services.
The first solution for a multi-catalogue environment includes a client application that can query the various catalogue services directly. This requires the client application to prepare appropriate queries for each catalogue service and to collate the search results when they are returned by the services.
The second solution involves the selection of one of the catalogue services to initiate a distributed search. In this case, the client application only needs to prepare queries to send to the cascading catalogue service. Upon receiving a request from the client, the cascading catalogue service then adapts the request to forward to other catalogue services and returns responses from the other services, as well as results from its own catalogue.
The third solution involves the harvesting of metadata from one or more source catalogue services into a single target catalogue service. Harvesting is ideally conducted at a scheduled time and not when a query is received from the client. The client application can then query the target catalogue service to discover resources published by both the source and target catalogue services.
The fourth solution involves the replication of metadata between a federation of catalogue services. Replication would ideally be conducted at a scheduled time and not when a query is received from the client. The client application can then query any catalogue service to discover resources published by any catalogue service.
For this tesbed, the integration with the OGC Catalog Services was accomplished by using a Harvester Service.
Compusult CSW Integration
We integrated Compusult CSW, which serves a ISO 19139 document, using the CSW 2.0 protocol. The integration could have been done with CSW 3.0, but no open-source clients that supported the CSW 3.0 protocol were available at the time of the testbed 12. However, the Harvester configuration for CSW 3.0 would be very similar to the CSW 2.0 GetRecords operation. To map ISO 19139 to SRIM, we use a semantic mapping using the DCAT profiles. We have not found any issues validating the ISO 19139 document against their XML schema, however we found some issues in the ISO 19139 mapping (explained in Section 1.4 Implementations).
A Harvester Source for the CSW Catalog was defined and harvested on demand. The following figure shows a client displaying the harvester source for Compusult CSW:
The following snippet shows the JSON encoding of the harvester source configuration:
{
"id": "compusultCSW",
"type": "csw",
"title": "Testbed12 Compusult CSW",
"description": "Compusult CSW used for OGC Testbed 12 to harvest ISO19139 documents",
"created": "2016-10-03T22:49:27.311Z",
"modified": "2016-10-03T22:49:27.311Z",
"source": "http://ogc-testbed12.compusult.net/wes/serviceManagerCSW/csw",
"config": {
"resourceType": "http://www.isotc211.org/2005/gmd"
},
"harvestInterval": "MANUAL",
"registerId": "datasets"
}
ESRI CSW Integration
We integrated the ESRI OGC CSW, which serves ISO 19139 documents, by defining a Harvester Source with CSW 2.0. We found out that some of the ISO 19139 documents registered in the CSW were not compliant with the standards (for example missing ScopeCode in HierarchyLevel) The following snippet shows the configuration of the harvester:
{
"id": "esriCSW",
"type": "csw",
"title": "Testbed12 ESRI CSW",
"description": "ESRI CSW used for OGC Testbed 12",
"created": "2016-11-15T18:11:24.203Z",
"modified": "2016-11-15T18:11:24.203Z",
"source": "http://gptogc.esri.com/geoportal/csw",
"config": {
"resourceType": "http://www.isotc211.org/2005/gmd"
},
"harvestInterval": "MANUAL",
"registerId": "datasets"
}
Envitia CSW Integration
Envitia provided a CSW instance with a ebRIM profile. We configured the harvester to collect dataset metadata stored in an object of type: urn:ogc:def:ebRIM-ObjectType:OGC-I15::DataMetadata. The following snippet shows the configuration of the CSW Harvester:
{
"id": "envitiaCSW",
"type": "csw",
"title": "Testbed12 Envitia ebRIM CSW",
"description": "Envitia CSW used harvest ebRIM datasets records",
"created": "2016-11-15T18:11:24.236Z",
"modified": "2016-11-15T18:11:24.236Z",
"source": "http://86.188.147.99:9080/RegistryService/registry",
"config": {
"requestXML": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>
<csw:GetRecords xmlns:env-ebrim=\"http://www.envitia.com/schemas/georegistry/ebrim-ext\"
xmlns:xmime=\"http://www.w3.org/2005/05/xmlmime\" xmlns:dct=\"http://purl.org/dc/terms/\"
xmlns:csw=\"http://www.opengis.net/cat/csw/2.0.2\" xmlns:gml=\"http://www.opengis.net/gml\"
xmlns:wrs=\"http://www.opengis.net/cat/wrs/1.0\" xmlns:ows=\"http://www.opengis.net/ows\"
xmlns:ogc=\"http://www.opengis.net/ogc\" xmlns:dc=\"http://purl.org/dc/elements/1.1/\"
xmlns:xlink=\"http://www.w3.org/1999/xlink\" service=\"CSW\" version=\"2.0.2\"
resultType=\"results\" outputSchema=\"urn:oasis:names:tc:ebxml-regrep:xsd:rim:3.0\"
startPosition=\"1\" maxRecords=\"50000\">
<csw:Query typeNames=\"wrs:ExtrinsicObject_coi\">
<csw:ElementSetName typeNames=\"coi\">full</csw:ElementSetName>
<csw:Constraint version=\"1.1.0\">
<ogc:Filter>
<ogc:PropertyIsEqualTo>
<ogc:PropertyName>$coi/@objectType</ogc:PropertyName>
<ogc:Literal>urn:ogc:def:ebRIM-ObjectType:OGC-I15::DataMetadata</ogc:Literal>
</ogc:PropertyIsEqualTo>
</ogc:Filter>
</csw:Constraint>
</csw:Query>
</csw:GetRecords>",
"resourceType": "urn:ogc:def:ebRIM-ObjectType:OGC-I15::DataMetadata"
},
"harvestInterval": "MANUAL",
"registerId": "datasets"
}
CSW ebXML Schema Registry
During the testbed, Galdos provided a CSW 2.0 instance which implemented the ebRIM profile. We extended the profile to accomodate representations of Schemas and Schema Mappings. In addition, we implemented a Semantic Mapper that converts the Schema and Schema Profile to the SRIM Schema Application Profile, and integrated it with a Semantic Registry harvester.
The following shows the Harvester Source Configuration needed to access the Schemas and Schema Mappings from the CSW:
{
"id": "galdosCSW1",
"type": "csw",
"title": "Schema Harvester from Galdos ebRIM CSW ",
"description": "This source harvests schemas stored in ebRIM Model",
"created": "2016-10-03T22:49:27.542Z",
"modified": "2016-10-03T22:49:27.542Z",
"source": "http://ows.galdosinc.com/indicio/query",
"config": {
"requestXML": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>
<csw:GetRecords xmlns:env-ebrim=\"http://www.envitia.com/schemas/georegistry/ebrim-ext\"
xmlns:xmime=\"http://www.w3.org/2005/05/xmlmime\"
xmlns:dct=\"http://purl.org/dc/terms/\"
xmlns:csw=\"http://www.opengis.net/cat/csw/2.0.2\"
xmlns:gml=\"http://www.opengis.net/gml\"
xmlns:wrs=\"http://www.opengis.net/cat/wrs/1.0\"
xmlns:ows=\"http://www.opengis.net/ows\"
xmlns:ogc=\"http://www.opengis.net/ogc\"
xmlns:dc=\"http://purl.org/dc/elements/1.1/\"
xmlns:xlink=\"http://www.w3.org/1999/xlink\"
service=\"CSW\" version=\"2.0.2\"\r\n\tresultType=\"results\" outputSchema=\"urn:oasis:names:tc:ebxml-regrep:xsd:rim:3.0\"
startPosition=\"1\" maxRecords=\"50\">
<csw:Query typeNames=\"wrs:ExtrinsicObject\">
<csw:ElementSetName>full</csw:ElementSetName>
<csw:Constraint version=\"1.1.0\">
<ogc:Filter>
<ogc:PropertyIsEqualTo>
<ogc:PropertyName>@objectType</ogc:PropertyName>
<ogc:Literal>urn:ogc:def:ebRIM-ObjectType:OGC:Schema</ogc:Literal>
</ogc:PropertyIsEqualTo>
</ogc:Filter>
</csw:Constraint>
</csw:Query>
</csw:GetRecords>",
"resourceType": "urn:ogc:def:ebRIM-ObjectType:OGC:Schema"
},
"harvestInterval": "MANUAL",
"registerId": "schemas"
}
8.4.6. Integration with Clients
A number of clients were successfully integrated with the Semantic Registry, as illustrated by the following figure:
ESRI Semantic Registry Client
The ESRI Client provides a plugin framework to access a variety of catalog services. For this testbed, ESRI developed a plugin to access the Semantic Registry. The following figure shows the results of a search in the ESRI Semantic Registry Client: