Published

OGC Engineering Report

OGC Testbed-19 Agile Reference Architecture Engineering Report
Lucio Colaiacomo Editor
OGC Engineering Report

Published

Document number:23-050
Document type:OGC Engineering Report
Document subtype:
Document stage:Published
Document language:English

License Agreement

Use of this document is subject to the license agreement at https://www.ogc.org/license



I.  Executive Summary

The concepts of agile architecture and reference architecture may not be new ideas in information or geospatial technologies, but what is meant by the term Agile Reference Architecture?

Agile Reference Architecture is the long-term vision of the complex and changing nature of how problems will be solved in the future within the location-referenced and geospatial realms. This includes consideration of network availability, as containers integrated with Linked Data, and Application Programming Interfaces (APIs) serve data as secure, trusted, and self-describing resources.

While the Open Geospatial Consortium (OGC) focuses on geospatial information and technologies, that community is also dependent on the overall state of information and communications technology (ICT), including developing cyber, cryptographic, and internet technologies.

In today’s infrastructures, the collection, exchange, and continuous processing of geospatial resources typically happens at pre-defined network endpoints of a spatial data infrastructure. Each participating operator hosts some capability at a network endpoint. Whereas some network operator endpoints may provide data access, other endpoints provide processing functionality and other endpoints may support the uploading of capabilities. In other words, such an infrastructure is not agile in the sense that it cannot adapt by itself to meet the needs of the moment. One of the biggest challenges resulting from the static characteristics is ensuring effective and efficient operations of the overall system and at the same time maintaining trust and provenance.

This OGC Testbed 19 Engineering Report (ER) outlines novel concepts for establishing a federated agile infrastructure of collaborative trusted systems (FACTS) that is capable of acting autonomously to ensure fit-for-purpose cooperation across the entire system. One of the key objectives is to not create a new data product, but instead a collaborative object is offered leveraging FACTS that allows for obtaining the data product via well-defined interfaces and functions provided by the collaborative object.

Trust and assurance are two key aspects when operating a network of collaborative objects leveraging STANAG 4774/4778. STANAG 4774 outlines the metadata syntax required for a confidentiality label to better facilitate and protect sensitive information sharing. In addition, STANAG 4778 defines how a confidentiality label is bound to the data throughout its lifecycle and between the sharing parties.The agile aspect is achieved by the object’s ability to activate, deactivate, and order well-defined capabilities from other objects. These capabilities are encapsulated in building blocks. Each building block is well defined in terms of accessibility, functionality, and ordering options. This allows building blocks to roam around collaborative objects as needed to ensure a well-balanced network load and suitable processing power of individual nodes from the network.

Equally trusted partners in the infrastructure participate in FACTS. They can collect data from other partners and create derived products via collaborative objects. The sharing of data products is only possible directly, meaning direct communication with data consumer and it is only possible via the objects. This guarantees that fundamental trust operations are applied to the data and provenance records are produced before the data product is made available to others. The use of Blockchain technology and Smart contracts is one example of how this fundamental behavior can be planted into collaborative objects. As in trusted networks that are using Evaluation Assurance Level (EAL) approved hardware and software components, the objects will have to undergo a similar assurance process.

For ensuring the acceptance and interoperability of an agile reference architecture, built on top of FACTS with collaborative objects and building blocks, standardization is a key aspect. In particular, the core (fundamental) requirements for FACTS as well as the interfaces and capabilities of the collaborative objects and pluggable building blocks should be standardized. The OGC provides a consensus based collaborative standardization environment fits these requirements very well.

II.  Keywords

The following are keywords to be used by search engines and document catalogues.

testbed, architecture, Agile Reference Architecture

III.  Contributors

All questions regarding this document should be directed either to the editor or to the contributors.

NameOrganizationRole
Lucio ColaiacomoEU SatCenEditor
Greg BuehlerOGCContributor
David HabgoodKurrawongAIContributor
Christophe NoëlSpacebel s.a.Contributor
Clemens Porteleinteractive instruments GmbHContributor
Yves CoeneSpacebel s.a.Contributor

1.  Introduction

The term Agile Reference Architecture (ARA) refers to the long-term vision of the complex and changing nature of how problems will be solved in the future within the location-referenced and geospatial realms, with or without network availability, as containers mix with Linked Data, and as APIs and data become more secure, trusted, and self-describing. In addition to relying on OGC Standards for enabling geospatial interoperability, the geospatial community also depends on the overall state of information and communications technology (ICT), including developing cyber, crypto, and internet technologies. In the OGC Testbed 19 ARA task as documented in this Engineering Report (ER), the reader is encouraged to begin a journey to define where the industry is with the current reference architecture. This discussion includes where the industry is headed in the near term as technology and ideas are developed (next generation), and ultimately, to determine a suitable direction for the generation-after-next of the geospatial community. This ER will not answer all these questions but is intended to provide a baseline, upon which future initiatives will build. This is required in order to evolve into the next, more flexible reference architecture, and ultimately into the agile reference architecture of the generation-after-next.

1.1.  Problem statement

In recent years the trend in Information Technology Security — the methods, tools and personnel used to defend an organization’s digital assets — developed normative references that require revision of how digital information (including geospatial data) is produced, managed and served. Trust of Data and Services is no longer an implementation issue but more and more an issue for the implementation and adoption of geospatial standards. Moreover, the OGC Standards development process has typically assumed a static networking model in the sense that each operator publishes interface instances or APIs with a given set of functionality. The following issues therefore need to be considered. Proxy caching is a feature of proxy servers that stores content on the proxy server itself, allowing web services to share those resources to more users. The proxy server coordinates with the source server to cache documents such as files, images and web pages. So creating a data space out of the control of the serber (or API). Another issue is how to discover new or updated capabilities provided by the APIs. OGC API Standards support synchronous or asynchronous communication, but still require using HTTP/S and/or MQTT protocols. Another issue is how to establish autonomous interactions between systems assuring trust. OGC API Standards are concerned with managing data, providing access to data, or processing data. For example, consider the current OGC API Standards baseline. A user can access a sensor through implementations of current OGC API Standards. However, how does the user determine the chain of commands for a given activity? How is Trust managed in OGC Standards? How can the provenance of the information be accessed via implementations of OGC Data Encoding Standards? Consider a GML instance document or a map stored in GeoTIFF or GMLJP2. How can the user of that information validate the integrity of the information and be assured about the authenticity of the original author? How is provenance documented? Consider for example a GML document or a GeoTIFF file that was modified by a user in good faith (e.g., updating feature properties to reflect updates). Once the information is saved to storage there is no recording that the modification happened! More than loosely coupled APIs are required to support the requirements as identified above. An ecosystem of trusted collaborating systems, of which implementations of the OGC API and Web Service Standards can be a part of, needs to be defined. In the D123-128 parts of this docment you will see examples of what is possible to manage with current standards and definitions but in an environment that is neither trusted nor secure.

1.2.  Possible path towards Next Generation Architecture

The objective is that any interaction on data inevitably produces a verifiable trace (provenance + identity) and that data itself is secured (principles of data centric security applied). The idea is to introduce Collaborative Objects (CO) that are capable of negotiating relevant business (in particular not data) between each other based on Smart Contracts. For example, Smart Contracts integrated into a Blockchain can assert fundamental communications to produce metadata and provenance. This enables working together in agilely in a self-sovereign / adaptable way. All interactions are controlled via Smart Contracts. F.A.C.T.S. (Federated Adaptive (Infrastructure of) Collaborating Trusted Systems) as the ecosystem establishes trust and provenance based on Collaborative Objects (COs). These COs could be implemented, as an example, via Docker Images using Content Trust. This ensures that there is basic trust in COs, which is required to realize the adaptive Trusted System. Many current implementations of OGC API Standards are not adaptive – they cannot self-adapt because there is no “built-in” logic other than providing access to data, metadata, and processes! They are simply not designed to do so. However, OGC API Standards are important for realizing the vision of data access in F.A.C.T.S. In the context of defining the next generation architecture, it is clear that there is a need to define a whole ecosystem for increased flexibility. Such a new Agile Reference Architecture can also adapt to an increased number of data/processes that are also derived by algorithms creating new COs in near real time services. The main steps for the next activities are Data Centric Security (IPT based), OGC building block definition with IT Security constraints considered (IPT enabled), OGC Standards harmonization to consider IPT.

2.  Terms, definitions and abbreviated terms

This document uses the terms defined in OGC Policy Directive 49, which is based on the ISO/IEC Directives, Part 2, Rules for the structure and drafting of International Standards. In particular, the word “shall” (not “must”) is the verb form used to indicate a requirement to be strictly followed to conform to this document and OGC documents do not use the equivalent phrases in the ISO/IEC Directives, Part 2.

This document also uses terms defined in the OGC Standard for Modular specifications (OGC 08-131r3), also known as the ‘ModSpec’. The definitions of terms such as standard, specification, requirement, and conformance test are provided in the ModSpec.

For the purposes of this document, the following additional terms and definitions apply.

This document uses the terms defined in OGC Policy Directive 49, which is based on the ISO/IEC Directives, Part 2, Rules for the structure and drafting of International Standards. In particular, the word “shall” (not “must”) is the verb form used to indicate a requirement to be strictly followed to conform to this document and OGC documents do not use the equivalent phrases in the ISO/IEC Directives, Part 2. This document also uses terms defined in the OGC Standard for Modular specifications (OGC 08-131r3), also known as the ‘ModSpec’. The definitions of terms such as standard, specification, requirement, and conformance test are provided in the ModSpec. For the purposes of this document, the following additional terms and definitions apply.

2.1.  Terms and definitions

2.1.1. Building Block

A building block is a package of functionality defined to meet specific business needs. The way in which functionality, products, and custom developments are assembled into building blocks will vary widely between individual architectures

2.1.2. Collaborative Object

A Collaborative Object is self-contained and contains data products, services/processes, and collaborates in the exchange of events, as well as the invocation of operations.

2.1.3. Data-centric Security

Data-centric security is an approach to security that emphasizes the dependability of the data itself rather than the security of networks, servers, or applications [Wikipedia].

2.1.4. F.A.C.T.S.

FACTS (Federated infrastructure of Agile Collaborative Trusted Systems) establishes trust based on Collaborative Objects (COs) such as a collection of Docker Images using Content Trust.

2.1.5. Smart Certificate

A Smart Certificate ensures that the F.A.C.T.S. Collaborative Object is doing what it is supposed to do, supporting verified attestation and processes. A Smart Certificate assures, for example, that the APIs on the Collaborative Object interact with F.A.C.T.S. Similar to the Smart Contract, where the recording of the processing is published on the Blockchain BEFORE the data product is published, a Smart Certificate ensures that the Verifiable Attestation is issued before the data product is published.

2.1.6. Smart Contract

A Smart Contract is a computer program or a transaction protocol that is intended to automatically execute, control, or document events and actions according to the terms of a contract or an agreement. [Wikipedia]. A F.A.C.T.S. Smart Contract for example ensures that the recording of the processing metadata (provenance information) is published on the Blockchain BEFORE the service (Building Block) is published.

2.1.7. Verifiable Attestation

A type of verifiable credential containing claims about certain attributes of an entity for uses other than identification or authentication (EBSI definition). https://code.europa.eu/ebsi/json-schema/-/tree/main/schemas/ebsi-attestation

2.2.  Abbreviated terms

ABB

Architecture Building Block

API

Application Programming Interface

ARA

Agile Reference Architecture

BB

Building Block

CO

Collaborative Object

CQL

Common Query Language

DCS

Data Centric Security

DDIL

Denied Disrupted Intermittent Limited

DGIWG

Defense Geospatial Information Working Group

DID

Decentralized Identifier

DIF

Decentralized Identity Foundation

DMF

DGIWG Metadata Foundation

FACTS

Federated Agile Collaborating Trusted Systems

GDAL

Geospatial Data Abstraction Library

HTML

Hypertext Markup Language

HTTP

Hypertext Transfer Protocol

IPT

Identity Provenance Trust

ISO

International Organization for Standardization

JSON

JavaScript Object Notation

MGCP

Multinational Geospatial Co-production Program

NGA

US National Geospatial Intelligence Agency

NSG

US National System for Geospatial Intelligence

OGC

Open Geospatial Consortium

OS

Ordnance Survey (Great Britain)

RDF

Resource Description Framework

REST

Representational State Transfer

RM-ODP

Reference Model of Open Distributed Processing

SatCen

European Union Satellite Center

SBB

Solution Building Block

SC

Smart Certificate

SHACL

Shapes Constraint Language

SSI

Self-Sovereign Identity

SWG

Standard Working Group

TIE

Technology Integration Experiment

TOGAF

The Open Group Architecture Framework

TP

Trust and Provenance

UML

Unified Modeling Language

URI

Uniform Resource Identifier

URL

Uniform Resource Locator

VA

Verifiable Attestation

XML

eXtensible Markup Language

3.  Reference Architecture

3.1.  Actual status

Currently the OGC Reference Model (ORM) architecture is used as the basis for OGC Standards work (OGC 08-062r7). The ORM is defined using Reference Model for Open Distributed Processing (RM-ODP) which is an international standard for architecting open, distributed processing systems. Recent advances in various technologies are causing a self-reflection on the way geospatial systems architecture can be adapted for the next generation of geospatial systems and for the generation-after-next. Below in Clause 3.2, is a short but not exhaustive list of the reference architectures that are in use and can be considered within the scope of this Engineering Report.

3.2.  Examples of architectures in use

The OGC RM has the following purposes (OGC 03-040):

  • provides a foundation for coordination and understanding (both internal and external to OGC) of ongoing OGC activities and the OGC Technical Baseline;

  • update/Replacement of parts of the 1998 OpenGIS Guide (https://www.ogc.org/standards/orm/);

  • describes the OGC requirements baseline for geospatial interoperability;

  • describes the OGC architecture framework through a series of non-overlapping viewpoints: including existing and future elements; and

  • regularizes the development of domain-specific interoperability architectures by providing examples.

DGIWG Geospatial Reference Architecture (DGRA)

The DGRA defines a set of standards, implementation guides, and industry practices which together form an ideal framework for achieving geospatial interoperability in a Defense context (DGIWG 933). The DGRA is particularly relevant to this engineering report because of its application in the Defense community and its use of ISO/IEC 10746 1-3 “Information Technology — Open Distributed Processing — Reference Model” (RM-ODP).

Open Distributed Processing — Reference Model (RM-ODP)

The RM-ODP defines a model that portrays a reference architecture from the following viewpoints.

  • Enterprise: Defines the purpose, scope and policies of the system.

  • Information: Describes the semantics of information used within the system, e.g., Vector, Imagery, Metadata, Portrayal, and their relevant standards.

  • Computational: Describes the systems individual interfaces, e.g., the standards and the operations they use for each function.

  • Engineering: Describes the system components, their relationship functions and standards.

  • Technology: Describes the technology choices available to realize systems in terms of their compliance to specifications described in other viewpoints.

Geospatial Interoperability reference architecture

Figure 1 — actual working brainstorming architecture (logical workflow)

OGC is in the process of addressing the need for a modernized reference architecture through its work on OGC API Standards. The definition of the concept of OGC API ‘Building block’, is still not standardized. To be more precise it is defined in the OGC Technical Commettee Policies and Procedures 7.2.1 where it is stated “There is no firm definition for the content or scope of a building block, but the building block must fulfill a function that can operate in the larger context of an implementation,…​”. As properly referred in that document the implementation part that is connected with security of data and services by normative reference (last but not least EU Digital Act) impose a redefinition of the architecture. The main disadvantage is that – as with the existing OGC Web Services Standards – security is not directly integrated into the API. As such, security needs to be added as part of the implementation or managed externally. Actual implementations of OGC API Standards are based on OGC building blocks (some of which are not yet standards) and there is no security capability yet for this concept. Where a component is specific to a given OGC API Standard, this is not an issue. However, this can result in definitions that are not unique. Because the building blocks currently do not use agreed cross API semantics, definitions, and concepts, the interfaces defined and in use are likely be developed on an ad hoc basis, resulting in stove-piped solutions that may not be fully interoperable.

Figure 2 — architecture deployment example

Current architectures — and in particular for geospatial systems — exhibit a strong dependency on data types (meaning metadata describing different encodings) that could be reduced as much as possible to enable real interoperable processes and services. The managing of new data formats and their standardization requires effort that some organizations cannot afford. Today the situation is that from the moment a new data format is available (produced from the market or developed) up to the moment it is standardized (OGC, ISO…​) the delay is considerable and does not allow an agile approach. Even though this development is asynchronous to software development and an OGC API implementation instance may use the encoding once standardized, there is the risk of not being interoperable.

The drawback of this approach is that the creation of connected services (consider the draft OGC API — Connected Systems (https://ogcapi.ogc.org/connectedsystems/)) or persistent monitoring capability is not straight forward. The building blocks approach enables modularity but requires basic elements such as Data Centric Security (DCS) in their development and life cycle. Therefore, DCS should be considered as an element for actual architectures wherever applicable and mainly for the next and generation after next. Many OGC encodings and API Standards do not specify security constraints. Perhaps it is impossible because there are too many use cases to be considered? This implies difficulty in establishing data trust and provenance to data/services/processes of a single operator and implies ‘mission impossible’ when trying to build workflows with secured services as in a federated setup.

Service Orchestration and Automation Platforms (SOAP)

Considering Service Orchestration and Automation Platforms (SOAP), Gartner says: “…​SOAPs enable I&O leaders to design and implement business services. These platforms combine workflow orchestration, workload automation and resource provisioning across an organization’s hybrid digital infrastructure. Increasingly, they are central to an organization’s ability to deploy workloads and to optimize deployments as a part of cost and availability initiatives. SOAPs provide a unified administration console and an orchestration engine to manage workloads and data pipelines and to enable event-driven application workflows. Most tools expose APIs enabling scheduling batch processes, monitoring task statuses and alerting users when new events are triggered that can be integrated into DevOps pipelines to increase delivery velocity. SOAPs expand the role of traditional workload automation by adapting to use cases that deliver and extend into data pipelines, cloud-native infrastructure and application architectures. These tools complement and integrate with DevOps toolchains to provide customer-focused agility and cost savings, operational efficiency and process standardization.” (source: https://www.gartner.com/reviews/vendor/storidge).

The Open Group Architecture framework (TOGAF)

A very detailed Architecture document is “The Open Group Architecture Framework” (TOGAF). The proposal is to have the following definition for a Building Block, which was defined by the Testbed-19 participants. There are two main aspects to consider, Architecture building block (see definition below) and solution building block (see below) that could better fit the actual OGC API concept and leave room for IPT adaptation.

3.3.  Building blocks

3.3.1.  Official OGC definition of a Building Block

Clause 7.2.1 of the OGC Technical Committee’s Policies and Procedures (OGC 05-020r29) defines a Standard Building Block as follows:

“Many OGC Standards are structured with modular sets of requirements (or requirement classes) that collectively function as a reusable building block. There is no firm definition for the content or scope of a building block, but the building block must fulfill a function that can operate in the larger context of an implementation, including combination with other OGC building blocks to create novel implementations.

Building blocks developed for one Standard can be reused in another Standard. To facilitate such reuse, a Standard constructed of building blocks shall identify each building block and publish a definition of the building block to OGC’s Registries and web resources. The definition will be in the form most suitable for the type of building block (e.g., Open API for a Standardized API), reference the owning Standard, and be adequately documented to be used in reference.

OGC Standards that reuse building blocks from other Standards must include in the Normative References a reference to the owning Standard of the building block(s) and a direct reference to the registered building block(s) content. In this fashion, implementers of the Standard reusing these building blocks need to only access specific parts (the building blocks) of the referenced Standard, not the entire document.”

3.3.2.  Generic characteristics of a building block

This engineering report therefore identifies the generic characteristics of a building block as follows:

  • a package of functionality defined to meet the business needs across an organization;

  • has published interfaces to access the functionality;

  • may interoperate with other, inter-dependent building blocks;

  • considers implementation and usage, and evolves to exploit technology and standards;

  • may be assembled from other building blocks;

  • may be a subassembly of other building blocks;

  • ideally is re-usable, replaceable, and well-specified; and

  • may have multiple implementations but with different inter-dependent building blocks.

A building block is therefore simply a package of functionality defined to meet specific business needs. The way in which functionality, products, and custom developments are assembled into building blocks will vary widely between individual architectures. Every organization must decide for itself what arrangement of building blocks works best for their use cases. A good choice of building blocks can lead to improvements in legacy system integration, interoperability, and flexibility in the creation of new systems and applications. Systems are built from collections of building blocks, so most building blocks must interoperate with other building blocks. Wherever that is true, it is important that the interfaces to a building block are published and reasonably stable and persistent. Building blocks can be defined at various levels of detail, depending on what stage of architecture development has been reached. For instance, at an early stage, a building block can simply consist of a grouping of functionality such as a customer database and some retrieval tools. Building blocks at this functional level of definition are described in TOGAF as Architecture Building Blocks (ABBs). Later, real products or specific custom developments replace these simple definitions of functionality, and the building blocks are then described as Solution Building Blocks (SBBs).

3.4.  Architecture and Solution Building Blocks (TOGAF definition)

The following content has been copied from the TOGAF specification [20] and represents a possible definition of what a building block could be at the architecture level and at solution level. This definition could help in maintaining the actual implementations and opening the door to the introduction of the IPT concept.

3.4.1.  Architecture Building Blocks

Architecture Building Blocks (ABBs) relate to the Architecture Continuum (https://pubs.opengroup.org/architecture/togaf8-doc/arch/chap18.html#tag_19_01), and are defined or selected as a result of the application of the Architecture Development Method (ADM). The ADM is a generic method for architecture development, which has been designed to deal with most system and organizational requirements.

Characteristics

The following are characteristics of Architecture Building Blocks:

  • define what functionality will be implemented;

  • capture business and technical requirements;

  • are technology aware; and

  • direct and guide the development of SBBs.

Specification Content

ABB specifications include the following as a minimum:

  • fundamental functionality and attributes: semantic, unambiguous, including security capability and manageability;

  • interfaces: selected, supplied (APIs, data formats, protocols, hardware interfaces, standards);

  • dependent building blocks with required functionality and named user interfaces; and

  • mapped to business/organizational entities and policies.

3.4.2.  Solution Building Blocks

Solution Building Blocks (SBBs) relate to the Solutions Continuum (https://pubs.opengroup.org/architecture/togaf8-doc/arch/chap18.html#tag_19_02), and may be either procured or developed.

Figure 3 — architecture vision (The Open Group Architecture Framework)

Characteristics

The following are characteristics of Solution Building Blocks:

  • define what products and components will implement the functionality;

  • define the implementation of the building block;

  • fulfill business requirements; and

  • are product or vendor-aware.

Specification Content (example not to be considered all applicable for OGC context)

SBB specifications include the following, as a minimum:

  • specific functionality and attributes;

  • interfaces: the implemented set;

  • required SBBs used with required functionality and names of the interfaces used;

  • mapping from the SBBs to the IT topology and operational policies;

  • specifications of attributes shared across the environment (not to be confused with functionality) such as security, manageability, localizability, and scalability;

  • performance and configurability;

  • design drivers and constraints, including the physical architecture; and

  • relationships between SBBs and ABBs.

3.5.  Comparison based on the above elements

Having analyzed the status of the various architectures available, the following can be assumed.

  1. The current situation based on a mix of commercial and open source solutions (partially implementing OGC and other standards) do not allow easy chaining of services and processes. With complex systems it is even more difficult, if using dynamic solutions, to enable certain flexibility if adapting systems dynamically is required.

  2. When ensuring security aspects such as identity+integrity, provenance and trust (Data Centric Security) should be considered as one of the pillars to build future architectures. Currently — as demonstrated in previous OGC Testbeds — there is compatibility but not real implementation.

  3. Derived from the previous points regarding trust on the data, it can be said that trust=identity+integrity+provenance. As such an IPT-enabled system can be based on identity and provenance.

  4. The use of data to train models as used in some Artificial Intelligence workflows is not currently defined in OGC Standards and should follow IPT criteria (unless provenance in OGC Training ML). One recent exception is OGC TrainingDML-AI Standard that specifies the requirement for detailed metadata for formalizing the information model of training data. This includes, but is not limited to, the following aspects: how the training data is prepared, such as provenance or quality, etc.

So there is a need to start defining a standardization for building blocks and possible implementations of OGC API Standards that support an IPT based Data Centric Security approach.

4.  Next Generation Architecture

The Reference Architecture chapter outlined the limitations of the current architecture. Basically, the need to chain data and services is limited by the data encoding dependency while service interfaces are strongly related to the data type (features, maps, etc.). Basically, it is not feature type agnostic. Moreover, data and services, if chained, must be trusted and identifiable. Otherwise, the results cannot be trusted and so any results have limited trustworthiness.

The significance of not having identifiable provenance for data and services is that it does not allow for agile connection between services and processes. This is because identification, trustiness, and provenance are fundamental elements for enabling trustworthy chaining of services. Identity and trustiness of data and services are also a key factor for developing and deploying Machine Learning models and algorithms that are not only accurate, but also explainable, FAIR, privacy-preserving, causal, robust, and trustworthy. To establish a viable definition of the next generation architecture, the following points should at least be considered:

The following is derived from prototyping-focused projects conducted by EU SatCen and are provided here only as a reference.

4.1.  Federated Agile Collaborating Trusted Systems (FACTS)

In today’s infrastructures, the collection, exchange, and continuous processing of geospatial data takes place at pre-defined network endpoints of a spatial data infrastructure. Each participating operator hosts predefined static functionality at a network endpoint: Some operator network endpoints may provide data access, other endpoints provide processing functionality, and other endpoints uploading capabilities. In other words, such an infrastructure is not agile in the sense that it cannot adapt by itself to meet more real time needs. One of the biggest challenges resulting from these static characteristics is ensuring effective and efficient operations of the overall system and at the same time maintaining trust and provenance.

This chapter outlines novel concepts for establishing federated agile infrastructure of collaborative trusted systems (FACTS) that is capable of acting autonomously for ensuring fit-for-purpose cooperation across the entire system. One of the key objectives is, for example, that a data product is not made available, but instead a collaborative object is offered leveraging FACTS that supports retrieval of the data product via well-defined interfaces and functions provided by the collaborative object.

Trust and assurance are two key aspects when operating a network of collaborating objects leveraging STANAG 4774/4778.

The agile aspect is achieved by the object’s ability to activate, deactivate, and order well-defined capabilities from other objects. These capabilities are encapsulated in building blocks. Each building block is well-defined in terms of accessibility, functionality, and ordering options. This allows building blocks to roam around collaborative objects as needed to ensure a well-balanced network load and processing power of individual nodes from the network.

Equally trusted partners in the infrastructure participate in FACTS. They are capable of collecting data from other partners and creating derived products via collaborating objects. The sharing of data products is not directly possible. It is only possible via the objects. This guarantees that fundamental trust operations applied to the data and provenance records are produced before the data product is made available to others. The use of the Blockchain technology and Smart contracts is one example of how this fundamental behavior can be planted into collaborative objects. As in trusted networks that are using EAL approved hardware and software components, the objects will have to undergo a similar assurance process.

Building blocks define capabilities that can be activated, de-activated, or ordered from other objects in the FACTS network. Even though the actual capabilities of a building block are subject to configuration (based on need), there are fundamental APIs that a building block must support. Considering that collaborative objects get distributed and executed via Kubernetes (https://kubernetes.io/) or Helm ( https://helm.sh/), for example, there is a key requirement of trust. One approach to manage the fundamental trust in FACTS can be achieved via The Update Framework (TUF) (https://theupdateframework.io/), which is called content trust in the Docker environment.

Enhancing FACTS as an existing network of collaborative objects with additional capabilities such as knowledge generation from Artificial Intelligence and Ontologies, the provenance for and trust of training data must be considered. Without applying trust and provenance to AI training data, the best algorithm is useless,, or even dangerous if trained with fraudulent data.

With FACTS, each participating entity can make available data or processing capabilities as it meets their quality and security requirements. The ability to use data and processing capabilities throughout FACTS strengthen the common capabilities because trustworthy can be chained on the fly. Additional capabilities, available via a Licensing Building Block, may support expressing re-use conditions.

Data products may become available to many participating entities. What is shared and under which re-use conditions are at the discretion of the creating entity. Anything that is shared must have re-use conditions to ensure proper and legitimate uptake. For example, leveraging the Creative Commons licensing framework allows an entity to waive all rights but also allows expressing concrete re-use conditions ranging from simple attribution, non-commercial, or preventing deriving own work conditions.

The use of FACTS can be compared with middleware that ensures integrity, provenance, and trust of geospatial data products as created and distributed by any organization. For example, data products encoded as GMLJP2, GeoPDF, geodb, and GeoTiff would be made available via collaborative objects and thereby benefit from collaboration in the entire network. By using FACTS, other EU member states could create derived work from the products generated. The fundamental trust capabilities of the collaborative objects ensure that any modification to a generated product can be detected and that the provenance capability enables tracing of the lineage of derived products towards the original source. For ensuring the acceptance and interoperability of an agile reference architecture, built on top of FACTS with collaborative objects and building blocks, standardization is a key aspect. In particular, the core (fundamental) requirements for FACTS as well as the interfaces and capabilities of the collaborative objects and pluggable building blocks should be standardized.

Projects running in Europe like the European Blockchain Services Infrastructure (EBSI) (https://ec.europa.eu/digital-building-blocks/wikis/display/EBSI/Home), Data Act (https://www.europarl.europa.eu/doceo/document/TA-9-2023-0069_EN.pdf), create the baseline to start revising the Architecture of Geospatial (and other) systems based at least on Security and, if possible, making the architecture more flexible. OGC provides a consensus based collaborative standardization environment that fits very well in this vision and could propose such concepts to try to find a way forward towards a proper discussion across the geospatial community.

4.2.  Requirements for next generation architecture

This section presents a possible system architecture that can answer the above use case and reflects the requirements for a next generation architecture. The architecture should support possible interaction between APIs (services) and data independently of pre-existing systems.

Figure 4 — possible reference architecture

Requirement 1

Statement

The new architecture should be, as much as possible, independent of data encodings. The encoding could just offer a file that uses Key Value Pairs (KVP) to present metadata that describes all the relevant information in a way that is similar to the DGIWG Metadata Foundation (DMF).

Requirement 2

Statement

Building blocks should be as service-agnostic as possible (without specialization according to data types such as maps, features, and coverages)

Requirement 3

Statement

All source of information should at least be defined in terms of trust and provenance to support the proper assessment of validity through a digital signature as shown below:

  1. signature

  2. signature value e.g,

<gmljp2:extension>
    <ds:Signature xmlns:ds="http://www.w3.org/2000/09/xmldsig#">
      <ds:SignedInfo>
        <ds:CanonicalizationMethod Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n-20010315"/>
        <ds:SignatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#rsa-sha1"/>
        <ds:Reference URI="">
          <ds:Transforms>
            <ds:Transform Algorithm="http://www.w3.org/2000/09/xmldsig#enveloped-signature"/>
            <ds:Transform Algorithm="http://www.w3.org/TR/1999/REC-xpath-19991116">
              <ds:XPath xmlns:ds="http://www.w3.org/2000/09/xmldsig#" xmlns:err="http://www.w3.org/2005/xqt-errors" xmlns:fn="http://www.w3.org/2005/xpath-functions" xmlns:gml="http://www.opengis.net/gml/3.2" xmlns:gmlcov="http://www.opengis.net/gmlcov/1.0" xmlns:gmljp2="http://www.opengis.net/gmljp2/2.0" xmlns:math="http://www.w3.org/2005/xpath-functions/math" xmlns:swe="http://www.opengis.net/swe/2.0" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">/gmljp2:GMLJP2CoverageCollection/gmlcov:metadata[1]/gmlcov:Extension[1]/gmljp2:eopMetadata[1]</ds:XPath>
            </ds:Transform>
          </ds:Transforms>
          <ds:DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/>
          <ds:DigestValue>PozvauWPsaua10zZ0cfnw4cTJu4=</ds:DigestValue>
        </ds:Reference>
      </ds:SignedInfo>
      <ds:SignatureValue>
        SQhuJ0FQzHPh4I0VTgUdtvdNc9TREL7q2WyZb5FLby0XNPFZ6h9r/ZiukgUyrryGpLBqyOvGprE pv4+cvrurbZcUik7Z4BoN2hxNs9T35P92sMjf9BGiCy5dgxSho9sIL29Hf0u9b6rfoQAj03NC7FJ/rR1EGAN6T5AMK4bBT/iG/fNWfZKC9DimNwCLvezj3sryodrrl+D0RfOrU7mL7d7IMsV75g5uklz/kilosBaQbkek6R+UINP8bY+yv1SD+Imyii+xO17TU9FPRh9puEwLraauDm7RePPwZ4n5kdu2l5yg+/b1kRZAMbZIHWBbYslbMoEz21keRVjXeHjA==
      </ds:SignatureValue>
    </ds:Signature>
  </gmljp2:extension

Requirement 4

Statement

All building blocks should support at least the same approach of the simple management of trust and provenance. Building blocks should implement a meta description enabling automatic activation on specific data sources (See OGC 20-089r1, OGC Best Practice for Earth Observation Application Package, for an example).

Requirement 5

Statement

Data should be discoverable and queryable depending on the IPT and releasibility/accessibility.

Requirement 6

Statement

A schemaless Data Model would be needed. An example would be the Open Street Map (OSM) dataset mapped to a new data dictionary with an Excel spreadsheet and then schema recovered automatically. Note that SAFE FME (Http://www.safe.com) and Hootenanny are examples of data conflation tools that can facilitate automated and semi-automated conflation of critical Foundation GEOINT features in the topographic domain. In short, it merges multiple maps into a single seamless map.

Requirement 7

Statement

Context AI tools and other processes following extraction guidelines used in the implementation of the processes themselves. These guidelines provide valuable information. As an example, the definition of a beach: on a shore, the area on which the waves break and over which shore debris (for example: sand, shingle, and/or pebbles) accumulate. These definitions could enable the use of RDF/Turtle.

Requirement 8

Statement

Trust and provenance validation could be implemented following a decentralized approach with criteria starting from a majority vote of the various nodes up to 100% requirement. But this can work also in the configuration of networks in Denied, Degraded, Intermittent, or Limited Bandwidth (DDIL) environments ensuring local IPT, and further checking once connected to a node of the system.

Requirement 9

Statement

Building blocks should enable streaming of information. Signaling of information available in a stream to other building blocks should be active. This could happen in an asynchronous or synchronous way and/or in accumulation mode. In this context a building block could be considered as “Data as a Service”, for example.

Requirement 10

Statement

Data should be discoverable and queryable depending on the IPT and releasability/accessibility.

Considering the above requirements it could be assumed that Identity (+ integrity) Provenance and Trust (IPT) layering is a required action to the next ARA, because the Architecture should at least manage IPT tuple. The IPT management is not affecting the open nature (if any) of the data and service but establish a minimal liability for the usage and further processing, e.g., usage of machine learning algorithm with information with no provenance and not trusted source.

The above requirements leads to the following issues.

  1. Managing of different data types and services in distributed environments

  2. File systems limitations with new virtualized environment (e.g., docker container, etc..)

  3. Data files and streaming

  4. Cloud/edge/local configuration.

Figure 6 — next generation architecture

4.3.  Building block definition — further considerations

In the TOGAF specification, “a Building Block is a package of functionality defined to meet business needs across an organization”. There is a type corresponding to a TOGAF metamodel. In this context, a building block (BB) is a “thing” (e.g., company, server, etc.) with well-defined interfaces, boundaries, and specifications to enable reusability. Moreover, BBs can be classified into Architectural and Solution BBs (technology/vendor aware). The first drives the development of the second. One or more building blocks can be integrated into existing/novel web applications. Each building block represents a testable interface component.

The BB definition should support interaction between data and services and provide at least the following parameters (dimensions, locations, domain, range values, and types (null and interpolation, etc.)) after checking the availability of information related to the BB via functions. The Data container should be provided with a “description” enabling BBs to interact via a set of functions and determine possible workflows. This could also be achieved with algorithms that could be integrating part of the orchestrator (i.e., orchestration is the coordination and management of multiple computer systems, applications, and/or services, stringing together multiple tasks in order to execute a larger workflow or process (http://databrticks.com)).

Considering experience acquired in the implementation of the OGC Web Processing Service (WPS) and Web Coverage Processing Service (WCPS) Standards, the following can be stated.

  1. WPS supports any kind of geoprocessing, whereas WCPS focuses on coverage processing.

  2. WPS consists only of a low-level framework for procedural invocation, whereas WCPS gives a high-level, concrete, and concise service specification.

  3. WPS specifies static services, whereas WCPS provides the flexibility of dynamic ad-hoc query formulation. In other words, a WPS extension requires client and server-side programming, whereas with WCPS this means composing a new string on the client side, without any changes to the server.

  4. WCPS supports phrasing of analytically expressible algorithms. WPS, on the other hand, by definition is Turing complete; As experience shows, WCPS offers a high potential for automatic chaining and optimization. WPS typically requires manual server-side intervention, such as code tuning in supercomputing centers.

Therefore, mechanisms for automation already exist and it can be assumed that they could be integrated with AI/ML. If there is an orchestrator (maybe referring to a register for the list of available building blocks), then there is already a possible scenario for the next architecture. The above can only be implemented leaving the flexibility to be automatically updated as stated before, otherwise the result is a static approach. This orchestrator can be considered as another API that is able to integrate different BBs.

To properly reflect the dynamic approach, an event driven mechanism (see event driven architecture Pub/Sub) should be considered. However, this compounds the fact that more and more streaming of information and algorithms can create active information signaling in a streaming environment.

This new approach redefines the way information is shared between data and services. It can be assumed that, first, an interaction space is required where data and building blocks can interact.

The building block, which could be defined using yaml, should include the following.

  1. Metadata (it could be DGIWG Metadata File)

  2. Specification (ogc-api reference)

  3. Configuration (possible data values or streaming to be handled, etc.).

The above can be managed by an admission webhook (e.g., a listener waiting for a new BB to be published, a registry service) that is validating and registering the Building Block.

4.4.  Interaction space

The interaction space could be a distributed object storage system, software defined, and should operate with disconnected or limited connection capability.

The above elements should provide identity management, encryption, and possibly distribution.

The basic idea is to use W3C Decentralized Identifiers v1.0 (e.g., https://www.w3.org/TR/did-core/). As an example, the Hyperledger Indy (https://www.hyperledger.org/projects/indy) provides tools, libraries, and reusable components for providing digital identities rooted on blockchains or other distributed ledgers so that they are interoperable across administrative domains, applications, and any other silo. Indy is interoperable with other blockchains or can be used standalone powering the decentralization of identity.

As an example, Hyperledger Fabric (https://www.hyperledger.org/projects/fabric) (for managing provenance) is intended as a foundation for developing applications or solutions with a modular architecture. Hyperledger Fabric allows components, such as consensus and membership services, to be plug-and-play. Fabric’s modular and versatile design satisfies a broad range of industry use cases. It offers a unique approach to consensus that enables performance at scale while preserving privacy. Using Hyperledger, the yaml file previously described could be substituted with a Smart Certificate.

While information security nowadays represents a core concern for any organization, Trust Management is usually less elaborated and is only important when two or more organizations cooperate towards a common objective.

For example, the overall Once-Only Principle Project (TOOP) (https://toop.eu/architecture) relies on the concept of trusted sources of information and on the existence of a secure exchange channel between the Data Providers and the Data Consumers in this interaction framework. Trust and information security are two cross-cutting concerns of paramount importance. These two concerns are overlapping, but not identical and they span all of the interoperability layers, from the legal down to the technical, passing through organizational and semantic layers.

While information security aims at the preservation of integrity, confidentiality, and availability of information, the establishment of trust guarantees that the origin and the destination of the data and documents are trustworthy (trustworthiness) and authentic (authenticity), and that data and documents are secured against any modification by untrusted parties (integrity).

Keeping in mind that the above are just examples and it would be interesting see different implementations based on other concepts and tools.

4.5.  Way ahead

The next-generation architecture should be based on Data Centric Security (DCS). The DCS concept is implemented through the adoption of the Identity Provenance and Trust concept (IPT). The concept can be improved with Integrity having an I2PT. The ecosystem that is proposed, which will be based on W3C Decentralized Identifiers (DID) and e.g., Hyperledger (https://www.hyperledger.org/projects/indy), will enable both data and building blocks. The entire architecture could be based on the concept of Kubernetes K8s volume abstraction concept (https://kubernetes.io/docs/concepts/storage/volumes/) but can be any other space where data and deployed APIs interact together. For data, W3C DID will be adopted to create identity and other elements for provenance and trust in order to compose a Smart Certificate. This is an active Certificate and can be chained with any other data or BB through an orchestrator or registry. Building blocks (such as those of OGC API Standards) have to be compatible with the IPT ecosystem. This is easier because by modifying the OGC Compliance & Interoperability Testing Environment (CITE) it will be possible to certify BB to be compatible with such an ecosystem, and so through a Smart Contract they can be registered in the main registry /orchestrator. In this way both data and BB can be chained to perform specific operations to be proposed to users or to other services.

The ecosystem that provides agile processing is based on Smart Contracts that run on a Distributed Ledger, such as Hyperledger Fabric. The conditions in a Smart Contract enforce that the provenance of data processing gets recorded on the distributes ledgers (Fabric nodes). As such, the Hyperledger Fabric is concerned with making processing results transparent by capturing provenance. The use of DID (W3C Recommendation) and VA (Verifiable Attestations) is the essential part for establishing the integrity of assets, e.g., data products, metadata records, etc., basically, anything that can be hashed. The issuing of DIDs for users and the recording of immutable “Smart Certificates” a.k.a. verifiable attestations + some business logic, can be implemented using, e.g., Hyperledger Indy. The combination of Hyperledger Indy and Fabric builds the complete ecosystem to support agile IPT.

To clarify the basic architecture elements required to implement the next generation architecture, below are implementation examples. The idea for using a Kubernetes persistent volume example is just to show actual capabilities and consider in the future where any space of interaction could be possible.

Figure 7 — w3c did sample architecture