Publication Date: 2021-01-13

Approval Date: 2020-12-14

Submission Date: 2020-11-16

Reference number of this document: OGC 20-019r1

Reference URL for this document: http://www.opengis.net/doc/PER/t16-D010

Category: OGC Public Engineering Report

Editor: Jeff Yutzler

Title: OGC Testbed-16: GeoPackage Engineering Report


OGC Public Engineering Report

COPYRIGHT

Copyright © 2021 Open Geospatial Consortium. To obtain additional rights of use, visit http://www.opengeospatial.org/

WARNING

This document is not an OGC Standard. This document is an OGC Public Engineering Report created as a deliverable in an OGC Interoperability Initiative and is not an official position of the OGC membership. It is distributed for review and comment. It is subject to change without notice and may not be referred to as an OGC Standard. Further, any OGC Public Engineering Report should not be referenced as required or mandatory technology in procurements. However, the discussions in this document could very well lead to the definition of an OGC Standard.

LICENSE AGREEMENT

Permission is hereby granted by the Open Geospatial Consortium, ("Licensor"), free of charge and subject to the terms set forth below, to any person obtaining a copy of this Intellectual Property and any associated documentation, to deal in the Intellectual Property without restriction (except as set forth below), including without limitation the rights to implement, use, copy, modify, merge, publish, distribute, and/or sublicense copies of the Intellectual Property, and to permit persons to whom the Intellectual Property is furnished to do so, provided that all copyright notices on the intellectual property are retained intact and that each person to whom the Intellectual Property is furnished agrees to the terms of this Agreement.

If you modify the Intellectual Property, all copies of the modified Intellectual Property must include, in addition to the above copyright notice, a notice that the Intellectual Property includes modifications that have not been approved or adopted by LICENSOR.

THIS LICENSE IS A COPYRIGHT LICENSE ONLY, AND DOES NOT CONVEY ANY RIGHTS UNDER ANY PATENTS THAT MAY BE IN FORCE ANYWHERE IN THE WORLD. THE INTELLECTUAL PROPERTY IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NONINFRINGEMENT OF THIRD PARTY RIGHTS. THE COPYRIGHT HOLDER OR HOLDERS INCLUDED IN THIS NOTICE DO NOT WARRANT THAT THE FUNCTIONS CONTAINED IN THE INTELLECTUAL PROPERTY WILL MEET YOUR REQUIREMENTS OR THAT THE OPERATION OF THE INTELLECTUAL PROPERTY WILL BE UNINTERRUPTED OR ERROR FREE. ANY USE OF THE INTELLECTUAL PROPERTY SHALL BE MADE ENTIRELY AT THE USER’S OWN RISK. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR ANY CONTRIBUTOR OF INTELLECTUAL PROPERTY RIGHTS TO THE INTELLECTUAL PROPERTY BE LIABLE FOR ANY CLAIM, OR ANY DIRECT, SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM ANY ALLEGED INFRINGEMENT OR ANY LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR UNDER ANY OTHER LEGAL THEORY, ARISING OUT OF OR IN CONNECTION WITH THE IMPLEMENTATION, USE, COMMERCIALIZATION OR PERFORMANCE OF THIS INTELLECTUAL PROPERTY.

This license is effective until terminated. You may terminate it at any time by destroying the Intellectual Property together with all copies in any form. The license will also terminate if you fail to comply with any term or condition of this Agreement. Except as provided in the following sentence, no such termination of this license shall require the termination of any third party end-user sublicense to the Intellectual Property which is in force as of the date of notice of such termination. In addition, should the Intellectual Property, or the operation of the Intellectual Property, infringe, or in LICENSOR’s sole opinion be likely to infringe, any patent, copyright, trademark or other right of a third party, you agree that LICENSOR, in its sole discretion, may terminate this license without any compensation or liability to you, your licensees or any other party. You agree upon termination of any kind to destroy or cause to be destroyed the Intellectual Property together with all copies in any form, whether held by you or by any third party.

Except as contained in this notice, the name of LICENSOR or of any other holder of a copyright in all or part of the Intellectual Property shall not be used in advertising or otherwise to promote the sale, use or other dealings in this Intellectual Property without prior written authorization of LICENSOR or such copyright holder. LICENSOR is and shall at all times be the sole entity that may authorize you or any third party to use certification marks, trademarks or other special designations to indicate compliance with any LICENSOR standards or specifications.

This Agreement is governed by the laws of the Commonwealth of Massachusetts. The application to this Agreement of the United Nations Convention on Contracts for the International Sale of Goods is hereby expressly excluded. In the event any provision of this Agreement shall be deemed unenforceable, void or invalid, such provision shall be modified so as to make it valid and enforceable, and as so modified the entire Agreement shall remain in full force and effect. No decision, action or inaction by LICENSOR shall be construed to be a waiver of any rights or remedies available to it.

None of the Intellectual Property or underlying information or technology may be downloaded or otherwise exported or reexported in violation of U.S. export laws and regulations. In addition, you are responsible for complying with any local laws in your jurisdiction which may impact your right to import, export or use the Intellectual Property, and you represent that you have complied with any regulations or registration procedures required by applicable law to make this license enforceable.

Table of Contents

1. Introduction

1.1. Executive Summary

The OGC GeoPackage Standard has grown substantially in popularity in the Geospatial community. The GeoPackage Encoding Standard was originally developed to provide an open, standards-based platform for transferring and using geospatial information which is platform-independent, portable, self-describing, and compact. In the Testbed-16 GeoPackage activity, the participants developed ways to:

  • Improve the interoperability of GeoPackages through better metadata, and

  • Improve the performance of extremely large GeoPackages so that the format itself is no longer the limit on the size of datasets that can be distributed.

Previous work suggested three specific GeoPackage limitations:

  1. The ability to discover what geospatial content is in a GeoPackage so that a client can assess the type of data contained in a GeoPackage and to determine how it may be processed effectively,

  2. Lack of a standard ability to share portrayal information (styles and symbols) via GeoPackage, and

  3. Poor performance when loading and processing very large GeoPackage vector datasets in client software.

In Testbed-16, participants researched ways to mitigate these limitations, particularly in the context of the Ordnance Survey (OS) MasterMap Topography datasets. The Testbed activity also made use of OS Open Zoomstack, a smaller, freely available, multi-scale dataset. To address the first two limitations, Testbed participants developed GeoPackage metadata profiles designed to advance the discoverability of the contents of a GeoPackage and exchange the OS portrayal styles and symbols. The metadata proved to be interoperable between the server and client implementation.

To address the third limitation, Testbed participants developed a profile of GeoPackage suitable for OS MasterMap data and performed controlled experiments on GeoPackages that used different techniques designed to improve performance. The profile was designed to improve performance by reducing overall GeoPackage size, segment data into smaller GeoPackages, and optimize the ordering of the feature identifiers. OS MasterMap data proved to be an ideal test case due to its size and complexity. The participants demonstrated that large amounts of feature data could be distributed via the GeoPackage Encoding Standard in a manner that maximizes the efficiency of data access.

Based on the results of these tests, the participants recommend that these techniques be used by other GeoPackage-supporting software systems so that these techniques emerge as interoperable solutions. Standardization of these approaches may make it easier for other systems to be able to implement these new capabilities. As a result, the participants proposed a number of change requests and community extensions that are candidates for standardization as part of the GeoPackage ecosystem.

1.2. Document contributor contact points

All questions regarding this document should be directed to the editor or the contributors:

Table 1. Contacts
Name Organization Role

Jeff Yutzler

Image Matters

Editor

Andrea Aime

GeoSolutions

Contributor

Adam Parsons

Compusult

Contributor

1.3. Foreword

Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. The Open Geospatial Consortium shall not be held responsible for identifying any or all such patent rights.

Recipients of this document are requested to submit, with their comments, notification of any relevant patent claims or other intellectual property rights of which they may be aware that might be infringed by any implementation of the standard set forth in this document, and to provide supporting documentation.

2. References

The following normative documents are referenced in this document.

Note

Only normative standards are referenced here, e.g. OGC, ISO or other SDO standards. All other references are listed in the bibliography.

3. Terms and definitions

For the purposes of this report, the definitions specified in Clause 4 of the OWS Common Implementation Standard OGC 06-121r9 shall apply. In addition, the following terms and definitions apply.

● controlled vocabulary

Controlled vocabularies provide a way to organize knowledge for subsequent retrieval and use. They are used in subject indexing schemes, subject headings, thesauri,[1][2] taxonomies and other forms of knowledge organization systems. Controlled vocabulary schemes mandate the use of predefined, authorized terms that have been preselected by the designers of the schemes, in contrast to natural language vocabularies, which have no such restriction. The use of controlled vocabularies in standards such as CDB can significantly increase interoperability and consistent understanding of the semantics. Controlled vocabularies typically are managed through formal processes and official governance.

● coordinate reference system

coordinate system that is related to the real world by a datum term name (source: ISO 19111)

● enumeration

In computer programming, an enumerated type (also called enumeration) is a data type consisting of a set of named values called elements, members, enumeral, or enumerators of the type. The enumerator names are usually identifiers that behave as constants in the language. Similarly, in a database enumerated (enum) types are data types that comprise a static, ordered set of values. They are equivalent to the enum types supported in a number of programming languages. An example of an enum type might be the days of the week, or a set of status values for a piece of data.

● GeoJSON

a geospatial data interchange format based on JSON (source: RFC 7946)

● glob

a pattern set for wildcard characters

● multiplicity

an indication of how many objects may participate in the given relationship, or the allowable number of instances of the element

● stylable layer set

a collection of styles designed to be used within the same domain

3.1. Abbreviated terms

  • COP Common Operational Picture

  • DDIL Denied Disconnected Intermittent Limited

  • GEMINI GEo-spatial Metadata INteroperability Initiative

  • GML Geography Markup Language

  • GPKG GeoPackage

  • JSON JavaScript Object Notation

  • OS Ordnance Survey

  • OWS OGC Web Services

  • PNG Portable Network Graphics

  • SLD Stylable Layer Descriptor

  • SVG Scalable Vector Graphics

  • UK United Kingdom

  • UML Unified Modeling Language

  • VTP2 OGC Vector Tiles Pilot, Phase 2

  • WFS Web Feature Service

  • WPS Web Processing Service

  • XML eXtensible Markup Language

3.2. Conventions

This Engineering Report uses a modified form of Unified Modeling Language (UML) to describe GeoPackage tables. Tables are represented as UML classes with the T label and columns are represented as UML attributes. The conventions used are as follows:

diag a14a91ca5f465f72f670a7ef05ecdd83
Figure 1. A green circle indicates a mandatory column.
diag bfd75325918a46dc9fe358fd66378e43
Figure 2. A solid arrow indicates a foreign key relationship, with the arrow label indicating the name of the referencing column in the dependent table.
diag c07587797956520c975516be131f8d17
Figure 3. A dashed arrow indicates a table_name relationship, with the arrow indicating the name of the referencing column in the referencing table.
diag e4d6bb3efcd6417911ee2a46d9b695c6
Figure 4. Non-explicit references between tables are indicated with a dashed arrow with a box at its base.

4. Overview

Section 5 describes the operational scenario used to test and demonstrate the work of this testbed activity.

Section 6 describes the efforts to encode metadata in a GeoPackage using draft GeoPackage metadata profiles.

Section 7 describes the efforts to improve GeoPackage performance when loading, accessing, and rendering very large vector datasets.

Section 8 describes the implementations produced by each of the participants.

Section 9 describes feedback, recommendations, and future work.

Annex A provides performance measurements and other statistics.

Annex B provides draft GeoPackage Extensions, including the metadata profiles.

Annex C provides example metadata documents.

Annex D provides schemas including database schemas and JSON schemas for metadata documents.

Annex E contains the revision history.

Annex F contains the bibliography.

5. Scenario

In Testbed 16, the data (see Datasets) was loaded into a GeoServer instance provided by GeoSolutions. GeoSolutions provided a WPS server that exported valid GeoPackages according to the various extensions evaluated in the GeoPackage Testbed 16 activity. Arrays present in the source data were converted to strings encoded as JSON arrays. The ensuing GeoPackages were opened by the Compusult GeoPackage client.

GeoPackages were created with several types of metadata:

This metadata increased the utility of the resulting GeoPackages, as demonstrated through the Compusult GeoPackage client.

In addition to metadata, the GeoPackages were created using a number of techniques to improve GeoPackage performance.

  1. Reduce overall dataset size by replacing text strings with more compact enumerations.

  2. Segment the data into smaller GeoPackages so that the file for each segment was not too big.

  3. Improve the ordering of the feature data so that it could be accessed more quickly.

Note

In addition, an approach for generalizing data was also researched. While generalization was not relevant for MasterMap data, the process led to significant performance gains for Zoomstack, in some cases as much as a 300X improvement in read throughput.

GeoSolutions captured metrics on how GeoPackage size was affected by the GeoPackage extensions that were used in creation. GeoSolutions also conducted performance measurement of client read speed when the ordering strategy was used. Compusult conducted performance measurement of client read speed when the segmentation, ordering, and generalization strategies were used. While benchmarking is difficult due to the inconsistency of disk speeds and access time, both GeoSolutions and Compusult were able to measure dramatic improvements in client read performance in their tests.

5.1. Datasets

The Testbed 16 activities used two separate data sets provided by Ordnance Survey.

5.1.1. OS MasterMap Topography

OS MasterMap topography is "the most detailed and accurate view of Great Britain’s landscape – from roads to fields, to buildings and trees, fences, paths and more." This dataset is delivered as a large set of GML files. The data can be used with a set of four freely available style sets also available as SLD. The MasterMap topography is to be displayed only at high scales (1:4000 and above). However, this dataset has a very significant size: 50GB as compressed GML files, 300GB as imported in PostgreSQL, and 200GB as an encoded GeoPackage (after optimizations). See MasterMap Data for more details. This dataset was ideal for tuning write and read performance, as well as experimenting with different content layouts.

5.1.2. OS Zoomstack

OS Zoomstack is a single GeoPackage providing "a single, customisable map of Great Britain to be used at national and local levels." Quoting from the documentation, "OS Open Zoomstack is a comprehensive vector basemap covering Great Britain at a national level, right down to street-level detail." This GeoPackage can be used with a set of six freely available style sets encoded, among other formats, as SLD.

The Zoomstack data set, around 11GB in size (see Space used by table, sorted from larger to smaller tables for details), provided a multi-scale map. Due to the small size of this dataset, the data was suitable for quick tests. As evident in Column level size details, geometries are the major component of space utilization in this dataset, while potentially enumerated fields (type when present) use an amount of space that is one or two orders of magnitude smaller.

In Testbed-16, participants used Zoomstack in scenarios where the huge size of the MasterMap data was cumbersome. Unlike the MasterMap dataset, ZoomStack is a true multi-zoom level dataset, visible from the country level and down to the road level, although not as detailed as the former. The data is also a natural fit for the generalized tables extension.

6. Metadata

Being able to discover what geospatial content is contained in a GeoPackage enables a developer to quickly assess the type of data contained in a GeoPackage and to determine how to best process the content. There is currently no agreement on the meaning and significance of metadata in GeoPackage or how that metadata should be used to serve any particular purpose. Without having to access the row entries in GeoPackage tables, manually opening a GeoPackage provides no way of recognizing if the file has any particular type of associated metadata.

6.1. The Metadata Extension

As described in the GeoPackage Getting Started Guide and illustrated in Figure 5, the Metadata Extension is enabled by adding two rows into gpkg_extensions:

c+m
Figure 5. The GeoPackage Metadata Extension

6.2. Semantic Annotations

Semantic annotations provide a way to represent the meaning of any business object (layer, feature, tile, style, etc.) through a resolvable URI. [1] In Testbed-16, semantic annotations were implemented in GeoPackage through the draft Semantic Annotations Extension. Semantic annotations may be placed on virtually any row in the GeoPackage.

6.3. Metadata Profiles

"Proposed GeoPackage Enhancements" cite:[Yutzler2019] proposed the concept of Metadata Profiles to meet this documented GeoPackage requirement. Profile creation consists of two parts:

  • Creating a new GeoPackage extension that defines a new “scope” (i.e., the gpkg_extensions.scope column) of "metadata".

  • Creating an extension for each metadata profile that describes the meaning and significance of a particular type of metadata.

Once the Metadata Extension is enabled, individual metadata profiles can be activated with additional rows in gpkg_extensions with a scope of "metadata". The Testbed-16 participants investigated implementing metadata profiles for OS MasterMap feature data. However, metadata profiles are also suitable for many other raster and imagery datasets. As described in the following subsections, participants tested a number of these profiles during Testbed-16 .

6.4. Common Operational Pictures

One of the use cases identified in the OGC OWS Context GeoJSON Encoding Standard is the exchange of a set of resources as a Common Operational Picture (COP). In Testbed-16, COPs were encoded as metadata in the GeoPackage as per GeoPackage Common Operational Picture Metadata Profile. The design for this capability is illustrated in Figure 6.

c+m owc1
Figure 6. The Common Operational Picture Metadata Profile

Since the OWS Context standard predates GeoPackage, OWS Context does not support referencing of GeoPackage layers. In Testbed-16, the participants extended OWS Context using the ideas first presented in "GeoPackage / OWS Context Harmonization Discussion Paper" cite:[Yutzler2018]. OWS Context GeoPackage Extension describes the extension used in Testbed-16 to extend OWS Context to represent GeoPackage contents.

In addition, semantic annotations were used to separate OWS Contexts representing COPs from OWS Contexts used for other purposes.

  • In gpkgext_semantic_annotations, a type of "im_metadata_cop_owc_geojson" was used and the title, description, and uri values were populated by the GeoPackage Producer.

  • In gpkgext_sa_reference, a row was added referencing the appropriate row in gpkg_metadata.

6.5. Dataset Details

UK GEMINI (GEo-spatial Metadata INteroperability Initiative) is a specification for a set of metadata elements that describe geospatial data resources. ISO 19115:2003 compliant UK GEMINI discovery level metadata is provided for the OS MasterMap data and can be found on the GIgateway®.

The following is a detailed description of the metadata elements that are provided on the GIgateway:

  • Title: The title of the product.

  • Abstract: The abstract gives a brief description of the product.

  • Currency: The currency takes the form of date of last update for the feature.

  • Lineage: The lineage metadata takes the form of product specification name and date of product specification.

  • Spatial extent: The spatial extent is supplied in the form of geographic identifiers (for example, England, Scotland, and Wales) and in the form of geographic coordinates.

  • Spatial reference system: The spatial reference system for all products takes the form of a British National Grid system, namely OSGB36®.

  • Data format: Data format takes the form of the name of the format or formats the product is supplied in.

  • Frequency of updates: Frequency of update takes the form of a stated period of time.

  • Distributor contact details: Distributor contact details include with postal address, phone number, fax number, email address and website.

  • Data originator: Given as the company having primary responsibility for the intellectual content of the data source; in all cases this will be Ordnance Survey.

  • Other metadata available includes keywords, start date of data capture, access constraints, use constraints, level of spatial data, supply media and presentation details.

Since GEMINI metadata is available for OS MasterMap, adding that metadata to the GeoPackage was a natural choice. GeoPackage Gemini Metadata Profile describes how GEMINI metadata documents may be added to a GeoPackage as metadata. As illustrated in Figure 7, one row is added to gpkg_metadata for each GEMINI document and one row is added to gpkg_metadata_reference for each GEMINI document – feature table combination.

c+m
Figure 7. The GeoPackage Metadata Extension for GEMINI Metadata

6.6. Dataset Provenance

In Testbed-16, GeoPackages were produced via a Web Processing Service (WPS) instance.

A WPS Execute request describes all of the relevant parameters, including which datasets are to be loaded into the GeoPackage. In core GeoPackage, there is no direct link between contents and their source. However the relationship may be described through metadata. This information may be useful when evaluating GeoPackage contents for fitness of purpose. This information could potentially enable in-field updates, such as when the source data is updated on a regular schedule.

There are a number of candidate encodings for the metadata including OWS Context. The OGC has adopted two encoding standards for OWS Context: Atom and GeoJSON. The Testbed-16 participants selected the GeoJSON encoding for this project because GeoJSON is easier for modern client applications to parse.

Note

"GeoPackage / OWS Context Harmonization Discussion Paper" cite:[Yutzler2018] proposed a relational encoding for OWS Context suitable for GeoPackage, but this approach stalled during Testbed-15.

The OWS Context GeoJSON Encoding specifies a number of offering types including WPS, WFS, and GML. In this metadata profile, an OWS Context document was populated with multiple resources:

  • A WPS resource indicating the WPS request that led to the creation of this GeoPackage

  • A WFS resource indicating the WFS instance that served the data after it was imported into GeoServer

6.6.1. Hierarchical Metadata

The GeoPackage Metadata Extension supports hierarchical metadata. Through hierarchical metadata, a GeoPackage client can identify metadata that pertains to the whole GeoPackage file and metadata that is relevant to a particular layer (e.g., feature table). In the Testbed-16 experiment, the metadata elements for the whole GeoPackage file included the WPS request that led to the creation of the GeoPackage. Further, the metadata elements for individual layers included descriptions of the WFS resources that served the data were included. Due to the hierarchical nature of the metadata, when needed a GeoPackage client can navigate from a the layer metadata to the metadata for the whole GeoPackage.

Note

Since a WFS (or other data resource) serving GeoPackage data might not be available online, the layer level dataset provenance metadata should be considered optional.

In the OWS Context GeoJSON encoding, the context is represented by a GeoJSON FeatureCollection and individual resources (layers) are represented by GeoJSON Feature instances. In this metadata profile, the FeatureCollection is inserted into gpkg_metadata as a metadata document, but the GeoJSON Feature instances referring to the feature data (not the WPS instance) are removed. The Feature instances are subsequently inserted into gpkg_metadata as their own entries.

As illustrated in Figure 8, gpkg_metadata_reference is populated as follows:

  • One row with an md_file_id of the parent metadata document reference_scope of "geopackage".

  • One row for each GeoPackage contents table and resource Feature instance, with an md_file_id of that Feature instance, a parent_id of the parent metadata document, and a reference_scope of "table".

c+m owc2
Figure 8. The Dataset Provenance Metadata Profile

6.6.2. Schema and Examples

6.6.3. Semantic Annotations

In addition, semantic annotations were used to separate OWS Contexts representing dataset provenance from OWS Contexts used for other purposes.

  • In gpkgext_semantic_annotations, a type of "im_metadata_dp_owc_geojson" was used and the title, description, and uri values were populated by the GeoPackage Producer.

  • In gpkgext_sa_reference, a row was added referencing the appropriate row in gpkg_metadata.

6.7. Portrayal

OS provides portrayal information for MasterMap in a GitHub repository. This repository contains styling rules using the OGC Styled Layer Descriptor (SLD) format and symbol information in both the Scalable Vector Graphics (SVG) and Portable Network Graphics (PNG) formats.

GeoPackage does not have core support for portrayal information, but recent efforts including the Vector Tiles Pilot, Phase 2 have led to a Portrayal Community Extension. Figure 9 illustrates how portrayal information can be added to GeoPackage through this extension.

diag 826ed2ab8a9f117e52acb84d04955157
Figure 9. The GeoPackage Portrayal Extension

6.7.1. Styles

The Portrayal Extension defines two tables for style information, gpkgext_styles and gpkgext_stylesheets. While this table structure allows a style to have multiple encodings, this feature was not used in Testbed-16.

These tables were populated as follows:

gpkgext_styles

  • id primary key

  • style a text name for the file

  • description null

  • uri the URL of the file landing page on GitHub

gpkgext_stylesheets

  • id primary key

  • style_id gpkgext_styles.id

  • format application/vnd.ogc.sld+xml;version=1.0

  • stylesheet the actual file

6.7.2. Symbols

The Portrayal Extension defines three tables for symbol information: gpkgext_symbols, gpkgext_symbol_images, and gpkgext_symbol_content. While this table structure supports multiple encodings for each style and encoding of multiple styles into a single file as sprites, neither of these approaches was used in Testbed-16.

These tables were populated as follows:

gpkgext_symbols

  • id primary key

  • symbol a text name for the file

  • description null

  • uri the URL of the file landing page on GitHub

gpkgext_symbol_images

  • id primary key

  • symbol_id gpkgext_symbols.id

  • content_id gpkgext_symbol_content.id

  • others null

gpkgext_symbol_content

  • id primary key

  • format "image/svg+xml"

  • content the actual file

  • uri the URL of the file on GitHub

The Portrayal Extension does not define an explicit coupling between layers and styles. (Coupling, in this context, refers to association of styles with layers.) Without any additional design elements, coupling is completely the responsibility of the user and/or client application as illustrated in Figure 10. In many scenarios, having the GeoPackage explicitly declare the coupling between layers and styles simplifies operations for the GeoPackage client operator. In Testbed-16, this coupling was done using semantic annotations.

f+p
Figure 10. GeoPackage Portrayal without Coupling to Layers

As illustrated in Figure 11, semantic annotations can be applied to features table and a style (a row in gpkgext_styles). This links the table and the style together.

c+sa
Figure 11. Semantic Annotations for OS MasterMap portrayal

In Testbed-16, semantic annotations were established for styles in a GeoPackage through the following:

  1. Ensure required tables are present as per the Semantic Annotations Extension:

    • gpkgext_sa_reference

    • gpkgext_semantic_annotations

  2. Populate gpkg_extensions with references to all tables mentioned above.

  3. Add a row to gpkgext_semantic_annotations for every style’s semantic annotation

    • type: Style

    • title: gpkgext_styles.style

    • description: gpkgext_styles.description

    • uri: gpkgext_styles.uri

  4. Add a row to gpkgext_sa_reference for every row that must be annotated.

    • The style in gpkgext_styles

    • Every table in gpkg_contents that is allowed to be used with that style (key_column_name and key_column_value are null).

6.7.4. Stylable Layer Sets

There are a number of scenarios where multiple styles or stylesheets are designed to be used together. In the Testbed-16 scenario, there are four stylesheets (standard, backdrop, light, outdoor) for each of the six feature types. Since a client user may wish to change the style for each layer at once, there needs to be a mechanism to aggregate these styles together. In Testbed-16, this was done through stylable layer sets. As developed in the OGC Vector Tiles Pilot, Phase 2 (VTP2), a stylable layer is created through a semantic annotation and both the styles and layers are tagged with that annotation.

In Testbed-16, semantic annotations were established for stylable layer sets in a GeoPackage through the following:

  1. Ensure required tables are present as per the Semantic Annotations Extension:

    • gpkgext_sa_reference

    • gpkgext_semantic_annotations

  2. Populate gpkg_extensions with references to all tables mentioned above

  3. Add a row to gpkgext_semantic_annotations for every style’s semantic annotation

    • type: StylableLayerSet

    • title: "backdrop", "light", "outdoor", or "standard"

    • description: null

    • uri: like a gpkgext_styles.uri, but with the filename replaced by /stylableLayerSet/[title]

  4. Add a row to gpkgext_sa_reference for every row that must be annotated

    • Every style in gpkgext_styles that is associated with that stylable layer set

    • Every table in gpkg_contents that is associated with that stylable layer set (key_column_name and key_column_value are null)

7. Large Vector Datasets

7.1. Baseline

During Testbed-16, the participants had access to the OS MasterMap Topography dataset. The dataset is licensed and may not be redistributed with permission. The download of the OS MasterMap Topography dataset is comprised of 50GB of gzipped GML files, allocated to 10 folders. The XML schemas for the GML files can be found here.

The data is meant to be rendered using the SLD version provided at the Topography Layer stylesheets. Each feature type is associated with 4 different variations of the styles, as illustrated by the following figures.

Table 2. Examples of Topography Layer Stylesheets. Contains OS data © Crown Copyright and database right 2020.

Backdrop 1 Backdrop

Light 1 Light

Outdoor 1 Outdoor

Standard 1 Standard

The styles are structurally similar. However, they each use different symbology for lines, points and polygon. In particular, the styles all have scale dependencies activating symbolization between presentation scales of 1:1 to 1:4000. No data are rendered at lower scales, making the portrayed map quite similar to a printed product. This is due to the very small range of visible scales.

7.2. Importing the Data

The GML content can be imported into GeoPackage using any of a variety of tools. In particular, there are tools specifically dedicated to the transformation of the GML datasets into a PostgreSQL database, with PostGIS extensions:

Both tools import directly into PostGIS and generate the same database structure, but the Lutra translator proved to be significantly faster. The database schemas are described in Database Schemas. Attributes with multiplicity higher than one have been translated into PostgreSQL array types.

7.3. Exporting the Data

For the most part, exporting the data was straightforward. However, a significant portion of the source data is in the form of arrays of either enumerations or date strings. In particular, the MasterMap arrays contain two types of data:

  • Dates, expressed as "YYYY-MM-DD".

  • Enumerated values, which can be expressed as compact enumerations (integers).

The GML loaders turn these two types into PostgreSQL arrays. Since SQLite databases do not support array column types, some translations were required to flatten arrays into JSON-encoded strings.

The Testbed participants chose to accomplish this translation by leveraging SQLite’s limited support for JSON through the JSON1 extension. These JSON arrays were used to hold either enumerations or date strings. The enumerations were as described above. The date strings were described by an appropriate glob [1-2][0-9][0-9][0-9]-[0-1][0-9]-[0-3][0-9]. A SQLite GLOB operator is used to match only text values against a pattern using wildcards. The following two examples show a JSON representations of arrays:

  • "changedate" ["2000-11-10", "2002-08-09", "2003-05-20", "2003-05-20", "2017-03-01", "2018-09-14"]

  • "reasonsforchange" (enumerated) [2, 0, 3, 4, 2, 2]

Storing the arrays as JSON arrays comes with a few benefits:

  • The JSON format is well-known and widely supported.

  • If the extension is present in the SQLite build, the contents of the arrays can be queried via SQL.

  • The overhead of storing JSON as a string is limited to the additional element identifiers {}[] and delimiter ,.

Using the "application/json" content type plus an additional constraint representing either the enumeration (for the array contents) or a glob identifying the column as a date container, the JSON columns are exposed as such through a modified form of the schema extension. Through relaxing GeoPackage requirements 103 and 106 of the existing Schema Extension, participants were able express the fact that arrays were encoded in this manner. Requirement 103 describes the columns for gpkg_data_columns table. The revised schema allows for the use of the media type "application/json" to be used for JSON arrays. Requirement 106 describes how constraints are defined for columns. The revised requirement allows constraints to apply to each element of a JSON array when a JSON array is declared as described above. The modified Schema Extension is presented in an annex.

7.4. Reducing GeoPackage Size

GeoPackage read and write speed appears to decay with GeoPackage size. Since there are a number of factors that affect SQLite performance. including disk speed and record proximity, defining an exact formula for speed is difficult. Anecdotal information suggests something like O(logN) for reads and O(n log n) for building the spatial index. While SQLite is reasonably fast for datasets of moderate size, performance does begin to become a problem when datasets are in the tens of gigabytes or more. This issue is evident when working with MasterMap, which is over 200GB. The full MasterMap GeoPackage file, as generated by Lutra Translator II, from GML to PostGIS, and then down to GeoPackage via ogr2ogr, is 235GB in size. One way to improve GeoPackage performance is to reduce the size of the file.

Note

The GeoPackage, as generated by OGR, is already significantly smaller than the PostgreSQL tables. This is due to the SQLite database design. In PostgreSQL records contents are aligned for fast access to specific columns. For example, a 32 bit integer always uses 4 bytes, no matter what the actual number is. On the other hand, SQLite is designed to use the minimum possible space, so numbers are stored in the exact space they need. For example, 1 is stored as a single byte, while 1024 requires two bytes of storage.

7.4.1. Enumerations

Many attributes from the source OS data originally stored as strings can be expressed as enumerations (code lists). The GeoPackage schema extension allows encoding these enumerations as integers or single characters. Their descriptions are found in the gpkg_data_column_constraints table. Following are some examples of how GeoPackage data is encoded without and with enumerations.

Table 3. Feature properties without and with enumerations
Without Enumerations With Enumerations

["Administrative Boundaries"]

[1]

2.5m

1

["Modified", "Attributes", "New", "Position", "Modified", "Modified"]

[2, 0, 3, 4, 2, 2]

["Political Or Administrative"]

[17]

["Parish"]

[164]

Boundary

0

Parish Boundary

0

The complete set of enumerations used can be found in Enumerations. Using this approach reduced the MasterMap GeoPackage size from 245GB to 206GB, an improvement of roughly 20%.

7.4.2. Segmentation

MasterMap is meant to be displayed at scale down to 1:4000, but not lower. As a result, when displaying maps, most of the time is spent locating the tiny portion of data required. The grid split could then be leveraged as a way to reduce data access. This provides an index GeoPackage that simply spatially indexes the single GeoPackages in the grid, while also providing all the styling and metadata information required to present the full dataset. Under these conditions, rendering a map should require opening and processing the spatial indexes over a small number of files. In particular, the common case will be to read data from a single GeoPackage, while the worst case will be to open at most four, for any conceivable map display at 1:4000.

In response, the large MasterMap Topography dataset was split into smaller GeoPackages as described in Segmentation. These GeoPackages are available in the GeoServer osmm workspace, which is password protected (please ask for credentials). Each GeoPackage generates 6 layers in the GeoServer services. For example, in TQ.gpkg we have:

osmm:tq_boundaryline
osmm:tq_cartographicsymbol
osmm:tq_cartographictext
osmm:tq_topographicarea
osmm:tq_topographicline
osmm:tq_topographicpoint

This amounts to a grand total of 336 layers. There are also layer groups for the TQ zone, one for each possible styling of the six layers, for a grand total of 230 layer groups, for example:

tq_backdrop
tq_light
tq_outdoor
tq_standard

To take advantage of this segmentation, Testbed-16 participants produced the GeoPackage Index extension which supports an Index GeoPackage containing an index table. A client of the GeoPackage Index extension should be able to find all the metadata information about a feature entry in the main GeoPackage, including attribute information, geometry, schema, proper metadata, styles, and the overall bounding box of the layer. With that information, clients are able to locate which files contain records for a given feature entry and bounding box. If a particular geometry spans multiple GeoPackages, it is encoded in all of them. A key column named fid was used to identify the duplicates.

gpkg index shared feature
Figure 12. Encoding a feature across multiple segments

Each feature entry should have separate information for the following reasons:

  • Not all feature entries span the full set of sub-packages

  • Feature entries do not necessarily share the same coordinate reference system.

The idea is borrowed from [MapServer’s ogrtindex tool](https://mapserver.org/optimization/tileindex.html).

7.4.3. Dates

Dates in GeoPackage are expressed as ISO-8601 strings. The OS MasterMap database contains a number of dates, making them significant from the point of use of space usage. Participants considered two candidate approaches for reducing the storage space required for these data.

  1. Express dates as integer Unix epochs. Then a database view could be set up to expand the integer to the expected ISO-8601 string through the SRTFTIME function. However, the current 10-byte date representation is not significantly greater than 8-byte long integers. While a more compact representation optimized for dates is possible, this approach would require an extension to transform the compact representation back to the ISO-8601 string expected by clients. This would potentially limit, or make more complex, the ability to perform queries on those fields. The space considerations would have been different had the dates been timestamps.

  2. Expressing dates as enumerations. Since each date value is potentially used by many entries in the database, each date could be turned into an enumeration using the ISO-8601 form as the description of the value. However, this would have to be tested to determine whether the cost of the join negates the benefits of the enumeration.

Participants ultimately chose not to pursue either approach and to accept the status quo.

7.4.4. Geometry Encoding

As illustrated by the Zoomstack column sizes, geometries generated the largest columns. The following alternatives were considered and discarded:

  • FlatGeoBuf This approach is not better in terms of size and is probably slightly worse due to the alignment constraint)

  • TinyWKB This approach reduces the accuracy of the data.

  • TopoJSON This approach requires reorganizing not just the geometry, but entire feature records.

  • MVT This approach reduces the accuracy of the data.

7.5. Controlling records layout via sorting

When accessing a spatial subset of the full GeoPackage dataset, SQLite performs two operations as part of an SQL query:

  • Traverse the R-Tree to locate all geometries intersecting the target bounding box.

  • Fetch from disk the relevant records.

These operations are faster if the the number of disk read operations is reduced. The participants were able to accomplish this by sorting the data using a GeoHash and inserting the features in the sorted order. The following images borrowed from the postgis.net workshop illustrate the concept:

Table 4. Spatially uncorrelated vs correlated information.

clustering3

Un-balanced R-tree

clustering4

Balanced R-Tree

Table 5. Retrieving features from disk

clustering1

Correlated features scattered on the disk

clustering2

Correlated features stored close on disk

The images above might give the impression that proper correlation is important primarily for spinning disks. While that is certainly more important on spinning disk, proper co-location has benefits also for Solid State Disks (SSD). This is due to the minimum unit of access to the disk (page), typically 4 to 8 kilobytes worth of data. A single page can host a number of features, especially if the features are small. If features spatially correlated are also close on the disk, there is a higher chance that they fit into the same page, or a small number of pages, reducing the amount of physical IO required. This also plays a role in the Operating System (OS) file system caches, which can then host a larger number of relevant features given the same amount of physical RAM.

7.6. Generalization

GeoPackage performance degrades as the size of the relevant tables increases. To mitigate this decrease in performance, T-16 participants developed a mechanism for generalizing the largest tables for use at larger scales. (This approach would work for ZoomStack data but not for MasterMap because MasterMap data is only designed for use at 1:4000 scale.) The generalizes tables contain fewer features and simpler geometries than the base tables. While adding generalized tables does increase the overall GeoPackage size, the incremental size increase is minor, approximately a few percent. Through this technique, read performance was improved by a factor of 2-8X for data at a scale of 1:40,000 and as much as 300X for data at a scale of 1:200,000. See Generalization for how this was implemented.

8. Implementations

8.1. GeoSolutions

GeoSolutions used the GeoTools library and GeoServer software application. The GeoTools library gt-gpkg module provided the basic library to read and write GeoPackages as well as support for the official GeoPackage spatial indexing extension. The GeoServer GeoPackage WPS community module (gs-gpkg) provided the ability to generate GeoPackages from layers available in GeoServer, with detailed control on what to include in the resulting GeoPackage.

gs layer list
Figure 13. GeoServer layers. Each of them can be included in a GeoPackage via the WPS GeoPackage process.

Both the gt-gpkg and gs-gpkg modules were significantly improved during Testbed 16. GeoSolutions donated these improvements to the respective communities. The improvements started appearing in GeoTools 24 and GeoServer 2.18, released September 2020, and are fully included into GeoTools 25 and GeoServer 2.19, to be released March 2021.

8.1.1. Initial status

The GeoTools gt-gpkg module was able to read and write both vector and raster GeoPackages and had already received performance tuning for reading vector data. The only supported extension was the R-Tree spatial index.

The GeoServer gs-gpkg community module provided GeoPackage output formats for the WMS, WFS and WPS endpoints, in particular:

  • The WMS output format produced GeoPackages with tiled maps.

  • The WFS output format produced GeoPackages with the requested feature entries (one or more depending on the GetFeature request),

  • The WPS request allowed the generation of complete GeoPackages with both raster and vector contents. In particular, it supported the following:

    • Enumerating the GeoServer layers included in the output GeoPackage.

    • For each layer, control of:

      • The inclusion of raster map tiles versus raw vector data.

      • The bounding box spatially filtering the data.

      • Input query (filter and property names).

      • Reprojection of the output.

      • Optional generation of a R-Tree spatial index.

8.1.2. Importing MasterMap topography

For the Testbed 16 activity, OS MasterMap topography was delivered as a set of 1694 gzip-compressed GML files with a volume of over 50GB. The first challenge was to translate these files into a GeoPackage. There are two open source tools dedicated to the import of MasterMap GML files:

  • Astun’s Loader: A set of Python scripts orchestrating the GDAL/OGR libraries and converting them to a format of choice.

  • Lutra Consulting’s OS Translator II: A QGIS plugin optimized for speed.

The import was performed using OS Translator II to load the files into PostGIS. Then a dedicated tool was written to generate GeoPackages from the PostGIS content. Rationale:

  • OS Translator II focuses on speed thus allowing a full data transfer into PostGIS in a matter of hours. The Astun Loader tool would have taken days.

  • Various MapsterMap Topography attributes are multi-valued. PostgreSQL supports a native array type to host the attributes. GeoPackage does not have an equivalent support (an approach to represent arrays in GeoPackage has been developed during the Testbed).

  • The translation between PostGIS and GeoPackage provided a good test harness for experimenting with data transfer speed improvements at different scales, starting from a small portion of the data and up to the entire data sets.

8.1.3. Write performance enhancement

A major portion of the GeoSolutions activity involved writing large GeoPackages, ranging from a few GB for a ZoomStack subset to the 200GB required to store the entire MasterMap topography dataset in a single GeoPackage. A number of improvements to the GeoPackage writing code allowed for speeding up the WPS export of GeoPackages, making the export process up to 4 times faster compared to the original code. The improvements were:

  • Upgrade to the latest version of the SQLite JDBC driver.

  • Use of prepared statements and statement batching during records insertion. In particular, given the large size of the transfer, the batch size was set to ten thousands statements.[2].

  • Specifically setup the SQLite database for one-off generation, including enforcing exclusive access, turning off the transaction journal as well as synchronous writes. Further information can be found exploring the SQLite pragmas.

  • Some export operations involved reading from a GeoPackage data source as well. In these cases the data source was set to read-only and memory mapping enabled, to speed up the reading side.

8.1.4. GeoPackage extension framework and implementation

As part of the GeoTools core functionality, the GeoPackage module initially supported only the R-Tree spatial indexing extension. The Testbed required the implementation of a variety of extensions, some well known, as part of the core GeoPackage standard. In addition, new extensions were developed during the Testbed to experiment with new functionality. This work required a new extension point in the GeoTools library that would allow for a variety of extensions to be implemented while allowing them to be plugged in. This kept the core module small.

Thus, a new base class GeoPkgExtension was added to the GeoPackage module, providing the basic functionality shared by all extensions registered in the GeoPackage. These include:

  • Providing the extension identifier, definition, scope.

  • Checking if the extension is registered in the current GeoPackage.

  • Listing which tables/columns are associated with the extension.

The GeoPackage class can look up extensions using the extension identifier. The extensions can be registered as plugins using the Java Service Provider Interface. This allowed placing the extensions in different module based on their state of evolution:

  • The schema extension and the metadata extension, part of the core GeoPackage standard, were placed directly in the supported gt-gpkg module.

  • The portrayal, semantic annotation and generalized tables extensions were placed in the unsupported gs-gpkg module, placing no backward compability requirement on them, leaving the door open for future breaking changes.

Ultimately the gs-gpkg module is the sole user of all the above extensions, accessing and using them to generate the GeoPackages as required.

8.1.5. Metadata generation

The metadata and semantic annotation extension support was used to include the following information in the GeoPackages:

  • Any metadata linked from GeoServer layers.

  • A list of layers, with styles, as a OGC Web Services Context (OWC) document, allowing the client to re-construct a usable map.

  • The request that generated the GeoPackage, thus providing provenance and ancestry information.

GeoServer layers cannot contain metadata by themselves. However, they can link to existing online metadata.

gs layer metadata
Figure 14. GeoServer layer metadata links

When a layer links to metadata, the GeoPackage generator downloads the link contents, and places them in the metadata table, making it available for offline consumption. The entry is then associated with the pertinent layer using the metadata reference.

In a similar way, the WPS request gets recorded, stored in the metadata table, and associated to the entire GeoPackage as "Dataset provenance" via semantic annotation. In order to allow update of the GeoPackage contents, one metadata entry is also associated to each layer, providing back-links to a WFS service offering the layer.

The portrayal extension is used to store relevant styles and their resources (e.g. icons) in the GeoPackage. Each style associated to the layer in the GeoServer configuration is included in the GeoPackage:

gs layer styles
Figure 15. GeoServer layer style associations

Finally, a layer might be included in a "Layer group" which is the definition of a set of layers and their stacking order and style to be used. If the layers included in the GeoPackage are part of one or more layer groups, the definition of the group is translated into a OWC operational picture document. This provides the list of layers, their stacking order, and style to be used. In the case of MasterMap Topography, this results in 4 entries in the GeoPackage: One for each of the four style set shared by Ordnance Survey.

gs layer groups
Figure 16. GeoServer layer group definitions

8.1.6. Generalized Tables Extension

When datasets are large, it is common to serve them in a multi-scale fashion, where detailed information is only available at high scales and lower scales depict the most important features. Introduced as part of this Testbed, the generalized tables extension sets up parallel tables with generalized geometries and reduced record contents. These are meant to be displayed at lower scales, providing the client with faster access path to the information needed for map rendering. When properly set up, this leads to a significant difference in read speed.

Following is an excerpt of the WPS request creating two generalized tables for the woodland layer:

Creating two generalized tables for the woodland layer
<features name="woodland" identifier="woodland">
  <description>woodland</description>
  <srs>EPSG:27700</srs>
  <featuretype>oszoom:woodland</featuretype>
  <indexed>true</indexed>
  <styles>true</styles>
  <overviews>
    <overview>
      <name>woodland_g1</name>
      <scaleDenominator>80000</scaleDenominator>
      <filter xmlns:fes="http://www.opengis.net/fes/2.0">
        <fes:Or>
          <fes:PropertyIsEqualTo>
            <fes:ValueReference>type</fes:ValueReference>
            <fes:Literal>National</fes:Literal>
          </fes:PropertyIsEqualTo>
          <fes:PropertyIsEqualTo>
            <fes:ValueReference>type</fes:ValueReference>
            <fes:Literal>Regional</fes:Literal>
          </fes:PropertyIsEqualTo>
        </fes:Or>
      </filter>
    </overview>
    <overview>
      <name>woodland_g2</name>
      <scaleDenominator>320000</scaleDenominator>
      <filter xmlns:fes="http://www.opengis.net/fes/2.0">
        <fes:PropertyIsEqualTo>
          <fes:ValueReference>type</fes:ValueReference>
          <fes:Literal>National</fes:Literal>
        </fes:PropertyIsEqualTo>
      </filter>
    </overview>
  </overviews>
</features>

The woodland_g1 table is meant to be used in place of the base woodland table starting at a scale of 1:80.000. This table only contains National and Regional woodlands. The woodland_g2 table is meant to be used starting at a scale of 1:320.000 and only contains National woodlands.

For performance reasons, each generalized table is generated from the previous one. This leverages the progressive filtering work already done including the geometry generalization. For more information refer to the generalized tables extension. Full examples of generalized tables WPS requests can be found in the appendix.

8.1.7. Controlling records layout via sorting

To implement the strategy described in Controlling records layout via sorting, the GeoPackage process was extended to sort data before insertion into SQLite, according to one or more sort keys. This is important because SQLite generates the primary key of the table based on insertion order and preserves such order in the final file layout. In particular, GeoPackages with 3 layouts were generated:

  • Using the natural order of the data as as the data are being loaded. Evaluating the data, there is already some spatial correlation, but is incomplete.

  • Sorting by the geometry field, which causes the GeoPackage process to compute a GeoHash of the geometry and sort based on that sort key.

  • Sorting over a non spatially correlated field, like a road name, or internal identifier (MasterMap TID).

The following example shows a WPS request excerpt, sorting the "topographicline" layer by geometry, in other words, by GeoHash:

Sorting topographicline by GeoHash
   <features name="topographicline" identifier="topographicline">
     <description>boundaryline</description>
     <srs>EPSG:27700</srs>
     <featuretype>osmm:topographicline</featuretype>
     <sort xmlns:fes="http://www.opengis.net/fes/2.0">
       <fes:SortProperty>
         <fes:ValueReference>wkb_geometry</fes:ValueReference>
       </fes:SortProperty>
     </sort>
     <indexed>true</indexed>
     <styles>true</styles>
   </features>

The following instead sorts the topographicline layer by fid (also known as TID in MasterMap Topography):

Sorting topographicline by an alphanumeric, non spatially correlated identifier.
   <features name="topographicline" identifier="topographicline">
     <description>boundaryline</description>
     <srs>EPSG:27700</srs>
     <featuretype>osmm:topographicline</featuretype>
     <sort xmlns:fes="http://www.opengis.net/fes/2.0">
       <fes:SortProperty>
         <fes:ValueReference>fid</fes:ValueReference>
       </fes:SortProperty>
     </sort>
     <indexed>true</indexed>
     <styles>true</styles>
   </features>
8.1.7.1. Benchmarking the GeoHash sorting

In order to provide a server side comparison between the original data order, as imported from GML, and GeoHash sorted GeoPackages, a GeoServer WMS benchmark has been run, drawing the maps from MasterMap.

The benchmarking is defined as follows:

  • Over 4000 unique random WMS requests have been created, all over the UK land, to be issued during the benchmark.

  • JMeter is used to generate increasingly high number of parallel WMS requests picked from the above list, starting with a single request, and progressing through 2, 4, 8, 16 and 64 concurrent requests.

  • Each set of concurrent requests generates statistics, such as response time statistics and throughput, measured as requests per second.

Table 6. Request area samples, starting from distribution at the country level and going to higher detail
benchmark locations low
benchmark locations mid
benchmark locations high

The benchmarks were executed under two sets of conditions:

  • "Hot benchmark", with the whole suite repeated, in sequence, several times, until the performance value settle to stable results. The system has enough memory to allow the Operating System (OS) file system cache to fully contain all the data needed, thus, no input/output (I/O) is observed during the load test.

  • "Cold benchmark", in which all the OS caches have been forcefully dropped, making GeoServer read each data bit directly from the disk. Continuous data read from disk has been verified using system utilities (iotop).

The hardware used for the tests was a desktop machine (developer workstation) with the following specifications:

  • AMD Ryzen 1700x, 8 physical cores, with hyperthreading (the OS sees 16 virtual cores). The CPU frequency governor has been forced to "performance" mode, meaning CPU core are running at their highest frequency all the time (instead of scaling down to lower frequencies when there is less load).

  • 32GB memory.

  • Internal 512GB SSD, NVME connection, Samsung 960 PRO. This disk was not used in benchmarks due to lack of free space.

  • Internal 2TB spinning disk, Seagate Barracuda, with 64MB of internal cache (used in the second benchmark).

  • External SanDisk Extreme 500GB disk, connected via a USB3 port (used in the first benchmark).

8.1.7.1.1. SSD Drive

The following table and chart report the throughput (request per second) in the various cases, when using the external SanDisk Extreme SSD drive:

Table 7. GeoServer WMS benchmark results, SSD drive
Concurrent requests Original order, "cold" Original order, "hot" GeoHash sorted, "cold" GeoHash sorted, "hot"

1

7.36

13.29

14.38

16.91

2

10.38

19.94

23.67

25.99

4

23.34

35.09

44.79

46.55

8

37.55

51.46

56.62

58.24

16

43.36

61.14

61.11

61.59

32

41.75

45.99

58.82

60.24

64

41.67

49.75

59.35

61.33

gs benchmark ssd
Figure 17. GeoServer WMS benchmark results chart, SDD drive

Observations

  • The data with original sorting is consistently slower than the one with GeoHash sorting

  • The package in original order sees a significant performance difference between cold and hot, while the GeoHash sorted has virtually no difference between them

  • Even in hot mode, the package with the original order does not manage to reach the performance of the GeoHash sorted. We speculate this could be an indication that even when fully cached, the distribute of data in the cache is less efficient, and likely causing more OS calls than the GeoHash one.

8.1.7.1.2. Hard Disk

To evaluate the importance of an efficient disk storage, a second set of benchmarks has been run against the internal Seagate spinning disk. This disk has good sequential read performance, but as most spinning disk, suffers on random access.

Running the same benchmark required hours, making it impossible to run a "hot" version of it (the cache got invalidated, as with most modern machines, there is a number of other background activities happening over the span of several hours):

Table 8. GeoServer WMS benchmark results, requests per second, spinning disk drive
Concurrent requests Original order, "cold" GeoHash sorted, "cold"

1

0.48

2.44

2

0.26

1.75

4

0.48

2.46

8

0.59

3.30

16

0.75

4.11

32

0.76

4.34

64

0.85

2.20

gs benchmark spinning
Figure 18. GeoServer WMS benchmark results chart, requests per second, on spinning disk drive.

Observations

  • The performance hit compared to the SSD is severe, even the GeoHash sorted package is an order of magnitude slower (4.3 r/s tops vs 61.5r/s tops).

  • The performance ratio between the GeoHash and the original package has been magnified, ranging between 2.5 and 4 times.

8.1.7.1.3. Maximum Response Times

It is also interesting to see a comparison of the maximum response time recorded during this spinning disk benchmark, at various load levels:

Table 9. GeoServer WMS benchmark results, maximum response time in milliseconds, spinning disk drive
Concurrent requests Original order, "cold" GeoHash sorted, "cold"

1

27102

1104

2

84874

5086

4

141793

10381

8

347901

12273

16

394209

27039

32

561954

50626

64

744749

102795

Observations As expected, the performance of the original GeoPackage proved to insufficient to make map browsing practical. However, the introduction of GeoHash ordering improved performance to the point that it is bearable even for extremely large GeoPackages.

8.1.8. GeoPackage Index extension

GeoSolutions prepared an example of a GeoPackage index and the associated GeoPackage parts. The split of the main GeoPackage has been performed along the 100KM UK national grid, depicted in the following image.

uk national grid 100km
Figure 19. UK 100KM national grid

While the squares have the same area, their contents vary significantly, from the smallest, SC.gpkg, weighting only 272KB, to the largest, TQ.gpkg, using 12GB of disk space.

A index GeoPackage has then been created, with the following characteristics:

  • All feature tables are present, and fully described, with metadata and styles, but have no contents.

  • The gpkgext_index table contains one entry per feature table, referencing a index table following the gpkgext_<main_table_name>_index naming convention.

  • The per table index contains one entry for each sub-geopackage that contains any record for that table. This results in some tables referencing all 54 GeoPackages, while for example gpkext_boundarile_index contains only 46, as some squares are not intersecting any boundaryline.

Table 10. The gpkgext_index table contents
table_name index_table_name key_column

boundaryline

gpkgext_boundaryline_index

fid

cartographicsymbol

gpkgext_cartographicsymbol_index

fid

cartographictext

gpkgext_cartographictext_index

fid

topographicarea

gpkgext_topographicarea_index

fid

topographicline

gpkgext_topographicline_index

fid

topographicpoint

gpkgext_topographicpoint_index

fid

Table 11. An excerpt from the gpkgext_boundaryline_index table
file min_x min_y max_x max_y

ND.gpkg

292879.5

915090.38

347962.13

995248.76

NW.gpkg

199749.0

553919.7

199816.1

553994.6

TQ.gpkg

499050.869

99278.28

602344.9

203093.29

TF.gpkg

498060.214

297787.03

602679.63

400834.96

NC.gpkg

207468.92

888949.2

330676.781

974571.8

NK.gpkg

398797.73

823737.98

413618.1

866884.4

8.2. Compusult

Compusult provided a client implementation for testing and demonstration of developed concepts as well as performance metric collection for this testbed activity. The desktop and mobile GeoPackage client was updated to support the new/updated GeoPackage extensions to support semantic annotations, portrayal, metadata profiles, generalizations and storage optimizations for large vector data-sets. A profiling module was also implemented to generate performance metrics to provide feedback regarding performance improvements when using table generalizations and GeoPackages sorted using the Geohash algorithm.

8.2.1. Portrayal

Building from the work done for portrayal during the Vector Tiles Pilot, the client was updated to support the updated (draft) Portrayal Extension for GeoPackage. Using the proposed Semantic Annotation Extension style references are looked up for each layer using the gpkgext_sa_reference and gpkgext_semantic_annotations tables. The client retrieves the style metadata and content from the gpkgext_styles and gpkgext_stylesheets allowing the client to view and select a style of there choosing. The selection of OS MasterMap and OS Open Zoomstack styles is illustrated in Table 12.

Table 12. Dataset Layer Styles.

mastermap layer styles OS MasterMap

zoomstack layer styles OS Open Zoomstack

The client allows a user to specify the style to use for each layer and dynamically update the map on selection. SLD style documents were provided for both OS MasterMap and OS Open Zoomstack layers. SLD documents can provide symbol content as an online resource for online scenarios or use symbol references which the client uses to retrieve symbol information using the gpkgext_symbols, gpkext_symbol_content and gpkgext_symbol_content tables. Since most supplied symbols were of the format image/svg+xml and were to drawn based on scale of the map the client caches scaled versions of the image at distinct scale steps to ensure new images were not created for each scale change without significantly degrading size approximation. A demonstration of changing the style of topographicarea for the OS MasterMap dataset is illustrated in Table 13.

Table 13. OS MasterMap Topographic Area Style Modification. Contains OS data © Crown Copyright and database right 2020.

mastermap topographicarea standard Standard