I. Abstract
Cloud Optimized GeoTIFF (COG) is a new approach in using existing standards to accelerate distribution and analysis of 2D regular grid coverage data on the web. COG combines the use of the TIFF format with data structured internally in tiles and low resolutions subfiles (also called overviews). The main subfile is georeferenced using GeoTIFF tags and the lower resolution subfiles inherit the same georeferencing. This organization allows for retrieving only the part of the data needed for presentation or analysis. This capability is possible not only in the file system but also over the web if the HTTP range header is supported by the servers.
This OGC Testbed 17 Engineering Report (ER) discusses the COG approach, describes how GeoTIFF is used for the lower resolution subfiles, and proposes a different path forward that integrates COG with the OGC Tile Matrix Set Standard (http://docs.opengeospatial.org/is/17-083r2/17-083r2.html). The ER includes a chapter that formalizes the draft COG specification with clear requirements.
One of the common use cases for COG is the provision of multispectral remote sensing data. The increase in spatial and spectral resolution combined with more accurate sensors that require more than 8 bits per pixel results in big files that can exceed the 4 Gbyte limit of the original TIFF format. Having an OGC standard formally specifying this approach would be useful. Therefore, this ER includes a chapter that formalizes a draft BigTIFF specification, defining clear requirements.
The objective is to be able to reference BigTIFF from the GeoTIFF and the COG standards.
II. Executive Summary
There is a need for new approaches to drastically accelerate visualization and analysis on the World Wide Web. Some emerging formats are now reused in a way that improves internal organization of the file. This allows for retrieving only the part of the data needed for presentation or analysis. If this organization is combined with the HTTP range header in the GET operation, clients can request parts of the data over the Web without any server side APIs or any additional web services. In the case of geospatial data in a 2D regular grid coverage model, this strategy can be implemented through the COG. COG restricts the GeoTIFF format to an internal data structured based on tiles and low resolutions overviews.
This OGC Testbed 17 Engineering Report (ER) describes how common libraries and implementations work internally, and exposes issues in the current approach. The report detects two main problems in COG:
-
The COG approach ignores the OGC Tile Matrix Set Standard (http://docs.opengeospatial.org/is/17-083r2/17-083r2.html) in the tile structure, forcing the same point of origin and extent for all overviews. In addition, it defines a overview schema that easily results in non-square pixels.
-
Formally, COG is based in GeoTIFF that is explicitly dependent on TIFF version 6 and is limited to 4-Gbyte files. This limitation is mitigated by developers by adding support to BigTIFF, an emerging de-facto standard defined by the LibTIFF community.
The ER includes a chapter that formalizes an initial draft COG specification with clear requirements and modular structure. A draft version of the BigTIFF specification adapted to the OGC Standard for Modular Specifications is also proposed. GeoTIFF should extend its support to BigTIFF. Both draft proposals have been submitted to the GeoTIFF Standards Working Group for consideration by the OGC Membership.
A similar approach has been proposed for other formats, such as Zarr and Cloud Optimized Point Cloud (COPC) (https://copc.io/). Given the success of the approach, more work needs to be done to consider extending this practice to other formats and media types such as GeoJSON or NetCDF.
III. Keywords
The following are keywords to be used by search engines and document catalogues.
ogcdoc, OGC document, COG, Cloud Optimized GeoTIFF
IV. Preface
This document provides the initial bases for a potential OGC standard for COG. Currently the COG standard is specified in https://github.com/cogeotiff/cog-spec/blob/master/spec.md that briefly describes the format as well of the GDAL implementation. This ER contains text that is independent of implementations and more comprehensive but compatible with the GDAL implementation. This draft will be transferred to the OGC GeoTIFF Standards Working Group (SWG) as a starting point for a potential OGC COG standard. The continuation of this work can be followed in the public GitHub repository https://github.com/opengeospatial/CloudOptimizedGeoTIFF. This document also provides the initial basis for a potential OGC standard for BigTIFF. A small group (participated by Andrey Kiselev, Bob Friesenhahn, Chris Cox, Dan Smith, Frank Warmerdam, Gerben Vos, John Aldridge, Joris Van Damme, Leonard Rosenthol, Lynn Quam, Marco Schmidt, Phillip Crews, Rob van den Tillaart and Thomas J. Kacvinsky, among others) from the LibTIFF community (https://lists.osgeo.org/mailman/listinfo/tiff) has defined a modification of the original TIFF format called BigTIFF that modifies some headers to allow for 64-bit internal offsets. The approach is also described here http://bigtiff.org/ and https://www.awaresystems.be/imaging/tiff/bigtiff.html. This ER contains consolidated text that is compatible with the current implementations. This draft will be transferred to the OGC GeoTIFF Standards Working Group (SWG) as a starting point for a potential OGC COG standard. The continuation of this work will can be followed in the public GitHub repository https://github.com/opengeospatial/BigTIFF.
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. The Open Geospatial Consortium shall not be held responsible for identifying any or all such patent rights.
Recipients of this document are requested to submit, with their comments, notification of any relevant patent claims or other intellectual property rights of which they may be aware that might be infringed by any implementation of the standard set forth in this document, and to provide supporting documentation.
V. Security considerations
No security considerations have been made for this document
VI. Submitting Organizations
The following organizations submitted this Document to the Open Geospatial Consortium (OGC):
- Universitat Autonoma de Barcelona (CREAF)
VII. Submitters
All questions regarding this document should be directed to the editor or the contributors:
Name | Organization | Role |
---|---|---|
Joan Masó | UAB-CREAF | Editor |
OGC Testbed-17: Cloud Optimized GeoTIFF specification Engineering Report
1. Scope
This Engineering Report represents deliverable D046 of the OGC Testbed 17 initiative performed under the OGC Innovation Program. It describes the current usage of COG and an alternative path to georeferenced COG internal multiresolution structure aligned with Tile Matrix Sets.
The Engineering Report uses the TIFF standard v6 with no modification. However, the Engineering report proposes to also use BigTIFF to support more that 4 Gbyte file sizes.
This document aims to demonstrate the business value of COG for distributing remote sensing data over the web without forcing applications to completely download big files for visualization and analysis. The result is fast performance in visualization tools that can read remote file repositories immediately, saving time and storage space. The combined use of COG and easy to use remote sensing data catalogues such as SpatioTemporal Asset Catalogs (STAC, https://stacspec.org/), simplifies finding and accessing large volumes and long series of remote sensing products.
2. Terms, definitions and abbreviated terms
This document uses the terms defined in OGC Policy Directive 49, which is based on the ISO/IEC Directives, Part 2, Rules for the structure and drafting of International Standards. In particular, the word “shall” (not “must”) is the verb form used to indicate a requirement to be strictly followed to conform to this document and OGC documents do not use the equivalent phrases in the ISO/IEC Directives, Part 2.
This document also uses terms defined in the OGC Standard for Modular specifications (OGC 08-131r3), also known as the ‘ModSpec’. The definitions of terms such as standard, specification, requirement, and conformance test are provided in the ModSpec.
For the purposes of this document, the following additional terms and definitions apply.
2.1. Terms and definitions
2.1.1. BigTIFF
a file that uses modified TIFF headers to allow for internal 64 bit offset (adding support for TIFF files larger than 4 Gbytes)
2.1.2. Cloud
an on-demand availability of computer system resources, especially data storage (cloud storage) and computing power, that are exposed in the web by cloud providers that do not require direct active management by the user
2.1.3. Coverage
feature that acts as a function to return values from its range for any direct position within its spatiotemporal domain [OGC Abstract Topic 6]
2.1.4. Cloud Optimized GeoTIFF
a regular GeoTIFF file, aimed at being hosted on a HTTP file server, with an internal organization that enables more efficient workflows on the cloud [https://www.cogeo.org/]
2.1.5. Geokey
an equivalent in function to a TIFF tag, but with a different storage mechanism defined by the GeoTIFF [GeoTIFF Format Specification 1.0]
2.1.6. GeoTIFF
Standard for storing georeference and geocoding information in a TIFF 6.0 compliant raster file. [GeoTIFF Format Specification 1.0]
2.1.7. Imagery
representation of phenomena as images produced electronically and/or optical techniques. [ISO 19101-2:2018, 3.14]
Note 1 to entry: In this document, it is assumed that the phenomena have been sensed or detected by one or more devices such as radar, cameras, photometers, and infra-red and multispectral scanners.
2.1.8. Overview
Image File Directory that contains a reduced resolution image
2.1.9. Range
a HTTP GET request type that lets clients ask for the portions of a web resource that they need
2.1.10. Raster
usually rectangular pattern of parallel scanning lines forming or corresponding to the display on a cathode ray tube [ISO 19123:2005, 4.1.30]
a continuous planar space in which pixel values are visually realized [GeoTIFF v1.0]
Note 1 to entry: A raster is a type of regular grid.
2.1.11. Regular grid
grid whose grid lines have a constant distance along each grid axis [OGC 09-146r8, Coverage Implementation Schema with Corrigendum]
2.1.12. Subfile
Image File Directory (a part of a TIFF file) that contains one raster image
2.1.13. Tag
a packet of numerical or ASCII values, which have a numerical “Tag” ID indicating their information content in a TIFF file [GeoTIFF Format Specification 1.0]
2.1.14. Tile
geometric shape with known properties that may or may not be the result of a tiling (tessellation) process. A tile consists of a single connected “piece” (topological disc) without “holes” or “lines” [OGC 19-014r3]
small rectangular representation of geographic data, often part of a set of such elements, covering a tiling scheme and sharing similar information content and graphical styling. A tile can be uniquely defined in a tile matrix by one integer index in each dimension. [OGC 17-083r3, fragment]
2.2. Abbreviated terms
COG
-
Cloud Optimized GeoTIFF
HTTP
-
Hypertext Transfer Protocol
IFD
-
Image File Directory
TIFF
-
Tagged Image File Format
3. Keywords
The following are keywords to be used by search engines and document catalogues. ogcdoc, OGC document, COG, Cloud Optimized GeoTIFF, tiles, overviews, bigTIFF, TIFF, coverage
4. Submitting organization
The following organizations submitted this Document to the Open Geospatial Consortium (OGC):
-
Universitat Autonoma de Barcelona (CREAF)
5. Introduction
A COG is a GeoTIFF file that uses Tiles and Reduced-Resolution Subfiles to organize the data for optimal retrieval of fragments at the required resolution. The goal is to have a format that can be hosted on a common HTTP web server, with an internal organization that enables more efficient workflows on the web. The use of COG better supports remote sensing imagery being stored in cloud data centers and offered as cloud services without any special configuration. Additionally, leveraging the ability of clients issuing HTTP GET range requests (IETF RFC7233) to ask for just the parts of a file they need is possible.
There are two characteristics of the TIFF format that are the key elements of the internal organization of COGs: Tiles and Reduced-Resolution Subfiles. The TIFF file is also georeferenced using GeoTIFF tags. COG can be considered a subset of what GeoTIFF and TIFF offers (see Figure 1).
Figure 1 — Diagram to describe that a Cloud Optimized GeoTIFF as a subset of a GeoTIFF, which in turn is a subset of a TIFF (modified from the original https://www.eclipse.org/community/eclipse_newsletter/2018/december/geotrellis.php)
5.1. Tiles
As shown in Figure 2, typical raster images store data row by row. To avoid having a stream of bytes that is a too long, a TIFF image is divided into strips. The client must read most of the file to get the piece of the image it is interested in.
Figure 2 — Traditional raster image storage is row by row as indicated by the green path (from https://www.element84.com/blog/cloud-optimized-geotiff-vs-the-meta-raster-format)
Instead, COG stores image data in tiles (a schema introduced in TIFF v6). With a tile matrix (commonly known as tiling in the COG community), only the tiles covering the area of interest need to be read by the client making data extraction and visualization faster (see Figure 3)
Figure 3 — COGs store images tile by tile instead of row by row (from https://www.element84.com/blog/cloud-optimized-geotiff-vs-the-meta-raster-format)
5.2. Reduced-Resolution Subfiles
Clients do not always need to show a full resolution image. Reduced-Resolution Subfiles (commonly known as overviews in the COG community) are down-sampled versions of the original image. They represent “zoomed out” versions of the image (see Figure 4).
Figure 4 — Image reduction of resolution to generate overviews (from https://www.element84.com/blog/cloud-optimized-geotiff-vs-the-meta-raster-format)
Multiple overviews can be stored in a COG file to match multiple zoom levels. Overviews are stored as tiles just like the original image and they share the same georeference. So an application that supports zooming only needs to retrieve the tiles for the overview associated with the given zoom level.
The use of tiles is not new and is used in several circumstances, as shown in Table 1, which identifies use of tiles in servers implementing Web Feature Service (WFS), Web Map Service (WMS), Web Coverage Service (WCS), and Web Map Tile Service (WMTS) Standards. Servers can structure their internal data in tiles (that are invisible to the user) to produce faster responses or have rendered tiles directly that can be presented to the user. When data is transmitted to the client and rendered on the client side, COG is an ideal solution for gridded data.
Table 1 — Use of tiles in different services and approaches
service type | feature based data | gridded data | summary |
---|---|---|---|
server-side rendering | WMS | WMS | easy to consume, does not require client processing |
server-side rendering and client presentation | WMTS | WMTS | easy to consume, and allow for caching |
data download | WFS | WCS | hardly suited for visualization (complex API, too much data) |
client-side rendering | tiled feature data | COG | heavy client processing, much more rendering possibilities |
(inspired by https://github.com/openlayers/openlayers/issues/10733)
To make the COG format an efficient data container available over the web, there is a need to be able to request a fragment of a file. Fortunately, HTTP 1.1 introduced support for range requests. Range requests enable a client to request only a portion of an HTTP message from a server. A server indicates support for range requests by returning the header Accept-Ranges: bytes. In the case of COGs, this enables the client to request the TIFF file header, the GeoTIFF tags, and the relevant individual tiles or tile ranges without downloading the entire file, as illustrated by Figure 5.
Figure 5 — Image transmission of only one tile
After this introduction, the ER is structured into the following sections:
Section 5 discusses the level of adoption of the COG format by clients and data providers
Section 6 discusses how COG is implemented in practice by providing examples
Section 7 proposes a set of requirements classes that could constitute the starting point for a future COG standard in the OGC Standards program
Section 8 discusses some issues found in the current approach for COG and proposes solutions
Section 9 proposes a set of requirements classes that could constitute the starting point for a future BigTIFF standard in the OGC Standards program
6. Key findings
6.1. TIFF v6 and BigTIFF
GeoTIFF (and, by extension COG) is dependent on TIFF v6 which is limited to files sizes of less than 4 GBytes. However, implementations of GeoTIFF (and COG) overcome this limitation by adopting BigTIFF as an alternative to the TIFF v6 headers. There is currently a misalignment between the GeoTIFF Standard (that ignores BigTIFF) and its implementations (that implement BigTIFF) that could be solved by formalizing BigTIFF as an official standard and referencing BigTIFF in the GeoTIFF Standard.
6.2. Georeference in COG
From all of the tile matrices provided at different resolutions in a COG file, only the highest resolution is georeferenced by GeoTIFF tags. The georeference of the lower resolutions depends on this georeference. Clients have to deduce the georeference of the reduced resolution subfiles by computing a ratio between the number of columns of the highest resolution and the reduced resolution (and another ratio for the number of rows). Instead, COG could take advantage of the tile matrix set data structure defined in the OGC 17-083r2 OGC Two Dimensional Tile Matrix Set Standard (2d-TMS). COG could also include a Tile Matrix Set definition and tile indices as new GeoTIFF tags. More work and discussions are needed to assess the convenience of this alignment between OGC and 2d-TMS.
7. Future Work
The work done to produce the list of requirements for COG as well as the list of requirements for BigTIFF will be transferred to the OGC GeoTIFF SWG for consideration as starting point for candidate Standards.
The strategy to divide a dataset into tiles or other smart organizations such as R-Trees, etc. could be applied to other file formats that could later be retrieved using the HTTP 1.1 range function. As an example, the header of a ZIP file could be retrieved and the information shown to the user in a web browser without the need to unzip the whole ZIP file. Future work could be done to determine if other file formats could get the same performance increase if organized conveniently and HTTP range is applied. For example, could we use a ZIP file to store vector tiles that can be unzipped at will when needed? The Testbed participants have already detected some implementation of HTTP range to read fragments of NetCDF v3, FlatGeoBuf (feature based format, https://github.com/flatgeobuf/flatgeobuf), Shapefiles, and others.
A good evaluation of netCDF-4 / HDF5 in cloud environment could be useful. The Testbed participants have seen some criticism against netCDF-4 in the Open Data Cube (ODC) community and it could be good to find out if there are intrinsic limitations in the netCDF-4 format or it is due to a limitation in the HDF5 library that can be overcome with more efficient code.
8. Overview
Traditionally, high-resolution imagery requires big files that have to be entirely downloaded to the client to be analyzed or visualized. This can require considerable download time, thus preventing the creation of real-time applications. One of the most popular formats in the world is the TIFF format. The format was created by the Aldus Corporation for use in desktop publishing. Aldus released the last version of the TIFF specification in 1992 (v. 6.0), subsequently updated with an Adobe Systems copyright after the Adobe acquired Aldus in 1994. Several Aldus or Adobe technical notes have been published with minor extensions to the format, and several specifications have been based on TIFF 6.0, including TIFF/EP (ISO 12234-2), TIFF/IT (ISO 12639), TIFF-F (RFC 2306) and TIFF-FX (RFC 3949), GeoTIFF (OGC 19-008r4, v1.1), and BigTIFF.
TIFF is a flexible, adaptable file format for handling images and data within a single file by including the header tags (size, definition, image-data arrangement, applied image compression) that define the images. The ability to store image data in a lossless format makes a TIFF file a useful image archive. TIFF can be used to store grey scale, color, or RGB images as well as integer or floating point data making it ideal as a support for storing the rangeset of grid coverage data.
To improve TIFF performance over the web, COG relies on two characteristics of the TIFF v6 format, the georeference GeoTIFF keys and a relatively unused property of HTTP (GET Range). This way, COG allows for efficient streaming of imagery and grid coverage data in the web, enables fast data visualization and facilitates faster geospatial processing workflows. This particular type of TIFF has been recently used to set up large series of remote sensing images in repositories of cloud providers (e.g., Amazon Web Services) enabling cloud processing at lower traffic. In fact, COG-aware software can request just the portions of data that it needs, improving accessing time and bandwidth. This is why it is called “Cloud Optimized GeoTIFF.”
COG is based on the GeoTIFF standard and does not introduce new capabilities that are not already in TIFF v6. As such, legacy software should be able to read COG files with no additional modifications. However, the legacy software will not be able to take advantage of the streaming capabilities, but still can easily download the whole file and read it.
The amount of data available for geospatial analytics has increased considerably in recent years. Therefore, downloading the data into a single computer is often not feasible. Providing data in the COG format can help decrease how much data is downloaded and copied. This is because online software systems can stream the data applications do not need to keep their own copy of the data for efficient access. New online software can access the content efficiently, while old versions can download completely. This avoids the need to have multiple copies of the files: one for fast access and another for download purposes.
COG relies on two complementary approaches already available in the existing standards to achieve its goal:
-
The first is the ability of GeoTIFF to store the raw pixels of the image organized in an efficient way; and
-
The second is HTTP GET Range requests, that let web clients request just the portions of a file that they need.
Using the first approach COG organizes the GeoTIFF so the latter requests can easily select and get the parts of the file that are useful for processing.
8.1. Efficient organization of data in a TIFF file
The Tiling and Reduced-Resolution Subfiles (sometimes called Overviews) in the GeoTIFF format supports the necessary structure for COG files so that the HTTP GET Range queries can request just the part of the file that is relevant.
Reduced-Resolution Subfiles come into play when the client wants to render a quick image of the whole or a big part of the area represented in the file. Instead of downloading every pixel, the software can just request a smaller, already created, lower-resolution version. The structure of the COG file on an HTTP Range supporting web server enables client software to easily find just the part of the whole file that is needed.
Tiles come into play when some small area of the overall extent of the COG file needs to be processed or visualized. This could be part of a reduced-resolution subfile, or it could be at full resolution. Tile organization makes all the relevant bytes of an area (a tile) to be in the same part of the file, so the software can use HTTP GET Range request to get only the tiles it needs.
8.2. Relation to OGC Tile Set Standards
The combined use of tiles and resolution levels is not new in OGC Standards. In fact the OGC Two-dimensional Tile Matrix Set standard (and the older OGC WMTS 1.0) use exactly the same approach. However, the draft OGC API — Tiles specification and the older WMTS 1.0 Standard require either a service to be installed in the web server provided or thousands of pre-generated independent tiles to be created. None of this is necessary in the COG approach as most of the modern web services natively support HTTP range.
Improving the relationship between COG and 2DTMS can be beneficial, so this document includes an extra requirement class for COG that could support using a list of COGs to store a Tile Set based on Common Tile Matrix Sets.
9. Implementations of COG
This section offers some examples of the current implementations of the COG format. This is not an exhaustive list. Some more relevant examples of implementations may be missing.
9.1. How to support COG in browsers
COG is not supported natively in web browsers. However, EOX (https://eox.at) has developed a set of JavaScript files that are used by the COG explorer library https://github.com/geotiffjs/cog-explorer. The library adds COG support to any modern browser. This demo portal: https://geotiffjs.github.io/cog-explorer provides a live example of the use of this library.
To demonstrate the use of the COG explorer with any COG file, the participants tried to connect the COG explorer to the Spanish CNIG download center http://centrodedescargas.cnig.es/CentroDescargas/catalogo.do?Serie=LANDS#. In this site, the historical Landsat national mosaics are available as COG files. Unfortunately, the CNIG has decided to adopt a multipart file format as a response to an HTTP GET request. This approach has the advantage of suggesting a file name to save the file when it is downloaded. However, there are two disadvantages: it forces encoding the COG file in base64 (inflating the size of the transmission) and it does not support COG HTTP GET range requests, thereby making the random access over HTTP impossible. This also makes the CNIG website incompatible with the COG explorer. Instead, the participants decided to download one of the available files and copy it to one of the CREAF servers. Once copies in the web shared folder, a normal HTTP GET request with range was possible. The file was exposed in the following url: https://joanma.uab.cat/temp/cog/landsat5_1991.tif.
9.1.1. COG and CORS in browsers
The COG-explorer is only available as an HTTPS endpoint. The rules recently auto-imposed in web browsers prevent a page that comes from a HTTPS domain to read data from an HTTP service. To be compatible with COG-Explorer, servers have to expose COG files in a URL starting with https://. To meet this requirement, an Internet Information Service (IIS) in Windows 2012r2 as a web server was selected. IIS in Windows 2012r2 supports “HTTP range” by default. However, that is not enough due to other Cross-Origin Resource Sharing (CORS) restrictions imposed by browsers. To “authorize” the COG-explorer domain (geotiffjs.github.io) to read the COG file in another domain (joanma.uab.cat), adding these two headers in the IIS configuration was necessary.
Access-Control-Allow-Headers: range
Access-Control-Allow-Origin: *
This way, the client knows that the http://joanma.uab.cat authorizes any other domain to read data and check for range headers. Now, server data can be read by the COG-explorer application directly (see Figure 6).