Publication Date: 2018-01-08

Approval Date: 2017-12-07

Posted Date: 2017-11-15

Reference number of this document: OGC 17-035

Reference URL for this document: http://www.opengis.net/doc/PER/t13-NR001

Category: Public Engineering Report

Editor: Charles Chen

Title: OGC Testbed-13: Cloud ER


OGC Engineering Report

COPYRIGHT

Copyright © 2018 Open Geospatial Consortium. To obtain additional rights of use, visit http://www.opengeospatial.org/

WARNING

This document is not an OGC Standard. This document is an OGC Public Engineering Report created as a deliverable in an OGC Interoperability Initiative and is not an official position of the OGC membership. It is distributed for review and comment. It is subject to change without notice and may not be referred to as an OGC Standard. Further, any OGC Engineering Report should not be referenced as required or mandatory technology in procurements. However, the discussions in this document could very well lead to the definition of an OGC Standard.

LICENSE AGREEMENT

Permission is hereby granted by the Open Geospatial Consortium, ("Licensor"), free of charge and subject to the terms set forth below, to any person obtaining a copy of this Intellectual Property and any associated documentation, to deal in the Intellectual Property without restriction (except as set forth below), including without limitation the rights to implement, use, copy, modify, merge, publish, distribute, and/or sublicense copies of the Intellectual Property, and to permit persons to whom the Intellectual Property is furnished to do so, provided that all copyright notices on the intellectual property are retained intact and that each person to whom the Intellectual Property is furnished agrees to the terms of this Agreement.

If you modify the Intellectual Property, all copies of the modified Intellectual Property must include, in addition to the above copyright notice, a notice that the Intellectual Property includes modifications that have not been approved or adopted by LICENSOR.

THIS LICENSE IS A COPYRIGHT LICENSE ONLY, AND DOES NOT CONVEY ANY RIGHTS UNDER ANY PATENTS THAT MAY BE IN FORCE ANYWHERE IN THE WORLD. THE INTELLECTUAL PROPERTY IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NONINFRINGEMENT OF THIRD PARTY RIGHTS. THE COPYRIGHT HOLDER OR HOLDERS INCLUDED IN THIS NOTICE DO NOT WARRANT THAT THE FUNCTIONS CONTAINED IN THE INTELLECTUAL PROPERTY WILL MEET YOUR REQUIREMENTS OR THAT THE OPERATION OF THE INTELLECTUAL PROPERTY WILL BE UNINTERRUPTED OR ERROR FREE. ANY USE OF THE INTELLECTUAL PROPERTY SHALL BE MADE ENTIRELY AT THE USER’S OWN RISK. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR ANY CONTRIBUTOR OF INTELLECTUAL PROPERTY RIGHTS TO THE INTELLECTUAL PROPERTY BE LIABLE FOR ANY CLAIM, OR ANY DIRECT, SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM ANY ALLEGED INFRINGEMENT OR ANY LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR UNDER ANY OTHER LEGAL THEORY, ARISING OUT OF OR IN CONNECTION WITH THE IMPLEMENTATION, USE, COMMERCIALIZATION OR PERFORMANCE OF THIS INTELLECTUAL PROPERTY.

This license is effective until terminated. You may terminate it at any time by destroying the Intellectual Property together with all copies in any form. The license will also terminate if you fail to comply with any term or condition of this Agreement. Except as provided in the following sentence, no such termination of this license shall require the termination of any third party end-user sublicense to the Intellectual Property which is in force as of the date of notice of such termination. In addition, should the Intellectual Property, or the operation of the Intellectual Property, infringe, or in LICENSOR’s sole opinion be likely to infringe, any patent, copyright, trademark or other right of a third party, you agree that LICENSOR, in its sole discretion, may terminate this license without any compensation or liability to you, your licensees or any other party. You agree upon termination of any kind to destroy or cause to be destroyed the Intellectual Property together with all copies in any form, whether held by you or by any third party.

Except as contained in this notice, the name of LICENSOR or of any other holder of a copyright in all or part of the Intellectual Property shall not be used in advertising or otherwise to promote the sale, use or other dealings in this Intellectual Property without prior written authorization of LICENSOR or such copyright holder. LICENSOR is and shall at all times be the sole entity that may authorize you or any third party to use certification marks, trademarks or other special designations to indicate compliance with any LICENSOR standards or specifications.

This Agreement is governed by the laws of the Commonwealth of Massachusetts. The application to this Agreement of the United Nations Convention on Contracts for the International Sale of Goods is hereby expressly excluded. In the event any provision of this Agreement shall be deemed unenforceable, void or invalid, such provision shall be modified so as to make it valid and enforceable, and as so modified the entire Agreement shall remain in full force and effect. No decision, action or inaction by LICENSOR shall be construed to be a waiver of any rights or remedies available to it.

None of the Intellectual Property or underlying information or technology may be downloaded or otherwise exported or reexported in violation of U.S. export laws and regulations. In addition, you are responsible for complying with any local laws in your jurisdiction which may impact your right to import, export or use the Intellectual Property, and you represent that you have complied with any regulations or registration procedures required by applicable law to make this license enforceable.

Table of Contents

1. Summary

This OGC Engineering Report (ER) will describe the use of OGC Web Processing Service (WPS) for cloud architecture in the OGC Testbed 13 Earth Observation Cloud (EOC) Thread. This report is intended to address issues in lack of interoperability and portability of cloud computing architectures which cause difficulty in managing the efficient use of virtual infrastructure such as in cloud migration, storage transference, quantifying resource metrics, and unified billing and invoicing. This engineering report will describe the current state of affairs in cloud computing architectures and describe the participant architectures based on use case scenarios from sponsor organizations.

Cloud computing is paving the way for future scalable computing infrastructures and is being used for processing digital earth observation data. In this EOC thread effort, data is stored in various storage resources in the cloud and accessed by an OGC Web Processing Service. The methods in which these processes are deployed and managed must be made interoperable to mitigate or avoid the complexities of administrative effort for the scientific community. In other words, the intent of this effort is to develop a way for scientists to acquire, process, and consume earth observation data without needing to administer computing cloud resources.

1.1. Requirements

The following requirements are to be addressed in this ER:

  1. Define the WPS interface and communication protocol between clients and WPS instances that work as interfaces to the cloud computing environment.

  2. Document the hosted cloud environment, processing tools, and deployment and management steps.

  3. Assess the status quo and the benefits of interoperability through use of OGC WPS and web services layer.

  4. Record the test experiments for reproducibility and document the use of Amazon Web Services (AWS) CloudForms and OpenStack Heat, etc.

1.2. Key Findings and Prior-After Comparison

Current cloud computing architectures have advanced from virtual hypervisors and shared compute resources to include containerization using Docker. As cloud computing continues to evolve, the scientific community seeks to achieve a more efficient process for deploying, processing, and retrieving data results. It is true that today, acquiring shared computing resources is easier than ever. However significant effort is still required in order to stage computing resources, determine compute requirements, deploy software libraries, and general administrative tasks more suited to information Technology (IT) administrators before processing of scientific data can occur. Furthermore, efficient use of compute resources when not being utilized needs to be accounted for through scalable solutions.

This engineering report seeks to document the Testbed 13 cloud implementations in which OGC web service specifications are used in conjunction with software containers (i.e. Docker) to establish a process flow and retrieve results for data processing functions. The goal is to reduce the amount of administrative work required to stage compute resources and process data, and simplify data retrieval, specifically for earth observation data which can vary depending on data size or volume.

1.3. What does this ER mean for the Working Group and OGC in general

The Working Group identified for the review of this engineering report is the Big Data Domain Working Group (DWG). The Big Data DWG’s current purpose statement defines their scope of work to include Big Data interoperability and especially analytics. However, key members of the DWG are interested in expanding the scope to include cloud in their discussions. Particularly, at least one member of the DWG is also working on multi-cloud discovery and access to Earth Observation data within cloud compute architectures.

The work of the EOC thread aligns with the Big Data DWG interests due to the use of earth observation data storage and processing of this data using OGC WPS standards in a dynamic cloud deployment for interoperable ease of access for scientific study. The goal of this effort is to improve the way EO data is disseminated, processed, stored, and searched without scientists needing to understand how to administer cloud computing resources. This engineering report describes future architectures and deployment methods for cloud architectures utilizing OGC WPS. The use of these architectures may assist in future developments of data processing including use cases such as big data processing. Other OGC W*S services may be considered as future follow-on effort.

1.4. Document contributor contact points

All questions regarding this document should be directed to the editor or the contributors:

Table 1. Contacts
Name Organization

Charles Chen (Editor)

Skymantics

Tom Landry

CRIM

Ziheng Sun

George Mason University

Chen Zhang

George Mason University

1.5. Future Work

The work contained herein may lead toward future interoperability tests across private-public cloud implementations. Future developments in cloud computing technology and container technology may lead to future work in interoperability research. The recommended future work on this topic includes the following:

  • Implement application packages and package managers

  • Attempt interoperability using various WPS clients to access other clouds

  • Stronger focus on security such as identity management (e.g. Shibboleth)

  • Improved data discoverability

  • Improved processes for handling of failures and error reporting

  • Improved health monitoring of docker containers and applications

  • Develop self-healing processes

1.6. Foreword

Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. The Open Geospatial Consortium shall not be held responsible for identifying any or all such patent rights.

Recipients of this document are requested to submit, with their comments, notification of any relevant patent claims or other intellectual property rights of which they may be aware that might be infringed by any implementation of the standard set forth in this document, and to provide supporting documentation.

2. References

3. Terms and Definitions

For the purposes of this report, the definitions specified in Clause 4 of the OWS Common Implementation Standard OGC 06-121r9 shall apply. In addition, the following terms and definitions apply.

3.1. Apache web server

Implements the HTTP protocol to serve web pages and other web content.

3.2. CA Siteminder

CA Siteminder, now called “CA Single Sign-On”, is a commercial product providing Single Sign-On functionality.

3.3. CGI

The Common Gateway Interface (CGI) allows web requests to interact with server executables.

3.4. Container

A method of operating system virtualization that allows packaging and running an application and its dependencies in resource-isolated processes.
Information stored with a user’s web browser to help a website identify the user for subsequent communication.

3.6. DACS

The Distributed Access Control System is software that can limit access to any content served by an Apache web server. In other modes of operation, DACS can be used by other web servers and virtually any application, script, server software, or CGI program to provide access control or authentication functionality.

3.7. DACS Federation

A DACS federation consists of one or more jurisdictions, each of which authenticates its users, provides web services, or both.

3.8. DACS Jurisdiction

Jurisdictions coordinate information sharing through light-weight business practices implemented as a requirement of membership in a DACS federation.
A cookie set in a user’s browser by DACS when the user is authenticated. Used to identify and verify the user.

3.10. Elasticity

The ability to adapt to workload changes by provisioning and de-provisioning resources in an autonomic manner. Elasticity in cloud infrastructure involves enabling the hypervisor to create virtual machines or containers with the resources to meet the real-time demand.

3.11. Hybrid Cloud

A composition of two or more clouds (private or public) that remain distinct entities but are bound together, offering the benefits of multiple deployment models with the ability to connect collocation, managed, and/or dedicated services with cloud resources.

3.12. Hypervisor|Virtual Machine Monitor (VMM)

A computer software, firmware or hardware that creates and runs virtual machines.

3.13. Image

A template for creating new instances.

3.14. Instance|Virtual Machine

A virtualized computing resource which provides functionality needed to execute an operating system. A hypervisor is used for native execution to share and manage hardware, allowing for multiple environments which are isolated from one another, yet exist on the same physical machine.

3.15. JavaScript

A programming language traditionally used by websites and interpreted on web browsers.

3.16. JQuery

A JavaScript framework.

3.17. Same Origin Policy

A security policy that prevents scripts from accessing content other than from where they originated.

3.18. Scalability

The ability of a system, network, or process to handle a growing amount of work, or its potential to be enlarged to accommodate that growth.

3.19. Single Sign-On

A system which enables a user to authenticate once in order to access multiple applications and/or services.

3.20. Spring Security

The security component of the Spring Framework.

3.21. Web Service

Provides information over web protocols (such as HTTP) with the primary intent to be machine readable (ie: in XML or JSON format).

3.22. Abbreviated terms

  • API Application Program Interface

  • AWS Amazon Web Services

  • CGI Common Gateway Interface

  • CLI Command Line Interface

  • CRIM Computer Research Institute of Montréal

  • DWG Domain Working Group

  • GPT Graph Processing Tool

  • SAR Synthetic Aperture Radar

  • SLC Single Look Complex

  • RSTB Radarsat-2 Software Tool Box

  • SNAP Sentinel Application Platform

  • SQW Standard Quad Polarization

  • SSH Secure Shell

  • TIE Technical Interoperability Experiment

  • VM Virtual Machine

  • WCS Web Coverage Service

  • WMS Web Map Service

  • WPS Web Processing Service

4. Overview

This engineering report describes the development effort in using a cloud computing environment for Earth Observation (EO) data integrated with OGC web services for improved interoperability. The cloud environment hosts data processing tools for deployment, management, and processing of EO data using an OGC Web Processing Service (WPS). Figure 1 describes the high level architecture for the cloud computing environment.

5 requirements 7aafc
Figure 1. Cloud Environment Overview

A user accesses the cloud environment via a WPS client (i.e., user dashboard). The user inputs WPS requests containing parameters which instruct the WPS Server to allocate cloud resources by taking advantage of the flexible, dynamic, and scalable nature of a cloud computing infrastructure. In this way, users can make use of cloud computing with minimal interaction with the IT administration of the cloud itself. The use case explored in this activity involves the processing of EO data using the RADARSAT-2 Tool Box (RSTB) deployed into a cloud environment to process and return processed images using a WMS/WCS server. The envisioned execution process flow is as follows:

  1. Software toolbox deployment, configuration and maintenance

  2. Receiving job request through OGC WPS.[1] (WPS being part of the cloud optionally)

  3. Allocating resources dynamically based on demand and performing job splitting/scheduling/processing/tracking

  4. Allocating required scratch storage for intermediate and final product

  5. Supporting batch processing multiple RADARSAT-2 or other SAR/optical images (a generic big /high volume data processing)

  6. Capable to integrate or exchange data from different sources hosted in a cloud environment (and/or traditional computing network)

  7. Gathering output elements into final products

  8. Disseminating final products through OGC Web services such as WCS and WMS

  9. Providing cloud usage statistics and user notification

4.1. Implementation Goals

Two separate implementations have been developed by Computer Research Institute of Montréal (CRIM) and George Mason University (GMU), respectively, to support the testbed experiments. Each implementation has been developed independently using various software tools and specifications to achieve the same implementations goals and capabilities as follows:

  • The ability to leverage large pools of computing resources from the hybrid cloud and traditional dedicated servers

  • The ability to easily create or expand the number of instances VMs when needed and not need to reconfigure how WPS services are advertised

  • The ability to control access and authentication of users of the web services and instances VMs

  • The ability to log usage and jobs being performed

  • Must allow for the integration of WPS 2.0 including constructs for service discovery service capabilities job control execution and data transmission of inputs and outputs in a chain

  • Will have a web-enabled dashboard of current usage and capacity of the computing resource of the cloud infrastructure, ideally this dashboard can be integrated into the WPS dashboard

  • Monitors the execution, requests, responses, etc.

  • Publish and consume OGC services like WMS, WCS, and WFS

  • The operation cloud model must be easily reproducible and documented

  • The operational cloud model should general enough to support any type of Earth Observation data processing supported by RADARSAT-2 toolbox

  • Delivery of scripts that allow for the reinstallation of the cloud environment

4.2. Expanded Architecture

Based on the architectures described earlier, the overview architecture can be expanded to describe additional components within each category as shown in Figure 2 below. The WPS Client may also include a graphical user interface and security functions. The WPS Server may include synchronous or asynchronous monitoring such as a polling function to get status of a running process. The VMs may contain Docker (see Section Docker Containers) containers and the RSTB processes to process EO data and produce a result. Additionally, a Docker Registry may be used for distributing the Docker images. Data may be accessed in external clouds, and the resulting data may be shared across shared cloud storage.

4 overview e309f
Figure 2. High Level Expanded Cloud Architecture

In general, the location of the WPS server is agnostic from the cloud environment and can be contained as a separate server or within a cloud server. While the WPS may not be represented within a cloud environment in Figure 1, the actual deployment may be within the same or separate cloud environment. However, it is important to note that the cloud environments are able to retrieve data (i.e., RADARSAT-2 Samples) via an external cloud. In this combined expanded architecture, the WPS client contains multiple functions such as application execution and monitoring functions. The virtual machines deployed in the cloud architectures contain application containers using Docker. Both architecture implementation follow this expanded architecture with various differences in how each configures their Docker deployments and VM provisioning.

4.3. Development Approach

The development process for the cloud environments were broken out into a Work Breakdown Structure as shown in Figure 3. This structure was developed and agreed upon by all participants to ensure all implementation aspects of the cloud environment were considered.

4 overview e345f
Figure 3. Testbed 13 EOC Thread - Cloud Development Work Breakdown Structure (WBS)

4.4. Outline

The engineering report describes the architecutre, configuration, execution, test, and demonstration of the component implementations as follows:

  • Section 5 Architectures describes the high level architectures as implemented by the participants, GMU and CRIM.

  • Section 6 Configuration describes the configuration for WPS, Cloud, Containers, Data Storage and Retrieval, and Metrics collection.

  • Section 7 Execution describes the execution processes of the WPS components including operations, parameterization, and job management.

  • Section 8 [Security] describes various security approaches.

  • Section 9 CRIM Test Experiments describes the Technical Interoperability Experiments (TIEs).

  • Section 10 Testbed 13 Demonstration describes the OGC Testbed 13 demonstration scenarios.

  • Section 11 Summary discusses the final conclusions and future work.

5. Architectures

In Testbed 13 Earth Observation Clouds (EOC) thread, two implementations of a cloud environment have been developed in order to compare and contrast different approaches while striving for interoperability. At a high level, both implementations utilize virtualized hosts in a cloud environment and Docker software containers. As seen in Figure 1 in the Overview, an Actor must be able to execute a series of operations via a WPS interface in order to interact with the cloud environment.

5.1. GMU Cloud Architecture Framework

GMU has established an architecture framework as seen in Figure 4 below. The framework is divided into four layers: Client, OGC Web Services, Cloud environment, and Internet (i.e., external network). The client consists of a browser-based application dashboard containing stored WPS requests, virtual host locations, and additional features such as priority and resource allocation functions.

5 solutions 109c2
Figure 4. GMU Cloud Architecture Framework

All the capabilities of the WPS server and cloud environment are performed via the WPS interface. In other words, WPS is the only exposed interface for users to access the cloud and Docker container functionality. The user (i.e., Actor) can perform WPS request functions through the client to an OGC WPS 2.0 implementation which also resides in the cloud. Job requests/responses with the cloud are managed by Secure Shell (SSH) using the Docker command-line interface (CLI). The WPS server RSTB processes requests using the Docker CLI by deploying Docker containers in a Virtual Machine (VM) (i.e., cloud instance). The processes contained within each job request publishes the final products via an OGC Web Coverage Service (WCS) service. The final product is returned through OGC WCS. The WPS client receives reference URL to the WCS containing the results.

Additionally, the WPS and cloud environment can interact over the Amazon Elastic Compute Cloud (EC2) API which can manage the deployment and scalability of VM instances using the Amazon Cloud Management Server. The cloud is capable of communicating with other cloud blocks to exchange data or call geospatial functions from different sources hosted externally to the local cloud environment. It should be noted that while there is a good compatibility guide to the OpenStack implementation of the EC2 API, many commands are not supported [EC2].

5.2. CRIM High Level Architecture

The CRIM High Level Architecture contains a WPS built on the open-source PyWPS 4.0.0, an OGC WPS 1.0 specification. Since PyWPS uses Python as its code-base, several software applications and libraries adopted for this implementation are also Python based. The current solution targets Python 2.7, in part because PyWPS 4.0 still exhibits input/output issues with Python 3.6. Python 2.7 is currently being phased out of CRIM’s research infrastructure, in favor of Python 3.6. The cloud environment is developed using OpenStack Juno, a free and open source cloud operating system that controls compute, storage, and networking resources in a data center. Docker 1.10 is used as an application package containing user’s process and pulled on demand from the Docker registry, but also as a way to bundle and deploy all other required components of the system. More details regarding each software package utilized can be found in Appendix C - Software Packages. The CRIM High Level Architecture is represented in Figure 5 below.

5 architectures aff4a
Figure 5. CRIM High Level Architecture

In this architecture view, the WPS 1.0 Client/Application Manager interacts with a WPS 1.0 Server. The server contains a job manager that splits jobs into tasks. A request containing parameters for processes, cloud, and application registry can be sent to the WPS server to manage the distribution of tasks across one to many task queues within multiple cloud environments. An elasticity manager monitors the tasks queues and automatically triggers the deployment of new VMs when predefined system load thresholds are met. The elasticity manager has been developed by CRIM and has been in operation for the last 3 years in a software research platform called VESTA (http://vesta.crim.ca/index_en.html). Additionally, the input and output parameters of the Docker application packages, as well as execution status, are passed via the task queue. The data results are stored on a shared fileserver and can be accessed either in their raw form or as WCS 1.0 or Web Map Service (WMS) 1.1.1 layers implemented using GeoServer. Once the results are computed, the WPS receives the output in the form of a reference URL to the WMS Server where the user can retrieve the image.

6. Configuration

This section describes the key components of the architectures described in the [Architecture] section. In general, the configuration will describe each WPS implementation, cloud environment, and additional interfaces in further detail.

6.1. WPS

6.1.1. GMU WPS

GMU has developed their WPS using the OGC WPS 2.0 specification using Java in the Eclipse development environment. The GMU WPS was developed by the Center for Spatial Information Science and Systems (CSISS) including a web-based GMU WPS Dashboard tool to support the OGC Testbed 13 demonstrations. The demonstration version of GMU WPS Dashboard can be accessed at: http://cloud.csiss.gmu.edu/GMUWPS/ (Note: This is a demonstration environment and is available based on best-effort.) Figure 6 below contains a screenshot of the GMU WPS Dashboard.

6 configuration d30c3
Figure 6. GMU WPS Dashboard

The GMU WPS Dashboard provides an interface to send requests and receive responses via WPS operations including GetCapabilities, DescribeProcess, Execute, GetStatus, and GetResult and supports both synchronous and asynchronous processes. The GMU WPS can interface with the cloud management service to create new VMs, list all running VMS, and shut down VMs using the Amazon Elastic Compute Cloud (EC2) API. RSTB is deployed in software containers (i.e., Docker) on virtual machines which are accessible via the GMU WPS Dashboard. Optimized for the Testbed 13 EOC thread, the client displays the current computing statuses such as Job Splitting, Job Priority, Cloud VM Usage, and Docker Usage.

6.1.2. CRIM WPS

CRIM developed a novel job scheduler for PyWPS 4.0.0 using Celery, a distributed task queue written in Python. PyWPS 4.0 currently implements OGC WPS 1.0 specification, while OGC WPS 2.0 support is still in development. In the WPS 2.0 standard, the core concepts for job control, process execution and job monitoring are particularly well addressed by task queues. However, in order to ensure easier implementation and better interoperability with existing servers, platforms and libraries, it was decided that the WPS 1.0 standard was sufficient. It was considered that for Testbed 13, CRIM’s existing WPS 1.0 Application Manager Client working alongside Flower, a real-time monitor and web admin for Celery, constitute an acceptable alternative to a full-fledged WPS 2.0 client. Figure 7 depicts a screenshot of PAVICS, Open Source component RS-40 in CANARIE software registry (https://science.canarie.ca/researchsoftware/researchresource/main.html?resourceID=103). CRIM develops and uses PAVICS as an App Manager dashboard that interfaces with WPS 1.0 servers and services.