Publication Date: 2021-01-13

Approval Date: 2020-12-15

Submission Date: 2020-11-19

Reference number of this document: OGC 20-035

Reference URL for this document: http://www.opengis.net/doc/PER/t16-D027

Category: OGC Public Engineering Report

Editor: Christophe Noël

Title: OGC Testbed-16: Earth Observation Application Packages with Jupyter Notebooks


OGC Public Engineering Report

COPYRIGHT

Copyright © 2021 Open Geospatial Consortium. To obtain additional rights of use, visit http://www.opengeospatial.org/

WARNING

This document is not an OGC Standard. This document is an OGC Public Engineering Report created as a deliverable in an OGC Interoperability Initiative and is not an official position of the OGC membership. It is distributed for review and comment. It is subject to change without notice and may not be referred to as an OGC Standard. Further, any OGC Public Engineering Report should not be referenced as required or mandatory technology in procurements. However, the discussions in this document could very well lead to the definition of an OGC Standard.

LICENSE AGREEMENT

Permission is hereby granted by the Open Geospatial Consortium, ("Licensor"), free of charge and subject to the terms set forth below, to any person obtaining a copy of this Intellectual Property and any associated documentation, to deal in the Intellectual Property without restriction (except as set forth below), including without limitation the rights to implement, use, copy, modify, merge, publish, distribute, and/or sublicense copies of the Intellectual Property, and to permit persons to whom the Intellectual Property is furnished to do so, provided that all copyright notices on the intellectual property are retained intact and that each person to whom the Intellectual Property is furnished agrees to the terms of this Agreement.

If you modify the Intellectual Property, all copies of the modified Intellectual Property must include, in addition to the above copyright notice, a notice that the Intellectual Property includes modifications that have not been approved or adopted by LICENSOR.

THIS LICENSE IS A COPYRIGHT LICENSE ONLY, AND DOES NOT CONVEY ANY RIGHTS UNDER ANY PATENTS THAT MAY BE IN FORCE ANYWHERE IN THE WORLD. THE INTELLECTUAL PROPERTY IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NONINFRINGEMENT OF THIRD PARTY RIGHTS. THE COPYRIGHT HOLDER OR HOLDERS INCLUDED IN THIS NOTICE DO NOT WARRANT THAT THE FUNCTIONS CONTAINED IN THE INTELLECTUAL PROPERTY WILL MEET YOUR REQUIREMENTS OR THAT THE OPERATION OF THE INTELLECTUAL PROPERTY WILL BE UNINTERRUPTED OR ERROR FREE. ANY USE OF THE INTELLECTUAL PROPERTY SHALL BE MADE ENTIRELY AT THE USER’S OWN RISK. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR ANY CONTRIBUTOR OF INTELLECTUAL PROPERTY RIGHTS TO THE INTELLECTUAL PROPERTY BE LIABLE FOR ANY CLAIM, OR ANY DIRECT, SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM ANY ALLEGED INFRINGEMENT OR ANY LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR UNDER ANY OTHER LEGAL THEORY, ARISING OUT OF OR IN CONNECTION WITH THE IMPLEMENTATION, USE, COMMERCIALIZATION OR PERFORMANCE OF THIS INTELLECTUAL PROPERTY.

This license is effective until terminated. You may terminate it at any time by destroying the Intellectual Property together with all copies in any form. The license will also terminate if you fail to comply with any term or condition of this Agreement. Except as provided in the following sentence, no such termination of this license shall require the termination of any third party end-user sublicense to the Intellectual Property which is in force as of the date of notice of such termination. In addition, should the Intellectual Property, or the operation of the Intellectual Property, infringe, or in LICENSOR’s sole opinion be likely to infringe, any patent, copyright, trademark or other right of a third party, you agree that LICENSOR, in its sole discretion, may terminate this license without any compensation or liability to you, your licensees or any other party. You agree upon termination of any kind to destroy or cause to be destroyed the Intellectual Property together with all copies in any form, whether held by you or by any third party.

Except as contained in this notice, the name of LICENSOR or of any other holder of a copyright in all or part of the Intellectual Property shall not be used in advertising or otherwise to promote the sale, use or other dealings in this Intellectual Property without prior written authorization of LICENSOR or such copyright holder. LICENSOR is and shall at all times be the sole entity that may authorize you or any third party to use certification marks, trademarks or other special designations to indicate compliance with any LICENSOR standards or specifications.

This Agreement is governed by the laws of the Commonwealth of Massachusetts. The application to this Agreement of the United Nations Convention on Contracts for the International Sale of Goods is hereby expressly excluded. In the event any provision of this Agreement shall be deemed unenforceable, void or invalid, such provision shall be modified so as to make it valid and enforceable, and as so modified the entire Agreement shall remain in full force and effect. No decision, action or inaction by LICENSOR shall be construed to be a waiver of any rights or remedies available to it.

None of the Intellectual Property or underlying information or technology may be downloaded or otherwise exported or reexported in violation of U.S. export laws and regulations. In addition, you are responsible for complying with any local laws in your jurisdiction which may impact your right to import, export or use the Intellectual Property, and you represent that you have complied with any regulations or registration procedures required by applicable law to make this license enforceable.

Table of Contents

1. Subject

This OGC Testbed-16 Engineering Report (ER) describes all results and experiences from the “Earth Observation Application Packages with Jupyter Notebook” thread of OGC Testbed-16. The aim of this thread was to extend the Earth Observation Applications architecture developed in OGC Testbeds 13, 14, and 15 with support for shared and remotely executed Jupyter Notebooks. The Notebooks make use of the Data Access and Processing API (DAPA) developed in the Testbed-16 Data Access and Processing API (DAPA) for Geospatial Data task and tested in joint Technology Integration Experiments.

2. Executive Summary

2.1. Problem Statement

Previous OGC Testbeds developed an architecture for deploying and executing data processing applications close to the data products hosted by Earth Observation (EO) platforms. Testbed-16 participants and sponsored wished to complement this approach with applications based on Project Jupyter that enables developing Notebooks.

2.2. Use Cases

For the Testbed-16 (TB-16) demonstrations, the Testbed Call for Participation (CFP) considered three complementary scenarios that were refined as follows:

  1. The first scenario explored the possible interactions of Jupyter Notebook with data and processing services through the Data Access and Processing API (DAPA) and various operations (OpenSearch query, GetMap, others).

  2. The second scenario explored the interactive and batch mode execution on a hosted Jupyter.

  3. The third scenario explored the conversion of Jupyter-based applications into a deployable ADES Application Package. Moreover, the scenario explored the chaining of ADES processes from a Jupyter Notebook.

2.3. Achievements

The TB-16 participants implemented two Jupyter Notebooks and also implemented two ADES endpoints, focusing on the following aspects:

  • 52°North targeted the implementation of three Jupyter Notebook (D168) related to water masks in flooding situations, which combined form a single use case.

  • Terradue developed a Jupyter Notebook related to volcanology thematic (Testbed 16 task D169) and explored the packaging of Notebooks based on a straightforward workflow from Jupyter Notebooks development to their deployment as a service in an Exploitation Platform supporting the ADES/EMS approach (Testbed 16 task D171).

  • Geomatys provided an Application Deployment and Execution Service (ADES) endpoint (Testbed 16 task D170) supporting Jupyter Notebooks and demonstrated related workflows.

2.4. Findings and Recommendations

Section 7 of this ER discusses the challenges and main findings raised during TB-16. First the concepts and technologies that help understanding the Jupyter paradigm and the factors affecting proposed solutions are introduced.

Then, ADES support of non-interactive Notebooks is investigated as the major topic demonstrated during the TB-16 Jupyter task. Thereafter, the key ideas for provisioning interactive Notebooks are tackled. The focus is on highlighting the assets of a Jupyter-based development environment in EO platforms, especially when relying on developer-friendly libraries.

Finally, more challenges and lessons learned are elaborated, emphasizing in particular the great simplicity of code-based workflows implemented using Jupyter Notebooks.

2.5. Document contributor contact points

All questions regarding this document should be directed to the editor or the contributors:

Contacts

Name Organization Role

Christophe Noël

Spacebel s.a.

Editor

Matthes Rieke

52º North GmbH

Contributor

Pedro Goncalves

Terradue Srl

Contributor

Fabrice Brito

Terradue Srl

Contributor

Guilhem Legal

Geomatys

Contributor

2.6. Foreword

Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. The Open Geospatial Consortium shall not be held responsible for identifying any or all such patent rights.

Recipients of this document are requested to submit, with their comments, notification of any relevant patent claims or other intellectual property rights of which they may be aware that might be infringed by any implementation of the standard set forth in this document, and to provide supporting documentation.

3. References

4. Terms and definitions

For the purposes of this report, the definitions specified in Clause 4 of the OWS Common Implementation Standard OGC 06-121r9 shall apply. In addition, the following terms and definitions apply.

● Container

a lightweight, standalone, executable package of software that includes everything needed to run an application: code, runtime, system tools, system libraries and settings. (Docker).

NOTE: This is not a general definition of `container` as might be used in the OGC but instead is a definition within the context of this OGC ER.
● OpenAPI Document

A document (or set of documents) that defines or describes an API. An OpenAPI definition uses and conforms to the OpenAPI Specification [OpenAPI]

● OpenSearch

Draft specification for web search syndication, originating from Amazon’s A9 project and given a corresponding interface binding by the OASIS Search Web Services working group.

● Service interface

Shared boundary between an automated system or human being and another automated system or human being

● Workflow

Automation of a process, in whole or part, during which electronic documents, information or tasks are passed from one participant to another for action, according to a set of procedural rules (source ISO 12651-2:2014)

4.1. Abbreviated terms

  • ADES Application Deployment and Execution Service

  • AP Application Package

  • CFP Call For Participation

  • CWL Common Workflow Language

  • DWG Domain Working Group

  • EMS Execution Management Service

  • EO Earth Observation

  • ER Engineering Report

  • ESA European Space Agency

  • GUI Graphical User Interface

  • JSON JavaScript Object Notation

  • OAS3 OpenAPI 3 Specification

  • OSDD OpenSearch Description Document

  • OWS OGC Web Services

  • REST REpresentational State Transfer

  • SNAP SeNtinel Application Platform

  • SWG Standards Working Group

  • TIE Technology Integration Experiment

  • UI User Interface

  • URI Uniform Resource Identifier

  • URL Uniform Resource Locator

  • WFS Web Feature Service

  • WPS Web Processing Service

  • XP Exploitation

5. Overview

Section 6 introduces the project context, background and the initial requirements for the OGC Testbed-16 Earth Observation Application Packages with Jupyter Notebooks activity.

Section 7 discusses the challenges and main findings raised during this TB-16 activity.

Section 8 presents the solutions and the results of the Technology Integration Experiments (TIEs).

Section 9 provides the conclusions and recommendations for future work.

6. Context and Requirements

Previous OGC Testbeds defined and developed an Earth Observation Exploitation Platform architecture that enables deploying and executing processing applications close to the input data hosted on a Cloud infrastructure.

The architecture currently allows deploying and executing applications packaged as Docker containers. Testbed-16 intended to complement this architecture with applications based on Project Jupyter as detailed below.

6.1. Jupyter Notebooks

The goal of Project Jupyter "is to build open-source tools and create a community that facilitates scientific research, reproducible and open workflows, education, computational narratives, and data analytics". To that end, Jupyter enables developing Notebooks that consist of a web interface combining visual interactive components resulting from the execution of editable programming cells.

Jupyter features the ability to execute code changes on the fly from the browser, display the result of computation using visual representation, and/or capture the user input from graphical widgets. The figure below illustrates how code cells allow to enter and run code.

notebook cell
Figure 1. Code cells used to enter and run code

The web-based interactive programming interface can be used to access Earth Observation data or to display results on a map as shown below.

jupyter1
Figure 2. WMS layers displayed from a Jupyter Notebook

The TB-16 Call For Participation (CFP) requested that participants explore two modes for executing Notebooks:

  • Batch mode - Allowing the execution without user interface interaction.

  • Interactive applications (Web interface) - Can be provided as either a classical Notebook enabling code editions or a rich interactive application hiding source code.

From the CFP: "The Jupyter Notebooks (D168, D169) to be developed will interact with data and processing capacities through the Data Access and Processing API [OGC 20-016]. The Notebooks will perform discovery, exploration, data requests and processing for both raster and vector data. Besides the standard interactive usage, the Jupyter Notebook should be used also as a Web application, hiding code and Notebook cell structure, as well as in batch mode".

6.2. Data Access and Processing API (DAPA)

For this TB-16 task, a new Data Access and Processing API (DAPA) was defined with the goal of developing an end-user optimized data access and processing API. The API was designed to provide simple access to processed samples of large datasets for subsequent analysis within an interactive Jupyter Notebook.

Existing data access web services often require complicated queries and are based on the characteristics of the data sources rather than the user perspective. A key design objective for the DAPA work is to provide a user centric API by making function-calls instead of accessing multiple generic Web services or local files.

The DAPA aims to provide the following capabilities:

  • Specific data is bound to specific processing functions.

  • Data access is based on the selection of fields requested for a given spatial coverage (point, area, or n-dimension cube) and for a given temporal sampling (instance or interval).

  • Data access can be requested with a post-processing function, typically to aggregate the observations (by space and/or time).

  • Multiple data encoding formats can be requested.

6.3. Earth Observation Exploitation Platform Architecture

The traditional approach for EO processing consisted of downloading available content to local infrastructure, enabling customers to perform local execution on the fetched data.

past platform
Figure 3. Traditional Approach for Data Processing

OGC Testbed-13, Testbed-14, Testbed-15 EO activities and the OGC Earth Observation Applications Pilot developed an architecture that enables the deployment and execution of applications close to the physical location of the source data. The goal of such architecture is to minimize data transfer between data repositories and application processes.

From a business perspective, the purpose of the proposed platform architecture is to reach additional customers beyond the usual platform Data Consumers. Indeed, Application Providers get offered new facilities to submit their developed applications in the Cloud infrastructure (with a potential reward). Alternatively the platform also provides user-friendly discovery and execution APIs for Application Consumers interested in the products generated by the hosted applications.

eo xp
Figure 4. Exploitation Platform Architecture Customers

Two major Engineering Reports describe the architecture:

  • OGC 18-049r1 - Testbed-14: Application Package Engineering Report

  • OGC 18-050r1 - Testbed-14: ADES & EMS Results and Best Practices Engineering Report

Applications are packaged as Docker Images that are deployed as self-contained applications through the Application Deployment and Execution Service (ADES). OGC Web Processing Service (WPS) provides the standard interface to expose all ADES operations (including application deployment, discovery and execution).

Workflows are typically deployed as a CWL (Command Workflow Language) document through the Execution Management Service (EMS). Note that the BPMN alternative has also been explored in OGC 18-085.

The applications are described by an Application Descriptor (actually WPS Process Description) that provides all the information required to provide the relevant inputs and start an execution process. The descriptor might potentially be generated automatically when registering the application or can be explicitly provided.

6.4. Project Initial Requirements

The Testbed-16 CFP considered three complementary scenarios that might be modified during the Testbed, as long as the basic characteristics (i.e. data access, processing, and chaining) remain conserved:

  1. The first scenario explores the possible interactions of Jupyter Notebooks with data and processing services through the Data Access and Processing API (DAPA) and potentially other OGC API endpoints.

  2. The second scenario demonstrates the deployment and execution of Jupyter Notebooks in the EO Exploitation Platform environment. Use cases were prototyped for both an interactive mode and a batch mode execution.

  3. The third scenario researched the execution of a workflow including steps with both containerized application and Jupyter Notebook executions. The CFP encouraged exploring other approaches than the CWL and BPMN approaches.

6.5. Project Refined Requirements

During the TB-16 Kick Off and the following weeks, Participants and the Initiative Manager refined the initiative architecture and settled upon specific use cases and interface models to be used as a baseline for prototype component interoperability. The following aspects were agreed:

  • The scenario exploring interactions with OGC services must be explored without considering the ADES component.

  • The scenario studying the ADES execution must be focused on the batch mode.

  • The focus of the workflow aspects should be about the chaining of ADES applications from a (classical) Jupyter Notebook.

  • ADES API: The Testbed-14 interface must be aligned to the latest draft version of OGC API - Processes.

The initial requirements were translated in a new set of three EO Application Packages to be demonstrated:

6.5.1. EOAP-1 Interactive Local Jupyter Notebook

The first scenario explored the handling of Jupyter Notebooks and the interaction with data through the Data Access and Processing API (DAPA). The scenario was refined as follows:

  1. OpenSearch query (bbox, toi, keywords)

  2. GetMap retrieval of map images

  3. OpenSearch query (data domain for selected collections)

  4. DAPA data request time-averaged map, area-averaged time series from variety of source data type.

  5. DAPA data request with bounding polygon from different query.

6.5.2. EOAP-2 Hosted Jupyter Notebook (Alice)

The second scenario explored interactive and batch mode execution on hosted Jupyter. This scenario is described as follows:

  1. Publish a Jupyter Notebook on an exploitation (XP) platform for Interactive use.

  2. Publish a Jupyter Notebook on an XP platform for a batch mode execution.

6.5.3. EOAP-3 Packaged Jupyter Notebook (Bob)

The third scenario explored:

  1. Convert the Jupyter Notebook (batch mode) into a deployable ADES Application Package.

  2. Process chaining of ADES processes from a Jupyter Notebook (and possibly other Docker container applications)

7. Findings and Lessons Learned

This section first introduces the concepts and technologies that help in understanding the Jupyter paradigm and the factors affecting proposed solutions used in this Testbed. Next, the use of a Notebook with ADES is discussed. This is the major topic demonstrated during this TB-16 activity. Then the key ideas for provisioning interactive Notebooks are presented with a focus on highlighting these assets for EO platforms. Finally, more challenges and lessons learned are elaborated.

7.1. Jupyter Concepts

Jupyter reshapes interactive computing by providing a web-based application able to capture a full computational workflow: developing, documenting, executing code and communicating the results.

Jupyter combines three components: A server, a Notebook document and a kernel. The Jupyter architecture and some specific aspects are detailed below.

7.1.1. Jupyter Notebook Architecture

The architecture of a Jupyter Notebook consists of a server rendering a Notebook document and an interacting programming language kernel:

  • The Notebook server is responsible for loading the Notebook, rendering the Notebook web user interface and sending (through ZeroMQ messages) the cells of code to the specific execution environment (kernel) when the user runs the cells. The user interface components are updated according to the cell execution result.

  • The kernels implement a Read-Eval-Print-Loop model for a dedicated programming language, which consists in the following loop: Prompt the user for some code, evaluate the code, then print the result. IPython is the reference Jupyter kernel, providing a powerful environment for interactive computing in Python.

  • The Notebook document essentially holds the metadata, the markdown cells, and the code cells that contain the source code in the language of the associated kernel. The official Jupyter Notebook format is defined with a JSON schema (https://github.com/jupyter/nbformat/blob/master/nbformat/v4/nbformat.v4.schema.json).

jupyter architecture
Figure 5. Jupyter Notebook Architecture

7.1.2. Jupyter Kernels and Environment

Jupyter kernels are the programming-language specific processes that run the Notebook cells code. IPython is the reference Jupyter kernel, providing a powerful environment for interactive computing in Python. A large set of kernels for other languages are available.

With administrator rights, kernels can be added to the Jupyter host using python or conda. Conda is an open-source cross-platform package and environment manager system for any language (e.g. Python, R, Java). The conda command easily creates, saves, loads and switches between environments.

The following commands create a conda environment, install a Python2 kernel in the environment, then register the conda environment as a kernel in Jupyter:

conda create -n py2 python=2 ipykernel
conda run -n py2 -- ipython kernel install
python -m ipykernel install --user --name py2 --display-name "Python (py2)"

A Notebook might depend on specific software (e.g. GDAL, Orfeo Toolboox, geopandas) used from the Jupyter kernel. Depending on the configured permissions, those dependencies can be installed either from the Jupyter terminal, or using system commands directly from the Notebook (enabling others users to repeat the installation commands). The environment is persisted in the user environment of the Jupyter host.

kernel env
Figure 6. Kernel Access to Host Environment

The usual means for installing dependencies from a Notebook are through pip commands (python libraries) and conda (virtual environment).

Conda environments can be registered as a Jupyter kernel. The environment is specified in a configuration file (see example below) listing a set of channels (repositories hosting the packages - potentially custom software -) and dependencies.

Conda Environment Example
name: my_environment_example
channels:
  - conda-ogc
dependencies:
  - otb
  - gdal
  - ipykernel
  - geopandas

To create an environment from an environment.yml file, the following command should be used:

conda env create -f environment.yml

The kernel maintains the state of a Notebook’s computations. When restarting the kernel, the Notebook reset its states and therefore looses all results of Notebook cells. However, the dependencies installed using system commands generally remains in the user environment depending on the specific implementation.

7.1.3. Notebooks Version Control

From any Jupyter session, it is possible to start a terminal and execute commands in the kernel. In particular, git commands can be used to pull a repository and push changes.

jupyterlab terminal
Figure 7. JupyterLab Terminal

As illustrated below, from the terminal, the Notebook documents and the environment can be saved and retrieved from Git repositories (1). Then the environment can be provisioned using pip or conda (2).

jupyterlab git
Figure 8. JupyterLab Configuration Management

In order to push back changes on the Notebook, the user can execute a git command using JupyterLab terminal, and potentially push the change to a user private repository. Therefore, as illustrated below, multiple users might collaborate on a shared Notebook repository.

sharing simple
Figure 9. Notebook Collaboration using Git

7.1.4. Security Considerations

Security is a big challenge with sharing Notebooks. The potential for threat actors to attempt to exploit the Notebook for nefarious purposes is increasing with the growing popularity of Jupyter Notebook.

From the users' perspective, the Notebook server shall prevent untrusted code to be executed on user’s behalf when the Notebook is opened. In particular, the security model of Jupyter is based on a trust attribute (a signature computed from a digest of the Notebook’s contents plus a secret key) that prevents executing any untrusted code and sanitize the HTML source code. The Notebook’s trust is updated when the user explicitly decides to execute a specific cell.

From the platform point of view, the security challenge implies offering the interactivity of a Notebook without allowing arbitrary code execution on the system by the end-user. In particular, restricting the access to the Notebook server is important. As the Notebook user might execute local system commands, the vulnerability of the system should be carefully reviewed - in particular regarding all the write permissions granted on the various offered resources.

7.2. Jupyter Notebook Technologies Review

A large set of technologies related to Jupyter Notebook have been explored and are detailed below. In particular, multiple solutions were considered for provisioning Jupyter Notebook environments.

7.2.1. Single-User Jupyter Notebook (JupyterLab)

In its simplest form, a Jupyter Notebook is simply executed by a kernel managed by a Jupyter server such as JupyterLab. JupyterLab provides the familiar building blocks of the classic Jupyter Notebook (Notebook, terminal, text editor, file browser, rich outputs, etc.) in a more flexible user interface and enabling plugin extensions.

jupyterlab
Figure 10. JupyterLab

The JupyterLab interface consists of a main work area containing tabs of documents and activities, a collapsible left sidebar, and a menu bar. The left sidebar contains a file browser, the list of running kernels and terminals, the command palette, the Notebook cell tools inspector, and the tabs list.

Tip
JupyterLab is relevant for local development of a Notebook but requires manual steps to share Notebooks on repositories and to provision the customized Notebook kernels.

7.2.2. Multi-User Jupyter Notebook (JupyterHub)

Sharing a Notebook server such as JupyterLab would make the concurrent users' commands collide and overwrite each other as the Notebook has exactly one interactive session connected to a kernel. In order to provision instances of a Jupyter Notebook and share the Notebook, a hub serving multiple users is required to spawn, manage, and proxy multiple instances of the single-user Jupyter Notebook server.

JupyterHub allows creating Jupyter single-user servers that guarantee the isolation of the user Notebook servers. The Notebook is copied to ensure the original deployed Notebook is preserved. JupyterHub includes JupyterLab and thus also supports the provisioning of the environment using Conda.

jupyterhub architecture
Figure 11. JupyterHub Architecture

Although nothing prevents a side component managing the sharing of Notebook documents, this support goes beyond the provided capabilities. Indeed, only extensions of JupyterHub provides Notebook sharing support.

Tip
JupyterHub brings multi-users support and Jupyter isolation but it does not provide support for sharing Notebooks.

7.2.3. Containerized Jupyter Notebook (repo2Docker)

Full reproducibility requires the possibility of recreating the system that was originally used to generate the results. This can, to a large extent, be accomplished by using Conda to make a project environment with specific versions of the packages that are needed in the project. The limitations of Conda are reached when complex software installation is required (e.g. a specific configuration of a tool).

For the support of sophisticated systems, repo2Docker can be used to build a reproducible container that can be executed anywhere by a Docker engine. repo2Docker can build a computational environment for any repository that follows the Reproducible Execution Environment Specification (https://repo2Docker.readthedocs.io/en/latest/specification.html#specification).

As illustrated on the figure below, for building a Jupyter Notebook container, repo2Docker must be used from a repository that consists of the Notebook resources and a Dockerfile that builds the environment and install the Jupyter kernel.

repo2docker
Figure 12. Repo2Docker Process

Assuming that the Jupyter kernel is installed in the container, the JupyterLab interface can be launched from within a user #session without additional configuration by simply appending /lab to the end of the URL like so:

http(s)://<server:port>/lab

Note that, the DockerSpawner (https://jupyterhub-Dockerspawner.readthedocs.io/en/latest/) from JupyterHub allows spawning Notebooks provisioned from a Docker image.

Tip
Supporting Docker provides a total flexibility for defining the Jupyter environment (e.g. if a custom GDAL configuration is needed to execute the Notebook). repo2Docker allows a user to reproduce the whole Jupyter environment in a container that can be deployed in any Docker engine.

7.2.4. Repository Based Jupyter Notebook (BinderHub)

BinderHub is an open-source tool that enables deploying a custom computing environment described on a Git repository and make it available (from an associated URL) by many remote users. A common use of Binder is the sharing of Jupyter Notebooks.

As illustrated on figure below, BinderHub combines the tools described above to generate the Notebook: Repo2Docker generates a Docker image from the Git repository (potentially setting a conda environment), and the JupyterLab container is provisioned online based on JupyterHub running on Kubernetes.