Jupyter

Jupyter is a software service for interactive computing and rich notebook editing. See more in the official website.

In Up2U, we deploy it in Kubernetes using the Zero to JupyterHub project.

Software Architecture

The architecture of the Jupyter service consists of:

  • JupyterHub application, run in a container (pod),
  • JupyterHub's configurable HTTP proxy, run in a separate container (pod),
  • a database for user metadata and sessions; by default, it is SQLite in the JupyterHub container (pod),
  • single-user Jupyter notebook servers with JupyterLab user interface, each one in a separate container (pod),
  • prePuller - a DaemonSet downloading necessary Docker images to all Kubernetes nodes, to speed up starting single-user pods.

From the Kubernetes perspective, we have two pods for JupyterHub and the proxy (assuming the SQLite database), and as many single-user pods as many users are currently active.

Deployment

Configuration

First, inspect the default configuration file (values.yaml) as well as the configuration reference to understand what you want to customize. For testing, you can set no config values, so the service will run with default ones.

Some commonly-used config options are the following:

# values.yaml

proxy:
  # generate one with `openssl rand -hex 32`
  secretToken: 785e66bd19b956296fb751c96306fa23f4deaf21b999baa3353fc53d2c1695a3
  https:
    # assuming HTTPS is terminated by infrastructure provider
    enabled: false
    type: offload

hub:
  image:
    name: OUR-PROJECT/k8s-hub
    tag: v1.0
  templatePaths:
    # the custom templates added in the JupyterHub image
    - /opt/templates
  # max concurrent users
  activeServerLimit: 100
  # max users concurrently starting their containers 
  concurrentSpawnLimit: 10
  extraConfig: |-
    config = '/etc/jupyter/jupyter_notebook_config.py'
    c.Spawner.cmd = ['jupyter-labhub']

auth:
  type: custom
  custom:
    className: oauthenticator.generic.GenericOAuthenticator
    # provide configuration details depending on your SSO
    config:
      login_service: "SSO"
      client_id: *****
      client_secret: *****
      token_url: https://proxy.eduteams.org/OIDC/token
      userdata_url: https://proxy.eduteams.org/OIDC/userinfo
      userdata_method: GET
      userdata_params: {'state': 'state'}
      username_key: sub
      scope:
        - openid

prePuller:
  hook:
    enabled: true

singleuser:
  defaultUrl: "/lab"
  image:
    name: OUR-PROJECT/k8s-single-user
    tag: v1.0
  cpu:
    limit: 2
    guarantee: .5
  memory:
    limit: 2G
    guarantee: 1G
  storage:
    capacity: 10G

# disable if not auto-scaling the K8s cluster
scheduling:
  userScheduler:
    enabled: false

# destroy singleuser pods that are inactive for 30 minutes
cull:
  enabled: true
  timeout: 1800
  every: 180

Theme and single-user customization

To customize theme, one need to extend the default Docker images of JupyterHub and single-user, and reference the new images in values.yaml.

Example JupyterHub image:

FROM jupyterhub/k8s-hub:0.9.0

# add our templates and CSS (referenced from templates)
COPY templates /opt/templates
COPY styles /usr/local/share/jupyterhub/static/css/custom

Read more about JupyterHub templates here.

Example single-user image:

FROM jupyterhub/k8s-singleuser-sample:0.9.0
ENV JUPYTER_ENABLE_LAB true

# install necessary system utils
USER root
RUN apt-get update && \
    apt-get install -y \
        zip \
        unzip \
        && rm -rf /var/lib/apt/lists/*

# install theme
COPY theme /opt/theme
RUN chown -R $NB_USER: /opt/theme
USER $NB_USER
RUN jupyter labextension install /opt/theme 

The actual theme for this image should be prepared as JupyterLab extension.

Actual deployment

Using helm v3, deploy the Jupyter service:

helm repo add jupyterhub https://jupyterhub.github.io/helm-chart/
helm repo update
helm upgrade --cleanup-on-fail \
  --install jhub jupyterhub/jupyterhub \
  --version=0.9.0 \
  --values config.yaml

Find out more in the official documentation.

Get admin rights

To grant admin rights for the first time, do something like the following:

kubectl exec -it hub-77d75ff89-8sj59 sqlite3 jupyterhub.sqlite "update users set admin=1 where name='{{ username|quote }}'; select name from users where admin<>0;"

# if JupyterHub does not see the change, we need to restart it:
kubectl delete pod hub-77d75ff89-8sj59

Next, the newly-granted admin can grant admin rights to others via UI.

Scaling up

At the time of writing:

JupyterHub isn't designed to support being run in parallell. More work needs to be done in JupyterHub itself for a fully highly available (HA) deployment of JupyterHub on k8s is to be possible. [source]

Thus, there is no easy way to horizontally scale up either JupyterHub application or the proxy. The only way of achieving a larger scale is to provide more resources to these pods.

Please note that users send HTTP requests to JupyterHub application only when authenticating and when starting up (spawning) their single-user pods. Afterwards, users send requests only to their single-user pods, so the JupyterHub pod is not so loaded.

Note that all users' HTTP requests always go through the HTTP proxy.

Persistent data

The critical persistent storage is Kubernetes volumes mounted to single-user pods. They hold user files.

Another critical storage in the database behind JupyterHub. By default, it is a SQLite file database in a volume mounted to the JupyterHub pod.