Table of Contents generated with DocToc
Jupyter Notebook and JupyterLab as well as JupyterHub are developed by Project Jupyter, a non-profit, open source project.
Jupyter Notebook (previously named IPython Notebook) and JupyterLab are user interfaces for computational science and data science commonly used with Spark, Tensorflow and other big data processing frameworks. They are used by data scientists and ML engineers across a variety of organizations for interactive tasks. They support multiple languages through runners called "language kernels", and allow users to run code, save code/results, and share “notebooks” with code, documentation, visualization, and media easily.
JupyterHub lets users manage authenticated access to multiple single-user Jupyter notebooks. JupyterHub delegates the launching of single-user notebooks to pluggable components called “spawners”. JupyterHub has a sub-project named kubespawner, maintained by the community, that enables users to provision single-user Jupyter notebooks backed by Kubernetes pods - the notebooks themselves are Kubernetes pods. kubeform_spawner extends kubespawner to enable users to have a form to specify cpu, memory, gpu, and desired image.
Refer to the user_guide for instructions on deploying JupyterHub via ksonnet.
Once that's completed, you will have a StatefulSet for JupyterHub, a configmap for configuration, and a LoadBalancer type of service, in addition to the requisite RBAC roles. If you are on Google Kubernetes Engine, the LoadBalancer type of service automatically creates an external IP address that can be used to access the Jupyter notebook. Note that this is for illustration purposes only. In a production environment, JupyterHub should be coupled with SSL and configured to use an authentication plugin.
If you're testing and want to avoid exposing JupyterHub on an external IP address, you can use kubectl instead to gain access to the hub on your local machine.
kubectl port-forward <jupyterhub-pod-name> 8000:8000
The above will expose JupyterHub on http://localhost:8000. The pod name can be obtained by running kubectl get pods
, and will be tf-hub-0
by default.
Configuration for JupyterHub is shipped separately and contained within the configmap defined by the core componenent. It is a Python file that is consumed by JupyterHub on starting up. The supplied configuration has reasonable defaults for the requisite fields and no authenticator configured by default. Furthermore, we provide a number of parameters that can be used to configure the core component. To see a list of ksonnet parameters run
ks prototype describe kubeflow-core
If the provided parameters don't provide the flexibility you need, you can take advantage of ksonnet to customize the core component and use a config file fully specified by you.
Configuration includes sections for KubeSpawner and Authenticators. Spawner parameters include the form used when provisioning new Jupyter notebooks, and configuration defining how JupyterHub creates and interacts with Kubernetes pods for individual notebooks. Authenticator parameters correspond to the authentication mechanism used by JupyterHub.
Additional information about configuration can be found in the Zero to JupyterHub with Kubernetes guide and the JupyterHub documentation.
If you're using the quick-start, the external IP address of the JupyterHub
instance can be obtained from kubectl get svc
.
kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
tf-hub-0 ClusterIP None <none> <none> 1h
tf-hub-lb LoadBalancer 10.43.246.148 xx.yy.zz.ww 80:32689/TCP 36m
Now, you can access the external IP, http://xx.yy.zz.ww, with your browser.
When trying to spawn a new image, a configuration page should pop up, allowing
configuration of the Jupyter notebook image, CPU, Memory, and additional
resources. Using the default DummyAuthenticator
,
the hub should allow any username/password to access the hub and create new
notebooks. You can use an alternate authenticator plugin if you want to secure your notebook server and use its administration functionality.
An image with JupyterHub 0.8.1, kubespawner 0.7.1 and two simple authenticator plugins can be built from within the docker/
directory using the Makefile provided. For example, if you're using Google Cloud Platform and have a project with ID foo
configured to use gcr.io, you can do the following:
make build PROJECT_ID=foo
make push PROJECT_ID=foo
Images published under in the Jupyter docker-stacks repo
should work directly with the Hub. The only requirements for the Jupyter
notebook images that may be used with this instance of Hub is
that notebook images must have the same version of JupyterHub installed
(0.8.1 by default), and there must be a standard start-singleuser.sh
accessible
via the default PATH
.
After creating the initial Hub and exposing it on a public IP address, you can add GitHub based authentication. First, you'll need to create a GitHub oauth application. The callback URL would be of the form http://xx.yy.zz.ww/hub/oauth_callback
.
Once the GitHub application is created in the GitHub UI, update the
manifest/config.yaml
with the callback_url
, client_id
and client_secret
provided by GitHub UI. You should comment out the DummyAuthenticator
and
set the JupyterHub authenticator_class
to GitHubOAuthenticator
. You will
also set the oauth_callback_url
, client_id
, and client_secret
for the
authenticator. An example configuration section might look like:
c.JupyterHub.authenticator_class = GitHubOAuthenticator
c.GitHubOAuthenticator.oauth_callback_url = 'http://xx.yy.zz.ww/hub/oauth_callback'
c.GitHubOAuthenticator.client_id = 'client_id_here'
c.GitHubOAuthenticator.client_secret = 'client_secret_here'
Finally, you can update the configuration and apply the new configuration by doing the following:
ks apply ${ENVIRONMENT} -c ${COMPONENT_NAME}
kubectl delete pod tf-hub-0
By deleting the old pod, a new pod will come up with the new configuration and be configured to use the GitHub authenticator you specified in the previous step. You can additionally modify the JupyterHub configuration to add whitelists and admin users. For example, to limit the hub to only GitHub users, user1 and user2, one might use the following configuration:
c.Authenticator.whitelist = {'user1', 'user2'}
After changing the configuration and kubectl apply -f config.yaml
, please
note that the JupyterHub pod needs to be restarted before the new configuration
is reflected.