This Terraform blueprint creates a Kubernetes environment (EKS) and installs JupyterHub. Based on AWS Data on EKS JupyterHub.
- Prerequisites
- AWS Cloud Configuration for Terraform
- AWS CLI Configuration for Multiple Accounts and Environments
- Environment Variables
- Variables File
- Github OAuth
- Deployment
- Cleanup
- Update
- Route the Domain in Route 53
- Manual Takedown of Just the Hub
- Adding Admins to EKS
- Adjusting Available Server Options
- Adjusting Available Nodes
- Adjusting Core Node
- Upgrading Kubernetes
- Kubernetes Layer Tour
This guide assumes that you have:
- A registered domain
- An AWS Certificate for the domain and subdomains
- An AWS IAM account (Trust Policy to assume JupyerhubProvisioningRole, or Admin if Role has not been created).
- Terraform >= 1.8.3 (installation guide)
- kubectl >= 1.26.15 (installation guide)
- yamllint >= 1.35.1 (installation guide
The project directory is structured to separate environment-specific configurations from the main
Terraform configuration. This allows for easier management and scalability when dealing with
multiple environments. Each deployment is given its own directory in envs/
.
This document explains how to set up the necessary AWS resources and configurations for using Terraform to provision JupyterHub.
-
Create an S3 Bucket:
- Go to the S3 console in AWS.
- Click "Create bucket".
- Name the bucket
jupyterhub-terraform-state-bucket
(ensure the name is unique per AWS account). - Choose the region
us-east-2
. - Enable default encryption.
- Create the bucket.
-
Configure Terraform to Use the S3 Bucket:
- In the
envs/<deployment>
directory, create a file namedbackend.tf
with the following content:
bucket = "jupyterhub-terraform-state-bucket" key = "terraform.tfstate" region = "us-east-2" encrypt = true dynamodb_table = "jupyterhub-terraform-lock-table"
- In the
- Create a DynamoDB Table:
- Go to the DynamoDB console in AWS.
- Click "Create table".
- Name the table
jupyterhub-terraform-lock-table
. - Set the primary key to
LockID
(String). - Create the table.
-
Create an IAM Role:
- Go to the IAM console in AWS.
- Click "Roles" and then "Create role".
- Choose
AWS service
and selectCustom trust policy
-
Set Up the Trust Policy:
- Edit the trust relationship for the
JupyterhubProvisioningRole
role to allow the necessary entities to assume the role. Copy and paste below:
- Edit the trust relationship for the
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::<account>:root"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"aws:PrincipalType": "User"
}
}
},
{
"Effect": "Allow",
"Principal": {
"Service": "ec2.amazonaws.com"
},
"Action": "sts:AssumeRole"
},
{
"Effect": "Allow",
"Principal": {
"Service": "lambda.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
-
Create and attach inline policies
- From the
JupyterhubProvisioningRole
under thepermissions
tab, selectcreate inline policy
- From the
JSON
tab, createterraform-jupyterhub-backend-policies
using the json in.aws
- From the
JSON
tab, createterraform-jupyterhub-provisioning-policies
using the json in.aws
- From the
-
Set Maximum Session Duration
- 1 hour is usually sufficient, but will occassionally fail.
- Recommend 4 hours.
To manage multiple AWS accounts and environments, you need to configure your AWS CLI with the appropriate profiles. Follow the steps below to set up your ~/.aws/config
and ~/.aws/credentials
files.
-
Obtain Your AWS Access Keys:
- Log in to the AWS Management Console.
- Navigate to the IAM service.
- Select Users and click on your user name.
- Go to the Security credentials tab.
- Click Create access key and note down the Access key ID and Secret access key.
-
Edit Your
.aws/credentials
File:- Open the
.aws/credentials
file in your home directory. If it doesn't exist, create it. - Add your access keys for each profile:
[mcgovern] aws_access_key_id = YOUR_MCGOVERN_ACCESS_KEY_ID aws_secret_access_key = YOUR_MCGOVERN_SECRET_ACCESS_KEY [bican] aws_access_key_id = YOUR_BICAN_ACCESS_KEY_ID aws_secret_access_key = YOUR_BICAN_SECRET_ACCESS_KEY
- Open the
-
Obtain Your Role ARN:
- Log in to the AWS Management Console.
- Navigate to the IAM service.
- Select Roles and find the role you will assume (e.g.,
JupyterhubProvisioningRole
). - Note down the Role ARN.
-
Edit Your
.aws/config
File:- Open the
.aws/config
file in your home directory. If it doesn't exist, create it. - Add the region, role ARN, and source profile for each environment. Here’s an example:
[profile mcgovern] region = us-east-2 role_arn = arn:aws:iam::MCGOVERN_ACCOUNT_ID:role/JupyterhubProvisioningRole source_profile = mcgovern [profile bican] region = us-east-2 role_arn = arn:aws:iam::BICAN_ACCOUNT_ID:role/JupyterhubProvisioningRole source_profile = bican
- Open the
Environment variables store secrets and hub deployment name:
AWS_PROFILE
: The profile for the AWS account to deploy to, see AWS config above.TF_VAR_github_client_id
: See Github OAuth Step.TF_VAR_github_client_secret
: See Github OAuth Step.TF_VAR_aws_certificate_arn
: See Create Cert Step.TF_VAR_danditoken
: API token for the DANDI instance used for user auth.
The variables are set in a terraform.tfvars
for each env
, ie envs/dandi/terraform.tfvars
name
: (optional, defaults to jupyerhub-on-eks)singleuser_image_repo
: Dockerhub repository containing custom jupyterhub imagesingleuser_image_tag
: tagjupyterhub_domain
: The domain to host the jupyterhub landing page: (ie "hub.dandiarchive.org")dandi_api_domain
: The domain that hosts the DANDI API with list of registered usersregion
: Cloud vendor region (ie us-west-1)
WARNING: If changing region
it must be changed both in the tfvars and in the backend.tf
.
JupyterHub is configured by merging two YAML files:
envs/shared/jupyterhub.yaml
envs/$ENV/jupyterhub-overrides.yaml
Env Minimum Requirements:
- hub.config.Authenticator.admin_users
This template is configuration for the jupyterhub helmchart administrator guide for jupyerhub.
The jupyterhub.yaml
and jupyterhub-overrides.yaml
can use ${terraform.templating.syntax}
with values that are explicitly passed to the jupyterhub_helm_config
template object
in addons.tf
The original AWS Jupyterhub Example Blueprint docs may be helpful.
Merge Strategy:
- Additive: New fields are added.
- Clobbering: Existing values, including lists, are overwritten.
example
Base Configuration (envs/shared/jupyterhub.yaml)
singleuser:
some_key: some_val
profileList:
- item1
- item2
Override Configuration (envs/$ENV/jupyterhub-overrides.yaml)
singleuser:
new_key: new_val
profileList:
- item3
Resulting Configuration
singleuser:
some_key: some_val
new_key: new_val
profileList:
- item3
- Open the GitHub OAuth App Wizard: GitHub settings -> Developer settings -> OAuth Apps. For dandihub, this is owned by a bot GitHub user account (e.g. dandibot).
- Create App:
Homepage URL
to the site root (e.g.,https://hub.dandiarchive.org
). Must be the same as jupyterhub_domain.Authorization callback URL
must be <jupyterhub_domain>/hub/oauth_callback.
Execute install script
./install.sh <env>
Timeouts and race conditions
Context Deadline Exceeded
: This just happens sometimes, usually resolved by rerunning the install script.
Key Management Service Duplicate Resource
This is usually caused by a problem with tfstate, it can't be immediately fixed because Amazon Key Management Service objects have a 7-day waiting period to delete.
The workaround is to change/add a name
var to the tfvars (ie jupyerhub-on-eks-2
)
Mark the existing KMS for deletion. You will need to assume the AWS IAM Role used to create it (ie JupyterhubProvisioningRole
)
Show config of current jupyterhub deployment
Warning: This is the fully templated jupyterhub. Be careful not to expose secrets.
helm get values jupyterhub -n jupyterhub
Route the Domain in Route 53
In Route 53 -> Hosted Zones -> <jupyterhub_domain> create an A
type Record that routes to an
Alias to Network Load Balancer
. Set the region and the EXTERNAL_IP of the service/proxy-public
Kubernetes object in the jupyterhub
namespace.
This will need to be redone each time the proxy-public
service is recreated (occurs during
./cleanup.sh
).
Changes to variables or the template configuration usually are updated idempotently by running
./install.sh <env>
without the need to cleanup prior.
Prior to cleanup ensure that kubectl and helm are using the appropriate kubeconfig
.
(<name>
is the value name
in terraform.tfvars
.)
aws eks --region us-east-2 update-kubeconfig --name <name-prefix>
Cleanup requires the same variables and is run ./cleanup.sh <env>
.
NOTE: Occasionally the Kubernetes namespace fails to delete.
WARNING: Sometimes AWS VPCs are left up due to an upstream Terraform race condition and must be deleted by hand (including hand-deleting each nested object).
terraform destroy -target=module.eks_data_addons.helm_release.jupyterhub -auto-approve
will
destroy all the jupyterhub assets, but will leave the EKS and VPC infrastructure intact.
Add the user/IAM to mapUsers
.
kubectl edit configMap -n kube-system aws-auth
apiVersion: v1
data:
mapAccounts: <snip>
mapRoles: <snip>
mapUsers: |
- groups:
- system:masters
userarn: arn:aws:iam::<acct_id>:user/<iam_username>
username: <iam_username>
kind: ConfigMap
metadata:
name: aws-auth
namespace: kube-system
These are the options for user-facing machines that run as a pod on the node, and they are configured in profileList
in dandihub.yaml
.
Each profile can have multiple user-facing profile_options
including images
.
These are the EKS machines that may run underneath one or more user-hub pods, and they are configured via Karpenter.
The node pools are configured in addons.tf
with karpenter-resources-*
objects.
The configuration for the machines that run the autoscaling and monitoring layer is eks_managed_node_groups
in main.tf
.
Kubernetes version is controlled via the terraform variable eks_cluster_version
, the default is
in versions.tf
, but each deployment can specify their own value in their tfvars
.
These objects are created by z2jh.
https://z2jh.jupyter.org/en/stable/
kubectl get all -n jupyterhub
Notable objects:
pod/hub-23490-393
: Jupyterhub server and culler podpod/jupyter-<github_username>
: User podpod/user-scheduler-5d8b9567-26x6j
: Creates user pods. There are two; one has been elected leader, with one backup.service/proxy-public
: LoadBalancer, External IP must be connected to DNS (Route 53)
pod/karpenter-75fc7784bf-cjddv
responds similarly to the cluster-autoscaler.
When Jupyterhub user pods are scheduled and sufficient Nodes are not available, Karpenter creates a NodeClaim and then interacts with AWS to spin up machines.
nodeclaims
: Create a node from one of the Karpenter Nodepools. (This is where spot/on-demand is configured for user-pods).