We showcase how a Data Scientist can develop a machine learning dashboard on Amazon SageMaker, and then share that dashboard with business users in a secure and robust way using Amazon ECS and Amazon Cognito. Our example dashboard uses Streamlit, but other dashboard tools can be switched in easily due to the containerized approach.
You will need an AWS account to use this solution. Sign up for an account here.
You will also need to have permission to use AWS CloudFormation and to create all the resources detailed in the architecture section. All AWS permissions can be managed through AWS IAM. Admin users will have the required permissions, but please contact your account's AWS administrator if your user account doesn't have the required permissions.
cloudformation/
template.yaml
: Creates AWS CloudFormation Stack for solution.development/
development.yaml
: Creates AWS CloudFormation Stack for development.
deployment/
deployment.yaml
: Creates AWS CloudFormation Stack for deployment.self-signed-certificate/
: Custom resource to create self-signed SSL certificate.string-functions/
: Custom resource to perform string functions in stack.
solution-assistant/
: Custom resource to clean up stack resources.
docs/
: Additional documentation and diagrams.examples/
nyc-uber-pickups/
dashboard/
src/
app.py
: Streamlit application code.
Dockerfile
: Container requirements for dashboard.requirements.in
: Unpinned Python requirements for dashboard.requirements.txt
: Pinned Python requirements for dashboard.
nyu-uber-pickups.ipynb
: Orchestrates the solution for this example.utils.py
: Useful functions for dashboard development.
text-generation/
dashboard/
src/
app.py
: Streamlit application code.
Dockerfile
: Container requirements for dashboard.requirements.in
: Unpinned Python requirements for dashboard.requirements.txt
: Pinned Python requirements for dashboard.
model/
entry_point.py
: Code used for model inference.requirements.in
: Unpinned Python requirements for model.requirements.txt
: Pinned Python requirements for model.
text-generation.ipynb
: Orchestrates the solution for this example.utils.py
: Useful functions for dashboard development.
As part of the solution, the following services are used:
- Amazon SageMaker Notebook: Used to develop the dashboard.
- Amazon SageMaker Endpoint: Used to deploy an example trained model.
- Amazon ECR: Used to store the custom dashboard Docker image.
- Amazon ECS: Used to run custom dashboard Docker containers inside a managed service.
- Amazon Cognito: Used to manage authentication to the dashboard.
- Elastic Load Balancing: Used to interface with Amazon Cognito and ECS service.
You are responsible for the cost of the AWS services used while running this solution.
As of 25th May 2020 in the US West (Oregon) region, the cost to:
- run an ml.t3.medium Amazon SageMaker Notebook Instance for development is $0.0582 per hour.
- store model artifacts in Amazon S3 is $0.023 per GB-month.
- host a DistilGPT-2 model on an ml.c5.xlarge Amazon SageMaker Hosting Instance is $0.238 per hour.
- store dashboard Docker containers in Amazon ECR is $0.10 per GB-month.
- run a dashboard Application Load Balancer is $0.0225 per hour and $0.008 per LCU-hour.
- run a dashboard Amazon ECS task (1 vCPU and 2GB of memory) is $0.04937 per hour.
- protect your dashboard with Amazon Cognito is $0.0055 per monthly active user.
All prices are subject to change. See the pricing webpage for each AWS service you will be using in this solution.
When you've finished with this solution, make sure that you delete all unwanted AWS resources. AWS CloudFormation can be used to automatically delete all standard resources that have been created by the solution and notebook. Go to the AWS CloudFormation Console, and delete the parent stack. Choosing to delete the parent stack will automatically delete the nested stacks.
Or use the AWS Command Line Interface:
aws cloudformation delete-stack \
--stack-name sagemaker-ml-dashboards
Caution: You need to manually delete any extra resources that you may have created in this notebook. Some examples include, extra Amazon S3 buckets (to the solution's default bucket), extra Amazon SageMaker endpoints (using a custom name), and extra Amazon ECR repositories.
See the Customization Guide for more details.
See the Troubleshooting Guide for more details.
Sticky sessions are a mechanism to route user requests to the same dashboard server (sometimes called a 'target') over the course of a session. When using an ECS Service, there can be multiple dashboard servers running at the same time. Sticky sessions can be useful when each dashboard server maintains state information in order to provide a continuous experience to users, but not all dashboard severs require sticky sessions. Certain libraries (such as Streamlit) use WebSockets which are inherently sticky, so sticky sessions are not required in this case. See target groups for more details. Other dashboard libraries are stateless, and so once again sticky sessions are not required.
Our solution does not use sticky session by default, but this can be
enabled by setting ApplicationLoadBalancerStickySessions
to true
.
- Amazon SageMaker Developer Guide
- Amazon SageMaker Python SDK Documentation
- AWS CloudFormation User Guide
- NYC Uber Pickups
- Text Generation
This project is licensed under the Apache-2.0 License.