Skip to content

Commit

Permalink
Add better integration with SageMaker capabilities and improve region…
Browse files Browse the repository at this point in the history
…al support.
  • Loading branch information
thvasilo committed Dec 4, 2020
1 parent 669e130 commit 8b87a3a
Show file tree
Hide file tree
Showing 19 changed files with 1,657 additions and 182 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
.DS_Store

.ipynb_checkpoints/
18 changes: 11 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,9 @@ To get started quickly, use the following quick-launch link to launch a CloudFor

| Region | Stack |
| ---- | ---- |
|US East (N. Virginia) | [<img src="https://s3.amazonaws.com/cloudformation-examples/cloudformation-launch-stack.png">](https://us-east-1.console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks/create/review?templateURL=https://sagemaker-solutions-us-east-1.s3.amazonaws.com/Fraud-detection-using-machine-learning/deployment/fraud-detection-using-machine-learning.yaml&stackName=SageMaker-Fraud-Machine-Learning) |
|US East (Ohio) | [<img src="https://s3.amazonaws.com/cloudformation-examples/cloudformation-launch-stack.png">](https://us-east-2.console.aws.amazon.com/cloudformation/home?region=us-east-2#/stacks/create/review?templateURL=https://sagemaker-solutions-us-east-2.s3.us-east-2.amazonaws.com/Fraud-detection-using-machine-learning/deployment/fraud-detection-using-machine-learning.yaml&stackName=SageMaker-Fraud-Machine-Learning) |
|US West (Oregon) | [<img src="https://s3.amazonaws.com/cloudformation-examples/cloudformation-launch-stack.png">](https://us-west-2.console.aws.amazon.com/cloudformation/home?region=us-west-2#/stacks/create/review?templateURL=https://sagemaker-solutions-us-west-2.s3-us-west-2.amazonaws.com/Fraud-detection-using-machine-learning/deployment/fraud-detection-using-machine-learning.yaml&stackName=SageMaker-Fraud-Machine-Learning) |
|US East (N. Virginia) | [<img src="https://s3.amazonaws.com/cloudformation-examples/cloudformation-launch-stack.png">](https://us-east-1.console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks/create/review?templateURL=https://sagemaker-solutions-prod-us-east-1.s3.us-east-1.amazonaws.com/Fraud-detection-using-machine-learning/deployment/fraud-detection-using-machine-learning.yaml&stackName=SageMaker-Fraud-Machine-Learning) |
|US East (Ohio) | [<img src="https://s3.amazonaws.com/cloudformation-examples/cloudformation-launch-stack.png">](https://us-east-2.console.aws.amazon.com/cloudformation/home?region=us-east-2#/stacks/create/review?templateURL=https://sagemaker-solutions-prod-us-east-2.s3.us-east-2.amazonaws.com/Fraud-detection-using-machine-learning/deployment/fraud-detection-using-machine-learning.yaml&stackName=SageMaker-Fraud-Machine-Learning) |
|US West (Oregon) | [<img src="https://s3.amazonaws.com/cloudformation-examples/cloudformation-launch-stack.png">](https://us-west-2.console.aws.amazon.com/cloudformation/home?region=us-west-2#/stacks/create/review?templateURL=https://sagemaker-solutions-prod-us-west-2.s3.us-west-2.amazonaws.com/Fraud-detection-using-machine-learning/deployment/fraud-detection-using-machine-learning.yaml&stackName=SageMaker-Fraud-Machine-Learning) |


### Additional Instructions
Expand All @@ -38,7 +38,7 @@ Both of the trained models are deployed to Amazon SageMaker managed real-time en

The model training and endpoint deployment is orchestrated by running a [jupyter notebook](source/notebooks/sagemaker_fraud_detection.ipynb) on a SageMaker Notebook instance. The jupyter notebook runs a demonstration of the project using the aforementioned anonymized credit card dataset that is automatically downloaded to the Amazon S3 Bucket created when you launch the solution. However, the notebook can be modified to run the project on a custom dataset in S3. The notebook instance also contains some example code that shows how to invoke the REST API for inference.

In order to encapsulate the project as a stand-alone microservice, Amazon API Gateway is used to provide a REST API, that is backed by an AWS Lambda function. The Lambda function runs the [code](https://github.com/awslabs/fraud-detection-using-machine-learning/blob/master/source/fraud_detection/index.py) to preprocess incoming transactions, invoke sagemaker endpoints, merge results from both endpoints if necessary, store the model inputs and model predictions in S3 via Kinesis Firehose, and provide a response to the client.
In order to encapsulate the project as a stand-alone microservice, Amazon API Gateway is used to provide a REST API, that is backed by an AWS Lambda function. The Lambda function runs the code necessary to preprocess incoming transactions, invoke sagemaker endpoints, merge results from both endpoints if necessary, store the model inputs and model predictions in S3 via Kinesis Firehose, and provide a response to the client.

## Data

Expand Down Expand Up @@ -78,12 +78,16 @@ We cite the following works:
* `notebooks/`
* `src`
* `package`
* `config.py`: Read in the environment variables set by cloudformation stack creation
* `config.py`: Read in the environment variables set during the Amazon CloudFormation stack creation
* `generate_endpoint_traffic.py`: Custom script to show how to send transaction traffic to REST API for inference
* `util.py`: Helper function and utilities
* `sagemaker_fraud_detection.ipynb`: Orchestrates the solution. Trains the models and deploys the trained model
* `setup/`
* `on-start.sh`: Bash script to setup sagemaker notebook environment with necessary dependencies
* `endpoint_demo.ipynb`: A small notebook that demonstrates how one can use the solution's endpoint to make prediction.
* `scripts/`
* `set_kernelspec.py`: Used to update the kernelspec name at deployment.
* `test/`
* Files that are used to automatically test the solution


## License

Expand Down
99 changes: 99 additions & 0 deletions deployment/fraud-detection-sagemaker-demo-stack.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
AWSTemplateFormatVersion: "2010-09-09"
Description: "((SO0056)) - fraud-detection-using-machine-learning demo stack"
Parameters:
SolutionPrefix:
Description: The name of the prefix for the solution used for naming resources.
Type: String
SolutionsBucket:
Description: The bucket that contains the solution files.
Type: String
SolutionName:
Type: String
ExecutionRoleArn:
Description: The role used when invoking the enpoint.
Type: String

Mappings:
RegionMap:
"us-west-1":
"XGBoost": "746614075791.dkr.ecr.us-west-1.amazonaws.com"
"us-west-2":
"XGBoost": "246618743249.dkr.ecr.us-west-2.amazonaws.com"
"us-east-1":
"XGBoost": "683313688378.dkr.ecr.us-east-1.amazonaws.com"
"us-east-2":
"XGBoost": "257758044811.dkr.ecr.us-east-2.amazonaws.com"
"ap-northeast-1":
"XGBoost": "354813040037.dkr.ecr.ap-northeast-1.amazonaws.com"
"ap-northeast-2":
"XGBoost": "366743142698.dkr.ecr.ap-northeast-2.amazonaws.com"
"ap-southeast-1":
"XGBoost": "121021644041.dkr.ecr.ap-southeast-1.amazonaws.com"
"ap-southeast-2":
"XGBoost": "783357654285.dkr.ecr.ap-southeast-2.amazonaws.com"
"ap-south-1":
"XGBoost": "720646828776.dkr.ecr.ap-south-1.amazonaws.com"
"ap-east-1":
"XGBoost": "651117190479.dkr.ecr.ap-east-1.amazonaws.com"
"ca-central-1":
"XGBoost": "341280168497.dkr.ecr.ca-central-1.amazonaws.com"
"cn-north-1":
"XGBoost": "450853457545.dkr.ecr.cn-north-1.amazonaws.com.cn"
"cn-northwest-1":
"XGBoost": "451049120500.dkr.ecr.cn-northwest-1.amazonaws.com.cn"
"eu-central-1":
"XGBoost": "492215442770.dkr.ecr.eu-central-1.amazonaws.com"
"eu-north-1":
"XGBoost": "662702820516.dkr.ecr.eu-north-1.amazonaws.com"
"eu-south-1":
"XGBoost": "048378556238.dkr.ecr.eu-north-1.amazonaws.com"
"eu-west-1":
"XGBoost": "141502667606.dkr.ecr.eu-west-1.amazonaws.com"
"eu-west-2":
"XGBoost": "764974769150.dkr.ecr.eu-west-2.amazonaws.com"
"eu-west-3":
"XGBoost": "659782779980.dkr.ecr.eu-west-3.amazonaws.com"
"me-south-1":
"XGBoost": "801668240914.dkr.ecr.me-south-1.amazonaws.com"
"sa-east-1":
"XGBoost": " 737474898029.dkr.ecr.sa-east-1.amazonaws.com"
"us-gov-west-1":
"XGBoost": "414596584902.dkr.ecr.us-gov-west-1.amazonaws.com"

Resources:
FraudClassificationModel:
Type: "AWS::SageMaker::Model"
Properties:
ExecutionRoleArn: !Ref ExecutionRoleArn
PrimaryContainer:
Image: !Sub
- "${ContainerLocation}/sagemaker-xgboost:0.90-2-cpu-py3"
- ContainerLocation:
Fn::FindInMap: [RegionMap, !Ref "AWS::Region", "XGBoost"]
ModelDataUrl: !Sub "s3://${SolutionsBucket}/${SolutionName}/artifacts/xgboost-model.tar.gz"
ModelName: !Sub "${SolutionPrefix}-demo"
FraudClassificationEndpointConfig:
Type: "AWS::SageMaker::EndpointConfig"
Properties:
ProductionVariants:
- InitialInstanceCount: 1
InitialVariantWeight: 1.0
InstanceType: ml.m5.xlarge
ModelName: !GetAtt FraudClassificationModel.ModelName
VariantName: !GetAtt FraudClassificationModel.ModelName
EndpointConfigName: !Sub "${SolutionPrefix}-demo"
Metadata:
cfn_nag:
rules_to_suppress:
- id: W1200
reason: Demo endpoint not given a KmsID
FraudClassificationEndpoint:
Type: "AWS::SageMaker::Endpoint"
Properties:
EndpointName: !Sub "${SolutionPrefix}-demo"
EndpointConfigName: !GetAtt FraudClassificationEndpointConfig.EndpointConfigName

Outputs:
EndpointName:
Description: Name of the demo XGBoost fraud classification endpoint
Value: !GetAtt FraudClassificationEndpoint.EndpointName
67 changes: 60 additions & 7 deletions deployment/fraud-detection-sagemaker-notebook-instance.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@ Description: >-
Parameters:
SolutionPrefix:
Type: String
Default: "sm-soln-fraud-detection"
ParentStackName:
Type: String
SolutionName:
Expand All @@ -17,19 +16,62 @@ Parameters:
Type: String
RESTAPIGateway:
Type: String
TestOutputsS3Bucket:
Type: String

Mappings:
SolutionsS3BucketName:
development:
Prefix: sagemaker-solutions-build
Prefix: sagemaker-solutions-devo
release:
Prefix: sagemaker-solutions
Prefix: sagemaker-solutions-prod
NotebookInstanceType:
"af-south-1":
Type: ml.t3.medium
"ap-east-1":
Type: ml.t3.medium
"ap-northeast-1":
Type: ml.t3.medium
"ap-northeast-2":
Type: ml.t2.medium
"ap-south-1":
Type: ml.t2.medium
"ap-southeast-1":
Type: ml.t3.medium
"ap-southeast-2":
Type: ml.t3.medium
"ca-central-1":
Type: ml.t3.medium
"eu-central-1":
Type: ml.t3.medium
"eu-north-1":
Type: ml.t3.medium
"eu-south-1":
Type: ml.t3.medium
"eu-west-1":
Type: ml.t3.medium
"eu-west-2":
Type: ml.t3.medium
"eu-west-3":
Type: ml.t3.medium
"me-south-1":
Type: ml.t3.medium
"sa-east-1":
Type: ml.t3.medium
"us-east-1":
Type: ml.t3.medium
"us-east-2":
Type: ml.t3.medium
"us-west-1":
Type: ml.t3.medium
"us-west-2":
Type: ml.t3.medium

Resources:
BasicNotebookInstance:
Type: 'AWS::SageMaker::NotebookInstance'
Properties:
InstanceType: ml.t3.medium
InstanceType: !FindInMap [NotebookInstanceType, !Ref "AWS::Region", Type]
NotebookInstanceName: !Sub "${SolutionPrefix}-notebook-instance"
RoleArn: !Ref NotebookInstanceExecutionRoleArn
LifecycleConfigName: !GetAtt
Expand All @@ -55,8 +97,8 @@ Resources:
cd /home/ec2-user/SageMaker
# copy source files
aws s3 sync s3://${SolutionsS3BucketNamePrefix}-${AWS::Region}/${SolutionName}/source .
unzip ./creditcardfraud.zip -d ./notebooks/
rm ./creditcardfraud.zip
# copy test files
aws s3 sync s3://${SolutionsS3BucketNamePrefix}-${AWS::Region}/${SolutionName}/test ./test
# create stack_outputs.json with stack resources that are required in notebook(s)
touch stack_outputs.json
echo '{' >> stack_outputs.json
Expand All @@ -66,22 +108,33 @@ Resources:
echo ' "AwsRegion": "${AWS::Region}",' >> stack_outputs.json
echo ' "IamRole": "${NotebookInstanceExecutionRoleArn}",' >> stack_outputs.json
echo ' "ModelDataBucket": "${ModelDataBucket}",' >> stack_outputs.json
echo ' "SolutionsS3Bucket": "${SolutionsS3BucketNamePrefix}-${AWS::Region}",' >> stack_outputs.json
echo ' "SolutionsS3Bucket": "${SolutionsS3BucketNamePrefix}",' >> stack_outputs.json
echo ' "RESTAPIGateway": "${RESTAPIGateway}",' >> stack_outputs.json
echo ' "TestOutputsS3Bucket": "${TestOutputsS3Bucket}",' >> stack_outputs.json
echo ' "SolutionName": "${SolutionName}",' >> stack_outputs.json
echo ' "SagemakerMode": "NotebookInstance"' >> stack_outputs.json
echo '}' >> stack_outputs.json
echo "stack_outputs.json created:"
cat stack_outputs.json
# Replace placeholders
cd /home/ec2-user/SageMaker/notebooks
sed -s -i 's/HUB_1P_IMAGE/conda_python3/g' *.ipynb
EOF
- SolutionsS3BucketNamePrefix:
Fn::FindInMap: [SolutionsS3BucketName, Ref: StackVersion, Prefix]
OnStart:
- Content:
Fn::Base64: |
#!/bin/bash
set -e
# perform following actions as ec2-user
sudo -u ec2-user -i <<EOF
/home/ec2-user/anaconda3/envs/python3/bin/python /home/ec2-user/SageMaker/env_setup.py --force --log-level DEBUG
cd /home/ec2-user/SageMaker
for nb in notebooks/*.ipynb; do python ./scripts/set_kernelspec.py --notebook "$nb" --kernel "conda_python3" --display-name "conda_python3"; done
# Optionally run the solution's notebook if this was an integration test launch
nohup /home/ec2-user/anaconda3/envs/python3/bin/python ./test/run_notebook.py > ./test/run_notebook.log 2>&1 &
echo "OnStart script completed!"
EOF
Outputs:
SageMakerNotebook:
Expand Down
Loading

0 comments on commit 8b87a3a

Please sign in to comment.