diff --git a/README.md b/README.md index 8364b49..b62c636 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,59 @@ -## Fraud Detection Using Machine Learning +# Fraud Detection using Machine Learning -Setup end to end demo architecture for predicting fraud events with Machine Learning using Amazon SageMaker +With businesses moving online, fraud and abuse in online systems is constantly increasing as well. Traditionally, rule-based fraud detection systems are used to combat online fraud, but these rely on a static set of rules created by human experts. This project uses machine learning to create models for fraud detection that are dynamic, self-improving and maintainable. Importantly, they can scale with the online business. + +Specifically, we show how to use Amazon SageMaker to train supervised and unsupervised machine learning models on historical transactions, so that they can predict the likelihood of incoming transactions being fraudulent or not. We also show how to deploy the models, once trained, to a REST API that can be integrated into an existing business software infracture. This project includes a demonstration of this process using a public, anonymized credit card transactions [dataset provided by ULB](https://www.kaggle.com/mlg-ulb/creditcardfraud), but can be easily modified to work with custom labelled or unlaballed data provided as a relational table in csv format. + +## Getting Started + +To get started quickly, use the following quick-launch link to launch a CloudFormation Stack create form and follow the instructions below to deploy the resources in this project. + +| Region | Stack | +| ---- | ---- | +|US West (Oregon) | [](https://us-west-2.console.aws.amazon.com/cloudformation/home?region=us-west-2#/stacks/create/review?templateURL=https://s3.amazonaws.com/sagemaker-solutions-us-west-2/Fraud-detection-using-machine-learning/deployment/fraud-detection-using-machine-learning.yaml&stackName=SageMaker-Fraud-Machine-Learning) | + + +### Additional Instructions + +* On the stack creation page, enter a name in the **Model and Data Bucket Name** field under S3 configurations and in the **Results Bucket Name**, check the box to acknowledge creation of IAM resources, and click **Create Stack**. This should trigger the creation of the CloudFormation stack. + +* Once the stack is created, go to the Outputs tab and click on the *SageMakerNotebook* link. This will open up the jupyter notebook in a SageMaker Notebook instance where you can run the code in the notebook. + +## Architecture + +The project architecture deployed by the cloud formation template is shown here. + +![](deployment/architecture.png) + +## Project Description +The project uses Amazon SageMaker to train both a supervised and an unsupervised machine learning models, which are then deployed using Amazon Sagemaker-managed endpoints. + +If you have labels for your data, for example if some of the transactions have been annotated as fraudulent and some as legitimate, then you can train a supervised learning model to learn to discern the two classes. In this project, we provide a recipe to train a gradient boosted decision tree model using [XGBoost on Amazon SageMaker](https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html). The supervised model training process also handles the common issue of working with highly imbalanced data in fraud detection problems. The project addresses this issue into two ways by 1) implementing data upsampling using the "imbalanced-learn" package, and 2) using scale position weight to control the balance of positive and negative weights. + +If you don't have labelled data or if you want to augment your supervised model predictions with an anomaly score from an unsupervised model, then the project also trains a [RandomCutForest](https://docs.aws.amazon.com/sagemaker/latest/dg/randomcutforest.html) model using Amazon SageMaker. The RandomCutForest algorithm is trained on the entire dataset, without labels, and takes advantage of the highly imbalanced nature of fraud datasets, to predict higher anomaly scores for the fraudulent transactions in the dataset. + +Both of the trained models are deployed to Amazon SageMaker managed real-time endpoints that host the models and can be invoked to provide model predictions for new transactions. + +The model training and endpoint deployment is orchestrated by running a [jupyter notebook](source/notebooks/sagemaker_fraud_detection.ipynb) on a SageMaker Notebook instance. The jupyter notebook runs a demonstration of the project using the aforementioned anonymized credit card dataset that is automatically downloaded to the Amazon S3 Bucket created when you launch the solution. However, the notebook can be modified to run the project on a custom dataset in S3. The notebook instance also contains some example code that shows how to invoke the REST API for inference. + +In order to encapsulate the project as a stand-alone microservice, Amazon API Gateway is used to provide a REST API, that is backed by an AWS Lambda function. The Lambda function runs the [code](https://github.com/awslabs/fraud-detection-using-machine-learning/blob/master/source/fraud_detection/index.py) to preprocess incoming transactions, invoke sagemaker endpoints, merge results from both endpoints if necessary, store the model inputs and model predictions in S3 via Kinesis Firehose, and provide a response to the client. + + +## Contents + +* `deployment/` + * `fraud-detection-using-machine-learning.yaml`: Creates AWS CloudFormation Stack for solution +* `source/` + * `fraud-detection/` + * `index.py`: Lambda function script for invoking SageMaker endpoints for inference + * `notebooks/` + * `generate_endpoint_traffic.py`: Custom script to show how to send transaction traffic to REST API for inference + * `sagemaker_fraud_detection.ipynb`: Orchestrates the solution. Trains the models and deploys the trained model + * `setup/` + * `on-start.sh`: Bash script to setup sagemaker notebook environment with necessary dependencies ## License -This library is licensed under the Apache 2.0 License. +This project is licensed under the Apache-2.0 License. + + diff --git a/deployment/architecture.png b/deployment/architecture.png new file mode 100644 index 0000000..04d694c Binary files /dev/null and b/deployment/architecture.png differ diff --git a/deployment/fraud-detection-using-machine-learning.template b/deployment/fraud-detection-using-machine-learning.template deleted file mode 100644 index 13e8aea..0000000 --- a/deployment/fraud-detection-using-machine-learning.template +++ /dev/null @@ -1,804 +0,0 @@ -{ - "AWSTemplateFormatVersion": "2010-09-09", - "Description": "(SO0056) - fraud-detection-using-machine-learning: Solution for predicting fraud events with ML using Amazon SageMaker. Version 2", - "Parameters": { - "S3BucketName1": { - "Type": "String", - "Description": "New bucket for storing the Amazon SageMaker model and training data." - }, - "S3BucketName2": { - "Type": "String", - "Description": "New bucket for storing processed events for visualization features." - }, - "KinesisFirehosePrefix": { - "Type": "String", - "Default": "fraud-detection/firehose/", - "Description": "Kinesis Firehose prefix for delivery of processed events." - } - }, - "Metadata": { - "AWS::CloudFormation::Interface": { - "ParameterGroups": [{ - "Label": { - "default": "Amazon S3 Bucket Configuration" - }, - "Parameters": ["S3BucketName1", "S3BucketName2"] - }, - { - "Label": { - "default": "Amazon Kinesis Firehose Configuration" - }, - "Parameters": ["KinesisFirehosePrefix"] - } - ], - "ParameterLabels": { - "S3BucketName1": { - "default": "Model and Data Bucket Name" - }, - "S3BucketName2": { - "default": "Results Bucket Name" - }, - "KinesisFirehosePrefix": { - "default": "Kinesis Firehose S3 Prefix" - } - } - } - }, - "Mappings": { - "Function": { - "FraudDetection": { - "S3Bucket": "%%BUCKET_NAME%%", - "S3Key": "fraud-detection-using-machine-learning/%%VERSION%%/fraud_detection.zip" - } - }, - "Notebook": { - "FraudDetection": { - "S3Bucket": "%%BUCKET_NAME%%", - "S3Key": "/fraud-detection-using-machine-learning/%%VERSION%%/notebooks/sagemaker_fraud_detection.ipynb" - } - }, - "Script": { - "Install": { - "S3Bucket": "%%BUCKET_NAME%%", - "S3Key": "/fraud-detection-using-machine-learning/%%VERSION%%/notebooks/on-start.sh" - }, - "GenerateTraffic": { - "S3Bucket": "%%BUCKET_NAME%%", - "S3Key": "/fraud-detection-using-machine-learning/%%VERSION%%/notebooks/generate_endpoint_traffic.py" - } - } - }, - "Resources": { - "S3Bucket1": { - "Type": "AWS::S3::Bucket", - "Properties": { - "BucketName": { - "Ref": "S3BucketName1" - }, - "PublicAccessBlockConfiguration": { - "BlockPublicAcls": true, - "BlockPublicPolicy": true, - "IgnorePublicAcls": true, - "RestrictPublicBuckets": true - }, - "BucketEncryption": { - "ServerSideEncryptionConfiguration": [{ - "ServerSideEncryptionByDefault": { - "SSEAlgorithm": "AES256" - } - }] - } - }, - "Metadata": { - "cfn_nag": { - "rules_to_suppress": [{ - "id": "W35", - "reason": "Configuring logging requires supplying an existing customer S3 bucket to store logs" - }, - { - "id": "W51", - "reason": "Default policy works fine" - }] - } - } - }, - "S3Bucket2": { - "Type": "AWS::S3::Bucket", - "Properties": { - "BucketName": { - "Ref": "S3BucketName2" - }, - "PublicAccessBlockConfiguration": { - "BlockPublicAcls": true, - "BlockPublicPolicy": true, - "IgnorePublicAcls": true, - "RestrictPublicBuckets": true - }, - "BucketEncryption": { - "ServerSideEncryptionConfiguration": [{ - "ServerSideEncryptionByDefault": { - "SSEAlgorithm": "AES256" - } - }] - } - }, - "Metadata": { - "cfn_nag": { - "rules_to_suppress": [{ - "id": "W35", - "reason": "Configuring logging requires supplying an existing customer S3 bucket to store logs" - }, - { - "id": "W51", - "reason": "Default policy works fine" - }] - } - } - }, - "BasicNotebookInstance": { - "Type": "AWS::SageMaker::NotebookInstance", - "Properties": { - "InstanceType": "ml.t2.medium", - "NotebookInstanceName": "FraudDetectionNotebookInstance", - "RoleArn": { - "Fn::GetAtt": [ - "NotebookInstanceExecutionRole", - "Arn" - ] - }, - "LifecycleConfigName": { - "Fn::GetAtt": [ - "BasicNotebookInstanceLifecycleConfig", - "NotebookInstanceLifecycleConfigName" - ] - } - } - }, - "BasicNotebookInstanceLifecycleConfig": { - "Type": "AWS::SageMaker::NotebookInstanceLifecycleConfig", - "DependsOn": [ - "S3Bucket1" - ], - "Properties": { - "OnStart": [{ - "Content": { - "Fn::Base64": { - "Fn::Join": [";", ["cd /home/ec2-user/SageMaker", - { - "Fn::Join": ["", ["aws s3 cp s3://", { - "Fn::Join": ["-", [{ - "Fn::FindInMap": ["Notebook", "FraudDetection", "S3Bucket"] - }, { - "Ref": "AWS::Region" - }]] - }, { - "Fn::FindInMap": ["Notebook", "FraudDetection", "S3Key"] - }, " ."]] - }, - { - "Fn::Join": ["", ["sed -i 's/fraud-detection-end-to-end-demo/", { - "Ref": "S3BucketName1" - }, "/g' sagemaker_fraud_detection.ipynb"]] - }, - { - "Fn::Join": ["", ["aws s3 cp s3://", { - "Fn::Join": ["-", [{ - "Fn::FindInMap": ["Script", "GenerateTraffic", "S3Bucket"] - }, { - "Ref": "AWS::Region" - }]] - }, { - "Fn::FindInMap": ["Script", "GenerateTraffic", "S3Key"] - }, " ."]] - }, - { - "Fn::Join": ["", ["aws s3 cp s3://", { - "Fn::Join": ["-", [{ - "Fn::FindInMap": ["Script", "Install", "S3Bucket"] - }, { - "Ref": "AWS::Region" - }]] - }, { - "Fn::FindInMap": ["Script", "Install", "S3Key"] - }, " ."]] - }, - { - "Fn::Join": ["", ["sed -i 's/fraud-detection-api-placeholder/", { - "Ref": "RESTAPIGateway" - }, "/g' generate_endpoint_traffic.py"]] - }, - "bash ./on-start.sh" - ]] - } - } - }] - } - }, - "NotebookInstanceExecutionRole": { - "Type": "AWS::IAM::Role", - "Properties": { - "AssumeRolePolicyDocument": { - "Version": "2012-10-17", - "Statement": [{ - "Effect": "Allow", - "Principal": { - "Service": [ - "sagemaker.amazonaws.com" - ] - }, - "Action": [ - "sts:AssumeRole" - ] - }] - } - } - }, - "NotebookInstanceIAMPolicy": { - "Type": "AWS::IAM::Policy", - "Properties": { - "PolicyName": "sm-notebook-instance-policy", - "PolicyDocument": { - "Version": "2012-10-17", - "Statement": [{ - "Effect": "Allow", - "Action": [ - "s3:GetBucketLocation", - "s3:ListBucket", - "s3:GetObject", - "s3:PutObject", - "s3:DeleteObject" - ], - "Resource": [{ - "Fn::Join": ["", ["arn:aws:s3:::", { - "Ref": "S3BucketName1" - }]] - }, - { - "Fn::Join": ["", ["arn:aws:s3:::", { - "Ref": "S3BucketName1" - }, "/*"]] - }, - { - "Fn::Join": ["", ["arn:aws:s3:::", { - "Fn::Join": ["-", [{ - "Fn::FindInMap": ["Notebook", "FraudDetection", "S3Bucket"] - }, { - "Ref": "AWS::Region" - }]] - }, "/*"]] - } - ] - }, - { - "Effect": "Allow", - "Action": [ - "sagemaker:CreateTrainingJob", - "sagemaker:DescribeTrainingJob", - "sagemaker:CreateModel", - "sagemaker:DescribeModel", - "sagemaker:DeleteModel", - "sagemaker:CreateEndpoint", - "sagemaker:CreateEndpointConfig", - "sagemaker:DescribeEndpoint", - "sagemaker:DescribeEndpointConfig", - "sagemaker:DeleteEndpoint", - "sagemaker:DeleteEndpointConfig", - "sagemaker:InvokeEndpoint" - ], - "Resource": [{ - "Fn::Join": ["", ["arn:aws:sagemaker:", { - "Ref": "AWS::Region" - }, ":", { - "Ref": "AWS::AccountId" - }, ":*"]] - }] - }, - { - "Effect": "Allow", - "Action": [ - "ecr:GetDownloadUrlForLayer", - "ecr:BatchGetImage", - "ecr:BatchCheckLayerAvailability" - ], - "Resource": [{ - "Fn::Join": ["", ["arn:aws:ecr:", { - "Ref": "AWS::Region" - }, ":", { - "Ref": "AWS::AccountId" - }, ":repository/*"]] - }] - }, - { - "Effect": "Allow", - "Action": [ - "ec2:CreateVpcEndpoint", - "ec2:DescribeRouteTables" - ], - "Resource": "*" - }, - { - "Effect": "Allow", - "Action": [ - "cloudwatch:PutMetricData", - "cloudwatch:GetMetricData", - "cloudwatch:GetMetricStatistics", - "cloudwatch:ListMetrics" - ], - "Resource": [{ - "Fn::Join": ["", ["arn:aws:cloudwatch:", { - "Ref": "AWS::Region" - }, ":", { - "Ref": "AWS::AccountId" - }, ":*"]] - }] - }, - { - "Effect": "Allow", - "Action": [ - "logs:CreateLogGroup", - "logs:CreateLogStream", - "logs:DescribeLogStreams", - "logs:GetLogEvents", - "logs:PutLogEvents" - ], - "Resource": { - "Fn::Join": ["", ["arn:aws:logs:", { - "Ref": "AWS::Region" - }, ":", { - "Ref": "AWS::AccountId" - }, ":log-group:/aws/sagemaker/*"]] - } - }, - { - "Effect": "Allow", - "Action": [ - "iam:PassRole" - ], - "Resource": [{ - "Fn::GetAtt": ["NotebookInstanceExecutionRole", "Arn"] - } - - ], - "Condition": { - "StringEquals": { - "iam:PassedToService": "sagemaker.amazonaws.com" - } - } - }, - { - "Effect": "Allow", - "Action": [ - "iam:GetRole" - ], - "Resource": [{ - "Fn::GetAtt": ["NotebookInstanceExecutionRole", "Arn"] - }] - }, - { - "Effect": "Allow", - "Action": [ - "lambda:InvokeFunction" - ], - "Resource": [{ - "Fn::GetAtt": ["LambdaFunction", "Arn"] - }] - }, - { - "Effect": "Allow", - "Action": [ - "execute-api:Invoke" - ], - "Resource": [{ - "Fn::Join": [ - "", - [ - "arn:aws:execute-api:", - { "Ref": "AWS::Region" }, - ":", - { "Ref": "AWS::AccountId" }, - ":", - { "Ref": "RESTAPIGateway" }, - "/*/POST/*" - ] - ] - }] - } - ] - }, - "Roles": [{ - "Ref": "NotebookInstanceExecutionRole" - }] - }, - "Metadata": { - "cfn_nag": { - "rules_to_suppress": [{ - "id": "W12", - "reason": "This policy needs to have * resource because some of the resources are created dynamically and some of its actions are * resource actions" - }] - } - } - }, - "LambdaFunction": { - "Type": "AWS::Lambda::Function", - "Properties": { - "Handler": "index.lambda_handler", - "FunctionName": "fraud-detection-event-processor", - "Role": { - "Fn::GetAtt": [ - "LambdaExecutionRole", - "Arn" - ] - }, - "Code": { - "S3Bucket": { - "Fn::Join": ["-", [{ - "Fn::FindInMap": ["Function", "FraudDetection", "S3Bucket"] - }, { - "Ref": "AWS::Region" - }]] - }, - "S3Key": { - "Fn::FindInMap": ["Function", "FraudDetection", "S3Key"] - } - }, - "Runtime": "python3.6" - } - }, - "LambdaExecutionRole": { - "Type": "AWS::IAM::Role", - "Properties": { - "AssumeRolePolicyDocument": { - "Version": "2012-10-17", - "Statement": [{ - "Effect": "Allow", - "Principal": { - "Service": [ - "lambda.amazonaws.com" - ] - }, - "Action": [ - "sts:AssumeRole" - ] - }] - }, - "Path": "/", - "Policies": [{ - "PolicyName": "root", - "PolicyDocument": { - "Version": "2012-10-17", - "Statement": [{ - "Effect": "Allow", - "Action": [ - "logs:CreateLogGroup", - "logs:CreateLogStream", - "logs:PutLogEvents" - ], - "Resource": { - "Fn::Join": ["", ["arn:aws:logs:", { - "Ref": "AWS::Region" - }, ":", { - "Ref": "AWS::AccountId" - }, ":log-group:/aws/lambda/*"]] - } - }, - { - "Effect": "Allow", - "Action": [ - "sagemaker:InvokeEndpoint" - ], - "Resource": [ - "arn:aws:sagemaker:*:*:endpoint/*" - ] - }, - { - "Effect": "Allow", - "Action": [ - "firehose:PutRecord", - "firehose:PutRecordBatch" - ], - "Resource": { - "Fn::GetAtt": [ - "KinesisFirehoseDeliveryStream", - "Arn" - ] - } - } - - ] - } - }] - }, - "Metadata": { - "cfn_nag": { - "rules_to_suppress": [{ - "id": "W11", - "reason": "This role needs to have * resource in it's policy because resource names are created dynamically and some of its actions are * resource actions" - }] - } - } - }, - "KinesisFirehoseDeliveryStream": { - "Type": "AWS::KinesisFirehose::DeliveryStream", - "Properties": { - "DeliveryStreamName": "fraud-detection-firehose-stream", - "DeliveryStreamType": "DirectPut", - "S3DestinationConfiguration": { - "BucketARN": { - "Fn::GetAtt": [ - "S3Bucket2", - "Arn" - ] - }, - "Prefix": { - "Ref": "KinesisFirehosePrefix" - }, - "BufferingHints": { - "IntervalInSeconds": 60, - "SizeInMBs": 100 - }, - "CompressionFormat": "GZIP", - "EncryptionConfiguration": { - "NoEncryptionConfig": "NoEncryption" - }, - "RoleARN": { - "Fn::GetAtt": [ - "FirehoseDeliveryIAMRole", - "Arn" - ] - } - } - }, - "DependsOn": [ - "FirehoseDeliveryIAMPolicy" - ] - }, - "FirehoseDeliveryIAMRole": { - "Type": "AWS::IAM::Role", - "Properties": { - "AssumeRolePolicyDocument": { - "Version": "2012-10-17", - "Statement": [{ - "Sid": "", - "Effect": "Allow", - "Principal": { - "Service": "firehose.amazonaws.com" - }, - "Action": [ - "sts:AssumeRole" - ] - }] - } - } - }, - "FirehoseDeliveryIAMPolicy": { - "Type": "AWS::IAM::Policy", - "Properties": { - "PolicyName": "fraud-detection-firehose-policy", - "PolicyDocument": { - "Version": "2012-10-17", - "Statement": [{ - "Effect": "Allow", - "Action": [ - "s3:AbortMultipartUpload", - "s3:GetBucketLocation", - "s3:GetObject", - "s3:ListBucket", - "s3:ListBucketMultipartUploads", - "s3:PutObject" - ], - "Resource": [{ - "Fn::Join": ["", ["arn:aws:s3:::", { - "Ref": "S3BucketName2" - }]] - }, - { - "Fn::Join": ["", ["arn:aws:s3:::", { - "Ref": "S3BucketName2" - }, "/", { - "Ref": "KinesisFirehosePrefix" - }, "*"]] - } - ] - }] - }, - "Roles": [{ - "Ref": "FirehoseDeliveryIAMRole" - }] - }, - "Metadata": { - "cfn_nag": { - "rules_to_suppress": [{ - "id": "W12", - "reason": "This policy needs to have * resource because some of its actions are * resource actions" - }] - } - } - }, - "RESTAPIGateway" : { - "Type" : "AWS::ApiGateway::RestApi", - "Properties" : { - "Description" : "A REST API that can be used to invoke the Lambda function that triggers predictions.", - "Name" : "model-invocation-api", - "EndpointConfiguration": { - "Types": ["REGIONAL"] - } - } - }, - "APIGatewayCloudWatchLogGroup": { - "Type" : "AWS::Logs::LogGroup", - "Properties" : { - "LogGroupName" : {"Fn::Join": ["/", ["/aws/apigateway/AccessLogs", {"Ref": "RESTAPIGateway"}, "prod"]]}, - "RetentionInDays" : 3653 - } - }, - "APIGatewayCloudWatchRole": { - "Type": "AWS::IAM::Role", - "Properties": { - "AssumeRolePolicyDocument": { - "Version": "2012-10-17", - "Statement": [ - { - "Effect": "Allow", - "Principal": { - "Service": [ - "apigateway.amazonaws.com" - ] - }, - "Action": "sts:AssumeRole" - } - ] - }, - "Path": "/", - "ManagedPolicyArns": [ - "arn:aws:iam::aws:policy/service-role/AmazonAPIGatewayPushToCloudWatchLogs" - ] - } - }, - "APIGatewayAccount": { - "Type": "AWS::ApiGateway::Account", - "Properties": { - "CloudWatchRoleArn": { - "Fn::GetAtt": [ - "APIGatewayCloudWatchRole", - "Arn" - ] - } - }, - "DependsOn": ["RESTAPIGateway"] - }, - "LambdaAPIPermission" : { - "Type": "AWS::Lambda::Permission", - "Properties": { - "Action": "lambda:InvokeFunction", - "FunctionName": {"Fn::GetAtt": ["LambdaFunction", "Arn"]}, - "Principal": "apigateway.amazonaws.com", - "SourceArn": { - "Fn::Join": [ - "", - [ - "arn:aws:execute-api:", - { "Ref": "AWS::Region" }, - ":", - { "Ref": "AWS::AccountId" }, - ":", - { "Ref": "RESTAPIGateway" }, - "/*/POST/*" - ] - ] - } - } - }, - "RESTInvocationResource" :{ - "Type" : "AWS::ApiGateway::Resource", - "Properties" : { - "ParentId" : { "Fn::GetAtt": ["RESTAPIGateway", "RootResourceId"] }, - "PathPart" : "invocations", - "RestApiId" : {"Ref" : "RESTAPIGateway"} - } - }, - "POSTMethod": { - "Type": "AWS::ApiGateway::Method", - "Properties": { - "RestApiId": { - "Ref": "RESTAPIGateway" - }, - "ResourceId": { - "Ref": "RESTInvocationResource" - }, - "HttpMethod": "POST", - "AuthorizationType": "AWS_IAM", - "Integration": { - "Type": "AWS", - "IntegrationHttpMethod": "POST", - "Uri": {"Fn::Join" : ["", ["arn:aws:apigateway:", {"Ref": "AWS::Region"}, ":lambda:path/2015-03-31/functions/", {"Fn::GetAtt": ["LambdaFunction", "Arn"]}, "/invocations"]]}, - "IntegrationResponses": [{ - "ResponseTemplates": { - "application/json": "" - }, - "StatusCode": 200 - }, { - "SelectionPattern": "^not found.*", - "ResponseTemplates": { - "application/json": "{}" - }, - "StatusCode": 404 - }], - "PassthroughBehavior": "WHEN_NO_TEMPLATES", - "RequestTemplates": { - "application/json": - "{\"data\": $input.json('$.data'),\"metadata\": $input.json('$.metadata'),\"model\": \"$input.params('model')\"}" - } - }, - "MethodResponses": [{ - "ResponseModels": { - "application/json": "Empty" - }, - "StatusCode": 200 - }, { - "ResponseModels": { - "application/json": "Empty" - }, - "StatusCode": 404 - }], - "RequestParameters": { - "method.request.querystring.model": false - } - } - }, - "RestApiDeployment": { - "Type": "AWS::ApiGateway::Deployment", - "Properties": { - "StageDescription": { - "AccessLogSetting": { - "DestinationArn" : {"Fn::GetAtt": ["APIGatewayCloudWatchLogGroup", "Arn"]}, - "Format" : { - "Fn::Join": [",", ["{\"requestId\":\"$context.requestId\"", - "\"ip\": \"$context.identity.sourceIp\"", - "\"caller\":\"$context.identity.caller\"", - "\"user\":\"$context.identity.user\"", - "\"requestTime\":\"$context.requestTime\"", - "\"httpMethod\":\"$context.httpMethod\"", - "\"resourcePath\":\"$context.resourcePath\"", - "\"status\":\"$context.status\"", - "\"protocol\":\"$context.protocol\"", - "\"responseLength\":\"$context.responseLength\"}" - ] - ] - } - } - }, - "RestApiId": {"Ref": "RESTAPIGateway"}, - "StageName": "prod" - }, - "DependsOn": ["POSTMethod"] - } - }, - "Outputs": { - "BasicNotebookInstanceId": { - "Value": { - "Ref": "BasicNotebookInstance" - } - }, - "firehoseDeliveryStreamArn": { - "Description": "Firehose Delivery Stream ARN", - "Value": { - "Fn::GetAtt": [ - "KinesisFirehoseDeliveryStream", - "Arn" - ] - } - }, - "firehoseDeliveryRoleArn": { - "Description": "Firehose Delivery Role ARN", - "Value": { - "Fn::GetAtt": [ - "FirehoseDeliveryIAMRole", - "Arn" - ] - } - }, - "RestApiId" : { - "Value": {"Ref": "RESTAPIGateway"} - } - } -} \ No newline at end of file diff --git a/deployment/fraud-detection-using-machine-learning.yaml b/deployment/fraud-detection-using-machine-learning.yaml new file mode 100644 index 0000000..324a68b --- /dev/null +++ b/deployment/fraud-detection-using-machine-learning.yaml @@ -0,0 +1,661 @@ +AWSTemplateFormatVersion: 2010-09-09 +Description: >- + (SO0056) - fraud-detection-using-machine-learning: Solution for predicting + fraud events with ML using Amazon SageMaker. Version 3 +Parameters: + ModelDataBucketName: + Type: String + Description: New bucket for storing the Amazon SageMaker model and training data. + OutputBucketName: + Type: String + Description: Bucket to store the predictions to be visualized using Quicksight. + KinesisFirehosePrefix: + Type: String + Default: fraud-detection/firehose/ + Description: Kinesis Firehose prefix for delivery of processed events. + SolutionsS3BucketName: + Description: Enter the name of the S3 bucket for the solution + Type: String + Default: "sagemaker-solutions" +Metadata: + 'AWS::CloudFormation::Interface': + ParameterGroups: + - Label: + default: Amazon S3 Bucket Configuration + Parameters: + - ModelDataBucketName + - OutputBucketName + - Label: + default: Amazon Kinesis Firehose Configuration + Parameters: + - KinesisFirehosePrefix + - Label: + default: Solution Configuration Parameters + Parameters: + - SolutionsS3BucketName + ParameterLabels: + SolutionsS3BucketName: + default: SageMaker Solution Bucket Base Name + ModelDataBucketName: + default: Model and Data Bucket Name + OutputBucketName: + default: Prediction Output Bucket Name + KinesisFirehosePrefix: + default: Kinesis Firehose S3 Prefix +Mappings: + Function: + FraudDetection: + S3Key: fraud-detection-using-machine-learning/build/model_invocation.zip + Notebook: + FraudDetection: + S3Key: >- + /fraud-detection-using-machine-learning/notebooks/sagemaker_fraud_detection.ipynb + Script: + GenerateTraffic: + S3Key: >- + /fraud-detection-using-machine-learning/notebooks/generate_endpoint_traffic.py + KibanaDashboard: + S3Key: /fraud-detection-using-machine-learning/notebooks/dashboard.json +Resources: + ModelDataBucket: + Type: 'AWS::S3::Bucket' + Properties: + BucketName: !Ref ModelDataBucketName + PublicAccessBlockConfiguration: + BlockPublicAcls: true + BlockPublicPolicy: true + IgnorePublicAcls: true + RestrictPublicBuckets: true + BucketEncryption: + ServerSideEncryptionConfiguration: + - ServerSideEncryptionByDefault: + SSEAlgorithm: AES256 + LoggingConfiguration: + DestinationBucketName: !Ref LogBucket + LogFilePrefix: fraud-model-data-bucket/ + Metadata: + cfn_nag: + rules_to_suppress: + - id: W35 + reason: >- + Configuring logging requires supplying an existing customer S3 + bucket to store logs + - id: W51 + reason: Default policy works fine + S3Bucket2: + Type: 'AWS::S3::Bucket' + Properties: + BucketName: !Ref OutputBucketName + PublicAccessBlockConfiguration: + BlockPublicAcls: true + BlockPublicPolicy: true + IgnorePublicAcls: true + RestrictPublicBuckets: true + BucketEncryption: + ServerSideEncryptionConfiguration: + - ServerSideEncryptionByDefault: + SSEAlgorithm: AES256 + LoggingConfiguration: + DestinationBucketName: !Ref LogBucket + LogFilePrefix: fraud-output-bucket/ + Metadata: + cfn_nag: + rules_to_suppress: + - id: W35 + reason: >- + Configuring logging requires supplying an existing customer S3 + bucket to store logs + - id: W51 + reason: Default policy works fine + LogBucket: + Type: 'AWS::S3::Bucket' + DeletionPolicy: Retain + Properties: + AccessControl: LogDeliveryWrite + PublicAccessBlockConfiguration: + BlockPublicAcls: true + BlockPublicPolicy: true + IgnorePublicAcls: true + RestrictPublicBuckets: true + BucketEncryption: + ServerSideEncryptionConfiguration: + - ServerSideEncryptionByDefault: + SSEAlgorithm: AES256 + Metadata: + cfn_nag: + rules_to_suppress: + - id: W35 + reason: this is the log bucket + LogBucketPolicy: + Type: 'AWS::S3::BucketPolicy' + Properties: + Bucket: !Ref LogBucket + PolicyDocument: + Version: 2012-10-17 + Statement: + - Sid: AWSCloudTrailAclCheck + Effect: Allow + Principal: + Service: cloudtrail.amazonaws.com + Action: 's3:GetBucketAcl' + Resource: !GetAtt + - LogBucket + - Arn + - Sid: AWSCloudTrailWrite + Effect: Allow + Principal: + Service: cloudtrail.amazonaws.com + Action: 's3:PutObject' + Resource: !Join + - '' + - - 'arn:aws:s3:::' + - !Ref LogBucket + - /AWSLogs/ + - !Ref 'AWS::AccountId' + - /* + Condition: + StringEquals: + 's3:x-amz-acl': bucket-owner-full-control + BasicNotebookInstance: + Type: 'AWS::SageMaker::NotebookInstance' + Properties: + InstanceType: ml.t3.medium + NotebookInstanceName: FraudDetectionNotebookInstance + RoleArn: !GetAtt + - NotebookInstanceExecutionRole + - Arn + LifecycleConfigName: !GetAtt + - BasicNotebookInstanceLifecycleConfig + - NotebookInstanceLifecycleConfigName + DependsOn: + - NotebookInstanceIAMPolicy + Metadata: + cfn_nag: + rules_to_suppress: + - id: W1201 + reason: Solution does not have KMS encryption enabled by default + BasicNotebookInstanceLifecycleConfig: + Type: 'AWS::SageMaker::NotebookInstanceLifecycleConfig' + DependsOn: + - ModelDataBucket + Properties: + OnCreate: + - Content: + Fn::Base64: !Sub | + set -e + # perform following actions as ec2-user + sudo -u ec2-user -i <> .env + echo "AWS_REGION=${AWS::Region}" >> .env + echo "SAGEMAKER_IAM_ROLE=${NotebookInstanceExecutionRole.Arn}" >> .env + echo "SOLUTIONS_S3_BUCKET=${SolutionsS3BucketName}-${AWS::Region}" >> .env + echo "MODEL_DATA_S3_BUCKET=${ModelDataBucketName}" >> .env + echo "REST_API_GATEWAY=${RESTAPIGateway}" >> .env + EOF + OnStart: + - Content: + Fn::Base64: !Sub | + set -e + # perform following actions as ec2-user + sudo -u ec2-user -i <- + This policy needs to have * resource because some of the resources + are created dynamically and some of its actions are * resource + actions + LambdaFunction: + Type: 'AWS::Lambda::Function' + Properties: + Handler: index.lambda_handler + FunctionName: fraud-detection-event-processor + Role: !GetAtt + - LambdaExecutionRole + - Arn + Code: + S3Bucket: !Sub "${SolutionsS3BucketName}-${AWS::Region}" + S3Key: !FindInMap + - Function + - FraudDetection + - S3Key + Runtime: python3.6 + LambdaExecutionRole: + Type: 'AWS::IAM::Role' + Properties: + AssumeRolePolicyDocument: + Version: 2012-10-17 + Statement: + - Effect: Allow + Principal: + Service: + - lambda.amazonaws.com + Action: + - 'sts:AssumeRole' + Path: / + Policies: + - PolicyName: root + PolicyDocument: + Version: 2012-10-17 + Statement: + - Effect: Allow + Action: + - 'logs:CreateLogGroup' + - 'logs:CreateLogStream' + - 'logs:PutLogEvents' + Resource: !Join + - '' + - - 'arn:aws:logs:' + - !Ref 'AWS::Region' + - ':' + - !Ref 'AWS::AccountId' + - ':log-group:/aws/lambda/*' + - Effect: Allow + Action: + - 'sagemaker:InvokeEndpoint' + Resource: + - 'arn:aws:sagemaker:*:*:endpoint/*' + - Effect: Allow + Action: + - 'firehose:PutRecord' + - 'firehose:PutRecordBatch' + Resource: !GetAtt + - KinesisFirehoseDeliveryStream + - Arn + Metadata: + cfn_nag: + rules_to_suppress: + - id: W11 + reason: >- + This role needs to have * resource in it's policy because resource + names are created dynamically and some of its actions are * + resource actions + KinesisFirehoseDeliveryStream: + Type: 'AWS::KinesisFirehose::DeliveryStream' + Properties: + DeliveryStreamName: fraud-detection-firehose-stream + DeliveryStreamType: DirectPut + S3DestinationConfiguration: + BucketARN: !GetAtt + - S3Bucket2 + - Arn + Prefix: !Ref KinesisFirehosePrefix + BufferingHints: + IntervalInSeconds: 60 + SizeInMBs: 100 + CompressionFormat: GZIP + EncryptionConfiguration: + NoEncryptionConfig: NoEncryption + RoleARN: !GetAtt + - FirehoseDeliveryIAMRole + - Arn + DependsOn: + - FirehoseDeliveryIAMPolicy + FirehoseDeliveryIAMRole: + Type: 'AWS::IAM::Role' + Properties: + AssumeRolePolicyDocument: + Version: 2012-10-17 + Statement: + - Sid: '' + Effect: Allow + Principal: + Service: firehose.amazonaws.com + Action: + - 'sts:AssumeRole' + FirehoseDeliveryIAMPolicy: + Type: 'AWS::IAM::Policy' + Properties: + PolicyName: fraud-detection-firehose-policy + PolicyDocument: + Version: 2012-10-17 + Statement: + - Effect: Allow + Action: + - 's3:AbortMultipartUpload' + - 's3:GetBucketLocation' + - 's3:GetObject' + - 's3:ListBucket' + - 's3:ListBucketMultipartUploads' + - 's3:PutObject' + Resource: + - !Join + - '' + - - 'arn:aws:s3:::' + - !Ref OutputBucketName + - !Join + - '' + - - 'arn:aws:s3:::' + - !Ref OutputBucketName + - / + - !Ref KinesisFirehosePrefix + - '*' + Roles: + - !Ref FirehoseDeliveryIAMRole + Metadata: + cfn_nag: + rules_to_suppress: + - id: W12 + reason: >- + This policy needs to have * resource because some of its actions + are * resource actions + RESTAPIGateway: + Type: 'AWS::ApiGateway::RestApi' + Properties: + Description: >- + A REST API that can be used to invoke the Lambda function that triggers + predictions. + Name: model-invocation-api + EndpointConfiguration: + Types: + - REGIONAL + APIGatewayCloudWatchLogGroup: + Type: 'AWS::Logs::LogGroup' + Properties: + LogGroupName: !Join + - / + - - /aws/apigateway/AccessLogs + - !Ref RESTAPIGateway + - prod + RetentionInDays: 365 + APIGatewayCloudWatchRole: + Type: 'AWS::IAM::Role' + Properties: + AssumeRolePolicyDocument: + Version: 2012-10-17 + Statement: + - Effect: Allow + Principal: + Service: + - apigateway.amazonaws.com + Action: 'sts:AssumeRole' + Path: / + ManagedPolicyArns: + - >- + arn:aws:iam::aws:policy/service-role/AmazonAPIGatewayPushToCloudWatchLogs + APIGatewayAccount: + Type: 'AWS::ApiGateway::Account' + Properties: + CloudWatchRoleArn: !GetAtt + - APIGatewayCloudWatchRole + - Arn + DependsOn: + - RESTAPIGateway + LambdaAPIPermission: + Type: 'AWS::Lambda::Permission' + Properties: + Action: 'lambda:InvokeFunction' + FunctionName: !GetAtt + - LambdaFunction + - Arn + Principal: apigateway.amazonaws.com + SourceArn: !Join + - '' + - - 'arn:aws:execute-api:' + - !Ref 'AWS::Region' + - ':' + - !Ref 'AWS::AccountId' + - ':' + - !Ref RESTAPIGateway + - /*/POST/* + RESTInvocationResource: + Type: 'AWS::ApiGateway::Resource' + Properties: + ParentId: !GetAtt + - RESTAPIGateway + - RootResourceId + PathPart: invocations + RestApiId: !Ref RESTAPIGateway + POSTMethod: + Type: 'AWS::ApiGateway::Method' + Properties: + RestApiId: !Ref RESTAPIGateway + ResourceId: !Ref RESTInvocationResource + HttpMethod: POST + AuthorizationType: AWS_IAM + Integration: + Type: AWS + IntegrationHttpMethod: POST + Uri: !Join + - '' + - - 'arn:aws:apigateway:' + - !Ref 'AWS::Region' + - ':lambda:path/2015-03-31/functions/' + - !GetAtt + - LambdaFunction + - Arn + - /invocations + IntegrationResponses: + - ResponseTemplates: + application/json: '' + StatusCode: 200 + - SelectionPattern: ^not found.* + ResponseTemplates: + application/json: '{}' + StatusCode: 404 + PassthroughBehavior: WHEN_NO_TEMPLATES + RequestTemplates: + application/json: >- + {"data": $input.json('$.data'),"metadata": + $input.json('$.metadata'),"model": "$input.params('model')"} + MethodResponses: + - ResponseModels: + application/json: Empty + StatusCode: 200 + - ResponseModels: + application/json: Empty + StatusCode: 404 + RequestParameters: + method.request.querystring.model: false + RestApiDeployment: + Type: 'AWS::ApiGateway::Deployment' + Properties: + StageDescription: + AccessLogSetting: + DestinationArn: !GetAtt + - APIGatewayCloudWatchLogGroup + - Arn + Format: !Join + - ',' + - - '{"requestId":"$context.requestId"' + - '"ip": "$context.identity.sourceIp"' + - '"caller":"$context.identity.caller"' + - '"user":"$context.identity.user"' + - '"requestTime":"$context.requestTime"' + - '"httpMethod":"$context.httpMethod"' + - '"resourcePath":"$context.resourcePath"' + - '"status":"$context.status"' + - '"protocol":"$context.protocol"' + - '"responseLength":"$context.responseLength"}' + RestApiId: !Ref RESTAPIGateway + StageName: prod + DependsOn: + - POSTMethod + Metadata: + cfn_nag: + rules_to_suppress: + - id: W68 + reason: Resource not associated with an AWS::ApiGateway::UsagePlan for now +Outputs: + JupyterInterface: + Description: "Open Jupyter IDE. This authenticates you against Jupyter." + Value: !Sub "https://console.aws.amazon.com/sagemaker/home?region=${AWS::Region}#/notebook-instances/openNotebook/FraudDetectionNotebookInstance?view=classic" + SageMakerNotebook: + Description: "Open Jupyter notebook kick off model training" + Value: !Sub "https://frauddetectionnotebookinstance.notebook.${AWS::Region}.sagemaker.aws/notebooks/source/notebooks/sagemaker_fraud_detection.ipynb" + FirehoseDeliveryStreamArn: + Description: Firehose Delivery Stream ARN + Value: !GetAtt + - KinesisFirehoseDeliveryStream + - Arn + FirehoseDeliveryRoleArn: + Description: Firehose Delivery Role ARN + Value: !GetAtt + - FirehoseDeliveryIAMRole + - Arn + RestApiId: + Value: !Ref RESTAPIGateway diff --git a/source/fraud_detection/index.py b/source/lambda/model-invocation/index.py similarity index 90% rename from source/fraud_detection/index.py rename to source/lambda/model-invocation/index.py index 4ffa49a..4faa4d7 100644 --- a/source/fraud_detection/index.py +++ b/source/lambda/model-invocation/index.py @@ -14,9 +14,6 @@ ############################################################################## import json import os -import random -import datetime -import re import logging import boto3 @@ -32,10 +29,10 @@ def lambda_handler(event, context): data_payload = event.get('data', None) assert data_payload, "Payload did not include a data field!" model_choice = event.get('model', None) - valid_models = set(['anomaly_detector', 'fraud_classifier']) + valid_models = {'anomaly_detector', 'fraud_classifier'} if model_choice: assert model_choice in valid_models, "The requested model, {}, was not a valid model name {}".format(model_choice, valid_models) - models = set([model_choice]) if model_choice else valid_models + models = {model_choice} if model_choice else valid_models output = {} if 'anomaly_detector' in models: @@ -44,7 +41,7 @@ def lambda_handler(event, context): if 'fraud_classifier' in models: output["fraud_classifier"] = get_fraud_prediction(data_payload) - success = store_data_prediction(output, metadata) + store_data_prediction(output, metadata) return output @@ -67,7 +64,7 @@ def get_fraud_prediction(data, threshold=0.5): Body=data) pred_proba = json.loads(response['Body'].read().decode()) prediction = 0 if pred_proba < threshold else 1 - # Note: XGBoost returns a float as a prediction, a linear learner would require different handling. + logger.info("classification pred_proba: {}, prediction: {}".format(pred_proba, prediction)) return {"pred_proba": pred_proba, "prediction": prediction} @@ -84,5 +81,7 @@ def store_data_prediction(output_dict, metadata): record = ','.join(metadata + [str(fraud_pred), str(anomaly_score)]) + '\n' success = firehose.put_record(DeliveryStreamName=firehose_delivery_stream, Record={'Data': record}) - logger.info("Record logged: {}".format(record)) - return success + if success: + logger.info("Record logged: {}".format(record)) + else: + logger.warning("Record delivery failed for record: {}".format(record)) diff --git a/source/notebooks/requirements.in b/source/notebooks/requirements.in new file mode 100644 index 0000000..475c8b3 --- /dev/null +++ b/source/notebooks/requirements.in @@ -0,0 +1,12 @@ +imbalanced-learn +aws_requests_auth +matplotlib +scikit-learn +pandas +sagemaker +boto3 +seaborn +docutils==0.14 +awscli +botocore==1.17.4 +python-dotenv diff --git a/source/notebooks/requirements.txt b/source/notebooks/requirements.txt new file mode 100644 index 0000000..0b849c5 --- /dev/null +++ b/source/notebooks/requirements.txt @@ -0,0 +1,48 @@ +# +# This file is autogenerated by pip-compile +# To update, run: +# +# pip-compile requirements.in +# +aws-requests-auth==0.4.3 # via -r requirements.in +awscli==1.18.81 # via -r requirements.in +boto3==1.14.4 # via -r requirements.in, sagemaker +botocore==1.17.4 # via -r requirements.in, awscli, boto3, s3transfer +certifi==2020.4.5.2 # via requests +chardet==3.0.4 # via requests +colorama==0.4.3 # via awscli +cycler==0.10.0 # via matplotlib +docutils==0.14 # via -r requirements.in, awscli, botocore +idna==2.9 # via requests +imbalanced-learn==0.7.0 # via -r requirements.in +importlib-metadata==1.6.1 # via sagemaker +jmespath==0.10.0 # via boto3, botocore +joblib==0.15.1 # via imbalanced-learn, scikit-learn +kiwisolver==1.2.0 # via matplotlib +matplotlib==3.2.1 # via -r requirements.in, seaborn +numpy==1.18.5 # via imbalanced-learn, matplotlib, pandas, sagemaker, scikit-learn, scipy, seaborn +packaging==20.4 # via sagemaker +pandas==1.0.4 # via -r requirements.in, seaborn +protobuf3-to-dict==0.1.5 # via sagemaker +protobuf==3.12.2 # via protobuf3-to-dict, sagemaker +pyasn1==0.4.8 # via rsa +pyparsing==2.4.7 # via matplotlib, packaging +python-dateutil==2.8.1 # via botocore, matplotlib, pandas +python-dotenv==0.13.0 # via -r requirements.in +pytz==2020.1 # via pandas +pyyaml==5.3.1 # via awscli +requests==2.23.0 # via aws-requests-auth +rsa==3.4.2 # via awscli +s3transfer==0.3.3 # via awscli, boto3 +sagemaker==1.64.1 # via -r requirements.in +scikit-learn==0.23.1 # via -r requirements.in, imbalanced-learn +scipy==1.4.1 # via imbalanced-learn, sagemaker, scikit-learn, seaborn +seaborn==0.10.1 # via -r requirements.in +six==1.15.0 # via cycler, packaging, protobuf, protobuf3-to-dict, python-dateutil +smdebug-rulesconfig==0.1.4 # via sagemaker +threadpoolctl==2.1.0 # via scikit-learn +urllib3==1.25.9 # via botocore, requests +zipp==3.1.0 # via importlib-metadata + +# The following packages are considered to be unsafe in a requirements file: +# setuptools diff --git a/source/notebooks/sagemaker_fraud_detection.ipynb b/source/notebooks/sagemaker_fraud_detection.ipynb index 56a6658..99792a9 100644 --- a/source/notebooks/sagemaker_fraud_detection.ipynb +++ b/source/notebooks/sagemaker_fraud_detection.ipynb @@ -601,7 +601,8 @@ "source": [ "We can now show how we could use both of these models in a production system, using HTTP requests to an AWS Lambda function that invokes both the unsupervised and the supervised SageMaker endpoints.\n", "\n", - "We create a background thread that will constantly create HTTP requests to invoke the Lambda, using our test data as input. See the included `generate_endpoint_traffic.py` file to see how that is done. The output will be logged to Kinesis, and you can also observe it in the Lambda function's CloudWatch logs." + "We create a background thread that will constantly create HTTP requests to invoke the Lambda, using our test data as input. See the included `generate_endpoint_traffic.py` file to see how that is done.\n", + "The output will be logged to an S3 bucket through Kinesis, and you can also observe it in the Lambda function's CloudWatch logs." ] }, { @@ -611,10 +612,10 @@ "outputs": [], "source": [ "from threading import Thread\n", - "from generate_endpoint_traffic import generate_traffic\n", + "from package.generate_endpoint_traffic import generate_traffic\n", "\n", "thread = Thread(target = generate_traffic, args=[np.copy(X_test)])\n", - "thread.start()" + "thread.start()\n" ] }, { @@ -884,7 +885,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.5" + "version": "3.6.10" } }, "nbformat": 4, diff --git a/source/notebooks/setup.py b/source/notebooks/setup.py new file mode 100644 index 0000000..683efe4 --- /dev/null +++ b/source/notebooks/setup.py @@ -0,0 +1,10 @@ +from distutils.core import setup + + +setup( + name='package', + version='1.0', + description="A package to organize the solution's code.", + package_dir={'': 'src'}, + packages=['package'], +) diff --git a/source/notebooks/src/package/__init__.py b/source/notebooks/src/package/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/source/notebooks/src/package/config.py b/source/notebooks/src/package/config.py new file mode 100644 index 0000000..ac189b1 --- /dev/null +++ b/source/notebooks/src/package/config.py @@ -0,0 +1,21 @@ +from dotenv import load_dotenv +import os +from pathlib import Path + +from package import utils + +current_folder = utils.get_current_folder(globals()) +env_location = '../../../../.env' +dotenv_filepath = Path(current_folder, env_location).resolve() +assert dotenv_filepath.exists(), "Could not find .env file at {}".format(str(dotenv_filepath)) + +load_dotenv() + +AWS_ACCOUNT_ID = os.environ['AWS_ACCOUNT_ID'] +AWS_REGION = os.environ['AWS_REGION'] +SAGEMAKER_IAM_ROLE = os.environ['SAGEMAKER_IAM_ROLE'] +STACK_NAME = os.environ['STACK_NAME'] +SOLUTIONS_S3_BUCKET = os.environ['SOLUTIONS_S3_BUCKET'] + +MODEL_DATA_S3_BUCKET = os.environ['MODEL_DATA_S3_BUCKET'] +REST_API_GATEWAY = os.environ['REST_API_GATEWAY'] diff --git a/source/notebooks/src/package/generate_endpoint_traffic.py b/source/notebooks/src/package/generate_endpoint_traffic.py new file mode 100644 index 0000000..54ee75d --- /dev/null +++ b/source/notebooks/src/package/generate_endpoint_traffic.py @@ -0,0 +1,63 @@ +""" +Handles generating traffic and creating the ElasticSearch index and dashboard. +""" +import time +import re +import datetime +import random + +import requests +from aws_requests_auth.boto_utils import BotoAWSRequestsAuth +import numpy as np +from scipy.stats import poisson + +from package import config + +def generate_metadata(): + """ + Generates medatadata for the HTTP request: a randomized source and a timestamp. + """ + millisecond_regex = r'\.\d+' + timestamp = re.sub(millisecond_regex, '', str(datetime.datetime.now())) + source = random.choice(['Mobile', 'Web', 'Store']) + result = [timestamp, 'random_id', source] + + return result + + +def get_data_payload(test_array): + return {'data':','.join(map(str, test_array)), + 'metadata': generate_metadata()} + + +def generate_traffic(X_test): + """ + Using a feature array as input + """ + while True: + # NB: The shuffle will mutate the X_test array in-place, so ensure + # you're working with a copy if you intend to use the calling argument + # array elsewhere. + np.random.shuffle(X_test) + for example in X_test: + data_payload = get_data_payload(example) + invoke_endpoint(data_payload) + # We invoke the function according to a shifted Poisson distribution + # to simulate data arriving at random intervals + time.sleep(poisson.rvs(1, size=1)[0] + np.random.rand() / 100) + + +def invoke_endpoint(payload): + """ + We get credentials from the IAM role of the notebook instance, + then use them to create a signed request to the API Gateway + """ + auth = BotoAWSRequestsAuth(aws_host="{}.execute-api.{}.amazonaws.com".format( + config.REST_API_GATEWAY, config.AWS_REGION), + aws_region=config.AWS_REGION, + aws_service='execute-api') + + invoke_url = "https://{}.execute-api.{}.amazonaws.com/prod/invocations".format( + config.REST_API_GATEWAY, config.AWS_REGION) + + requests.post(invoke_url, json=payload, auth=auth) diff --git a/source/notebooks/src/package/utils.py b/source/notebooks/src/package/utils.py new file mode 100644 index 0000000..0c263fc --- /dev/null +++ b/source/notebooks/src/package/utils.py @@ -0,0 +1,13 @@ +from pathlib import Path +import os + + +def get_current_folder(global_variables): + # if calling from a file + if "__file__" in global_variables: + current_file = Path(global_variables["__file__"]) + current_folder = current_file.parent.resolve() + # if calling from a notebook + else: + current_folder = Path(os.getcwd()) + return current_folder