Skip to content

Latest commit

 

History

History

llm-on-govcloud-sagemaker

LLM on Amazon SageMaker, compatible with AWS GovCloud

An AWS implementation of a Machine Learning (ML) Large Language Model (LLM) hosting, compatible with GovCloud.

Built With

AWS CDK

cdk

Getting Started

Prerequisites

  • An AWS account with:
    • An increased quota to deploy one ml.g4dn.12xlarge instance for endpoint usage
    • An IAM User or Role with AdministratorAccess policy granted (we recommend restricting access as needed)
  • The AWS CLI installed
    • export the AWS_REGION environment variable
  • Typescript, npm, and the AWS CDK cli installed (Required versions found in the package.json file)

Deployment

Note: The Amazon SageMaker Endpoint can incur significant cost if left running, be sure to monitor your billing, and destroy the stack via the Clean Up section when done experimenting.

Configure your AWS credentials, either by exporting your role credentials to the current terminal, or configuring your AWS CLI profile

Next you can run the following commands:

  1. npm install
  2. npm run build
  3. cdk bootstrap (Only for first time running cdk in the account)
  4. cdk deploy

Once the deployment is completed, you can navigate to SageMaker Notebook Instances and open the notebook Falcon40BNotebook-XXXXXXXXXX where the X's are randomly generated. From there you can run the notebook cells.

Clean Up

If you have created additional Jupyter notebooks in SageMaker you can download them from the SageMaker notebook instance's IDE before destroying the stack.

When complete you can run:

  1. cdk destroy

To delete all resources you created.

Architecture

Architecture

Architecture Overview

To deploy this application, we leverage HuggingFace's prebuilt Text Generation Inference (TGI) Falcon-40b docker image with HuggingFaceSageMakerEndpoint construct(deploy hugging face model to Amazon SageMaker) from @cdklabs/generative-ai-cdk-constructs.

The AWS Deep Learning Containers (DLCs) provide the set of Docker images which can be deployed on Amazon SageMaker. This creates a scalable, secure, hosted endpoint for real time inference.

We deploy a SageMaker notebook instance in a private subnet and allow outbound internet connectivity, while controlling inbound connectivity. To enable notebook to AWS Service Endpoint communication, we then use VPC Endpoints powered by AWS PrivateLink. The benefit of using AWS PrivateLink is it allows SageMaker notebook instances to access the SageMaker real-time inference endpoint over the private network IP space.

Architecture Details & Technologies

HuggingFace TGI and Falcon-40b LLM

Hugging Face's TGI provides a seamless way to deploy LLMs for real-time text generation. It bundles prebuilt Docker containers that handle hosting infrastructure so users can focus on their applications and use-cases.

Falcon-40b features advanced text generation and comprehension capabilities. Boasting 178 billion parameters, Falcon-40b is one of the largest publicly available models. Trained on 1.5 trillion text tokens across English, German, Spanish, French, and other languages, Falcon-40b can fluently generate, summarize, and translate text.

SagemakerEndpoints

SageMaker real-time inference endpoints enable low-latency, high-throughput hosting of machine learning models for real-time inference. By using Amazon SageMaker, we can take advantage of the operational efficiencies of using AWS infrastructure and eliminating the undifferentiated heavy-lifting. Amazon SageMaker handles provisioning servers, scaling, monitoring, and availability freeing up the data scientists to work with LLMs.

SagemakerNotebook

Amazon SageMaker notebook instances provide a managed and familiar environment, purpose-built for developing and evaluating ML models. Amazon SageMaker provides a painless and cost effective sandbox to prototype capabilities.

Multiple instance types give data scientists flexibility to test small demos or fine-tune LLMs on significant datasets.

AWSPrivateLink

Networking components and features of AWS like AWS PrivateLink allow administrators to control private connectivity between VPCs and AWS services securely on AWS without traversing the public internet. This helps enable secure LLM experimentation with datasets.