Athena Create API is an advanced language model API designed to seamlessly integrate and manage models like text-davinci-003
and many more open-source language models. This API comes with the capability to run the models locally, utilize Hugging Face Inference Endpoints, or even run in a hybrid mode that uses both local and Hugging Face models.
- Ubuntu 16.04 LTS
- VRAM >= 24GB
- RAM > 12GB (minimal), 16GB (standard), 80GB (full)
- Disk > 284GB
- Ubuntu 16.04 LTS
The server-side configuration file is server/configs/config.default.yaml
, and some parameters are presented as follows:
model
: LLM, currently supportstext-davinci-003
. We are working on integrating more open-source LLMs.inference_mode
: mode of inference endpointslocal
: only use the local inference endpointshuggingface
: only use the Hugging Face Inference Endpoints (free of local inference endpoints)hybrid
: both oflocal
andhuggingface
local_deployment
: scale of locally deployed models, works underlocal
orhybrid
inference mode:minimal
(RAM>12GB, ControlNet only)standard
(RAM>16GB, ControlNet + Standard Pipelines)full
(RAM>42GB, All registered models)
- Clone this repository
git clone https://github.com/Agora-X/Athena-Create-API.git
cd Athena-Create-API
- Build the Docker image
docker build -t athena-create-api:latest .
- Run the Docker image
docker run -p 8004:8004 -p 8005:8005 athena-create-api:latest
- A Kubernetes cluster up and running.
kubectl
configured to interact with your Kubernetes cluster.
- Apply the Kubernetes deployment
kubectl apply -f kubernetes-deployment.yaml
- Apply the Kubernetes service
kubectl apply -f kubernetes-service.yaml
- To view the status of your deployment, use the following command:
kubectl get pods
- Once the status of all pods is
Running
, you can access your application via the load balancer's IP address on port 8004 for model server and 8005 for the web server.
Documentation on how to use the various endpoints of the API can be found here.
Contributions are welcome! Please read our contribution guide for details.
Help us spread the word about Athena Create API by sharing this project with your friends and colleagues. Here is a quick share link:
This project is licensed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license.
If you have any questions or suggestions, please feel free to open an issue here.
This guide will provide extensive and detailed instructions on how to deploy the Athena Create API on AWS using Kubernetes, configured for self-healing, auto-scaling, and other important production infrastructure features.
Before starting, ensure the following requirements are met:
- You have an AWS account.
- You have installed and configured the AWS CLI.
- You have installed and configured
kubectl
andeksctl
on your local machine. - You have a Docker Hub account (for storing Docker images).
- You have Docker installed on your local machine.
- Build Docker Image
- Push Docker Image to Docker Hub
- Setup AWS EKS Cluster
- Deploy Application
- Setup Load Balancer
- Setup Auto-scaling
- Testing
- Monitoring and Logging
- Cleanup
From the root directory of the project, build the Docker image:
docker build -t athena-create-api .
Make sure to replace athena-create-api
with the desired name for the Docker image.
Tag the image with your Docker Hub username and the image name, then push it:
docker tag athena-create-api:latest yourusername/athena-create-api:latest
docker push yourusername/athena-create-api:latest
Create an EKS cluster using eksctl
:
eksctl create cluster --name athena-create-api-cluster --region us-west-2 --nodes 3
Replace us-west-2
with the desired AWS region and 3
with the desired number of nodes.
First, apply the Kubernetes secrets and configmaps:
kubectl apply -f kubernetes/athena-create-api-secret.yaml
kubectl apply -f kubernetes/athena-create-api-configmap.yaml
Next, update the athena-create-api-deployment.yaml
file with the Docker image you pushed to Docker Hub:
spec:
containers:
- name: athena-create-api
image: yourusername/athena-create-api:latest
...
Then, apply the Kubernetes deployment and service:
kubectl apply -f kubernetes/athena-create-api-deployment.yaml
kubectl apply -f kubernetes/athena-create-api-service.yaml
In the athena-create-api-service.yaml
file, make sure the service type is set to LoadBalancer
:
spec:
type: LoadBalancer
...
Once the service is applied, you can get the public URL of your application with:
kubectl get service athena-create-api-service
Apply the Kubernetes Horizontal Pod Autoscaler:
kubectl apply -f kubernetes
/athena-create-api-hpa.yaml
Make sure to configure the minReplicas
, maxReplicas
, and targetCPUUtilizationPercentage
fields in the athena-create-api-hpa.yaml
file based on your needs.
Test the deployment by navigating to the public URL of your application from the Load Balancer.
You can use Amazon CloudWatch for monitoring and logging. To access logs:
- Open the CloudWatch console.
- In the navigation pane, choose
Logs
. - In the log groups pane, choose the log group for your application.
You can also use kubectl
commands to monitor the pods and nodes of your Kubernetes cluster.
To avoid incurring charges, you can delete the EKS cluster when you're done:
eksctl delete cluster --name athena-create-api-cluster
Note: Be sure to replace any placeholder values with your actual values throughout this guide. It's also important to understand that managing a Kubernetes cluster on AWS may incur charges, so always monitor your usage and costs.