Automated deployment script to run your private, self-hosted AI workspace on Akamai Cloud GPU instances. This stack combines vLLM for high-performance LLM inference with AnythingLLM - a full-stack application for building private AI assistants with RAG (Retrieval Augmented Generation), document chat, and agent capabilities.
Just run this single command:
curl -fsSL https://raw.githubusercontent.com/linode/ai-quickstart-anythingllm/main/deploy.sh | bashThat's it! The script will download required files and guide you through the interactive deployment process.
- Fully Automated Deployment: Handles instance creation with real-time progress tracking
- Ready to use AI Stack: vLLM for GPU-accelerated inference + AnythingLLM for enterprise AI workspace
- RAG & Document Chat: Upload documents and chat with your data using AnythingLLM's built-in vector database
- AI Agents: Build custom AI agents with tools and workflows
- Cross-Platform Support: Works on macOS, Linux, and Windows (Git Bash/WSL)
- Ubuntu 24.04 LTS with NVIDIA drivers
- Docker & NVIDIA Container Toolkit
- Systemd service for automatic startup on reboot
gpt-oss-20b is OpenAI's first fully open-source LLM, released under Apache 2.0 license. Key characteristics:
- 20B parameters: Fits on a single RTX 4000 Ada GPU (20GB VRAM)
- High benchmark scores: Competitive with larger models on reasoning and instruction-following tasks
- High throughput: Optimized for fast token generation with vLLM inference engine
BGE-M3 is a state-of-the-art multilingual embedding model. Key strengths:
- Multilingual: Supports 100+ languages with strong cross-lingual retrieval
- Multi-functionality: Supports dense, sparse (lexical), and multi-vector retrieval in one model
- Top performance: Ranked #1 on MTEB multilingual leaderboard at release
AnythingLLM is an open-source, full-stack application that turns any document, resource, or content into context for any LLM. Key features include:
- Document Intelligence: Upload PDFs, Word docs, websites, and more - chat with your data instantly
- Vector Database: Uses pgvector (PostgreSQL) for scalable, production-grade vector storage
- Multi-user Workspaces: Create isolated workspaces for different projects or teams
- Privacy-First: All data stays on your infrastructure - nothing leaves your server
This deployment includes a complete RAG (Retrieval Augmented Generation) pipeline:
- Text Embeddings Inference: Hugging Face's TEI service running the BAAI/bge-m3 multilingual embedding model
- pgvector: PostgreSQL extension for efficient vector similarity search, enabling fast document retrieval at scale
- Active Linode account with GPU access enabled
- Required: bash, curl, ssh, jq
- Note: jq will be auto-installed if missing
No installation required - just run:
curl -fsSL https://raw.githubusercontent.com/linode/ai-quickstart-anythingllm/main/deploy.sh | bashDownload the script and run locally:
curl -fsSLO https://raw.githubusercontent.com/linode/ai-quickstart-anythingllm/main/deploy.sh
bash deploy.shIf you prefer to inspect or customize the scripts:
git clone https://github.com/linode/ai-quickstart-anythingllm
cd ai-quickstart-anythingllm
./deploy.shNote
if you like to add more services check out docker compose template file
vi /template/docker-compose.yml
The script will ask you to:
- Choose a region (e.g., us-east, eu-west)
- Select GPU instance type
- Provide instance label
- Select or generate SSH keys
- Confirm deployment
The script automatically:
- Creates GPU instance in your Linode account
- Monitors cloud-init installation progress
- Waits for AnythingLLM health check
- Waits for vLLM model loading
Once complete, you'll see:
π Setup Complete!
β
Your AI LLM instance is now running!
π Access URLs:
AnythingLLM: https://<ip-label>.ip.linodeusercontent.com
π Access Credentials:
SSH: ssh -i /path/to/your/key root@<instance-ip>
# Install script called by cloud-init service
/opt/ai-quickstart-anythingllm/install.sh
# docker compose file called by systemctl at startup
/opt/ai-quickstart-anythingllm/docker-compose.yml
# Caddy reverse proxy configuration
/opt/ai-quickstart-anythingllm/Caddyfile
# service definition
/etc/systemd/system/ai-quickstart-anythingllm.service
To delete a deployed instance:
# Remote execution
curl -fsSL https://raw.githubusercontent.com/linode/ai-quickstart-anythingllm/main/delete.sh | bash -s -- <instance_id>
# Or download script and run
curl -fsSLO https://raw.githubusercontent.com/linode/ai-quickstart-anythingllm/main/delete.sh
bash delete.sh <instance_id>The script will show instance details and ask for confirmation before deletion.
ai-quickstart-anythingllm/
βββ deploy.sh # Main deployment script
βββ delete.sh # Instance deletion script
βββ template/
βββ cloud-init.yaml # Cloud-init configuration
βββ docker-compose.yml # Docker Compose configuration
βββ Caddyfile # Caddy reverse proxy configuration
βββ install.sh # Post-boot installation script
-
Configure Cloud Firewall (Recommended)
- Create Linode Cloud Firewall
- Restrict access to ports 80/443 by source IP
- Allow SSH (port 22) from trusted IPs only
-
SSH Security
- SSH key authentication required
- Root password provided for emergency console access only
# SSH into your instance
ssh -i /path/to/your/key root@<instance-ip>
# Check container status
docker ps -a
# Check Docker containers log
cd /opt/ai-quickstart-anythingllm && docker compose logs -f
# Check systemd service status
systemctl status ai-quickstart-anythingllm.service
# View systemd service logs
journalctl -u ai-quickstart-anythingllm.service -n 100
# Check cloud-init logs
tail -f /var/log/cloud-init-output.log -n 100
# Restart all services
systemctl restart ai-quickstart-anythingllm.service
# Check NVIDIA GPU status
nvidia-smi
# Check vLLM loaded models
curl http://localhost:8000/v1/models
# Check AnythingLLM health
curl http://localhost:3001/api/ping
# Check container logs
docker logs vllm
docker logs anythingllm
docker logs embedding
docker logs pgvector
# Check embedding service health
curl http://localhost:8001/health
# Check pgvector status
docker exec pgvector pg_isready -U anythingllmIssues and pull requests are welcome! For major changes, please open an issue first to discuss what you would like to change.
This project is licensed under the Apache License 2.0.