π₯οΈ Streamlit frontend β’ π Python RAG backend β’ π³ Docker Compose β’ π NGINX reverse proxy β’ βοΈ AWS EC2 (t3.micro, Free Tier) β’ β° cron scheduling
π Live App: Click here to experience the app
π¬ Demo Preview π
This single README documents the entire RAG Chatbot web application: local development, containerized deployment, NGINX reverse proxy, and a low-cost AWS EC2 deployment with scheduled runtime using cron to reduce resource usage. Follow this guide to set up, run, deploy, and maintain the app.
Name: RAG Chatbot App
Short description:
A web application that combines retrieval (vector search over ingested documents) with a Large Language Model to produce context-aware chat responses (Retrieval-Augmented Generation β RAG). The frontend is a Streamlit chat UI; the backend handles ingestion, embedding generation, vector search, reranking (MMR), and LLM calls. Deployable with Docker Compose behind an NGINX reverse proxy; optimized to run on a Free Tier AWS EC2 t3.micro by applying container-level resource limits and scheduled runtime.
[User Browser]
|
(HTTP 80)
|
[NGINX Reverse Proxy] -----> [Streamlit Frontend Container (8501)]
|
`----> [Backend Container (8000)] (internal API calls from frontend)
- NGINX listens on port 80 and forwards traffic to the Streamlit frontend container.
- Frontend (Streamlit) provides UI, maintains per-session state (
st.session_state), and calls backend endpoints such as/chat,/ingest,/summarize. - Backend (FastAPI or similar) performs document ingestion, text chunking, embedding generation (ONNX MiniLM or provider), stores vectors (FAISS or managed vector DB), executes retrieval (including MMR), and routes prompts to an LLM provider (OpenAI/OpenRouter/etc.).
- All services are Dockerized and orchestrated by Docker Compose.
- Frontend: Streamlit (Python)
- Backend: Python (FastAPI), LangChain-like pattern, local ONNX embeddings and FAISS (or external vector store)
- Reverse proxy: NGINX
- Container orchestration: Docker Compose (v3.8)
- Hosting: AWS EC2 (Ubuntu 24.04 LTS,
t3.micro) - Scheduling:
cronjobs for start/stop to conserve resources - Env management:
.envfile and (recommended) AWS Secrets Manager for production secrets
- File & URL ingestion (PDFs, web pages)
- Text chunking + embedding generation (ONNX MiniLM or provider-based)
- Vector store integration (FAISS local, or managed vector DB like Qdrant/Pinecone)
- Retriever with MMR re-ranking and configurable fetch/k
- LLM integration via provider (OpenAI / OpenRouter / other)
- API endpoints:
/chat,/ingest,/summarize,/remove_vectors,/remove_all_vectors,/settings,/switch_model,/health
- Streamlit chat UI with
st.chat_input/st.chat_message - Per-user session isolation via
st.session_stateandsession_id - Upload PDFs or provide URLs to ingest
- Control temperature and embedding model mode
- Display AI responses and retrieved source attributions
- Export chat (JSON / Markdown / PDF)
- Docker Compose deployment with container-level memory & CPU limits
- NGINX reverse proxy exposes only port 80 (optionally 443)
- Cron jobs to start/stop containers on a schedule to keep costs low
- Languages: Python 3.10+ (recommended)
- Frontend: Streamlit
- Backend: FastAPI (or similar), LangChain components (optional)
- Embeddings model: all-MiniLM-L6-v2
- Embed / Vector: ONNX + onnxruntime (ONNX MiniLM), FAISS (local)
- Qunatisation: quantised the converted onnx model to increase the speed and reduce the model size.
- LLM and providers: deepseek-chat-v3-0324 from Openrouter
- Containers: Docker, Docker Compose (v3.8)
- Proxy: NGINX (alpine image)
- Hosting: AWS EC2 (Ubuntu 24.04 LTS)
- Scheduling: crontab (system cron)
- Utilities: python-dotenv, requests/httpx, PyMuPDF/pypdf for PDF handling
For AWS deployment (recommended minimal config):
- Instance:
t3.micro(Free Tier) β 2 vCPUs (burstable), 1 GiB RAM - Storage: EBS gp3/gp2 β 30 GB (start)
- OS: Ubuntu Server 24.04 LTS (x86_64)
- Docker & Docker Compose installed
Local dev requirements:
- Python 3.10+
- Streamlit & backend dependencies
- Docker (optional for containerized local testing)
β οΈ Note: Thet3.microis memory-limited. We use swap + container memory limits; for multi-user or heavier embeddings, upgrade the instance.
git clone https://github.com/your-repo/rag-chatbot.git
cd rag-chatbotrag-chatbot/
ββ backend/ # Python backend (FastAPI)
β ββ main.py
β ββ Dockerfile
β ββ requirements.txt
ββ frontend/ # Streamlit app
β ββ app.py
β ββ css.py
β ββ requirements.txt
ββ nginx/
β ββ Dockerfile
β ββ nginx.conf
ββ docker-compose.yml
ββ .env
Copy the file and edit with your secrets:
cp .env .env
nano .envOPENROUTER_API_KEY=...Below is a recommended docker-compose.yml tuned for a t3.micro:
version: '3.8'
services:
backend:
image: suraj5424/rag-backend:latest
container_name: rag-backend
env_file:
- .env
expose:
- "8000"
volumes:
- ./backend/onnx_model:/app/onnx_model:ro
- ./backend/user_data:/app/user_data
restart: always
mem_limit: 600m
cpus: 0.5
logging:
driver: "json-file"
options:
max-size: "5m"
max-file: "3"
frontend:
image: suraj5424/rag-frontend:latest
container_name: rag-frontend
env_file:
- .env
expose:
- "8501"
depends_on:
- backend
restart: always
mem_limit: 350m
cpus: 0.4
logging:
driver: "json-file"
options:
max-size: "5m"
max-file: "3"
nginx:
build: ./nginx
container_name: nginx-reverse-proxy
ports:
- "80:80"
depends_on:
- frontend
- backend
restart: always
mem_limit: 200m
cpus: 0.2
logging:
driver: "json-file"
options:
max-size: "5m"
max-file: "3"
networks:
default:
driver: bridgeNotes
- Replace
suraj5424/...with your actual image names or use local Dockerfiles viabuild:. - Volume mounts for
onnx_modelanduser_dataensure persistence across container restarts.
nginx/Dockerfile
FROM nginx:stable-alpine
COPY nginx.conf /etc/nginx/conf.d/default.confnginx/nginx.conf
server {
listen 80;
server_name _; # replace with your.domain.tld if you have one
# Proxy base traffic to Streamlit frontend
location / {
proxy_pass http://frontend:8501;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_cache_bypass $http_upgrade;
}
# Health /status proxied to backend health endpoint
location /health {
proxy_pass http://backend:8000/health;
proxy_set_header Host $host;
}
# Optional: static files or admin endpoints can be routed similarly
}Routing logic summary
- NGINX listens on host port 80 β proxies to Docker Compose service
frontend:8501. /healthrouted to backend health endpoint so external checks can verify backend status.- If using a domain, update
server_nameand add TLS blocks for 443 (see Security).
To reduce runtime costs and resource usage, the plan is to run the backend only MonβSat 07:00β22:00 (Berlin time) and allow the frontend to stay online during the day.
Edit crontab:
crontab -eAdd:
# Use Berlin time for schedule
TZ=Europe/Berlin
# Start backend at 07:00 Mon-Sat
0 7 * * 1-6 /usr/bin/docker start rag-backend
# Stop backend at 22:00 Mon-Sat
0 22 * * 1-6 /usr/bin/docker stop rag-backend
# Ensure backend is stopped on Sunday at midnight
0 0 * * 0 /usr/bin/docker stop rag-backend
# Start frontend at 07:00 Mon-Sat (frontend may remain running 24/7 if desired)
0 7 * * 1-6 /usr/bin/docker start rag-frontendNotes & tips
- Use full paths (
/usr/bin/dockerorwhich docker). - If you prefer
docker-composestart/stop behavior, point cron todocker compose -f /path/to/docker-compose.yml up -dordown. - Make sure the
ubuntuuser has permissions for Docker commands in crontab or usesudoin crons.
.env.example
# LLM provider & keys
MODEL_NAME=your-model-name
OPENAI_API_KEY=sk-...
OPENROUTER_API_KEY=
# Host references (use service names inside Docker Compose)
BACKEND_HOST=http://backend:8000
FRONTEND_HOST=http://frontend:8501
# App configuration
MAX_UPLOAD_MB=10
RETRIEVER_K=4
RETRIEVER_FETCH_K=10
MMR_LAMBDA=0.7
# Admin / light auth (optional)
ADMIN_API_KEY=changemeSecurity note
- DO NOT commit
.envto source control. For production, use AWS Secrets Manager, Parameter Store, or Docker secrets.
Compose-level resource limits
backend:mem_limit: 600m,cpus: 0.5Rationale: Embedding/ML operations are backend-heavy; limit to 600MB to avoid OOM ont3.micro, yet leave some headroom.frontend:mem_limit: 350m,cpus: 0.4Rationale: Streamlit is lightweight but needs memory for session data and rendering.nginx:mem_limit: 200m,cpus: 0.2Rationale: NGINX is extremely lightweight.
How this fits Free Tier
t3.microhas 1 GiB RAM. With swap (recommended 1 GiB) and the above limits, typical demo workloads run with occasional bursts. For heavier loads, upgrade instance.
Logging & disk
- Docker logging capped (
max-size: "5m",max-file: "3") to avoid disk filling on EBS.
- Backend:
cd backend && pip install -r requirements.txt && uvicorn main:app --reload - Frontend:
cd frontend && pip install -r requirements.txt && streamlit run app.py - Point frontend to
http://localhost:8000in env.
Build & start:
docker-compose up -d --buildCheck containers:
docker ps
docker logs -f rag-frontend
docker logs -f rag-backend
docker logs -f nginx-reverse-proxyAccess
- Public:
http://<EC2_PUBLIC_IP>/(NGINX forwards to Streamlit) - Health:
http://<EC2_PUBLIC_IP>/health(if proxied)
Restart or stop
docker-compose restart backend
docker-compose stop backend
docker-compose up -dThe frontend expects the backend endpoints below. Match payloads exactly for compatibility.
Request JSON
{
"query": "What is RAG?",
"chat_history": [{"role":"user","content":"Hi"}],
"temperature": 0.7
}Response JSON
{
"response": "Answer text...",
"source_type": "RAG" | "Tool" | "LLM_Internal_Knowledge",
"retrieved_sources": [{"source":"file.pdf","page":2}],
"temperature": 0.7
}- Form-data:
file(PDF) ORurl(string) - Response: ingestion status, content hash
- Form-data:
fileorurl - Response:
{ "summary": "..." }
- Removes vectors tied to a source
- Clears all ingested vectors for session
- GET returns session settings, POST updates (e.g.,
{"temperature":0.5})
- Form param:
quantized=true|false - Switches embeddings model variant
License
- Add
LICENSEfile (MIT or Apache-2.0 recommended for permissive use).
Credits
- Project assembled from Streamlit frontend + Python RAG backend design; NGINX containerization and basic orchestration.
- Launch EC2
t3.micro(Ubuntu 24.04) and configure security group (SSH + HTTP). - SSH into instance and install Docker & Docker Compose.
- Clone repository into
/home/ubuntu/rag-deploy. - Create
.envfrom.env.example(do not commit). - (Recommended) Create 1GB swapfile to avoid memory OOMs.
- Place
nginx/nginx.confand confirmdocker-compose.ymlvalues. docker-compose up -d --build- Add crontab entries for scheduled start/stop.
- Verify site at
http://<EC2_PUBLIC_IP>/. - Add TLS & secrets in production; consider upgrading instance for heavy use.
- Add authentication (JWT / API keys) and an admin UI.
- Replace local FAISS with Qdrant (hosted) or Pinecone (managed) for multi-instance scaling.
- Add centralized logs and metrics (Prometheus + Grafana, CloudWatch).
- Integrate CI/CD (GitHub Actions) for image builds and automatic deploys to EC2.
Author: Suraj Varma
Email: [email protected]
GitHub: @suraj5424
LinkedIn: Suraj Varma
Website/Portfolio: Suraj Varma
This project is licensed under the MIT License β see the LICENSE file for details.
If you find this project useful, please give it a β on GitHub!
