I built this flexible microservices setup to run multiple Ollama-compatible models simultaneously on your local machine. This advanced stack combines Ollama with Open WebUI to create a complete AI workbench where you can leverage different specialized models for various tasks!
This repo gives you a powerful multi-model template with isolated microservices:
- Multiple Ollama containers: Each running a different AI model in its own environment
- DeepSeek r1: Great for complex reasoning and analysis
- Llama3: Excellent for general conversation and creative tasks
- Phi3: Optimized for code and technical documentation
- Gemma3: Google's efficient and capable model
- Single Open WebUI container: Provides a unified interface to access all models
Each model runs independently in its own container with dedicated resources, as microservices.
-
Get Docker running
You'll need Docker installed → Get Docker here -
Clone this repo
git clone https://github.com/oburay/multi-llm-workbench.git cd multi-llm-workbench -
Customize your model selection (optional)
The default setup includes DeepSeek r1, Llama3, Phi3, and Gemma3, but you can:
- Edit any of the Dockerfiles in the ollama directory to use different models
- Add new models by creating additional Dockerfiles following the pattern
- Remove models by commenting out their sections in docker-compose.yml
-
Launch your AI workbench!
docker compose up -d
First run will take a while as it downloads multiple models (a good time to make coffee!)
-
Start interacting with your models
Just open http://localhost:8080 in your browser -
Switch between models
Use the model selector dropdown in the chat interface to choose which model handles each conversation
What's actually happening:
- Each Ollama container boots up and downloads its specific model if needed
- All models run in parallel, each in its own isolated environment
- The containers share Docker networks but have separate data volumes
- Open WebUI provides a unified interface to all models
- Everything runs locally - your data stays on your machine
┌───────────────────────────────────────────────────────────────────────────┐
│ Your Machine │
│ │
│ ┌────────────────────────────────────────────────────────────────────┐ │
│ │ Docker Environment │ │
│ │ │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌────────┐ │ │
│ │ │ │ │ │ │ │ │ │ │ │
│ │ │ DeepSeek r1 │ │ Llama3 │ │ Phi3 │ │ Gemma3 │ │ │
│ │ │ Container │ │ Container │ │ Container │ │Container││ │
│ │ │ │ │ │ │ │ │ │ │ │
│ │ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ └────┬───┘ │ │
│ │ │ │ │ │ │ │
│ │ │ │ │ │ │ │
│ │ ┌──────▼──────┐ ┌──────▼──────┐ ┌──────▼──────┐ ┌────▼───┐ │ │
│ │ │ DeepSeek │ │ Llama3 │ │ Phi3 │ │ Gemma3 │ │ │
│ │ │ Data │ │ Data │ │ Data │ │ Data │ │ │
│ │ │ Volume │ │ Volume │ │ Volume │ │ Volume │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────┘ └────────┘ │ │
│ │ │ │
│ │ ┌─────────────┐ │ │
│ │ │ │ │ │
│ │ │ Open WebUI │ │ │
│ │ │ Container │◄────────────────────────┘ │
│ │ │ │ │
│ │ └──────┬──────┘ │
│ └───────────────────────────────────┼────────────────────────────────────┘
│ │ │
│ ┌─────────────┐ │ │
│ │ │ │ │
│ │Web Browser │◄────────────────────┘ │
│ │ │ │
│ └─────────────┘ │
│ │
└───────────────────────────────────────────────────────────────────────────┘
Port 11434-11437 (APIs) Port 8080 (UI)
(Each model on separate port) (User access)
These come in handy when you need to check on things:
See what's running
docker compose psCheck the logs for a specific model
# For a specific model
docker logs ollama-llama3
# For the UI
docker logs ollama-webui
# For all containers
docker compose logs -fShut everything down
docker compose downRemove all model data
docker compose down -vCheck which models are available in a container
docker exec -it ollama-llama3 ollama list-
Resource management is important:
- Running multiple models simultaneously requires more RAM and CPU
- Consider stopping models you're not actively using:
docker stop ollama-phi3 - Start them again when needed:
docker start ollama-phi3
-
Choose the right model for each task:
- Use larger models (DeepSeek r1) for complex reasoning
- Use code-specialized models (Phi3) for programming
- Use smaller models for quick interactions
-
Expect slower initial responses as each model warms up
-
Network usage: Each model runs on a separate port internally:
- DeepSeek r1: 11434
- Llama3: 11435
- Phi3: 11436
- Gemma3: 11437
To add another model to your workbench:
- Create a new Dockerfile in the ollama directory:
# filepath: ./ollama/Dockerfile.yourmodel
FROM ollama/ollama:latest
EXPOSE 11434
RUN echo '#!/bin/sh\n\
MODEL_PATH="/root/.ollama/models/manifests/registry.ollama.ai/library/yourmodel/latest"\n\
if [ ! -f "$MODEL_PATH" ]; then\n\
echo "Model not found, downloading..."\n\
ollama serve & SERVER_PID=$!\n\
sleep 5\n\
ollama pull yourmodel\n\
kill $SERVER_PID\n\
wait $SERVER_PID\n\
echo "Model downloaded successfully"\n\
fi\n\
echo "Starting Ollama server..."\n\
exec ollama serve\n' > /startup.sh && chmod +x /startup.sh
ENTRYPOINT ["/startup.sh"]- Add the new service to docker-compose.yml:
ollama-yourmodel:
build:
context: ./ollama
dockerfile: Dockerfile.yourmodel
container_name: ollama-yourmodel
volumes:
- ollama_yourmodel_data:/root/.ollama
ports:
- "11438:11434" # Use next available port
restart: unless-stopped
networks:
- ollama_network- Add the volume entry:
volumes:
# ... existing volumes
ollama_yourmodel_data:- Update the webui environment to include the new model:
webui:
# ... existing config
environment:
- OLLAMA_BASE_URL=http://ollama-deepseek:11434
- OLLAMA_BASE_URLS=http://ollama-deepseek:11434;http://ollama-llama3:11434;http://ollama-phi3:11434;http://ollama-gemma3:11434;http://ollama-yourmodel:11434This project builds upon these amazing open source projects:
- Ollama - For running LLMs locally
- Open WebUI - For the intuitive chat interface
This project extends the single-model approach from local-llm-stack to support multiple specialized LLMs running simultaneously.