ALIEN1 (Windows 10) VantaBlack (Ubuntu)
───────────────────── ────────────────────
┌─────────────────┐ ┌─────────────────┐
│ Web UI │ │ Ollama LLM │
│ localhost:8000 │────SSH────────────>│ (dual 3090s) │
│ │ │ │
│ AI Agent │ │ ChromaDB │
│ Ecosystem │ │ Vector Store │
│ │ │ │
│ FastAPI Backend │<───────────────────│ Model Inference │
└─────────────────┘ └─────────────────┘
- Beautiful web UI with research categories
- Document batch processing
- Agent orchestration
- Task management
- WebSocket monitoring
- 48GB VRAM (dual RTX 3090)
- ChromaDB semantic search
- Legal document processing
- 30B+ parameter model support
- GPU-accelerated embeddings
- Query legal/technical documents
- Semantic search with citations
- Model selection (qwen2.5:32b, wizard-vicuna:30b, etc.)
- Real-time ingestion
- Database statistics
SSH into VantaBlack:
ssh -i C:\Users\matte\.ssh\id_ed25519 lightspeed@vantablackEnsure Ollama is running:
ollama list # Check loaded models
# If empty, load a model (see VANTABLACK_QUICKSTART.md)cd C:\Users\matte\Documents\GitHub\AI_Agent_Ecosystem
python -m venv venv
.\venv\Scripts\activate
pip install -r requirements.txt
python api/main.pyOpen browser: http://localhost:8000
Navigate to Research → Document RAG
Edit config/vantablack_config.py:
VANTABLACK_HOST = "vantablack" # or 192.168.0.15
OLLAMA_BASE_URL = "http://vantablack:11434"
SSH_HOST = "lightspeed@vantablack"Already configured to use VantaBlack by default:
llm_backends/ollama_backend.pypoints tohttp://vantablack:11434- All model inference happens on VantaBlack GPUs
-
Query Documents:
- Category: Document RAG
- Input: "What are the liability clauses in section 3?"
- Model: qwen2.5:32b (or wizard-vicuna:30b)
- Click "Generate"
-
Ingest PDFs:
- Upload PDFs to VantaBlack:
/mnt/llm/contracts/ - Use ingestion endpoint
- Documents auto-chunked and indexed
- Upload PDFs to VantaBlack:
-
Check Status:
- View document count
- See available models
- Monitor ChromaDB health
Query RAG:
curl -X POST http://localhost:8000/rag/query \
-H "Content-Type: application/json" \
-d '{"question": "What is the indemnification clause?", "model": "qwen2.5:32b"}'Ingest Documents:
curl -X POST http://localhost:8000/rag/ingest \
-H "Content-Type: application/json" \
-d '{"directory": "/mnt/llm/contracts"}'Status:
curl http://localhost:8000/rag/statusYour VantaBlack /mnt/llm/LLM-Models:
- Wizard-Vicuna 30B (18GB) - Best for complex legal reasoning
- Llama 3.1 8B Abliterated - Fast, uncensored
- Dolphin 2.9.4 - Instruction-tuned
- qwen2.5:32b - Currently downloading
Load models with:
/mnt/llm/load_models.shFrontend (ALIEN1):
- FastAPI backend
- HTML/JS/CSS web UI
- Agent orchestration
Backend (VantaBlack):
- Ollama (LLM inference)
- ChromaDB (vector database)
- sentence-transformers (embeddings)
- Python RAG scripts
C:\Users\matte\Documents\GitHub\AI_Agent_Ecosystem\
├── agents\rag_agent.py # New RAG agent
├── config\vantablack_config.py # VantaBlack settings
├── llm_backends\ollama_backend.py # Updated for remote
└── web\ # UI files
/mnt/llm/
├── rag_ingest.py
├── rag_query.py
├── chromadb/
└── LLM-Models/
"Connection refused" to VantaBlack:
# Test connectivity
ssh lightspeed@vantablack "ollama list"Web UI won't start:
cd C:\Users\matte\Documents\GitHub\AI_Agent_Ecosystem
python api/main.py # Check error outputRAG queries fail:
# On VantaBlack, test directly:
python3 /mnt/llm/rag_query.py "test question"Ollama models not loading:
# Check Ollama service
systemctl status ollama
# Restart if needed
sudo systemctl restart ollama- ✅ VantaBlack RAG pipeline operational
- ✅ MCP server for Cursor ready
- 🔄 Add RAG to web UI categories
- 🔄 Test end-to-end workflow
- 🔜 Deploy production config
- 🔜 Add conversation memory
- 🔜 Implement reranking
System Status: ✅ Ready for testing
Run python api/main.py on ALIEN1, open http://localhost:8000, and start querying your documents!