A comprehensive local AI chat application built from scratch to explore modern AI integration patterns, RAG (Retrieval-Augmented Generation), and production-grade backend architecture. The system runs entirely locally using Ollama for LLM services, with no cloud dependencies.
- Multi-Profile System: Switch between different contexts (coding, paperwork, general) with separate configurations and memory
- Semantic Memory: Long-term memory using vector embeddings for context-aware conversations
- Document RAG: Automatic document indexing with intelligent Markdown-aware chunking and semantic search
- Web Search Integration: Optional Brave Search API integration for current information
- Hot Reload: Switch profiles without server restart
- Auto-Indexing: File watching with hash-based change detection for efficient document updates
- Clean Architecture: Interface-manager pattern with clear separation of concerns
- Package Structure:
config: Profile and configuration managementhandler: HTTP request handlers (chat, profile switching)memory: Vector-based semantic memory with filteringindexer: Document scanning and indexing with watcherembedding: Ollama embedding integrationollama: LLM generation interfacesearch: Web search providers (Brave, DuckDuckGo)prompt: Context-aware prompt buildingdocument: Smart text chunking respecting document structuremiddleware: CORS and logging
- Simple web interface with markdown rendering
- Real-time chat with streaming support
- Profile selector and model switcher
- Toggle controls for web search and RAG
- Go 1.25.4+
- Ollama running locally
- Embedding Model:
all-minilm:33m(or configure your own) - LLM Models: Any Ollama-compatible models
- Brave Search API Key (optional, for web search)
- Clone the repository
git clone <repository-url>
cd chak-server- Install dependencies
cd server
go mod download- Set up Ollama
# Install Ollama (see ollama.ai)
# Pull required models
ollama pull all-minilm:33m
ollama pull llama2 # or your preferred model- Configure environment
# Create .env file in server directory
OLLAMA_HOST=localhost
BRAVE_API_KEY=your_api_key_here # Optional- Configure profiles
Edit server/config.json to customize profiles:
{
"active_profile": "coding",
"profiles": {
"coding": {
"name": "Coding Assistant",
"description": "Programming help and code examples",
"directories": ["./documents/coding"],
"memory_file": "memory_coding.json",
"index_file": "index_coding.json",
"extensions": [".txt", ".md"],
"max_file_size": 5242880
}
}
}- Start the server
cd server
go run main.go- Open the web interface I use python in this case
cd web
python -m http.server 8080
- Chat with the AI
- Select a model from the dropdown
- Toggle web search or RAG as needed
- Switch profiles on the fly
- Start chatting!
.
├── server/
│ ├── main.go # Application entry point
│ ├── config.json # Profile configurations
│ ├── internal/
│ │ ├── config/ # Configuration management
│ │ ├── handler/ # HTTP handlers
│ │ ├── memory/ # Semantic memory system
│ │ ├── indexer/ # Document indexing
│ │ ├── embedding/ # Vector embeddings
│ │ ├── ollama/ # LLM integration
│ │ ├── search/ # Web search providers
│ │ ├── prompt/ # Prompt building
│ │ ├── document/ # Text chunking
│ │ └── middleware/ # HTTP middleware
│ └── documents/ # Document directories per profile
└── web/
├── index.html # Main interface
├── css/style.css # Styling
└── js/scripts.js # Frontend logic
- Short-term: Conversational history with sliding window
- Long-term: Vector embeddings with semantic search
- Metadata filtering: Prevents cross-contamination between document and conversation memories
- Respects Markdown structure (headers, paragraphs)
- Smart sentence-based splitting
- Code block preservation
- Configurable chunk sizes
- Profile switching without restart
- Proper watcher lifecycle management
- Thread-safe operations with mutexes
- Goroutine management with stop channels
GET /- Health checkPOST /chat- Send chat messageGET /profiles- List available profilesGET /profile/active- Get current active profilePOST /profile/switch- Switch to different profile
directories: Paths to watch for documentsmemory_file: JSON file for storing memoriesindex_file: JSON file for index stateextensions: Allowed file extensionsmax_file_size: Maximum file size in bytes
- Memory operations use
sync.RWMutex - Config access is protected
- Proper goroutine cleanup on profile switch
- Hash-based change detection avoids redundant indexing
- Vector similarity using cosine distance
- Efficient metadata filtering
- Currently single-user (no authentication)
- In-memory vector store (consider persistent storage)
- Basic chunking algorithm (could use more sophisticated methods)
- No conversation branching or editing
For sharing memory files across Windows/Linux machines:
- Use SMB/CIFS network shares
- Mount shared directory containing memory files
- Update
config.jsonpaths to point to network location
Havent thought of it yet
This is a learning project built to understand AI integration patterns. Feedback and suggestions welcome!
- Built with Ollama for local LLM inference
- Uses Brave Search API for web search
- Inspired by modern RAG architectures and semantic memory systems