Chak - Local AI Chat System

A comprehensive local AI chat application built from scratch to explore modern AI integration patterns, RAG (Retrieval-Augmented Generation), and production-grade backend architecture. The system runs entirely locally using Ollama for LLM services, with no cloud dependencies.

Features

Multi-Profile System: Switch between different contexts (coding, paperwork, general) with separate configurations and memory
Semantic Memory: Long-term memory using vector embeddings for context-aware conversations
Document RAG: Automatic document indexing with intelligent Markdown-aware chunking and semantic search
Web Search Integration: Optional Brave Search API integration for current information
Hot Reload: Switch profiles without server restart
Auto-Indexing: File watching with hash-based change detection for efficient document updates

Architecture

Backend (Go)

Clean Architecture: Interface-manager pattern with clear separation of concerns
Package Structure:
- config: Profile and configuration management
- handler: HTTP request handlers (chat, profile switching)
- memory: Vector-based semantic memory with filtering
- indexer: Document scanning and indexing with watcher
- embedding: Ollama embedding integration
- ollama: LLM generation interface
- search: Web search providers (Brave, DuckDuckGo)
- prompt: Context-aware prompt building
- document: Smart text chunking respecting document structure
- middleware: CORS and logging

Frontend (Vanilla JS)

Simple web interface with markdown rendering
Real-time chat with streaming support
Profile selector and model switcher
Toggle controls for web search and RAG

Prerequisites

Go 1.25.4+
Ollama running locally
Embedding Model: all-minilm:33m (or configure your own)
LLM Models: Any Ollama-compatible models
Brave Search API Key (optional, for web search)

Installation

Clone the repository

git clone <repository-url>
cd chak-server

Install dependencies

cd server
go mod download

Set up Ollama

# Install Ollama (see ollama.ai)
# Pull required models
ollama pull all-minilm:33m
ollama pull llama2  # or your preferred model

Configure environment

# Create .env file in server directory
OLLAMA_HOST=localhost
BRAVE_API_KEY=your_api_key_here  # Optional

Configure profiles

Edit server/config.json to customize profiles:

{
  "active_profile": "coding",
  "profiles": {
    "coding": {
      "name": "Coding Assistant",
      "description": "Programming help and code examples",
      "directories": ["./documents/coding"],
      "memory_file": "memory_coding.json",
      "index_file": "index_coding.json",
      "extensions": [".txt", ".md"],
      "max_file_size": 5242880
    }
  }
}

Usage

Start the server

cd server
go run main.go

Open the web interface I use python in this case

cd web
python -m http.server 8080

Chat with the AI

Select a model from the dropdown
Toggle web search or RAG as needed
Switch profiles on the fly
Start chatting!

Project Structure

.
├── server/
│   ├── main.go                 # Application entry point
│   ├── config.json             # Profile configurations
│   ├── internal/
│   │   ├── config/            # Configuration management
│   │   ├── handler/           # HTTP handlers
│   │   ├── memory/            # Semantic memory system
│   │   ├── indexer/           # Document indexing
│   │   ├── embedding/         # Vector embeddings
│   │   ├── ollama/            # LLM integration
│   │   ├── search/            # Web search providers
│   │   ├── prompt/            # Prompt building
│   │   ├── document/          # Text chunking
│   │   └── middleware/        # HTTP middleware
│   └── documents/             # Document directories per profile
└── web/
    ├── index.html             # Main interface
    ├── css/style.css          # Styling
    └── js/scripts.js          # Frontend logic

Key Design Patterns

Memory Architecture

Short-term: Conversational history with sliding window
Long-term: Vector embeddings with semantic search
Metadata filtering: Prevents cross-contamination between document and conversation memories

Document Chunking

Respects Markdown structure (headers, paragraphs)
Smart sentence-based splitting
Code block preservation
Configurable chunk sizes

Hot Reload

Profile switching without restart
Proper watcher lifecycle management
Thread-safe operations with mutexes
Goroutine management with stop channels

API Endpoints

GET / - Health check
POST /chat - Send chat message
GET /profiles - List available profiles
GET /profile/active - Get current active profile
POST /profile/switch - Switch to different profile

Configuration Options

Profile Settings

directories: Paths to watch for documents
memory_file: JSON file for storing memories
index_file: JSON file for index state
extensions: Allowed file extensions
max_file_size: Maximum file size in bytes

Development Notes

Thread Safety

Memory operations use sync.RWMutex
Config access is protected
Proper goroutine cleanup on profile switch

Performance

Hash-based change detection avoids redundant indexing
Vector similarity using cosine distance
Efficient metadata filtering

Limitations & Future Work

Currently single-user (no authentication)
In-memory vector store (consider persistent storage)
Basic chunking algorithm (could use more sophisticated methods)
No conversation branching or editing

Cross-Platform Memory Sharing

For sharing memory files across Windows/Linux machines:

Use SMB/CIFS network shares
Mount shared directory containing memory files
Update config.json paths to point to network location

License

Havent thought of it yet

Contributing

This is a learning project built to understand AI integration patterns. Feedback and suggestions welcome!

Acknowledgments

Built with Ollama for local LLM inference
Uses Brave Search API for web search
Inspired by modern RAG architectures and semantic memory systems

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
server		server
web		web
.gitignore		.gitignore
README.md		README.md
start.bat		start.bat
start.ps1		start.ps1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chak - Local AI Chat System

Features

Architecture

Backend (Go)

Frontend (Vanilla JS)

Prerequisites

Installation

Usage

Project Structure

Key Design Patterns

Memory Architecture

Document Chunking

Hot Reload

API Endpoints

Configuration Options

Profile Settings

Development Notes

Thread Safety

Performance

Limitations & Future Work

Cross-Platform Memory Sharing

License

Contributing

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

dmsadjt/chak

Folders and files

Latest commit

History

Repository files navigation

Chak - Local AI Chat System

Features

Architecture

Backend (Go)

Frontend (Vanilla JS)

Prerequisites

Installation

Usage

Project Structure

Key Design Patterns

Memory Architecture

Document Chunking

Hot Reload

API Endpoints

Configuration Options

Profile Settings

Development Notes

Thread Safety

Performance

Limitations & Future Work

Cross-Platform Memory Sharing

License

Contributing

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages