A local development tool that lets you chat with your project documentation using AI and semantic search. Ask questions in natural language and get real-time streaming responses with source citations.
- 🔍 Semantic Search - Find relevant documentation based on meaning, not just keywords
- 💬 Real-time Streaming - See responses as they're generated via WebSocket
- 📚 Source Citations - Know exactly which files informed each answer
- 🧠 Conversation Memory - Follow-up questions maintain context
- ⚡ Smart Incremental Indexing - MD5 hash tracking only re-indexes new/changed files
- 🚀 GPU-Accelerated Indexing - Automatic CUDA detection for 6-10x faster embedding generation
- 🔄 WebSocket Indexing - Real-time progress updates, no timeout issues on cloud platforms
- 🎨 Clean UI - Modern React interface with Tailwind CSS 4
┌─────────────────┐
│ React Frontend │
│ (TypeScript) │
└────────┬────────┘
│ WebSocket
↓
┌─────────────────┐ ┌──────────────┐
│ FastAPI Server │─────→│ FAISS │
│ (Python) │ │ (Vector DB) │
└────────┬────────┘ └──────────────┘
│
↓
┌─────────────────┐
│ Claude API │
│ (Anthropic) │
└─────────────────┘
-
Indexing Phase (via WebSocket)
- User provides documentation directory path
- Backend crawls directory for supported file types
- Calculates MD5 hash for each file to detect changes
- Smart incremental indexing:
- New files: fully indexed
- Modified files: old chunks marked deleted, re-indexed
- Unchanged files: skipped entirely
- Deleted files: chunks marked as deleted
- Splits documents into ~1000 token chunks with overlap
- Generates vector embeddings for semantic search
- Stores in FAISS with metadata (file path, hash, deleted flag, etc.)
- Real-time progress updates sent via WebSocket
-
Query Phase (via WebSocket)
- User asks a question via the web UI
- Query is converted to a vector embedding
- FAISS finds the 5 most semantically similar chunks (excluding deleted)
- Chunks are sent as context to Claude API
- Response streams back in real-time via WebSocket
-
Smart Updates
- File hashes stored in
file_hashes.pkl - MD5 hash comparison detects file changes
- Only new and modified files are re-indexed
- Unchanged files skip processing entirely
- Deleted chunks are soft-deleted (marked, not removed)
- File hashes stored in
- Python 3.10+ (3.11 or 3.12 recommended)
- Node.js 18+
- Anthropic API key (get one here)
- Windows ARM64 users: Use WSL2 for best compatibility (see below)
If you're on Windows ARM64 (Snapdragon/Copilot+ PC), we strongly recommend using WSL2:
# Install WSL2 (PowerShell as Administrator)
wsl --install
# After restart, open Ubuntu and navigate to project
cd /mnt/c/Users/YourUsername/path/to/doc-chat/backend
# Follow the backend setup instructions belowWhy WSL2? Python data science packages (numpy, faiss-cpu, sentence-transformers) have better pre-built wheel support on Linux ARM64 than Windows ARM64. This avoids compilation errors and missing compiler issues.
# Clone the repository
git clone https://github.com/yourusername/doc-chat.git
cd doc-chat/backend
# Create virtual environment
python3 -m venv venv
# Activate (Linux/Mac/WSL2)
source venv/bin/activate
# Or activate (Windows CMD - not recommended for ARM64)
venv\Scripts\activate
# Install dependencies
pip install --upgrade pip
pip install -r requirements.txt
# Set up environment variables
cp .env.example .env
# Edit .env and add your ANTHROPIC_API_KEYrequirements.txt:
fastapi==0.104.1
uvicorn[standard]==0.24.0
websockets==12.0
anthropic==0.7.8
python-dotenv==1.0.0
tiktoken>=0.7.0
langchain==0.1.0
langchain-community==0.0.10
faiss-cpu==1.13.2
sentence-transformers==2.2.2
numpy>=1.25.0,<2.0cd ../frontend
# Install dependencies
npm install
# Start development server
npm run devcd backend
source venv/bin/activate # or venv\Scripts\activate on Windows
python app.pyThe API server will start on http://localhost:8000
cd frontend
npm run devThe web UI will be available at http://localhost:5173
On first launch, you'll see the indexing screen. Enter the path to your project documentation:
/path/to/your/project/docs
Or use relative paths:
../my-project
./docs
The indexer will process:
- Markdown files (
.md) - Text files (
.txt) - Code files (
.py,.js,.ts,.tsx,.cs) - JSON files (
.json)
Example queries to try:
- "How does authentication work?"
- "Explain the payment processing flow"
- "What API endpoints are available for user management?"
- "Show me examples of error handling"
- "What's the database schema for orders?"
Create a .env file in the backend/ directory:
# Required
ANTHROPIC_API_KEY=sk-ant-...
# Optional - Vector Database
FAISS_PERSIST_DIR=./data/faiss_db # Directory to persist FAISS index
# Optional - Text Chunking
CHUNK_SIZE=1000 # Size of text chunks for indexing
CHUNK_OVERLAP=200 # Overlap between consecutive chunks
# Optional - Embedding Model
SENTENCE_TRANSFORMER_MODEL=all-MiniLM-L6-v2 # Sentence transformer model name
# Optional - File Types
INDEX_FILE_TYPES=.md,.txt,.py,.cs,.js,.ts,.tsx,.json,.yaml,.yml # Comma-separated file extensions to index
# Optional - AI Response Length
MAX_TOKENS=16384 # Maximum tokens in AI responses (default: 16384, max: 200000)
# Optional - Embedding Performance (see Performance Optimization section)
EMBEDDING_BATCH_SIZE=64 # Batch size for GPU encoding
EMBEDDING_CPU_BATCH_SIZE=32 # Batch size for CPU encoding
EMBEDDING_MAX_WORKERS=4 # Number of CPU workers for multiprocessing
FILE_IO_WORKERS=8 # Workers for parallel file reading
MIN_CHUNKS_FOR_MULTIPROCESS=999999 # Chunk threshold to enable CPU multiprocessing (default: disabled)You can customize which file types to index by setting the INDEX_FILE_TYPES environment variable in your .env file:
# Add more file extensions (comma-separated)
INDEX_FILE_TYPES=.md,.txt,.py,.cs,.js,.ts,.tsx,.json,.yaml,.yml,.java,.cpp,.goAlternatively, you can edit backend/indexer.py directly:
def _should_index_file(self, filepath: Path) -> bool:
"""Check if file should be indexed"""
file_types = os.getenv("INDEX_FILE_TYPES", ".md,.txt,.py,.cs,.js,.ts,.tsx,.json,.yaml,.yml")
extensions = {ext.strip() for ext in file_types.split(',')}
return filepath.suffix.lower() in extensionsSmaller chunks = more precise but less context Larger chunks = more context but less precise
You can adjust chunk size and overlap via environment variables in your .env file:
# Smaller chunks for more precise results
CHUNK_SIZE=500
CHUNK_OVERLAP=100
# Larger chunks for more context
CHUNK_SIZE=2000
CHUNK_OVERLAP=400Alternatively, you can edit backend/indexer.py directly:
# In indexer.py
self.text_splitter = RecursiveCharacterTextSplitter(
chunk_size=int(os.getenv("CHUNK_SIZE", 1000)),
chunk_overlap=int(os.getenv("CHUNK_OVERLAP", 200)),
separators=["\n\n", "\n", ". ", " ", ""]
)If AI responses are getting cut off, you can increase the maximum token limit via the MAX_TOKENS environment variable in your .env file:
# Default (approximately 12,000-16,000 words)
MAX_TOKENS=16384
# For longer responses (approximately 24,000-32,000 words)
MAX_TOKENS=32768
# Maximum supported by Claude Sonnet 4
MAX_TOKENS=200000The Claude Sonnet 4 model supports up to 200,000 output tokens, so you can set this as high as needed. Each token is roughly 3/4 of a word on average.
Note: Higher values may increase API costs and response times, but ensure complete responses for complex questions.
This project uses Tailwind CSS 4 with the Vite plugin. No tailwind.config.js is needed for basic usage.
src/index.css:
@import "tailwindcss";vite.config.ts:
import tailwindcss from '@tailwindcss/vite'
export default defineConfig({
plugins: [react(), tailwindcss()],
// ...
})For custom configuration (optional), create tailwind.config.ts:
import type { Config } from 'tailwindcss'
export default {
content: ['./index.html', './src/**/*.{js,ts,jsx,tsx}'],
theme: {
extend: {
colors: {
primary: '#3b82f6',
},
},
},
} satisfies ConfigIndex documents from a directory (legacy endpoint).
Note: For long-running indexing operations, use the WebSocket endpoint /ws/index instead. This endpoint may timeout on cloud platforms with connection time limits (e.g., Azure's 3.5 minute limit).
Request:
{
"directory": "/path/to/docs"
}Response:
{
"success": true,
"stats": {
"files": 42,
"chunks": 387,
"new": 10,
"modified": 5,
"unchanged": 27,
"deleted": 2
}
}Get indexing statistics.
Response:
{
"total_chunks": 387,
"dimension": 384
}Get detailed information about all indexed files.
Response:
{
"total_files": 42,
"files": [
{
"file_path": "/path/to/docs/auth.md",
"file_name": "auth.md",
"extension": ".md",
"chunk_count": 12,
"hash": "5d41402abc4b2a76b9719d911017c592"
},
{
"file_path": "/path/to/docs/api.md",
"file_name": "api.md",
"extension": ".md",
"chunk_count": 8,
"hash": "7d793037a0760186574b0282f2f435e7"
}
]
}Use this endpoint to:
- Debug indexing issues (verify expected files are indexed)
- View which files are in the vector database
- Check chunk counts per file
- Verify file hashes for change detection
Real-time document indexing with progress updates. Recommended for production use to avoid timeout issues on cloud platforms.
Send:
{
"directory": "/path/to/docs"
}Receive (multiple progress messages):
Scan start:
{
"type": "scan_start",
"data": {
"directory": "/path/to/docs"
}
}File processing:
{
"type": "file_processing",
"data": {
"file": "auth.md",
"status": "new"
}
}File processed:
{
"type": "file_processed",
"data": {
"file": "auth.md",
"chunks": 12
}
}Embedding generation:
{
"type": "embedding_start",
"data": {
"total_chunks": 387,
"device": "cuda:0",
"batch_size": 64
}
}Embedding progress (batched):
{
"type": "embedding_progress",
"data": {
"processed": 128,
"total": 387,
"percent": 33
}
}Final statistics:
{
"type": "stats",
"data": {
"files": 42,
"chunks": 387,
"new": 10,
"modified": 5,
"unchanged": 27,
"deleted": 2
}
}Completion:
{
"type": "done",
"data": {}
}Error (non-fatal):
{
"type": "error",
"data": {
"message": "Failed to process file.txt"
}
}Fatal error:
{
"type": "fatal_error",
"data": {
"message": "Invalid directory path"
}
}Other message types: file_skipped, file_deleted, embedding_progress, embedding_info, embedding_complete, saving, save_complete
Real-time chat with streaming responses.
Send:
{
"query": "How does authentication work?"
}Receive (multiple messages):
Sources message:
{
"type": "sources",
"data": [
{
"file": "auth.md",
"path": "/docs/auth.md",
"chunk": 0
}
]
}Content chunks (streamed):
{
"type": "content",
"data": "Authentication in this system uses..."
}Completion signal:
{
"type": "done"
}doc-chat/
├── backend/
│ ├── app.py # FastAPI application & WebSocket
│ ├── indexer.py # Document ingestion & chunking
│ ├── retriever.py # Vector search & Claude integration
│ ├── requirements.txt
│ ├── .env.example
│ └── data/
│ └── faiss_db/ # Vector database (auto-created)
├── frontend/
│ ├── src/
│ │ ├── App.tsx # Main application component
│ │ ├── components/
│ │ │ ├── Chat.tsx # Chat interface
│ │ │ ├── MessageList.tsx
│ │ │ ├── SourcePanel.tsx
│ │ │ └── IndexStatus.tsx
│ │ ├── hooks/
│ │ │ └── useWebSocket.ts # WebSocket state management
│ │ └── types/
│ │ └── index.ts # TypeScript type definitions
│ ├── package.json
│ ├── tailwind.config.js
│ └── vite.config.ts
├── docs/ # Your documentation (example)
└── README.md
- FastAPI - Modern Python web framework
- FAISS - Vector database for semantic search
- LangChain - Text splitting and document processing
- Anthropic SDK - Claude API integration
- Sentence Transformers - Embedding generation
- WebSockets - Real-time streaming communication
- React 19 - UI framework
- TypeScript - Type-safe JavaScript
- Vite - Fast build tool and dev server
- Tailwind CSS 4 - Modern utility-first styling with Vite plugin
- Native WebSocket API - Real-time updates
# Backend tests
cd backend
pytest
# Frontend tests
cd frontend
npm test# Frontend production build
cd frontend
npm run build
# Serve with backend
cd ../backend
uvicorn app:app --host 0.0.0.0 --port 8000# Python linting
cd backend
flake8 .
black .
# TypeScript linting
cd frontend
npm run lint
npm run format# Make sure you're in the virtual environment
source venv/bin/activate # Linux/Mac/WSL2
venv\Scripts\activate # Windows
# Reinstall dependencies
pip install -r requirements.txt- Ensure backend is running on port 8000
- Check CORS settings in
app.py - Verify frontend is using correct WebSocket URL (
ws://localhost:8000/ws/chat)
- Check that documents were indexed:
GET /api/stats - View which files are indexed:
GET /api/indexed-files - Verify file extensions are supported (check
INDEX_FILE_TYPESin.env) - Try re-indexing: Delete
data/faiss_db/and re-index
- Increase
MAX_TOKENSin your.envfile (default: 16384, max: 200000) - See the "Adjusting AI Response Length" section for details
- Restart your backend server after changing
.env
- Reduce embedding batch size:
EMBEDDING_BATCH_SIZE=16(for GPU) orEMBEDDING_CPU_BATCH_SIZE=16(for CPU) - Reduce chunk size in
indexer.py - Process directories in smaller batches
- Increase system RAM or use swap space
- Verify CUDA is available:
python -c "import torch; print(torch.cuda.is_available())" - Install PyTorch with CUDA support:
pip install torch --index-url https://download.pytorch.org/whl/cu118 - Check GPU memory: Large batch sizes may exceed VRAM, try
EMBEDDING_BATCH_SIZE=16
- Verify
ANTHROPIC_API_KEYis set correctly in.env - Check API usage limits in Anthropic console
- Ensure you're using a supported model name (
claude-sonnet-4-20250514)
If you get errors about missing compilers or "can't find Rust compiler":
- Use WSL2 (strongly recommended - see setup section above)
- Or install Visual Studio Build Tools with C++ support (not recommended for this project)
- The WSL2 approach avoids all compilation issues
If you see "can't find Rust compiler" for tiktoken:
- Update to
tiktoken>=0.7.0which has pre-built wheels - Or use WSL2 where all packages have proper wheels
The indexer automatically detects and uses the best available hardware:
- GPU (CUDA): If a CUDA-compatible GPU is available, embeddings are generated on the GPU with ~6-10x speedup
- CPU Multiprocessing: On CPU-only systems, embeddings are generated in parallel across multiple cores with ~3-4x speedup
You'll see the device being used in the console output:
Loading embedding model...
Using GPU: NVIDIA GeForce RTX 3080
or
Loading embedding model...
No GPU available, using CPU with multiprocessing
Adjust embedding performance via environment variables:
| Variable | Default | Description |
|---|---|---|
EMBEDDING_BATCH_SIZE |
64 | Batch size for GPU encoding. Reduce to 16-32 for GPUs with <4GB VRAM |
EMBEDDING_CPU_BATCH_SIZE |
32 | Batch size for CPU encoding |
EMBEDDING_MAX_WORKERS |
4 | Number of CPU processes for multiprocessing |
FILE_IO_WORKERS |
8 | Workers for parallel file reading |
MIN_CHUNKS_FOR_MULTIPROCESS |
999999 | Chunk threshold to enable CPU multiprocessing. Set to 500-1000 on native Linux for large datasets. Disabled by default due to noisy output on WSL2/Windows |
Performance by dataset size:
| Chunks | GPU Time | CPU Time (4 cores) |
|---|---|---|
| 100 | ~1s | ~3s |
| 1,000 | ~5s | ~20s |
| 10,000 | ~50s | ~200s |
The indexer handles large datasets efficiently with:
- Batched Embedding Generation - Processes chunks in configurable batches to control memory usage
- Progress Callbacks - Real-time updates during embedding generation via WebSocket
- Smart Change Detection - Only re-indexes new/modified files using MD5 hashes
# Search only specific file types
results = self.store.search(
query,
# Add filtering logic for specific extensions
)You can use different sentence transformer models by setting the SENTENCE_TRANSFORMER_MODEL environment variable in your .env file:
# Larger, more accurate model
SENTENCE_TRANSFORMER_MODEL=all-mpnet-base-v2
# Smaller, faster model
SENTENCE_TRANSFORMER_MODEL=paraphrase-MiniLM-L3-v2
# Default model (if not set)
SENTENCE_TRANSFORMER_MODEL=all-MiniLM-L6-v2Note: When changing the embedding model, you'll need to re-index your documents as different models produce embeddings of different dimensions and characteristics.
Alternatively, you can edit backend/indexer.py and backend/retriever.py directly:
# In indexer.py and retriever.py
self.model = SentenceTransformer(os.getenv("SENTENCE_TRANSFORMER_MODEL", "all-MiniLM-L6-v2"))Auto-reindex on file changes:
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler
class DocChangeHandler(FileSystemEventHandler):
def on_modified(self, event):
if event.is_directory:
return
# Re-index changed file
indexer.index_file(event.src_path)
observer = Observer()
observer.schedule(handler, path='./docs', recursive=True)
observer.start()- API Keys: Never commit
.envfiles. Use environment variables in production. - Input Validation: The backend validates file paths to prevent directory traversal.
- Rate Limiting: Consider adding rate limits to API endpoints for production use.
- CORS: Update
allow_originsin production to specific domains only. - Sandboxing: Consider running indexing in a sandboxed environment for untrusted documents.
- Multi-project support (switch between indexed projects)
- Export conversation history
- Support for images/diagrams in documentation
- Advanced filtering (date ranges, file types, directories)
- API authentication (JWT tokens)
- Docker deployment configuration
- Slack/Discord bot integration
- VS Code extension
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
MIT License - see LICENSE file for details
- Anthropic for Claude API
- FAISS for vector similarity search
- LangChain for document processing utilities
- FastAPI for the excellent web framework
- Sentence Transformers for embedding generation
- 📧 Email: hello@iceninemedia.com
- 🐛 Issues: GitHub Issues
- 💬 Discussions: GitHub Discussions
Performance Optimization
- Automatic GPU detection and CUDA acceleration for embedding generation (6-10x speedup)
- CPU multiprocessing for embedding generation on systems without GPU (3-4x speedup)
- Batched embedding with configurable batch sizes and real-time progress updates
- New environment variables for tuning embedding performance (
EMBEDDING_BATCH_SIZE,EMBEDDING_MAX_WORKERS, etc.) - Added directory/file exclusions for indexing (node_modules, bin, obj, etc.)
Configuration & Debugging Improvements
- Added configurable
MAX_TOKENSenvironment variable for AI response length (fixes response cutoff issues) - Increased default max tokens from 8,192 to 16,384 (can be set up to 200,000)
- Added
/api/indexed-filesREST endpoint for debugging indexing issues - Made indexer fully configurable via environment variables:
CHUNK_SIZE- Size of text chunks for indexing (default: 1000)CHUNK_OVERLAP- Overlap between chunks (default: 200)SENTENCE_TRANSFORMER_MODEL- Embedding model selection (default: all-MiniLM-L6-v2)INDEX_FILE_TYPES- File extensions to index (default: .md,.txt,.py,.cs,.js,.ts,.tsx,.json,.yaml,.yml)FAISS_PERSIST_DIR- Vector database location (default: ./data/faiss_db)
Code Quality
- Refactored
indexer.pyto reduce cyclomatic complexity from 28 to under 10 - Added 6 new helper methods with single-responsibility design:
_get_file_status()- Determine if file is new, modified, or unchanged_process_single_file()- Process individual file and create chunks_process_deleted_files()- Handle deleted file tracking_add_documents_to_index()- Generate embeddings and index documents_scan_and_process_files()- Scan directory for eligible files_finalize_indexing()- Complete indexing and print summary
- Fixed all flake8 errors (E722, E501, C901)
- Added flake8 configuration file with project-specific rules
- Fixed bare
exceptclause in app.py (now catchesException)
Testing
- Added comprehensive test coverage for refactored helper methods (11 new tests)
- Added tests for
/api/indexed-filesendpoint - Removed coverage files from git tracking (.coverage, htmlcov/, coverage.xml)
- Updated CI/CD pipeline to trigger on feature branches
- Achieved 97% test coverage on indexer.py
Documentation
- Enhanced README with:
- Configuration examples for all environment variables
- Troubleshooting section for AI response cutoffs
- Documentation of new
/api/indexed-filesendpoint - "Adjusting AI Response Length" section with usage examples
- Updated
.env.examplewith all configurable options and helpful comments
- Initial release
- Basic indexing and chat functionality
- WebSocket streaming support
- Source citation panel
- FAISS vector database integration
- React 19 + TypeScript frontend
- Tailwind CSS 4 styling
Made with ❤️ by John McKillip | Ice Nine Media