feat: add remote LLM backend with averaged embedding optimization by olyashok · Pull Request #325 · tobi/qmd

olyashok · 2026-03-08T21:52:59Z

Summary

Adds an optional HTTP-based remote LLM backend (targeting llama.cpp servers) as an alternative to local node-llama-cpp, plus an embedding optimization that reduces query time significantly on large indexes.

New features:

src/llm-remote.ts: RemoteLLM class — embed/rerank/generate via HTTP (llama.cpp compatible)
hybridQuery, vectorSearchQuery, structuredSearch accept an optional llm override so callers can use a remote backend without changing global config
Average all expanded query embeddings into a single vector → one sqlite-vec scan instead of N, cutting query time from ~47s to ~12s on a 25 GB index
New CLI commands for remote mode; --local flag to force local node-llama-cpp
.qmd directory resolution for DB and collection config paths
server/docker-compose.yml for spinning up llama.cpp embed/rerank/generate services

Why: On large corpora (200k+ files) local embedding becomes a bottleneck. This lets QMD scale to server-side models while keeping local mode fully intact.

Test plan

Local mode (--local) still works unchanged
Remote mode connects to llama.cpp server and returns correct results
Averaged embeddings produce equivalent quality results to N-scan approach

🤖 AI-assisted (Claude) | Tested on local instance with ~25 GB index

Wire remote HTTP-based LLM (embed/rerank/generate via llama.cpp servers) as an alternative to local node-llama-cpp. Add `llm?: LLM` option to hybridQuery, vectorSearchQuery, and structuredSearch so callers can override the LLM backend. Average all expanded query embeddings into a single vector for one sqlite-vec scan instead of N separate scans, reducing query time from ~47s to ~12s on a 25GB index. Key changes: - llm-remote.ts: RemoteLLM class with HTTP embed/rerank/generate - store.ts: LLM passthrough for expandQuery, embedBatch, rerank; averaged embedding scan in hybridQuery and vectorSearchQuery; .qmd directory resolution for DB path - qmd.ts: withLLMSessionAuto, remote CLI commands, --local flag - collections.ts: config path resolution via .qmd directory - llm.ts: embedBatch added to LLM interface Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Updated the embed method in RemoteLLM to implement a retry mechanism, allowing up to three attempts for transient errors during remote server requests. Improved error handling and logging for both individual and batch embedding processes, ensuring robustness in case of failures. Additionally, modified comments to clarify the behavior of batch embedding fallbacks. Key changes: - Added retry logic in embed method for transient errors - Enhanced error handling and logging - Updated comments for clarity on batch embedding behavior

tobi force-pushed the main branch from d2a6c42 to ed0249f Compare March 10, 2026 16:59

Claude and others added 2 commits March 11, 2026 12:36

olyashok force-pushed the feat/remote-llm-backend branch from 7f4e1e4 to 575b9ea Compare March 11, 2026 12:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add remote LLM backend with averaged embedding optimization#325

feat: add remote LLM backend with averaged embedding optimization#325
olyashok wants to merge 2 commits intotobi:mainfrom
cellect-ai:feat/remote-llm-backend

olyashok commented Mar 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

olyashok commented Mar 8, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants