-
Notifications
You must be signed in to change notification settings - Fork 918
Open
Description
Summary
qmd query crashes during reranking when the combined input (query + document chunk + Qwen3 template overhead) exceeds RERANK_CONTEXT_SIZE = 2048. The error is deterministic and reproducible.
Environment
- QMD version: 1.1.0 (also reproduced on 1.0.7)
- OS: Rocky Linux 9 (x86_64)
- Node.js: v22.22.0
- GPU: NVIDIA RTX 3090 (24GB VRAM)
- Content: ~345 markdown files, primarily CJK (Chinese) text
- Index: 1386 chunks from 338 documents
Error
$ qmd query "test" --json
Searching 6 queries...
Reranking 40 chunks...
Error: The input lengths of some of the given documents exceed the context size.
Try to increase the context size to at least 2099
at LlamaRankingContext.rankAll (...LlamaRankingContext.js:50:19)
at LlamaCpp.rerank (...llm.js:751:82)
Root Cause
In src/llm.ts:
static RERANK_CONTEXT_SIZE = 2048;The reranker input = query tokens + chunk tokens + Qwen3 template overhead (~200). The comment says chunks are capped at ~800 tokens so ~1100 should fit, but:
- CJK tokenization produces different token counts — a chunk ~900 tokens in the embedding tokenizer may be longer in the Qwen3 reranker tokenizer.
- Query expansion generates HyDE documents 100+ tokens, pushing total past 2048.
- The error requests at least 2099 — only 51 tokens over.
Workaround
Changing RERANK_CONTEXT_SIZE to 4096 in dist/llm.js resolves the issue.
Suggested Fix
- Increase default to 4096 (safest, modest VRAM cost)
- Dynamic sizing: compute required context from actual longest (query + chunk) pair
- Graceful fallback: skip oversized chunks during reranking instead of crashing (use retrieval score)
Option 3 is most robust.
Related
- Changelog v1.0.0: right-sized reranker context (40960 to 2048, 17x less memory)
- The reduction was too aggressive for CJK content with long query expansions
Thank you for building QMD!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels