-
Notifications
You must be signed in to change notification settings - Fork 902
Description
Bug Report: Embedding session timeout not configurable (defaults to 10min despite 30min comment)
Description
The qmd embed command has a hardcoded 10-minute session timeout that causes large embedding jobs to fail with SessionReleasedError: LLM session has been released or aborted, even though the code comments claim it uses a 30-minute timeout.
Environment
- QMD version:
@tobilu/qmd(installed via npm/bun global) - Platform: Raspberry Pi 5 (ARM64, 4GB RAM)
- CPU mode:
QMD_FORCE_CPU=1(no GPU) - Embedding model:
embeddinggemma-300M-Q8_0 - Collection size: 77 documents, 269 chunks, ~576KB
Steps to Reproduce
-
Index a workspace with 77+ markdown files:
qmd collection add /path/to/workspace qmd update
-
Run embedding on CPU (which takes >10 minutes):
QMD_FORCE_CPU=1 qmd embed
-
Observe that after ~10 minutes, chunks start failing with:
⚠ Error embedding "file.md" chunk 0: SessionReleasedError: LLM session has been released or aborted
Expected Behavior
The embedding process should complete all chunks without timing out, or at minimum respect the 30-minute timeout mentioned in the code comments.
Actual Behavior
The session times out after exactly 10 minutes, causing ~28% of chunks to fail (77 out of 269 in our case). The job ran for 31 minutes 36 seconds total but only successfully embedded 192/269 chunks.
Root Cause
File: src/qmd.ts in the vectorIndex() function
The bug: The code has a comment claiming to use a 30-minute timeout:
// Wrap all LLM embedding operations in a session for lifecycle management
// Use 30 minute timeout for large collections ← THIS COMMENT
await withLLMSession(async (session) => {
// ...embedding work happens here
});But the withLLMSession call never passes the maxDuration option, so it falls back to the default 10-minute timeout defined in src/llm.ts:
// In LLMSession constructor:
const maxDuration = options.maxDuration ?? 10 * 60 * 1000; // Default 10 minutesProposed Fix
Add the missing maxDuration option to match the comment:
await withLLMSession(async (session) => {
// ...embedding work
}, { maxDuration: 30 * 60 * 1000, name: 'embed' });Or better: make it configurable via CLI flag or config:
const embedTimeout = process.env.QMD_EMBED_TIMEOUT_MS
? parseInt(process.env.QMD_EMBED_TIMEOUT_MS)
: 30 * 60 * 1000;
await withLLMSession(async (session) => {
// ...
}, { maxDuration: embedTimeout, name: 'embed' });Workaround
Currently the only workaround is to:
- Run
qmd embedmultiple times until all chunks succeed - Rely on weekly health checks to catch stragglers
- Use a machine with GPU acceleration to complete within 10 minutes
Additional Context
This particularly affects CPU-only embedding which is significantly slower than GPU. On our Pi 5 system, embedding 269 chunks took 31+ minutes, but only the first 192 chunks (first ~10 minutes) succeeded.
The error appears exactly at the 10-minute mark and affects all subsequent chunks, confirming it's a timeout issue rather than a model/memory problem.