Skip to content

Embedding session timeout not configurable (defaults to 10min despite 30min comment) #378

@rook-assistant

Description

@rook-assistant

Bug Report: Embedding session timeout not configurable (defaults to 10min despite 30min comment)

Description

The qmd embed command has a hardcoded 10-minute session timeout that causes large embedding jobs to fail with SessionReleasedError: LLM session has been released or aborted, even though the code comments claim it uses a 30-minute timeout.

Environment

  • QMD version: @tobilu/qmd (installed via npm/bun global)
  • Platform: Raspberry Pi 5 (ARM64, 4GB RAM)
  • CPU mode: QMD_FORCE_CPU=1 (no GPU)
  • Embedding model: embeddinggemma-300M-Q8_0
  • Collection size: 77 documents, 269 chunks, ~576KB

Steps to Reproduce

  1. Index a workspace with 77+ markdown files:

    qmd collection add /path/to/workspace
    qmd update
  2. Run embedding on CPU (which takes >10 minutes):

    QMD_FORCE_CPU=1 qmd embed
  3. Observe that after ~10 minutes, chunks start failing with:

    ⚠ Error embedding "file.md" chunk 0: SessionReleasedError: LLM session has been released or aborted
    

Expected Behavior

The embedding process should complete all chunks without timing out, or at minimum respect the 30-minute timeout mentioned in the code comments.

Actual Behavior

The session times out after exactly 10 minutes, causing ~28% of chunks to fail (77 out of 269 in our case). The job ran for 31 minutes 36 seconds total but only successfully embedded 192/269 chunks.

Root Cause

File: src/qmd.ts in the vectorIndex() function

The bug: The code has a comment claiming to use a 30-minute timeout:

// Wrap all LLM embedding operations in a session for lifecycle management
// Use 30 minute timeout for large collections    ← THIS COMMENT
await withLLMSession(async (session) => {
  // ...embedding work happens here
});

But the withLLMSession call never passes the maxDuration option, so it falls back to the default 10-minute timeout defined in src/llm.ts:

// In LLMSession constructor:
const maxDuration = options.maxDuration ?? 10 * 60 * 1000; // Default 10 minutes

Proposed Fix

Add the missing maxDuration option to match the comment:

await withLLMSession(async (session) => {
  // ...embedding work
}, { maxDuration: 30 * 60 * 1000, name: 'embed' });

Or better: make it configurable via CLI flag or config:

const embedTimeout = process.env.QMD_EMBED_TIMEOUT_MS 
  ? parseInt(process.env.QMD_EMBED_TIMEOUT_MS) 
  : 30 * 60 * 1000;

await withLLMSession(async (session) => {
  // ...
}, { maxDuration: embedTimeout, name: 'embed' });

Workaround

Currently the only workaround is to:

  1. Run qmd embed multiple times until all chunks succeed
  2. Rely on weekly health checks to catch stragglers
  3. Use a machine with GPU acceleration to complete within 10 minutes

Additional Context

This particularly affects CPU-only embedding which is significantly slower than GPU. On our Pi 5 system, embedding 269 chunks took 31+ minutes, but only the first 192 chunks (first ~10 minutes) succeeded.

The error appears exactly at the 10-minute mark and affects all subsequent chunks, confirming it's a timeout issue rather than a model/memory problem.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions