-
Notifications
You must be signed in to change notification settings - Fork 942
Description
Hi, thank you for maintaining qmd.
A separate issue appears to affect qmd query and qmd vsearch on macOS Apple Silicon. This does not seem to be the same as the oversized-input embedding crash addressed in #393 / #303.
Environment
- qmd:
2.0.1 - install method:
npm install -g @tobilu/qmd - Node.js:
v24.13.1 - also reproduced via Bun shim
- hardware: Apple Silicon (
Apple M1) - backend shown by
qmd status:metal
Observed behavior
Both commands fail even for very short queries:
qmd query "k"
qmd query "factors for ai adoption"
qmd vsearch "ai adoption"qmd search "ai adoption" works correctly.
Error
With Node:
Expanding query...
[Error: Failed to accept token in sampler: Unexpected empty grammar stack after accepting piece: 0 (15)]
Node.js v24.13.1
...
ggml-metal-device.m:612: GGML_ASSERT([rsets->data count] == 0) failed
With Bun, the same grammar-stack error appears first, followed by a Bun crash after the process aborts.
Why this seems distinct from the embedding overflow issue
The failure reproduces with 1-2 word queries such as:
qmd query "k"
qmd vsearch "ai adoption"So this does not appear to depend on large document chunks or oversized embedding input.
Also:
qmd searchworksqmd queryfails duringExpanding query...qmd vsearchalso fails with the same grammar-stack error
This suggests the problem is likely in the local LLM/sampler/grammar path used before retrieval, rather than in long-text embedding of indexed documents.
Additional context
qmd status reports:
- Embedding:
ggml-org/embeddinggemma-300M-GGUF - Reranking:
ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF - Generation:
tobil/qmd-query-expansion-1.7B-gguf
Given the timing (Expanding query...) and the error text (Unexpected empty grammar stack), this may be related to grammar-constrained generation in the expansion step.
Minimal reproduction
npm install -g @tobilu/qmd@2.0.1
qmd status
qmd search "ai adoption" # works
qmd query "k" # fails
qmd vsearch "ai adoption" # failsIf helpful, a full stack trace can be provided.