Project
vgrep
Description
The normalize() function in src/core/embeddings.rs doesn't handle NaN or Infinity values in embeddings. If the embedding model produces NaN (which can happen with numerical instability, certain inputs, or model bugs), these values propagate through the system causing:
- Silent incorrect similarity scores
- Undefined sort ordering (NaN comparisons are always false)
- Results appearing in random order
Error Message
No error - NaN values propagate silently.
Debug Logs
System Information
- Bounty Version: 0.1.0
- OS: Ubuntu 24.04 LTS
- Rust: 1.75+
Screenshots
No response
Steps to Reproduce
- Feed the embedding model text that causes numerical instability
- Store the resulting embedding (containing NaN)
- Run a search query
- Observe that results with NaN embeddings have undefined ordering
Expected Behavior
- Detect NaN/Infinity in embeddings during generation
- Either return an error or replace with zeros
- Log a warning about potential model issues
- Never store or return NaN embeddings
Actual Behavior
- NaN values pass through all stages
- Stored in database as binary blob
- Cosine similarity produces NaN
- Sort comparison treats NaN as "equal" to everything, giving random order
- No indication to user that something is wrong
Additional Context
NaN in floating-point comparisons is particularly insidious because:
NaN == NaN is false
NaN < x is false for all x
NaN > x is false for all x
partial_cmp returns None for NaN comparisons
This means sort order becomes undefined when NaN values are present, and results may appear in any order with no indication of the problem.
Models can produce NaN values when:
- Input text causes numerical overflow
- Model weights are corrupted
- Quantization introduces numerical issues
- Context is exhausted (edge cases in llama.cpp)
Project
vgrep
Description
The
normalize()function insrc/core/embeddings.rsdoesn't handle NaN or Infinity values in embeddings. If the embedding model produces NaN (which can happen with numerical instability, certain inputs, or model bugs), these values propagate through the system causing:Error Message
Debug Logs
System Information
Screenshots
No response
Steps to Reproduce
Expected Behavior
Actual Behavior
Additional Context
NaN in floating-point comparisons is particularly insidious because:
NaN == NaNis falseNaN < xis false for all xNaN > xis false for all xpartial_cmpreturnsNonefor NaN comparisonsThis means sort order becomes undefined when NaN values are present, and results may appear in any order with no indication of the problem.
Models can produce NaN values when: