Skip to content

[BUG] NaN/Infinity in Embeddings Propagates Silently #150

@olddev94

Description

@olddev94

Project

vgrep

Description

The normalize() function in src/core/embeddings.rs doesn't handle NaN or Infinity values in embeddings. If the embedding model produces NaN (which can happen with numerical instability, certain inputs, or model bugs), these values propagate through the system causing:

  1. Silent incorrect similarity scores
  2. Undefined sort ordering (NaN comparisons are always false)
  3. Results appearing in random order

Error Message

No error - NaN values propagate silently.

Debug Logs

System Information

- Bounty Version: 0.1.0
- OS: Ubuntu 24.04 LTS
- Rust: 1.75+

Screenshots

No response

Steps to Reproduce

  1. Feed the embedding model text that causes numerical instability
  2. Store the resulting embedding (containing NaN)
  3. Run a search query
  4. Observe that results with NaN embeddings have undefined ordering

Expected Behavior

  1. Detect NaN/Infinity in embeddings during generation
  2. Either return an error or replace with zeros
  3. Log a warning about potential model issues
  4. Never store or return NaN embeddings

Actual Behavior

  1. NaN values pass through all stages
  2. Stored in database as binary blob
  3. Cosine similarity produces NaN
  4. Sort comparison treats NaN as "equal" to everything, giving random order
  5. No indication to user that something is wrong

Additional Context

NaN in floating-point comparisons is particularly insidious because:

  • NaN == NaN is false
  • NaN < x is false for all x
  • NaN > x is false for all x
  • partial_cmp returns None for NaN comparisons

This means sort order becomes undefined when NaN values are present, and results may appear in any order with no indication of the problem.

Models can produce NaN values when:

  • Input text causes numerical overflow
  • Model weights are corrupted
  • Quantization introduces numerical issues
  • Context is exhausted (edge cases in llama.cpp)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingideIssues related to IDEinvalidThis doesn't seem rightvgrep

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions