Skip to content

[BUG] Indexer Panics if Embedding Count Mismatches Chunk Count #147

@olddev94

Description

@olddev94

Project

vgrep

Description

The Indexer::index_directory() function assumes that embed_batch() returns exactly the same number of embeddings as input chunks, then accesses embeddings by index without bounds checking. If the embedding engine returns fewer embeddings than expected (due to internal errors, OOM, or API quirks), the indexer will panic with an "index out of bounds" error.

Error Message

thread 'main' panicked at 'index out of bounds: the len is N but the index is N',
src/core/indexer.rs:167:33

Debug Logs

System Information

- Bounty Version: 0.1.0
- OS: Ubuntu 24.04 LTS
- Rust: 1.75+

Screenshots

No response

Steps to Reproduce

  1. Create a scenario where embed_batch() returns fewer embeddings than inputs:
    • Mock the embedding engine to return partial results
    • Or trigger an OOM condition during batch embedding
    • Or use a model that silently skips some inputs
  2. Run vgrep index
  3. Observe panic

Expected Behavior

The indexer should:

  1. Verify that all_embeddings.len() == total_chunks after embedding
  2. Return a clear error if there's a mismatch
  3. Not panic on partial results

Actual Behavior

  1. No verification is performed
  2. Direct indexing causes panic if counts don't match
  3. No indication of which file/chunk caused the issue

Additional Context

While embed_batch() in embeddings.rs processes texts one by one and should return matching counts, there are scenarios where this could fail:

  • Partial memory allocation failures
  • Context size limits causing silent drops
  • Future API changes that might batch differently

The defensive check is inexpensive and prevents obscure crashes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinginvalidThis doesn't seem right

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions