Project
vgrep
Description
The Indexer::index_directory() function assumes that embed_batch() returns exactly the same number of embeddings as input chunks, then accesses embeddings by index without bounds checking. If the embedding engine returns fewer embeddings than expected (due to internal errors, OOM, or API quirks), the indexer will panic with an "index out of bounds" error.
Error Message
thread 'main' panicked at 'index out of bounds: the len is N but the index is N',
src/core/indexer.rs:167:33
Debug Logs
System Information
- Bounty Version: 0.1.0
- OS: Ubuntu 24.04 LTS
- Rust: 1.75+
Screenshots
No response
Steps to Reproduce
- Create a scenario where
embed_batch() returns fewer embeddings than inputs:
- Mock the embedding engine to return partial results
- Or trigger an OOM condition during batch embedding
- Or use a model that silently skips some inputs
- Run
vgrep index
- Observe panic
Expected Behavior
The indexer should:
- Verify that
all_embeddings.len() == total_chunks after embedding
- Return a clear error if there's a mismatch
- Not panic on partial results
Actual Behavior
- No verification is performed
- Direct indexing causes panic if counts don't match
- No indication of which file/chunk caused the issue
Additional Context
While embed_batch() in embeddings.rs processes texts one by one and should return matching counts, there are scenarios where this could fail:
- Partial memory allocation failures
- Context size limits causing silent drops
- Future API changes that might batch differently
The defensive check is inexpensive and prevents obscure crashes.
Project
vgrep
Description
The
Indexer::index_directory()function assumes thatembed_batch()returns exactly the same number of embeddings as input chunks, then accesses embeddings by index without bounds checking. If the embedding engine returns fewer embeddings than expected (due to internal errors, OOM, or API quirks), the indexer will panic with an "index out of bounds" error.Error Message
Debug Logs
System Information
Screenshots
No response
Steps to Reproduce
embed_batch()returns fewer embeddings than inputs:vgrep indexExpected Behavior
The indexer should:
all_embeddings.len() == total_chunksafter embeddingActual Behavior
Additional Context
While
embed_batch()inembeddings.rsprocesses texts one by one and should return matching counts, there are scenarios where this could fail:The defensive check is inexpensive and prevents obscure crashes.