Project
vgrep
Description
The indexer accesses embeddings by index without validating that the embedding count matches the chunk count. If the embedding API returns fewer embeddings than expected (due to errors, timeouts, or bugs), the application panics with "index out of bounds".
Affected Files
src/core/indexer.rs (lines 139, 167) - Indexer
src/core/indexer.rs (lines 515, 544) - ServerIndexer
Evidence
The Bug Pattern:
let all_chunks: Vec<&str> = pending_files
.iter()
.flat_map(|f| f.chunks.iter().map(|c| c.content.as_str()))
.collect();
// ...
let all_embeddings = self.engine.embed_batch(&all_chunks)?;
// ❌ NO VALIDATION: all_embeddings.len() == all_chunks.len()
for pending in &pending_files {
let file_id = self.db.insert_file(&pending.path, &pending.hash)?;
for (chunk_idx, chunk) in pending.chunks.iter().enumerate() {
let embedding = &all_embeddings[embedding_idx]; // 💥 PANIC if index >= len!
self.db.insert_chunk(...)?;
embedding_idx += 1;
}
}
Panic Scenario:
all_chunks.len() = 100
all_embeddings.len() = 95 (server returned partial results)
Loop iteration 96:
embedding_idx = 95
all_embeddings[95] // 💥 PANIC: index out of bounds: len is 95 but index is 95
Error Message
Debug Logs
System Information
Bounty Version: 0.1.0
OS: Ubuntu 24.04 LTS
CPU: AMD EPYC-Genoa Processor (8 cores)
RAM: 15 GB
Screenshots
No response
Steps to Reproduce
Method 1: Simulate embedding API failure
// In embed_batch, simulate partial failure
pub fn embed_batch(&self, texts: &[&str]) -> Result<Vec<Vec<f32>>> {
// Bug: Return fewer embeddings than requested
let results: Vec<Vec<f32>> = texts.iter()
.take(texts.len() - 5) // Missing 5 embeddings!
.map(|t| self.embed(t).unwrap())
.collect();
Ok(results)
}
Method 2: Server timeout during batch
# Start server with artificial delay
VGREP_EMBED_DELAY=100ms vgrep serve
# Index large codebase (will timeout partway)
timeout 10s vgrep index /large/codebase
# Server returns partial embeddings before timeout
# Client panics when iterating
Method 3: Memory pressure
# Limit memory to force OOM during embedding
systemd-run --scope -p MemoryMax=500M vgrep index /large/codebase
# embed_batch runs out of memory partway through
# Returns partial results, indexer panics
Expected Behavior
- Validate that
all_embeddings.len() == all_chunks.len() before processing
- Return a descriptive error if counts don't match
- Never panic due to index out of bounds
Actual Behavior
- No validation of embedding count
- Blind index access into
all_embeddings
- PANIC:
index out of bounds: the len is X but the index is Y
- Entire indexing operation crashes
- Database may be left in inconsistent state
Additional Context
No response
Project
vgrep
Description
The indexer accesses embeddings by index without validating that the embedding count matches the chunk count. If the embedding API returns fewer embeddings than expected (due to errors, timeouts, or bugs), the application panics with "index out of bounds".
Affected Files
src/core/indexer.rs(lines 139, 167) -Indexersrc/core/indexer.rs(lines 515, 544) -ServerIndexerEvidence
The Bug Pattern:
Panic Scenario:
Error Message
Debug Logs
System Information
Screenshots
No response
Steps to Reproduce
Method 1: Simulate embedding API failure
Method 2: Server timeout during batch
Method 3: Memory pressure
Expected Behavior
all_embeddings.len() == all_chunks.len()before processingActual Behavior
all_embeddingsindex out of bounds: the len is X but the index is YAdditional Context
No response