fix: prevent OOM in search by using BinaryHeap by echobt · Pull Request #48 · CortexLM/vgrep

echobt · 2026-01-20T23:01:43Z

Description

This PR fixes a potential Out-Of-Memory (OOM) issue when searching in large repositories. Previously, search_similar would load all matching embeddings into memory before filtering and sorting. This could consume gigabytes of RAM for large datasets (e.g., >500k chunks).

Changes

Replaced the loading of all results into a Vec with a BinaryHeap (min-heap) to maintain only the top K results in memory.
The heap size is bounded by limit * 3, ensuring predictable memory usage regardless of the total number of matching chunks.
Implemented HeapItem wrapper to handle sorting by similarity in the heap.

Memory Impact

For a large index with 500,000 chunks, memory usage for search results is now constant (based on limit) rather than linear with the number of chunks.

Verification

Verified that search results are still correctly sorted by similarity.
Verified that memory usage is constrained.

fix: use BinaryHeap for search results to avoid OOM on large datasets

0e33021

echobt mentioned this pull request Jan 20, 2026

[BUG] Search Loads ALL Embeddings Into Memory - OOM Risk PlatformNetwork/bounty-challenge#140

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: prevent OOM in search by using BinaryHeap#48

fix: prevent OOM in search by using BinaryHeap#48
echobt wants to merge 1 commit intomainfrom
fix/issue-140

echobt commented Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

echobt commented Jan 20, 2026

Description

Changes

Memory Impact

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant