Skip to content

fix: prevent OOM in search by using BinaryHeap#48

Open
echobt wants to merge 1 commit intomainfrom
fix/issue-140
Open

fix: prevent OOM in search by using BinaryHeap#48
echobt wants to merge 1 commit intomainfrom
fix/issue-140

Conversation

@echobt
Copy link
Copy Markdown
Contributor

@echobt echobt commented Jan 20, 2026

Description

This PR fixes a potential Out-Of-Memory (OOM) issue when searching in large repositories. Previously, search_similar would load all matching embeddings into memory before filtering and sorting. This could consume gigabytes of RAM for large datasets (e.g., >500k chunks).

Changes

  • Replaced the loading of all results into a Vec with a BinaryHeap (min-heap) to maintain only the top K results in memory.
  • The heap size is bounded by limit * 3, ensuring predictable memory usage regardless of the total number of matching chunks.
  • Implemented HeapItem wrapper to handle sorting by similarity in the heap.

Memory Impact

For a large index with 500,000 chunks, memory usage for search results is now constant (based on limit) rather than linear with the number of chunks.

Verification

  • Verified that search results are still correctly sorted by similarity.
  • Verified that memory usage is constrained.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant