fix: reuse LlamaContext across batches to improve performance by echobt · Pull Request #40 · CortexLM/vgrep

echobt · 2026-01-20T18:39:57Z

Description

Refactors EmbeddingEngine to reuse LlamaContext across embed_batch calls instead of recreating it every time. This significantly reduces overhead for batch embedding operations.

Changes

Modified EmbeddingEngine to store LlamaContext alongside LlamaModel and LlamaBackend.
Implemented a self-referential struct pattern using Box and unsafe transmute to handle the lifetime dependency between LlamaContext and LlamaModel.
Updated EmbeddingEngine::embed_batch to use the persistent context.
Propagated mutability requirements to ServerState, SearchEngine, and Indexer as llama_cpp_2::context::LlamaContext requires mutable access for inference.

Verification

Ran cargo test to ensure all tests pass.
Verified compilation and type safety of the new structure.

Fixes #159

Previously, LlamaContext was recreated for every batch in embed_batch, causing significant overhead. This change refactors EmbeddingEngine to persist the context using a self-referential struct pattern (Box + transmute), ensuring context reuse while satisfying Rust's lifetime rules. Mutability propagation was also updated in dependent modules.

echobt mentioned this pull request Jan 20, 2026

[BUG] Llama Context Recreated for Every Batch PlatformNetwork/bounty-challenge#159

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: reuse LlamaContext across batches to improve performance#40

fix: reuse LlamaContext across batches to improve performance#40
echobt wants to merge 1 commit intomainfrom
fix/issue-159

echobt commented Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

echobt commented Jan 20, 2026

Description

Changes

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant