Skip to content

fix: reuse LlamaContext across batches to improve performance#40

Open
echobt wants to merge 1 commit intomainfrom
fix/issue-159
Open

fix: reuse LlamaContext across batches to improve performance#40
echobt wants to merge 1 commit intomainfrom
fix/issue-159

Conversation

@echobt
Copy link
Copy Markdown
Contributor

@echobt echobt commented Jan 20, 2026

Description

Refactors EmbeddingEngine to reuse LlamaContext across embed_batch calls instead of recreating it every time. This significantly reduces overhead for batch embedding operations.

Changes

  • Modified EmbeddingEngine to store LlamaContext alongside LlamaModel and LlamaBackend.
  • Implemented a self-referential struct pattern using Box and unsafe transmute to handle the lifetime dependency between LlamaContext and LlamaModel.
  • Updated EmbeddingEngine::embed_batch to use the persistent context.
  • Propagated mutability requirements to ServerState, SearchEngine, and Indexer as llama_cpp_2::context::LlamaContext requires mutable access for inference.

Verification

  • Ran cargo test to ensure all tests pass.
  • Verified compilation and type safety of the new structure.

Fixes #159

Previously, LlamaContext was recreated for every batch in embed_batch, causing significant overhead. This change refactors EmbeddingEngine to persist the context using a self-referential struct pattern (Box + transmute), ensuring context reuse while satisfying Rust's lifetime rules. Mutability propagation was also updated in dependent modules.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant