Skip to content

[BUG] Llama Context Recreated for Every Batch #159

@olddev94

Description

@olddev94

Project

vgrep

Description

The embed_batch() function creates a new LlamaContext for every batch of texts to embed. This is expensive as context creation involves memory allocation and initialization. The context should be reused across batches.

Error Message

None - performance issue.

Debug Logs

System Information

- Bounty Version: 0.1.0
- OS: Ubuntu 24.04 LTS
- Rust: 1.75+

Screenshots

No response

Steps to Reproduce

  1. Index a large project with many files
  2. Observe that embedding is slower than expected
  3. Each batch call recreates the context

Expected Behavior

  1. Create context once during EmbeddingEngine::new()
  2. Reuse context for all embedding operations
  3. Use configured n_threads instead of recalculating

Actual Behavior

  1. Every embed_batch() call creates a new context
  2. Thread count is queried from OS every time
  3. Significant overhead for many small batches

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingideIssues related to IDEinvalidThis doesn't seem rightvgrep

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions