Skip to content

Conversation

@ThomasDelsart
Copy link
Contributor

Related to this issue: #152

@ThomasDelsart ThomasDelsart requested review from Copilot and lsorber June 4, 2025 10:20
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR parallelizes heavy per-evaluation loops in both the query adapter updater and evaluation runner by adding a max_workers parameter and using ThreadPoolExecutor to process each eval concurrently.

  • Adds max_workers arguments to update_query_adapter and answer_evals to control parallelism.
  • Introduces process_eval helper functions with thread-local sessions to handle per-eval work.
  • Replaces sequential loops over evals with futures, progress bars, and result aggregation.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/raglite/_query_adapter.py Parallelize update_query_adapter by spawning threads per eval
src/raglite/_eval.py Parallelize answer_evals by spawning threads per eval
Comments suppressed due to low confidence (3)

src/raglite/_query_adapter.py:47

  • The new max_workers parameter should be documented in the function's docstring (describe how it controls concurrency).
max_workers: int | None = None,

src/raglite/_query_adapter.py:195

  • Variable N is not defined in process_eval before being passed to _optimize_query_target, leading to a NameError. You need to build N (negative embeddings) as you did for P.
t = _optimize_query_target(q, P, N, α=optimize_gap)

src/raglite/_eval.py:198

  • The new parallel branch controlled by max_workers is not covered by existing tests; add tests for answer_evals with max_workers > 1 to ensure correct ordering and error handling.
max_workers: int | None = None,

with ThreadPoolExecutor(max_workers=max_workers) as executor:
# Submit all tasks with their original index
future_to_eval_idx = {
executor.submit(process_eval, eval_): (eval_, idx)
Copy link

Copilot AI Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Calling create_database_engine inside each thread for every eval can be expensive; consider creating the engine once outside and passing a shared session factory to avoid repeated engine initialization.

Suggested change
executor.submit(process_eval, eval_): (eval_, idx)
executor.submit(process_eval, eval_, engine): (eval_, idx)

Copilot uses AI. Check for mistakes.
@lsorber
Copy link
Member

lsorber commented Jun 6, 2025

Superseded by #157.

@lsorber lsorber closed this Jun 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants