add llm pool & judge panel config #156

asamal4 · 2026-02-09T08:43:45Z

Description

Introduced new configs for llm pool & judge panel
llm pool is used outside of judge panel, because this can be used for different purpose in future
A nested structure was considered as a lot of fields can be common. But used a flat structure instead for simplicity purpose.
Different llm configuration options:
- same model with different parameters like temperature
- multiple models with same provider
- for vllm based provider, the whole set up can be unique considering in reality it can be different platforms like RHOAI, RHAIIS
Introduced new properties api_base & api_key_path. So far we are doing this through env variable which was very efficient, practically we can use a new provider without code change. But with multiple providers, same env variables from litellm won't work. Ex: there can be multiple vllm based providers with different key
model parameters can be used dynamically.
This PR just adds the new config. Actual usage, documentation and backward compatibility will be handled separately.

Type of change

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

Assisted-by: Cursor

Related Tickets & Documents

Related Issue #
Closes #

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Please provide detailed steps to perform tests related to this code change.
How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

New Features
- Exposes shared LLM pool and per-model provider configs for centralized model management.
- Adds a judge panel for multi-LLM evaluation with validated metric formatting and aggregation strategies.
- System configuration can now resolve individual and judge LLM configurations (including cache suffix handling).
Tests
- Added comprehensive unit tests for pool, provider, parameters, judge panel, and system integration.
- Removed an obsolete test module.

coderabbitai · 2026-02-09T08:43:59Z

Walkthrough

Adds an LLM configuration subsystem (LLMParametersConfig, LLMDefaultsConfig, LLMProviderConfig, LLMPoolConfig) and judge-panel config (JudgePanelConfig), exposes LLMPoolConfig and JudgePanelConfig in the package API, extends SystemConfig with llm_pool/judge_panel and resolution methods, and adds consolidated unit tests. (≈39 words)

Changes

Cohort / File(s)	Summary
System models `src/lightspeed_evaluation/core/models/system.py`	Adds LLMParametersConfig, LLMDefaultsConfig, LLMProviderConfig, LLMPoolConfig, JudgePanelConfig; validators for metrics and aggregation; LLMPool.get_model_ids() and resolve_llm_config(); SystemConfig gains `llm_pool`, `judge_panel`, `get_judge_configs()` and `get_llm_config()` with configuration error handling and cache_dir suffix logic.
Public exports `src/lightspeed_evaluation/core/models/__init__.py`	Imports and adds `JudgePanelConfig` and `LLMPoolConfig` to `__all__` to make them public exports of the package.
Unit tests (added) `tests/unit/core/models/test_system.py`	New comprehensive tests for LLM configs, LLMParametersConfig serialization, LLMProviderConfig, LLMPoolConfig resolution and cache_dir behavior, JudgePanelConfig validation, SystemConfig integration methods, and error cases.
Unit tests (removed) `tests/unit/core/models/test_system_additional.py`	Removes legacy test module (tests appear consolidated into the new `test_system.py`).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

tisnik
VladimirKadlec

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'add llm pool & judge panel config' accurately and concisely summarizes the main change—adding new configuration entities for LLM pool and judge panel management to the system.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

🤖 Fix all issues with AI agents

In `@src/lightspeed_evaluation/core/models/system.py`:
- Around line 399-454: LLMProviderConfig.api_key_path is parsed but never
propagated into the resolved LLMConfig, so update the resolved model: add an
api_key_path (and optionally api_base if planned) field to the LLMConfig
dataclass/model and assign provider.api_key_path inside resolve_llm_config (the
code that builds the LLMConfig from LLMProviderConfig); locate
resolve_llm_config and the LLMConfig type and set llm_config.api_key_path =
provider.api_key_path (and llm_config.api_base = provider.api_base if you add
that field) so the API key path is carried forward rather than dropped.
- Around line 570-580: The current validate_enabled_metrics (field_validator on
"enabled_metrics") only checks for ":" presence and allows empty framework or
metric parts; update it to split each metric on ":" (preferably
metric.split(":", 1)), verify there are exactly two non-empty trimmed parts
(framework and metric_name), and raise ValueError if either side is missing or
blank, keeping the existing error message format to indicate the required
"framework:metric_name" structure.
- Around line 645-669: Replace the ValueError used for configuration validation
in get_judge_configs with the project-specific ConfigurationError from
core.system.exceptions: import ConfigurationError (or the exceptions module) at
the top of the file and raise ConfigurationError instead of ValueError in
get_judge_configs; also update the analogous check in get_llm_config to raise
ConfigurationError so both configuration validation failures use the custom
exception type (refer to symbols get_judge_configs, get_llm_config, and
ConfigurationError).

In `@tests/unit/core/models/test_system.py`:
- Around line 338-383: Add unit tests to cover the error branches in
SystemConfig: create tests that call SystemConfig.get_judge_configs() when
judge_panel is set but llm_pool is None and assert it raises ValueError
mentioning "llm_pool"; create a test that calls
SystemConfig.get_llm_config("...") on a SystemConfig with no llm_pool and assert
it raises ValueError mentioning "llm_pool"; and add a test where llm_pool exists
but a judge id in JudgePanelConfig is missing from llm_pool.models and assert
get_judge_configs() raises the appropriate error (or ValueError) for a missing
judge id. Locate and add these tests alongside the existing
TestSystemConfigWithLLMPoolAndJudgePanel methods so they exercise
get_judge_configs and get_llm_config error paths.
- Around line 325-327: Replace the instance construction that uses a type ignore
with a direct pydantic validation call: inside the
pytest.raises(ValidationError) block call
JudgePanelConfig.model_validate({"judges": [{"provider": "openai"}]}) instead of
JudgePanelConfig(judges=[{"provider": "openai"}]) and remove the "# type:
ignore[list-item]" comment so the validation path is exercised without silencing
type checks; this uses the JudgePanelConfig.model_validate method to trigger the
same validation error.
- Around line 156-172: In test_extra_params_and_to_dict, remove the "# type:
ignore[attr-defined]" suppression and assert the extra field via Pydantic's API
instead of attribute access: fetch the dynamic field from the validated model
(e.g., use params.model_dump().get("top_p") or params.model_extra /
params.model_dump(exclude_none=...) as appropriate) and assert its value;
reference LLMParametersConfig, the test function test_extra_params_and_to_dict,
and the params variable when making the change.
- Line 174: The TestLLMProviderConfig class currently silences
"too-few-public-methods"; remove the "# pylint: disable=too-few-public-methods"
and fix the underlying issue by adding an additional test method (e.g., add
test_empty_provider_string or test_invalid_provider_values) inside the same
class so it has multiple public methods, or alternatively merge the existing
test_llm_provider_config method into a neighboring test class (e.g.,
TestSystemConfig) and delete the empty class; update references to the
class/method names (TestLLMProviderConfig, test_llm_provider_config) accordingly
to ensure tests still run and lint passes.

🧹 Nitpick comments (2)

src/lightspeed_evaluation/core/models/system.py (2)
511-517: Consider os.path.join for cache directory construction.

String concatenation with "/" isn't portable across platforms, and rstrip("/") won't handle "\\" on Windows. Using os.path.join would be more robust.
Suggested fix
-        base_cache = self.defaults.cache_dir.rstrip("/")
-        cache_dir = (
-            f"{base_cache}/{cache_suffix}"
-            if cache_suffix
-            else f"{base_cache}/{model_id}"
-        )
+        suffix = cache_suffix if cache_suffix else model_id
+        cache_dir = os.path.join(self.defaults.cache_dir, suffix)
594-622: Consider eager cross-validation of judge_panel.judges against llm_pool.models.

Currently, if a judge ID doesn't exist in llm_pool.models, the error surfaces only when get_judge_configs() is called at runtime. A model_validator(mode="after") on SystemConfig could catch misconfiguration at parse time, providing a better developer experience.
Example
`@model_validator`(mode="after")
def _validate_judge_panel_references(self) -> "SystemConfig":
    if self.judge_panel and self.llm_pool:
        available = set(self.llm_pool.models.keys())
        missing = [j for j in self.judge_panel.judges if j not in available]
        if missing:
            raise ValueError(
                f"Judge IDs {missing} not found in llm_pool.models. "
                f"Available: {sorted(available)}"
            )
    return self

src/lightspeed_evaluation/core/models/system.py

tests/unit/core/models/test_system.py

asamal4 · 2026-02-09T09:40:28Z

@coderabbitai help

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@src/lightspeed_evaluation/core/models/system.py`:
- Around line 694-695: The docstring for the method that validates llm_pool
currently states it raises ValueError but the implementation actually raises
ConfigurationError; update the Raises section to list ConfigurationError
(including its module/class name if you use a custom exception, e.g.,
ConfigurationError) and a short description like "If llm_pool not configured or
model not found" so the docstring matches the actual exception raised by the
method that references llm_pool and ConfigurationError.

🧹 Nitpick comments (1)

src/lightspeed_evaluation/core/models/system.py (1)
506-512: resolve_llm_config raises ValueError instead of ConfigurationError.

This is inconsistent with get_judge_configs and get_llm_config which were updated to raise ConfigurationError. Since this ValueError propagates through get_judge_configs uncaught, callers see mixed exception types for configuration issues.
Suggested fix
         if model_id not in self.models:
-            raise ValueError(
+            raise ConfigurationError(
                 f"Model '{model_id}' not found in llm_pool.models. "
                 f"Available: {list(self.models.keys())}"
             )
As per coding guidelines: "Use custom exceptions from core.system.exceptions module for error handling."

src/lightspeed_evaluation/core/models/system.py

VladimirKadlec

Nice, thanks.
LGTM

coderabbitai bot reviewed Feb 9, 2026

View reviewed changes

asamal4 force-pushed the judge-panel-config branch from 9696f3d to 92bb19c Compare February 9, 2026 09:26

lightspeed-core deleted a comment from coderabbitai bot Feb 9, 2026

coderabbitai bot reviewed Feb 9, 2026

View reviewed changes

src/lightspeed_evaluation/core/models/system.py Outdated Show resolved Hide resolved

add llm pool & judge panel config

8aacdca

asamal4 force-pushed the judge-panel-config branch from 92bb19c to 8aacdca Compare February 9, 2026 10:34

VladimirKadlec approved these changes Feb 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add llm pool & judge panel config #156

add llm pool & judge panel config #156

asamal4 commented Feb 9, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Feb 9, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

asamal4 commented Feb 9, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

VladimirKadlec left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

add llm pool & judge panel config #156

Are you sure you want to change the base?

add llm pool & judge panel config #156

Conversation

asamal4 commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Tools used to create PR

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

asamal4 commented Feb 9, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

VladimirKadlec left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

asamal4 commented Feb 9, 2026 •

edited

Loading

coderabbitai bot commented Feb 9, 2026 •

edited

Loading