Skip to content

Conversation

@asamal4
Copy link
Collaborator

@asamal4 asamal4 commented Feb 9, 2026

Description

  • Introduced new configs for llm pool & judge panel
  • llm pool is used outside of judge panel, because this can be used for different purpose in future
  • A nested structure was considered as a lot of fields can be common. But used a flat structure instead for simplicity purpose.
  • Different llm configuration options:
    • same model with different parameters like temperature
    • multiple models with same provider
    • for vllm based provider, the whole set up can be unique considering in reality it can be different platforms like RHOAI, RHAIIS
  • Introduced new properties api_base & api_key_path. So far we are doing this through env variable which was very efficient, practically we can use a new provider without code change. But with multiple providers, same env variables from litellm won't work. Ex: there can be multiple vllm based providers with different key
  • model parameters can be used dynamically.
  • This PR just adds the new config. Actual usage, documentation and backward compatibility will be handled separately.

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Unit tests improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: Cursor

Related Tickets & Documents

  • Related Issue #
  • Closes #

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

  • New Features

    • Exposes shared LLM pool and per-model provider configs for centralized model management.
    • Adds a judge panel for multi-LLM evaluation with validated metric formatting and aggregation strategies.
    • System configuration can now resolve individual and judge LLM configurations (including cache suffix handling).
  • Tests

    • Added comprehensive unit tests for pool, provider, parameters, judge panel, and system integration.
    • Removed an obsolete test module.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 9, 2026

Walkthrough

Adds an LLM configuration subsystem (LLMParametersConfig, LLMDefaultsConfig, LLMProviderConfig, LLMPoolConfig) and judge-panel config (JudgePanelConfig), exposes LLMPoolConfig and JudgePanelConfig in the package API, extends SystemConfig with llm_pool/judge_panel and resolution methods, and adds consolidated unit tests. (≈39 words)

Changes

Cohort / File(s) Summary
System models
src/lightspeed_evaluation/core/models/system.py
Adds LLMParametersConfig, LLMDefaultsConfig, LLMProviderConfig, LLMPoolConfig, JudgePanelConfig; validators for metrics and aggregation; LLMPool.get_model_ids() and resolve_llm_config(); SystemConfig gains llm_pool, judge_panel, get_judge_configs() and get_llm_config() with configuration error handling and cache_dir suffix logic.
Public exports
src/lightspeed_evaluation/core/models/__init__.py
Imports and adds JudgePanelConfig and LLMPoolConfig to __all__ to make them public exports of the package.
Unit tests (added)
tests/unit/core/models/test_system.py
New comprehensive tests for LLM configs, LLMParametersConfig serialization, LLMProviderConfig, LLMPoolConfig resolution and cache_dir behavior, JudgePanelConfig validation, SystemConfig integration methods, and error cases.
Unit tests (removed)
tests/unit/core/models/test_system_additional.py
Removes legacy test module (tests appear consolidated into the new test_system.py).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

  • tisnik
  • VladimirKadlec
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'add llm pool & judge panel config' accurately and concisely summarizes the main change—adding new configuration entities for LLM pool and judge panel management to the system.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🤖 Fix all issues with AI agents
In `@src/lightspeed_evaluation/core/models/system.py`:
- Around line 399-454: LLMProviderConfig.api_key_path is parsed but never
propagated into the resolved LLMConfig, so update the resolved model: add an
api_key_path (and optionally api_base if planned) field to the LLMConfig
dataclass/model and assign provider.api_key_path inside resolve_llm_config (the
code that builds the LLMConfig from LLMProviderConfig); locate
resolve_llm_config and the LLMConfig type and set llm_config.api_key_path =
provider.api_key_path (and llm_config.api_base = provider.api_base if you add
that field) so the API key path is carried forward rather than dropped.
- Around line 570-580: The current validate_enabled_metrics (field_validator on
"enabled_metrics") only checks for ":" presence and allows empty framework or
metric parts; update it to split each metric on ":" (preferably
metric.split(":", 1)), verify there are exactly two non-empty trimmed parts
(framework and metric_name), and raise ValueError if either side is missing or
blank, keeping the existing error message format to indicate the required
"framework:metric_name" structure.
- Around line 645-669: Replace the ValueError used for configuration validation
in get_judge_configs with the project-specific ConfigurationError from
core.system.exceptions: import ConfigurationError (or the exceptions module) at
the top of the file and raise ConfigurationError instead of ValueError in
get_judge_configs; also update the analogous check in get_llm_config to raise
ConfigurationError so both configuration validation failures use the custom
exception type (refer to symbols get_judge_configs, get_llm_config, and
ConfigurationError).

In `@tests/unit/core/models/test_system.py`:
- Around line 338-383: Add unit tests to cover the error branches in
SystemConfig: create tests that call SystemConfig.get_judge_configs() when
judge_panel is set but llm_pool is None and assert it raises ValueError
mentioning "llm_pool"; create a test that calls
SystemConfig.get_llm_config("...") on a SystemConfig with no llm_pool and assert
it raises ValueError mentioning "llm_pool"; and add a test where llm_pool exists
but a judge id in JudgePanelConfig is missing from llm_pool.models and assert
get_judge_configs() raises the appropriate error (or ValueError) for a missing
judge id. Locate and add these tests alongside the existing
TestSystemConfigWithLLMPoolAndJudgePanel methods so they exercise
get_judge_configs and get_llm_config error paths.
- Around line 325-327: Replace the instance construction that uses a type ignore
with a direct pydantic validation call: inside the
pytest.raises(ValidationError) block call
JudgePanelConfig.model_validate({"judges": [{"provider": "openai"}]}) instead of
JudgePanelConfig(judges=[{"provider": "openai"}]) and remove the "# type:
ignore[list-item]" comment so the validation path is exercised without silencing
type checks; this uses the JudgePanelConfig.model_validate method to trigger the
same validation error.
- Around line 156-172: In test_extra_params_and_to_dict, remove the "# type:
ignore[attr-defined]" suppression and assert the extra field via Pydantic's API
instead of attribute access: fetch the dynamic field from the validated model
(e.g., use params.model_dump().get("top_p") or params.model_extra /
params.model_dump(exclude_none=...) as appropriate) and assert its value;
reference LLMParametersConfig, the test function test_extra_params_and_to_dict,
and the params variable when making the change.
- Line 174: The TestLLMProviderConfig class currently silences
"too-few-public-methods"; remove the "# pylint: disable=too-few-public-methods"
and fix the underlying issue by adding an additional test method (e.g., add
test_empty_provider_string or test_invalid_provider_values) inside the same
class so it has multiple public methods, or alternatively merge the existing
test_llm_provider_config method into a neighboring test class (e.g.,
TestSystemConfig) and delete the empty class; update references to the
class/method names (TestLLMProviderConfig, test_llm_provider_config) accordingly
to ensure tests still run and lint passes.
🧹 Nitpick comments (2)
src/lightspeed_evaluation/core/models/system.py (2)

511-517: Consider os.path.join for cache directory construction.

String concatenation with "/" isn't portable across platforms, and rstrip("/") won't handle "\\" on Windows. Using os.path.join would be more robust.

Suggested fix
-        base_cache = self.defaults.cache_dir.rstrip("/")
-        cache_dir = (
-            f"{base_cache}/{cache_suffix}"
-            if cache_suffix
-            else f"{base_cache}/{model_id}"
-        )
+        suffix = cache_suffix if cache_suffix else model_id
+        cache_dir = os.path.join(self.defaults.cache_dir, suffix)

594-622: Consider eager cross-validation of judge_panel.judges against llm_pool.models.

Currently, if a judge ID doesn't exist in llm_pool.models, the error surfaces only when get_judge_configs() is called at runtime. A model_validator(mode="after") on SystemConfig could catch misconfiguration at parse time, providing a better developer experience.

Example
`@model_validator`(mode="after")
def _validate_judge_panel_references(self) -> "SystemConfig":
    if self.judge_panel and self.llm_pool:
        available = set(self.llm_pool.models.keys())
        missing = [j for j in self.judge_panel.judges if j not in available]
        if missing:
            raise ValueError(
                f"Judge IDs {missing} not found in llm_pool.models. "
                f"Available: {sorted(available)}"
            )
    return self

@asamal4 asamal4 force-pushed the judge-panel-config branch from 9696f3d to 92bb19c Compare February 9, 2026 09:26
@asamal4
Copy link
Collaborator Author

asamal4 commented Feb 9, 2026

@coderabbitai help

@lightspeed-core lightspeed-core deleted a comment from coderabbitai bot Feb 9, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/lightspeed_evaluation/core/models/system.py`:
- Around line 694-695: The docstring for the method that validates llm_pool
currently states it raises ValueError but the implementation actually raises
ConfigurationError; update the Raises section to list ConfigurationError
(including its module/class name if you use a custom exception, e.g.,
ConfigurationError) and a short description like "If llm_pool not configured or
model not found" so the docstring matches the actual exception raised by the
method that references llm_pool and ConfigurationError.
🧹 Nitpick comments (1)
src/lightspeed_evaluation/core/models/system.py (1)

506-512: resolve_llm_config raises ValueError instead of ConfigurationError.

This is inconsistent with get_judge_configs and get_llm_config which were updated to raise ConfigurationError. Since this ValueError propagates through get_judge_configs uncaught, callers see mixed exception types for configuration issues.

Suggested fix
         if model_id not in self.models:
-            raise ValueError(
+            raise ConfigurationError(
                 f"Model '{model_id}' not found in llm_pool.models. "
                 f"Available: {list(self.models.keys())}"
             )

As per coding guidelines: "Use custom exceptions from core.system.exceptions module for error handling."

@asamal4 asamal4 force-pushed the judge-panel-config branch from 92bb19c to 8aacdca Compare February 9, 2026 10:34
Copy link
Contributor

@VladimirKadlec VladimirKadlec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks.
LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants