Skip to content

Conversation

MatsErdkamp
Copy link

@MatsErdkamp MatsErdkamp commented Oct 2, 2025

Fixes #8689

Draft PR aims to solve this issue. Allows subscores to be defined in metrics. Declared subscores can be seen by optimizers. An upstream issue that requires this change can be found in the GEPA repo

Example code

    def compute_overall_score(gold, pred, trace, pred_name=None, pred_trace=None):
        metrics = compute_metrics(gold, pred, trace, pred_name, pred_trace)
        quality = subscore("quality", metrics.quality)
        leakage = subscore("leakage", 1.0 - metrics.leakage)

        return (quality + leakage) / 2.0

In the future it would be nice to also adapt MLflow.dspy to autolog the subscores as evaluations.

Would love to have discussions on this draft PR about the implementation. There are some quite big changes that warrant some discussion over syntax, naming and implementation

…e-metrics-support

Add multi-objective metric support with subscores
…d clarity and functionality

This update modifies the metric handling throughout the codebase, transitioning from the Scores class to the new Score class. The Score class encapsulates scalar values and subscores, enhancing the metric evaluation process. Adjustments were made in various modules, including evaluation, metrics, and teleprompt utilities, to ensure compatibility with the new structure. Additionally, documentation and tests were updated to reflect these changes.
…e-metrics-support

Refactor metric handling: Replace Scores with Score class for improve…
@MatsErdkamp MatsErdkamp marked this pull request as ready for review October 10, 2025 08:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Allow metrics to return multiple scores, and allow optimizers like GEPA to support multi-score optimization

1 participant