Allow metrics to return multiple scores #8888

MatsErdkamp · 2025-10-02T11:52:28Z

Draft PR aims to solve this issue. Allows subscores to be defined in metrics. Declared subscores can be seen by optimizers. An upstream issue that requires this change can be found in the GEPA repo

Example code

    def compute_overall_score(gold, pred, trace, pred_name=None, pred_trace=None):
        metrics = compute_metrics(gold, pred, trace, pred_name, pred_trace)
        quality = subscore("quality", metrics.quality)
        leakage = subscore("leakage", 1.0 - metrics.leakage)

        return (quality + leakage) / 2.0

In the future it would be nice to also adapt MLflow.dspy to autolog the subscores as evaluations.

Would love to have discussions on this draft PR about the implementation. There are some quite big changes that warrant some discussion over syntax, naming and implementation

…e-metrics-support Add multi-objective metric support with subscores

…d clarity and functionality This update modifies the metric handling throughout the codebase, transitioning from the Scores class to the new Score class. The Score class encapsulates scalar values and subscores, enhancing the metric evaluation process. Adjustments were made in various modules, including evaluation, metrics, and teleprompt utilities, to ensure compatibility with the new structure. Additionally, documentation and tests were updated to reflect these changes.

…e-metrics-support Refactor metric handling: Replace Scores with Score class for improve…

MatsErdkamp added 8 commits October 1, 2025 16:26

Add multi-objective metric support with subscores

e812dd2

Normalize score handling for teleprompt utilities

1efc9a1

Merge pull request #2 from MatsErdkamp/codex/implement-multi-objectiv…

dd7c660

…e-metrics-support Add multi-objective metric support with subscores

Merge pull request #3 from MatsErdkamp/codex/implement-multi-objectiv…

47e3c16

…e-metrics-support Refactor metric handling: Replace Scores with Score class for improve…

Merge branch 'stanfordnlp:main' into main

cc03a53

Update readme

2654cec

Test commit with ruff fix

4ad6f3b

MatsErdkamp marked this pull request as ready for review October 10, 2025 08:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow metrics to return multiple scores #8888

Allow metrics to return multiple scores #8888

Uh oh!

MatsErdkamp commented Oct 2, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Allow metrics to return multiple scores #8888

Are you sure you want to change the base?

Allow metrics to return multiple scores #8888

Uh oh!

Conversation

MatsErdkamp commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

MatsErdkamp commented Oct 2, 2025 •

edited

Loading