You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Evals CLI commands should have option to dump output to STDOUT, and show general aggregate metrics
I think a --verbose flag would be great. I also think you should print summary statistics (e.g. mean or p50 scores for each evaluator type) for every run, though, because that adds minimal log spam and it creates a nice confirmation that the evaluator truly did run and produce meaningful results. Also, add a URL in the console to open the results for that run in the Genkit console if it is running.
Having spent some time working with the Gen UI codebase that has 50-100 eval data points (it doesn't use genkit), a common workflow is like:
Make change to code
Run eval pipeline
Check if summary metrics are better or worse.
(optional) Investigate traces to understand what changed
Very often, you are just trying all kinds of tweaks and doing only steps 1-3, so the summary metrics are enough.
The text was updated successfully, but these errors were encountered:
We should add the URL to view the results. This is now possible since the runtime is already up and running.
Summary metrics are useful, this needs more thought on it. Not all metrics can be summarized by calculating mean, eg: enum scores, boolean scores, string scores are all valid in Genkit. We can start with support for numeric scores and improving as we go.
--verbose flag that outputs the results to STDOUT is a nice-to-have.
Is your feature request related to a problem? Please describe.
Evals CLI commands should have option to dump output to STDOUT, and show general aggregate metrics
Additional context
Filed on behalf of @jacobsimionato
The text was updated successfully, but these errors were encountered: