The evaluation UI currently supports only ROUGE, but the backend (adk-python) already allows multiple metrics through RunEvalRequest and exposes them via /metrics-info. This issue adds UI support for dynamically selecting metrics (e.g., ROUGE, BERTScore, LLM-as-judge, path accuracy) and displaying multiple evaluation results.
No backend changes are required only frontend updates to:
- Fetch available metrics from /metrics-info
- Add a metric selection dropdown in the evaluation panel
- Include selected metrics in the /eval-sets/{eval_set_id}/run request payload
- Render multiple metric results dynamically in the results view