getting eval_metric_results_with_invocations using pytest #3437

gbox32 · 2025-08-27T15:28:02Z

gbox32
Aug 27, 2025

Is there a way to get the back the eval_metric_results_with_invocations or in particular the raw agent responses in the evaluation when running the evaluation using pytest?

hangfei · 2025-11-07T04:15:31Z

hangfei
Nov 7, 2025
Maintainer

@adk-bot

0 replies

adk-bot · 2025-11-07T04:16:44Z

adk-bot
Nov 7, 2025
Maintainer

Response from ADK Answering Agent (experimental, answer may be inaccurate)

TLDR: When you run evaluations using pytest, ADK automatically saves detailed results, including raw agent responses and invocation-level metrics, to a local output directory. You can find the results in a JSON file within this directory.

Hello! Yes, there is a way to access the detailed evaluation results when using pytest.

When you run an ADK evaluation with pytest, the framework generates a directory containing the evaluation results. This includes the eval_metric_results_with_invocations and the raw agent responses you're looking for.

The component responsible for saving these results is the LocalEvalSetResultsManager [1]. This manager creates a directory and stores the detailed results of each evaluation case in a JSON file. You should inspect your test output directory for a folder related to your evaluation set, and within it, you will find the JSON files containing the EvalCaseResult data, which includes invocation-level details.

[1] https://github.com/google/adk-python/blob/main/tests/unittests/evaluation/test_local_eval_set_results_manager.py

0 replies

adk-bot · 2025-11-07T04:16:45Z

adk-bot
Nov 7, 2025
Maintainer

Response from ADK Answering Agent (experimental, answer may be inaccurate)

TLDR: Yes, you can access the detailed evaluation results, including raw agent responses, by capturing the return value of the AgentEvaluator.evaluate() method when running your pytest evaluation.

Hello! That's a great question. When you run an ADK evaluation using pytest, the AgentEvaluator.evaluate() method returns a comprehensive set of results that you can inspect.

The evaluate method returns a list of EvalSetResult objects. Each of these objects contains the detailed results for a single evaluation set, including the results for each individual test case and each invocation within that case. This is where you can find the raw agent responses. [1]

Here is an example of how you can modify your pytest test to capture and parse these results:

import pytest
from google.adk.evaluation.agent_evaluator import AgentEvaluator

@pytest.mark.asyncio
async def test_with_single_test_file_and_get_results():
    """Test the agent and access the detailed evaluation results."""
    eval_results = await AgentEvaluator.evaluate(
        agent_module="home_automation_agent",
        eval_dataset_file_path_or_dir="tests/integration/fixture/home_automation_agent/simple_test.test.json",
    )

    # eval_results is a list of EvalSetResult objects.
    # You can now iterate through them to access detailed results.
    for eval_set_result in eval_results:
        print(f"Results for Eval Set: {eval_set_result.eval_set_id}")
        for case_result in eval_set_result.eval_case_results:
            print(f"  Results for Eval Case: {case_result.eval_id}")
            # This contains the detailed invocation and metric results
            for result_per_invocation in case_result.eval_metric_result_per_invocation:
                # 'result_per_invocation' is an EvalMetricResultPerInvocation object
                # which contains the invocation details.
                print(f"    Invocation: {result_per_invocation.invocation}")
                print(f"    Metric '{result_per_invocation.metric_name}' Score: {result_per_invocation.score}")

In this structure:

eval_set_result.eval_case_results gives you a list of EvalCaseResult objects.
case_result.eval_metric_result_per_invocation contains a list where each item holds the detailed results for each turn in the conversation, including the invocation object with the raw agent response. [2]

This approach allows you to programmatically access and assert on the raw outputs of your agent during your automated tests.

[1] https://google.github.io/adk-docs/evaluate/
[2] https://github.com/google/adk-python/blob/main/tests/unittests/evaluation/test_local_eval_set_results_manager.py

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

getting eval_metric_results_with_invocations using pytest #3437

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

getting eval_metric_results_with_invocations using pytest #3437

Uh oh!

gbox32 Aug 27, 2025

Replies: 3 comments

Uh oh!

hangfei Nov 7, 2025 Maintainer

Uh oh!

adk-bot Nov 7, 2025 Maintainer

Uh oh!

adk-bot Nov 7, 2025 Maintainer

gbox32
Aug 27, 2025

hangfei
Nov 7, 2025
Maintainer

adk-bot
Nov 7, 2025
Maintainer

adk-bot
Nov 7, 2025
Maintainer