📖 Tutorial: Automated LLM Output Testing with Evidently GitHub Action

This guide walks you through setting up Evidently’s GitHub Action for data validation and automated regression testing of your Language Model agent’s outputs — both locally and using Evidently Cloud.

📖 How the Evidently Report Action Works

The Evidently Report GitHub Action is a CI-friendly wrapper around the Evidently CLI for running data checks, LLM evaluations, and monitoring reports.

It lets you:

Run descriptors, metrics, and tests as configured in a JSON or Python config file
Use local datasets or datasets stored in Evidently Cloud
Save results either locally or to Evidently Cloud
Fail the workflow if any test fails
Auto-post report links into GitHub commits or upload result artifacts

Key options:

config_path — path to your config file (Python or JSON)
input_path — path to your input dataset (local file or cloud:// URI)
output — where to save results (local dir or cloud:// URI)
api_key — Evidently Cloud API key if needed

📌 How We'll Use It in This Example

In our tutorial, we'll apply this action for testing an LLM agent’s prompt or code changes via CI:

A dataset of test prompts/questions will be prepared (either locally or in Evidently Cloud)
Our LLM agent code will generate answers for each prompt (in CI)
The Evidently Report Action will:
- Run descriptors (LLM judges or custom descriptors)
- Run metrics and tests
- Fail the run if any test fails
- Save results to local storage or Evidently Cloud

This lets you automatically verify, in CI, that your latest prompt or agent code changes still behave correctly on a controlled test set.

📦 Tutorial: Continuous LLM Agent Evaluation with Evidently Cloud and GitHub Actions

📁 Project Structure

Your repo will look like this:

my-llm-agent/
├── .github/
│   └── workflows/
│       └── evidently.yml
├── src/
│   ├── my_agent.py
│   └── evidently_config.py
├── requirements.txt
└── README.md

📦 Step 1: Create the Repository

Create a new repository on GitHub (via UI). Clone it locally:

git clone https://github.com/your-username/my-llm-agent.git
cd my-llm-agent

☁️ Step 2: Set Up Evidently Cloud

Create an Evidently Cloud account
Create a new project
Upload your test dataset there — for example a CSV with a question column:

question
"What is your name?"
"How are you?"
"Tell me a joke."

Copy the dataset_id of your uploaded dataset
Create an API key from your account settings
In your GitHub repo: Settings → Secrets and variables → Actions → New repository secret Name it EVIDENTLY_API_KEY
Add other secrets your agent might need e.g. OPENAI_API_KEY

📑 Step 3: Write Your Agent Code

For this tutorial, we will create most basic agent.

Create src/my_agent.py:

# src/my_agent.py

from agents import Agent, WebSearchTool, Runner

my_agent = Agent(
    name="Assistant",
    instructions="You are a helpful assistant",
    model="gpt-4.1",
    tools=[
        WebSearchTool(),
    ],
)


def answer(question: str) -> str:
    response = Runner.run_sync(my_agent, question)
    return response.final_output

(👉 Replace the function logic with your actual agent code later.)

📑 Step 4: Define Evidently Config

Create src/evidently_config.py. This config defines:

A descriptor to run your agent
A descriptor to check your agent’s answers
Optionally a test summary metric

# src/evidently_config.py

from evidently import ColumnType
from evidently.cli.report import ReportConfig
from evidently.core.datasets import DatasetColumn
from evidently.metrics import MinValue
from evidently.tests import gte, eq
from evidently.descriptors import NegativityLLMEval, WordCount, CustomColumnDescriptor
from src.my_agent import answer


def answer_descriptor(col: DatasetColumn) -> DatasetColumn:
    return DatasetColumn(ColumnType.Text, col.data.apply(answer))


descr_conf = ReportConfig(descriptors=[
    CustomColumnDescriptor("question", answer_descriptor, alias="answer"),
    NegativityLLMEval("answer", provider="openai", model="gpt-4o-mini", alias="answer_negativity",
                      tests=[eq("POSITIVE", column="answer_negativity")]),
    WordCount("answer", alias="answer_word_count"),
],
    metrics=[MinValue(column="answer_word_count", tests=[gte(2)])]
)

(👉 Adjust descriptor params to your needs later.)

📦 Step 5: Define Dependencies

Create requirements.txt:

evidently[llm]
openai-agents

⚙️ Step 6: Create GitHub Actions Workflow

Create .github/workflows/evidently.yml:

name: Cloud Evidently Check
on:
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]
    types:
      - opened
      - reopened
      - synchronize
      - ready_for_review

  workflow_dispatch:

permissions:
  contents: read
  statuses: write

concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

jobs:
  evidently-check:
    name: Evidently Report
    runs-on: ubuntu-22.04
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"
          cache: "pip"
          cache-dependency-path: requirements.txt
      - name: "Install requirements"
        run: pip install -r requirements.txt
      - uses: evidentlyai/evidently-report-action@v1
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
        with:
          config_path: src/evidently_config.py
          api_key: ${{ secrets.EVIDENTLY_API_KEY }}
          input_path: cloud://<dataset_id>
          output: cloud://<project_id>

Notes: Replace YOUR_DATASET_ID with the dataset ID from Evidently Cloud

✅ Done

Every time you push a commit or open a PR:

CI will download the test dataset from Evidently Cloud
Run your agent against it
Evaluate responses via Evidently
Upload results
Fail workflow if test fails
Attach a link to the report in GitHub Check Annotations

📦 Continuous LLM Agent Evaluation with Evidently Local Run

In this tutorial, you’ll set up a GitHub Actions workflow to evaluate your LLM agent locally using Evidently’s CLI inside CI, without using Evidently Cloud.

Key differences from Cloud run:

The test dataset is stored directly in your repository.
The evaluation results are saved as local files inside the CI runner.
No API keys or Evidently Cloud setup is needed.
Optionally, result files can be uploaded as GitHub Actions artifacts.

📁 Project Structure

Your repo structure is almost the same, with one extra data/ folder:

my-llm-agent/
├── .github/
│   └── workflows/
│       └── evidently.yml
├── data/
│   └── test_questions.csv
├── src/
│   ├── my_agent.py
│   ├── run_agent.py
│   └── evidently_config.py
├── requirements.txt
└── README.md

📦 Step 1: Add Test Dataset

Create data/test_questions.csv:

question
"What is your name?"
"How are you?"
"Tell me a joke."

(👉 Replace or extend with your real test questions later.)

📑 Step 2: Use the Same Agent Code and Config

Re-use the same:

src/my_agent.py
src/run_agent.py
src/evidently_config.py
requirements.txt

No changes needed.

⚙️ Step 3: Create Local-Mode GitHub Actions Workflow

Create .github/workflows/evidently.yml:

name: Local Evidently Check
on:
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]
    types:
      - opened
      - reopened
      - synchronize
      - ready_for_review

  workflow_dispatch:

permissions:
  contents: read
  statuses: write

concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

jobs:
  evidently-check:
    name: Evidently Report
    runs-on: ubuntu-22.04
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"
          cache: "pip"
          cache-dependency-path: requirements.txt
      - name: "Install requirements"
        run: pip install -r requirements.txt
      - uses: evidentlyai/evidently-report-action@v1
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
        with:
          config_path: src/evidently_config.py
          input_path: data/test_questions.csv
          output: report.json
          upload_artifacts: 'true'

Notes:

No API key or cloud dataset ID needed
Dataset loaded from data/test_questions.csv in your repo
Results saved locally to reports/run-<commit_hash>
Artifacts uploaded if desired
To load results back to python, use Dataset.load(path) or Run.load(path)
CI fails if any test fails

✅ Done

On every commit:

CI uses your local test dataset
Runs your agent on it
Runs Evidently descriptors and metrics
Saves results locally
Uploads report artifacts
Fails workflow if test fails

📎 Related Links

✅ Summary

This setup makes it possible to automatically check that changes to your LLM agent’s prompt or logic still produce acceptable, positive results on a test set — in CI, using either local files or integrated Evidently Cloud datasets and storage.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📖 Tutorial: Automated LLM Output Testing with Evidently GitHub Action

📖 How the Evidently Report Action Works

📌 How We'll Use It in This Example

📦 Tutorial: Continuous LLM Agent Evaluation with Evidently Cloud and GitHub Actions

📁 Project Structure

📦 Step 1: Create the Repository

☁️ Step 2: Set Up Evidently Cloud

📑 Step 3: Write Your Agent Code

📑 Step 4: Define Evidently Config

📦 Step 5: Define Dependencies

⚙️ Step 6: Create GitHub Actions Workflow

✅ Done

📦 Continuous LLM Agent Evaluation with Evidently Local Run

📁 Project Structure

📦 Step 1: Add Test Dataset

📑 Step 2: Use the Same Agent Code and Config

⚙️ Step 3: Create Local-Mode GitHub Actions Workflow

✅ Done

📎 Related Links

✅ Summary

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.github/workflows		.github/workflows
data		data
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

evidentlyai/evidently-ci-example

Folders and files

Latest commit

History

Repository files navigation

📖 Tutorial: Automated LLM Output Testing with Evidently GitHub Action

📖 How the Evidently Report Action Works

📌 How We'll Use It in This Example

📦 Tutorial: Continuous LLM Agent Evaluation with Evidently Cloud and GitHub Actions

📁 Project Structure

📦 Step 1: Create the Repository

☁️ Step 2: Set Up Evidently Cloud

📑 Step 3: Write Your Agent Code

📑 Step 4: Define Evidently Config

📦 Step 5: Define Dependencies

⚙️ Step 6: Create GitHub Actions Workflow

✅ Done

📦 Continuous LLM Agent Evaluation with Evidently Local Run

📁 Project Structure

📦 Step 1: Add Test Dataset

📑 Step 2: Use the Same Agent Code and Config

⚙️ Step 3: Create Local-Mode GitHub Actions Workflow

✅ Done

📎 Related Links

✅ Summary

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages