This guide walks you through setting up Evidentlyβs GitHub Action for data validation and automated regression testing of your Language Model agentβs outputs β both locally and using Evidently Cloud.
The Evidently Report GitHub Action is a CI-friendly wrapper around the Evidently CLI for running data checks, LLM evaluations, and monitoring reports.
It lets you:
- Run descriptors, metrics, and tests as configured in a JSON or Python config file
- Use local datasets or datasets stored in Evidently Cloud
- Save results either locally or to Evidently Cloud
- Fail the workflow if any test fails
- Auto-post report links into GitHub commits or upload result artifacts
Key options:
config_path
β path to your config file (Python or JSON)input_path
β path to your input dataset (local file orcloud://
URI)output
β where to save results (local dir orcloud://
URI)api_key
β Evidently Cloud API key if needed
In our tutorial, we'll apply this action for testing an LLM agentβs prompt or code changes via CI:
-
A dataset of test prompts/questions will be prepared (either locally or in Evidently Cloud)
-
Our LLM agent code will generate answers for each prompt (in CI)
-
The Evidently Report Action will:
- Run descriptors (LLM judges or custom descriptors)
- Run metrics and tests
- Fail the run if any test fails
- Save results to local storage or Evidently Cloud
This lets you automatically verify, in CI, that your latest prompt or agent code changes still behave correctly on a controlled test set.
Your repo will look like this:
my-llm-agent/
βββ .github/
β βββ workflows/
β βββ evidently.yml
βββ src/
β βββ my_agent.py
β βββ evidently_config.py
βββ requirements.txt
βββ README.md
Create a new repository on GitHub (via UI). Clone it locally:
git clone https://github.com/your-username/my-llm-agent.git
cd my-llm-agent
- Create an Evidently Cloud account
- Create a new project
- Upload your test dataset there β for example a CSV with a
question
column:
question
"What is your name?"
"How are you?"
"Tell me a joke."
- Copy the
dataset_id
of your uploaded dataset - Create an API key from your account settings
- In your GitHub repo:
Settings β Secrets and variables β Actions β New repository secret
Name it
EVIDENTLY_API_KEY
- Add other secrets your agent might need e.g. OPENAI_API_KEY
For this tutorial, we will create most basic agent.
Create src/my_agent.py
:
# src/my_agent.py
from agents import Agent, WebSearchTool, Runner
my_agent = Agent(
name="Assistant",
instructions="You are a helpful assistant",
model="gpt-4.1",
tools=[
WebSearchTool(),
],
)
def answer(question: str) -> str:
response = Runner.run_sync(my_agent, question)
return response.final_output
(π Replace the function logic with your actual agent code later.)
Create src/evidently_config.py
.
This config defines:
- A descriptor to run your agent
- A descriptor to check your agentβs answers
- Optionally a test summary metric
# src/evidently_config.py
from evidently import ColumnType
from evidently.cli.report import ReportConfig
from evidently.core.datasets import DatasetColumn
from evidently.metrics import MinValue
from evidently.tests import gte, eq
from evidently.descriptors import NegativityLLMEval, WordCount, CustomColumnDescriptor
from src.my_agent import answer
def answer_descriptor(col: DatasetColumn) -> DatasetColumn:
return DatasetColumn(ColumnType.Text, col.data.apply(answer))
descr_conf = ReportConfig(descriptors=[
CustomColumnDescriptor("question", answer_descriptor, alias="answer"),
NegativityLLMEval("answer", provider="openai", model="gpt-4o-mini", alias="answer_negativity",
tests=[eq("POSITIVE", column="answer_negativity")]),
WordCount("answer", alias="answer_word_count"),
],
metrics=[MinValue(column="answer_word_count", tests=[gte(2)])]
)
(π Adjust descriptor params to your needs later.)
Create requirements.txt
:
evidently[llm]
openai-agents
Create .github/workflows/evidently.yml
:
name: Cloud Evidently Check
on:
push:
branches: [ main ]
pull_request:
branches: [ main ]
types:
- opened
- reopened
- synchronize
- ready_for_review
workflow_dispatch:
permissions:
contents: read
statuses: write
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
evidently-check:
name: Evidently Report
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
cache: "pip"
cache-dependency-path: requirements.txt
- name: "Install requirements"
run: pip install -r requirements.txt
- uses: evidentlyai/evidently-report-action@v1
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
with:
config_path: src/evidently_config.py
api_key: ${{ secrets.EVIDENTLY_API_KEY }}
input_path: cloud://<dataset_id>
output: cloud://<project_id>
Notes:
Replace YOUR_DATASET_ID
with the dataset ID from Evidently Cloud
Every time you push a commit or open a PR:
- CI will download the test dataset from Evidently Cloud
- Run your agent against it
- Evaluate responses via Evidently
- Upload results
- Fail workflow if test fails
- Attach a link to the report in GitHub Check Annotations
In this tutorial, youβll set up a GitHub Actions workflow to evaluate your LLM agent locally using Evidentlyβs CLI inside CI, without using Evidently Cloud.
Key differences from Cloud run:
- The test dataset is stored directly in your repository.
- The evaluation results are saved as local files inside the CI runner.
- No API keys or Evidently Cloud setup is needed.
- Optionally, result files can be uploaded as GitHub Actions artifacts.
Your repo structure is almost the same, with one extra data/
folder:
my-llm-agent/
βββ .github/
β βββ workflows/
β βββ evidently.yml
βββ data/
β βββ test_questions.csv
βββ src/
β βββ my_agent.py
β βββ run_agent.py
β βββ evidently_config.py
βββ requirements.txt
βββ README.md
Create data/test_questions.csv
:
question
"What is your name?"
"How are you?"
"Tell me a joke."
(π Replace or extend with your real test questions later.)
Re-use the same:
src/my_agent.py
src/run_agent.py
src/evidently_config.py
requirements.txt
No changes needed.
Create .github/workflows/evidently.yml
:
name: Local Evidently Check
on:
push:
branches: [ main ]
pull_request:
branches: [ main ]
types:
- opened
- reopened
- synchronize
- ready_for_review
workflow_dispatch:
permissions:
contents: read
statuses: write
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
evidently-check:
name: Evidently Report
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
cache: "pip"
cache-dependency-path: requirements.txt
- name: "Install requirements"
run: pip install -r requirements.txt
- uses: evidentlyai/evidently-report-action@v1
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
with:
config_path: src/evidently_config.py
input_path: data/test_questions.csv
output: report.json
upload_artifacts: 'true'
Notes:
- No API key or cloud dataset ID needed
- Dataset loaded from
data/test_questions.csv
in your repo - Results saved locally to
reports/run-<commit_hash>
- Artifacts uploaded if desired
- To load results back to python, use
Dataset.load(path)
orRun.load(path)
- CI fails if any test fails
On every commit:
- CI uses your local test dataset
- Runs your agent on it
- Runs Evidently descriptors and metrics
- Saves results locally
- Uploads report artifacts
- Fails workflow if test fails
This setup makes it possible to automatically check that changes to your LLM agentβs prompt or logic still produce acceptable, positive results on a test set β in CI, using either local files or integrated Evidently Cloud datasets and storage.