Skip to content

evidentlyai/evidently-ci-example

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

40 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation


πŸ“– Tutorial: Automated LLM Output Testing with Evidently GitHub Action

This guide walks you through setting up Evidently’s GitHub Action for data validation and automated regression testing of your Language Model agent’s outputs β€” both locally and using Evidently Cloud.


πŸ“– How the Evidently Report Action Works

The Evidently Report GitHub Action is a CI-friendly wrapper around the Evidently CLI for running data checks, LLM evaluations, and monitoring reports.

It lets you:

  • Run descriptors, metrics, and tests as configured in a JSON or Python config file
  • Use local datasets or datasets stored in Evidently Cloud
  • Save results either locally or to Evidently Cloud
  • Fail the workflow if any test fails
  • Auto-post report links into GitHub commits or upload result artifacts

Key options:

  • config_path β€” path to your config file (Python or JSON)
  • input_path β€” path to your input dataset (local file or cloud:// URI)
  • output β€” where to save results (local dir or cloud:// URI)
  • api_key β€” Evidently Cloud API key if needed

πŸ“Œ How We'll Use It in This Example

In our tutorial, we'll apply this action for testing an LLM agent’s prompt or code changes via CI:

  • A dataset of test prompts/questions will be prepared (either locally or in Evidently Cloud)

  • Our LLM agent code will generate answers for each prompt (in CI)

  • The Evidently Report Action will:

    • Run descriptors (LLM judges or custom descriptors)
    • Run metrics and tests
    • Fail the run if any test fails
    • Save results to local storage or Evidently Cloud

This lets you automatically verify, in CI, that your latest prompt or agent code changes still behave correctly on a controlled test set.


πŸ“¦ Tutorial: Continuous LLM Agent Evaluation with Evidently Cloud and GitHub Actions

πŸ“ Project Structure

Your repo will look like this:

my-llm-agent/
β”œβ”€β”€ .github/
β”‚   └── workflows/
β”‚       └── evidently.yml
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ my_agent.py
β”‚   └── evidently_config.py
β”œβ”€β”€ requirements.txt
└── README.md

πŸ“¦ Step 1: Create the Repository

Create a new repository on GitHub (via UI). Clone it locally:

git clone https://github.com/your-username/my-llm-agent.git
cd my-llm-agent

☁️ Step 2: Set Up Evidently Cloud

  1. Create an Evidently Cloud account
  2. Create a new project
  3. Upload your test dataset there β€” for example a CSV with a question column:
question
"What is your name?"
"How are you?"
"Tell me a joke."
  1. Copy the dataset_id of your uploaded dataset
  2. Create an API key from your account settings
  3. In your GitHub repo: Settings β†’ Secrets and variables β†’ Actions β†’ New repository secret Name it EVIDENTLY_API_KEY
  4. Add other secrets your agent might need e.g. OPENAI_API_KEY

πŸ“‘ Step 3: Write Your Agent Code

For this tutorial, we will create most basic agent.

Create src/my_agent.py:

# src/my_agent.py

from agents import Agent, WebSearchTool, Runner

my_agent = Agent(
    name="Assistant",
    instructions="You are a helpful assistant",
    model="gpt-4.1",
    tools=[
        WebSearchTool(),
    ],
)


def answer(question: str) -> str:
    response = Runner.run_sync(my_agent, question)
    return response.final_output

(πŸ‘‰ Replace the function logic with your actual agent code later.)


πŸ“‘ Step 4: Define Evidently Config

Create src/evidently_config.py. This config defines:

  • A descriptor to run your agent
  • A descriptor to check your agent’s answers
  • Optionally a test summary metric
# src/evidently_config.py

from evidently import ColumnType
from evidently.cli.report import ReportConfig
from evidently.core.datasets import DatasetColumn
from evidently.metrics import MinValue
from evidently.tests import gte, eq
from evidently.descriptors import NegativityLLMEval, WordCount, CustomColumnDescriptor
from src.my_agent import answer


def answer_descriptor(col: DatasetColumn) -> DatasetColumn:
    return DatasetColumn(ColumnType.Text, col.data.apply(answer))


descr_conf = ReportConfig(descriptors=[
    CustomColumnDescriptor("question", answer_descriptor, alias="answer"),
    NegativityLLMEval("answer", provider="openai", model="gpt-4o-mini", alias="answer_negativity",
                      tests=[eq("POSITIVE", column="answer_negativity")]),
    WordCount("answer", alias="answer_word_count"),
],
    metrics=[MinValue(column="answer_word_count", tests=[gte(2)])]
)

(πŸ‘‰ Adjust descriptor params to your needs later.)


πŸ“¦ Step 5: Define Dependencies

Create requirements.txt:

evidently[llm]
openai-agents

βš™οΈ Step 6: Create GitHub Actions Workflow

Create .github/workflows/evidently.yml:

name: Cloud Evidently Check
on:
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]
    types:
      - opened
      - reopened
      - synchronize
      - ready_for_review

  workflow_dispatch:

permissions:
  contents: read
  statuses: write

concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

jobs:
  evidently-check:
    name: Evidently Report
    runs-on: ubuntu-22.04
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"
          cache: "pip"
          cache-dependency-path: requirements.txt
      - name: "Install requirements"
        run: pip install -r requirements.txt
      - uses: evidentlyai/evidently-report-action@v1
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
        with:
          config_path: src/evidently_config.py
          api_key: ${{ secrets.EVIDENTLY_API_KEY }}
          input_path: cloud://<dataset_id>
          output: cloud://<project_id>

Notes: Replace YOUR_DATASET_ID with the dataset ID from Evidently Cloud


βœ… Done

Every time you push a commit or open a PR:

  • CI will download the test dataset from Evidently Cloud
  • Run your agent against it
  • Evaluate responses via Evidently
  • Upload results
  • Fail workflow if test fails
  • Attach a link to the report in GitHub Check Annotations

πŸ“¦ Continuous LLM Agent Evaluation with Evidently Local Run

In this tutorial, you’ll set up a GitHub Actions workflow to evaluate your LLM agent locally using Evidently’s CLI inside CI, without using Evidently Cloud.

Key differences from Cloud run:

  • The test dataset is stored directly in your repository.
  • The evaluation results are saved as local files inside the CI runner.
  • No API keys or Evidently Cloud setup is needed.
  • Optionally, result files can be uploaded as GitHub Actions artifacts.

πŸ“ Project Structure

Your repo structure is almost the same, with one extra data/ folder:

my-llm-agent/
β”œβ”€β”€ .github/
β”‚   └── workflows/
β”‚       └── evidently.yml
β”œβ”€β”€ data/
β”‚   └── test_questions.csv
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ my_agent.py
β”‚   β”œβ”€β”€ run_agent.py
β”‚   └── evidently_config.py
β”œβ”€β”€ requirements.txt
└── README.md

πŸ“¦ Step 1: Add Test Dataset

Create data/test_questions.csv:

question
"What is your name?"
"How are you?"
"Tell me a joke."

(πŸ‘‰ Replace or extend with your real test questions later.)


πŸ“‘ Step 2: Use the Same Agent Code and Config

Re-use the same:

  • src/my_agent.py
  • src/run_agent.py
  • src/evidently_config.py
  • requirements.txt

No changes needed.


βš™οΈ Step 3: Create Local-Mode GitHub Actions Workflow

Create .github/workflows/evidently.yml:

name: Local Evidently Check
on:
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]
    types:
      - opened
      - reopened
      - synchronize
      - ready_for_review

  workflow_dispatch:

permissions:
  contents: read
  statuses: write

concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

jobs:
  evidently-check:
    name: Evidently Report
    runs-on: ubuntu-22.04
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"
          cache: "pip"
          cache-dependency-path: requirements.txt
      - name: "Install requirements"
        run: pip install -r requirements.txt
      - uses: evidentlyai/evidently-report-action@v1
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
        with:
          config_path: src/evidently_config.py
          input_path: data/test_questions.csv
          output: report.json
          upload_artifacts: 'true'

Notes:

  • No API key or cloud dataset ID needed
  • Dataset loaded from data/test_questions.csv in your repo
  • Results saved locally to reports/run-<commit_hash>
  • Artifacts uploaded if desired
  • To load results back to python, use Dataset.load(path) or Run.load(path)
  • CI fails if any test fails

βœ… Done

On every commit:

  • CI uses your local test dataset
  • Runs your agent on it
  • Runs Evidently descriptors and metrics
  • Saves results locally
  • Uploads report artifacts
  • Fails workflow if test fails

πŸ“Ž Related Links


βœ… Summary

This setup makes it possible to automatically check that changes to your LLM agent’s prompt or logic still produce acceptable, positive results on a test set β€” in CI, using either local files or integrated Evidently Cloud datasets and storage.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages