Langfuse Experiment

langfuse/experiment-action

Run a Langfuse experiment in your CI pipeline. The action loads your experiment script, runs it against a Langfuse dataset, comments the result on the PR, and optionally fails the job when a regression is detected. Learn more in the Langfuse docs on testing experiments in CI environments.

Quickstart

name: Langfuse experiment
on:
  pull_request:

permissions:
  contents: read
  pull-requests: write # required for posting the experiment comment
  actions: read # lets "View run" link to the specific job (falls back to the workflow-run URL otherwise)

jobs:
  experiment:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v6

      # For experiments written in Python (`.py` scripts).
      - uses: actions/setup-python@v6
        with:
          python-version: "3.14"

      # For experiments written in TypeScript / JavaScript
      # (`.ts` / `.js` / `.mjs` / `.cjs` scripts). Safe to include both
      # setups if your `experiment_path` is a directory with a mix of
      # runtimes.
      - uses: actions/setup-node@v6
        with:
          node-version: "24"

      # Pin to a release SHA (Zizmor-friendly, protects you from
      # tag-moved attacks). The `# v<version>` comment is what humans
      # read; the SHA is what GitHub resolves. This line is auto-bumped
      # by `.github/workflows/release-bump-readme.yml` on every release.
      - uses: langfuse/experiment-action@d1f45d1979375456a80e1efc6c035e3719d32cb9 # v1.0.2
        with:
          langfuse_public_key: ${{ secrets.LANGFUSE_PUBLIC_KEY }}
          langfuse_secret_key: ${{ secrets.LANGFUSE_SECRET_KEY }}
          experiment_path: experiments/qa_experiment.py
          dataset_name: qa-set
          github_token: ${{ github.token }}

Only include the setup steps you actually need — Python-only projects can drop actions/setup-node, TS-only projects can drop actions/setup-python.

Usage

Inputs

Input	Required	Default	Description
`langfuse_public_key`	yes		Langfuse public API key.
`langfuse_secret_key`	yes		Langfuse secret API key.
`langfuse_base_url`	no	`https://cloud.langfuse.com`	Langfuse base URL.
`experiment_path`	yes		File, directory, or glob pattern pointing at experiment scripts.
`dataset_name`	no		Dataset to run against. If omitted, the user script is expected to select its own dataset.
`dataset_version`	no		Pin the experiment to a specific dataset version.
`experiment_metadata`	no		Additional metadata as a multiline `key=value` string. Shown under the Metadata column in the Langfuse UI.
`should_fail_on_regression`	no	`true`	Fail CI when an experiment raises `RegressionError`.
`should_fail_on_script_error`	no	`true`	Fail CI when an experiment script crashes or raises a non-regression error.
`should_comment_on_pr`	no	`true`	Post the result as a PR comment.
`python_sdk_version`	no	`4.6.0`	Python SDK version to install via `pip` (for `.py` experiments).
`js_sdk_version`	no	`5.3.0`	JS SDK version (`@langfuse/client`) to install via `npm` (for `.ts`/`.js`/`.mjs`/`.cjs` experiments).
`should_skip_sdk_installation`	no	`false`	Skip automatic SDK installation and use the ambient Python/Node environment.
`github_token`	no		Token used to post the PR comment. Pass `${{ github.token }}` and grant `pull-requests: write` (and optionally `actions: read` for job-level "View run" links).

Outputs

Output	Description
`result_json`	JSON output with action metadata and experiment results. See `schemas/result-json.v1.schema.json`.
`failed`	`"true"` if any experiment errored or raised a regression, else `"false"`.

For the full result_json structure, see schemas/result-json.v1.schema.json.

Script contract

Your experiment script must define a function named experiment. The action creates a Langfuse SDK RunnerContext and passes it as the first argument so scripts can use action-injected defaults for dataset, dataset version, and metadata. Use the access patterns from the Langfuse experiments docs — iterate run_evaluations / runEvaluations to find the score you want to gate on.

Python

from langfuse import RegressionError, RunnerContext


def experiment(context: RunnerContext):
    result = context.run_experiment(
        name="My experiment",
        task=my_task,
        evaluators=[my_evaluator],
        run_evaluators=[avg_accuracy],
    )

    avg_accuracy = next(
        evaluation.value
        for evaluation in result.run_evaluations
        if evaluation.name == "avg_accuracy"
    )
    if avg_accuracy < 0.9:
        raise RegressionError(result=result)

    return result

TypeScript / JavaScript

import { RegressionError, type RunnerContext } from "@langfuse/client";

export async function experiment(context: RunnerContext) {
  const result = await context.runExperiment({
    name: "My experiment",
    task: myTask,
    evaluators: [myEvaluator],
    runEvaluators: [avgAccuracy],
  });

  const avgAccuracy =
    (result.runEvaluations.find((e) => e.name === "avg_accuracy")?.value as number | undefined) ??
    0;
  if (avgAccuracy < 0.9) {
    throw new RegressionError({ result });
  }

  return result;
}

The action serializes the returned value to JSON and exposes it through the result_json output. If the function raises, the error is captured and the CI job fails depending on should_fail_on_regression / should_fail_on_script_error.

Consuming the result in later steps

- uses: langfuse/experiment-action@d1f45d1979375456a80e1efc6c035e3719d32cb9 # v1.0.2
  id: experiment
  with: # ...

- name: Upload result as artifact
  run: echo '${{ steps.experiment.outputs.result_json }}' > experiment.json

- uses: actions/upload-artifact@v7
  with:
    name: experiment-result
    path: experiment.json

Experiment metadata

Every experiment run carries the following metadata, in addition to anything you pass via experiment_metadata. Action-generated keys are namespaced under langfuse.* so they're easy to distinguish from your own.

Key	Source	Notes
`langfuse.git_sha`	`$GITHUB_SHA`	The commit being tested.
`langfuse.branch`	`$GITHUB_REF_NAME`
`langfuse.event`	`$GITHUB_EVENT_NAME`	E.g. `pull_request`, `push`.
`langfuse.actor`	`$GITHUB_TRIGGERING_ACTOR` → `$GITHUB_ACTOR`	The user who kicked off the current attempt.
`langfuse.pr_url`	derived from `$GITHUB_REF`	Only on `pull_request` events.
`langfuse.github_workflow_name`	`$GITHUB_WORKFLOW`	E.g. `CI`.
`langfuse.github_job_name`	`$GITHUB_JOB`	The workflow job running the experiment.
`langfuse.github_job_attempt`	`$GITHUB_RUN_ATTEMPT`	`"1"` on the initial run, `"2"`+ on re-runs.
`langfuse.github_job_url`	resolved via the GitHub API from `$GITHUB_RUN_ID`	Direct link to this job's logs. Requires `github_token` with `actions: read`; falls back to the workflow-run URL otherwise.
custom	`experiment_metadata`	Forwarded verbatim — pick whatever namespace your org prefers.

FAQ

Can I run Python and TypeScript experiments in the same step?

Yes. Point experiment_path at a directory (or a glob) that contains a mix of .py and .ts / .js / .mjs / .cjs files. The action detects the runtime per-script, installs each SDK once per step, and runs everything sequentially. Make sure your workflow has actions/setup-python and actions/setup-node before the action step.

experiment_path: experiments/ # contains both python and ts scripts

Files starting with . or _ are skipped so helper modules (e.g. __init__.py, _utils.ts) don't get executed as experiments.

How do I manage the Langfuse SDK installation myself?

Set should_skip_sdk_installation: "true" and install the SDK yourself before the action runs. Useful when your project has pinned lockfiles you'd rather honour than reinstall against the action's default SDK versions.

Python — install from your own `requirements.txt`

- uses: actions/setup-python@v6
  with:
    python-version: "3.14"
    cache: pip

- run: pip install -r requirements.txt # must include `langfuse`

- uses: langfuse/experiment-action@d1f45d1979375456a80e1efc6c035e3719d32cb9 # v1.0.2
  with:
    experiment_path: experiments/
    should_skip_sdk_installation: "true"
    langfuse_public_key: ${{ secrets.LANGFUSE_PUBLIC_KEY }}
    langfuse_secret_key: ${{ secrets.LANGFUSE_SECRET_KEY }}

Works with any Python installer — swap pip install -r requirements.txt for poetry install, uv sync, pdm install, etc.

Node — install from your own lockfile

The action needs @langfuse/client, @langfuse/tracing, @langfuse/otel, @opentelemetry/sdk-node, and tsx reachable from the working directory's node_modules/.

- uses: actions/setup-node@v6
  with:
    node-version: "24"
    cache: npm # or pnpm / yarn

- run: npm ci # or `pnpm install --frozen-lockfile`, `yarn install --frozen-lockfile`

- uses: langfuse/experiment-action@d1f45d1979375456a80e1efc6c035e3719d32cb9 # v1.0.2
  with:
    experiment_path: experiments/
    should_skip_sdk_installation: "true"
    langfuse_public_key: ${{ secrets.LANGFUSE_PUBLIC_KEY }}
    langfuse_secret_key: ${{ secrets.LANGFUSE_SECRET_KEY }}

Make sure @langfuse/client, @langfuse/tracing, @langfuse/otel, @opentelemetry/sdk-node, and tsx are listed in package.json (either dependencies or devDependencies — npm ci installs both).

How do I pass extra secrets (OpenAI keys, etc.) to my experiment?

Set them as env vars on the action step — the experiment subprocess inherits the parent process's environment.

- uses: langfuse/experiment-action@d1f45d1979375456a80e1efc6c035e3719d32cb9 # v1.0.2
  env:
    OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
    ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
  with:
    experiment_path: experiments/
    # ... usual inputs

Your experiment() function can read them with os.environ[...] (Python) or process.env.[...] (TS / JS).

Can I pin a specific Langfuse SDK version?

Yes, for each runtime independently:

python_sdk_version: "4.1.2"
js_sdk_version: "5.0.0-rc.3"

If a version is already installed and matches, the action skips the install.

Why does the action need a `github_token`?

For two things, both optional:

Posting the experiment results comment on the PR (requires pull-requests: write permission on the workflow).
Resolving the specific job URL — the action does one API call to GET /repos/.../actions/runs/<id>/jobs so the PR comment and the langfuse.github_job_url metadata link to this job's logs instead of the broader workflow run.

If github_token is blank, both features are silently skipped; the experiment still runs and succeeds/fails as usual.

Does PR commenting work on forked-PR runs?

No — GitHub restricts the default GITHUB_TOKEN to read-only on workflows triggered from forks, which blocks comment creation. This is a platform-level constraint, not something the action can work around directly. Two common mitigations:

Use pull_request_target (carefully — read GitHub's security guidance first; this runs workflows in the context of the base repo with write permissions).
Use a separate workflow_run-triggered job that grabs artefacts and posts the comment with elevated permissions.

Why can't I see my experiment in Langfuse?

The action only renders View in Langfuse for dataset-backed experiments. To get a View in Langfuse link, run against a real Langfuse dataset by passing dataset_name to the action.

Contributing

See CONTRIBUTING.md.

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Langfuse Experiment

About

Tags

langfuse/experiment-action

Contents

Quickstart

Usage

Inputs

Outputs

Script contract

Python

TypeScript / JavaScript

Consuming the result in later steps

Experiment metadata

FAQ

Can I run Python and TypeScript experiments in the same step?

How do I manage the Langfuse SDK installation myself?

Python — install from your own `requirements.txt`

Node — install from your own lockfile

How do I pass extra secrets (OpenAI keys, etc.) to my experiment?

Can I pin a specific Langfuse SDK version?

Why does the action need a `github_token`?

Does PR commenting work on forked-PR runs?

Why can't I see my experiment in Langfuse?

Contributing

License

Contributors (4)

Resources

About

Tags

Contributors (4)

Resources

langfuse/experiment-action

Contents

Quickstart

Usage

Inputs

Outputs

Script contract

Python

TypeScript / JavaScript

Consuming the result in later steps

Experiment metadata

FAQ

Can I run Python and TypeScript experiments in the same step?

How do I manage the Langfuse SDK installation myself?

Python — install from your own requirements.txt

Node — install from your own lockfile

How do I pass extra secrets (OpenAI keys, etc.) to my experiment?

Can I pin a specific Langfuse SDK version?

Why does the action need a github_token?

Does PR commenting work on forked-PR runs?

Why can't I see my experiment in Langfuse?

Contributing

License

Contributors4 (4)

Resources

About

Tags

Contributors4 (4)

Resources

Python — install from your own `requirements.txt`

Why does the action need a `github_token`?

Contributors (4)

Contributors (4)