Skip to content

Conversation

@leondz
Copy link
Collaborator

@leondz leondz commented Dec 10, 2025

Conduct a variety of checks and tests to assess the integrity of a garak report.jsonl file

This helps us identify where a report may be broken, deficient, or incorrectly assembled

Inventory of tests:

  • ✔️ garak version match between that used to create report and current garak used for checking
  • ✔️ report using a dev version of garak
  • ✔️ check using a dev version of garak
  • ✔️ inventory described by config's probe_spec matches probes present in attempts
  • ✔️ each attempt status 1 has matching status 2
  • ✔️ all attempts have enough unique generations
  • ✔️ run ID is insetup run IDs
  • ✔️ detection output has correct cardinality in attempt status 2s
  • ✔️ a summary digest object is present
  • ✔️ at least one z-score is listed in the digest
  • ✔️ probes present in summary matches probes requested in config
  • ✔️ the run was completed
  • ✔️ the run is <6 months old (calibration freshness)
  • ✔️ there is at least one eval statement for any probe attempted
  • ✔️ evals are performed over all status 2 attempts
  • ✔️ number of responses graded passed + nones is not more than total reponses graded in eval entries

@leondz leondz requested a review from jmartin-tech December 10, 2025 11:51
@leondz leondz added the reporting Reporting, analysis, and other per-run result functions label Dec 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

reporting Reporting, analysis, and other per-run result functions

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant