AD testing #869

penelopeysm · 2025-03-28T11:56:51Z

I'm not aware of an issue for this, so wanted to open one to capture thoughts.

The code for this is still not in DynamicPPL, it's hosted at https://github.com/penelopeysm/ModelTests.jl as it's easier for me to iterate on it there

Desiderata:

Each AD backend runs in its own CI.
For each AD backend, each model tested runs in its own process. This is pretty awkward. I think it basically means we need to have a shell script calling Julia.
The result of the job should be aggregated - if any model fails then the job should have a red cross
Output should specify benchmark time (if run successfully) & error (if not). When the jobs finish running, this info must be collated into a single csv and/or html on gh-pages i.e. this info must be easily available to end-user

Note: some of these are difficult to do right. It may well be that we should sacrifice some of these points, or push them to later, just for the sake of getting something out.

Bonus stretch goals:

Avoid recalculating the 'ground truth' with ForwardDiff for the same model multiple times.
Add links to existing GitHub issues when reasons for failing models are known.
Add ability to test on different varinfos

Additional details in #799 (comment)

penelopeysm self-assigned this Mar 28, 2025

This was referenced Apr 4, 2025

Implement AD testing and benchmarking (hand rolled) #882

Open

Implement AD testing and benchmarking (with DITest) #883

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AD testing #869

AD testing #869

penelopeysm commented Mar 28, 2025 •

edited

Loading

AD testing #869

AD testing #869

Comments

penelopeysm commented Mar 28, 2025 • edited Loading

penelopeysm commented Mar 28, 2025 •

edited

Loading