You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For each AD backend, each model tested runs in its own process. This is pretty awkward. I think it basically means we need to have a shell script calling Julia.
The result of the job should be aggregated - if any model fails then the job should have a red cross
Output should specify benchmark time (if run successfully) & error (if not). When the jobs finish running, this info must be collated into a single csv and/or html on gh-pages i.e. this info must be easily available to end-user
Note: some of these are difficult to do right. It may well be that we should sacrifice some of these points, or push them to later, just for the sake of getting something out.
Bonus stretch goals:
Avoid recalculating the 'ground truth' with ForwardDiff for the same model multiple times.
Add links to existing GitHub issues when reasons for failing models are known.
I'm not aware of an issue for this, so wanted to open one to capture thoughts.
The code for this is still not in DynamicPPL, it's hosted at https://github.com/penelopeysm/ModelTests.jl as it's easier for me to iterate on it there
Desiderata:
Note: some of these are difficult to do right. It may well be that we should sacrifice some of these points, or push them to later, just for the sake of getting something out.
Bonus stretch goals:
Additional details in #799 (comment)
The text was updated successfully, but these errors were encountered: