You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the contents of benchmarks/ are outdated. It and the accompanying workflow file (.github/workflows/MicroBenchmarks.yml) needs updating.
A set of lightweight benchmarks are needed to measure performance (speed) for standard inference algorithms and models between (1) consecutive releases and (2) a PR and the latest release.
These benchmarks should run automatically (via GitHub Actions) whenever a PR is made and when a new release is created.
Benchmarks (timings) for each release should be stored (e.g. as a release asset) for regression testing.
Ideally, a warning would be raised if regressions are detected. Differences in timings between releases should also be stored/recorded.
Things to consider:
Results from the benchmarks could be stored as GitHub release assets. Open to suggestions for other locations.
Visualizing the benchmarks
Some users are interested in how model performance scales for a particular model, for example, by data size, number of features, etc. Useful visuals will be helpful for digesting the benchmarks.
Models to benchmark
Models of a wide variety for potentially different data sizes should be considered. But we hope to run all the tests rather quickly (well under an hour). Inference algorithms won't be run to convergence, just long enough to get decent timings.
Inference algorithms to benchmark
We want to avoid algorithms that adapt in a way that influence timings. For example, NUTS adapts the number of leapfrog steps, which would result in unpredictable timings. HMC, ADVI, GibbsConditional, MH, PG, for example, are fair game.
Currently, the contents of
benchmarks/
are outdated. It and the accompanying workflow file (.github/workflows/MicroBenchmarks.yml
) needs updating.A set of lightweight benchmarks are needed to measure performance (speed) for standard inference algorithms and models between (1) consecutive releases and (2) a PR and the latest release.
These benchmarks should run automatically (via GitHub Actions) whenever a PR is made and when a new release is created.
Benchmarks (timings) for each release should be stored (e.g. as a release asset) for regression testing.
Ideally, a warning would be raised if regressions are detected. Differences in timings between releases should also be stored/recorded.
Things to consider:
HMC
,ADVI
,GibbsConditional
,MH
,PG
, for example, are fair game.Resources for Benchmarking in Julia
The text was updated successfully, but these errors were encountered: