Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lightweight benchmarks for Turing #1534

Closed
luiarthur opened this issue Feb 1, 2021 · 4 comments
Closed

Lightweight benchmarks for Turing #1534

luiarthur opened this issue Feb 1, 2021 · 4 comments
Assignees

Comments

@luiarthur
Copy link
Contributor

luiarthur commented Feb 1, 2021

Currently, the contents of benchmarks/ are outdated. It and the accompanying workflow file (.github/workflows/MicroBenchmarks.yml) needs updating.

A set of lightweight benchmarks are needed to measure performance (speed) for standard inference algorithms and models between (1) consecutive releases and (2) a PR and the latest release.

These benchmarks should run automatically (via GitHub Actions) whenever a PR is made and when a new release is created.

Benchmarks (timings) for each release should be stored (e.g. as a release asset) for regression testing.

Ideally, a warning would be raised if regressions are detected. Differences in timings between releases should also be stored/recorded.

Things to consider:

  • Results from the benchmarks could be stored as GitHub release assets. Open to suggestions for other locations.
  • Visualizing the benchmarks
    • Some users are interested in how model performance scales for a particular model, for example, by data size, number of features, etc. Useful visuals will be helpful for digesting the benchmarks.
  • Models to benchmark
    • Models of a wide variety for potentially different data sizes should be considered. But we hope to run all the tests rather quickly (well under an hour). Inference algorithms won't be run to convergence, just long enough to get decent timings.
  • Inference algorithms to benchmark
    • We want to avoid algorithms that adapt in a way that influence timings. For example, NUTS adapts the number of leapfrog steps, which would result in unpredictable timings. HMC, ADVI, GibbsConditional, MH, PG, for example, are fair game.
  • AD backends to benchmark
  • Other PPLs to compare against Turing

Resources for Benchmarking in Julia

  1. BenchmarkTools.jl manual
  2. BenchmarkTools.jl API
  3. Github actions API
@luiarthur luiarthur self-assigned this Feb 1, 2021
@luiarthur
Copy link
Contributor Author

I'm working on this right now, let me know if you have any suggestions/comments.

@devmotion
Copy link
Member

SciML uses a benchmarking bot, it was described in a blog post on julialang.org a while ago.

@luiarthur
Copy link
Contributor Author

Thanks! I'll have a look.

@yebai
Copy link
Member

yebai commented Dec 16, 2021

Closed in favour of TuringLang/DynamicPPL.jl#346

@yebai yebai closed this as completed Dec 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants