-
Notifications
You must be signed in to change notification settings - Fork 219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Easy way to get gradient evaluation timing #1721
Comments
More generally I think it'd be nice if we had a more fine-tuned ability to talk to gradient evaluations. For example, I had a use case where I wanted to examine the gradients as they rolled along, but I ended up writing a custom fork of Turing to do so. |
I guess the main issue is that you have many more choices in Turing/Julia to compute gradients than in Stan (and of course some samplers don't use gradients at all). My hope is that AD backends start to adapt AbstractDifferentiation which would allow us to use one common API for all differentiation backends and to support every backend that implements it automatically (well, in theory at least since the backends still have to support to differentiate the models and eg support Distributions - but this would have to be fixed in other packages and not Turing). More practically, I think it would be better to document how to benchmark gradient computations with different backends and possibly provide some convenience functions (again hopefully this will become easier with AbstractDifferentiation). I think we shouldn't perform any benchmarks at the beginning of sampling since valid timings would require at least two (and ideally more) execution to ensure that we don't report compilation time as well. I guess for gradient tracking we would have to report it in the transition and then users could use a callback (possibly a helper would be useful). |
While I mentioned gradients, I think some feedback about how long model evaluations are taking would be helpful for any method, especially in Julia where wants to check that some optimisation technique has taken effect. This is less an issue with Stan, where performance is more predictable for non-experts. For context, this discourse thread is where this issue/request came from.
doesn't one need to compile anyway? a few extra evaluations aren't much to provide the information to a user; it's technically overhead, but in the same way a progress bar is. |
Not in the sense that Stan does it. Turing is all JIT compiled, so the initial timings would be very poor anyways.
A different way to think about this is to calculate a running estimate of gradient/joint timings, and then report them when they stop moving around -- i.e. after maybe 100 evaluations it could report an accurate time without actually incurring extra evaluations. |
For just benchmarking the AD-performance, this used to work: https://gist.github.com/torfjelde/7794c384d82d03c36625cd25b702b8d7 Probably still works. IMO we should at least show an estimate of "seconds per iteration" (yes, some samplers have varying seconds/iteration but this is still useful information). |
Agreed. I suppose we'd throw this into the default logger, no? |
The performance tips page doesn't mention how to get timing on the gradient evaluation timing of a model. CmdStan prints this before running the sampler, and it is a helpful number for distinguishing code performance issue from parametrisation issues. Could a tip be added for this?
The text was updated successfully, but these errors were encountered: