Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Easy way to get gradient evaluation timing #1721

Closed
maedoc opened this issue Oct 25, 2021 · 7 comments
Closed

Easy way to get gradient evaluation timing #1721

maedoc opened this issue Oct 25, 2021 · 7 comments

Comments

@maedoc
Copy link

maedoc commented Oct 25, 2021

The performance tips page doesn't mention how to get timing on the gradient evaluation timing of a model. CmdStan prints this before running the sampler, and it is a helpful number for distinguishing code performance issue from parametrisation issues. Could a tip be added for this?

@cpfiffer
Copy link
Member

More generally I think it'd be nice if we had a more fine-tuned ability to talk to gradient evaluations. For example, I had a use case where I wanted to examine the gradients as they rolled along, but I ended up writing a custom fork of Turing to do so.

@devmotion
Copy link
Member

I guess the main issue is that you have many more choices in Turing/Julia to compute gradients than in Stan (and of course some samplers don't use gradients at all). My hope is that AD backends start to adapt AbstractDifferentiation which would allow us to use one common API for all differentiation backends and to support every backend that implements it automatically (well, in theory at least since the backends still have to support to differentiate the models and eg support Distributions - but this would have to be fixed in other packages and not Turing).

More practically, I think it would be better to document how to benchmark gradient computations with different backends and possibly provide some convenience functions (again hopefully this will become easier with AbstractDifferentiation). I think we shouldn't perform any benchmarks at the beginning of sampling since valid timings would require at least two (and ideally more) execution to ensure that we don't report compilation time as well.

I guess for gradient tracking we would have to report it in the transition and then users could use a callback (possibly a helper would be useful).

@maedoc
Copy link
Author

maedoc commented Oct 29, 2021

While I mentioned gradients, I think some feedback about how long model evaluations are taking would be helpful for any method, especially in Julia where wants to check that some optimisation technique has taken effect. This is less an issue with Stan, where performance is more predictable for non-experts. For context, this discourse thread is where this issue/request came from.

think we shouldn't perform any benchmarks at the beginning of sampling since valid timings would require at least two (and ideally more) execution to ensure that we don't report compilation time as well.

doesn't one need to compile anyway? a few extra evaluations aren't much to provide the information to a user; it's technically overhead, but in the same way a progress bar is.

@cpfiffer
Copy link
Member

doesn't one need to compile anyway?

Not in the sense that Stan does it. Turing is all JIT compiled, so the initial timings would be very poor anyways.

a few extra evaluations aren't much to provide the information to a user; it's technically overhead, but in the same way a progress bar is.

A different way to think about this is to calculate a running estimate of gradient/joint timings, and then report them when they stop moving around -- i.e. after maybe 100 evaluations it could report an accurate time without actually incurring extra evaluations.

@torfjelde
Copy link
Member

For just benchmarking the AD-performance, this used to work: https://gist.github.com/torfjelde/7794c384d82d03c36625cd25b702b8d7

Probably still works.

IMO we should at least show an estimate of "seconds per iteration" (yes, some samplers have varying seconds/iteration but this is still useful information).

@cpfiffer
Copy link
Member

IMO we should at least show an estimate of "seconds per iteration" (yes, some samplers have varying seconds/iteration but this is still useful information).

Agreed. I suppose we'd throw this into the default logger, no?

@yebai
Copy link
Member

yebai commented Nov 12, 2022

@yebai yebai closed this as completed Nov 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants