Easy way to get gradient evaluation timing #1721

maedoc · 2021-10-25T20:09:54Z

The performance tips page doesn't mention how to get timing on the gradient evaluation timing of a model. CmdStan prints this before running the sampler, and it is a helpful number for distinguishing code performance issue from parametrisation issues. Could a tip be added for this?

cpfiffer · 2021-10-27T15:14:18Z

More generally I think it'd be nice if we had a more fine-tuned ability to talk to gradient evaluations. For example, I had a use case where I wanted to examine the gradients as they rolled along, but I ended up writing a custom fork of Turing to do so.

devmotion · 2021-10-27T15:33:11Z

I guess the main issue is that you have many more choices in Turing/Julia to compute gradients than in Stan (and of course some samplers don't use gradients at all). My hope is that AD backends start to adapt AbstractDifferentiation which would allow us to use one common API for all differentiation backends and to support every backend that implements it automatically (well, in theory at least since the backends still have to support to differentiate the models and eg support Distributions - but this would have to be fixed in other packages and not Turing).

More practically, I think it would be better to document how to benchmark gradient computations with different backends and possibly provide some convenience functions (again hopefully this will become easier with AbstractDifferentiation). I think we shouldn't perform any benchmarks at the beginning of sampling since valid timings would require at least two (and ideally more) execution to ensure that we don't report compilation time as well.

I guess for gradient tracking we would have to report it in the transition and then users could use a callback (possibly a helper would be useful).

maedoc · 2021-10-29T11:31:58Z

While I mentioned gradients, I think some feedback about how long model evaluations are taking would be helpful for any method, especially in Julia where wants to check that some optimisation technique has taken effect. This is less an issue with Stan, where performance is more predictable for non-experts. For context, this discourse thread is where this issue/request came from.

think we shouldn't perform any benchmarks at the beginning of sampling since valid timings would require at least two (and ideally more) execution to ensure that we don't report compilation time as well.

doesn't one need to compile anyway? a few extra evaluations aren't much to provide the information to a user; it's technically overhead, but in the same way a progress bar is.

cpfiffer · 2021-10-29T15:50:30Z

doesn't one need to compile anyway?

Not in the sense that Stan does it. Turing is all JIT compiled, so the initial timings would be very poor anyways.

a few extra evaluations aren't much to provide the information to a user; it's technically overhead, but in the same way a progress bar is.

A different way to think about this is to calculate a running estimate of gradient/joint timings, and then report them when they stop moving around -- i.e. after maybe 100 evaluations it could report an accurate time without actually incurring extra evaluations.

torfjelde · 2021-10-29T18:06:35Z

For just benchmarking the AD-performance, this used to work: https://gist.github.com/torfjelde/7794c384d82d03c36625cd25b702b8d7

Probably still works.

IMO we should at least show an estimate of "seconds per iteration" (yes, some samplers have varying seconds/iteration but this is still useful information).

cpfiffer · 2021-10-30T02:32:16Z

IMO we should at least show an estimate of "seconds per iteration" (yes, some samplers have varying seconds/iteration but this is still useful information).

Agreed. I suppose we'd throw this into the default logger, no?

yebai · 2022-11-12T20:46:01Z

See TuringLang/DynamicPPL.jl#346 and https://github.com/torfjelde/TuringBenchmarking.jl

yebai closed this as completed Nov 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Easy way to get gradient evaluation timing #1721

Easy way to get gradient evaluation timing #1721

maedoc commented Oct 25, 2021

cpfiffer commented Oct 27, 2021

devmotion commented Oct 27, 2021

maedoc commented Oct 29, 2021

cpfiffer commented Oct 29, 2021

torfjelde commented Oct 29, 2021

cpfiffer commented Oct 30, 2021

yebai commented Nov 12, 2022

Easy way to get gradient evaluation timing #1721

Easy way to get gradient evaluation timing #1721

Comments

maedoc commented Oct 25, 2021

cpfiffer commented Oct 27, 2021

devmotion commented Oct 27, 2021

maedoc commented Oct 29, 2021

cpfiffer commented Oct 29, 2021

torfjelde commented Oct 29, 2021

cpfiffer commented Oct 30, 2021

yebai commented Nov 12, 2022