Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix + test for compiled ReverseDiff without linking #2097

Merged
merged 1 commit into from
Oct 5, 2023

Conversation

torfjelde
Copy link
Member

Currently, we have this (from https://turinglang.org/TuringBenchmarking.jl/dev/):

julia> using TuringBenchmarking, Turing


julia> @model function demo(x)
           s ~ InverseGamma(2, 3)
           m ~ Normal(0, sqrt(s))
           for i in 1:length(x)
               x[i] ~ Normal(m, sqrt(s))
           end
       end
demo (generic function with 2 methods)


julia> model = demo([1.5, 2.0]);


julia> benchmark_model(
           model;
           # Check correctness of computations
           check=true,
           # Automatic differentiation backends to check and benchmark
           adbackends=[:forwarddiff, :reversediff, :reversediff_compiled, :zygote]
       )
┌ Warning: There is disagreement in the log-density values!
└ @ TuringBenchmarking ~/work/TuringBenchmarking.jl/TuringBenchmarking.jl/src/TuringBenchmarking.jl:248
┌──────────────────────────────────────┬─────────────┐
│                             Standard │ Log-density │
│                              backend │    distance │
├──────────────────────────────────────┼─────────────┤
│                ForwardDiff vs Zygote │        0.00 │
│ ForwardDiff vs ReverseDiff[compiled] │        0.59 │
│           ForwardDiff vs ReverseDiff │        0.00 │
│      Zygote vs ReverseDiff[compiled] │        0.59 │
│                Zygote vs ReverseDiff │        0.00 │
│ ReverseDiff[compiled] vs ReverseDiff │        0.59 │
└──────────────────────────────────────┴─────────────┘
┌ Warning: There is disagreement in the gradients!
└ @ TuringBenchmarking ~/work/TuringBenchmarking.jl/TuringBenchmarking.jl/src/TuringBenchmarking.jl:255
┌──────────────────────────────────────┬──────────┐
│                             Standard │ Gradient │
│                              backend │ distance │
├──────────────────────────────────────┼──────────┤
│                ForwardDiff vs Zygote │     0.00 │
│ ForwardDiff vs ReverseDiff[compiled] │     1.20 │
│           ForwardDiff vs ReverseDiff │     0.00 │
│      Zygote vs ReverseDiff[compiled] │     1.20 │
│                Zygote vs ReverseDiff │     0.00 │
│ ReverseDiff[compiled] vs ReverseDiff │     1.20 │
└──────────────────────────────────────┴──────────┘
2-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "evaluation" => 2-element BenchmarkTools.BenchmarkGroup:
	  tags: []
	  "linked" => Trial(500.000 ns)
	  "standard" => Trial(400.000 ns)
  "gradient" => 4-element BenchmarkTools.BenchmarkGroup:
	  tags: []
	  "Turing.Essential.ReverseDiffAD{false}()" => 2-element BenchmarkTools.BenchmarkGroup:
		  tags: ["ReverseDiff"]
		  "linked" => Trial(11.800 μs)
		  "standard" => Trial(11.200 μs)
	  "Turing.Essential.ReverseDiffAD{true}()" => 2-element BenchmarkTools.BenchmarkGroup:
		  tags: ["ReverseDiff[compiled]"]
		  "linked" => Trial(1.900 μs)
		  "standard" => Trial(1.900 μs)
	  "Turing.Essential.ForwardDiffAD{0, true}()" => 2-element BenchmarkTools.BenchmarkGroup:
		  tags: ["ForwardDiff"]
		  "linked" => Trial(800.000 ns)
		  "standard" => Trial(700.000 ns)
	  "Turing.Essential.ZygoteAD()" => 2-element BenchmarkTools.BenchmarkGroup:
		  tags: ["Zygote"]
		  "linked" => Trial(778.310 μs)
		  "standard" => Trial(768.010 μs)

That is, compiled ReverseDiff is incorrect when not linking! Super-strange, right?

Weeeell, not so much; LogDensityProblemsAD.jl uses zeros as the default input for compiling the tape, which, in the case where we have not performed any linking, causes issues with models involving, say, positively constrained distributions a la InverseGamma: https://github.com/tpapp/LogDensityProblemsAD.jl/blob/e13061ff72ddedb1fccf4deeb69f713972300239/ext/LogDensityProblemsADReverseDiffExt.jl#L54-L58

Note that this is not LogDensityProblemsAD.jl's fault, as it assumes we're working in unconstrained space.

This PR addresses this issue. It's not a very common use-case, but it's useful for identifying performance issues with transformations + it's also relevant if we want to work with Float32 instead of Float64, as the current implementation would then compile the tape with Float64 every time.

@yebai yebai requested a review from sunxd3 October 4, 2023 16:13
@codecov
Copy link

codecov bot commented Oct 4, 2023

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (ed410b1) 0.00% compared to head (31e8f70) 0.00%.

Additional details and impacted files
@@          Coverage Diff           @@
##           master   #2097   +/-   ##
======================================
  Coverage    0.00%   0.00%           
======================================
  Files          21      21           
  Lines        1451    1451           
======================================
  Misses       1451    1451           
Files Coverage Δ
src/essential/ad.jl 0.00% <0.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@sunxd3
Copy link
Member

sunxd3 commented Oct 5, 2023

Interesting, if I assume that people use autodiff for gradient based sampling algorithms, then are there gradient algorithms that do not require unconstrained space?

@torfjelde
Copy link
Member Author

then are there gradient algorithms that do not require unconstrained space?

Yup, e.g. reflective HMC, though we don't currently have these implemented (though there is interest: TuringLang/AdvancedHMC.jl#310)

@sunxd3
Copy link
Member

sunxd3 commented Oct 5, 2023

@torfjelde actually, why compile with input zeros causing wrong result? Is it because ReverseDiff use zero inputs for specialization?

@yebai yebai merged commit b5a07b7 into master Oct 5, 2023
13 checks passed
@yebai yebai deleted the torfjelde/fix-for-reversediff-without-linking branch October 5, 2023 16:23
torfjelde added a commit that referenced this pull request Oct 6, 2023
yebai pushed a commit to TuringLang/JuliaBUGS.jl that referenced this pull request Oct 12, 2023
In light of TuringLang/Turing.jl#2097, we know
sometimes computation with `ReverseDiff` compiled can be wrong because
`LogDensityProblemsAD` uses zeros array for the compilation process.

This PR added a function `getparams` similar to
[`DynamicPPL.jl`'s](https://github.com/TuringLang/DynamicPPL.jl/blob/d204fcb658a889421525365808b9830be37d3fdb/src/logdensityfunction.jl#L89).

The PR also update the function `get_params_varinfo` so that we can
return a DPPL compatible `SimpleVarInfo` with values in unconstrained
space.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants