ADTypes + ADgradient Performance #727

willtebbutt · 2024-11-28T11:41:18Z

The way to use Mooncake with DPPL is to make use of the generic DifferentiationInterface.jl interface that was added to LogDensityProblemsAD.jl. i.e. write something like

ADgradient(ADTypes.AutoMooncake(; config=nothing), log_density_function)

where log_density_function is a DPPL.LogDensityFunction.

By default, this will hit this method in LogDensityProblemsAD.

This leads to DifferentiationInterface not having sufficient information to construct its prep object, in which various things are pre-allocated and, in the case of Mooncake, the rule is constructed. This means that this method of logdensity_and_gradient gets hit, in which the prep object is reconstructed each and every time the rule is hit. This is moderately bad for Mooncake's performance, because this includes fetching the rule each and every time this function is called.

This PR adds a method to ADGradient which is specialised to LogDensityFunction and AbstractADType which ensures that the optional x kwarg is always passed in. This is enough to ensure good performance with Mooncake.

Questions:

is this the optimal way to implement this? Another option might be to modify setmodel to always do this every time that ADgradient is called.
Where / how should I test this? Should I just add Mooncake to the test suite and verify that ADgradient runs correctly?

Misc:

I've removed the [extras] block in the primary Project.toml because we use the test/Project.toml for our test deps.
I've bumped the patch so that we can tag a release asap after this is merged.

coveralls · 2024-11-28T12:05:11Z

Pull Request Test Coverage Report for Build 12068201796

Details

0 of 3 (0.0%) changed or added relevant lines in 1 file are covered.
No unchanged relevant lines lost coverage.
Overall coverage decreased (-0.06%) to 84.294%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
src/logdensityfunction.jl	0	3	0.0%

Totals
Change from base Build 12056044639:	-0.06%
Covered Lines:	3553
Relevant Lines:	4215

💛 - Coveralls

codecov · 2024-11-28T12:06:00Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 86.34%. Comparing base (5a58571) to head (5b7ab97).
Report is 1 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #727      +/-   ##
==========================================
+ Coverage   86.32%   86.34%   +0.01%     
==========================================
  Files          35       34       -1     
  Lines        4249     4254       +5     
==========================================
+ Hits         3668     3673       +5     
  Misses        581      581

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

coveralls · 2024-11-28T12:07:06Z

Pull Request Test Coverage Report for Build 12206812251

Details

7 of 7 (100.0%) changed or added relevant lines in 1 file are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+0.02%) to 86.342%

Totals
Change from base Build 12200518816:	0.02%
Covered Lines:	3673
Relevant Lines:	4254

💛 - Coveralls

sunxd3 · 2024-11-28T12:36:55Z

maybe you have seen this, in ReverseDiffExt there is similar code for different reason

DynamicPPL.jl/ext/DynamicPPLReverseDiffExt.jl

Lines 11 to 24 in 48921d3

    
           function LogDensityProblemsAD.ADgradient( 
        
               ad::ADTypes.AutoReverseDiff{Tcompile}, ℓ::DynamicPPL.LogDensityFunction 
        
           ) where {Tcompile} 
        
               return LogDensityProblemsAD.ADgradient( 
        
                   Val(:ReverseDiff), 
        
                   ℓ; 
        
                   compile=Val(Tcompile), 
        
                   # `getparams` can return `Vector{Real}`, in which case, `ReverseDiff` will initialize the gradients to Integer 0 
        
                   # because at https://github.com/JuliaDiff/ReverseDiff.jl/blob/c982cde5494fc166965a9d04691f390d9e3073fd/src/tracked.jl#L473 
        
                   # `zero(D)` will return 0 when D is Real. 
        
                   # here we use `identity` to possibly concretize the type to `Vector{Float64}` in the case of `Vector{Real}`. 
        
                   x=map(identity, DynamicPPL.getparams(ℓ)), 
        
               ) 
        
           end

can the above code be removed then?

willtebbutt · 2024-11-28T13:17:44Z

Ooooo I think we might be able to. This is because there is this code in the DI.jl extension of LogDensityProblemsAD.

Probably good to do a quick performance check though.

I wonder whether the ForwardDiff code found here could also be removed...

torfjelde · 2024-11-28T15:20:51Z

Where / how should I test this? Should I just add Mooncake to the test suite and verify that ADgradient runs correctly?

Yes please:) We have one for ForwardDiff.jl (see test/ext/... and the AD tests in runtests.jl). Maybe just add a similar one?

willtebbutt · 2024-11-28T17:57:25Z

Actually, does anyone know whether here is the only place that we interact with ADgradient? It's the only place I can find it in DPPL. I wonder whether we should just change the call here to pass in x, rather than adding random extra methods of ADgradient?

penelopeysm · 2024-11-28T18:26:18Z

AFAIK that's the only place where LogDensityProblems is used, so yes, it seems we could just pass in an x there.

torfjelde · 2024-11-28T19:52:34Z

Not sure I fully follow

willtebbutt · 2024-11-29T09:12:41Z

I could have been clearer in my explanation. Here's a better one.

The Problem

My issue with the current implementation is method ambiguities. I've defined a method with signature

Tuple{typeof(ADgradient), AbstractADType, LogDensityFunction}

but there exist other methods in LogDensityProblemsAD.jl, located around here, with signatures such as

Tuple{typeof(ADgradient), AutoEnzyme, Any}
Tuple{typeof(ADgradient), AutoForwardDiff, Any}
Tuple{typeof(ADgradient), AutoReverseDiff, Any}

etc. Now, we currently have methods in DynamicPPL.jl (defined in extensions) which have signatures

Tuple{typeof(ADgradient), AutoForwardDiff, LogDensityFunction}
Tuple{typeof(ADgradient), AutoReverseDiff, LogDensityFunction}

which resolve the ambiguity discussed above for AutoForwardDiff and AutoReverseDiff, but I imagine we'll encounter problems for AutoEnzyme and AutoZygote. Also, we would quite like to remove these methods, so they don't constitute a solution to the problem.

Potential Solutions

My initial proposal above was to avoid this method ambiguity entirely by just not defining any new methods of ADgradient, and simply ensuring that we always make sure to pass in the x kwarg when calling ADgradient with an AbstractADType.

This seems like a fine solution if we only ever call it in a single place (i.e. in setmodel), but if we call ADgradient in many places, it's a pain to ensure that we do the (somewhat arcane) thing required to get x in all of the places.

Another option would be to introduce another function to the DPPL interface, which has two methods, with signatures

Tuple{typeof(make_ad_gradient), ADType, LogDensityFunction} # ADType interface
Tuple{typeof(make_ad_gradient), ::Val, LogDensityFunction} # old LogDensityProblemsAD with `Val` interface

Both of which would construct an ADgradient in whatever the correct manner is.

This function would need to be documented as part of the public DynamicPPL interface, and linked to from the docstring for LogDensityFunction.

Thoughts @penelopeysm @torfjelde @sunxd3 ?

sunxd3 · 2024-11-29T09:45:35Z

I think all of you guys have better opinions on interface than I do. So this is more like a discussion point rather than strong suggestion.

I think

Tuple{typeof(ADgradient), AutoForwardDiff, LogDensityFunction}

can causes confusion for (potential) maintainers (us), but straightforward for users that are familiar with LogDensityProblemsAD.

I like the idea of make_ad_gradient to avoid ambiguity. But it might be somewhat unavoidable that someone would try to call ADgradient with LogDensityFunction just because they think: "okay, LogDensityFunction conforms to LogDenistyProblems interface, so it should just work withLogDensityProblemsAD." Then we would need to make ADgradient work regardless.

torfjelde · 2024-11-29T10:14:44Z

My issue with the current implementation is method ambiguities. I've defined a method with signature

Ah, damn 😕 Yeah this ain't great.

But @willtebbutt why do we need to define this extraction for the AbstractADBackend? Why dont' we just do ths on a case-by-case basis? Sure, that is a bit annoying, but there aren't that many AD backends we need to do it for.

Another option would be to introduce another function to the DPPL interface, which has two methods, with signatures

We had this before, but a big part of the motivation for moving to LogDensityProblemsAD.jl was to not diverge from the ecosystem by defining our own make_ad functions, so this goes quite counter to that. IF we make a new method, then the selling point that "you can also just treat a model as a LogDensityProblems.jl problem!" sort of isnt' true anymore, no?

willtebbutt · 2024-11-29T10:21:13Z

Hmmm yes, I agree that it would be a great shame to do something that users aren't expecting here.

Okay, I propose the following:

we define an internal function called _make_ad_gradient,
for each ADType we care about we add a method of ADgradient to an extension, which just defers the call to _make_ad_gradient. i.e. it should just be a 1-liner.

I'm going to implement this now to see what it looks like.

torfjelde · 2024-11-29T10:34:35Z

Happy with the internal _make_ad_gradient:)

willtebbutt · 2024-11-29T16:36:14Z

Is it often the case that CI times out, or should I look into why this might be happening?

Project.toml

willtebbutt · 2024-12-04T08:14:03Z

Update: we're not going to be able to get this merged until JuliaRegistries/General#120562 is resolved.

willtebbutt · 2024-12-05T14:15:28Z

@mhauru @torfjelde any idea what this OOM error is about? Have we seen it anywhere else? It looks x86 specific, and like it's happening in a part of the pipeline which isn't AD related, but it does scare me a little bit.

mhauru · 2024-12-05T14:33:58Z

Seems like a case of this #725

penelopeysm · 2024-12-05T14:36:20Z

#725

I thought I'd look into it, but those tests will be removed in #733 anyway so couldn't be bothered to track it down. Personally I'd be ok with ignoring the error.

willtebbutt · 2024-12-05T14:44:00Z

Cool. As far as I'm concerned, this is reading to go then (I've bumped the patch version). @penelopeysm could you approve if you're happy, and we'll get it merged.

test/ad.jl

torfjelde

Dopey stuff @willtebbutt :)

willtebbutt added 8 commits November 28, 2024 09:48

ADTypes interop

cc98429

Improve comment

749eca9

Bump patch version

4e6b97c

Formatting

f8847db

Formatting

1b33c75

Improve documentation

cc17471

Testing infrastructure

ea39bc7

Remove extras from main Project toml

c8c95c5

willtebbutt requested review from penelopeysm and torfjelde November 28, 2024 11:42

Merge branch 'master' into wct/mooncake-perf

88d132b

willtebbutt added 2 commits November 28, 2024 17:31

Apply some basic tests

3877cf0

Locate tests better

46bbf06

willtebbutt added 2 commits November 29, 2024 11:11

Internal _make_ad_gradient

4a234d6

Mark failing tests as broken

6fb7f9b

Formatting

99532e0

willtebbutt closed this Nov 30, 2024

willtebbutt reopened this Nov 30, 2024

Merge in main

d73668d

willtebbutt closed this Dec 2, 2024

willtebbutt reopened this Dec 2, 2024

willtebbutt closed this Dec 2, 2024

willtebbutt reopened this Dec 2, 2024

willtebbutt commented Dec 2, 2024

View reviewed changes

Project.toml Outdated Show resolved Hide resolved

Update Project.toml

21c2a0a

willtebbutt mentioned this pull request Dec 2, 2024

AD Meta Issue for 1.0 TuringLang/Turing.jl#2411

Open

9 tasks

willtebbutt and others added 2 commits December 4, 2024 18:42

Updates

73fbf34

Merge branch 'master' into wct/mooncake-perf

b21af0b

willtebbutt closed this Dec 5, 2024

willtebbutt reopened this Dec 5, 2024

Bump patch version

a785c5c

penelopeysm approved these changes Dec 5, 2024

View reviewed changes

wsmoses reviewed Dec 6, 2024

View reviewed changes

test/ad.jl Show resolved Hide resolved

torfjelde approved these changes Dec 6, 2024

View reviewed changes

penelopeysm mentioned this pull request Dec 6, 2024

Add Enzyme to AD tests #739

Open

penelopeysm added 2 commits December 6, 2024 22:03

Merge branch 'master' into wct/mooncake-perf

c2ff015

Bump patch again

5b7ab97

penelopeysm merged commit f0c31f0 into master Dec 7, 2024
11 of 13 checks passed

penelopeysm deleted the wct/mooncake-perf branch December 7, 2024 01:52

penelopeysm mentioned this pull request Dec 7, 2024

New minor versions of DPPL can't be tested #740

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ADTypes + ADgradient Performance #727

ADTypes + ADgradient Performance #727

willtebbutt commented Nov 28, 2024 •

edited

Loading

coveralls commented Nov 28, 2024 •

edited

Loading

codecov bot commented Nov 28, 2024 •

edited

Loading

coveralls commented Nov 28, 2024 •

edited

Loading

sunxd3 commented Nov 28, 2024

willtebbutt commented Nov 28, 2024

torfjelde commented Nov 28, 2024

willtebbutt commented Nov 28, 2024

penelopeysm commented Nov 28, 2024

torfjelde commented Nov 28, 2024

willtebbutt commented Nov 29, 2024 •

edited

Loading

sunxd3 commented Nov 29, 2024

torfjelde commented Nov 29, 2024

willtebbutt commented Nov 29, 2024 •

edited

Loading

torfjelde commented Nov 29, 2024

willtebbutt commented Nov 29, 2024

willtebbutt commented Dec 4, 2024

willtebbutt commented Dec 5, 2024 •

edited

Loading

mhauru commented Dec 5, 2024

penelopeysm commented Dec 5, 2024 •

edited

Loading

willtebbutt commented Dec 5, 2024 •

edited

Loading

torfjelde left a comment

ADTypes + ADgradient Performance #727

ADTypes + ADgradient Performance #727

Conversation

willtebbutt commented Nov 28, 2024 • edited Loading

Questions:

Misc:

coveralls commented Nov 28, 2024 • edited Loading

Pull Request Test Coverage Report for Build 12068201796

Details

💛 - Coveralls

codecov bot commented Nov 28, 2024 • edited Loading

Codecov Report

coveralls commented Nov 28, 2024 • edited Loading

Pull Request Test Coverage Report for Build 12206812251

Details

💛 - Coveralls

sunxd3 commented Nov 28, 2024

willtebbutt commented Nov 28, 2024

torfjelde commented Nov 28, 2024

willtebbutt commented Nov 28, 2024

penelopeysm commented Nov 28, 2024

torfjelde commented Nov 28, 2024

willtebbutt commented Nov 29, 2024 • edited Loading

The Problem

Potential Solutions

sunxd3 commented Nov 29, 2024

torfjelde commented Nov 29, 2024

willtebbutt commented Nov 29, 2024 • edited Loading

torfjelde commented Nov 29, 2024

willtebbutt commented Nov 29, 2024

willtebbutt commented Dec 4, 2024

willtebbutt commented Dec 5, 2024 • edited Loading

mhauru commented Dec 5, 2024

penelopeysm commented Dec 5, 2024 • edited Loading

willtebbutt commented Dec 5, 2024 • edited Loading

torfjelde left a comment

Choose a reason for hiding this comment

willtebbutt commented Nov 28, 2024 •

edited

Loading

coveralls commented Nov 28, 2024 •

edited

Loading

codecov bot commented Nov 28, 2024 •

edited

Loading

coveralls commented Nov 28, 2024 •

edited

Loading

willtebbutt commented Nov 29, 2024 •

edited

Loading

willtebbutt commented Nov 29, 2024 •

edited

Loading

willtebbutt commented Dec 5, 2024 •

edited

Loading

penelopeysm commented Dec 5, 2024 •

edited

Loading

willtebbutt commented Dec 5, 2024 •

edited

Loading