-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ADTypes + ADgradient Performance #727
Conversation
Pull Request Test Coverage Report for Build 12068201796Details
💛 - Coveralls |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #727 +/- ##
==========================================
+ Coverage 86.32% 86.34% +0.01%
==========================================
Files 35 34 -1
Lines 4249 4254 +5
==========================================
+ Hits 3668 3673 +5
Misses 581 581 ☔ View full report in Codecov by Sentry. |
Pull Request Test Coverage Report for Build 12206812251Details
💛 - Coveralls |
maybe you have seen this, in DynamicPPL.jl/ext/DynamicPPLReverseDiffExt.jl Lines 11 to 24 in 48921d3
can the above code be removed then? |
Yes please:) We have one for ForwardDiff.jl (see |
Actually, does anyone know whether here is the only place that we interact with |
AFAIK that's the only place where LogDensityProblems is used, so yes, it seems we could just pass in an |
Not sure I fully follow |
I could have been clearer in my explanation. Here's a better one. The ProblemMy issue with the current implementation is method ambiguities. I've defined a method with signature Tuple{typeof(ADgradient), AbstractADType, LogDensityFunction} but there exist other methods in LogDensityProblemsAD.jl, located around here, with signatures such as Tuple{typeof(ADgradient), AutoEnzyme, Any}
Tuple{typeof(ADgradient), AutoForwardDiff, Any}
Tuple{typeof(ADgradient), AutoReverseDiff, Any} etc. Now, we currently have methods in DynamicPPL.jl (defined in extensions) which have signatures Tuple{typeof(ADgradient), AutoForwardDiff, LogDensityFunction}
Tuple{typeof(ADgradient), AutoReverseDiff, LogDensityFunction} which resolve the ambiguity discussed above for Potential SolutionsMy initial proposal above was to avoid this method ambiguity entirely by just not defining any new methods of This seems like a fine solution if we only ever call it in a single place (i.e. in Another option would be to introduce another function to the DPPL interface, which has two methods, with signatures Tuple{typeof(make_ad_gradient), ADType, LogDensityFunction} # ADType interface
Tuple{typeof(make_ad_gradient), ::Val, LogDensityFunction} # old LogDensityProblemsAD with `Val` interface Both of which would construct an This function would need to be documented as part of the public DynamicPPL interface, and linked to from the docstring for Thoughts @penelopeysm @torfjelde @sunxd3 ? |
I think all of you guys have better opinions on interface than I do. So this is more like a discussion point rather than strong suggestion. I think Tuple{typeof(ADgradient), AutoForwardDiff, LogDensityFunction} can causes confusion for (potential) maintainers (us), but straightforward for users that are familiar with I like the idea of |
Ah, damn 😕 Yeah this ain't great. But @willtebbutt why do we need to define this extraction for the
We had this before, but a big part of the motivation for moving to LogDensityProblemsAD.jl was to not diverge from the ecosystem by defining our own make_ad functions, so this goes quite counter to that. IF we make a new method, then the selling point that "you can also just treat a model as a LogDensityProblems.jl problem!" sort of isnt' true anymore, no? |
Hmmm yes, I agree that it would be a great shame to do something that users aren't expecting here. Okay, I propose the following:
I'm going to implement this now to see what it looks like. |
Happy with the internal |
Is it often the case that CI times out, or should I look into why this might be happening? |
Update: we're not going to be able to get this merged until JuliaRegistries/General#120562 is resolved. |
@mhauru @torfjelde any idea what this OOM error is about? Have we seen it anywhere else? It looks x86 specific, and like it's happening in a part of the pipeline which isn't AD related, but it does scare me a little bit. |
Seems like a case of this #725 |
Cool. As far as I'm concerned, this is reading to go then (I've bumped the patch version). @penelopeysm could you approve if you're happy, and we'll get it merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dopey stuff @willtebbutt :)
The way to use Mooncake with DPPL is to make use of the generic DifferentiationInterface.jl interface that was added to LogDensityProblemsAD.jl. i.e. write something like
where
log_density_function
is aDPPL.LogDensityFunction
.By default, this will hit this method in LogDensityProblemsAD.
This leads to DifferentiationInterface not having sufficient information to construct its
prep
object, in which various things are pre-allocated and, in the case of Mooncake, the rule is constructed. This means that this method oflogdensity_and_gradient
gets hit, in which theprep
object is reconstructed each and every time the rule is hit. This is moderately bad for Mooncake's performance, because this includes fetching the rule each and every time this function is called.This PR adds a method to
ADGradient
which is specialised to LogDensityFunction andAbstractADType
which ensures that the optionalx
kwarg is always passed in. This is enough to ensure good performance with Mooncake.Questions:
setmodel
to always do this every time thatADgradient
is called.ADgradient
runs correctly?Misc:
[extras]
block in the primaryProject.toml
because we use thetest/Project.toml
for our test deps.