-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ADTypes + ADgradient Performance #727
base: master
Are you sure you want to change the base?
Conversation
Pull Request Test Coverage Report for Build 12068201796Details
💛 - Coveralls |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #727 +/- ##
==========================================
+ Coverage 84.35% 86.39% +2.04%
==========================================
Files 35 36 +1
Lines 4212 4183 -29
==========================================
+ Hits 3553 3614 +61
+ Misses 659 569 -90 ☔ View full report in Codecov by Sentry. |
Pull Request Test Coverage Report for Build 12083712337Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
maybe you have seen this, in DynamicPPL.jl/ext/DynamicPPLReverseDiffExt.jl Lines 11 to 24 in 48921d3
can the above code be removed then? |
Yes please:) We have one for ForwardDiff.jl (see |
Actually, does anyone know whether here is the only place that we interact with |
AFAIK that's the only place where LogDensityProblems is used, so yes, it seems we could just pass in an |
Not sure I fully follow |
I could have been clearer in my explanation. Here's a better one. The ProblemMy issue with the current implementation is method ambiguities. I've defined a method with signature Tuple{typeof(ADgradient), AbstractADType, LogDensityFunction} but there exist other methods in LogDensityProblemsAD.jl, located around here, with signatures such as Tuple{typeof(ADgradient), AutoEnzyme, Any}
Tuple{typeof(ADgradient), AutoForwardDiff, Any}
Tuple{typeof(ADgradient), AutoReverseDiff, Any} etc. Now, we currently have methods in DynamicPPL.jl (defined in extensions) which have signatures Tuple{typeof(ADgradient), AutoForwardDiff, LogDensityFunction}
Tuple{typeof(ADgradient), AutoReverseDiff, LogDensityFunction} which resolve the ambiguity discussed above for Potential SolutionsMy initial proposal above was to avoid this method ambiguity entirely by just not defining any new methods of This seems like a fine solution if we only ever call it in a single place (i.e. in Another option would be to introduce another function to the DPPL interface, which has two methods, with signatures Tuple{typeof(make_ad_gradient), ADType, LogDensityFunction} # ADType interface
Tuple{typeof(make_ad_gradient), ::Val, LogDensityFunction} # old LogDensityProblemsAD with `Val` interface Both of which would construct an This function would need to be documented as part of the public DynamicPPL interface, and linked to from the docstring for Thoughts @penelopeysm @torfjelde @sunxd3 ? |
I think all of you guys have better opinions on interface than I do. So this is more like a discussion point rather than strong suggestion. I think Tuple{typeof(ADgradient), AutoForwardDiff, LogDensityFunction} can causes confusion for (potential) maintainers (us), but straightforward for users that are familiar with I like the idea of |
Ah, damn 😕 Yeah this ain't great. But @willtebbutt why do we need to define this extraction for the
We had this before, but a big part of the motivation for moving to LogDensityProblemsAD.jl was to not diverge from the ecosystem by defining our own make_ad functions, so this goes quite counter to that. IF we make a new method, then the selling point that "you can also just treat a model as a LogDensityProblems.jl problem!" sort of isnt' true anymore, no? |
Hmmm yes, I agree that it would be a great shame to do something that users aren't expecting here. Okay, I propose the following:
I'm going to implement this now to see what it looks like. |
Happy with the internal |
Is it often the case that CI times out, or should I look into why this might be happening? |
The CI matrix is set to fail-fast, so in this case the coveralls app failed and that shut everything else down 😅 Rerunning it will probably fix it. Usually, DPPL CI should complete within half an hour on ubuntu. |
The way to use Mooncake with DPPL is to make use of the generic DifferentiationInterface.jl interface that was added to LogDensityProblemsAD.jl. i.e. write something like
where
log_density_function
is aDPPL.LogDensityFunction
.By default, this will hit this method in LogDensityProblemsAD.
This leads to DifferentiationInterface not having sufficient information to construct its
prep
object, in which various things are pre-allocated and, in the case of Mooncake, the rule is constructed. This means that this method oflogdensity_and_gradient
gets hit, in which theprep
object is reconstructed each and every time the rule is hit. This is moderately bad for Mooncake's performance, because this includes fetching the rule each and every time this function is called.This PR adds a method to
ADGradient
which is specialised to LogDensityFunction andAbstractADType
which ensures that the optionalx
kwarg is always passed in. This is enough to ensure good performance with Mooncake.Questions:
setmodel
to always do this every time thatADgradient
is called.ADgradient
runs correctly?Misc:
[extras]
block in the primaryProject.toml
because we use thetest/Project.toml
for our test deps.