Implement AD testing and benchmarking (with DITest) #883

penelopeysm · 2025-04-04T02:09:37Z

Part 2 of two options. The other one at #882.

Closes #869

Why am I not in favour of this one?

I think some exposition is required here, and I didn't have time to explain this super clearly during the meeting.

The API of DITest is like this:

You construct a scenario, which includes the function f, the value at which to evaluate it / the gradient x, and a bunch of other things. Crucially, the scenario does not include the adtype.
You then run the scenario with an adtype (or an array thereof).

From the perspective of generic functions f, this is quite a nice interface. The tricky bit with DynamicPPL, as I briefly mentioned, is that when you pass LogDensityFunction a model, varinfo, etc. it does a bunch of things that not only changes the function f being differentiated, but also potentially modifies the adtype that is actually used. See, especially, this constructor:

DynamicPPL.jl/src/logdensityfunction.jl

Lines 110 to 115 in 019e41b

    
           function LogDensityFunction( 
        
               model::Model, 
        
               varinfo::AbstractVarInfo=VarInfo(model), 
        
               context::AbstractContext=leafcontext(model.context); 
        
               adtype::Union{ADTypes.AbstractADType,Nothing}=nothing, 
        
           )

.

(Note that LogDensityFunctionsAD.jl used to do this stuff for us; #806 effectively removed it and inlined its optimisations into that inner constructor.)

What this means is that, to be completely consistent with the way DynamicPPL behaves, one has to:

Reproduce the code inside src/logdensityfunctions.jl that generates the function f, so that the scenario can use the correct f.
Because the above depends on the adtype, we have to make sure that scenarios generated with one adtype are later run with the same adtype.
- In fact, the preparation in the LogDensityFunction doesn't only depend on the adtype; it potentially also modifies the adtype.
- That's why this PR doesn't just include make_scenario; it also includes a run_ad function below, which ensures that the scenario is run with the appropriately modified adtype.

If we adopt this PR, then we have to choose between either:

Duplicating the code inside src/logdensityfunctions.jl, as I've done in this PR; or
Cutting this duplicated code out, which means that the results obtained when using this test/benchmark function will differ from the results when actually sampling a Turing model;
Removing the extra prep work inside src/logdensityfunctions.jl

(3) is a no-go as it would have noticeable impacts on performance, and even though I think it'd be very nice if we could just export a list of scenarios, I'm not really comfortable with either (1) or (2), and I don't think it's a good enough reason to do either.

The alternative to this, #882, already makes the API very straightforward (it's just one function with a very thorough docstring) and so I don't think it's unfair to define that as our interface - especially considering that it's most likely that we will actually be the ones writing the integration tests for other people.

github-actions · 2025-04-04T02:18:59Z

Benchmark Report for Commit `e1a34e1`

Computer Information

Julia Version 1.11.4
Commit 8561cc3d68d (2025-03-10 11:36 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 4 × AMD EPYC 7763 64-Core Processor
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)

Benchmark Results

|                 Model | Dimension |  AD Backend |      VarInfo Type | Linked | Eval Time / Ref Time | AD Time / Eval Time |
|-----------------------|-----------|-------------|-------------------|--------|----------------------|---------------------|
| Simple assume observe |         1 | forwarddiff |             typed |  false |                  9.9 |                 1.5 |
|           Smorgasbord |       201 | forwarddiff |             typed |  false |                617.9 |                42.6 |
|           Smorgasbord |       201 | forwarddiff | simple_namedtuple |   true |                419.8 |                48.4 |
|           Smorgasbord |       201 | forwarddiff |           untyped |   true |               1243.3 |                27.5 |
|           Smorgasbord |       201 | forwarddiff |       simple_dict |   true |               3937.0 |                20.4 |
|           Smorgasbord |       201 | reversediff |             typed |   true |               1459.6 |                29.8 |
|           Smorgasbord |       201 |    mooncake |             typed |   true |                944.3 |                 5.4 |
|    Loop univariate 1k |      1000 |    mooncake |             typed |   true |               5567.2 |                 4.1 |
|       Multivariate 1k |      1000 |    mooncake |             typed |   true |               1123.4 |                 8.2 |
|   Loop univariate 10k |     10000 |    mooncake |             typed |   true |              61969.3 |                 3.7 |
|      Multivariate 10k |     10000 |    mooncake |             typed |   true |               8946.0 |                 9.6 |
|               Dynamic |        10 |    mooncake |             typed |   true |                136.5 |                11.9 |
|              Submodel |         1 |    mooncake |             typed |   true |                 25.7 |                 7.7 |
|                   LDA |        12 | reversediff |             typed |   true |                479.8 |                 5.2 |

codecov · 2025-04-04T02:28:21Z

Codecov Report

Attention: Patch coverage is 88.88889% with 2 lines in your changes missing coverage. Please review.

Project coverage is 84.89%. Comparing base (eed80e5) to head (e1a34e1).
Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
ext/DynamicPPLDifferentiationInterfaceTestExt.jl	88.88%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #883      +/-   ##
==========================================
+ Coverage   84.87%   84.89%   +0.01%     
==========================================
  Files          34       35       +1     
  Lines        3815     3833      +18     
==========================================
+ Hits         3238     3254      +16     
- Misses        577      579       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coveralls · 2025-04-04T02:29:30Z

Pull Request Test Coverage Report for Build 14256574630

Details

0 of 14 (0.0%) changed or added relevant lines in 1 file are covered.
No unchanged relevant lines lost coverage.
Overall coverage decreased (-3.5%) to 81.418%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
ext/DynamicPPLDifferentiationInterfaceTestExt.jl	0	14	0.0%

Totals
Change from base Build 14127923718:	-3.5%
Covered Lines:	3111
Relevant Lines:	3821

💛 - Coveralls

coveralls · 2025-04-04T02:42:20Z

Pull Request Test Coverage Report for Build 14263072728

Details

0 of 18 (0.0%) changed or added relevant lines in 1 file are covered.
20 unchanged lines in 3 files lost coverage.
Overall coverage increased (+0.02%) to 84.983%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
ext/DynamicPPLDifferentiationInterfaceTestExt.jl	0	18	0.0%

Files with Coverage Reduction	New Missed Lines	%
src/model.jl	1	85.83%
src/varinfo.jl	3	84.51%
src/threadsafe.jl	16	55.05%

Totals
Change from base Build 14127923718:	0.02%
Covered Lines:	3254
Relevant Lines:	3829

💛 - Coveralls

sunxd3 · 2025-04-08T12:27:05Z

The reasons for preference are super valid. I also think that since the hand-rolled version is not too complicated, it's worth to maintain it ourselves. Otherwise for new contributors to be able to contribute to this, they need to know what a test scenario is for DIT.

penelopeysm added 2 commits April 4, 2025 03:09

Implement AD testing (with DITest)

0aedf08

Fix 1.10 extensions

4539a80

penelopeysm changed the title ~~Implement AD testing (with DITest)~~ Implement AD testing and benchmarking (with DITest) Apr 4, 2025

Make interface more consistent

70e1aa9

penelopeysm self-assigned this Apr 7, 2025

penelopeysm mentioned this pull request Apr 7, 2025

Implement AD testing and benchmarking (hand rolled) #882

Open

Add varinfo to LogDensityFunction

e1a34e1

penelopeysm mentioned this pull request Apr 8, 2025

Release 0.36 #829

Draft

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement AD testing and benchmarking (with DITest) #883

Implement AD testing and benchmarking (with DITest) #883

penelopeysm commented Apr 4, 2025 •

edited

Loading

github-actions bot commented Apr 4, 2025 •

edited

Loading

codecov bot commented Apr 4, 2025 •

edited

Loading

coveralls commented Apr 4, 2025 •

edited

Loading

coveralls commented Apr 4, 2025 •

edited

Loading

sunxd3 commented Apr 8, 2025

	function LogDensityFunction(
	model::Model,
	varinfo::AbstractVarInfo=VarInfo(model),
	context::AbstractContext=leafcontext(model.context);
	adtype::Union{ADTypes.AbstractADType,Nothing}=nothing,
	)

Implement AD testing and benchmarking (with DITest) #883

Are you sure you want to change the base?

Implement AD testing and benchmarking (with DITest) #883

Conversation

penelopeysm commented Apr 4, 2025 • edited Loading

Why am I not in favour of this one?

github-actions bot commented Apr 4, 2025 • edited Loading

Benchmark Report for Commit e1a34e1

Computer Information

Benchmark Results

codecov bot commented Apr 4, 2025 • edited Loading

Codecov Report

coveralls commented Apr 4, 2025 • edited Loading

Pull Request Test Coverage Report for Build 14256574630

Details

💛 - Coveralls

coveralls commented Apr 4, 2025 • edited Loading

Pull Request Test Coverage Report for Build 14263072728

Details

💛 - Coveralls

sunxd3 commented Apr 8, 2025

penelopeysm commented Apr 4, 2025 •

edited

Loading

github-actions bot commented Apr 4, 2025 •

edited

Loading

Benchmark Report for Commit `e1a34e1`

codecov bot commented Apr 4, 2025 •

edited

Loading

coveralls commented Apr 4, 2025 •

edited

Loading

coveralls commented Apr 4, 2025 •

edited

Loading