Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to PRIMA, MatrixTable, and Chairmarks in aGHQ.qmd #50

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

dmbates
Copy link
Contributor

@dmbates dmbates commented Mar 6, 2024

@palday I have found that there is a substantial difference in the number of iterations required to converge on a simple GLMM in Appendix C, when I used the PRIMA.jl versions of BOBYQA and NEWUOA. While experimenting with PRIMA I also decided to abandon the approach of using rowtables and fall back on MatrixTable to work with tables of vectors of the same type. (As a person who taught factorial experimental design I should know better than to change multiple factors in the experiment but it seemed to work out.)

Along the way I also switched to using Lilith Hafner's Chairmarks.jl for benchmarking. It is cleaner and easier to use than BenchmarkTools. I'm not sure about the pun in the name but I guess I can live with it.

I realize this is a lot of changes to absorb. I guess I would check out this branch and render or preview the appendix then check out the main branch, render and compare. However, you may have better workflows.

@dmbates dmbates marked this pull request as draft March 6, 2024 20:24
@kliegl
Copy link
Collaborator

kliegl commented Mar 6, 2024

This? "Chairmarks... has more reliable back support. Back support is a defining feature of chairs while benches are known to sometimes lack back support."

@dmbates
Copy link
Contributor Author

dmbates commented Mar 6, 2024

This? "Chairmarks... has more reliable back support. Back support is a defining feature of chairs while benches are known to sometimes lack back support."

Exactly.

Comment on lines +210 to +215
@inbounds for i in eachindex(y, η, μ, dev)
ηi = η[i]
expmη = exp(-ηi)
μ[i] = inv(one(T) + expmη)
dev[i] = 2 * ((one(T) - y[i]) * ηi + log1p(expmη))
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because of the use of concrete eltypes and equal length vectors, each row is a constant distance away in memory, which means that the CPU prefetch may still work quite well -- you grab adjacent chunks of memory and load them into the L1/L2 cache and then you're grabbing elements out of that cache.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a vague recollection that the @inbounds here is redundant. That is, using eachindex essentially shuts down bounds checking on its arguments. Am I misremembering?

Previously I had tried to work with a row-table (vector of NamedTuples of floats) but it just got too complicated and it wasn't as effective as it should be. I keep thinking that I should transpose the underlying matrix so that the memory accesses in this loop are adjacent but I'm not sure if that optimization would be visible to the compiler. In this orientation the evaluation of the deviance is extremely fast so maybe it is best to just leave it alone.

aGHQ.qmd Outdated Show resolved Hide resolved
aGHQ.qmd Outdated
@@ -622,11 +642,11 @@ end
fit!(com05fe, β₀);
```

The IRLS algorithm has converged in 4 iterations to essentially the same deviance as the general optimizer achieved after around 500 function evaluations.
The IRLS algorithm has converged in 4 iterations to essentially the same deviance as the general optimizer achieved after more than 150 function evaluations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we should be grabbing this value from the code and rounding to something stable instead of manually updating.

[noblock]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference here is that PRIMA declares convergence in fewer iterations than does NLopt. Nevertheless, I take your point and will see about squirreling away the number of iterations to convergence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants