-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch to PRIMA, MatrixTable, and Chairmarks in aGHQ.qmd #50
base: main
Are you sure you want to change the base?
Conversation
This? "Chairmarks... has more reliable back support. Back support is a defining feature of chairs while benches are known to sometimes lack back support." |
Exactly. |
@inbounds for i in eachindex(y, η, μ, dev) | ||
ηi = η[i] | ||
expmη = exp(-ηi) | ||
μ[i] = inv(one(T) + expmη) | ||
dev[i] = 2 * ((one(T) - y[i]) * ηi + log1p(expmη)) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because of the use of concrete eltypes and equal length vectors, each row is a constant distance away in memory, which means that the CPU prefetch may still work quite well -- you grab adjacent chunks of memory and load them into the L1/L2 cache and then you're grabbing elements out of that cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a vague recollection that the @inbounds
here is redundant. That is, using eachindex
essentially shuts down bounds checking on its arguments. Am I misremembering?
Previously I had tried to work with a row-table (vector of NamedTuples of floats) but it just got too complicated and it wasn't as effective as it should be. I keep thinking that I should transpose the underlying matrix so that the memory accesses in this loop are adjacent but I'm not sure if that optimization would be visible to the compiler. In this orientation the evaluation of the deviance is extremely fast so maybe it is best to just leave it alone.
aGHQ.qmd
Outdated
@@ -622,11 +642,11 @@ end | |||
fit!(com05fe, β₀); | |||
``` | |||
|
|||
The IRLS algorithm has converged in 4 iterations to essentially the same deviance as the general optimizer achieved after around 500 function evaluations. | |||
The IRLS algorithm has converged in 4 iterations to essentially the same deviance as the general optimizer achieved after more than 150 function evaluations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like we should be grabbing this value from the code and rounding to something stable instead of manually updating.
[noblock]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference here is that PRIMA declares convergence in fewer iterations than does NLopt. Nevertheless, I take your point and will see about squirreling away the number of iterations to convergence.
Co-authored-by: Phillip Alday <[email protected]>
@palday I have found that there is a substantial difference in the number of iterations required to converge on a simple GLMM in Appendix C, when I used the PRIMA.jl versions of BOBYQA and NEWUOA. While experimenting with PRIMA I also decided to abandon the approach of using rowtables and fall back on MatrixTable to work with tables of vectors of the same type. (As a person who taught factorial experimental design I should know better than to change multiple factors in the experiment but it seemed to work out.)
Along the way I also switched to using Lilith Hafner's Chairmarks.jl for benchmarking. It is cleaner and easier to use than BenchmarkTools. I'm not sure about the pun in the name but I guess I can live with it.
I realize this is a lot of changes to absorb. I guess I would check out this branch and render or preview the appendix then check out the main branch, render and compare. However, you may have better workflows.