Skip to content

Commit

Permalink
More TidierPlots and cleanup
Browse files Browse the repository at this point in the history
  • Loading branch information
dmbates committed May 6, 2024
1 parent 3654bc1 commit 5968b3c
Show file tree
Hide file tree
Showing 5 changed files with 68 additions and 48 deletions.
10 changes: 6 additions & 4 deletions aGHQ.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,8 @@ and define some constants
#| label: constantsA03
@isdefined(contrasts) || const contrasts = Dict{Symbol,Any}()
@isdefined(progress) || const progress = false
TidierPlots_set("plot_show", false)
TidierPlots_set("plot_log", false)
```
## Generalized linear models for binary data {#sec-BernoulliGLM}
Expand Down Expand Up @@ -404,7 +406,7 @@ Each evaluation of the deviance is fast, requiring only a fraction of a millisec
```{julia}
βopt = copy(com05fe.β)
@b deviance(setβ!($com05fe, $βopt))
@be deviance(setβ!($com05fe, $βopt))
```
but the already large number of evaluations for these six coefficients would not scale well as this dimension increases.
Expand Down Expand Up @@ -636,7 +638,7 @@ The IRLS algorithm has converged in 4 iterations to essentially the same devianc
Each iteration of the IRLS algorithm takes more time than a deviance evaluation, but still only a fraction of a millisecond on a laptop computer.
```{julia}
@b deviance(updateβ!($com05fe))
@be deviance(updateβ!($com05fe))
```
## GLMMs and the PIRLS algorithm {#sec-PIRLS}
Expand Down Expand Up @@ -867,7 +869,7 @@ pirls!(m; verbose=true);
As with IRLS, PIRLS is a fast and stable algorithm for determining the mode of the conditional distribution $(\mcU|\mcY=\bby)$ with $\bbtheta$ and $\bbbeta$ held fixed.
```{julia}
@b pirls!($m)
@be pirls!($m)
```
The time taken for the four iterations to determine the conditional mode of $\bbu$ is comparable to the time taken for a single call to `updateβ!`.
Expand Down Expand Up @@ -1143,7 +1145,7 @@ Notice that the magnitudes of the weights drop quite dramatically as the positio
```{julia}
#| code-fold: true
#| fig-cap: Weights (logarithm base 2) and positions for the 9th order normalized Gauss-Hermite quadrature rule
#| fig-cap: Weights (logarithm base 10) and positions for the 9th order normalized Gauss-Hermite quadrature rule
#| label: fig-ghninelog
ggplot(df9, aes(; x=:abscissae, y=:weights)) +
geom_point() +
Expand Down
29 changes: 16 additions & 13 deletions intro.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,8 @@ using StatsBase # basic statistical summaries
using TidierPlots # ggplot2-like graphics in Julia
using EmbraceUncertainty: dataset # `dataset` means this one
TidierPlots_set("plot_show", false)
TidierPlots_set("plot_log", false)
```

A package must be attached before any of the data sets or functions in the package can be used.
Expand Down Expand Up @@ -621,6 +623,7 @@ Plots of the bootstrap estimates for individual parameters are obtained by extra
For example,
```{julia}
tbl = DataFrame(dsm01samp.tbl)
βdf = @subset(dsm01pars, :type == "β")
```
Expand All @@ -633,10 +636,7 @@ You can check the details by clicking on the "Code" button in the HTML version o
#| fig-cap: Kernel density plot of bootstrap fixed-effects parameter estimates from dsm01
#| label: fig-dsm01_bs_beta_density
#| warning: false
ggplot(
filter(==("β") ∘ getproperty(:type), dsm01pars),
aes(; x=:value),
) +
ggplot(tbl, aes(; x=:β1)) +
geom_density() +
labs(; x="Bootstrap samples of β₁")
```
Expand All @@ -655,15 +655,18 @@ The situation is different for the estimates of the standard deviation parameter
#| fig-cap: Kernel density plot of bootstrap variance-component parameter estimates from model dsm01
#| label: fig-dsm01_bs_sigma_density
#| warning: false
draw(
data(@subset(dsm01pars, :type == "σ")) *
mapping(
:value => "Bootstrap samples of σ";
color=(:group => "Group"),
) *
AlgebraOfGraphics.density();
figure=(; size=(600, 340)),
)
ggplot(stack(DataFrame(tbl), 3:4), aes(; x=:value, color=:variable)) +
geom_density(alpha=0.6) +
labs(; x="Bootstrap samples of σ")
# draw(
# data(@subset(dsm01pars, :type == "σ")) *
# mapping(
# :value => "Bootstrap samples of σ";
# color=(:group => "Group"),
# ) *
# AlgebraOfGraphics.density();
# figure=(; size=(600, 340)),
# )
```
The estimator for the residual standard deviation, $\sigma$, is approximately normally distributed but the estimator for $\sigma_1$, the standard deviation of the `batch` random effects is bimodal (i.e. has two "modes" or local maxima).
Expand Down
2 changes: 2 additions & 0 deletions largescaledesigned.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,8 @@ and define some constants, if not already defined.
#| label: constants04
@isdefined(contrasts) || const contrasts = Dict{Symbol,Any}()
@isdefined(progress) || const progress = false
TidierPlots_set("plot_show", false)
TidierPlots_set("plot_log", false)
```

As with many techniques in data science, the place where "the rubber meets the road", as they say in the automotive industry, for mixed-effects models is when working on large-scale studies.
Expand Down
25 changes: 15 additions & 10 deletions longitudinal.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,8 @@ and declare some constants, if not already defined.
#| label: constants03
@isdefined(contrasts) || const contrasts = Dict{Symbol,Any}()
@isdefined(progress) || const progress = false
TidierPlots_set("plot_show", false)
TidierPlots_set("plot_log", false)
```

Longitudinal data consist of repeated measurements on the same subject, or some other observational unit, taken over time.
Expand Down Expand Up @@ -589,6 +591,7 @@ bxm03samp = parametricbootstrap(
progress=false,
)
bxm03pars = DataFrame(bxm03samp.allpars)
tbl = DataFrame(bxm03samp.tbl)
DataFrame(shortestcovint(bxm03samp))
```

Expand All @@ -600,16 +603,18 @@ A kernel density plot, @fig-bxm03rhodens, of the parametric bootstrap estimates
#| code-fold: true
#| fig-cap: Kernel density plots of parametric bootstrap estimates of correlation estimates from model bxm03
#| label: fig-bxm03rhodens
#| warning: false
draw(
data(@subset(bxm03pars, :type == "ρ")) *
mapping(
:value => "Bootstrap replicates of correlation estimates";
color=(:names => "Variables"),
) *
AlgebraOfGraphics.density();
figure=(; size=(600, 400)),
)
ggplot(stack(tbl, [:ρ1, :ρ2, :ρ3]), aes(; x=:value, color=:variable)) +
geom_density() +
labs(; x="Bootstrap replicates of correlation estimates")
# draw(
# data(@subset(bxm03pars, :type == "ρ")) *
# mapping(
# :value => "Bootstrap replicates of correlation estimates";
# color=(:names => "Variables"),
# ) *
# AlgebraOfGraphics.density();
# figure=(; size=(600, 400)),
# )
```

Even on the scale of [Fisher's z transformation](https://en.wikipedia.org/wiki/Fisher_transformation), @fig-bxm03rhodensatanh, these estimates are highly skewed.
Expand Down
50 changes: 29 additions & 21 deletions multiple.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,8 @@ and define some constants, if not already defined,
#| label: constants02
@isdefined(contrasts) || const contrasts = Dict{Symbol,Any}()
@isdefined(progress) || const progress = false
TidierPlots_set("plot_show", false)
TidierPlots_set("plot_log", false)
```

The mixed models considered in the previous chapter had only one random-effects term, which was a simple, scalar random-effects term, and a single fixed-effects coefficient.
Expand Down Expand Up @@ -239,6 +241,7 @@ A parametric bootstrap sample of the parameter estimates
#| code-fold: true
bsrng = Random.seed!(9876789)
pnm01samp = parametricbootstrap(bsrng, 10_000, pnm01; progress)
tbl = DataFrame(pnm01samp.tbl)
pnm01pars = DataFrame(pnm01samp.allpars);
```

Expand All @@ -255,7 +258,7 @@ As for model `dsm01` the bootstrap parameter estimates of the fixed-effects para
#| fig-cap: "Parametric bootstrap estimates of fixed-effects parameters in model pnm01"
#| label: fig-pnm01bsbeta
#| warning: false
ggplot(DataFrame(pnm01samp.tbl), aes(x=:β1)) +
ggplot(tbl, aes(x=:β1)) +
geom_density() +
labs(; x="Bootstrap samples of β₁")
```
Expand All @@ -273,16 +276,18 @@ The densities of the variance-components, on the scale of the standard deviation
#| fig-cap: "Parametric bootstrap estimates of variance components in model pnm01"
#| label: fig-pnm01bssigma
#| code-fold: true
#| warning: false
draw(
data(@subset(pnm01pars, :type == "σ")) *
mapping(
:value => "Bootstrap samples of σ";
color=(:group => "Group"),
) *
AlgebraOfGraphics.density();
figure=(; size=(600, 340)),
)
ggplot(stack(tbl, [:σ, :σ1, :σ2]), aes(; x=:value, color=:variable)) +
geom_density() +
labs(; x="Bootstrap samples of σ")
# draw(
# data(@subset(pnm01pars, :type == "σ")) *
# mapping(
# :value => "Bootstrap samples of σ";
# color=(:group => "Group"),
# ) *
# AlgebraOfGraphics.density();
# figure=(; size=(600, 340)),
# )
```

The lack of precision in the estimate of $\sigma_2$, the standard deviation of the random effects for `sample`, is a consequence of only having 6 distinct levels of the `sample` factor.
Expand Down Expand Up @@ -437,23 +442,26 @@ Furthermore, kernel density estimates from a parametric bootstrap sample of the
```{julia}
Random.seed!(4567654)
psm01samp = parametricbootstrap(10_000, psm01; progress)
tbl = DataFrame(psm01samp.tbl)
psm01pars = DataFrame(psm01samp.allpars);
```

```{julia}
#| fig-cap: "Kernel density plots of bootstrap estimates of σ for model psm01"
#| label: fig-psm01bssampdens
#| code-fold: true
#| warning: false
draw(
data(@subset(psm01pars, :type == "σ")) *
mapping(
:value => "Bootstrap samples of σ";
color=(:group => "Group"),
) *
AlgebraOfGraphics.density();
figure=(; size=(600, 340)),
)
ggplot(stack(tbl, [:σ, :σ1, :σ2]), aes(; x=:value, color=:variable)) +
geom_density() +
labs(; x="Bootstrap samples of σ")
# draw(
# data(@subset(psm01pars, :type == "σ")) *
# mapping(
# :value => "Bootstrap samples of σ";
# color=(:group => "Group"),
# ) *
# AlgebraOfGraphics.density();
# figure=(; size=(600, 340)),
# )
```

Because there are several indications that $\sigma_2$ could reasonably be zero, resulting in a simpler model incorporating random effects for only `sample`, we perform a statistical test of this hypothesis.
Expand Down

2 comments on commit 5968b3c

@kliegl
Copy link
Collaborator

@kliegl kliegl commented on 5968b3c May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmbates

| fig-cap: "Conditional means of scalar random effects for item in model elm01, fit to the pruned data, versus those for model elm02, fit to the pruned data with inaccurate subjects removed."
| label: fig-itemreelm01vselm02
FIXME: figure out how to change the alpha on the points
ggplot( # this throws an error on the use of color=:isword - can't work out why

ggplot() appears to want string variables with an explicitly specified length (e.g., InlineStrings.String15()). I converted isword from Boolean to String15 variable named target with levels "nonword" and "word".

This chunk works for me:

typeof(condmeans.target)
# Vector{String15} (alias for Array{String15, 1})

fig1 =
  ggplot(condmeans, aes(x = :elm01, y = :elm02, color = :target)) + 
    geom_point(alpha=.10) +
    scale_color_discrete(palette = "julia") +
    labs(x="Conditional means of item random effects for model elm01",
         y="Conditional means of item random effects for model elm02"
         )

@kliegl
Copy link
Collaborator

@kliegl kliegl commented on 5968b3c May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmbates
In the ELP chapter 4, I propose to generate target very early in preprocessing and use it in place of isword. Ok?

Please sign in to comment.