diff --git a/previews/PR56/api/index.html b/previews/PR56/api/index.html index c415918..a4b1428 100644 --- a/previews/PR56/api/index.html +++ b/previews/PR56/api/index.html @@ -1,16 +1,16 @@ -API · Effects.jl

API

Effects.AutoInvLinkType
AutoInvLink

Singleton type indicating that the inverse link should be automatically determined from the model type.

This will only work for model types with an appropriate extension on Julia 1.9+. If an appropriate extension is not defined, then an error will occur.

Currently, this is only implemented for GLM.jl and MixedModels.jl

source
Effects.effects!Method
effects!(reference_grid::DataFrame, model::RegressionModel;
+API · Effects.jl

API

Effects.AutoInvLinkType
AutoInvLink

Singleton type indicating that the inverse link should be automatically determined from the model type.

This will only work for model types with an appropriate extension on Julia 1.9+. If an appropriate extension is not defined, then an error will occur.

Currently, this is only implemented for GLM.jl and MixedModels.jl

source
Effects.effects!Method
effects!(reference_grid::DataFrame, model::RegressionModel;
          eff_col=nothing, err_col=:err, typical=mean, invlink=identity,
          vcov=StatsBase.vcov)

Compute the effects as specified in formula.

Effects are the model predictions made using values given via the reference grid. For terms present in the model, but not in the reference grid, then the typical value of those predictors is used. (In other words, effects are conditional on the typical value.) The function for computing typical values is specified via typical. Note that this is also applied to categorical contrasts, thus yielding an average of the contrast, weighted by the balance of levels in the data set used to fit the model.

typical can either be a single, scalar-valued function (e.g. mean) or a dictionary matching term symbols to scalar-valued functions. The use of a dictionary allows specifying different typical functions for different input variables. In this case, typical functions must be provided for all term variables except the intercept. If there is a single term that should be "typified" differently than others, then the use of DataStructures.DefaultDict may be useful to create a default typical value with only the exception explicitly specified. For example:

using DataStructures
 typical = DefaultDict(() -> mean)  # default to x -> mean(x)
 typical[:sex] = v -> 0.0           # typical value for :sex

By default, the column corresponding to the response variable in the formula is overwritten with the effects, but an alternative column for the effects can be specified by eff_col. Note that eff_col is determined first by trying StatsBase.responsename and then falling back to the string representation of the model's formula's left-hand side. For models with a transformed response, whether in the original formula specification or via hints/contrasts, the name will be the display name of the resulting term and not the original variable. This convention also has the advantage of highlighting another aspect of the underlying method: effects are computed on the scale of the transformed response. If this column does not exist, it is created. Pointwise standard errors are written into the column specified by err_col.

Note

By default (invlink=identity), effects are computed on the scale of the transformed response. For models with an explicit transformation, that transformation is the scale of the effects. For models with a link function, the scale of the effects is the link scale, i.e. after application of the link function. For example, effects for logitistic regression models are on the logit and not the probability scale.

Warning

If the inverse link is specified via invlink, then effects and errors are computed on the original, untransformed scale via the delta method using automatic differentiation. This means that the invlink function must be differentiable and should not involve inplace operations.

The special singleton value AutoInvLink() can be used to specify that the appropriate inverse link should be determined and, where possible, a direct or analytic computation of the derivative is used.

Effects are computed using the model's variance-covariance matrix, which is computed by default using StatsBas.vcov. Alternative methods such as the sandwich estimator or robust estimators can be used by specifying vcov, which should be a function of a single argument (the model) returning the estimated variance-covariance matrix. Vcov.jl and CovarianceMatrices.jl provide several possibilities as functions of multiple arguments and so it is necessary to curry when using these functions. For example

using Vcov
-myvcov(x) = Base.Fix2(vcov, Vcov.robust())

The reference grid must contain columns for all predictors in the formula. (Interactions are computed automatically.) Contrasts must match the contrasts used to fit the model; using other contrasts will lead to undefined behavior.

Interaction terms are computed in the same way for any regression model: as the product of the lower-order terms. Typical values of lower terms thus propagate up into the interaction term in the same way that any value would.

The use of typical values for excluded effects differs from other approaches such as "partial effects" used in R packages like remef. The R package effects is similar in approach, but due to differing languages and licenses, no source code was inspected and there is no attempt at API compatibility or even similarity.

The approach for computing effect is based on the effects plots described here:

Fox, John (2003). Effect Displays in R for Generalised Linear Models. Journal of Statistical Software. Vol. 8, No. 15

See also AutoInvLink.

source
Effects.effectsMethod
effects(design::AbstractDict, model::RegressionModel;
+myvcov(x) = Base.Fix2(vcov, Vcov.robust())

The reference grid must contain columns for all predictors in the formula. (Interactions are computed automatically.) Contrasts must match the contrasts used to fit the model; using other contrasts will lead to undefined behavior.

Interaction terms are computed in the same way for any regression model: as the product of the lower-order terms. Typical values of lower terms thus propagate up into the interaction term in the same way that any value would.

The use of typical values for excluded effects differs from other approaches such as "partial effects" used in R packages like remef. The R package effects is similar in approach, but due to differing languages and licenses, no source code was inspected and there is no attempt at API compatibility or even similarity.

The approach for computing effect is based on the effects plots described here:

Fox, John (2003). Effect Displays in R for Generalised Linear Models. Journal of Statistical Software. Vol. 8, No. 15

See also AutoInvLink.

source
Effects.effectsMethod
effects(design::AbstractDict, model::RegressionModel;
         eff_col=nothing, err_col=:err, typical=mean,
         lower_col=:lower, upper_col=:upper, invlink=identity,
-        vcov=StatsBase.vcov, level=nothing)

Compute the effects as specified by the design.

This is a convenience wrapper for effects!. Instead of specifying a reference grid, a dictionary containing the levels/values of each predictor is specified. This is then expanded into a reference grid representing a fully-crossed design. Additionally, two extra columns are created representing the lower and upper edges of the confidence interval. The default level=nothing corresponds to [resp-err, resp+err], while level=0.95 corresponds to the 95% confidence interval.

See also AutoInvLink.

source
Effects.emmeansMethod
emmeans(model::RegressionModel; eff_col=nothing, err_col=:err,
-        invlink=identity, levels=Dict(), dof=nothing)

Compute estimated marginal means, a.k.a. least-square (LS) means for a model.

By default, emmeans are computed for each level of each categorical variable along with the means of continuous variables. For centered terms, the center is used instead of the mean. Alternative levels can be specified with levels.

dof is used to compute the associated degrees of freedom for a particular margin. For regression models, the appropriate degrees of freedom are the same degrees of freedom as used for the Wald tests on the coefficients. These are typically the residual degrees of freedom (e.g., via dof_residual). If dof is a function, then it is called on model and filled elementwise into the dof column. Alternatively, dof can be specified as a single number or vector of appropriate length. For example, in a mixed effect model with a large number of observations, dof=Inf may be appropriate.

invlink, eff_col and err_col work exactly as in effects!.

Estimated marginal means are closely related to effects and are also known as least-square means. The functionality here is a convenience wrapper for effects and maps onto the concept of least-square means as presented in e.g. the SAS documentation. There are several extensions available to estimated marginal means, related to when the marginalization occurs and how cells are weighted, but these are not currently supported. The documentation for the R package emmeans explains the background in more depth.

source
Effects.empairsMethod
empairs(model::RegressionModel; eff_col=nothing, err_col=:err,
+        vcov=StatsBase.vcov, level=nothing)

Compute the effects as specified by the design.

This is a convenience wrapper for effects!. Instead of specifying a reference grid, a dictionary containing the levels/values of each predictor is specified. This is then expanded into a reference grid representing a fully-crossed design. Additionally, two extra columns are created representing the lower and upper edges of the confidence interval. The default level=nothing corresponds to [resp-err, resp+err], while level=0.95 corresponds to the 95% confidence interval.

See also AutoInvLink.

source
Effects.emmeansMethod
emmeans(model::RegressionModel; eff_col=nothing, err_col=:err,
+        invlink=identity, levels=Dict(), dof=nothing)

Compute estimated marginal means, a.k.a. least-square (LS) means for a model.

By default, emmeans are computed for each level of each categorical variable along with the means of continuous variables. For centered terms, the center is used instead of the mean. Alternative levels can be specified with levels.

dof is used to compute the associated degrees of freedom for a particular margin. For regression models, the appropriate degrees of freedom are the same degrees of freedom as used for the Wald tests on the coefficients. These are typically the residual degrees of freedom (e.g., via dof_residual). If dof is a function, then it is called on model and filled elementwise into the dof column. Alternatively, dof can be specified as a single number or vector of appropriate length. For example, in a mixed effect model with a large number of observations, dof=Inf may be appropriate.

invlink, eff_col and err_col work exactly as in effects!.

Estimated marginal means are closely related to effects and are also known as least-square means. The functionality here is a convenience wrapper for effects and maps onto the concept of least-square means as presented in e.g. the SAS documentation. There are several extensions available to estimated marginal means, related to when the marginalization occurs and how cells are weighted, but these are not currently supported. The documentation for the R package emmeans explains the background in more depth.

source
Effects.empairsMethod
empairs(model::RegressionModel; eff_col=nothing, err_col=:err,
         invlink=identity, levels=Dict(), dof=nothing, padjust=identity)
-empairs(df::AbstractDataFrame; eff_col, err_col=:err, padjust=identity)

Compute pairwise differences of estimated marginal means.

The method for AbstractDataFrame acts on the results of emmeans, while the method for RegressionModel is a convenience wrapper that calls emmeans internally.

The keyword arguments are generally the same as emmeans.

By default, pairs are computed for all combinations of categorical variables and the means/centers of continuous variables. The contrast for a pair "a" vs "b" is represented in the column for the contrasted variable with a > b. For variables that are the same within a pair (e.g., a continuous variable), the sole value is displayed as is.

If dof is not nothing, then p-values are also computed using a t-distribution with the resulting degrees of freedom. The results for dof=Inf correspond to using a z-distribution, but the column names in the returned dataframe remain t and Pr(>|t|).

If padjust is provided, then it is used to compute adjust the p-values for multiple comparisons. MultipleTesting.jl provides a number of useful possibilities for this.

Note

padjust is silently ignored if dof is not provided.

Warning

This feature is experimental and the precise column names and presentation of contrasts/differences may change without being considered breaking.

Warning

The use of invlink is subject to a number of interpretation subtleties. The EM means are computed on the scale of the linear predictor, then transformed to the scale of invlink. The associated errors on the transformed scale are computed via the difference method. These estimates and errors are then used to compute the pairs. Test statistics are computed on the scale of these pairs. In general, these will not be the same as test statistics computed on the original scale. These subleties are discussed in the documentation for the R package emmeans.

source
Effects.expand_gridMethod
expand_grid(design)

Compute a fully crossed reference grid.

design can be either a NamedTuple or a Dict, where the keys represent the variables, i.e., column names in the resulting grid and the values are vectors of possible values that each variable can take on.

julia> expand_grid(Dict(:x => ["a", "b"], :y => 1:3))
+empairs(df::AbstractDataFrame; eff_col, err_col=:err, padjust=identity)

Compute pairwise differences of estimated marginal means.

The method for AbstractDataFrame acts on the results of emmeans, while the method for RegressionModel is a convenience wrapper that calls emmeans internally.

The keyword arguments are generally the same as emmeans.

By default, pairs are computed for all combinations of categorical variables and the means/centers of continuous variables. The contrast for a pair "a" vs "b" is represented in the column for the contrasted variable with a > b. For variables that are the same within a pair (e.g., a continuous variable), the sole value is displayed as is.

If dof is not nothing, then p-values are also computed using a t-distribution with the resulting degrees of freedom. The results for dof=Inf correspond to using a z-distribution, but the column names in the returned dataframe remain t and Pr(>|t|).

If padjust is provided, then it is used to compute adjust the p-values for multiple comparisons. MultipleTesting.jl provides a number of useful possibilities for this.

Note

padjust is silently ignored if dof is not provided.

Warning

This feature is experimental and the precise column names and presentation of contrasts/differences may change without being considered breaking.

Warning

The use of invlink is subject to a number of interpretation subtleties. The EM means are computed on the scale of the linear predictor, then transformed to the scale of invlink. The associated errors on the transformed scale are computed via the difference method. These estimates and errors are then used to compute the pairs. Test statistics are computed on the scale of these pairs. In general, these will not be the same as test statistics computed on the original scale. These subleties are discussed in the documentation for the R package emmeans.

source
Effects.expand_gridMethod
expand_grid(design)

Compute a fully crossed reference grid.

design can be either a NamedTuple or a Dict, where the keys represent the variables, i.e., column names in the resulting grid and the values are vectors of possible values that each variable can take on.

julia> expand_grid(Dict(:x => ["a", "b"], :y => 1:3))
 6×2 DataFrame
  Row │ y      x
      │ Int64  String
@@ -20,4 +20,4 @@
    3 │     3  a
    4 │     1  b
    5 │     2  b
-   6 │     3  b
source
Effects.pooled_semMethod
pooled_sem(sems...)

Compute the pooled standard error of the mean.

This corresponds to the square root of the sum of the the squared SEMs, which in turn corresponds to the weighted root-mean-square of the underlying standard deviations.

source
+ 6 │ 3 b
source
Effects.pooled_semMethod
pooled_sem(sems...)

Compute the pooled standard error of the mean.

This corresponds to the square root of the sum of the the squared SEMs, which in turn corresponds to the weighted root-mean-square of the underlying standard deviations.

source
diff --git a/previews/PR56/emmeans/index.html b/previews/PR56/emmeans/index.html index 7cf42f0..6ad65b4 100644 --- a/previews/PR56/emmeans/index.html +++ b/previews/PR56/emmeans/index.html @@ -19,4 +19,4 @@ mixed_model2 = fit(MixedModel, mixed_form2, kb07; progress=false)
Est.SEzpσ_subjσ_item
(Intercept)2441.688184.381528.94<1e-99318.4042363.6708
prec: maintain-676.077948.5432-13.93<1e-43
load: yes148.142348.48653.060.0022
prec: maintain & load: yes17.303468.59040.250.8008
Residual725.2555
empairs(mixed_model2; dof=dof_residual)
6×7 DataFrame
Rowloadprecrt_truncerrdoftPr(>|t|)
StringStringFloat64Float64Int64Float64Float64
1no > yesbreak-148.142119.3221782-1.241530.214572
2nobreak > maintain676.078119.34517825.664911.71183e-8
3no > yesbreak > maintain510.632119.32217824.279451.97306e-5
4yes > nobreak > maintain824.22119.33317826.906876.86976e-12
5yesbreak > maintain658.775119.3117825.521523.85549e-8
6no > yesmaintain-165.446119.3331782-1.386410.165794
empairs(mixed_model2; dof=nobs(mixed_model) - sum(leverage(mixed_model)))
6×7 DataFrame
Rowloadprecrt_truncerrdoftPr(>|t|)
StringStringFloat64Float64Float64Float64Float64
1no > yesbreak-148.142119.3221704.63-1.241530.214579
2nobreak > maintain676.078119.3451704.635.664911.7235e-8
3no > yesbreak > maintain510.632119.3221704.634.279451.97767e-5
4yes > nobreak > maintain824.22119.3331704.636.906876.97054e-12
5yesbreak > maintain658.775119.311704.635.521523.87932e-8
6no > yesmaintain-165.446119.3331704.63-1.386410.165802
empairs(mixed_model2; dof=nobs(mixed_model) - sum(size(mixed_model)[2:3]))
6×7 DataFrame
Rowloadprecrt_truncerrdoftPr(>|t|)
StringStringFloat64Float64Int64Float64Float64
1no > yesbreak-148.142119.3221693-1.241530.21458
2nobreak > maintain676.078119.34516935.664911.72535e-8
3no > yesbreak > maintain510.632119.32216934.279451.9784e-5
4yes > nobreak > maintain824.22119.33316936.906876.9866e-12
5yesbreak > maintain658.775119.3116935.521523.8831e-8
6no > yesmaintain-165.446119.3331693-1.386410.165803
empairs(mixed_model2; dof=Inf)
6×7 DataFrame
Rowloadprecrt_truncerrdoftPr(>|t|)
StringStringFloat64Float64Float64Float64Float64
1no > yesbreak-148.142119.322Inf-1.241530.214408
2nobreak > maintain676.078119.345Inf5.664911.47106e-8
3no > yesbreak > maintain510.632119.322Inf4.279451.87356e-5
4yes > nobreak > maintain824.22119.333Inf6.906874.9548e-12
5yesbreak > maintain658.775119.31Inf5.521523.36087e-8
6no > yesmaintain-165.446119.333Inf-1.386410.16562

These values are all very similar to each other, because the $t$ distribution rapidly converges to the $z$ distribution for $t > 30$ and so amount of probability mass in the tails does not change much for a model with more than a thousand observations and 30+ levels of each grouping variable.

Multiple Comparisons Correction

A perhaps more important concern is inflation of Type-1 error rate through multiple testing. We can also provide a correction method to be applied to the p-values.

using MultipleTesting
 bonferroni(pvals) = adjust(PValues(pvals), Bonferroni())
-empairs(mixed_model2; dof=Inf, padjust=bonferroni)
6×7 DataFrame
Rowloadprecrt_truncerrdoftPr(>|t|)
StringStringFloat64Float64Float64Float64Float64
1no > yesbreak-148.142119.322Inf-1.241531.0
2nobreak > maintain676.078119.345Inf5.664918.82634e-8
3no > yesbreak > maintain510.632119.322Inf4.279450.000112414
4yes > nobreak > maintain824.22119.333Inf6.906872.97288e-11
5yesbreak > maintain658.775119.31Inf5.521522.01652e-7
6no > yesmaintain-165.446119.333Inf-1.386410.993722

A Final Note

There are many subleties in more advanced applications of EM means. These are discussed at length in the documentation for the R package emmeans. For example:

All of these problems are fundamental to EM means as a technique and not particular to the software implementation. Effects.jl does not yet support everything that R's emmeans does, but all of the fundamental statistical concerns discussed in the latter's documentation still apply.

+empairs(mixed_model2; dof=Inf, padjust=bonferroni)
6×7 DataFrame
Rowloadprecrt_truncerrdoftPr(>|t|)
StringStringFloat64Float64Float64Float64Float64
1no > yesbreak-148.142119.322Inf-1.241531.0
2nobreak > maintain676.078119.345Inf5.664918.82634e-8
3no > yesbreak > maintain510.632119.322Inf4.279450.000112414
4yes > nobreak > maintain824.22119.333Inf6.906872.97288e-11
5yesbreak > maintain658.775119.31Inf5.521522.01652e-7
6no > yesmaintain-165.446119.333Inf-1.386410.993722

A Final Note

There are many subleties in more advanced applications of EM means. These are discussed at length in the documentation for the R package emmeans. For example:

All of these problems are fundamental to EM means as a technique and not particular to the software implementation. Effects.jl does not yet support everything that R's emmeans does, but all of the fundamental statistical concerns discussed in the latter's documentation still apply.

diff --git a/previews/PR56/index.html b/previews/PR56/index.html index 2896408..1cf6162 100644 --- a/previews/PR56/index.html +++ b/previews/PR56/index.html @@ -85,4 +85,4 @@ effects(Dict(:age => [15]), mod_imbalance)
1×5 DataFrame
Rowageweighterrlowerupper
Int64Float64Float64Float64Float64
115110.7280.712056110.016111.44

Finally, we note that the user can specify alternative functions for computing the typical values. For example, specifying the mode for the imbalanced dataset results in effects estimates for the most frequent level of sex, i.e. female:

using StatsBase
 effects(Dict(:age => [15]), mod_imbalance; typical=mode)
1×5 DataFrame
Rowageweighterrlowerupper
Int64Float64Float64Float64Float64
115107.580.48853107.091108.068

Note that these should be scalar valued functions, so we can use minimum or maximum but not extrema:

effects(Dict(:sex => ["female", "male"]), mod_imbalance; typical=maximum)
2×5 DataFrame
Rowsexweighterrlowerupper
StringFloat64Float64Float64Float64
1female124.6270.746242123.881125.373
2male154.080.96724153.113155.047
using Test
 @test_throws ArgumentError effects(Dict(:sex => ["female", "male"]), mod_imbalance; typical=extrema)
Test Passed
-      Thrown: ArgumentError
+ Thrown: ArgumentError diff --git a/previews/PR56/search/index.html b/previews/PR56/search/index.html index 1cc10a9..e3982e2 100644 --- a/previews/PR56/search/index.html +++ b/previews/PR56/search/index.html @@ -1,2 +1,2 @@ -Search · Effects.jl

Loading search...

    +Search · Effects.jl

    Loading search...