Skip to content

Latest commit

 

History

History
177 lines (135 loc) · 4.7 KB

README.md

File metadata and controls

177 lines (135 loc) · 4.7 KB
output
github_document

dbmm: Dynamic Bayesian Measurement Models

Installation

You can install the development version of dbmm from GitHub with:

# install.packages("devtools")
devtools::install_github("devincaughey/dbmm")

You will also need to install cmdstanr (for instructions, see here).

Overview

The R package dbmm fits dynamic Bayesian measurement models using the programming language Stan via the R package cmdstanr. Currently, the only supported model is a dynamic factor (DF) model for indicators of mixed type (binary, ordinal, or metric). In the future, however, the package will incorporate other models, including the dynamic group-level item response theory (DGIRT) model currently implemented by the R package dgo.

Workflow

Using dbmm involves the following steps:

  1. Shape the data into the list format required by cmdstanr.
  2. Fit the required model in Stan using.
  3. Extract parameter draws from the fitted model.
  4. If needed, identify the model the model by rotating and/or sign-flipping the draws.
  5. Check the convergence diagnostics of the fitted model.
  6. Label the draws with informative parameter names.
  7. Summarize and plot the posterior distributions.

Step 1: Shape data

## Load data on societal attributes of U.S. states in 2020 and 2021
data("social_outcomes_2020_2021")

## Drop observations with missing values
social_outcomes_2020_2021 <- na.omit(social_outcomes_2020_2021)

## Shape the data into list form
shaped_data <- shape_data(
    long_data = social_outcomes_2020_2021,
    unit_var = "st",                      
    time_var = "year",                    
    item_var = "outcome",                 
    value_var = "value",                  
    periods_to_estimate = 2020:2021,      
    ordinal_items = NA,
    binary_items = NA,
    max_cats = 10,
    standardize = TRUE,
    make_indicator_for_zeros = TRUE
)

Step 2: Fit the model

You can specify additional arguments for cmdstanr::sample(). For details, see here.

options(mc.cores = parallel::detectCores()) # for parallizing across chains
fitted <- fit(
    data = shaped_data,
    n_dim = 2,
    chains = 2,
    parallelize_within_chains = FALSE,
    constant_alpha = FALSE,
    separate_eta = TRUE,
    init_kappa = FALSE,
    force_recompile = FALSE,
    iter_warmup = 500, 
    iter_sampling = 500,
    adapt_delta = .9,
    refresh = 10,
    seed = 123
)

Step 3: Extract draws

fitted_draws <- extract_draws(fitted)
head(fitted_draws)

Step 4: Identify the model

identified_draws <- identify_draws(fitted_draws, rotate = TRUE)
## (To apply varimax rotation, set `rotate = TRUE`.)

Step 5: Check convergence of the identified model

## Basic check
check_convergence(identified_draws$id_draws)
## Traceplot of selected parameters
bayesplot::mcmc_trace(identified_draws$id_draws, pars = "lambda_metric[23,2]")
## More details 
summarized_draws <- summary(identified_draws$id_draws)
summary(summarized_draws)

Step 6: Label parameters

labeled_draws <- label_draws(identified_draws)
head(labeled_draws$eta)
head(labeled_draws$lambda_metric)

Step 7: Summarizing and plotting the posterior draws

library(tidyverse)

## Posterior mean and standard deviation of the factor scores and item loadings
eta_summ <- labeled_draws$eta %>%
    summarise(
        est = mean(value),
        err = sd(value),
        .by = c(TIME, UNIT, dim)
    )
head(eta_summ)
lambda_metric_summ <- labeled_draws$lambda_metric %>%
    summarise(
        est = mean(value),
        err = sd(value),
        .by = c(ITEM, dim)
    )
head(lambda_metric_summ)

## Plot item loadings
lambda_metric_summ %>%
    pivot_wider(
        id_cols = "ITEM",
        names_from = "dim",
        values_from = c("est", "err")
    ) %>%
    ggplot() +
    aes(x = est_1, y = est_2, label = ITEM) +
    geom_vline(xintercept = 0, linetype = "dotted") +
    geom_hline(yintercept = 0, linetype = "dotted") +
    geom_point() +
    geom_linerange(
        aes(xmin = est_1 - 1.96*err_1, xmax = est_1 + 1.96*err_1),
        alpha = 1/4, linewidth = 2
    ) +
    geom_linerange(
        aes(ymin = est_2 - 1.96*err_2, ymax = est_2 + 1.96*err_2),
        alpha = 1/4, linewidth = 2
    ) +
    ggrepel::geom_text_repel() +
    labs(title = "Item Loadings") +
    coord_fixed()