-
-
Notifications
You must be signed in to change notification settings - Fork 16
Open
Description
I think the current implementation of cross-classified disaggregation is missing a desiderata.
First let's note some desiderata that we do have:
- The "between" variable are simply the separate group means.
Make some crossed data
mu <- 100
ul <- setNames(c(-1, -3, 0, 4), nm = letters[1:4])
uL <- setNames(c(10, 30, 0, -40), nm = LETTERS[1:4])
um <- setNames(c(100, 150, -250), nm = month.abb[1:3])
dat <- expand.grid(l = letters[1:4], L = LETTERS[1:4], m = month.abb[1:3])
set.seed(111)
e <- rnorm(nrow(dat)-1) |> round(2)
e <- append(e, -sum(e))
dat$y <- mu + ul[dat$l] + uL[dat$L] + um[dat$m] + e
dat$z <- mu + ul[dat$l] + uL[dat$L] + um[dat$m] + 10*e
dat_dem <- datawizard::demean(dat, by = c("l", "L", "m"), select = c("y","z"))
all.equal(c(dat_dem$y_l_between), ave(dat$y, dat$l))
#> TRUE
all.equal(c(dat_dem$y_L_between), ave(dat$y, dat$L))
#> TRUE
all.equal(c(dat_dem$y_m_between), ave(dat$y, dat$m))
#> TRUE
- The sum of an observation's "between"/"within" variables is equal to the original observation
all.equal(rowSums(dat_dem[grepl("^y_", colnames(dat_dem))]), dat$y)
#> TRUE
What we don't have is that -- unlike with a single grouping variable or with nested designs -- the "within" variable is mean centered:
mean(dat_dem$y_within)
#> -200
This is equal to
-mean(dat$y) * (3-1)
#> -200
I think this is something we want, for consistency (typically "within" is considered to be automatically double-centered), however with a crossed design this cannot be achieved without compromising on desiderata 1 or 2.