Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nested random intercepts? #406

Open
slamander opened this issue Feb 18, 2025 · 4 comments
Open

Nested random intercepts? #406

slamander opened this issue Feb 18, 2025 · 4 comments

Comments

@slamander
Copy link

Hi sdmTMB team,

I'm curious if it is possible to use nested random intercepts. Below I create a nested grouping variable, made up of group and site, based on rounded coordinates. However, I can only implement them as crossed random intercepts, instead of nested ones:

library(tidyverse)
library(sdmTMB)
#> Warning: package 'sdmTMB' was built under R version 4.3.3

# create lower level grouping variable, "site":
sites <- pcod %>%
  mutate(
         x_round1 = round(X, -1) %>% as.factor(),
         y_round1 = round(Y, -1) %>% as.factor()) %>%
  group_by(x_round1, y_round1) %>%
  mutate(site = row_number()) 

# create higher level grouping variable, "group":
groups <- pcod %>%
  mutate(x_round2 = round(X, -2) %>% as.factor(),
         y_round2 = round(Y, -2) %>% as.factor()) %>%
  group_by(x_round2, y_round2) %>%
  mutate(group = row_number()) 

# incorporate site and group into the pcod data:
pcod_nesting <- pcod %>%
  mutate(x_round1 = round(X, -1) %>% as.factor(),
         y_round1 = round(Y, -1) %>% as.factor(),
         x_round2 = round(X, -2) %>% as.factor(),
         y_round2 = round(Y, -2) %>% as.factor()) %>%
  left_join(sites) %>%
  left_join(groups) %>%
  select(-c(x_round1, x_round2, y_round1, y_round2)) %>%
  mutate(site = as.factor(site),
         group = as.factor(site))

# create mesh
mesh <- make_mesh(pcod_nesting, c("X", "Y"), cutoff = 10)

# run sdmTMB with crossed random intercepts for "group" and "site":
cross <- sdmTMB(
  formula = present ~ depth_scaled + depth_scaled2 + (1 | group) + (1 | site),
  mesh = mesh, # can be omitted for a non-spatial model
  family = binomial(link = "logit"),
  spatial = "off",
  data = pcod_nesting)

# run sdmTMB with nested random intercepts for "group" and "site":
nest <- sdmTMB(
  formula = present ~ depth_scaled + depth_scaled2 + (1 | group / site),
  mesh = mesh, # can be omitted for a non-spatial model
  family = binomial(link = "logit"),
  spatial = "off",
  data = pcod_nesting)
#> Error: Random effect group column `site:group` is not a factor.

An error is thrown indicating that it recognizes the syntax of the nested term, but it is looking for a single column for the data?

Any help would be appreciated.

-Alex.

@seananderson
Copy link
Member

The nesting syntax is just a convenience as far as I know. It's useful if subgroup IDs are not unique across groups. But you can make those subgroup IDs unique. I believe you can always fit the same model by recoding the IDs. E.g.

library(dplyr)
set.seed(99)
d <- tibble::tibble(
  frog_dens = rlnorm(100), 
  vegetation = rnorm(100), 
  pond = gl(25, 4), # generates factor levels
  transect = rep(gl(2, 2), 25))
d <- d |> mutate(transect2 = gl(50, 2))
d
#> # A tibble: 100 × 5
#>    frog_dens vegetation pond  transect transect2
#>        <dbl>      <dbl> <fct> <fct>    <fct>    
#>  1     1.24      -1.53  1     1        1        
#>  2     1.62      -0.501 1     1        1        
#>  3     1.09      -1.21  1     2        2        
#>  4     1.56      -0.630 1     2        2        
#>  5     0.696     -1.45  2     1        3        
#>  6     1.13      -0.167 2     1        3        
#>  7     0.422      1.59  2     2        4        
#>  8     1.63      -0.230 2     2        4        
#>  9     0.695     -0.573 3     1        5        
#> 10     0.274      0.563 3     1        5        
#> # ℹ 90 more rows

(m1 <- lmer(log(frog_dens) ~ vegetation + (1 | pond/transect), data = d))
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: log(frog_dens) ~ vegetation + (1 | pond/transect)
#>    Data: d
#> REML criterion at convergence: 265.1362
#> Random effects:
#>  Groups        Name        Std.Dev.
#>  transect:pond (Intercept) 0.2107  
#>  pond          (Intercept) 0.3100  
#>  Residual                  0.8243  
#> Number of obs: 100, groups:  transect:pond, 50; pond, 25
#> Fixed Effects:
#> (Intercept)   vegetation  
#>    -0.09739      0.07563

(m2 <- lmer(log(frog_dens) ~ vegetation + (1 | pond) + (1 | pond:transect), data = d))
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: log(frog_dens) ~ vegetation + (1 | pond) + (1 | pond:transect)
#>    Data: d
#> REML criterion at convergence: 265.1362
#> Random effects:
#>  Groups        Name        Std.Dev.
#>  pond:transect (Intercept) 0.2107  
#>  pond          (Intercept) 0.3100  
#>  Residual                  0.8243  
#> Number of obs: 100, groups:  pond:transect, 50; pond, 25
#> Fixed Effects:
#> (Intercept)   vegetation  
#>    -0.09739      0.07563

(m3 <- lmer(log(frog_dens) ~ vegetation +  (1 | pond) + (1 | transect2), data = d))
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: log(frog_dens) ~ vegetation + (1 | pond) + (1 | transect2)
#>    Data: d
#> REML criterion at convergence: 265.1362
#> Random effects:
#>  Groups    Name        Std.Dev.
#>  transect2 (Intercept) 0.2107  
#>  pond      (Intercept) 0.3100  
#>  Residual              0.8243  
#> Number of obs: 100, groups:  transect2, 50; pond, 25
#> Fixed Effects:
#> (Intercept)   vegetation  
#>    -0.09739      0.07563

Created on 2025-02-18 with reprex v2.1.1.9000

On a related topic, we have random slopes working in a branch and should merge into main soon. I'm not sure if we get the nested syntax for free in that branch.

@slamander
Copy link
Author

slamander commented Feb 18, 2025

Thanks for getting back to me so quickly! I'm not sure if you were identifying my factor formatting or random intercept syntax as the focal issue... I should have also mentioned that the alternative syntax for a nested intercept fails.

But, I took your example and made fake coordinates so I could put it into a sdmTMB model. I get the same original error for the first two nested intercept syntax calls. Except, on the final call (m3), which uses a crossed random effect and your modified transect factor (transect2) it actually does run successfully! So, I'm curious, is this a work around--in which you can use the syntax of a crossed random intercept but get a nested random intercept in function?

library(tidyverse)
library(sdmTMB)

d <- tibble::tibble(
  x = runif(100, 70, 82),
  y = runif(100, 120, 150),
  frog_dens = rlnorm(100), 
  vegetation = rnorm(100), 
  pond = gl(25, 4), # generates factor levels
  transect = rep(gl(2, 2), 25))

d <- d |> mutate(transect2 = gl(50, 2))

mesh <- make_mesh(d, c("x", "y"), n_knots = 85); plot(mesh)

m1 <- sdmTMB(log(frog_dens) ~ vegetation + (1 | pond/transect), mesh = mesh, data = d)
#> Error: Random effect group column `transect:pond` is not a factor.

m2 <- sdmTMB(log(frog_dens) ~ vegetation + (1 | pond) + (1 | pond:transect), mesh = mesh, data = d)
#> Error: Random effect group column `transect:pond` is not a factor.

m3 <- sdmTMB(log(frog_dens) ~ vegetation +  (1 | pond) + (1 | transect2), mesh = mesh, data = d)

Any clarification about what's happening on my end would be great!

Best,

-Alex.

@seananderson
Copy link
Member

I'm not sure if you were identifying my factor formatting or random intercept syntax as the focal issue

It's sort of both. A given factor coding requires a given random effect grouping syntax.

is this a work around--in which you can use the syntax of a crossed random intercept but get a nested random intercept in function?

You could think of it as a workaround to the identical model. The nested random effect syntax is just a convenience you can use if your random effects are coded as:

> dplyr::select(d, pond, transect)
   pond  transect
 1 1     1       
 2 1     1       
 3 1     2       
 4 1     2       
 5 2     1       
 6 2     1       
 7 2     2       
 8 2     2       

instead of:

> dplyr::select(d, pond, transect2)
   pond  transect2
 1 1     1        
 2 1     1        
 3 1     2        
 4 1     2        
 5 2     3        
 6 2     3        
 7 2     4        
 8 2     4  

I.e., the subgroups don't have unique IDs across groups.

A quick way to fix this would be to paste the group and subgroup IDs together into a new unique subgroup ID. That's what the pond:transect syntax above basically does by using the interaction syntax. And, yeah, that's probably not set up to work with our more simplistic parsing in sdmTMB.

So, these are all identical models:

set.seed(99)
d <- tibble::tibble(
  x = runif(100, 70, 82),
  y = runif(100, 120, 150),
  frog_dens = rlnorm(100), 
  vegetation = rnorm(100), 
  pond = gl(25, 4), # generates factor levels
  transect = rep(gl(2, 2), 25))
d <- d |> dplyr::mutate(transect2 = gl(50, 2))

# all identical models:
m1 <- glmmTMB::glmmTMB(log(frog_dens) ~ vegetation +  (1 | pond/transect), data = d)
m2 <- glmmTMB::glmmTMB(log(frog_dens) ~ vegetation +  (1 | pond) + (1 | transect2), data = d)
m_sdmTMB <- sdmTMB::sdmTMB(log(frog_dens) ~ vegetation +  (1 | pond) + (1 | transect2), spatial = "off", data = d)

logLik(m1)
#> 'log Lik.' -146.6231 (df=5)
logLik(m2)
#> 'log Lik.' -146.6231 (df=5)
logLik(m_sdmTMB)
#> 'log Lik.' -146.6231 (df=5)

Created on 2025-02-18 with reprex v2.1.1.9000

So if you want a 'nested' random intercept in sdmTMB, just make sure to code your subgroups as unique.

@slamander
Copy link
Author

Thank you for spelling this out for me! So would this be an example of using "natural" versus "implicit nesting" notation--in which the subgroupings is the only thing affecting the interaction variance? E.g. Schielzeth & Nakagawa, 2012.

Also, apologies for my misleading example--my real data do indeed have unique subgroupings. I hadn't considered making them unique in the example, as I assumed the issue arose elsewhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants