Skip to content

Unexpected coding of factor variables in aft() #32

@alexploner

Description

@alexploner

Running the default example from the aft help page yields an estimate for the numerical binary exposure hormon as expected, with the intercept seemingly integrated into the baseline spline term nsx():

> summary(aft(Surv(rectime,censrec==1)~hormon,data=brcancer,df=4))
Maximum likelihood estimation

Call:
bbmle::mle2(minuslogl = negll, start = coef, eval.only = TRUE, 
    vecpar = TRUE, gr = gradient, control = control)

Coefficients:
                                      Estimate Std. Error  z value     Pr(z)    
hormon                                0.285474   0.094189   3.0309  0.002438 ** 
nsx(logtstar, df, intercept = TRUE)1 -0.272989   0.235853  -1.1575  0.247086    
nsx(logtstar, df, intercept = TRUE)2  2.045560   0.214611   9.5315 < 2.2e-16 ***
nsx(logtstar, df, intercept = TRUE)3 -7.387203   0.586909 -12.5866 < 2.2e-16 ***
nsx(logtstar, df, intercept = TRUE)4  4.361118   0.346428  12.5888 < 2.2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

-2 log L: 5215.435 

However, if I include the hormon-variable as a factor, I get contrast coding with two df / parameters for hormonal exposure:

> brcancer$hormon_factor <- factor(brcancer$hormon)
> summary(aft(Surv(rectime,censrec==1)~hormon_factor,data=brcancer,df=4))
Maximum likelihood estimation

Call:
bbmle::mle2(minuslogl = negll, start = coef, eval.only = TRUE, 
    vecpar = TRUE, gr = gradient, control = control)

Coefficients:
                                      Estimate Std. Error z value     Pr(z)    
hormon_factor0                        0.215612   0.146446  1.4723  0.140939    
hormon_factor1                        0.471117   0.148454  3.1735  0.001506 ** 
nsx(logtstar, df, intercept = TRUE)1 -0.083068   0.212022 -0.3918  0.695212    
nsx(logtstar, df, intercept = TRUE)2  1.806920   0.228551  7.9060 2.658e-15 ***
nsx(logtstar, df, intercept = TRUE)3 -6.229016   0.903396 -6.8951 5.382e-12 ***
nsx(logtstar, df, intercept = TRUE)4  4.041162   0.339188 11.9142 < 2.2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

-2 log L: 5214.279 

I don't know whether this is intended, but it is not what I would expect when using a standard formula interface in an R modelling function.

Comments:

  1. I assume that doing my own dummy coding for any factor variable included as predictor would do the trick.
  2. Not sure whether this is related, but setting up starting values for the parameters (including removing the intercept), as seen in the aft-code below, seems to suffer from copy & paste:
   lm0.obj <- if (cure) lm(logHhat ~ nsx(logtstar, df, intercept = TRUE, cure = cure) - 1, dataEvents)
              else      lm(logHhat ~ nsx(logtstar, df, intercept = TRUE) - 1, dataEvents)

System info:

> packageVersion("rstpm2")
[1] ‘1.6.2’
> R.version
               _                           
platform       x86_64-pc-linux-gnu         
arch           x86_64                      
os             linux-gnu                   
system         x86_64, linux-gnu           
status                                     
major          4                           
minor          3.1                         
year           2023                        
month          06                          
day            16                          
svn rev        84548                       
language       R                           
version.string R version 4.3.1 (2023-06-16)
nickname       Beagle Scouts    

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions