Unexpected coding of factor variables in aft()

Running the default example from the `aft` help page yields an estimate for the numerical binary exposure `hormon` as expected, with the intercept seemingly integrated into the baseline spline term `nsx()`:

```
> summary(aft(Surv(rectime,censrec==1)~hormon,data=brcancer,df=4))
Maximum likelihood estimation

Call:
bbmle::mle2(minuslogl = negll, start = coef, eval.only = TRUE, 
    vecpar = TRUE, gr = gradient, control = control)

Coefficients:
                                      Estimate Std. Error  z value     Pr(z)    
hormon                                0.285474   0.094189   3.0309  0.002438 ** 
nsx(logtstar, df, intercept = TRUE)1 -0.272989   0.235853  -1.1575  0.247086    
nsx(logtstar, df, intercept = TRUE)2  2.045560   0.214611   9.5315 < 2.2e-16 ***
nsx(logtstar, df, intercept = TRUE)3 -7.387203   0.586909 -12.5866 < 2.2e-16 ***
nsx(logtstar, df, intercept = TRUE)4  4.361118   0.346428  12.5888 < 2.2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

-2 log L: 5215.435 
```
However, if I include the `hormon`-variable as a factor, I get contrast coding with two df / parameters for hormonal exposure:
```
> brcancer$hormon_factor <- factor(brcancer$hormon)
> summary(aft(Surv(rectime,censrec==1)~hormon_factor,data=brcancer,df=4))
Maximum likelihood estimation

Call:
bbmle::mle2(minuslogl = negll, start = coef, eval.only = TRUE, 
    vecpar = TRUE, gr = gradient, control = control)

Coefficients:
                                      Estimate Std. Error z value     Pr(z)    
hormon_factor0                        0.215612   0.146446  1.4723  0.140939    
hormon_factor1                        0.471117   0.148454  3.1735  0.001506 ** 
nsx(logtstar, df, intercept = TRUE)1 -0.083068   0.212022 -0.3918  0.695212    
nsx(logtstar, df, intercept = TRUE)2  1.806920   0.228551  7.9060 2.658e-15 ***
nsx(logtstar, df, intercept = TRUE)3 -6.229016   0.903396 -6.8951 5.382e-12 ***
nsx(logtstar, df, intercept = TRUE)4  4.041162   0.339188 11.9142 < 2.2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

-2 log L: 5214.279 
```
I don't know whether this is intended, but it is not what I would **expect** when using a standard formula interface in an R modelling function. 

Comments:

1. I assume that doing my own dummy coding for any factor variable included as predictor would do the trick.
2. Not sure whether this is related, but setting up starting values for the parameters (including removing the intercept), as seen in the `aft`-code below, seems to suffer from copy & paste: 
```
   lm0.obj <- if (cure) lm(logHhat ~ nsx(logtstar, df, intercept = TRUE, cure = cure) - 1, dataEvents)
              else      lm(logHhat ~ nsx(logtstar, df, intercept = TRUE) - 1, dataEvents)
```

System info:
```
> packageVersion("rstpm2")
[1] ‘1.6.2’
> R.version
               _                           
platform       x86_64-pc-linux-gnu         
arch           x86_64                      
os             linux-gnu                   
system         x86_64, linux-gnu           
status                                     
major          4                           
minor          3.1                         
year           2023                        
month          06                          
day            16                          
svn rev        84548                       
language       R                           
version.string R version 4.3.1 (2023-06-16)
nickname       Beagle Scouts    
```




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unexpected coding of factor variables in aft() #32

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Unexpected coding of factor variables in aft() #32

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions