Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] AalenAdditive regressor predicts improper survival function #1606

Open
fkiraly opened this issue Apr 17, 2024 · 5 comments
Open

[BUG] AalenAdditive regressor predicts improper survival function #1606

fkiraly opened this issue Apr 17, 2024 · 5 comments

Comments

@fkiraly
Copy link

fkiraly commented Apr 17, 2024

The AalenAdditive regressor predicts improper survival functions, i.e., functions that are not monotonous decreasing, or staying in the expected range [0,1]. Observed with lifelines 0.28.0.

To reproduce:

import numpy as np
import pandas as pd
from sklearn.datasets import load_diabetes

X, y = load_diabetes(return_X_y=True, as_frame=True)
df = pd.concat([X, y], axis=1)

from lifelines.fitters.aalen_additive_fitter import AalenAdditiveFitter

aaf = AalenAdditiveFitter()
aaf.fit(df, duration_col="target")

y_pred_surv = aaf.predict_survival_function(df)

# not monotonous decreasing
np.sum(y_pred_surv.diff() > 0)  # entries count increasing diff, should all be 0

# outside expected range [0, 1]  # entries count strictly above 1, should all be 0
np.sum(y_pred_surv > 1)
@bachnguyen-tomo
Copy link

bachnguyen-tomo commented Apr 19, 2024

This is mentioned here as an artifact of the model. My intuition for that is because of the additive form of the hazard function, rather multiplicative and exponentiated like Cox.
image

@fkiraly
Copy link
Author

fkiraly commented Apr 19, 2024

Hm, I wouldn't agree that this is a valid explanation, @bachnguyen-tomo.
I see the box in the documentatiton which makes the claim in alignment wiht your statement, but I'm not sure whether I agree to that. Why:

Any non-negative, integrable function with infinite integral is a valid hazard function - this can be seen from writing the survival function as

$$S(t) = \exp \left( - \int_0^t h(x) dx \right)$$

(this is a well-known proposition that relates the survial function and the hazard function/distribution)

So, no matter what the above equates to, as long as $h$ is a non-negative function, the survival function should be proper.

In consequence of this theorem, there might be a bug?

@fkiraly
Copy link
Author

fkiraly commented Apr 19, 2024

But, I suppose this answers the more pragmatic question sufficiently, on whether this is something that people would expect to happen.

Given the note in the documentation, it seems that this is expected (in the social sense) behaviour of the algorithm, and in that sense, we could close this issue.

@fkiraly
Copy link
Author

fkiraly commented Apr 19, 2024

PS @bachnguyen-tomo, in case you have some input on what models that produce full distributions should do in this case, contribution here would be appreciated: sktime/skpro#249

@bachnguyen-tomo
Copy link

@fkiraly The equation above assumes that the hazard function is non-negative though, which is the main drawback of the regressor, it doesn't guarantee non-negative hazard. First page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants