Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frailty survival model #580

Merged
merged 24 commits into from
Nov 28, 2023
Merged

Conversation

NathanielF
Copy link
Contributor

@NathanielF NathanielF commented Oct 2, 2023

Frailty Models - Hierarchical Survival Models

Related to this issue: #579

Leaving as Draft for now.

Helpful links


📚 Documentation preview 📚: https://pymc-examples--580.org.readthedocs.build/en/580/

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@review-notebook-app
Copy link

review-notebook-app bot commented Oct 8, 2023

View / edit / reply to this conversation on ReviewNB

ricardoV94 commented on 2023-10-08T23:16:38Z
----------------------------------------------------------------

Any reason not to use pm.Censored for the likelihood ?


NathanielF commented on 2023-10-09T07:54:47Z
----------------------------------------------------------------

In the case of the CoxPH regression the "Poisson trick" is a classic of the literature and it works to give me the results i was expecting. It's also consistent with the approach already documented in Austin's notebook.

More generally in the case of the AFT models below. I tried using the pm.Censored with the Weibull regression, but it gave me garbage results and was allot slower than using the Potential. And again in the case of the Weibull AFT my current parameterisation gives the expected results.

But maybe i'm missing something, was there a pattern of usage you had in mind w.r.t. to using censored liklihoods for survival?

Copy link
Contributor Author

In the case of the CoxPH regression the "Poisson trick" is a classic of the literature and it works to give me the results i was expecting. It's also consistent with the approach already documented in Austin's notebook.

More generally in the case of the AFT models below. I tried using the pm.Censored with the Weibull regression, but it gave me garbage results and was allot slower than using the Potential. And again in the case of the Weibull AFT my current parameterisation gives the expected results.

But maybe i'm missing something, was there a pattern of usage you had in mind w.r.t. to using censored liklihoods for survival?


View entire conversation on ReviewNB

@ricardoV94
Copy link
Member

ricardoV94 commented Oct 9, 2023

But maybe i'm missing something, was there a pattern of usage you had in mind w.r.t. to using censored likelihoods for survival?

From a first glance it sounded like you just wrote your own Censored likelihood by hand. Since we implemented it in PyMC we've been moving examples towards using the Censored factory instead because that fits much more with the PyMC vibe. Similarly, nobody is writing Potentials for Mixture likelihoods either, because we have pm.Mixture.

In the case of the CoxPH regression the "Poisson trick" is a classic of the literature

I am not familiar with the Poisson trick. Google didn't elucidate it quickly for me.

@ricardoV94
Copy link
Member

ricardoV94 commented Oct 9, 2023

More generally in the case of the AFT models below. I tried using the pm.Censored with the Weibull regression, but it gave me garbage results and was allot slower than using the Potential.

I was talking about this one yes, not the Poisson trick. Sounds like a bug or a difference in the parametrization of the PyMC-defined Weibull and what you're comparing against. Using pm.Censored could be slower but shouldn't give different results. Logp wise it should be equivalent to what you did with the Potential.

@NathanielF
Copy link
Contributor Author

Can you show me an example of how to use the censored distribution for AFT models. This is what i tried....
image

@NathanielF
Copy link
Contributor Author

I am not familiar with the Poisson trick. Google didn't elucidate it quickly for me.

See maybe here: https://cran.r-project.org/web/packages/survival/vignettes/approximate.pdf
Also, in a textbook i have at home...

@ricardoV94
Copy link
Member

ricardoV94 commented Oct 9, 2023

In that example you didn't pass observed.

When you have mixed censored and not censored you have to pass an array to upper with +inf (or anything above y) for the uncensored obs and y for the censored ones. Something like

pm.Censored("obs", dist, lower=None, upper=np.where(cens, y, np.inf), observed=y)

@NathanielF
Copy link
Contributor Author

Ah... i did miss observed... but now i just get a crash out:
image

image

Similar crashes for different upper bounds

@ricardoV94
Copy link
Member

Maybe try upper=np.where(cens, y, y+1). The last value has to be larger than the observations when they are not censored. np.inf could be leading to numerical issues in the logcdf

@NathanielF
Copy link
Contributor Author

NathanielF commented Oct 9, 2023

So that does fit:
image

But gives predictions out of line with Lifelines and my prior Weibull fit. Top table here is the new fit predictions. Bottom is the lifelines predictions and my Potential based fit agreed with Lifelines here
image

@ricardoV94
Copy link
Member

ricardoV94 commented Oct 9, 2023

Your upper is still weird. Do you have a constant censoring at y=20? If that's the case you should just be able to set upper=20

upper is the point of censoring. You are not allowed to observe any value beyond upper. If it matches exactly with upper it means that observation was censored. If it is below, it was not censored. upper=np.where(cens, y+1, 20) is odd. It say for the cases where cens is True, the censoring point is y+1 (which means it won't be treated as censored in the logp, because it's always higher than the observed value), otherwise it's 20.

@NathanielF
Copy link
Contributor Author

NathanielF commented Oct 9, 2023

Ugh, sorry. Don't know where my head is this morning!

The max y in the data set is 12. But it crashes out under this parameterisation:
image

@ricardoV94
Copy link
Member

ricardoV94 commented Oct 9, 2023

Just as a sanity check do you get a -inf logp at the starting point if you just use a vanilla Weibull (without censoring). I wonder if the PyMC parametrization is just different than what you were working with.

@NathanielF
Copy link
Contributor Author

You mean, just run the observed like this:
image

Seems to run fine

@ricardoV94
Copy link
Member

ricardoV94 commented Oct 9, 2023

No, run it with all observations, censored and non-censored as if they were all uncensored

@NathanielF
Copy link
Contributor Author

Seems to run just fine with all observations:
image

@ricardoV94
Copy link
Member

ricardoV94 commented Oct 9, 2023

Interesting, that suggests a precision issue with the CDF (or a bug in the CDF). Can you open a GitHub issue with a minimal reproducible example?

@NathanielF
Copy link
Contributor Author

@ricardoV94 added the issue here: pymc-devs/pytensor#471

Copy link
Contributor Author

Fixed


View entire conversation on ReviewNB

Copy link
Contributor Author

Fixed


View entire conversation on ReviewNB

Copy link
Contributor Author

Adjusted


View entire conversation on ReviewNB

Copy link
Contributor Author

Adjusted


View entire conversation on ReviewNB

Copy link
Contributor Author

Added


View entire conversation on ReviewNB

Copy link
Contributor Author

Thanks so much for the detailed feedback!

I think I've addressed all the above comments and it's much stronger now.

Copy link
Contributor Author

Indeed!


View entire conversation on ReviewNB

Copy link
Contributor Author

Replaced with print statement


View entire conversation on ReviewNB

@NathanielF
Copy link
Contributor Author

I'm happy with this now! Caught a few extra typos and mistakes in the writing, but basically the same with adjustments for your feedback @drbenvincent. Thanks again for the detailed review!

@NathanielF
Copy link
Contributor Author

Just giving this another nudge @drbenvincent . If you have time it'd be great to get it over the line this week.

@drbenvincent
Copy link
Contributor

Thanks for the nudge. Fingers crossed I'll have time in the next few days

Copy link
Contributor

"In the context of a failure modelling" -> "In the context of failure modelling"


View entire conversation on ReviewNB

Copy link
Contributor

Thanks


View entire conversation on ReviewNB

@drbenvincent
Copy link
Contributor

Found a typo "strucuture"

@drbenvincent
Copy link
Contributor

It could be worth clarifying this is Wilkinson notation?
Screenshot 2023-11-28 at 15 22 42

@drbenvincent
Copy link
Contributor

A pedantic/stylistic point, but we could label the line (presumably posterior mean) and make the 50 and 99% credible intervals the same colour but different opacity?

c53264a4ea273fdc2f404812d46396fd767482d95c5bab12146b01c70aeec457

@drbenvincent
Copy link
Contributor

I'm not sure what I feel about this, but I'm wondering if in this case of utility functions we might want to add type hints just to help the reader out a bit

def cum_hazard(hazard):
    return hazard.cumsum(dim="intervals")


def survival(hazard):
    return [np.exp](https://numpy.org/doc/stable/reference/generated/numpy.exp.html#numpy.exp)(-cum_hazard(hazard))


def get_mean(trace):
    return trace.mean(("draw", "chain"))

@drbenvincent
Copy link
Contributor

Where we have the cell az.plot_compare, it could be good to add a reference to the model comparison tag (https://pymcio--580.org.readthedocs.build/projects/examples/en/580/blog/tag/model-comparison.html)

@drbenvincent
Copy link
Contributor

drbenvincent commented Nov 28, 2023

Missing capitalisation. Doesn't follow on from a previous unfinished sentence.

"which allows us to pull out the gender specific..."

Same here:

"which suggests that the model over.."

and

"where we see a stark difference..."

@drbenvincent
Copy link
Contributor

Cool! Have added a set of comments above. Mostly small grammatical or stylistic things. Great addition to the set of examples.

Signed-off-by: Nathaniel <[email protected]>
@NathanielF
Copy link
Contributor Author

Thanks @drbenvincent, addressed those notes! Pleased with this one.

Copy link
Contributor

@drbenvincent drbenvincent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done

@drbenvincent drbenvincent merged commit cd77a9b into pymc-devs:main Nov 28, 2023
2 checks passed
@NathanielF
Copy link
Contributor Author

Thank you Ben! Appreciate it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants