Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node times remain biased after multiple EP rounds (and bias depends on number of nodes) #444

Open
hyanwong opened this issue Dec 11, 2024 · 4 comments

Comments

@hyanwong
Copy link
Member

This is redating the Quebec genealogy, so there shouldn't be too many polytomies to mess things up. Here we are simply plotting the difference between true and (unconstrained) tsdate times for all nodes.

I was expecting these to asymptote to 0, at least on the linear scale (left), even if not on a log scale (right). It's weird to me that they don't

Screenshot 2024-12-11 at 13 28 00
@nspope
Copy link
Contributor

nspope commented Dec 11, 2024

Hmm ... why would you expect them to not have error? They should converge to the (true) posterior mean with more EP iterations, which is not going to equal the true times. In fact, the posterior mean may be systematically larger than the true times, if the posterior is right-skewed. Estimates should converge to the true times as you add more mutations (that is, the posterior is consistent)

Rather than looking at posterior mean vs true time, the quantity to look at to assess calibration is expected vs observed coverage.

@hyanwong
Copy link
Member Author

Hmm, I see. That sounds reasonable. I'll try a coverage plot too.

@hyanwong
Copy link
Member Author

(I would expect error, but not such clear bias)

@nspope
Copy link
Contributor

nspope commented Dec 11, 2024

If the posterior is right-skewed, it's reasonable to see positive bias (and even without skew there's no guarantee that the posterior mean is an unbiased estimator-- rather the opposite, with priors one trades variance for bias, though we're not using a very strong prior here at all). It's probably worth plotting error vs age to see what nodes this is coming from?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants