Standard node classification in the ARG shares limitations of the CwR #44
Replies: 1 comment
-
Technically, I would say that the recombination event happens during the production of the gamete, rather that the gamete itself being the recombination event. The "events" are the processes that (directly) generate the genomes that we represent in a tree sequence, rather than the genomes themselves.
Surely that depends on the organism? If you are (for example) a moss biologist, you will be sampling the haploid stage by default. Likewise with male honeybees. Personally, I think that the issue comes down to the problem that we are trying to model the processes at the wrong level. We combine a whole set of coalescences and recombinations into a single "individual" because we think of an individual as indivisible. But that's not true. It's important because, for instance, we can have multiple differently related lineages within an individual. That's what you need when e.g. studying cancer, but it's also why you can (commonly, I would guess) have a de-novo mutation shared between 2 offspring of a single parent, but not shared by a 3rd offspring. It's not that there's been recurrent mutation in this case, but that there are multiple coalescences within an individual. |
Beta Was this translation helpful? Give feedback.
-
Key point
Through generating an ARG from a Wright-Fisher process in #43 a somewhat obvious point struck me: the way that we classify nodes in the standard ARG is based on the same approximations as the coalescent with recombination (for obvious reasons). In particular, we assume that events are rare enough that multiple don't happen at the same time - you can't have a node that is both a recombination and a common ancestor at the same time. When you imagine nodes as events then it makes sense to classify them like this, but if you think of nodes as corresponding to actual genomes it doesn't. It becomes much simpler to think of nodes as the monoploid genomes carried by individuals and the edges representing the transfer of ancestral material between these genomes. The model and process based terminology becomes unhelpful when you think about patterns of inheritance that don't fit the assumptions.
Objection: the coalescent is a good model, why not use the framework?
Response: The coalescent is a good model, but it's definitely an approximation. Human data in particular is in firmly in the regime where the coalescent assumptions are broken from many different directions. It seems artificial to force the observed data that we want to encode to fit the requirements of a model that we know are widely broken. Another scenario in which these assumptions are going to be badly broken is in the deeply sampled pedigree from a small population, like we have in cattle.
Objection: when you go down the cellular level these really are indivisible events
Response: There does exist a unique gamete that corresponds to a recombination, and this gamete can be said to correspond to the recombination "event", and coalescences happen at the . We have to ask what this really solves, though. We don't sample gametes when we're trying to study the genetics of a sample, we sample their genomes. There's no point in sampling the gametes, unless what we're specifically interested in is the gametes themselves.
So, certainly we can model the passage of ancestral material through gametes if we wish, but we'll still need to have the sampled genome. We'll surely want some representation of the parent's genomes as well, so why bother with the gametes (why, by definition, we know what the gamete is if we have a representation of the individual's two monoploid genomes).
Objection: it's just a data encoding it doesn't really matter once we all agree on it
Response: We don't all agree on it though - sharing of ARG data has a very poor record. Pretty hard to find anyone analysing someone's inferred ARGs without using their software I expect.
Beta Was this translation helpful? Give feedback.
All reactions