Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How many populations are need for this tool? #2

Open
zhiqinlong opened this issue Apr 20, 2024 · 5 comments
Open

How many populations are need for this tool? #2

zhiqinlong opened this issue Apr 20, 2024 · 5 comments

Comments

@zhiqinlong
Copy link

Hi Silas,
Thanks for your wonderful work for developing this tool.
I have a question regarding the use of populations in the analysis. Is it possible to use a total of four populations in the following scenarios: two species under selection and the other two not, three populations under selection, or all four populations under selection?
You mentioned in your paper that the necessity of using at least three populations under selection and three that are not showing a selection signal. Could you please clarify if using four populations, as described above, is feasible for the rdmc tool you developed?
By the way, if it is resonable I use the species as a unit rather than population.
I am eager to hear your thoughts on this matter. If there are any points of confusion or if further clarification is needed, please do not hesitate to reach out to me.
Thank you for your time and assistance.

@silastittes
Copy link
Owner

Hi @zhiqinlong! Two populations under selection and two that are not should work. This was the set up used for the mimulus example in Lee and Coop (2017) paper. Apologies for the confusion I created by saying three of each are needed in the rdmc paper. Using species as the unit instead of populations seems a bit shaky to me. I suppose it depends on how divergent they are. The theory was definitely developed for populations, and the model assumes the adapted allele went to fixation recently -- unlikely to be the case for distinct species. You could certainly give it a try, but I'm not sure if and how things would break down. Hope this helps some!

@zhiqinlong
Copy link
Author

zhiqinlong commented Apr 22, 2024

Hi Silas,
Thanks for your help.
My species diverged around 2 million years ago. I don't think this software can accurately handle this. Could you explain more about why the divergence level affects accuracy? Is it because highly divergent species evolved independently for a long time? Futhermore, why do we need two populations that are not under selection? If you can provide me with a more professional explanation, I would appreciate it.

@silastittes
Copy link
Owner

Let me start by saying I didn't develop any of the theory behind these methods. I didn't even write a lot of the original code. I really just wanted to make it easier to use these methods for my data and did so in a way that others could use too. Kristin Lee and Graham Coop are really the authorities on this matter.

My main concern is that when species have been diverging for a long time (longer than 2*Ne generations, where Ne is the long term effective population size), the ability for things like migration and standing variation to be shared AND detectable across them is reduced. So any signature of a shared selective sweep that is detectable between them is more likely to be due to independent mutations at the same locus. The whole goal of this approach is to distinguish between the modes of convergence, but the strong apriori expectation for convergence between very divergent groups is independent mutations. In evolutionary terms, selective sweeps happen relatively fast (4log(2Ne)/s generations, where s is the strength of selection -- usually around 10^-3). So the sweeps that were shared by migration or standing variation may have happened too long ago to have left a signature, or will have happened after divergence, so are unlikely to have transferred across the species. BUT, this depends on how reproductively isolated your species are, and how large the effective population size is. Most species have an Ne around 10^4 - 10^5 range. If that's the case for your species of interest, things that happened longer than 2 millions years ago probably won't be detectable.

Perhaps there is a lot of ongoing gene flow between the species? I don't know that anyone has explicitly studied these methods and their accuracy for highly divergent samples, so the above is just some guidelines that I think play a role. Using simulations to studying the scenerios of interest and/or developing the theory further are the only ways to be confident about what will happen.

If the ideas around population size I'm describing are unfamiliar, I highly recommend Graham's free textbook.
https://github.com/cooplab/popgen-notes

@zhiqinlong
Copy link
Author

Hi Silas,
your explainations helped me a lot. My palnts are four species of wild sunflower and they diverged at ~2mya. The Ne is about 5e5 and these four species have constant gene flow. My primary results showed that the sweeps shared by 2 species are mainly derived from migration (migration>standing>independent). So I think this tool will be ok for my data.

However, I have another question. in your "Not so local: the populationgenetics of convergentadaptation in maize and teosinte" paper, you detected the origin of sweeps shared by 2 to 9 of the 11 populations. you mentioned we have to provide 2 populations that are not under selection, so you didn't analysis the sweeps shared by 10 or 11 populations. I am curious about why we have to provide 2 non-selection populations? I used this tool by providing 4 slected populations and I found it still worked technically. If we just provied the selected populations, what is the concern about the results? Do you have any advice about how to detect the origin of sweeps regions share by all popualtions, becuase I have some sweeps shared by 4 species.
Sorry for so many questions. I wrote an email to the original author(Lee). I didn't get reply.
Thank you for your help again. I really appreciate you take the time to help me.

@silastittes
Copy link
Owner

You're welcome -- I'm happy to help! Always good to have the excuse brush-up on important concepts!
I haven't done a deep-dive on the Lee and Coop (2017) paper in a while, but my understanding is that the populations that did not undergo the sweep are required to contrast the patterns of allele frequency covariance that are present at the locus of interest when selection is not acting -- sort of a baseline expectation for the how neutral variation is structured in the region to quantitatively compare to. Does that make sense? The code does run if you say all the populations are under selection, but I'm not sure if the results are reliable. Might be worth asking Graham Coop about that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants