Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

culling by linkage #6

Open
brentp opened this issue Oct 5, 2019 · 5 comments
Open

culling by linkage #6

brentp opened this issue Oct 5, 2019 · 5 comments
Labels
enhancement New feature or request

Comments

@brentp
Copy link
Contributor

brentp commented Oct 5, 2019

select only 1 site from each linkage region for the PRS summation.

@mpinese
Copy link
Owner

mpinese commented Oct 9, 2019

Any ideas on implementation?

I haven't thought about it much as linkage pruning was a lower priority feature for me, but locus substitution (ie finding good substitutes for PRS loci that are not well-genotyped, rather than just totally giving up and imputing) would be a killer feature, and I think could use the same backend.

@brentp
Copy link
Contributor Author

brentp commented Oct 9, 2019

I was thinking that we could calculate R2 from the observed genotypes, given a large enough cohort.

Then choose, for example the single variant from a block with highest allele frequency.

@mpinese
Copy link
Owner

mpinese commented Oct 10, 2019

Ah yep that would work for LD pruning, though not for locus substitution. I've been thinking about adding support for something like a precomputed r2 file (https://www.cog-genomics.org/plink/1.9/ld#r) which is a good fit for the substitution problem, but would that address the linkage pruning issue for you?

I think I'm not really understanding the use case for linkage pruning, so let me know if I'm barking up the wrong tree here.

@brentp
Copy link
Contributor Author

brentp commented Oct 10, 2019

I don't have a lot of experience in this area, but I guess I'm thinking of something like this: https://www.prsice.info/step_by_step/#clumping

@mpinese
Copy link
Owner

mpinese commented Oct 10, 2019

Ah I see, yes for a GPRS (ie all loci w/o LD pruning in the discovery phase, coefficients from simple per-SNP tests) that would be useful to ensure that densely sampled LD blocks don't dominate the score. Something to add for genome-wide for sure, thanks.

@mpinese mpinese added the enhancement New feature or request label Oct 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants