Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LDpred2: strategy for per-chromosome resolved PLINK .bed files? #210

Open
espenhgn opened this issue Nov 15, 2023 · 3 comments
Open

LDpred2: strategy for per-chromosome resolved PLINK .bed files? #210

espenhgn opened this issue Nov 15, 2023 · 3 comments
Labels
enhancement New feature or request no-issue-activity

Comments

@espenhgn
Copy link
Contributor

What should be our strategy for dealing with genotypes split across multiple files?
So far we've assumed a singular prefix.{bed|fam|bim} file set for LDpred2, but these can be split per chromosome.
Should we:

  1. Merge using PLINK prior to predictions?
  2. Extend createBackingFile.R script to allow for a list of files producing a single .bk/.rds file set?
  3. Treat files separately, compute scores per chromosome, and sum the predictions. LDpred2 author implies this would be ok (Combining chromosomes from .bed files privefl/paper-ldpred2#4 (comment))

The final option would allow for trivial parallelization.

@espenhgn espenhgn added the enhancement New feature or request label Nov 15, 2023
@deepchocolate
Copy link
Contributor

I think 3) is a bit messy as there would have to be 2 files for each chromosome and we would have to rewrite the PGS script. Feels like it will increase complexity.

I think I'd vote for 2. Maybe we could allow for an @ parameter in the flag for the bed file to the createBackingFile.R script. Another flag like --merge could then tell the script to put all genotype data in the .bk/.rds files.

The only drawback with 1 I can think of is that it would add a plink-step whose only purpose is to make the creatingBackingFile.R-script work as intended.

@espenhgn
Copy link
Contributor Author

Actually, for option 3 we won't need to modify anything in the R scripts; but add a Slurm job-array script template that distributes the tasks per chromosome (run createBackingFile.R and ldpred2.R per chr independently) plus another simple script that reads in the per-chr predictions and then sum the contributions.

Copy link

This issue appears to be stale due to non-activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request no-issue-activity
Projects
None yet
Development

No branches or pull requests

2 participants