Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling - Large scale Individual WGS VCF's #107

Open
snehaleela opened this issue Jul 9, 2024 · 1 comment
Open

Handling - Large scale Individual WGS VCF's #107

snehaleela opened this issue Jul 9, 2024 · 1 comment

Comments

@snehaleela
Copy link

Hi all, Can you please help with my understanding here - I am having large scale INDIVIDUAL WGS VCF files - want to run the NF-GWAS pipeline on the full dataset. I have the nextflow and infra ready to handle the size of this scale. ~50 Nodes - 64 CPUS 256 GB RAM

  1. Does the pipeline assume that the input has to be merged per chromosome for each VCF?
  2. Also, what all preprocessing steps are recommended before giving the input to the pipeline?
  3. For this scale do we need to use .bgen files only ? Was this scale of data tested on the VCF data for reginie to perform in the best way?
  4. If needed to create the merged VCF - can you confirm if this is the best method :
    (Each VCFs > Normalize(bcftools) > for each VCF - Pvar,Pgen,Psam > Merge to 1 - Pvar,Pgen,Psam(Plink) > Convert to bgen.
@seppinho
Copy link
Member

This issue is also discussed here: #104

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants