WGS data #104

jjfarrell · 2024-04-07T15:38:03Z

The pipeline looks like it is optimized for processing imputed vcf data from UMICH or TOPMed imputation server which generates a DS field. Is is possible to run the pipeline on GATK WGS sequencing data without the DS field. Or does that need to be calculated with the PL field and written out to plink format before running the pipeline?

aaleksandrov95 · 2024-04-29T08:13:57Z

Are there any updates on this issue? I believe I am having a similar problem in our implementation of the pipeline for GATK WGS.

seppinho · 2024-05-16T18:41:50Z

Hi,
I just double checked the Regenie repo, and regenie uses either DS or GT. So in case you want to use our pipeline, you have to convert it first (e.g. with plink2). If you have a working command, I'm happy to integrate that as a step into the pipeline. I think thats useful for many!

See here: rgcgithub/regenie#114 (comment)

aaleksandrov95 · 2024-06-06T11:17:41Z

I got it to work by converting the VCF to BED using plink2. I also saw in several other issues, such as rgcgithub/regenie#209, that a Oxford Sample file may help with the missing values error, which kept occurring for me, so I generated one as well, again using plink2.

The only tricky part was to keep the IID and FID consistent with the internal workings of the pipeline, but now it seems to run fine.

EDIT: Here are the PLINK2 commands for reference.

VCF-to-BED:

plink2 --vcf ${input_vcf_file} \
        --fam ${path}/samples-sex.nf_gwas.psam \
        --double-id \
        --split-par 'hg38' \
        --output-chr chrM \
        --set-all-var-ids @:#:ref\$r-alt\$a --new-id-max-allele-len 527 \
        --make-bed \
        --out ${output_path}

Making Oxford .sample file

plink2 --vcf ${input_vcf_file} \
        --fam ${path}/samples-sex.nf_gwas.psam \
        --split-par 'hg38' \
        --output-chr chrM \
        --set-all-var-ids @:#:ref\$r-alt\$a --new-id-max-allele-len 527\
        --recode oxford \
        --out ${output_path}

As mentioned, I added the oxford .sample file, because of several missing values/ invalid sample names errors, as linked in the issue above.

seppinho · 2024-06-06T13:45:31Z

Great to hear. Can you also share the commands, in case someone else is running into the same issue?
Best.
Sebastian

seppinho mentioned this issue Dec 20, 2024

Handling - Large scale Individual WGS VCF's #107

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WGS data #104

WGS data #104

jjfarrell commented Apr 7, 2024

aaleksandrov95 commented Apr 29, 2024 •

edited

Loading

seppinho commented May 16, 2024

aaleksandrov95 commented Jun 6, 2024 •

edited

Loading

seppinho commented Jun 6, 2024

WGS data #104

WGS data #104

Comments

jjfarrell commented Apr 7, 2024

aaleksandrov95 commented Apr 29, 2024 • edited Loading

seppinho commented May 16, 2024

aaleksandrov95 commented Jun 6, 2024 • edited Loading

seppinho commented Jun 6, 2024

aaleksandrov95 commented Apr 29, 2024 •

edited

Loading

aaleksandrov95 commented Jun 6, 2024 •

edited

Loading