Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IntegrateGQ.sh error trapping doesn't work, causing pipe failures #759

Open
1 task done
MattWellie opened this issue Dec 9, 2024 · 0 comments · May be fixed by #760
Open
1 task done

IntegrateGQ.sh error trapping doesn't work, causing pipe failures #759

MattWellie opened this issue Dec 9, 2024 · 0 comments · May be fixed by #760

Comments

@MattWellie
Copy link
Contributor

Bug Report

Affected module(s) or script(s)

IntegrateGQ.sh

Affected version(s)

  • Latest master branch as of Today (still present here)

Description

We're having issues running the GenotypeBatch WDL. We've gone with the default sharding value for n_per_split (from here). This has served us just fine so far, but we've now got a callset with 355016 variants. This leaves only 16 variants in the final shard, and by chance they're all DELetions. I think that's exposed a bug in the IntegrateGQ.sh script.

With -o pipefail set at the top of the script we've repeatedly run into silent failures during this process, and I've tracked the issue down to the failure of || true; to trap failures in this line.

# as the line exists in the script:
$ zcat /data/pe.geno.withquality.txt.gz | { fgrep -wf <(awk '{if ($5!="DEL") print $4}' int.bed) || true; }
$ echo $?
141

# and with the curly braces removed:
$ zcat /data/pe.geno.withquality.txt.gz |  fgrep -wf <(awk '{if ($5!="DEL") print $4}' int.bed) || true  
$ echo $?
0

I can't really explain why this is causing failures, but in the sv-pipeline:2023-09-13-v0.28.3-beta-af8362e3 image (admittedly slightly behind your main branch now) I have demonstrated failures trapping the 141 grep error (caused by Grep failing to identify any matches). If I drop the curly braces the || true trap works as expected, and the whole script completes perfectly.

There are 22 instances of the || true trap appearing in curly braces, and in the sv-pipeline image as we're running it, that trap fails to catch the error. Removal of the braces resolves the issue on all counts. In my operation, only removing the instances on 36, 43, 53, and 60 were enough to run the script through to the end, as they are conditional pairs - if the data only contains one or the other variant type by chance, one of these will fail.

We're going to attempt fixing this on our side by modulating the n_per_split value so that all shards have a mix of variant types, but our experience with this script in the sv-pipeline container it sits in is that its ability to complete is down to chance, as the trapping is not effective.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant