You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Latest master branch as of Today (still present here)
Description
We're having issues running the GenotypeBatch WDL. We've gone with the default sharding value for n_per_split (from here). This has served us just fine so far, but we've now got a callset with 355016 variants. This leaves only 16 variants in the final shard, and by chance they're all DELetions. I think that's exposed a bug in the IntegrateGQ.sh script.
With -o pipefail set at the top of the script we've repeatedly run into silent failures during this process, and I've tracked the issue down to the failure of || true; to trap failures in this line.
# as the line exists in the script:
$ zcat /data/pe.geno.withquality.txt.gz | { fgrep -wf <(awk '{if ($5!="DEL") print $4}' int.bed) || true; }
$ echo $?
141
# and with the curly braces removed:
$ zcat /data/pe.geno.withquality.txt.gz | fgrep -wf <(awk '{if ($5!="DEL") print $4}' int.bed) || true
$ echo $?
0
I can't really explain why this is causing failures, but in the sv-pipeline:2023-09-13-v0.28.3-beta-af8362e3 image (admittedly slightly behind your main branch now) I have demonstrated failures trapping the 141 grep error (caused by Grep failing to identify any matches). If I drop the curly braces the || true trap works as expected, and the whole script completes perfectly.
There are 22 instances of the || true trap appearing in curly braces, and in the sv-pipeline image as we're running it, that trap fails to catch the error. Removal of the braces resolves the issue on all counts. In my operation, only removing the instances on 36, 43, 53, and 60 were enough to run the script through to the end, as they are conditional pairs - if the data only contains one or the other variant type by chance, one of these will fail.
We're going to attempt fixing this on our side by modulating the n_per_split value so that all shards have a mix of variant types, but our experience with this script in the sv-pipeline container it sits in is that its ability to complete is down to chance, as the trapping is not effective.
The text was updated successfully, but these errors were encountered:
Bug Report
Affected module(s) or script(s)
IntegrateGQ.sh
Affected version(s)
Description
We're having issues running the GenotypeBatch WDL. We've gone with the default sharding value for
n_per_split
(from here). This has served us just fine so far, but we've now got a callset with 355016 variants. This leaves only 16 variants in the final shard, and by chance they're all DELetions. I think that's exposed a bug in the IntegrateGQ.sh script.With
-o pipefail
set at the top of the script we've repeatedly run into silent failures during this process, and I've tracked the issue down to the failure of|| true;
to trap failures in this line.I can't really explain why this is causing failures, but in the
sv-pipeline:2023-09-13-v0.28.3-beta-af8362e3
image (admittedly slightly behind your main branch now) I have demonstrated failures trapping the 141 grep error (caused by Grep failing to identify any matches). If I drop the curly braces the|| true
trap works as expected, and the whole script completes perfectly.There are 22 instances of the
|| true
trap appearing in curly braces, and in the sv-pipeline image as we're running it, that trap fails to catch the error. Removal of the braces resolves the issue on all counts. In my operation, only removing the instances on 36, 43, 53, and 60 were enough to run the script through to the end, as they are conditional pairs - if the data only contains one or the other variant type by chance, one of these will fail.We're going to attempt fixing this on our side by modulating the
n_per_split
value so that all shards have a mix of variant types, but our experience with this script in the sv-pipeline container it sits in is that its ability to complete is down to chance, as the trapping is not effective.The text was updated successfully, but these errors were encountered: