Skip to content

Conversation

kjaisingh
Copy link
Collaborator

@kjaisingh kjaisingh commented Sep 2, 2025

Description

This PR is intended to enable the use of a generalizable TSV file that contains cutoffs stratified by SV type, minimum size and maximum size when filtering genotypes.

In doing so, it also separates FilterGenotypes into two distinct workflows - a TrainGenotypeFilteringModel workflow which trains the machine learning model, and a FilterGenotypes workflow which applies the filtering using the trained model and input cutoffs.

Testing

  • The following job represents a run of the original WDL.
  • The following workspace contains a run of the modified pair of WDLs with the original cutoffs from the job above passed, albeit in the form of a table rather than named arguments. As you can see, the output filtered_vcf in the latter workflow is identical to that of the above workflow.
  • Validated all WDLs with womtool.

Pre-Merge Changes Required

  • Add template SL cutoff file reference to GATK-SV resources bucket.
  • Remove automated sync of WDL to Dockstore.

@kjaisingh kjaisingh self-assigned this Sep 2, 2025
@kjaisingh kjaisingh added the enhancement New feature or request label Sep 2, 2025
@kjaisingh kjaisingh requested a review from mwalker174 September 3, 2025 13:19
@kjaisingh kjaisingh marked this pull request as ready for review September 3, 2025 13:19
@kjaisingh kjaisingh changed the title Modify genotype filtering to read cutoffs from generalizable table Modify genotype filtering to ingest cutoffs from a table Sep 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant