Add code infrastructure for pipeline configuration regression tests #12

nwiltsie · 2024-02-14T23:59:19Z

Okay, this is a lot. None of this is in Action format yet, but I wanted to push up this minimal working version and start getting some feedback on it.

The problem this is solving

We have so much metaprogramming craziness inside the pipeline configurations. The configuration changes based on available CPUs and memory, there are 7 ways to modify parameters, and we have fancy process parameter merging and retry specification.

Given that, it's very difficult to review most pull requests changing configuration logic, as I have exceptionally little understanding of what's going on under the hood.

I want an Action that can run regression tests on our Nextflow pipeline configurations. That is, given a specific commit of a pipeline and a defined configuration (split across config files, parameter files, environment variables, and command line arguments), I want to be able to compute the final configuration in the cloud, without requiring any files outside of the repository. Then that Action can automatically run on pull requests, documenting how the code changes impacted the final rendered configuration.

This PR is the bones of that Action (and a record of my slow descent into madness).

Interface

Test File Format

Tests are defined as specially-formatted JSON files tracked within each pipeline repository, one test per file. For demonstration I've included one test for the recalibrate BAM pipeline with this PR. The configuration appears at the start:

tool-Nextflow-action/run-nextflow-tests/recalibrate-bam.json

Lines 1 to 37 in c890484

    
           { 
        
             "config": [ 
        
               "test/nftest.config" 
        
             ], 
        
             "params_file": "test/single.yaml", 
        
             "cpus": 16, 
        
             "memory_gb": 31, 
        
             "empty_files": [ 
        
               "/hot/ref/reference/GRCh38-BI-20160721/Homo_sapiens_assembly38.fasta", 
        
               "/hot/ref/tool-specific-input/GATK/GRCh38/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz", 
        
               "/hot/ref/tool-specific-input/GATK/GRCh38/Homo_sapiens_assembly38.known_indels.vcf.gz", 
        
               "/hot/ref/tool-specific-input/GATK/GRCh38/resources_broad_hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf.gz", 
        
               "/hot/ref/tool-specific-input/GATK/GRCh38/Biallelic/hapmap_3.3.hg38.BIALLELIC.PASS.2021-09-01.vcf.gz", 
        
               "/hot/resource/SMC-HET/tumours/A-mini/bams/n1/output/S2.T-n1.bam" 
        
             ], 
        
             "mapped_files": [], 
        
             "nf_params": { 
        
               "output_dir": "/tmp/outputs" 
        
             }, 
        
             "envvars": { 
        
               "SLURM_JOB_ID": "851543" 
        
             }, 
        
             "mocks": { 
        
               "parse_bam_header": { 
        
                 "read_group": [ 
        
                   { 
        
                     "SM": "4915723" 
        
                   } 
        
                 ] 
        
               } 
        
             }, 
        
             "dated_fields": [ 
        
               "params.log_output_dir", 
        
               "report.file", 
        
               "timeline.file", 
        
               "trace.file" 
        
             ],

The following keys are required (unless stated otherwise all paths are relative to the pipeline root):

Key	Description
config	A list of configuration files to be included (`nextflow -c <file1> -c <file2>`)
params_file	A single parameter file or an empty string (`nextflow -params-file <file>`)
cpus	The integer CPU count to be returned by `SysHelper::getAvailCpus()`
memory_gb	The memory value to be returned by `SysHelper::getAvailMemory` (float, GB)
empty_files	A list of absolute paths to `touch` within the docker image
mapped_files	A map of relative repository files to absolute container paths that should be mapped into the docker image
nf_params	A map of command-line parameters to pass to Nextflow (`nextflow --<key>=<value>`)
envvars	A map of environment variables to set in the docker image (`KEY=VALUE nextflow ...`)
mocks	Method names to be mocked, mapped to the objects they should return
dated_field	A list of JSONPath-like keys indicating values in the rendered configuration that contain datestamps
expected_results	The expected output of the test (lightly filtered, see later sections for details)

That expected_results field looks something like this:

tool-Nextflow-action/run-nextflow-tests/recalibrate-bam.json

Lines 38 to 810 in c890484

    
           "expected_result": { 
        
             "docker": { 
        
               "all_group_ids": "$(for i in `id --real --groups`; do echo -n \"--group-add=$i \"; done)", 
        
               "enabled": true, 
        
               "runOptions": "-u $(id -u):$(id -g) $(for i in `id --real --groups`; do echo -n \"--group-add=$i \"; done)", 
        
               "uid_and_gid": "-u $(id -u):$(id -g)" 
        
             }, 
        
             "manifest": { 
        
               "author": "Yash Patel", 
        
               "description": "Nextflow pipeline to perform Indel Realignment and Base Quality Score Recalibration", 
        
               "name": "recalibrate-BAM", 
        
               "version": "1.0.0-rc.4" 
        
             }, 
        
             "params": { 
        
               "aligner": "BWA-MEM2-2.2.1", 
        
               "blcds_registered_dataset": false, 
        
               "bundle_contest_hapmap_3p3_vcf_gz": "/hot/ref/tool-specific-input/GATK/GRCh38/Biallelic/hapmap_3.3.hg38.BIALLELIC.PASS.2021-09-01.vcf.gz", 
        
               "bundle_known_indels_vcf_gz": "/hot/ref/tool-specific-input/GATK/GRCh38/Homo_sapiens_assembly38.known_indels.vcf.gz", 
        
               "bundle_mills_and_1000g_gold_standard_indels_vcf_gz": "/hot/ref/tool-specific-input/GATK/GRCh38/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz", 
        
               "bundle_v0_dbsnp138_vcf_gz": "/hot/ref/tool-specific-input/GATK/GRCh38/resources_broad_hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf.gz", 
        
               "cache_intermediate_pipeline_steps": false, 
        
               "dataset_id": "A-mini", 
        
               "docker_container_registry": "ghcr.io/uclahs-cds", 
        
               "docker_image_gatk": "broadinstitute/gatk:4.2.4.1", 
        
               "docker_image_gatk3": "ghcr.io/uclahs-cds/call-gsnp:GATK-3.7.0", 
        
               "docker_image_picard": "ghcr.io/uclahs-cds/picard:2.26.10", 
        
               "docker_image_pipeval": "ghcr.io/uclahs-cds/pipeval:4.0.0-rc.2", 
        
               "docker_image_samtools": "ghcr.io/uclahs-cds/samtools:1.17", 
        
               "gatk3_version": "GATK-3.7.0", 
        
               "gatk_command_mem_diff": "0", 
        
               "gatk_ir_compression": "1", 
        
               "gatk_version": "4.2.4.1", 
        
               "input": { 
        
                 "BAM": { 
        
                   "tumor": [ 
        
                     "/hot/resource/SMC-HET/tumours/A-mini/bams/n1/output/S2.T-n1.bam" 
        
                   ] 
        
                 }, 
        
                 "recalibration_table": [ 
        
                   "/scratch/851543/NO_FILE.grp" 
        
                 ] 
        
               }, 
        
               "intervals": "", 
        
               "is_DOC_run": false, 
        
               "is_emit_original_quals": true, 
        
               "is_targeted": false, 
        
               "log_output_dir": "/tmp/outputs/recalibrate-BAM-1.0.0-rc.4/TWGSAMIN000001/log-recalibrate-BAM-1.0.0-rc.4-20240214T213139Z", 
        
               "max_cpus": "16", 
        
               "max_memory": "31 GB", 
        
               "metapipeline_delete_input_bams": false, 
        
               "metapipeline_states_to_delete": [ 
        
                 "normal", 
        
                 "tumor" 
        
               ], 
        
               "min_cpus": "1", 
        
               "min_memory": "1 MB", 
        
               "output_dir": "/tmp/outputs", 
        
               "output_dir_base": "/tmp/outputs/recalibrate-BAM-1.0.0-rc.4/TWGSAMIN000001/GATK-4.2.4.1", 
        
               "parallelize_by_chromosome": true, 
        
               "patient_id": "TWGSAMIN000001", 
        
               "picard_version": "2.26.10", 
        
               "pipeval_version": "4.0.0-rc.2", 
        
               "proc_resource_params": { 
        
                 "deduplicate_records_SAMtools": { 
        
                   "cpus": "2", 
        
                   "memory": "27.9 GB", 
        
                   "retry_strategy": { 
        
                     "memory": { 
        
                       "operand": "2", 
        
                       "strategy": "exponential" 
        
                     } 
        
                   } 
        
                 }, 
        
                 "extract_GenomeIntervals": { 
        
                   "cpus": "1", 
        
                   "memory": "1 GB" 
        
                 }, 
        
                 "remove_intermediate_files": { 
        
                   "cpus": "1", 
        
                   "memory": "1 GB" 
        
                 }, 
        
                 "remove_merged_BAM": { 
        
                   "cpus": "1", 
        
                   "memory": "1 GB" 
        
                 }, 
        
                 "remove_unmerged_BAMs": { 
        
                   "cpus": "1", 
        
                   "memory": "1 GB" 
        
                 }, 
        
                 "run_ApplyBQSR_GATK": { 
        
                   "cpus": "1", 
        
                   "memory": "2 GB", 
        
                   "retry_strategy": { 
        
                     "memory": { 
        
                       "operand": "4", 
        
                       "strategy": "exponential" 
        
                     } 
        
                   } 
        
                 }, 
        
                 "run_BaseRecalibrator_GATK": { 
        
                   "cpus": "1", 
        
                   "memory": "27.9 GB", 
        
                   "retry_strategy": { 
        
                     "memory": { 
        
                       "operand": "2", 
        
                       "strategy": "exponential" 
        
                     } 
        
                   } 
        
                 }, 
        
                 "run_CalculateContamination_GATK": { 
        
                   "cpus": "1", 
        
                   "memory": "14 GB", 
        
                   "retry_strategy": { 
        
                     "memory": { 
        
                       "operand": "2", 
        
                       "strategy": "exponential" 
        
                     } 
        
                   } 
        
                 }, 
        
                 "run_DepthOfCoverage_GATK": { 
        
                   "cpus": "1", 
        
                   "memory": "14 GB", 
        
                   "retry_strategy": { 
        
                     "memory": { 
        
                       "operand": "2", 
        
                       "strategy": "exponential" 
        
                     } 
        
                   } 
        
                 }, 
        
                 "run_GetPileupSummaries_GATK": { 
        
                   "cpus": "1", 
        
                   "memory": "14 GB", 
        
                   "retry_strategy": { 
        
                     "memory": { 
        
                       "operand": "2", 
        
                       "strategy": "exponential" 
        
                     } 
        
                   } 
        
                 }, 
        
                 "run_IndelRealigner_GATK": { 
        
                   "cpus": "2", 
        
                   "memory": "4 GB", 
        
                   "retry_strategy": { 
        
                     "memory": { 
        
                       "operand": "4", 
        
                       "strategy": "exponential" 
        
                     } 
        
                   } 
        
                 }, 
        
                 "run_MergeSamFiles_Picard": { 
        
                   "cpus": "2", 
        
                   "memory": "27.9 GB", 
        
                   "retry_strategy": { 
        
                     "memory": { 
        
                       "operand": "2", 
        
                       "strategy": "exponential" 
        
                     } 
        
                   } 
        
                 }, 
        
                 "run_RealignerTargetCreator_GATK": { 
        
                   "cpus": "2", 
        
                   "memory": "4 GB", 
        
                   "retry_strategy": { 
        
                     "memory": { 
        
                       "operand": "2", 
        
                       "strategy": "exponential" 
        
                     } 
        
                   } 
        
                 }, 
        
                 "run_SplitIntervals_GATK": { 
        
                   "cpus": "1", 
        
                   "memory": "1 GB" 
        
                 }, 
        
                 "run_index_SAMtools": { 
        
                   "cpus": "1", 
        
                   "memory": "2 GB", 
        
                   "retry_strategy": { 
        
                     "memory": { 
        
                       "operand": "2", 
        
                       "strategy": "exponential" 
        
                     } 
        
                   } 
        
                 }, 
        
                 "run_validate_PipeVal": { 
        
                   "cpus": "1", 
        
                   "memory": "1 GB" 
        
                 } 
        
               }, 
        
               "reference_fasta": "/hot/ref/reference/GRCh38-BI-20160721/Homo_sapiens_assembly38.fasta", 
        
               "samples_to_process": [ 
        
                 { 
        
                   "id": "4915723", 
        
                   "path": "/hot/resource/SMC-HET/tumours/A-mini/bams/n1/output/S2.T-n1.bam", 
        
                   "sample_type": "tumor" 
        
                 } 
        
               ], 
        
               "samtools_version": "1.17", 
        
               "save_intermediate_files": false, 
        
               "scatter_count": "50", 
        
               "split_intervals_extra_args": "", 
        
               "ucla_cds": true, 
        
               "use_recal_tables": false, 
        
               "work_dir": "/scratch/851543" 
        
             }, 
        
             "params_schema": { 
        
               "aligner": { 
        
                 "help": "Aligner used to align input BAMs. Provided as <Aligner>-<Aligner-version>", 
        
                 "required": true, 
        
                 "type": "AlignerTool" 
        
               }, 
        
               "base_resource_update": { 
        
                 "elements": { 
        
                   "cpus": { 
        
                     "help": "List of CPU updates", 
        
                     "required": false, 
        
                     "type": "ResourceUpdateList" 
        
                   }, 
        
                   "memory": { 
        
                     "help": "List of memory updates", 
        
                     "required": false, 
        
                     "type": "ResourceUpdateList" 
        
                   } 
        
                 }, 
        
                 "help": "User-defined modifications for adjusting base resource allocations for processes", 
        
                 "required": false, 
        
                 "type": "ResourceUpdateNamespace" 
        
               }, 
        
               "bundle_contest_hapmap_3p3_vcf_gz": { 
        
                 "help": "Absolute path to ConEst HapMap 3p3 VCF", 
        
                 "mode": "r", 
        
                 "required": true, 
        
                 "type": "Path" 
        
               }, 
        
               "bundle_known_indels_vcf_gz": { 
        
                 "help": "Absolute path to known INDELs VCF", 
        
                 "mode": "r", 
        
                 "required": true, 
        
                 "type": "Path" 
        
               }, 
        
               "bundle_mills_and_1000g_gold_standard_indels_vcf_gz": { 
        
                 "help": "Absolute path to Mills and 1000g gold standard INDELs VCF", 
        
                 "mode": "r", 
        
                 "required": true, 
        
                 "type": "Path" 
        
               }, 
        
               "bundle_v0_dbsnp138_vcf_gz": { 
        
                 "help": "Absolute path to v0 dbSNP 138 VCF", 
        
                 "mode": "r", 
        
                 "required": true, 
        
                 "type": "Path" 
        
               }, 
        
               "dataset_id": { 
        
                 "help": "Dataset ID", 
        
                 "required": true, 
        
                 "type": "String" 
        
               }, 
        
               "gatk_ir_compression": { 
        
                 "choices": [ 
        
                   "0", 
        
                   "1", 
        
                   "2", 
        
                   "3", 
        
                   "4", 
        
                   "5", 
        
                   "6", 
        
                   "7", 
        
                   "8", 
        
                   "9" 
        
                 ], 
        
                 "default": "1", 
        
                 "help": "", 
        
                 "required": false, 
        
                 "type": "Integer" 
        
               }, 
        
               "input": { 
        
                 "elements": { 
        
                   "BAM": { 
        
                     "elements": { 
        
                       "normal": { 
        
                         "help": "Input normal BAMs", 
        
                         "required": false, 
        
                         "type": "BAMEntryList" 
        
                       }, 
        
                       "tumor": { 
        
                         "help": "Input tumor BAMs", 
        
                         "required": false, 
        
                         "type": "BAMEntryList" 
        
                       } 
        
                     }, 
        
                     "help": "Input BAMs for calling", 
        
                     "required": true, 
        
                     "type": "InputBAMNamespace" 
        
                   }, 
        
                   "recalibration_table": { 
        
                     "allow_empty": false, 
        
                     "help": "List of any available recalibration tables", 
        
                     "required": false, 
        
                     "type": "RecalibrationTableList" 
        
                   } 
        
                 }, 
        
                 "help": "Input samples", 
        
                 "required": true, 
        
                 "type": "InputNamespace" 
        
               }, 
        
               "intervals": { 
        
                 "allow_empty": true, 
        
                 "help": "Target intervals to process for DNA panel/targeted sequencing samples; leave empty for WGS", 
        
                 "required": true, 
        
                 "type": "String" 
        
               }, 
        
               "is_DOC_run": { 
        
                 "default": false, 
        
                 "help": "Whether to run the DepthOfCoverage process, which is very time-consuming for large BAMs", 
        
                 "required": true, 
        
                 "type": "Bool" 
        
               }, 
        
               "is_emit_original_quals": { 
        
                 "default": true, 
        
                 "help": "Whether to emit original quality scores after recalibration", 
        
                 "required": true, 
        
                 "type": "Bool" 
        
               }, 
        
               "metapipeline_delete_input_bams": { 
        
                 "default": false, 
        
                 "help": "Whether to delete the input BAMs", 
        
                 "required": true, 
        
                 "type": "Bool" 
        
               }, 
        
               "metapipeline_final_output_dir": { 
        
                 "help": "Directory containing final outputs to check before input deletion", 
        
                 "required": false, 
        
                 "type": "String" 
        
               }, 
        
               "metapipeline_states_to_delete": { 
        
                 "choice": [ 
        
                   "normal", 
        
                   "tumor" 
        
                 ], 
        
                 "default": [ 
        
                   "normal", 
        
                   "tumor" 
        
                 ], 
        
                 "help": "List of states for which to delete input BAMs", 
        
                 "required": true, 
        
                 "type": "List" 
        
               }, 
        
               "output_dir": { 
        
                 "help": "Absolute path to output directory", 
        
                 "mode": "w", 
        
                 "required": true, 
        
                 "type": "Path" 
        
               }, 
        
               "patient_id": { 
        
                 "help": "Patient ID", 
        
                 "required": true, 
        
                 "type": "String" 
        
               }, 
        
               "reference_fasta": { 
        
                 "help": "Absolute path to reference genome fasta", 
        
                 "mode": "r", 
        
                 "required": true, 
        
                 "type": "Path" 
        
               }, 
        
               "save_intermediate_files": { 
        
                 "default": false, 
        
                 "help": "Whether to save intermediate files", 
        
                 "required": true, 
        
                 "type": "Bool" 
        
               }, 
        
               "scatter_count": { 
        
                 "default": "50", 
        
                 "help": "How many intervals to divide the genome into for parallelization", 
        
                 "required": true, 
        
                 "type": "Integer" 
        
               }, 
        
               "split_intervals_extra_args": { 
        
                 "allow_empty": true, 
        
                 "help": "Extra arguments for interval splitting", 
        
                 "required": false, 
        
                 "type": "String" 
        
               } 
        
             }, 
        
             "proc_name_keys": [ 
        
               "withName:run_validate_PipeVal", 
        
               "withName:extract_GenomeIntervals", 
        
               "withName:run_SplitIntervals_GATK", 
        
               "withName:run_RealignerTargetCreator_GATK", 
        
               "withName:run_IndelRealigner_GATK", 
        
               "withName:run_BaseRecalibrator_GATK", 
        
               "withName:run_ApplyBQSR_GATK", 
        
               "withName:run_MergeSamFiles_Picard", 
        
               "withName:deduplicate_records_SAMtools", 
        
               "withName:run_index_SAMtools", 
        
               "withName:run_GetPileupSummaries_GATK", 
        
               "withName:run_CalculateContamination_GATK", 
        
               "withName:run_DepthOfCoverage_GATK", 
        
               "withName:remove_intermediate_files", 
        
               "withName:remove_unmerged_BAMs", 
        
               "withName:remove_merged_BAM" 
        
             ], 
        
             "proc_names": "[Ljava.lang.String;@7cf166db", 
        
             "process": { 
        
               "cache": false, 
        
               "containerOptions": { 
        
                 "1": "--cpu-shares 1024 --cpus $task.cpus", 
        
                 "2": "--cpu-shares 1024 --cpus $task.cpus", 
        
                 "3": "--cpu-shares 1024 --cpus $task.cpus", 
        
                 "closure": "--cpu-shares 1024 --cpus $task.cpus" 
        
               }, 
        
               "cpus": { 
        
                 "1": "1", 
        
                 "2": "2", 
        
                 "3": "3", 
        
                 "closure": "closure()" 
        
               }, 
        
               "errorStrategy": { 
        
                 "1": "terminate", 
        
                 "2": "terminate", 
        
                 "3": "terminate", 
        
                 "closure": "terminate" 
        
               }, 
        
               "executor": "local", 
        
               "maxRetries": "1", 
        
               "memory": "31 GB", 
        
               "withLabel:process_high": { 
        
                 "cpus": { 
        
                   "1": "12", 
        
                   "2": "12", 
        
                   "3": "12", 
        
                   "closure": "retry_updater(12, add, 0, $task.attempt, cpus)" 
        
                 }, 
        
                 "memory": { 
        
                   "1": "31 GB", 
        
                   "2": "31 GB", 
        
                   "3": "31 GB", 
        
                   "closure": "retry_updater(84 GB, exponential, 2, $task.attempt, memory)" 
        
                 } 
        
               }, 
        
               "withLabel:process_low": { 
        
                 "cpus": { 
        
                   "1": "2", 
        
                   "2": "2", 
        
                   "3": "2", 
        
                   "closure": "retry_updater(2, add, 0, $task.attempt, cpus)" 
        
                 }, 
        
                 "memory": { 
        
                   "1": "3 GB", 
        
                   "2": "6 GB", 
        
                   "3": "12 GB", 
        
                   "closure": "retry_updater(3 GB, exponential, 2, $task.attempt, memory)" 
        
                 } 
        
               }, 
        
               "withLabel:process_medium": { 
        
                 "cpus": { 
        
                   "1": "6", 
        
                   "2": "6", 
        
                   "3": "6", 
        
                   "closure": "retry_updater(6, add, 0, $task.attempt, cpus)" 
        
                 }, 
        
                 "memory": { 
        
                   "1": "31 GB", 
        
                   "2": "31 GB", 
        
                   "3": "31 GB", 
        
                   "closure": "retry_updater(42 GB, exponential, 2, $task.attempt, memory)" 
        
                 } 
        
               }, 
        
               "withName:deduplicate_records_SAMtools": { 
        
                 "cpus": "2", 
        
                 "memory": { 
        
                   "1": "27.9 GB", 
        
                   "2": "31 GB", 
        
                   "3": "31 GB", 
        
                   "closure": "retry_updater(27.9 GB, exponential, 2, $task.attempt, memory)" 
        
                 } 
        
               }, 
        
               "withName:extract_GenomeIntervals": { 
        
                 "cpus": "1", 
        
                 "memory": "1 GB" 
        
               }, 
        
               "withName:remove_intermediate_files": { 
        
                 "cpus": "1", 
        
                 "memory": "1 GB" 
        
               }, 
        
               "withName:remove_merged_BAM": { 
        
                 "cpus": "1", 
        
                 "memory": "1 GB" 
        
               }, 
        
               "withName:remove_unmerged_BAMs": { 
        
                 "cpus": "1", 
        
                 "memory": "1 GB" 
        
               }, 
        
               "withName:run_ApplyBQSR_GATK": { 
        
                 "cpus": "1", 
        
                 "memory": { 
        
                   "1": "2 GB", 
        
                   "2": "8 GB", 
        
                   "3": "31 GB", 
        
                   "closure": "retry_updater(2 GB, exponential, 4, $task.attempt, memory)" 
        
                 } 
        
               }, 
        
               "withName:run_BaseRecalibrator_GATK": { 
        
                 "cpus": "1", 
        
                 "memory": { 
        
                   "1": "27.9 GB", 
        
                   "2": "31 GB", 
        
                   "3": "31 GB", 
        
                   "closure": "retry_updater(27.9 GB, exponential, 2, $task.attempt, memory)" 
        
                 } 
        
               }, 
        
               "withName:run_CalculateContamination_GATK": { 
        
                 "cpus": "1", 
        
                 "memory": { 
        
                   "1": "14 GB", 
        
                   "2": "27.9 GB", 
        
                   "3": "31 GB", 
        
                   "closure": "retry_updater(14 GB, exponential, 2, $task.attempt, memory)" 
        
                 } 
        
               }, 
        
               "withName:run_DepthOfCoverage_GATK": { 
        
                 "cpus": "1", 
        
                 "memory": { 
        
                   "1": "14 GB", 
        
                   "2": "27.9 GB", 
        
                   "3": "31 GB", 
        
                   "closure": "retry_updater(14 GB, exponential, 2, $task.attempt, memory)" 
        
                 } 
        
               }, 
        
               "withName:run_GetPileupSummaries_GATK": { 
        
                 "cpus": "1", 
        
                 "memory": { 
        
                   "1": "14 GB", 
        
                   "2": "27.9 GB", 
        
                   "3": "31 GB", 
        
                   "closure": "retry_updater(14 GB, exponential, 2, $task.attempt, memory)" 
        
                 } 
        
               }, 
        
               "withName:run_IndelRealigner_GATK": { 
        
                 "cpus": "2", 
        
                 "memory": { 
        
                   "1": "4 GB", 
        
                   "2": "16 GB", 
        
                   "3": "31 GB", 
        
                   "closure": "retry_updater(4 GB, exponential, 4, $task.attempt, memory)" 
        
                 } 
        
               }, 
        
               "withName:run_MergeSamFiles_Picard": { 
        
                 "cpus": "2", 
        
                 "memory": { 
        
                   "1": "27.9 GB", 
        
                   "2": "31 GB", 
        
                   "3": "31 GB", 
        
                   "closure": "retry_updater(27.9 GB, exponential, 2, $task.attempt, memory)" 
        
                 } 
        
               }, 
        
               "withName:run_RealignerTargetCreator_GATK": { 
        
                 "cpus": "2", 
        
                 "memory": { 
        
                   "1": "4 GB", 
        
                   "2": "8 GB", 
        
                   "3": "16 GB", 
        
                   "closure": "retry_updater(4 GB, exponential, 2, $task.attempt, memory)" 
        
                 } 
        
               }, 
        
               "withName:run_SplitIntervals_GATK": { 
        
                 "cpus": "1", 
        
                 "memory": "1 GB" 
        
               }, 
        
               "withName:run_index_SAMtools": { 
        
                 "cpus": "1", 
        
                 "memory": { 
        
                   "1": "2 GB", 
        
                   "2": "4 GB", 
        
                   "3": "8 GB", 
        
                   "closure": "retry_updater(2 GB, exponential, 2, $task.attempt, memory)" 
        
                 } 
        
               }, 
        
               "withName:run_validate_PipeVal": { 
        
                 "cpus": "1", 
        
                 "memory": "1 GB" 
        
               } 
        
             }, 
        
             "report": { 
        
               "enabled": true, 
        
               "file": "/tmp/outputs/recalibrate-BAM-1.0.0-rc.4/TWGSAMIN000001/log-recalibrate-BAM-1.0.0-rc.4-20240214T213139Z/nextflow-log/report.html" 
        
             }, 
        
             "timeline": { 
        
               "enabled": true, 
        
               "file": "/tmp/outputs/recalibrate-BAM-1.0.0-rc.4/TWGSAMIN000001/log-recalibrate-BAM-1.0.0-rc.4-20240214T213139Z/nextflow-log/timeline.html" 
        
             }, 
        
             "trace": { 
        
               "enabled": true, 
        
               "file": "/tmp/outputs/recalibrate-BAM-1.0.0-rc.4/TWGSAMIN000001/log-recalibrate-BAM-1.0.0-rc.4-20240214T213139Z/nextflow-log/trace.txt" 
        
             }, 
        
             "workDir": "/scratch/851543", 
        
             "yaml": { 
        
               "aligner": { 
        
                 "help": "Aligner used to align input BAMs. Provided as <Aligner>-<Aligner-version>", 
        
                 "required": true, 
        
                 "type": "AlignerTool" 
        
               }, 
        
               "base_resource_update": { 
        
                 "elements": { 
        
                   "cpus": { 
        
                     "help": "List of CPU updates", 
        
                     "required": false, 
        
                     "type": "ResourceUpdateList" 
        
                   }, 
        
                   "memory": { 
        
                     "help": "List of memory updates", 
        
                     "required": false, 
        
                     "type": "ResourceUpdateList" 
        
                   } 
        
                 }, 
        
                 "help": "User-defined modifications for adjusting base resource allocations for processes", 
        
                 "required": false, 
        
                 "type": "ResourceUpdateNamespace" 
        
               }, 
        
               "bundle_contest_hapmap_3p3_vcf_gz": { 
        
                 "help": "Absolute path to ConEst HapMap 3p3 VCF", 
        
                 "mode": "r", 
        
                 "required": true, 
        
                 "type": "Path" 
        
               }, 
        
               "bundle_known_indels_vcf_gz": { 
        
                 "help": "Absolute path to known INDELs VCF", 
        
                 "mode": "r", 
        
                 "required": true, 
        
                 "type": "Path" 
        
               }, 
        
               "bundle_mills_and_1000g_gold_standard_indels_vcf_gz": { 
        
                 "help": "Absolute path to Mills and 1000g gold standard INDELs VCF", 
        
                 "mode": "r", 
        
                 "required": true, 
        
                 "type": "Path" 
        
               }, 
        
               "bundle_v0_dbsnp138_vcf_gz": { 
        
                 "help": "Absolute path to v0 dbSNP 138 VCF", 
        
                 "mode": "r", 
        
                 "required": true, 
        
                 "type": "Path" 
        
               }, 
        
               "dataset_id": { 
        
                 "help": "Dataset ID", 
        
                 "required": true, 
        
                 "type": "String" 
        
               }, 
        
               "gatk_ir_compression": { 
        
                 "choices": [ 
        
                   "0", 
        
                   "1", 
        
                   "2", 
        
                   "3", 
        
                   "4", 
        
                   "5", 
        
                   "6", 
        
                   "7", 
        
                   "8", 
        
                   "9" 
        
                 ], 
        
                 "default": "1", 
        
                 "help": "", 
        
                 "required": false, 
        
                 "type": "Integer" 
        
               }, 
        
               "input": { 
        
                 "elements": { 
        
                   "BAM": { 
        
                     "elements": { 
        
                       "normal": { 
        
                         "help": "Input normal BAMs", 
        
                         "required": false, 
        
                         "type": "BAMEntryList" 
        
                       }, 
        
                       "tumor": { 
        
                         "help": "Input tumor BAMs", 
        
                         "required": false, 
        
                         "type": "BAMEntryList" 
        
                       } 
        
                     }, 
        
                     "help": "Input BAMs for calling", 
        
                     "required": true, 
        
                     "type": "InputBAMNamespace" 
        
                   }, 
        
                   "recalibration_table": { 
        
                     "allow_empty": false, 
        
                     "help": "List of any available recalibration tables", 
        
                     "required": false, 
        
                     "type": "RecalibrationTableList" 
        
                   } 
        
                 }, 
        
                 "help": "Input samples", 
        
                 "required": true, 
        
                 "type": "InputNamespace" 
        
               }, 
        
               "intervals": { 
        
                 "allow_empty": true, 
        
                 "help": "Target intervals to process for DNA panel/targeted sequencing samples; leave empty for WGS", 
        
                 "required": true, 
        
                 "type": "String" 
        
               }, 
        
               "is_DOC_run": { 
        
                 "default": false, 
        
                 "help": "Whether to run the DepthOfCoverage process, which is very time-consuming for large BAMs", 
        
                 "required": true, 
        
                 "type": "Bool" 
        
               }, 
        
               "is_emit_original_quals": { 
        
                 "default": true, 
        
                 "help": "Whether to emit original quality scores after recalibration", 
        
                 "required": true, 
        
                 "type": "Bool" 
        
               }, 
        
               "metapipeline_delete_input_bams": { 
        
                 "default": false, 
        
                 "help": "Whether to delete the input BAMs", 
        
                 "required": true, 
        
                 "type": "Bool" 
        
               }, 
        
               "metapipeline_final_output_dir": { 
        
                 "help": "Directory containing final outputs to check before input deletion", 
        
                 "required": false, 
        
                 "type": "String" 
        
               }, 
        
               "metapipeline_states_to_delete": { 
        
                 "choice": [ 
        
                   "normal", 
        
                   "tumor" 
        
                 ], 
        
                 "default": [ 
        
                   "normal", 
        
                   "tumor" 
        
                 ], 
        
                 "help": "List of states for which to delete input BAMs", 
        
                 "required": true, 
        
                 "type": "List" 
        
               }, 
        
               "output_dir": { 
        
                 "help": "Absolute path to output directory", 
        
                 "mode": "w", 
        
                 "required": true, 
        
                 "type": "Path" 
        
               }, 
        
               "patient_id": { 
        
                 "help": "Patient ID", 
        
                 "required": true, 
        
                 "type": "String" 
        
               }, 
        
               "reference_fasta": { 
        
                 "help": "Absolute path to reference genome fasta", 
        
                 "mode": "r", 
        
                 "required": true, 
        
                 "type": "Path" 
        
               }, 
        
               "save_intermediate_files": { 
        
                 "default": false, 
        
                 "help": "Whether to save intermediate files", 
        
                 "required": true, 
        
                 "type": "Bool" 
        
               }, 
        
               "scatter_count": { 
        
                 "default": "50", 
        
                 "help": "How many intervals to divide the genome into for parallelization", 
        
                 "required": true, 
        
                 "type": "Integer" 
        
               }, 
        
               "split_intervals_extra_args": { 
        
                 "allow_empty": true, 
        
                 "help": "Extra arguments for interval splitting", 
        
                 "required": false, 
        
                 "type": "String" 
        
               } 
        
             } 
        
           }

Due to our metaprogramming madness many of the memory and cpu values under process are closures that return different results depending upon task.attempt - in those cases the reported value is a map containing the closure code itself (heavy caveats) and the computed values for attempts 1, 2, and 3:

tool-Nextflow-action/run-nextflow-tests/recalibrate-bam.json

Lines 605 to 612 in c890484

    
           "withName:run_index_SAMtools": { 
        
             "cpus": "1", 
        
             "memory": { 
        
               "1": "2 GB", 
        
               "2": "4 GB", 
        
               "3": "8 GB", 
        
               "closure": "retry_updater(2 GB, exponential, 2, $task.attempt, memory)" 
        
             }

Current Usage

The entry.py script compares the bundled tests against a local checkout of the recalibrate BAM pipeline. With a clean checkout that goes well (all examples are running on my mac laptop, detached from the cluster):

$ ./entry.py ~/src/pipeline-recalibrate-BAM/
No changes!

After I make a small change to the pipeline...

--- a/config/F16.config
+++ b/config/F16.config
@@ -17,7 +17,7 @@ process {
         retry_strategy {
             memory {
                 strategy = 'exponential'
-                operand = 2
+                operand = 4
             }
         }
     }

$ ./entry.py ~/src/pipeline-recalibrate-BAM/
.params.proc_resource_params.run_RealignerTargetCreator_GATK.retry_strategy.memory.operand
2
4
------
.process.withName:run_RealignerTargetCreator_GATK.memory.2
8 GB
16 GB
------
.process.withName:run_RealignerTargetCreator_GATK.memory.3
16 GB
31 GB
------
.process.withName:run_RealignerTargetCreator_GATK.memory.closure
retry_updater(4 GB, exponential, 2, $task.attempt, memory)
retry_updater(4 GB, exponential, 4, $task.attempt, memory)
------
Saving updated file to /Users/nwiltsie/src/tool-Nextflow-action/run-nextflow-tests/recalibrate-bam-out.json

When differences are found, the script saves a complete test file as "[testfile]-out.json", which makes it easy to copy over the original if the changes were intended.

Internals

The key steps are:

Build the docker image (currently done on-the-fly)
Generate a temporary config file with the appropriate mocks and includeConfigs
Within the image, run a groovy script that's kinda-sorta like nextflow config -properties
Outside the image, parse the properties and diff them against the expected values

Docker Image

This tool runs in a modified version of the nextflow/nextflow:23.10.0 image, and can be updated if desired.

The key changes are to download and include mockito on the classpath, and to swap the JAR entrypoint for the nextflow script from nextflow.cli.Launcher to groovy.ui.GroovyMain (what you get if you call groovy from the command line).

This is definitely hackish, but the goal is to execute an arbitrary groovy script with exactly the same JVM arguments and settings as nextflow. The nextflow script does some tricky business, caching all of those arguments in /.nextflow/tmp/launcher/nextflow-one_${NEXTFLOW_VERSION}/buildkitsandbox/classpath-${NEXTFLOW_MD5}, where ${NEXTFLOW_MD5} is the hash of a bunch of environment variables. I'd prefer to use $NXF_CLASSPATH rather than sed to splice in the extra JARs, but doing so would result in an unpredictable launcher and make it harder to swap out the entrypoint.

Runtime Container

At runtime, this tool creates a container from the image and bind-mounts:

The pipeline to the same path
An empty NamedTemporaryFile to each "empty file"
Individual pipeline files to the absolute paths defined by "mapped_files"
A temporary directory to /mnt/bl_tests

Within that temporary directory, it creates:

docker_test.config - the temporary config file described below
cli_params.json - A JSON file of the Nextflow command-line parameters
test_mocks.json - A JSON file of the test-specific method mocks

The container also gets the following environment variables:

Everything specified by the test's "envvars"
BL_PIPELINE_DIR - Path to the pipeline
BL_CONFIG_FILE - The container path to docker_test.config
BL_MOCKS_FILE - The container path to test_mocks.json
BL_CLI_PARAMS_FILE - The container path to cli_params.json
BL_PARAMS_FILE - The container path to the "params file" (if set)

Temporary Config File

The generated config file looks something like this:

import nextflow.util.SysHelper
import nextflow.util.MemoryUnit
import static org.mockito.Mockito.*
import org.mockito.MockedStatic

try (MockedStatic dummyhelper = mockStatic(
        SysHelper.class,
        CALLS_REAL_METHODS)) {
    dummyhelper
        .when(SysHelper::getAvailCpus)
        .thenReturn(16);
    dummyhelper
        .when(SysHelper::getAvailMemory)
        .thenReturn(MemoryUnit.of("31GB"));
    includeConfig "/Users/nwiltsie/src/pipeline-recalibrate-BAM/test/nftest.config"
}

System.out.println("=========SENTINEL_OUTPUT==========")

The two key parts here:

The use of mockito to mock out static Java libraries (it's mindblowing that it works) with the CPU and memory values from the test configuration.
Explicitly printing a sentinel value after the true config files are loaded. Our configs are littered with random print statements, so the sentinel indicates the boundary between all that junk and the structured output to follow.

Groovy Script

Here's the main madness. This script is a chimera of nextflow config and a portion of nextflow run with some major changes:

I don't know how to do parameter parsing in groovy, so the parameters and config files are read from the BL_* environment variables and injected into the nextflow.cli.CliOptions and nextflow.cli.CmdRun objects in the same way that they would have been using CLI parameters.
The actual building of the ConfigObject (where all of our config code is executed) is wrapped in an Interceptor that gets to spy on every method call. If any of those method names are in the tests's "mocks", the actual method is ignored and the pre-computed result is returned.

At this point the ConfigObject is as complete as Nextflow would ever make it. However, if it were to be serialized at this point every closure would be represented like Script1E713595D312C8FDC3D2D9EDAF11D59D$_run_closure1$_closure11@65293ca5, which is not ideal.

To solve that I take one more step and walk over the process block with another Interceptor. During that process I take the following steps:

For top-level keys (direct children of the process block), cache their name (e.g. withLabel:process_medium) in the interceptor while traversing their children.
For every closure value, attempt to replace it with the following, moving onto the next if there is an issue:
1. The result of evaluating the closure, raising an exception if it tries to get a property named task.
2. A map with keys closure, 1, 2, and 3 mapping to results of evaluating the closure, with some serious skullduggery to intercept ConfigObject.get("task") to inject the value [process: <saved name>, attempt: '???', cpus: '$task.cpus'] (making the closure not-so-closed). The value for cpus is the literal string '$task.cpus', but the value for attempt varies:
  - For closure, attempt is set to the literal string '$task.attempt'. While evaluating the closure any calls to methods named check_limits or retry_updater are mocked out for strings like "$name(${args.join(', ')})" (e.g. "retry_updater(12, add, 0, $task.attempt, cpus)").
  - For 1, 2, and 3, attempt is set to that integer and the closure is evaluated without any other shenanigans.
If we still fail, it's because of a closure like cpus = { methods.check_limits( 1 * task.attempt, 'cpus' ) }. 1 * task.attempt is evaluated with the static Java method java.lang.Integer.multiply and can't be intercepted without mockito, so at this point I just bail and return "closure()".

Finally, the fully-rendered configuration is printed to stdout in Java Properties File format.

Parsing and Diffing

Outside of the docker container, the groovy output is split using the sentinel value and parsed into a dictionary. A lot of junk shows up here - seemingly any variable defined without a def (example, another example) bubbles up to the global namespace. For sanity I'm not parsing that json_object variable, but we should just add the def onto it.

Once parsed, I pop off any configuration-only namespaces we've defined, as they're just noise:

tool-Nextflow-action/run-nextflow-tests/configtest.py

Lines 224 to 236 in f8210ca

    
           # These are namespaces defined in the common submodules 
        
           boring_keys = { 
        
               'csv_parser', 
        
               'custom_schema_types', 
        
               'methods', 
        
               'retry', 
        
               'schema', 
        
               'bam_parser', 
        
               'json_extractor', 
        
           } 
        
           for key in boring_keys: 
        
               result.pop(key, None)

Finally, I generate a list of differences, using re.sub to remove any dates from the "dated_fields".

Where to go from here

As-is, this is not an Action. However, it is shockingly complicated, and before I go any further I need feedback from the lot of you. Feel free to dive into any aspect of this, but here are some specific questions:

Is this at core a terrible idea? I don't really want to throw this away, but it will only work if the rest of you are invested in the utility of it.
Does the top-level interface seem reasonable? My vague plan is to have the Action look for all nf-configtest-*.json files (or something) in the repository. That would allow each test file to be independent.

sorelfitzgibbon

Superficially, which is as far as I'm capable of going with this, looks great to me. How does this work interplay, if at all, with the plan to capture all input configuration for every run? Can we use some of this to capture both input values and post configuration values to store? Or can we already extract those values just from the normal run?

nwiltsie · 2024-02-15T17:33:26Z

How does this work interplay, if at all, with the plan to capture all input configuration for every run?

This is definitely related work. My mental framework is as follows:

We write configuration and parameter files for the pipelines (our intended inputs)
Each pipeline's configuration process renders our inputs into variety of processes to be run and inputs for each (the rendered configuration)
Nextflow executes each process (nextflow reports, pipeline outputs)

There are three problems there:

On the input side, we control the intended inputs, but the pipelines actually work with the rendered configuration. Something might go wrong in that translation or change inadvertently.
On the output side, we can only infer pieces of the rendered configuration from the Nextflow reports.
On the output side, even assuming we have the full rendered configuration, we can't back-calculate the original human inputs.

This work partially addresses the first problem by helping us be consistent in the rendering/translation process (although it doesn't say anything about it being right). Sufficient test coverage with this tool might convince us of the "rightness".

@j2salmingo's work with uclahs-cds/pipeline-Nextflow-config#43 addresses the second problem by capturing the full rendered configuration with the pipeline results.

To my knowledge, the third problem is currently unaddressed.

sorelfitzgibbon · 2024-02-15T19:35:57Z

Thanks, I agree it would be helpful to solve all three problems.

kiarod · 2024-02-15T20:40:10Z

Is this at core a terrible idea? I don't really want to throw this away, but it will only work if the rest of you are invested in the utility of it.

On the first point, this definitely is not a terrible idea, and I don't think we can throw it away if we wanted to. This would be an awesome utility to take the guesswork out of the downstream effects of configuration changes, and would dramatically improve the quality of code reviews in a number of ways(speed, certainty, trackability of changes).

Does the top-level interface seem reasonable? My vague plan is to have the Action look for all nf-configtest-*.json files (or something) in the repository. That would allow each test file to be independent.

In terms of top level interface, did you compose the test json by hand or did you have an automated way of constructing the json or at least a skeleton of it. Some of the components seem to be JSONified portions of the schema.yaml

Overall this is amazing work Nick! I think if this were actionified it would seriously raise the professional standard of our pipeline work. I do feel it would be worthwhile to present a demo on this work at Nextflow WG meeting so we can stop and ask you implementation level questions

nwiltsie · 2024-02-15T21:56:32Z

In terms of top level interface, did you compose the test json by hand or did you have an automated way of constructing the json or at least a skeleton of it. Some of the components seem to be JSONified portions of the schema.yaml

I hand-composed a file like this:

{
  "config": [
    "test/nftest.config"
  ],
  "params_file": "test/single.yaml",
  "cpus": 16,
  "memory_gb": 31,
  "empty_files": [
    "/hot/ref/reference/GRCh38-BI-20160721/Homo_sapiens_assembly38.fasta",
    "/hot/ref/tool-specific-input/GATK/GRCh38/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz",
    "/hot/ref/tool-specific-input/GATK/GRCh38/Homo_sapiens_assembly38.known_indels.vcf.gz",
    "/hot/ref/tool-specific-input/GATK/GRCh38/resources_broad_hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf.gz",
    "/hot/ref/tool-specific-input/GATK/GRCh38/Biallelic/hapmap_3.3.hg38.BIALLELIC.PASS.2021-09-01.vcf.gz",
    "/hot/resource/SMC-HET/tumours/A-mini/bams/n1/output/S2.T-n1.bam"
  ],
  "mapped_files": [],
  "nf_params": {
    "output_dir": "/tmp/outputs"
  },
  "envvars": {
    "SLURM_JOB_ID": "851543"
  },
  "mocks": {
    "parse_bam_header": {
      "read_group": [
        {
          "SM": "4915723"
        }
      ]
    }
  },
  "dated_fields": [
    "params.log_output_dir",
    "report.file",
    "timeline.file",
    "trace.file"
  ],
  "expected_result": {}
}

So that's all of the inputs and an expected result of nothing. When I ran the tool it complained about all of the unexpected results, and wrote them to a new file for me (I trimmed out most of the JSON from the console output below), which I turned right around and committed. That definitely makes these regression tests - I don't swear that it's the correct output, but it's consistent output.

$ ./entry.py ~/src/pipeline-recalibrate-BAM/
docker
None
{'all_group_ids': '$(for i in `id --real --groups`; do echo -n "--group-add=$i "; done)', 'enabled': True, 'runOptions': '-u $(id -u):$(id -g) $(for i in `id --real --groups`; do echo -n "--group-add=$i "; done)', 'uid_and_gid': '-u $(id -u):$(id -g)'}
------
manifest
None
{'author': 'Yash Patel', 'description': 'Nextflow pipeline to perform Indel Realignment and Base Quality Score Recalibration', 'name': 'recalibrate-BAM', 'version': '1.0.0-rc.4'}
------
params
None
[full content trimmed]
------
params_schema
None
[full content trimmed]
------
process
None
[full content trimmed]
------
report
None
{'enabled': True, 'file': '/tmp/outputs/recalibrate-BAM-1.0.0-rc.4/TWGSAMIN000001/log-recalibrate-BAM-1.0.0-rc.4-20240215T215116Z/nextflow-log/report.html'}
------
timeline
None
{'enabled': True, 'file': '/tmp/outputs/recalibrate-BAM-1.0.0-rc.4/TWGSAMIN000001/log-recalibrate-BAM-1.0.0-rc.4-20240215T215116Z/nextflow-log/timeline.html'}
------
trace
None
{'enabled': True, 'file': '/tmp/outputs/recalibrate-BAM-1.0.0-rc.4/TWGSAMIN000001/log-recalibrate-BAM-1.0.0-rc.4-20240215T215116Z/nextflow-log/trace.txt'}
------
workDir
None
/scratch/851543
------
yaml
None
[full content trimmed]
------
Saving updated file to /Users/nwiltsie/src/tool-Nextflow-action/run-nextflow-tests/recalibrate-bam-out.json

So yes, that yaml block is the JSONified version of the schema.yaml, because this line of code defines it without a def.

nwiltsie · 2024-02-15T21:58:32Z

(Expanding on that - that line of code should have a def on it, and we shouldn't include that yaml key in the test files. This is going to quickly turn up a lot of global variables that we didn't intend to be global.)

zhuchcn · 2024-02-18T05:35:07Z

This is absolutely amazing work! Have you considered incorporating this into our nf-test? It is an already published software, so we could even consider another publication of nf-test v2 in the future if more functionalities are added. Are you also expect users to write a series of test cases for different input parameters and enviroments? We can benefit from it when developing locally if this can be accessed from command line directly.

A couple of silly questions:

Does every closure needs to be provided in 'mocks'? What will happen if a cluster is not mocked?
I don't understand what entry.py is comparing against. What is a local checkout?

nwiltsie · 2024-02-20T16:50:25Z

Have you considered incorporating this into our nf-test? It is an already published software, so we could even consider another publication of nf-test v2 in the future if more functionalities are added.

I've vaguely considered it - part of my goal in pushing this up now was to solicit feedback on how best to use it. I'm open to whatever makes the most sense!

Are you also expect users to write a series of test cases for different input parameters and enviroments? We can benefit from it when developing locally if this can be accessed from command line directly.

Yes, the expectation is that we'd have a representative set of tests stored in each pipeline - different CPU counts, different inputs, etc. I agree that it should be available locally as well as in the cloud.

Does every closure needs to be provided in 'mocks'? What will happen if a cluster is not mocked?

No, not every closure needs to be mocked (in fact, as few as possible should be mocked out). Anything not mocked will be evaluated in the same way it would be in a real pipeline run.

I don't understand what entry.py is comparing against. What is a local checkout?

entry.py is just a small demo for this pull request. In practice this should work like NFTest, with each pipeline storing its own test files. Specifically for this pull request, to demonstrate the functionality, I bundled a test case for the recalibrate BAM pipeline and wrote entry.py as a simple test runner. That means you can follow these steps to run the test:

#!/bin/bash

# Clone this pull request
git clone \
    --branch nwiltsie-nextflow-regression-logic \
    --recurse-submodules \
    [email protected]:uclahs-cds/tool-Nextflow-action.git

# Clone the recalibrate-BAM pipeline (with submodules)
git clone \
    --recurse-submodules \
    [email protected]:uclahs-cds/pipeline-recalibrate-BAM.git

# Run the test bundled with the pull request
./tool-Nextflow-action/run-nextflow-tests/entry.py \
    "$(readlink -f ./pipeline-recalibrate-BAM/)"

nwiltsie · 2024-02-26T17:55:24Z

uclahs-cds/pipeline-call-gSV#118 is a perfect example of where this can be useful. @Faizal-Eeman and I created the test files from cd08fa7 and a187b4e to match pipeline-call-gSV's current main branch. When run against the branch from that pull request, the output is...

$ ./entry.py ../../../pipelines/pipeline-call-gSV ./call-gsv-F32.json
.paramsproc_resource_params
None
{'call_gSV_Delly': {'cpus': '1', 'memory': '30 GB', 'retry_strategy': {'memory': {'operand': '2', 'strategy': 'exponential'}}}, 'call_gSV_Manta': {'cpus': '1', 'memory': '30 GB', 'retry_strategy': {'memory': {'operand': '2', 'strategy': 'exponential'}}}, 'run_validate_PipeVal': {'cpus': '1', 'memory': '1 GB'}}
------
.process.withName:call_gSV_Delly.memory
30 GB
{'1': '30 GB', '2': '60 GB', '3': '64 GB', 'closure': 'retry_updater(30 GB, exponential, 2, $task.attempt, memory)'}
------
.process.withName:call_gSV_Manta.memory
30 GB
{'1': '30 GB', '2': '60 GB', '3': '64 GB', 'closure': 'retry_updater(30 GB, exponential, 2, $task.attempt, memory)'}
------
proc_names
None
[Ljava.lang.String;@47fc9ce
------
Saving updated file to call-gsv-F32-out.json

So the differences are:

The branch newly defines the params.proc_resource_params variable (and my script clearly has a bug in assembling that variable name).
The call_gSV_Delly and call_gSV_Manta processes previously had a flat 30 GB memory allocation, and now have a progressive allocation of 30 GB, 60 GB, and 64 GB on the first three attempts (this is on an F32 node).
The proc_names variable is newly set to an opaque pointer value, indicating that it's a variable missing a def (link).

yashpatel6

Looks great! I think we can proceed with this version and make updates as necessary once we begin implementing across pipelines

run-nextflow-tests/entry.py

nwiltsie added 2 commits February 14, 2024 13:33

Add working code and test

c890484

Remove unnecessary dependencies from pom.xml

f8210ca

nwiltsie requested review from a team February 14, 2024 23:59

yashpatel6 assigned aholmes, sorelfitzgibbon, zhuchcn, Faizal-Eeman, yashpatel6, j2salmingo and kiarod Feb 15, 2024

sorelfitzgibbon reviewed Feb 15, 2024

View reviewed changes

nwiltsie added 4 commits February 23, 2024 16:52

Modify entry script to be generic, add call-gSV test

cd08fa7

Add multiple test files for call-gSV

a187b4e

Add tests for pipeline-call-sSNV

73d3bb7

Add tests for pipeline-call-sSV

f2c678d

Add tests for call-mtSNV

6564eb0

yashpatel6 approved these changes Feb 27, 2024

View reviewed changes

run-nextflow-tests/entry.py Show resolved Hide resolved

nwiltsie merged commit d84ad41 into main Feb 27, 2024
1 check passed

nwiltsie deleted the nwiltsie-nextflow-regression-logic branch February 27, 2024 23:27

nwiltsie mentioned this pull request Mar 1, 2024

Create usable Nextflow regression test Action #16

Merged

7 tasks

nwiltsie mentioned this pull request Mar 7, 2024

And Nextflow regression test, update CHANGELOG uclahs-cds/pipeline-calculate-targeted-coverage#55

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add code infrastructure for pipeline configuration regression tests #12

Add code infrastructure for pipeline configuration regression tests #12

nwiltsie commented Feb 14, 2024

sorelfitzgibbon left a comment

nwiltsie commented Feb 15, 2024 •

edited

Loading

sorelfitzgibbon commented Feb 15, 2024

kiarod commented Feb 15, 2024

nwiltsie commented Feb 15, 2024 •

edited

Loading

nwiltsie commented Feb 15, 2024

zhuchcn commented Feb 18, 2024

nwiltsie commented Feb 20, 2024

nwiltsie commented Feb 26, 2024

yashpatel6 left a comment

	{
	"config": [
	"test/nftest.config"
	],
	"params_file": "test/single.yaml",
	"cpus": 16,
	"memory_gb": 31,
	"empty_files": [
	"/hot/ref/reference/GRCh38-BI-20160721/Homo_sapiens_assembly38.fasta",
	"/hot/ref/tool-specific-input/GATK/GRCh38/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz",
	"/hot/ref/tool-specific-input/GATK/GRCh38/Homo_sapiens_assembly38.known_indels.vcf.gz",
	"/hot/ref/tool-specific-input/GATK/GRCh38/resources_broad_hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf.gz",
	"/hot/ref/tool-specific-input/GATK/GRCh38/Biallelic/hapmap_3.3.hg38.BIALLELIC.PASS.2021-09-01.vcf.gz",
	"/hot/resource/SMC-HET/tumours/A-mini/bams/n1/output/S2.T-n1.bam"
	],
	"mapped_files": [],
	"nf_params": {
	"output_dir": "/tmp/outputs"
	},
	"envvars": {
	"SLURM_JOB_ID": "851543"
	},
	"mocks": {
	"parse_bam_header": {
	"read_group": [
	{
	"SM": "4915723"
	}
	]
	}
	},
	"dated_fields": [
	"params.log_output_dir",
	"report.file",
	"timeline.file",
	"trace.file"
	],

	"expected_result": {
	"docker": {
	"all_group_ids": "$(for i in `id --real --groups`; do echo -n \"--group-add=$i \"; done)",
	"enabled": true,
	"runOptions": "-u $(id -u):$(id -g) $(for i in `id --real --groups`; do echo -n \"--group-add=$i \"; done)",
	"uid_and_gid": "-u $(id -u):$(id -g)"
	},
	"manifest": {
	"author": "Yash Patel",
	"description": "Nextflow pipeline to perform Indel Realignment and Base Quality Score Recalibration",
	"name": "recalibrate-BAM",
	"version": "1.0.0-rc.4"
	},
	"params": {
	"aligner": "BWA-MEM2-2.2.1",
	"blcds_registered_dataset": false,
	"bundle_contest_hapmap_3p3_vcf_gz": "/hot/ref/tool-specific-input/GATK/GRCh38/Biallelic/hapmap_3.3.hg38.BIALLELIC.PASS.2021-09-01.vcf.gz",
	"bundle_known_indels_vcf_gz": "/hot/ref/tool-specific-input/GATK/GRCh38/Homo_sapiens_assembly38.known_indels.vcf.gz",
	"bundle_mills_and_1000g_gold_standard_indels_vcf_gz": "/hot/ref/tool-specific-input/GATK/GRCh38/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz",
	"bundle_v0_dbsnp138_vcf_gz": "/hot/ref/tool-specific-input/GATK/GRCh38/resources_broad_hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf.gz",
	"cache_intermediate_pipeline_steps": false,
	"dataset_id": "A-mini",
	"docker_container_registry": "ghcr.io/uclahs-cds",
	"docker_image_gatk": "broadinstitute/gatk:4.2.4.1",
	"docker_image_gatk3": "ghcr.io/uclahs-cds/call-gsnp:GATK-3.7.0",
	"docker_image_picard": "ghcr.io/uclahs-cds/picard:2.26.10",
	"docker_image_pipeval": "ghcr.io/uclahs-cds/pipeval:4.0.0-rc.2",
	"docker_image_samtools": "ghcr.io/uclahs-cds/samtools:1.17",
	"gatk3_version": "GATK-3.7.0",
	"gatk_command_mem_diff": "0",
	"gatk_ir_compression": "1",
	"gatk_version": "4.2.4.1",
	"input": {
	"BAM": {
	"tumor": [
	"/hot/resource/SMC-HET/tumours/A-mini/bams/n1/output/S2.T-n1.bam"
	]
	},
	"recalibration_table": [
	"/scratch/851543/NO_FILE.grp"
	]
	},
	"intervals": "",
	"is_DOC_run": false,
	"is_emit_original_quals": true,
	"is_targeted": false,
	"log_output_dir": "/tmp/outputs/recalibrate-BAM-1.0.0-rc.4/TWGSAMIN000001/log-recalibrate-BAM-1.0.0-rc.4-20240214T213139Z",
	"max_cpus": "16",
	"max_memory": "31 GB",
	"metapipeline_delete_input_bams": false,
	"metapipeline_states_to_delete": [
	"normal",
	"tumor"
	],
	"min_cpus": "1",
	"min_memory": "1 MB",
	"output_dir": "/tmp/outputs",
	"output_dir_base": "/tmp/outputs/recalibrate-BAM-1.0.0-rc.4/TWGSAMIN000001/GATK-4.2.4.1",
	"parallelize_by_chromosome": true,
	"patient_id": "TWGSAMIN000001",
	"picard_version": "2.26.10",
	"pipeval_version": "4.0.0-rc.2",
	"proc_resource_params": {
	"deduplicate_records_SAMtools": {
	"cpus": "2",
	"memory": "27.9 GB",
	"retry_strategy": {
	"memory": {
	"operand": "2",
	"strategy": "exponential"
	}
	}
	},
	"extract_GenomeIntervals": {
	"cpus": "1",
	"memory": "1 GB"
	},
	"remove_intermediate_files": {
	"cpus": "1",
	"memory": "1 GB"
	},
	"remove_merged_BAM": {
	"cpus": "1",
	"memory": "1 GB"
	},
	"remove_unmerged_BAMs": {
	"cpus": "1",
	"memory": "1 GB"
	},
	"run_ApplyBQSR_GATK": {
	"cpus": "1",
	"memory": "2 GB",
	"retry_strategy": {
	"memory": {
	"operand": "4",
	"strategy": "exponential"
	}
	}
	},
	"run_BaseRecalibrator_GATK": {
	"cpus": "1",
	"memory": "27.9 GB",
	"retry_strategy": {
	"memory": {
	"operand": "2",
	"strategy": "exponential"
	}
	}
	},
	"run_CalculateContamination_GATK": {
	"cpus": "1",
	"memory": "14 GB",
	"retry_strategy": {
	"memory": {
	"operand": "2",
	"strategy": "exponential"
	}
	}
	},
	"run_DepthOfCoverage_GATK": {
	"cpus": "1",
	"memory": "14 GB",
	"retry_strategy": {
	"memory": {
	"operand": "2",
	"strategy": "exponential"
	}
	}
	},
	"run_GetPileupSummaries_GATK": {
	"cpus": "1",
	"memory": "14 GB",
	"retry_strategy": {
	"memory": {
	"operand": "2",
	"strategy": "exponential"
	}
	}
	},
	"run_IndelRealigner_GATK": {
	"cpus": "2",
	"memory": "4 GB",
	"retry_strategy": {
	"memory": {
	"operand": "4",
	"strategy": "exponential"
	}
	}
	},
	"run_MergeSamFiles_Picard": {
	"cpus": "2",
	"memory": "27.9 GB",
	"retry_strategy": {
	"memory": {
	"operand": "2",
	"strategy": "exponential"
	}
	}
	},
	"run_RealignerTargetCreator_GATK": {
	"cpus": "2",
	"memory": "4 GB",
	"retry_strategy": {
	"memory": {
	"operand": "2",
	"strategy": "exponential"
	}
	}
	},
	"run_SplitIntervals_GATK": {
	"cpus": "1",
	"memory": "1 GB"
	},
	"run_index_SAMtools": {
	"cpus": "1",
	"memory": "2 GB",
	"retry_strategy": {
	"memory": {
	"operand": "2",
	"strategy": "exponential"
	}
	}
	},
	"run_validate_PipeVal": {
	"cpus": "1",
	"memory": "1 GB"
	}
	},
	"reference_fasta": "/hot/ref/reference/GRCh38-BI-20160721/Homo_sapiens_assembly38.fasta",
	"samples_to_process": [
	{
	"id": "4915723",
	"path": "/hot/resource/SMC-HET/tumours/A-mini/bams/n1/output/S2.T-n1.bam",
	"sample_type": "tumor"
	}
	],
	"samtools_version": "1.17",
	"save_intermediate_files": false,
	"scatter_count": "50",
	"split_intervals_extra_args": "",
	"ucla_cds": true,
	"use_recal_tables": false,
	"work_dir": "/scratch/851543"
	},
	"params_schema": {
	"aligner": {
	"help": "Aligner used to align input BAMs. Provided as <Aligner>-<Aligner-version>",
	"required": true,
	"type": "AlignerTool"
	},
	"base_resource_update": {
	"elements": {
	"cpus": {
	"help": "List of CPU updates",
	"required": false,
	"type": "ResourceUpdateList"
	},
	"memory": {
	"help": "List of memory updates",
	"required": false,
	"type": "ResourceUpdateList"
	}
	},
	"help": "User-defined modifications for adjusting base resource allocations for processes",
	"required": false,
	"type": "ResourceUpdateNamespace"
	},
	"bundle_contest_hapmap_3p3_vcf_gz": {
	"help": "Absolute path to ConEst HapMap 3p3 VCF",
	"mode": "r",
	"required": true,
	"type": "Path"
	},
	"bundle_known_indels_vcf_gz": {
	"help": "Absolute path to known INDELs VCF",
	"mode": "r",
	"required": true,
	"type": "Path"
	},
	"bundle_mills_and_1000g_gold_standard_indels_vcf_gz": {
	"help": "Absolute path to Mills and 1000g gold standard INDELs VCF",
	"mode": "r",
	"required": true,
	"type": "Path"
	},
	"bundle_v0_dbsnp138_vcf_gz": {
	"help": "Absolute path to v0 dbSNP 138 VCF",
	"mode": "r",
	"required": true,
	"type": "Path"
	},
	"dataset_id": {
	"help": "Dataset ID",
	"required": true,
	"type": "String"
	},
	"gatk_ir_compression": {
	"choices": [
	"0",
	"1",
	"2",
	"3",
	"4",
	"5",
	"6",
	"7",
	"8",
	"9"
	],
	"default": "1",
	"help": "",
	"required": false,
	"type": "Integer"
	},
	"input": {
	"elements": {
	"BAM": {
	"elements": {
	"normal": {
	"help": "Input normal BAMs",
	"required": false,
	"type": "BAMEntryList"
	},
	"tumor": {
	"help": "Input tumor BAMs",
	"required": false,
	"type": "BAMEntryList"
	}
	},
	"help": "Input BAMs for calling",
	"required": true,
	"type": "InputBAMNamespace"
	},
	"recalibration_table": {
	"allow_empty": false,
	"help": "List of any available recalibration tables",
	"required": false,
	"type": "RecalibrationTableList"
	}
	},
	"help": "Input samples",
	"required": true,
	"type": "InputNamespace"
	},
	"intervals": {
	"allow_empty": true,
	"help": "Target intervals to process for DNA panel/targeted sequencing samples; leave empty for WGS",
	"required": true,
	"type": "String"
	},
	"is_DOC_run": {
	"default": false,
	"help": "Whether to run the DepthOfCoverage process, which is very time-consuming for large BAMs",
	"required": true,
	"type": "Bool"
	},
	"is_emit_original_quals": {
	"default": true,
	"help": "Whether to emit original quality scores after recalibration",
	"required": true,
	"type": "Bool"
	},
	"metapipeline_delete_input_bams": {
	"default": false,
	"help": "Whether to delete the input BAMs",
	"required": true,
	"type": "Bool"
	},
	"metapipeline_final_output_dir": {
	"help": "Directory containing final outputs to check before input deletion",
	"required": false,
	"type": "String"
	},
	"metapipeline_states_to_delete": {
	"choice": [
	"normal",
	"tumor"
	],
	"default": [
	"normal",
	"tumor"
	],
	"help": "List of states for which to delete input BAMs",
	"required": true,
	"type": "List"
	},
	"output_dir": {
	"help": "Absolute path to output directory",
	"mode": "w",
	"required": true,
	"type": "Path"
	},
	"patient_id": {
	"help": "Patient ID",
	"required": true,
	"type": "String"
	},
	"reference_fasta": {
	"help": "Absolute path to reference genome fasta",
	"mode": "r",
	"required": true,
	"type": "Path"
	},
	"save_intermediate_files": {
	"default": false,
	"help": "Whether to save intermediate files",
	"required": true,
	"type": "Bool"
	},
	"scatter_count": {
	"default": "50",
	"help": "How many intervals to divide the genome into for parallelization",
	"required": true,
	"type": "Integer"
	},
	"split_intervals_extra_args": {
	"allow_empty": true,
	"help": "Extra arguments for interval splitting",
	"required": false,
	"type": "String"
	}
	},
	"proc_name_keys": [
	"withName:run_validate_PipeVal",
	"withName:extract_GenomeIntervals",
	"withName:run_SplitIntervals_GATK",
	"withName:run_RealignerTargetCreator_GATK",
	"withName:run_IndelRealigner_GATK",
	"withName:run_BaseRecalibrator_GATK",
	"withName:run_ApplyBQSR_GATK",
	"withName:run_MergeSamFiles_Picard",
	"withName:deduplicate_records_SAMtools",
	"withName:run_index_SAMtools",
	"withName:run_GetPileupSummaries_GATK",
	"withName:run_CalculateContamination_GATK",
	"withName:run_DepthOfCoverage_GATK",
	"withName:remove_intermediate_files",
	"withName:remove_unmerged_BAMs",
	"withName:remove_merged_BAM"
	],
	"proc_names": "[Ljava.lang.String;@7cf166db",
	"process": {
	"cache": false,
	"containerOptions": {
	"1": "--cpu-shares 1024 --cpus $task.cpus",
	"2": "--cpu-shares 1024 --cpus $task.cpus",
	"3": "--cpu-shares 1024 --cpus $task.cpus",
	"closure": "--cpu-shares 1024 --cpus $task.cpus"
	},
	"cpus": {
	"1": "1",
	"2": "2",
	"3": "3",
	"closure": "closure()"
	},
	"errorStrategy": {
	"1": "terminate",
	"2": "terminate",
	"3": "terminate",
	"closure": "terminate"
	},
	"executor": "local",
	"maxRetries": "1",
	"memory": "31 GB",
	"withLabel:process_high": {
	"cpus": {
	"1": "12",
	"2": "12",
	"3": "12",
	"closure": "retry_updater(12, add, 0, $task.attempt, cpus)"
	},
	"memory": {
	"1": "31 GB",
	"2": "31 GB",
	"3": "31 GB",
	"closure": "retry_updater(84 GB, exponential, 2, $task.attempt, memory)"
	}
	},
	"withLabel:process_low": {
	"cpus": {
	"1": "2",
	"2": "2",
	"3": "2",
	"closure": "retry_updater(2, add, 0, $task.attempt, cpus)"
	},
	"memory": {
	"1": "3 GB",
	"2": "6 GB",
	"3": "12 GB",
	"closure": "retry_updater(3 GB, exponential, 2, $task.attempt, memory)"
	}
	},
	"withLabel:process_medium": {
	"cpus": {
	"1": "6",
	"2": "6",
	"3": "6",
	"closure": "retry_updater(6, add, 0, $task.attempt, cpus)"
	},
	"memory": {
	"1": "31 GB",
	"2": "31 GB",
	"3": "31 GB",
	"closure": "retry_updater(42 GB, exponential, 2, $task.attempt, memory)"
	}
	},
	"withName:deduplicate_records_SAMtools": {
	"cpus": "2",
	"memory": {
	"1": "27.9 GB",
	"2": "31 GB",
	"3": "31 GB",
	"closure": "retry_updater(27.9 GB, exponential, 2, $task.attempt, memory)"
	}
	},
	"withName:extract_GenomeIntervals": {
	"cpus": "1",
	"memory": "1 GB"
	},
	"withName:remove_intermediate_files": {
	"cpus": "1",
	"memory": "1 GB"
	},
	"withName:remove_merged_BAM": {
	"cpus": "1",
	"memory": "1 GB"
	},
	"withName:remove_unmerged_BAMs": {
	"cpus": "1",
	"memory": "1 GB"
	},
	"withName:run_ApplyBQSR_GATK": {
	"cpus": "1",
	"memory": {
	"1": "2 GB",
	"2": "8 GB",
	"3": "31 GB",
	"closure": "retry_updater(2 GB, exponential, 4, $task.attempt, memory)"
	}
	},
	"withName:run_BaseRecalibrator_GATK": {
	"cpus": "1",
	"memory": {
	"1": "27.9 GB",
	"2": "31 GB",
	"3": "31 GB",
	"closure": "retry_updater(27.9 GB, exponential, 2, $task.attempt, memory)"
	}
	},
	"withName:run_CalculateContamination_GATK": {
	"cpus": "1",
	"memory": {
	"1": "14 GB",
	"2": "27.9 GB",
	"3": "31 GB",
	"closure": "retry_updater(14 GB, exponential, 2, $task.attempt, memory)"
	}
	},
	"withName:run_DepthOfCoverage_GATK": {
	"cpus": "1",
	"memory": {
	"1": "14 GB",
	"2": "27.9 GB",
	"3": "31 GB",
	"closure": "retry_updater(14 GB, exponential, 2, $task.attempt, memory)"
	}
	},
	"withName:run_GetPileupSummaries_GATK": {
	"cpus": "1",
	"memory": {
	"1": "14 GB",
	"2": "27.9 GB",
	"3": "31 GB",
	"closure": "retry_updater(14 GB, exponential, 2, $task.attempt, memory)"
	}
	},
	"withName:run_IndelRealigner_GATK": {
	"cpus": "2",
	"memory": {
	"1": "4 GB",
	"2": "16 GB",
	"3": "31 GB",
	"closure": "retry_updater(4 GB, exponential, 4, $task.attempt, memory)"
	}
	},
	"withName:run_MergeSamFiles_Picard": {
	"cpus": "2",
	"memory": {
	"1": "27.9 GB",
	"2": "31 GB",
	"3": "31 GB",
	"closure": "retry_updater(27.9 GB, exponential, 2, $task.attempt, memory)"
	}
	},
	"withName:run_RealignerTargetCreator_GATK": {
	"cpus": "2",
	"memory": {
	"1": "4 GB",
	"2": "8 GB",
	"3": "16 GB",
	"closure": "retry_updater(4 GB, exponential, 2, $task.attempt, memory)"
	}
	},
	"withName:run_SplitIntervals_GATK": {
	"cpus": "1",
	"memory": "1 GB"
	},
	"withName:run_index_SAMtools": {
	"cpus": "1",
	"memory": {
	"1": "2 GB",
	"2": "4 GB",
	"3": "8 GB",
	"closure": "retry_updater(2 GB, exponential, 2, $task.attempt, memory)"
	}
	},
	"withName:run_validate_PipeVal": {
	"cpus": "1",
	"memory": "1 GB"
	}
	},
	"report": {
	"enabled": true,
	"file": "/tmp/outputs/recalibrate-BAM-1.0.0-rc.4/TWGSAMIN000001/log-recalibrate-BAM-1.0.0-rc.4-20240214T213139Z/nextflow-log/report.html"
	},
	"timeline": {
	"enabled": true,
	"file": "/tmp/outputs/recalibrate-BAM-1.0.0-rc.4/TWGSAMIN000001/log-recalibrate-BAM-1.0.0-rc.4-20240214T213139Z/nextflow-log/timeline.html"
	},
	"trace": {
	"enabled": true,
	"file": "/tmp/outputs/recalibrate-BAM-1.0.0-rc.4/TWGSAMIN000001/log-recalibrate-BAM-1.0.0-rc.4-20240214T213139Z/nextflow-log/trace.txt"
	},
	"workDir": "/scratch/851543",
	"yaml": {
	"aligner": {
	"help": "Aligner used to align input BAMs. Provided as <Aligner>-<Aligner-version>",
	"required": true,
	"type": "AlignerTool"
	},
	"base_resource_update": {
	"elements": {
	"cpus": {
	"help": "List of CPU updates",
	"required": false,
	"type": "ResourceUpdateList"
	},
	"memory": {
	"help": "List of memory updates",
	"required": false,
	"type": "ResourceUpdateList"
	}
	},
	"help": "User-defined modifications for adjusting base resource allocations for processes",
	"required": false,
	"type": "ResourceUpdateNamespace"
	},
	"bundle_contest_hapmap_3p3_vcf_gz": {
	"help": "Absolute path to ConEst HapMap 3p3 VCF",
	"mode": "r",
	"required": true,
	"type": "Path"
	},
	"bundle_known_indels_vcf_gz": {
	"help": "Absolute path to known INDELs VCF",
	"mode": "r",
	"required": true,
	"type": "Path"
	},
	"bundle_mills_and_1000g_gold_standard_indels_vcf_gz": {
	"help": "Absolute path to Mills and 1000g gold standard INDELs VCF",
	"mode": "r",
	"required": true,
	"type": "Path"
	},
	"bundle_v0_dbsnp138_vcf_gz": {
	"help": "Absolute path to v0 dbSNP 138 VCF",
	"mode": "r",
	"required": true,
	"type": "Path"
	},
	"dataset_id": {
	"help": "Dataset ID",
	"required": true,
	"type": "String"
	},
	"gatk_ir_compression": {
	"choices": [
	"0",
	"1",
	"2",
	"3",
	"4",
	"5",
	"6",
	"7",
	"8",
	"9"
	],
	"default": "1",
	"help": "",
	"required": false,
	"type": "Integer"
	},
	"input": {
	"elements": {
	"BAM": {
	"elements": {
	"normal": {
	"help": "Input normal BAMs",
	"required": false,
	"type": "BAMEntryList"
	},
	"tumor": {
	"help": "Input tumor BAMs",
	"required": false,
	"type": "BAMEntryList"
	}
	},
	"help": "Input BAMs for calling",
	"required": true,
	"type": "InputBAMNamespace"
	},
	"recalibration_table": {
	"allow_empty": false,
	"help": "List of any available recalibration tables",
	"required": false,
	"type": "RecalibrationTableList"
	}
	},
	"help": "Input samples",
	"required": true,
	"type": "InputNamespace"
	},
	"intervals": {
	"allow_empty": true,
	"help": "Target intervals to process for DNA panel/targeted sequencing samples; leave empty for WGS",
	"required": true,
	"type": "String"
	},
	"is_DOC_run": {
	"default": false,
	"help": "Whether to run the DepthOfCoverage process, which is very time-consuming for large BAMs",
	"required": true,
	"type": "Bool"
	},
	"is_emit_original_quals": {
	"default": true,
	"help": "Whether to emit original quality scores after recalibration",
	"required": true,
	"type": "Bool"
	},
	"metapipeline_delete_input_bams": {
	"default": false,
	"help": "Whether to delete the input BAMs",
	"required": true,
	"type": "Bool"
	},
	"metapipeline_final_output_dir": {
	"help": "Directory containing final outputs to check before input deletion",
	"required": false,
	"type": "String"
	},
	"metapipeline_states_to_delete": {
	"choice": [
	"normal",
	"tumor"
	],
	"default": [
	"normal",
	"tumor"
	],
	"help": "List of states for which to delete input BAMs",
	"required": true,
	"type": "List"
	},
	"output_dir": {
	"help": "Absolute path to output directory",
	"mode": "w",
	"required": true,
	"type": "Path"
	},
	"patient_id": {
	"help": "Patient ID",
	"required": true,
	"type": "String"
	},
	"reference_fasta": {
	"help": "Absolute path to reference genome fasta",
	"mode": "r",
	"required": true,
	"type": "Path"
	},
	"save_intermediate_files": {
	"default": false,
	"help": "Whether to save intermediate files",
	"required": true,
	"type": "Bool"
	},
	"scatter_count": {
	"default": "50",
	"help": "How many intervals to divide the genome into for parallelization",
	"required": true,
	"type": "Integer"
	},
	"split_intervals_extra_args": {
	"allow_empty": true,
	"help": "Extra arguments for interval splitting",
	"required": false,
	"type": "String"
	}
	}
	}

	# These are namespaces defined in the common submodules
	boring_keys = {
	'csv_parser',
	'custom_schema_types',
	'methods',
	'retry',
	'schema',
	'bam_parser',
	'json_extractor',
	}

	for key in boring_keys:
	result.pop(key, None)

Add code infrastructure for pipeline configuration regression tests #12

Add code infrastructure for pipeline configuration regression tests #12

Conversation

nwiltsie commented Feb 14, 2024

The problem this is solving

Interface

Test File Format

Current Usage

Internals

Docker Image

Runtime Container

Temporary Config File

Groovy Script

Parsing and Diffing

Where to go from here

sorelfitzgibbon left a comment

Choose a reason for hiding this comment

nwiltsie commented Feb 15, 2024 • edited Loading

sorelfitzgibbon commented Feb 15, 2024

kiarod commented Feb 15, 2024

nwiltsie commented Feb 15, 2024 • edited Loading

nwiltsie commented Feb 15, 2024

zhuchcn commented Feb 18, 2024

nwiltsie commented Feb 20, 2024

nwiltsie commented Feb 26, 2024

yashpatel6 left a comment

Choose a reason for hiding this comment

nwiltsie commented Feb 15, 2024 •

edited

Loading

nwiltsie commented Feb 15, 2024 •

edited

Loading