Add ability to mask absolute version numbers in Nextflow tests #40

nwiltsie · 2024-05-24T22:17:48Z

Description

The Nextflow configuration tests have the pipeline's version repeated multiple times in the expected values. That means that any version bump requires updating every test, like with uclahs-cds/pipeline-call-gSV#151 (comment).

In order to eliminate those kinds of frustrating PRs, this PR adds an optional new version_fields parameter for tests. That parameter is (as might be expected) a list of fields that contain the version. Any field listed should also have its embedded version number(s) updated to the string VER.SI.ON, like so:

--- a/test/configtest-F16.json
+++ b/test/configtest-F16.json
@@ -30,6 +30,14 @@
     "trace.file",
     "params.date"
   ],
+  "version_fields": [
+    "manifest.version",
+    "params.log_output_dir",
+    "params.output_dir_base",
+    "report.file",
+    "trace.file",
+    "timeline.file"
+  ],
   "expected_result": {
     "docker": {
       "all_group_ids": "$(for i in `id --real --groups`; do echo -n \"--group-add=$i \"; done)",
@@ -41,7 +49,7 @@
       "author": "Yu Pan, Tim Sanders, Yael Berkovich, Mohammed Faizal Eeman Mootor",
       "description": "A pipeline to call germline structural variants utilizing Delly and Manta",
       "name": "call-gSV",
-      "version": "5.0.0"
+      "version": "VER.SI.ON"
     },
     "params": {
       "GCNV": "gCNV",
@@ -70,7 +78,7 @@
           ]
         }
       },
-      "log_output_dir": "/tmp/test-only-outputs/call-gSV-5.0.0/8675309/log-call-gSV-5.0.0-19970704T165655Z",
+      "log_output_dir": "/tmp/test-only-outputs/call-gSV-VER.SI.ON/8675309/log-call-gSV-VER.SI.ON-19970704T165655Z",
       "manta_version": "1.6.0",
       "map_qual": "20",
       "mappability_map": "/hot/ref/tool-specific-input/Delly/GRCh38/Homo_sapiens.GRCh38.dna.primary_assembly.fa.r101.s501.blacklist.gz",
@@ -79,7 +87,7 @@
       "min_cpus": "1",
       "min_memory": "1 MB",
       "output_dir": "/tmp/test-only-outputs",
-      "output_dir_base": "/tmp/test-only-outputs/call-gSV-5.0.0/8675309",
+      "output_dir_base": "/tmp/test-only-outputs/call-gSV-VER.SI.ON/8675309",
       "pipeval_version": "4.0.0-rc.2",
       "proc_resource_params": {
         "call_gCNV_Delly": {
@@ -368,15 +376,15 @@
     },
     "report": {
       "enabled": true,
-      "file": "/tmp/test-only-outputs/call-gSV-5.0.0/8675309/log-call-gSV-5.0.0-19970704T165655Z/nextflow-log/report.html"
+      "file": "/tmp/test-only-outputs/call-gSV-VER.SI.ON/8675309/log-call-gSV-VER.SI.ON-19970704T165655Z/nextflow-log/report.html"
     },
     "timeline": {
       "enabled": true,
-      "file": "/tmp/test-only-outputs/call-gSV-5.0.0/8675309/log-call-gSV-5.0.0-19970704T165655Z/nextflow-log/timeline.html"
+      "file": "/tmp/test-only-outputs/call-gSV-VER.SI.ON/8675309/log-call-gSV-VER.SI.ON-19970704T165655Z/nextflow-log/timeline.html"
     },
     "trace": {
       "enabled": true,
-      "file": "/tmp/test-only-outputs/call-gSV-5.0.0/8675309/log-call-gSV-5.0.0-19970704T165655Z/nextflow-log/trace.txt"
+      "file": "/tmp/test-only-outputs/call-gSV-VER.SI.ON/8675309/log-call-gSV-VER.SI.ON-19970704T165655Z/nextflow-log/trace.txt"
     },
     "tz": "sun.util.calendar.ZoneInfo[id=\"UTC\",offset=0,dstSavings=0,useDaylight=false,transitions=0,lastRule=null]",
     "workDir": "/scratch/300935"

When a test is run, the true/current version number is parsed from the manifest.version field of the raw test output. Each field in version_fields then has that exact version number replaced with VER.SI.ON before the comparison with the expected results.

The effect of that is that the string VER.SI.ON always represents one specific version throughout the entire file, even if that specific version is variable. That means that the test in the pipeline-call-gSV example above would not have needed to be modified.

It also means that we'll catch if a version number is incorrectly hard-coded somewhere, e.g. manifest.version = "1.0.0"; params.randomvar = "pipeline_${manifest.version}/output_1.0.0/". If the test JSON were written to assume that both numbers would update, i.e. "randomvar": "pipeline_VER.SI.ON/output_VER.SI.ON/", then that would fail once manifest.version was updated to anything other than 1.0.0.

I've tested this locally on pipeline-recalibrate-BAM and pipeline-call-gSV.

Checklist

This PR does NOT contain Protected Health Information (PHI). A repo may need to be deleted if such data is uploaded.
Disclosing PHI is a major problem¹ - Even a small leak can be costly².
This PR does NOT contain germline genetic data³, RNA-Seq, DNA methylation, microbiome or other molecular data⁴.

This PR does NOT contain other non-plain text files, such as: compressed files, images (e.g. .png, .jpeg), .pdf, .RData, .xlsx, .doc, .ppt, or other output files.

To automatically exclude such files using a .gitignore file, see here for example.

I have read the code review guidelines and the code review best practice on GitHub check-list.
I have set up or verified the main branch protection rule following the github standards before opening this pull request.
The name of the branch is meaningful and well formatted following the standards, using [AD_username (or 5 letters of AD if AD is too long)]-[brief_description_of_branch].
I have added the major changes included in this pull request to the CHANGELOG.md under the next release version or unreleased, and updated the date.

UCLA Health reaches $7.5m settlement over 2015 breach of 4.5m patient records ↩
The average healthcare data breach costs $2.2 million, despite the majority of breaches releasing fewer than 500 records. ↩
Genetic information is considered PHI.
Forensic assays can identify patients with as few as 21 SNPs ↩
RNA-Seq, DNA methylation, microbiome, or other molecular data can be used to predict genotypes (PHI) and reveal a patient's identity. ↩

yashpatel6

Looks good to me!

run-nextflow-tests/utils.py

nwiltsie added 5 commits May 24, 2024 13:06

Mask out versions

4bcc709

Update the README

3b4f627

Strip optional fields from output for brevity

b135e0d

Don't automatically add 'manifest.version' to the masked fields

7e355da

Add example version_fields to the README

1bd89d4

nwiltsie requested a review from a team May 24, 2024 22:17

yashpatel6 approved these changes May 28, 2024

View reviewed changes

run-nextflow-tests/utils.py Show resolved Hide resolved

nwiltsie merged commit 9d7245d into main May 28, 2024
5 checks passed

nwiltsie deleted the nwiltsie-mask-versions branch May 28, 2024 20:05

nwiltsie mentioned this pull request May 29, 2024

Sfitz input vcfs uclahs-cds/pipeline-call-sSNV#274

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ability to mask absolute version numbers in Nextflow tests #40

Add ability to mask absolute version numbers in Nextflow tests #40

nwiltsie commented May 24, 2024

yashpatel6 left a comment

Add ability to mask absolute version numbers in Nextflow tests #40

Add ability to mask absolute version numbers in Nextflow tests #40

Conversation

nwiltsie commented May 24, 2024

Description

Checklist

Footnotes

yashpatel6 left a comment

Choose a reason for hiding this comment