-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Snapshot Transformer #223
base: main
Are you sure you want to change the base?
Snapshot Transformer #223
Conversation
That looks promising, I'll have a look |
This looks very good! I definitely see some useful things that could simplify how we create snapshots here 🥳 |
Hi @lukfor ! This looks great! From what we have seen, the biggest problem typically arises from a few "variable" files that don't allow us to snapshot meaning we have to revert to only checking for Using your example below,
What would the relevant code snippet be to:
|
@GallVp Usman, could you give this feature a look and provide your feedback. In general, this might be a good feature but, I doubt if it will be useful right away in nf-core as it expects the outputs to be separated into their respective channels (which is not the case for many nf-core modules). anyway, plz give it a look and lets discuss when we meet next. Thanks |
Hi @sateeshperi This can be very useful in some cases. I was in a somewhat similar situation and resorted to the following logic for the orthofinder module: import groovy.io.FileType
.
.
.
assert process.success
def all_files = []
file(process.out.orthofinder[0][1]).eachFileRecurse (FileType.FILES) { file ->
all_files << file
}
def all_file_names = all_files.collect { it.name }.sort(false)
def stable_file_names = [
'Statistics_PerSpecies.tsv',
'SpeciesTree_Gene_Duplications_0.5_Support.txt',
'SpeciesTree_rooted.txt'
]
def stable_files = all_files.findAll { it.name in stable_file_names }
assert snapshot(
all_file_names,
stable_files,
process.out.versions[0]
).match() With the proposed transformers, we may be able to make the snapshotting more groovy! |
Thanks for your examples! They really help me get a better understanding of what is needed. |
Thank you @lukfor This pattern is very common (example from nf-core/modules), { assert snapshot(process.out.versions,
process.out.bam.collect { bam(it[1]).getReadsMD5() },
process.out.fastq,
process.out.log
).match()
} where all the outputs can be md5'ed except the log file or a bam file. We currently have to list all the outputs and apply a function to a specific output. Would it be possible to select a specific output by name and apply a function to it. So that, { assert snapshot(
process.out.mutate('bam') { it -> bam(it[1]).getReadsMD5() }
).match()
}
|
Thank you @sateeshperi The example code I pasted above is already using |
This PR uses JSONPath selectors to remove, replace or map elements in snapshots.
Happy to hear your feedback! (especially @sateeshperi @nvnieuwk @maxulysse @adamrtalbot)
Related issues: #211 and #116
Snapshot Transformer
Taking snapshots of objects is an easy and effective way to create regression tests. By capturing the state of an object at a particular point in time, you can compare it against future states to detect any unintended changes. However, not every object is deterministic. Certain elements such as dates, log files, and headers can introduce variability, making direct comparisons unreliable.
Consider the following snapshot object:
This snapshot object has several non deterministic values:
start_time
andend_time
fields contain timestamps that will vary with each execution.chunks
array can have elements in random order.files
array includes a log file that contain timestamps.To address this,
nf-test
provides methods to transform and reduce snapshots to make them deterministic. These methods include replacing random values, formatting numbers, and reducing large contents by generating MD5 hashes.To make this snapshot deterministic, we need to transform these elements.
Methods and Functions
replace
The
replace
function allows you to replace a value at a specific JSON path with a fixed value or pattern.Parameters:
Example:
This replaces the
end_time
field with a fixed string"<END_TIME>"
.map
The
map
function transforms elements of an array or an object using a provided function. You can apply this to each element individually or to the array as a whole.Parameters:
Example:
This sorts the elements of the
chunks
array to ensure consistent ordering.Example:
This replaces each element in the
chunks
array with the value27
.remove
The
remove
function removes elements from the snapshot based on a JSON path.Parameters:
Example:
This removes the
start_time
field from the snapshot.traverse
The
traverse
function iterates over all key-value pairs in the snapshot and applies a transformation function.Parameters:
key
andvalue
.Example:
This replaces values that end with ".log" with the string
"<LOG>"
.view
The
view
function is used to print the final snapshot. It provides a way to see what the snapshot looks like after all transformations have been applied.You can also view intermediate result:
Explanation of JSONPath Selectors
.start_time
: Selects thestart_time
field at the root level of the JSON object..end_time
: Selects theend_time
field at the root level of the JSON object..chunks
: Selects thechunks
array at the root level of the JSON object..chunks[*]
: Selects each element within thechunks
array.contents[0].value
: Selects thevalue
field of the first element in thecontents
array.contents[0].files[?(@ == "output.log")]
: Selects theoutput.log
file in thefiles
array if it matches exactly.contents[0].files[?(@ =~ /.*\\.log$/)]
: Selects all log files in thefiles
array using a regular expression.