determine the exact output files from the start for complete compatibility #12

sreichl · 2024-03-30T09:03:46Z

DEA modules with explicit output to enable usage as module with subsequent modules (avoiding missing input exception) e.g., for enrichment analysis as input.
Requires loading all metadata files and explicitly defining final output files using limma /lmfit variable naming scheme.

Pro:

enables smooth usage as a module with explicit outputs to be used for subsequent inputs

Con

requires data dependent configuration/annotation -> considered bad practice/to be avoided
- read up on why and how this translates to this use case

if done, do the same in dea_seurat.

sreichl · 2024-05-21T08:52:27Z

Idea 1: pre-generate all feature list names

Make feature list generating rule a checkpoint with a subsequent aggregation rule that creates a csv similar to the input annotation of enrichment analysis (name, path, background,…) for each analysis. -> this is then required in the target rule instead of the feature_list folder
Thereby the missing input problem is solved without using the internal data and the annotation of enrichment analysis module has become less cumbersome.
-> enabling run from A to Z
Need to explicitly determine the exact filenames before execution and then instruct rules -> Is this actually possible?! I did not manage before in genome_track to make outputs conditional, only inputs using input functions.
This requires the function dmatrix from library patsy, which in turn requires the Global Workflow Dependency functionality of Snakemake 8
need to make empty files for groups without DEGs

Idea 2: use checkpoints

should work directly without a rule in the middle: https://edwards.flinders.edu.au/how-to-use-snakemake-checkpoints/
Just Checkpoints did not work.

Idea 3: use for loops around the rule

Check if for loops for rules are supported. Then one rule per analysis with the respective expand for the result files.

Idea 4: input = output?

Can I have a rule that has its input as output?!

Idea 5: adapt enrichmnet_analysis input

Change enrichment analysis input to a pattern of the output directory of the differential analysis. Think it threw before testing and implementing

Idea 6: Split up the feature list generation per group

Con: waste of resources as the result is loaded over and over
Pro: specific outputs supported by Snakemake
Request in the final target rule all pre determined feature lists and use wildcards for each group within each analyses.
Solves the problem without checkpoints or other problems (but requires Snakemake 8)
To save resources the explicit rule can take the input from the checkpoint but selects only for the lists per analysis and then copies or touches them?

sreichl · 2024-05-26T11:53:13Z

Goal: Run analyses from rAw/reAds to pathwayZ/enrichmentZ i.e., close the gap between dea_limma/_seurat and enrichment-anlaysis module

if explicit pre generation of file names, then Snakemake 8 is required

install Snakemake 8
setup & document SLURM executor for CeMM HPC
change module to work with Snakemake 8 and SLURM executor (e.g., move partition from param to resource)
- change & test all other modules, then switch min_version to 8.X.X
add global workflow dependency ie envs/global.yaml with library patsy for function dmatrix
develop function that generates file names using patsy
add it to target rule all as final outcome

add rule that touches (or copies?) respective files per group from checkpoint or call a new rule/script for feature list generation per group

input:
    get_feature_lists,
output:
    up = os.path.join(result_path,'{analysis}','feature_lists','{group}_up_features.txt'),
    up_annot = os.path.join(result_path,'{analysis}','feature_lists','{group}_up_features_annot.txt') if config["feature_annotation"]["path"]!="" else [],
    # same for down and featureScores.csv

sreichl · 2024-06-21T09:09:55Z

predetermining result names potential problem
Requires to look into annotation/metatada data that is upstream generated by eg spilterlize or scRNAseq processing… hence can’t be used for a real A to Z run… But isn't that then a general problem? Think about it thoroughly before testing, then test easily without heavy developing.
Which brings me back to checkpoints between modules being the solution?!?!

sreichl · 2024-06-30T09:14:59Z

Annotations coming from previous outputs should be copied to the respective config folder. Best practice usage should be working through a project module by module and thereby creating the respective annotation files. Only at the end a ring from A to Z should be possible for rerunning not investigation/exploration.

sreichl · 2024-08-27T16:15:24Z

from dlaehnemann (Snakemake Dev) on Discord

Actually, having empty files but then a deterministic list of files sounds like the best and cleanest solution to me. That way, files are properly tracked, and you just have to decide how to handle empty files downstream. Whatever you can encode in deterministic wiring of rules, I would try to do that way. checkpoints and directory() outputs really are a last resort, as they will make it more difficult for snakemake to resolve stuff.

sreichl self-assigned this Mar 30, 2024

sreichl added the enhancement New feature or request label Mar 30, 2024

sreichl added a commit that referenced this issue May 25, 2024

implement checkpoints for generated feature lists #12

93c0d48

sreichl added a commit that referenced this issue May 26, 2024

update software versions and prepare for explicit result creation #12

fa938d9

sreichl changed the title ~~consider determining the exact output files from the start for complete compatibility~~ determine the exact output files from the start for complete compatibility Sep 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

determine the exact output files from the start for complete compatibility #12

determine the exact output files from the start for complete compatibility #12

sreichl commented Mar 30, 2024 •

edited

Loading

sreichl commented May 21, 2024 •

edited

Loading

sreichl commented May 26, 2024 •

edited

Loading

sreichl commented Jun 21, 2024

sreichl commented Jun 30, 2024

sreichl commented Aug 27, 2024

determine the exact output files from the start for complete compatibility #12

determine the exact output files from the start for complete compatibility #12

Comments

sreichl commented Mar 30, 2024 • edited Loading

sreichl commented May 21, 2024 • edited Loading

Idea 1: pre-generate all feature list names

Idea 2: use checkpoints

Idea 3: use for loops around the rule

Idea 4: input = output?

Idea 5: adapt enrichmnet_analysis input

Idea 6: Split up the feature list generation per group

sreichl commented May 26, 2024 • edited Loading

sreichl commented Jun 21, 2024

sreichl commented Jun 30, 2024

sreichl commented Aug 27, 2024

sreichl commented Mar 30, 2024 •

edited

Loading

sreichl commented May 21, 2024 •

edited

Loading

sreichl commented May 26, 2024 •

edited

Loading