Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Add stub support to every module #4570

Open
Tracked by #5828
ewels opened this issue Dec 11, 2023 · 8 comments
Open
Tracked by #5828

[FEATURE] Add stub support to every module #4570

ewels opened this issue Dec 11, 2023 · 8 comments
Labels
enhancement New feature or request

Comments

@ewels
Copy link
Member

ewels commented Dec 11, 2023

Using command stub blocks in Nextflow is useful when developing syntax and quickly testing pipelines. However, its use within nf-core is limited because the vast majority of nf-core modules do not have stub blocks.

My suggestion is to require that every module should have a stub block. All 1095 of them :trollface:

We can lint for the presence of this in linting and also potentially run with -stub-run in the module CI.

Although adding a stub block for one or two modules is not difficult, adding it for all will be a significant effort. This could be a nice project focus for a future hackathon(s).

Tasks

No tasks being tracked yet.
@ewels ewels added the enhancement New feature or request label Dec 11, 2023
@GallVp
Copy link
Member

GallVp commented Dec 19, 2023

Hi @ewels

This is such a good idea and can be immensely helpful when implementing dataflows.

Here is something that I have been wondering about and I thought I'll share it here to get your feedback. With nf-test snapshot feature, it may be possible to partially automate stub generation and testing.

If there is a function in nf-test (snapshot.matchStub("test-name")) which compares the structure of process outputs against a snapshot while ignoring md5sums, it may be able to suggest stub code such as:

touch ${prefix}.gff3
touch ${prefix}.report.html

...versions.yml...

It won't be able to tackle complex situations but can provide starter code. Moreover, the stub testing will become more robust by expecting a complete match of output structure across script and stub for the same inputs and configuration.

Here is a pull request where I have tried to implement something similar but manually: #4627

@ewels
Copy link
Member Author

ewels commented Dec 19, 2023

That would be great 👌🏻 @mashehu / @mirpedrol - what do you think: could we make a #tools command to scaffold a stub from nf-test outputs / snapshots?

@GallVp
Copy link
Member

GallVp commented Dec 19, 2023

I think if snapshot.matchStub("test-name") is implemented by nf-test, nf-core tools might not have to change at all. I am hesitant to commit time at this point but keen to contribute after March 2024. Probably someone will pick it up before that otherwise, I am keen to give it a go.

@GallVp
Copy link
Member

GallVp commented Dec 20, 2023

I have tried to implement something similar in logic to snapshot.matchStub for the fastp module here: #4637

Each test creates a sorted list of outputs which should be matched by the same test with the -stub option.

{
    assert snapshot(
        (
            [process.out.reads[0][0].toString()] + // meta
            process.out.reads.collect { file(it[1]).getName() } +
            process.out.json.collect { file(it[1]).getName() } +
            process.out.html.collect { file(it[1]).getName() } +
            process.out.log.collect { file(it[1]).getName() } +
            process.out.reads_fail.collect { file(it[1]).getName() } +
            process.out.reads_merged.collect { file(it[1]).getName() }
        ).sort()
    ).match("test_fastp_single_end-for_stub_match")
}

@GallVp
Copy link
Member

GallVp commented Dec 21, 2023

On @sateeshperi suggestion, also raised it on nf-test askimed/nf-test#168

@famosab
Copy link
Contributor

famosab commented Jun 24, 2024

Can we create a nice comprehensive list where the stub is still missing (such as was done for other batch changes)?

@GallVp
Copy link
Member

GallVp commented Jun 25, 2024

Good idea @famosab

As of https://github.com/nf-core/modules/tree/d5b47a24314cab9f64593f29cf97a64b0acc7dce, there are 512 modules without stub:

mods=($(find ./modules/nf-core -name main.nf)) 
for file in $mods; do grep -q 'stub:' "$file" || sed -n 's/process \(.*\) {/- \1/p' "$file"; done | sort -V
  • ABACAS
  • ABRICATE_SUMMARY
  • ADAPTERREMOVAL
  • ADAPTERREMOVALFIXPREFIX
  • AFFY_JUSTRMA
  • AMPS
  • ANGSD_CONTAMINATION
  • ANGSD_DOCOUNTS
  • ANOTA2SEQ_ANOTA2SEQRUN
  • ARCASHLA_EXTRACT
  • ARIBA_GETREF
  • ARIBA_RUN
  • ARTIC_GUPPYPLEX
  • ARTIC_MINION
  • ASHLAR
  • ASSEMBLYSCAN
  • ATAQV_ATAQV
  • ATAQV_MKARV
  • ATLAS_CALL
  • ATLAS_PMD
  • ATLAS_RECAL
  • ATLAS_SPLITMERGE
  • AUTHENTICT_DEAM2CONT
  • BAM2FASTX_BAM2FASTQ
  • BAMTOOLS_CONVERT
  • BAMTOOLS_SPLIT
  • BAMTOOLS_STATS
  • BAMUTIL_TRIMBAM
  • BANDAGE_IMAGE
  • BBMAP_ALIGN
  • BBMAP_BBNORM
  • BBMAP_CLUMPIFY
  • BBMAP_INDEX
  • BBMAP_PILEUP
  • BBMAP_SENDSKETCH
  • BCFTOOLS_CONSENSUS
  • BCFTOOLS_ISEC
  • BCL2FASTQ
  • BCLCONVERT
  • BEAGLE5_BEAGLE
  • BEDTOOLS_CLOSEST
  • BEDTOOLS_COMPLEMENT
  • BEDTOOLS_COVERAGE
  • BEDTOOLS_MASKFASTA
  • BIOAWK
  • BIOBAMBAM_BAMMARKDUPLICATES2
  • BIOBAMBAM_BAMMERGE
  • BIOBAMBAM_BAMSORMADUP
  • BIOHANSEL
  • BISCUIT_ALIGN
  • BISCUIT_EPIREAD
  • BISCUIT_INDEX
  • BISCUIT_PILEUP
  • BISCUIT_QC
  • BISCUIT_VCF2BED
  • BISMARK_ALIGN
  • BISMARK_COVERAGE2CYTOSINE
  • BISMARK_DEDUPLICATE
  • BISMARK_GENOMEPREPARATION
  • BISMARK_METHYLATIONEXTRACTOR
  • BISMARK_REPORT
  • BISMARK_SUMMARY
  • BOWTIE_ALIGN
  • BOWTIE_BUILD
  • BWAMETH_ALIGN
  • BWAMETH_INDEX
  • BWA_ALN
  • BWA_SAMPE
  • BWA_SAMSE
  • CALDER2
  • CANU
  • CDHIT_CDHIT
  • CELLRANGERARC_MKGTF
  • CELLRANGERARC_MKREF
  • CELLRANGERATAC_MKREF
  • CHECKM_LINEAGEWF
  • CHECKM_QA
  • CHOPPER
  • CLIPPY
  • CLONALFRAMEML
  • CMSEQ_POLYMUT
  • CNVKIT_ACCESS
  • CNVKIT_ANTITARGET
  • CNVKIT_BATCH
  • CNVKIT_CALL
  • CNVKIT_EXPORT
  • CNVKIT_GENEMETRICS
  • CNVKIT_REFERENCE
  • CNVKIT_TARGET
  • CONCOCT_CONCOCT
  • CONCOCT_EXTRACTFASTABINS
  • CONCOCT_MERGECUTUPCLUSTERING
  • COOLER_BALANCE
  • COOLER_DIGEST
  • COOLER_DUMP
  • COOLER_MAKEBINS
  • CRISPRCLEANR_NORMALIZE
  • CRUMBLE
  • CUSTOM_DUMPSOFTWAREVERSIONS
  • CUSTOM_MATRIXFILTER
  • CUSTOM_SRATOOLSNCBISETTINGS
  • CUSTOM_TABULARTOGSEACLS
  • CUSTOM_TABULARTOGSEAGCT
  • CUTESV
  • DAMAGEPROFILER
  • DASTOOL_DASTOOL
  • DASTOOL_FASTATOCONTIG2BIN
  • DASTOOL_SCAFFOLDS2BIN
  • DECOUPLER
  • DEDUP
  • DEEPBGC_DOWNLOAD
  • DEEPCELL_MESMER
  • DEEPTOOLS_BAMCOVERAGE
  • DEEPTOOLS_COMPUTEMATRIX
  • DEEPTOOLS_MULTIBAMSUMMARY
  • DEEPTOOLS_PLOTCORRELATION
  • DEEPTOOLS_PLOTFINGERPRINT
  • DEEPTOOLS_PLOTHEATMAP
  • DEEPTOOLS_PLOTPCA
  • DEEPTOOLS_PLOTPROFILE
  • DRAGONFLYE
  • DSHBIO_EXPORTSEGMENTS
  • DSHBIO_FILTERBED
  • DSHBIO_FILTERGFF3
  • DSHBIO_SPLITBED
  • DSHBIO_SPLITGFF3
  • DUPHOLD
  • ECTYPER
  • EIDO_CONVERT
  • EIDO_VALIDATE
  • EIGENSTRATDATABASETOOLS_EIGENSTRATSNPCOVERAGE
  • ELPREP_FILTER
  • ELPREP_MERGE
  • ELPREP_SPLIT
  • EMBOSS_SEQRET
  • EMMTYPER
  • ENDORSPY
  • ENTREZDIRECT_ESEARCH
  • ENTREZDIRECT_ESUMMARY
  • ENTREZDIRECT_XTRACT
  • EPANG_PLACE
  • EPANG_SPLIT
  • EXPANSIONHUNTERDENOVO_MERGE
  • EXPANSIONHUNTERDENOVO_PROFILE
  • FARGENE
  • FASTANI
  • FASTAWINDOWS
  • FASTK_HISTEX
  • FASTK_MERGE
  • FASTQSCAN
  • FASTTREE
  • FFQ
  • FGBIO_CALLDUPLEXCONSENSUSREADS
  • FGBIO_CALLMOLECULARCONSENSUSREADS
  • FGBIO_FILTERCONSENSUSREADS
  • FGBIO_GROUPREADSBYUMI
  • FGBIO_SORTBAM
  • FGBIO_ZIPPERBAMS
  • FILTLONG
  • FQTK
  • FQ_GENERATE
  • FQ_LINT
  • FREEBAYES
  • GAMMA_GAMMA
  • GANGSTR
  • GAPPA_EXAMINEASSIGN
  • GAPPA_EXAMINEGRAFT
  • GAPPA_EXAMINEHEATTREE
  • GATK4SPARK_BASERECALIBRATOR
  • GATK4_APPLYBQSR
  • GATK4_CNNSCOREVARIANTS
  • GATK4_COLLECTSVEVIDENCE
  • GATK4_COMBINEGVCFS
  • GATK4_CONDENSEDEPTHEVIDENCE
  • GATK4_CREATESOMATICPANELOFNORMALS
  • GATK4_ESTIMATELIBRARYCOMPLEXITY
  • GATK4_FASTQTOSAM
  • GATK4_FILTERVARIANTTRANCHES
  • GATK4_GATHERBQSRREPORTS
  • GATK4_GATHERPILEUPSUMMARIES
  • GATK4_INDEXFEATUREFILE
  • GATK4_INTERVALLISTTOBED
  • GATK4_LEARNREADORIENTATIONMODEL
  • GATK4_PRINTSVEVIDENCE
  • GATK4_SITEDEPTHTOBAF
  • GATK4_SPLITCRAM
  • GATK4_SPLITNCIGARREADS
  • GATK4_SVANNOTATE
  • GATK4_SVCLUSTER
  • GATK_INDELREALIGNER
  • GATK_REALIGNERTARGETCREATOR
  • GATK_UNIFIEDGENOTYPER
  • GENESCOPEFK
  • GENOTYPHI_PARSE
  • GENRICH
  • GEOQUERY_GETGEO
  • GFAFFIX
  • GGET_GGET
  • GOAT_TAXONSEARCH
  • GPROFILER2_GOST
  • GRAPHMAP2_ALIGN
  • GRAPHMAP2_INDEX
  • GRAPHTYPER_GENOTYPE
  • GRAPHTYPER_VCFCONCATENATE
  • GSEA_GSEA
  • GSTAMA_COLLAPSE
  • GSTAMA_MERGE
  • GSTAMA_POLYACLEANUP
  • GT_GFF3
  • GUBBINS
  • GUNC_DOWNLOADDB
  • GUNC_MERGECHECKM
  • GUNC_RUN
  • GVCFTOOLS_EXTRACTVARIANTS
  • HAPIBD
  • HICAP
  • HICEXPLORER_HICPCA
  • HLALA_PREPAREGRAPH
  • HMMCOPY_GCCOUNTER
  • HMMCOPY_GENERATEMAP
  • HMMCOPY_MAPCOUNTER
  • HMMCOPY_READCOUNTER
  • HMMER_ESLALIMASK
  • HMMER_ESLREFORMAT
  • HMMER_HMMALIGN
  • HMMER_HMMBUILD
  • HOMER_ANNOTATEPEAKS
  • HOMER_MAKETAGDIRECTORY
  • HOMER_MAKEUCSCFILE
  • HOMER_POS2BED
  • HPSUISSERO
  • ICHORCNA_CREATEPON
  • ICHORCNA_RUN
  • ICOUNTMINI_SEGMENT
  • IDR
  • IGV_JS
  • ISLANDPATH
  • ISMAPPER
  • IVAR_CONSENSUS
  • IVAR_TRIM
  • IVAR_VARIANTS
  • JUPYTERNOTEBOOK
  • KAT_HIST
  • KHMER_NORMALIZEBYMEDIAN
  • KHMER_UNIQUEKMERS
  • KLEBORATE
  • KOFAMSCAN
  • KRAKENTOOLS_COMBINEKREPORTS
  • KRAKENTOOLS_EXTRACTKRAKENREADS
  • KRAKENTOOLS_KREPORT2KRONA
  • KRAKENUNIQ_BUILD
  • KRAKENUNIQ_DOWNLOAD
  • KRONA_KRONADB
  • KRONA_KTIMPORTTAXONOMY
  • KRONA_KTIMPORTTEXT
  • KRONA_KTUPDATETAXONOMY
  • LEEHOM
  • LEGSTA
  • LIMMA_DIFFERENTIAL
  • LISSERO
  • LOFREQ_CALL
  • LOFREQ_CALLPARALLEL
  • LOFREQ_FILTER
  • LOFREQ_INDELQUAL
  • MACREL_CONTIGS
  • MACS2_CALLPEAK
  • MALTEXTRACT
  • MAPAD_INDEX
  • MAPAD_MAP
  • MAPDAMAGE2
  • MASHTREE
  • MASH_DIST
  • MAXBIN2
  • MAXQUANT_LFQ
  • MCRONI
  • MEDAKA
  • MEGAHIT
  • MEGAN_DAA2INFO
  • MEGAN_RMA2INFO
  • MENINGOTYPE
  • MERQURYFK_KATCOMP
  • MERQURYFK_KATGC
  • MERQURYFK_MERQURYFK
  • MERQURYFK_PLOIDYPLOT
  • METABAT2_JGISUMMARIZEBAMCONTIGDEPTHS
  • METABAT2_METABAT2
  • METAPHLAN3_MERGEMETAPHLANTABLES
  • METAPHLAN3_METAPHLAN3
  • METAPHLAN_MERGEMETAPHLANTABLES
  • METAPHLAN_METAPHLAN
  • METHYLDACKEL_EXTRACT
  • METHYLDACKEL_MBIAS
  • MINDAGAP_DUPLICATEFINDER
  • MINIA
  • MLST
  • MOBSUITE_RECON
  • MOTUS_DOWNLOADDB
  • MOTUS_MERGE
  • MOTUS_PROFILE
  • MSISENSOR2_MSI
  • MSISENSOR2_SCAN
  • MSISENSORPRO_MSISOMATIC
  • MSISENSORPRO_SCAN
  • MSISENSOR_MSI
  • MSISENSOR_SCAN
  • MTNUCRATIO
  • MUSCLE
  • MYGENE
  • MYKROBE_PREDICT
  • NANOLYSE
  • NANOMONSV_PARSE
  • NCBIGENOMEDOWNLOAD
  • NEXTCLADE_DATASETGET
  • NEXTCLADE_RUN
  • NEXTGENMAP
  • NGMASTER
  • NUCMER
  • OATK
  • ODGI_BUILD
  • ODGI_DRAW
  • ODGI_LAYOUT
  • ODGI_SORT
  • ODGI_SQUEEZE
  • ODGI_STATS
  • ODGI_UNCHOP
  • ODGI_VIEW
  • ODGI_VIZ
  • ONCOCNV
  • OPTITYPE
  • PAIRIX
  • PAIRTOOLS_DEDUP
  • PAIRTOOLS_FLIP
  • PAIRTOOLS_MERGE
  • PAIRTOOLS_PARSE
  • PAIRTOOLS_RESTRICT
  • PAIRTOOLS_SELECT
  • PAIRTOOLS_SORT
  • PAIRTOOLS_STATS
  • PANAROO_RUN
  • PANGOLIN
  • PARACLU
  • PASTY
  • PBBAM_PBMERGE
  • PBPTYPER
  • PEAR
  • PHANTOMPEAKQUALTOOLS
  • PICARD_CREATESEQUENCEDICTIONARY
  • PICARD_FASTQTOSAM
  • PICARD_MERGESAMFILES
  • PICARD_SORTSAM
  • PILON
  • PINDEL_PINDEL
  • PINTS_CALLER
  • PIRATE
  • PLASMIDFINDER
  • PLASMIDID
  • PLATYPUS
  • PLINK2_EXTRACT
  • PLINK2_SCORE
  • PLINK2_VCF
  • PLINK_BCF
  • PLINK_EXCLUDE
  • PLINK_EXTRACT
  • PLINK_INDEP
  • PLINK_INDEPPAIRWISE
  • PLINK_RECODE
  • PLINK_VCF
  • PMDTOOLS_FILTER
  • PORECHOP_ABI
  • PRESEQ_CCURVE
  • PRETEXTSNAPSHOT
  • PRINSEQPLUSPLUS
  • PROKKA
  • PROPR_GREA
  • PROPR_LOGRATIO
  • PROPR_PROPD
  • PROPR_PROPR
  • PROTEUS_READPROTEINGROUPS
  • PURGEDUPS_CALCUTS
  • PURGEDUPS_GETSEQS
  • PURGEDUPS_PBCSTAT
  • PURGEDUPS_PURGEDUPS
  • PURGEDUPS_SPLITFA
  • PYCOQC
  • PYDAMAGE_ANALYZE
  • PYDAMAGE_FILTER
  • QCAT
  • QUALIMAP_BAMQCCRAM
  • QUILT_QUILT
  • RACON
  • RAPIDNJ
  • RASUSA
  • RAVEN
  • RAXMLNG
  • RMARKDOWNNOTEBOOK
  • ROARY
  • RSEM_CALCULATEEXPRESSION
  • RSEM_PREPAREREFERENCE
  • RSEQC_BAMSTAT
  • RSEQC_INFEREXPERIMENT
  • RSEQC_INNERDISTANCE
  • RSEQC_JUNCTIONANNOTATION
  • RSEQC_JUNCTIONSATURATION
  • RSEQC_READDISTRIBUTION
  • RSEQC_READDUPLICATION
  • RSEQC_TIN
  • SALSA2
  • SAM2LCA_ANALYZE
  • SAMBAMBA_FLAGSTAT
  • SAMBAMBA_MARKDUP
  • SAMTOOLS_AMPLICONCLIP
  • SAMTOOLS_BAM2FQ
  • SAMTOOLS_COLLATE
  • SAMTOOLS_COLLATEFASTQ
  • SAMTOOLS_DEPTH
  • SAMTOOLS_FASTA
  • SAMTOOLS_FASTQ
  • SAMTOOLS_FIXMATE
  • SAMTOOLS_MPILEUP
  • SCIMAP_MCMICRO
  • SCOARY
  • SCRAMBLE_CLUSTERANALYSIS
  • SCRAMBLE_CLUSTERIDENTIFIER
  • SEACR_CALLPEAK
  • SEMIBIN_SINGLEEASYBIN
  • SEQKIT_FX2TAB
  • SEQKIT_SPLIT2
  • SEQKIT_STATS
  • SEQKIT_TAB2FX
  • SEQSERO2
  • SEQTK_TRIM
  • SEQUENCETOOLS_PILEUPCALLER
  • SEQUENZAUTILS_BAM2SEQZ
  • SEQUENZAUTILS_GCWIGGLE
  • SEQWISH_INDUCE
  • SEROBA_RUN
  • SEXDETERRMINE
  • SGDEMUX
  • SHASTA
  • SHASUM
  • SHIGATYPER
  • SHIGEIFINDER
  • SHINYNGS_STATICDIFFERENTIAL
  • SHINYNGS_VALIDATEFOMCOMPONENTS
  • SHOVILL
  • SICKLE
  • SISTR
  • SLIMFASTQ
  • SMOOTHXG
  • SNAPALIGNER_ALIGN
  • SNIFFLES
  • SNIPPY_CORE
  • SNIPPY_RUN
  • SNPDISTS
  • SNPSIFT_SPLIT
  • SNPSITES
  • SOMALIER_ANCESTRY
  • SOURMASH_COMPARE
  • SOURMASH_GATHER
  • SOURMASH_SKETCH
  • SOURMASH_TAXANNOTATE
  • SPACERANGER_MKGTF
  • SPACERANGER_MKREF
  • SPATYPER
  • SPRING_COMPRESS
  • SPRING_DECOMPRESS
  • SRST2_SRST2
  • SSUISSERO
  • STADENIOLIB_SCRAMBLE
  • STAPHOPIASCCMEC
  • STECFINDER
  • SUBREAD_FEATURECOUNTS
  • SVTK_BAFTEST
  • SVTK_COUNTSVTYPES
  • SVTK_RDTEST2VCF
  • SVTK_STANDARDIZE
  • SVTK_VCF2BED
  • TAILFINDR
  • TBPROFILER_PROFILE
  • TOPAS_GENCONS
  • TRANSDECODER_LONGORF
  • TRANSDECODER_PREDICT
  • TRIMGALORE
  • TRIMMOMATIC
  • UCSC_BEDCLIP
  • UCSC_LIFTOVER
  • ULTRAPLEX
  • ULTRA_ALIGN
  • ULTRA_INDEX
  • ULTRA_PIPELINE
  • UMITOOLS_EXTRACT
  • UNZIP
  • UNZIPFILES
  • VARIANTBAM
  • VARLOCIRAPTOR_CALLVARIANTS
  • VARLOCIRAPTOR_ESTIMATEALIGNMENTPROPERTIES
  • VARLOCIRAPTOR_PREPROCESS
  • VCF2MAF
  • VCFLIB_VCFFILTER
  • VCFLIB_VCFUNIQ
  • VERIFYBAMID_VERIFYBAMID
  • VERIFYBAMID_VERIFYBAMID2
  • VG_DECONSTRUCT
  • VSEARCH_CLUSTER
  • VSEARCH_SINTAX
  • VSEARCH_USEARCHGLOBAL
  • WFMASH
  • WGSIM
  • WHAMG
  • YARA_INDEX
  • YARA_MAPPER
  • ZIP

@maxulysse
Copy link
Member

I've been updating stub for modules in rnaseq cf nf-core/rnaseq#1335
And I noticed some issues in some of the modules I worked on:

  • stub do not exist (which is what what this issues is about)
  • stub are not working (and end up in failure when used)
  • stub are badly implemented (and create random files that does not reflect the files that should be created)
  • stub are not reflecting actual modules functionalities and not creating enough files or too many (for example optional files)
  • stub are not tested
  • stub are not tested properly (I do think every stub test should be producing the same number of files with the same filenames than the test they replicate)

If this is not properly done, then subworkflows stubs are not working properly either, and pipeline stubs are just a pipe dream.

Which lead to the creation of this nf-test issue askimed/nf-test#227
And led me to reflect of what we actually want to test with stub.

Do we just want to test the names and the files created, and thus the channel logic when chaining modules.
If so, do we really need to print out the version, and create conda env/pull docker/singularity container just to touch files?
(and do we really need to tests stubs in conda/docker and singularity?)

Or do we use stub as well to check that the virtual env/containers are working?
(But for me that should be a separate test).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants