You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The "-h" or "--help" options get a strange error. If they aren't implemented, then shouldn't the tool complain "no such option exist" versus the following?
The index directory -h does not seem to exist
The help message has duplicate entries, for example:
-i, --index <index>
directory where the pufferfish index is stored
The help message has two options for the same flag:
-t, --threads <num threads>
Specify the number of threads (default=8)
...
-t, --type <statType>
statType (options:ctab)
Can you add the default/current value to each option? For example, I assume that allowSoftclip is off by default (end-to-end alignment), but if the option could output:
--allowSoftclip
Allow soft-clipping at start and end of alignments (default=off)
Full output of the help message:
pufferfish/build/src/pufferfish align -h
Parsing command line failed with exception: The index directory -h does not seem to exist.
SYNOPSIS
pufferfish index -r <ref_file>... -o <output_dir> [--headerSep <sep_strs>] [--keepFixedFasta] [--keepDuplicates] [-d <decoy_list>] [-f <filt_size>] [--tmpdir <twopaco_tmp_dir>] [-k <kmer_length>] [-p <threads>] [-l] [-q] [-s] [-e <extension_size>] [-v]
pufferfish index -r <ref_file>... -o <output_dir> [--headerSep <sep_strs>] [--keepFixedFasta] [--keepDuplicates] [-d <decoy_list>] [-f <filt_size>] [--tmpdir <twopaco_tmp_dir>] [-k <kmer_length>] [-p <threads>] [-l] [-q] [-x <lossy_rate>] [-v]
pufferfish validate -i <index> [-v]
pufferfish lookup -i <index> -r <ref> [-v]
pufferfish align -i <index> --mate1 <mate 1> --mate2 <mate 2> [-b] [--coverageScoreRatio <score ratio>] [-t <num threads>] [-m] (--noOutput | (-o <output file>)) [--allowSoftclip] [--allowOverhangSoftclip] [--maxSpliceGap <max splice gap>] [--maxFragmentLength <max frag length>] [--noOrphans] [--orphanRecovery] [--noDiscordant] [--noDovetail] [-z] [-k|-p] [--verbose] [--fullAlignment] [--heuristicChaining] [--bestStrata] [--genomicReads] [--primaryAlignment] [--filterGenomics <genes names file>] [--filterBestScoreMicrobiome <genes ID file>] [--filterMicrobiome <genes ID file>] [--bt2DefaultThreshold] [--minScoreFraction <minScoreFraction>] [--consensusFraction <consensus fraction>] [--noAlignmentCache] [-v]
pufferfish align -i <index> --read <reads> [-b] [--coverageScoreRatio <score ratio>] [-t <num threads>] [-m] (--noOutput | (-o <output file>)) [--allowSoftclip] [--allowOverhangSoftclip] [--maxSpliceGap <max splice gap>] [--maxFragmentLength <max frag length>] [--noOrphans] [--orphanRecovery] [--noDiscordant] [--noDovetail] [-z] [-k|-p] [--verbose] [--fullAlignment] [--heuristicChaining] [--bestStrata] [--genomicReads] [--primaryAlignment] [--filterGenomics <genes names file>] [--filterBestScoreMicrobiome <genes ID file>] [--filterMicrobiome <genes ID file>] [--bt2DefaultThreshold] [--minScoreFraction <minScoreFraction>] [--consensusFraction <consensus fraction>] [--noAlignmentCache] [-v]
pufferfish examine -i <index> [--dump-fasta <fasta_out>] [--dump-kmer-freq <kmer_freq_out>] [-v]
pufferfish stat [-t <statType>] -i <index> [-v]
pufferfish help [-v]
OPTIONS
-r, --ref <ref_file>
path to the reference fasta file
-o, --output <output_dir>
directory where index is written
--headerSep <sep_strs>
Instead of a space or tab, break the header at the first occurrence of this string, and name the transcript as the token before the first separator (default = space & tab)
--keepFixedFasta
Retain the fixed fasta file (without short transcripts and duplicates, clipped, etc.) generated during indexing
--keepDuplicates
Retain duplicate references in the input
-d, --decoys <decoy_list>
Treat these sequences as decoys that may be sequence-similar to some known indexed reference
-f, --filt-size <filt_size>
filter size to pass to TwoPaCo when building the reference dBG
--tmpdir <twopaco_tmp_dir>
temporary work directory to pass to TwoPaCo when building the reference dBG
-k, --klen <kmer_length>
length of the k-mer with which the dBG was built (default = 31)
-p, --threads <threads>
total number of threads to use for building MPHF (default = 16)
-l, --build-edges
build and record explicit edge table for the contaigs of the ccdBG (default = false)
-q, --build-eqclses
build and record equivalence classes (default = false)
-s, --sparse
use the sparse pufferfish index (less space, but slower lookup)
-e, --extension <extension_size>
length of the extension to store in the sparse index (default = 4)
<lossy_rate>
use the lossy sampling index with a sampling rate of x (less space and fast, but lower sensitivity)
-i, --index <index>
directory where the pufferfish index is stored
-i, --index <index>
directory where the pufferfish index is stored
-r, --ref <ref>
fasta file with reference sequences
-i, --index <index>
Directory where the Pufferfish index is stored
--mate1, -1 <mate 1>
Path to the left end of the read files
--mate2, -2 <mate 2>
Path to the right end of the read files
--read <reads>
Path to single-end read files
-b, --batchOfReads
Is each input a file containing the list of reads? (default=false)
--coverageScoreRatio <score ratio>
Discard mappings with a coverage score < scoreRatio * OPT (default=0.6)
-t, --threads <num threads>
Specify the number of threads (default=8)
-m, --just-mapping
don't attempt alignment validation; just do mapping
--noOutput Run without writing SAM file
-o, --outdir <output file>
Output file where the alignment results will be stored
--allowSoftclip
Allow soft-clipping at start and end of alignments
--allowOverhangSoftclip
Allow soft-clipping part of a read that overhangs the reference (the regular --allowSoftclip flag overrides this one)
--maxSpliceGap <max splice gap>
Specify maximum splice gap that two uni-MEMs should have
--maxFragmentLength <max frag length>
Specify the maximum distance between the last uni-MEM of the left and first uni-MEM of the right end of the read pairs (default:1000)
--noOrphans Write Orphans flag
--orphanRecovery
Recover mappings for the other end of orphans using alignment
--noDiscordant
Write Orphans flag
--noDovetail
Disallow dovetail alignment for paired end reads
-z, --compressedOutput
Compress (gzip) the output file
-k, --krakOut
Write output in the format required for krakMap
-p, --pam Write output in the format required for salmon
--verbose Print out auxilary information to trace program's flow
--fullAlignment
Perform full alignment instead of gapped alignment
--heuristicChaining
Whether or not perform only 2 rounds of chaining
--bestStrata
Keep only the alignments with the best score for each read
--genomicReads
Align genomic dna-seq reads instead of RNA-seq reads
--primaryAlignment
Report at most one alignment per read
--filterGenomics <genes names file>
Path to the file containing gene IDs. Filters alignments to the IDs listed in the file. Used to filter genomic reads while aligning to both genome and transcriptome.A read will be reported with only the valid gene ID alignments and will be discarded if the best alignment is to an invalid IDThe IDs are the same as the IDs in the fasta file provided for the index construction phase
--filterBestScoreMicrobiome <genes ID file>
Path to the file containing gene IDs. Same as option "filterGenomics" except that a read will be discarded if aligned equally best to a valid and invalid gene ID.
--filterMicrobiome <genes ID file>
Path to the file containing gene IDs. Same as option "filterGenomics" except that a read will be discarded if an invalid gene ID is in the list of alignments.
--bt2DefaultThreshold
mimic the default threshold function of Bowtie2 which is t = -0.6 -0.6 * read_len
--minScoreFraction <minScoreFraction>
Discard alignments with alignment score < minScoreFraction * max_alignment_score for that read (default=0.65)
--consensusFraction <consensus fraction>
The fraction of mems, relative to the reference with the maximum number of mems, that a reference must contain in order to move forward with computing an optimal chain score (default=0.65)
--noAlignmentCache
Do not use the alignment cache during the alignment.
-i, --index <index>
pufferfish index directory
--dump-fasta <fasta_out>
dump the reference sequences in the index in the provided fasta file
--dump-kmer-freq <kmer_freq_out>
dump the frequency histogram of k-mers
-t, --type <statType>
statType (options:ctab)
-i, --index <index>
directory where the pufferfish index is stored
-v, --version
The text was updated successfully, but these errors were encountered:
The particular items you pointed at in this issue are definitely something that requires our immediate action to resolve so that the help manual of the tool is easy to interpret by the user. The biggest issue is the automatically generated help produced by the argument parsing library we are using, clipp, exhibits the issues you raise regarding duplicate options in the way it generates the default help messages. We're looking into if there is a way to fix this within clipp, and may otherwise consider changing the argument parser we use.
That being said, we hope to have this resolved one way or the other in a few days. We are currently in the process of merging the cigar-strings branch where Puffaligner lives and the develop (which we use as an external library for the selective-alignment in salmon). We anticipate it will take a few days to test and guarantee that the performance and the accuracy are not changed by either some specific optimizations in cigar-string branch or merge conflicts. After that, we will continue resolving these issues on the (merged) develop branch. We will ping back here when the develop branch is updated with the improved help messages and options.
And I would suggest please if when e.g. pufferfish index --help, only show the index help and relevant options, same as other command line programs already do with sub command help. This will probably already eliminate the number of duplicate options seen in the help.
Four suggestions based off of 8c24fb1:
allowSoftclip
is off by default (end-to-end alignment), but if the option could output:Full output of the help message:
The text was updated successfully, but these errors were encountered: