Skip to content

Commit

Permalink
minor help improvements
Browse files Browse the repository at this point in the history
  • Loading branch information
hasindu2008 committed Aug 17, 2024
1 parent d339dcf commit 6237ce7
Show file tree
Hide file tree
Showing 6 changed files with 18 additions and 269 deletions.
2 changes: 1 addition & 1 deletion docs/dna.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# squigulator evaluation - genomic DNA

**This document is based on very early experimentsre conducted using commit [2b9f3e886eea16](https://github.com/hasindu2008/squigulator/commit/2b9f3e886eea1601d9f0b2021bcd303ad48005a8) when the tool was under activate development.r more upto date information, please refer to the [squigulator publication](https://genome.cshlp.org/content/34/5/778.full?sid=cd2c8aec-be46-4c9e-885c-8452ac069f64).**
**This document is based on very early experiments conducted using commit [2b9f3e886eea16](https://github.com/hasindu2008/squigulator/commit/2b9f3e886eea1601d9f0b2021bcd303ad48005a8) when the tool was under activate development. For more up to date information, please refer to the [squigulator publication](https://genome.cshlp.org/content/34/5/778.full?sid=cd2c8aec-be46-4c9e-885c-8452ac069f64).**

For comparing *squigulator* against real data, reads mapping to the chr22 from a NA12878 dataset sequenced on a PromethION sequencer was used. The extracted dataset had ~135,000 reads whose mean read length was ~10,800 bases (will be referred to as *real*). To simulate reads using *squigulator*, first, a heterozygous NA12878 heterozygous chr22 reference was created by applying NA12878 GIAB variants to chr22 in hg38 reference genome. Then, *squigulator* was executed to generate 135,000 reads at mean read length of 10,800 (will be referred to as *simulated*). See [Methods](#Methods) for more information.

Expand Down
6 changes: 4 additions & 2 deletions docs/man.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,14 +25,14 @@ Advanced options are as below:
- `--dwell-mean FLOAT`: Mean of number of signal samples per k-mer/base. This is usually the sampling rate (4000Hz for DNA and 3000Hz for RNA) divided by translocation speed in bases per second (450 for R9.4.1 pore for DNA and 70 for RNA). [default: 9.0]
- `--dwell-std FLOAT`: Standard deviation of number of signal samples per k-mer/base. Increasing this will increase time-domain noise. Setting this to 0 is same as `--ideal-time`. See example [here](img/dwell.svg). [default: 4.0]
- `--bps INT`: translocation speed in bases per second (incompatible with --dwell-mean).
- `--kmer-model FILE`: custom nucleotide k-mer model file (format similar to [here](https://github.com/hasindu2008/f5c/blob/master/test/r9-models/r9.4_450bps.nucleotide.6mer.template.model))
- `--prefix=yes/no`: generate prefixes such as adaptor (and polya for RNA). [default: no]
- `--seed INT`: seed for random generators (if 0, will be autogenerated). Giving the same seed will produce same results. [default: 0]
- `--paf-ref`: in paf output, use the reference as the target instead of read (needs -c)
- `--cdna`: generate cDNA reads (only valid with dna profiles and the reference must a transcriptome, experimental)
- `--trans-count FILE`: simulate relative abundance using specified 2-column tsv with first column containing transcript name and the second containing the count (only for direct-rna and cDNA, experimental). You may generate this a test fatq dataset using minimap2, for example, `minimap2 -cx map-ont transcripts.fa reads.fastq --secondary=no -t20 -uf | cut -f 6 | sort | uniq -c | awk '{print$2"\t"$1}'`.
- `--trans-count FILE`: simulate relative abundance using specified 2-column tsv with first column containing transcript name and the second containing the count. See the example at [test/sequin_count.tsv]() (only for direct-rna and cDNA, experimental). You may generate this using a dataset using minimap2, for example, `minimap2 -cx map-ont transcripts.fa reads.fastq --secondary=no -t20 -uf | cut -f 6 | sort | uniq -c | awk '{print$2"\t"$1}'`.
- `--trans-trunc=yes/no`: simulate transcript truncation (only for direct-rna, experimental) [default: no]
- `--ont-friendly=yes/no`: generate fake uuid for readids and add a dummy end_reason [default: no]
-- `--meth-freq FILE`: simulate CpG methylation using a frequency tsv file. The tsv file should have three columns, chr, 0-based pos, and methylation frequency. See the example at [test/mfreq.tsv](). (for DNA, experimental)

Developer options (which are not much tested and error handling) are as below:

Expand All @@ -43,3 +43,5 @@ Developer options (which are not much tested and error handling) are as below:
- `--offset-std FLOAT`: ADC offset standard deviation (see [here](https://hasindu2008.github.io/slow5specs/summary))
- `--median-before-mean`: Median before mean (see [here](https://hasindu2008.github.io/slow5specs/summary))
- `--median-before-std`: Median before standard deviation (see [here](https://hasindu2008.github.io/slow5specs/summary))
- `--kmer-model FILE`: custom nucleotide k-mer model file (format similar to [f5c models](https://github.com/hasindu2008/f5c/blob/master/test/r9-models/r9.4_450bps.nucleotide.6mer.template.model))
- `--meth-model FILE`: custom methylation k-mer model file (format similar to [f5c models](https://github.com/hasindu2008/f5c/blob/master/test/r9-models/r9.4_450bps.cpg.6mer.template.model))
5 changes: 4 additions & 1 deletion scripts/test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -143,8 +143,11 @@ diff -q test/r9_mfreq.exp a.slow5 || die "diff failed"
# f5c index a.fastq --slow5 a.slow5 && f5c call-methylation -r a.fastq -g test/nCoV-2019.reference.fasta -b a.bam --slow5 a.slow5 -o meth.tsv && f5c meth-freq -i meth.tsv -s -o meth-freq.tsv

#meth r10
# ex ./squigulator -x dna-r9-prom -o a.slow5 --seed 1 -t1 -n 2 -r 29000 test/nCoV-2019.reference.fasta --meth-freq test/methfreq.tsv
ex ./squigulator -x dna-r10-prom -o a.slow5 --seed 1 -t1 -n 2 -r 29000 test/nCoV-2019.reference.fasta --meth-freq test/methfreq.tsv
# diff -q test/r10_methfreq.exp a.slow5 || die "diff failed"
# /install/buttery-eel-0.3.1+6.5.7/scripts/eel -i a.slow5 -o a.fastq --config dna_r10.4.1_e8.2_400bps_5khz_sup.cfg -x cuda:all
# minimap2 -ax map-ont /genome/nCoV-2019.reference.fasta a.fastq --secondary=no | samtools sort - -o a.bam && samtools index a.bam
# f5c index a.fastq --slow5 a.slow5 && f5c call-methylation -r a.fastq -g test/nCoV-2019.reference.fasta -b a.bam --slow5 a.slow5 -o meth.tsv && f5c meth-freq -i meth.tsv -s -o meth-freq.tsv

# threads and batch size
redundancy_check () {
Expand Down
258 changes: 0 additions & 258 deletions src/genref.c

This file was deleted.

14 changes: 8 additions & 6 deletions src/sim.c
Original file line number Diff line number Diff line change
Expand Up @@ -743,25 +743,27 @@ static void print_help(FILE *fp_help, opt_t opt, profile_t p, int64_t nreads) {
fprintf(fp_help," --dwell-mean FLOAT mean of number of signal samples per base [%.1f]\n",p.dwell_mean);
fprintf(fp_help," --dwell-std FLOAT standard deviation of number of signal samples per base [%.1f]\n",p.dwell_std);
fprintf(fp_help," --bps INT translocation speed in bases per second (incompatible with --dwell-mean) [%ld]\n",(long)(p.sample_rate/p.dwell_mean));
fprintf(fp_help," --kmer-model FILE custom nucleotide k-mer model file (format similar to https://github.com/hasindu2008/f5c/blob/master/test/r9-models/r9.4_450bps.nucleotide.6mer.template.model)\n");
fprintf(fp_help," --meth-model FILE custom methylation k-mer model file (format similar to https://github.com/hasindu2008/f5c/blob/master/test/r9-models/r9.4_450bps.cpg.6mer.template.model)\n");
fprintf(fp_help," --prefix=yes|no generate prefixes such as adaptor (and polya for RNA) [no]\n");
fprintf(fp_help," --seed INT seed or random generators (if 0, will be autogenerated) [%ld]\n",opt.seed);
fprintf(fp_help," --paf-ref in paf output, use the reference as the target instead of read (needs -c)\n");
fprintf(fp_help," --cdna generate cDNA reads (only valid with dna profiles and the reference must a transcriptome, experimental)\n");
fprintf(fp_help," --trans-count FILE simulate relative abundance using specified tsv with transcript name & count (only for direct-rna and cDNA, experimental)\n");
fprintf(fp_help," --cdna generate cDNA reads (only for dna profiles & the reference must a transcriptome, experimental)\n");
fprintf(fp_help," --trans-count FILE simulate relative abundance using tsv file [transcript name, count] (for direct-rna & cDNA, experimental)\n");
fprintf(fp_help," --trans-trunc=yes/no simulate transcript truncattion (only for direct-rna, experimental) [no]\n");
fprintf(fp_help," --ont-friendly=yes/no generate fake uuid for readids and add a dummy end_reason [no]\n");
fprintf(fp_help," --meth-freq FILE simulate CpG methylation using specified frequency file, tsv file with chr, 0-based pos and freq as the columns (only for DNA, experimental)\n");
fprintf(fp_help," --meth-freq FILE simulate CpG methylation using frequency tsv file [chr, 0-based pos, freq] (for DNA, experimental)\n");

fprintf(fp_help,"\ndeveloper options (not much tested yet):\n");
fprintf(fp_help,"\ndeveloper options (less tested):\n");
fprintf(fp_help," --digitisation FLOAT ADC digitisation [%.1f]\n",p.digitisation);
fprintf(fp_help," --sample-rate FLOAT ADC sampling rate [%.1f]\n",p.sample_rate);
fprintf(fp_help," --range FLOAT ADC range [%.1f]\n",p.range);
fprintf(fp_help," --offset-mean FLOAT ADC offset mean [%.1f]\n",p.offset_mean);
fprintf(fp_help," --offset-std FLOAT ADC offset standard deviation [%.1f]\n",p.offset_std);
fprintf(fp_help," --median-before-mean FLOAT Median before mean [%.1f]\n",p.median_before_mean);
fprintf(fp_help," --median-before-std FLOAT Median before standard deviation [%.1f]\n",p.median_before_std);
fprintf(fp_help," --kmer-model FILE custom nucleotide k-mer model file (format similar to f5c models)\n");
fprintf(fp_help," --meth-model FILE custom methylation k-mer model file (format similar to f5c models)\n");
fprintf(fp_help,"\n");
fprintf(fp_help,"See the manual page on GitHub for more details, options and the format of input/output files.\n");

if(fp_help == stdout){
exit(EXIT_SUCCESS);
Expand Down
2 changes: 1 addition & 1 deletion test/methfreq.tsv
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
MN908947.3 99 1.0
MN908947.3 1184 0.0
MN908947.3 1456 0.5
MN908947.3 20031 0.7
MN908947.3 20031 0.7

0 comments on commit 6237ce7

Please sign in to comment.