Skip to content

Commit

Permalink
Rna004 (#10)
Browse files Browse the repository at this point in the history
*rna004
  • Loading branch information
hasindu2008 committed Feb 4, 2024
1 parent 198ad51 commit a806f20
Show file tree
Hide file tree
Showing 7 changed files with 262,323 additions and 16 deletions.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,8 @@ By default, DNA PromethION reads (R9.4.1) will be simulated. Specify the `-x STR
- `rna-r9-prom`: direct RNA on PromethION R9.4.1 flowcells
- `dna-r10-min`: genomic DNA on MinION R10.4.1 flowcells
- `dna-r10-prom`: genomic DNA on PromethION R10.4.1 flowcells
- `rna004-min`: direct RNA on MinION RNA004 flowcells
- `rna004-prom`: direct RNA on promethION RNA004 flowcells

If a genomic DNA profile is selected, the input reference must be the **reference genome in *FASTA* format**. *squigulator* will randomly sample the genome from a uniform distribution and generate reads whose lengths are from a gamma distribution (based on `-r`). If a direct RNA profile is selected, the input reference must be the **transcriptome in *FASTA* format**. For RNA, *squigulator* will randomly pick transcripts from a uniform distribution and the whole transcript length is simulated.

Expand Down
5 changes: 3 additions & 2 deletions docs/man.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
Basic options in *squigulator* are as below:

- `-o FILE`: SLOW5/BLOW5 file to write.
- `-x STR`: Parameter profile (always applied before other options). Available profiles are: *dna-r9-min*, *dna-r9-prom, rna-r9-min*, *rna-r9-prom*, *dna-r10-min*, *dna-r10-prom*. [default: dna-r9-prom]
- `-x STR`: Parameter profile (always applied before other options). Available profiles are: *dna-r9-min*, *dna-r9-prom, rna-r9-min*, *rna-r9-prom*, *dna-r10-min*, *dna-r10-prom*, *rna004-min, *rna004-prom* [default: dna-r9-prom]
- `-n INT`: Number of reads to simulate. [default: 4000]
- `-r INT `: Mean read length (estimated mean only, unused for RNA) [default: 10000]
- `-f INT`: fold coverage to simulate (incompatible with -n)
Expand Down Expand Up @@ -37,4 +37,5 @@ Developer options (which are not much tested and error handling) are as below:
- `--range FLOAT`: ADC range (see [here](https://hasindu2008.github.io/slow5specs/summary))
- `--offset-mean FLOAT`: ADC offset mean (see [here](https://hasindu2008.github.io/slow5specs/summary))
- `--offset-std FLOAT`: ADC offset standard deviation (see [here](https://hasindu2008.github.io/slow5specs/summary))

- `--median-before-mean`: Median before mean (see [here](https://hasindu2008.github.io/slow5specs/summary))
- `--median-before-std`: Median before standard deviation (see [here](https://hasindu2008.github.io/slow5specs/summary))
98 changes: 98 additions & 0 deletions docs/profile.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
# Parameter profiles

-x option. Sets the parameters below.

## R9 profiles

- dna-r9-min: DNA R9.4.1 MinION (or GridION)
- dna-r9-prom: DNA R9.4.1 promethION (or P2solo)
- rna-r9-min: RNA R9.4.1 MinION (or GridION)
- rna-r9-prom: RNA R9.4.1 promethION (or P2solo)

| Profile | dna-r9-min | dna-r9-prom | rna-r9-min | rna-r9-prom |
|----------------------|--------------|---------------|--------------|---------------|
| digitisation | 8192 | 2048 | 8192 | 2048 |
| sample-rate | 4000 | 4000 | 3012 | 3000 |
| bps | 450 | 450 | 70 | 70 |
| range | 1443.030273 | 748.5801 | 1126.47 | 548.788269 |
| offset-mean | 13.7222605 | -237.4102 | 4.65491888 | -231.9440589 |
| Offset STD | 10.25279688 | 14.1575 | 4.115262472 | 12.87185278 |
| Median Before Mean | 200.815801 | 214.2890337 | 242.6584118 | 238.5286796 |
| Median Before STD | 20.48933762 | 18.0127916 | 10.60230888 | 21.1871794 |
| Dwell Mean | 9.0 | 9.0 | 43.0 | 43.0 |
| Dwell STD | 4.0 | 4.0 | 35.0 | 35.0 |


## R10 and RNA004 profiles

- dna-r10-min: DNA R10.4.1 MinION (or GridION)
- dna-r10-prom: DNA R10.4.1 promethION (or P2solo)
- rna004-min: RNA004 MinION (or GridION)
- rna004-prom: RNA004 promethION (or P2solo)

| Profile | dna-r10-min | dna-r10-prom | rna004-min | rna004-prom |
|---------------------|---------------|-----------------|--------------|---------------|
| digitisation | 8192 | 2048 | 2048 | 2048 |
| sample-rate | 4000 | 4000 | 4000 | 4000 |
| bps | 400 | 400 | 130 | 130 |
| range | 1536.598389 | 281.345551 | TBD | 299.432068 |
| offset-mean | 13.380569389 | -127.5655735 | TBD5 | -259.421128 |
| offset-std | 16.311471649 | 19.377283387665 | TBD | 16.010841823643 |
| median-before-mean | 202.154074388 | 189.87607393756 | TBD | 189.87607393756 |
| median-before-std | 13.406139242 | 15.788097978713 | TBD | 15.788097978713 |
| dwell-mean | 10.0 | 10.0 | TBD | 31.0 |
| dwell-std | 4.0 | 4.0 | TBD | 4.0 |

## Determining parameters for a profile

Assume S/BLOW5. Need slow5tools and datamash.
Convert using X and Y. Following methods:
Assume you have a pore model.


- digitisation. This is the [digitisation field in the BLOW5 file](https://hasindu2008.github.io/slow5specs/summary). Observe that this is the same across the whole dataset (Infact, MinIONs/GridIONs so far has 8192 and promethION/P2 has 2048).

Example command (you should only see one value):
```
slow5tools skim -t40 PNXRXX240011_reads_500k.blow5 | cut -f 3 | tail -n+2 | sort -u
2048
```

- sample-rate. This is the [sample-rate field in the BLOW5 file](https://hasindu2008.github.io/slow5specs/summary). Observe that this is the same across the whole dataset.

Example command (you should only see one value):
```
slow5tools skim -t40 PNXRXX240011_reads_500k.blow5 | cut -f 6 | tail -n+2 | sort -u
4000
```

- bps. This is the translocation speed which can be found on the relevant Guppy/Dorado model. For example, the Dorado model for rna004 is `[email protected]`. The bps is 130.

- range. This is the [digitisation field in the BLOW5 file](https://hasindu2008.github.io/slow5specs/summary).

Example command (you should only see one value):
```
slow5tools skim -t40 PNXRXX240011_reads_500k.blow5 | cut -f 5 | tail -n+2 | sort -u
299.432068
```

- offset-mean and offset-std. This is the mean and standard deviation of the [offset field in the BLOW5 file](https://hasindu2008.github.io/slow5specs/summary).

Example command to get the two parameters:
```
slow5tools skim -t40 PNXRXX240011_reads_500k.blow5 | cut -f 4 | tail -n+2 | datamash mean 1 sstdev 1
-259.421128 16.010841823643
```

-
- median-before-mean and median-before-std. This is the mean and standard deviation of the [median_before in the BLOW5 file](https://hasindu2008.github.io/slow5specs/summary).

Example command to get the two parameters:
```
slow5tools skim -t40 PNXRXX240011_reads_500k.blow5 | awk -v c="median_before" 'NR==1{for (i=1; i<=NF; i++) if ($i==c){p=i; break};} {if ($p!=".") print $p}' | tail -n+2 | datamash mean 1 sstdev 1
205.63935594369 8.3994882799157
```

- dwell-mean. This must be equal to the sample_rate/bps, and acts as a sanity check currently.

- dwell-std.
8 changes: 6 additions & 2 deletions src/model.c
Original file line number Diff line number Diff line change
Expand Up @@ -169,8 +169,12 @@ uint32_t set_model(model_t* model, uint32_t model_id) {
num_kmer=262144;
inbuilt_model=r10_4_nucleotide_9mer_template_model_builtin_data;
assert(num_kmer == (uint32_t)(1 << 2*kmer_size)); //num_kmer should be 4^kmer_size
}
else{
} else if(model_id==MODEL_ID_RNA_RNA004_NUCLEOTIDE){
kmer_size=9;
num_kmer=262144;
inbuilt_model=rna004_130bps_u_to_t_rna_9mer_template_model_builtin_data;
assert(num_kmer == (uint32_t)(1 << 2*kmer_size)); //num_kmer should be 4^kmer_size
} else{
assert(0);
}

Expand Down
Loading

0 comments on commit a806f20

Please sign in to comment.