Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rna004 #10

Merged
merged 7 commits into from
Feb 4, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,8 @@ By default, DNA PromethION reads (R9.4.1) will be simulated. Specify the `-x STR
- `rna-r9-prom`: direct RNA on PromethION R9.4.1 flowcells
- `dna-r10-min`: genomic DNA on MinION R10.4.1 flowcells
- `dna-r10-prom`: genomic DNA on PromethION R10.4.1 flowcells
- `rna004-min`: direct RNA on MinION RNA004 flowcells
- `rna004-prom`: direct RNA on promethION RNA004 flowcells

If a genomic DNA profile is selected, the input reference must be the **reference genome in *FASTA* format**. *squigulator* will randomly sample the genome from a uniform distribution and generate reads whose lengths are from a gamma distribution (based on `-r`). If a direct RNA profile is selected, the input reference must be the **transcriptome in *FASTA* format**. For RNA, *squigulator* will randomly pick transcripts from a uniform distribution and the whole transcript length is simulated.

Expand Down
5 changes: 3 additions & 2 deletions docs/man.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
Basic options in *squigulator* are as below:

- `-o FILE`: SLOW5/BLOW5 file to write.
- `-x STR`: Parameter profile (always applied before other options). Available profiles are: *dna-r9-min*, *dna-r9-prom, rna-r9-min*, *rna-r9-prom*, *dna-r10-min*, *dna-r10-prom*. [default: dna-r9-prom]
- `-x STR`: Parameter profile (always applied before other options). Available profiles are: *dna-r9-min*, *dna-r9-prom, rna-r9-min*, *rna-r9-prom*, *dna-r10-min*, *dna-r10-prom*, *rna004-min, *rna004-prom* [default: dna-r9-prom]
- `-n INT`: Number of reads to simulate. [default: 4000]
- `-r INT `: Mean read length (estimated mean only, unused for RNA) [default: 10000]
- `-f INT`: fold coverage to simulate (incompatible with -n)
Expand Down Expand Up @@ -37,4 +37,5 @@ Developer options (which are not much tested and error handling) are as below:
- `--range FLOAT`: ADC range (see [here](https://hasindu2008.github.io/slow5specs/summary))
- `--offset-mean FLOAT`: ADC offset mean (see [here](https://hasindu2008.github.io/slow5specs/summary))
- `--offset-std FLOAT`: ADC offset standard deviation (see [here](https://hasindu2008.github.io/slow5specs/summary))

- `--median-before-mean`: Median before mean (see [here](https://hasindu2008.github.io/slow5specs/summary))
- `--median-before-std`: Median before standard deviation (see [here](https://hasindu2008.github.io/slow5specs/summary))
98 changes: 98 additions & 0 deletions docs/profile.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
# Parameter profiles

-x option. Sets the parameters below.

## R9 profiles

- dna-r9-min: DNA R9.4.1 MinION (or GridION)
- dna-r9-prom: DNA R9.4.1 promethION (or P2solo)
- rna-r9-min: RNA R9.4.1 MinION (or GridION)
- rna-r9-prom: RNA R9.4.1 promethION (or P2solo)

| Profile | dna-r9-min | dna-r9-prom | rna-r9-min | rna-r9-prom |
|----------------------|--------------|---------------|--------------|---------------|
| digitisation | 8192 | 2048 | 8192 | 2048 |
| sample-rate | 4000 | 4000 | 3012 | 3000 |
| bps | 450 | 450 | 70 | 70 |
| range | 1443.030273 | 748.5801 | 1126.47 | 548.788269 |
| offset-mean | 13.7222605 | -237.4102 | 4.65491888 | -231.9440589 |
| Offset STD | 10.25279688 | 14.1575 | 4.115262472 | 12.87185278 |
| Median Before Mean | 200.815801 | 214.2890337 | 242.6584118 | 238.5286796 |
| Median Before STD | 20.48933762 | 18.0127916 | 10.60230888 | 21.1871794 |
| Dwell Mean | 9.0 | 9.0 | 43.0 | 43.0 |
| Dwell STD | 4.0 | 4.0 | 35.0 | 35.0 |


## R10 and RNA004 profiles

- dna-r10-min: DNA R10.4.1 MinION (or GridION)
- dna-r10-prom: DNA R10.4.1 promethION (or P2solo)
- rna004-min: RNA004 MinION (or GridION)
- rna004-prom: RNA004 promethION (or P2solo)

| Profile | dna-r10-min | dna-r10-prom | rna004-min | rna004-prom |
|---------------------|---------------|-----------------|--------------|---------------|
| digitisation | 8192 | 2048 | 2048 | 2048 |
| sample-rate | 4000 | 4000 | 4000 | 4000 |
| bps | 400 | 400 | 130 | 130 |
| range | 1536.598389 | 281.345551 | TBD | 299.432068 |
| offset-mean | 13.380569389 | -127.5655735 | TBD5 | -259.421128 |
| offset-std | 16.311471649 | 19.377283387665 | TBD | 16.010841823643 |
| median-before-mean | 202.154074388 | 189.87607393756 | TBD | 189.87607393756 |
| median-before-std | 13.406139242 | 15.788097978713 | TBD | 15.788097978713 |
| dwell-mean | 10.0 | 10.0 | TBD | 31.0 |
| dwell-std | 4.0 | 4.0 | TBD | 4.0 |

## Determining parameters for a profile

Assume S/BLOW5. Need slow5tools and datamash.
Convert using X and Y. Following methods:
Assume you have a pore model.


- digitisation. This is the [digitisation field in the BLOW5 file](https://hasindu2008.github.io/slow5specs/summary). Observe that this is the same across the whole dataset (Infact, MinIONs/GridIONs so far has 8192 and promethION/P2 has 2048).

Example command (you should only see one value):
```
slow5tools skim -t40 PNXRXX240011_reads_500k.blow5 | cut -f 3 | tail -n+2 | sort -u
2048
```

- sample-rate. This is the [sample-rate field in the BLOW5 file](https://hasindu2008.github.io/slow5specs/summary). Observe that this is the same across the whole dataset.

Example command (you should only see one value):
```
slow5tools skim -t40 PNXRXX240011_reads_500k.blow5 | cut -f 6 | tail -n+2 | sort -u
4000
```

- bps. This is the translocation speed which can be found on the relevant Guppy/Dorado model. For example, the Dorado model for rna004 is `[email protected]`. The bps is 130.

- range. This is the [digitisation field in the BLOW5 file](https://hasindu2008.github.io/slow5specs/summary).

Example command (you should only see one value):
```
slow5tools skim -t40 PNXRXX240011_reads_500k.blow5 | cut -f 5 | tail -n+2 | sort -u
299.432068
```

- offset-mean and offset-std. This is the mean and standard deviation of the [offset field in the BLOW5 file](https://hasindu2008.github.io/slow5specs/summary).

Example command to get the two parameters:
```
slow5tools skim -t40 PNXRXX240011_reads_500k.blow5 | cut -f 4 | tail -n+2 | datamash mean 1 sstdev 1
-259.421128 16.010841823643
```

-
- median-before-mean and median-before-std. This is the mean and standard deviation of the [median_before in the BLOW5 file](https://hasindu2008.github.io/slow5specs/summary).

Example command to get the two parameters:
```
slow5tools skim -t40 PNXRXX240011_reads_500k.blow5 | awk -v c="median_before" 'NR==1{for (i=1; i<=NF; i++) if ($i==c){p=i; break};} {if ($p!=".") print $p}' | tail -n+2 | datamash mean 1 sstdev 1
205.63935594369 8.3994882799157
```

- dwell-mean. This must be equal to the sample_rate/bps, and acts as a sanity check currently.

- dwell-std.
8 changes: 6 additions & 2 deletions src/model.c
Original file line number Diff line number Diff line change
Expand Up @@ -169,8 +169,12 @@ uint32_t set_model(model_t* model, uint32_t model_id) {
num_kmer=262144;
inbuilt_model=r10_4_nucleotide_9mer_template_model_builtin_data;
assert(num_kmer == (uint32_t)(1 << 2*kmer_size)); //num_kmer should be 4^kmer_size
}
else{
} else if(model_id==MODEL_ID_RNA_RNA004_NUCLEOTIDE){
kmer_size=9;
num_kmer=262144;
inbuilt_model=rna004_130bps_u_to_t_rna_9mer_template_model_builtin_data;
assert(num_kmer == (uint32_t)(1 << 2*kmer_size)); //num_kmer should be 4^kmer_size
} else{
assert(0);
}

Expand Down
Loading
Loading