Skip to content

Commit

Permalink
Merge branch 'dev' into dwell_table
Browse files Browse the repository at this point in the history
  • Loading branch information
hasindu2008 committed Apr 12, 2024
2 parents 0206923 + 7318deb commit 12d25c1
Show file tree
Hide file tree
Showing 52 changed files with 527,900 additions and 263,498 deletions.
27 changes: 24 additions & 3 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,10 @@ OBJ = $(BUILD_DIR)/main.o \
$(BUILD_DIR)/misc.o \
$(BUILD_DIR)/sim.o \
$(BUILD_DIR)/thread.o \
$(BUILD_DIR)/format.o \
$(BUILD_DIR)/gensig.o \
$(BUILD_DIR)/genread.o \
$(BUILD_DIR)/ref.o \

PREFIX = /usr/local
VERSION = `git describe --tags`
Expand All @@ -29,7 +33,11 @@ endif
$(BINARY): $(OBJ) slow5lib/lib/libslow5.a
$(CC) $(CFLAGS) $(OBJ) slow5lib/lib/libslow5.a $(LDFLAGS) -o $@

$(BUILD_DIR)/main.o: src/main.c src/misc.h src/error.h src/sq.h
HEADERS = src/error.h src/format.h src/misc.h src/model.h \
src/rand.h src/ref.h src/seq.h src/sq.h src/str.h src/version.h \
src/kseq.h src/khash.h src/ksort.h

$(BUILD_DIR)/main.o: src/main.c $(HEADERS)
$(CC) $(CFLAGS) $(CPPFLAGS) $< -c -o $@

$(BUILD_DIR)/model.o: src/model.c src/model.h src/misc.h
Expand All @@ -38,12 +46,25 @@ $(BUILD_DIR)/model.o: src/model.c src/model.h src/misc.h
$(BUILD_DIR)/thread.o: src/thread.c
$(CC) $(CFLAGS) $(CPPFLAGS) $< -c -o $@

$(BUILD_DIR)/misc.o: src/misc.c
$(BUILD_DIR)/misc.o: src/misc.c src/error.h
$(CC) $(CFLAGS) $(CPPFLAGS) $< -c -o $@

$(BUILD_DIR)/sim.o: src/sim.c $(HEADERS)
$(CC) $(CFLAGS) $(CPPFLAGS) $< -c -o $@

$(BUILD_DIR)/format.o: src/format.c $(HEADERS)
$(CC) $(CFLAGS) $(CPPFLAGS) $< -c -o $@

$(BUILD_DIR)/sim.o: src/sim.c src/ref.h src/misc.h src/str.h
$(BUILD_DIR)/gensig.o: src/gensig.c $(HEADERS)
$(CC) $(CFLAGS) $(CPPFLAGS) $< -c -o $@

$(BUILD_DIR)/genread.o: src/genread.c $(HEADERS)
$(CC) $(CFLAGS) $(CPPFLAGS) $< -c -o $@

$(BUILD_DIR)/ref.o: src/ref.c $(HEADERS)
$(CC) $(CFLAGS) $(CPPFLAGS) $< -c -o $@


slow5lib/lib/libslow5.a:
$(MAKE) -C slow5lib zstd=$(zstd) no_simd=$(no_simd) zstd_local=$(zstd_local) lib/libslow5.a

Expand Down
25 changes: 23 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,31 @@

Reads directly extracted from the reference genome are simulated without any mutations/variants. If you want to have variants in your simulated data, you can first apply a set of variants to the reference using [bcftools](http://www.htslib.org/download/) and use that as the input to the *squigulator*.

Preprint: [https://www.biorxiv.org/content/10.1101/2023.05.09.539953v1](https://www.biorxiv.org/content/10.1101/2023.05.09.539953v1)<br/>
SLOW5 ecosystem: [https://hasindu2008.github.io/slow5](https://hasindu2008.github.io/slow5)<br/>

![squigulator](docs/img/example.svg)

[![GitHub Downloads](https://img.shields.io/github/downloads/hasindu2008/squigulator/total?logo=GitHub)](https://github.com/hasindu2008/squigulator/releases)
[![BioConda Install](https://img.shields.io/conda/dn/bioconda/squigulator?label=BioConda)](https://anaconda.org/bioconda/squigulator)
[![x86_64](https://github.com/hasindu2008/squigulator/actions/workflows/c-cpp.yml/badge.svg)](https://github.com/hasindu2008/squigulator/actions/workflows/c-cpp.yml)


Please cite the following in your publications when using *squigulator*:

> Gamaarachchi, H., Ferguson, J. M., Samarakoon, H., Liyanage, K., & Deveson, I. W. (2023). Squigulator: simulation of nanopore sequencing signal data with tunable noise parameters. bioRxiv, 2023-05.
```
@article{gamaarachchi2023squigulator,
title={Squigulator: simulation of nanopore sequencing signal data with tunable noise parameters},
author={Gamaarachchi, Hasindu and Ferguson, James M and Samarakoon, Hiruna and Liyanage, Kisaru and Deveson, Ira W},
journal={bioRxiv},
pages={2023--05},
year={2023},
publisher={Cold Spring Harbor Laboratory}
}
```

## Background story

*squigulator* started as *ssssim* (Stupidly Simple Signal Simulator). For an experiment, [kisarur](https://github.com/kisarur) wanted some simulated data. After [hiruna72](https://github.com/hiruna72) trying ~3 days to get an existing simulator installed (dependency and compatibility issues), I thought that writing a simple tool from scratch is easier. Indeed, that is when writing BLOW5 files. Writing over complicated formats like FAST5 or POD5 would consume months and I would not think about writing a simulator in the first place then.
Expand All @@ -23,7 +42,7 @@ After getting the basic *ssssim* implemented in ~8 hours and successfully baseca
For x86-64 Linux, you can use the precompiled binaries under [releases](https://github.com/hasindu2008/squigulator/releases):

```
VERSION=0.2.0-dirty
VERSION=0.3.0-dirty
wget https://github.com/hasindu2008/squigulator/releases/download/v${VERSION}/squigulator-v${VERSION}-x86_64-linux-binaries.tar.gz
tar xf squigulator-v${VERSION}-x86_64-linux-binaries.tar.gz && cd squigulator-v${VERSION}
./squigulator --help
Expand Down Expand Up @@ -52,13 +71,15 @@ The simplest command to generate reads:
squigulator [OPTIONS] ref_genome.fa -o out_signal.blow5 -n NUM_READS
```

By default, DNA PromethION reads (R9.4.1) will be simulated. Specify the `-x STR` option to set a different profile from the following available pre-sets (inspired by pre-sets in [Minimap2](https://github.com/lh3/minimap2)).
By default, DNA PromethION reads (R9.4.1) will be simulated. Specify the `-x STR` option to set a different profile from the following available pre-sets (see [here](docs/profile.md) for more info).
- `dna-r9-min`: genomic DNA on MinION R9.4.1 flowcells
- `dna-r9-prom`: genomic DNA on PromethION R9.4.1 flowcells
- `rna-r9-min`: direct RNA on MinION R9.4.1 flowcells
- `rna-r9-prom`: direct RNA on PromethION R9.4.1 flowcells
- `dna-r10-min`: genomic DNA on MinION R10.4.1 flowcells
- `dna-r10-prom`: genomic DNA on PromethION R10.4.1 flowcells
- `rna004-min`: direct RNA on MinION RNA004 flowcells
- `rna004-prom`: direct RNA on promethION RNA004 flowcells

If a genomic DNA profile is selected, the input reference must be the **reference genome in *FASTA* format**. *squigulator* will randomly sample the genome from a uniform distribution and generate reads whose lengths are from a gamma distribution (based on `-r`). If a direct RNA profile is selected, the input reference must be the **transcriptome in *FASTA* format**. For RNA, *squigulator* will randomly pick transcripts from a uniform distribution and the whole transcript length is simulated.

Expand Down
Loading

0 comments on commit 12d25c1

Please sign in to comment.