Raw sequences produced by next generation sequencing (NGS) machines contain adapter, linker, barcode and fingerprint (UMI) sequences. TagDust2 is designed to make is as easy as possible to de-multiplex reads from any library preparation method. In addition, TagDust2 can detect which library prep. was used from the raw reads themselves making it possible to automate processing pipelines.
TagDust allows users to specify the expected architecture of a read and converts it into a hidden Markov model. The latter can assign sequences to a particular barcode (or index) even in the presence of sequencing errors. Sequences not matching the architecture (primer dimers, contaminants etc.) are automatically discarded.
Unpack the tarball:
tar -zxvf tagdust-XXX.tar.gz
cd tagdust
./autogen.sh
./configure
make
make check
At this point the TagDust executable appears in the src directory. You can copy it to any directory in your path. To install it system wide type:
make install
Have a look at the user manual in the doc directory!
Lassmann, Timo. "TagDust2: a generic method to extract reads from sequencing data." BMC bioinformatics 16.1 (2015): 24. doi.org/10.1186/s12859-015-0454-y