DUX4 fusions finder
Duxhund is a dedicated tool to call DUX4 fusions. It calls DUX4 fusions by the following three steps:
- Align the input reads by BWA
- Cut off softclips from the alignments and realign them with BWA, using a reference sequence that masks the pseudo-gene regions of DUX4
- Extract appropriate triplets of alignments from the realigned result and call them with fusionfusion
- BWA
- samtools
- fusionfusion
- Java JDK (>= 8)
- Leiningen (to build Clojure code)
Duxhund provides the following scripts:
duxhund.sh
: to run from the command lineduxhund_batch.sh
to run as an AWS Batch job
The duxhund.sh
script supports two ways to run:
- To run in a Docker container
- To run directly from the command line
To run duxhund.sh
in a Docker container, you'll need to build the Docker image first.
To build the Docker image, run:
./script/build.sh
Or, if you would like to name the resulting image, run:
IMAGE_NAME=<image name> ./script/build.sh
By default, the image name will be chrovis/duxhund:latest
.
Once you build the image, you can run duxhund.sh
as:
docker run <image name> duxhund.sh <arg> ...
To run duxhund.sh
directly from the command line, you'll need to build the Clojure code first.
To build it, run:
lein uberjar
After the Clojure code is successfully built, the duxhund.jar
file will be generated in the target
directory.
To run duxhund.sh
, run:
DUXHUND_JAR=<duxhund.jar path> duxhund.sh <arg> ...
If you have the dependant tools (i.e. BWA, samtools and fusionfusion) installed not on your PATH
, you'll need to specify their installation paths in addition:
BWA=<bwa path> SAMTOOLS=<samtools path> \
FUSIONFUSION=<fusionfusion path> DUXHUND_JAR=<duxhund.jar path> duxhund.sh <arg> ...
The duxhund.sh
script takes the following options as the command line arguments, which are all mandatory:
--reference
: The path to the reference FASTA file--masked-reference
: The path to the masked reference FASTA file (see below for details)--target
: The path to the target BED file--r1
,--r2
: The paths to the input FASTQ files--output
: The path to the output directory
A masked reference is a reference that is masked for the regions of DUX4 pseudo-genes. To make a masked reference, run the following commands:
$ bedtools maskfasta \
-fi <input FASTA file> \
-bed <DUX4 pseudo-gene BED file> \
-fo <output FASTA file>
$ bwa index <output FASTA file>
Duxhund bundles the DUX4 pseudo-gene BED file in the resources/
directory.
Copyright © 2021 Xcoo, Inc.
This program is distributed under the GNU General Public License v3. See LICENSE for details.