Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved GPU utilization #2

Open
wants to merge 39 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
dfb20f7
reordered flow to only process full batches
wings-public Nov 12, 2022
d72d9ee
sporadic missing predictions handling
wings-public Nov 12, 2022
6f1d40e
print overall performance when batching
wings-public Nov 13, 2022
7662a76
fix tensorflow batch size to actually use the provided size
wings-public Nov 14, 2022
ecd71f6
revise logic to read parallel to prediction
wings-public Nov 14, 2022
5e64b96
disabled performance output on predict in illumina implementation
wings-public Nov 15, 2022
e20c586
furhter optimization of gpu usage through offloading np-tensor conver…
wings-public Nov 15, 2022
ee16e00
add gc collect to original code to prevent oom kills
wings-public Nov 16, 2022
3160793
add long variant for all arguments
wings-public Nov 16, 2022
36a6659
add dockerfile
wings-public Nov 16, 2022
e6e06cc
moved docker directory
wings-public Nov 16, 2022
d223e4c
corrected long argument access
wings-public Nov 16, 2022
580f5d5
updated benchmarks
wings-public Nov 16, 2022
90a3f01
added docker link
wings-public Nov 16, 2022
f7ec23c
small fix to dockerfile
wings-public Nov 16, 2022
7255331
small fix to dockerfile
wings-public Nov 16, 2022
b1f28d2
fixed typo
wings-public Nov 16, 2022
2c5d052
fixed code layout
wings-public Nov 16, 2022
1b28d31
fixed code layout
wings-public Nov 16, 2022
2cf5ac5
fixed table layout
wings-public Nov 16, 2022
af2e7f8
Update README.md
geertvandeweyer Nov 16, 2022
c7fd579
revised code to scale to multiple gpus
geertvandeweyer Nov 28, 2022
cbc9679
relocate annotation loader in worker
geertvandeweyer Nov 29, 2022
0754a37
fix arguments
geertvandeweyer Nov 29, 2022
e2ec04b
fix arguments
geertvandeweyer Nov 29, 2022
aad6bf9
add small sleep investigating issues above 3 gpus
geertvandeweyer Nov 29, 2022
fdbcf1d
looking into startup issues
geertvandeweyer Nov 29, 2022
31a2ad0
There is an issue when going above 2 GPUs
geertvandeweyer Nov 29, 2022
0e89d63
hide physical devices in batch workers to evaluate memory issues
geertvandeweyer Nov 29, 2022
18c6dac
pass nonmasked device to worker for correct shelf names
geertvandeweyer Nov 29, 2022
a1ce6c5
corrected arguments
geertvandeweyer Nov 29, 2022
d3cb462
final code cleanup
geertvandeweyer Nov 29, 2022
bf00fb1
Added custom port option
barneyhill Mar 16, 2023
6e61778
Merge pull request #1 from barneyhill/master
geertvandeweyer Mar 18, 2023
c985bc1
Set default port
matthiasblum Mar 22, 2023
36d22d6
Add -P option to README
matthiasblum Mar 22, 2023
12c3079
Merge pull request #2 from matthiasblum/master
geertvandeweyer Mar 23, 2023
e09bca6
working on error handling to shutdown on issues
geertvandeweyer Sep 9, 2024
a354e38
fix joining of workers
geertvandeweyer Sep 13, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
124 changes: 93 additions & 31 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,16 +9,35 @@ This package annotates genetic variants with their predicted effect on splicing,
SpliceAI source code is provided under the [GPLv3 license](LICENSE). SpliceAI includes several third party packages provided under other open source licenses, please see [NOTICE](NOTICE) for additional details. The trained models used by SpliceAI (located in this package at spliceai/models) are provided under the [CC BY NC 4.0](LICENSE) license for academic and non-commercial use; other use requires a commercial license from Illumina, Inc.

### Installation
The simplest way to install SpliceAI is through pip or conda:

This release can most easily be used as a docker container:

```sh
docker pull cmgantwerpen/spliceai_v1.3:latest

docker run --gpus all cmgantwerpen/spliceai_v1.3:latest spliceai -h
```

A container including reference and annotation data is available as well:


```sh
docker pull cmgantwerpen/spliceai_v1.3:full
```
Note that this version has a larger footprint (12Gb). Data is available for Genome Build hg19 and hg38 under /data/



The simplest way to install (the original version of) SpliceAI is through pip or conda:
```sh
pip install spliceai
# or
conda install -c bioconda spliceai
```

Alternately, SpliceAI can be installed from the [github repository](https://github.com/Illumina/SpliceAI.git):
Alternately, SpliceAI can be installed from the [github repository](https://github.com/invitae/SpliceAI.git):
```sh
git clone https://github.com/Illumina/SpliceAI.git
git clone https://github.com/invitae/SpliceAI.git
cd SpliceAI
python setup.py install
```
Expand All @@ -42,41 +61,74 @@ Required parameters:
- ```-I```: Input VCF with variants of interest.
- ```-O```: Output VCF with SpliceAI predictions `ALLELE|SYMBOL|DS_AG|DS_AL|DS_DG|DS_DL|DP_AG|DP_AL|DP_DG|DP_DL` included in the INFO column (see table below for details). Only SNVs and simple INDELs (REF or ALT is a single base) within genes are annotated. Variants in multiple genes have separate predictions for each gene.
- ```-R```: Reference genome fasta file. Can be downloaded from [GRCh37/hg19](http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz) or [GRCh38/hg38](http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz).
- ```-A```: Gene annotation file. Can instead provide `grch37` or `grch38` to use GENCODE V24 canonical annotation files included with the package. To create custom gene annotation files, use `spliceai/annotations/grch37.txt` in repository as template.
- ```-A```: Gene annotation file. Can instead provide `grch37` or `grch38` to use GENCODE V24 canonical annotation files included with the package. To create custom gene annotation files, use `spliceai/annotations/grch37.txt` in repository as template and provide as full path.

Optional parameters:
- ```-D```: Maximum distance between the variant and gained/lost splice site (default: 50).
- ```-M```: Mask scores representing annotated acceptor/donor gain and unannotated acceptor/donor loss (default: 0).
- ```-B```: Number of predictions to collect before running models on them in batch. (default: 1 (don't batch))
- ```-T```: Internal Tensorflow `predict()` batch size if you want something different from the `-B` value. (default: the `-B` value)
- ```-V```: Enable verbose logging during run

**Batching Considerations:** When setting the batching parameters, be mindful of the system and gpu memory of the machine you
are running the script on. Feel free to experiment, but some reasonable `-B` numbers would be 64/128.

Batching Performance Benchmarks:

| Type | Speed |
| -------- | ----------- |
| n1-standard-2 CPU (GCP) | ~800 per hour |
| CPU (2019 MacBook Pro) | ~3,000 per hour |
| K80 GPU (GCP) | ~25,000 per hour |
| V100 GPU (GCP) | ~150,000 per hour |

Details of SpliceAI INFO field:

| ID | Description |
| -------- | ----------- |
| ALLELE | Alternate allele |
| SYMBOL | Gene symbol |
| DS_AG | Delta score (acceptor gain) |
| DS_AL | Delta score (acceptor loss) |
| DS_DG | Delta score (donor gain) |
| DS_DL | Delta score (donor loss) |
| DP_AG | Delta position (acceptor gain) |
| DP_AL | Delta position (acceptor loss) |
| DP_DG | Delta position (donor gain) |
| DP_DL | Delta position (donor loss) |
- ```-t```: Specify a location to create the temporary files
- ```-G```: Specify the GPU(s) to run on : either indexed (eg : 0,2) or 'all'. (default: 'all')
- ```-S```: Simulate *n* multiple GPUs on a single physical device. Used for development only, currently all values above 2 crashed due to memory issues. (default: 0)
- ```-P```: Port to use when connecting to the socket (default: 54677, only used in batch mode).

**Batching Considerations:**

When setting the batching parameters, be mindful of the system and gpu memory of the machine you
are running the script on. Feel free to experiment, but some reasonable `-T` numbers would be 64/128. CPU memory is larger, and increasing `-B` might further improve performance.

*Batching Performance Benchmarks:*
- Input data: GATK generated WES sample with ~ 90K variants in genome build GRCh37.
- Total predictions made : 174,237
- invitae v2 mainly implements logic to prioritize full batches while predicting
- settings :
- invitae & invitae v2 : B = T = 64
- invitae v2 optimal : on V100 : B = 4096 ; T = 256 -- on K80/GeForce : B = 4096 ; T = 64

*Benchmark results*

| Type | Implementation | Total Time | Speed (predictions / hour) |
|--------------------------------------|-----------------------|------------|----------------------------|
| CPU (intel i5-8365U)<sup>a</sup> | illumina | ~100h | ~1000 pred/h |
| | invitae | ~39h | ~4500 pred/h |
| | invitae v2 | ~35h | ~5000 pred/h |
| | invitae v2 optimal | ~35h | ~5000 pred/h |
| K80 GPU (AWS p2.large) | illumina</sup>b</sup> | ~25 h | ~7000 pred/h |
| | invitae | 242m | ~43,000 pred / h |
| | invitae v2 | 213m | ~50,000 pred / h |
| | invitae v2 optimal | 188 m | ~56,000 pred / h |
| GeForce RTX 2070 SUPER GPU (desktop) | illumina</sup>b</sup> | ~10 h | ~ 17,000 pred/h |
| | invitae | 76m | ~137,000 pred / h |
| | invitae v2 | 63m | ~166,000 pred / h |
| | invitae v2 optimal | 52m | ~200,000 pred / h |
| V100 GPU (AWS p3.xlarge) | illumina<sup>b</sup> | ~10h | ~18,000 pred/h |
| | invitae | 78m | ~135,000 pred / h |
| | invitae v2 | 54m | ~190,000 pred / h |
| | invitae v2 optimal | 31 m | ~335,000 pred / h |


<sup>(a)</sup> : Extrapolated from first 500 variants

<sup>(b)</sup> : Illumina implementation showed a memory leak with the installed versions of tf/keras/.... Values extrapolated from incomplete runs at the point of OOM.

*Note:* On a p3.8xlarge machine, hosting 4 V100 GPU's, we were able reach 1,379,505 predictions/hour ! This is a nearly linear scale-up.

### Details of SpliceAI INFO field:

| ID | Description |
|--------|--------------------------------|
| ALLELE | Alternate allele |
| SYMBOL | Gene symbol |
| DS_AG | Delta score (acceptor gain) |
| DS_AL | Delta score (acceptor loss) |
| DS_DG | Delta score (donor gain) |
| DS_DL | Delta score (donor loss) |
| DP_AG | Delta position (acceptor gain) |
| DP_AL | Delta position (acceptor loss) |
| DP_DG | Delta position (donor gain) |
| DP_DL | Delta position (donor loss) |

Delta score of a variant, defined as the maximum of (DS_AG, DS_AL, DS_DG, DS_DL), ranges from 0 to 1 and can be interpreted as the probability of the variant being splice-altering. In the paper, a detailed characterization is provided for 0.2 (high recall), 0.5 (recommended), and 0.8 (high precision) cutoffs. Delta position conveys information about the location where splicing changes relative to the variant position (positive values are downstream of the variant, negative values are upstream).

Expand Down Expand Up @@ -133,5 +185,15 @@ donor_prob = y[0, :, 2]
* Adds test cases to run a small file using a generated FASTA reference to test if the results are the same with no batching and with different batching sizes
* Slightly modifies the entrypoint of running the code to allow for easier unit testing. Being able to pass in what would normally come from the argparser

**Multi-GPU support** - Geert Vandeweyer (_November 2022_)

* Offload more code to CPU (eg np to tensor conversion) to *only* perform predictions on the GPU
* Implement queuing system to always have full batches ready for prediction
* Implement new parameter, `--tmpdir` to support a custom tmp folder to store prepped batches
* Implement socket-based client/server approach to scale over multiple GPUs


### Contact
Kishore Jaganathan: [email protected]

Geert Vandeweyer (This implementation) : [email protected]
67 changes: 67 additions & 0 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
######################################
## CONTAINER FOR GPU based SpliceAI ##
######################################

# start from the cuda docker base
FROM nvidia/cuda:11.4.0-base-ubuntu20.04

LABEL version="1.3"
LABEL description="This container was tested with \
- V100 on AWS p3.2xlarge with nvidia drivers 510.47.03 and cuda v11.6 \
- K80 on AWS p2.xlarge with nvidia drivers 470.141.03 and cuda v11.4 \
- Geforce RTX 2070 SUPER (local) with nvidia drivers 470.141.03 and cuda v11.4"

LABEL author="Geert Vandeweyer"
LABEL author.email="[email protected]"

## needed apt packages
ARG BUILD_PACKAGES="wget git bzip2"
# needed conda packages

ARG CONDA_PACKAGES="python=3.9.13 tensorflow-gpu=2.10.0 cuda-nvcc=11.8.89"

## ENV SETTINGS during runtime
ENV LANG=C.UTF-8 LC_ALL=C.UTF-8
ENV PATH=/opt/conda/bin:$PATH
ENV DEBIAN_FRONTEND noninteractive

# For micromamba:
SHELL ["/bin/bash", "-l", "-c"]
ENV MAMBA_ROOT_PREFIX=/opt/conda/
ENV PATH=/opt/micromamba/bin:/opt/conda/bin:$PATH
ARG CONDA_CHANNEL="-c bioconda -c conda-forge -c nvidia"

## INSTALL
RUN apt-get -y update && \
apt-get -y install $BUILD_PACKAGES && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*


# conda packages
RUN mkdir /opt/conda && \
mkdir /opt/micromamba && \
wget -qO - https://micromamba.snakepit.net/api/micromamba/linux-64/0.23.0 | tar -xvj -C /opt/micromamba bin/micromamba && \
# initialize bash
micromamba shell init --shell=bash --prefix=/opt/conda && \
# remove a statement from bashrc that prevents initialization
grep -v '[ -z "\$PS1" ] && return' /root/.bashrc > /opt/micromamba/bashrc && \
mv /opt/micromamba/bashrc /root/.bashrc && \
source ~/.bashrc && \
# activate & install base conda packag
micromamba activate && \
micromamba install -y $CONDA_CHANNEL $CONDA_PACKAGES && \
micromamba clean --all --yes

# Break cache for recloning git
ARG DATE_CACHE_BREAK=$(date)

# my fork of spliceai : has gpu optimizations
RUN cd /opt/ && \
git clone https://github.com/geertvandeweyer/SpliceAI.git && \
cd SpliceAI && \
python setup.py install

# no command given, print help.
CMD spliceai -h

Loading