Skip to content

Commit

Permalink
Merge pull request #273 from rmcolq/dev
Browse files Browse the repository at this point in the history
Merge dev into master
  • Loading branch information
leoisl authored Apr 13, 2021
2 parents d9c4201 + 99d2a7b commit 9735340
Show file tree
Hide file tree
Showing 49 changed files with 1,620 additions and 447 deletions.
40 changes: 9 additions & 31 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -65,20 +65,6 @@ EcoliK12_pandora*
# local directory - michael
vm-build/

# local directory - leandro
Singularity_debug.pandora
debug_with_singularity.sh
pandora_debug.img
build_debug*
build/
data_issue*
build_release*
build_profiling*
docker/

mounted_folder/

docker_dev/
CMakeCache.txt
CMakeFiles
CTestTestfile.cmake
Expand All @@ -100,24 +86,16 @@ cmake_install.cmake
compile_commands.json
pandora

example/output_toy_example_no_denovo/
example/prgs/kmer_prgs/
example/prgs/toy_prg.fa.k15.w14.idx
example/pandora-latest.simg
example/pandora_workflow
!example/msas/custom/GC00006032.fa
!example/msas/custom/GC00010897.fa
!example/pandora_workflow_data/assemblies/samples/toy_sample_1/toy_sample_1.ref.fa
!example/pandora_workflow_data/assemblies/samples/toy_sample_2/toy_sample_2.ref.fa
!example/prgs/toy_prg.fa
!example/scripts/data/ref_to_get_reads_from.toy_example_1.fa
!example/scripts/data/ref_to_get_reads_from.toy_example_2.fa

!example/msas
!example/reads
/example/pandora_discover_out/
/example/prgs/
/example/output_toy_example*
/example/updated_prgs/
/example/pandora-linux-precompiled*
/example/make_prg_*

#portable binary build dir
build_portable_executable
pandora-linux-precompiled

/cmake-build-release/
/example/pandora-linux-precompiled-v0.8.0
/example/pandora-linux-precompiled-v0.8.0.gz
/example/prgs/
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,6 @@
[submodule "thirdparty/cgranges"]
path = thirdparty/cgranges
url = https://github.com/lh3/cgranges
[submodule "thirdparty/seqan"]
path = thirdparty/seqan
url = https://github.com/seqan/seqan.git
25 changes: 24 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,27 @@ project adheres to

## [Unreleased]

## [0.9.0-rc2]

### Changed
- `pandora discover` now processes one sample at a time, but runs with several threads on the heavy tasks, i.e. when
mapping reads, finding candidate regions, and finding denovo variants. The result is that it now takes a lot less RAM to
run on multiple samples.

## [0.9.0-rc1]

### Changed
- `pandora discover` now receives read index files describing samples and reads, and discover denovo sequences in these samples.
To improve performance on discovering denovo sequences on several samples, `pandora discover` is now multithreaded, but
the performance is still the same as the previous version, i.e. each sample is processed in a single-threaded way;
- `pandora discover` output changed to a proprietary format. See [example](example) for the new output;
- `pandora` can now communicate with a [`make_prg` prototype](https://github.com/leoisl/make_prg) that is able to update PRGs
without needing to realign and remake the PRG. This provides major performance upgrades to running the full `pandora` pipeline
with denovo discovery enabled, and there is no need anymore to use a `snakemake` pipeline
(see [this example](example/run_pandora.sh) to how to run the full pipeline);
- We now use [musl libc](https://musl.libc.org/) instead of [Holy Build Box](https://github.com/phusion/holy-build-box)
to build a precompiled portable binary, removing the dependency on `OpenMP 4.0+` or `GCC 4.9+`, and `GLIBC`;

## [0.8.0]

### Added
Expand Down Expand Up @@ -71,7 +92,9 @@ from this point will have their changes meticulously documented here.

- k-mer coverage underflow bug in `LocalPRG` [[#183][183]]

[Unreleased]: https://github.com/olivierlacan/keep-a-changelog/compare/0.8.0...HEAD
[Unreleased]: https://github.com/rmcolq/pandora/compare/0.9.0-rc2...HEAD
[0.9.0-rc2]: https://github.com/rmcolq/pandora/releases/tag/0.9.0-rc2
[0.9.0-rc1]: https://github.com/rmcolq/pandora/releases/tag/0.9.0-rc1
[0.8.0]: https://github.com/rmcolq/pandora/releases/tag/0.8.0
[v0.7.0]: https://github.com/rmcolq/pandora/releases/tag/v0.7.0

Expand Down
21 changes: 17 additions & 4 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@ HunterGate(

# project configuration
set(PROJECT_NAME_STR pandora)
project(${PROJECT_NAME_STR} VERSION "0.8.0" LANGUAGES C CXX)
project(${PROJECT_NAME_STR} VERSION "0.9.0" LANGUAGES C CXX)
set(ADDITIONAL_VERSION_LABELS "-rc2")
configure_file( include/version.h.in ${CMAKE_BINARY_DIR}/include/version.h )

# add or not feature to print the stack trace
Expand Down Expand Up @@ -91,9 +92,9 @@ set(Gtest_LIBRARIES GTest::gtest GTest::gmock_main)

########################################################################################################################
# INSTALL BOOST
hunter_add_package(Boost COMPONENTS filesystem iostreams log system thread)
find_package(Boost CONFIG REQUIRED filesystem iostreams log system thread)
set(BOOST_LIBRARIES Boost::filesystem Boost::iostreams Boost::log Boost::system Boost::thread)
hunter_add_package(Boost COMPONENTS filesystem iostreams log serialization system thread)
find_package(Boost CONFIG REQUIRED filesystem iostreams log serialization system thread)
set(BOOST_LIBRARIES Boost::filesystem Boost::iostreams Boost::log Boost::serialization Boost::system Boost::thread)
set(Boost_USE_STATIC_LIBS ON)
########################################################################################################################
########################################################################################################################
Expand All @@ -104,10 +105,18 @@ set(Boost_USE_STATIC_LIBS ON)
########################################################################################################################
# PANDORA INSTALLATION
########################################################################################################################
# allows Seqan to be found
list(APPEND CMAKE_PREFIX_PATH "${PROJECT_SOURCE_DIR}/thirdparty/seqan/util/cmake")
set(SEQAN_INCLUDE_PATH "${PROJECT_SOURCE_DIR}/thirdparty/seqan/include")

# Load the SeqAn module and fail if not found
find_package (SeqAn REQUIRED)

#include directories as SYSTEM includes, thus warnings will be ignored for these
include_directories(SYSTEM
${CMAKE_BINARY_DIR}/include
${PROJECT_SOURCE_DIR}/thirdparty/cgranges/cpp
${SEQAN_INCLUDE_DIRS}
)

# normal includes: warnings will be reported for these
Expand All @@ -118,6 +127,9 @@ include_directories(
${PROJECT_SOURCE_DIR}/thirdparty/src
)

# Add definitions set by find_package (SeqAn).
add_definitions (${SEQAN_DEFINITIONS})

file(GLOB_RECURSE SRC_FILES
${PROJECT_SOURCE_DIR}/src/*.cpp
${PROJECT_SOURCE_DIR}/src/*/*.cpp
Expand All @@ -141,6 +153,7 @@ target_link_libraries(${PROJECT_NAME}
${CMAKE_DL_LIBS}
${STATIC_C_CXX}
${BACKWARD_LIBRARIES}
${SEQAN_LIBRARIES}
)

enable_testing()
Expand Down
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ ENV PANDORA_DIR "/pandora/"

COPY . $PANDORA_DIR
WORKDIR ${PANDORA_DIR}/build
RUN cmake -DCMAKE_BUILD_TYPE="$PANDORA_BUILD_TYPE" -j4 .. \
RUN cmake -DCMAKE_BUILD_TYPE="$PANDORA_BUILD_TYPE" -DHUNTER_JOBS_NUMBER=4 .. \
&& make -j4 \
&& ctest -V \
&& apt-get remove -y cmake git \
Expand Down
32 changes: 8 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
- [Quick Start](#quick-start)
- [Hands-on toy example](#hands-on-toy-example)
- [Installation](#installation)
- [Precompiled portable binary](#no-installation-needed---precompiled-portable-binary)
- [Containers](#containers)
- [Installation from source](#installation-from-source)
- [Usage](#usage)
Expand All @@ -27,7 +28,7 @@ Pandora is a tool for bacterial genome analysis using a pangenome reference grap
The PanRG is a collection of 'floating'
local graphs (PRGs), each representing some orthologous region of interest
(e.g. genes, mobile elements or intergenic regions). See
https://github.com/rmcolq/make_prg for a pipeline which can construct
https://github.com/leoisl/make_prg for a tool which can construct
these PanRGs from a set of aligned sequence files.

Pandora can do the following for a single sample (read dataset):
Expand Down Expand Up @@ -66,45 +67,28 @@ pandora map <panrg.fa> <reads.fq>
## Hands-on toy example

You can test `pandora` on a toy example following [this link](example).
There is no need to have `pandora` installed, as it is run inside containers.
**There is no need to have `pandora` installed.**

## Installation

### No installation needed - precompiled portable binary

You can use `pandora` with no installation at all by simply downloading the precompiled binary, and running it.
In this binary, all libraries are linked statically, except for OpenMP.

* **Requirements**
* The only dependency required to run the precompiled binary is OpenMP 4.0+;
* The easiest way to install OpenMP 4.0+ is to have GCC 4.9 (from April 22, 2014) or more recent installed, which supports OpenMP 4.0;
* Technical details on why OpenMP can't be linked statically
can be found [here](https://gcc.gnu.org/onlinedocs/gfortran/OpenMP.html).
In this binary, all libraries are linked statically.

* **Download**:
```
wget https://github.com/rmcolq/pandora/releases/download/0.8.0/pandora-linux-precompiled-v0.8.0.gz
gunzip pandora-linux-precompiled-v0.8.0.gz
wget https://github.com/rmcolq/pandora/releases/download/0.9.0-rc2/pandora-linux-precompiled-v0.9.0-rc2
```

* **Running**:
```
chmod +x pandora-linux-precompiled-v0.8.0
./pandora-linux-precompiled-v0.8.0 -h
chmod +x pandora-linux-precompiled-v0.9.0-rc2
./pandora-linux-precompiled-v0.9.0-rc2 -h
```

* **Compatibility**: This precompiled binary works on pretty much any glibc-2.12-or-later-based x86 and x86-64 Linux distribution
released since approx 2011. A non-exhaustive list: Debian >= 7, Ubuntu >= 10.10, Red Hat Enterprise Linux >= 6,
CentOS >= 6;

* **Credits**:
* Precompilation is done using [Holy Build Box](http://phusion.github.io/holy-build-box/);
* We acknowledge Páll Melsted since we followed his [blog post](https://pmelsted.wordpress.com/2015/10/14/building-binaries-for-bioinformatics/) to build this portable binary.

* **Notes**:
* We provide precompiled binaries for Linux OS only;
* The performance of precompiled binaries is several times slower than a binary compiled from source.
The main reason is that the precompiled binary can't contain specific instructions that might speed up
the execution on specific processors, as it has to be runnable on a wide range of systems;

### Containers

Expand Down
2 changes: 1 addition & 1 deletion ci/script.sh
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#!/bin/bash
set -evu

export CMAKE_OPTIONS="-DCMAKE_BUILD_TYPE=$BUILD_TYPE "
export CMAKE_OPTIONS="-DCMAKE_BUILD_TYPE=$BUILD_TYPE -DHUNTER_JOBS_NUMBER=4"

echo "$CMAKE_OPTIONS"
mkdir build
Expand Down
9 changes: 4 additions & 5 deletions doc/Usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,7 @@ graphs, one entry for each gene/ genome region of interest. If you
haven't, you will need a multiple sequence alignment for each graph.
Precompiled collections of MSA representing othologous gene clusters for
a number of species can be downloaded from [here](http://pangenome.de/)
and converted to graphs using the pipeline from
[here](https://github.com/rmcolq/make_prg).
and converted to graphs using [make_prg](https://github.com/leoisl/make_prg).

# Build index

Expand Down Expand Up @@ -193,19 +192,19 @@ Genotyping:
-G,--gt-conf INT Minimum genotype confidence (GT_CONF) required to make a call [default: 1]
```

# Discover novel variants
# Discover novel variants in several samples

This will look for regions in the pangraph where the reads do not map
and attempt to locally assemble these regions to find novel variants.

```
$ pandora discover --help
Quasi-map reads to an indexed PRG, infer the sequence of present loci in the sample and discover novel variants.
Usage: pandora discover [OPTIONS] <TARGET> <QUERY>
Usage: pandora discover [OPTIONS] <TARGET> <QUERY_IDX>
Positionals:
<TARGET> FILE [required] An indexed PRG file (in fasta format)
<QUERY> FILE [required] Fast{a,q} file containing reads to quasi-map
<QUERY_IDX> FILE [required] A tab-delimited file where each line is a sample identifier followed by the path to the fast{a,q} of reads for that sample
Options:
-h,--help Print this help message and exit
Expand Down
Loading

0 comments on commit 9735340

Please sign in to comment.