diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md index 83bf5a89..484c13c3 100644 --- a/.github/CONTRIBUTING.md +++ b/.github/CONTRIBUTING.md @@ -1,47 +1,57 @@ # nf-core/chipseq: Contributing Guidelines -Hi there! Many thanks for taking an interest in improving nf-core/chipseq. +Hi there! +Many thanks for taking an interest in improving nf-core/chipseq. -We try to manage the required tasks for nf-core/chipseq using GitHub issues, you probably came to this page when creating one. Please use the pre-filled template to save time. - -However, don't be put off by this template - other more general issues and suggestions are welcome! Contributions to the code are even more welcome ;) - -> If you need help using or modifying nf-core/chipseq then the best place to ask is on the pipeline channel on [Slack](https://nf-co.re/join/slack/). +We try to manage the required tasks for nf-core/chipseq using GitHub issues, you probably came to this page when creating one. +Please use the pre-filled template to save time. +However, don't be put off by this template - other more general issues and suggestions are welcome! +Contributions to the code are even more welcome ;) +> If you need help using or modifying nf-core/chipseq then the best place to ask is on the nf-core Slack [#chipseq](https://nfcore.slack.com/channels/chipseq) channel ([join our Slack here](https://nf-co.re/join/slack)). ## Contribution workflow -If you'd like to write some code for nf-core/chipseq, the standard workflow -is as follows: -1. Check that there isn't already an issue about your idea in the - [nf-core/chipseq issues](https://github.com/nf-core/chipseq/issues) to avoid - duplicating work. +If you'd like to write some code for nf-core/chipseq, the standard workflow is as follows: + +1. Check that there isn't already an issue about your idea in the [nf-core/chipseq issues](https://github.com/nf-core/chipseq/issues) to avoid duplicating work * If there isn't one already, please create one so that others know you're working on this -2. Fork the [nf-core/chipseq repository](https://github.com/nf-core/chipseq) to your GitHub account +2. [Fork](https://help.github.com/en/github/getting-started-with-github/fork-a-repo) the [nf-core/chipseq repository](https://github.com/nf-core/chipseq) to your GitHub account 3. Make the necessary changes / additions within your forked repository -4. Submit a Pull Request against the `dev` branch and wait for the code to be reviewed and merged. - -If you're not used to this workflow with git, you can start with some [basic docs from GitHub](https://help.github.com/articles/fork-a-repo/) or even their [excellent interactive tutorial](https://try.github.io/). +4. Submit a Pull Request against the `dev` branch and wait for the code to be reviewed and merged +If you're not used to this workflow with git, you can start with some [docs from GitHub](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests) or even their [excellent `git` resources](https://try.github.io/). ## Tests -When you create a pull request with changes, [Travis CI](https://travis-ci.com/) will run automatic tests. + +When you create a pull request with changes, [GitHub Actions](https://github.com/features/actions) will run automatic tests. Typically, pull-requests are only fully reviewed when these tests are passing, though of course we can help out before then. There are typically two types of tests that run: ### Lint Tests -The nf-core has a [set of guidelines](https://nf-co.re/developers/guidelines) which all pipelines must adhere to. + +`nf-core` has a [set of guidelines](https://nf-co.re/developers/guidelines) which all pipelines must adhere to. To enforce these and ensure that all pipelines stay in sync, we have developed a helper tool which runs checks on the pipeline code. This is in the [nf-core/tools repository](https://github.com/nf-core/tools) and once installed can be run locally with the `nf-core lint ` command. If any failures or warnings are encountered, please follow the listed URL for more documentation. ### Pipeline Tests -Each nf-core pipeline should be set up with a minimal set of test-data. -Travis CI then runs the pipeline on this data to ensure that it exists successfully. + +Each `nf-core` pipeline should be set up with a minimal set of test-data. +`GitHub Actions` then runs the pipeline on this data to ensure that it exits successfully. If there are any failures then the automated tests fail. -These tests are run both with the latest available version of Nextflow and also the minimum required version that is stated in the pipeline code. +These tests are run both with the latest available version of `Nextflow` and also the minimum required version that is stated in the pipeline code. + +## Patch + +: warning: Only in the unlikely and regretful event of a release happening with a bug. + +* On your own fork, make a new branch `patch` based on `upstream/master`. +* Fix the bug, and bump version (X.Y.Z+1). +* A PR should be made on `master` from patch to directly this particular bug. ## Getting help -For further information/help, please consult the [nf-core/chipseq documentation](https://github.com/nf-core/chipseq#documentation) and don't hesitate to get in touch on the [nf-core/chipseq pipeline channel](https://nfcore.slack.com/channels/chipseq) on [Slack](https://nf-co.re/join/slack/). + +For further information/help, please consult the [nf-core/chipseq documentation](https://nf-co.re/nf-core/chipseq/docs) and don't hesitate to get in touch on the nf-core Slack [#chipseq](https://nfcore.slack.com/channels/chipseq) channel ([join our Slack here](https://nf-co.re/join/slack)). diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md index b1700e7c..f394683e 100644 --- a/.github/ISSUE_TEMPLATE/bug_report.md +++ b/.github/ISSUE_TEMPLATE/bug_report.md @@ -1,31 +1,42 @@ +# nf-core/chipseq bug report + Hi there! -Thanks for telling us about a problem with the pipeline. Please delete this text and anything that's not relevant from the template below: +Thanks for telling us about a problem with the pipeline. +Please delete this text and anything that's not relevant from the template below: + +## Describe the bug -#### Describe the bug A clear and concise description of what the bug is. -#### Steps to reproduce +## Steps to reproduce + Steps to reproduce the behaviour: + 1. Command line: `nextflow run ...` 2. See error: _Please provide your error message_ -#### Expected behaviour +## Expected behaviour + A clear and concise description of what you expected to happen. -#### System: - - Hardware: [e.g. HPC, Desktop, Cloud...] - - Executor: [e.g. slurm, local, awsbatch...] - - OS: [e.g. CentOS Linux, macOS, Linux Mint...] - - Version [e.g. 7, 10.13.6, 18.3...] +## System + +- Hardware: +- Executor: +- OS: +- Version + +## Nextflow Installation + +- Version: + +## Container engine -#### Nextflow Installation: - - Version: [e.g. 0.31.0] +- Engine: +- version: +- Image tag: -#### Container engine: - - Engine: [e.g. Conda, Docker or Singularity] - - version: [e.g. 1.0.0] - - Image tag: [e.g. nfcore/chipseq:1.0.0] +## Additional context -#### Additional context Add any other context about the problem here. diff --git a/.github/ISSUE_TEMPLATE/feature_request.md b/.github/ISSUE_TEMPLATE/feature_request.md index 1f025b77..c3bbf194 100644 --- a/.github/ISSUE_TEMPLATE/feature_request.md +++ b/.github/ISSUE_TEMPLATE/feature_request.md @@ -1,16 +1,24 @@ +# nf-core/chipseq feature request + Hi there! -Thanks for suggesting a new feature for the pipeline! Please delete this text and anything that's not relevant from the template below: +Thanks for suggesting a new feature for the pipeline! +Please delete this text and anything that's not relevant from the template below: + +## Is your feature request related to a problem? Please describe -#### Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. + Ex. I'm always frustrated when [...] -#### Describe the solution you'd like +## Describe the solution you'd like + A clear and concise description of what you want to happen. -#### Describe alternatives you've considered +## Describe alternatives you've considered + A clear and concise description of any alternative solutions or features you've considered. -#### Additional context +## Additional context + Add any other context about the feature request here. diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index fa1063da..aed991c4 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -1,15 +1,19 @@ -Many thanks to contributing to nf-core/chipseq! +# nf-core/chipseq pull request -Please fill in the appropriate checklist below (delete whatever is not relevant). These are the most common things requested on pull requests (PRs). +Many thanks for contributing to nf-core/chipseq! + +Please fill in the appropriate checklist below (delete whatever is not relevant). +These are the most common things requested on pull requests (PRs). ## PR checklist - - [ ] This comment contains a description of changes (with reason) - - [ ] If you've fixed a bug or added code that should be tested, add tests! - - [ ] If necessary, also make a PR on the [nf-core/chipseq branch on the nf-core/test-datasets repo]( https://github.com/nf-core/test-datasets/pull/new/nf-core/chipseq) - - [ ] Ensure the test suite passes (`nextflow run . -profile test,docker`). - - [ ] Make sure your code lints (`nf-core lint .`). - - [ ] Documentation in `docs` is updated - - [ ] `CHANGELOG.md` is updated - - [ ] `README.md` is updated - -**Learn more about contributing:** https://github.com/nf-core/chipseq/tree/master/.github/CONTRIBUTING.md + +- [ ] This comment contains a description of changes (with reason) +- [ ] If you've fixed a bug or added code that should be tested, add tests! +- [ ] If necessary, also make a PR on the [nf-core/chipseq branch on the nf-core/test-datasets repo](https://github.com/nf-core/test-datasets/pull/new/nf-core/chipseq) +- [ ] Ensure the test suite passes (`nextflow run . -profile test,docker`). +- [ ] Make sure your code lints (`nf-core lint .`). +- [ ] Documentation in `docs` is updated +- [ ] `CHANGELOG.md` is updated +- [ ] `README.md` is updated + +**Learn more about contributing:** [CONTRIBUTING.md](https://github.com/nf-core/chipseq/tree/master/.github/CONTRIBUTING.md) diff --git a/.github/markdownlint.yml b/.github/markdownlint.yml index 96b12a70..cc8e0d33 100644 --- a/.github/markdownlint.yml +++ b/.github/markdownlint.yml @@ -3,3 +3,7 @@ default: true, line-length: false no-duplicate-header: siblings_only: true +MD033: + allowed_elements: [details, summary, p, img] +MD007: + indent: 4 diff --git a/.github/workflows/awsfulltest.yml b/.github/workflows/awsfulltest.yml new file mode 100644 index 00000000..6d9ce127 --- /dev/null +++ b/.github/workflows/awsfulltest.yml @@ -0,0 +1,36 @@ +name: nf-core AWS full size tests +# This workflow is triggered on releases. +# It runs the -profile 'test_full' on AWS batch + +on: + release: + types: [published] + +jobs: + run-awstest: + if: github.repository == 'nf-core/chipseq' + name: Run AWS test + runs-on: ubuntu-latest + steps: + - name: Setup Miniconda + uses: goanpeca/setup-miniconda@v1.0.2 + with: + auto-update-conda: true + python-version: 3.7 + - name: Install awscli + run: conda install -c conda-forge awscli + - name: Start AWS batch job + env: + AWS_ACCESS_KEY_ID: ${{secrets.AWS_ACCESS_KEY_ID}} + AWS_SECRET_ACCESS_KEY: ${{secrets.AWS_SECRET_ACCESS_KEY}} + TOWER_ACCESS_TOKEN: ${{secrets.AWS_TOWER_TOKEN}} + #AWS_JOB_DEFINITION: ${{secrets.AWS_JOB_DEFINITION}} + AWS_JOB_QUEUE: ${{secrets.AWS_JOB_QUEUE}} + AWS_S3_BUCKET: ${{secrets.AWS_S3_BUCKET}} + run: | # Submits job to AWS batch using a 'nextflow-4GiB' job definition. Setting JVM options to "-XX:+UseG1GC" for more efficient garbage collection when staging remote files. + aws batch submit-job \ + --region eu-west-1 \ + --job-name nf-core-chipseq \ + --job-queue $AWS_JOB_QUEUE \ + --job-definition nextflow-4GiB \ + --container-overrides '{"command": ["nf-core/chipseq", "-r '"${GITHUB_SHA}"' -profile test_full --outdir s3://'"${AWS_S3_BUCKET}"'/chipseq/results-'"${GITHUB_SHA}"' -w s3://'"${AWS_S3_BUCKET}"'/chipseq/work-'"${GITHUB_SHA}"' -with-tower"], "environment": [{"name": "TOWER_ACCESS_TOKEN", "value": "'"$TOWER_ACCESS_TOKEN"'"}, {"name": "NXF_OPTS", "value": "-XX:+UseG1GC"}]}' diff --git a/.github/workflows/awstest.yml b/.github/workflows/awstest.yml new file mode 100644 index 00000000..aeccb336 --- /dev/null +++ b/.github/workflows/awstest.yml @@ -0,0 +1,38 @@ +name: nf-core AWS test +# This workflow is triggered on push to the master branch. +# It runs the -profile 'test' on AWS batch + +on: + push: + branches: + - master + - dev # just for testing purposes, to be removed + +jobs: + run-awstest: + if: github.repository == 'nf-core/chipseq' + name: Run AWS test + runs-on: ubuntu-latest + steps: + - name: Setup Miniconda + uses: goanpeca/setup-miniconda@v1.0.2 + with: + auto-update-conda: true + python-version: 3.7 + - name: Install awscli + run: conda install -c conda-forge awscli + - name: Start AWS batch job + env: + AWS_ACCESS_KEY_ID: ${{secrets.AWS_ACCESS_KEY_ID}} + AWS_SECRET_ACCESS_KEY: ${{secrets.AWS_SECRET_ACCESS_KEY}} + TOWER_ACCESS_TOKEN: ${{secrets.AWS_TOWER_TOKEN}} + #AWS_JOB_DEFINITION: ${{secrets.AWS_JOB_DEFINITION}} + AWS_JOB_QUEUE: ${{secrets.AWS_JOB_QUEUE}} + AWS_S3_BUCKET: ${{secrets.AWS_S3_BUCKET}} + run: | # Submits job to AWS batch using a 'nextflow-4GiB' job definition. Setting JVM options to "-XX:+UseG1GC" for more efficient garbage collection when staging remote files. + aws batch submit-job \ + --region eu-west-1 \ + --job-name nf-core-chipseq \ + --job-queue $AWS_JOB_QUEUE \ + --job-definition nextflow-4GiB \ + --container-overrides '{"command": ["nf-core/chipseq", "-r '"${GITHUB_SHA}"' -profile test --outdir s3://'"${AWS_S3_BUCKET}"'/chipseq/results-'"${GITHUB_SHA}"' -w s3://'"${AWS_S3_BUCKET}"'/chipseq/work-'"${GITHUB_SHA}"' -with-tower"], "environment": [{"name": "TOWER_ACCESS_TOKEN", "value": "'"$TOWER_ACCESS_TOKEN"'"}, {"name": "NXF_OPTS", "value": "-XX:+UseG1GC"}]}' diff --git a/.github/workflows/branch.yml b/.github/workflows/branch.yml new file mode 100644 index 00000000..aa94952d --- /dev/null +++ b/.github/workflows/branch.yml @@ -0,0 +1,16 @@ +name: nf-core branch protection +# This workflow is triggered on PRs to master branch on the repository +# It fails when someone tries to make a PR against the nf-core `master` branch instead of `dev` +on: + pull_request: + branches: + - master + +jobs: + test: + runs-on: ubuntu-18.04 + steps: + # PRs are only ok if coming from an nf-core `dev` branch or a fork `patch` branch + - name: Check PRs + run: | + { [[ $(git remote get-url origin) == *nf-core/chipseq ]] && [[ ${GITHUB_HEAD_REF} = "dev" ]]; } || [[ ${GITHUB_HEAD_REF} == "patch" ]] diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml new file mode 100755 index 00000000..6c65240b --- /dev/null +++ b/.github/workflows/ci.yml @@ -0,0 +1,125 @@ +name: nf-core CI +# This workflow is triggered on releases and pull-requests. +# It runs the pipeline with the minimal test dataset to check that it completes without any syntax errors +on: + push: + branches: + - dev + pull_request: + release: + types: [published] + +jobs: + test: + name: Run workflow tests + # Only run on push if this is the nf-core dev branch (merged PRs) + if: ${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'nf-core/chipseq') }} + runs-on: ubuntu-latest + env: + NXF_VER: ${{ matrix.nxf_ver }} + NXF_ANSI_LOG: false + strategy: + matrix: + # Nextflow versions: check pipeline minimum and current latest + nxf_ver: ['19.10.0', ''] + steps: + - name: Check out pipeline code + uses: actions/checkout@v2 + + - name: Check if Dockerfile or Conda environment changed + uses: technote-space/get-diff-action@v1 + with: + PREFIX_FILTER: | + Dockerfile + environment.yml + + - name: Build new docker image + if: env.GIT_DIFF + run: docker build --no-cache . -t nfcore/chipseq:1.2.0 + + - name: Pull docker image + if: ${{ !env.GIT_DIFF }} + run: | + docker pull nfcore/chipseq:dev + docker tag nfcore/chipseq:dev nfcore/chipseq:1.2.0 + + - name: Install Nextflow + run: | + wget -qO- get.nextflow.io | bash + sudo mv nextflow /usr/local/bin/ + + - name: Run pipeline with test data + run: | + nextflow run ${GITHUB_WORKSPACE} -profile test,docker + + parameters: + name: Test workflow parameters + if: ${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'nf-core/chipseq') }} + runs-on: ubuntu-latest + env: + NXF_VER: '19.10.0' + NXF_ANSI_LOG: false + strategy: + matrix: + parameters: [--single_end, --skip_trimming, --skip_consensus_peaks] + steps: + - name: Check out pipeline code + uses: actions/checkout@v2 + + - name: Check if Dockerfile or Conda environment changed + uses: technote-space/get-diff-action@v1 + with: + PREFIX_FILTER: | + Dockerfile + environment.yml + + - name: Build new docker image + if: env.GIT_DIFF + run: docker build --no-cache . -t nfcore/chipseq:1.2.0 + + - name: Pull docker image + if: ${{ !env.GIT_DIFF }} + run: | + docker pull nfcore/chipseq:dev + docker tag nfcore/chipseq:dev nfcore/chipseq:1.2.0 + + - name: Install Nextflow + run: | + wget -qO- get.nextflow.io | bash + sudo mv nextflow /usr/local/bin/ + + - name: Run pipeline with various parameters + run: | + nextflow run ${GITHUB_WORKSPACE} -profile test,docker ${{ matrix.parameters }} + + push_dockerhub: + name: Push new Docker image to Docker Hub + runs-on: ubuntu-latest + # Only run if the tests passed + needs: test + # Only run for the nf-core repo, for releases and merged PRs + if: ${{ github.repository == 'nf-core/chipseq' && (github.event_name == 'release' || github.event_name == 'push') }} + env: + DOCKERHUB_USERNAME: ${{ secrets.DOCKERHUB_USERNAME }} + DOCKERHUB_PASS: ${{ secrets.DOCKERHUB_PASS }} + steps: + - name: Check out pipeline code + uses: actions/checkout@v2 + + - name: Build new docker image + run: docker build --no-cache . -t nfcore/chipseq:latest + + - name: Push Docker image to DockerHub (dev) + if: ${{ github.event_name == 'push' }} + run: | + echo "$DOCKERHUB_PASS" | docker login -u "$DOCKERHUB_USERNAME" --password-stdin + docker tag nfcore/chipseq:latest nfcore/chipseq:dev + docker push nfcore/chipseq:dev + + - name: Push Docker image to DockerHub (release) + if: ${{ github.event_name == 'release' }} + run: | + echo "$DOCKERHUB_PASS" | docker login -u "$DOCKERHUB_USERNAME" --password-stdin + docker push nfcore/chipseq:latest + docker tag nfcore/chipseq:latest nfcore/chipseq:${{ github.event.release.tag_name }} + docker push nfcore/chipseq:${{ github.event.release.tag_name }} diff --git a/.github/workflows/linting.yml b/.github/workflows/linting.yml new file mode 100755 index 00000000..eb66c144 --- /dev/null +++ b/.github/workflows/linting.yml @@ -0,0 +1,61 @@ +name: nf-core linting +# This workflow is triggered on pushes and PRs to the repository. +# It runs the `nf-core lint` and markdown lint tests to ensure that the code meets the nf-core guidelines +on: + push: + pull_request: + release: + types: [published] + +jobs: + Markdown: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v2 + - uses: actions/setup-node@v1 + with: + node-version: '10' + - name: Install markdownlint + run: npm install -g markdownlint-cli + - name: Run Markdownlint + run: markdownlint ${GITHUB_WORKSPACE} -c ${GITHUB_WORKSPACE}/.github/markdownlint.yml + YAML: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v1 + - uses: actions/setup-node@v1 + with: + node-version: '10' + - name: Install yaml-lint + run: npm install -g yaml-lint + - name: Run yaml-lint + run: yamllint $(find ${GITHUB_WORKSPACE} -type f -name "*.yml") + nf-core: + runs-on: ubuntu-latest + steps: + + - name: Check out pipeline code + uses: actions/checkout@v2 + + - name: Install Nextflow + run: | + wget -qO- get.nextflow.io | bash + sudo mv nextflow /usr/local/bin/ + + - uses: actions/setup-python@v1 + with: + python-version: '3.6' + architecture: 'x64' + + - name: Install dependencies + run: | + python -m pip install --upgrade pip + pip install nf-core + + - name: Run nf-core lint + env: + GITHUB_COMMENTS_URL: ${{ github.event.pull_request.comments_url }} + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + GITHUB_PR_COMMIT: ${{ github.event.pull_request.head.sha }} + run: nf-core lint ${GITHUB_WORKSPACE} + diff --git a/.gitignore b/.gitignore index 5b54e3e6..6354f370 100644 --- a/.gitignore +++ b/.gitignore @@ -3,5 +3,6 @@ work/ data/ results/ .DS_Store -tests/test_data +tests/ +testing/ *.pyc diff --git a/.travis.yml b/.travis.yml deleted file mode 100644 index f86c8abf..00000000 --- a/.travis.yml +++ /dev/null @@ -1,42 +0,0 @@ -sudo: required -language: python -jdk: openjdk8 -services: docker -python: '3.6' -cache: pip -matrix: - fast_finish: true - -before_install: - # PRs to master are only ok if coming from dev branch - - '[ $TRAVIS_PULL_REQUEST = "false" ] || [ $TRAVIS_BRANCH != "master" ] || ([ $TRAVIS_PULL_REQUEST_SLUG = $TRAVIS_REPO_SLUG ] && ([ $TRAVIS_PULL_REQUEST_BRANCH = "dev" ] || [ $TRAVIS_PULL_REQUEST_BRANCH = "patch" ]))' - # Pull the docker image first so the test doesn't wait for this - - docker pull nfcore/chipseq:dev - # Fake the tag locally so that the pipeline runs properly - # Looks weird when this is :dev to :dev, but makes sense when testing code for a release (:dev to :1.0.1) - - docker tag nfcore/chipseq:dev nfcore/chipseq:1.1.0 - -install: - # Install Nextflow - - mkdir /tmp/nextflow && cd /tmp/nextflow - - wget -qO- get.nextflow.io | bash - - sudo ln -s /tmp/nextflow/nextflow /usr/local/bin/nextflow - # Install nf-core/tools - - pip install --upgrade pip - - pip install nf-core - # Reset - - mkdir ${TRAVIS_BUILD_DIR}/tests && cd ${TRAVIS_BUILD_DIR}/tests - # Install markdownlint-cli - - sudo apt-get install npm && npm install -g markdownlint-cli - -env: - - NXF_VER='19.10.0' # Specify a minimum NF version that should be tested and work - - NXF_VER='' # Plus: get the latest NF version and check that it works - -script: - # Lint the pipeline code - - nf-core lint ${TRAVIS_BUILD_DIR} - # Lint the documentation - - markdownlint ${TRAVIS_BUILD_DIR} -c ${TRAVIS_BUILD_DIR}/.github/markdownlint.yml - # Run the pipeline with the test profile - - nextflow run ${TRAVIS_BUILD_DIR} -profile test,docker -ansi-log false diff --git a/CHANGELOG.md b/CHANGELOG.md index 6c2fbe56..aafc386f 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,15 +1,91 @@ # nf-core/chipseq: Changelog -All notable changes to this project will be documented in this file. - The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html). +## [1.2.0] - 2020-07-02 + +### `Added` + +* [#138](https://github.com/nf-core/chipseq/issues/138) - Add social preview image +* [#153](https://github.com/nf-core/chipseq/issues/153) - Add plotHeatmap +* [#159](https://github.com/nf-core/chipseq/issues/159) - expose bwa mem -T parameter +* [nf-core/atacseq#63](https://github.com/nf-core/atacseq/issues/63) - Added multicore support for Trim Galore! +* [nf-core/atacseq#75](https://github.com/nf-core/atacseq/issues/75) - Include gene annotation versions in multiqc report +* [nf-core/atacseq#76](https://github.com/nf-core/atacseq/issues/76) - featureCounts coupled to DESeq2 +* [nf-core/atacseq#79](https://github.com/nf-core/atacseq/issues/79) - Parallelize DESeq2 +* [nf-core/atacseq#97](https://github.com/nf-core/atacseq/issues/97) - PBC1, PBC2 from pipeline? +* [nf-core/atacseq#107](https://github.com/nf-core/atacseq/issues/107) - Add options to change MACS2 parameters +* Regenerated screenshots and added collapsible sections for output files in `docs/output.md` +* Update template to tools `1.9` +* Replace `set` with `tuple` and `file()` with `path()` in all processes +* Capitalise process names +* Parameters: + * `--bwa_min_score` to set minimum alignment score for BWA MEM + * `--macs_fdr` to provide FDR threshold for MACS2 peak calling + * `--macs_pvalue` to provide p-value threshold for MACS2 peak calling + * `--skip_peak_qc` to skip MACS2 peak QC plot generation + * `--skip_peak_annotation` to skip annotation of MACS2 and consensus peaks with HOMER + * `--skip_consensus_peaks` to skip consensus peak generation + * `--deseq2_vst` to use variance stabilizing transformation (VST) instead of regularized log transformation (rlog) with DESeq2 + * `--publish_dir_mode` to customise method of publishing results to output directory [nf-core/tools#585](https://github.com/nf-core/tools/issues/585) + +### `Removed` + +* `--tss_bed` parameter + +### `Fixed` + +* [#118](https://github.com/nf-core/chipseq/issues/118) - Running on with SGE +* [#132](https://github.com/nf-core/chipseq/issues/132) - BigWig Error: sort: cannot create temporary file in '': Read-only file system +* [#154](https://github.com/nf-core/chipseq/issues/154) - computeMatrix.val.mat.gz files not zipped +* [nf-core/atacseq#71](https://github.com/nf-core/atacseq/issues/71) - consensus_peaks.mLb.clN.boolean.intersect.plot.pdf not generated +* [nf-core/atacseq#73](https://github.com/nf-core/atacseq/issues/73) - macs_annotatePeaks.mLb.clN.summary.txt file is not created +* [nf-core/atacseq#86](https://github.com/nf-core/atacseq/issues/86) - bug in the plot_homer_annotatepeaks.r script +* [nf-core/atacseq#102](https://github.com/nf-core/atacseq/issues/102) - Incorrect Group ID assigned by featurecounts_deseq2.r +* [nf-core/atacseq#109](https://github.com/nf-core/atacseq/issues/109) - Specify custom gtf but gene bed is not generated from that gtf? +* Make executables in `bin/` compatible with Python 3 + +### `Dependencies` + +* Add bioconductor-biocparallel `1.20.0` +* Add markdown `3.2.2` +* Add pigz `2.3.4` +* Add pygments `2.6.1` +* Add pymdown-extensions `7.1` +* Add python `3.7.6` +* Add r-reshape2 `1.4.4` +* Add r-tidyr `1.1.0` +* Update bedtools `2.27.1` -> `2.29.2` +* Update bioconductor-deseq2 `1.20.0` -> `1.26.0` +* Update bioconductor-vsn `3.46.0` -> `3.54.0` +* Update deeptools `3.2.1` -> `3.4.3` +* Update fastqc `0.11.8` -> `0.11.9` +* Update gawk `4.2.1` -> `5.1.0` +* Update homer `4.9.1` -> `4.11` +* Update macs2 `2.1.2` -> `2.2.7.1` +* Update multiqc `1.7` -> `1.8` +* Update phantompeakqualtools `1.2` -> `1.2.2` +* Update picard `2.19.0` -> `2.23.1` +* Update pysam `0.15.2` -> `0.15.3` +* Update r-base `3.4.1` -> `3.6.2` +* Update r-ggplot2 `3.1.0` -> `3.3.2` +* Update r-lattice `0.20_35` -> `0.20_41` +* Update r-optparse `1.6.0` -> `1.6.6` +* Update r-pheatmap `1.0.10` -> `1.0.12` +* Update r-scales `1.0.0` -> `1.1.1` +* Update r-upsetr `1.3.3` -> `1.4.0` +* Update r-xfun `0.3` -> `0.15` +* Update samtools `1.9` -> `1.10` +* Update subread `1.6.4` -> `2.0.1` +* Update trim-galore `0.5.0` -> `0.6.5` +* Update ucsc-bedgraphtobigwig `377` -> `357` + ## [1.1.0] - 2019-11-05 ### `Added` -* [#46](https://github.com/nf-core/atacseq/issues/46) - Missing gene_bed path in igenomes config +* [nf-core/atacseq#46](https://github.com/nf-core/atacseq/issues/46) - Missing gene_bed path in igenomes config * Update template to tools `1.7` * Add `--trim_nextseq` parameter * Add `CITATIONS.md` file @@ -18,10 +94,10 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0. ### `Fixed` * **Change all parameters from `camelCase` to `snake_case` (see [Deprecated](#Deprecated))** -* [#44](https://github.com/nf-core/atacseq/issues/44) - Output directory missing: macs2/consensus/deseq2 -* [#45](https://github.com/nf-core/atacseq/issues/45) - Wrong x-axis scale for the HOMER: Peak annotation Counts tab plot? -* [#46](https://github.com/nf-core/atacseq/issues/46) - Stage blacklist file in channel properly -* [#50](https://github.com/nf-core/atacseq/issues/50) - HOMER number of peaks does not correspond to found MACS2 peaks +* [nf-core/atacseq#44](https://github.com/nf-core/atacseq/issues/44) - Output directory missing: macs2/consensus/deseq2 +* [nf-core/atacseq#45](https://github.com/nf-core/atacseq/issues/45) - Wrong x-axis scale for the HOMER: Peak annotation Counts tab plot? +* [nf-core/atacseq#46](https://github.com/nf-core/atacseq/issues/46) - Stage blacklist file in channel properly +* [nf-core/atacseq#50](https://github.com/nf-core/atacseq/issues/50) - HOMER number of peaks does not correspond to found MACS2 peaks * Fixed bug in UpSetR peak intersection plot * Increase default resource requirements in `base.config` * Increase process-specific requirements based on user-reported failures diff --git a/CITATIONS.md b/CITATIONS.md index 3a7e6242..987f08bb 100644 --- a/CITATIONS.md +++ b/CITATIONS.md @@ -1,9 +1,14 @@ # nf-core/chipseq: Citations -## Pipeline tools +## [nf-core](https://www.ncbi.nlm.nih.gov/pubmed/32055031/) + +> Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031. ReadCube: [Full Access Link](https://rdcu.be/b1GjZ). + +## [Nextflow](https://www.ncbi.nlm.nih.gov/pubmed/28398311/) -* [Nextflow](https://www.ncbi.nlm.nih.gov/pubmed/28398311/) - > Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311. +> Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311. + +## Pipeline tools * [BWA](https://www.ncbi.nlm.nih.gov/pubmed/19451168/) > Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009 Jul 15;25(14):1754-60. doi: 10.1093/bioinformatics/btp324. Epub 2009 May 18. PubMed PMID: 19451168; PubMed Central PMCID: PMC2705234. @@ -98,4 +103,4 @@ * [Singularity](https://www.ncbi.nlm.nih.gov/pubmed/28494014/) > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675. -* [Docker](https://www.docker.com/) +* [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241) diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md index 1cda7600..cf930c8a 100755 --- a/CODE_OF_CONDUCT.md +++ b/CODE_OF_CONDUCT.md @@ -34,7 +34,7 @@ This Code of Conduct applies both within project spaces and in public spaces whe ## Enforcement -Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team on [Slack](https://nf-co.re/join/slack/). The project team will review and investigate all complaints, and will respond in a way that it deems appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately. +Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team on [Slack](https://nf-co.re/join/slack). The project team will review and investigate all complaints, and will respond in a way that it deems appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately. Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project's leadership. diff --git a/Dockerfile b/Dockerfile index c43fecf6..0628abe4 100644 --- a/Dockerfile +++ b/Dockerfile @@ -1,13 +1,17 @@ -FROM nfcore/base:1.7 +FROM nfcore/base:1.9 LABEL authors="Philip Ewels" \ - description="Docker image containing all requirements for nf-core/chipseq pipeline" + description="Docker image containing all software requirements for the nf-core/chipseq pipeline" # Install the conda environment COPY environment.yml / RUN conda env create -f /environment.yml && conda clean -a # Add conda installation dir to PATH (instead of doing 'conda activate') -ENV PATH /opt/conda/envs/nf-core-chipseq-1.1.0/bin:$PATH +ENV PATH /opt/conda/envs/nf-core-chipseq-1.2.0/bin:$PATH # Dump the details of the installed packages to a file for posterity -RUN conda env export --name nf-core-chipseq-1.1.0 > nf-core-chipseq-1.1.0.yml +RUN conda env export --name nf-core-chipseq-1.2.0 > nf-core-chipseq-1.2.0.yml + +# Instruct R processes to use these empty files instead of clashing with a local version +RUN touch .Rprofile +RUN touch .Renviron diff --git a/README.md b/README.md index 543f009f..09d9d5b4 100644 --- a/README.md +++ b/README.md @@ -1,11 +1,13 @@ # ![nf-core/chipseq](docs/images/nf-core-chipseq_logo.png) -[![Build Status](https://travis-ci.com/nf-core/chipseq.svg?branch=master)](https://travis-ci.com/nf-core/chipseq) +[![GitHub Actions CI Status](https://github.com/nf-core/chipseq/workflows/nf-core%20CI/badge.svg)](https://github.com/nf-core/chipseq/actions) +[![GitHub Actions Linting Status](https://github.com/nf-core/chipseq/workflows/nf-core%20linting/badge.svg)](https://github.com/nf-core/chipseq/actions) [![Nextflow](https://img.shields.io/badge/nextflow-%E2%89%A519.10.0-brightgreen.svg)](https://www.nextflow.io/) +[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3240506.svg)](https://doi.org/10.5281/zenodo.3240506) [![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg)](http://bioconda.github.io/) [![Docker](https://img.shields.io/docker/automated/nfcore/chipseq.svg)](https://hub.docker.com/r/nfcore/chipseq/) -[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3240506.svg)](https://doi.org/10.5281/zenodo.3240506) +[![Get help on Slack](http://img.shields.io/badge/slack-nf--core%20%23chipseq-4A154B?logo=slack)](https://nfcore.slack.com/channels/chipseq) ## Introduction @@ -47,23 +49,23 @@ The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool ## Quick Start -i. Install [`nextflow`](https://nf-co.re/usage/installation) +1. Install [`nextflow`](https://nf-co.re/usage/installation) -ii. Install one of [`docker`](https://docs.docker.com/engine/installation/), [`singularity`](https://www.sylabs.io/guides/3.0/user-guide/) or [`conda`](https://conda.io/miniconda.html) +2. Install either [`Docker`](https://docs.docker.com/engine/installation/) or [`Singularity`](https://www.sylabs.io/guides/3.0/user-guide/) for full pipeline reproducibility _(please only use [`Conda`](https://conda.io/miniconda.html) as a last resort; see [docs](https://nf-co.re/usage/configuration#basic-configuration-profiles))_ -iii. Download the pipeline and test it on a minimal dataset with a single command +3. Download the pipeline and test it on a minimal dataset with a single command: -```bash -nextflow run nf-core/chipseq -profile test, -``` + ```bash + nextflow run nf-core/chipseq -profile test, + ``` -> Please check [nf-core/configs](https://github.com/nf-core/configs#documentation) to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use `-profile institute` in your command. This will enable either `docker` or `singularity` and set the appropriate execution settings for your local compute environment. + > Please check [nf-core/configs](https://github.com/nf-core/configs#documentation) to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use `-profile ` in your command. This will enable either `docker` or `singularity` and set the appropriate execution settings for your local compute environment. -iv. Start running your own analysis! +4. Start running your own analysis! -```bash -nextflow run nf-core/chipseq -profile --input design.csv --genome GRCh37 -``` + ```bash + nextflow run nf-core/chipseq -profile --input design.csv --genome GRCh37 + ``` See [usage docs](docs/usage.md) for all of the available options when running the pipeline. @@ -82,21 +84,27 @@ The nf-core/chipseq pipeline comes with documentation about the pipeline, found ## Credits -These scripts were orginally written by Chuan Wang ([@chuan-wang](https://github.com/chuan-wang)) and Phil Ewels ([@ewels](https://github.com/ewels)) for use at the [National Genomics Infrastructure](https://portal.scilifelab.se/genomics/) at [SciLifeLab](http://www.scilifelab.se/) in Stockholm, Sweden. It has since been re-implemented by Harshil Patel ([@drpatelh](https://github.com/drpatelh)) from [The Bioinformatics & Biostatistics Group](https://www.crick.ac.uk/research/science-technology-platforms/bioinformatics-and-biostatistics/) at [The Francis Crick Institute](https://www.crick.ac.uk/), London. +These scripts were originally written by Chuan Wang ([@chuan-wang](https://github.com/chuan-wang)) and Phil Ewels ([@ewels](https://github.com/ewels)) for use at the [National Genomics Infrastructure](https://portal.scilifelab.se/genomics/) at [SciLifeLab](http://www.scilifelab.se/) in Stockholm, Sweden. The pipeline has since been re-implemented by Harshil Patel ([@drpatelh](https://github.com/drpatelh)) from [The Bioinformatics & Biostatistics Group](https://www.crick.ac.uk/research/science-technology-platforms/bioinformatics-and-biostatistics/) at [The Francis Crick Institute](https://www.crick.ac.uk/), London. -Many thanks to others who have helped out along the way too, including (but not limited to): [@apeltzer](https://github.com/apeltzer), [@bc2zb](https://github.com/bc2zb), [@drejom](https://github.com/drejom), [@KevinMenden](https://github.com/KevinMenden), [@crickbabs](https://github.com/crickbabs), [@pditommaso](https://github.com/pditommaso). +Many thanks to others who have helped out and contributed along the way too, including (but not limited to): [@apeltzer](https://github.com/apeltzer), [@bc2zb](https://github.com/bc2zb), [@crickbabs](https://github.com/crickbabs), [@drejom](https://github.com/drejom), [@houghtos](https://github.com/houghtos), [@KevinMenden](https://github.com/KevinMenden), [@mashehu](https://github.com/mashehu), [@pditommaso](https://github.com/pditommaso), [@Rotholandus](https://github.com/Rotholandus), [@sofiahaglund](https://github.com/sofiahaglund), [@tiagochst](https://github.com/tiagochst) and [@winni2k](https://github.com/winni2k). ## Contributions and Support If you would like to contribute to this pipeline, please see the [contributing guidelines](.github/CONTRIBUTING.md). -For further information or help, don't hesitate to get in touch on [Slack](https://nfcore.slack.com/channels/chipseq) (you can join with [this invite](https://nf-co.re/join/slack)). +For further information or help, don't hesitate to get in touch on the [Slack `#chipseq` channel](https://nfcore.slack.com/channels/chipseq) (you can join with [this invite](https://nf-co.re/join/slack)). ## Citation If you use nf-core/chipseq for your analysis, please cite it using the following doi: [10.5281/zenodo.3240506](https://doi.org/10.5281/zenodo.3240506) -You can cite the `nf-core` pre-print as follows: -> Ewels PA, Peltzer A, Fillinger S, Alneberg JA, Patel H, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. **nf-core: Community curated bioinformatics pipelines**. *bioRxiv*. 2019. p. 610741. [doi: 10.1101/610741](https://www.biorxiv.org/content/10.1101/610741v1). +You can cite the `nf-core` publication as follows: An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file. + +> **The nf-core framework for community-curated bioinformatics pipelines.** +> +> Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen. +> +> _Nat Biotechnol._ 2020 Feb 13. doi: [10.1038/s41587-020-0439-x](https://dx.doi.org/10.1038/s41587-020-0439-x). +> ReadCube: [Full Access Link](https://rdcu.be/b1GjZ) diff --git a/assets/multiqc/deseq2_clustering_header.txt b/assets/multiqc/deseq2_clustering_header.txt index 1895dfd1..a5edcb79 100644 --- a/assets/multiqc/deseq2_clustering_header.txt +++ b/assets/multiqc/deseq2_clustering_header.txt @@ -1,11 +1,11 @@ #id: 'deseq2_clustering' -#section_name: 'DESeq2: Sample similarity' +#section_name: 'MERGED LIB: DESeq2 sample similarity' #description: " matrix is generated from clustering by Euclidean distances between # DESeq2 # rlog values for each sample # (see featurecounts_deseq2.r script)." #plot_type: 'heatmap' -#anchor: 'nfcore_chipseq-deseq2_clustering' +#anchor: 'deseq2_clustering' #pconfig: # title: 'DESeq2: Heatmap of the sample-to-sample distances' # xlab: True diff --git a/assets/multiqc/deseq2_pca_header.txt b/assets/multiqc/deseq2_pca_header.txt index c3a1c562..a086d306 100644 --- a/assets/multiqc/deseq2_pca_header.txt +++ b/assets/multiqc/deseq2_pca_header.txt @@ -1,10 +1,10 @@ #id: 'deseq2_pca' -#section_name: 'DESeq2: PCA plot' +#section_name: 'MERGED LIB: DESeq2 PCA plot' #description: "between samples in the experiment. # These values are calculated using DESeq2 # in the featurecounts_deseq2.r script." #plot_type: 'scatter' -#anchor: 'nfcore_chipseq-deseq2_pca' +#anchor: 'deseq2_pca' #pconfig: # title: 'DESeq2: Principal component plot' # xlab: PC1 diff --git a/assets/multiqc/frip_score_header.txt b/assets/multiqc/frip_score_header.txt index 12f91cf1..82902115 100644 --- a/assets/multiqc/frip_score_header.txt +++ b/assets/multiqc/frip_score_header.txt @@ -1,10 +1,10 @@ #id: 'frip_score' -#section_name: 'MACS2: Peak FRiP score' +#section_name: 'MERGED LIB: MACS2 FRiP score' #description: "is generated by calculating the fraction of all mapped reads that fall # into the MACS2 called peak regions. A read must overlap a peak by at least 20% to be counted. # See FRiP score." #plot_type: 'bargraph' -#anchor: 'nfcore_chipseq-frip_score' +#anchor: 'frip_score' #pconfig: # title: 'FRiP score' # ylab: 'FRiP score' diff --git a/assets/multiqc/peak_annotation_header.txt b/assets/multiqc/peak_annotation_header.txt index 1aff7522..2b3ee938 100644 --- a/assets/multiqc/peak_annotation_header.txt +++ b/assets/multiqc/peak_annotation_header.txt @@ -1,9 +1,9 @@ #id: 'peak_annotation' -#section_name: 'HOMER: Peak annotation' +#section_name: 'MERGED LIB: HOMER peak annotation' #description: "is generated by calculating the proportion of peaks assigned to genomic features by # HOMER annotatePeaks.pl." #plot_type: 'bargraph' -#anchor: 'nfcore_chipseq-peak_annotation' +#anchor: 'peak_annotation' #pconfig: # title: 'Peak to feature proportion' # ylab: 'Peak count' diff --git a/assets/multiqc/peak_count_header.txt b/assets/multiqc/peak_count_header.txt index 1c690277..aa4dd346 100644 --- a/assets/multiqc/peak_count_header.txt +++ b/assets/multiqc/peak_count_header.txt @@ -1,9 +1,9 @@ #id: 'peak_count' -#section_name: 'MACS2: Peak count' +#section_name: 'MERGED LIB: MACS2 peak count' #description: "is calculated from total number of peaks called by # MACS2" #plot_type: 'bargraph' -#anchor: 'nfcore_chipseq-peak_count' +#anchor: 'peak_count' #pconfig: # title: 'Total peak count' # ylab: 'Peak count' diff --git a/assets/multiqc/spp_correlation_header.txt b/assets/multiqc/spp_correlation_header.txt index 3087a3a7..ad571563 100644 --- a/assets/multiqc/spp_correlation_header.txt +++ b/assets/multiqc/spp_correlation_header.txt @@ -1,9 +1,9 @@ #id: 'strand_shift_correlation' -#section_name: 'spp: Strand-shift correlation plot' +#section_name: 'MERGED LIB: spp strand-shift correlation' #description: "generated using run_spp.R script from # phantompeakqualtools." #plot_type: 'linegraph' -#anchor: 'nfcore_chipseq-strand_shift_correlation' +#anchor: 'strand_shift_correlation' #pconfig: # title: 'Strand-shift correlation plot' # ylab: 'Cross-correlation' diff --git a/assets/multiqc/spp_nsc_header.txt b/assets/multiqc/spp_nsc_header.txt index 21121334..43370f32 100644 --- a/assets/multiqc/spp_nsc_header.txt +++ b/assets/multiqc/spp_nsc_header.txt @@ -1,9 +1,9 @@ #id: 'nsc_coefficient' -#section_name: 'spp: NSC coefficient' +#section_name: 'MERGED LIB: spp NSC coefficient' #description: "generated using run_spp.R script from # phantompeakqualtools." #plot_type: 'bargraph' -#anchor: 'nfcore_chipseq-nsc_coefficient' +#anchor: 'nsc_coefficient' #pconfig: # title: 'Normalized strand cross-correlation coefficient' # ylab: 'NSC coefficient' diff --git a/assets/multiqc/spp_rsc_header.txt b/assets/multiqc/spp_rsc_header.txt index 084e056e..bab5e09b 100644 --- a/assets/multiqc/spp_rsc_header.txt +++ b/assets/multiqc/spp_rsc_header.txt @@ -1,9 +1,9 @@ #id: 'rsc_coefficient' -#section_name: 'spp: RSC coefficient' +#section_name: 'MERGED LIB: spp RSC coefficient' #description: "generated using run_spp.R script from # phantompeakqualtools." #plot_type: 'bargraph' -#anchor: 'nfcore_chipseq-rsc_coefficient' +#anchor: 'rsc_coefficient' #pconfig: # title: 'Relative strand cross-correlation coefficient' # ylab: 'RSC coefficient' diff --git a/assets/multiqc/multiqc_config.yaml b/assets/multiqc_config.yaml similarity index 59% rename from assets/multiqc/multiqc_config.yaml rename to assets/multiqc_config.yaml index 5af6e34f..6d5da1d6 100644 --- a/assets/multiqc/multiqc_config.yaml +++ b/assets/multiqc_config.yaml @@ -3,90 +3,85 @@ report_comment: > analysis pipeline. For information about how to interpret these results, please see the documentation. -export_plots: true +data_format: 'yaml' -fn_clean_exts: - - 'fastq.gz' - - '_trimmed' - - '_val' - - 'sorted.bam' - - '.Lb' - - 'mkD' - - 'clN' - - 'mLb' - - '_peaks' - - '_spp' - - '.spp' +run_modules: + - custom_content + - fastqc + - cutadapt + - samtools + - picard + - preseq + - featureCounts + - deeptools + - phantompeakqualtools + +exclude_modules: + - 'general_stats' module_order: - fastqc: - name: 'FastQC (library; raw)' - info: 'This section of the report shows FastQC results before adapter trimming.' + name: 'LIB: FastQC (raw)' + info: 'This section of the report shows FastQC results before adapter trimming for individual libraries.' path_filters: - - '*_fastqc.zip' - path_filters_exclude: - - '*val*_fastqc.zip' - - '*trimmed_fastqc.zip' + - './fastqc/*.zip' - cutadapt: - name: 'cutadapt (library; trimmed)' - info: 'This section of the report shows the length of trimmed reads by cutadapt.' - path_filters: - - '*_trimming_report.txt' + name: 'LIB: cutadapt (trimmed)' + info: 'This section of the report shows the length of trimmed reads by cutadapt for individual libraries.' - fastqc: - name: 'FastQC (library; trimmed)' - info: 'This section of the report shows FastQC results after adapter trimming.' + name: 'LIB: FastQC (trimmed)' + info: 'This section of the report shows FastQC results after adapter trimming for individual libraries.' path_filters: - - '*val*_fastqc.zip' - - '*trimmed_fastqc.zip' + - './trimgalore/fastqc/*.zip' - samtools: - name: 'SAMTools (library)' + name: 'LIB: SAMTools' info: 'This section of the report shows SAMTools results for individual libraries.' path_filters: - - '*.Lb.sorted.bam*' + - './alignment/library/*' - samtools: - name: 'SAMTools (merged library; unfiltered)' + name: 'MERGED LIB: SAMTools (unfiltered)' info: 'This section of the report shows SAMTools results after merging libraries and before filtering.' path_filters: - - '*mLb.mkD.sorted.bam*' + - './alignment/mergedLibrary/*.mLb.mkD.sorted.bam*' - preseq: - name: 'Preseq (merged library; unfiltered)' + name: 'MERGED LIB: Preseq (unfiltered)' info: 'This section of the report shows Preseq results after merging libraries and before filtering.' - path_filters: - - '*mLb*' - samtools: - name: 'SAMTools (merged library; filtered)' + name: 'MERGED LIB: SAMTools (filtered)' info: 'This section of the report shows SAMTools results after merging libraries and after filtering.' path_filters: - - '*mLb.clN.sorted.bam*' + - './alignment/mergedLibrary/*.mLb.clN.sorted.bam*' - picard: - name: 'Picard (merged library; filtered)' + name: 'MERGED LIB: Picard' info: 'This section of the report shows picard results after merging libraries and after filtering.' path_filters: - - '*mLb*' + - './alignment/mergedLibrary/picard_metrics/*' - deeptools: - name: 'deepTools' + name: 'MERGED LIB: deepTools' + anchor: 'mlib_deeptools' info: 'This section of the report shows ChIP-seq QC plots generated by deepTools.' - path_filters: - - '*.plot*' - featureCounts: - name: 'featureCounts' + name: 'MERGED LIB: featureCounts' + anchor: 'mlib_featurecounts' info: 'This section of the report shows featureCounts results for the number of reads assigned to merged library consensus peaks.' path_filters: - - '*featureCounts*' + - './macs/consensus/*.summary' report_section_order: peak_count: - order: -1000 + before: mlib_deeptools frip_score: - order: -1100 + before: peak_count + peak_annotation: + before: frip_score strand_shift_correlation: - order: -1200 + before: peak_annotation nsc_coefficient: - order: -1300 + before: strand_shift_correlation rsc_coefficient: - order: -1400 - peak_annotation: - order: -1500 + before: nsc_coefficient + mlib_featurecounts: + before: rsc_coefficient deseq2_pca_1: order: -1600 deseq2_pca_2: @@ -133,9 +128,37 @@ report_section_order: order: -3700 custom_plot_config: - picard-insertsize: + picard_insert_size: cpswitch_c_active: False smooth_points: 1000 - featurecounts: cpswitch_c_active: False + +extra_fn_clean_exts: + - 'fastq.gz' + - '_trimmed' + - '_val' + - 'sorted.bam' + - '.Lb' + - 'mkD' + - 'clN' + - 'mLb' + - '_peaks' + - '_spp' + - '.spp' + - 'ccurve' + +# # Customise the module search patterns to speed up execution time +# # - Skip module sub-tools that we are not interested in +# # - Replace file-content searching with filename pattern searching +# # - Don't add anything that is the same as the MultiQC default +# # See https://multiqc.info/docs/#optimise-file-search-patterns for details +sp: + cutadapt: + fn: '*trimming_report.txt' + preseq: + fn: '*.ccurve.txt' + deeptools/plotFingerprintOutRawCounts: + fn: '*plotFingerprint*' + deeptools/plotProfile: + fn: '*plotProfile*' diff --git a/assets/nf-core-chipseq_social_preview.png b/assets/nf-core-chipseq_social_preview.png new file mode 100644 index 00000000..57778def Binary files /dev/null and b/assets/nf-core-chipseq_social_preview.png differ diff --git a/assets/nf-core-chipseq_social_preview.svg b/assets/nf-core-chipseq_social_preview.svg new file mode 100644 index 00000000..078dfb8c --- /dev/null +++ b/assets/nf-core-chipseq_social_preview.svg @@ -0,0 +1,446 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + ChIP-seq peak-calling, QC and differential analysis pipeline + chipseq + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/bin/bampe_rm_orphan.py b/bin/bampe_rm_orphan.py index c11d4e67..5b0a6f72 100755 --- a/bin/bampe_rm_orphan.py +++ b/bin/bampe_rm_orphan.py @@ -8,6 +8,7 @@ import os import pysam +import errno import argparse ############################################ @@ -61,10 +62,11 @@ def bampe_rm_orphan(BAMIn,BAMOut,onlyFRPairs=False): ## ITERATE THROUGH BAM FILE EOF = 0 - SAMFin = pysam.Samfile(BAMIn,"rb") - SAMFout = pysam.Samfile(BAMOut, "wb",header=SAMFin.header) - currRead = SAMFin.next() - for read in SAMFin: + SAMFin = pysam.AlignmentFile(BAMIn, "rb") + SAMFout = pysam.AlignmentFile(BAMOut, "wb", header=SAMFin.header) + iter = SAMFin.fetch(until_eof=True) + currRead = next(iter) + for read in iter: totalReads += 1 if currRead.qname == read.qname: pair1 = currRead; pair2 = read @@ -103,7 +105,7 @@ def bampe_rm_orphan(BAMIn,BAMOut,onlyFRPairs=False): ## RESET COUNTER try: totalReads += 1 - currRead = SAMFin.next() + currRead = next(iter) except: StopIteration EOF = 1 diff --git a/bin/check_design.py b/bin/check_design.py index 9c263b8f..51a99375 100755 --- a/bin/check_design.py +++ b/bin/check_design.py @@ -8,7 +8,6 @@ import os import sys -import requests import argparse ############################################ @@ -43,7 +42,7 @@ def reformat_design(DesignFile,ReadMappingFile,ControlMappingFile): fin = open(DesignFile,'r') header = fin.readline().strip().split(',') if header != HEADER: - print "{} header: {} != {}".format(ERROR_STR,','.join(header),','.join(HEADER)) + print("{} header: {} != {}".format(ERROR_STR,','.join(header),','.join(HEADER))) sys.exit(1) numColList = [] @@ -58,48 +57,48 @@ def reformat_design(DesignFile,ReadMappingFile,ControlMappingFile): ## CHECK VALID NUMBER OF COLUMNS PER SAMPLE numCols = len(lspl) if numCols not in [6]: - print "{}: Invalid number of columns (should be 6)!\nLine: '{}'".format(ERROR_STR,line.strip()) + print("{}: Invalid number of columns (should be 6)!\nLine: '{}'".format(ERROR_STR,line.strip())) sys.exit(1) numColList.append(numCols) ## CHECK GROUP ID DOESNT CONTAIN SPACES if group.find(' ') != -1: - print "{}: Group id contains spaces!\nLine: '{}'".format(ERROR_STR,line.strip()) + print("{}: Group id contains spaces!\nLine: '{}'".format(ERROR_STR,line.strip())) sys.exit(1) ## CHECK REPLICATE COLUMN IS INTEGER if not replicate.isdigit(): - print "{}: Replicate id not an integer!\nLine: '{}'".format(ERROR_STR,line.strip()) + print("{}: Replicate id not an integer!\nLine: '{}'".format(ERROR_STR,line.strip())) sys.exit(1) replicate = int(replicate) for fastq in fastQFiles: ## CHECK FASTQ FILE EXTENSION if fastq[-9:] != '.fastq.gz' and fastq[-6:] != '.fq.gz': - print "{}: FastQ file has incorrect extension (has to be '.fastq.gz' or 'fq.gz') - {}\nLine: '{}'".format(ERROR_STR,fastq,line.strip()) + print("{}: FastQ file has incorrect extension (has to be '.fastq.gz' or 'fq.gz') - {}\nLine: '{}'".format(ERROR_STR,fastq,line.strip())) sys.exit(1) ## CREATE GROUP MAPPING DICT = {GROUP_ID: {REPLICATE_ID:[[FASTQ_FILES]]} - if not sampleMappingDict.has_key(group): + if group not in sampleMappingDict: sampleMappingDict[group] = {} - if not sampleMappingDict[group].has_key(replicate): + if replicate not in sampleMappingDict[group]: sampleMappingDict[group][replicate] = [] sampleMappingDict[group][replicate].append(fastQFiles) ## CHECK BOTH ANTIBODY AND CONTROL COLUMNS HAVE VALID VALUES if antibody: if antibody.find(' ') != -1: - print "{}: Antibody id contains spaces!\nLine: '{}'".format(ERROR_STR,line.strip()) + print("{}: Antibody id contains spaces!\nLine: '{}'".format(ERROR_STR,line.strip())) sys.exit(1) if not control: - print "{}: both Antibody and Control must be specified!\nLine: '{}'".format(ERROR_STR,line.strip()) + print("{}: both Antibody and Control must be specified!\nLine: '{}'".format(ERROR_STR,line.strip())) sys.exit(1) if control: if control.find(' ') != -1: - print "{}: Control id contains spaces!\nLine: '{}'".format(ERROR_STR,line.strip()) + print("{}: Control id contains spaces!\nLine: '{}'".format(ERROR_STR,line.strip())) sys.exit(1) if not antibody: - print "{}: both Antibody and Control must be specified!\nLine: '{}'".format(ERROR_STR,line.strip()) + print("{}: both Antibody and Control must be specified!\nLine: '{}'".format(ERROR_STR,line.strip())) sys.exit(1) ## CREATE ANTIBODY MAPPING CONTROL DICT @@ -112,12 +111,12 @@ def reformat_design(DesignFile,ReadMappingFile,ControlMappingFile): ## CHECK IF DATA IS PAIRED-END OR SINGLE-END AND NOT A MIXTURE if min(numColList) != max(numColList): - print "{}: Mixture of paired-end and single-end reads!".format(ERROR_STR) + print("{}: Mixture of paired-end and single-end reads!".format(ERROR_STR)) sys.exit(1) ## CHECK IF ANTIBODY AND CONTROL COLUMNS HAVE BEEN SPECIFIED AT LEAST ONCE if len(antibodyDict) == 0: - print "{}: Antibody and Control must be specified at least once!".format(ERROR_STR) + print("{}: Antibody and Control must be specified at least once!".format(ERROR_STR)) sys.exit(1) ## WRITE READ MAPPING FILE @@ -129,7 +128,7 @@ def reformat_design(DesignFile,ReadMappingFile,ControlMappingFile): ## CHECK THAT REPLICATE IDS ARE IN FORMAT 1.. uniq_rep_ids = set(sampleMappingDict[group].keys()) if len(uniq_rep_ids) != max(uniq_rep_ids): - print "{}: Replicate IDs must start with 1..\nGroup: {}, Replicate IDs: {}".format(ERROR_STR,group,list(uniq_rep_ids)) + print("{}: Replicate IDs must start with 1..\nGroup: {}, Replicate IDs: {}".format(ERROR_STR,group,list(uniq_rep_ids))) sys.exit(1) ## RECONSTRUCT LINE FOR SAMPLE IN DESIGN @@ -145,21 +144,21 @@ def reformat_design(DesignFile,ReadMappingFile,ControlMappingFile): fout.write(','.join(oList) + '\n') ## EXTRAPOLATE CONTROL COLUMN - if antibodyDict.has_key(group): + if group in antibodyDict: antibody,control = antibodyDict[group] if control in sampleMappingDict.keys(): control_id = "{}_R1".format(control) - if sampleMappingDict[control].has_key(replicate): + if replicate in sampleMappingDict[control]: control_id = "{}_R{}".format(control,replicate) - if not antibodyGroupDict.has_key(antibody): + if antibody not in antibodyGroupDict: antibodyGroupDict[antibody] = {} - if not antibodyGroupDict[antibody].has_key(group): + if group not in antibodyGroupDict[antibody]: antibodyGroupDict[antibody][group] = [] antibodyList = [sample_id[:-3],control_id] if not antibodyList in antibodyGroupDict[antibody][group]: antibodyGroupDict[antibody][group].append(antibodyList) else: - print "{}: Control id not a valid group\nControl id: {}, Valid Groups: {}".format(ERROR_STR,control,sorted(sampleMappingDict.keys())) + print("{}: Control id not a valid group\nControl id: {}, Valid Groups: {}".format(ERROR_STR,control,sorted(sampleMappingDict.keys()))) sys.exit(1) fout.close() diff --git a/bin/featurecounts_deseq2.r b/bin/featurecounts_deseq2.r index c6683cbb..b75335a5 100755 --- a/bin/featurecounts_deseq2.r +++ b/bin/featurecounts_deseq2.r @@ -25,6 +25,7 @@ library(ggplot2) library(RColorBrewer) library(pheatmap) library(lattice) +library(BiocParallel) ################################################ ################################################ @@ -32,11 +33,13 @@ library(lattice) ################################################ ################################################ -option_list <- list(make_option(c("-i", "--featurecount_file"), type="character", default=NULL, help="Feature count file generated by the SubRead featureCounts command.", metavar="featurecount_file"), - make_option(c("-b", "--bam_suffix"), type="character", default=NULL, help="Portion of filename after sample name in featurecount file header e.g. '.rmDup.bam' if 'DRUG_R1.rmDup.bam'", metavar="bam_suffix"), +option_list <- list(make_option(c("-i", "--featurecount_file"), type="character", default=NULL, help="Feature count file generated by the SubRead featureCounts command.", metavar="path"), + make_option(c("-b", "--bam_suffix"), type="character", default=NULL, help="Portion of filename after sample name in featurecount file header e.g. '.rmDup.bam' if 'DRUG_R1.rmDup.bam'", metavar="string"), make_option(c("-o", "--outdir"), type="character", default='./', help="Output directory", metavar="path"), - make_option(c("-p", "--outprefix"), type="character", default='differential', help="Output prefix", metavar="character"), - make_option(c("-s", "--outsuffix"), type="character", default='', help="Output suffix for comparison-level results", metavar="character")) + make_option(c("-p", "--outprefix"), type="character", default='differential', help="Output prefix", metavar="string"), + make_option(c("-s", "--outsuffix"), type="character", default='', help="Output suffix for comparison-level results", metavar="string"), + make_option(c("-v", "--vst"), type="logical", default=FALSE, help="Run vst transform instead of rlog", metavar="boolean"), + make_option(c("-c", "--cores"), type="integer", default=1, help="Number of cores", metavar="integer")) opt_parser <- OptionParser(option_list=option_list) opt <- parse_args(opt_parser) @@ -75,7 +78,7 @@ if (file.exists(opt$outdir) == FALSE) { setwd(opt$outdir) samples.vec <- sort(colnames(count.table)) -groups <- substr(samples.vec, 1, nchar(samples.vec)-3) +groups <- sub("_[^_]+$", "", samples.vec) print(unique(groups)) if (length(unique(groups)) == 1) { quit(save = "no", status = 0, runLast = FALSE) @@ -86,8 +89,12 @@ if (file.exists(DDSFile) == FALSE) { counts <- count.table[,samples.vec,drop=FALSE] coldata <- data.frame(row.names=colnames(counts),condition=groups) dds <- DESeqDataSetFromMatrix(countData = round(counts), colData = coldata, design = ~ condition) - dds <- DESeq(dds) - rld <- rlog(dds) + dds <- DESeq(dds, parallel=TRUE, BPPARAM=MulticoreParam(opt$cores)) + if (!opt$vst) { + rld <- rlog(dds) + } else { + rld <- vst(dds) + } save(dds,rld,file=DDSFile) } diff --git a/bin/igv_files_to_session.py b/bin/igv_files_to_session.py index 6dc6aabe..48e749c8 100755 --- a/bin/igv_files_to_session.py +++ b/bin/igv_files_to_session.py @@ -7,6 +7,7 @@ ####################################################################### import os +import errno import argparse ############################################ diff --git a/bin/macs2_merged_expand.py b/bin/macs2_merged_expand.py index 10889947..f4e84a14 100755 --- a/bin/macs2_merged_expand.py +++ b/bin/macs2_merged_expand.py @@ -7,6 +7,7 @@ ####################################################################### import os +import errno import argparse ############################################ @@ -87,9 +88,9 @@ def macs2_merged_expand(MergedIntervalTxtFile,SampleNameList,OutFile,isNarrow=Fa groupDict = {} for sID in ['_'.join(x.split('_')[:-2]) for x in names]: gID = '_'.join(sID.split('_')[:-1]) - if not groupDict.has_key(gID): + if gID not in groupDict: groupDict[gID] = [] - if not sID in groupDict[gID]: + if sID not in groupDict[gID]: groupDict[gID].append(sID) ## GET SAMPLES THAT PASS REPLICATE THRESHOLD @@ -103,23 +104,23 @@ def macs2_merged_expand(MergedIntervalTxtFile,SampleNameList,OutFile,isNarrow=Fa for idx in range(len(names)): sample = '_'.join(names[idx].split('_')[:-2]) if sample in passRepThreshList: - if not fcDict.has_key(sample): + if sample not in fcDict: fcDict[sample] = [] fcDict[sample].append(str(fcs[idx])) - if not qvalDict.has_key(sample): + if sample not in qvalDict: qvalDict[sample] = [] qvalDict[sample].append(str(qvals[idx])) - if not pvalDict.has_key(sample): + if sample not in pvalDict: pvalDict[sample] = [] pvalDict[sample].append(str(pvals[idx])) - if not startDict.has_key(sample): + if sample not in startDict: startDict[sample] = [] startDict[sample].append(str(starts[idx])) - if not endDict.has_key(sample): + if sample not in endDict: endDict[sample] = [] endDict[sample].append(str(ends[idx])) if isNarrow: - if not summitDict.has_key(sample): + if sample not in summitDict: summitDict[sample] = [] summitDict[sample].append(str(summits[idx])) @@ -138,7 +139,7 @@ def macs2_merged_expand(MergedIntervalTxtFile,SampleNameList,OutFile,isNarrow=Fa fout.write('\t'.join(oList) + '\n') tsamples = tuple(sorted(samples)) - if not combFreqDict.has_key(tsamples): + if tsamples not in combFreqDict: combFreqDict[tsamples] = 0 combFreqDict[tsamples] += 1 totalOutIntervals += 1 diff --git a/bin/markdown_to_html.py b/bin/markdown_to_html.py new file mode 100755 index 00000000..57cc4263 --- /dev/null +++ b/bin/markdown_to_html.py @@ -0,0 +1,100 @@ +#!/usr/bin/env python +from __future__ import print_function +import argparse +import markdown +import os +import sys + +def convert_markdown(in_fn): + input_md = open(in_fn, mode="r", encoding="utf-8").read() + html = markdown.markdown( + "[TOC]\n" + input_md, + extensions = [ + 'pymdownx.extra', + 'pymdownx.b64', + 'pymdownx.highlight', + 'pymdownx.emoji', + 'pymdownx.tilde', + 'toc' + ], + extension_configs = { + 'pymdownx.b64': { + 'base_path': os.path.dirname(in_fn) + }, + 'pymdownx.highlight': { + 'noclasses': True + }, + 'toc': { + 'title': 'Table of Contents' + } + } + ) + return html + +def wrap_html(contents): + header = """ + + + + + +
+ """ + footer = """ +
+ + + """ + return header + contents + footer + + +def parse_args(args=None): + parser = argparse.ArgumentParser() + parser.add_argument('mdfile', type=argparse.FileType('r'), nargs='?', + help='File to convert. Defaults to stdin.') + parser.add_argument('-o', '--out', type=argparse.FileType('w'), + default=sys.stdout, + help='Output file name. Defaults to stdout.') + return parser.parse_args(args) + +def main(args=None): + args = parse_args(args) + converted_md = convert_markdown(args.mdfile.name) + html = wrap_html(converted_md) + args.out.write(html) + +if __name__ == '__main__': + sys.exit(main()) diff --git a/bin/markdown_to_html.r b/bin/markdown_to_html.r deleted file mode 100755 index abe13350..00000000 --- a/bin/markdown_to_html.r +++ /dev/null @@ -1,51 +0,0 @@ -#!/usr/bin/env Rscript - -# Command line argument processing -args = commandArgs(trailingOnly=TRUE) -if (length(args) < 2) { - stop("Usage: markdown_to_html.r ", call.=FALSE) -} -markdown_fn <- args[1] -output_fn <- args[2] - -# Load / install packages -if (!require("markdown")) { - install.packages("markdown", dependencies=TRUE, repos='http://cloud.r-project.org/') - library("markdown") -} - -base_css_fn <- getOption("markdown.HTML.stylesheet") -base_css <- readChar(base_css_fn, file.info(base_css_fn)$size) -custom_css <- paste(base_css, " -body { - padding: 3em; - margin-right: 350px; - max-width: 100%; -} -#toc { - position: fixed; - right: 20px; - width: 300px; - padding-top: 20px; - overflow: scroll; - height: calc(100% - 3em - 20px); -} -#toc_header { - font-size: 1.8em; - font-weight: bold; -} -#toc > ul { - padding-left: 0; - list-style-type: none; -} -#toc > ul ul { padding-left: 20px; } -#toc > ul > li > a { display: none; } -img { max-width: 800px; } -") - -markdownToHTML( - file = markdown_fn, - output = output_fn, - stylesheet = custom_css, - options = c('toc', 'base64_images', 'highlight_code') -) diff --git a/bin/plot_homer_annotatepeaks.r b/bin/plot_homer_annotatepeaks.r index 9865357c..4a867d8f 100755 --- a/bin/plot_homer_annotatepeaks.r +++ b/bin/plot_homer_annotatepeaks.r @@ -17,10 +17,10 @@ library(scales) ################################################ ################################################ -option_list <- list(make_option(c("-i", "--homer_files"), type="character", default=NULL, help="Comma-separated list of homer annotated text files.", metavar="anno_files"), - make_option(c("-s", "--sample_ids"), type="character", default=NULL, help="Comma-separated list of sample ids associated with homer annotated text files. Must be unique and in same order as homer files input.", metavar="sampleids"), +option_list <- list(make_option(c("-i", "--homer_files"), type="character", default=NULL, help="Comma-separated list of homer annotated text files.", metavar="path"), + make_option(c("-s", "--sample_ids"), type="character", default=NULL, help="Comma-separated list of sample ids associated with homer annotated text files. Must be unique and in same order as homer files input.", metavar="string"), make_option(c("-o", "--outdir"), type="character", default='./', help="Output directory", metavar="path"), - make_option(c("-p", "--outprefix"), type="character", default='homer_annotation', help="Output prefix", metavar="character")) + make_option(c("-p", "--outprefix"), type="character", default='homer_annotation', help="Output prefix", metavar="string")) opt_parser <- OptionParser(option_list=option_list) opt <- parse_args(opt_parser) @@ -57,12 +57,21 @@ plot.feature.dat <- data.frame() for (idx in 1:length(HomerFiles)) { sampleid = SampleIDs[idx] - anno.dat <- read.table(HomerFiles[idx], sep="\t", header=TRUE,quote="") + anno.dat <- read.csv(HomerFiles[idx], sep="\t", header=TRUE) anno.dat <- anno.dat[,c("Annotation","Distance.to.TSS","Nearest.PromoterID")] - anno.dat <- anno.dat[which(!is.na(anno.dat$Distance.to.TSS)),] - if (nrow(anno.dat) == 0) { - quit(save = "no", status = 0, runLast = FALSE) - } + + ## REPLACE UNASSIGNED FEATURE ENTRIES WITH SENSIBLE VALUES + unassigned <- which(is.na(as.character(anno.dat$Distance.to.TSS))) + anno.dat$Distance.to.TSS[unassigned] <- 1000000 + + anno.dat$Annotation <- as.character(anno.dat$Annotation) + anno.dat$Annotation[unassigned] <- "Unassigned" + anno.dat$Annotation <- as.factor(anno.dat$Annotation) + + anno.dat$Nearest.PromoterID <- as.character(anno.dat$Nearest.PromoterID) + anno.dat$Nearest.PromoterID[unassigned] <- "Unassigned" + anno.dat$Nearest.PromoterID <- as.factor(anno.dat$Nearest.PromoterID) + anno.dat$name <- rep(sampleid,nrow(anno.dat)) anno.dat$Distance.to.TSS <- abs(anno.dat$Distance.to.TSS) + 1 plot.dat <- rbind(plot.dat,anno.dat) diff --git a/bin/plot_macs_qc.r b/bin/plot_macs_qc.r index 2360505f..b8e25d56 100755 --- a/bin/plot_macs_qc.r +++ b/bin/plot_macs_qc.r @@ -17,10 +17,10 @@ library(scales) ################################################ ################################################ -option_list <- list(make_option(c("-i", "--peak_files"), type="character", default=NULL, help="Comma-separated list of peak files.", metavar="peak_files"), - make_option(c("-s", "--sample_ids"), type="character", default=NULL, help="Comma-separated list of sample ids associated with peak files. Must be unique and in same order as peaks files input.", metavar="sampleids"), +option_list <- list(make_option(c("-i", "--peak_files"), type="character", default=NULL, help="Comma-separated list of peak files.", metavar="path"), + make_option(c("-s", "--sample_ids"), type="character", default=NULL, help="Comma-separated list of sample ids associated with peak files. Must be unique and in same order as peaks files input.", metavar="string"), make_option(c("-o", "--outdir"), type="character", default='./', help="Output directory", metavar="path"), - make_option(c("-p", "--outprefix"), type="character", default='macs2_peakqc', help="Output prefix", metavar="character")) + make_option(c("-p", "--outprefix"), type="character", default='macs2_peakqc', help="Output prefix", metavar="string")) opt_parser <- OptionParser(option_list=option_list) opt <- parse_args(opt_parser) diff --git a/bin/plot_peak_intersect.r b/bin/plot_peak_intersect.r index 2404b8b9..513e44b3 100755 --- a/bin/plot_peak_intersect.r +++ b/bin/plot_peak_intersect.r @@ -15,8 +15,8 @@ library(UpSetR) ################################################ ################################################ -option_list <- list(make_option(c("-i", "--input_file"), type="character", default=NULL, help="Path to tab-delimited file containing two columns i.e sample1&sample2&sample3 indicating intersect between samples set size.", metavar="input_file"), - make_option(c("-o", "--output_file"), type="character", default=NULL, help="Path to output file with '.pdf' extension.", metavar="output_file")) +option_list <- list(make_option(c("-i", "--input_file"), type="character", default=NULL, help="Path to tab-delimited file containing two columns i.e sample1&sample2&sample3 indicating intersect between samples set size.", metavar="path"), + make_option(c("-o", "--output_file"), type="character", default=NULL, help="Path to output file with '.pdf' extension.", metavar="path")) opt_parser <- OptionParser(option_list=option_list) opt <- parse_args(opt_parser) @@ -44,11 +44,13 @@ if (file.exists(OutDir) == FALSE) { comb.dat <- read.table(opt$input_file,sep="\t",header=FALSE) comb.vec <- comb.dat[,2] comb.vec <- setNames(comb.vec,comb.dat[,1]) - sets <- sort(unique(unlist(strsplit(names(comb.vec),split='&'))), decreasing = TRUE) + nintersects = length(names(comb.vec)) if (nintersects > 70) { nintersects <- 70 + comb.vec <- sort(comb.vec, decreasing = TRUE)[1:70] + sets <- sort(unique(unlist(strsplit(names(comb.vec),split='&'))), decreasing = TRUE) } pdf(opt$output_file,onefile=F,height=10,width=20) diff --git a/bin/scrape_software_versions.py b/bin/scrape_software_versions.py index 4e8dfbcd..6f61aa86 100755 --- a/bin/scrape_software_versions.py +++ b/bin/scrape_software_versions.py @@ -13,7 +13,7 @@ 'BEDTools': ['v_bedtools.txt', r"bedtools v(\S+)"], 'BamTools': ['v_bamtools.txt', r"bamtools (\S+)"], 'deepTools': ['v_deeptools.txt', r"plotFingerprint (\S+)"], - 'Picard': ['v_picard.txt', r"([\d\.]+)-SNAPSHOT"], + 'Picard': ['v_picard.txt', r"\n(\S+)"], 'R': ['v_R.txt', r"R version (\S+)"], 'Pysam': ['v_pysam.txt', r"(\S+)"], 'MACS2': ['v_macs2.txt', r"macs2 (\S+)"], @@ -52,9 +52,9 @@ results[k] = "v{}".format(match.group(1)) except IOError: results[k] = False - + # Remove software set to false in results -for k in results: +for k in list(results): if not results[k]: del(results[k]) diff --git a/conf/awsbatch.config b/conf/awsbatch.config deleted file mode 100755 index 14af5866..00000000 --- a/conf/awsbatch.config +++ /dev/null @@ -1,18 +0,0 @@ -/* - * ------------------------------------------------- - * Nextflow config file for running on AWS batch - * ------------------------------------------------- - * Base config needed for running with -profile awsbatch - */ -params { - config_profile_name = 'AWSBATCH' - config_profile_description = 'AWSBATCH Cloud Profile' - config_profile_contact = 'Alexander Peltzer (@apeltzer)' - config_profile_url = 'https://aws.amazon.com/de/batch/' -} - -aws.region = params.awsregion -process.executor = 'awsbatch' -process.queue = params.awsqueue -executor.awscli = '/home/ec2-user/miniconda/bin/aws' -params.tracedir = './' diff --git a/conf/base.config b/conf/base.config index 722a01ad..f6b9dec3 100644 --- a/conf/base.config +++ b/conf/base.config @@ -12,7 +12,7 @@ process { cpus = { check_max( 1 * task.attempt, 'cpus' ) } - memory = { check_max( 7.GB * task.attempt, 'memory' ) } + memory = { check_max( 6.GB * task.attempt, 'memory' ) } time = { check_max( 4.h * task.attempt, 'time' ) } errorStrategy = { task.exitStatus in [143,137,104,134,139] ? 'retry' : 'finish' } @@ -22,22 +22,25 @@ process { // Process-specific resource requirements withLabel:process_low { cpus = { check_max( 2 * task.attempt, 'cpus' ) } - memory = { check_max( 14.GB * task.attempt, 'memory' ) } + memory = { check_max( 12.GB * task.attempt, 'memory' ) } time = { check_max( 6.h * task.attempt, 'time' ) } } withLabel:process_medium { cpus = { check_max( 6 * task.attempt, 'cpus' ) } - memory = { check_max( 42.GB * task.attempt, 'memory' ) } + memory = { check_max( 36.GB * task.attempt, 'memory' ) } time = { check_max( 8.h * task.attempt, 'time' ) } } withLabel:process_high { cpus = { check_max( 12 * task.attempt, 'cpus' ) } - memory = { check_max( 84.GB * task.attempt, 'memory' ) } + memory = { check_max( 72.GB * task.attempt, 'memory' ) } time = { check_max( 16.h * task.attempt, 'time' ) } } withLabel:process_long { time = { check_max( 20.h * task.attempt, 'time' ) } } + withLabel:error_ignore { + errorStrategy = 'ignore' + } withName:get_software_versions { cache = false } diff --git a/conf/igenomes.config b/conf/igenomes.config index 37217cf4..2de92422 100644 --- a/conf/igenomes.config +++ b/conf/igenomes.config @@ -18,6 +18,7 @@ params { bismark = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/BismarkIndex/" gtf = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.gtf" bed12 = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/README.txt" mito_name = "MT" macs_gsize = "2.7e9" blacklist = "${baseDir}/assets/blacklists/GRCh37-blacklist.bed" @@ -42,6 +43,7 @@ params { bismark = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/BismarkIndex/" gtf = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/Genes/genes.gtf" bed12 = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/README.txt" mito_name = "MT" macs_gsize = "1.87e9" blacklist = "${baseDir}/assets/blacklists/GRCm38-blacklist.bed" @@ -54,6 +56,7 @@ params { bismark = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/BismarkIndex/" gtf = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Annotation/Genes/genes.gtf" bed12 = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Annotation/README.txt" mito_name = "Mt" } 'EB2' { @@ -64,6 +67,7 @@ params { bismark = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/BismarkIndex/" gtf = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Annotation/Genes/genes.gtf" bed12 = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Annotation/README.txt" } 'UMD3.1' { fasta = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/WholeGenomeFasta/genome.fa" @@ -73,6 +77,7 @@ params { bismark = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/BismarkIndex/" gtf = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Annotation/Genes/genes.gtf" bed12 = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Annotation/README.txt" mito_name = "MT" } 'WBcel235' { @@ -94,6 +99,7 @@ params { bismark = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/BismarkIndex/" gtf = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Annotation/Genes/genes.gtf" bed12 = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Annotation/README.txt" mito_name = "MT" } 'GRCz10' { @@ -125,6 +131,7 @@ params { bismark = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/BismarkIndex/" gtf = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Annotation/Genes/genes.gtf" bed12 = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Annotation/README.txt" mito_name = "MT" } 'EB1' { @@ -135,6 +142,7 @@ params { bismark = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/BismarkIndex/" gtf = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Annotation/Genes/genes.gtf" bed12 = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Annotation/README.txt" } 'Galgal4' { fasta = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/WholeGenomeFasta/genome.fa" @@ -154,6 +162,7 @@ params { bismark = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/BismarkIndex/" gtf = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Annotation/Genes/genes.gtf" bed12 = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Annotation/README.txt" } 'Mmul_1' { fasta = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/WholeGenomeFasta/genome.fa" @@ -163,6 +172,7 @@ params { bismark = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/BismarkIndex/" gtf = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Annotation/Genes/genes.gtf" bed12 = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Annotation/README.txt" mito_name = "MT" } 'IRGSP-1.0' { @@ -183,6 +193,7 @@ params { bismark = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/BismarkIndex/" gtf = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Annotation/Genes/genes.gtf" bed12 = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Annotation/README.txt" mito_name = "MT" } 'Rnor_6.0' { @@ -214,6 +225,7 @@ params { bismark = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/BismarkIndex/" gtf = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Annotation/Genes/genes.gtf" bed12 = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Annotation/README.txt" mito_name = "MT" macs_gsize = "1.21e7" } @@ -225,6 +237,7 @@ params { bismark = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/BismarkIndex/" gtf = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Annotation/Genes/genes.gtf" bed12 = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Annotation/README.txt" } 'Sscrofa10.2' { fasta = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/WholeGenomeFasta/genome.fa" @@ -234,6 +247,7 @@ params { bismark = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/BismarkIndex/" gtf = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Annotation/Genes/genes.gtf" bed12 = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Annotation/README.txt" mito_name = "MT" } 'AGPv3' { @@ -266,6 +280,7 @@ params { bismark = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/BismarkIndex/" gtf = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Annotation/Genes/genes.gtf" bed12 = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Annotation/README.txt" mito_name = "chrM" macs_gsize = "2.7e9" blacklist = "${baseDir}/assets/blacklists/hg19-blacklist.bed" @@ -278,6 +293,7 @@ params { bismark = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/BismarkIndex/" gtf = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Annotation/Genes/genes.gtf" bed12 = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Annotation/README.txt" mito_name = "chrM" macs_gsize = "1.87e9" blacklist = "${baseDir}/assets/blacklists/mm10-blacklist.bed" @@ -300,6 +316,7 @@ params { bismark = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/BismarkIndex/" gtf = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Annotation/Genes/genes.gtf" bed12 = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Annotation/README.txt" mito_name = "chrM" macs_gsize = "9e7" } @@ -311,6 +328,7 @@ params { bismark = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/BismarkIndex/" gtf = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Annotation/Genes/genes.gtf" bed12 = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Annotation/README.txt" mito_name = "chrM" } 'danRer10' { @@ -342,6 +360,7 @@ params { bismark = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/BismarkIndex/" gtf = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Annotation/Genes/genes.gtf" bed12 = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Annotation/README.txt" mito_name = "chrM" } 'galGal4' { @@ -352,6 +371,7 @@ params { bismark = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/BismarkIndex/" gtf = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Annotation/Genes/genes.gtf" bed12 = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Annotation/README.txt" mito_name = "chrM" } 'panTro4' { @@ -362,6 +382,7 @@ params { bismark = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/BismarkIndex/" gtf = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Annotation/Genes/genes.gtf" bed12 = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Annotation/README.txt" mito_name = "chrM" } 'rn6' { @@ -380,6 +401,7 @@ params { bowtie2 = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/Bowtie2Index/" star = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/STARIndex/" bismark = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/BismarkIndex/" + readme = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Annotation/README.txt" mito_name = "chrM" macs_gsize = "1.2e7" } @@ -391,6 +413,7 @@ params { bismark = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/BismarkIndex/" gtf = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Annotation/Genes/genes.gtf" bed12 = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Annotation/README.txt" mito_name = "chrM" } } diff --git a/conf/test.config b/conf/test.config index 2fdcb4e8..e3b4016c 100644 --- a/conf/test.config +++ b/conf/test.config @@ -4,13 +4,14 @@ * ------------------------------------------------- * Defines bundled input files and everything required * to run a fast and simple test. Use as follows: - * nextflow run nf-core/chipseq -profile test + * nextflow run nf-core/chipseq -profile test, */ params { config_profile_name = 'Test profile' config_profile_description = 'Minimal test dataset to check pipeline function' + // Limit resources so that this can run on GitHub Actions max_cpus = 2 max_memory = 6.GB max_time = 12.h diff --git a/conf/test_full.config b/conf/test_full.config new file mode 100755 index 00000000..3875f029 --- /dev/null +++ b/conf/test_full.config @@ -0,0 +1,20 @@ +/* + * ------------------------------------------------- + * Nextflow config file for running tests + * ------------------------------------------------- + * Defines bundled input files and everything required + * to run a full pipeline test. Use as follows: + * nextflow run nf-core/chipseq -profile test_full, + */ + +params { + config_profile_name = 'Full test profile' + config_profile_description = 'Full test dataset to check pipeline function' + + // Input data + input = 'https://raw.githubusercontent.com/nf-core/test-datasets/chipseq/design_full.csv' + single_end = true + + // Genome references + genome = 'hg19' +} diff --git a/docs/images/mqc_annotatePeaks_feature_percentage_plot.png b/docs/images/mqc_annotatePeaks_feature_percentage_plot.png index 6551111a..76de4813 100755 Binary files a/docs/images/mqc_annotatePeaks_feature_percentage_plot.png and b/docs/images/mqc_annotatePeaks_feature_percentage_plot.png differ diff --git a/docs/images/mqc_cutadapt_plot.png b/docs/images/mqc_cutadapt_plot.png index 43b51c5c..7fcc4348 100755 Binary files a/docs/images/mqc_cutadapt_plot.png and b/docs/images/mqc_cutadapt_plot.png differ diff --git a/docs/images/mqc_deeptools_plotFingerprint_plot.png b/docs/images/mqc_deeptools_plotFingerprint_plot.png index 41cc7181..6bc3a31b 100755 Binary files a/docs/images/mqc_deeptools_plotFingerprint_plot.png and b/docs/images/mqc_deeptools_plotFingerprint_plot.png differ diff --git a/docs/images/mqc_deeptools_plotProfile_plot.png b/docs/images/mqc_deeptools_plotProfile_plot.png index e1959daf..812dffc5 100755 Binary files a/docs/images/mqc_deeptools_plotProfile_plot.png and b/docs/images/mqc_deeptools_plotProfile_plot.png differ diff --git a/docs/images/mqc_deseq2_pca_plot.png b/docs/images/mqc_deseq2_pca_plot.png index 9fd98094..874001a6 100755 Binary files a/docs/images/mqc_deseq2_pca_plot.png and b/docs/images/mqc_deseq2_pca_plot.png differ diff --git a/docs/images/mqc_featureCounts_assignment_plot.png b/docs/images/mqc_featureCounts_assignment_plot.png index 0cb29d88..62546e88 100755 Binary files a/docs/images/mqc_featureCounts_assignment_plot.png and b/docs/images/mqc_featureCounts_assignment_plot.png differ diff --git a/docs/images/mqc_frip_score_plot.png b/docs/images/mqc_frip_score_plot.png index 30fa2307..e0d90129 100755 Binary files a/docs/images/mqc_frip_score_plot.png and b/docs/images/mqc_frip_score_plot.png differ diff --git a/docs/images/mqc_macs2_peak_count_plot.png b/docs/images/mqc_macs2_peak_count_plot.png index 074fbe47..54e46f30 100755 Binary files a/docs/images/mqc_macs2_peak_count_plot.png and b/docs/images/mqc_macs2_peak_count_plot.png differ diff --git a/docs/images/mqc_picard_deduplication_plot.png b/docs/images/mqc_picard_deduplication_plot.png index f19fe4c6..9b0c08f9 100755 Binary files a/docs/images/mqc_picard_deduplication_plot.png and b/docs/images/mqc_picard_deduplication_plot.png differ diff --git a/docs/images/mqc_picard_insert_size_plot.png b/docs/images/mqc_picard_insert_size_plot.png index bb43dab8..3c3d19eb 100755 Binary files a/docs/images/mqc_picard_insert_size_plot.png and b/docs/images/mqc_picard_insert_size_plot.png differ diff --git a/docs/images/mqc_preseq_plot.png b/docs/images/mqc_preseq_plot.png index 962641c2..c4c98f17 100755 Binary files a/docs/images/mqc_preseq_plot.png and b/docs/images/mqc_preseq_plot.png differ diff --git a/docs/images/mqc_samtools_stats_plot.png b/docs/images/mqc_samtools_stats_plot.png index f9c76ce0..cd7a50fd 100755 Binary files a/docs/images/mqc_samtools_stats_plot.png and b/docs/images/mqc_samtools_stats_plot.png differ diff --git a/docs/images/mqc_spp_nsc_plot.png b/docs/images/mqc_spp_nsc_plot.png index 76d87e47..c98aa1f9 100755 Binary files a/docs/images/mqc_spp_nsc_plot.png and b/docs/images/mqc_spp_nsc_plot.png differ diff --git a/docs/images/mqc_spp_rsc_plot.png b/docs/images/mqc_spp_rsc_plot.png index 7fc4bee6..a6f0365c 100755 Binary files a/docs/images/mqc_spp_rsc_plot.png and b/docs/images/mqc_spp_rsc_plot.png differ diff --git a/docs/images/mqc_spp_strand_correlation_plot.png b/docs/images/mqc_spp_strand_correlation_plot.png index 870d5ab1..151ca672 100755 Binary files a/docs/images/mqc_spp_strand_correlation_plot.png and b/docs/images/mqc_spp_strand_correlation_plot.png differ diff --git a/docs/images/nf-core-chipseq_logo.png b/docs/images/nf-core-chipseq_logo.png old mode 100755 new mode 100644 diff --git a/docs/images/r_deseq2_ma_plot.png b/docs/images/r_deseq2_ma_plot.png index 5b7c17c4..67c598fc 100755 Binary files a/docs/images/r_deseq2_ma_plot.png and b/docs/images/r_deseq2_ma_plot.png differ diff --git a/docs/images/r_deseq2_volcano_plot.png b/docs/images/r_deseq2_volcano_plot.png index c9872f71..679746b4 100755 Binary files a/docs/images/r_deseq2_volcano_plot.png and b/docs/images/r_deseq2_volcano_plot.png differ diff --git a/docs/output.md b/docs/output.md index b797b869..7a2c654c 100644 --- a/docs/output.md +++ b/docs/output.md @@ -1,4 +1,4 @@ -# nf-core/chipseq: Output +# ![nf-core/chipseq](images/nf-core-chipseq_logo.png) This document describes the output produced by the pipeline. Most of the plots are taken from the MultiQC report, which summarises results at the end of the pipeline. @@ -14,311 +14,308 @@ The directories listed below will be created in the output directory after the p The initial QC and alignments are performed at the library-level e.g. if the same library has been sequenced more than once to increase sequencing depth. This has the advantage of being able to assess each library individually, and the ability to process multiple libraries from the same sample in parallel. -1. **Raw read QC** +### Raw read QC - *Documentation*: - [FastQC](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/) +
+ Output files - *Description*: - FastQC gives general quality metrics about your reads. It provides information about the quality score distribution across your reads, the per base sequence content (%A/C/G/T). You get information about adapter contamination and other overrepresented sequences. +* `fastqc/` + * `*_fastqc.html`: FastQC report containing quality metrics for read 1 (*and read2 if paired-end*) **before** adapter trimming. +* `fastqc/zips/` + * `*_fastqc.zip`: Zip archive containing the FastQC report, tab-delimited data file and plot images. - *Output directories*: - * `fastqc/` - FastQC `*.html` files for read 1 (*and read2 if paired-end*) **before** adapter trimming. - * `fastqc/zips/` - FastQC `*.zip` files for read 1 (*and read2 if paired-end*) **before** adapter trimming. +
-2. **Adapter trimming** +[FastQC](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/) gives general quality metrics about your reads. It provides information about the quality score distribution across your reads, the per base sequence content (%A/C/G/T). You get information about adapter contamination and other overrepresented sequences. - *Documentation*: - [Trim Galore!](https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) +### Adapter trimming - *Description*: - Trim Galore! is a wrapper tool around Cutadapt and FastQC to consistently apply quality and adapter trimming to FastQ files. By default, Trim Galore! will automatically detect and trim the appropriate adapter sequence. See [`usage.md`](usage.md) for more details about the trimming options. +
+ Output files - ![MultiQC - Cutadapt trimmed sequence plot](images/mqc_cutadapt_plot.png) +* `trim_galore/` + * `*fastq.gz`: If `--save_trimmed` is specified, FastQ files **after** adapter trimming will be placed in this directory. +* `trim_galore/logs/` + * `*.log`: Log file generated by Trim Galore!. +* `trim_galore/fastqc/` + * `*_fastqc.html`: FastQC report containing quality metrics for read 1 (*and read2 if paired-end*) **after** adapter trimming. +* `trim_galore/fastqc/zips/` + * `*_fastqc.zip`: Zip archive containing the FastQC report, tab-delimited data file and plot images. - *Output directories*: - * `trim_galore/` - If `--save_trimmed` is specified FastQ files **after** adapter trimming will be placed in this directory. - * `trim_galore/logs/` - `*.log` files generated by Trim Galore!. - * `trim_galore/fastqc/` - FastQC `*.html` files for read 1 (*and read2 if paired-end*) **after** adapter trimming. - * `trim_galore/fastqc/zips/` - FastQC `*.zip` files for read 1 (*and read2 if paired-end*) **after** adapter trimming. +
-3. **Alignment** +[Trim Galore!](https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) is a wrapper tool around Cutadapt and FastQC to consistently apply quality and adapter trimming to FastQ files. By default, Trim Galore! will automatically detect and trim the appropriate adapter sequence. See [`usage.md`](usage.md) for more details about the trimming options. - *Documentation*: - [BWA](http://bio-bwa.sourceforge.net/bwa.shtml), [SAMtools](http://samtools.sourceforge.net/) +![MultiQC - Cutadapt trimmed sequence plot](images/mqc_cutadapt_plot.png) - *Description*: - Adapter-trimmed reads are mapped to the reference assembly using BWA. A genome index is required to run BWA so if this is not provided explicitly using the `--bwa_index` parameter then it will be created automatically from the genome fasta input. The index creation process can take a while for larger genomes so it is possible to use the `--save_reference` parameter to save the indices for future pipeline runs, reducing processing times. +### Alignment - ![MultiQC - SAMtools stats plot](images/mqc_samtools_stats_plot.png) +
+ Output files - File names in the resulting directory (i.e. `bwa/library/`) will have the '`.Lb.`' (**L**i**b**rary) suffix. +* `bwa/library/` + * `*.bam`: The files resulting from the alignment of individual libraries are not saved by default so this directory will not be present in your results. You can override this behaviour with the use of the `--save_align_intermeds` flag in which case it will contain the coordinate sorted alignment files in [`*.bam`](https://samtools.github.io/hts-specs/SAMv1.pdf) format. +* `bwa/library/samtools_stats/` + * SAMtools `.sorted.bam.flagstat`, `.sorted.bam.idxstats` and `.sorted.bam.stats` files generated from the alignment files. - *Output directories*: - * `bwa/library/` - The files resulting from the alignment of individual libraries are not saved by default so this directory will not be present in your results. You can override this behaviour with the use of the `--saveAlignedIntermediates` flag in which case it will contain the coordinate sorted alignment files in [`*.bam`](https://samtools.github.io/hts-specs/SAMv1.pdf) format. - * `bwa/library/samtools_stats/` - SAMtools `*.flagstat`, `*.idxstats` and `*.stats` files generated from the alignment files. +> **NB:** File names in the resulting directory (i.e. `bwa/library/`) will have the '`.Lb.`' suffix. + +
+ +Adapter-trimmed reads are mapped to the reference assembly using [BWA](http://bio-bwa.sourceforge.net/bwa.shtml). A genome index is required to run BWA so if this is not provided explicitly using the `--bwa_index` parameter then it will be created automatically from the genome fasta input. The index creation process can take a while for larger genomes so it is possible to use the `--save_reference` parameter to save the indices for future pipeline runs, reducing processing times. + +![MultiQC - SAMtools stats plot](images/mqc_samtools_stats_plot.png) ## Merged library-level analysis The library-level alignments associated with the same sample are merged and subsequently used for the downstream analyses. -1. **Alignment merging, duplicate marking, filtering and QC** +### Alignment merging, duplicate marking, filtering and QC + +
+ Output files + +* `bwa/mergedLibrary/` + * `*.bam`: Merged library-level, coordinate sorted `*.bam` files after the marking of duplicates, and filtering based on various criteria. The file suffix for the final filtered files will be `*.mLb.clN.*`. If you specify the `--save_align_intermeds` parameter then two additional sets of files will be present. These represent the unfiltered alignments with duplicates marked (`*.mLb.mkD.*`), and in the case of paired-end datasets the filtered alignments before the removal of orphan read pairs (`*.mLb.flT.*`). +* `bwa/mergedLibrary/samtools_stats/` + * SAMtools `*.flagstat`, `*.idxstats` and `*.stats` files generated from the alignment files. +* `bwa/mergedLibrary/picard_metrics/` + * `*_metrics`: Alignment QC files from picard CollectMultipleMetrics. + * `*.metrics.txt`: Metrics file from MarkDuplicates. +* `bwa/mergedLibrary/picard_metrics/pdf/` + * `*.pdf`: Alignment QC plot files from picard CollectMultipleMetrics. +* `bwa/mergedLibrary/preseq/` + * `*.ccurve.txt`: Preseq expected future yield file. + +> **NB:** File names in the resulting directory (i.e. `bwa/mergedLibrary/`) will have the '`.mLb.`' suffix. - *Documentation*: - [picard](https://broadinstitute.github.io/picard/command-line-overview.html), [SAMtools](http://samtools.sourceforge.net/), [BEDTools](https://bedtools.readthedocs.io/en/latest/), [BAMTools](https://github.com/pezmaster31/bamtools/wiki/Tutorial_Toolkit_BamTools-1.0.pdf), [Pysam](https://pysam.readthedocs.io/en/latest/api.html), [Preseq](http://smithlabresearch.org/software/preseq/) +
- *Description*: - Picard MergeSamFiles and MarkDuplicates are used in combination to merge the alignments, and for the marking of duplicates, respectively. If you only have one library for any given replicate then the merging step isnt carried out because the library-level and merged library-level BAM files will be exactly the same. +[Picard MergeSamFiles and MarkDuplicates](https://broadinstitute.github.io/picard/command-line-overview.html) are used in combination to merge the alignments, and for the marking of duplicates, respectively. If you only have one library for any given replicate then the merging step isnt carried out because the library-level and merged library-level BAM files will be exactly the same. - Read duplicate marking is carried out using the Picard MarkDuplicates command. Duplicate reads are generally removed from the aligned reads to mitigate for fragments in the library that may have been sequenced more than once due to PCR biases. There is an option to keep duplicate reads with the `--keep_dups` parameter but its generally recommended to remove them to avoid the wrong interpretation of the results. A similar option has been provided to keep reads that are multi-mapped - `--keep_multi_map`. Other steps have been incorporated into the pipeline to filter the resulting alignments - see [`main README.md`](../README.md) for a more comprehensive listing, and the tools used at each step. +![MultiQC - Picard deduplication stats plot](images/mqc_picard_deduplication_plot.png) - A selection of alignment-based QC metrics generated by Picard CollectMultipleMetrics and MarkDuplicates will be included in the MultiQC report. +Read duplicate marking is carried out using the Picard MarkDuplicates command. Duplicate reads are generally removed from the aligned reads to mitigate for fragments in the library that may have been sequenced more than once due to PCR biases. There is an option to keep duplicate reads with the `--keep_dups` parameter but its generally recommended to remove them to avoid the wrong interpretation of the results. A similar option has been provided to keep reads that are multi-mapped - `--keep_multi_map`. Other steps have been incorporated into the pipeline to filter the resulting alignments - see [`main README.md`](../README.md) for a more comprehensive listing, and the tools used at each step. A selection of alignment-based QC metrics generated by Picard CollectMultipleMetrics and MarkDuplicates will be included in the MultiQC report. - ![MultiQC - Picard deduplication stats plot](images/mqc_picard_deduplication_plot.png) +![MultiQC - Picard insert size plot](images/mqc_picard_insert_size_plot.png) - ![MultiQC - Picard insert size plot](images/mqc_picard_insert_size_plot.png) +The [Preseq](http://smithlabresearch.org/software/preseq/) package is aimed at predicting and estimating the complexity of a genomic sequencing library, equivalent to predicting and estimating the number of redundant reads from a given sequencing depth and how many will be expected from additional sequencing using an initial sequencing experiment. The estimates can then be used to examine the utility of further sequencing, optimize the sequencing depth, or to screen multiple libraries to avoid low complexity samples. The dashed line shows a perfectly complex library where total reads = unique reads. Note that these are predictive numbers only, not absolute. The MultiQC plot can sometimes give extreme sequencing depth on the X axis - click and drag from the left side of the plot to zoom in on more realistic numbers. - The Preseq package is aimed at predicting and estimating the complexity of a genomic sequencing library, equivalent to predicting and estimating the number of redundant reads from a given sequencing depth and how many will be expected from additional sequencing using an initial sequencing experiment. The estimates can then be used to examine the utility of further sequencing, optimize the sequencing depth, or to screen multiple libraries to avoid low complexity samples. The dashed line shows a perfectly complex library where total reads = unique reads. +![MultiQC - Preseq library complexity plot](images/mqc_preseq_plot.png) - Note that these are predictive numbers only, not absolute. The MultiQC plot can sometimes give extreme sequencing depth on the X axis - click and drag from the left side of the plot to zoom in on more realistic numbers. +### Normalised bigWig files - ![MultiQC - Preseq library complexity plot](images/mqc_preseq_plot.png) +
+ Output files - File names in the resulting directory (i.e. `bwa/mergedLibrary/`) will have the '`.mLb.`' suffix to denote **m**erged **L**i**b**raries. +* `bwa/mergedLibrary/bigwig/` + * `*.bigWig`: Normalised bigWig files scaled to 1 million mapped reads. - *Output directories*: - * `bwa/mergedLibrary/` - Merged library-level, coordinate sorted `*.bam` files after the marking of duplicates, and filtering based on various criteria. The file suffix for the final filtered files will be `*.mLb.clN.*`. If you specify the `--saveAlignedIntermediates` parameter then two additional sets of files will be present. These represent the unfiltered alignments with duplicates marked (`*.mLb.mkD.*`), and in the case of paired-end datasets the filtered alignments before the removal of orphan read pairs (`*.mLb.flT.*`). - * `bwa/mergedLibrary/samtools_stats/` - SAMtools `*.flagstat`, `*.idxstats` and `*.stats` files generated from the alignment files. - * `bwa/mergedLibrary/picard_metrics/` - Alignment QC files from picard CollectMultipleMetrics and the metrics file from MarkDuplicates: `*_metrics` and `*.metrics.txt`, respectively. - * `bwa/mergedLibrary/picard_metrics/pdf/` - Alignment QC plot files in `*.pdf` format from picard CollectMultipleMetrics. - * `bwa/mergedLibrary/preseq/` - Preseq expected future yield file (`*.ccurve.txt`). +
-2. **Normalised bigWig files** +The [bigWig](https://genome.ucsc.edu/goldenpath/help/bigWig.html) format is in an indexed binary format useful for displaying dense, continuous data in Genome Browsers such as the [UCSC](https://genome.ucsc.edu/cgi-bin/hgTracks) and [IGV](http://software.broadinstitute.org/software/igv/). This mitigates the need to load the much larger BAM files for data visualisation purposes which will be slower and result in memory issues. The coverage values represented in the bigWig file can also be normalised in order to be able to compare the coverage across multiple samples - this is not possible with BAM files. The bigWig format is also supported by various bioinformatics software for downstream processing such as meta-profile plotting. - *Documentation*: - [BEDTools](https://bedtools.readthedocs.io/en/latest/), [bedGraphToBigWig](https://genome.ucsc.edu/goldenpath/help/bigWig.html#Ex3) +### ChIP-seq QC metrics - *Description*: - The [bigWig](https://genome.ucsc.edu/goldenpath/help/bigWig.html) format is in an indexed binary format useful for displaying dense, continuous data in Genome Browsers such as the [UCSC](https://genome.ucsc.edu/cgi-bin/hgTracks) and [IGV](http://software.broadinstitute.org/software/igv/). This mitigates the need to load the much larger BAM files for data visualisation purposes which will be slower and result in memory issues. The coverage values represented in the bigWig file can also be normalised in order to be able to compare the coverage across multiple samples - this is not possible with BAM files. The bigWig format is also supported by various bioinformatics software for downstream processing such as meta-profile plotting. +
+ Output files - *Output directories*: - * `bwa/mergedLibrary/bigwig/` - Normalised `*.bigWig` files scaled to 1 million mapped reads. +* `bwa/mergedLibrary/phantompeakqualtools/` + * `*.spp.out`, `*.spp.pdf`: phantompeakqualtools output files. + * `*_mqc.tsv`: MultiQC custom content files. +* `bwa/mergedLibrary/deepTools/plotFingerprint/` + * `*.plotFingerprint.pdf`, `*.plotFingerprint.qcmetrics.txt`, `*.plotFingerprint.raw.txt`: plotFingerprint output files. +* `bwa/mergedLibrary/deepTools/plotProfile/` + * `*.computeMatrix.mat.gz`, `*.computeMatrix.vals.mat.tab`, `*.plotProfile.pdf`, `*.plotProfile.tab`, `*.plotHeatmap.pdf`, `*.plotHeatmap.mat.tab`: plotProfile output files. -3. **ChIP-seq QC metrics** +
- *Documentation*: - [phantompeakqualtools](https://github.com/kundajelab/phantompeakqualtools), [deepTools](https://deeptools.readthedocs.io/en/develop/content/list_of_tools.html) +[phantompeakqualtools](https://github.com/kundajelab/phantompeakqualtools) plots the strand cross-correlation of aligned reads for each sample. In a strand cross-correlation plot, reads are shifted in the direction of the strand they map to by an increasing number of base pairs and the Pearson correlation between the per-position read count for each strand is calculated. Two cross-correlation peaks are usually observed in a ChIP experiment, one corresponding to the read length ("phantom" peak) and one to the average fragment length of the library. The absolute and relative height of the two peaks are useful determinants of the success of a ChIP-seq experiment. A high-quality IP is characterized by a ChIP peak that is much higher than the "phantom" peak, while often very small or no such peak is seen in failed experiments. - *Description*: - Phantompeakqualtools plots the strand cross-correlation of aligned reads for each sample. In a strand cross-correlation plot, reads are shifted in the direction of the strand they map to by an increasing number of base pairs and the Pearson correlation between the per-position read count for each strand is calculated. Two cross-correlation peaks are usually observed in a ChIP experiment, one corresponding to the read length ("phantom" peak) and one to the average fragment length of the library. The absolute and relative height of the two peaks are useful determinants of the success of a ChIP-seq experiment. A high-quality IP is characterized by a ChIP peak that is much higher than the "phantom" peak, while often very small or no such peak is seen in failed experiments. +![MultiQC - spp strand-correlation plot](images/mqc_spp_strand_correlation_plot.png) - ![MultiQC - spp strand-correlation plot](images/mqc_spp_strand_correlation_plot.png) +Normalized strand coefficient (NSC) is the normalized ratio between the fragment-length cross-correlation peak and the background cross-correlation. NSC values range from a minimum of 1 to larger positive numbers. 1.1 is the critical threshold. Datasets with NSC values much less than 1.1 (< 1.05) tend to have low signal to noise or few peaks (this could be biological e.g. a factor that truly binds only a few sites in a particular tissue type OR it could be due to poor quality). ENCODE cut-off: **NSC > 1.05**. - Normalized strand coefficient (NSC) is the normalized ratio between the fragment-length cross-correlation peak and the background cross-correlation. NSC values range from a minimum of 1 to larger positive numbers. 1.1 is the critical threshold. Datasets with NSC values much less than 1.1 (< 1.05) tend to have low signal to noise or few peaks (this could be biological e.g. a factor that truly binds only a few sites in a particular tissue type OR it could be due to poor quality). ENCODE cut-off: **NSC > 1.05**. +![MultiQC - spp NSC plot](images/mqc_spp_nsc_plot.png) - ![MultiQC - spp NSC plot](images/mqc_spp_nsc_plot.png) +Relative strand correlation (RSC) is the ratio between the fragment-length peak and the read-length peak. RSC values range from 0 to larger positive values. 1 is the critical threshold. RSC values significantly lower than 1 (< 0.8) tend to have low signal to noise. The low scores can be due to failed and poor quality ChIP, low read sequence quality and hence lots of mis-mappings, shallow sequencing depth (significantly below saturation) or a combination of these. Like the NSC, datasets with few binding sites (< 200), which is biologically justifiable, also show low RSC scores. ENCODE cut-off: **RSC > 0.8**. - Relative strand correlation (RSC) is the ratio between the fragment-length peak and the read-length peak. RSC values range from 0 to larger positive values. 1 is the critical threshold. RSC values significantly lower than 1 (< 0.8) tend to have low signal to noise. The low scores can be due to failed and poor quality ChIP, low read sequence quality and hence lots of mis-mappings, shallow sequencing depth (significantly below saturation) or a combination of these. Like the NSC, datasets with few binding sites (< 200), which is biologically justifiable, also show low RSC scores. ENCODE cut-off: **RSC > 0.8**. +![MultiQC - spp RSC plot](images/mqc_spp_rsc_plot.png) - ![MultiQC - spp RSC plot](images/mqc_spp_rsc_plot.png) +[deepTools](https://deeptools.readthedocs.io/en/develop/content/list_of_tools.html) plotFingerprint is a useful QC for ChIP-seq data in order to see the relative enrichment of the IP samples with respect to the controls on a genome-wide basis. The results, however, are expected to look different for example when comparing narrow marks such as transcription factors and broader marks such as histone modifications (see [plotFingerprint docs](https://deeptools.readthedocs.io/en/develop/content/tools/plotFingerprint.html)). - deepTools plotFingerprint is a useful QC for ChIP-seq data in order to see the relative enrichment of the IP samples with respect to the controls on a genome-wide basis. The results, however, are expected to look different for example when comparing narrow marks such as transcription factors and broader marks such as histone modifications (see [plotFingerprint docs](https://deeptools.readthedocs.io/en/develop/content/tools/plotFingerprint.html)). +![MultiQC - deepTools plotFingerprint plot](images/mqc_deeptools_plotFingerprint_plot.png) - ![MultiQC - deepTools plotFingerprint plot](images/mqc_deeptools_plotFingerprint_plot.png) +The results from deepTools plotProfile gives you a quick visualisation for the genome-wide enrichment of your samples at the TSS, and across the gene body. During the downstream analysis, you may want to refine the features/genes used to generate these plots in order to see a more specific condition-related effect. - The results from deepTools plotProfile gives you a quick visualisation for the genome-wide enrichment of your samples at the TSS, and across the gene body. During the downstream analysis, you may want to refine the features/genes used to generate these plots in order to see a more specific condition-related effect. +![MultiQC - deepTools plotProfile plot](images/mqc_deeptools_plotProfile_plot.png) - ![MultiQC - deepTools plotProfile plot](images/mqc_deeptools_plotProfile_plot.png) +### Call peaks - *Output directories*: - * `bwa/mergedLibrary/phantompeakqualtools/` - * Output files: `*.spp.out`, `*.spp.pdf`. - * MultiQC custom content files: `*_spp_correlation_mqc.tsv`, `*_spp_nsc_mqc.tsv`, `*_spp_rsc_mqc.tsv`. - * `bwa/mergedLibrary/deepTools/plotFingerprint/` - * Output files: `*.plotFingerprint.pdf`, `*.plotFingerprint.qcmetrics.txt`, `*.plotFingerprint.raw.txt` - * `bwa/mergedLibrary/deepTools/plotProfile/` - * Output files: `*.computeMatrix.mat.gz`, `*.computeMatrix.vals.mat.gz`, `*.plotProfile.pdf`, `*.plotProfile.tab`. +
+ Output files -4. **Call peaks** +* `bwa/mergedLibrary/macs//` + * `*.xls`, `*.broadPeak` or `*.narrowPeak`, `*.gappedPeak`, `*summits.bed`: MACS2 output files - the files generated will depend on whether MACS2 has been run in *narrowPeak* or *broadPeak* mode. + * `*.annotatePeaks.txt`: HOMER peak-to-gene annotation file. +* `bwa/mergedLibrary/macs//qc/` + * `macs_peak.plots.pdf`: QC plots for MACS2 peaks. + * `macs_annotatePeaks.plots.pdf`: QC plots for peak-to-gene feature annotation. + * `*.FRiP_mqc.tsv`, `*.count_mqc.tsv`, `macs_annotatePeaks.summary_mqc.tsv`: MultiQC custom-content files for FRiP score, peak count and peak-to-gene ratios. - *Documentation*: - [MACS2](https://github.com/taoliu/MACS), [HOMER](http://homer.ucsd.edu/homer/ngs/annotation.html) +> **NB:** `` in the directory structure above corresponds to the type of peak that you have specified to call with MACS2 i.e. `broadPeak` or `narrowPeak`. If you so wish, you can call both narrow and broad peaks without redoing the preceding steps in the pipeline such as the alignment and filtering. For example, if you already have broad peaks then just add `--narrow_peak -resume` to the command you used to run the pipeline, and these will be called too! However, resuming the pipeline will only be possible if you have not deleted the `work/` directory generated by the pipeline. - *Description*: - MACS2 is one of the most popular peak-calling algorithms for ChIP-seq data. By default, the peaks are called with the MACS2 `--broad` parameter. If, however, you would like to call narrow peaks then please provide the `--narrow_peak` parameter when running the pipeline. See [MACS2 outputs](https://github.com/taoliu/MACS#output-files) for a description of the output files generated by MACS2. +
- ![MultiQC - MACS2 total peak count plot](images/mqc_macs2_peak_count_plot.png) +[MACS2](https://github.com/taoliu/MACS) is one of the most popular peak-calling algorithms for ChIP-seq data. By default, the peaks are called with the MACS2 `--broad` parameter. If, however, you would like to call narrow peaks then please provide the `--narrow_peak` parameter when running the pipeline. See [MACS2 outputs](https://github.com/taoliu/MACS#output-files) for a description of the output files generated by MACS2. - [HOMER annotatePeaks.pl](http://homer.ucsd.edu/homer/ngs/annotation.html) is used to annotate the peaks relative to known genomic features. HOMER is able to use the `--gtf` annotation file which is provided to the pipeline. Please note that some of the output columns will be blank because the annotation is not provided using HOMER's in-built database format. However, the more important fields required for downstream analysis will be populated i.e. *Annotation*, *Distance to TSS* and *Nearest Promoter ID*. +![MultiQC - MACS2 total peak count plot](images/mqc_macs2_peak_count_plot.png) - ![MultiQC - HOMER annotatePeaks peak-to-gene feature ratio plot](images/mqc_annotatePeaks_feature_percentage_plot.png) +[HOMER annotatePeaks.pl](http://homer.ucsd.edu/homer/ngs/annotation.html) is used to annotate the peaks relative to known genomic features. HOMER is able to use the `--gtf` annotation file which is provided to the pipeline. Please note that some of the output columns will be blank because the annotation is not provided using HOMER's in-built database format. However, the more important fields required for downstream analysis will be populated i.e. *Annotation*, *Distance to TSS* and *Nearest Promoter ID*. - Various QC plots per sample including number of peaks, fold-change distribution, [FRiP score](https://genome.cshlp.org/content/22/9/1813.full.pdf+html) and peak-to-gene feature annotation are also generated by the pipeline. Where possible these have been integrated into the MultiQC report. +![MultiQC - HOMER annotatePeaks peak-to-gene feature ratio plot](images/mqc_annotatePeaks_feature_percentage_plot.png) - ![MultiQC - MACS2 peaks FRiP score plot](images/mqc_frip_score_plot.png) +Various QC plots per sample including number of peaks, fold-change distribution, [FRiP score](https://genome.cshlp.org/content/22/9/1813.full.pdf+html) and peak-to-gene feature annotation are also generated by the pipeline. Where possible these have been integrated into the MultiQC report. - `` in the directory structure below corresponds to the type of peak that you have specified to call with MACS2 i.e. `broadPeak` or `narrowPeak`. If you so wish, you can call both narrow and broad peaks without redoing the preceding steps in the pipeline such as the alignment and filtering. For example, if you already have broad peaks then just add `--narrow_peak -resume` to the command you used to run the pipeline, and these will be called too! However, resuming the pipeline will only be possible if you have not deleted the `work/` directory generated by the pipeline. +![MultiQC - MACS2 peaks FRiP score plot](images/mqc_frip_score_plot.png) - *Output directories*: - * `bwa/mergedLibrary/macs//` - * MACS2 output files: `*.xls`, `*.broadPeak` or `*.narrowPeak`, `*.gappedPeak` and `*summits.bed`. - The files generated will depend on whether MACS2 has been run in *narrowPeak* or *broadPeak* mode. - * HOMER peak-to-gene annotation file: `*.annotatePeaks.txt`. - * `bwa/mergedLibrary/macs//qc/` - * QC plots for MACS2 peaks: `macs_peak.plots.pdf` - * QC plots for peak-to-gene feature annotation: `macs_annotatePeaks.plots.pdf` - * MultiQC custom-content files for FRiP score, peak count and peak-to-gene ratios: `*.FRiP_mqc.tsv`, `*.count_mqc.tsv` and `macs_annotatePeaks.summary_mqc.tsv` respectively. +### Create and quantify consensus set of peaks -5. **Create consensus set of peaks** +
+ Output files - *Documentation*: - [BEDTools](https://bedtools.readthedocs.io/en/latest/) +* `bwa/mergedLibrary/macs//consensus/` + * `*.bed`: Consensus peak-set across all samples in BED format. + * `*.saf`: Consensus peak-set across all samples in SAF format. Required by featureCounts for read quantification. + * `*.featureCounts.txt`: Read counts across all samples relative to consensus peak-set. + * `*.annotatePeaks.txt`: HOMER peak-to-gene annotation file for consensus peaks. + * `*.boolean.annotatePeaks.txt`: Spreadsheet representation of consensus peak-set across samples **with** gene annotation columns. The columns from individual peak files are included in this file along with the ability to filter peaks based on their presence or absence in multiple replicates/conditions. + * `*.boolean.txt`: Spreadsheet representation of consensus peak-set across samples **without** gene annotation columns. Same as file above but without annotation columns. + * `*.boolean.intersect.plot.pdf`, `*.boolean.intersect.txt`: [UpSetR](https://cran.r-project.org/web/packages/UpSetR/README.html) files to illustrate peak intersection. - *Description*: - In order to perform the differential binding analysis we need to be able to carry out the read quantification for the same intervals across **all** of the samples in the experiment. To this end, the individual peak-sets called per sample have to be merged together in order to create a consensus set of peaks. +
- Using the consensus peaks it is possible to assess the degree of overlap between the peaks from a set of samples e.g. *Which consensus peaks contain peaks that are common/unique to a given set of samples?*. This may be useful for downstream filtering of peaks based on whether they are called in multiple replicates/conditions. Please note that it is possible for a consensus peak to contain multiple peaks from the same sample. Unfortunately, this is sample-dependent but the files generated by the pipeline do have columns that report such instances and allow you to factor them into any further analysis. +In order to perform the differential binding analysis we need to be able to carry out the read quantification for the same intervals across **all** of the samples in the experiment. To this end, the individual peak-sets called per sample have to be merged together in order to create a consensus set of peaks. - By default, the peak-sets are not filtered, therefore, the consensus peaks will be generated across the union set of peaks from all samples. However, you can increment the `--min_reps_consensus` parameter appropriately if you are confident you have good reproducibility amongst your replicates to create a "reproducible" set of consensus of peaks. In future iterations of the pipeline more formal analyses such as [IDR](https://projecteuclid.org/euclid.aoas/1318514284) may be implemented to obtain reproducible and high confidence peak-sets with which to perform this sort of analysis. +Using the consensus peaks it is possible to assess the degree of overlap between the peaks from a set of samples e.g. *Which consensus peaks contain peaks that are common/unique to a given set of samples?*. This may be useful for downstream filtering of peaks based on whether they are called in multiple replicates/conditions. Please note that it is possible for a consensus peak to contain multiple peaks from the same sample. Unfortunately, this is sample-dependent but the files generated by the pipeline do have columns that report such instances and allow you to factor them into any further analysis. - ![R - UpSetR peak intersection plot](images/r_upsetr_intersect_plot.png) +![R - UpSetR peak intersection plot](images/r_upsetr_intersect_plot.png) - *Output directories*: - * `bwa/mergedLibrary/macs//consensus/` - * Consensus peak-set across all samples in `*.bed` format. - * Consensus peak-set across all samples in `*.saf` format. Required by featureCounts for read quantification. - * HOMER `*.annotatePeaks.txt` peak-to-gene annotation file for consensus peaks. - * Spreadsheet representation of consensus peak-set across samples **with** gene annotation columns: `*.boolean.annotatePeaks.txt`. - The columns from individual peak files are included in this file along with the ability to filter peaks based on their presence or absence in multiple replicates/conditions. - * Spreadsheet representation of consensus peak-set across samples **without** gene annotation columns: `*.boolean.txt`. - Same as file above but without annotation columns. - * [UpSetR](https://cran.r-project.org/web/packages/UpSetR/README.html) files to illustrate peak intersection: `*.boolean.intersect.plot.pdf` and `*.boolean.intersect.txt`. +By default, the peak-sets are not filtered, therefore, the consensus peaks will be generated across the union set of peaks from all samples. However, you can increment the `--min_reps_consensus` parameter appropriately if you are confident you have good reproducibility amongst your replicates to create a "reproducible" set of consensus of peaks. In future iterations of the pipeline more formal analyses such as [IDR](https://projecteuclid.org/euclid.aoas/1318514284) may be implemented to obtain reproducible and high confidence peak-sets with which to perform this sort of analysis. -6. **Read counting and differential binding analysis** +The [featureCounts](http://bioinf.wehi.edu.au/featureCounts/) tool is used to count the number of reads relative to the consensus peak-set across all of the samples. This essentially generates a file containing a matrix where the rows represent the consensus intervals, the columns represent all of the samples in the experiment, and the values represent the raw read counts. - *Documentation*: - [featureCounts](http://bioinf.wehi.edu.au/featureCounts/), [DESeq2](https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html), [R](https://www.r-project.org/) +![MultiQC - featureCounts consensus peak read assignment plot](images/mqc_featureCounts_assignment_plot.png) - *Description*: - The featureCounts tool is used to count the number of reads relative to the consensus peak-set across all of the samples. This essentially generates a file containing a matrix where the rows represent the consensus intervals, the columns represent all of the samples in the experiment, and the values represent the raw read counts. +### Read counting and differential binding analysis - ![MultiQC - featureCounts consensus peak read assignment plot](images/mqc_featureCounts_assignment_plot.png) +
+ Output files - DESeq2 is more commonly used to perform differential expression analysis for RNA-seq datasets. However, it can also be used for ChIP-seq differential binding analysis, in which case you can imagine that instead of counts per gene for RNA-seq data we now have counts per bound region. +* `bwa/mergedLibrary/macs//consensus//deseq2/` + * `*.results.txt`: Spreadsheet containing differential binding results across all consensus peaks and all comparisons. + * `*.plots.pdf`: File containing PCA and hierarchical clustering plots. + * `*.log`: Log file with information for number of differentially bound intervals at different FDR and fold-change thresholds for each comparison. + * `*.dds.rld.RData`: File containing R `dds` and `rld` objects generated by DESeq2. + * `R_sessionInfo.log`: File containing information about R, the OS and attached or loaded packages. +* `bwa/mergedLibrary/macs//consensus///` + * `*.results.txt`: Spreadsheet containing comparison-specific DESeq2 output for differential binding results across all peaks. + * `*FDR0.01.results.txt`, `*FDR0.05.results.txt`: Subset of above file for peaks that pass FDR <= 0.01 and FDR <= 0.05. + * `*FDR0.01.results.bed`, `*FDR0.05.results.bed`: BED files for peaks that pass FDR <= 0.01 and FDR <= 0.05. + * `*deseq2.plots.pdf`: MA, Volcano, clustering and scatterplots at FDR <= 0.01 and FDR <= 0.05. +* `bwa/mergedLibrary/macs//consensus//sizeFactors/` + * `*.txt`, `*.RData`: Files containing DESeq2 sizeFactors per sample. - This pipeline uses a standardised DESeq2 analysis script to get an idea of the reproducibility within the experiment, and to assess the overall differential binding. Please note that this will not suit every experimental design, and if there are other problems with the experiment then it may not work as well as expected. +
- ![MultiQC - DESeq2 PCA plot](images/mqc_deseq2_pca_plot.png) +[DESeq2](https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html) is more commonly used to perform differential expression analysis for RNA-seq datasets. However, it can also be used for ChIP-seq differential binding analysis, in which case you can imagine that instead of counts per gene for RNA-seq data we now have counts per bound region. - ![MultiQC - DESeq2 sample similarity plot](images/mqc_deseq2_sample_similarity_plot.png) +This pipeline uses a standardised DESeq2 analysis script to get an idea of the reproducibility within the experiment, and to assess the overall differential binding. Please note that this will not suit every experimental design, and if there are other problems with the experiment then it may not work as well as expected. For larger experiments, it may be recommended to use the `vst` transformation instead of the default `rlog` option. You can do this by providing the `--deseq2_vst` parameter to the pipeline. See [DESeq2 docs](http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#data-transformations-and-visualization) for a more detailed explanation. - By default, all possible pairwise comparisons across the groups from a particular antibody (as defined in [`design.csv`](usage.md#--design)) are performed. The DESeq2 results are generated by the pipeline in various ways. You can load up the results across all of the comparisons in a single spreadsheet, or individual folders will also be created that contain the results specific to a particular comparison. For the latter, additional files will also be generated where the intervals have been pre-filtered based on a couple of standard FDR thresholds. Please see [DESeq2 output](http://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#differential-expression-analysis) for a description of the columns generated by DESeq2. +![MultiQC - DESeq2 PCA plot](images/mqc_deseq2_pca_plot.png) - ![R - DESeq2 MA plot](images/r_deseq2_ma_plot.png) +![MultiQC - DESeq2 sample similarity plot](images/mqc_deseq2_sample_similarity_plot.png) - ![R - DESeq2 Volcano plot](images/r_deseq2_volcano_plot.png) +By default, all possible pairwise comparisons across the groups from a particular antibody (as defined in [`design.csv`](usage.md#--design)) are performed. The DESeq2 results are generated by the pipeline in various ways. You can load up the results across all of the comparisons in a single spreadsheet, or individual folders will also be created that contain the results specific to a particular comparison. For the latter, additional files will also be generated where the intervals have been pre-filtered based on a couple of standard FDR thresholds. Please see [DESeq2 output](http://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#differential-expression-analysis) for a description of the columns generated by DESeq2. - *Output directories*: - * `bwa/mergedLibrary/macs//consensus//deseq2/` - * `.featureCounts.txt` file for read counts across all samples relative to consensus peak-set. - * Differential binding `*.results.txt` spreadsheet containing results across all consensus peaks and all comparisons. - * `*.plots.pdf` file for PCA and hierarchical clustering. - * `*.log` file with information for number of differentially bound intervals at different FDR and fold-change thresholds for each comparison. - * `*.dds.rld.RData` file containing R `dds` and `rld` objects generated by DESeq2. - * `R_sessionInfo.log` file containing information about R, the OS and attached or loaded packages. - * `bwa/mergedLibrary/macs//consensus///` - * `*.results.txt` spreadsheet containing comparison-specific DESeq2 output for differential binding results across all peaks. - * Subset of above file for peaks that pass FDR <= 0.01 (`*FDR0.01.results.txt`) and FDR <= 0.05 (`*FDR0.05.results.txt`). - * BED files for peaks that pass FDR <= 0.01 (`*FDR0.01.results.bed`) and FDR <= 0.05 (`*FDR0.05.results.bed`). - * MA, Volcano, clustering and scatterplots at FDR <= 0.01 and FDR <= 0.05: `*deseq2.plots.pdf`. - * `bwa/mergedLibrary/macs//consensus//sizeFactors/` - Files containing DESeq2 sizeFactors per sample: `*.txt` and `*.RData`. +

+ R - DESeq2 MA plot +

+ +

+ R - DESeq2 Volcano plot +

## Aggregate analysis -1. **Present QC for the raw read, alignment, peak and differential binding results** +### Present QC for the raw read, alignment, peak and differential binding results + +
+ Output files + +* `multiqc//` + * `multiqc_report.html`: A standalone HTML file that can be viewed in your web browser. + * `multiqc_data/`: Directory containing parsed statistics from the different tools used in the pipeline. + * `multiqc_plots/`: Directory containing static images from the report in various formats. - *Documentation*: - [MultiQC](https://multiqc.info/docs/) +
- *Description*: - MultiQC is a visualisation tool that generates a single HTML report summarising all samples in your project. Most of the pipeline QC results are visualised in the report and further statistics are available within the report data directory. +[MultiQC](https://multiqc.info/docs/) is a visualisation tool that generates a single HTML report summarising all samples in your project. Most of the pipeline QC results are visualised in the report and further statistics are available within the report data directory. - Results generated by MultiQC collate pipeline QC from FastQC, TrimGalore, samtools flagstat, samtools idxstats, samtools stats, picard CollectMultipleMetrics, picard MarkDuplicates, Preseq, deepTools plotProfile, deepTools plotFingerprint, phantompeakqualtools and featureCounts. The default [`multiqc config file`](../assets/multiqc/multiqc_config.yaml) also contains the provision for loading custom-content to report peak counts, FRiP scores, peak-to-gene annnotation proportions, spp NSC coefficient, spp RSC coefficient, PCA plots and sample-similarity heatmaps. +Results generated by MultiQC collate pipeline QC from FastQC, TrimGalore, samtools flagstat, samtools idxstats, samtools stats, picard CollectMultipleMetrics, picard MarkDuplicates, Preseq, deepTools plotProfile, deepTools plotFingerprint, phantompeakqualtools and featureCounts. The default [`multiqc config file`](../assets/multiqc_config.yaml) also contains the provision for loading custom-content to report peak counts, FRiP scores, peak-to-gene annnotation proportions, spp NSC coefficient, spp RSC coefficient, PCA plots and sample-similarity heatmaps. - The pipeline has special steps which also allow the software versions to be reported in the MultiQC output for future traceability. For more information about how to use MultiQC reports, see . +The pipeline has special steps which also allow the software versions to be reported in the MultiQC output for future traceability. For more information about how to use MultiQC reports, see . - *Output directories*: - * `multiqc//` - * `multiqc_report.html` - a standalone HTML file that can be viewed in your web browser. - * `multiqc_data/` - directory containing parsed statistics from the different tools used in the pipeline. - * `multiqc_plots/` - directory containing static images from the report in various formats. +### Create IGV session file -2. **Create IGV session file** +
+ Output files - *Documentation*: - [IGV](https://software.broadinstitute.org/software/igv/UserGuide) +* `igv//` + * `igv_session.xml`: Session file that can be directly loaded into IGV. + * `igv_files.txt`: File containing a listing of the files used to create the IGV session. - *Description*: - An IGV session file will be created at the end of the pipeline containing the normalised bigWig tracks, per-sample peaks, consensus peaks and differential sites. This avoids having to load all of the data individually into IGV for visualisation. +
- The genome fasta file required for the IGV session will be the same as the one that was provided to the pipeline. This will be copied into `reference_genome/` to overcome any loading issues. If you prefer to use another path or an in-built genome provided by IGV just change the `genome` entry in the second-line of the session file. +An [IGV](https://software.broadinstitute.org/software/igv/UserGuide) session file will be created at the end of the pipeline containing the normalised bigWig tracks, per-sample peaks, consensus peaks and differential sites. This avoids having to load all of the data individually into IGV for visualisation. - The file paths in the IGV session file will only work if the results are kept in the same place on your storage. If the results are moved or for example, if you prefer to load the data over the web then just replace the file paths with others that are more appropriate. +The genome fasta file required for the IGV session will be the same as the one that was provided to the pipeline. This will be copied into `genome/` to overcome any loading issues. If you prefer to use another path or an in-built genome provided by IGV just change the `genome` entry in the second-line of the session file. - Once installed, open IGV, go to `File > Open Session` and select the `igv_session.xml` file for loading. +The file paths in the IGV session file will only work if the results are kept in the same place on your storage. If the results are moved or for example, if you prefer to load the data over the web then just replace the file paths with others that are more appropriate. - >NB: If you are not using an in-built genome provided by IGV you will need to load the annotation yourself e.g. in .gtf and/or .bed format. +Once installed, open IGV, go to `File > Open Session` and select the `igv_session.xml` file for loading. - ![IGV screenshot](images/igv_screenshot.png) +![IGV screenshot](images/igv_screenshot.png) - *Output directories*: - * `igv//` - * `igv_session.xml` file. - * `igv_files.txt` file containing a listing of the files used to create the IGV session, and their allocated colours. +> **NB:** If you are not using an in-built genome provided by IGV you will need to load the annotation yourself e.g. in .gtf and/or .bed format. ## Other results -1. **Reference genome files** +### Reference genome files + +
+ Output files + +* `genome/` + * A number of genome-specific files are generated by the pipeline in order to aid in the filtering of the data, and because they are required by standard tools such as BEDTools. These can be found in this directory along with the genome fasta file which is required by IGV. If using a genome from AWS iGenomes and if it exists a `README.txt` file containing information about the annotation version will also be saved in this directory. +* `genome/BWAIndex/` + * If the `--save_reference` parameter is provided then the alignment indices generated by the pipeline will be saved in this directory. This can be quite a time-consuming process so it permits their reuse for future runs of the pipeline or for other purposes. - *Documentation*: - [BWA](https://sourceforge.net/projects/bio-bwa/files/), [BEDTools](https://bedtools.readthedocs.io/en/latest/), [SAMtools](http://samtools.sourceforge.net/) +
- *Description*: - Reference genome-specific files can be useful to keep for the downstream processing of the results. +Reference genome-specific files can be useful to keep for the downstream processing of the results. - *Output directories*: - * `reference_genome/` - A number of genome-specific files are generated by the pipeline in order to aid in the filtering of the data, and because they are required by standard tools such as BEDTools. These can be found in this directory along with the genome fasta file which is required by IGV. - * `reference_genome/BWAIndex/` - If the `--save_reference` parameter is provided then the alignment indices generated by the pipeline will be saved in this directory. This can be quite a time-consuming process so it permits their reuse for future runs of the pipeline or for other purposes. +### Pipeline information -2. **Pipeline information** +
+ Output files - *Documentation*: - [Nextflow!](https://www.nextflow.io/docs/latest/tracing.html) +* `pipeline_info/` + * `pipeline_report.html`, `pipeline_report.txt`, `software_versions.csv`: Reports generated by the pipeline. + * `execution_report.html`, `execution_timeline.html`, `execution_trace.txt`, `pipeline_dag.svg`: Reports generated by Nextflow. + * `design_reads.csv`, `design_controls.csv`: Reformatted design files used as input to the pipeline. + * `results_description.html`: Documentation for interpretation of results in HTML format. - *Description*: - Nextflow provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to trouble-shoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage. +
- *Output directories*: - * `pipeline_info/` - * Reports generated by the pipeline - `pipeline_report.html`, `pipeline_report.txt` and `software_versions.csv`. - * Reports generated by Nextflow - `execution_report.html`, `execution_timeline.html`, `execution_trace.txt` and `pipeline_dag.svg`. - * Reformatted design files used as input to the pipeline - `design_reads.csv` and `design_controls.csv`. - * `Documentation/` - Documentation for interpretation of results in HTML format - `results_description.html`. +[Nextflow](https://www.nextflow.io/docs/latest/tracing.html) provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to trouble-shoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage. diff --git a/docs/usage.md b/docs/usage.md index 76f91076..44d0b95b 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -2,69 +2,75 @@ ## Table of contents - - * [Table of contents](#table-of-contents) * [Introduction](#introduction) * [Running the pipeline](#running-the-pipeline) - * [Updating the pipeline](#updating-the-pipeline) - * [Reproducibility](#reproducibility) + * [Updating the pipeline](#updating-the-pipeline) + * [Reproducibility](#reproducibility) * [Main arguments](#main-arguments) - * [`-profile`](#-profile) - * [`--input`](#--input) + * [`-profile`](#-profile) + * [`--input`](#--input) * [Generic arguments](#generic-arguments) - * [`--single_end`](#--single_end) - * [`--seq_center`](#--seq_center) - * [`--fragment_size`](#--fragment_size) - * [`--fingerprint_bins`](#--fingerprint_bins) + * [`--single_end`](#--single_end) + * [`--seq_center`](#--seq_center) + * [`--fragment_size`](#--fragment_size) + * [`--fingerprint_bins`](#--fingerprint_bins) * [Reference genomes](#reference-genomes) - * [`--genome` (using iGenomes)](#--genome-using-igenomes) - * [`--fasta`](#--fasta) - * [`--gtf`](#--gtf) - * [`--bwa_index`](#--bwa_index) - * [`--gene_bed`](#--gene_bed) - * [`--tss_bed`](#--tss_bed) - * [`--macs_gsize`](#--macs_gsize) - * [`--blacklist`](#--blacklist) - * [`--save_reference`](#--save_reference) - * [`--igenomes_ignore`](#--igenomes_ignore) + * [`--genome` (using iGenomes)](#--genome-using-igenomes) + * [`--fasta`](#--fasta) + * [`--gtf`](#--gtf) + * [`--bwa_index`](#--bwa_index) + * [`--gene_bed`](#--gene_bed) + * [`--macs_gsize`](#--macs_gsize) + * [`--blacklist`](#--blacklist) + * [`--save_reference`](#--save_reference) + * [`--igenomes_ignore`](#--igenomes_ignore) * [Adapter trimming](#adapter-trimming) - * [`--skip_trimming`](#--skip_trimming) - * [`--save_trimmed`](#--save_trimmed) + * [`--skip_trimming`](#--skip_trimming) + * [`--save_trimmed`](#--save_trimmed) * [Alignments](#alignments) - * [`--keep_dups`](#--keep_dups) - * [`--keep_multi_map`](#--keep_multi_map) - * [`--save_align_intermeds`](#--save_align_intermeds) + * [`--bwa_min_score`](#--bwa_min_score) + * [`--keep_dups`](#--keep_dups) + * [`--keep_multi_map`](#--keep_multi_map) + * [`--save_align_intermeds`](#--save_align_intermeds) * [Peaks](#peaks) - * [`--narrow_peak`](#--narrow_peak) - * [`--broad_cutoff`](#--broad_cutoff) - * [`--min_reps_consensus`](#--min_reps_consensus) - * [`--save_macs_pileup`](#--save_macs_pileup) - * [`--skip_diff_analysis`](#--skip_diff_analysis) + * [`--narrow_peak`](#--narrow_peak) + * [`--broad_cutoff`](#--broad_cutoff) + * [`--macs_fdr`](#--macs_fdr) + * [`--macs_pvalue`](#--macs_pvalue) + * [`--min_reps_consensus`](#--min_reps_consensus) + * [`--save_macs_pileup`](#--save_macs_pileup) + * [`--skip_peak_qc`](#--skip_peak_qc) + * [`--skip_peak_annotation`](#--skip_peak_annotation) + * [`--skip_consensus_peaks`](#--skip_consensus_peaks) +* [Differential analysis](#differential_analysis) + * [`--deseq2_vst`](#--deseq2_vst) + * [`--skip_diff_analysis`](#--skip_diff_analysis) * [Skipping QC steps](#skipping-qc-steps) * [Job resources](#job-resources) - * [Automatic resubmission](#automatic-resubmission) - * [Custom resource requests](#custom-resource-requests) + * [Automatic resubmission](#automatic-resubmission) + * [Custom resource requests](#custom-resource-requests) * [AWS Batch specific parameters](#aws-batch-specific-parameters) - * [`--awsqueue`](#--awsqueue) - * [`--awsregion`](#--awsregion) + * [`--awsqueue`](#--awsqueue) + * [`--awsregion`](#--awsregion) + * [`--awscli`](#--awscli) * [Other command line parameters](#other-command-line-parameters) - * [`--outdir`](#--outdir) - * [`--email`](#--email) - * [`--email_on_fail`](#--email_on_fail) - * [`--max_multiqc_email_size`](#--max_multiqc_email_size) - * [`-name`](#-name) - * [`-resume`](#-resume) - * [`-c`](#-c) - * [`--custom_config_version`](#--custom_config_version) - * [`--custom_config_base`](#--custom_config_base) - * [`--max_memory`](#--max_memory) - * [`--max_time`](#--max_time) - * [`--max_cpus`](#--max_cpus) - * [`--plaintext_email`](#--plaintext_email) - * [`--monochrome_logs`](#--monochrome_logs) - * [`--multiqc_config`](#--multiqc_config) - + * [`--outdir`](#--outdir) + * [`--publish_dir_mode`](#--publish_dir_mode) + * [`--email`](#--email) + * [`--email_on_fail`](#--email_on_fail) + * [`--max_multiqc_email_size`](#--max_multiqc_email_size) + * [`-name`](#-name) + * [`-resume`](#-resume) + * [`-c`](#-c) + * [`--custom_config_version`](#--custom_config_version) + * [`--custom_config_base`](#--custom_config_base) + * [`--max_memory`](#--max_memory) + * [`--max_time`](#--max_time) + * [`--max_cpus`](#--max_cpus) + * [`--plaintext_email`](#--plaintext_email) + * [`--monochrome_logs`](#--monochrome_logs) + * [`--multiqc_config`](#--multiqc_config) ## Introduction @@ -115,24 +121,32 @@ This version number will be logged in reports when you run the pipeline, so that ### `-profile` -Use this parameter to choose a configuration profile. Profiles can give configuration presets for different compute environments. Note that multiple profiles can be loaded, for example: `-profile docker` - the order of arguments is important! +Use this parameter to choose a configuration profile. Profiles can give configuration presets for different compute environments. -If `-profile` is not specified at all the pipeline will be run locally and expects all software to be installed and available on the `PATH`. +Several generic profiles are bundled with the pipeline which instruct the pipeline to use software packaged using different methods (Docker, Singularity, Conda) - see below. + +> We highly recommend the use of Docker or Singularity containers for full pipeline reproducibility, however when this is not possible, Conda is also supported. + +The pipeline also dynamically loads configurations from [https://github.com/nf-core/configs](https://github.com/nf-core/configs) when it runs, making multiple config profiles for various institutional clusters available at run time. For more information and to see if your system is available in these configs please see the [nf-core/configs documentation](https://github.com/nf-core/configs#documentation). + +Note that multiple profiles can be loaded, for example: `-profile test,docker` - the order of arguments is important! +They are loaded in sequence, so later profiles can overwrite earlier profiles. + +If `-profile` is not specified, the pipeline will run locally and expect all software to be installed and available on the `PATH`. This is _not_ recommended. -* `awsbatch` - * A generic configuration profile to be used with AWS Batch. -* `conda` - * A generic configuration profile to be used with [conda](https://conda.io/docs/) - * Pulls most software from [Bioconda](https://bioconda.github.io/) * `docker` - * A generic configuration profile to be used with [Docker](http://docker.com/) - * Pulls software from dockerhub: [`nfcore/chipseq`](http://hub.docker.com/r/nfcore/chipseq/) + * A generic configuration profile to be used with [Docker](http://docker.com/) + * Pulls software from dockerhub: [`nfcore/chipseq`](http://hub.docker.com/r/nfcore/chipseq/) * `singularity` - * A generic configuration profile to be used with [Singularity](http://singularity.lbl.gov/) - * Pulls software from DockerHub: [`nfcore/chipseq`](http://hub.docker.com/r/nfcore/chipseq/) + * A generic configuration profile to be used with [Singularity](http://singularity.lbl.gov/) + * Pulls software from DockerHub: [`nfcore/chipseq`](http://hub.docker.com/r/nfcore/chipseq/) +* `conda` + * Please only use Conda as a last resort i.e. when it's not possible to run the pipeline with Docker or Singularity. + * A generic configuration profile to be used with [Conda](https://conda.io/docs/) + * Pulls most software from [Bioconda](https://bioconda.github.io/) * `test` - * A profile with a complete configuration for automated testing - * Includes links to test data so needs no other parameters + * A profile with a complete configuration for automated testing + * Includes links to test data so needs no other parameters ### `--input` @@ -249,13 +263,13 @@ There are 31 different species supported in the iGenomes references. To run the You can find the keys to specify the genomes in the [iGenomes config file](../conf/igenomes.config). Common genomes that are supported are: * Human - * `--genome GRCh37` + * `--genome GRCh37` * Mouse - * `--genome GRCm38` + * `--genome GRCm38` * _Drosophila_ - * `--genome BDGP6` + * `--genome BDGP6` * _S. cerevisiae_ - * `--genome 'R64-1-1'` + * `--genome 'R64-1-1'` > There are numerous others - check the config file for more. @@ -306,14 +320,6 @@ The full path to BED file for genome-wide gene intervals. This will be created f --gene_bed '[path to gene BED file]' ``` -### `--tss_bed` - -The full path to BED file for genome-wide transcription start sites. This will be created from the gene BED file if not specified. - -```bash ---tss_bed '[path to tss BED file]' -``` - ### `--macs_gsize` [Effective genome size](https://github.com/taoliu/MACS#-g--gsize) parameter required by MACS2. These have been provided when `--genome` is set as *GRCh37*, *GRCh38*, *GRCm38*, *WBcel235*, *BDGP6*, *R64-1-1*, *EF2*, *hg38*, *hg19* and *mm10*. For other genomes, if this parameter is not specified then the MACS2 peak-calling and differential analysis will be skipped. @@ -332,11 +338,11 @@ If provided, alignments that overlap with the regions in this file will be filte ### `--save_reference` -If the BWA index is generated by the pipeline use this parameter to save it to your results folder. These can then be used for future pipeline runs, reducing processing times. +If the BWA index is generated by the pipeline use this parameter to save it to your results folder. These can then be used for future pipeline runs, reducing processing times (Default: false). ### `--igenomes_ignore` -Do not load `igenomes.config` when running the pipeline. You may choose this option if you observe clashes between custom parameters and those supplied in `igenomes.config`. +Do not load `igenomes.config` when running the pipeline. You may choose this option if you observe clashes between custom parameters and those supplied in `igenomes.config` (Default: false). ## Adapter trimming @@ -344,51 +350,63 @@ The pipeline accepts a number of parameters to change how the trimming is done, You can specify custom trimming parameters as follows: * `--clip_r1 [int]` - * Instructs Trim Galore to remove [int] bp from the 5' end of read 1 (for single-end reads). + * Instructs Trim Galore to remove [int] bp from the 5' end of read 1 (for single-end reads). * `--clip_r2 [int]` - * Instructs Trim Galore to remove [int] bp from the 5' end of read 2 (paired-end reads only). + * Instructs Trim Galore to remove [int] bp from the 5' end of read 2 (paired-end reads only). * `--three_prime_clip_r1 [int]` - * Instructs Trim Galore to remove [int] bp from the 3' end of read 1 _AFTER_ adapter/quality trimming has been + * Instructs Trim Galore to remove [int] bp from the 3' end of read 1 _AFTER_ adapter/quality trimming has been * `--three_prime_clip_r2 [int]` - * Instructs Trim Galore to remove [int] bp from the 3' end of read 2 _AFTER_ adapter/quality trimming has been performed. + * Instructs Trim Galore to remove [int] bp from the 3' end of read 2 _AFTER_ adapter/quality trimming has been performed. * `--trim_nextseq [int]` - * This enables the option Cutadapt `--nextseq-trim=3'CUTOFF` option via Trim Galore, which will set a quality cutoff (that is normally given with -q instead), but qualities of G bases are ignored. This trimming is in common for the NextSeq- and NovaSeq-platforms, where basecalls without any signal are called as high-quality G bases. + * This enables the option Cutadapt `--nextseq-trim=3'CUTOFF` option via Trim Galore, which will set a quality cutoff (that is normally given with -q instead), but qualities of G bases are ignored. This trimming is in common for the NextSeq- and NovaSeq-platforms, where basecalls without any signal are called as high-quality G bases. ### `--skip_trimming` -Skip the adapter trimming step. Use this if your input FastQ files have already been trimmed outside of the workflow or if you're very confident that there is no adapter contamination in your data. +Skip the adapter trimming step. Use this if your input FastQ files have already been trimmed outside of the workflow or if you're very confident that there is no adapter contamination in your data (Default: false). ### `--save_trimmed` -By default, trimmed FastQ files will not be saved to the results directory. Specify this flag (or set to true in your config file) to copy these files to the results directory when complete. +By default, trimmed FastQ files will not be saved to the results directory. Specify this flag (or set to true in your config file) to copy these files to the results directory when complete (Default: false). ## Alignments +### `--bwa_min_score` + +Don’t output BWA MEM alignments with score lower than this parameter (Default: false). + ### `--keep_dups` -Duplicate reads are not filtered from alignments. +Duplicate reads are not filtered from alignments (Default: false). ### `--keep_multi_map` -Reads mapping to multiple locations in the genome are not filtered from alignments. +Reads mapping to multiple locations in the genome are not filtered from alignments (Default: false). ### `--save_align_intermeds` -By default, intermediate BAM files will not be saved. The final BAM files created after the appropriate filtering step are always saved to limit storage usage. Set to true to also save other intermediate BAM files. +By default, intermediate BAM files will not be saved. The final BAM files created after the appropriate filtering step are always saved to limit storage usage. Set to true to also save other intermediate BAM files (Default: false). ## Peaks ### `--narrow_peak` -MACS2 is run by default with the [`--broad`](https://github.com/taoliu/MACS#--broad) flag. Specify this flag to call peaks in narrowPeak mode. +MACS2 is run by default with the [`--broad`](https://github.com/taoliu/MACS#--broad) flag. Specify this flag to call peaks in narrowPeak mode (Default: false). ### `--broad_cutoff` Specifies broad cut-off value for MACS2. Only used when `--narrow_peak` isnt specified (Default: `0.1`). +### `--macs_fdr` + +Minimum FDR (q-value) cutoff for peak detection, `--macs_fdr` and `--macs_pvalue` are mutually exclusive (Default: false). + +### `--macs_pvalue` + +p-value cutoff for peak detection, `--macs_fdr` and `--macs_pvalue` are mutually exclusive (Default: false). If `--macs_pvalue` cutoff is set, q-value will not be calculated and reported as -1 in the final .xls file. + ### `--min_reps_consensus` -Number of biological replicates required from a given condition for a peak to contribute to a consensus peak . If you are confident you have good reproducibility amongst your replicates then you can increase the value of this parameter to create a "reproducible" set of consensus of peaks. For example, a value of 2 will mean peaks that have been called in at least 2 replicates will contribute to the consensus set of peaks, and as such peaks that are unique to a given replicate will be discarded. +Number of biological replicates required from a given condition for a peak to contribute to a consensus peak . If you are confident you have good reproducibility amongst your replicates then you can increase the value of this parameter to create a "reproducible" set of consensus of peaks. For example, a value of 2 will mean peaks that have been called in at least 2 replicates will contribute to the consensus set of peaks, and as such peaks that are unique to a given replicate will be discarded (Default: 1). ```bash -- min_reps_consensus 1 @@ -396,11 +414,29 @@ Number of biological replicates required from a given condition for a peak to co ### `--save_macs_pileup` -Instruct MACS2 to create bedGraph files using the `-B --SPMR` parameters. +Instruct MACS2 to create bedGraph files using the `-B --SPMR` parameters (Default: false). + +### `--skip_peak_qc` + +Skip MACS2 peak QC plot generation (Default: false). + +### `--skip_peak_annotation` + +Skip annotation of MACS2 and consensus peaks with HOMER (Default: false). + +### `--skip_consensus_peaks` + +Skip consensus peak generation, annotation and counting (Default: false). + +## Differential analysis + +### `--deseq2_vst` + +Use `vst` transformation instead of `rlog` with DESeq2. See [DESeq2 docs](http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#data-transformations-and-visualization) (Default: false). ### `--skip_diff_analysis` -Skip read counting and differential analysis step. +Skip differential binding analysis with DESeq2 (Default: false). ## Skipping QC steps @@ -430,11 +466,11 @@ Wherever process-specific requirements are set in the pipeline, the default valu If you are likely to be running `nf-core` pipelines regularly it may be a good idea to request that your custom config file is uploaded to the `nf-core/configs` git repository. Before you do this please can you test that the config file works with your pipeline of choice using the `-c` parameter (see definition below). You can then create a pull request to the `nf-core/configs` repository with the addition of your config file, associated documentation file (see examples in [`nf-core/configs/docs`](https://github.com/nf-core/configs/tree/master/docs)), and amending [`nfcore_custom.config`](https://github.com/nf-core/configs/blob/master/nfcore_custom.config) to include your custom profile. -If you have any questions or issues please send us a message on [Slack](https://nf-co.re/join/slack/). +If you have any questions or issues please send us a message on [Slack](https://nf-co.re/join/slack). ## AWS Batch specific parameters -Running the pipeline on AWS Batch requires a couple of specific parameters to be set according to your AWS Batch configuration. Please use the `-awsbatch` profile and then specify all of the following parameters. +Running the pipeline on AWS Batch requires a couple of specific parameters to be set according to your AWS Batch configuration. Please use [`-profile awsbatch`](https://github.com/nf-core/configs/blob/master/conf/awsbatch.config) and then specify all of the following parameters. ### `--awsqueue` @@ -442,7 +478,11 @@ The JobQueue that you intend to use on AWS Batch. ### `--awsregion` -The AWS region to run your job in. Default is set to `eu-west-1` but can be adjusted to your needs. +The AWS region in which to run your job. Default is set to `eu-west-1` but can be adjusted to your needs. + +### `--awscli` + +The [AWS CLI](https://www.nextflow.io/docs/latest/awscloud.html#aws-cli-installation) path in your custom AMI. Default: `/home/ec2-user/miniconda/bin/aws`. Please make sure to also set the `-w/--work-dir` and `--outdir` parameters to a S3 storage bucket of your choice - you'll get an error message notifying you if you didn't. @@ -452,6 +492,10 @@ Please make sure to also set the `-w/--work-dir` and `--outdir` parameters to a The output directory where the results will be saved. +### `--publish_dir_mode` + +Value passed to Nextflow [`publishDir`](https://www.nextflow.io/docs/latest/process.html#publishdir) directive for publishing results in the output directory. Available: 'symlink', 'rellink', 'link', 'copy', 'copyNoFollow' and 'move' (Default: 'copy'). + ### `--email` Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits. If set in your user config file (`~/.nextflow/config`) then you don't need to specify this on the command line for every run. @@ -462,7 +506,7 @@ This works exactly as with `--email`, except emails are only sent if the workflo ### `--max_multiqc_email_size` -Threshold size for MultiQC report to be attached in notification email. If file generated by pipeline exceeds the threshold, it will not be attached (Default: `25MB`). +Threshold size for MultiQC report to be attached in notification email. If file generated by pipeline exceeds the threshold, it will not be attached (Default: 25MB). ### `-name` @@ -490,7 +534,7 @@ Note - you can use this to override pipeline defaults. ### `--custom_config_version` -Provide git commit id for custom Institutional configs hosted at `nf-core/configs`. This was implemented for reproducibility purposes. Default is set to `master`. +Provide git commit id for custom Institutional configs hosted at `nf-core/configs`. This was implemented for reproducibility purposes. Default: `master`. ```bash ## Download and use config file with following git commid id diff --git a/environment.yml b/environment.yml index 40bd3199..11207722 100644 --- a/environment.yml +++ b/environment.yml @@ -1,40 +1,47 @@ # You can use this file to create a conda environment for this pipeline: # conda env create -f environment.yml -name: nf-core-chipseq-1.1.0 +name: nf-core-chipseq-1.2.0 channels: - conda-forge - bioconda - defaults dependencies: ## conda-forge packages - - gawk=4.2.1 - - r-base=3.4.1 - - r-optparse=1.6.0 - - r-rcolorbrewer=1.1_2 - - r-ggplot2=3.1.0 - - r-reshape2=1.4.3 - - r-scales=1.0.0 - - r-pheatmap=1.0.10 - - r-lattice=0.20_35 - - r-upsetr=1.3.3 - - r-xfun=0.3 + - conda-forge::python=3.7.6 + - conda-forge::markdown=3.2.2 + - conda-forge::pymdown-extensions=7.1 + - conda-forge::pygments=2.6.1 + - conda-forge::r-base=3.6.2 + - conda-forge::r-optparse=1.6.6 + - conda-forge::r-rcolorbrewer=1.1_2 + - conda-forge::r-reshape2=1.4.4 + - conda-forge::r-ggplot2=3.3.2 + - conda-forge::r-tidyr=1.1.0 + - conda-forge::r-scales=1.1.1 + - conda-forge::r-pheatmap=1.0.12 + - conda-forge::r-lattice=0.20_41 + - conda-forge::r-upsetr=1.4.0 + - conda-forge::r-xfun=0.15 + - conda-forge::gawk=5.1.0 + - conda-forge::pigz=2.3.4 ## Required for TrimGalore multi-threading ## bioconda packages - - fastqc=0.11.8 - - trim-galore=0.5.0 - - bwa=0.7.17 - - samtools=1.9 - - picard=2.19.0 - - bamtools=2.5.1 - - pysam=0.15.2 - - bedtools=2.27.1 - - ucsc-bedgraphtobigwig=377 - - deeptools=3.2.1 - - macs2=2.1.2 - - homer=4.9.1 - - subread=1.6.4 - - phantompeakqualtools=1.2 - - preseq=2.0.3 - - multiqc=1.7 - - bioconductor-deseq2=1.20.0 - - bioconductor-vsn=3.46.0 + - bioconda::fastqc=0.11.9 + - bioconda::trim-galore=0.6.5 + - bioconda::bwa=0.7.17 + - bioconda::samtools=1.10 + - bioconda::picard=2.23.1 + - bioconda::bamtools=2.5.1 + - bioconda::pysam=0.15.3 + - bioconda::bedtools=2.29.2 + - bioconda::ucsc-bedgraphtobigwig=357 + - bioconda::deeptools=3.4.3 + - bioconda::macs2=2.2.7.1 + - bioconda::homer=4.11 + - bioconda::subread=2.0.1 + - bioconda::phantompeakqualtools=1.2.2 + - bioconda::preseq=2.0.3 + - bioconda::multiqc=1.9 + - bioconda::bioconductor-biocparallel=1.20.0 + - bioconda::bioconductor-deseq2=1.26.0 + - bioconda::bioconductor-vsn=3.54.0 diff --git a/main.nf b/main.nf index 51b84f3e..fa9c455a 100755 --- a/main.nf +++ b/main.nf @@ -19,26 +19,25 @@ def helpMessage() { nextflow run nf-core/chipseq --input design.csv --genome GRCh37 -profile docker Mandatory arguments: - --input [file] Comma-separated file containing information about the samples in the experiment (see docs/usage.md) - --fasta [file] Path to Fasta reference. Not mandatory when using reference in iGenomes config via --genome - --gtf [file] Path to GTF file. Not mandatory when using reference in iGenomes config via --genome + --input [file] Comma-separated file containing information about the samples in the experiment (see docs/usage.md) (Default: './design.csv') + --fasta [file] Path to Fasta reference. Not mandatory when using reference in iGenomes config via --genome (Default: false) + --gtf [file] Path to GTF file. Not mandatory when using reference in iGenomes config via --genome (Default: false) -profile [str] Configuration profile to use. Can use multiple (comma separated) Available: conda, docker, singularity, awsbatch, test Generic - --single_end [bool] Specifies that the input is single-end reads - --seq_center [str] Sequencing center information to be added to read group of BAM files + --single_end [bool] Specifies that the input is single-end reads (Default: false) + --seq_center [str] Sequencing center information to be added to read group of BAM files (Default: false) --fragment_size [int] Estimated fragment size used to extend single-end reads (Default: 200) --fingerprint_bins [int] Number of genomic bins to use when calculating fingerprint plot (Default: 500000) References If not specified in the configuration file or you wish to overwrite any of the references - --genome [str] Name of iGenomes reference - --bwa_index [file] Full path to directory containing BWA index including base name i.e. /path/to/index/genome.fa - --gene_bed [file] Path to BED file containing gene intervals - --tss_bed [file] Path to BED file containing transcription start sites - --macs_gsize [str] Effective genome size parameter required by MACS2. If using iGenomes config, values have only been provided when --genome is set as GRCh37, GRCm38, hg19, mm10, BDGP6 and WBcel235 - --blacklist [file] Path to blacklist regions (.BED format), used for filtering alignments - --save_reference [bool] If generated by the pipeline save the BWA index in the results directory + --genome [str] Name of iGenomes reference (Default: false) + --bwa_index [file] Full path to directory containing BWA index including base name i.e. /path/to/index/genome.fa (Default: false) + --gene_bed [file] Path to BED file containing gene intervals (Default: false) + --macs_gsize [str] Effective genome size parameter required by MACS2. If using iGenomes config, values have only been provided when --genome is set as GRCh37, GRCm38, hg19, mm10, BDGP6 and WBcel235 (Default: false) + --blacklist [file] Path to blacklist regions (.BED format), used for filtering alignments (Default: false) + --save_reference [bool] If generated by the pipeline save the BWA index in the results directory (Default: false) Trimming --clip_r1 [int] Instructs Trim Galore to remove bp from the 5' end of read 1 (or single-end reads) (Default: 0) @@ -46,41 +45,52 @@ def helpMessage() { --three_prime_clip_r1 [int] Instructs Trim Galore to remove bp from the 3' end of read 1 AFTER adapter/quality trimming has been performed (Default: 0) --three_prime_clip_r2 [int] Instructs Trim Galore to re move bp from the 3' end of read 2 AFTER adapter/quality trimming has been performed (Default: 0) --trim_nextseq [int] Instructs Trim Galore to apply the --nextseq=X option, to trim based on quality after removing poly-G tails (Default: 0) - --skip_trimming [bool] Skip the adapter trimming step - --save_trimmed [bool] Save the trimmed FastQ files in the results directory + --skip_trimming [bool] Skip the adapter trimming step (Default: false) + --save_trimmed [bool] Save the trimmed FastQ files in the results directory (Default: false) Alignments - --keep_dups [bool] Duplicate reads are not filtered from alignments - --keep_multi_map [bool] Reads mapping to multiple locations are not filtered from alignments - --save_align_intermeds [bool] Save the intermediate BAM files from the alignment step - not done by default + --bwa_min_score [int] Don’t output BWA MEM alignments with score lower than this parameter (Default: false) + --keep_dups [bool] Duplicate reads are not filtered from alignments (Default: false) + --keep_multi_map [bool] Reads mapping to multiple locations are not filtered from alignments (Default: false) + --save_align_intermeds [bool] Save the intermediate BAM files from the alignment step - not done by default (Default: false) Peaks - --narrow_peak [bool] Run MACS2 in narrowPeak mode + --narrow_peak [bool] Run MACS2 in narrowPeak mode (Default: false) --broad_cutoff [float] Specifies broad cutoff value for MACS2. Only used when --narrow_peak isnt specified (Default: 0.1) + --macs_fdr [float] Minimum FDR (q-value) cutoff for peak detection, --macs_fdr and --macs_pvalue are mutually exclusive (Default: false) + --macs_pvalue [float] p-value cutoff for peak detection, --macs_fdr and --macs_pvalue are mutually exclusive (Default: false) --min_reps_consensus [int] Number of biological replicates required from a given condition for a peak to contribute to a consensus peak (Default: 1) - --save_macs_pileup [bool] Instruct MACS2 to create bedGraph files normalised to signal per million reads - --skip_diff_analysis [bool] Skip differential binding analysis + --save_macs_pileup [bool] Instruct MACS2 to create bedGraph files normalised to signal per million reads (Default: false) + --skip_peak_qc [bool] Skip MACS2 peak QC plot generation (Default: false) + --skip_peak_annotation [bool] Skip annotation of MACS2 and consensus peaks with HOMER (Default: false) + --skip_consensus_peaks [bool] Skip consensus peak generation (Default: false) + + Differential analysis + --deseq2_vst [bool] Use vst transformation instead of rlog with DESeq2 (Default: false) + --skip_diff_analysis [bool] Skip differential accessibility analysis (Default: false) QC - --skip_fastqc [bool] Skip FastQC - --skip_picard_metrics [bool] Skip Picard CollectMultipleMetrics - --skip_preseq [bool] Skip Preseq - --skip_plot_profile [bool] Skip deepTools plotProfile - --skip_plot_fingerprint [bool] Skip deepTools plotFingerprint - --skip_spp [bool] Skip Phantompeakqualtools - --skip_igv [bool] Skip IGV - --skip_multiqc [bool] Skip MultiQC + --skip_fastqc [bool] Skip FastQC (Default: false) + --skip_picard_metrics [bool] Skip Picard CollectMultipleMetrics (Default: false) + --skip_preseq [bool] Skip Preseq (Default: false) + --skip_plot_profile [bool] Skip deepTools plotProfile (Default: false) + --skip_plot_fingerprint [bool] Skip deepTools plotFingerprint (Default: false) + --skip_spp [bool] Skip Phantompeakqualtools (Default: false) + --skip_igv [bool] Skip IGV (Default: false) + --skip_multiqc [bool] Skip MultiQC (Default: false) Other - --outdir [file] The output directory where the results will be saved - --email [email] Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits - --email_on_fail [email] Same as --email, except only send mail if the workflow is not successful - --max_multiqc_email_size [str] Theshold size for MultiQC report to be attached in notification email. If file generated by pipeline exceeds the threshold, it will not be attached (Default: 25MB) - -name [str] Name for the pipeline run. If not specified, Nextflow will automatically generate a random mnemonic - - AWSBatch + --outdir [file] The output directory where the results will be saved (Default: './results') + --publish_dir_mode [str] Mode for publishing results in the output directory. Available: symlink, rellink, link, copy, copyNoFollow, move (Default: copy) + --email [email] Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits (Default: false) + --email_on_fail [email] Same as --email, except only send mail if the workflow is not successful (Default: false) + --max_multiqc_email_size [str] Threshold size for MultiQC report to be attached in notification email. If file generated by pipeline exceeds the threshold, it will not be attached (Default: 25MB) + -name [str] Name for the pipeline run. If not specified, Nextflow will automatically generate a random mnemonic (Default: false) + + AWSBatch options: --awsqueue [str] The AWSBatch JobQueue that needs to be set when running on AWSBatch --awsregion [str] The AWS Region for your AWS Batch job to run on + --awscli [str] Path to the AWS CLI tool """.stripIndent() } @@ -92,25 +102,28 @@ def helpMessage() { /////////////////////////////////////////////////////////////////////////////// /////////////////////////////////////////////////////////////////////////////// -/* - * SET UP CONFIGURATION VARIABLES - */ - // Show help message if (params.help) { helpMessage() exit 0 } -// Check if genome exists in the config file -if (params.genomes && params.genome && !params.genomes.containsKey(params.genome)) { - exit 1, "The provided genome '${params.genome}' is not available in the iGenomes file. Currently the available genomes are ${params.genomes.keySet().join(", ")}" +// Has the run name been specified by the user? +// this has the bonus effect of catching both -name and --name +custom_runName = params.name +if (!(workflow.runName ==~ /[a-z]+_[a-z]+/)) { + custom_runName = workflow.runName } //////////////////////////////////////////////////// /* -- DEFAULT PARAMETER VALUES -- */ //////////////////////////////////////////////////// +// Check if genome exists in the config file +if (params.genomes && params.genome && !params.genomes.containsKey(params.genome)) { + exit 1, "The provided genome '${params.genome}' is not available in the iGenomes file. Currently the available genomes are ${params.genomes.keySet().join(", ")}" +} + // Configurable variables params.fasta = params.genome ? params.genomes[ params.genome ].fasta ?: false : false params.bwa_index = params.genome ? params.genomes[ params.genome ].bwa ?: false : false @@ -118,23 +131,20 @@ params.gtf = params.genome ? params.genomes[ params.genome ].gtf ?: false : fals params.gene_bed = params.genome ? params.genomes[ params.genome ].bed12 ?: false : false params.macs_gsize = params.genome ? params.genomes[ params.genome ].macs_gsize ?: false : false params.blacklist = params.genome ? params.genomes[ params.genome ].blacklist ?: false : false +params.anno_readme = params.genome ? params.genomes[ params.genome ].readme ?: false : false // Global variables -def PEAK_TYPE = params.narrow_peak ? "narrowPeak" : "broadPeak" - -// Has the run name been specified by the user? -// this has the bonus effect of catching both -name and --name -custom_runName = params.name -if (!(workflow.runName ==~ /[a-z]+_[a-z]+/)) { - custom_runName = workflow.runName -} +def PEAK_TYPE = params.narrow_peak ? 'narrowPeak' : 'broadPeak' //////////////////////////////////////////////////// /* -- CONFIG FILES -- */ //////////////////////////////////////////////////// // Pipeline config +ch_multiqc_config = file("$baseDir/assets/multiqc_config.yaml", checkIfExists: true) +ch_multiqc_custom_config = params.multiqc_config ? Channel.fromPath(params.multiqc_config, checkIfExists: true) : Channel.empty() ch_output_docs = file("$baseDir/docs/output.md", checkIfExists: true) +ch_output_docs_images = file("$baseDir/docs/images/", checkIfExists: true) // JSON files required by BAMTools for alignment filtering if (params.single_end) { @@ -144,7 +154,6 @@ if (params.single_end) { } // Header files for MultiQC -ch_multiqc_config = file(params.multiqc_config, checkIfExists: true) ch_peak_count_header = file("$baseDir/assets/multiqc/peak_count_header.txt", checkIfExists: true) ch_frip_score_header = file("$baseDir/assets/multiqc/frip_score_header.txt", checkIfExists: true) ch_peak_annotation_header = file("$baseDir/assets/multiqc/peak_annotation_header.txt", checkIfExists: true) @@ -158,11 +167,9 @@ ch_spp_rsc_header = file("$baseDir/assets/multiqc/spp_rsc_header.txt", checkIfEx /* -- VALIDATE INPUTS -- */ //////////////////////////////////////////////////// -// Validate inputs -if (params.input) { ch_input = file(params.input, checkIfExists: true) } else { exit 1, "Samples design file not specified!" } -if (params.gtf) { ch_gtf = file(params.gtf, checkIfExists: true) } else { exit 1, "GTF annotation file not specified!" } +if (params.input) { ch_input = file(params.input, checkIfExists: true) } else { exit 1, 'Samples design file not specified!' } +if (params.gtf) { ch_gtf = file(params.gtf, checkIfExists: true) } else { exit 1, 'GTF annotation file not specified!' } if (params.gene_bed) { ch_gene_bed = file(params.gene_bed, checkIfExists: true) } -if (params.tss_bed) { ch_tss_bed = file(params.tss_bed, checkIfExists: true) } if (params.blacklist) { ch_blacklist = Channel.fromPath(params.blacklist, checkIfExists: true) } else { ch_blacklist = Channel.empty() } if (params.fasta) { @@ -170,30 +177,36 @@ if (params.fasta) { bwa_base = params.fasta.substring(lastPath+1) ch_fasta = file(params.fasta, checkIfExists: true) } else { - exit 1, "Fasta file not specified!" + exit 1, 'Fasta file not specified!' } if (params.bwa_index) { lastPath = params.bwa_index.lastIndexOf(File.separator) - bwa_dir = params.bwa_index.substring(0,lastPath+1) + bwa_dir = params.bwa_index.substring(0,lastPath+1) bwa_base = params.bwa_index.substring(lastPath+1) Channel .fromPath(bwa_dir, checkIfExists: true) .set { ch_bwa_index } } +// Save AWS IGenomes file containing annotation version +if (params.anno_readme && file(params.anno_readme).exists()) { + file("${params.outdir}/genome/").mkdirs() + file(params.anno_readme).copyTo("${params.outdir}/genome/") +} + //////////////////////////////////////////////////// /* -- AWS -- */ //////////////////////////////////////////////////// -if (workflow.profile == 'awsbatch') { +if (workflow.profile.contains('awsbatch')) { // AWSBatch sanity checking - if (!params.awsqueue || !params.awsregion) exit 1, "Specify correct --awsqueue and --awsregion parameters on AWSBatch!" + if (!params.awsqueue || !params.awsregion) exit 1, 'Specify correct --awsqueue and --awsregion parameters on AWSBatch!' // Check outdir paths to be S3 buckets if running on AWSBatch // related: https://github.com/nextflow-io/nextflow/issues/813 - if (!params.outdir.startsWith('s3:')) exit 1, "Outdir not on S3 - specify S3 Bucket to run on AWSBatch!" + if (!params.outdir.startsWith('s3:')) exit 1, 'Outdir not on S3 - specify S3 Bucket to run on AWSBatch!' // Prevent trace files to be stored on S3 since S3 does not support rolling files. - if (workflow.tracedir.startsWith('s3:')) exit 1, "Specify a local tracedir or run without trace! S3 cannot be used for tracefiles." + if (params.tracedir.startsWith('s3:')) exit 1, 'Specify a local tracedir or run without trace! S3 cannot be used for tracefiles.' } /////////////////////////////////////////////////////////////////////////////// @@ -214,13 +227,15 @@ summary['Genome'] = params.genome ?: 'Not supplied' summary['Fasta File'] = params.fasta summary['GTF File'] = params.gtf if (params.gene_bed) summary['Gene BED File'] = params.gene_bed -if (params.tss_bed) summary['TSS BED File'] = params.tss_bed if (params.bwa_index) summary['BWA Index'] = params.bwa_index if (params.blacklist) summary['Blacklist BED'] = params.blacklist +if (params.bwa_min_score) summary['BWA Min Score'] = params.bwa_min_score summary['MACS2 Genome Size'] = params.macs_gsize ?: 'Not supplied' summary['Min Consensus Reps'] = params.min_reps_consensus if (params.macs_gsize) summary['MACS2 Narrow Peaks'] = params.narrow_peak ? 'Yes' : 'No' if (!params.narrow_peak) summary['MACS2 Broad Cutoff'] = params.broad_cutoff +if (params.macs_fdr) summary['MACS2 FDR'] = params.macs_fdr +if (params.macs_pvalue) summary['MACS2 P-value'] = params.macs_pvalue if (params.skip_trimming) { summary['Trimming Step'] = 'Skipped' } else { @@ -228,7 +243,7 @@ if (params.skip_trimming) { summary['Trim R2'] = "$params.clip_r2 bp" summary["Trim 3' R1"] = "$params.three_prime_clip_r1 bp" summary["Trim 3' R2"] = "$params.three_prime_clip_r2 bp" - summary["NextSeq Trim"] = "$params.trim_nextseq bp" + summary['NextSeq Trim'] = "$params.trim_nextseq bp" } if (params.seq_center) summary['Sequencing Center'] = params.seq_center if (params.single_end) summary['Fragment Size'] = "$params.fragment_size bp" @@ -239,7 +254,11 @@ summary['Save Genome Index'] = params.save_reference ? 'Yes' : 'No' if (params.save_trimmed) summary['Save Trimmed'] = 'Yes' if (params.save_align_intermeds) summary['Save Intermeds'] = 'Yes' if (params.save_macs_pileup) summary['Save MACS2 Pileup'] = 'Yes' -if (params.skip_diff_analysis) summary['Skip Diff Analysis'] = 'Yes' +if (params.skip_peak_qc) summary['Skip MACS2 Peak QC'] = 'Yes' +if (params.skip_peak_annotation) summary['Skip Peak Annotation'] = 'Yes' +if (params.skip_consensus_peaks) summary['Skip Consensus Peaks'] = 'Yes' +if (params.deseq2_vst) summary['Use DESeq2 vst Transform'] = 'Yes' +if (params.skip_diff_analysis) summary['Skip Differential Analysis'] = 'Yes' if (params.skip_fastqc) summary['Skip FastQC'] = 'Yes' if (params.skip_picard_metrics) summary['Skip Picard Metrics'] = 'Yes' if (params.skip_preseq) summary['Skip Preseq'] = 'Yes' @@ -249,15 +268,16 @@ if (params.skip_spp) summary['Skip spp'] = 'Yes' if (params.skip_igv) summary['Skip IGV'] = 'Yes' if (params.skip_multiqc) summary['Skip MultiQC'] = 'Yes' summary['Max Resources'] = "$params.max_memory memory, $params.max_cpus cpus, $params.max_time time per job" -if (workflow.containerEngine) summary['Container'] = "$workflow.containerEngine - $workflow.container" +if (workflow.containerEngine) summary['Container'] = "$workflow.containerEngine - $workflow.container" summary['Output Dir'] = params.outdir summary['Launch Dir'] = workflow.launchDir summary['Working Dir'] = workflow.workDir summary['Script Dir'] = workflow.projectDir summary['User'] = workflow.userName -if (workflow.profile == 'awsbatch') { +if (workflow.profile.contains('awsbatch')) { summary['AWS Region'] = params.awsregion summary['AWS Queue'] = params.awsqueue + summary['AWS CLI'] = params.awscli } summary['Config Profile'] = workflow.profile if (params.config_profile_description) summary['Config Description'] = params.config_profile_description @@ -268,7 +288,7 @@ if (params.email || params.email_on_fail) { summary['E-mail on failure'] = params.email_on_fail summary['MultiQC Max Size'] = params.max_multiqc_email_size } -log.info summary.collect { k,v -> "${k.padRight(20)}: $v" }.join("\n") +log.info summary.collect { k,v -> "${k.padRight(20)}: $v" }.join('\n') log.info "-\033[2m--------------------------------------------------\033[0m-" // Check the hostnames against configured profiles @@ -293,18 +313,18 @@ if (!params.macs_gsize) { /////////////////////////////////////////////////////////////////////////////// /* - * PREPROCESSING - REFORMAT DESIGN FILE, CHECK VALIDITY & CREATE IP vs CONTROL MAPPINGS + * PREPROCESSING: Reformat design file, check validitiy and create IP vs control mappings */ -process CheckDesign { +process CHECK_DESIGN { tag "$design" - publishDir "${params.outdir}/pipeline_info", mode: 'copy' + publishDir "${params.outdir}/pipeline_info", mode: params.publish_dir_mode input: - file design from ch_input + path design from ch_input output: - file "design_reads.csv" into ch_design_reads_csv - file "design_controls.csv" into ch_design_controls_csv + path 'design_reads.csv' into ch_design_reads_csv + path 'design_controls.csv' into ch_design_controls_csv script: // This script is bundled with the pipeline, in nf-core/chipseq/bin/ """ @@ -346,20 +366,20 @@ ch_design_controls_csv /////////////////////////////////////////////////////////////////////////////// /* - * PREPROCESSING - Build BWA index + * PREPROCESSING: Build BWA index */ if (!params.bwa_index) { - process BWAIndex { + process BWA_INDEX { tag "$fasta" label 'process_high' - publishDir path: { params.save_reference ? "${params.outdir}/reference_genome" : params.outdir }, - saveAs: { params.save_reference ? it : null }, mode: 'copy' + publishDir path: { params.save_reference ? "${params.outdir}/genome" : params.outdir }, + saveAs: { params.save_reference ? it : null }, mode: params.publish_dir_mode input: - file fasta from ch_fasta + path fasta from ch_fasta output: - file "BWAIndex" into ch_bwa_index + path 'BWAIndex' into ch_bwa_index script: """ @@ -370,19 +390,29 @@ if (!params.bwa_index) { } /* - * PREPROCESSING - Generate gene BED file + * PREPROCESSING: Generate gene BED file */ +// If --gtf is supplied along with --genome +// Make gene bed from supplied --gtf instead of using iGenomes one automatically +def MAKE_BED = false if (!params.gene_bed) { - process MakeGeneBED { + MAKE_BED = true +} else if (params.genome && params.gtf) { + if (params.genomes[ params.genome ].gtf != params.gtf) { + MAKE_BED = true + } +} +if (MAKE_BED) { + process MAKE_GENE_BED { tag "$gtf" label 'process_low' - publishDir "${params.outdir}/reference_genome", mode: 'copy' + publishDir "${params.outdir}/genome", mode: params.publish_dir_mode input: - file gtf from ch_gtf + path gtf from ch_gtf output: - file "*.bed" into ch_gene_bed + path '*.bed' into ch_gene_bed script: // This script is bundled with the pipeline, in nf-core/chipseq/bin/ """ @@ -392,42 +422,21 @@ if (!params.gene_bed) { } /* - * PREPROCESSING - Generate TSS BED file - */ -if (!params.tss_bed) { - process MakeTSSBED { - tag "$bed" - publishDir "${params.outdir}/reference_genome", mode: 'copy' - - input: - file bed from ch_gene_bed - - output: - file "*.bed" into ch_tss_bed - - script: - """ - cat $bed | awk -v FS='\t' -v OFS='\t' '{ if(\$6=="+") \$3=\$2+1; else \$2=\$3-1; print \$1, \$2, \$3, \$4, \$5, \$6;}' > ${bed.baseName}.tss.bed - """ - } -} - -/* - * PREPROCESSING - Prepare genome intervals for filtering + * PREPROCESSING: Prepare genome intervals for filtering */ -process MakeGenomeFilter { +process MAKE_GENOME_FILTER { tag "$fasta" - publishDir "${params.outdir}/reference_genome", mode: 'copy' + publishDir "${params.outdir}/genome", mode: params.publish_dir_mode input: - file fasta from ch_fasta - file blacklist from ch_blacklist.ifEmpty([]) + path fasta from ch_fasta + path blacklist from ch_blacklist.ifEmpty([]) output: - file "$fasta" into ch_genome_fasta // FASTA FILE FOR IGV - file "*.fai" into ch_genome_fai // FAI INDEX FOR REFERENCE GENOME - file "*.bed" into ch_genome_filter_regions // BED FILE WITHOUT BLACKLIST REGIONS - file "*.sizes" into ch_genome_sizes_bigwig // CHROMOSOME SIZES FILE FOR BEDTOOLS + path "$fasta" // FASTA FILE FOR IGV + path '*.fai' // FAI INDEX FOR REFERENCE GENOME + path '*.bed' into ch_genome_filter_regions // BED FILE WITHOUT BLACKLIST REGIONS + path '*.sizes' into ch_genome_sizes_bigwig // CHROMOSOME SIZES FILE FOR BEDTOOLS script: blacklist_filter = params.blacklist ? "sortBed -i $blacklist -g ${fasta}.sizes | complementBed -i stdin -g ${fasta}.sizes" : "awk '{print \$1, '0' , \$2}' OFS='\t' ${fasta}.sizes" @@ -447,24 +456,24 @@ process MakeGenomeFilter { /////////////////////////////////////////////////////////////////////////////// /* - * STEP 1 - FastQC + * STEP 1: FastQC */ -process FastQC { +process FASTQC { tag "$name" label 'process_medium' - publishDir "${params.outdir}/fastqc", mode: 'copy', + publishDir "${params.outdir}/fastqc", mode: params.publish_dir_mode, saveAs: { filename -> - filename.endsWith(".zip") ? "zips/$filename" : "$filename" + filename.endsWith('.zip') ? "zips/$filename" : filename } when: !params.skip_fastqc input: - set val(name), file(reads) from ch_raw_reads_fastqc + tuple val(name), path(reads) from ch_raw_reads_fastqc output: - file "*.{zip,html}" into ch_fastqc_reports_mqc + path '*.{zip,html}' into ch_fastqc_reports_mqc script: // Added soft-links to original fastqs for consistent naming in MultiQC @@ -492,49 +501,61 @@ process FastQC { /////////////////////////////////////////////////////////////////////////////// /* - * STEP 2 - Trim Galore! + * STEP 2: Trim Galore! */ if (params.skip_trimming) { ch_trimmed_reads = ch_raw_reads_trimgalore - ch_trimgalore_results_mqc = [] - ch_trimgalore_fastqc_reports_mqc = [] + ch_trimgalore_results_mqc = Channel.empty() + ch_trimgalore_fastqc_reports_mqc = Channel.empty() } else { - process TrimGalore { + process TRIMGALORE { tag "$name" - label 'process_long' - publishDir "${params.outdir}/trim_galore", mode: 'copy', + label 'process_high' + publishDir "${params.outdir}/trim_galore", mode: params.publish_dir_mode, saveAs: { filename -> - if (filename.endsWith(".html")) "fastqc/$filename" - else if (filename.endsWith(".zip")) "fastqc/zips/$filename" - else if (filename.endsWith("trimming_report.txt")) "logs/$filename" + if (filename.endsWith('.html')) "fastqc/$filename" + else if (filename.endsWith('.zip')) "fastqc/zips/$filename" + else if (filename.endsWith('trimming_report.txt')) "logs/$filename" else params.save_trimmed ? filename : null } input: - set val(name), file(reads) from ch_raw_reads_trimgalore + tuple val(name), path(reads) from ch_raw_reads_trimgalore output: - set val(name), file("*.fq.gz") into ch_trimmed_reads - file "*.txt" into ch_trimgalore_results_mqc - file "*.{zip,html}" into ch_trimgalore_fastqc_reports_mqc + tuple val(name), path('*.fq.gz') into ch_trimmed_reads + path '*.txt' into ch_trimgalore_results_mqc + path '*.{zip,html}' into ch_trimgalore_fastqc_reports_mqc script: - // Added soft-links to original fastqs for consistent naming in MultiQC + // Calculate number of --cores for TrimGalore based on value of task.cpus + // See: https://github.com/FelixKrueger/TrimGalore/blob/master/Changelog.md#version-060-release-on-1-mar-2019 + // See: https://github.com/nf-core/atacseq/pull/65 + def cores = 1 + if (task.cpus) { + cores = (task.cpus as int) - 4 + if (params.single_end) cores = (task.cpus as int) - 3 + if (cores < 1) cores = 1 + if (cores > 4) cores = 4 + } + c_r1 = params.clip_r1 > 0 ? "--clip_r1 ${params.clip_r1}" : '' c_r2 = params.clip_r2 > 0 ? "--clip_r2 ${params.clip_r2}" : '' tpc_r1 = params.three_prime_clip_r1 > 0 ? "--three_prime_clip_r1 ${params.three_prime_clip_r1}" : '' tpc_r2 = params.three_prime_clip_r2 > 0 ? "--three_prime_clip_r2 ${params.three_prime_clip_r2}" : '' nextseq = params.trim_nextseq > 0 ? "--nextseq ${params.trim_nextseq}" : '' + + // Added soft-links to original fastqs for consistent naming in MultiQC if (params.single_end) { """ [ ! -f ${name}.fastq.gz ] && ln -s $reads ${name}.fastq.gz - trim_galore --fastqc --gzip $c_r1 $tpc_r1 $nextseq ${name}.fastq.gz + trim_galore --cores $cores --fastqc --gzip $c_r1 $tpc_r1 $nextseq ${name}.fastq.gz """ } else { """ [ ! -f ${name}_1.fastq.gz ] && ln -s ${reads[0]} ${name}_1.fastq.gz [ ! -f ${name}_2.fastq.gz ] && ln -s ${reads[1]} ${name}_2.fastq.gz - trim_galore --paired --fastqc --gzip $c_r1 $c_r2 $tpc_r1 $tpc_r2 $nextseq ${name}_1.fastq.gz ${name}_2.fastq.gz + trim_galore --cores $cores --paired --fastqc --gzip $c_r1 $c_r2 $tpc_r1 $tpc_r2 $nextseq ${name}_1.fastq.gz ${name}_2.fastq.gz """ } } @@ -549,18 +570,18 @@ if (params.skip_trimming) { /////////////////////////////////////////////////////////////////////////////// /* - * STEP 3.1 - Align read 1 with bwa + * STEP 3.1: Map read(s) with bwa mem */ -process BWAMem { +process BWA_MEM { tag "$name" label 'process_high' input: - set val(name), file(reads) from ch_trimmed_reads - file index from ch_bwa_index.collect() + tuple val(name), path(reads) from ch_trimmed_reads + path index from ch_bwa_index.collect() output: - set val(name), file("*.bam") into ch_bwa_bam + tuple val(name), path('*.bam') into ch_bwa_bam script: prefix = "${name}.Lb" @@ -568,11 +589,13 @@ process BWAMem { if (params.seq_center) { rg = "\'@RG\\tID:${name}\\tSM:${name.split('_')[0..-2].join('_')}\\tPL:ILLUMINA\\tLB:${name}\\tPU:1\\tCN:${params.seq_center}\'" } + score = params.bwa_min_score ? "-T ${params.bwa_min_score}" : '' """ bwa mem \\ -t $task.cpus \\ -M \\ -R $rg \\ + $score \\ ${index}/${bwa_base} \\ $reads \\ | samtools view -@ $task.cpus -b -h -F 0x0100 -O BAM -o ${prefix}.bam - @@ -580,27 +603,27 @@ process BWAMem { } /* - * STEP 3.2 - Convert .bam to coordinate sorted .bam + * STEP 3.2: Convert BAM to coordinate sorted BAM */ -process SortBAM { +process SORT_BAM { tag "$name" label 'process_medium' if (params.save_align_intermeds) { - publishDir path: "${params.outdir}/bwa/library", mode: 'copy', + publishDir path: "${params.outdir}/bwa/library", mode: params.publish_dir_mode, saveAs: { filename -> - if (filename.endsWith(".flagstat")) "samtools_stats/$filename" - else if (filename.endsWith(".idxstats")) "samtools_stats/$filename" - else if (filename.endsWith(".stats")) "samtools_stats/$filename" + if (filename.endsWith('.flagstat')) "samtools_stats/$filename" + else if (filename.endsWith('.idxstats')) "samtools_stats/$filename" + else if (filename.endsWith('.stats')) "samtools_stats/$filename" else filename } } input: - set val(name), file(bam) from ch_bwa_bam + tuple val(name), path(bam) from ch_bwa_bam output: - set val(name), file("*.sorted.{bam,bam.bai}") into ch_sort_bam_merge - file "*.{flagstat,idxstats,stats}" into ch_sort_bam_flagstat_mqc + tuple val(name), path('*.sorted.{bam,bam.bai}') into ch_sort_bam_merge + path '*.{flagstat,idxstats,stats}' into ch_sort_bam_flagstat_mqc script: prefix = "${name}.Lb" @@ -622,7 +645,7 @@ process SortBAM { /////////////////////////////////////////////////////////////////////////////// /* - * STEP 4.1 Merge BAM files for all libraries from same replicate + * STEP 4.1: Merge BAM files for all libraries from same sample replicate */ ch_sort_bam_merge .map { it -> [ it[0].split('_')[0..-2].join('_'), it[1] ] } @@ -630,33 +653,33 @@ ch_sort_bam_merge .map { it -> [ it[0], it[1].flatten() ] } .set { ch_sort_bam_merge } -process MergeBAM { +process MERGED_BAM { tag "$name" label 'process_medium' - publishDir "${params.outdir}/bwa/mergedLibrary", mode: 'copy', + publishDir "${params.outdir}/bwa/mergedLibrary", mode: params.publish_dir_mode, saveAs: { filename -> - if (filename.endsWith(".flagstat")) "samtools_stats/$filename" - else if (filename.endsWith(".idxstats")) "samtools_stats/$filename" - else if (filename.endsWith(".stats")) "samtools_stats/$filename" - else if (filename.endsWith(".metrics.txt")) "picard_metrics/$filename" + if (filename.endsWith('.flagstat')) "samtools_stats/$filename" + else if (filename.endsWith('.idxstats')) "samtools_stats/$filename" + else if (filename.endsWith('.stats')) "samtools_stats/$filename" + else if (filename.endsWith('.metrics.txt')) "picard_metrics/$filename" else params.save_align_intermeds ? filename : null } input: - set val(name), file(bams) from ch_sort_bam_merge + tuple val(name), path(bams) from ch_sort_bam_merge output: - set val(name), file("*${prefix}.sorted.{bam,bam.bai}") into ch_merge_bam_filter, - ch_merge_bam_preseq - file "*.{flagstat,idxstats,stats}" into ch_merge_bam_stats_mqc - file "*.txt" into ch_merge_bam_metrics_mqc + tuple val(name), path("*${prefix}.sorted.{bam,bam.bai}") into ch_merge_bam_filter, + ch_merge_bam_preseq + path '*.{flagstat,idxstats,stats}' into ch_merge_bam_stats_mqc + path '*.txt' into ch_merge_bam_metrics_mqc script: prefix = "${name}.mLb.mkD" bam_files = bams.findAll { it.toString().endsWith('.bam') }.sort() def avail_mem = 3 if (!task.memory) { - log.info "[Picard MarkDuplicates] Available memory not known - defaulting to 3GB. Specify process memory requirements to change this." + log.info '[Picard MarkDuplicates] Available memory not known - defaulting to 3GB. Specify process memory requirements to change this.' } else { avail_mem = task.memory.toGiga() } @@ -704,40 +727,40 @@ process MergeBAM { } /* - * STEP 4.2 Filter BAM file at merged library-level + * STEP 4.2: Filter BAM file at merged library-level */ -process MergeBAMFilter { +process MERGED_BAM_FILTER { tag "$name" label 'process_medium' - publishDir path: "${params.outdir}/bwa/mergedLibrary", mode: 'copy', + publishDir path: "${params.outdir}/bwa/mergedLibrary", mode: params.publish_dir_mode, saveAs: { filename -> if (params.single_end || params.save_align_intermeds) { - if (filename.endsWith(".flagstat")) "samtools_stats/$filename" - else if (filename.endsWith(".idxstats")) "samtools_stats/$filename" - else if (filename.endsWith(".stats")) "samtools_stats/$filename" - else if (filename.endsWith(".sorted.bam")) filename - else if (filename.endsWith(".sorted.bam.bai")) filename + if (filename.endsWith('.flagstat')) "samtools_stats/$filename" + else if (filename.endsWith('.idxstats')) "samtools_stats/$filename" + else if (filename.endsWith('.stats')) "samtools_stats/$filename" + else if (filename.endsWith('.sorted.bam')) filename + else if (filename.endsWith('.sorted.bam.bai')) filename else null } } input: - set val(name), file(bam) from ch_merge_bam_filter - file bed from ch_genome_filter_regions.collect() - file bamtools_filter_config from ch_bamtools_filter_config + tuple val(name), path(bam) from ch_merge_bam_filter + path bed from ch_genome_filter_regions.collect() + path bamtools_filter_config from ch_bamtools_filter_config output: - set val(name), file("*.{bam,bam.bai}") into ch_filter_bam - set val(name), file("*.flagstat") into ch_filter_bam_flagstat - file "*.{idxstats,stats}" into ch_filter_bam_stats_mqc + tuple val(name), path('*.{bam,bam.bai}') into ch_filter_bam + tuple val(name), path('*.flagstat') into ch_filter_bam_flagstat + path '*.{idxstats,stats}' into ch_filter_bam_stats_mqc script: prefix = params.single_end ? "${name}.mLb.clN" : "${name}.mLb.flT" - filter_params = params.single_end ? "-F 0x004" : "-F 0x004 -F 0x0008 -f 0x001" - dup_params = params.keep_dups ? "" : "-F 0x0400" - multimap_params = params.keep_multi_map ? "" : "-q 1" - blacklist_params = params.blacklist ? "-L $bed" : "" - name_sort_bam = params.single_end ? "" : "samtools sort -n -@ $task.cpus -o ${prefix}.bam -T $prefix ${prefix}.sorted.bam" + filter_params = params.single_end ? '-F 0x004' : '-F 0x004 -F 0x0008 -f 0x001' + dup_params = params.keep_dups ? '' : '-F 0x0400' + multimap_params = params.keep_multi_map ? '' : '-q 1' + blacklist_params = params.blacklist ? "-L $bed" : '' + name_sort_bam = params.single_end ? '' : "samtools sort -n -@ $task.cpus -o ${prefix}.bam -T $prefix ${prefix}.sorted.bam" """ samtools view \\ $filter_params \\ @@ -759,7 +782,7 @@ process MergeBAMFilter { } /* - * STEP 4.3 Remove orphan reads from paired-end BAM file + * STEP 4.3: Remove orphan reads from paired-end BAM file */ if (params.single_end) { ch_filter_bam @@ -778,33 +801,33 @@ if (params.single_end) { ch_filter_bam_stats_mqc .set { ch_rm_orphan_stats_mqc } } else { - process MergeBAMRemoveOrphan { + process MERGED_BAM_REMOVE_ORPHAN { tag "$name" label 'process_medium' - publishDir path: "${params.outdir}/bwa/mergedLibrary", mode: 'copy', + publishDir path: "${params.outdir}/bwa/mergedLibrary", mode: params.publish_dir_mode, saveAs: { filename -> - if (filename.endsWith(".flagstat")) "samtools_stats/$filename" - else if (filename.endsWith(".idxstats")) "samtools_stats/$filename" - else if (filename.endsWith(".stats")) "samtools_stats/$filename" - else if (filename.endsWith(".sorted.bam")) filename - else if (filename.endsWith(".sorted.bam.bai")) filename + if (filename.endsWith('.flagstat')) "samtools_stats/$filename" + else if (filename.endsWith('.idxstats')) "samtools_stats/$filename" + else if (filename.endsWith('.stats')) "samtools_stats/$filename" + else if (filename.endsWith('.sorted.bam')) filename + else if (filename.endsWith('.sorted.bam.bai')) filename else null } input: - set val(name), file(bam) from ch_filter_bam + tuple val(name), path(bam) from ch_filter_bam output: - set val(name), file("*.sorted.{bam,bam.bai}") into ch_rm_orphan_bam_metrics, - ch_rm_orphan_bam_bigwig, - ch_rm_orphan_bam_macs_1, - ch_rm_orphan_bam_macs_2, - ch_rm_orphan_bam_phantompeakqualtools - set val(name), file("${prefix}.bam") into ch_rm_orphan_name_bam_counts - set val(name), file("*.flagstat") into ch_rm_orphan_flagstat_bigwig, - ch_rm_orphan_flagstat_macs, - ch_rm_orphan_flagstat_mqc - file "*.{idxstats,stats}" into ch_rm_orphan_stats_mqc + tuple val(name), path('*.sorted.{bam,bam.bai}') into ch_rm_orphan_bam_metrics, + ch_rm_orphan_bam_bigwig, + ch_rm_orphan_bam_macs_1, + ch_rm_orphan_bam_macs_2, + ch_rm_orphan_bam_phantompeakqualtools + tuple val(name), path("${prefix}.bam") into ch_rm_orphan_name_bam_counts + tuple val(name), path('*.flagstat') into ch_rm_orphan_flagstat_bigwig, + ch_rm_orphan_flagstat_macs, + ch_rm_orphan_flagstat_mqc + path '*.{idxstats,stats}' into ch_rm_orphan_stats_mqc script: // This script is bundled with the pipeline, in nf-core/chipseq/bin/ prefix = "${name}.mLb.clN" @@ -829,39 +852,48 @@ if (params.single_end) { /////////////////////////////////////////////////////////////////////////////// /* - * STEP 5.1 preseq analysis after merging libraries and before filtering + * STEP 5.1: Preseq analysis after merging libraries and before filtering */ -process Preseq { +process PRESEQ { tag "$name" - label 'process_low' - publishDir "${params.outdir}/bwa/mergedLibrary/preseq", mode: 'copy' + label 'process_medium' + label 'error_ignore' + publishDir "${params.outdir}/bwa/mergedLibrary/preseq", mode: params.publish_dir_mode when: !params.skip_preseq input: - set val(name), file(bam) from ch_merge_bam_preseq + tuple val(name), path(bam) from ch_merge_bam_preseq output: - file "*.ccurve.txt" into ch_preseq_mqc + path '*.ccurve.txt' into ch_preseq_mqc + path '*.log' script: - prefix = "${name}.mLb.clN" + pe = params.single_end ? '' : '-pe' """ - preseq lc_extrap -v -output ${prefix}.ccurve.txt -bam ${bam[0]} + preseq lc_extrap \\ + -output ${name}.ccurve.txt \\ + -verbose \\ + -bam \\ + $pe \\ + -seed 1 \\ + ${bam[0]} + cp .command.err ${name}.command.log """ } /* - * STEP 5.2 Picard CollectMultipleMetrics after merging libraries and filtering + * STEP 5.2: Picard CollectMultipleMetrics after merging libraries and filtering */ -process CollectMultipleMetrics { +process PICARD_METRICS { tag "$name" label 'process_medium' - publishDir path: "${params.outdir}/bwa/mergedLibrary", mode: 'copy', + publishDir path: "${params.outdir}/bwa/mergedLibrary", mode: params.publish_dir_mode, saveAs: { filename -> - if (filename.endsWith("_metrics")) "picard_metrics/$filename" - else if (filename.endsWith(".pdf")) "picard_metrics/pdf/$filename" + if (filename.endsWith('_metrics')) "picard_metrics/$filename" + else if (filename.endsWith('.pdf')) "picard_metrics/pdf/$filename" else null } @@ -869,18 +901,18 @@ process CollectMultipleMetrics { !params.skip_picard_metrics input: - set val(name), file(bam) from ch_rm_orphan_bam_metrics - file fasta from ch_fasta + tuple val(name), path(bam) from ch_rm_orphan_bam_metrics + path fasta from ch_fasta output: - file "*_metrics" into ch_collectmetrics_mqc - file "*.pdf" into ch_collectmetrics_pdf + path '*_metrics' into ch_collectmetrics_mqc + path '*.pdf' script: prefix = "${name}.mLb.clN" def avail_mem = 3 if (!task.memory) { - log.info "[Picard CollectMultipleMetrics] Available memory not known - defaulting to 3GB. Specify process memory requirements to change this." + log.info '[Picard CollectMultipleMetrics] Available memory not known - defaulting to 3GB. Specify process memory requirements to change this.' } else { avail_mem = task.memory.toGiga() } @@ -895,60 +927,59 @@ process CollectMultipleMetrics { } /* - * STEP 5.3 Read depth normalised bigWig + * STEP 5.3: Read depth normalised bigWig */ -process BigWig { +process BIGWIG { tag "$name" label 'process_medium' - publishDir "${params.outdir}/bwa/mergedLibrary/bigwig", mode: 'copy', + publishDir "${params.outdir}/bwa/mergedLibrary/bigwig", mode: params.publish_dir_mode, saveAs: { filename -> - if (filename.endsWith("scale_factor.txt")) "scale/$filename" - else if (filename.endsWith(".bigWig")) "$filename" + if (filename.endsWith('scale_factor.txt')) "scale/$filename" + else if (filename.endsWith('.bigWig')) filename else null } input: - set val(name), file(bam), file(flagstat) from ch_rm_orphan_bam_bigwig.join(ch_rm_orphan_flagstat_bigwig, by: [0]) - file sizes from ch_genome_sizes_bigwig.collect() + tuple val(name), path(bam), path(flagstat) from ch_rm_orphan_bam_bigwig.join(ch_rm_orphan_flagstat_bigwig, by: [0]) + path sizes from ch_genome_sizes_bigwig.collect() output: - set val(name), file("*.bigWig") into ch_bigwig_plotprofile - file "*scale_factor.txt" into ch_bigwig_scale - file "*igv.txt" into ch_bigwig_igv + tuple val(name), path('*.bigWig') into ch_bigwig_plotprofile + path '*igv.txt' into ch_bigwig_igv + path '*scale_factor.txt' script: - prefix = "${name}.mLb.clN" - pe_fragment = params.single_end ? "" : "-pc" + pe_fragment = params.single_end ? '' : '-pc' extend = (params.single_end && params.fragment_size > 0) ? "-fs ${params.fragment_size}" : '' """ SCALE_FACTOR=\$(grep 'mapped (' $flagstat | awk '{print 1000000/\$1}') - echo \$SCALE_FACTOR > ${prefix}.scale_factor.txt - genomeCoverageBed -ibam ${bam[0]} -bg -scale \$SCALE_FACTOR $pe_fragment $extend | sort -k1,1 -k2,2n > ${prefix}.bedGraph + echo \$SCALE_FACTOR > ${name}.scale_factor.txt + genomeCoverageBed -ibam ${bam[0]} -bg -scale \$SCALE_FACTOR $pe_fragment $extend | sort -T '.' -k1,1 -k2,2n > ${name}.bedGraph - bedGraphToBigWig ${prefix}.bedGraph $sizes ${prefix}.bigWig + bedGraphToBigWig ${name}.bedGraph $sizes ${name}.bigWig - find * -type f -name "*.bigWig" -exec echo -e "bwa/mergedLibrary/bigwig/"{}"\\t0,0,178" \\; > ${prefix}.bigWig.igv.txt + find * -type f -name "*.bigWig" -exec echo -e "bwa/mergedLibrary/bigwig/"{}"\\t0,0,178" \\; > ${name}.bigWig.igv.txt """ } /* - * STEP 5.4 generate gene body coverage plot with deepTools + * STEP 5.4: Generate gene body coverage plot with deepTools plotProfile and plotHeatmap */ -process PlotProfile { +process PLOTPROFILE { tag "$name" label 'process_high' - publishDir "${params.outdir}/bwa/mergedLibrary/deepTools/plotProfile", mode: 'copy' + publishDir "${params.outdir}/bwa/mergedLibrary/deepTools/plotProfile", mode: params.publish_dir_mode when: !params.skip_plot_profile input: - set val(name), file(bigwig) from ch_bigwig_plotprofile - file bed from ch_gene_bed + tuple val(name), path(bigwig) from ch_bigwig_plotprofile + path bed from ch_gene_bed output: - file '*.{gz,pdf}' into ch_plotprofile_results - file '*.plotProfile.tab' into ch_plotprofile_mqc + path '*.plotProfile.tab' into ch_plotprofile_mqc + path '*.{gz,pdf,mat.tab}' script: """ @@ -956,7 +987,7 @@ process PlotProfile { --regionsFileName $bed \\ --scoreFileName $bigwig \\ --outFileName ${name}.computeMatrix.mat.gz \\ - --outFileNameMatrix ${name}.computeMatrix.vals.mat.gz \\ + --outFileNameMatrix ${name}.computeMatrix.vals.mat.tab \\ --regionBodyLength 1000 \\ --beforeRegionStartLength 3000 \\ --afterRegionStartLength 3000 \\ @@ -967,31 +998,34 @@ process PlotProfile { plotProfile --matrixFile ${name}.computeMatrix.mat.gz \\ --outFileName ${name}.plotProfile.pdf \\ --outFileNameData ${name}.plotProfile.tab + + plotHeatmap --matrixFile ${name}.computeMatrix.mat.gz \\ + --outFileName ${name}.plotHeatmap.pdf \\ + --outFileNameMatrix ${name}.plotHeatmap.mat.tab """ } /* - * STEP 5.5 Phantompeakqualtools + * STEP 5.5: Phantompeakqualtools */ -process PhantomPeakQualTools { +process PHANTOMPEAKQUALTOOLS { tag "$name" label 'process_medium' - publishDir "${params.outdir}/bwa/mergedLibrary/phantompeakqualtools", mode: 'copy' + publishDir "${params.outdir}/bwa/mergedLibrary/phantompeakqualtools", mode: params.publish_dir_mode when: !params.skip_spp input: - set val(name), file(bam) from ch_rm_orphan_bam_phantompeakqualtools - file spp_correlation_header from ch_spp_correlation_header - file spp_nsc_header from ch_spp_nsc_header - file spp_rsc_header from ch_spp_rsc_header + tuple val(name), path(bam) from ch_rm_orphan_bam_phantompeakqualtools + path spp_correlation_header from ch_spp_correlation_header + path spp_nsc_header from ch_spp_nsc_header + path spp_rsc_header from ch_spp_rsc_header output: - file '*.pdf' into ch_spp_plot - file '*.spp.out' into ch_spp_out, - ch_spp_out_mqc - file '*_mqc.tsv' into ch_spp_csv_mqc + path '*.spp.out' into ch_spp_out_mqc + path '*_mqc.tsv' into ch_spp_csv_mqc + path '*.pdf' script: """ @@ -1025,25 +1059,25 @@ ch_design_controls_csv .map { it -> it[2..-1] } .into { ch_group_bam_macs; ch_group_bam_plotfingerprint; - ch_group_bam_deseq } + ch_group_bam_counts } /* - * STEP 6.1 deepTools plotFingerprint + * STEP 6.1: deepTools plotFingerprint */ -process PlotFingerprint { +process PLOTFINGERPRINT { tag "${ip} vs ${control}" label 'process_high' - publishDir "${params.outdir}/bwa/mergedLibrary/deepTools/plotFingerprint", mode: 'copy' + publishDir "${params.outdir}/bwa/mergedLibrary/deepTools/plotFingerprint", mode: params.publish_dir_mode when: !params.skip_plot_fingerprint input: - set val(antibody), val(replicatesExist), val(multipleGroups), val(ip), file(ipbam), val(control), file(controlbam), file(ipflagstat) from ch_group_bam_plotfingerprint + tuple val(antibody), val(replicatesExist), val(multipleGroups), val(ip), path(ipbam), val(control), path(controlbam), path(ipflagstat) from ch_group_bam_plotfingerprint output: - file '*.{txt,pdf}' into ch_plotfingerprint_results - file '*.raw.txt' into ch_plotfingerprint_mqc + path '*.raw.txt' into ch_plotfingerprint_mqc + path '*.{txt,pdf}' script: extend = (params.single_end && params.fragment_size > 0) ? "--extendReads ${params.fragment_size}" : '' @@ -1063,15 +1097,15 @@ process PlotFingerprint { } /* - * STEP 6.2 Call peaks with MACS2 and calculate FRiP score + * STEP 6.2: Call peaks with MACS2 and calculate FRiP score */ -process MACSCallPeak { +process MACS2 { tag "${ip} vs ${control}" label 'process_medium' - publishDir "${params.outdir}/bwa/mergedLibrary/macs/${PEAK_TYPE}", mode: 'copy', + publishDir "${params.outdir}/bwa/mergedLibrary/macs/${PEAK_TYPE}", mode: params.publish_dir_mode, saveAs: { filename -> - if (filename.endsWith(".tsv")) "qc/$filename" - else if (filename.endsWith(".igv.txt")) null + if (filename.endsWith('.tsv')) "qc/$filename" + else if (filename.endsWith('.igv.txt')) null else filename } @@ -1079,20 +1113,24 @@ process MACSCallPeak { params.macs_gsize input: - set val(antibody), val(replicatesExist), val(multipleGroups), val(ip), file(ipbam), val(control), file(controlbam), file(ipflagstat) from ch_group_bam_macs - file peak_count_header from ch_peak_count_header - file frip_score_header from ch_frip_score_header + tuple val(antibody), val(replicatesExist), val(multipleGroups), val(ip), path(ipbam), val(control), path(controlbam), path(ipflagstat) from ch_group_bam_macs + path peak_count_header from ch_peak_count_header + path frip_score_header from ch_frip_score_header output: - set val(ip), file("*.{bed,xls,gappedPeak,bdg}") into ch_macs_output - set val(antibody), val(replicatesExist), val(multipleGroups), val(ip), val(control), file("*.$PEAK_TYPE") into ch_macs_homer, ch_macs_qc, ch_macs_consensus - file "*igv.txt" into ch_macs_igv - file "*_mqc.tsv" into ch_macs_mqc + tuple val(antibody), val(replicatesExist), val(multipleGroups), val(ip), val(control), path("*.$PEAK_TYPE") into ch_macs_homer, + ch_macs_qc, + ch_macs_consensus + path '*igv.txt' into ch_macs_igv + path '*_mqc.tsv' into ch_macs_mqc + path '*.{bed,xls,gappedPeak,bdg}' script: broad = params.narrow_peak ? '' : "--broad --broad-cutoff ${params.broad_cutoff}" - format = params.single_end ? "BAM" : "BAMPE" - pileup = params.save_macs_pileup ? "-B --SPMR" : "" + format = params.single_end ? 'BAM' : 'BAMPE' + pileup = params.save_macs_pileup ? '-B --SPMR' : '' + fdr = params.macs_fdr ? "--qvalue ${params.macs_fdr}" : '' + pvalue = params.macs_pvalue ? "--pvalue ${params.macs_pvalue}" : '' """ macs2 callpeak \\ -t ${ipbam[0]} \\ @@ -1102,6 +1140,8 @@ process MACSCallPeak { -g $params.macs_gsize \\ -n $ip \\ $pileup \\ + $fdr \\ + $pvalue \\ --keep-dup all cat ${ip}_peaks.${PEAK_TYPE} | wc -l | awk -v OFS='\t' '{ print "${ip}", \$1 }' | cat $peak_count_header - > ${ip}_peaks.count_mqc.tsv @@ -1114,23 +1154,23 @@ process MACSCallPeak { } /* - * STEP 6.3 Annotate peaks with HOMER + * STEP 6.3: Annotate peaks with HOMER */ -process AnnotatePeaks { +process MACS2_ANNOTATE { tag "${ip} vs ${control}" label 'process_medium' - publishDir "${params.outdir}/bwa/mergedLibrary/macs/${PEAK_TYPE}", mode: 'copy' + publishDir "${params.outdir}/bwa/mergedLibrary/macs/${PEAK_TYPE}", mode: params.publish_dir_mode when: - params.macs_gsize + params.macs_gsize && !params.skip_peak_annotation input: - set val(antibody), val(replicatesExist), val(multipleGroups), val(ip), val(control), file(peak) from ch_macs_homer - file fasta from ch_fasta - file gtf from ch_gtf + tuple val(antibody), val(replicatesExist), val(multipleGroups), val(ip), val(control), path(peak) from ch_macs_homer + path fasta from ch_fasta + path gtf from ch_gtf output: - file "*.txt" into ch_macs_annotate + path '*.txt' into ch_macs_annotate script: """ @@ -1145,23 +1185,23 @@ process AnnotatePeaks { } /* - * STEP 6.4 Aggregated QC plots for peaks, FRiP and peak-to-gene annotation + * STEP 6.4: Aggregated QC plots for peaks, FRiP and peak-to-gene annotation */ -process PeakQC { - label "process_medium" - publishDir "${params.outdir}/bwa/mergedLibrary/macs/${PEAK_TYPE}/qc", mode: 'copy' +process MACS2_QC { + label 'process_medium' + publishDir "${params.outdir}/bwa/mergedLibrary/macs/${PEAK_TYPE}/qc", mode: params.publish_dir_mode when: - params.macs_gsize + params.macs_gsize && !params.skip_peak_annotation && !params.skip_peak_qc input: - file peaks from ch_macs_qc.collect{ it[-1] } - file annos from ch_macs_annotate.collect() - file peak_annotation_header from ch_peak_annotation_header + path peaks from ch_macs_qc.collect{ it[-1] } + path annos from ch_macs_annotate.collect() + path peak_annotation_header from ch_peak_annotation_header output: - file "*.{txt,pdf}" into ch_macs_qc_output - file "*.tsv" into ch_macs_qc_mqc + path '*.tsv' into ch_macs_qc_mqc + path '*.{txt,pdf}' script: // This script is bundled with the pipeline, in nf-core/chipseq/bin/ """ @@ -1189,7 +1229,7 @@ process PeakQC { /////////////////////////////////////////////////////////////////////////////// /////////////////////////////////////////////////////////////////////////////// -// group by ip from this point and carry forward boolean variables +// Group by ip from this point and carry forward boolean variables ch_macs_consensus .map { it -> [ it[0], it[1], it[2], it[-1] ] } .groupTuple() @@ -1197,37 +1237,37 @@ ch_macs_consensus .set { ch_macs_consensus } /* - * STEP 7.1 Consensus peaks across samples, create boolean filtering file, .saf file for featureCounts and UpSetR plot for intersection + * STEP 7.1: Consensus peaks across samples, create boolean filtering file, SAF file for featureCounts and UpSetR plot for intersection */ -process ConsensusPeakSet { +process CONSENSUS_PEAKS { tag "${antibody}" label 'process_long' - publishDir "${params.outdir}/bwa/mergedLibrary/macs/${PEAK_TYPE}/consensus/${antibody}", mode: 'copy', + publishDir "${params.outdir}/bwa/mergedLibrary/macs/${PEAK_TYPE}/consensus/${antibody}", mode: params.publish_dir_mode, saveAs: { filename -> - if (filename.endsWith(".igv.txt")) null + if (filename.endsWith('.igv.txt')) null else filename } when: - params.macs_gsize && (replicatesExist || multipleGroups) + params.macs_gsize && (replicatesExist || multipleGroups) && !params.skip_consensus_peaks input: - set val(antibody), val(replicatesExist), val(multipleGroups), file(peaks) from ch_macs_consensus + tuple val(antibody), val(replicatesExist), val(multipleGroups), path(peaks) from ch_macs_consensus output: - set val(antibody), val(replicatesExist), val(multipleGroups), file("*.bed") into ch_macs_consensus_bed - set val(antibody), file("*.saf") into ch_macs_consensus_saf - file "*.boolean.txt" into ch_macs_consensus_bool - file "*.intersect.{txt,plot.pdf}" into ch_macs_consensus_intersect - file "*igv.txt" into ch_macs_consensus_igv + tuple val(antibody), val(replicatesExist), val(multipleGroups), path('*.bed') into ch_macs_consensus_bed + tuple val(antibody), path('*.saf') into ch_macs_consensus_saf + path '*.boolean.txt' into ch_macs_consensus_bool + path '*igv.txt' into ch_macs_consensus_igv + path '*.intersect.{txt,plot.pdf}' - script: // scripts are bundled with the pipeline, in nf-core/chipseq/bin/ + script: // scripts are bundled with the pipeline in nf-core/chipseq/bin/ prefix = "${antibody}.consensus_peaks" mergecols = params.narrow_peak ? (2..10).join(',') : (2..9).join(',') - collapsecols = params.narrow_peak ? (["collapse"]*9).join(',') : (["collapse"]*8).join(',') - expandparam = params.narrow_peak ? "--is_narrow_peak" : "" + collapsecols = params.narrow_peak ? (['collapse']*9).join(',') : (['collapse']*8).join(',') + expandparam = params.narrow_peak ? '--is_narrow_peak' : '' """ - sort -k1,1 -k2,2n ${peaks.collect{it.toString()}.sort().join(' ')} \\ + sort -T '.' -k1,1 -k2,2n ${peaks.collect{it.toString()}.sort().join(' ')} \\ | mergeBed -c $mergecols -o $collapsecols > ${prefix}.txt macs2_merged_expand.py ${prefix}.txt \\ @@ -1248,24 +1288,24 @@ process ConsensusPeakSet { } /* - * STEP 7.2 Annotate consensus peaks with HOMER, and add annotation to boolean output file + * STEP 7.2: Annotate consensus peaks with HOMER, and add annotation to boolean output file */ -process ConsensusPeakSetAnnotate { +process CONSENSUS_PEAKS_ANNOTATE { tag "${antibody}" label 'process_medium' - publishDir "${params.outdir}/bwa/mergedLibrary/macs/${PEAK_TYPE}/consensus/${antibody}", mode: 'copy' + publishDir "${params.outdir}/bwa/mergedLibrary/macs/${PEAK_TYPE}/consensus/${antibody}", mode: params.publish_dir_mode when: - params.macs_gsize && (replicatesExist || multipleGroups) + params.macs_gsize && (replicatesExist || multipleGroups) && !params.skip_consensus_peaks && !params.skip_peak_annotation input: - set val(antibody), val(replicatesExist), val(multipleGroups), file(bed) from ch_macs_consensus_bed - file bool from ch_macs_consensus_bool - file fasta from ch_fasta - file gtf from ch_gtf + tuple val(antibody), val(replicatesExist), val(multipleGroups), path(bed) from ch_macs_consensus_bed + path bool from ch_macs_consensus_bool + path fasta from ch_fasta + path gtf from ch_gtf output: - file "*.annotatePeaks.txt" into ch_macs_consensus_annotate + path '*.annotatePeaks.txt' script: prefix = "${antibody}.consensus_peaks" @@ -1278,56 +1318,43 @@ process ConsensusPeakSetAnnotate { -cpu $task.cpus \\ > ${prefix}.annotatePeaks.txt - cut -f2- ${prefix}.annotatePeaks.txt | awk 'NR==1; NR > 1 {print \$0 | "sort -k1,1 -k2,2n"}' | cut -f6- > tmp.txt + cut -f2- ${prefix}.annotatePeaks.txt | awk 'NR==1; NR > 1 {print \$0 | "sort -T '.' -k1,1 -k2,2n"}' | cut -f6- > tmp.txt paste $bool tmp.txt > ${prefix}.boolean.annotatePeaks.txt """ } -// get bam and saf files for each ip -ch_group_bam_deseq +// Get BAM and SAF files for each ip +ch_group_bam_counts .map { it -> [ it[3], [ it[0], it[1], it[2] ] ] } .join(ch_rm_orphan_name_bam_counts) .map { it -> [ it[1][0], it[1][1], it[1][2], it[2] ] } .groupTuple() .map { it -> [ it[0], it[1][0], it[2][0], it[3].flatten().sort() ] } .join(ch_macs_consensus_saf) - .set { ch_group_bam_deseq } + .set { ch_group_bam_counts } /* - * STEP 7.3 Count reads in consensus peaks with featureCounts and perform differential analysis with DESeq2 + * STEP 7.3: Count reads in consensus peaks with featureCounts */ -process ConsensusPeakSetDESeq { +process CONSENSUS_PEAKS_COUNTS { tag "${antibody}" label 'process_medium' - publishDir "${params.outdir}/bwa/mergedLibrary/macs/${PEAK_TYPE}/consensus/${antibody}/deseq2", mode: 'copy', - saveAs: { filename -> - if (filename.endsWith(".igv.txt")) null - else filename - } + publishDir "${params.outdir}/bwa/mergedLibrary/macs/${PEAK_TYPE}/consensus/${antibody}", mode: params.publish_dir_mode when: - params.macs_gsize && replicatesExist && multipleGroups && !params.skip_diff_analysis + params.macs_gsize && (replicatesExist || multipleGroups) && !params.skip_consensus_peaks input: - set val(antibody), val(replicatesExist), val(multipleGroups), file(bams) ,file(saf) from ch_group_bam_deseq - file deseq2_pca_header from ch_deseq2_pca_header - file deseq2_clustering_header from ch_deseq2_clustering_header + tuple val(antibody), val(replicatesExist), val(multipleGroups), path(bams), path(saf) from ch_group_bam_counts output: - file "*featureCounts.txt" into ch_macs_consensus_counts - file "*featureCounts.txt.summary" into ch_macs_consensus_counts_mqc - file "*.{RData,results.txt,pdf,log}" into ch_macs_consensus_deseq_results - file "sizeFactors" into ch_macs_consensus_deseq_factors - file "*vs*/*.{pdf,txt}" into ch_macs_consensus_deseq_comp_results - file "*vs*/*.bed" into ch_macs_consensus_deseq_comp_bed - file "*igv.txt" into ch_macs_consensus_deseq_comp_igv - file "*.tsv" into ch_macs_consensus_deseq_mqc + tuple val(antibody), val(replicatesExist), val(multipleGroups), path('*featureCounts.txt') into ch_macs_consensus_counts + path '*featureCounts.txt.summary' into ch_macs_consensus_counts_mqc script: prefix = "${antibody}.consensus_peaks" bam_files = bams.findAll { it.toString().endsWith('.bam') }.sort() - bam_ext = params.single_end ? ".mLb.clN.sorted.bam" : ".mLb.clN.bam" - pe_params = params.single_end ? '' : "-p --donotsort" + pe_params = params.single_end ? '' : '-p --donotsort' """ featureCounts \\ -F SAF \\ @@ -1338,15 +1365,57 @@ process ConsensusPeakSetDESeq { -a $saf \\ -o ${prefix}.featureCounts.txt \\ ${bam_files.join(' ')} + """ +} - featurecounts_deseq2.r -i ${prefix}.featureCounts.txt -b '$bam_ext' -o ./ -p $prefix -s .mLb +/* + * STEP 7.4: Differential analysis with DESeq2 + */ +process CONSENSUS_PEAKS_DESEQ2 { + tag "${antibody}" + label 'process_medium' + publishDir "${params.outdir}/bwa/mergedLibrary/macs/${PEAK_TYPE}/consensus/${antibody}/deseq2", mode: params.publish_dir_mode, + saveAs: { filename -> + if (filename.endsWith('.igv.txt')) null + else filename + } + + when: + params.macs_gsize && replicatesExist && multipleGroups && !params.skip_consensus_peaks && !params.skip_diff_analysis + + input: + tuple val(antibody), val(replicatesExist), val(multipleGroups), path(counts) from ch_macs_consensus_counts + path deseq2_pca_header from ch_deseq2_pca_header + path deseq2_clustering_header from ch_deseq2_clustering_header + + output: + path '*.tsv' into ch_macs_consensus_deseq_mqc + path '*igv.txt' into ch_macs_consensus_deseq_comp_igv + path '*.{RData,results.txt,pdf,log}' + path 'sizeFactors' + path '*vs*/*.{pdf,txt}' + path '*vs*/*.bed' + + script: + prefix = "${antibody}.consensus_peaks" + bam_ext = params.single_end ? '.mLb.clN.sorted.bam' : '.mLb.clN.bam' + vst = params.deseq2_vst ? '--vst TRUE' : '' + """ + featurecounts_deseq2.r \\ + --featurecount_file $counts \\ + --bam_suffix '$bam_ext' \\ + --outdir ./ \\ + --outprefix $prefix \\ + --outsuffix '' \\ + --cores $task.cpus \\ + $vst sed 's/deseq2_pca/deseq2_pca_${task.index}/g' <$deseq2_pca_header >tmp.txt - sed -i -e 's/DESeq2:/${antibody} DESeq2:/g' tmp.txt + sed -i -e 's/DESeq2 /${antibody} DESeq2 /g' tmp.txt cat tmp.txt ${prefix}.pca.vals.txt > ${prefix}.pca.vals_mqc.tsv sed 's/deseq2_clustering/deseq2_clustering_${task.index}/g' <$deseq2_clustering_header >tmp.txt - sed -i -e 's/DESeq2:/${antibody} DESeq2:/g' tmp.txt + sed -i -e 's/DESeq2 /${antibody} DESeq2 /g' tmp.txt cat tmp.txt ${prefix}.sample.dists.txt > ${prefix}.sample.dists_mqc.tsv find * -type f -name "*.FDR0.05.results.bed" -exec echo -e "bwa/mergedLibrary/macs/${PEAK_TYPE}/consensus/${antibody}/deseq2/"{}"\\t255,0,0" \\; > ${prefix}.igv.txt @@ -1362,28 +1431,28 @@ process ConsensusPeakSetDESeq { /////////////////////////////////////////////////////////////////////////////// /* - * STEP 8 - Create IGV session file + * STEP 8: Create IGV session file */ process IGV { - publishDir "${params.outdir}/igv/${PEAK_TYPE}", mode: 'copy' + publishDir "${params.outdir}/igv/${PEAK_TYPE}", mode: params.publish_dir_mode when: !params.skip_igv input: - file fasta from ch_fasta - file bigwigs from ch_bigwig_igv.collect().ifEmpty([]) - file peaks from ch_macs_igv.collect().ifEmpty([]) - file consensus_peaks from ch_macs_consensus_igv.collect().ifEmpty([]) - file differential_peaks from ch_macs_consensus_deseq_comp_igv.collect().ifEmpty([]) + path fasta from ch_fasta + path bigwigs from ch_bigwig_igv.collect().ifEmpty([]) + path peaks from ch_macs_igv.collect().ifEmpty([]) + path consensus_peaks from ch_macs_consensus_igv.collect().ifEmpty([]) + path differential_peaks from ch_macs_consensus_deseq_comp_igv.collect().ifEmpty([]) output: - file "*.{txt,xml}" into ch_igv_session + path '*.{txt,xml}' - script: // scripts are bundled with the pipeline, in nf-core/chipseq/bin/ + script: // scripts are bundled with the pipeline in nf-core/chipseq/bin/ """ cat *.txt > igv_files.txt - igv_files_to_session.py igv_session.xml igv_files.txt ../../reference_genome/${fasta.getName()} --path_prefix '../../' + igv_files_to_session.py igv_session.xml igv_files.txt ../../genome/${fasta.getName()} --path_prefix '../../' """ } @@ -1399,15 +1468,15 @@ process IGV { * Parse software version numbers */ process get_software_versions { - publishDir "${params.outdir}/pipeline_info", mode: 'copy', + publishDir "${params.outdir}/pipeline_info", mode: params.publish_dir_mode, saveAs: { filename -> - if (filename.indexOf(".csv") > 0) filename + if (filename.indexOf('.csv') > 0) filename else null } output: - file 'software_versions_mqc.yaml' into ch_software_versions_mqc - file "software_versions.csv" + path 'software_versions_mqc.yaml' into ch_software_versions_mqc + path 'software_versions.csv' script: """ @@ -1432,10 +1501,10 @@ process get_software_versions { """ } -def create_workflow_summary(summary) { - - def yaml_file = workDir.resolve('workflow_summary_mqc.yaml') - yaml_file.text = """ +Channel.from(summary.collect{ [it.key, it.value] }) + .map { k,v -> "
$k
${v ?: 'N/A'}
" } + .reduce { a, b -> return [a, b].join("\n ") } + .map { x -> """ id: 'nf-core-chipseq-summary' description: " - this information is collected when the pipeline is started." section_name: 'nf-core/chipseq Workflow Summary' @@ -1443,61 +1512,59 @@ def create_workflow_summary(summary) { plot_type: 'html' data: |
-${summary.collect { k,v -> "
$k
${v ?: 'N/A'}
" }.join("\n")} + $x
- """.stripIndent() - - return yaml_file -} + """.stripIndent() } + .set { ch_workflow_summary } /* - * STEP 9 - MultiQC + * STEP 9: MultiQC */ -process MultiQC { - publishDir "${params.outdir}/multiqc/${PEAK_TYPE}", mode: 'copy' +process MULTIQC { + publishDir "${params.outdir}/multiqc/${PEAK_TYPE}", mode: params.publish_dir_mode when: !params.skip_multiqc input: - file multiqc_config from ch_multiqc_config - - file ('software_versions/*') from ch_software_versions_mqc.collect() - file ('workflow_summary/*') from create_workflow_summary(summary) - - file ('fastqc/*') from ch_fastqc_reports_mqc.collect().ifEmpty([]) - file ('trimgalore/*') from ch_trimgalore_results_mqc.collect().ifEmpty([]) - file ('trimgalore/fastqc/*') from ch_trimgalore_fastqc_reports_mqc.collect().ifEmpty([]) - - file ('alignment/library/*') from ch_sort_bam_flagstat_mqc.collect() - file ('alignment/mergedLibrary/*') from ch_merge_bam_stats_mqc.collect() - file ('alignment/mergedLibrary/*') from ch_rm_orphan_flagstat_mqc.collect{it[1]} - file ('alignment/mergedLibrary/*') from ch_rm_orphan_stats_mqc.collect() - file ('alignment/mergedLibrary/picard_metrics/*') from ch_merge_bam_metrics_mqc.collect() - file ('alignment/mergedLibrary/picard_metrics/*') from ch_collectmetrics_mqc.collect() - - file ('macs/*') from ch_macs_mqc.collect().ifEmpty([]) - file ('macs/*') from ch_macs_qc_mqc.collect().ifEmpty([]) - file ('macs/consensus/*') from ch_macs_consensus_counts_mqc.collect().ifEmpty([]) - file ('macs/consensus/*') from ch_macs_consensus_deseq_mqc.collect().ifEmpty([]) - - file ('preseq/*') from ch_preseq_mqc.collect().ifEmpty([]) - file ('deeptools/*') from ch_plotfingerprint_mqc.collect().ifEmpty([]) - file ('deeptools/*') from ch_plotprofile_mqc.collect().ifEmpty([]) - file ('phantompeakqualtools/*') from ch_spp_out_mqc.collect().ifEmpty([]) - file ('phantompeakqualtools/*') from ch_spp_csv_mqc.collect().ifEmpty([]) + path (multiqc_config) from ch_multiqc_config + path (mqc_custom_config) from ch_multiqc_custom_config.collect().ifEmpty([]) + + path ('software_versions/*') from ch_software_versions_mqc.collect() + path workflow_summary from ch_workflow_summary.collectFile(name: 'workflow_summary_mqc.yaml') + + path ('fastqc/*') from ch_fastqc_reports_mqc.collect().ifEmpty([]) + path ('trimgalore/*') from ch_trimgalore_results_mqc.collect().ifEmpty([]) + path ('trimgalore/fastqc/*') from ch_trimgalore_fastqc_reports_mqc.collect().ifEmpty([]) + + path ('alignment/library/*') from ch_sort_bam_flagstat_mqc.collect() + path ('alignment/mergedLibrary/*') from ch_merge_bam_stats_mqc.collect() + path ('alignment/mergedLibrary/*') from ch_rm_orphan_flagstat_mqc.collect{it[1]} + path ('alignment/mergedLibrary/*') from ch_rm_orphan_stats_mqc.collect() + path ('alignment/mergedLibrary/picard_metrics/*') from ch_merge_bam_metrics_mqc.collect() + path ('alignment/mergedLibrary/picard_metrics/*') from ch_collectmetrics_mqc.collect() + + path ('macs/*') from ch_macs_mqc.collect().ifEmpty([]) + path ('macs/*') from ch_macs_qc_mqc.collect().ifEmpty([]) + path ('macs/consensus/*') from ch_macs_consensus_counts_mqc.collect().ifEmpty([]) + path ('macs/consensus/*') from ch_macs_consensus_deseq_mqc.collect().ifEmpty([]) + + path ('preseq/*') from ch_preseq_mqc.collect().ifEmpty([]) + path ('deeptools/*') from ch_plotfingerprint_mqc.collect().ifEmpty([]) + path ('deeptools/*') from ch_plotprofile_mqc.collect().ifEmpty([]) + path ('phantompeakqualtools/*') from ch_spp_out_mqc.collect().ifEmpty([]) + path ('phantompeakqualtools/*') from ch_spp_csv_mqc.collect().ifEmpty([]) output: - file "*multiqc_report.html" into ch_multiqc_report - file "*_data" - file "multiqc_plots" + path '*multiqc_report.html' into ch_multiqc_report + path '*_data' script: rtitle = custom_runName ? "--title \"$custom_runName\"" : '' rfilename = custom_runName ? "--filename " + custom_runName.replaceAll('\\W','_').replaceAll('_+','_') + "_multiqc_report" : '' + custom_config_file = params.multiqc_config ? "--config $mqc_custom_config" : '' """ - multiqc . -f $rtitle $rfilename --config $multiqc_config \\ - -m custom_content -m fastqc -m cutadapt -m samtools -m picard -m preseq -m featureCounts -m deeptools -m phantompeakqualtools + multiqc . -f $rtitle $rfilename $custom_config_file """ } @@ -1510,20 +1577,21 @@ process MultiQC { /////////////////////////////////////////////////////////////////////////////// /* - * STEP 10 - Output description HTML + * STEP 10: Output description HTML */ process output_documentation { - publishDir "${params.outdir}/Documentation", mode: 'copy' + publishDir "${params.outdir}/pipeline_info", mode: params.publish_dir_mode input: - file output_docs from ch_output_docs + path output_docs from ch_output_docs + path images from ch_output_docs_images output: - file "results_description.html" + path 'results_description.html' script: """ - markdown_to_html.r $output_docs results_description.html + markdown_to_html.py $output_docs -o results_description.html """ } @@ -1556,7 +1624,6 @@ workflow.onComplete { if (workflow.repository) email_fields['summary']['Pipeline repository Git URL'] = workflow.repository if (workflow.commitId) email_fields['summary']['Pipeline repository Git Commit'] = workflow.commitId if (workflow.revision) email_fields['summary']['Pipeline Git branch/tag'] = workflow.revision - if (workflow.container) email_fields['summary']['Docker image'] = workflow.container email_fields['summary']['Nextflow Version'] = workflow.nextflow.version email_fields['summary']['Nextflow Build'] = workflow.nextflow.build email_fields['summary']['Nextflow Compile Timestamp'] = workflow.nextflow.timestamp @@ -1622,22 +1689,22 @@ workflow.onComplete { def output_tf = new File(output_d, "pipeline_report.txt") output_tf.withWriter { w -> w << email_txt } - c_reset = params.monochrome_logs ? '' : "\033[0m"; - c_purple = params.monochrome_logs ? '' : "\033[0;35m"; c_green = params.monochrome_logs ? '' : "\033[0;32m"; + c_purple = params.monochrome_logs ? '' : "\033[0;35m"; c_red = params.monochrome_logs ? '' : "\033[0;31m"; + c_reset = params.monochrome_logs ? '' : "\033[0m"; if (workflow.stats.ignoredCount > 0 && workflow.success) { - log.info "${c_purple}Warning, pipeline completed, but with errored process(es) ${c_reset}" - log.info "${c_red}Number of ignored errored process(es) : ${workflow.stats.ignoredCount} ${c_reset}" - log.info "${c_green}Number of successfully ran process(es) : ${workflow.stats.succeedCount} ${c_reset}" + log.info "-${c_purple}Warning, pipeline completed, but with errored process(es) ${c_reset}-" + log.info "-${c_red}Number of ignored errored process(es) : ${workflow.stats.ignoredCount} ${c_reset}-" + log.info "-${c_green}Number of successfully ran process(es) : ${workflow.stats.succeedCount} ${c_reset}-" } if (workflow.success) { - log.info "${c_purple}[nf-core/chipseq]${c_green} Pipeline completed successfully${c_reset}" + log.info "-${c_purple}[nf-core/chipseq]${c_green} Pipeline completed successfully${c_reset}-" } else { checkHostname() - log.info "${c_purple}[nf-core/chipseq]${c_red} Pipeline completed with errors${c_reset}" + log.info "-${c_purple}[nf-core/chipseq]${c_red} Pipeline completed with errors${c_reset}-" } } @@ -1652,15 +1719,15 @@ workflow.onComplete { def nfcoreHeader() { // Log colors ANSI codes - c_reset = params.monochrome_logs ? '' : "\033[0m"; - c_dim = params.monochrome_logs ? '' : "\033[2m"; c_black = params.monochrome_logs ? '' : "\033[0;30m"; - c_green = params.monochrome_logs ? '' : "\033[0;32m"; - c_yellow = params.monochrome_logs ? '' : "\033[0;33m"; c_blue = params.monochrome_logs ? '' : "\033[0;34m"; - c_purple = params.monochrome_logs ? '' : "\033[0;35m"; c_cyan = params.monochrome_logs ? '' : "\033[0;36m"; + c_dim = params.monochrome_logs ? '' : "\033[2m"; + c_green = params.monochrome_logs ? '' : "\033[0;32m"; + c_purple = params.monochrome_logs ? '' : "\033[0;35m"; + c_reset = params.monochrome_logs ? '' : "\033[0m"; c_white = params.monochrome_logs ? '' : "\033[0;37m"; + c_yellow = params.monochrome_logs ? '' : "\033[0;33m"; return """ -${c_dim}--------------------------------------------------${c_reset}- ${c_green},--.${c_black}/${c_green},-.${c_reset} diff --git a/nextflow.config b/nextflow.config index c758ab92..4e54affd 100644 --- a/nextflow.config +++ b/nextflow.config @@ -9,6 +9,7 @@ params { // Options: Generic + input = './design.csv' single_end = false seq_center = false fragment_size = 200 @@ -16,7 +17,6 @@ params { // Options: References genome = false - tss_bed = false save_reference = false // Options: Trimming @@ -29,6 +29,7 @@ params { save_trimmed = false // Options: Alignments + bwa_min_score = false keep_dups = false keep_multi_map = false save_align_intermeds = false @@ -36,8 +37,16 @@ params { // Options: Peaks narrow_peak = false broad_cutoff = 0.1 + macs_fdr = false + macs_pvalue = false min_reps_consensus = 1 save_macs_pileup = false + skip_peak_qc = false + skip_peak_annotation = false + skip_consensus_peaks = false + + // Options: Differential analysis + deseq2_vst = false skip_diff_analysis = false // Options: QC @@ -50,12 +59,8 @@ params { skip_igv = false skip_multiqc = false - // Options: AWSBatch - awsqueue = false - awsregion = 'eu-west-1' - // Options: Config - multiqc_config = "$baseDir/assets/multiqc/multiqc_config.yaml" + multiqc_config = false bamtools_filter_pe_config = "$baseDir/assets/bamtools_filter_pe.json" bamtools_filter_se_config = "$baseDir/assets/bamtools_filter_se.json" @@ -69,6 +74,7 @@ params { // Options: Other help = false outdir = './results' + publish_dir_mode = 'copy' igenomes_base = 's3://ngi-igenomes/igenomes/' igenomes_ignore = false max_multiqc_email_size = 25.MB @@ -90,7 +96,7 @@ params { // Container slug. Stable releases should specify release tag! // Developmental code should specify :dev -process.container = 'nfcore/chipseq:1.1.0' +process.container = 'nfcore/chipseq:1.2.0' // Load base.config by default for all pipelines includeConfig 'conf/base.config' @@ -103,20 +109,24 @@ try { } profiles { - awsbatch { includeConfig 'conf/awsbatch.config' } conda { process.conda = "$baseDir/environment.yml" } debug { process.beforeScript = 'echo $HOSTNAME' } - docker { docker.enabled = true } - singularity { singularity.enabled = true - singularity.autoMounts = true } + docker { + docker.enabled = true + // Avoid this error: + // WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. + // Testing this in nf-core after discussion here https://github.com/nf-core/tools/pull/351 + // once this is established and works well, nextflow might implement this behavior as new default. + docker.runOptions = '-u \$(id -u):\$(id -g)' + } + singularity { + singularity.enabled = true + singularity.autoMounts = true + } test { includeConfig 'conf/test.config' } + test_full { includeConfig 'conf/test_full.config' } } -// Avoid this error: -// WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. -// Testing this in nf-core after discussion here https://github.com/nf-core/tools/pull/351, once this is established and works well, nextflow might implement this behavior as new default. -docker.runOptions = '-u \$(id -u):\$(id -g)' - // Load igenomes.config if required if (!params.igenomes_ignore) { includeConfig 'conf/igenomes.config' @@ -125,9 +135,11 @@ if (!params.igenomes_ignore) { // Increase time available to build conda environment conda { createTimeout = "60 min" } -// Export this variable to prevent local Python libraries from conflicting with those in the container +// Export these variables to prevent local Python/R libraries from conflicting with those in the container env { PYTHONNOUSERSITE = 1 + R_PROFILE_USER = "/.Rprofile" + R_ENVIRON_USER = "/.Renviron" } // Capture exit codes from upstream processes when piping @@ -157,7 +169,7 @@ manifest { description = 'ChIP-seq peak-calling and differential analysis pipeline.' mainScript = 'main.nf' nextflowVersion = '>=19.10.0' - version = '1.1.0' + version = '1.2.0' } // Function to ensure that resource requirements don't go beyond diff --git a/nextflow_schema.json b/nextflow_schema.json new file mode 100644 index 00000000..5614b18c --- /dev/null +++ b/nextflow_schema.json @@ -0,0 +1,540 @@ +{ + "$schema": "http://json-schema.org/draft-07/schema", + "$id": "https://raw.githubusercontent.com/nf-core/atacseq/master/nextflow_schema.json", + "title": "nf-core/atacseq pipeline parameters", + "description": "ATACSeq peak-calling and differential analysis pipeline.", + "type": "object", + "properties": { + "Input/output options": { + "type": "object", + "properties": { + "input": { + "type": "string", + "description": "Path to comma-separated file containing information about the samples in the experiment.", + "help_text": "You will need to create a design file with information about the samples in your experiment before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with 4 columns, and a header row. See [usage docs](https://nf-co.re/atacseq/docs/usage#--input).", + "fa_icon": "fas fa-file-csv" + }, + "single_end": { + "type": "boolean", + "description": "Specifies that the input is single-end reads.", + "fa_icon": "fas fa-align-center", + "default": false, + "help_text": "By default, the pipeline expects paired-end data. If you have single-end data, specify this parameter on the command line when you launch the pipeline. It is not possible to run a mixture of single-end and paired-end files in one run." + }, + "fragment_size": { + "type": "integer", + "default": 0, + "description": "Estimated fragment size used to extend single-end reads.", + "fa_icon": "fas fa-chart-area", + "help_text": "" + }, + "seq_center": { + "type": "string", + "description": "Sequencing center information to be added to read group of BAM files.", + "fa_icon": "fas fa-synagogue" + }, + "outdir": { + "type": "string", + "description": "Path to the output directory where the results will be saved.", + "default": "./results", + "fa_icon": "fas fa-folder-open", + "help_text": "" + }, + "email": { + "type": "string", + "description": "Email address for completion summary.", + "fa_icon": "fas fa-envelope", + "help_text": "An email address to send a summary email to when the pipeline is completed.", + "pattern": "^([a-zA-Z0-9_\\-\\.]+)@([a-zA-Z0-9_\\-\\.]+)\\.([a-zA-Z]{2,5})$" + } + }, + "required": [ + "input" + ], + "fa_icon": "fas fa-terminal" + }, + "Reference genome options": { + "type": "object", + "properties": { + "genome": { + "type": "string", + "description": "Name of iGenomes reference.", + "fa_icon": "fas fa-book", + "help_text": "If using a reference genome configured in the pipeline using iGenomes, use this parameter to give the ID for the reference. This is then used to build the full paths for all required reference genome files e.g. `--genome GRCh38`." + }, + "fasta": { + "type": "string", + "description": "Path to Fasta reference file.", + "help_text": "This parameter is *mandatory* if `--genome` is not specified. If you don't have a BWA index available this will be generated for you automatically. Combine with `--save_reference` to save BWA index for future runs.", + "fa_icon": "far fa-file-code" + }, + "gtf": { + "type": "string", + "description": "Path to GTF annotation file.", + "fa_icon": "fas fa-file-invoice", + "help_text": "This parameter is *mandatory* if `--genome` is not specified." + }, + "bwa_index": { + "type": "string", + "description": "Full path to directory containing BWA index including base name. i.e. `/path/to/index/genome.fa`.", + "fa_icon": "fas fa-bezier-curve", + "help_text": "" + }, + "gene_bed": { + "type": "string", + "description": "Path to BED file containing gene intervals. This will be created from the GTF file if not specified.", + "fa_icon": "fas fa-procedures", + "help_text": "" + }, + "macs_gsize": { + "type": "string", + "description": "Effective genome size parameter required by MACS2.", + "help_text": "[Effective genome size](https://github.com/taoliu/MACS#-g--gsize) parameter required by MACS2. If using an iGenomes reference these have been provided when `--genome` is set as *GRCh37*, *GRCh38*, *GRCm38*, *WBcel235*, *BDGP6*, *R64-1-1*, *EF2*, *hg38*, *hg19* and *mm10*. For other genomes, if this parameter is not specified then the MACS2 peak-calling and differential analysis will be skipped.", + "fa_icon": "fas fa-arrows-alt-h" + }, + "blacklist": { + "type": "string", + "description": "Path to blacklist regions in BED format, used for filtering alignments.", + "help_text": "If provided, alignments that overlap with the regions in this file will be filtered out (see [ENCODE blacklists](https://sites.google.com/site/anshulkundaje/projects/blacklists)). The file should be in BED format. Blacklisted regions for *GRCh37*, *GRCh38*, *GRCm38*, *hg19*, *hg38*, *mm10* are bundled with the pipeline in the [`blacklists`](../assets/blacklists/) directory, and as such will be automatically used if any of those genomes are specified with the `--genome` parameter.", + "fa_icon": "fas fa-book-dead" + }, + "save_reference": { + "type": "boolean", + "default": false, + "description": "If generated by the pipeline save the BWA index in the results directory.", + "help_text": "If the BWA index is generated by the pipeline use this parameter to save it to your results folder. These can then be used for future pipeline runs, reducing processing times.", + "fa_icon": "fas fa-save" + }, + "igenomes_base": { + "type": "string", + "description": "Directory / URL base for iGenomes references.", + "default": "s3://ngi-igenomes/igenomes/", + "fa_icon": "fas fa-cloud-download-alt", + "hidden": true, + "help_text": "" + }, + "igenomes_ignore": { + "type": "boolean", + "description": "Do not load the iGenomes reference config.", + "fa_icon": "fas fa-ban", + "hidden": true, + "default": false, + "help_text": "Do not load `igenomes.config` when running the pipeline. You may choose this option if you observe clashes between custom parameters and those supplied in `igenomes.config`." + } + }, + "fa_icon": "fas fa-dna" + }, + "Adapter trimming options": { + "type": "object", + "properties": { + "clip_r1": { + "type": "integer", + "default": 0, + "description": "Instructs Trim Galore to remove bp from the 5' end of read 1 (or single-end reads).", + "fa_icon": "fas fa-cut", + "help_text": "" + }, + "clip_r2": { + "type": "integer", + "default": 0, + "description": "Instructs Trim Galore to remove bp from the 5' end of read 2 (paired-end reads only).", + "fa_icon": "fas fa-cut", + "help_text": "" + }, + "three_prime_clip_r1": { + "type": "integer", + "default": 0, + "description": "Instructs Trim Galore to remove bp from the 3' end of read 1 AFTER adapter/quality trimming has been performed.", + "fa_icon": "fas fa-cut" + }, + "three_prime_clip_r2": { + "type": "integer", + "default": 0, + "description": "Instructs Trim Galore to remove bp from the 3' end of read 2 AFTER adapter/quality trimming has been performed.", + "fa_icon": "fas fa-cut" + }, + "trim_nextseq": { + "type": "integer", + "default": 0, + "description": "Instructs Trim Galore to apply the --nextseq=X option, to trim based on quality after removing poly-G tails.", + "help_text": "This enables the option Cutadapt `--nextseq-trim=3'CUTOFF` option via Trim Galore, which will set a quality cutoff (that is normally given with -q instead), but qualities of G bases are ignored. This trimming is in common for the NextSeq- and NovaSeq-platforms, where basecalls without any signal are called as high-quality G bases.", + "fa_icon": "fas fa-cut" + }, + "skip_trimming": { + "type": "boolean", + "default": false, + "description": "Skip the adapter trimming step.", + "help_text": "Use this if your input FastQ files have already been trimmed outside of the workflow or if you're very confident that there is no adapter contamination in your data.", + "fa_icon": "fas fa-forward" + }, + "save_trimmed": { + "type": "boolean", + "default": false, + "description": "Save the trimmed FastQ files in the results directory.", + "help_text": "By default, trimmed FastQ files will not be saved to the results directory. Specify this flag (or set to true in your config file) to copy these files to the results directory when complete.", + "fa_icon": "fas fa-save" + } + }, + "fa_icon": "fas fa-cut" + }, + "Alignment options": { + "type": "object", + "properties": { + "keep_dups": { + "type": "boolean", + "default": false, + "description": "Duplicate reads are not filtered from alignments.", + "fa_icon": "fas fa-cart-arrow-down" + }, + "keep_multi_map": { + "type": "boolean", + "default": false, + "description": "Reads mapping to multiple locations are not filtered from alignments.", + "fa_icon": "fas fa-cart-arrow-down" + }, + "bwa_min_score": { + "type": "integer", + "description": "Don\u2019t output BWA MEM alignments with score lower than this parameter.", + "fa_icon": "fas fa-hand-paper" + }, + "save_align_intermeds": { + "type": "boolean", + "default": false, + "description": "Save the intermediate BAM files from the alignment step.", + "help_text": "By default, intermediate BAM files will not be saved. The final BAM files created after the appropriate filtering step are always saved to limit storage usage. Set this parameter to also save other intermediate BAM files.", + "fa_icon": "fas fa-save" + }, + "bamtools_filter_pe_config": { + "type": "string", + "default": "$baseDir/assets/bamtools_filter_pe.json", + "hidden": true, + "description": "BAMTools JSON file with custom filters for paired-end data.", + "fa_icon": "fas fa-cog", + "help_text": "" + }, + "bamtools_filter_se_config": { + "type": "string", + "default": "$baseDir/assets/bamtools_filter_se.json", + "hidden": true, + "description": "BAMTools JSON file with custom filters for single-end data.", + "fa_icon": "fas fa-cog", + "help_text": "" + } + }, + "fa_icon": "fas fa-map-signs" + }, + "Peak calling options": { + "type": "object", + "properties": { + "narrow_peak": { + "type": "boolean", + "default": false, + "description": "Run MACS2 in narrowPeak mode.", + "help_text": "MACS2 is run by default with the [`--broad`](https://github.com/taoliu/MACS#--broad) flag. Specify this flag to call peaks in narrowPeak mode.", + "fa_icon": "fas fa-arrows-alt-h" + }, + "broad_cutoff": { + "type": "number", + "default": 0.1, + "description": "Specifies broad cutoff value for MACS2. Only used when --narrow_peak isnt specified.", + "fa_icon": "fas fa-hand-scissors" + }, + "macs_fdr": { + "type": "number", + "description": "Minimum FDR (q-value) cutoff for peak detection, --macs_fdr and --macs_pvalue are mutually exclusive.", + "fa_icon": "fas fa-sort-amount-down" + }, + "macs_pvalue": { + "type": "number", + "description": "p-value cutoff for peak detection, --macs_fdr and --macs_pvalue are mutually exclusive. If --macs_pvalue cutoff is set, q-value will not be calculated and reported as -1 in the final .xls file.", + "fa_icon": "fas fa-sort-amount-down" + }, + "min_reps_consensus": { + "type": "integer", + "default": 1, + "description": "Number of biological replicates required from a given condition for a peak to contribute to a consensus peak.", + "help_text": "If you are confident you have good reproducibility amongst your replicates then you can increase the value of this parameter to create a \"reproducible\" set of consensus peaks. For example, a value of 2 will mean peaks that have been called in at least 2 replicates will contribute to the consensus set of peaks, and as such peaks that are unique to a given replicate will be discarded.", + "fa_icon": "fas fa-sort-numeric-down" + }, + "save_macs_pileup": { + "type": "boolean", + "default": false, + "description": "Instruct MACS2 to create bedGraph files normalised to signal per million reads.", + "fa_icon": "fas fa-save", + "help_text": "" + }, + "skip_peak_qc": { + "type": "boolean", + "fa_icon": "fas fa-forward", + "description": "Skip MACS2 peak QC plot generation.", + "default": false + }, + "skip_peak_annotation": { + "type": "boolean", + "fa_icon": "fas fa-forward", + "description": "Skip annotation of MACS2 and consensus peaks with HOMER.", + "default": false + }, + "skip_consensus_peaks": { + "type": "boolean", + "default": false, + "description": "Skip consensus peak generation, annotation and counting.", + "fa_icon": "fas fa-forward", + "help_text": "" + } + }, + "fa_icon": "fas fa-chart-area" + }, + "Differential analysis options": { + "type": "object", + "properties": { + "deseq2_vst": { + "type": "boolean", + "default": false, + "description": "Use vst transformation instead of rlog with DESeq2.", + "help_text": "See [DESeq2 docs](http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#data-transformations-and-visualization).", + "fa_icon": "fas fa-dolly" + }, + "skip_diff_analysis": { + "type": "boolean", + "default": false, + "description": "Skip differential accessibility analysis.", + "fa_icon": "fas fa-forward", + "help_text": "" + } + }, + "fa_icon": "fas fa-not-equal" + }, + "Process skipping options": { + "type": "object", + "properties": { + "skip_fastqc": { + "type": "boolean", + "default": false, + "description": "Skip FastQC.", + "fa_icon": "fas fa-forward" + }, + "skip_picard_metrics": { + "type": "boolean", + "default": false, + "description": "Skip Picard CollectMultipleMetrics.", + "fa_icon": "fas fa-forward" + }, + "skip_preseq": { + "type": "boolean", + "default": false, + "description": "Skip Preseq.", + "fa_icon": "fas fa-forward" + }, + "skip_plot_profile": { + "type": "boolean", + "default": false, + "description": "Skip deepTools plotProfile.", + "fa_icon": "fas fa-forward" + }, + "skip_plot_fingerprint": { + "type": "boolean", + "default": false, + "description": "Skip deepTools plotFingerprint.", + "fa_icon": "fas fa-forward" + }, + "skip_spp": { + "type": "boolean", + "default": false, + "description": "Skip Phantompeakqualtools.", + "fa_icon": "fas fa-forward" + }, + "skip_igv": { + "type": "boolean", + "default": false, + "description": "Skip IGV.", + "fa_icon": "fas fa-forward" + }, + "skip_multiqc": { + "type": "boolean", + "default": false, + "description": "Skip MultiQC.", + "fa_icon": "fas fa-forward" + } + }, + "fa_icon": "fas fa-forward" + }, + "Institutional config options": { + "type": "object", + "properties": { + "custom_config_version": { + "type": "string", + "description": "Git commit id for Institutional configs.", + "default": "master", + "hidden": true, + "fa_icon": "fas fa-users-cog", + "help_text": "" + }, + "custom_config_base": { + "type": "string", + "description": "Base directory for Institutional configs.", + "default": "https://raw.githubusercontent.com/nf-core/configs/master", + "hidden": true, + "help_text": "If you're running offline, Nextflow will not be able to fetch the institutional config files from the internet. If you don't need them, then this is not a problem. If you do need them, you should download the files from the repo and tell Nextflow where to find them with this parameter.", + "fa_icon": "fas fa-users-cog" + }, + "hostnames": { + "type": "string", + "description": "Institutional configs hostname.", + "hidden": true, + "fa_icon": "fas fa-users-cog", + "help_text": "" + }, + "config_profile_description": { + "type": "string", + "description": "Institutional config description.", + "hidden": true, + "fa_icon": "fas fa-users-cog", + "help_text": "" + }, + "config_profile_contact": { + "type": "string", + "description": "Institutional config contact information.", + "hidden": true, + "fa_icon": "fas fa-users-cog", + "help_text": "" + }, + "config_profile_url": { + "type": "string", + "description": "Institutional config URL link.", + "hidden": true, + "fa_icon": "fas fa-users-cog", + "help_text": "" + } + }, + "fa_icon": "fas fa-university" + }, + "Max job request options": { + "type": "object", + "properties": { + "max_cpus": { + "type": "integer", + "description": "Maximum number of CPUs that can be requested for any single job.", + "default": 16, + "fa_icon": "fas fa-microchip", + "hidden": true, + "help_text": "Use to set an upper-limit for the CPU requirement for each process. Should be an integer e.g. `--max_cpus 1`" + }, + "max_memory": { + "type": "string", + "description": "Maximum amount of memory that can be requested for any single job.", + "default": "128.GB", + "fa_icon": "fas fa-memory", + "hidden": true, + "help_text": "Use to set an upper-limit for the memory requirement for each process. Should be a string in the format integer-unit e.g. `--max_memory '8.GB'`" + }, + "max_time": { + "type": "string", + "description": "Maximum amount of time that can be requested for any single job.", + "default": "240.h", + "fa_icon": "far fa-clock", + "hidden": true, + "help_text": "Use to set an upper-limit for the time requirement for each process. Should be a string in the format integer-unit e.g. `--max_time '2.h'`" + } + }, + "fa_icon": "fab fa-acquisitions-incorporated" + }, + "Generic options": { + "type": "object", + "properties": { + "help": { + "type": "boolean", + "description": "Display help text.", + "fa_icon": "fas fa-question-circle", + "default": false, + "hidden": true + }, + "fingerprint_bins": { + "type": "integer", + "default": 500000, + "description": "Number of genomic bins to use when calculating deepTools fingerprint plot.", + "fa_icon": "fas fa-dumpster", + "hidden": true + }, + "publish_dir_mode": { + "type": "string", + "default": "copy", + "description": "Method used to save pipeline results to output directory.", + "help_text": "The Nextflow `publishDir` option specifies which intermediate files should be saved to the output directory. This option tells the pipeline what method should be used to move these files. See [Nextflow docs](https://www.nextflow.io/docs/latest/process.html#publishdir) for details.", + "fa_icon": "fas fa-copy", + "enum": [ + "symlink", + "rellink", + "link", + "copy", + "copyNoFollow", + "mov" + ], + "hidden": true + }, + "name": { + "type": "string", + "description": "Workflow name.", + "fa_icon": "fas fa-address-card", + "help_text": "A custom name for the pipeline run. Unlike the core nextflow `-name` option with one hyphen this parameter can be reused multiple times, for example if using `-resume`. Passed through to steps such as MultiQC and used for things like report filenames and titles.", + "hidden": true + }, + "email_on_fail": { + "type": "string", + "description": "Email address for completion summary, only when pipeline fails.", + "fa_icon": "fas fa-exclamation-triangle", + "pattern": "^([a-zA-Z0-9_\\-\\.]+)@([a-zA-Z0-9_\\-\\.]+)\\.([a-zA-Z]{2,5})$", + "help_text": "An email address to send a summary email to when the pipeline is completed - ONLY sent if the pipeline does not exit successfully.", + "hidden": true + }, + "plaintext_email": { + "type": "boolean", + "description": "Send plain-text email instead of HTML.", + "fa_icon": "fas fa-remove-format", + "default": false, + "help_text": "", + "hidden": true + }, + "max_multiqc_email_size": { + "type": "string", + "description": "File size limit when attaching MultiQC reports to summary emails.", + "default": "25.MB", + "fa_icon": "fas fa-file-upload", + "help_text": "", + "hidden": true + }, + "monochrome_logs": { + "type": "boolean", + "description": "Do not use coloured log outputs.", + "fa_icon": "fas fa-palette", + "default": false, + "help_text": "", + "hidden": true + }, + "multiqc_config": { + "type": "string", + "description": "Custom config file to supply to MultiQC.", + "fa_icon": "fas fa-cog", + "help_text": "", + "hidden": true + }, + "tracedir": { + "type": "string", + "description": "Directory to keep pipeline Nextflow logs and reports.", + "default": "${params.outdir}/pipeline_info", + "fa_icon": "fas fa-cogs", + "help_text": "", + "hidden": true + }, + "clusterOptions": { + "type": "string", + "description": "Arguments passed to Nextflow clusterOptions.", + "fa_icon": "fas fa-network-wired", + "help_text": "", + "hidden": true + } + }, + "fa_icon": "fas fa-file-import" + } + } +} \ No newline at end of file