Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update usage.md for igenomes warning #1073

Merged
merged 18 commits into from
Oct 13, 2023
Merged

Update usage.md for igenomes warning #1073

merged 18 commits into from
Oct 13, 2023

Conversation

pinin4fjords
Copy link
Member

@pinin4fjords pinin4fjords commented Aug 30, 2023

iGenomes are a pain:

  1. VERY outdated gene annotation
  2. Problems for downstream analysis caused by symbols used as primary identifiers in NCBI iGenome references.

So I propose some strong wording to advocate for supplying --gtf and --fasta.

Edit: file downloads added after feedback. I also tidied up the discussion of reference index handling that I thought had gotten a bit wordy, hope I haven't been too brutal.

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/rnaseq branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@github-actions
Copy link

github-actions bot commented Aug 30, 2023

nf-core lint overall result: Passed ✅ ⚠️

Posted for pipeline commit db2b2ac

+| ✅ 143 tests passed       |+
#| ❔   6 tests were ignored |#
!| ❗   3 tests had warnings |!

❗ Test warnings:

  • files_exist - File not found: .github/workflows/awstest.yml
  • files_exist - File not found: .github/workflows/awsfulltest.yml
  • pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline

❔ Tests ignored:

  • files_unchanged - File ignored due to lint config: assets/email_template.html
  • files_unchanged - File ignored due to lint config: assets/email_template.txt
  • files_unchanged - File ignored due to lint config: lib/NfcoreTemplate.groovy
  • files_unchanged - File ignored due to lint config: .gitignore or .prettierignore or pyproject.toml
  • actions_awstest - 'awstest.yml' workflow not found: /home/runner/work/rnaseq/rnaseq/.github/workflows/awstest.yml
  • multiqc_config - multiqc_config

✅ Tests passed:

Run details

  • nf-core/tools version 2.9
  • Run at 2023-09-05 08:45:40

@tdanhorn
Copy link

Looks good to me, except that [differential abundance workflow](https://nf-co.re/differentialabundance) workflow. in usage.md will be rendered as "... workflow workflow".
This just changes the GitHub documentation, right? I think most users probably look at documentation on the website, so it would make sense to update that as well.

docs/usage.md Outdated
- The igenomes file usage triggered by this option is outdated with respect to gene annotations. This can be particularly problematic for RNA-seq analysis, which relies on accurate gene annotation.
- Some iGenomes references (e.g., GRCh38) point to annotation files that use gene symbols as the primary identifier. This can cause issues for downstream analysis, such as the nf-core [differential abundance workflow](https://nf-co.re/differentialabundance) workflow.

We recommend that you provide reference files directly, via `--gtf` and `--fasta`, and that supplied GTF files do not use gene names as `gene_id`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably also worth extending these to provide genome indices too. And maybe even docs and commands to download via Ensembl. I am manually sharing them right now on Slack which isn't ideal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drpatelh I've added the example download commands, and reworked a bit to make the existing discussion of indices more prominent.

@pinin4fjords
Copy link
Member Author

@nf-core-bot fix linting

--input <SAMPLESHEET> \
--outdir <OUTDIR> \
--gtf <GTF> \
--fasta <GENOME FASTA> \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we need --igenomes_ignore --genome null too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... actually, could I be more provocative and suggest we lose --igenomes_ignore in favour of a new --load_igenomes? So non-use of iGenomes becomes the default position (since it's what we're recommending)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(realise I've broken the tests, I'll fix things up if folk agree this is the right way to go)

Copy link

@tdanhorn tdanhorn Sep 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does --ignomes_ignore do specifically? I can see the use if there is no valid igenomes.config, but not otherwise. I just run with with --fasta and gtf and don't specify --genome at all (is that equivalent to --igenomes null?), and that works great (I do have igenomes_base in my profile).
Also, is there anything in iGenomes (QC references, etc.?) that is being used by default, even when --fasta and --gtf are provided? I suspect not, because when I don't specify the --genome I don't seem to be missing anything. If that is in fact the case, I am totally fine with not using iGenomes by default.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does --ignomes_ignore do specifically?

Literally just that conditional around the config load I think. The help in the schema says "Do not load igenomes.config when running the pipeline. You may choose this option if you observe clashes between custom parameters and those supplied in igenomes.config." - so clearly someone has encountered clashes of some sort in the past.

conf/test.config Outdated Show resolved Hide resolved
Copy link
Member

@drpatelh drpatelh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pinin4fjords @tdanhorn! Will merge this and make any changes in a follow up PR. Due to the extensive changes it will be easier for me to update bits and bobs.

@@ -23,6 +23,7 @@ Thank you to everyone else that has contributed by reporting bugs, enhancements
- [PR #1054](https://github.com/nf-core/rnaseq/pull/1054) - Template update to nf-core/tools v2.9
- [PR #1058](https://github.com/nf-core/rnaseq/pull/1058) - Use `nf-validation` plugin for parameter and samplesheet validation
- [PR #1068](https://github.com/nf-core/rnaseq/pull/1068) - Update `grep` version for `untar` module
- [PR #1073](https://github.com/nf-core/rnaseq/pull/1073) - Update documentation to discourage --genome
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- [PR #1073](https://github.com/nf-core/rnaseq/pull/1073) - Update documentation to discourage --genome
- [PR #1073](https://github.com/nf-core/rnaseq/pull/1073) - Update documentation to discourage usage of `--genome`

@drpatelh drpatelh merged commit c742192 into dev Oct 13, 2023
26 checks passed
@drpatelh drpatelh deleted the igenomes_warning branch November 21, 2023 09:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants