-
Notifications
You must be signed in to change notification settings - Fork 705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update usage.md for igenomes warning #1073
Conversation
|
Looks good to me, except that |
docs/usage.md
Outdated
- The igenomes file usage triggered by this option is outdated with respect to gene annotations. This can be particularly problematic for RNA-seq analysis, which relies on accurate gene annotation. | ||
- Some iGenomes references (e.g., GRCh38) point to annotation files that use gene symbols as the primary identifier. This can cause issues for downstream analysis, such as the nf-core [differential abundance workflow](https://nf-co.re/differentialabundance) workflow. | ||
|
||
We recommend that you provide reference files directly, via `--gtf` and `--fasta`, and that supplied GTF files do not use gene names as `gene_id`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably also worth extending these to provide genome indices too. And maybe even docs and commands to download via Ensembl. I am manually sharing them right now on Slack which isn't ideal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@drpatelh I've added the example download commands, and reworked a bit to make the existing discussion of indices more prominent.
@nf-core-bot fix linting |
--input <SAMPLESHEET> \ | ||
--outdir <OUTDIR> \ | ||
--gtf <GTF> \ | ||
--fasta <GENOME FASTA> \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't we need --igenomes_ignore --genome null
too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... actually, could I be more provocative and suggest we lose --igenomes_ignore
in favour of a new --load_igenomes
? So non-use of iGenomes becomes the default position (since it's what we're recommending)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(realise I've broken the tests, I'll fix things up if folk agree this is the right way to go)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does --ignomes_ignore
do specifically? I can see the use if there is no valid igenomes.config
, but not otherwise. I just run with with --fasta
and gtf
and don't specify --genome
at all (is that equivalent to --igenomes null
?), and that works great (I do have igenomes_base
in my profile).
Also, is there anything in iGenomes (QC references, etc.?) that is being used by default, even when --fasta
and --gtf
are provided? I suspect not, because when I don't specify the --genome
I don't seem to be missing anything. If that is in fact the case, I am totally fine with not using iGenomes by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does
--ignomes_ignore
do specifically?
Literally just that conditional around the config load I think. The help in the schema says "Do not load igenomes.config
when running the pipeline. You may choose this option if you observe clashes between custom parameters and those supplied in igenomes.config
." - so clearly someone has encountered clashes of some sort in the past.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @pinin4fjords @tdanhorn! Will merge this and make any changes in a follow up PR. Due to the extensive changes it will be easier for me to update bits and bobs.
@@ -23,6 +23,7 @@ Thank you to everyone else that has contributed by reporting bugs, enhancements | |||
- [PR #1054](https://github.com/nf-core/rnaseq/pull/1054) - Template update to nf-core/tools v2.9 | |||
- [PR #1058](https://github.com/nf-core/rnaseq/pull/1058) - Use `nf-validation` plugin for parameter and samplesheet validation | |||
- [PR #1068](https://github.com/nf-core/rnaseq/pull/1068) - Update `grep` version for `untar` module | |||
- [PR #1073](https://github.com/nf-core/rnaseq/pull/1073) - Update documentation to discourage --genome |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- [PR #1073](https://github.com/nf-core/rnaseq/pull/1073) - Update documentation to discourage --genome | |
- [PR #1073](https://github.com/nf-core/rnaseq/pull/1073) - Update documentation to discourage usage of `--genome` |
iGenomes are a pain:
So I propose some strong wording to advocate for supplying
--gtf
and--fasta
.Edit: file downloads added after feedback. I also tidied up the discussion of reference index handling that I thought had gotten a bit wordy, hope I haven't been too brutal.
PR checklist
nf-core lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).