Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rename in the MultiQC report for samples without techreps #1341

Merged
merged 12 commits into from
Jul 15, 2024

Conversation

pinin4fjords
Copy link
Member

@pinin4fjords pinin4fjords commented Jul 12, 2024

@MatthiasZepper noted in #1308 that running UMItools extract resulted in inconsistent sample naming in the multiqc report.

Screenshot 2024-07-12 at 15 49 51

This is because if a sample does not have technical replicates, the input fastq files go to the umitools extract process without ever going through a process that applies a suffix, and the relevant MultiQC module effectively uses the input file name to derive a sample name.

One fix we should definitely apply is to have MultiQC use the output (possibly prefixed) file name as the source of identifier: MultiQC/MultiQC#2698.

However it probably also makes sense to be defensive, and tell MultiQC to rename any other related occurrences that come up in future. That's what this PR does, using the sample sheet to derive a set of replacements to pass to multiqc via --replace-names (to be clear this is also an immediate fix for the umi tools issue until if/ when it's fixed in MultiQC).

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/rnaseq branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

Copy link

This PR is against the master branch ❌

  • Do not close this PR
  • Click Edit and change the base to dev
  • This CI test will remain failed until you push a new commit

Hi @pinin4fjords,

It looks like this pull-request is has been made against the nf-core/rnaseq master branch.
The master branch on nf-core repositories should always contain code from the latest release.
Because of this, PRs to master are only allowed if they come from the nf-core/rnaseq dev branch.

You do not need to close this PR, you can change the target branch to dev by clicking the "Edit" button at the top of this page.
Note that even after this, the test will continue to show as failing until you push a new commit.

Thanks again for your contribution!

@pinin4fjords pinin4fjords changed the base branch from master to dev July 12, 2024 17:12
Copy link

github-actions bot commented Jul 12, 2024

nf-core lint overall result: Passed ✅ ⚠️

Posted for pipeline commit f35f51b

+| ✅ 173 tests passed       |+
#| ❔   9 tests were ignored |#
!| ❗   7 tests had warnings |!

❗ Test warnings:

  • files_exist - File not found: assets/multiqc_config.yml
  • files_exist - File not found: .github/workflows/awstest.yml
  • files_exist - File not found: .github/workflows/awsfulltest.yml
  • pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
  • pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline

❔ Tests ignored:

✅ Tests passed:

Run details

  • nf-core/tools version 2.14.1
  • Run at 2024-07-15 15:12:11

Copy link
Member

@MatthiasZepper MatthiasZepper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, but I do not feel competent to review this. The changes to the module are evident (and of course also reviewed for the merge to the modules repo), but the details of the ch_name_replacements construction are beyond my comprehension. Thus, I can't think the edge cases through to spot potential issues.

workflows/rnaseq/main.nf Outdated Show resolved Hide resolved
workflows/rnaseq/main.nf Outdated Show resolved Hide resolved
@pinin4fjords
Copy link
Member Author

@MatthiasZepper - it probably was a little arcane (I blame Friday afternoon head) - simplified a bit now

Copy link
Member

@MatthiasZepper MatthiasZepper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonderful! Much simpler solution that even I can understand!

What I am somewhat worried about are the possible side effects of doing that in the first place.

You are starting from ch_samples, which is directly taken from the sample sheet, so before .groupTuple(), checkSamplesAfterGrouping(it) and CAT_FASTQ are applied.

This means that samples with more than one pair of FastQs will be mapped to the same meta.id in separate lines of the collected file. The MultiQC doc is very clear about what will happen in that case: Samples mapped to the same name will overwrite the preexisting information when processed.

Since we use the concatenated FastQ files for essentially everything in the pipeline, I think that those lines for the single FastQs will simply never apply (in particular since you set sample_names_replace_exact: true), but it might be safer to never include them in the first place?

For that reason, I suggest .cross()ing your ch_name_replacements channel with ch_fastq.single, so that aliases are only written for the samples with one pair of FastQs to avoid duplicates.

Update: I changed my mind. Firstly, I realized that ch_fastq.single is of course the wrong channel (since it containes the unpaired samples and not those with just one pair) and I also tried the .cross() myself and just ended once more being f*cking frustrated with Nextflow, because I simply do not know how to act on errors like

groovy.lang.MissingMethodException: No signature of method: Script_4f6e07c481ddf010$_runScript_closure1$_closure2$_closure4$_closure8.doCall() is applicable for argument types: (ArrayList) values: [[SampleA, [[id:SampleA, single_end:false], [id:SampleA, single_e>
Possible solutions: doCall(java.lang.Object, java.lang.Object), findAll(), findAll(), isCase(java.lang.Object), isCase(java.lang.Object)

So: If you like, I suggest doing something that restricts the list to those samples with only one pair of FastQs or a single unpaired FastQ (essentially no duplicate meta.id) before you write the file, but if you do not feel like doing that, I am fine with that as well.

@pinin4fjords
Copy link
Member Author

@MatthiasZepper

ch_fastq.single is of course the wrong channel (since it containes the unpaired samples and not those with just one pair)

No, you were right first time :-). ch_fastq.single contains all those samples with a single ([read1] or [read1, read2]) tuple after the groupTuple(), so ch_fastq.single is the right way to go!

Copy link
Member

@MatthiasZepper MatthiasZepper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brilliant! I probably nudged you to overengineer the whole stuff, but thanks for bearing with me, and sorry for the fuss!

@pinin4fjords pinin4fjords changed the title Add wholesale rename in the MultiQC report Add rename in the MultiQC report for samples without techreps Jul 15, 2024
@pinin4fjords
Copy link
Member Author

Thanks @MatthiasZepper , all good :-).

@pinin4fjords pinin4fjords merged commit e1b2ef7 into dev Jul 15, 2024
27 checks passed
@pinin4fjords pinin4fjords deleted the rename_samples branch July 15, 2024 16:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants