Sfitz by readgroup and add FastQC #62

sorelfitzgibbon · 2024-06-12T04:06:36Z

Description

Processing is now done with respect to samples, libraries and readgroups.
SAMtools stats now runs three times, by sample, by library and by readgroup, unless there is only a single library or readgroup.
FastQC runs at a single level set by input parameter fastqc_level
CollectWgsMetrics and Qualimap are functionally unchanged.
Added HG003 downsampled to NFTest
algorithms standardized to algorithm.

Closes #42 #44 #45 #50 #63 #66 #67

Testing Results

NFTest Samples:
1. GIAB HG003_0.05x-selected-readgroups =
  - H003 0.05x downsampled (~0.05x coverage? / 1.4GB BAM)
    - then 6 readgroups (3 libraries) selected out 1005 total readgroups (-> 11MB BAM 33,400 reads).
  - 1 sample (HG003) with...
    - library1: 1 readgroup
    - library2: 2 readgroups
    - library3: 3 readgroups
2. A_mini
  - 2 samples, each with one library and one readgroup
NFTest runs
- hg003-all-tools, sample level FastQC
- a_mini-all-tools
- a_mini-all-tools-multiple-samples
- log: /hot/software/pipeline/pipeline-generate-SQC-BAM/Nextflow/development/unreleased/sfitz-by-readgroup/log-nftest-20240626T210242Z.log
- hg003-fastqc, readgroup level FastQC
- log: /hot/software/pipeline/pipeline-generate-SQC-BAM/Nextflow/development/unreleased/sfitz-by-readgroup/log-nftest-20240626T204641Z.log
- a_mini-fastqc, sample level FastQC
- log: /hot/software/pipeline/pipeline-generate-SQC-BAM/Nextflow/development/unreleased/sfitz-by-readgroup/log-nftest-20240626T204702Z.log

Checklist

I have read the code review guidelines and the code review best practice on GitHub check-list.
I have reviewed the Nextflow pipeline standards.
The name of the branch is meaningful and well formatted following the standards, using [AD_username (or 5 letters of AD if AD is too long)]-[brief_description_of_branch].
I have set up or verified the branch protection rule following the github standards before opening this pull request.
I have added my name to the contributors listings in the manifest block in the nextflow.config as part of this pull request, am listed
already, or do not wish to be listed. (This acknowledgement is optional.)
I have added the changes included in this pull request to the CHANGELOG.md under the next release version or unreleased, and updated the date.
I have updated the version number in the metadata.yaml and manifest block of the nextflow.config file following semver, or the version number has already been updated. (Leave it unchecked if you are unsure about new version number and discuss it with the infrastructure team in this PR.)
I have tested the pipeline using NFTest, or I have justified why I did not need to run NFTest above.

yashpatel6

I'll review in greater depth in the next couple of days but a suggestion in the meantime:

yashpatel6 · 2024-06-13T01:16:14Z

module/stats_samtools.nf

+    if (task.process == "run_stats_SAMtools_sample") {
+        output_filename = generate_standard_filename("SAMtools-${params.samtools_version}",
+            params.dataset_id,
+            sm_id,
+            [:])
+        outdir = "."
+        log_suffix = "${sm_id}"
+    } else if (task.process == "run_stats_SAMtools_library") {
+        output_filename = generate_standard_filename("SAMtools-${params.samtools_version}",
+            params.dataset_id,
+            lib_id,
+            [:])
+        outdir = sm_id
+        log_suffix = "${sm_id}/${lib_id}"
+    } else if (task.process == "run_stats_SAMtools_readgroup") {
+        output_filename = generate_standard_filename("SAMtools-${params.samtools_version}",
+            params.dataset_id,
+            rg_id,
+            [:])
+        outdir = "${sm_id}/${lib_id}"
+        log_suffix = "${sm_id}/${lib_id}/${rg_id}"
+    }
+


suggestion: Consider re-organizing this block to something like:

if (params.stat_mode == 'library') { filename_id = lib_id outdir = ... log_suffix = ... } else if (params.stat_mode == 'readgroup') { filename_id = rg_id outdir = ... log_suffix = ... } else { filename_id = sm_id outdir = ... log_suffix = ... } output_filename = generate_standard_filename("SAMtools-${params.samtools_version}", params.dataset_id, filename_id, [:])

and then using addParams when importing the process to set stat_mode.

This reduces the amount of duplicated code (ex. the standard filename generation function is only coded once) and also removes the need for the process itself to have to be imported with a specific name and instead be controlled by a parameter set during import.

sorelfitzgibbon · 2024-06-14T20:18:56Z

@yashpatel6 this is ready for review

sorelfitzgibbon · 2024-06-14T20:36:19Z

main.nf

+Channel
+    .fromList(params.libraries_to_process)
+    .map { lib ->
+        def rg_arg = lib.rgs.collect { "-r ${it}" }.join(' ')


I forgot to change this. In a separate PR I will make this a list of readgroups and add the -r within the processes

Some of the GIAB samples for example have > 1000 readgroups. I think I should impose a maximum number of readgroups, and I suppose libraries, that will be processed separately by SAMtools stats. But for FastQC they will all be processed separately as this is equivalent to running FastQC on the FASTQ files.

I added parameters for maximum number of readgroups and libraries to process separately with defaults = 20.

I also parameterized a choice of levels to run FastQC.

tyamaguchi-ucla · 2024-06-14T20:43:49Z

module/fastqc.nf

+    set -euo pipefail
+    mkdir -p ${sm_id}
+    samtools view -F 0x900 -h ${rg_arg} ${path} | \
+        samtools fastq | \
+        fastqc \
+        --outdir "${sm_id}" \
+        --format fastq \
+        --extract \
+        --delete \
+        ${params.fastqc_additional_options} \
+        stdin:${output_filename}


@sorelfitzgibbon
question: I think I understand what this command tries to archive but can you elaborate on this a bit (can this -F 0x900 be optional, etc)? Also, have you tried to incorporate threading? This would be probably really slow for large samples. Is it possible to use fastqc -format bam with threading for the sample level run?

Thanks @tyamaguchi-ucla

-F 0x900 removes secondary and supplementary alignments. It is actually the default but I thought I'd leave it here so there is no question that its being done.

Surprisingly fastqc -format bam does not allow per readgroup analysis.

I have not tried to incorporate threading. All three commands allow threading. I will try it and test speedup.

UPDATE 2: After discussion with @yashpatel6:

Since, within metapipeline, this pipeline will eventually be used to gate the running of further pipelines, realtime will be more important than cpu hours, thus the default will be 2 cpus. More than 2 cpus provides very little time advantage for the large (more typical size) readgroup.

cpus will be adjusted down to 1 for input with large numbers of readgroups.

UDPATE: I now think it would be better to avoid threading entirely, at least when running in metapipeline.

Testing on a sample with 112 readgroups took twice as long with 2 threads vs 1 (9.5 hours vs 4.7 hours).
/hot/software/pipeline/pipeline-generate-SQC-BAM/Nextflow/development/unreleased/sfitz-add-mosdepth/slurm-184021.out
/hot/software/pipeline/pipeline-generate-SQC-BAM/Nextflow/development/unreleased/sfitz-add-mosdepth/slurm-184029.out

The per cpu benefit of 2 threads on samples with fewer readgroups is minimal, if even real.

Below are the results for running with various threads on SGSALASC000015-T011-O04-F.bam (185 GB).

This sample has five readgroups with one being much larger than the others.

All output here: /hot/software/pipeline/pipeline-generate-SQC-BAM/Nextflow/development/unreleased/sfitz-by-readgroup

It looks like the sweet spot is 2 threads

Speedup (single thread realtime/realtime)

threads 1 2 3 4 6 12

small readgroup 1 1.9 2.1 3.8 4 3.4

large readgroup 1 2 2.5 2.2 2.6 2.5

total cpu hours 9.3 9.2 12.2 12.4 16.7 37.0

1 thread; full run CPU HOURS = 9.3

name threads realtime %cpu peak_rss peak_vmem rchar wchar

assess_ReadQuality_FastQC (2) 1 1h 24m 33s 97.50% 444.2 MB 3.3 GB 369.5 GB 185.2 GB

assess_ReadQuality_FastQC (5) 1 1h 27m 31s 99.70% 444.8 MB 3.3 GB 369.6 GB 185.4 GB

assess_ReadQuality_FastQC (4) 1 1h 27m 31s 99.70% 447.5 MB 3.3 GB 369.6 GB 185.4 GB

assess_ReadQuality_FastQC (3) 1 1h 27m 31s 99.80% 445.4 MB 3.3 GB 370.4 GB 186.2 GB

assess_ReadQuality_FastQC (1) 1 3h 31m 22s 99.60% 444.5 MB 3.3 GB 1.2 TB 1 TB

2 threads; full run cpu hours = 9.2

name threads realtime %cpu peak_rss peak_vmem rchar wchar

assess_ReadQuality_FastQC (3) 2 44m 11s 188.30% 459.8 MB 3.9 GB 370.4 GB 186.2 GB

assess_ReadQuality_FastQC (4) 2 44m 11s 190.20% 465.7 MB 3.9 GB 369.6 GB 185.4 GB

assess_ReadQuality_FastQC (5) 2 44m 10s 189.90% 465.2 MB 3.9 GB 369.6 GB 185.4 GB

assess_ReadQuality_FastQC (2) 2 38m 54s 191.90% 462 MB 3.9 GB 369.5 GB 185.2 GB

assess_ReadQuality_FastQC (1) 2 1h 45m 35s 200.20% 459.1 MB 3.9 GB 1.2 TB 1 TB

3 threads; full run cpu hours = 12.2

name threads realtime %cpu peak_rss peak_vmem rchar wchar

assess_ReadQuality_FastQC (2) 3 40m 48s 182.00% 877.1 MB 4.9 GB 369.5 GB 185.2 GB

assess_ReadQuality_FastQC (3) 3 40m 48s 181.90% 885.5 MB 4.9 GB 370.4 GB 186.2 GB

assess_ReadQuality_FastQC (4) 3 38m 42s 187.30% 885.5 MB 4.9 GB 369.6 GB 185.4 GB

assess_ReadQuality_FastQC (5) 3 38m 43s 187.50% 880.5 MB 5 GB 369.6 GB 185.4 GB

assess_ReadQuality_FastQC (1) 3 1h 24m 36s 237.80% 894.7 MB 5 GB 1.2 TB 1 TB

4 threads; full run cpu hours = 12.4

name threads realtime %cpu peak_rss peak_vmem rchar wchar

assess_ReadQuality_FastQC (2) 4 22m 33s 350.10% 1.1 GB 5.7 GB 369.5 GB 185.2 GB

assess_ReadQuality_FastQC (3) 4 22m 19s 358.50% 905.9 MB 5.8 GB 370.4 GB 186.2 GB

assess_ReadQuality_FastQC (5) 4 22m 7s 356.40% 970.9 MB 5.7 GB 369.6 GB 185.4 GB

assess_ReadQuality_FastQC (4) 4 22m 10s 353.60% 971.2 MB 5.7 GB 369.6 GB 185.4 GB

assess_ReadQuality_FastQC (1) 4 1h 37m 8s 233.10% 1.1 GB 5.7 GB 1.2 TB 1 TB

6 threads; full run CPU HOURS = 16.7

name threads realtime %cpu peak_rss peak_vmem rchar wchar

assess_ReadQuality_FastQC (2) 6 21m 36s 332.00% 1.5 GB 6.9 GB 369.5 GB 185.2 GB

assess_ReadQuality_FastQC (3) 6 21m 36s 319.90% 1.4 GB 6.8 GB 370.4 GB 186.2 GB

assess_ReadQuality_FastQC (5) 6 21m 36s 332.20% 1.5 GB 6.9 GB 369.6 GB 185.4 GB

assess_ReadQuality_FastQC (4) 6 21m 36s 318.70% 1.5 GB 6.9 GB 369.6 GB 185.4 GB

assess_ReadQuality_FastQC (1) 6 1h 20m 29s 263.50% 1.4 GB 7 GB 1.2 TB 1 TB

12 threads; full run CPU HOURS = 37.0

name threads realtime %cpu peak_rss peak_vmem rchar wchar

assess_ReadQuality_FastQC (5) 12 25m 22s 271.30% 2.5 GB 10.4 GB 369.6 GB 185.4 GB

assess_ReadQuality_FastQC (2) 12 25m 22s 265.00% 2.3 GB 10.6 GB 369.5 GB 185.2 GB

assess_ReadQuality_FastQC (4) 12 25m 22s 265.00% 2.4 GB 10.5 GB 369.6 GB 185.4 GB

assess_ReadQuality_FastQC (3) 12 25m 22s 271.30% 2.5 GB 10.4 GB 370.4 GB 186.2 GB

assess_ReadQuality_FastQC (1) 12 1h 23m 44s 250.30% 2.1 GB 10.5 GB 1.2 TB 1 TB

-F 0x900 removes secondary and supplementary alignments. It is actually the default but I thought I'd leave it here so there is no question that its being done.

Thanks for the benchmarking results. I wasn't aware of this default. Do we want to include them though (i.e. -f?) to get a complete result for the RG level FASTQs, etc?

No, I don't think we do want to include them. They are duplicates of reads that are already represented. Removing them creates the output that directly matches the original FASTQs.

@sorelfitzgibbon (suggestion/question: non-blocking) You've generated lots of benchmarking results and we should post them on GitHub discussions as well. Also, do you typically adjust for caching performance when conducting benchmarking?

do you typically adjust for caching performance when conducting benchmarking

Is there a recommended way of doing that? I ran almost all of it on nodes that were already running and had space available. They were run at more or less the same time. I could run them all concurrently on a single F72, although I assume even that background will change as jobs start to finish.

Caching adjustments are hard to control in general; a way around it could to be make the tests totally independent of the cache by copying the input files to /scratch and then run all benchmarking runs using files under scratch.

I will do some cache independent benchmarking.

Caching adjustments are hard to control in general; a way around it could to be make the tests totally independent of the cache by copying the input files to /scratch and then run all benchmarking runs using files under scratch.

Yeah, I was going suggest this. Maybe worth creating a confluence page for general benchmark recommendations? We've discussed pre-caching, etc with OHIA and MSFT a while ago but using /scratch would be the easiest solution.

sorelfitzgibbon · 2024-06-25T00:18:41Z

@yashpatel6 I think this is ready for another review. I will update based on cache-independent benchmarking in the next PR.

sorelfitzgibbon · 2024-06-25T17:58:26Z

@yashpatel6 I think this is ready for another review. I will update based on cache-independent benchmarking in the next PR.

Actually hold off. I need to stop single libraries and readgroups being run multiple times.

yashpatel6 · 2024-06-25T20:17:32Z

@yashpatel6 I think this is ready for another review. I will update based on cache-independent benchmarking in the next PR.

Actually hold off. I need to stop single libraries and readgroups being run multiple times.

Gotcha, feel free to re-request the review when ready!

sorelfitzgibbon · 2024-06-26T21:53:22Z

In order to optimize this pipeline, it would be helpful to start to specify

In order to optimize this pipeline, it would be helpful to start to specify what QC results would trigger a cessation of the metapipeline run. For example coverage calculations take the longest time and may not be necessary before the gatekeeping. Results from SAMtools stats should be sufficient, e.g. read QC and count, mapping QC & % duplicated.

In this case, perhaps this pipeline could be run twice within metapipeline, first targeting the gatekeeping information (SAMtools stats) and second filling in the rest (FastQC and coverage (mosdepth or collectWgsMetrics)).

Regardless, I hope to merge this PR and implement optimizations after adding mosdepth. I've created issues for the various issues left hanging.

I started a Discussion about goals for integrating sample level QC into metapipeline-DNA

yashpatel6

A couple of comments and suggestions but otherwise looks good:

yashpatel6 · 2024-06-27T21:28:16Z

config/F16.config

+            }
+        }
+    }
+    withName: run_stats_SAMtools_readgroup {


I'm ok with maintaining the three sections and using run_statsReadgroups_SAMtools

yashpatel6 · 2024-06-27T21:33:30Z

config/template.config

    reference = '/hot/ref/reference/GRCh38-BI-20160721/Homo_sapiens_assembly38.fasta'
    output_dir = '/path/to/output/directory'
    blcds_registered_dataset = false // if you want the output to be registered
    save_intermediate_files = true

    // SAMtools stats options
-    samtools_remove_duplicates = false
-    samtools_stats_additional_options = ''
+    stats_max_rgs_per_sample = 20


suggestion: Do we also want to add the max libs parameter here?

@yashpatel6 I left it off in the interest of streamlining the template since library numbers that are large enough to be an issue are currently unlikely. It is in the README. Do you want me to add it?

For consistency, I would suggest adding it

sorelfitzgibbon · 2024-06-27T22:44:31Z

done and tested:
/hot/software/pipeline/pipeline-generate-SQC-BAM/Nextflow/development/unreleased/sfitz-by-readgroup/log-nftest-20240627T222505Z.log

yashpatel6

Generally looks good technically! The specific usage in terms of failing/passing QC we can discuss in the metapipeline context

sorelfitzgibbon requested a review from a team as a code owner June 12, 2024 04:06

sorelfitzgibbon requested a review from yashpatel6 June 12, 2024 04:34

yashpatel6 reviewed Jun 13, 2024

View reviewed changes

yashpatel6 self-assigned this Jun 13, 2024

This was linked to issues Jun 13, 2024

Perform lane/library level analysis where possible #44

Closed

revist output dir path #50

Closed

yashpatel6 self-requested a review June 13, 2024 01:19

sorelfitzgibbon changed the title ~~Sfitz by readgroup~~ Sfitz by readgroup and add FastQC Jun 14, 2024

sorelfitzgibbon commented Jun 14, 2024

View reviewed changes

sorelfitzgibbon mentioned this pull request Jun 14, 2024

add readgroup parameter within process workflows #63

Open

tyamaguchi-ucla reviewed Jun 14, 2024

View reviewed changes

sorelfitzgibbon added 18 commits June 16, 2024 16:31

process by readgroup in progress

d0d3a7b

standardize algorithms to algorithm

adb0591

samtools stats by readgroup

ad00e21

update changelog

046ce02

fix mislabel

07b5458

revert unintentional resource changes

eb422cc

add fastqc

ae43ccd

add fastqc module

68cf4b4

use process_afterscript

b738e51

update changelog

27f1581

update nftest for fastqc

1c922f9

fix nftest path

dff3a7a

merge with sfitz-by-readgroup complete and tested

dbf3a01

use fastqc docker with samtools

9fd01e9

fastqc by readgroup

c5407f8

nftest paths updated

8653bbc

refactor channels

20085c3

add hg003 to NFTest

b34451e

sorelfitzgibbon added 6 commits June 24, 2024 16:58

update readme

91bf337

update nftest.yml

4e2b724

parameterize fastqc and add max gps to stats

01b50bf

sanitize library ID

e61b1ac

update test configs

e6d0200

change process input variable names

91f47d3

add slurm logs and extra test files to .gitignore

a2f0489

sorelfitzgibbon added 6 commits June 26, 2024 14:10

update readme

fae512d

update comments

ecf44dd

adjust run level triggers

206c7e9

adjust log filename

8587520

update nftests

137874f

fix test name

6ad240e

sorelfitzgibbon requested a review from yashpatel6 June 26, 2024 21:58

remove out of date comments

86cc4b2

yashpatel6 reviewed Jun 27, 2024

View reviewed changes

sorelfitzgibbon added 3 commits June 27, 2024 15:13

change process names

8fd7119

rename bamqc_outformat to bamqc_output_format

9bb4d2f

rename bamqc_outformat to bamqc_output_format

2a63bb4

sorelfitzgibbon requested review from yashpatel6 and tyamaguchi-ucla June 27, 2024 22:44

remove fastqc as default

4e1e2fd

yashpatel6 approved these changes Jun 28, 2024

View reviewed changes

sorelfitzgibbon merged commit ef50a59 into main Jul 19, 2024
7 checks passed

sorelfitzgibbon deleted the sfitz-by-readgroup branch July 23, 2024 19:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sfitz by readgroup and add FastQC #62

Sfitz by readgroup and add FastQC #62

sorelfitzgibbon commented Jun 12, 2024 •

edited

Loading

yashpatel6 left a comment

yashpatel6 Jun 13, 2024

sorelfitzgibbon commented Jun 14, 2024

sorelfitzgibbon Jun 14, 2024

sorelfitzgibbon Jun 18, 2024 •

edited

Loading

sorelfitzgibbon Jun 25, 2024

tyamaguchi-ucla Jun 14, 2024

sorelfitzgibbon Jun 14, 2024

sorelfitzgibbon Jun 16, 2024 •

edited

Loading

tyamaguchi-ucla Jun 20, 2024

sorelfitzgibbon Jun 20, 2024

tyamaguchi-ucla Jun 20, 2024

sorelfitzgibbon Jun 21, 2024

yashpatel6 Jun 21, 2024

sorelfitzgibbon Jun 25, 2024

tyamaguchi-ucla Jun 25, 2024

sorelfitzgibbon commented Jun 25, 2024

sorelfitzgibbon commented Jun 25, 2024

yashpatel6 commented Jun 25, 2024

sorelfitzgibbon commented Jun 26, 2024

yashpatel6 left a comment

yashpatel6 Jun 27, 2024

yashpatel6 Jun 27, 2024

sorelfitzgibbon Jun 27, 2024 •

edited

Loading

yashpatel6 Jun 28, 2024

sorelfitzgibbon commented Jun 27, 2024

yashpatel6 left a comment

threads	1	2	3	4	6	12
small readgroup	1	1.9	2.1	3.8	4	3.4
large readgroup	1	2	2.5	2.2	2.6	2.5
total cpu hours	9.3	9.2	12.2	12.4	16.7	37.0

name	threads	realtime	%cpu	peak_rss	peak_vmem	rchar	wchar
assess_ReadQuality_FastQC (2)	1	1h 24m 33s	97.50%	444.2 MB	3.3 GB	369.5 GB	185.2 GB
assess_ReadQuality_FastQC (5)	1	1h 27m 31s	99.70%	444.8 MB	3.3 GB	369.6 GB	185.4 GB
assess_ReadQuality_FastQC (4)	1	1h 27m 31s	99.70%	447.5 MB	3.3 GB	369.6 GB	185.4 GB
assess_ReadQuality_FastQC (3)	1	1h 27m 31s	99.80%	445.4 MB	3.3 GB	370.4 GB	186.2 GB
assess_ReadQuality_FastQC (1)	1	3h 31m 22s	99.60%	444.5 MB	3.3 GB	1.2 TB	1 TB

name	threads	realtime	%cpu	peak_rss	peak_vmem	rchar	wchar
assess_ReadQuality_FastQC (3)	2	44m 11s	188.30%	459.8 MB	3.9 GB	370.4 GB	186.2 GB
assess_ReadQuality_FastQC (4)	2	44m 11s	190.20%	465.7 MB	3.9 GB	369.6 GB	185.4 GB
assess_ReadQuality_FastQC (5)	2	44m 10s	189.90%	465.2 MB	3.9 GB	369.6 GB	185.4 GB
assess_ReadQuality_FastQC (2)	2	38m 54s	191.90%	462 MB	3.9 GB	369.5 GB	185.2 GB
assess_ReadQuality_FastQC (1)	2	1h 45m 35s	200.20%	459.1 MB	3.9 GB	1.2 TB	1 TB

name	threads	realtime	%cpu	peak_rss	peak_vmem	rchar	wchar
assess_ReadQuality_FastQC (2)	3	40m 48s	182.00%	877.1 MB	4.9 GB	369.5 GB	185.2 GB
assess_ReadQuality_FastQC (3)	3	40m 48s	181.90%	885.5 MB	4.9 GB	370.4 GB	186.2 GB
assess_ReadQuality_FastQC (4)	3	38m 42s	187.30%	885.5 MB	4.9 GB	369.6 GB	185.4 GB
assess_ReadQuality_FastQC (5)	3	38m 43s	187.50%	880.5 MB	5 GB	369.6 GB	185.4 GB
assess_ReadQuality_FastQC (1)	3	1h 24m 36s	237.80%	894.7 MB	5 GB	1.2 TB	1 TB

name	threads	realtime	%cpu	peak_rss	peak_vmem	rchar	wchar
assess_ReadQuality_FastQC (2)	4	22m 33s	350.10%	1.1 GB	5.7 GB	369.5 GB	185.2 GB
assess_ReadQuality_FastQC (3)	4	22m 19s	358.50%	905.9 MB	5.8 GB	370.4 GB	186.2 GB
assess_ReadQuality_FastQC (5)	4	22m 7s	356.40%	970.9 MB	5.7 GB	369.6 GB	185.4 GB
assess_ReadQuality_FastQC (4)	4	22m 10s	353.60%	971.2 MB	5.7 GB	369.6 GB	185.4 GB
assess_ReadQuality_FastQC (1)	4	1h 37m 8s	233.10%	1.1 GB	5.7 GB	1.2 TB	1 TB

name	threads	realtime	%cpu	peak_rss	peak_vmem	rchar	wchar
assess_ReadQuality_FastQC (2)	6	21m 36s	332.00%	1.5 GB	6.9 GB	369.5 GB	185.2 GB
assess_ReadQuality_FastQC (3)	6	21m 36s	319.90%	1.4 GB	6.8 GB	370.4 GB	186.2 GB
assess_ReadQuality_FastQC (5)	6	21m 36s	332.20%	1.5 GB	6.9 GB	369.6 GB	185.4 GB
assess_ReadQuality_FastQC (4)	6	21m 36s	318.70%	1.5 GB	6.9 GB	369.6 GB	185.4 GB
assess_ReadQuality_FastQC (1)	6	1h 20m 29s	263.50%	1.4 GB	7 GB	1.2 TB	1 TB

name	threads	realtime	%cpu	peak_rss	peak_vmem	rchar	wchar
assess_ReadQuality_FastQC (5)	12	25m 22s	271.30%	2.5 GB	10.4 GB	369.6 GB	185.4 GB
assess_ReadQuality_FastQC (2)	12	25m 22s	265.00%	2.3 GB	10.6 GB	369.5 GB	185.2 GB
assess_ReadQuality_FastQC (4)	12	25m 22s	265.00%	2.4 GB	10.5 GB	369.6 GB	185.4 GB
assess_ReadQuality_FastQC (3)	12	25m 22s	271.30%	2.5 GB	10.4 GB	370.4 GB	186.2 GB
assess_ReadQuality_FastQC (1)	12	1h 23m 44s	250.30%	2.1 GB	10.5 GB	1.2 TB	1 TB

Sfitz by readgroup and add FastQC #62

Sfitz by readgroup and add FastQC #62

Conversation

sorelfitzgibbon commented Jun 12, 2024 • edited Loading

Description

Closes #42 #44 #45 #50 #63 #66 #67

Testing Results

Checklist

yashpatel6 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sorelfitzgibbon commented Jun 14, 2024

Choose a reason for hiding this comment

sorelfitzgibbon Jun 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sorelfitzgibbon Jun 16, 2024 • edited Loading

Choose a reason for hiding this comment

Speedup (single thread realtime/realtime)

1 thread; full run CPU HOURS = 9.3

2 threads; full run cpu hours = 9.2

3 threads; full run cpu hours = 12.2

4 threads; full run cpu hours = 12.4

6 threads; full run CPU HOURS = 16.7

12 threads; full run CPU HOURS = 37.0

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sorelfitzgibbon commented Jun 25, 2024

sorelfitzgibbon commented Jun 25, 2024

yashpatel6 commented Jun 25, 2024

sorelfitzgibbon commented Jun 26, 2024

yashpatel6 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sorelfitzgibbon Jun 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sorelfitzgibbon commented Jun 27, 2024

yashpatel6 left a comment

Choose a reason for hiding this comment

sorelfitzgibbon commented Jun 12, 2024 •

edited

Loading

sorelfitzgibbon Jun 18, 2024 •

edited

Loading

sorelfitzgibbon Jun 16, 2024 •

edited

Loading

sorelfitzgibbon Jun 27, 2024 •

edited

Loading