Run prettier

nf-core · Sep 5, 2024 · 699c49c · 699c49c
1 parent 90ffcc0
commit 699c49c
Show file tree

Hide file tree

Showing 5 changed files with 82 additions and 74 deletions.
diff --git a/conf/test_screening.config b/conf/test_screening.config
@@ -25,9 +25,12 @@ params {
     crisprcleanr               = "Brunello_Library"
     library                    = params.pipelines_testdata_base_path + "crisprseq/testdata/brunello_target_sequence.txt"
     contrasts                  = params.pipelines_testdata_base_path + "crisprseq/testdata/rra_contrasts.txt"
-    drugz                      = params.pipelines_testdata_base_path + "crisprseq/testdata/rra_contrasts.txt"
-    hit_selection_iteration_nb = 150
+    drugz                      = true
+    hit_selection_iteration_nb = 50
     hitselection               = true
+    bagel2                     = true
+    rra                        = true
+    mle                        = true
 }
 
 process {

diff --git a/docs/usage/screening.md b/docs/usage/screening.md
@@ -72,9 +72,17 @@ Otherwise, if you wish to provide your own file, please provide it in CSV format
 | CTCTACGAGAAGCTCTACAC | NM_021446.2 | 0610007P14Rik | ex2  | 12     | +        | 85822108 |
 | GACTCTATCACATCACACTG | NM_021446.2 | 0610007P14Rik | ex4  | 12     | +        | 85816419 |
 
-### Running MAGeCK MLE and BAGEL2 with a contrast file
+### Running gene essentiality scoring
 
-To run both MAGeCK MLE and BAGEL2, you can provide a contrast file with the flag `--contrasts` with the mandatory headers "treatment" and "reference". These two columns should be separated with a dot comma (;) and contain the `csv` extension. You can also integrate several samples/conditions by comma separating them in each column. Please find an example here below :
+nf-core/crisprseq supports 4 gene essentiality analysis modules : MAGeCK RRA, MAGeCK MLE,
+BAGEL2 and DrugZ. You can run any of these modules by providing a contrast file using `--contrasts` and the flag of the tool you wish to use:
+
+- `--rra` for MAGeCK RRA,
+- `--mle` for MAGeCK MLE
+- `--drugz` for DrugZ
+- `--bagel2` for BAGEL2.
+
+The contrast file must contain the headers "treatment" and "reference".These two columns should be separated with a dot comma (;) and contain the `csv` extension. You can also integrate several samples/conditions by comma separating them in each column. Please find an example here below :
 
 | reference         | treatment             |
 | ----------------- | --------------------- |
@@ -87,14 +95,13 @@ A full example can be found [here](https://raw.githubusercontent.com/nf-core/tes
 
 Running MAGeCK MLE and BAGEL2 with a contrast file will also output a Venn diagram showing common genes having an FDR < 0.1.
 
-### Running MAGeCK RRA only
+### MAGeCK RRA
 
 MAGeCK RRA performs robust ranking aggregation to identify genes that are consistently ranked highly across multiple replicate screens. To run MAGeCK RRA, you can define the contrasts as previously stated in the last section with --contrasts your_file.txt(with a `.txt` extension) and also specify `--rra`.
-MAGeCK RRA performs robust ranking aggregation to identify genes that are consistently ranked highly across multiple replicate screens. To run MAGeCK RRA, you can define the contrasts as previously stated in the last section with `--contrasts your_file.txt` (with a `.txt` extension) and also specify `--rra`.
 
 ### Running MAGeCK MLE only
 
-#### With design matrices
+#### With your own design matrices
 
 If you wish to run MAGeCK MLE only, you can specify several design matrices (where you state which comparisons you wish to run) with the flag `--mle_design_matrix`.
 MAGeCK MLE uses a maximum likelihood estimation approach to estimate the effects of gene knockout on cell fitness. It models the read count data of guide RNAs targeting each gene and estimates the dropout probability for each gene.
@@ -106,7 +113,11 @@ If there are several designs to be run, you can input a folder containing all th
 
 This label is not mandatory as in case you are running time series. If you wish to run MAGeCK MLE with the day0 label you can do so by specifying `--day0_label` and the sample names that should be used as day0. The contrast will then be automatically adjusted for the other days.
 
-### MAGECKFlute
+#### With the contrast file
+
+To run MAGeCK MLE, you can define the contrasts as previously stated in the last section with --contrasts your_file.txt and also specify `--mle`.
+
+### MAGeCKFlute
 
 The downstream analysis involves distinguishing essential, non-essential, and target-associated genes. Additionally, it encompasses conducting biological functional category analysis and pathway enrichment analysis for these genes. Furthermore, it provides visualization of genes within pathways, enhancing user exploration of screening data. MAGECKFlute is run automatically after MAGeCK MLE and for each MLE design matrice. If you have used the `--day0_label`, MAGeCKFlute will be ran on all the other conditions. Please note that the DepMap data is used for these plots.
 
@@ -117,11 +128,11 @@ You can add the parameter `--mle_control_sgrna` followed by your file (one non t
 ### Running BAGEL2
 
 BAGEL2 (Bayesian Analysis of Gene Essentiality with Location) is a computational tool developed by the Hart Lab at Harvard University. It is designed for analyzing large-scale genetic screens, particularly CRISPR-Cas9 screens, to identify genes that are essential for the survival or growth of cells under different conditions. BAGEL2 integrates information about the location of guide RNAs within a gene and leverages this information to improve the accuracy of gene essentiality predictions.
-BAGEL2 uses the same contrasts from `--contrasts`.
+BAGEL2 uses the same contrasts from `--contrasts` and is run with the extra parameter `--bagel2`.
 
 ### Running drugZ
 
-[DrugZ](https://github.com/hart-lab/drugz) detects synergistic and suppressor drug-gene interactions in CRISPR screens. DrugZ is an open-source Python software for the analysis of genome-scale drug modifier screens. The software accurately identifies genetic perturbations that enhance or suppress drug activity. To run drugZ, you can specify `--drugz` followed a contrast file with the mandatory headers "treatment" and "reference". These two columns should be separated with a dot comma (;) and contain the `csv` extension. You can also integrate several samples/conditions by comma separating them in each column.
+[DrugZ](https://github.com/hart-lab/drugz) detects synergistic and suppressor drug-gene interactions in CRISPR screens. DrugZ is an open-source Python software for the analysis of genome-scale drug modifier screens. The software accurately identifies genetic perturbations that enhance or suppress drug activity. To run drugZ, you can specify `--drugz` with the contrast file `--contrasts`. These two columns should be separated with a dot comma (;) and contain the `csv` extension. You can also integrate several samples/conditions by comma separating them in each column.
 
 | reference         | treatment             |
 | ----------------- | --------------------- |

diff --git a/nextflow.config b/nextflow.config
@@ -27,6 +27,9 @@ params {
     min_reads                  = 30
     min_targeted_genes         = 3
     rra                        = false
+    mle                        = false
+    drugz                      = false
+    bagel2                     = false
     bagel_reference_essentials =    'https://raw.githubusercontent.com/hart-lab/bagel/master/CEGv2.txt'
     bagel_reference_nonessentials = 'https://raw.githubusercontent.com/hart-lab/bagel/master/NEGv1.txt'
     drugz                      = null
@@ -232,7 +235,6 @@ singularity.registry = 'quay.io'
 // Nextflow plugins
 plugins {
     id '[email protected]' // Validation of pipeline parameters and creation of an input channel from a sample sheet
-    id '[email protected]'
 }
 
 // Load igenomes.config if required

diff --git a/nextflow_schema.json b/nextflow_schema.json
@@ -202,10 +202,23 @@
                     "description": "Comma-separated file with the conditions to be compared. The first one will be the reference (control)",
                     "fa_icon": "fas fa-adjust"
                 },
+                "mle": {
+                    "type": "boolean",
+                    "description": "Parameter specifying MAGeCK MLE should be run"
+                },
                 "rra": {
                     "type": "boolean",
                     "description": "Parameter indicating if MAGeCK RRA should be ran instead of MAGeCK MLE."
                 },
+                "bagel2": {
+                    "type": "boolean",
+                    "description": "Parameter indicating if BAGEL2 should be run"
+                },
+                "drugz": {
+                    "type": "boolean",
+                    "format": "file-path",
+                    "description": "Parameter indicating if DrugZ should be run"
+                },
                 "count_table": {
                     "type": "string",
                     "format": "file-path",
@@ -238,11 +251,6 @@
                     "description": "Non essential gene set  for BAGEL2",
                     "default": "https://raw.githubusercontent.com/hart-lab/bagel/master/NEGv1.txt"
                 },
-                "drugz": {
-                    "type": "string",
-                    "format": "file-path",
-                    "description": "Specifies drugz to be run and your contrast file on which comparisons should be done"
-                },
                 "drugz_remove_genes": {
                     "type": "string",
                     "description": "Essential genes to remove from the drugZ modules",

diff --git a/workflows/crisprseq_screening.nf b/workflows/crisprseq_screening.nf
@@ -37,7 +37,6 @@ include { BOWTIE2_ALIGN                                } from '../modules/nf-cor
 include { INITIALISATION_CHANNEL_CREATION_SCREENING    } from '../subworkflows/local/utils_nfcore_crisprseq_pipeline'
 // Functions
 include { paramsSummaryMap                             } from 'plugin/nf-validation'
-include { gptPromptForText                             } from 'plugin/nf-gpt'
 include { paramsSummaryMultiqc                         } from '../subworkflows/nf-core/utils_nfcore_pipeline'
 include { softwareVersionsToYAML                       } from '../subworkflows/nf-core/utils_nfcore_pipeline'
 include { methodsDescriptionText                       } from '../subworkflows/local/utils_nfcore_crisprseq_pipeline'
@@ -245,53 +244,56 @@ workflow CRISPRSEQ_SCREENING {
     counts = ch_contrasts.combine(ch_counts)
 
 
+    if(params.bagel2) {
     //Define non essential and essential genes channels for bagel2
-    ch_bagel_reference_essentials= Channel.fromPath(params.bagel_reference_essentials).first()
-    ch_bagel_reference_nonessentials= Channel.fromPath(params.bagel_reference_nonessentials).first()
+        ch_bagel_reference_essentials= Channel.fromPath(params.bagel_reference_essentials).first()
+        ch_bagel_reference_nonessentials= Channel.fromPath(params.bagel_reference_nonessentials).first()
 
-    BAGEL2_FC (
-            counts
-        )
-    ch_versions = ch_versions.mix(BAGEL2_FC.out.versions)
+        BAGEL2_FC (
+                counts
+            )
+        ch_versions = ch_versions.mix(BAGEL2_FC.out.versions)
 
-    BAGEL2_BF (
-        BAGEL2_FC.out.foldchange,
-        ch_bagel_reference_essentials,
-        ch_bagel_reference_nonessentials
-    )
+        BAGEL2_BF (
+            BAGEL2_FC.out.foldchange,
+            ch_bagel_reference_essentials,
+            ch_bagel_reference_nonessentials
+        )
 
-    ch_versions = ch_versions.mix(BAGEL2_BF.out.versions)
+        ch_versions = ch_versions.mix(BAGEL2_BF.out.versions)
 
 
-    ch_bagel_pr = BAGEL2_BF.out.bf.combine(ch_bagel_reference_essentials)
+        ch_bagel_pr = BAGEL2_BF.out.bf.combine(ch_bagel_reference_essentials)
                                         .combine(ch_bagel_reference_nonessentials)
 
-    BAGEL2_PR (
-        ch_bagel_pr
-    )
-    ch_versions = ch_versions.mix(BAGEL2_PR.out.versions)
+        BAGEL2_PR (
+            ch_bagel_pr
+        )
+        ch_versions = ch_versions.mix(BAGEL2_PR.out.versions)
 
-    BAGEL2_GRAPH (
-        BAGEL2_PR.out.pr
-    )
+        BAGEL2_GRAPH (
+            BAGEL2_PR.out.pr
+        )
 
-    ch_versions = ch_versions.mix(BAGEL2_GRAPH.out.versions)
+        ch_versions = ch_versions.mix(BAGEL2_GRAPH.out.versions)
+            // Run hit selection on BAGEL2
+        if(params.hitselection) {
 
-    // Run hit selection on BAGEL2
-    if(params.hitselection) {
+            HITSELECTION_BAGEL2 (
+                BAGEL2_PR.out.pr,
+                INITIALISATION_CHANNEL_CREATION_SCREENING.out.biogrid,
+                INITIALISATION_CHANNEL_CREATION_SCREENING.out.hgnc,
+                params.hit_selection_iteration_nb
+            )
+            ch_versions = ch_versions.mix(HITSELECTION_BAGEL2.out.versions)
+            }
 
-        HITSELECTION_BAGEL2 (
-            BAGEL2_PR.out.pr,
-            INITIALISATION_CHANNEL_CREATION_SCREENING.out.biogrid,
-            INITIALISATION_CHANNEL_CREATION_SCREENING.out.hgnc,
-            params.hit_selection_iteration_nb
-        )
-        ch_versions = ch_versions.mix(HITSELECTION_BAGEL2.out.versions)
-    }
+        }
 
     }
 
-    if((params.mle_design_matrix) || (params.contrasts && !params.rra) || (params.day0_label)) {
+    // Run MLE
+    if((params.mle_design_matrix) || (params.contrasts && params.mle) || (params.day0_label)) {
         //if the user only wants to run mle through their own design matrices
         if(params.mle_design_matrix) {
             INITIALISATION_CHANNEL_CREATION_SCREENING.out.design.map {
@@ -306,7 +308,7 @@ workflow CRISPRSEQ_SCREENING {
         }
 
         //if the user specified a contrast file
-        if(params.contrasts) {
+        if(params.contrasts && params.mle) {
             MATRICESCREATION(ch_contrasts)
             ch_mle = MATRICESCREATION.out.design_matrix.combine(ch_counts)
             MAGECK_MLE (ch_mle, INITIALISATION_CHANNEL_CREATION_SCREENING.out.mle_control_sgrna)
@@ -318,15 +320,11 @@ workflow CRISPRSEQ_SCREENING {
                 INITIALISATION_CHANNEL_CREATION_SCREENING.out.hgnc,
                 params.hit_selection_iteration_nb)
 
-                ch_versions = ch_versions.mix(HITSELECTION_BAGEL2.out.versions)
+                ch_versions = ch_versions.mix(HITSELECTION_MLE.out.versions)
             }
 
             MAGECK_FLUTEMLE_CONTRASTS(MAGECK_MLE.out.gene_summary)
             ch_versions = ch_versions.mix(MAGECK_FLUTEMLE_CONTRASTS.out.versions)
-            ch_venndiagram = BAGEL2_PR.out.pr.join(MAGECK_MLE.out.gene_summary)
-            VENNDIAGRAM(ch_venndiagram)
-            ch_versions = ch_versions.mix(VENNDIAGRAM.out.versions)
-
         }
         if(params.day0_label) {
             ch_mle = Channel.of([id: "day0"]).merge(Channel.of([[]])).merge(ch_counts)
@@ -339,7 +337,7 @@ workflow CRISPRSEQ_SCREENING {
 
     // Launch module drugZ
     if(params.drugz) {
-        Channel.fromPath(params.drugz)
+        Channel.fromPath(params.contrasts)
                 .splitCsv(header:true, sep:';' )
                 .set { ch_drugz }
 
@@ -355,30 +353,16 @@ workflow CRISPRSEQ_SCREENING {
                 INITIALISATION_CHANNEL_CREATION_SCREENING.out.hgnc,
                 params.hit_selection_iteration_nb)
 
-            ch_versions = ch_versions.mix(HITSELECTION_BAGEL2.out.versions)
+            ch_versions = ch_versions.mix(HITSELECTION.out.versions)
         }
 
     }
 
-    //
-    // Parse genes from drugZ to Open AI api
-    //
-    gene_source = DRUGZ.out.per_gene_results.map { meta, genes -> genes}
-    def question = "Which of the following genes enhance or supress drug activity. Only write the gene names with yes or no respectively."
-    PREPARE_GPT_INPUT(
-        gene_source,
-        question
-    )
 
-    PREPARE_GPT_INPUT.out.query.map {
-        it -> it.text
+    if(params.mle && params.bagel2) {
+        ch_venndiagram = BAGEL2_PR.out.pr.join(MAGECK_MLE.out.gene_summary)
+        VENNDIAGRAM(ch_venndiagram)
     }
-    .collect()
-    .flatMap { it -> gptPromptForText(it[0]) }
-    .set { gpt_genes_output }
-
-    gpt_genes_output
-        .collectFile( name: 'gpt_important_genes.txt', newLine: true, sort: false )
 
     //
     // Collate and save software versions