Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hit selection module #191

Merged
merged 22 commits into from
Sep 4, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .editorconfig
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,6 @@ indent_size = unset
# ignore python and markdown
[*.{py,md}]
indent_style = unset

[/assets/hgnc_complete_set.txt]
trim_trailing_whitespace = unset
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

- Add module to classify samples by clonality ([#178](https://github.com/nf-core/crisprseq/pull/178))
- Add DrugZ, a module for chemogenetic interaction ([#168](https://github.com/nf-core/crisprseq/pull/168))
- Add Hitselection, a module for subsetting more likely true positives for KO screen based on the protein protein interaction ([#191](https://github.com/nf-core/crisprseq/pull/191))

### Fixed

Expand Down
10 changes: 6 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,10 +67,12 @@ For crispr screening:
- ([`bowtie2`](http://bowtie-bio.sourceforge.net/bowtie2/index.shtml))
3. Optional: CNV correction and normalization with ([`CRISPRcleanR`](https://github.com/francescojm/CRISPRcleanR))
4. Rank sgRNAs and genes ;
a. ([MAGeCK test](https://sourceforge.net/p/mageck/wiki/usage/#test))
b. ([MAGeCK mle](https://sourceforge.net/p/mageck/wiki/Home/#mle))
c. ([BAGEL2](https://github.com/hart-lab/bagel))
5. Visualize analysis
- ([MAGeCK test](https://sourceforge.net/p/mageck/wiki/usage/#test))
- ([MAGeCK mle](https://sourceforge.net/p/mageck/wiki/Home/#mle))
- ([BAGEL2](https://github.com/hart-lab/bagel))
- ([DrugZ](https://github.com/hart-lab/drugz))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this go under point 5?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually no it's also a gene essentiality analysis but only on drug treated cell lines. It does something similar to MAGeCK MLE

5. Optional: hit selection on KO screen allowing a subset of more likely true positives
6. Visualize analysis

## Usage

Expand Down
84,496 changes: 84,496 additions & 0 deletions assets/biogrid_hgncid_noduplicate_dropna.csv

Large diffs are not rendered by default.

43,843 changes: 43,843 additions & 0 deletions assets/hgnc_complete_set.txt

Large diffs are not rendered by default.

38 changes: 38 additions & 0 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -144,6 +144,44 @@ process {
]
}

withName: HITSELECTION {
containerOptions = ''
publishDir = [
path: { "${params.outdir}/hitselection/drugz/" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

withName: HITSELECTION_MLE {
containerOptions = ''
publishDir = [
path: { "${params.outdir}/hitselection/mle/" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}


withName: HITSELECTION_BAGEL2 {
containerOptions = ''
publishDir = [
path: { "${params.outdir}/hitselection/bagel2/" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

withName: HITSELECTION_RRA {
containerOptions = ''
publishDir = [
path: { "${params.outdir}/hitselection/rra/" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}


withName: VENNDIAGRAM {
publishDir = [
path: { "${params.outdir}/venndiagram/${meta.treatment}_vs_${meta.reference}/" },
Expand Down
14 changes: 8 additions & 6 deletions conf/test_screening.config
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,14 @@ params {
max_time = '6.h'

// Input data
input = params.pipelines_testdata_base_path + "crisprseq/testdata/samplesheet_test.csv"
analysis = 'screening'
crisprcleanr = "Brunello_Library"
library = params.pipelines_testdata_base_path + "crisprseq/testdata/brunello_target_sequence.txt"
contrasts = params.pipelines_testdata_base_path + "crisprseq/testdata/rra_contrasts.txt"
drugz = params.pipelines_testdata_base_path + "crisprseq/testdata/rra_contrasts.txt"
input = params.pipelines_testdata_base_path + "crisprseq/testdata/samplesheet_test.csv"
analysis = 'screening'
crisprcleanr = "Brunello_Library"
library = params.pipelines_testdata_base_path + "crisprseq/testdata/brunello_target_sequence.txt"
contrasts = params.pipelines_testdata_base_path + "crisprseq/testdata/rra_contrasts.txt"
drugz = params.pipelines_testdata_base_path + "crisprseq/testdata/rra_contrasts.txt"
hit_selection_iteration_nb = 150
hitselection = true
}

process {
Expand Down
2 changes: 2 additions & 0 deletions conf/test_screening_rra.config
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@ params {
library = params.pipelines_testdata_base_path + "crisprseq/testdata/brunello_target_sequence.txt"
contrasts = params.pipelines_testdata_base_path + "crisprseq/testdata/rra_contrasts.txt"
rra = true
hitselection = true
hit_selection_iteration_nb = 150
}

process {
Expand Down
6 changes: 6 additions & 0 deletions docs/output/screening.md
Original file line number Diff line number Diff line change
Expand Up @@ -190,6 +190,12 @@ For further reading and documentation see the [cutadapt helper page](https://cut
- `*.txt`: Pathway view for top enriched pathways.
- `*.png`: Pathway view for top enriched pathways.

### HitSelection

- `HitSelection`
- `*.png` : -logP value vs gene rank plot to determine the rank thresholds
- `*.txt` : Ranked -logP value and gene symbols table

## MultiQC

<details markdown="1">
Expand Down
12 changes: 12 additions & 0 deletions docs/usage/screening.md
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,18 @@ The contrast from reference to treatment should be ; separated

If you wish to remove specific genes before the drugZ analysis, you can use the `--drugz_remove_genes` option following a comma separated list of genes.

### Running Hitselection

Hitselection provides the user with a threshold and a set of genes that are likely to be closer to true positives by identifying the most interconnected subnetworks within the ranked gene list. This module is for now only developed for KO screens on Human data mapped to Entrez IDs.

Hitselection is a script for identifying rank thresholds for CRISPR screen results based on using the connectivity of subgraphs of protein-protein interaction (PPI) networks. The script is based on R and is also an implementation of RNAiCut (Kaplow et al., 2009), a method for estimating thresholds in RNAi data. The principle behind Hitselection is that true positive hits are densely connected in the PPI networks. The script runs a simulation based on Poisson distribution of the ranked screen gene list to calculate the -logP value for comparing the interconnectivity of the real subnetwork and the degree match random subnetwork of each gene, one by one. The degree of the nodes is used as the interconnectivity metric.

To run Hitselection, you can specify '--hitselection' and it will automatically run on the gene essentiality algorithms you have chosen. The outputs are a png file containing the -logP value vs gene rank plot and a txt file containing all the -logP values, edge and average edge values and ranked gene symbols.

## :warning: The hitselection algorithm is for the moment developed only for KO screens and requires the library to map to genes with an Homosapiens EntrezID.

## :warning: Please be advised that the Hitselection algorithm is time intensive and will make the pipeline run longer
Comment on lines +145 to +147
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To show a formatted warning box on the webside the notation is the following:

> [!WARNING]
> The hitselection algorithm is for the moment developed only for KO screens and requires the library to map to genes with an Homosapiens EntrezID.

> [!WARNING]
Please be advised that the Hitselection algorithm is time intensive and will make the pipeline run longer


Note that the pipeline will create the following files in your working directory:

```bash
Expand Down
Loading