Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --control-sgrna parameter for megeck count #63

Closed
zhouzhendiao opened this issue Jul 31, 2023 · 5 comments
Closed

Add --control-sgrna parameter for megeck count #63

zhouzhendiao opened this issue Jul 31, 2023 · 5 comments
Labels
enhancement Improvement for existing functionality

Comments

@zhouzhendiao
Copy link

Description of feature

I have some non-target sgRNA in my library.

image

megeck count privide parameters --control-sgrna for generate the null distribution.

What does the --control-sgrna CONTROL_SGRNA option do? How to use this option?
A: This option tells MAGeCK to use provided negative control sgRNAs to generate the null distribution when calculating the p values. If this option is not specified, MAGeCK generates the null distribution of RRA scores by assuming all of the genes in the library are non-essential. This approach is sometimes over-conservative, and you can improve this if you know some genes are not essential. By providing the corresponding sgRNA IDs in the --control-sgrna option, MAGeCK will have a better estimation of p values.

Can you kindly add these paramter, thanks!

@zhouzhendiao zhouzhendiao added the enhancement Improvement for existing functionality label Jul 31, 2023
@LaurenceKuhl
Copy link
Contributor

Hi @zhouzhendiao !
This is already possible with a user.config profile :) could you please create a config file such as the following :

process { withName:MAGECK_MLE { ext.args = '--control-sgrna "your-config-file" ' } }

and then in the command line specify -c user.config

let me know how it goes :)
best,
Laurence

@zhouzhendiao
Copy link
Author

Hi @LaurenceKuhl ,

I will try this later. Thanks for replying!

@LaurenceKuhl
Copy link
Contributor

Hi i will close this issue, please feel free to re open if anything is unclear

@jeremymsimon
Copy link

Hi @LaurenceKuhl - my understanding is that these sorts of extra pipeline-specific parameters are best suited for the -params-file rather than the -c config.yml specification. Is it possible to implement something where we would specify the above as:

extra_mageck_mle_args: >-
  --control-sgrna "gRNA_CONTROL_IDs.tsv"

or similar within a supplied -params-file?

@jeremymsimon
Copy link

Note that when trying the above as

process { 
  withName: 'MAGECK_MLE' { 
    ext.args =  '--control-sgrna "gRNA_annotated_geneSymbol_withID_CONTROLS.tsv" ' 
  } 
}

where my .tsv contains a list of control gRNAs like

sgRNA56555
sgRNA56556
sgRNA56557
sgRNA56558
sgRNA56559
sgRNA56560
sgRNA56561
sgRNA56562
sgRNA56563
sgRNA56564

I get a FileNotFoundError:

Caused by:
  Process `NFCORE_CRISPRSEQ:CRISPRSEQ_SCREENING:MAGECK_MLE (treated_7hr_1,treated_7hr_2,treated_7hr_3_vs_control_1,control_2,control_3)` terminated with an error exit status (1)

Command executed:

  mageck \
      mle \
      --control-sgrna "gRNA_annotated_geneSymbol_withID_CONTROLS.tsv"  \
      --threads 6 \
      -k count_table.count.txt \
      -n treated_7hr_1,treated_7hr_2,treated_7hr_3_vs_control_1,control_2,control_3     \
      -d treated_7hr_1_treated_7hr_2_treated_7hr_3_vs_control_1_control_2_control_3.txt


  cat <<-END_VERSIONS > versions.yml
  "NFCORE_CRISPRSEQ:CRISPRSEQ_SCREENING:MAGECK_MLE":
      mageck: $(mageck -v)
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  INFO  @ Wed, 24 Jul 2024 12:58:58: Parameters: /usr/local/bin/mageck mle --control-sgrna gRNA_annotated_geneSymbol_withID_CONTROLS.tsv --threads 6 -k count_table.count.txt -n treated_7hr_1,treated_7hr_2,treated_7hr_3_vs_control_1,control_2,control_3 -d treated_7hr_1_treated_7hr_2_treated_7hr_3_vs_control_1_control_2_control_3.txt
  INFO  @ Wed, 24 Jul 2024 12:58:59: Cannot parse design matrix as a string; try to parse it as a file name ...
  INFO  @ Wed, 24 Jul 2024 12:58:59: Design matrix:
  INFO  @ Wed, 24 Jul 2024 12:58:59: [[1. 0.]
  INFO  @ Wed, 24 Jul 2024 12:58:59:  [1. 0.]
  INFO  @ Wed, 24 Jul 2024 12:58:59:  [1. 0.]
  INFO  @ Wed, 24 Jul 2024 12:58:59:  [1. 1.]
  INFO  @ Wed, 24 Jul 2024 12:58:59:  [1. 1.]
  INFO  @ Wed, 24 Jul 2024 12:58:59:  [1. 1.]]
  INFO  @ Wed, 24 Jul 2024 12:58:59: Beta labels:baseline,treated_7hr_1_treated_7hr_2_treated_7hr_3_vs_control_1_control_2_control_3
  INFO  @ Wed, 24 Jul 2024 12:58:59: Included samples:control_1,control_2,control_3,treated_7hr_1,treated_7hr_2,treated_7hr_3
  INFO  @ Wed, 24 Jul 2024 12:59:00: Loaded samples:control_1;control_2;control_3;treated_7hr_1;treated_7hr_2;treated_7hr_3
  INFO  @ Wed, 24 Jul 2024 12:59:00: Sample index: 6;7;8;3;4;5
  INFO  @ Wed, 24 Jul 2024 12:59:00: Loaded 18899 genes.
  Traceback (most recent call last):
    File "/usr/local/bin/mageck", line 66, in <module>
      main();
    File "/usr/local/bin/mageck", line 43, in main
      args=crisprseq_parseargs();
    File "/usr/local/lib/python3.9/site-packages/mageck/argsParser.py", line 258, in crisprseq_parseargs
      mageckmle_main(parsedargs=args); # ignoring the script path, and the sub command
    File "/usr/local/lib/python3.9/site-packages/mageck/mlemageck.py", line 83, in mageckmle_main
      mageckcount_checkcontrolsgrna(args,sgrna2genelist)
    File "/usr/local/lib/python3.9/site-packages/mageck/mageckCount.py", line 457, in mageckcount_checkcontrolsgrna
      controlsglist=[line.strip() for line in open(args.control_sgrna)]
  FileNotFoundError: [Errno 2] No such file or directory: 'gRNA_annotated_geneSymbol_withID_CONTROLS.tsv'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improvement for existing functionality
Projects
None yet
Development

No branches or pull requests

3 participants