Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support non pgPEN libraries: Normalization, Documentation, Bug Fixes #92

Open
marissafujimoto opened this issue Feb 12, 2025 · 2 comments
Assignees

Comments

@marissafujimoto
Copy link
Collaborator

Is your feature request related to a problem? Please describe.
Right now gimap is focused on analyzing genetic interaction using the pgPEN library. Users should be able to bring their own library specification with it's own guides and control setups and still be able to calculate the genetic interactions

Describe the solution you'd like
The solution might have three components:

  1. Parameterize or break up the gimap_normalize function such that users can opt into the normalizations. This is motivated by the fact that some libraries may not have been designed to include certain types of controls such as double negative controls and currently normalization is a required step in gimap.
  2. Fix bugs related to using non pgPEN library annotations. Right now there is support in the gimap functions where users can specify a non pgPEN library, but this functionality appears to be broken and untested.
  3. Generalize the documentation to support non pgPEN use cases. There are two big issues with the documentation surrounding this use case currently. First there is not a clear guide on how users should format their library annotation files. While the pgPEN library annotations can be downloaded from figshare and manually inspected, what fields are necessary and in what format is not clear from this. Second, the way that controls are described in the documentation is pgPEN specific and gimap should have a broader definition of controls. One example is that some libraries use safe-targeting guides instead of non-targeting guides as controls. Others use non essential genes from separate paralog groups as controls or specify the controls in a cell line specific manner using computational analysis.

Describe alternatives you've considered
Without this feature users would have to calculate gi scores on with custom scripts or write custom versions of gimap functions (particularly normalization).

Additional context
Dede et al 2020 library design specifies it's own set of non essential genes to use as controls when paired with the paralog guides. Then is also includes it's own set of essential genes to use as positive controls. It does not contain any double negative guide controls.
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02173-2#Sec4:~:text=and%20BRCA1/PARP1.-,Library%20design,-We%20selected%20Cas12a

This paper also describes their normalization and genetic interaction calculations. One particular feature they do is to add a psuedo count of 5 to each count per paired guide and then normalized reads per sample have an average of 500 reads per guide. The authors were also interested in the correlation for guide orientation (guide for gene 1 first vs second) and they calculated a Pearson correlation for this as a quality control metric. They also perform a Z-transformation after removing the top 2.5% and bottom 2.5% of results. Other than this their calculations are similar to gimap's approach of calculating genetic interaction as the observed dLFC - sum of the single knockout dLFC.
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02173-2#Sec4:~:text=Output%20reagents%20(Illumina).-,Screen%20analysis,-Construct%20sequences%20were

Ito et al 2021 selected positive control guides though analysis of CCLE and depmap data. The negative controls were selected cell line specifically by selecting pairs of non expressed, non synergistic genes usign depmap and RNA seq data. Single knockout controls were constructed by pairing each guide with a preset guide sgAAVS1.
https://www.nature.com/articles/s41588-021-00967-z#Sec13:~:text=Positive%20and%20negative%20paralog%20pair%20controls

The same paper doesn't calculate genetic interaction scores, but rather uses a variational bayesian approach to gene level and gene pair inactions and then uses a gaussian mixture model to calculate p values assuming a null distribution from the known "non synergistic" pairs.
https://www.nature.com/articles/s41588-021-00967-z#Sec13:~:text=Calculation%20of%20LFC%2C%20synergy%20and%20FDR

@cansavvy
Copy link
Collaborator

Did some initial testing of what the data looks like if we don't use negative controls but it isn't complete. I think it would be good to see a comparison between the following:

  • Same data with the full normalization adjustment method
  • Same data without the negative controls so just LFC/positive controls
  • Same data without LFC adjustment, just only log2FC raw

I'm going to run this and take a look at how the final data and plots differ and have it ready for Monday.

@cansavvy
Copy link
Collaborator

Calculation comparisons here: https://www.biorxiv.org/content/10.1101/2024.08.19.608665v1.full.pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants