Support non pgPEN libraries: Normalization, Documentation, Bug Fixes #92

marissafujimoto · 2025-02-12T19:49:44Z

Is your feature request related to a problem? Please describe.
Right now gimap is focused on analyzing genetic interaction using the pgPEN library. Users should be able to bring their own library specification with it's own guides and control setups and still be able to calculate the genetic interactions

Describe the solution you'd like
The solution might have three components:

Parameterize or break up the gimap_normalize function such that users can opt into the normalizations. This is motivated by the fact that some libraries may not have been designed to include certain types of controls such as double negative controls and currently normalization is a required step in gimap.
Fix bugs related to using non pgPEN library annotations. Right now there is support in the gimap functions where users can specify a non pgPEN library, but this functionality appears to be broken and untested.
Generalize the documentation to support non pgPEN use cases. There are two big issues with the documentation surrounding this use case currently. First there is not a clear guide on how users should format their library annotation files. While the pgPEN library annotations can be downloaded from figshare and manually inspected, what fields are necessary and in what format is not clear from this. Second, the way that controls are described in the documentation is pgPEN specific and gimap should have a broader definition of controls. One example is that some libraries use safe-targeting guides instead of non-targeting guides as controls. Others use non essential genes from separate paralog groups as controls or specify the controls in a cell line specific manner using computational analysis.

Describe alternatives you've considered
Without this feature users would have to calculate gi scores on with custom scripts or write custom versions of gimap functions (particularly normalization).

Additional context
Dede et al 2020 library design specifies it's own set of non essential genes to use as controls when paired with the paralog guides. Then is also includes it's own set of essential genes to use as positive controls. It does not contain any double negative guide controls.
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02173-2#Sec4:~:text=and%20BRCA1/PARP1.-,Library%20design,-We%20selected%20Cas12a

This paper also describes their normalization and genetic interaction calculations. One particular feature they do is to add a psuedo count of 5 to each count per paired guide and then normalized reads per sample have an average of 500 reads per guide. The authors were also interested in the correlation for guide orientation (guide for gene 1 first vs second) and they calculated a Pearson correlation for this as a quality control metric. They also perform a Z-transformation after removing the top 2.5% and bottom 2.5% of results. Other than this their calculations are similar to gimap's approach of calculating genetic interaction as the observed dLFC - sum of the single knockout dLFC.
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02173-2#Sec4:~:text=Output%20reagents%20(Illumina).-,Screen%20analysis,-Construct%20sequences%20were

Ito et al 2021 selected positive control guides though analysis of CCLE and depmap data. The negative controls were selected cell line specifically by selecting pairs of non expressed, non synergistic genes usign depmap and RNA seq data. Single knockout controls were constructed by pairing each guide with a preset guide sgAAVS1.
https://www.nature.com/articles/s41588-021-00967-z#Sec13:~:text=Positive%20and%20negative%20paralog%20pair%20controls

The same paper doesn't calculate genetic interaction scores, but rather uses a variational bayesian approach to gene level and gene pair inactions and then uses a gaussian mixture model to calculate p values assuming a null distribution from the known "non synergistic" pairs.
https://www.nature.com/articles/s41588-021-00967-z#Sec13:~:text=Calculation%20of%20LFC%2C%20synergy%20and%20FDR

The text was updated successfully, but these errors were encountered:

cansavvy · 2025-02-19T18:40:18Z

Did some initial testing of what the data looks like if we don't use negative controls but it isn't complete. I think it would be good to see a comparison between the following:

Same data with the full normalization adjustment method
Same data without the negative controls so just LFC/positive controls
Same data without LFC adjustment, just only log2FC raw

I'm going to run this and take a look at how the final data and plots differ and have it ready for Monday.

cansavvy · 2025-02-19T19:41:55Z

Calculation comparisons here: https://www.biorxiv.org/content/10.1101/2024.08.19.608665v1.full.pdf

marissafujimoto assigned cansavvy Feb 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support non pgPEN libraries: Normalization, Documentation, Bug Fixes #92

Support non pgPEN libraries: Normalization, Documentation, Bug Fixes #92

marissafujimoto commented Feb 12, 2025

cansavvy commented Feb 19, 2025

cansavvy commented Feb 19, 2025

Support non pgPEN libraries: Normalization, Documentation, Bug Fixes #92

Support non pgPEN libraries: Normalization, Documentation, Bug Fixes #92

Comments

marissafujimoto commented Feb 12, 2025

cansavvy commented Feb 19, 2025

cansavvy commented Feb 19, 2025