Datasets:
final_X_tcga_processed.hkl: Expression and mutation features for each cell from DepMap 22Q4'sOmicsExpressionProteinCodingGenesTPMLogp1.csv,OmicsSomaticMutationsMatrixHotspot.csv, andOmicsSomaticMutationsMatrixDamaging.csvdatasets. It is processed so expression features are z-scored and the features for each cell are l2-normalized to 1.final_X_tcga_raw_unnormalized.hkl: Expression and mutation features for each cell from DepMap 22Q4'sOmicsExpressionProteinCodingGenesTPMLogp1.csv,OmicsSomaticMutationsMatrixHotspot.csv, andOmicsSomaticMutationsMatrixDamaging.csvdatasets.CRISPRGeneEffect_processed.hkl:CRISPRGeneEffect.csvfrom DepMap 22Q4, filtered for cells that we have mutation and expression features for.Chronos_Combined_predictability_results.csv: Predictability data from DepMapcancerGeneList.tsv: OncoKB cancer genes (https://www.oncokb.org/cancer-genes)sample_info.csv: DepMap metadata for cell linesdatasets/tcga_data_processed_figures.hkl: TCGA data downloaded from Xena
Files:
train_and_get_grads.ipynb: Train one kernel regression model per knockout and get feature importances for each KO.demo.py: Use calculated feature importances to visualize feature importance distributions for a given KO.generate_figures.ipynb: Generate main text figures
Feel free to direct any questions about the code to caic@mit.edu.