Skip to content

The Compositional Perturbation Autoencoder (CPA) is a deep generative framework to learn effects of perturbations at the single-cell level. CPA performs OOD predictions of unseen combinations of drugs, learns interpretable embeddings, estimates dose-response curves, and provides uncertainty estimates.

License

Notifications You must be signed in to change notification settings

FrancescaDr/CPA

 
 

Repository files navigation

CPA - Compositional Perturbation Autoencoder

CPA is a collaborative research project from Facebook AI Research (FAIR) and a computatiobal biology group of Prof. Fabian Theis (https://github.com/theislab) from Helmholtz Zentrum München.

What is CPA?

Screenshot

CPA is a framework to learn effects of perturbations at the single-cell level. CPA encodes and learns phenotypic drug response across different cell types, doses and drug combinations. CPA allows:

  • Out-of-distribution predicitons of unseen drug combinations at various doses and among different cell types.
  • Learn interpretable drug and cell type latent spaces.
  • Estimate dose response curve for each perturbation and their combinations.
  • Access the uncertainty of the estimations of the model.

Package Structure

The repository is centered around the compert module:

Additional files and folders:

  • datasets contains both versions of the data: raw and pre-processed.
  • preprocessing contains notebooks to reproduce the datasets pre-processing from raw data.
  • notebooks contains notebooks to reproduce plots from the paper and detailed analysis of each of the datasets.
  • pretrained_models contains best models selected after the sweeps. These models were used for the analysis and figures in the paper.
  • scripts contains bash files for automatic running of the model.

Usage

  • As a first step, download the contents of datasets/ and pretrained_models/ from this tarball.

To learn how to use this repository, check ./notebooks/demo.ipynb, and the following scripts:

Examples and Reproducibility

All the examples and the reproducbility notebooks for the plots in the paper could be found in the notebooks/ folder.

Curation of your own data to train CPA

  • To prepare your data to train CPA, you need to add specific fields to adata object and perfrom data split. Examples on how to add necessary fields for multiple datasets used in the paper can be found in preprocessing/ folder.

Training a model

There are two ways to train a compert model:

  • Using the command line, e.g.: python -m compert.train --dataset_path datasets/GSM_new.h5ad --save_dir /tmp --max_epochs 1 --doser_type sigm
  • From jupyter notebook: example in ./notebooks/demo.ipynb

Testing

Run python ./scripts/run_one_epoch.sh to perfrom automatic testing for one epoch of all the datasets used in the study.

Documentation

Currently you can access the documentation via help function in IPython. For example:

from compert.api import ComPertAPI

help(ComPertAPI)

from compert.plotting import CompertVisuals

help(CompertVisuals)

A separate page with the documentation is coming soon.

Support and contribute

If you have a question or noticed a problem, you can post an issue.

Reference

Please cite the following preprint if you find CPA useful in your research.

@article {Lotfollahi2021.04.14.439903,
	author = {Lotfollahi, Mohammad and Susmelj, Anna Klimovskaia and De Donno, Carlo and Ji, Yuge and Ibarra, Ignacio L. and Wolf, F. Alexander and Yakubova, Nafissa and Theis, Fabian J. and Lopez-Paz, David},
	title = {Learning interpretable cellular responses to complex perturbations in high-throughput screens},
	elocation-id = {2021.04.14.439903},
	year = {2021},
	doi = {10.1101/2021.04.14.439903},
	publisher = {Cold Spring Harbor Laboratory},

	URL = {https://www.biorxiv.org/content/early/2021/05/18/2021.04.14.439903},
	eprint = {https://www.biorxiv.org/content/early/2021/05/18/2021.04.14.439903.full.pdf},
	journal = {bioRxiv}
}

The preprint titled Learning interpretable cellular responses to complex perturbations in high-throughput screens can be found here.

License

This source code is released under the MIT license, included here.

About

The Compositional Perturbation Autoencoder (CPA) is a deep generative framework to learn effects of perturbations at the single-cell level. CPA performs OOD predictions of unseen combinations of drugs, learns interpretable embeddings, estimates dose-response curves, and provides uncertainty estimates.

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.8%
  • Other 0.2%