This repository contains the LIBRA code and online data used for Single-cell multi-omics integration and prediction analysis employed on LIBRA manuscript. Libra metrics are also available for quantifying outputs quality as well as novel PPJI preservation measurement. Seurat code employed to analyze LIBRA input omics as well as for clustering and visualization pipelines are providen.
The Python package sc-Libra, has been developed with the aim of extending and summarizing the developer code used on the paper to a user-friendly version and is freely available in the PyPI repository. Read online package documentation for detailled description and guidelines.
LIBRA is a deep learning model that is designed for Single-cell multi-omics integration and prediction. LIBRA performs this by using an unbalance Autoencoder which learns a shared low-dimensional embedding from both experiment omics, combining each sample's uniqueness for generating a enriched representation of integrated data respect to the original experiment independent data. This tool has been first developed in R code, a code snapshot is providen for R users. Next, adaptative LIBRA (aLIBRA) tool has been develop for paralellize training of LIBRA models using a grid structure for selecting optimal hyperparameters in a automatic way excluding the requirement of doing this by users saving considerable time. Snapshot code is providen in Python code for conceptual understanding.
As a result from these raw developer-codes provided, sc-Libra package is provided as a built-in resource to perform the pipeline propossed.
For further details, please refer to the online manuscript currently at biorxiv repository (will be updated asap).
To run sc-Libra pipeline the following settings are required:
- Install Python >=3.7.0.
- Install R >=3.5.2.
- Install sc-libra python package:
$ pip install sc_libra
For stepwise guide follow the online documentation.
Find Neurips provided dataset for LIBRA testing at figsahre repository to be downloaded here.
Following datasets consist only on the sparse versions without cell/feature identity, go to corresponding autor references for original datasets.
LIBRA name | GSE link | Modalities | Technology | Genomic ref used | Download sparse matrix |
---|---|---|---|---|---|
DataSet1 | GSE126074 | scRNAseq + scATACseq | SNARE-seq | Mus_musculus.GRCm38 Ver: 3.0.0 | RNA and ATAC |
DataSet2 | GSE128639 | scRNAseq + scADT | CITE-seq | Homo_sapiens.GRCh38 Ver: 3.0.0 | RNA and ADT |
DataSet3 | GSE130399 | scRNAseq + scATACseq | Paired-seq | Mus_musculus.GRCm38 Ver: 3.0.0 | RNA and ATAC |
DataSet4 | GSE140203 | scRNAseq + scATACseq | SHARE-seq | Mus_musculus.GRCm38 Ver: 3.0.0 | RNA and ATAC |
DataSet5 | 10X Genomics | scRNAseq + scATACseq | 10X multiome | Homo_sapiens.GRCh38 Ver: 3.0.0 | RNA and ATAC |
DataSet6 | GSE194122 | scRNAseq + scATACseq | 10X multiome | Homo_sapiens.GRCh38 Ver: 3.0.0 | RNA and ATAC |
DataSet7 | GSE194122 | scRNAseq + scADT | CITE-seq | Homo_sapiens.GRCh38 Ver: 3.0.0 | RNA and ADT |
DataSet8 | GSE109262 | scRNAseq + scATACseq | scNMT-seq | Mus_musculus.GRCm38 Ver: 3.0.0 | RNA and ATAC |
- Easiest way of running LIBRA analysis is though sc-Libra python package.
- Package documentation is online available using "Read the Docs" platform.
For validating LIBRA performance we compared it against other:
-
Integration performance compared to - published/available: BABEL.
-
Prediction performance compared to - published/available: Seurat3, Seurat4, MOFA+, totalVI, BABEL, multiVI and multigrate.
Further details are provided at supplementary material added at LIBRA manuscript.