BigSur is a package for principled, robust scRNAseq normalization. Currently we can perform feature selection, see BigSurR for correlations.
Basic Informatics and Gene Statistics from Unnormalized Reads (BigSur) is a principled pipeline allowing for feature selection, correlation and clustering in scRNAseq.
- The correlation are detailed in Silkwood et al. 2023.
- The feature selection derivations are detailed in the BioRxiv paper Dollinger et al. 2023.
The only way to install BigSur currently is to clone the GitHub repo. We've included an environment.yml file for conda environment installation; the only package we require that isn't installed with scanpy is mpmath and numexpr. For example:
In terminal:
cd bigsur_dir #directory to clone to
git clone https://github.com/landerlabcode/BigSur.git
Usage for feature selection is detailed in the example notebook.
TL;DR:
import sys
sys.path.append(bigsur_dir) # directory where git repo was cloned
from BigSur.feature_selection import mcfano_feature_selection as mcfano
Replace sc.pp.highly_variable_genes(adata)
in your pipeline with mcfano(adata, layer='counts')
, where the UMI counts are in adata.layers['counts']
.
And that's it! You can read more about how to use BigSur for feature selection, and in particular how to optimize cutoffs for a given dataset, in the example notebook.