Principal components analysis, duh

Overview

As the name suggests, this repository implements functions to perform a PCA on the gene-by-cell expression matrix, returning low-dimensional coordinates for each cell that can be used for efficient downstream analyses, e.g., clustering, visualization. The code itself was originally derived from the scran and batchelor R packages factored out into a separate C++ library for easier re-use.

Quick start

Given a tatami::Matrix, the scran_pca::simple_pca() function will compute the PCA to obtain a low-dimensional representation of the cells:

#include "scran_pca/scran_pca.hpp"

const tatami::Matrix<double, int>& mat = some_data_source();

// Take the top 20 PCs:
scran_pca::SimplePcaOptions opt;
opt.rank = 20;
auto res = scran_pca::simple_pca(mat, opt);

res.components; // rows are PCs, columns are cells.
res.rotation; // rows are genes, columns correspond to PCs.
res.variance_explained; // one per PC, in decreasing order.
res.total_variance; // total variance in the dataset.

Advanced users can fiddle with more of the options:

opt.scale = true;
opt.num_threads = 4;
opt.realize_matrix = false;
auto res2 = scran_pca::simple_pca(mat, opt);

In the presence of multiple blocks, we can perform the PCA on the residuals after regressing out the blocking factor. This ensures that the inter-block differences do not contribute to the first few PCs, instead favoring the representation of intra-block variation.

std::vector<int> blocks = some_blocks();

scran_pca::BlockedPcaOptions bopt;
bopt.rank = 10; // taking the top 10 PCs this time.
auto bres = scran_pca::blocked_pca(mat, blocks.data(), bopt);

bres.components; // rows are PCs, columns are cells.
bres.center; // rows are blocks, columns are genes.

The components derived from the residuals will only be free of inter-block differences under certain conditions (equal population composition with a consistent shift between blocks). If this is not the case, more sophisticated batch correction methods are required. If those methods accept a low-dimensional representation for the cells as input, we can use scran_pca::blocked_pca() to obtain an appropriate matrix that focuses on intra-block variation without making assumptions about the inter-block differences:

bopt.components_from_residuals = false;
auto bres2 = scran_pca::blocked_pca(mat, blocks.data(), bopt);

Check out the reference documentation for more details.

Building projects

CMake with `FetchContent`

If you're using CMake, you just need to add something like this to your CMakeLists.txt:

include(FetchContent)

FetchContent_Declare(
  scran_pca
  GIT_REPOSITORY https://github.com/libscran/scran_pca
  GIT_TAG master # or any version of interest
)

FetchContent_MakeAvailable(scran_pca)

Then you can link to scran_pca to make the headers available during compilation:

# For executables:
target_link_libraries(myexe libscran::scran_pca)

# For libaries
target_link_libraries(mylib INTERFACE libscran::scran_pca)

CMake with `find_package()`

find_package(libscran_scran_pca CONFIG REQUIRED)
target_link_libraries(mylib INTERFACE libscran::scran_pca)

To install the library, use:

mkdir build && cd build
cmake .. -DSCRAN_PCA_TESTS=OFF
cmake --build . --target install

By default, this will use FetchContent to fetch all external dependencies. If you want to install them manually, use -DSCRAN_PCA_FETCH_EXTERN=OFF. See the tags in extern/CMakeLists.txt to find compatible versions of each dependency.

Manual

If you're not using CMake, the simple approach is to just copy the files in include/ - either directly or with Git submodules - and include their path during compilation with, e.g., GCC's -I. This requires the external dependencies listed in extern/CMakeLists.txt, which also need to be made available during compilation.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.github/workflows		.github/workflows
cmake		cmake
docs		docs
extern		extern
include/scran_pca		include/scran_pca
tests		tests
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Principal components analysis, duh

Overview

Quick start

Building projects

CMake with `FetchContent`

CMake with `find_package()`

Manual

About

Releases 3

Packages

Languages

License

libscran/scran_pca

Folders and files

Latest commit

History

Repository files navigation

Principal components analysis, duh

Overview

Quick start

Building projects

CMake with FetchContent

CMake with find_package()

Manual

About

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Languages

CMake with `FetchContent`

CMake with `find_package()`

Packages