Skip to content

Latest commit

 

History

History
108 lines (87 loc) · 9.03 KB

README.md

File metadata and controls

108 lines (87 loc) · 9.03 KB

Save Bioconductor objects to file

Overview

The alabaster framework implements methods to save a variety of R/Bioconductor objects to on-disk representations. This is a more robust and portable alternative to the typical approach of saving objects in RDS files.

  • By separating the on-disk representation from the in-memory object structure, we can more easily adapt to changes in S4 class definitions. This improves robustness to R environment updates, especially when updateObject() is not correctly configured.
  • By using standard file formats like HDF5 and JSON, we ensure that Bioconductor objects can be easily read from other languages like Python and Javascript. This improves interoperability between application ecosystems.
  • By breaking up complex Bioconductor objects into their components, we enable modular reads and writes to the backing store. We can easily read or update part of an object without having to consider the other parts.

The alabaster.base package defines the base generics to read and write the file structures along with the associated metadata. Implementations of these methods for various Bioconductor classes can be found in the other alabaster packages like alabaster.se and alabaster.bumpy.

Quick start

First, we'll install the alabaster.base package. This package is available from Bioconductor, so we can use the standard Bioconductor installation process:

# install.packages("BiocManager")
BiocManager::install("alabaster.base")

The simplest example involves saving a DataFrame inside a staging directory. Let's mock up an object:

library(S4Vectors)
df <- DataFrame(X=1:10, Y=letters[1:10])
## DataFrame with 10 rows and 2 columns
##            X           Y
##    <integer> <character>
## 1          1           a
## 2          2           b
## 3          3           c
## 4          4           d
## 5          5           e
## 6          6           f
## 7          7           g
## 8          8           h
## 9          9           i
## 10        10           j

Then we can save it to the staging directory:

tmp <- tempfile()
library(alabaster.base)
saveObject(df, tmp)

We can copy the directory to another location, over a network, etc., and then easily load it back into a new R session:

readObject(tmp)
## DataFrame with 10 rows and 2 columns
##            X           Y
##    <integer> <character>
## 1          1           a
## 2          2           b
## 3          3           c
## 4          4           d
## 5          5           e
## 6          6           f
## 7          7           g
## 8          8           h
## 9          9           i
## 10        10           j

Check out the user's guide for more details.

Supported classes

The saving/reading process can be applied to a range of data structures, provided the appropriate alabaster package is installed.

Package Object types BioC-devel BioC-release
alabaster.base list, factor, DataFrame, List
alabaster.matrix matrix, Matrix objects, DelayedArray
alabaster.ranges GRanges, GRangesList and related objects
alabaster.se SummarizedExperiment, RangedSummarizedExperiment
alabaster.sce SingleCellExperiment
alabaster.mae MultiAssayExperiment
alabaster.string XStringSet
alabaster.spatial SpatialExperiment
alabaster.bumpy BumpyMatrix objects
alabaster.vcf VCF objects
alabaster.files Common bioinformatics files, e.g., FASTQ, BAM

All packages are available from Bioconductor and can be installed with the usual BiocManager::install() process. Alternatively, to install all packages in one go, users can install the alabaster umbrella package.

Extensions and applications

Developers can extend this framework to support more R/Bioconductor classes by creating their own alabaster package. Check out the extension section for more details.

Developers can also customize this framework for specific applications, most typically to add bespoke metadata in the staging directory. The metadata can then be indexed by database systems like SQLite and MongoDB to provide search capabilities. Check out the applications section for more details.