Of Exactitude in Science
...In that Empire, the craft of Cartography attained such Perfection that the Map of a Single province covered the space of an entire City, and the Map of the Empire itself an entire Province. In the course of Time, these Extensive maps were found somehow wanting, and so the College of Cartographers evolved a Map of the Empire that was of the same Scale as the Empire and that coincided with it point for point. Less attentive to the Study of Cartography, succeeding Generations came to judge a map of such Magnitude cumbersome, and, not without Irreverence, they abandoned it to the Rigours of sun and Rain. In the western Deserts, tattered Fragments of the Map are still to be found, Sheltering an occasional Beast or beggar; in the whole Nation, no other relic is left of the Discipline of Geography.
---From Travels of Praiseworthy Men (1658) by J. A. Suarez Miranda
borges
is a small data visualization package that allows you to plot your single cell RNA-seq dataset (or any other dataset) as an antique or modern cartographic atlas. It uses 2D coordinates - be it UMAP, tSNE, PCA or anything else - and depicts group labels as continuous territories in an ocean, separated by rivers or seas.
This is all done through the use of oveRlay
, ggplot2
and a few other libraries.
borges
is still very much under development so any feedback (especially bug reports) is more than welcome.
If you are trying to represent high-dimensional data in 2D, not at all. All dimensional reduction techniques distort distances going from high dimensionality to low dimensionality, and non-linear techniques such as t-SNE and UMAP are very sensitive to tunable parameters (perplexity, number of neighbors, spread, etc) that do not depend on the input data. You can fiddle with as many of these parameters and RNG rounds as you want until you get something nice to show your friends. You can read more about it here. The purpose of borges
is to make your beautiful, useless plots even more beautiful and slightly more useless.
If instead you are representing point clouds that have a rigorous justification for their embedding in a 2D space, then by all means use borges
to make your beautiful, useful plots even more beautiful and slightly less useful.
To install borges
you will need to install first oveRlay
.
Use remotes::install_github()
or devtools::install_github()
as follows:
remotes::install_github("gdagstn/oveRlay")
remotes::install_github("gdagstn/borges")
borges
has only two functions:
prepAtlas()
to prepare the atlas coordinates from a SingleCellExperiment
object, a matrix
or a data.frame
, and plotAtlas()
to display it as a ggplot2
plot.
For a practical demonstration, let's download a SingleCellExperiment
object using the scRNAseq
BioConductor package from Zeisel et al. 2018, "Molecular architecture of the mouse nervous system" (link).
This is quite a large file which will take a while to download.
# BiocManager::install("scRNAseq")
zeisel = scRNAseq::ZeiselNervousData()
The Zeisel dataset has an "unnamed" reducedDim
slot that contains a t-SNE embedding for cells of the nervous system. There are several labels in the colData
slot, and we will choose one that offers a good balance between detail and redability.
zeiselatlas = prepAtlas(zeisel,
dimred = "unnamed",
res = 400,
labels = "TaxonomyRank3",
as_map = TRUE)
The atlas can be plotted:
plotAtlas(zeiselatlas)
Note that plotting can take a few seconds to a minute due to the high level of detail. To have less detailed maps, you can set the res
argument in prepAtlas()
to a smaller value, e.g. 250 or 300, and plot_cells = FALSE
.
The arguments of plotAtlas()
allow you to control a few graphical elements:
-
plot_cells
(logical) to plot cells (as small, semi-transparent dots) -
add_contours
(logical) to add 2D kernel density contour estimates, clipped to stay within land masses (mostly) -
show_labels
(logical) to show labels usinggeom_label_repel()
fromggrepel
-
label_size
(numeric) to override the default label size decided by the map theme -
shade_borders
(logical) to add an antique-style shading to the boundaries -
shade_offset
(numeric) for the size and direction of the border -
shade_skip
(numeric) for the spacing between shading lines -
capitalize_labels
(logical) to capitalize all labels
borges
can also be used on any generic 2D point cloud represented as a matrix
or data.frame
, as long as they have two columns (the first one is taken to have coordinates in the x axis, and the second one in the y axis). Moreover, if you are supplying either matrix
or data.frame
, the labels
argument must be a character vector with labels for every point.
mats = rbind(matrix(rnorm(1000, 0, 1), ncol = 2),
matrix(rnorm(1000, 4, 1), ncol = 2),
matrix(rnorm(1000, -3, 1), ncol = 2))
labels = c(rep("Cluster 1", 500),
rep("Cluster 2", 500),
rep("Cluster 3", 500))
atl = prepAtlas(mats, res = 100, labels = labels)
plotAtlas(atl)
The as_map
argument in plotAtlas()
controls whether it will be plotted using a geographical projection, and the map_proj
argument controls the type of projection. Any one-character argument to ggplot2::coord_map()
is acceptable.
For instance, we can plot the atlas using a "globular" projection:
plotAtlas(zeiselatlas, as_map = TRUE, map_proj = "globular")
borges
comes with a few different themes pre-packaged:
-
classic: the default theme
-
modern: a modern political atlas-like theme
-
renaissance: palette from 16th century maps
-
medieval: palette from 14th century maps
plotAtlas(zeiselatlas, map_theme = "renaissance")
In the future, themes will support different fonts and additional aesthetic elements.