You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I know the formatting looks crazy. I can't figure out the syntax.
Goal
a. Get niches defined across multiple patient samples
b. Figure out how many k-means, aka niches, to use
c. Quantify niche content and and around structures of interest
Background
I have single cell multiplex IF data that I pushed into a Seurat Object.
Specifically, I have per cell mean fluorescence intensity, xy locations, and phenotype. I figure that I don't need to process the MFI data to get clusters or anything, I already have the phenotype from a different and can add those into the Seurat Object metadata.
Here's some non-reproducible code, to get the idea.
fov = SeuratObject::CreateFOV(
"coords" = coord.df # rows = cells, 2 columns for x an y location in um
, "type" = "centroids"
, "assay" = "Spatial"
)
add FOV to SeuratObject
SO@images$whole_image = fov # whole_image is the FOV name
add cell metadata, which includes a phenotype column
SO[[]] = colData
change cell Identities to phenotype
Idents(SO) = "phenotype"
Then I can use BuildNicheAssay on 1 sample to get niches.
SO = BuildNicheAssay(
"object" = SO
, "fov" = "whole_image"
, "group.by" = "phenotype"
, "assay" = "niche" # Name for spatial neighborhoods assay
, "cluster.name" = "niches" # Name of output clusters. Need different for each
, "neighbors.k" = 20 # Number of neighbors to consider for each cell
, "niches.k" = 4 # Number of clusters to return based on the niche assay
)
questions
How to choose how many niches? The classic "how many k to choose" question, but maybe somebody already has an elegant feature built into Seurat for this.
I'm interested in niches specifically around rare structures in the images. Before I figured out how to get the data into Seurat and build niches here, I did a very similar analysis with kmeans clustering by hand on the whole slides--- I think that the biology from the whole slide is swamping out any rare niches that may occur near my rare structures. So should I filter to cells in and around those structures and only build niches from those cells?
I have 20 samples from different patients. How do I build niches with the same niches assigned across all the samples?
I could only find vignette that mentions BuildNicheAssay: https://satijalab.org/seurat/articles/seurat5_spatial_vignette_2#mouse-brain-vizgen-merscope
Seurat V5 is wonderful! Thanks so much!
Edit: add a hack to perform kmeans clustering on multiple samples
this is supposed to be a nicely indented/spaced function, but I haven't figured out how to format it correctly
Description: Use the BuildNicheAssay function as a backbone. Hack it to take a list of SeuratObjects and perform ClusterR::MiniBatchKmeans on select fov. Returns the list of SeuratObjects, but now the metadata contains cluster IDs for a the provided range of K. ClusterR::MiniBatchKmeans is good when you have giant datasets to do kmeans on. This function may poop out on you if there are too many input cells.
`#' Construct an assay for spatial niche analysis
#'
#' This function will construct a new assay where each feature is a
#' cell label The values represents the sum of a particular cell label
#' neighboring a given cell.
#'
#' @param list.object list of Seurat objects to do clustering on
#' @param list.fov list of fov names to use for grabbing cells to cluster from list.object. Should be the same length as list.object
#' @param group.by Cell classifications to count in spatial neighborhood
#' @param assay Name for spatial neighborhoods assay
#' @param cluster.name Name of output clusters
#' @param neighbors.k Number of neighbors to consider for each cell
#' @param niches.k.range Number of clusters to return based on the niche assay. provide a range
#' @param batch_size Number of mini-batches for ClusterR::MiniBatchKmeans
#' @param num_init # number of times the algorithm will be run with different centroid seeds for ClusterR::MiniBatchKmeans
#'
#' @importFrom stats kmeans
#' @return Seurat object containing a new assay
#' @concept clustering
#' @export
#'
BuildNicheAssay.multiple_FOVs.MiniBatchKmeans <- function(
list.object,
list.fov,
group.by,
assay = "niche",
cluster.name = "niches",
neighbors.k = 20,
niches.k.range = 2:30 ,
batch_size = 20,
num_init = 20
) {
# check for fov in sample set
# remove if not found in object
remove = NULL # init list of indices to remove
for ( i in seq_along(list.object) ){ # message(i)
# get object and fov for each object
object = list.object[[i]]
fov = list.fov[[i]]
if( fov %!in% names(object@images) ){
warning( "fov is not found in the i-th object. Removing the object from the list.object and list.fov. i =", i)
remove = c(remove, i)
}
}
for (i in rev(remove) ){
list.object[[i]] = NULL
list.fov[[i]] = NULL
}
for ( i in seq_along(list.object) ){ message(i)
# get object and fov for each object
object = list.object[[i]]
fov = list.fov[[i]]
# initialize an empty cells x groups binary matrix
cells <- Cells( object[[fov]] )
group.labels <- unlist(object[[group.by]][cells, ] )
groups <- sort( unique(group.labels) )
cell.type.mtx <- matrix(
"data" = 0
, "nrow" = length(cells)
, "ncol" = length(groups)
)
rownames(cell.type.mtx) <- cells
colnames(cell.type.mtx) <- groups
# populate the binary matrix
cells.idx <- seq_along(cells)
group.idx <- match(group.labels, groups)
cell.type.mtx[cbind(cells.idx, group.idx)] <- 1
# find neighbors based on tissue position
coords <- Seurat::GetTissueCoordinates( object[[fov]], "which" = "centroids" )
rownames(coords) <- coords[["cell"]]
coords <- as.matrix(coords[ , c("x", "y")])
neighbors <- Seurat::FindNeighbors(
"object" = coords
, "k.param" = neighbors.k # Defines k for the k-nearest neighbor algorithm
, "compute.SNN" = F
)
# create niche assay
sum.mtx <- as.matrix( neighbors[["nn"]] %*% cell.type.mtx )
niche.assay <- CreateAssayObject( "counts" = t(sum.mtx) )
object[[assay]] <- niche.assay
DefaultAssay(object) <- assay
# scale data
object <- ScaleData(object)
# return edited object to list
list.object[[i]] = object
}
# get aggregate data for ClusterR::MiniBatchKmeans
# columns = features
# rows = cells
# cells = values
DAT = lapply( seq_along(list.object), function(i){
t( list.object[[i]][[assay]]@scale.data )
} )
DAT <- do.call("rbind", DAT)
res.clusters = data.frame(row.names = rownames(DAT))
for ( k in niches.k.range ){ message("k=", k)
# new column name
newCol = paste0("kmeans_", k)
# get centroids
km_mb = ClusterR::MiniBatchKmeans(
"data" = DAT
, "clusters" = k # the number of clusters
, "batch_size" = batch_size # the size of the mini batches
, "num_init" = num_init # number of times the algorithm will be run with different centroid seeds
, "max_iters" = 100 # the maximum number of clustering iterations.
, "init_fraction" = 0.2 # percentage of data to use for the initialization centroids (applies if initializer is kmeans++ or optimal_init). Should be a float number between 0.0 and 1.0.
, "initializer" = "kmeans++" # the method of initialization. One of, optimal_init, quantile_init, kmeans++ and random. See details for more information
, "early_stop_iter" = 10 # continue that many iterations after calculation of the best within-cluster-sum-of-squared-error
, "verbose" = F
, "CENTROIDS" = NULL
, "tol" = 1e-04
, "tol_optimal_init" = 0.3
, "seed" = 1
)
# use centroids to get clusters
res.clusters[,newCol] = ClusterR::predict_MBatchKMeans( # This function takes the data and the output centroids and returns the clusters.
"data" = DAT
, "CENTROIDS" = km_mb$centroids
)
res.clusters[,newCol] = as.factor( res.clusters[,newCol] ) # change clusters to factors
}
# get clusters back onto the objects
colnames(res.clusters) = paste0(cluster.name,".", colnames(res.clusters))
for ( i in seq_along(list.object) ){ message(i)
# get object and fov for each object
object = list.object[[i]]
# get clusters in correct cell row order into metadata of object
object[[]] = res.clusters[rownames(object[[]]),]
# return edited object to list
list.object[[i]] = object
}
return(list.object)
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Goal
a. Get niches defined across multiple patient samples
b. Figure out how many k-means, aka niches, to use
c. Quantify niche content and and around structures of interest
Background
I have single cell multiplex IF data that I pushed into a Seurat Object.
Specifically, I have per cell mean fluorescence intensity, xy locations, and phenotype. I figure that I don't need to process the MFI data to get clusters or anything, I already have the phenotype from a different and can add those into the Seurat Object metadata.
Here's some non-reproducible code, to get the idea.
init Seurat Object
create FOV
add FOV to SeuratObject
add cell metadata, which includes a phenotype column
change cell Identities to phenotype
Then I can use BuildNicheAssay on 1 sample to get niches.
questions
How to choose how many niches? The classic "how many k to choose" question, but maybe somebody already has an elegant feature built into Seurat for this.
I'm interested in niches specifically around rare structures in the images. Before I figured out how to get the data into Seurat and build niches here, I did a very similar analysis with kmeans clustering by hand on the whole slides--- I think that the biology from the whole slide is swamping out any rare niches that may occur near my rare structures. So should I filter to cells in and around those structures and only build niches from those cells?
I have 20 samples from different patients. How do I build niches with the same niches assigned across all the samples?
I could only find vignette that mentions BuildNicheAssay:
https://satijalab.org/seurat/articles/seurat5_spatial_vignette_2#mouse-brain-vizgen-merscope
Seurat V5 is wonderful! Thanks so much!
Edit: add a hack to perform kmeans clustering on multiple samples
Description: Use the BuildNicheAssay function as a backbone. Hack it to take a list of SeuratObjects and perform ClusterR::MiniBatchKmeans on select fov. Returns the list of SeuratObjects, but now the metadata contains cluster IDs for a the provided range of K. ClusterR::MiniBatchKmeans is good when you have giant datasets to do kmeans on. This function may poop out on you if there are too many input cells.
`#' Construct an assay for spatial niche analysis
#'
#' This function will construct a new assay where each feature is a
#' cell label The values represents the sum of a particular cell label
#' neighboring a given cell.
#'
#' @param list.object list of Seurat objects to do clustering on
#' @param list.fov list of fov names to use for grabbing cells to cluster from list.object. Should be the same length as list.object
#' @param group.by Cell classifications to count in spatial neighborhood
#' @param assay Name for spatial neighborhoods assay
#' @param cluster.name Name of output clusters
#' @param neighbors.k Number of neighbors to consider for each cell
#' @param niches.k.range Number of clusters to return based on the niche assay. provide a range
#' @param batch_size Number of mini-batches for ClusterR::MiniBatchKmeans
#' @param num_init # number of times the algorithm will be run with different centroid seeds for ClusterR::MiniBatchKmeans
#'
#' @importFrom stats kmeans
#' @return Seurat object containing a new assay
#' @concept clustering
#' @export
#'
BuildNicheAssay.multiple_FOVs.MiniBatchKmeans <- function(
list.object,
list.fov,
group.by,
assay = "niche",
cluster.name = "niches",
neighbors.k = 20,
niches.k.range = 2:30 ,
batch_size = 20,
num_init = 20
) {
# check for fov in sample set
# remove if not found in object
remove = NULL # init list of indices to remove
for ( i in seq_along(list.object) ){ # message(i)
# get object and fov for each object
object = list.object[[i]]
fov = list.fov[[i]]
}
`
Beta Was this translation helpful? Give feedback.
All reactions