BuildNicheAssay on multiple samples, get niches named the same across samples, and filtering cells before building niches #9534

grindhej · 2024-12-06T21:21:05Z

grindhej
Dec 6, 2024

I know the formatting looks crazy. I can't figure out the syntax.

Goal

a. Get niches defined across multiple patient samples
b. Figure out how many k-means, aka niches, to use
c. Quantify niche content and and around structures of interest

Background

I have single cell multiplex IF data that I pushed into a Seurat Object.
Specifically, I have per cell mean fluorescence intensity, xy locations, and phenotype. I figure that I don't need to process the MFI data to get clusters or anything, I already have the phenotype from a different and can add those into the Seurat Object metadata.

Here's some non-reproducible code, to get the idea.

init Seurat Object

SO = CreateSeuratObject(
"counts" = data # columns = cells, rows = cell types, values = MFI (mean fluorescence intensity)
, "assay"= "Spatial"
)

create FOV

fov = SeuratObject::CreateFOV(
"coords" = coord.df # rows = cells, 2 columns for x an y location in um
, "type" = "centroids"
, "assay" = "Spatial"
)

add FOV to SeuratObject

SO@images$whole_image = fov # whole_image is the FOV name

add cell metadata, which includes a phenotype column

SO[[]] = colData

change cell Identities to phenotype

Idents(SO) = "phenotype"

Then I can use BuildNicheAssay on 1 sample to get niches.

SO = BuildNicheAssay(
"object" = SO
, "fov" = "whole_image"
, "group.by" = "phenotype"
, "assay" = "niche" # Name for spatial neighborhoods assay
, "cluster.name" = "niches" # Name of output clusters. Need different for each
, "neighbors.k" = 20 # Number of neighbors to consider for each cell
, "niches.k" = 4 # Number of clusters to return based on the niche assay
)

questions

How to choose how many niches? The classic "how many k to choose" question, but maybe somebody already has an elegant feature built into Seurat for this.
I'm interested in niches specifically around rare structures in the images. Before I figured out how to get the data into Seurat and build niches here, I did a very similar analysis with kmeans clustering by hand on the whole slides--- I think that the biology from the whole slide is swamping out any rare niches that may occur near my rare structures. So should I filter to cells in and around those structures and only build niches from those cells?
I have 20 samples from different patients. How do I build niches with the same niches assigned across all the samples?
I could only find vignette that mentions BuildNicheAssay:
https://satijalab.org/seurat/articles/seurat5_spatial_vignette_2#mouse-brain-vizgen-merscope

Seurat V5 is wonderful! Thanks so much!

Edit: add a hack to perform kmeans clustering on multiple samples

this is supposed to be a nicely indented/spaced function, but I haven't figured out how to format it correctly

Description: Use the BuildNicheAssay function as a backbone. Hack it to take a list of SeuratObjects and perform ClusterR::MiniBatchKmeans on select fov. Returns the list of SeuratObjects, but now the metadata contains cluster IDs for a the provided range of K. ClusterR::MiniBatchKmeans is good when you have giant datasets to do kmeans on. This function may poop out on you if there are too many input cells.

`#' Construct an assay for spatial niche analysis
#'
#' This function will construct a new assay where each feature is a
#' cell label The values represents the sum of a particular cell label
#' neighboring a given cell.
#'
#' @param list.object list of Seurat objects to do clustering on
#' @param list.fov list of fov names to use for grabbing cells to cluster from list.object. Should be the same length as list.object
#' @param group.by Cell classifications to count in spatial neighborhood
#' @param assay Name for spatial neighborhoods assay
#' @param cluster.name Name of output clusters
#' @param neighbors.k Number of neighbors to consider for each cell
#' @param niches.k.range Number of clusters to return based on the niche assay. provide a range
#' @param batch_size Number of mini-batches for ClusterR::MiniBatchKmeans
#' @param num_init # number of times the algorithm will be run with different centroid seeds for ClusterR::MiniBatchKmeans
#'
#' @importFrom stats kmeans
#' @return Seurat object containing a new assay
#' @concept clustering
#' @export
#'
BuildNicheAssay.multiple_FOVs.MiniBatchKmeans <- function(
list.object,
list.fov,
group.by,
assay = "niche",
cluster.name = "niches",
neighbors.k = 20,
niches.k.range = 2:30 ,
batch_size = 20,
num_init = 20
) {
# check for fov in sample set
# remove if not found in object
remove = NULL # init list of indices to remove
for ( i in seq_along(list.object) ){ # message(i)
# get object and fov for each object
object = list.object[[i]]
fov = list.fov[[i]]

    if( fov %!in%  names(object@images) ){
        warning( "fov is not found in the i-th object.  Removing the object from the list.object and list.fov.  i =", i)
       remove = c(remove, i)
    }
}
for (i in rev(remove) ){
    list.object[[i]] = NULL
    list.fov[[i]] = NULL
}


for ( i in seq_along(list.object) ){ message(i)
    # get object and fov for each object
    object = list.object[[i]]
    fov = list.fov[[i]]
    
    # initialize an empty cells x groups binary matrix
    cells <- Cells( object[[fov]] )
    group.labels <- unlist(object[[group.by]][cells, ] )
    groups <- sort( unique(group.labels) )
    cell.type.mtx <- matrix(
        "data" = 0
        , "nrow" = length(cells)
        , "ncol" = length(groups)
    )
    rownames(cell.type.mtx) <- cells
    colnames(cell.type.mtx) <- groups

    # populate the binary matrix 
    cells.idx <- seq_along(cells)
    group.idx <- match(group.labels, groups)
    cell.type.mtx[cbind(cells.idx, group.idx)] <- 1
    
    # find neighbors based on tissue position
    coords <- Seurat::GetTissueCoordinates( object[[fov]], "which" = "centroids" )
    rownames(coords) <- coords[["cell"]]
    coords <- as.matrix(coords[ , c("x", "y")])
    neighbors <- Seurat::FindNeighbors(
        "object" = coords
        , "k.param" = neighbors.k # Defines k for the k-nearest neighbor algorithm
        , "compute.SNN" = F
    )
    
    # create niche assay
    sum.mtx <- as.matrix( neighbors[["nn"]] %*% cell.type.mtx )
    niche.assay <- CreateAssayObject( "counts" = t(sum.mtx) )
    object[[assay]] <- niche.assay
    DefaultAssay(object) <- assay
    
    # scale data 
    object <- ScaleData(object)
    
    # return edited object to list
    list.object[[i]] = object
    
    
}

# get aggregate data for ClusterR::MiniBatchKmeans
# columns = features
# rows = cells
# cells = values
DAT = lapply( seq_along(list.object), function(i){
    t( list.object[[i]][[assay]]@scale.data )
}  )
DAT <- do.call("rbind", DAT)



res.clusters = data.frame(row.names = rownames(DAT))

for ( k in niches.k.range ){ message("k=", k)
    # new column name
    newCol = paste0("kmeans_", k)
    # get centroids
    km_mb = ClusterR::MiniBatchKmeans(
        "data" = DAT
        , "clusters" = k # the number of clusters
        , "batch_size" = batch_size # the size of the mini batches
        , "num_init" = num_init # number of times the algorithm will be run with different centroid seeds
        , "max_iters" = 100 # the maximum number of clustering iterations. 
        , "init_fraction" = 0.2 # percentage of data to use for the initialization centroids (applies if initializer is kmeans++ or optimal_init). Should be a float number between 0.0 and 1.0.
        , "initializer" = "kmeans++" # the method of initialization. One of, optimal_init, quantile_init, kmeans++ and random. See details for more information
        , "early_stop_iter" = 10 # continue that many iterations after calculation of the best within-cluster-sum-of-squared-error
        , "verbose" = F
        , "CENTROIDS" = NULL
        , "tol" = 1e-04
        , "tol_optimal_init" = 0.3
        , "seed" = 1
    )
    
    # use centroids to get clusters

    res.clusters[,newCol] = ClusterR::predict_MBatchKMeans( # This function takes the data and the output centroids and returns the clusters.
        "data" = DAT
        , "CENTROIDS" = km_mb$centroids
    )
    res.clusters[,newCol] = as.factor( res.clusters[,newCol] ) # change clusters to factors
    
}

# get clusters back onto the objects
colnames(res.clusters) = paste0(cluster.name,".", colnames(res.clusters))
for ( i in seq_along(list.object) ){ message(i)
    # get object and fov for each object
    object = list.object[[i]]
    
    # get clusters in correct cell row order into metadata of object
    object[[]] = res.clusters[rownames(object[[]]),]
    
    # return edited object to list
    list.object[[i]] = object
}


return(list.object)

}

`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BuildNicheAssay on multiple samples, get niches named the same across samples, and filtering cells before building niches #9534

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

BuildNicheAssay on multiple samples, get niches named the same across samples, and filtering cells before building niches #9534

grindhej Dec 6, 2024

Goal

Background

questions

Edit: add a hack to perform kmeans clustering on multiple samples

Replies: 0 comments

grindhej
Dec 6, 2024