This repository includes the supplementary materials and code for the paper titled Unification and Benchmarking of Segmentation Methods for Spatial Transcriptomics. The research evaluates various methodologies in spatial transcriptomics, providing insights into their performance and offering recommendations for best practices in data analysis. Additionally, we present a Nextflow framework that serves as a baseline for future benchmarking efforts. This framework is highly adaptable and user-friendly, allowing for the seamless incorporation of new segmentation methods as they emerge in the field of spatial transcriptomics.
Table of contents:
Spatial transcriptomics has emerged as a pivotal technique for understanding tissue architecture and cellular interactions. However, the rapid development of various spatial transcriptomics methods necessitates rigorous benchmarking to guide researchers in selecting appropriate tools for their studies. This work systematically evaluates several spatial transcriptomics methods based on a variety of performance metrics.
The following methodologies were benchmarked:
- Watershed: The watershed segmentation method utilizes multi-class Otsu thresholding and peak detection to accurately delineate nuclei in spatial transcriptomics images, effectively distinguishing them from the background.
- Cellpose: CellPose employs a deep learning framework with a U-Net-like architecture to segment cells based on shape and internal structure, generating vector fields that refine segmentation results while improving image quality through noise reduction.
- SCS: Subcellular Spatial Transcriptomics Cell Segmentation (SCS) integrates staining and transcriptomic data, utilizing a traditional watershed algorithm alongside a transformer neural network to accurately predict cellular relationships and enable detailed analyses of RNA localization.
- Baysor: Baysor combines molecular position data with optional staining using a Bayesian mixture model and Markov Random Field approach, optimizing cell boundary delineation while maintaining spatial coherence and enhancing segmentation accuracy across various tissue conditions.
- BIDCell: BidCell features a self-supervised deep learning framework with a Bidirectional U-Net3+ architecture that leverages biological insights to accurately segment cells in subcellular spatial transcriptomics without the need for manual annotations.
- SAM: The Segment Anything Model (SAM) utilizes a Vision Transformer to perform real-time, prompt-based segmentation across diverse tasks, expanding its training dataset through a cycle of model-assisted data annotation for enhanced robustness.
- SAM2: SAM2 builds upon the original SAM framework, incorporating streaming memory and iterative prompting capabilities for effective video and image segmentation, allowing for real-time object tracking and improved accuracy across complex content.
Following datasets consist only on the formatted transcripts and images files, go to corresponding autor references for original datasets.
Dataset | Raw data link | Technology | scRNA-seq annotation reference data used | Download data |
---|---|---|---|---|
Brain | MOSTA | StereoSeq | Paper link | Transcripts and Image |
Breast | 10x Genomics | Xenium | Paper link | Transcripts and Image |
Embrio | MOSTA | StereoSeq | Paper link | Transcripts and Image |
Lung | Nanostring | CosMx | Paper link | Transcripts and Image |
Pancreas | Nanostring | CosMx | Paper link | Transcripts and Image |
The evaluation was performed using the five datasets presented. The metrics employed were derived from the BIDCell proposed metrics. You can find the code for evaluation and visualization generation in this repository.
The findings highlight significant differences in performance across the evaluated methods, influencing the choice of method based on specific research questions and data characteristics. Detailed results, including comparisons and statistical analyses, are provided in the paper.
To run the code and reproduce the results, please ensure you have the following conda environments installed:
- SCS: This conda environment is used for running SCS, Baysor and Watershed tools
- Cellpose: This conda environment is used for running Cellpose
- BIDCell: This conda environment is used for running BIDCell
- Kernel: This conda environment is used for running the evaluation
In case you are interested in running the Nextflow pipeline (instead of a specific segmentation method) for running the corresponding scripts, ensure they are installed prior to run the Nextflow pipeline.
Running each segmentation method is possible with tool specific codes provided in this repository.
The Nextflow pipeline provides a scalable and user-friendly framework for benchmarking segmentation methods. Expanding the pipeline is straightforward—simply create a new entry for the tool you wish to benchmark and integrate it into the Nextflow workflow.
Running the Nextflow is also simple, just follow this steps:
- Nextflow directory: Clone the Nextflow directory provided in this repository.
- Change output directory: Change the output directory where outcomes should be stored in main.nf file.
- Add input data: Add to the input_data folder the two required files, the trsanscripts and the image.
- Run the Nextflow: Run the command
$ nextflow run .../nextflow/main.nf
The pipeline will preprocess, generate the patches and run the added segmentation tools.
Contributions to improve this repository are welcome! If you have suggestions or improvements, please open an issue or submit a pull request.