uhlerlab · sabrinacamp2 · Mar 25, 2026 · Mar 25, 2026 · Mar 25, 2026 · Mar 25, 2026
diff --git a/README.md b/README.md
@@ -19,65 +19,41 @@ The method is described in the paper: [SpatialFusion: A lightweight multimodal f
 You can find detailed documentation at https://uhlerlab.github.io/spatialfusion/
 
 ---
+## Prepare SpatialFusion inputs
+Before running SpatialFusion, you need to generate unimodal embeddings from:
+- spatial transcriptomics data → using **scGPT**
+- H&E / whole-slide images → using **UNI**
 
+We provide two ways to generate these, detailed on our [documentation website](https://uhlerlab.github.io/spatialfusion/unimodal-embeddings/).
 ## Installation
 
-We provide pretrained weights for the **multimodal autoencoder (AE)** and **graph convolutional masked autoencoder (GCN)** under `data/`.
-
-SpatialFusion depends on **PyTorch** and **DGL**, which have different builds for CPU and GPU systems. You can install it using **pip** or inside a **conda/mamba** environment.
-
----
-
-### 1. Create mamba environment
+### 1. Create virtual environment
 
 ```bash
 mamba create -n spatialfusion python=3.10 -y
 mamba activate spatialfusion
-# Then install GPU or CPU version below
-```
-
-### 2. Install platform-specific libraries (GPU vs CPU)
-
-#### GPU (CUDA 12.4)
-
-```bash
-pip install "torch==2.4.1" "torchvision==0.19.1" \
-  --index-url https://download.pytorch.org/whl/cu124
-conda install -c dglteam/label/th24_cu124 dgl
 ```
 
-**Note:** TorchText issues exist for this version:
-[https://github.com/pytorch/text/issues/2272](https://github.com/pytorch/text/issues/2272) — this may affect scGPT.
+### 2. Install platform-specific libraries
 
----
+SpatialFusion depends on PyTorch and DGL, which have different builds for CPU and GPU systems. 
 
-#### GPU (CUDA 12.1) — *Recommended if using scGPT*
+#### CPU
 
 ```bash
-pip install torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 \
-  --index-url https://download.pytorch.org/whl/cu121
-conda install -c dglteam/label/th21_cu121 dgl
-
-# Optional: embeddings used by scGPT
-pip install --no-cache-dir torchtext==0.18.0 torchdata==0.9.0
+pip install "torch==2.4.1" "torchvision==0.19.1" \
+  --index-url https://download.pytorch.org/whl/cpu
 
-# Optional: UNI (H&E embedding model)
-pip install timm
+pip install dgl -f https://data.dgl.ai/wheels/torch-2.4/repo.html
 ```
----
 
-#### CPU-only
+#### GPU (CUDA 12.4)
 
 ```bash
 pip install "torch==2.4.1" "torchvision==0.19.1" \
-  --index-url https://download.pytorch.org/whl/cpu
-pip install dgl -f https://data.dgl.ai/wheels/torch-2.4/repo.html
-
-# Optional, used for scGPT
-pip install --no-cache-dir torchtext==0.18.0 torchdata==0.9.0
+  --index-url https://download.pytorch.org/whl/cu124
 
-# Optional, used for UNI
-pip install timm
+pip install dgl -f https://data.dgl.ai/wheels/torch-2.4/cu124/repo.html
 ```
 ---
 
@@ -93,12 +69,9 @@ Includes: `pytest`, `black`, `ruff`, `sphinx`, `matplotlib`, `seaborn`.
 
 ```bash
 git clone https://github.com/uhlerlab/spatialfusion.git
+
 cd spatialfusion/
-pip install -e .
-```
 
-```bash
-# Optional contributor extras
 pip install -e ".[dev,docs]"
 ```
 
@@ -278,10 +251,7 @@ Tutorial data is available on Zenodo:
 
 If you use SpatialFusion, please cite:
 
-> Broad Institute Spatial Foundation, *SpatialFusion* (2025).
-> [https://github.com/broadinstitute/spatialfusion](https://github.com/broadinstitute/spatialfusion)
-
-Full manuscript citation will be added when available.
+> Yates J, Shavakhi M, Choueiri T, Van Allen EM, Uhler C. SpatialFusion: A lightweight multimodal foundation model for pathway-informed spatial niche mapping. _bioRxiv_. 2026. doi:10.64898/2026.03.16.712056
 
 ---
 

diff --git a/docs/index.md b/docs/index.md
@@ -7,7 +7,7 @@
   </figcaption>
 </figure>
 
-This method is described in the paper (TBD).
+The method is described in the paper: [SpatialFusion: A lightweight multimodal foundation model for pathway-informed spatial niche mapping](https://doi.org/10.64898/2026.03.16.712056).
 
 **SpatialFusion** is a lightweight foundation model designed to represent find niches in tissue using a lower dimensional embedding. It integrates spatial transcriptomics data with histopathology-derived image features into a shared latent representation, and can be applied to paired spatial transcriptomics and whole slide images or whole slide images only.
 

diff --git a/docs/installation.md b/docs/installation.md
@@ -1,81 +1,52 @@
 # Installation
 
-We provide pretrained weights for the **multimodal autoencoder (AE)** and **graph convolutional masked autoencoder (GCN)** under `data/`.
 
-SpatialFusion depends on **PyTorch** and **DGL**, which have different builds for CPU and GPU systems. You can install it using **pip** or inside a **conda/mamba** environment.
-
----
-
-### 1. Create mamba environment
+### 1. Create virtual environment
 
 ```bash
 mamba create -n spatialfusion python=3.10 -y
 mamba activate spatialfusion
-# Then install GPU or CPU version below
 ```
 
-### 2. Install platform-specific libraries (GPU vs CPU)
+### 2. Install platform-specific libraries
 
-#### GPU (CUDA 12.4)
+SpatialFusion depends on PyTorch and DGL, which have different builds for CPU and GPU systems. 
 
-```bash
-pip install "torch==2.4.1" "torchvision==0.19.1" \
-  --index-url https://download.pytorch.org/whl/cu124
-conda install -c dglteam/label/th24_cu124 dgl
-```
-
-**Note:** TorchText issues exist for this version:
-[https://github.com/pytorch/text/issues/2272](https://github.com/pytorch/text/issues/2272) — this may affect scGPT.
-
----
-
-#### GPU (CUDA 12.1) — *Recommended if using scGPT*
+#### CPU
 
 ```bash
-pip install torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 \
-  --index-url https://download.pytorch.org/whl/cu121
-conda install -c dglteam/label/th21_cu121 dgl
-
-# Optional: embeddings used by scGPT
-pip install --no-cache-dir torchtext==0.18.0 torchdata==0.9.0
+pip install "torch==2.4.1" "torchvision==0.19.1" \
+  --index-url https://download.pytorch.org/whl/cpu
 
-# Optional: UNI (H&E embedding model)
-pip install timm
+pip install dgl -f https://data.dgl.ai/wheels/torch-2.4/repo.html
 ```
 
----
-
-#### CPU-only
+#### GPU (CUDA 12.4)
 
 ```bash
 pip install "torch==2.4.1" "torchvision==0.19.1" \
-  --index-url https://download.pytorch.org/whl/cpu
-conda install -c dglteam -c conda-forge dgl
-
-# Optional, used for scGPT
-pip install --no-cache-dir torchtext==0.18.0 torchdata==0.9.0
+  --index-url https://download.pytorch.org/whl/cu124
 
-# Optional, used for UNI
-pip install timm
+pip install dgl -f https://data.dgl.ai/wheels/torch-2.4/cu124/repo.html
 ```
 
-> 💡 Replace `cu124` with the CUDA version matching your system (e.g., `cu121`).
-
 ---
 
 ### 3. Install SpatialFusion package
 
 #### Basic installation — *Recommended for users*
 ```bash
-cd spatialfusion/
-pip install -e .
+pip install spatialfusion
 ```
 ---
-#### Developer installation - *Recommended for contributors*
+#### Install from source - *Recommended for contributors*
 Includes: `pytest`, `black`, `ruff`, `sphinx`, `matplotlib`, `seaborn`.
 
 ```bash
+git clone https://github.com/uhlerlab/spatialfusion.git
+
 cd spatialfusion/
+
 pip install -e ".[dev,docs]"
 ```
 

diff --git a/docs/unimodal-embeddings.md b/docs/unimodal-embeddings.md
@@ -0,0 +1,114 @@
+# Generate SpatialFusion inputs
+
+## Overview
+
+Before running SpatialFusion, you need to generate unimodal embeddings from:
+
+- spatial transcriptomics data → using **scGPT**
+- H&E / whole-slide images → using **UNI**
+
+
+This step requires a GPU to run efficiently and we provide two ways to run it.
+
+## Which workflow should I choose?
+
+### WDL workflow
+Best if you:
+
+- do not have access to a GPU
+- use a platform like Terra
+
+Launch via Dockstore: 
+<https://dockstore.org/workflows/github.com/uhlerlab/spatialfusion/unimodal-embeddings-for-spatialfusion:main?tab=info>
+
+### Local / self-managed GPU workflow (this guide) 
+
+Best if you:
+
+- have access to a GPU machine
+
+---
+
+
+The remainder of this guide covers the **local/ self-managed GPU workflow**.
+
+## 1. Requirements
+
+Before running this step, you will need:
+
+- a GPU-enabled machine (tested with NVIDIA Tesla T4)
+- Docker installed
+
+
+## 2. Gather the required files
+
+Your inputs should include:
+
+- `adata`: AnnData (`.h5ad`) used for scGPT embeddings and for the spatial coordinates consumed by UNI. Spatial coordinates are expected in `adata.obsm["spatial"]`.
+- `wsi`: whole-slide image / H&E TIFF used to generate UNI image embeddings. TIFF / OME-TIFF format is expected.
+- `scgpt_weights`: a directory containing `best_model.pt`, `args.json`, and `vocab.json`.
+    - Download from <https://doi.org/10.6084/m9.figshare.24747228>
+- `uni_weights`: the UNI model weights file `pytorch_model.bin`.
+    - Request access and download from Mahmood Lab at <https://huggingface.co/MahmoodLab/UNI2-h>
+- `input_is_log_normalized`: decide whether your AnnData expression values are already log-normalized. You will pass `True` if they are already log-normalized and `False` if they are not.
+
+
+## 3. Set local paths
+
+Pull the public Docker image:
+
+```bash
+docker pull vanallenlab/unimodal-embeddings:v0.1
+```
+
+Set local path variables (absolute paths):
+
+```bash
+ADATA=/absolute/path/to/object.h5ad
+WSI=/absolute/path/to/image.ome.tif
+SCGPT_WEIGHTS_DIR=/absolute/path/to/scgpt
+UNI_WEIGHTS=/absolute/path/to/pytorch_model.bin
+OUTPUT_DIR=/absolute/path/to/output
+# Depends on your data
+LOG_NORM="False"
+```
+
+Notes:
+
+- `SCGPT_WEIGHTS_DIR` should point to a directory containing `best_model.pt`, `args.json`, and `vocab.json`.
+
+
+## 4. Run embedding generation
+
+```bash
+docker run --rm --gpus all \
+  -v "$ADATA":/inputs/object.h5ad \
+  -v "$WSI":/inputs/image.ome.tif \
+  -v "$SCGPT_WEIGHTS_DIR":/weights/scgpt \
+  -v "$UNI_WEIGHTS":/weights/pytorch_model.bin \
+  -v "$OUTPUT_DIR":/out \
+  vanallenlab/unimodal-embeddings:v0.1 \
+  python /app/unimodal-embeddings.py \
+    --mode both \
+    --adata /inputs/object.h5ad \
+    --input-is-log-normalized "$LOG_NORM" \
+    --wsi /inputs/image.ome.tif \
+    --output-dir /out \
+    --scgpt-weights /weights/scgpt \
+    --uni-weights /weights/pytorch_model.bin
+```
+
+## 5. Expected outputs
+After successful execution, you should see:
+
+```
+$OUTPUT_DIR/
+  ├── scGPT.parquet
+  └── UNI.parquet
+```
+
+
+## Notes
+- This guide covers the most common use case with minimal inputs
+- Additional optional parameters are available, see
+[`unimodal-embeddings.py`](https://github.com/uhlerlab/spatialfusion/blob/mkdocs-update/workflows/unimodal-embeddings/scripts/unimodal-embeddings.py#L208)
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -18,6 +18,7 @@ plugins:
 
 nav:
   - Home: index.md
+  - Prepare inputs: unimodal-embeddings.md
   - Installation: installation.md
   - Quick Start: quickstart.md
   - Concepts: concepts.md