BiomedParse

[Notice] This is v2 of the BiomedParse model, with improved code and model architecture using BoltzFormer, supporting end-to-end 3D inference. Check v1 if you are looking for the original version.

[Paper] [Demo] [Model] [Data] [BibTeX]

This repository hosts the code and resources for BiomedParse, aka "A Foundation Model for Joint Segmentation, Detection, and Recognition of Biomedical Objects Across Nine Modalities" (Nature Methods). BiomedParse is designed for comprehensive biomedical image analysis. It offers a unified approach to perform segmentation, detection, and recognition across diverse biomedical imaging modalities. By consolidating these tasks, BiomedParse provides an efficient and flexible tool tailored for researchers and practitioners, facilitating the interpretation and analysis of complex biomedical data.

What's New in v2?

Since the publication of BiomedParse, we've been continuously collecting feedbacks from the community and making progressive efforts to improve and expand its capability and usability. The v2 release provides:

Larger pretraining data at million scale covering 200+ anatomies across different modalities.
Improved segmentation performance for small objects using the BoltzFormer architecture.
SOTA 3D segmentation performance supporting end-to-end volumetric inference (CVPR Challenge).
Built-in object existence detection for false positives (no seperate mask checking required).

Should I use v1 or v2?

Short answer: v2 for the 3D modalities, and v1 for the rest.

Version	Image type	Modalities	# tasks	Existence detection
v1	2D	CT, MRI, Ultrasound, X-Ray, Pathology, Endoscopy, Dermoscopy, Fundus, OCT	100+	Post-inference K-S test
v2	3D	CT, MRI, Ultrasound, PET, 3D Microscopy (EM, lightsheet)	200+	Built-in ISD module

News

Oct. 15, 2025: BiomedParse v2 release is complete with full support for inference and finetuning!
Jun. 11, 2025: BiomedParse is #1 in the CVPR 2025: Foundation Models for Text-guided 3D Biomedical Image Segmentation Challenge! We upgraded our model and finetuned on the challenge dataset with a wider and more comprehensive coverage for 3D biomedical imaging data. Checkout our model in containerized [docker image] for direct inference. Please acknowledge the original challenge if you use this version of the model.
Jan. 9, 2025: Refined all object recognition script and added notebook with examples.
Dec. 12, 2024: Uploaded extra datasets for finetuning on [Data]. Added random rotation feature for training.
Dec. 5, 2024: The loading process of target_dist.json is optimized by automatic downloading from HuggingFace.
Dec. 3, 2024: We added inference notebook examples in inference_example_RGB.ipynb and inference_example_NIFTI.ipynb
Nov. 22, 2024: We added negative prediction p-value example in inference_example_DICOM.ipynb
Nov. 18, 2024: BiomedParse is officially online in Nature Methods!

Installation

git clone https://github.com/microsoft/BiomedParse.git

Conda Environment Setup

conda create -n biomedparse_v2 python=3.10.14
conda activate biomedparse_v2

Install dependencies

pip install -r assets/requirements/requirements.txt 

# The above requirements file assumes your environment uses cuda12.4. Adjust accordingly for your system/environment

pip install azureml-automl-core
pip install opencv-python
pip install git+https://github.com/facebookresearch/detectron2.git

Model Weights

We provides model weights trained on the CVPR 2025 Text-guided 3D Segmentation Challenge dataset. Please acknowledge the original challenge if you use this version of the model. We also refer to the original dataset for necessary image preprocessing.

Option 1: HuggingFace Hub

You can download the pretrained model weights directly from the HuggingFace Hub.

First, install the required package:

pip install huggingface_hub

Then, download the checkpoint file using the HuggingFace Hub API:

from huggingface_hub import hf_hub_download

# Download the checkpoint file
file_path = hf_hub_download(
    repo_id="microsoft/BiomedParse",
    filename="biomedparse_v2.ckpt"
)

print("Model weights downloaded to:", file_path)

Option 2: Direct Download via Command Line

You can also download the file directly using wget or curl:

wget https://huggingface.co/microsoft/BiomedParse/resolve/main/biomedparse_v2.ckpt

or

curl -L -o biomedparse_v2.ckpt https://huggingface.co/microsoft/BiomedParse/resolve/main/biomedparse_v2.ckpt

💡 Note: If the repository is private, log in with your HuggingFace token using:
huggingface-cli login
before attempting to download.

Now you should have the model weights ready for use!

Model Inference

The v2 of BiomedParse supports segmentation of 3D volumes in a slice-by-slice manner, with neighboring 3D context encoded around each slice in RGB format.

Inference 3D Examples

import numpy as np
import torch
import torch.nn.functional as F
import hydra
from hydra import compose
from hydra.core.global_hydra import GlobalHydra
from utils import process_input, process_output, slice_nms
from inference import postprocess, merge_multiclass_masks
from skimage import segmentation
from huggingface_hub import hf_hub_download

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Using device:", device)

GlobalHydra.instance().clear()
hydra.initialize(config_path="configs/model", job_name="example_prediction")
cfg = compose(config_name="biomedparse_3D")
model = hydra.utils.instantiate(cfg, _convert_="object")
model.load_pretrained(hf_hub_download(
  repo_id="microsoft/BiomedParse", filename="biomedparse_v2.ckpt"))
model = model.to(device).eval()

# Example image and prompt
file_path = "examples/imgs/CT_AMOS_amos_0018.npz"

npz_data = np.load(file_path, allow_pickle=True)
imgs = npz_data["imgs"]
text_prompts = npz_data["text_prompts"].item()

print("Loaded image shape:", imgs.shape)
print("Text prompts:", text_prompts)

ids = [int(_) for _ in text_prompts.keys() if _ != "instance_label"]
ids.sort()
text = "[SEP]".join([text_prompts[str(i)] for i in ids])

imgs, pad_width, padded_size, valid_axis = process_input(imgs, 512)

imgs = imgs.to(device).int()

input_tensor = {
    "image": imgs.unsqueeze(0),  # Add batch dimension
    "text": [text],
}

with torch.no_grad():
    output = model(input_tensor, mode="eval", slice_batch_size=4)

mask_preds = output["predictions"]["pred_gmasks"]
mask_preds = F.interpolate(mask_preds, size=(512, 512), mode="bicubic", align_corners=False, antialias=True)

mask_preds = postprocess(mask_preds, output["predictions"]["object_existence"])
mask_preds = merge_multiclass_masks(mask_preds, ids)
mask_preds = process_output(mask_preds, pad_width, padded_size, valid_axis)
print("Processed mask shape:", mask_preds.shape)

Please refer to the inference notebook for more examples.

Evaluation

You need to prepare the public model checkpoint and evaluation data under <YOUR MODEL AND DATA DIR> and put it in evaluate_biomedparse.yaml as

mounts:
  external: <YOUR MODEL AND DATA DIR>

Save the model checkpoint under <YOUR MODEL AND DATA DIR>. Download the validation set of the CVPR 2025 Text-guided 3D Segmentation Challenge dataset. Save the validation images to <YOUR MODEL AND DATA DIR>/data/test, and validation masks to <YOUR MODEL AND DATA DIR>/data/test_mask. Run

python -m azureml.acft.image.components.olympus.app.main \
  --config-path <YOUR ABSOLUTE CONFIG DIRECTORY PATH> \
  --config-name evaluate_biomedparse

Fine-tuning

Want to improve performance for your specific tasks? Here is a detailed instruction for end-to-end finetuning on your own data: FINETUNING

Recommended Preprocessing (Important!)

BiomedParse v2 training covered five commonly used 3D biomedical image modalities: CT, MR, PET, Ultrasound, and Microscopy. It is important to preprocess the inference images the same as in model training to achieve reasonable performance. Please process all images to npz format with an intensity range of [0, 255]. Specifically, for CT images, please normalize the Hounsfield units using typical window width and level values according to the site/anatomy:

soft tissues (W:400, L:40)
lung (W:1500, L:-160)
brain (W:80, L:40)
bone (W:1800, L:400).

For all other images, clip the intensity values in the range between the 0.5th and 99.5th percentiles. Finally, rescale the intensity values to the range of [0, 255]. If the original intensity range was already in [0, 255], no preprocessing needed.

Supported Tasks

CT: oncology/pathology (adrenocortical carcinoma, kidney lesions/cysts L/R, liver tumors, lung lesions, pancreas tumors, head–neck cancer, colon cancer primaries, COVID-19, whole-body lesion, lymph nodes); thoracic (lungs L/R, lobes LUL/LLL/RUL/RML/RLL, trachea, airway tree); abdomen/pelvis (spleen, liver, gallbladder, stomach, pancreas, duodenum, small bowel, colon, esophagus); GU/endocrine (kidneys L/R, adrenal glands L/R, bladder, prostate, uterus); vascular (aorta/tree, SVC, IVC, pulmonary vein, brachiocephalic trunk, subclavian/carotid arteries L/R, brachiocephalic veins L/R, left atrial appendage, portal/splenic vein, iliac arteries/veins L/R); cardiac (heart); head/neck (carotids L/R, submandibular/parotid/lacrimal glands L/R, thyroid, larynx glottic/supraglottic, lips, buccal mucosa, oral cavity, cervical esophagus, cricopharyngeal inlet, arytenoids, eyeball segments ant/post L/R, optic chiasm, optic nerves L/R, cochleae L/R, pituitary, brainstem, spinal cord); neuro/cranial (brain, skull, Circle of Willis CTA); spine/MSK (sacrum, vertebrae C1–S1, humeri/scapulae/clavicles/femora/hips L/R, gluteus maximus/medius/minimus L/R, autochthon L/R, iliopsoas L/R).
MRI: abdomen/pelvis (spleen, liver, gallbladder, stomach, pancreas, duodenum, small bowel, colon whole, esophagus, bladder, prostate, uterus); colon segments (cecum, appendix, ascending, transverse, descending, sigmoid, rectum); GU (prostate transition zone, prostate lesion); cardiac CMR (LV, RV, myocardium, LA, RA); thoracic (lungs L/R); vascular (aorta, pulmonary artery, SVC, IVC, portal/splenic vein, iliac arteries/veins L/R, carotid arteries L/R, jugular veins L/R); neuro tumors/ischemia (brain, brain tumor, stroke lesion, GTVp/GTVn tumor, vestibular schwannoma intra/extra-meatal, cochleae L/R); glioma components (non-enhancing tumor core, non-enhancing FLAIR hyperintensity, enhancing tissue, resection cavity); white matter disease (WM hyperintensities FLAIR/T1); neurovascular (Circle of Willis MRA); spine/MSK (sacrum, vertebrae regional, discs, spinal canal/cord, humeri/femora/hips L/R, gluteus maximus/medius/minimus L/R, autochthon L/R, iliopsoas L/R).
Ultrasound: cardiac (LV, myocardium, LA), neck (thyroid, carotid artery, jugular vein), neuro (brain tumor), calf MSK (soleus, gastrocnemius medialis/lateralis).
PET: whole-body lesion.
Electron Microscopy: endolysosomes, mitochondria, nuclei, neuronal ultrastructure, synaptic clefts, axon.
Lightsheet Microscopy: brain neural activity, Alzheimer’s plaque, nuclei, vessel.

Citation

Please cite our paper if you use the code, model, or data.

@article{zhao2025foundation,
  title={A foundation model for joint segmentation, detection and recognition of biomedical objects across nine modalities},
  author={Zhao, Theodore and Gu, Yu and Yang, Jianwei and Usuyama, Naoto and Lee, Ho Hin and Kiblawi, Sid and Naumann, Tristan and Gao, Jianfeng and Crabtree, Angela and Abel, Jacob and others},
  journal={Nature methods},
  volume={22},
  number={1},
  pages={166--176},
  year={2025},
  publisher={Nature Publishing Group US New York}
}

If you use the v2 code or model, please also cite the BoltzFormer paper:

@inproceedings{zhao2025boltzmann,
  title={Boltzmann Attention Sampling for Image Analysis with Small Objects},
  author={Zhao, Theodore and Kiblawi, Sid and Usuyama, Naoto and Lee, Ho Hin and Preston, Sam and Poon, Hoifung and Wei, Mu},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={25950--25959},
  year={2025}
}

Usage and License Notices

The model described in this repository is provided for research and development use only. The model is not intended for use in clinical decision-making or for any other clinical use, and the performance of the model for clinical use has not been established. You bear sole responsibility for any use of this model, including incorporation into any product intended for clinical use.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
assets		assets
biomedparse_datasets		biomedparse_datasets
configs		configs
examples		examples
figures		figures
inference_utils		inference_utils
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
CT-Abdomen_rotate.gif		CT-Abdomen_rotate.gif
FINETUNING.md		FINETUNING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
__init__.py		__init__.py
inference.py		inference.py
inference_example_3D.ipynb		inference_example_3D.ipynb
process_2D.py		process_2D.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BiomedParse

What's New in v2?

News

Installation

Conda Environment Setup

Model Weights

Option 1: HuggingFace Hub

Option 2: Direct Download via Command Line

Model Inference

Inference 3D Examples

Evaluation

Fine-tuning

Recommended Preprocessing (Important!)

Supported Tasks

Citation

Usage and License Notices

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

BiomedParse

What's New in v2?

News

Installation

Conda Environment Setup

Model Weights

Option 1: HuggingFace Hub

Option 2: Direct Download via Command Line

Model Inference

Inference 3D Examples

Evaluation

Fine-tuning

Recommended Preprocessing (Important!)

Supported Tasks

Citation

Usage and License Notices

About

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages