chanzuckerberg · mlgill · Jun 20, 2025 · Jul 3, 2025 · Jul 7, 2025 · Jul 7, 2025
diff --git a/.github/workflows/docker-build.yml b/.github/workflows/docker-build.yml
diff --git a/.gitignore b/.gitignore
@@ -43,11 +43,11 @@ cover/
 *.ipynb_checkpoints/
 
 # Environments
-.env
-.venv
-env/
-venv/
-ENV/
+.env*
+.venv*
+env*/
+venv*/
+ENV*/
 env.bak/
 venv.bak/
 

diff --git a/Makefile b/Makefile
@@ -1,43 +1,8 @@
-# Default target
+# Default target to run tests and code checks before commit.
 .PHONY: all
-all: scvi uce scgpt scgenept geneformer transcriptformer aido
+all: test lint mypy-checkmake
 
-# Build the scvi image
-.PHONY: scvi
-scvi:
-	docker build -t cz-benchmarks-models:scvi -f docker/scvi/Dockerfile .
-
-# Build the uce image
-.PHONY: uce
-uce:
-	docker build -t cz-benchmarks-models:uce -f docker/uce/Dockerfile .
-
-# Build the scgpt image
-.PHONY: scgpt
-scgpt:
-	docker build -t cz-benchmarks-models:scgpt -f docker/scgpt/Dockerfile .
-
-# Build the scgenept image
-.PHONY: scgenept
-scgenept:
-	docker build -t cz-benchmarks-models:scgenept -f docker/scgenept/Dockerfile .
-
-# Build the geneformer image
-.PHONY: geneformer
-geneformer:
-	docker build -t cz-benchmarks-models:geneformer -f docker/geneformer/Dockerfile .
-
-# Build the geneformer image
-.PHONY: aido
-aido:
-	docker build -t cz-benchmarks-models:aido -f docker/aido/Dockerfile .
-
-# Build the transcriptformer image
-.PHONY: transcriptformer
-transcriptformer:
-	docker build -t cz-benchmarks-models:transcriptformer -f docker/transcriptformer/Dockerfile .
-
-# Clean up images
+# Clean up model images generated by the czbenchmarks in version 0.9
 .PHONY: clean
 clean:
 	docker rmi cz-benchmarks-models:scvi || true
@@ -46,9 +11,7 @@ clean:
 	docker rmi cz-benchmarks-models:scgenept || true
 	docker rmi cz-benchmarks-models:geneformer || true
 	docker rmi cz-benchmarks-models:transcriptformer || true
-# Helper target to rebuild everything from scratch
-.PHONY: rebuild
-rebuild: clean all
+	docker rmi cz-benchmarks-models:aido || true
 
 # Run all unit tests
 .PHONY: test

diff --git a/README-pypi.md b/README-pypi.md
@@ -5,10 +5,10 @@
 ⚠️ **Warning:** Repository under active development and is in the alpha phase of development, subject to major refactors as outlined in the public-facing [roadmap](https://chanzuckerberg.github.io/cz-benchmarks/roadmap.html).
 
 ### What is cz-benchmarks?
-cz-benchmarks is a package for standardized evaluation and comparison of machine learning models for biological applications (first, in the single-cell transcriptomics domain, with future plans to expand to additional domains). The package provides a toolkit for running containerized models, executing biologically-relevant tasks, and computing performance metrics. We see this tool as a step towards ensuring that large-scale AI models can be harnessed to deliver genuine biological insights -- by building trust, accelerating development, and bridging the gap between ML and biology communities.
+cz-benchmarks is a package for standardized evaluation and comparison of machine learning models for biological applications (first, in the single-cell transcriptomics domain, with future plans to expand to additional domains). The package provides a toolkit for running containerized models, executing biologically-relevant tasks, and computing performance metrics. We see this tool as a step towards ensuring that large-scale ML Models can be harnessed to deliver genuine biological insights -- by building trust, accelerating development, and bridging the gap between ML and biology communities.
 
 ### Why benchmarking? Why now?
-Last year, CZI hosted a workshop focused on benchmarking and evaluation of AI models in biology, and the [insights gained](https://virtualcellmodels.cziscience.com/micro-pub/benchmarking-workshop) have reinforced our commitment to supporting the development of a robust benchmarking infrastructure, which we see as critical to achieving our Virtual Cell vision.
+Last year, CZI hosted a workshop focused on benchmarking and evaluation of ML Models in biology, and the [insights gained](https://virtualcellmodels.cziscience.com/micro-pub/benchmarking-workshop) have reinforced our commitment to supporting the development of a robust benchmarking infrastructure, which we see as critical to achieving our Virtual Cell vision.
 
 ### 💬 Community Feedback & Contributions
 We're working to get the alpha version of cz-benchmarks stable to build with the community. In the meantime, for issues you may identify, feel free to open an issue on GitHub or reach out to us at [[email protected]](mailto:[email protected]).
@@ -22,7 +22,6 @@ To get started with `cz-benchmarks`, refer to the [Quick Start Guide](https://
 
 - [How To Guides](https://chanzuckerberg.github.io/cz-benchmarks/how_to_guides/index.html)
     - [Add a Custom Dataset](https://chanzuckerberg.github.io/cz-benchmarks/how_to_guides/add_custom_dataset.html)
-    - [Add a Custom Model](https://chanzuckerberg.github.io/cz-benchmarks/how_to_guides/add_custom_model.html)
 - [Developer Guides](https://chanzuckerberg.github.io/cz-benchmarks/developer_guides/index.html)
 - [API Reference](https://chanzuckerberg.github.io/cz-benchmarks/api_reference.html)
 - [Assets](https://chanzuckerberg.github.io/cz-benchmarks/assets.html)

diff --git a/README.md b/README.md
@@ -5,10 +5,10 @@
 ⚠️ **Warning:** Repository under active development and is in the alpha phase of development, subject to major refactors as outlined in the public-facing [roadmap](docs/source/roadmap.md).
 
 ### What is cz-benchmarks?
-cz-benchmarks is a package for standardized evaluation and comparison of machine learning models for biological applications (first, in the single-cell transcriptomics domain, with future plans to expand to additional domains). The package provides a toolkit for running containerized models, executing biologically-relevant tasks, and computing performance metrics. We see this tool as a step towards ensuring that large-scale AI models can be harnessed to deliver genuine biological insights -- by building trust, accelerating development, and bridging the gap between ML and biology communities.
+cz-benchmarks is a package for standardized evaluation and comparison of machine learning models for biological applications (first, in the single-cell transcriptomics domain, with future plans to expand to additional domains). The package provides a toolkit for running containerized models, executing biologically-relevant tasks, and computing performance metrics. We see this tool as a step towards ensuring that large-scale ML Models can be harnessed to deliver genuine biological insights -- by building trust, accelerating development, and bridging the gap between ML and biology communities.
 
 ### Why benchmarking? Why now?
-Last year, CZI hosted a workshop focused on benchmarking and evaluation of AI models in biology, and the [insights gained](https://virtualcellmodels.cziscience.com/micro-pub/benchmarking-workshop) have reinforced our commitment to supporting the development of a robust benchmarking infrastructure, which we see as critical to achieving our Virtual Cell vision.
+Last year, CZI hosted a workshop focused on benchmarking and evaluation of ML Models in biology, and the [insights gained](https://virtualcellmodels.cziscience.com/micro-pub/benchmarking-workshop) have reinforced our commitment to supporting the development of a robust benchmarking infrastructure, which we see as critical to achieving our Virtual Cell vision.
 
 ### 💬 Community Feedback & Contributions
 We're working to get the alpha version of cz-benchmarks stable to build with the community. In the meantime, for issues you may identify, feel free to open an issue on GitHub or reach out to us at [[email protected]](mailto:[email protected]).
@@ -21,7 +21,6 @@ We're working to get the alpha version of cz-benchmarks stable to build with the
 
 ### How-To Guides
 - [Add a Custom Dataset](docs/source/how_to_guides/add_custom_dataset.md)
-- [Add a Custom Model](docs/source/how_to_guides/add_custom_model.md)
 - [Add a New Metric](docs/source/how_to_guides/add_new_metric.md)
 - [Add a New Task](docs/source/how_to_guides/add_new_task.md)
 - [Interactive Mode](docs/source/how_to_guides/interactive_mode.md)
@@ -30,7 +29,6 @@ We're working to get the alpha version of cz-benchmarks stable to build with the
 ### Developer Guides
 - [Datasets](docs/source/developer_guides/datasets.md)
 - [Metrics](docs/source/developer_guides/metrics.md)
-- [Models](docs/source/developer_guides/models.md)
 - [Tasks](docs/source/developer_guides/tasks.md)
 - [Debugging](docs/source/developer_guides/debugging.md)
 - [Release Process](docs/source/developer_guides/release_process.md)

diff --git a/docker/README.md b/docker/README.md
diff --git a/docker/aido/Dockerfile b/docker/aido/Dockerfile
diff --git a/docker/aido/config.yaml b/docker/aido/config.yaml