Add General Pathology Foundation Model for embedding tool #61

khaivandangusf2210 · 2025-10-03T04:47:27Z

No description provided.

Copilot

Pull Request Overview

Adds support for a Generalizable Pathology Foundation Model (GPFM) to the embedding extractor, including UI option, help text, and a basic test. Key changes:

Introduces a simplified DinoVisionTransformer-based GPFM implementation with on-demand weight download and preprocessing.
Wires GPFM into model selection and transforms; updates tests and help.
Updates Docker image to CUDA base and installs additional dependencies.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File	Description
tools/galaxy-embedding_extractor/pytorch_embedding.xml	Adds GPFM option, a new test, and updated help text describing the model.
tools/galaxy-embedding_extractor/pytorch_embedding.py	Implements GPFM model, weight download, preprocessing, model registry changes, and dataloader tweak.
tools/galaxy-embedding_extractor/Docker/Dockerfile	Switches to CUDA base and adds several Python/system dependencies for model support.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-03T04:51:38Z

tools/galaxy-embedding_extractor/pytorch_embedding.py

+            return model
+
+        # Standard torchvision models
        if "weights" in inspect.signature(


inspect is not imported; this will raise NameError. Either import inspect at the top or use the already-imported signature function as elsewhere in this file. Suggested fix: replace inspect.signature(...) with signature(...).

Suggested change

if "weights" in inspect.signature(

if "weights" in signature(

Copilot · 2025-10-03T04:51:39Z

tools/galaxy-embedding_extractor/pytorch_embedding.py

+                model.get_transformer()
+            ])
+        else:
+            transform = model.get_transformer()


The apply_normalization flag is ignored for the GPFM path; model.get_transformer() always includes Normalize, so setting apply_normalization=false has no effect. Recommend updating GPFMModel.get_transformer to accept apply_normalization and omitting the final Normalize when false, then pass that flag here, e.g., model.get_transformer(apply_normalization=apply_normalization).

Suggested change

model.get_transformer()

])

else:

transform = model.get_transformer()

model.get_transformer(apply_normalization=apply_normalization)

])

else:

transform = model.get_transformer(apply_normalization=apply_normalization)

Copilot · 2025-10-03T04:51:39Z

tools/galaxy-embedding_extractor/pytorch_embedding.py

+    def _download_weights(self, url, filepath):
+        """Download GPFM weights from the official repository."""
+        if os.path.exists(filepath):
+            logging.info(f"GPFM weights already exist at {filepath}")
+            return True
+
+        logging.info(f"Downloading GPFM weights from {url}")
+        try:
+            response = requests.get(url, stream=True, timeout=300)
+            response.raise_for_status()
+
+            os.makedirs(os.path.dirname(filepath), exist_ok=True)
+
+            # Get file size for progress tracking
+            total_size = int(response.headers.get('content-length', 0))
+            downloaded = 0
+
+            with open(filepath, 'wb') as f:
+                for chunk in response.iter_content(chunk_size=8192):
+                    if chunk:
+                        f.write(chunk)
+                        downloaded += len(chunk)
+                        if total_size > 0:
+                            progress = (downloaded / total_size) * 100
+                            if downloaded % (1024 * 1024 * 10) == 0:  # Log every 10MB
+                                logging.info(f"Downloaded {downloaded // (1024 * 1024)}MB / {total_size // (1024 * 1024)}MB ({progress:.1f}%)")
+
+            logging.info(f"GPFM weights downloaded successfully to {filepath}")
+            return True
+
+        except Exception as e:
+            logging.error(f"Failed to download GPFM weights: {e}")
+            if os.path.exists(filepath):
+                os.remove(filepath)  # Clean up partial download
+            return False


Weights are downloaded without integrity verification; a compromised or truncated file could be loaded. Add a known SHA256 (or similar) and verify the checksum after download (and before load), failing fast if it does not match.

Copilot · 2025-10-03T04:51:39Z

tools/galaxy-embedding_extractor/Docker/Dockerfile

 RUN pip install --no-cache-dir numpy==1.24.4

+# Install timm for GigaPath tile encoder (critical for compatibility)
+RUN pip install --no-cache-dir timm>=1.0.3


The '>' operator will be interpreted by the shell for redirection, so the version constraint may be ignored and output redirected to a file. Quote the specifier or pin via a requirements file, e.g., RUN pip install --no-cache-dir 'timm>=1.0.3'.

Suggested change

RUN pip install --no-cache-dir timm>=1.0.3

RUN pip install --no-cache-dir 'timm>=1.0.3'

Copilot · 2025-10-03T04:51:40Z

tools/galaxy-embedding_extractor/Docker/Dockerfile

+RUN pip install --no-cache-dir git+https://github.com/prov-gigapath/prov-gigapath.git
+
 # Install remaining Python dependencies
 RUN pip install --no-cache-dir Pillow opencv-python pandas fastparquet argparse logging multiprocessing


argparse, logging, and multiprocessing are part of Python's standard library; installing similarly named PyPI packages can shadow/break stdlib behavior. Also, requests is required by the GPFM code but is not installed. Replace with: RUN pip install --no-cache-dir Pillow opencv-python pandas fastparquet requests.

Suggested change

RUN pip install --no-cache-dir Pillow opencv-python pandas fastparquet argparse logging multiprocessing

RUN pip install --no-cache-dir Pillow opencv-python pandas fastparquet requests

Add General Pathology Foundation Model

3c228fb

Copilot AI review requested due to automatic review settings October 3, 2025 04:47

Copilot AI reviewed Oct 3, 2025

View reviewed changes

Fix Copilot review suggestions

2c7a3d8

khaivandangusf2210 changed the title ~~Add General Pathology Foundation Model~~ Add General Pathology Foundation Model for embedding tool. Oct 3, 2025

khaivandangusf2210 changed the title ~~Add General Pathology Foundation Model for embedding tool.~~ Add General Pathology Foundation Model for embedding tool Oct 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add General Pathology Foundation Model for embedding tool #61

Add General Pathology Foundation Model for embedding tool #61

Uh oh!

khaivandangusf2210 commented Oct 3, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Oct 3, 2025

Uh oh!

Copilot AI Oct 3, 2025

Uh oh!

Copilot AI Oct 3, 2025

Uh oh!

Copilot AI Oct 3, 2025

Uh oh!

Copilot AI Oct 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	if "weights" in inspect.signature(
	if "weights" in signature(

	RUN pip install --no-cache-dir timm>=1.0.3
	RUN pip install --no-cache-dir 'timm>=1.0.3'

	RUN pip install --no-cache-dir Pillow opencv-python pandas fastparquet argparse logging multiprocessing
	RUN pip install --no-cache-dir Pillow opencv-python pandas fastparquet requests

Add General Pathology Foundation Model for embedding tool #61

Are you sure you want to change the base?

Add General Pathology Foundation Model for embedding tool #61

Uh oh!

Conversation

khaivandangusf2210 commented Oct 3, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant