Skip to content

RumaizaNorova/Telus_Hackathon_2026

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI Pathology Copilot (PCam)

Pathology review is time-intensive, and spotting metastatic tissue across large slide volumes is hard to scale. This project delivers a fast, explainable patch-level assistant built on PatchCamelyon (PCam): a ResNet-based classifier with Grad-CAM evidence, uncertainty estimates, and domain-shift warnings, wrapped in a single-screen Streamlit UI for demo and triage workflows.

Quick Start (Local)

  1. Create a virtual env and install deps: python3 -m venv .venv && source .venv/bin/activate pip install -r requirements.txt
  2. Add model weights (optional):
    • Place outputs/best_model.pt locally, or
    • Set MODEL_URL to a direct download link for the checkpoint.
  3. Run the UI: streamlit run src/ui/app.py

The UI ships with bundled samples in assets/samples/ so the demo works without the full dataset. If no model weights are found, the app falls back to a heuristic demo mode and still runs end-to-end.

Dataset Setup (Optional)

If you want to load dataset samples or retrain, place the full dataset locally in data/pcam/ (not tracked in git). You can also use a tiny subset in data/sample/ for smoke tests.

The repo expects the original PCam structure:

data/pcam/
  pcam/
    training_split.h5
    validation_split.h5
    test_split.h5
  Labels/Labels/
    camelyonpatch_level_2_split_train_y.h5
    camelyonpatch_level_2_split_valid_y.h5
    camelyonpatch_level_2_split_test_y.h5
  Metadata/Metadata/
    train_metadata.csv
    valid_metadata.csv
    test_metadata.csv
  camelyonpatch_level_2_split_train_mask/
    camelyonpatch_level_2_split_train_mask.h5

Full dataset source (Kaggle): https://www.kaggle.com/datasets/andrewmvd/metastatic-tissue-classification-patchcamelyon

Model Weights

outputs/ is gitignored to keep the repo light. For real predictions, provide a checkpoint locally:

  • outputs/best_model.pt (recommended), or
  • set MODEL_URL in your environment or Streamlit secrets to auto-download.

Example: export MODEL_URL="https://your-hosted-file/best_model.pt"

Data Notes

  • Images are stored in HDF5 under dataset key x.
  • Labels are stored in HDF5 under dataset key y.
  • Train masks are optional and stored under dataset key mask.
  • Metadata CSV rows align with the HDF5 sample indices.

Repo Layout

src/
  data/     # dataset loaders
  models/   # model defs
  train/    # training scripts
  eval/     # metrics + evaluation
  explain/  # Grad-CAM
  ood/      # uncertainty + domain shift
  api/      # inference service
  ui/       # demo UI
scripts/    # helper scripts (sample creation, inspection)

Quick Commands

Inspect data: python3 scripts/inspect_dataset.py --split train --num-samples 5 --metadata

Train (fast smoke run): python3 -m src.train.train --epochs 1 --max-train 2048 --max-val 512

Train (better baseline with balancing + aug): python3 -m src.train.train --epochs 2 --max-train 50000 --max-val 10000 --pretrained --augment --pos-weight --balanced-sampler

Evaluate: python3 -m src.eval.evaluate --checkpoint outputs/best_model.pt --split valid

Compute OOD stats (feature centroid): python3 scripts/compute_ood_stats.py --checkpoint outputs/best_model.pt --max-samples 5000

Inference demo (prediction + Grad-CAM + uncertainty + OOD): python3 scripts/infer_demo.py --checkpoint outputs/best_model.pt --feature-stats outputs/feature_stats.npz --split test --index 0

Run UI: streamlit run src/ui/app.py

Generate bundled UI samples (optional, from local dataset): python3 scripts/make_ui_samples.py --count 50

Metrics report (ROC-AUC + sensitivity/specificity): python3 scripts/report_metrics.py --checkpoint outputs/best_model.pt --split valid --threshold 0.5 --max-samples 5000

Calibrate temperature (optional): python3 scripts/calibrate_temperature.py --checkpoint outputs/best_model.pt --split valid --max-samples 5000 Threshold sweep (optional): python3 scripts/threshold_sweep.py --checkpoint outputs/best_model.pt --split valid --max-samples 5000

Generate gallery images (optional): python3 scripts/make_gallery.py --checkpoint outputs/best_model.pt

About

AI-powered tool for identifying metastatic cancer in pathology slides. Upload a tissue patch, get a tumor prediction with visual explanation. Built for trust: includes model interpretability and domain shift awareness for safe use in clinical workflows.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages