I build data science and machine learning systems that move cleanly from raw data to defensible insight, with an emphasis on well-motivated problems, reproducible pipelines, and model interpretability.
My work sits at the intersection of applied ML, scientific computing, and software engineering, often using public or operational data to prototype end-to-end analyses that could realistically run in production.
-
Distributed-ML
A distributed, dataset-agnostic CT preprocessing pipeline using Dask, designed for large clinical imaging datasets and downstream ML workflows. -
publicdata_ca
A reusable data acquisition and normalization framework for Canadian public datasets (StatCan, CMHC, CIHI), supporting rapid ML case studies such as housing affordability indices and hospital utilization analysis. -
Applied ML Case Studies
Short, tightly scoped projects demonstrating:- Unsupervised learning (Isolation Forests, autoencoders)
- Feature engineering from messy public datasets
- Evaluation under limited or noisy ground truth
- Clear motivation and decision-oriented outputs
-
YesChef GPT
An AI-powered system that structures generative outputs into machine-readable components (ingredients, preparation steps, pickup notes), emphasizing controllability and downstream usability over novelty.
- C++ medical image registration using ITK (Insight Toolkit)
- MATLAB pipelines using Marching Cubes for carotid artery tracing in CT angiography
- Control systems for LED solar simulators supporting photovoltaic research
- Formal training in medical physics, with strong grounding in measurement, uncertainty, and validation
-
Anomaly detection in healthcare operations
Early detection of unusual demand or utilization patterns using unsupervised and semi-supervised methods. -
Public-sector ML pipelines
Designing reusable ingestion and feature pipelines that make public data viable for rapid experimentation. -
Evaluation without labels
Practical techniques for validating unsupervised models when ground truth is incomplete or unavailable. -
Bridging notebooks to systems
Turning exploratory analyses into maintainable, testable services without losing scientific intent.
I approach data science as an engineering discipline:
start with a clear question, respect the data’s limitations, and build models that can be explained, tested, and trusted.
My goal is to work on problems where statistical thinking, ML techniques, and real-world constraints all matter — especially in healthcare, infrastructure, and public data contexts.



