Skip to content
View ajharris's full-sized avatar

Block or report ajharris

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
ajharris/README.md

About

I build data science and machine learning systems that move cleanly from raw data to defensible insight, with an emphasis on well-motivated problems, reproducible pipelines, and model interpretability.

My work sits at the intersection of applied ML, scientific computing, and software engineering, often using public or operational data to prototype end-to-end analyses that could realistically run in production.


Current Focus & Active Projects

  • Distributed-ML
    A distributed, dataset-agnostic CT preprocessing pipeline using Dask, designed for large clinical imaging datasets and downstream ML workflows.

  • publicdata_ca
    A reusable data acquisition and normalization framework for Canadian public datasets (StatCan, CMHC, CIHI), supporting rapid ML case studies such as housing affordability indices and hospital utilization analysis.

  • Applied ML Case Studies
    Short, tightly scoped projects demonstrating:

    • Unsupervised learning (Isolation Forests, autoencoders)
    • Feature engineering from messy public datasets
    • Evaluation under limited or noisy ground truth
    • Clear motivation and decision-oriented outputs
  • YesChef GPT
    An AI-powered system that structures generative outputs into machine-readable components (ingredients, preparation steps, pickup notes), emphasizing controllability and downstream usability over novelty.


Background in Scientific Computing

  • C++ medical image registration using ITK (Insight Toolkit)
  • MATLAB pipelines using Marching Cubes for carotid artery tracing in CT angiography
  • Control systems for LED solar simulators supporting photovoltaic research
  • Formal training in medical physics, with strong grounding in measurement, uncertainty, and validation

Currently Exploring

  • Anomaly detection in healthcare operations
    Early detection of unusual demand or utilization patterns using unsupervised and semi-supervised methods.

  • Public-sector ML pipelines
    Designing reusable ingestion and feature pipelines that make public data viable for rapid experimentation.

  • Evaluation without labels
    Practical techniques for validating unsupervised models when ground truth is incomplete or unavailable.

  • Bridging notebooks to systems
    Turning exploratory analyses into maintainable, testable services without losing scientific intent.


Perspective

I approach data science as an engineering discipline:
start with a clear question, respect the data’s limitations, and build models that can be explained, tested, and trusted.

My goal is to work on problems where statistical thinking, ML techniques, and real-world constraints all matter — especially in healthcare, infrastructure, and public data contexts.

Pinned Loading

  1. housing-affordability housing-affordability Public

    Housing Affordability Stress Index

    Jupyter Notebook

  2. Distributed-ML Distributed-ML Public

    Distributed-ML: CT Preprocessing Pipeline with Dask This repository implements a distributed, dataset-agnostic CT preprocessing pipeline designed for large clinical imaging datasets such as NLST, C…

    Python

  3. knock-em-dead-resume knock-em-dead-resume Public

    An AI-powered resume builder based on Martin Yate’s Knock ’Em Dead formula. It finds job ads, extracts key skills, and helps craft tailored, achievement-driven resumes optimized for each role, with…

    Python

  4. Andrew-Harris-Tech/EVXchange Andrew-Harris-Tech/EVXchange Public

    EVXchange is a web app that lets electric vehicle (EV) owners find and book nearby charging stations hosted by individuals or businesses. It works like AirBnB, but for EV chargers — people can rent…

    Python 1

  5. BJJ-notebook BJJ-notebook Public

    Python

  6. yes-chef-gpt yes-chef-gpt Public

    Python