Research Software Engineer · Data Infrastructure · Open Science
I build the pipelines, databases, and scalable architectures that make research data usable at scale. My work sits at the intersection of scientific computing, data engineering, and open-source community building.
Previously RSE at Stanford University (METER-AI / Andrew Ng, Doerr School of Sustainability, SRCC), where I architected cloud data pipelines processing 14M+ records from 200+ sources and led the open-sourcing of a methane emissions dataset now used by Planet and CarbonMapper for climate mitigation. Before that, nearly three years at UW-Madison Radiology building serverless containerized tools for neuroimaging data management (BIDS, NIfTI, DICOM, Flywheel.io) and GB-range deidentification pipelines under HIPAA/IRB compliance.
Currently exploring open neuroscience infrastructure — studying how platforms like brainlife.io and FreeSurfer are architected, and how standards like BIDS and NWB govern the flow of data from scanner to archive.
- Neuroimaging & neurophysiology data — BIDS, NIfTI, DICOM, NWB, EEG pipelines, deidentification
- Data versioning & reproducibility — Git internals, containerized workflows, CI/CD for research
- Graph-based data models — Neo4j, provenance modeling (PROV), brain connectivity, cross-archive metadata linking
- Agentic AI for science — LLM-powered development workflows, automated data validation, standards compliance
- Cloud data engineering — GCP (certified), AWS, BigQuery, Terraform, Docker, Kubernetes
- MassGen1 — Framework for AI-augmented workflows
- Contributions and explorations in neuroinformatics tooling — studying DataLad, DANDI, HeuDiConv ecosystems
NVIDIA Certified Professional: Agentic AI (2026) · Google Cloud Professional Data Engineer (2025) · Neo4j Certified Professional (2025) · Neo4j Graph Data Science (2025) · Google Cloud Professional ML Engineer (2020) · Google Cloud Professional Cloud Architect (2018, 2020)
M.S. Applied Statistics, Penn State University (GPA 3.77) · B.S. Finance, Minor in Statistics, Penn State University
- Stanford SRCC, 2023 — METER-AI: BigQuery pipelines, serverless architecture, SAM (Segment Anything Model) integration
- SIIM, 2019 — Healthcare interoperability, serverless functions, and big data analytics
Interested in open-source neuroscience infrastructure, reproducible science, and building tools that make research data FAIR and accessible. Always looking to connect with people working on these problems.