Building the first open-source, publicly available Social Security dynamic microsimulation model
This project develops a comprehensive dynamic microsimulation model for Social Security policy analysis, combining:
- PolicyEngine's existing Social Security rules implementation
- Machine learning-based synthetic panel construction
- Quantile regression forests for earnings trajectory imputation
- Gradient descent calibration to administrative targets
- Full integration with PolicyEngine's web app and Python API
This would be the first open-source Social Security model comparable to proprietary tools like DynaSim (Urban Institute) and CBOLT (Congressional Budget Office), democratizing access to sophisticated lifetime benefit analysis.
This approach is proven. PolicyEngine's Enhanced CPS is the only publicly available microdata file that produces accurate tax-benefit microsimulation impacts, matching models that use restricted IRS data. This project extends that proven methodology from cross-sectional to longitudinal analysis.
Full planning documentation available as a Jupyter Book:
cd jupyterbook
myst build --html
myst startOr view chapters directly:
- Introduction - Overview and significance
- Literature Review - Academic foundations
- Existing Models - Comparison to DynaSim, MINT, CBOLT
- Technical Specifications - Variables, transitions, behavioral responses
- Data Sources - CPS, PSID, SIPP, SSA data
- Calibration Targets - Validation approach
- Methodology - Technical approach
- Infrastructure - Tools and architecture
- Team - Proposed team leadership
- Roadmap - Development timeline
The same challenge exists in cross-sectional tax modeling: all major models (Tax Policy Center, Penn Wharton, Tax Foundation) rely on the IRS Public Use File, which cannot be publicly shared.
PolicyEngine solved this with the Enhanced CPS (eCPS) - the only publicly available microdata producing accurate tax-benefit impacts:
- CPS base + ML imputation + calibration
- Matches Joint Committee on Taxation revenue estimates
- Matches Tax Policy Center distributional tables
- Fully transparent and reproducible
This project applies the same proven methodology to longitudinal analysis:
- eCPS: CPS + ML + calibration → accurate tax modeling ✓
- This project: CPS + QRF + calibration → accurate Social Security modeling
PolicyEngine-US already captures Social Security rules comprehensively. The main challenge is building a synthetic longitudinal panel with realistic:
- Lifetime earnings trajectories (birth to retirement)
- Demographic transitions (marriage, divorce, fertility, disability, mortality)
- Cross-sectional accuracy (matches current population)
- Longitudinal consistency (realistic earnings mobility)
- Calibration to SSA projections
- Start with Enhanced CPS: ~200,000 individuals, high-quality cross-section
- Impute Earnings Histories: Quantile regression forests trained on PSID
- Model Demographics: Hazard models for transitions (marriage, disability, mortality)
- Calibrate: Gradient descent reweighting to match SSA targets
- Project Forward: Year-by-year aging with continued calibration
- Calculate Benefits: Leverage PolicyEngine-US's existing implementation
Built on PolicyEngine's open-source tools:
- microimpute: Machine learning imputation (quantile regression forests)
- microcalibrate: Gradient descent calibration to targets
- PolicyEngine-US-Data: Enhanced CPS construction pipeline
- PolicyEngine-Core: Microsimulation engine with Social Security rules
18 months from start to public launch:
- Months 1-3: Proof of concept
- Months 4-6: Full earnings imputation
- Months 7-9: Demographic transitions
- Months 10-12: Benefit calculation and validation
- Months 13-15: Forward projection and calibration
- Months 16-18: Web interface and API deployment
- Max Ghenis: PolicyEngine founder, infrastructure and integration
- Ben Ogorek: PhD Statistics, quantile regression and validation
- John Sabelhaus: Former Fed economist, Social Security expert
This model will enable:
- First open-source dynamic Social Security microsimulation
- Free public access via web interface
- Python API for programmatic analysis
- Full transparency and reproducibility
- Individual-level distributional analysis
- Lifetime benefit calculations across cohorts
- Analysis of reforms (benefit formulas, retirement age, taxation, etc.)
Current: Planning and documentation phase
This repository contains the planning documentation. Code development will begin in Phase 1.
social-security-model/
├── jupyterbook/ # Planning documentation (Jupyter Book)
│ ├── intro.md
│ ├── literature-review.md
│ ├── existing-models.md
│ ├── data-sources.md
│ ├── calibration-targets.md
│ ├── methodology.md
│ ├── infrastructure.md
│ ├── team.md
│ ├── roadmap.md
│ ├── references.bib
│ └── myst.yml
├── README.md # This file
└── pyproject.toml # Python package configuration
After Phase 1 begins, will add:
├── data/ # Data preparation scripts
├── imputation/ # Earnings history imputation
├── calibration/ # Reweighting and calibration
├── simulation/ # Projection and benefit calculation
└── tests/ # Test suite
Requires Python 3.13 and MyST:
# Install dependencies
pip install -e ".[dev]"
# Build Jupyter Book
cd jupyterbook
myst build --html
# Start local server
myst startView at http://localhost:3004
- PolicyEngine-US - US tax-benefit microsimulation
- PolicyEngine-US-Data - Enhanced CPS
- microimpute - ML imputation
- microcalibrate - Survey calibration
- L0 - Sparse reweighting
If you use this model or methodology, please cite:
@misc{policyengine_ss_model,
title={Open-Source Social Security Dynamic Microsimulation Model},
author={Ghenis, Max and Ogorek, Ben and Sabelhaus, John},
year={2025},
publisher={PolicyEngine},
url={https://github.com/PolicyEngine/social-security-model}
}MIT License - See LICENSE file
- Max Ghenis: [email protected]
- PolicyEngine: https://policyengine.org
- GitHub Issues: For questions and discussion
This project builds on:
- PolicyEngine's open-source infrastructure
- Academic literature on dynamic microsimulation
- SSA's public data and documentation
- PSID and CPS for panel and cross-sectional data
- Open-source machine learning and statistical tools