Billy - Credit Domino

Production-style credit risk scoring and contagion simulation on the real Prosper lending network
Semi-synthetic graph-credit benchmark built with XGBoost, Neo4j, MLflow, Airflow, dbt, FastAPI, ClickHouse, Prometheus, and Grafana

The Problem

A lender needs to answer two questions at decision time:

Should we approve this loan? — Predict the probability of default using both the applicant's financial profile and their position in a borrower network.
What happens if they default? — Simulate how financial stress cascades through co-borrower, guarantor, and employer relationships — the domino effect of interconnected credit risk.

Traditional credit models treat applicants as independent. In reality, defaults propagate through relationship networks. A borrower whose three co-borrowers just defaulted is categorically different from one whose network is healthy — even if their income and credit score are identical.

The Solution

Credit Domino is a production-grade ML system that combines tabular credit features with graph-derived contagion metrics to score borrowers and simulate systemic risk cascades across a real-world P2P lending network.

Key Results

Metric	Value
ROC-AUC	0.829
Accuracy	75.5%
Precision	63.6%
Recall	77.5%
F1 Score	0.699
Scoring latency (p50)	2.2 ms
Scoring latency (p95)	5.6 ms
Scoring latency (p99)	7.8 ms
Graph scale	89,171 nodes · 3.4M edges
Features	29 (12 tabular + 10 graph + 7 interactions)
Training set	71,336 samples (80/20 stratified)
Test set	17,835 samples

_{Latency measured across 300 sequential requests against the Dockerized API. p50/p95/p99 reported for cached (returning customer) requests. First-seen customers incur a one-time Postgres graph feature lookup (~130 ms).}

Architecture

flowchart TB
    subgraph ORCHESTRATION["⚙️  Orchestration — Airflow 3.1"]
        A1[Load Data] --> A2[Neo4j Load]
        A1 --> A3[ClickHouse Init]
        A2 --> A4[Compute Graph Features]
        A4 --> A5[dbt Build]
        A3 --> A5
        A5 --> A6[Train XGBoost]
        A6 --> A7[Evaluate Model]
        A7 --> A8{AUC ≥ 0.72?}
        A8 -->|Yes| A9[Register champion]
        A8 -->|No| A10[Skip]
        A7 --> A11[Drift Check]
        A7 --> A12[Push Metrics]
    end

    subgraph DATA["🗄️  Data Layer"]
        D1[(Prosper P2P\n89K nodes · 3.4M edges)] --> D2[(PostgreSQL 17.9\nOLTP Store)]
        D2 --> D3[(Neo4j + GDS\nGraph DB)]
        D2 --> D4[dbt 1.11\n3 staging + 2 marts]
        D5[(ClickHouse 26.2\nOLAP Store)]
    end

    subgraph ML["🤖  ML Layer"]
        M1[XGBoost 3.2\n1500 trees · 29 features] --> M2[MLflow 3.10\nExperiment Registry\nchampion alias]
        M1 --> M3[SHAP TreeExplainer\nPer-request top-5]
    end

    subgraph SERVING["🚀  Serving Layer"]
        S1[FastAPI\n/score · /simulate-domino\nDual-write · Prometheus] --> S2[Streamlit 1.55\nScoring · Cascade Viz · Analytics]
    end

    subgraph MONITORING["📊  Monitoring Layer"]
        O1[Prometheus 3.10\nScrape /metrics 15s] --> O2[Grafana 12.3\n8-panel dashboard]
        O3[Evidently 0.7\nKS-test drift detection] --> O1
    end

    ORCHESTRATION --> DATA
    DATA --> ML
    ML --> SERVING
    SERVING --> MONITORING
    S1 -.->|dual-write| D2
    S1 -.->|dual-write| D5
    D3 -.->|graph features| S1
    M2 -.->|champion model| S1
    O3 -.->|drift gauges| O1

    style ORCHESTRATION fill:#fff8f0,stroke:#f97316,stroke-width:2px,color:#1a1a1a
    style DATA fill:#f0f9ff,stroke:#0ea5e9,stroke-width:2px,color:#1a1a1a
    style ML fill:#f5f3ff,stroke:#8b5cf6,stroke-width:2px,color:#1a1a1a
    style SERVING fill:#f0fdf4,stroke:#22c55e,stroke-width:2px,color:#1a1a1a
    style MONITORING fill:#fff1f2,stroke:#f43f5e,stroke-width:2px,color:#1a1a1a

All 9 services start with a single docker compose up -d and self-configure: Grafana dashboards are auto-provisioned, ClickHouse materialized views are created by the DAG, and the API hot-reloads the champion model on promotion.

What Makes This Different

1. Graph features are first-class citizens, not afterthoughts

The model uses 10 graph-derived features computed from the real Prosper P2P lending network (89,171 borrowers, 3.4M directed edges from KONECT):

Feature	Computation	Signal
`neighbor_default_frac`	1-hop default rate	Direct contagion exposure
`neighbor_default_frac_2hop`	2-hop default rate	Second-order network risk
`degree` / `in_degree` / `out_degree`	Degree centrality	Network connectivity
`norm_in_degree` / `norm_out_degree`	Normalized degree	Relative network position
`pagerank`	PageRank centrality	Systemic importance
`distance_to_prior_default`	Multi-source BFS	Proximity to known defaults
`clustering_coefficient`	Local transitivity	Network density around borrower

Plus 7 engineered interaction features that capture cross-domain signals:

Feature	Formula	Intuition
`norm_in_x_dti`	`norm_in_degree × loan_percent_income`	High DTI + high inbound connections = contagion amplifier
`norm_in_x_nbr_def`	`norm_in_degree × neighbor_default_frac`	Connected borrowers with defaulting neighbors
`dti_x_cb_default`	`loan_percent_income × cb_person_default_on_file`	Over-leveraged repeat defaulters
`crisis_exposure`	`neighbor_default_frac × loan_percent_income × (1 + cb_default)`	Composite systemic risk score
`degree_ratio`	`in_degree / (out_degree + 1)`	Asymmetric lending relationships
`dti`	`loan_amnt / (person_income + 1)`	Debt-to-income ratio
`is_recent_default`	Binary flag	Recency signal

2. SHAP explainability on every prediction

Every scoring response includes the top-5 SHAP factors (TreeExplainer) showing why the model scored this borrower the way it did. This isn't a batch report — it's per-request, sub-10ms explainability in production.

SHAP Interpretation Guide

The SHAP summary plot reveals how each feature drives model predictions:

crisis_exposure (top feature) — The composite systemic risk score. High values (red, right side) strongly push predictions toward default. A single-feature systemic risk indicator that combines network contagion with financial leverage.
neighbor_default_frac — Direct neighborhood default rate. When a borrower's immediate neighbors are defaulting (high values = red), the model aggressively predicts default. This is the core "domino effect" the system is designed to detect.
norm_in_x_dti — The interaction between normalized in-degree and debt-to-income ratio. Borrowers who are both highly connected and over-leveraged carry disproportionate risk. This interaction wouldn't be captured by either feature alone.
in_degree / norm_in_degree — Network connectivity. Counterintuitively, higher in-degree (more borrowers depending on you) reduces predicted default risk — well-connected borrowers tend to be more established.
cb_person_default_on_file — Prior default history. Clean binary signal with strong rightward push when present.
loan_percent_income — Loan-to-income ratio. High values linearly increase predicted default probability.
pagerank — Systemic importance in the lending network. Low PageRank borrowers (blue) tend toward higher default risk — peripheral network nodes are less stable.

The key insight: 5 of the top 10 most important features are graph-derived or graph-interaction features, validating the hypothesis that relationship network structure carries material signal for credit risk prediction.

3. Domino contagion simulation

Beyond individual scoring, the system simulates cascade propagation through the borrower network:

curl -X POST http://localhost:8000/simulate-domino \
  -H "Content-Type: application/json" \
  -d '{"trigger_customer_id":"PROSPER_0","initial_shock":1.0,"decay":0.6,"threshold":0.3,"max_hops":3}'

{
  "trigger_customer_id": "PROSPER_0",
  "total_affected": 10,
  "total_fallen": 2,
  "max_hop": 2,
  "cascade": [
    {"customer_id": "PROSPER_0", "hop": 0, "stress": 1.0, "fallen": true},
    {"customer_id": "PROSPER_1", "hop": 1, "stress": 0.346, "fallen": true},
    {"customer_id": "PROSPER_237", "hop": 2, "stress": 0.17, "fallen": false}
  ]
}

The simulation uses BFS-based stress propagation with:

Edge-type weighting — co-borrower links transmit more stress than employer links
DTI-based vulnerability — over-leveraged borrowers amplify incoming stress
Configurable decay — stress attenuates with network distance
Threshold-based default — nodes "fall" when accumulated stress exceeds their resilience

Model Details

Champion Model (MLflow Run `a8b97ef8`)

Parameter	Value
Algorithm	XGBoost (gradient boosted trees)
Trees	1,500
Max depth	7
Learning rate	0.03
Subsample	0.8
Min child weight	3
Gamma	0.05
Scale pos weight	1.73 (class imbalance correction)
Optimal threshold	0.441 (F1-maximized via precision-recall curve)
Total features	29

Feature Categories

Category	Count	Features
Tabular	12	`person_age`, `person_income`, `person_home_ownership`, `person_emp_length`, `loan_intent`, `loan_grade`, `loan_amnt`, `loan_int_rate`, `loan_percent_income`, `cb_person_default_on_file`, `cb_person_cred_hist_length`, `is_recent_default`
Graph	10	`degree`, `in_degree`, `out_degree`, `norm_in_degree`, `norm_out_degree`, `pagerank`, `distance_to_prior_default`, `clustering_coefficient`, `neighbor_default_frac`, `neighbor_default_frac_2hop`
Interactions	7	`norm_in_x_dti`, `norm_in_x_nbr_def`, `dti_x_cb_default`, `crisis_exposure`, `degree_ratio`, `dti`, `is_recent_default`

Model Selection Rationale

Three graph embedding approaches were evaluated for hybrid models:

Method	Approach	Result
Spectral (Laplacian Eigenmaps)	Truncated SVD on normalized Laplacian → XGBoost	+0.0002 AUC over vanilla (negligible — hand-crafted graph features already capture the signal)
Node2Vec	Random walk skip-gram	Infeasible at 89K × 10 walks × 20 steps (150M+ pairs)
GraphSAGE	Supervised GNN with mini-batch sampling	AUC 0.50 (random) due to oversmoothing on avg-degree-76 graph

Conclusion: Hand-crafted graph features (neighbor default fractions, 2-hop contagion, PageRank) outperform learned embeddings on this graph topology. The dense, high-degree Prosper network causes GNN oversmoothing, while spectral embeddings capture redundant information. This is consistent with findings from Shchur et al., 2019 on the limitations of GNNs on dense graphs.

The production model uses XGBoost with hand-crafted features — the approach that delivers the best AUC with deterministic, interpretable feature attributions via SHAP.

API Performance

_{Benchmarked with 300 sequential requests against the Dockerized API (docker compose up)}

Metric	Value
Throughput	~450 req/s (cached)
p50 latency	2.2 ms
p95 latency	5.6 ms
p99 latency	7.8 ms
First-request latency	~130 ms (Postgres graph feature lookup)
Health check	< 4 ms
Dual-write overhead	Postgres + ClickHouse per scoring event

Scoring API Response

{
  "scoring_event_id": "29a7fa3a-abec-4260-87d3-e0638d99fb3e",
  "customer_id": "PROSPER_0",
  "risk_score": 0.151,
  "decision_band": "low",
  "top_factors": [
    {"feature": "crisis_exposure", "shap_value": -0.517},
    {"feature": "neighbor_default_frac", "shap_value": -0.352},
    {"feature": "in_degree", "shap_value": -0.249},
    {"feature": "norm_in_x_dti", "shap_value": -0.195},
    {"feature": "neighbor_default_frac_2hop", "shap_value": -0.177}
  ],
  "scored_at": "2026-03-08T16:15:34.744506+00:00"
}

Every response is dual-written to:

PostgreSQL — OLTP persistence with indexed lookups
ClickHouse — OLAP analytics with SummingMergeTree materialized views for pre-aggregated hourly metrics

Infrastructure

Services (9 containers, single `docker compose up`)

Service	Version	Port	Role
FastAPI	—	8000	Scoring API + domino simulation + Prometheus metrics
Streamlit	1.55	8501	Interactive dashboard (scoring, cascade viz, analytics)
MLflow	3.10.1	5001	Experiment tracking + model registry with `@champion` alias
Airflow	3.1.7	8080	Pipeline orchestration (12-task DAG with quality gate)
Neo4j	2026.02.2	7474	Graph database + GDS (PageRank, degree centrality)
PostgreSQL	17.9	5433	OLTP store (customers, relationships, graph features, scoring events)
ClickHouse	26.2.4	8123	OLAP analytics (scoring events + materialized hourly aggregation)
Prometheus	3.10.0	9090	Metrics collection (scrapes API `/metrics` every 15s)
Grafana	12.3.0	3000	Pre-provisioned 8-panel monitoring dashboard

Airflow DAG

12-task DAG with conditional branching:

load_and_prepare_data
    ├──▸ load_neo4j ──▸ compute_neo4j_features ──┐
    └──▸ init_clickhouse_tables ──────────────────┤
                                                  ▼
                                              dbt_build
                                                  │
                                            train_model
                                                  │
                                           evaluate_model
                                          ╱       │       ╲
                              check_quality   drift_check   push_metrics
                               ╱        ╲
                     register @champion  skip
                               ╲        ╱
                                 notify

Key design decisions:

Model is trained once, evaluated by reading MLflow metrics (no re-training)
Quality gate: only promoted to @champion if AUC ≥ 0.72
The API hot-reloads the champion model on promotion — zero-downtime deployment
Graph features computed via Neo4j GDS in production, NetworkX in dev/CI (dual backend)

Monitoring Stack

Grafana dashboard (auto-provisioned, zero config):

Panel	Metric	Source
Request rate	`rate(http_requests_total{handler="/score"}[5m])`	Prometheus
Latency p95	`histogram_quantile(0.95, ...)`	Prometheus
Error rate	5xx / total ratio	Prometheus
Risk band distribution	`credit_domino_scores_total` by band	Prometheus
API uptime	`up{job="credit-domino-api"}`	Prometheus
Model ROC-AUC	`credit_domino_model_auc` gauge	API → Prometheus
Drifted columns	`credit_domino_drift_columns`	Evidently → Prometheus
Drift share	`credit_domino_drift_share`	Evidently → Prometheus

Drift detection: Evidently runs Kolmogorov-Smirnov tests on numeric features after each training cycle. Results are pushed to Prometheus via the API's /monitoring/drift endpoint, visible as Grafana gauges with alerting thresholds.

Data Pipeline (dbt)

Layer	Model	Materialization	Purpose
Staging	`stg_customers`	View	Type-cast, filter (age > 18, income > 0)
Staging	`stg_relationships`	View	Edge validation, self-loop removal
Staging	`stg_graph_features`	View	Type-cast graph metrics
Mart	`fct_credit_features`	Table	Joined customer + graph features with risk bands
Mart	`fct_scoring_log`	Incremental	Append-only scoring event log (CDC-ready)

Schema enforcement via schema.yml: uniqueness, not-null, accepted values. Custom macro interest_rate_risk_band for SQL-level risk categorization.

Quick Start

Prerequisites

Docker & Docker Compose
Python 3.13+ (for local dev)

1. Start everything (Docker)

git clone https://github.com/<your-username>/credit-domino && cd credit-domino
docker compose up -d --build           # 9 services (~3 min first build)
docker compose ps -a                    # verify all healthy

2. Trigger the pipeline

Open Airflow UI (credentials: admin / check docker compose logs airflow | grep password), enable the credit_domino_pipeline DAG, and trigger it. The DAG will:

Load 89K borrowers from the Prosper P2P dataset
Compute graph features (degree, PageRank, neighbor default rates)
Load the graph into Neo4j
Run dbt (staging + marts)
Train XGBoost (1,500 trees, 29 features)
Evaluate and promote to @champion if AUC ≥ 0.72
Run drift detection and push metrics to Prometheus

3. Score a borrower

curl -X POST http://localhost:8000/score \
  -H "Content-Type: application/json" \
  -d '{
    "customer_id": "PROSPER_42",
    "person_age": 35,
    "person_income": 60000,
    "person_home_ownership": "RENT",
    "person_emp_length": 5,
    "loan_intent": "PERSONAL",
    "loan_grade": "B",
    "loan_amnt": 10000,
    "loan_int_rate": 11.5,
    "loan_percent_income": 0.17,
    "cb_person_default_on_file": 0,
    "cb_person_cred_hist_length": 8
  }'

4. Explore

What	Where
Interactive scoring + cascade viz	localhost:8501 (Streamlit)
API documentation	localhost:8000/docs (Swagger UI)
Model experiments	localhost:5001 (MLflow)
Graph exploration	localhost:7474 (Neo4j Browser)
Monitoring dashboard	localhost:3000 (Grafana, `admin`/`admin`)
Pipeline runs	localhost:8080 (Airflow)
Metrics endpoint	localhost:8000/metrics (Prometheus format)

Local development

conda create -n credit-domino python=3.13 -y && conda activate credit-domino
pip install -e ".[dev]"

python scripts/prepare_credit_data.py   # CSV → Postgres + graph features
cd dbt && dbt build --profiles-dir . --project-dir . && cd ..
python scripts/train_model.py           # Train, evaluate, register @champion
uvicorn credit_domino.api:app --reload --port 8000

Testing

61 tests · 7 modules · ~10 seconds

pytest tests/ -v                        # run all tests
ruff check src/ tests/                  # lint
ruff format --check src/ tests/         # format check

Module	Tests	What's Covered
API	8	Request/response contracts, validation errors, health/ready probes
Data loading	12	Column integrity, null handling, outlier filtering, deterministic graph generation
dbt	4	Schema compilation, incremental config, test presence
Graph features	7	Feature completeness, type correctness, BFS distance, determinism
Modeling	14	Feature assembly, label encoding, training convergence, SHAP computation
Drift monitoring	2	Report structure, feature column coverage
Simulation	8	Cascade propagation, decay mechanics, threshold behavior, star graph topology
Smoke	6	Config loading, structured logging, module importability

CI pipeline (GitHub Actions): Lint → Test (with Postgres service container) → dbt compile → Docker build.

Tech Stack

Layer	Tool	Why This Tool
OLTP	PostgreSQL 17.9	Source-of-truth for borrower profiles, relationships, and scoring events. Every prediction is persisted with indexed customer lookups.
OLAP	ClickHouse 26.2	Scoring events are dual-written for sub-second analytical queries. `SummingMergeTree` with materialized views provides pre-aggregated hourly metrics without impacting API latency at scale.
Graph	Neo4j 2026.02 + GDS	PageRank and degree centrality on the 89K-node, 3.4M-edge Prosper graph. GDS provides scalable graph algorithms; a dual NetworkX backend enables CI testing without Neo4j.
Transforms	dbt 1.11	SQL-based transformations with schema enforcement (`schema.yml` tests), incremental materialization for the scoring log, and built-in data lineage.
ML	XGBoost 3.2 + SHAP	Gradient boosted trees with per-prediction explainability via TreeExplainer. Every API response includes the top-5 features that drove the risk score.
Registry	MLflow 3.10	Experiment tracking with artifact persistence (models, label encoders, SHAP plots). Alias-based promotion (`@champion`) enables zero-downtime model updates.
Orchestration	Airflow 3.1	12-task DAG with conditional branching: the model is only promoted if AUC passes the quality gate. XCom-based data passing between tasks.
Serving	FastAPI	Pydantic validation, auto-generated OpenAPI docs, Prometheus instrumentation via middleware. Non-root Docker container with health/readiness probes.
Dashboard	Streamlit 1.55	Interactive scoring form, pyvis network visualization of cascade propagation, ClickHouse-backed analytics tab.
Monitoring	Prometheus 3.10 + Grafana 12.3	8-panel auto-provisioned dashboard. Custom gauges for model AUC and drift metrics, pushed from the Airflow DAG through the API.
Drift	Evidently 0.7	Kolmogorov-Smirnov tests on numeric features. Integrated into the Airflow DAG and pushed to Prometheus for Grafana alerting.
CI	GitHub Actions	4-stage pipeline: lint → test (61 tests, Postgres service container) → dbt compile → Docker build.

Project Structure

├── src/credit_domino/              # Core Python package (~3,000 lines)
│   ├── api/                        #   FastAPI: scoring, simulation, monitoring endpoints
│   ├── dashboard/                  #   Streamlit: 3-tab interactive UI
│   ├── data/                       #   Prosper P2P loader + synthetic data generator
│   ├── graph/                      #   Graph features (NetworkX + Neo4j dual backend)
│   ├── modeling/                   #   Train, evaluate, register (MLflow integration)
│   ├── monitoring/                 #   Evidently drift detection
│   └── simulation/                 #   BFS domino cascade with stress propagation
│
├── dbt/                            # dbt project
│   ├── models/staging/             #   3 staging views (type-cast, validate)
│   ├── models/marts/               #   2 mart tables (features, scoring log)
│   └── macros/                     #   risk_band SQL macro
│
├── airflow/dags/                   # 12-task DAG with quality gate branching
├── tests/                          # 61 pytest tests across 7 modules
├── scripts/                        # CLI tools: prepare data, train, demo walkthrough
├── infra/                          # Docker configs, Postgres init, Prometheus, Grafana
├── .github/workflows/ci.yml        # CI: lint → test → dbt → docker build
└── docker-compose.yml              # 9-service stack (single command startup)

Production Readiness

Concern	Implementation
Security	Non-root Docker user (`appuser`), multi-stage build, no build tools in runtime image
Health checks	`/health` (liveness) + `/ready` (readiness, 503 if model not loaded) on every container
Zero-downtime deploys	MLflow `@champion` alias — model promotion doesn't require container restart
Observability	Structured JSON logging (structlog), Prometheus metrics, Grafana dashboards
Data quality	dbt schema tests (unique, not_null, accepted_values), quality gate in Airflow DAG
Reproducibility	Seed-controlled data generation, MLflow experiment tracking with full parameter logging

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github/workflows		.github/workflows
airflow/dags		airflow/dags
data		data
dbt		dbt
infra		infra
public		public
scripts		scripts
src		src
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

Billy - Credit Domino

The Problem

The Solution

Key Results

Architecture

What Makes This Different

1. Graph features are first-class citizens, not afterthoughts

2. SHAP explainability on every prediction

3. Domino contagion simulation

Model Details

Champion Model (MLflow Run a8b97ef8)

Feature Categories

Model Selection Rationale

API Performance

Scoring API Response

Infrastructure

Services (9 containers, single docker compose up)

Airflow DAG

Monitoring Stack

Data Pipeline (dbt)

Quick Start

Prerequisites

1. Start everything (Docker)

2. Trigger the pipeline

3. Score a borrower

4. Explore

Local development

Testing

Tech Stack

Project Structure

Production Readiness

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Champion Model (MLflow Run `a8b97ef8`)

Services (9 containers, single `docker compose up`)

Packages