Experimental project around a CFR+ solver (3-handed NLHE), an ML model approximating the policy, and a Next.js UI to explore the results.
- External-sampling CFR+ (3-handed, minimal-raise) in
cfr_solver.pyrelies onpoker_game_expresso.PokerGameExpressoand compact infoset keys ininfoset.py. - The average policy is serialized as compact gzipped JSON (bitmask + quantized values) in
policy/avg_policy.json.gzand duplicated for the UI inui/public/avg_policy.json.gz. - A PyTorch model (
ml/model.py) is trained to approximate this policy withml/train.py. Visualizations (e.g., preflop heatmap) are inml/viz.py. - The UI (
ui/) is a Next.js/TypeScript app that loads the policy frompublic/avg_policy.json.gz.
The preflop range visualization (aggregate raise/all-in vs fold ratio across 13x13 hand grid).
Interactive live testing table in the web UI where you can play against the GTO policy.
# Create a venv and install dependencies (example)
python -m venv .venv
source .venv/bin/activate
pip install torch numpy pandas treys tqdm seaborn matplotlibcfr_solver.py trains the solver, saves the policy, and exports a copy for the UI.
python cfr_solver.pyMain outputs:
policy/avg_policy.json.gzui/public/avg_policy.json.gzpolicy/avg_policy.csv(viastats_policy.extraction_policy_data())
Key parameters (edit in cfr_solver.py):
iterations(default 1_000_000)stacks(e.g.,(100, 100, 100))SAVE_EVERYfor checkpoints (0 = disabled)
python stats_policy.py # reads policy/avg_policy.json.gz and writes policy/avg_policy.csvstats_policy.py reconstructs action distributions, decodes infoset keys, and produces a CSV for quick exploration.
cd ml
python train.py # reads ../policy/avg_policy.json.gz, trains, and saves trained_policy_model.pthNotes:
- Input: 224-dim one-hot features (phase, role, hand 169, board 31, 3 normalized scalars, hero-vs-board 11)
- Output: 5 canonical actions
FOLD, CHECK, CALL, RAISE, ALL-IN - Loss: MSE over distributions (model outputs softmax)
From ml/:
python viz.py # produces ml/preflop_heatmap.png using trained_policy_model.pthYou can adjust role_id and bucketing parameters in ml/viz.py.
The UI lives in ui/. It reads public/avg_policy.json.gz.
cd ui
npm install
npm run dev # start UI in development modeKey sources in ui/src/:
lib/policy.ts,lib/infoset.ts,lib/game.ts: parsing/logiccomponents/andapp/for pages and widgets
Each entry in avg_policy.json.gz is:
{
"<infoset_key_u64>": {
"policy": [bitmask, q1, q2, ...],
"visits": <int capped>
}
}bitmask: which actions (by index) are presentq_i: quantized integers (0..255), sum adjusted to 255- Rebuild:
prob[action] = q_i / sum(q)in the order of set bits
Infoset fields are packed into a u64 (see infoset.py):
PHASE(3 bits),ROLE(2),HAND(8, 13x13 index),BOARD(5),POT(8),RATIO(8),SPR(8),HEROBOARD(4)
GTO_Bot/
cfr_solver.py # CFR+ training, policy export
poker_game_expresso.py # 3-handed env + betting/pot logic
infoset.py # Bucketing, u64 pack/unpack, 169 mapping
policy.py # Load/sample compact average policy
stats_policy.py # Decode policy -> CSV and stats
utils.py # Hand evaluation (Treys) and range I/O
ml/
model.py # PyTorch network
train.py # Training pipeline on policy
viz.py # Heatmaps and visualizations
policy/
avg_policy.json.gz # Average policy (solver output)
avg_policy.csv # Tabular export
ui/ # Next.js/TypeScript app
public/avg_policy.json.gz # Policy copy for the UI

