openai · ibarrajo · Apr 1, 2026
diff --git a/records/track_10min_16mb/2026-04-01_ApproachH_FocalLoss/README.md b/records/track_10min_16mb/2026-04-01_ApproachH_FocalLoss/README.md
@@ -0,0 +1,35 @@
+# Approach H: Focal Loss
+
+## Summary
+
+Builds on Approach B (Int5 GPTQ + 33.6M params + SWA + XSA + VE) by replacing
+standard cross-entropy with focal loss during training.
+
+Focal loss: `loss = (1 - p_correct)^gamma * CE_loss` where gamma=2.0 (configurable
+via `FOCAL_GAMMA` env var). This down-weights easy tokens the model already predicts
+well and focuses gradient signal on hard tokens.
+
+Inspired by PR #1180 which achieved 1.0577 BPB using P2 loss `(1-p)^2` among other
+techniques (residual mixing, conv token mixer, wallclock-aware warmdown).
+
+## Key changes from Approach B
+
+1. **Focal loss** (training only): replaces `F.cross_entropy(..., reduction="mean")`
+   with `((1 - exp(-ce))^gamma * ce).mean()` in the model's `forward()` method.
+2. `FOCAL_GAMMA` env var (default 2.0, set to 0.0 for standard CE).
+
+No eval changes. No architecture changes. No artifact size impact.
+
+## Configuration
+
+```bash
+FOCAL_GAMMA=2.0  # default; 0.0 = standard CE
+```
+
+All other hyperparameters unchanged from Approach B defaults.
+
+## Expected outcome
+
+Focal loss should improve BPB by focusing training on hard tokens. The
+down-weighting factor `(1-p)^2` is strongest early in training when many tokens
+are hard, and naturally relaxes as the model improves.
diff --git a/records/track_10min_16mb/2026-04-01_ApproachH_FocalLoss/run.sh b/records/track_10min_16mb/2026-04-01_ApproachH_FocalLoss/run.sh
@@ -0,0 +1,12 @@
+#!/bin/bash
+# Approach H: Focal Loss training run
+# Base: ApproachB (Int5 GPTQ + 33.6M params) + focal loss (gamma=2.0)
+
+set -euo pipefail
+
+export NCCL_IB_DISABLE=1
+export RUN_ID=approach_h_focal
+export FOCAL_GAMMA=2.0
+
+# All other hyperparams inherit from ApproachB defaults in Hyperparameters class
+torchrun --standalone --nproc_per_node=8 train_gpt.py 2>&1 | tee train.log
diff --git a/records/track_10min_16mb/2026-04-01_ApproachH_FocalLoss/submission.json b/records/track_10min_16mb/2026-04-01_ApproachH_FocalLoss/submission.json
@@ -0,0 +1,11 @@
+{
+    "author": "Alex Ibarra",
+    "github_id": "elninja",
+    "name": "ApproachH_FocalLoss",
+    "blurb": "Focal loss (gamma=2) on top of Int5 GPTQ + 33.6M params + SWA + XSA + VE. Down-weights easy tokens to focus training on hard tokens.",
+    "date": "2026-04-01",
+    "val_loss": null,
+    "val_bpb": null,
+    "bytes_total": null,
+    "bytes_code": null
+}