openai · TheDryhtscipe · Mar 27, 2026
diff --git a/records/track_non_record_16mb/2026-03-27_ConsensusWindow_Bypass_FAT_Golf/README.md b/records/track_non_record_16mb/2026-03-27_ConsensusWindow_Bypass_FAT_Golf/README.md
@@ -0,0 +1,38 @@
+# ConsensusWindow Bypass (FAT-Golf)
+
+Adds a depthwise causal convolution bypass path to the SOTA baseline, derived from the ORC FAT-AR architecture (Factorized Attention Transformer for Autoregressive generation).
+
+## Changes from baseline (abaybektursun, 1.1194 BPB)
+
+Two additions (~90 lines of new code, ~47K params, ~0.2% of ~22M model):
+
+1. **ConsensusWindowEmbed**: replaces SmearGate (1-token lookback, 512 params) with a depthwise causal conv1d (16-token receptive field, ~9K params). Learns per-channel weighted sum over local context at the embedding level.
+
+2. **ConsensusBlockBypass** on deepest 4 layers: gated parallel path alongside attention. Each block gets a depthwise causal conv that processes the same normed input as attention, with a per-dimension sigmoid gate (initialized 80% attention / 20% bypass) blending the outputs.
+
+Everything else is identical: Muon, parameter banking, int6 QAT, EMA/SWA, BigramHash, XSA, Partial RoPE, LN Scale, VE, LeakyReLU(0.5)^2, TTT.
+
+## Status
+
+**Small-scale results only** — awaiting H100 compute for full-scale validation.
+
+Tested at 256d, 6 layers, 500 steps on a single 4060 Ti 8GB.
+
+### Results (3-seed means)
+
+Pre-EMA val_bpb: baseline 2.3477, ours 2.3208 (delta -0.027)
+Post-EMA+int6 val_bpb: baseline 2.4185, ours 2.3438 (delta -0.075)
+
+Key finding: the combined architecture produces weights far more robust to EMA averaging and int6 quantization (EMA+quant penalty +0.023 vs baseline's +0.071). Neither component alone beats baseline post-quantization — they must be combined for the synergistic effect.
+
+## Environment variables
+
+```
+CONSENSUS_WINDOW_SIZE=32    # Conv1d receptive field (0 = use SmearGate)
+CONSENSUS_BYPASS_LAST_N=4   # Number of deepest layers with bypass
+CONSENSUS_EMA_EXCLUDE=0     # Exclude consensus from EMA (not recommended)
+```
+
+## Source
+
+Full repository with tests and ablation scripts: https://github.com/TheDryhtscipe/golf-model-1
diff --git a/records/track_non_record_16mb/2026-03-27_ConsensusWindow_Bypass_FAT_Golf/requirements.txt b/records/track_non_record_16mb/2026-03-27_ConsensusWindow_Bypass_FAT_Golf/requirements.txt
@@ -0,0 +1,3 @@
+torch>=2.5.0
+numpy
+sentencepiece
diff --git a/records/track_non_record_16mb/2026-03-27_ConsensusWindow_Bypass_FAT_Golf/submission.json b/records/track_non_record_16mb/2026-03-27_ConsensusWindow_Bypass_FAT_Golf/submission.json
@@ -0,0 +1,12 @@
+{
+    "author": "TheDryhtscipe",
+    "github_id": "TheDryhtscipe",
+    "name": "ConsensusWindow Bypass (FAT-Golf)",
+    "blurb": "Depthwise causal conv bypass from ORC FAT-AR. Replaces SmearGate + adds gated deep-layer bypass. Small-scale: -0.075 BPB post-quantization, 3x more robust to EMA+int6.",
+    "date": "2026-03-27T00:00:00Z",
+    "val_loss": 0.0,
+    "val_bpb": 0.0,
+    "bytes_total": 0,
+    "bytes_code": 0,
+    "note": "Small-scale results only (256d/6L/500steps on 4060 Ti). Awaiting H100 compute for full-scale validation."
+}