Commit 7ea2371
Swap ReLU² → LeakyReLU(0.5)² in MLP activation
Multiple top PRs (openai#535, openai#549, openai#569) demonstrate -0.0015 to -0.003 bpb
from this change. LeakyReLU preserves gradient flow through negative
pre-activations while maintaining the sparsity/gating benefits of
squaring. At 22M params, dead neurons from hard ReLU are expensive.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent 01724f3 commit 7ea2371
File tree
1 file changed
+1
-1
lines changed- records/track_10min_16mb/2026-03-23_Reproduce414_LegalTTT
1 file changed
+1
-1
lines changedLines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
614 | 614 | | |
615 | 615 | | |
616 | 616 | | |
617 | | - | |
| 617 | + | |
618 | 618 | | |
619 | 619 | | |
620 | 620 | | |
| |||
0 commit comments