Record: Vocab4096 + MLP4.0x + SLOT - val_bpb 1.0925 (3-seed mean)#1291
Record: Vocab4096 + MLP4.0x + SLOT - val_bpb 1.0925 (3-seed mean)#1291dentity007 wants to merge 4 commits intoopenai:mainfrom
Conversation
…er optimization, and SSM exploration
…3168) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
DGX Spark GB10 Ablation Data - PROTEUS Feature IntegrationRan overnight ablation tests on NVIDIA DGX Spark (GB10, 128GB unified memory, single GPU) to evaluate PROTEUS features before committing to 8xH100 runs. All tests use sp1024 data, SEED=42, TORCH_COMPILE_DISABLE=1 (Triton/inductor not supported on GB10 ARM). Phase 1: 3-Run Comparison (1000 iterations each)
Delta (Run 3 vs Run 1): -0.0122 train_bpb, -0.0267 post-EMA, -0.0145 INT6 roundtrip Phase 1 used TRAIN_BATCH_TOKENS=49152, VAL_BATCH_TOKENS=49152, full sliding window eval. Run 2 crashed during initialization (likely OOM from torch.compile fallback before TORCH_COMPILE_DISABLE was added). Phase 2: 7-Run Overnight Ablation (500 iterations each)All runs: VOCAB_SIZE=1024, ITERATIONS=500, WARMUP_STEPS=10, SLIDING_WINDOW_ENABLED=0
Isolated Feature Contributions (from ablation)
Key Conclusions
Hardware Details
Next StepsPlanning to run the parallel residuals configuration on 8xH100 with sp4096 data to validate BPB improvement at competition scale. The 2.3x throughput boost on GB10 could translate to more training steps within the 600s wallclock, amplifying the architecture advantage. |
Record: Vocab4096 + MLP4.0x + SLOT
val_bpb: 1.0925 (3-seed mean, std 0.0018) | ~15.95 MB | 8xH100 SXM | SLOT eval-time optimization
3-Seed Results
Merged SOTA (PR #1019): 1.1147 BPB (1.8822 nats).
This submission: 1.0925 BPB (~1.8432 nats).
Delta: -0.0390 nats (-0.0222 BPB). Clears the 0.005-nat threshold by 7.8x.
Architecture
Built on PR #1218 (@clarkkev) with SLOT from PR #1176 (@bigbag).
SLOT: Per-Batch Delta Optimization
After sliding window eval, optimizes a small delta vector [1,1,512] at the last hidden layer:
Delta re-initialized to zeros per batch. No cross-batch state. SLOT contribution: -0.007 BPB.
Legality
Credits
PR #1218 (@clarkkev), PR #1176 (@bigbag), PR #1019 (@abaybektursun)
Reproduction
Test Plan