Record: SP8192 + 3-Layer Recurrence + Parallel Residuals + QK-Gain 5.25 + Legal TTT + Asynchronous Data Loader - val_bpb 1.0803#1532
Conversation
… aten::copy_ calls
5695e8f to
2ac8fa9
Compare
Community Review — Record: SP8192 + 3-Layer Recurrence + Parallel Residuals + QK-Gain 5.25 + Legal TTT + Asynchronous Data Loader - val_bpb 1.0803Compliance: NEEDS AUTHOR ACTION — What I found: The CPU smoke test on CT2038 (proteus-engine, 128 GB RAM, Triton 3.6.0, flash_attn stub, cutlass_evt_fusion stub) failed at the import step with: A few of the common patterns I've seen for this class of error in the 2026-04-11 sweep:
Recommendation: Could you run Once the parse/import issue is fixed, I'll re-run the compliance audit through the normal pipeline. No other flags identified yet because the audit halts at the import step. Reviewed by @MatoTeziTanka — The Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): IMPORT_FAIL — SyntaxError: f-string: expecting '}' (line 289). Classification via |
This submission introduces two system optimizations, both of
ShuffledSequenceLoader:next_batchlogic tonumpyto prevent redundant copies. By callingtorch.from_numpyonly at the final return, we reducedaten::copy_overhead by 50% on 1xH100 benchmarks.next_batchcall.Using these optimizations yielded a ~1.5% imrpovement in throughput (70-80 more steps).
Attaching the relevant code as non lzma here for anyone who'd like to use :)