Add Stack Integration + Legal TTT submission package by Taleef7 · Pull Request #1050 · openai/parameter-golf

Taleef7 · 2026-03-29T02:02:06Z

Summary

add the 2026-03-28 Stack Integration + Legal TTT + Parallel Muon submission folder
include README.md, submission.json, train.log, and the three audited train_seed*.log evidence files
keep the PR diff limited to records/track_10min_16mb/2026-03-28_StackIntegration_LegalTTT_ParallelMuon/

Verification

python experiments/audit_submission_package.py records/track_10min_16mb/2026-03-28_StackIntegration_LegalTTT_ParallelMuon/train_seed1337.log records/track_10min_16mb/2026-03-28_StackIntegration_LegalTTT_ParallelMuon/train_seed42.log records/track_10min_16mb/2026-03-28_StackIntegration_LegalTTT_ParallelMuon/train_seed2025.log
reviewed clean-worktree diff scope against origin/main before push

Copilot

Pull request overview

Adds a new Track (10min / 16MB) submission package under records/track_10min_16mb/2026-03-28_StackIntegration_LegalTTT_ParallelMuon/, bundling the promoted training script plus audited 3-seed evidence logs and metadata for reviewer consumption.

Changes:

Add promoted train_gpt.py submission script and canonical train.log plus 3 audited train_seed*.log evidence logs.
Add README.md describing provenance, run contract, and evidence inventory.
Add submission.json capturing audited metrics and artifact size metadata.

Reviewed changes

Copilot reviewed 3 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
records/track_10min_16mb/2026-03-28_StackIntegration_LegalTTT_ParallelMuon/README.md	Reviewer-facing provenance + run/verification instructions for the promoted submission package.
records/track_10min_16mb/2026-03-28_StackIntegration_LegalTTT_ParallelMuon/submission.json	Audit-derived metadata (metrics, sizes, seeds) for the submission package.
records/track_10min_16mb/2026-03-28_StackIntegration_LegalTTT_ParallelMuon/train_gpt.py	The promoted training/eval script included as the submission’s executable artifact.
records/track_10min_16mb/2026-03-28_StackIntegration_LegalTTT_ParallelMuon/train.log	Canonical log (intended alias of seed 2025) for quick review.
records/track_10min_16mb/2026-03-28_StackIntegration_LegalTTT_ParallelMuon/train_seed42.log	Seeded evidence log (42) for the audited 3-seed package.
records/track_10min_16mb/2026-03-28_StackIntegration_LegalTTT_ParallelMuon/train_seed2025.log	Seeded evidence log (2025) used as the canonical `train.log` content.
records/track_10min_16mb/2026-03-28_StackIntegration_LegalTTT_ParallelMuon/train_seed1337.log	Seeded evidence log (1337) for the audited 3-seed package.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-29T02:04:59Z

+python experiments/verify_run.py \
+  records/track_10min_16mb/2026-03-28_StackIntegration_LegalTTT_ParallelMuon/train.log
+```


README references experiments/verify_run.py, but there is no experiments/ directory in this repo. This makes the documented run/verification command unusable; please update the instructions to point at an existing verifier in the repository, or include the verifier script in this submission folder.

Copilot · 2026-03-29T02:05:00Z

+```bash
+python experiments/audit_submission_package.py \
+  records/track_10min_16mb/2026-03-28_StackIntegration_LegalTTT_ParallelMuon/train_seed1337.log \
+  records/track_10min_16mb/2026-03-28_StackIntegration_LegalTTT_ParallelMuon/train_seed42.log \
+  records/track_10min_16mb/2026-03-28_StackIntegration_LegalTTT_ParallelMuon/train_seed2025.log
+```


README suggests regenerating metadata via python experiments/audit_submission_package.py, but that script/path does not exist in this repository. Either update the documentation to the correct script location/name or add the referenced audit helper to keep the package reproducible.

Copilot · 2026-03-29T02:05:00Z

+  "val_bpb": 1.1193666666666668,
+  "val_bpb_std": 0.0004642796092395205,
+  "bytes_total": 15990006,
+  "bytes_code": 91356,


bytes_code appears inconsistent with the evidence logs: in train_seed2025.log, Total submission size int6+lzma (15,990,006) minus Serialized model int6+lzma (15,900,548) implies a code size of 89,458 bytes, but this JSON records 91,356. Please correct bytes_code (and any derived fields) to match the audited artifact size computation.

Suggested change

"bytes_code": 91356,

"bytes_code": 89458,

…th a dif… - ".git/config" - "records/track_10min_16mb/2026-03-28_StackIntegration_LegalTTT_ParallelMuon/README.md" - ".gsd/KNOWLEDGE.md" - ".gsd/milestones/M001/slices/S05/tasks/T03-SUMMARY.md" GSD-Task: S05/T03

MatoTeziTanka · 2026-04-11T20:15:35Z

Community Review — Add Stack Integration + Legal TTT submission package

BPB: (not parsed — see PR title) | Compliance: LOOKS CLEAN — score-first-per-chunk TTT (legal #1416/#1423 pattern)

What I found in the code (head SHA ef4d3f3ae3ee, file records/track_10min_16mb/2026-03-28_StackIntegration_LegalTTT_ParallelMuon/train_gpt.py):

The TTT path at line 1074 implements the score-first-per-chunk pattern: each chunk is scored under torch.no_grad() / inference_mode() before the base_model.train() + SGD adaptation runs on that same chunk, with an is_last_chunk guard so the final chunk gets no adaptation pass. This is the structural shape the legal frontier uses (PRs #1416 erichroepke, #1423 aryanbhosale).

Per Issue #402 and Issue #677, TTT is legal when each token is scored before the adapter updates on it, and that's what the code does here — chunk ci is scored under weights adapted only on chunks 0..ci-1. No prequant_ttt_adapt_adamw(val_tokens, ...) multi-epoch fine-tune, no scored-region SLOT, no target-in-key n-gram cache.

CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.05s, dim=512, layers=11, vocab=1024, code=89458 B, SMOKE_TEST_PASS

Verdict: LOOKS CLEAN.

Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: MERGE pending standard checks (3-seed validation, 16MB artifact cap, 10-min wallclock on 8×H100 SXM). The compliance picture matches the legal reference frontier and no flags were raised by the classification pass.

Auto-classification caveat: this review was drafted by the AST-based classifier against a template derived from manually-reviewed cluster PRs (#1420, #1450, #1487, #1541, #1529, #1533, #1518). If I've misread a subtlety in your eval path — e.g., multi-epoch TTT that I mistook for single-pass, or a target-in-key lookup I missed in a helper function — please flag it and I'll re-run the audit manually.

Reviewed by @MatoTeziTanka — The Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.05s, dim=512, layers=11, vocab=1024, code=89458 B, SMOKE_TEST_PASS. Classification via deterministic AST-based classify_prs.py (pattern bank derived from ~65 manually-reviewed PRs earlier in the 2026-04-11 sweep). This review was auto-drafted from a template and spot-checked before posting — if the template misread your code, please call it out so I can iterate the classifier.

Add Stack Integration submission package

ef4d3f3

Copilot AI review requested due to automatic review settings March 29, 2026 02:02

Copilot started reviewing on behalf of Taleef7 March 29, 2026 02:02 View session

Copilot AI reviewed Mar 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Stack Integration + Legal TTT submission package#1050

Add Stack Integration + Legal TTT submission package#1050
Taleef7 wants to merge 1 commit intoopenai:mainfrom
Taleef7:s05-submission

Taleef7 commented Mar 29, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 29, 2026

Uh oh!

Copilot AI Mar 29, 2026

Uh oh!

Copilot AI Mar 29, 2026

Uh oh!

MatoTeziTanka commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Taleef7 commented Mar 29, 2026

Summary

Verification

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

MatoTeziTanka commented Apr 11, 2026

Community Review — Add Stack Integration + Legal TTT submission package

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants