Starter kit: align submission schema and path safety by SID-6921 · Pull Request #1262 · openai/parameter-golf

SID-6921 · 2026-04-02T18:48:45Z

Follow-up fixes from review:

Align submission.json schema and track naming with repository conventions
Update prepare_submission.py track options to 10min_16mb / non_record_16mb
Sanitize --run-name and add resolve-time safety guard against path traversal
Update starter docs/examples to match normalized track values

This keeps generated folders metadata-compatible and safer for PR-ready usage.

Add starter kit for low-budget RunPod workflow

Copilot

Pull request overview

Adds a small “starter kit” workflow to help users run baseline experiments and generate PR-ready records/ submission folders with normalized track names and some path-safety checks.

Changes:

Introduces starter-kit docs/templates for README.md + submission.json.
Adds prepare_submission.py to generate a new records/track_*/<date>_<run> folder with required files.
Adds RunPod bootstrap + smoke/full run helper scripts and an experiment log template.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
starter_kit/templates/submission.json.template	Adds a `submission.json` starter template for new records.
starter_kit/templates/README_submission_template.md	Adds a README template to accompany generated record folders.
starter_kit/START_HERE.md	Adds step-by-step starter documentation for new contributors.
starter_kit/scripts/prepare_submission.py	Adds a CLI tool to generate PR-ready `records/` folders and sanitize run folder names.
starter_kit/scripts/01_runpod_bootstrap.sh	Adds a RunPod bootstrap script (clone + download small dataset slice).
starter_kit/scripts/02_smoke_run.sh	Adds a quick smoke run script with a short wallclock cap.
starter_kit/scripts/03_full_run.sh	Adds a ~10-minute baseline-style run script.
starter_kit/notes/EXPERIMENT_LOG_TEMPLATE.md	Adds an experiment log template for tracking runs/changes/outcomes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

starter_kit/scripts/prepare_submission.py

starter_kit/templates/submission.json.template

SID-6921 · 2026-04-02T20:10:25Z

Follow-up update: addressed all Copilot review points in the latest commit (3cde174).

What was fixed:

Hardened --source-train-script handling in starter_kit/scripts/prepare_submission.py (reject absolute paths, resolve against repo root, and enforce in-repo resolved path).
Removed nonstandard size field from generated/template submission.json for schema alignment.
Kept human-readable --run-name for display metadata (submission.json[name] and README title), while still using sanitized value for folder slug/path safety.
Preserved normalized track usage (10min_16mb / non_record_16mb) to match existing records/ layout.

Also updated submission metadata author to: Siddhardha Nanda (github_id: SID-6921).

If this now looks good, could I please get a final review/approval?

SID-6921 · 2026-04-02T20:24:20Z

Added one-command experiment automation for immediate non-record iteration:

starter_kit/scripts/04_non_record_a40_campaign.sh (10-run A40 campaign)
starter_kit/scripts/rank_campaign_results.py (auto ranking by final roundtrip val_bpb)
START_HERE updated with usage

This is now in commit 2ed635d and on this PR branch.

Run command from repo root:

bash starter_kit/scripts/04_non_record_a40_campaign.sh

SID-6921 · 2026-04-02T22:54:02Z

Campaign complete! 10 runs finished on A40. Best result: R08 Higher-LR Matrix/Scalar — val_bpb 2.1827 (vs baseline 3.2686, ~33% improvement).

Full ranking:

#	Run	val_bpb	bytes
1	R08_lr_matrix_up	2.1827	9.9 MB
2	R07_qk_gain_high	2.2058	9.5 MB
3	R02_warmdown_short	2.2161	9.3 MB
4	R04_batch_half	2.2202	9.4 MB

Key changes in R08: MATRIX_LR=0.05, SCALAR_LR=0.04, TIED_EMBED_LR=0.05, TRAIN_BATCH_TOKENS=1M, WARMDOWN_ITERS=400, ITERATIONS=60.

Committed to this PR in d281106 at
ecords/track_non_record_16mb/2026-04-02_R08_LRMatrixUp_A40_17M/.

SID-6921 · 2026-04-03T03:46:16Z

Run update: we had strong run logs trending around val_bpb ~1.38 and improving, and were on track for a better push, but we ran out of credits before we could complete and package that run.

SID-6921 · 2026-04-03T03:46:48Z

Proof from RunPod terminal logs (H100 run, pre-interruption):

step:2000/500000 val_loss:2.4183 val_bpb:1.4323 train_time:647140ms step_avg:323.57ms
step:4000/500000 val_loss:2.3438 val_bpb:1.3881 train_time:1292708ms step_avg:323.18ms
step:6000/500000 val_loss:2.3221 val_bpb:1.3753 train_time:1938046ms step_avg:323.01ms
step:8000/500000 val_loss:2.3079 val_bpb:1.3669 train_time:2584614ms step_avg:323.08ms
step:26000/500000 val_loss:2.2954 val_bpb:1.3595 train_time:8395665ms step_avg:322.91ms
step:30000/500000 val_loss:2.2947 val_bpb:1.3591 train_time:9687638ms step_avg:322.92ms

Interruption seen in logs before we could package the run:
W0403 ... Received 1 death signal, shutting down workers
W0403 ... closing signal SIGHUP
torch.distributed.elastic.multiprocessing.api.SignalException: Process 1029 got signal: 1

Given credit limits, we preserved and submitted the last fully packaged valid run.

MatoTeziTanka · 2026-04-11T20:15:01Z

[RETRACTED 2026-04-11] — This IMPORT_FAIL was a false positive. Root cause: classifier ast.parse() didn't handle UTF-8 BOM; smoke test actually PASSED. Your code is not broken. See correction below: #1262 (comment)

Community Review — Starter kit: align submission schema and path safety

Compliance: NEEDS AUTHOR ACTION — train_gpt.py fails to import on CT2038 (Python 3.10 / torch 2.10.0+cpu)

What I found: The CPU smoke test on CT2038 (proteus-engine, 128 GB RAM, Triton 3.6.0, flash_attn stub, cutlass_evt_fusion stub) failed at the import step with:

syntax error at line 1: invalid non-printable character U+FEFF

A few of the common patterns I've seen for this class of error in the 2026-04-11 sweep:

PEP 701 f-string nesting — e.g. log(f" {cat}: {", ".join(...)}") is valid Python 3.12+ but invalid Python 3.10 because the inner ", " re-enters the outer double-quote context. One-character fix: ', ' instead of ", ". See PR Record: SP8192 + Improved Parallel Residuals + Muon 0.97 + LR 0.03 + Legal TTT — val_bpb 1.07785 (3-seed mean) #1541 / Record: SP8192 + Triple Recurrence + Banking + Fused MLP + Muon 0.97 — val_bpb 1.0778 (3-seed mean) #1523 for reference.
Missing flash_attn variants — e.g. from flash_attn_interface import flash_attn_varlen_func when the wrapper script only stubs flash_attn_func. Not a PR defect on H100s, but the eval image / CPU preflight path needs a guarded import.
Local compiled extension — e.g. import cutlass_evt_fusion from a records/*/cutlass_evt_fusion/ subfolder that isn't on the import path at smoke time. Usually an import-order issue inside the script.
Actual syntax error — typo, missing bracket, etc.

Recommendation: Could you run python3 -c "import py_compile; py_compile.compile('train_gpt.py')" on your records-folder train_gpt.py under Python 3.10 specifically? The eval image is Python 3.10 per Issue #17 / the README, so any parse error on 3.10 blocks the submission at import time before any of the scored-eval logic runs.

Once the parse/import issue is fixed, I'll re-run the compliance audit through the normal pipeline. No other flags identified yet because the audit halts at the import step.

Reviewed by @MatoTeziTanka — The Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): IMPORT_FAIL — syntax error at line 1: invalid non-printable character U+FEFF. Classification via classify_prs.py AST-based classifier; full compliance audit deferred until the import issue is resolved. Auto-drafted from a template and spot-checked before posting.

MatoTeziTanka · 2026-04-11T21:51:39Z

Retraction — this IMPORT_FAIL was a UTF-8 BOM handling bug in my classifier

Sorry @SID-6921, this one's on me. Your CPU smoke test on CT2038 actually passed — the IMPORT_FAIL I reported above came from a separate classifier step, and it was a bug in the classifier, not in your code.

What happened:

My classifier does an ast.parse() walk over your file to check for n-gram family bugs, SLOT masks, Pre-Quant TTT patterns, etc. It opened records/track_non_record_16mb/2026-04-02_R08_LRMatrixUp_A40_17M/train_gpt.py in plain UTF-8 mode, and your file starts with a UTF-8 byte-order mark (U+FEFF, bytes EF BB BF). Python's own parser handles BOMs fine via encoding='utf-8-sig', but my classifier used plain UTF-8, so ast.parse() threw invalid non-printable character U+FEFF at line 1.

The smoke runner's importlib path (which uses Python's real loader) imported your file without issue and reported:

SMOKE_TEST_PASS — import OK, HAS_HYPERPARAMETERS=True, HAS_GPT=True

Your PR is not broken. Python accepts BOMs; my classifier's ast walk was buggy. I'm retracting the IMPORT_FAIL classification. I'll re-queue the compliance audit now that the BOM-handling bug is identified and post findings separately.

Again — sorry for the noise.

SID-6921 and others added 3 commits April 2, 2026 14:23

Add starter kit for low-budget RunPod workflow

e49f73c

Merge pull request #1 from SID-6921/starter/full-scale-repo

f215f74

Add starter kit for low-budget RunPod workflow

Align starter kit schema and harden submission path handling

abee85d

Copilot AI review requested due to automatic review settings April 2, 2026 18:48

Copilot started reviewing on behalf of SID-6921 April 2, 2026 18:49 View session

Copilot AI reviewed Apr 2, 2026

View reviewed changes

SID-6921 added 2 commits April 2, 2026 15:46

non_record_16mb: Stock Baseline GPT 17M sp1024 A40 — val_bpb 3.2686

9759195

starter_kit: address Copilot review + set author metadata

3cde174

starter_kit: add 10-run A40 campaign runner and auto-ranking

2ed635d

non_record_16mb: R08 Higher-LR Matrix/Scalar GPT 17M A40 val_bpb 2.1827

d281106

SID-6921 added 4 commits April 2, 2026 18:58

train: add swiglu/warmdown-frac knobs + A40 top-chase script

e5db118

train: add scaled_dot_product_attention gqa compatibility fallback

225e576

train: add ENABLE_COMPILE toggle and disable compile for top-chase

60cdb1a

starter_kit: add H100 top-chase campaign script

c15023e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Starter kit: align submission schema and path safety#1262

Starter kit: align submission schema and path safety#1262
SID-6921 wants to merge 11 commits intoopenai:mainfrom
SID-6921:main

SID-6921 commented Apr 2, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SID-6921 commented Apr 2, 2026

Uh oh!

SID-6921 commented Apr 2, 2026

Uh oh!

SID-6921 commented Apr 2, 2026

Uh oh!

SID-6921 commented Apr 3, 2026

Uh oh!

SID-6921 commented Apr 3, 2026

Uh oh!

MatoTeziTanka commented Apr 11, 2026 •

edited

Loading

Uh oh!

MatoTeziTanka commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

SID-6921 commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SID-6921 commented Apr 2, 2026

Uh oh!

SID-6921 commented Apr 2, 2026

Uh oh!

SID-6921 commented Apr 2, 2026

Uh oh!

SID-6921 commented Apr 3, 2026

Uh oh!

SID-6921 commented Apr 3, 2026

Uh oh!

MatoTeziTanka commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Community Review — Starter kit: align submission schema and path safety

Uh oh!

MatoTeziTanka commented Apr 11, 2026

Retraction — this IMPORT_FAIL was a UTF-8 BOM handling bug in my classifier

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SID-6921 commented Apr 2, 2026 •

edited

Loading

MatoTeziTanka commented Apr 11, 2026 •

edited

Loading