Skip to content

Starter kit: align submission schema and path safety#1262

Open
SID-6921 wants to merge 11 commits intoopenai:mainfrom
SID-6921:main
Open

Starter kit: align submission schema and path safety#1262
SID-6921 wants to merge 11 commits intoopenai:mainfrom
SID-6921:main

Conversation

@SID-6921
Copy link
Copy Markdown

@SID-6921 SID-6921 commented Apr 2, 2026

Follow-up fixes from review:

  • Align submission.json schema and track naming with repository conventions
  • Update prepare_submission.py track options to 10min_16mb / non_record_16mb
  • Sanitize --run-name and add resolve-time safety guard against path traversal
  • Update starter docs/examples to match normalized track values

This keeps generated folders metadata-compatible and safer for PR-ready usage.

Copilot AI review requested due to automatic review settings April 2, 2026 18:48
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a small “starter kit” workflow to help users run baseline experiments and generate PR-ready records/ submission folders with normalized track names and some path-safety checks.

Changes:

  • Introduces starter-kit docs/templates for README.md + submission.json.
  • Adds prepare_submission.py to generate a new records/track_*/<date>_<run> folder with required files.
  • Adds RunPod bootstrap + smoke/full run helper scripts and an experiment log template.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
starter_kit/templates/submission.json.template Adds a submission.json starter template for new records.
starter_kit/templates/README_submission_template.md Adds a README template to accompany generated record folders.
starter_kit/START_HERE.md Adds step-by-step starter documentation for new contributors.
starter_kit/scripts/prepare_submission.py Adds a CLI tool to generate PR-ready records/ folders and sanitize run folder names.
starter_kit/scripts/01_runpod_bootstrap.sh Adds a RunPod bootstrap script (clone + download small dataset slice).
starter_kit/scripts/02_smoke_run.sh Adds a quick smoke run script with a short wallclock cap.
starter_kit/scripts/03_full_run.sh Adds a ~10-minute baseline-style run script.
starter_kit/notes/EXPERIMENT_LOG_TEMPLATE.md Adds an experiment log template for tracking runs/changes/outcomes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@SID-6921
Copy link
Copy Markdown
Author

SID-6921 commented Apr 2, 2026

Follow-up update: addressed all Copilot review points in the latest commit (3cde174).

What was fixed:

  • Hardened --source-train-script handling in starter_kit/scripts/prepare_submission.py (reject absolute paths, resolve against repo root, and enforce in-repo resolved path).
  • Removed nonstandard size field from generated/template submission.json for schema alignment.
  • Kept human-readable --run-name for display metadata (submission.json[name] and README title), while still using sanitized value for folder slug/path safety.
  • Preserved normalized track usage (10min_16mb / non_record_16mb) to match existing records/ layout.

Also updated submission metadata author to: Siddhardha Nanda (github_id: SID-6921).

If this now looks good, could I please get a final review/approval?

@SID-6921
Copy link
Copy Markdown
Author

SID-6921 commented Apr 2, 2026

Added one-command experiment automation for immediate non-record iteration:

  • starter_kit/scripts/04_non_record_a40_campaign.sh (10-run A40 campaign)
  • starter_kit/scripts/rank_campaign_results.py (auto ranking by final roundtrip val_bpb)
  • START_HERE updated with usage

This is now in commit 2ed635d and on this PR branch.

Run command from repo root:

bash starter_kit/scripts/04_non_record_a40_campaign.sh

@SID-6921
Copy link
Copy Markdown
Author

SID-6921 commented Apr 2, 2026

Campaign complete! 10 runs finished on A40. Best result: R08 Higher-LR Matrix/Scalar — val_bpb 2.1827 (vs baseline 3.2686, ~33% improvement).

Full ranking:

# Run val_bpb bytes
1 R08_lr_matrix_up 2.1827 9.9 MB
2 R07_qk_gain_high 2.2058 9.5 MB
3 R02_warmdown_short 2.2161 9.3 MB
4 R04_batch_half 2.2202 9.4 MB

Key changes in R08: MATRIX_LR=0.05, SCALAR_LR=0.04, TIED_EMBED_LR=0.05, TRAIN_BATCH_TOKENS=1M, WARMDOWN_ITERS=400, ITERATIONS=60.

Committed to this PR in d281106 at
ecords/track_non_record_16mb/2026-04-02_R08_LRMatrixUp_A40_17M/.

@SID-6921
Copy link
Copy Markdown
Author

SID-6921 commented Apr 3, 2026

Run update: we had strong run logs trending around val_bpb ~1.38 and improving, and were on track for a better push, but we ran out of credits before we could complete and package that run.

@SID-6921
Copy link
Copy Markdown
Author

SID-6921 commented Apr 3, 2026

Proof from RunPod terminal logs (H100 run, pre-interruption):

step:2000/500000 val_loss:2.4183 val_bpb:1.4323 train_time:647140ms step_avg:323.57ms
step:4000/500000 val_loss:2.3438 val_bpb:1.3881 train_time:1292708ms step_avg:323.18ms
step:6000/500000 val_loss:2.3221 val_bpb:1.3753 train_time:1938046ms step_avg:323.01ms
step:8000/500000 val_loss:2.3079 val_bpb:1.3669 train_time:2584614ms step_avg:323.08ms
step:26000/500000 val_loss:2.2954 val_bpb:1.3595 train_time:8395665ms step_avg:322.91ms
step:30000/500000 val_loss:2.2947 val_bpb:1.3591 train_time:9687638ms step_avg:322.92ms

Interruption seen in logs before we could package the run:
W0403 ... Received 1 death signal, shutting down workers
W0403 ... closing signal SIGHUP
torch.distributed.elastic.multiprocessing.api.SignalException: Process 1029 got signal: 1

Given credit limits, we preserved and submitted the last fully packaged valid run.

@MatoTeziTanka
Copy link
Copy Markdown

MatoTeziTanka commented Apr 11, 2026

[RETRACTED 2026-04-11] — This IMPORT_FAIL was a false positive. Root cause: classifier ast.parse() didn't handle UTF-8 BOM; smoke test actually PASSED. Your code is not broken. See correction below: #1262 (comment)


Community Review — Starter kit: align submission schema and path safety

Compliance: NEEDS AUTHOR ACTION — train_gpt.py fails to import on CT2038 (Python 3.10 / torch 2.10.0+cpu)

What I found: The CPU smoke test on CT2038 (proteus-engine, 128 GB RAM, Triton 3.6.0, flash_attn stub, cutlass_evt_fusion stub) failed at the import step with:

syntax error at line 1: invalid non-printable character U+FEFF

A few of the common patterns I've seen for this class of error in the 2026-04-11 sweep:

Recommendation: Could you run python3 -c "import py_compile; py_compile.compile('train_gpt.py')" on your records-folder train_gpt.py under Python 3.10 specifically? The eval image is Python 3.10 per Issue #17 / the README, so any parse error on 3.10 blocks the submission at import time before any of the scored-eval logic runs.

Once the parse/import issue is fixed, I'll re-run the compliance audit through the normal pipeline. No other flags identified yet because the audit halts at the import step.


Reviewed by @MatoTeziTankaThe Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): IMPORT_FAIL — syntax error at line 1: invalid non-printable character U+FEFF. Classification via classify_prs.py AST-based classifier; full compliance audit deferred until the import issue is resolved. Auto-drafted from a template and spot-checked before posting.

@MatoTeziTanka
Copy link
Copy Markdown

Retraction — this IMPORT_FAIL was a UTF-8 BOM handling bug in my classifier

Sorry @SID-6921, this one's on me. Your CPU smoke test on CT2038 actually passed — the IMPORT_FAIL I reported above came from a separate classifier step, and it was a bug in the classifier, not in your code.

What happened:

My classifier does an ast.parse() walk over your file to check for n-gram family bugs, SLOT masks, Pre-Quant TTT patterns, etc. It opened records/track_non_record_16mb/2026-04-02_R08_LRMatrixUp_A40_17M/train_gpt.py in plain UTF-8 mode, and your file starts with a UTF-8 byte-order mark (U+FEFF, bytes EF BB BF). Python's own parser handles BOMs fine via encoding='utf-8-sig', but my classifier used plain UTF-8, so ast.parse() threw invalid non-printable character U+FEFF at line 1.

The smoke runner's importlib path (which uses Python's real loader) imported your file without issue and reported:

SMOKE_TEST_PASS — import OK, HAS_HYPERPARAMETERS=True, HAS_GPT=True

Your PR is not broken. Python accepts BOMs; my classifier's ast walk was buggy. I'm retracting the IMPORT_FAIL classification. I'll re-queue the compliance audit now that the BOM-handling bug is identified and post findings separately.

Again — sorry for the noise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants