Non-record: Swarm-Guided KG-Conditioned Training (val_bpb=1.1220) by michaelwinczuk · Pull Request #1081 · openai/parameter-golf

michaelwinczuk · 2026-03-29T15:11:37Z

Summary

val_bpb: 1.1220 (seed 1337, Legal TTT) | 15.96 MB | 8xH100 SXM
A multi-agent swarm (4 rule-based agents) makes training decisions via consensus voting every 800 steps
A 500K-node typed-edge knowledge graph conditions embedding initialization
Total swarm overhead: <300 microseconds across the entire 600s training window

What's Novel

No other submission treats the training process as a multi-agent system. Instead of a static script with fixed hyperparameters, 4 autonomous agents observe training signals and vote on interventions:

Agent	Controls	Logic
QAT Timing	When to enable quantization	Fires when warmdown begins, safety deadline at 65%
KG Weight	Knowledge graph influence	Ramps 0.3→0.5 early, tapers to 0.1 late
Gradient Health	Grad clipping threshold	Tightens if grad_norm > 2.0
MTP Weight	Multi-token prediction weight	Reduces 0.1→0.05 after 75%

The knowledge graph (500K nodes, 121K typed edges — CAUSES, REQUIRES, SOLVES, CONTRADICTS) is distilled to 358 token importance scores, compressed to 976 bytes, and used to bias embedding initialization.

Infrastructure

Built with a Think Tank Swarm — a multi-agent research system with knowledge graph traversal and typed-edge specialists. The swarm ran two research missions to design this approach, with a Voting Mesh evaluating feasibility within PG constraints.

Decision Log (Seed 1337)

Swarm: 8 cycles, 2 decisions
  cycle 1 step 800: kg_weight_agent kg_weight 0.3->0.5 (conf=0.75, 50us)
  cycle 4 step 3200: kg_weight_agent kg_weight 0.5->0.4 (conf=0.75, 39us)

Results

Metric	Score
Pre-quant (EMA)	1.1397
Post-quant (int6)	1.1481
Sliding window (stride=64)	1.1245
Legal TTT	1.1220
Artifact size	15,955,969 bytes

Files

train_gpt.py — Training script with swarm integration (counted in artifact)
swarm_agents.py — 4 agents + VotingMesh (imported, not counted)
kg_data.py — 976 bytes of compressed KG importance data (imported, not counted)

🤖 Generated with Claude Code

A multi-agent swarm (4 rule-based agents, <300us overhead) makes training decisions via consensus voting. A 500K-node knowledge graph conditions embedding initialization. Novel agentic training approach. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

MatoTeziTanka · 2026-04-11T20:06:30Z

[RETRACTED 2026-04-11] — This IMPORT_FAIL was a false positive. Root cause: sibling module exists in same records/ folder; runner sys.path bug. Your code is not broken. See correction below: #1081 (comment)

Community Review — Non-record: Swarm-Guided KG-Conditioned Training (val_bpb=1.1220)

Compliance: NEEDS AUTHOR ACTION — train_gpt.py fails to import on CT2038 (Python 3.10 / torch 2.10.0+cpu)

What I found: The CPU smoke test on CT2038 (proteus-engine, 128 GB RAM, Triton 3.6.0, flash_attn stub, cutlass_evt_fusion stub) failed at the import step with:

ModuleNotFoundError: No module named 'swarm_agents'

A few of the common patterns I've seen for this class of error in the 2026-04-11 sweep:

PEP 701 f-string nesting — e.g. log(f" {cat}: {", ".join(...)}") is valid Python 3.12+ but invalid Python 3.10 because the inner ", " re-enters the outer double-quote context. One-character fix: ', ' instead of ", ". See PR Record: SP8192 + Improved Parallel Residuals + Muon 0.97 + LR 0.03 + Legal TTT — val_bpb 1.07785 (3-seed mean) #1541 / Record: SP8192 + Triple Recurrence + Banking + Fused MLP + Muon 0.97 — val_bpb 1.0778 (3-seed mean) #1523 for reference.
Missing flash_attn variants — e.g. from flash_attn_interface import flash_attn_varlen_func when the wrapper script only stubs flash_attn_func. Not a PR defect on H100s, but the eval image / CPU preflight path needs a guarded import.
Local compiled extension — e.g. import cutlass_evt_fusion from a records/*/cutlass_evt_fusion/ subfolder that isn't on the import path at smoke time. Usually an import-order issue inside the script.
Actual syntax error — typo, missing bracket, etc.

Recommendation: Could you run python3 -c "import py_compile; py_compile.compile('train_gpt.py')" on your records-folder train_gpt.py under Python 3.10 specifically? The eval image is Python 3.10 per Issue #17 / the README, so any parse error on 3.10 blocks the submission at import time before any of the scored-eval logic runs.

Once the parse/import issue is fixed, I'll re-run the compliance audit through the normal pipeline. No other flags identified yet because the audit halts at the import step.

Reviewed by @MatoTeziTanka — The Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): IMPORT_FAIL — ModuleNotFoundError: No module named 'swarm_agents'. Classification via classify_prs.py AST-based classifier; full compliance audit deferred until the import issue is resolved. Auto-drafted from a template and spot-checked before posting.

Same class of bug as PR openai#1094: the CT2038 CPU smoke test runs `python records/.../train_gpt.py` from the repo root, so the submission directory is not on sys.path and the sibling swarm_agents.py / kg_data.py modules fail to import. Both files are already shipped in the submission folder; this patch prepends the script's own directory to sys.path so imports resolve regardless of eval-harness CWD. Verified: py_compile OK on Python 3.10.11, runtime import of both swarm_agents (VotingMesh, TrainingMetrics) and kg_data (KG_IMPORTANCE_B64) succeeds when executed from repo root. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

michaelwinczuk · 2026-04-11T20:56:41Z

@MatoTeziTanka thanks for the review — this PR shares a head branch with #977, and the import-fail fix is already at HEAD in eb068f0.

Root cause (same as #1094): swarm_agents.py and kg_data.py are both shipped in the submission folder next to train_gpt.py, but because CT2038 runs python records/.../train_gpt.py from the repo root, the submission directory isn't on sys.path and the sibling imports fail before any scored-eval logic runs.

Fix is 4 lines — prepend the script's own directory to sys.path before the first sibling import:

from flash_attn_interface import flash_attn_func as flash_attn_3_func
# Make the submission self-contained regardless of eval-harness CWD: the
# sibling swarm_agents.py and kg_data.py live next to this file but aren't
# on sys.path when the harness runs `python records/.../train_gpt.py` from
# the repo root.
sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
from swarm_agents import VotingMesh, TrainingMetrics

Verified locally under Python 3.10.11:

py_compile → OK
Running from repo root resolves both swarm_agents.VotingMesh / TrainingMetrics and kg_data.KG_IMPORTANCE_B64.

Ready for a re-run of the compliance audit whenever convenient.

MatoTeziTanka · 2026-04-11T21:51:04Z

Retraction — this IMPORT_FAIL was a bug in my smoke runner

Sorry @michaelwinczuk, this one's on me. I re-audited the IMPORT_FAIL I posted above and it was a false positive — the fault is in how my CPU smoke runner set up sys.path, not in your code.

What happened:

The runner imported your records/track_non_record_16mb/2026-03-29_SwarmGuided_KGConditioned_Training/train_gpt.py with only the script's folder implicitly on sys.path, so when your file did from swarm_agents import ... it couldn't resolve the sibling swarm_agents.py that lives in the same 2026-03-29_SwarmGuided_KGConditioned_Training/ directory. The error I reported — ModuleNotFoundError: No module named 'swarm_agents' — looked like a missing file, but I re-checked the head SHA eb068f0 and records/track_non_record_16mb/2026-03-29_SwarmGuided_KGConditioned_Training/swarm_agents.py is right there, committed to the PR, next to train_gpt.py.

Verified at head eb068f0:

records/track_non_record_16mb/2026-03-29_SwarmGuided_KGConditioned_Training/swarm_agents.py   ← sibling module, exists
records/track_non_record_16mb/2026-03-29_SwarmGuided_KGConditioned_Training/train_gpt.py   ← imports it

On the real eval image (Python 3.10, records/*/ as the working dir), this import resolves correctly because the records folder ends up on sys.path via the standard cwd-driven import or via the eval harness's per-record entry point.

Your PR is not broken by this error. I'm retracting the IMPORT_FAIL classification. I'll re-queue the full compliance audit (BPB check, n-gram / TTT / SLOT flags, etc.) on the current head and post findings separately.

Again — sorry for the noise. These community reviews only work if I actually read what I'm reviewing, and I didn't in this case.

MatoTeziTanka · 2026-04-11T22:20:29Z

Community Review — Non-record: Swarm-Guided KG-Conditioned Training (val_bpb=1.1220)

BPB: 1.1220 | Compliance: LOOKS CLEAN — score-first-per-chunk TTT (legal #1416/#1423 pattern)

What I found in the code (head SHA eb068f0a193d, file records/track_non_record_16mb/2026-03-29_SwarmGuided_KGConditioned_Training/train_gpt.py):

The TTT path at line 1105 implements the score-first-per-chunk pattern: each chunk is scored under torch.no_grad() / inference_mode() before the base_model.train() + SGD adaptation runs on that same chunk, with an is_last_chunk guard so the final chunk gets no adaptation pass. This is the structural shape the legal frontier uses (PRs #1416 erichroepke, #1423 aryanbhosale).

Per Issue #402 and Issue #677, TTT is legal when each token is scored before the adapter updates on it, and that's what the code does here — chunk ci is scored under weights adapted only on chunks 0..ci-1. No prequant_ttt_adapt_adamw(val_tokens, ...) multi-epoch fine-tune, no scored-region SLOT, no target-in-key n-gram cache.

CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.04s, dim=512, layers=11, vocab=1024, code=94136 B, SMOKE_TEST_PASS

Verdict: LOOKS CLEAN.

Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: MERGE pending standard checks (3-seed validation, 16MB artifact cap, 10-min wallclock on 8×H100 SXM). The compliance picture matches the legal reference frontier and no flags were raised by the classification pass.

Auto-classification caveat: this review was drafted by the AST-based classifier against a template derived from manually-reviewed cluster PRs (#1420, #1450, #1487, #1541, #1529, #1533, #1518). If I've misread a subtlety in your eval path — e.g., multi-epoch TTT that I mistook for single-pass, or a target-in-key lookup I missed in a helper function — please flag it and I'll re-run the audit manually.

Reviewed by @MatoTeziTanka — The Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.04s, dim=512, layers=11, vocab=1024, code=94136 B, SMOKE_TEST_PASS. Classification via deterministic AST-based classify_prs.py (pattern bank derived from ~65 manually-reviewed PRs earlier in the 2026-04-11 sweep). This review was auto-drafted from a template and spot-checked before posting — if the template misread your code, please call it out so I can iterate the classifier.

MatoTeziTanka · 2026-04-11T23:14:07Z

Correction to the review above — I cited PR #1416 (erichroepke) and PR #1423 (aryanbhosale) as "the legal frontier shape" for score-first-per-chunk TTT. That citation is wrong, but the verdict on this PR (LOOKS CLEAN) still stands.

What I verified when I posted the review: your train_gpt.py at line 1207 has is_last_chunk = (ci == num_chunks - 1) followed by if not is_last_chunk and args.ttt_epochs > 0: base_model.train() at line 1208, with torch.no_grad() scoring at lines 1677 and 1761. That's the legal score-first-per-chunk pattern and your code is correct.

What I got wrong: I listed #1416 and #1423 as the "legal frontier reference". They're not. At their current heads both have ttt_adapt_adamw(args, base_model, device, val_tokens, ...) — a flat-epoch AdamW fine-tune on val_tokens with no per-chunk score-first discipline, which is the pattern Issue #677 was opened to rule out. I conflated their folder-name branding with their actual code.

The correct legal reference is PR #1413 (dexhunter) — the current leaderboard entry at val_bpb 1.0828 (SP8192 + QK-Gain 5 + Legal Score-First TTT). I decompressed its lzma shim and verified the is_last_chunk + torch.no_grad() score-first accumulator pattern is present. Your code structure matches #1413, not #1416/#1423.

No change to the LOOKS CLEAN verdict. The citation is now fixed.

Correction by @MatoTeziTanka — The Agora. The review verdict stands; only the legal reference citation was wrong.

michaelwinczuk · 2026-04-11T23:18:00Z

@MatoTeziTanka Thank you again — both for the clean review and for coming back to correct the citation. Mistakes happen, and the fact that you re-verified against #1413 (dexhunter) and publicly fixed the reference rather than letting the wrong citation sit is genuinely appreciated. That's exactly the kind of diligence that makes these community audits trustworthy.

Also thanks for walking through the is_last_chunk + torch.no_grad() score-first pattern explicitly — it's helpful to have the legal shape spelled out on the record. I totally get it, and I really value the care you're putting into these reviews. Much respect.

MatoTeziTanka · 2026-04-12T14:50:48Z

Appreciate the kind words. The #1413 correction was necessary — bad citations erode trust faster than anything else in a community review process. Glad the corrected verdict matched what you knew about your own code.

michaelwinczuk and others added 3 commits March 29, 2026 08:11

Update README with improved framing and full architecture details

e6cf847

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Polish README: stronger hook, architect prominence, OpenAI relevance

30b555b

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-record: Swarm-Guided KG-Conditioned Training (val_bpb=1.1220)#1081

Non-record: Swarm-Guided KG-Conditioned Training (val_bpb=1.1220)#1081
michaelwinczuk wants to merge 4 commits intoopenai:mainfrom
michaelwinczuk:swarm-guided-kg-conditioned-training

michaelwinczuk commented Mar 29, 2026

Uh oh!

MatoTeziTanka commented Apr 11, 2026 •

edited

Loading

Uh oh!

michaelwinczuk commented Apr 11, 2026

Uh oh!

MatoTeziTanka commented Apr 11, 2026

Uh oh!

MatoTeziTanka commented Apr 11, 2026

Uh oh!

MatoTeziTanka commented Apr 11, 2026

Uh oh!

michaelwinczuk commented Apr 11, 2026

Uh oh!

MatoTeziTanka commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

michaelwinczuk commented Mar 29, 2026

Summary

What's Novel

Infrastructure

Decision Log (Seed 1337)

Results

Files

Uh oh!

MatoTeziTanka commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Community Review — Non-record: Swarm-Guided KG-Conditioned Training (val_bpb=1.1220)

Uh oh!

michaelwinczuk commented Apr 11, 2026

Uh oh!

MatoTeziTanka commented Apr 11, 2026

Retraction — this IMPORT_FAIL was a bug in my smoke runner

Uh oh!

MatoTeziTanka commented Apr 11, 2026

Community Review — Non-record: Swarm-Guided KG-Conditioned Training (val_bpb=1.1220)

Uh oh!

MatoTeziTanka commented Apr 11, 2026

Uh oh!

michaelwinczuk commented Apr 11, 2026

Uh oh!

MatoTeziTanka commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MatoTeziTanka commented Apr 11, 2026 •

edited

Loading