Non-record: Swarm-Guided KG-Conditioned Training (val_bpb=1.1220)#1081
Non-record: Swarm-Guided KG-Conditioned Training (val_bpb=1.1220)#1081michaelwinczuk wants to merge 4 commits intoopenai:mainfrom
Conversation
A multi-agent swarm (4 rule-based agents, <300us overhead) makes training decisions via consensus voting. A 500K-node knowledge graph conditions embedding initialization. Novel agentic training approach. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Community Review — Non-record: Swarm-Guided KG-Conditioned Training (val_bpb=1.1220)Compliance: NEEDS AUTHOR ACTION — What I found: The CPU smoke test on CT2038 (proteus-engine, 128 GB RAM, Triton 3.6.0, flash_attn stub, cutlass_evt_fusion stub) failed at the import step with: A few of the common patterns I've seen for this class of error in the 2026-04-11 sweep:
Recommendation: Could you run Once the parse/import issue is fixed, I'll re-run the compliance audit through the normal pipeline. No other flags identified yet because the audit halts at the import step. Reviewed by @MatoTeziTanka — The Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): IMPORT_FAIL — ModuleNotFoundError: No module named 'swarm_agents'. Classification via |
Same class of bug as PR openai#1094: the CT2038 CPU smoke test runs `python records/.../train_gpt.py` from the repo root, so the submission directory is not on sys.path and the sibling swarm_agents.py / kg_data.py modules fail to import. Both files are already shipped in the submission folder; this patch prepends the script's own directory to sys.path so imports resolve regardless of eval-harness CWD. Verified: py_compile OK on Python 3.10.11, runtime import of both swarm_agents (VotingMesh, TrainingMetrics) and kg_data (KG_IMPORTANCE_B64) succeeds when executed from repo root. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@MatoTeziTanka thanks for the review — this PR shares a head branch with #977, and the import-fail fix is already at HEAD in Root cause (same as #1094): Fix is 4 lines — prepend the script's own directory to from flash_attn_interface import flash_attn_func as flash_attn_3_func
# Make the submission self-contained regardless of eval-harness CWD: the
# sibling swarm_agents.py and kg_data.py live next to this file but aren't
# on sys.path when the harness runs `python records/.../train_gpt.py` from
# the repo root.
sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
from swarm_agents import VotingMesh, TrainingMetricsVerified locally under Python 3.10.11:
Ready for a re-run of the compliance audit whenever convenient. |
Retraction — this IMPORT_FAIL was a bug in my smoke runnerSorry @michaelwinczuk, this one's on me. I re-audited the IMPORT_FAIL I posted above and it was a false positive — the fault is in how my CPU smoke runner set up What happened: The runner imported your Verified at head On the real eval image (Python 3.10, Your PR is not broken by this error. I'm retracting the IMPORT_FAIL classification. I'll re-queue the full compliance audit (BPB check, n-gram / TTT / SLOT flags, etc.) on the current head and post findings separately. Again — sorry for the noise. These community reviews only work if I actually read what I'm reviewing, and I didn't in this case. |
Community Review — Non-record: Swarm-Guided KG-Conditioned Training (val_bpb=1.1220)BPB: 1.1220 | Compliance: LOOKS CLEAN — score-first-per-chunk TTT (legal #1416/#1423 pattern) What I found in the code (head SHA The TTT path at line 1105 implements the score-first-per-chunk pattern: each chunk is scored under Per Issue #402 and Issue #677, TTT is legal when each token is scored before the adapter updates on it, and that's what the code does here — chunk CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.04s, dim=512, layers=11, vocab=1024, code=94136 B, SMOKE_TEST_PASS Verdict: LOOKS CLEAN. Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: MERGE pending standard checks (3-seed validation, 16MB artifact cap, 10-min wallclock on 8×H100 SXM). The compliance picture matches the legal reference frontier and no flags were raised by the classification pass. Auto-classification caveat: this review was drafted by the AST-based classifier against a template derived from manually-reviewed cluster PRs (#1420, #1450, #1487, #1541, #1529, #1533, #1518). If I've misread a subtlety in your eval path — e.g., multi-epoch TTT that I mistook for single-pass, or a target-in-key lookup I missed in a helper function — please flag it and I'll re-run the audit manually. Reviewed by @MatoTeziTanka — The Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.04s, dim=512, layers=11, vocab=1024, code=94136 B, SMOKE_TEST_PASS. Classification via deterministic AST-based |
|
Correction to the review above — I cited PR #1416 (erichroepke) and PR #1423 (aryanbhosale) as "the legal frontier shape" for score-first-per-chunk TTT. That citation is wrong, but the verdict on this PR (LOOKS CLEAN) still stands. What I verified when I posted the review: your What I got wrong: I listed #1416 and #1423 as the "legal frontier reference". They're not. At their current heads both have The correct legal reference is PR #1413 (dexhunter) — the current leaderboard entry at val_bpb 1.0828 (SP8192 + QK-Gain 5 + Legal Score-First TTT). I decompressed its lzma shim and verified the No change to the LOOKS CLEAN verdict. The citation is now fixed. Correction by @MatoTeziTanka — The Agora. The review verdict stands; only the legal reference citation was wrong. |
|
@MatoTeziTanka Thank you again — both for the clean review and for coming back to correct the citation. Mistakes happen, and the fact that you re-verified against #1413 (dexhunter) and publicly fixed the reference rather than letting the wrong citation sit is genuinely appreciated. That's exactly the kind of diligence that makes these community audits trustworthy. Also thanks for walking through the is_last_chunk + torch.no_grad() score-first pattern explicitly — it's helpful to have the legal shape spelled out on the record. I totally get it, and I really value the care you're putting into these reviews. Much respect. |
|
Appreciate the kind words. The #1413 correction was necessary — bad citations erode trust faster than anything else in a community review process. Glad the corrected verdict matched what you knew about your own code. |
Summary
What's Novel
No other submission treats the training process as a multi-agent system. Instead of a static script with fixed hyperparameters, 4 autonomous agents observe training signals and vote on interventions:
The knowledge graph (500K nodes, 121K typed edges — CAUSES, REQUIRES, SOLVES, CONTRADICTS) is distilled to 358 token importance scores, compressed to 976 bytes, and used to bias embedding initialization.
Infrastructure
Built with a Think Tank Swarm — a multi-agent research system with knowledge graph traversal and typed-edge specialists. The swarm ran two research missions to design this approach, with a Voting Mesh evaluating feasibility within PG constraints.
Decision Log (Seed 1337)
Results
Files
train_gpt.py— Training script with swarm integration (counted in artifact)swarm_agents.py— 4 agents + VotingMesh (imported, not counted)kg_data.py— 976 bytes of compressed KG importance data (imported, not counted)🤖 Generated with Claude Code