Skip to content

Non-record: Swarm-Guided KG-Conditioned Training (val_bpb=1.1220)#1081

Open
michaelwinczuk wants to merge 4 commits intoopenai:mainfrom
michaelwinczuk:swarm-guided-kg-conditioned-training
Open

Non-record: Swarm-Guided KG-Conditioned Training (val_bpb=1.1220)#1081
michaelwinczuk wants to merge 4 commits intoopenai:mainfrom
michaelwinczuk:swarm-guided-kg-conditioned-training

Conversation

@michaelwinczuk
Copy link
Copy Markdown

Summary

  • val_bpb: 1.1220 (seed 1337, Legal TTT) | 15.96 MB | 8xH100 SXM
  • A multi-agent swarm (4 rule-based agents) makes training decisions via consensus voting every 800 steps
  • A 500K-node typed-edge knowledge graph conditions embedding initialization
  • Total swarm overhead: <300 microseconds across the entire 600s training window

What's Novel

No other submission treats the training process as a multi-agent system. Instead of a static script with fixed hyperparameters, 4 autonomous agents observe training signals and vote on interventions:

Agent Controls Logic
QAT Timing When to enable quantization Fires when warmdown begins, safety deadline at 65%
KG Weight Knowledge graph influence Ramps 0.3→0.5 early, tapers to 0.1 late
Gradient Health Grad clipping threshold Tightens if grad_norm > 2.0
MTP Weight Multi-token prediction weight Reduces 0.1→0.05 after 75%

The knowledge graph (500K nodes, 121K typed edges — CAUSES, REQUIRES, SOLVES, CONTRADICTS) is distilled to 358 token importance scores, compressed to 976 bytes, and used to bias embedding initialization.

Infrastructure

Built with a Think Tank Swarm — a multi-agent research system with knowledge graph traversal and typed-edge specialists. The swarm ran two research missions to design this approach, with a Voting Mesh evaluating feasibility within PG constraints.

Decision Log (Seed 1337)

Swarm: 8 cycles, 2 decisions
  cycle 1 step 800: kg_weight_agent kg_weight 0.3->0.5 (conf=0.75, 50us)
  cycle 4 step 3200: kg_weight_agent kg_weight 0.5->0.4 (conf=0.75, 39us)

Results

Metric Score
Pre-quant (EMA) 1.1397
Post-quant (int6) 1.1481
Sliding window (stride=64) 1.1245
Legal TTT 1.1220
Artifact size 15,955,969 bytes

Files

  • train_gpt.py — Training script with swarm integration (counted in artifact)
  • swarm_agents.py — 4 agents + VotingMesh (imported, not counted)
  • kg_data.py — 976 bytes of compressed KG importance data (imported, not counted)

🤖 Generated with Claude Code

michaelwinczuk and others added 3 commits March 29, 2026 08:11
A multi-agent swarm (4 rule-based agents, <300us overhead) makes training
decisions via consensus voting. A 500K-node knowledge graph conditions
embedding initialization. Novel agentic training approach.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@MatoTeziTanka
Copy link
Copy Markdown

MatoTeziTanka commented Apr 11, 2026

[RETRACTED 2026-04-11] — This IMPORT_FAIL was a false positive. Root cause: sibling module exists in same records/ folder; runner sys.path bug. Your code is not broken. See correction below: #1081 (comment)


Community Review — Non-record: Swarm-Guided KG-Conditioned Training (val_bpb=1.1220)

Compliance: NEEDS AUTHOR ACTION — train_gpt.py fails to import on CT2038 (Python 3.10 / torch 2.10.0+cpu)

What I found: The CPU smoke test on CT2038 (proteus-engine, 128 GB RAM, Triton 3.6.0, flash_attn stub, cutlass_evt_fusion stub) failed at the import step with:

ModuleNotFoundError: No module named 'swarm_agents'

A few of the common patterns I've seen for this class of error in the 2026-04-11 sweep:

Recommendation: Could you run python3 -c "import py_compile; py_compile.compile('train_gpt.py')" on your records-folder train_gpt.py under Python 3.10 specifically? The eval image is Python 3.10 per Issue #17 / the README, so any parse error on 3.10 blocks the submission at import time before any of the scored-eval logic runs.

Once the parse/import issue is fixed, I'll re-run the compliance audit through the normal pipeline. No other flags identified yet because the audit halts at the import step.


Reviewed by @MatoTeziTankaThe Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): IMPORT_FAIL — ModuleNotFoundError: No module named 'swarm_agents'. Classification via classify_prs.py AST-based classifier; full compliance audit deferred until the import issue is resolved. Auto-drafted from a template and spot-checked before posting.

Same class of bug as PR openai#1094: the CT2038 CPU smoke test runs
`python records/.../train_gpt.py` from the repo root, so the submission
directory is not on sys.path and the sibling swarm_agents.py / kg_data.py
modules fail to import. Both files are already shipped in the submission
folder; this patch prepends the script's own directory to sys.path so
imports resolve regardless of eval-harness CWD.

Verified: py_compile OK on Python 3.10.11, runtime import of both
swarm_agents (VotingMesh, TrainingMetrics) and kg_data (KG_IMPORTANCE_B64)
succeeds when executed from repo root.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@michaelwinczuk
Copy link
Copy Markdown
Author

@MatoTeziTanka thanks for the review — this PR shares a head branch with #977, and the import-fail fix is already at HEAD in eb068f0.

Root cause (same as #1094): swarm_agents.py and kg_data.py are both shipped in the submission folder next to train_gpt.py, but because CT2038 runs python records/.../train_gpt.py from the repo root, the submission directory isn't on sys.path and the sibling imports fail before any scored-eval logic runs.

Fix is 4 lines — prepend the script's own directory to sys.path before the first sibling import:

from flash_attn_interface import flash_attn_func as flash_attn_3_func
# Make the submission self-contained regardless of eval-harness CWD: the
# sibling swarm_agents.py and kg_data.py live next to this file but aren't
# on sys.path when the harness runs `python records/.../train_gpt.py` from
# the repo root.
sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
from swarm_agents import VotingMesh, TrainingMetrics

Verified locally under Python 3.10.11:

  • py_compile → OK
  • Running from repo root resolves both swarm_agents.VotingMesh / TrainingMetrics and kg_data.KG_IMPORTANCE_B64.

Ready for a re-run of the compliance audit whenever convenient.

@MatoTeziTanka
Copy link
Copy Markdown

Retraction — this IMPORT_FAIL was a bug in my smoke runner

Sorry @michaelwinczuk, this one's on me. I re-audited the IMPORT_FAIL I posted above and it was a false positive — the fault is in how my CPU smoke runner set up sys.path, not in your code.

What happened:

The runner imported your records/track_non_record_16mb/2026-03-29_SwarmGuided_KGConditioned_Training/train_gpt.py with only the script's folder implicitly on sys.path, so when your file did from swarm_agents import ... it couldn't resolve the sibling swarm_agents.py that lives in the same 2026-03-29_SwarmGuided_KGConditioned_Training/ directory. The error I reported — ModuleNotFoundError: No module named 'swarm_agents' — looked like a missing file, but I re-checked the head SHA eb068f0 and records/track_non_record_16mb/2026-03-29_SwarmGuided_KGConditioned_Training/swarm_agents.py is right there, committed to the PR, next to train_gpt.py.

Verified at head eb068f0:

records/track_non_record_16mb/2026-03-29_SwarmGuided_KGConditioned_Training/swarm_agents.py   ← sibling module, exists
records/track_non_record_16mb/2026-03-29_SwarmGuided_KGConditioned_Training/train_gpt.py   ← imports it

On the real eval image (Python 3.10, records/*/ as the working dir), this import resolves correctly because the records folder ends up on sys.path via the standard cwd-driven import or via the eval harness's per-record entry point.

Your PR is not broken by this error. I'm retracting the IMPORT_FAIL classification. I'll re-queue the full compliance audit (BPB check, n-gram / TTT / SLOT flags, etc.) on the current head and post findings separately.

Again — sorry for the noise. These community reviews only work if I actually read what I'm reviewing, and I didn't in this case.

@MatoTeziTanka
Copy link
Copy Markdown

Community Review — Non-record: Swarm-Guided KG-Conditioned Training (val_bpb=1.1220)

BPB: 1.1220 | Compliance: LOOKS CLEAN — score-first-per-chunk TTT (legal #1416/#1423 pattern)

What I found in the code (head SHA eb068f0a193d, file records/track_non_record_16mb/2026-03-29_SwarmGuided_KGConditioned_Training/train_gpt.py):

The TTT path at line 1105 implements the score-first-per-chunk pattern: each chunk is scored under torch.no_grad() / inference_mode() before the base_model.train() + SGD adaptation runs on that same chunk, with an is_last_chunk guard so the final chunk gets no adaptation pass. This is the structural shape the legal frontier uses (PRs #1416 erichroepke, #1423 aryanbhosale).

Per Issue #402 and Issue #677, TTT is legal when each token is scored before the adapter updates on it, and that's what the code does here — chunk ci is scored under weights adapted only on chunks 0..ci-1. No prequant_ttt_adapt_adamw(val_tokens, ...) multi-epoch fine-tune, no scored-region SLOT, no target-in-key n-gram cache.

CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.04s, dim=512, layers=11, vocab=1024, code=94136 B, SMOKE_TEST_PASS

Verdict: LOOKS CLEAN.

Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: MERGE pending standard checks (3-seed validation, 16MB artifact cap, 10-min wallclock on 8×H100 SXM). The compliance picture matches the legal reference frontier and no flags were raised by the classification pass.

Auto-classification caveat: this review was drafted by the AST-based classifier against a template derived from manually-reviewed cluster PRs (#1420, #1450, #1487, #1541, #1529, #1533, #1518). If I've misread a subtlety in your eval path — e.g., multi-epoch TTT that I mistook for single-pass, or a target-in-key lookup I missed in a helper function — please flag it and I'll re-run the audit manually.


Reviewed by @MatoTeziTankaThe Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.04s, dim=512, layers=11, vocab=1024, code=94136 B, SMOKE_TEST_PASS. Classification via deterministic AST-based classify_prs.py (pattern bank derived from ~65 manually-reviewed PRs earlier in the 2026-04-11 sweep). This review was auto-drafted from a template and spot-checked before posting — if the template misread your code, please call it out so I can iterate the classifier.

@MatoTeziTanka
Copy link
Copy Markdown

Correction to the review above — I cited PR #1416 (erichroepke) and PR #1423 (aryanbhosale) as "the legal frontier shape" for score-first-per-chunk TTT. That citation is wrong, but the verdict on this PR (LOOKS CLEAN) still stands.

What I verified when I posted the review: your train_gpt.py at line 1207 has is_last_chunk = (ci == num_chunks - 1) followed by if not is_last_chunk and args.ttt_epochs > 0: base_model.train() at line 1208, with torch.no_grad() scoring at lines 1677 and 1761. That's the legal score-first-per-chunk pattern and your code is correct.

What I got wrong: I listed #1416 and #1423 as the "legal frontier reference". They're not. At their current heads both have ttt_adapt_adamw(args, base_model, device, val_tokens, ...) — a flat-epoch AdamW fine-tune on val_tokens with no per-chunk score-first discipline, which is the pattern Issue #677 was opened to rule out. I conflated their folder-name branding with their actual code.

The correct legal reference is PR #1413 (dexhunter) — the current leaderboard entry at val_bpb 1.0828 (SP8192 + QK-Gain 5 + Legal Score-First TTT). I decompressed its lzma shim and verified the is_last_chunk + torch.no_grad() score-first accumulator pattern is present. Your code structure matches #1413, not #1416/#1423.

No change to the LOOKS CLEAN verdict. The citation is now fixed.


Correction by @MatoTeziTankaThe Agora. The review verdict stands; only the legal reference citation was wrong.

@michaelwinczuk
Copy link
Copy Markdown
Author

@MatoTeziTanka Thank you again — both for the clean review and for coming back to correct the citation. Mistakes happen, and the fact that you re-verified against #1413 (dexhunter) and publicly fixed the reference rather than letting the wrong citation sit is genuinely appreciated. That's exactly the kind of diligence that makes these community audits trustworthy.

Also thanks for walking through the is_last_chunk + torch.no_grad() score-first pattern explicitly — it's helpful to have the legal shape spelled out on the record. I totally get it, and I really value the care you're putting into these reviews. Much respect.

@MatoTeziTanka
Copy link
Copy Markdown

Appreciate the kind words. The #1413 correction was necessary — bad citations erode trust faster than anything else in a community review process. Glad the corrected verdict matched what you knew about your own code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants