Non-leaderboard experiment log: dual-space MLP, refine pass, and bpb/loss plateau observations #1986

Chiefpetja · 2026-04-30T15:18:21Z

Chiefpetja
Apr 30, 2026

I did not end up reaching a leaderboard-ready submission, but I wanted to share my experimental branch and notes in case they are useful for others exploring non-standard architectures under the Parameter Golf constraints.

Links

Main branch:
https://github.com/Chiefpetja/parameter-golf/tree/runpod-clean

Main experimental script:
train_gpt_runpod.py

Personal experiment writeup:
EXPERIMENT_README.md

Main observation

The main observation was that smaller-batch runs could reach around ~1.8 bpb surprisingly quickly, sometimes within roughly 300 steps, while training loss continued to improve afterward without translating into continued bpb gains.

My current hypothesis is that the model may specialize too aggressively on local training structure and then plateau in tokenizer-agnostic compression/generalization.

What I experimented with

dual-space MLP using positive/negative ReLU² decomposition
soft parity modulation
delta expansion between positive and negative feature space
base/refine pass structure
cluster/overlay signals for token-local refinement
diagnostics for training dynamics and plateau behavior

This is not a record submission. It is shared as an experiment log / research note.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-leaderboard experiment log: dual-space MLP, refine pass, and bpb/loss plateau observations #1986

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Non-leaderboard experiment log: dual-space MLP, refine pass, and bpb/loss plateau observations #1986

Uh oh!

Chiefpetja Apr 30, 2026

Links

Main observation

What I experimented with

Replies: 0 comments

Chiefpetja
Apr 30, 2026