Non-leaderboard experiment log: dual-space MLP, refine pass, and bpb/loss plateau observations #1986
Chiefpetja
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I did not end up reaching a leaderboard-ready submission, but I wanted to share my experimental branch and notes in case they are useful for others exploring non-standard architectures under the Parameter Golf constraints.
Links
Main branch:
https://github.com/Chiefpetja/parameter-golf/tree/runpod-clean
Main experimental script:
train_gpt_runpod.pyPersonal experiment writeup:
EXPERIMENT_README.mdMain observation
The main observation was that smaller-batch runs could reach around ~1.8 bpb surprisingly quickly, sometimes within roughly 300 steps, while training loss continued to improve afterward without translating into continued bpb gains.
My current hypothesis is that the model may specialize too aggressively on local training structure and then plateau in tokenizer-agnostic compression/generalization.
What I experimented with
This is not a record submission. It is shared as an experiment log / research note.
Beta Was this translation helpful? Give feedback.
All reactions