Skip to content

Non-record: KNN Hidden State Retrieval — Scale Deception from Weak to Strong Models (8xH100)#1259

Open
himanshudongre wants to merge 1 commit intoopenai:mainfrom
himanshudongre:nonrecord/knn-scale-deception-8xh100
Open

Non-record: KNN Hidden State Retrieval — Scale Deception from Weak to Strong Models (8xH100)#1259
himanshudongre wants to merge 1 commit intoopenai:mainfrom
himanshudongre:nonrecord/knn-scale-deception-8xh100

Conversation

@himanshudongre
Copy link
Copy Markdown

Summary

Novel eval-time technique: KNN Hidden State Retrieval. Zero artifact cost, score-first protocol.

Key finding: helps weak models (-2 to -4%), HURTS strong competition-quality models (+1.5%).

Model Quality KNN Effect
Weak (1xH100, 2K steps) -1.57% (helps)
Strong (8xH100, 5665 steps, SOTA) +1.47% (hurts)

Second confirmed case of scale deception in this competition (first: SSM in PR #1013/PR #1227).

8xH100 Results

  • Neural roundtrip BPB: 1.1533
  • KNN BPB: 1.1702 (+1.47% worse)
  • KNN eval time: 168s (fits in 600s)
  • Artifact: 15.8MB (KNN adds 0 bytes)

Implication

Eval-time prediction mixing has negative returns on strong models. Techniques that adapt the model (TTT) may work better than mixing external distributions.

Full scaling analysis in README.

… Strong Models

Novel eval-time technique validated on 8xH100. Helps weak models (-2 to -4%), hurts competition-quality models (+1.5%). Definitive scale deception finding.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant