RQZ-Golf v1: Depth recurrence for parameter efficiency by TheCause · Pull Request #54 · openai/parameter-golf

TheCause · 2026-03-19T06:31:46Z

Non-record experimental submission

Approach: Replace some unique layers with a single shared recurrent layer applied K times, saving parameters while increasing effective depth.

Architecture

7 unique layers (encoder/decoder with U-Net skip connections)
1 recurrent layer applied K=3 times with learned iteration embeddings
Effective depth: 10 layers (7 unique + 3 recurrent) vs baseline 9
Residual scaling by 1/sqrt(K) for stability

Key ideas

Depth recurrence: shared weights across K passes saves ~20% parameters
Iteration embeddings: per-pass learned vector (psi_k) for pass-awareness
Test-time compute: increase K at inference (K'>K) for better BPB without changing model size

Status

Preliminary baseline: 1.5283 BPB (1 shard, 1xA100)
RQZ-Golf architecture implemented, not yet benchmarked on full dataset
Requesting compute credits for full evaluation

Theoretical basis

Inspired by Universal Transformers (Dehghani 2019) and Deep Equilibrium Models (Bai 2019).

Non-record experimental submission. Architecture: 7 unique layers + 1 shared recurrent layer (K=3 passes) with iteration embeddings and 1/sqrt(K) scaling. Test-time compute: increase K at inference without changing model size.

TheCause · 2026-04-07T09:42:42Z

Closing this PR. Our depth recurrence findings are independently confirmed and superseded by PR #363 (merged), which documented the same core result (+0.025 BPB degradation) with 35 runs across 8xH100/2xH100/consumer GPUs.

Our additional experiments (E2: 15 runs positional ablation, E3: layer diagnostics on 1x RTX 3090) corroborate that recurrence degrades performance when used as a standalone technique. Meanwhile, PRs #1204 (1.1063 BPB) and #1392 (1.1020 BPB) demonstrate that recurrence does work within a complete stack (11L + MLP3x + XSA + EMA + GPTQ + parallel residuals), confirming the stack-dependency hypothesis.

Full analysis documented internally. Thanks to @evangelinehelsinki for the thorough work on #363.

Add RQZ-Golf v1: depth recurrence for parameter efficiency

944fc30

Non-record experimental submission. Architecture: 7 unique layers + 1 shared recurrent layer (K=3 passes) with iteration embeddings and 1/sqrt(K) scaling. Test-time compute: increase K at inference without changing model size.

0hq added the not ready for review label Mar 19, 2026

TheCause closed this Apr 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RQZ-Golf v1: Depth recurrence for parameter efficiency#54

RQZ-Golf v1: Depth recurrence for parameter efficiency#54
TheCause wants to merge 1 commit intoopenai:mainfrom
TheCause:rqz-golf-v1

TheCause commented Mar 19, 2026

Uh oh!

TheCause commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

TheCause commented Mar 19, 2026

Non-record experimental submission

Architecture

Key ideas

Status

Theoretical basis

Uh oh!

TheCause commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants