Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions records/track_non_record_16mb/rqz_golf_v1/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# RQZ-Golf v1 — Depth Recurrence for Parameter Golf

## Approach

Replace some unique layers with a single shared recurrent layer applied K times.
This saves parameters (shared weights) while increasing effective depth.

### Architecture
- 7 unique layers (3 encoder + 4 decoder with U-Net skip connections)
- 1 recurrent layer applied K=3 times with iteration embeddings
- Effective depth: 10 layers (7 unique + 3 recurrent) vs baseline 9

### Key ideas
1. **Depth recurrence**: last block shares weights across K passes, saving ~3M params
2. **Iteration embeddings**: learned per-pass vector so the layer knows which pass it's on
3. **Stability scaling**: residual scaled by 1/sqrt(K) to prevent amplitude explosion
4. **Test-time compute**: can increase K at inference (K'=6, 8, ...) for better BPB

### Theoretical basis
Inspired by Universal Transformers (Dehghani 2019) and Deep Equilibrium Models (Bai 2019).
Each recurrent pass reconstructs the residual of the previous pass in latent space.

## Config
```
NUM_UNIQUE_LAYERS=7
NUM_RECURRENT_PASSES=3
# All other params same as baseline
```

## Author
Regis Rigaud (@TheCause)
9 changes: 9 additions & 0 deletions records/track_non_record_16mb/rqz_golf_v1/submission.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
{
"name": "RQZ-Golf v1",
"github_id": "TheCause",
"val_bpb": null,
"summary": "Depth recurrence: 7 unique layers + 1 shared recurrent layer (K=3 passes) with iteration embeddings and 1/sqrt(K) scaling. Test-time compute: increase K at inference. Preliminary baseline: 1.5283 BPB (1 shard, 1xA100).",
"date": "2026-03-19",
"track": "non_record",
"status": "experimental"
}
Loading