[Submission] Random LinearMaps + LoRA Adapters by austinluk · Pull Request #1295 · openai/parameter-golf

austinluk · 2026-04-03T10:11:14Z

No description provided.

himanshudongre · 2026-04-03T14:28:03Z

Great to see someone else exploring this direction! I've been working on the same wishlist item and just submitted my findings in PR #1301.

TL;DR: Your "Potential Improvements" section nails it — selective freezing is the key.

I tested both full freeze + adapters (your approach) and selective freeze (freeze only MLP gate+up, learn attention fully) on FineWeb data. The results are dramatic:

Approach	Frozen%	Best CE (FineWeb)	vs Baseline
Full freeze + VeRA rank=8	94%	2.3388	+80% gap
Full freeze + VeRA rank=16	94%	2.3288	+79% gap
Full freeze + VeRA rank=32	94%	2.3221	+79% gap
Selective freeze (gate+up only)	37%	1.2792	-1.5% BETTER than baseline

Increasing adapter rank from 8→32 barely helps — the bottleneck is frozen attention weights that can't learn relational patterns, not adapter capacity.

The fix: freeze only the MLP gate and up projections (feature expansion — where Johnson-Lindenstrauss applies naturally), learn everything else. This preserves the model's ability to learn attention patterns while getting artifact savings from frozen random projections.

On the artifact-normalized comparison (the real competition question), a larger frozen model beats a smaller fully-trained model at the same artifact budget:

Config	CE (FineWeb)	Artifact
6L 192d fully-trained + dropout	3.2531	2.4MB
12L 384d selective freeze + dropout	2.8803	7.3MB

The frozen model has 4× more effective params at 3× the artifact cost — and it wins by 11.5%.

Full details + code in PR #1301. Would be interesting to see if your 12L 768d backbone with selective freeze (learn attention, freeze only MLP gate+up) closes the gap further.

himanshudongre · 2026-04-04T12:15:02Z

Related work: I've been running extensive experiments on selective freeze (freezing gate+up projections only, 37% frozen) as an alternative to your full freeze + LoRA approach.

Key finding: selective freeze (37% frozen) dramatically outperforms full freeze + LoRA (94% frozen) — the LoRA approach has an ~80% quality gap while selective freeze shows -2.1% improvement over baseline on H100.

I also developed "progressive freeze" — train all weights fully for N steps, then freeze mid-training. This outperforms random-init freeze by 1.3 percentage points on FineWeb sp4096.

Full results with 7 architecture variants across H100 and A40: PR #1301.

austinluk added 3 commits April 3, 2026 00:35

changed readme.

7ba5111

Add Random Linear Maps + LoRA Adapters submission

c4a33ee

Add Random Linear Maps + LoRA Adapters submission + fixed README

77cec21

himanshudongre mentioned this pull request Apr 3, 2026

Non-Record: Learning Adapters on Random Linear Maps — Selective Freeze, Progressive Freeze, and 7 Architecture Variants (H100 + A40 Validated) #1301

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Submission] Random LinearMaps + LoRA Adapters#1295

[Submission] Random LinearMaps + LoRA Adapters#1295
austinluk wants to merge 3 commits intoopenai:mainfrom
austinluk:submission/random-linear-maps-lora

austinluk commented Apr 3, 2026

Uh oh!

himanshudongre commented Apr 3, 2026

Uh oh!

himanshudongre commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

austinluk commented Apr 3, 2026

Uh oh!

himanshudongre commented Apr 3, 2026

Uh oh!

himanshudongre commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants