[Submission] Random LinearMaps + LoRA Adapters#1295
[Submission] Random LinearMaps + LoRA Adapters#1295austinluk wants to merge 3 commits intoopenai:mainfrom
Conversation
|
Great to see someone else exploring this direction! I've been working on the same wishlist item and just submitted my findings in PR #1301. TL;DR: Your "Potential Improvements" section nails it — selective freezing is the key. I tested both full freeze + adapters (your approach) and selective freeze (freeze only MLP gate+up, learn attention fully) on FineWeb data. The results are dramatic:
Increasing adapter rank from 8→32 barely helps — the bottleneck is frozen attention weights that can't learn relational patterns, not adapter capacity. The fix: freeze only the MLP gate and up projections (feature expansion — where Johnson-Lindenstrauss applies naturally), learn everything else. This preserves the model's ability to learn attention patterns while getting artifact savings from frozen random projections. On the artifact-normalized comparison (the real competition question), a larger frozen model beats a smaller fully-trained model at the same artifact budget:
The frozen model has 4× more effective params at 3× the artifact cost — and it wins by 11.5%. Full details + code in PR #1301. Would be interesting to see if your 12L 768d backbone with selective freeze (learn attention, freeze only MLP gate+up) closes the gap further. |
|
Related work: I've been running extensive experiments on selective freeze (freezing gate+up projections only, 37% frozen) as an alternative to your full freeze + LoRA approach. Key finding: selective freeze (37% frozen) dramatically outperforms full freeze + LoRA (94% frozen) — the LoRA approach has an ~80% quality gap while selective freeze shows -2.1% improvement over baseline on H100. I also developed "progressive freeze" — train all weights fully for N steps, then freeze mid-training. This outperforms random-init freeze by 1.3 percentage points on FineWeb sp4096. Full results with 7 architecture variants across H100 and A40: PR #1301. |
No description provided.