Add Olmo3 #445

Goekdeniz-Guelmez · 2025-09-10T09:19:51Z

No description provided.

Goekdeniz-Guelmez · 2025-09-10T09:25:53Z

MLX port of PR

dirkgr · 2025-09-29T22:53:13Z

Any progress on this? Is there anything missing?

awni · 2025-09-29T22:57:28Z

Afaik there is no model or config to test this? Would be good to test the model before landing it.

dirkgr · 2025-09-30T00:49:22Z

I can provide you with a model. What do you need?

awni · 2025-09-30T01:00:58Z

Great! Access to a hugging face repo with the model safetensors, config and tokenizer would be ideal.

Goekdeniz-Guelmez · 2025-09-30T07:01:53Z

@dirkgr that would be amazing! @awni I can test it out, and will ping you when its ready to be revived and merged.

2015aroras · 2025-09-30T14:52:36Z

You can use https://huggingface.co/shanearora/2025-sep-a-base-model-with-yarn to get yourself going. The tokenizer is https://huggingface.co/allenai/dolma2-tokenizer. The given model is not instruction-tuned and it's somewhat early in pretraining but you should expect it to produce long rambling continuations if the model is implemented correctly.

2015aroras

This is looking good (though I'm not an mlx dev)! Just some minor corrections.

mlx_lm/models/olmo3.py

Goekdeniz-Guelmez · 2025-09-30T20:38:29Z

Perfect, thanks, I’ll update it tomorrow!

Goekdeniz-Guelmez · 2025-10-01T12:40:57Z

@awni @2015aroras the implementation is finished and can be merged!

training:

Loading Hugging Face dataset mlx-community/wikisql.
Training
Trainable parameters: 0.274% (19.988M/7298.011M)
Starting training..., iters: 100
Calculating loss...: 100%|███████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.13it/s]
Iter 1: Val loss 2.922, Val took 0.892s
Iter 10: Train loss 2.616, Learning Rate 1.000e-05, It/sec 1.278, Tokens/sec 90.705, Trained Tokens 710, Peak mem 15.844 GB
Iter 20: Train loss 1.610, Learning Rate 1.000e-05, It/sec 1.668, Tokens/sec 137.808, Trained Tokens 1536, Peak mem 16.418 GB
Iter 30: Train loss 1.420, Learning Rate 1.000e-05, It/sec 1.536, Tokens/sec 135.627, Trained Tokens 2419, Peak mem 16.426 GB
Iter 40: Train loss 1.679, Learning Rate 1.000e-05, It/sec 1.491, Tokens/sec 128.841, Trained Tokens 3283, Peak mem 16.426 GB
Iter 50: Train loss 1.293, Learning Rate 1.000e-05, It/sec 1.699, Tokens/sec 138.097, Trained Tokens 4096, Peak mem 16.426 GB
Iter 60: Train loss 1.276, Learning Rate 1.000e-05, It/sec 1.621, Tokens/sec 123.694, Trained Tokens 4859, Peak mem 16.426 GB
Iter 70: Train loss 1.442, Learning Rate 1.000e-05, It/sec 1.705, Tokens/sec 145.604, Trained Tokens 5713, Peak mem 16.426 GB
Iter 80: Train loss 1.129, Learning Rate 1.000e-05, It/sec 1.695, Tokens/sec 143.230, Trained Tokens 6558, Peak mem 16.426 GB
Iter 90: Train loss 1.181, Learning Rate 1.000e-05, It/sec 1.623, Tokens/sec 145.760, Trained Tokens 7456, Peak mem 16.426 GB
Calculating loss...: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  3.10it/s]
Iter 100: Val loss 1.081, Val took 0.329s
Iter 100: Train loss 1.378, Learning Rate 1.000e-05, It/sec 1.693, Tokens/sec 126.978, Trained Tokens 8206, Peak mem 16.426 GB

inference

--prompt "Michael Jackson was a "
==========
8 year old boy when he first started singing. He was in a church choir and the director of the church was a man by the name of Joe Jackson. Joe Jackson was a very strict man and he would not let his children do anything that he did not approve of. He was also a very talented musician and he taught his children how to play instruments. Joe Jackson was very proud of his children and he wanted them to be successful. He was also very protective of them and he did not want
==========
Prompt: 5 tokens, 58.940 tokens-per-sec
Generation: 100 tokens, 54.572 tokens-per-sec
Peak memory: 4.249 GB

ACKNOWLEDGMENTS.md

awni

Thanks for the addition! LGTM.

Goekdeniz-Guelmez added 5 commits September 10, 2025 11:19

in. com.

3a29a7e

done

022b616

making it trainable

508abd6

upd. ackn.

3087fc4

format

81fb4ce

Merge branch 'main' into adding-olmo3

feefede

awni closed this Sep 30, 2025

awni reopened this Sep 30, 2025

2015aroras reviewed Sep 30, 2025

View reviewed changes

mlx_lm/models/olmo3.py Outdated Show resolved Hide resolved

mlx_lm/models/olmo3.py Outdated Show resolved Hide resolved

mlx_lm/models/olmo3.py Outdated Show resolved Hide resolved

Goekdeniz-Guelmez added 8 commits October 1, 2025 09:34

Merge branch 'main' into adding-olmo3

7014fe2

make tie_word_embeddings false

e99b4ce

fix index_id number

b972ddc

default layer_types

2cfa41c

nits

7c2fa97

working inference

1c268ff

finish

857a520

finish

8fb0261

format

97b8cd3

dirkgr suggested changes Oct 1, 2025

View reviewed changes

ACKNOWLEDGMENTS.md Outdated Show resolved Hide resolved

nits

12b5623

awni approved these changes Oct 1, 2025

View reviewed changes

comment

b3616b2

awni merged commit 9a4039a into ml-explore:main Oct 1, 2025
4 checks passed

Goekdeniz-Guelmez deleted the adding-olmo3 branch October 2, 2025 07:05

Add Olmo3 #445

Add Olmo3 #445

Uh oh!

Conversation

Goekdeniz-Guelmez commented Sep 10, 2025

Uh oh!

Goekdeniz-Guelmez commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dirkgr commented Sep 29, 2025

Uh oh!

awni commented Sep 29, 2025

Uh oh!

dirkgr commented Sep 30, 2025

Uh oh!

awni commented Sep 30, 2025

Uh oh!

Goekdeniz-Guelmez commented Sep 30, 2025

Uh oh!

2015aroras commented Sep 30, 2025

Uh oh!

2015aroras left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Goekdeniz-Guelmez commented Sep 30, 2025

Uh oh!

Goekdeniz-Guelmez commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

training:

inference

Uh oh!

Uh oh!

awni left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Goekdeniz-Guelmez commented Sep 10, 2025 •

edited

Loading

Goekdeniz-Guelmez commented Oct 1, 2025 •

edited

Loading