Skip to content

Conversation

@Goekdeniz-Guelmez
Copy link
Contributor

No description provided.

@Goekdeniz-Guelmez
Copy link
Contributor Author

Goekdeniz-Guelmez commented Sep 10, 2025

MLX port of PR

@dirkgr
Copy link

dirkgr commented Sep 29, 2025

Any progress on this? Is there anything missing?

@awni
Copy link
Member

awni commented Sep 29, 2025

Afaik there is no model or config to test this? Would be good to test the model before landing it.

@dirkgr
Copy link

dirkgr commented Sep 30, 2025

I can provide you with a model. What do you need?

@awni
Copy link
Member

awni commented Sep 30, 2025

Great! Access to a hugging face repo with the model safetensors, config and tokenizer would be ideal.

@awni awni closed this Sep 30, 2025
@awni awni reopened this Sep 30, 2025
@Goekdeniz-Guelmez
Copy link
Contributor Author

@dirkgr that would be amazing! @awni I can test it out, and will ping you when its ready to be revived and merged.

@2015aroras
Copy link

You can use https://huggingface.co/shanearora/2025-sep-a-base-model-with-yarn to get yourself going. The tokenizer is https://huggingface.co/allenai/dolma2-tokenizer. The given model is not instruction-tuned and it's somewhat early in pretraining but you should expect it to produce long rambling continuations if the model is implemented correctly.

Copy link

@2015aroras 2015aroras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good (though I'm not an mlx dev)! Just some minor corrections.

@Goekdeniz-Guelmez
Copy link
Contributor Author

Perfect, thanks, I’ll update it tomorrow!

@Goekdeniz-Guelmez
Copy link
Contributor Author

Goekdeniz-Guelmez commented Oct 1, 2025

@awni @2015aroras the implementation is finished and can be merged!

training:

Loading Hugging Face dataset mlx-community/wikisql.
Training
Trainable parameters: 0.274% (19.988M/7298.011M)
Starting training..., iters: 100
Calculating loss...: 100%|███████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.13it/s]
Iter 1: Val loss 2.922, Val took 0.892s
Iter 10: Train loss 2.616, Learning Rate 1.000e-05, It/sec 1.278, Tokens/sec 90.705, Trained Tokens 710, Peak mem 15.844 GB
Iter 20: Train loss 1.610, Learning Rate 1.000e-05, It/sec 1.668, Tokens/sec 137.808, Trained Tokens 1536, Peak mem 16.418 GB
Iter 30: Train loss 1.420, Learning Rate 1.000e-05, It/sec 1.536, Tokens/sec 135.627, Trained Tokens 2419, Peak mem 16.426 GB
Iter 40: Train loss 1.679, Learning Rate 1.000e-05, It/sec 1.491, Tokens/sec 128.841, Trained Tokens 3283, Peak mem 16.426 GB
Iter 50: Train loss 1.293, Learning Rate 1.000e-05, It/sec 1.699, Tokens/sec 138.097, Trained Tokens 4096, Peak mem 16.426 GB
Iter 60: Train loss 1.276, Learning Rate 1.000e-05, It/sec 1.621, Tokens/sec 123.694, Trained Tokens 4859, Peak mem 16.426 GB
Iter 70: Train loss 1.442, Learning Rate 1.000e-05, It/sec 1.705, Tokens/sec 145.604, Trained Tokens 5713, Peak mem 16.426 GB
Iter 80: Train loss 1.129, Learning Rate 1.000e-05, It/sec 1.695, Tokens/sec 143.230, Trained Tokens 6558, Peak mem 16.426 GB
Iter 90: Train loss 1.181, Learning Rate 1.000e-05, It/sec 1.623, Tokens/sec 145.760, Trained Tokens 7456, Peak mem 16.426 GB
Calculating loss...: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  3.10it/s]
Iter 100: Val loss 1.081, Val took 0.329s
Iter 100: Train loss 1.378, Learning Rate 1.000e-05, It/sec 1.693, Tokens/sec 126.978, Trained Tokens 8206, Peak mem 16.426 GB

inference

--prompt "Michael Jackson was a "
==========
8 year old boy when he first started singing. He was in a church choir and the director of the church was a man by the name of Joe Jackson. Joe Jackson was a very strict man and he would not let his children do anything that he did not approve of. He was also a very talented musician and he taught his children how to play instruments. Joe Jackson was very proud of his children and he wanted them to be successful. He was also very protective of them and he did not want
==========
Prompt: 5 tokens, 58.940 tokens-per-sec
Generation: 100 tokens, 54.572 tokens-per-sec
Peak memory: 4.249 GB

Copy link
Member

@awni awni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the addition! LGTM.

@awni awni merged commit 9a4039a into ml-explore:main Oct 1, 2025
4 checks passed
@Goekdeniz-Guelmez Goekdeniz-Guelmez deleted the adding-olmo3 branch October 2, 2025 07:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants