-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support gpu for metal m1 (BUG AND FEATURE) #121
Comments
Can you try installing pytorch from the website? There is an option for mac there. |
I don't have a Mac computer to experiment with, but maybe someone else can help with this. |
So basically the install instructions would probably be
If you already have the wrong version of pytorch installed uninstall it before running the pytorch install command for mac |
I've actually been playing around with this quite a bit on a previous iteration of oobabooga. There's a few speed bumps with getting it running at the moment. |
I have a 32GB m2 pro and would love to get this working. so far it only runs in CPU mode and performs fine. 4chanGPT works very fast, LLaMA runs quite slow and 13B doesn't really load (I think I would need even more ram) should I try and tell it to use MPS backend and see which parts aren't implemented? |
At this point, I think it's best to wait. There's work being done by PyTorch and without some of those underlying basic functions, it will be playing Whack-a-Mole trying to get this to work. But, if you want to give it a go, I'd love to know what you alter in the code to make progress. |
Well, I got it hooked up using MPS backend. There's more than a few caveats and I'm not sure why some things are happening. It is a memory pig on even the 1.3b Pyg. I don't know if that's because of how it's allocating memory between CPU and GPU, but it definitely is using the GPU to generate responses. If anyone is interested, I can post the code and my settings to share. I'm working on a 32 GB M2 Mac mini. If you're on anything less than 32GB of RAM, this will likely just not work at all. I'd love to try it on a high end Studio with 128 GB of RAM to see how it fares. |
Yeah I'll consider an even beefier m3 with 128gb of ram but for now the whole machine learning community seems kneecapped by the pytorch implementation being half done (especially compared to CUDA) if you drop the code you have so far here or on your github page that would be massively helpful in getting started! |
I've attached the files I made alterations to based off yesterday's git clone of the oob repo. You should be able to just replace those. You'll need the PyTorch nightly build rather than stable. I'd also suggest the 2.7B Pyg model to start with and see how you do. It may generate gibberish or it might generate an amazing few blocks of conversation depending on what magic happens in the GPU/RAM. Make sure your topk is set to 15 or less. For some reason, MPS can't handle anything over 15. Once you have the updated PyTorch and copied the files to the right spots, run the server.py with a --mps flag and it should start using both CPU and GPU. It causes a seg fault on Intel Mac, but runs on M2. |
https://github.com/ggerganov/llama.cpp looks like someone figured out LLaMA 7B on apple silicon in case anyone here is interested! |
Looks like they managed to hook things up properly. You'll need 13.3 beta to make it work, but getting great results on my machine now. #393 |
@GundamWing Thank you for sharing your work! The default repository, even today, does not seem to support Also, what checkout hash did you use for the changes? Not surprisingly, dropping them in as-is to the current repo does not work. A .patch file might be a good approach, especially if you can pull / rebase and then upload the output of I have all of this stuff running on a 3090, but am quite keen to try out the MPS version on my M1 Max 64GB to load larger models as the memory usage stuff is figured out (and/or contribute to bug reports for the platform). |
Afaik the zip contains parts of the MPS code working for Pygmalion It has not been tested for LLaMA and other models. I haven't checked since my initial reply |
Hello, everyone i like a lot your work, this is amazing to use our own computer to train and have our own ai to use instead of chat gpt, I have an ask pls can you help me to make my mac m1 run this using metal it crash everytime i use the gpu to run the model it tells me that torch doesn't work with cuda and it is true torch.cuda.is_available() == false
I really want to make it works please help me.
The text was updated successfully, but these errors were encountered: