Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to Reproduce ALMA-7b-LoRA Performance, Seeking Assistance #58

Open
liangyingshao opened this issue Aug 28, 2024 · 6 comments
Open

Comments

@liangyingshao
Copy link

Thank you for your excellent work.

While fine-tuning the ALMA-7b-Pretrain model and testing with the checkpoint you provided, I was unable to reproduce the performance of ALMA-7b-LoRA as described in the paper. I would appreciate any guidance or suggestions you could offer.
I used the code, data, and scripts provided in this repository (including runs/parallel_ft_lora.sh and evals/alma_7b_lora.sh), with a training batch size of 256 and four V100 GPUs.

Please feel free to ask if you need more details about my experiments.

@fe1ixxu
Copy link
Owner

fe1ixxu commented Aug 28, 2024

Thanks for your interest! Could you please provide the transformer, accelerate, and deepspeed version? And also, could you please provide the results you got?

@liangyingshao
Copy link
Author

Thanks for your interest! Could you please provide the transformer, accelerate, and deepspeed version? And also, could you please provide the results you got?
I use transformers==4.33.0, accelerate==0.33.0, deepspeed==0.14.4
Here are the results of mine:
25D3A016-73F7-4863-A017-5C4493B89CC7
D6B9FEE4-BB06-48A7-B5F1-7CA524DFC84F

@fe1ixxu
Copy link
Owner

fe1ixxu commented Aug 28, 2024

It looks like the results you got are very close to the checkpoint we released under the same virtual env. I suspect the main issue could come from the version mismatch.

Please try uninstall transformers, deepspeed and accelerate and reinstall them by

  1. pip install git+https://github.com/fe1ixxu/ALMA.git@alma-r-hf
  2. pip3 install deepspeed==0.13.1
  3. pip install accelerate==0.27.2
  4. pip install peft==0.5.0
  5. Re-evaluate the checkpoint

Hope this is helpful

@liangyingshao
Copy link
Author

It looks like the results you got are very close to the checkpoint we released under the same virtual env. I suspect the main issue could come from the version mismatch.

Please try uninstall transformers, deepspeed and accelerate and reinstall them by

  1. pip install git+https://github.com/fe1ixxu/ALMA.git@alma-r-hf
  2. pip3 install deepspeed==0.13.1
  3. pip install accelerate==0.27.2
  4. pip install peft==0.5.0
  5. Re-evaluate the checkpoint

Hope this is helpful

Thank you for your suggestion! I will try it out and provide feedback in this issue.

@liangyingshao
Copy link
Author

It looks like the results you got are very close to the checkpoint we released under the same virtual env. I suspect the main issue could come from the version mismatch.

Please try uninstall transformers, deepspeed and accelerate and reinstall them by

  1. pip install git+https://github.com/fe1ixxu/ALMA.git@alma-r-hf
  2. pip3 install deepspeed==0.13.1
  3. pip install accelerate==0.27.2
  4. pip install peft==0.5.0
  5. Re-evaluate the checkpoint

Hope this is helpful

I try your suggestion, and it does lead to some performance improvement.
image
image
However, the reproduced performance still falls short of what the paper reports. Could you advise on any other potential factors that might affect the performance?
Any further suggestions for improvement would be greatly appreciated.

Repository owner deleted a comment Aug 28, 2024
@github-staff github-staff deleted a comment from liangyingshao Aug 28, 2024
@liangyingshao
Copy link
Author

By the way, could you please provide the versions of the datasets, tokenizer, and huggingface-hub that you are using?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants