-
Notifications
You must be signed in to change notification settings - Fork 27.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transformers PaliGemma evaluate and compute_loss fail with tensors/device errors #35990
Comments
No this did not fix the problem, I've upgraded transformers, also had to upgrade accelerate and then got the following error: Python Version: 3.12.7 | packaged by conda-forge | (main, Oct 4 2024, 16:05:46) [GCC 13.3.0]
|
Now it's a new error 😆 and related to |
Just to clarify, the My trainer and config are the following, can you let me know if it works for you?
|
Can you also share the |
Sure:
(Should be harmless though, hopefully) |
@BlGene thanks, I can confirm that the bug is reproducible, and it fails even with models that use default Not sure if accelerate team has a workaround for that already, let's wait for SunMarc |
System Info
My versions are:
Who can help?
@ArthurZucker , @amyeroberts, @qubvel
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
I'm loading a PaliGemma2 model
google/paligemma2-3b-pt-224
and trying to fine-tune using Trainer/Seq2SeqTrainer. If I add evaluation, this fails. After doing some digging, I found that this only happens if the model is in evaluate mode.I've worked around it by mokey-patching compute_loss_context_manager as follows:
(Bonus question: Is this safe to do, or will I train on the test set?)
Error:
Error of Evaluator (bottom half of file): https://gist.github.com/BlGene/607c7bee450e03835aa2bf0d2fd2959a
Expected behavior
Training runs with evaluation enabled.
The text was updated successfully, but these errors were encountered: