You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all, thanks for setting up the nicely formatted code for fine-tuning LLaMa2 in 4-bits.
I was able to follow all the steps and was able to setup training of the model (as shown in your tutorial/ ipython notebook): https://www.philschmid.de/instruction-tune-llama-2
Your tutorial mentions that the training time on a g5.2x large without flash attention is around 3hours. However, running your code shows training time as 40hours! Can you help narrow down the difference/ issue?
I am attaching some screen-shots. On a high-level I suspect there is a bottleneck in data-loader (since the code is only using 1 cpu core), I did try adding the num_workers flag in TrainingArguments but that did not help. GPU utilization seems decent.
The text was updated successfully, but these errors were encountered:
mlscientist2
changed the title
Instruction tuning of LLama2 is significantly slower compared to claimed 3 hours fine-tuning time on A10G.
Instruction tuning of LLama2 is significantly slower compared to documented 3 hours fine-tuning time on A10G.
Sep 21, 2023
Hi,
First of all, thanks for setting up the nicely formatted code for fine-tuning LLaMa2 in 4-bits.
I was able to follow all the steps and was able to setup training of the model (as shown in your tutorial/ ipython notebook): https://www.philschmid.de/instruction-tune-llama-2
Your tutorial mentions that the training time on a g5.2x large without flash attention is around 3hours. However, running your code shows training time as 40hours! Can you help narrow down the difference/ issue?
I am attaching some screen-shots. On a high-level I suspect there is a bottleneck in data-loader (since the code is only using 1 cpu core), I did try adding the num_workers flag in TrainingArguments but that did not help. GPU utilization seems decent.
Any thoughts @philschmid ?
The text was updated successfully, but these errors were encountered: