两张4090,加载数据集反而比一张4090慢了5-6倍 #9693
Unanswered
Sleep-Enough
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
尝试两张4090运行lora,微调qwen2-vl-2b
一张4090,加载数据集,半小时就好了,使用相同的其他配置:
Running tokenizer on dataset (num_proc=12): 100%|██████████| 96470/96470 [37:11<00:00, 43.23 examples/s]
两张4090,加载了2个小时还没加载完毕。
这里我先设置了preprocessing_num_workers: 30
Running tokenizer on dataset (num_proc=30): 81%|████████ | 78240/96470 [2:34:40<24:37, 12.34 examples/s]
随后重新设置preprocessing_num_workers: 12
Running tokenizer on dataset (num_proc=12): 112470 examples [29:49, 15.55 examples/s]
运行特别慢,并且cpu,gpu使用率都很高:
GPU使用率:

不知为何数据集加载使用了GPU,而且只用了一张
CPU:
运行命令:
CUDA_VISIBLE_DEVICES=0,1 FORCE_TORCHRUN=1 llamafactory-cli train /root/autodl-tmp/zhou_vlm/yaml_dir/four_class_train/train.yaml
train.yaml
deepspeed
Beta Was this translation helpful? Give feedback.
All reactions