You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
1.尝试将batch_size调为32,mini_batch_size调为1,在进行50step训练后,保存权重时仍发生OOM
2.将 model = LlamaForCausalLM.from_pretrained(
base_model,
load_in_8bit=True,
torch_dtype=torch.float32,
device_map="auto",
)改为
model = LlamaForCausalLM.from_pretrained(
base_model,
load_in_8bit=True,
torch_dtype=torch.float16,
device_map="auto",
)会报如下错误:
File "/opt/anaconda3/envs/llm_envs/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in call_impl
return forward_call(*args, **kwargs)
File "/opt/anaconda3/envs/llm_envs/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 711, in forward
KG_infused = tmp.index_copy(0, index_fixed, attn_output)
RuntimeError: index_copy(): self and source expected to have the same dtype, but got (self) Float and (source) Half
3、如何进行多卡训练?
The text was updated successfully, but these errors were encountered:
1.尝试将batch_size调为32,mini_batch_size调为1,在进行50step训练后,保存权重时仍发生OOM
2.将 model = LlamaForCausalLM.from_pretrained(
base_model,
load_in_8bit=True,
torch_dtype=torch.float32,
device_map="auto",
)改为
model = LlamaForCausalLM.from_pretrained(
base_model,
load_in_8bit=True,
torch_dtype=torch.float16,
device_map="auto",
)会报如下错误:
File "/opt/anaconda3/envs/llm_envs/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in call_impl
return forward_call(*args, **kwargs)
File "/opt/anaconda3/envs/llm_envs/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 711, in forward
KG_infused = tmp.index_copy(0, index_fixed, attn_output)
RuntimeError: index_copy(): self and source expected to have the same dtype, but got (self) Float and (source) Half
3、如何进行多卡训练?
The text was updated successfully, but these errors were encountered: