Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

如何在只有48G显存的资源下训练? #5

Open
yfangZhang opened this issue Jul 31, 2024 · 0 comments
Open

如何在只有48G显存的资源下训练? #5

yfangZhang opened this issue Jul 31, 2024 · 0 comments

Comments

@yfangZhang
Copy link

1.尝试将batch_size调为32,mini_batch_size调为1,在进行50step训练后,保存权重时仍发生OOM
2.将 model = LlamaForCausalLM.from_pretrained(
base_model,
load_in_8bit=True,
torch_dtype=torch.float32,
device_map="auto",
)改为
model = LlamaForCausalLM.from_pretrained(
base_model,
load_in_8bit=True,
torch_dtype=torch.float16,
device_map="auto",
)会报如下错误:
File "/opt/anaconda3/envs/llm_envs/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in call_impl
return forward_call(*args, **kwargs)
File "/opt/anaconda3/envs/llm_envs/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 711, in forward
KG_infused = tmp.index_copy(0, index_fixed, attn_output)
RuntimeError: index_copy
(): self and source expected to have the same dtype, but got (self) Float and (source) Half
3、如何进行多卡训练?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant