You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Detected ZeRO Offload and non-DeepSpeed optimizers: This combination should work as long as the custom optimizer has both CPU and GPU implementation (except LAMB)
***** Running training *****
Num examples = 50,486
Num Epochs = 1
Instantaneous batch size per device = 16
Total train batch size (w. parallel, distributed & accumulation) = 16
Gradient Accumulation steps = 1
Total optimization steps = 1,000
Number of trainable parameters = 2,949,120
0%| | 0/1000 [00:00<?, ?it/s][rank0]: Traceback (most recent call last):
[rank0]: File "finetune.py", line 221, in
[rank0]: trainer.train()
[rank0]: File "/root/anaconda3/envs/minicpm/lib/python3.8/site-packages/transformers/trainer.py", line 1938, in train
[rank0]: return inner_training_loop(
[rank0]: File "/root/anaconda3/envs/minicpm/lib/python3.8/site-packages/transformers/trainer.py", line 2279, in _inner_training_loop
[rank0]: tr_loss_step = self.training_step(model, inputs)
[rank0]: File "/root/anaconda3/envs/minicpm/lib/python3.8/site-packages/transformers/trainer.py", line 3318, in training_step
[rank0]: loss = self.compute_loss(model, inputs)
[rank0]: File "/root/anaconda3/envs/minicpm/lib/python3.8/site-packages/transformers/trainer.py", line 3363, in compute_loss
[rank0]: outputs = model(**inputs)
[rank0]: File "/root/anaconda3/envs/minicpm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/root/anaconda3/envs/minicpm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/root/anaconda3/envs/minicpm/lib/python3.8/site-packages/accelerate/utils/operations.py", line 820, in forward
[rank0]: return model_forward(*args, **kwargs)
[rank0]: File "/root/anaconda3/envs/minicpm/lib/python3.8/site-packages/accelerate/utils/operations.py", line 808, in call
[rank0]: return convert_to_fp32(self.model_forward(*args, **kwargs))
[rank0]: File "/root/anaconda3/envs/minicpm/lib/python3.8/site-packages/torch/amp/autocast_mode.py", line 43, in decorate_autocast
[rank0]: return func(*args, **kwargs)
[rank0]: File "/root/anaconda3/envs/minicpm/lib/python3.8/site-packages/peft/peft_model.py", line 1577, in forward
[rank0]: return self.base_model(
[rank0]: File "/root/anaconda3/envs/minicpm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/root/anaconda3/envs/minicpm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/root/anaconda3/envs/minicpm/lib/python3.8/site-packages/peft/tuners/tuners_utils.py", line 188, in forward
[rank0]: return self.model.forward(*args, **kwargs)
[rank0]: File "/root/.cache/huggingface/modules/transformers_modules/MiniCPM-2B-sft-bf16/modeling_minicpm.py", line 1196, in forward
[rank0]: outputs = self.model(
[rank0]: File "/root/anaconda3/envs/minicpm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/root/anaconda3/envs/minicpm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/root/.cache/huggingface/modules/transformers_modules/MiniCPM-2B-sft-bf16/modeling_minicpm.py", line 1040, in forward
[rank0]: inputs_embeds = self.embed_tokens(input_ids) * self.config.scale_emb
[rank0]: File "/root/anaconda3/envs/minicpm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/root/anaconda3/envs/minicpm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/root/anaconda3/envs/minicpm/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 164, in forward
[rank0]: return F.embedding(
[rank0]: File "/root/anaconda3/envs/minicpm/lib/python3.8/site-packages/torch/nn/functional.py", line 2267, in embedding
[rank0]: return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
[rank0]: RuntimeError: 'weight' must be 2-D
0%| | 0/1000 [00:01<?, ?it/s]
Is there an existing issue ? / 是否已有相关的 issue ?
Describe the bug / 描述这个 bug
执行命令如下。
1 formatted_time=$(date +"%Y%m%d%H%M%S")
2 echo $formatted_time
3
4 export NCCL_P2P_DISABLE=1
5 export NCCL_IB_DISABLE=1
6
7 mlx worker launch python finetune.py
8 --model_name_or_path MiniCPM-2B-sft-bf16
9 --output_dir output/OCNLILoRA/$formatted_time/
10 --train_data_path data/ocnli_public_chatml/train.json
11 --eval_data_path data/ocnli_public_chatml/dev.json
12 --learning_rate 5e-5 --per_device_train_batch_size 16
13 --per_device_eval_batch_size 128 --model_max_length 128 --bf16 --use_lora false
14 --gradient_accumulation_steps 1 --warmup_steps 100
15 --max_steps 1000 --weight_decay 0.01
16 --evaluation_strategy steps --eval_steps 500
17 --save_strategy steps --save_steps 500 --seed 42
18 --log_level info --logging_strategy steps --logging_steps 10
19 --deepspeed configs/ds_config_zero3_offload.json
Detected ZeRO Offload and non-DeepSpeed optimizers: This combination should work as long as the custom optimizer has both CPU and GPU implementation (except LAMB)
***** Running training *****
Num examples = 50,486
Num Epochs = 1
Instantaneous batch size per device = 16
Total train batch size (w. parallel, distributed & accumulation) = 16
Gradient Accumulation steps = 1
Total optimization steps = 1,000
Number of trainable parameters = 2,949,120
0%| | 0/1000 [00:00<?, ?it/s][rank0]: Traceback (most recent call last):
[rank0]: File "finetune.py", line 221, in
[rank0]: trainer.train()
[rank0]: File "/root/anaconda3/envs/minicpm/lib/python3.8/site-packages/transformers/trainer.py", line 1938, in train
[rank0]: return inner_training_loop(
[rank0]: File "/root/anaconda3/envs/minicpm/lib/python3.8/site-packages/transformers/trainer.py", line 2279, in _inner_training_loop
[rank0]: tr_loss_step = self.training_step(model, inputs)
[rank0]: File "/root/anaconda3/envs/minicpm/lib/python3.8/site-packages/transformers/trainer.py", line 3318, in training_step
[rank0]: loss = self.compute_loss(model, inputs)
[rank0]: File "/root/anaconda3/envs/minicpm/lib/python3.8/site-packages/transformers/trainer.py", line 3363, in compute_loss
[rank0]: outputs = model(**inputs)
[rank0]: File "/root/anaconda3/envs/minicpm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/root/anaconda3/envs/minicpm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/root/anaconda3/envs/minicpm/lib/python3.8/site-packages/accelerate/utils/operations.py", line 820, in forward
[rank0]: return model_forward(*args, **kwargs)
[rank0]: File "/root/anaconda3/envs/minicpm/lib/python3.8/site-packages/accelerate/utils/operations.py", line 808, in call
[rank0]: return convert_to_fp32(self.model_forward(*args, **kwargs))
[rank0]: File "/root/anaconda3/envs/minicpm/lib/python3.8/site-packages/torch/amp/autocast_mode.py", line 43, in decorate_autocast
[rank0]: return func(*args, **kwargs)
[rank0]: File "/root/anaconda3/envs/minicpm/lib/python3.8/site-packages/peft/peft_model.py", line 1577, in forward
[rank0]: return self.base_model(
[rank0]: File "/root/anaconda3/envs/minicpm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/root/anaconda3/envs/minicpm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/root/anaconda3/envs/minicpm/lib/python3.8/site-packages/peft/tuners/tuners_utils.py", line 188, in forward
[rank0]: return self.model.forward(*args, **kwargs)
[rank0]: File "/root/.cache/huggingface/modules/transformers_modules/MiniCPM-2B-sft-bf16/modeling_minicpm.py", line 1196, in forward
[rank0]: outputs = self.model(
[rank0]: File "/root/anaconda3/envs/minicpm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/root/anaconda3/envs/minicpm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/root/.cache/huggingface/modules/transformers_modules/MiniCPM-2B-sft-bf16/modeling_minicpm.py", line 1040, in forward
[rank0]: inputs_embeds = self.embed_tokens(input_ids) * self.config.scale_emb
[rank0]: File "/root/anaconda3/envs/minicpm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/root/anaconda3/envs/minicpm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/root/anaconda3/envs/minicpm/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 164, in forward
[rank0]: return F.embedding(
[rank0]: File "/root/anaconda3/envs/minicpm/lib/python3.8/site-packages/torch/nn/functional.py", line 2267, in embedding
[rank0]: return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
[rank0]: RuntimeError: 'weight' must be 2-D
0%| | 0/1000 [00:01<?, ?it/s]
To Reproduce / 如何复现
RuntimeError
Expected behavior / 期望的结果
No response
Screenshots / 截图
No response
Environment / 环境
Additional context / 其他信息
No response
The text was updated successfully, but these errors were encountered: