Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sft+freeze训练internlm2-base-7b报错,RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn #4101

Open
1 task done
1737686924 opened this issue Jun 6, 2024 · 0 comments
Labels
npu This problem is related to NPU devices pending This problem is yet to be addressed

Comments

@1737686924
Copy link

1737686924 commented Jun 6, 2024

Reminder

  • I have read the README and searched the existing issues.

System Info

ASCEND_RT_VISIBLE_DEVICES=0,1 deepspeed --num_gpus 2 src/train.py
--deepspeed examples/deepspeed/ds_z3_offload_config.json
--stage sft
--do_train true
--model_name_or_path /data/applications/lmd-formal/backend/BaseModels/internlm2-base-7b
--dataset identity,alpaca_en_demo
--template intern2
--finetuning_type freeze
--freeze_trainable_layers 8
--freeze_trainable_modules all
--use_llama_pro true
--output_dir saves/internlm2-base-7b/sft/freeze
--overwrite_cache
--overwrite_output_dir
--cutoff_len 1024
--preprocessing_num_workers 16
--per_device_train_batch_size 2
--per_device_eval_batch_size 2
--gradient_accumulation_steps 4
--lr_scheduler_type cosine
--logging_steps 1
--save_steps 100
--eval_steps 100
--evaluation_strategy steps
--load_best_model_at_end
--learning_rate 1e-4
--num_train_epochs 3.0
--val_size 0.001
--ddp_timeout 180000000
--plot_loss
--fp16

sft+freeze训练internlm2-base-7b报错,RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

transformers=0.41.2
torch=2.2.0
torch-npu=2.2.0

Reproduction

sft+freeze训练internlm2-base-7b报错,RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Expected behavior

No response

Others

No response

@hiyouga hiyouga added the pending This problem is yet to be addressed label Jun 6, 2024
@hiyouga hiyouga added the npu This problem is related to NPU devices label Jun 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
npu This problem is related to NPU devices pending This problem is yet to be addressed
Projects
None yet
Development

No branches or pull requests

2 participants