-
Notifications
You must be signed in to change notification settings - Fork 41
Description
Reminder
- I have read the README and searched the existing issues.
System Info
单机8卡H20
Reproduction
[rank5]: File "/usr/local/lib/python3.10/dist-packages/torch/_ops.py", line 721, in redispatch
[rank5]: return self._handle.redispatch_boxed(keyset, *args, **kwargs)
[rank5]: File "/usr/local/lib/python3.10/dist-packages/torch/_library/custom_ops.py", line 324, in backend_impl
[rank5]: result = self._backend_fns[device_type](*args, **kwargs)
[rank5]: File "/usr/local/lib/python3.10/dist-packages/torch/_compile.py", line 32, in inner
[rank5]: return disable_fn(*args, **kwargs)
[rank5]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 632, in _fn
[rank5]: return fn(*args, **kwargs)
[rank5]: File "/usr/local/lib/python3.10/dist-packages/torch/_library/custom_ops.py", line 367, in wrapped_fn
[rank5]: return fn(*args, **kwargs)
[rank5]: File "/usr/local/lib/python3.10/dist-packages/flash_attn/flash_attn_interface.py", line 96, in _flash_attn_forward
[rank5]: out, softmax_lse, S_dmask, rng_state = flash_attn_gpu.fwd(
[rank5]: RuntimeError: v must have shape (batch_size, seqlen_k, num_heads_k, head_size)
deepspeed src/train.py
--stage sft
--do_train
--model_name_or_path /mnt/tidalfs-idc01/dataset/redaccel/models/Moonlight-16B-A3B-Instruct
--dataset agent_v1.9_am_thinking_mathcode_only
--template deepseek3
--finetuning_type lora
--lora_target all
--output_dir saves/360/moonlight_16bA3b/lora/sft_sp
--overwrite_cache
--overwrite_output_dir
--cutoff_len 1024
--max_samples 1000
--per_device_train_batch_size 1
--gradient_accumulation_steps 8
--lr_scheduler_type cosine
--warmup_ratio 0.0
--logging_steps 1
--save_steps 100
--learning_rate 8.0e-6
--num_train_epochs 2.0
--plot_loss
--deepspeed examples/deepspeed/ds_z3_config.json
--bf16 True
--ddp_timeout 180000000
--sequence_parallel_size 8
--flash_attn fa2
--enable_liger_kernel false
--preprocessing_num_workers 128
--trust_remote_code false
Expected behavior
No response
Others
No response