-
Notifications
You must be signed in to change notification settings - Fork 41
Description
Reminder
- I have read the README and searched the existing issues.
System Info
使用zigzag-ring+qwen3,sp=8,数据量 2700,开sp后,数据集大小变成了 2700 * 8,速度好像也慢了8倍多,本来2小时可以训练完的,现在得20多小时,有没有办法提速,看readme好像qwen3也没法尝试ulysses?
Reproduction
deepspeed --master_port=48765
src/train.py
--stage dpo
--do_train
--model_name_or_path "LLMs/Qwen3-32B"
--dataset dpo_zh_demo
--template qwen3
--finetuning_type full
--pref_beta 0.1
--pref_loss sigmoid
--output_dir output/debug
--cache_dir .cache
--overwrite_cache
--overwrite_output_dir
--cutoff_len 32768
--per_device_train_batch_size 1
--gradient_accumulation_steps 1
--lr_scheduler_type cosine
--warmup_ratio 0.0
--logging_steps 1
--save_steps 2000
--save_strategy steps
--learning_rate 1e-6
--num_train_epochs 3
--plot_loss
--save_only_model True
--deepspeed examples/deepspeed/ds_z3_offload_config.json
--flash_attn fa2
--gradient_checkpointing True
--bf16 True
--ddp_timeout 180000000
--seed 42
--sequence_parallel_size 8
Expected behavior
No response
Others
No response