Skip to content

开启zigzag-ring,训练速度很慢 #68

@xuchanganan

Description

@xuchanganan

Reminder

  • I have read the README and searched the existing issues.

System Info

使用zigzag-ring+qwen3,sp=8,数据量 2700,开sp后,数据集大小变成了 2700 * 8,速度好像也慢了8倍多,本来2小时可以训练完的,现在得20多小时,有没有办法提速,看readme好像qwen3也没法尝试ulysses?

Reproduction

deepspeed --master_port=48765
src/train.py
--stage dpo
--do_train
--model_name_or_path "LLMs/Qwen3-32B"
--dataset dpo_zh_demo
--template qwen3
--finetuning_type full
--pref_beta 0.1
--pref_loss sigmoid
--output_dir output/debug
--cache_dir .cache
--overwrite_cache
--overwrite_output_dir
--cutoff_len 32768
--per_device_train_batch_size 1
--gradient_accumulation_steps 1
--lr_scheduler_type cosine
--warmup_ratio 0.0
--logging_steps 1
--save_steps 2000
--save_strategy steps
--learning_rate 1e-6
--num_train_epochs 3
--plot_loss
--save_only_model True
--deepspeed examples/deepspeed/ds_z3_offload_config.json
--flash_attn fa2
--gradient_checkpointing True
--bf16 True
--ddp_timeout 180000000
--seed 42
--sequence_parallel_size 8

Expected behavior

No response

Others

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions