在npu上，对大模型用zero3进行全参微调，初始换参数很大，是什么原因造成的？ #4272

fjw1049 · 2024-06-14T01:46:35Z

Reminder

I have read the README and searched the existing issues.

System Info

Reproduction

model_name_or_path: /data1/mixstral

method

stage: sft
do_train: true
finetuning_type: full

ddp

ddp_timeout: 180000000
deepspeed: examples/deepspeed/ds_z3_offload_config.json

dataset

dataset: sft_zh_data,alpaca_zh_demo
template: llama3
cutoff_len: 1024
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16

output

output_dir: saves/full
logging_steps: 30
save_steps: 1000
plot_loss: true
overwrite_output_dir: true

train

per_device_train_batch_size: 1
gradient_accumulation_steps: 2
learning_rate: 1.0e-4
num_train_epochs: 2.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
fp16: true
flash_attn: auto

eval

val_size: 0.1
per_device_eval_batch_size: 1
eval_strategy: steps
eval_steps: 500

Expected behavior

No response

Others

No response

github-actions bot added the pending This problem is yet to be addressed label Jun 14, 2024

hiyouga added the npu This problem is related to NPU devices label Jun 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

在npu上，对大模型用zero3进行全参微调，初始换参数很大，是什么原因造成的？ #4272

在npu上，对大模型用zero3进行全参微调，初始换参数很大，是什么原因造成的？ #4272

fjw1049 commented Jun 14, 2024 •

edited

Loading

在npu上，对大模型用zero3进行全参微调，初始换参数很大，是什么原因造成的？ #4272

在npu上，对大模型用zero3进行全参微调，初始换参数很大，是什么原因造成的？ #4272

Comments

fjw1049 commented Jun 14, 2024 • edited Loading

Reminder

System Info

Reproduction

method

ddp

dataset

output

train

eval

Expected behavior

Others

fjw1049 commented Jun 14, 2024 •

edited

Loading