【NPU】GLM-4-9B-Chat PPO 出错 #4135

hunterhome · 2024-06-07T02:22:27Z

Reminder

I have read the README and searched the existing issues.

System Info

[2024-06-07 10:17:14,980] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to npu (auto detect)

llamafactory version: 0.7.2.dev0
Platform: Linux-5.10.0-198.0.0.111.oe2203sp3.aarch64-aarch64-with-glibc2.34
Python version: 3.10.14
PyTorch version: 2.2.0 (NPU)
Transformers version: 4.41.2
Datasets version: 2.19.2
Accelerate version: 0.30.1
PEFT version: 0.11.1
TRL version: 0.9.3
NPU type: Ascend910B2
CANN version: 8.0.RC2.alpha001
DeepSpeed version: 0.13.2

Reproduction

llamafactory-cli train \
    --stage ppo \
    --do_train True \
    --model_name_or_path ZhipuAI/glm-4-9b-chat \
    --preprocessing_num_workers 16 \
    --finetuning_type lora \
    --template glm4 \
    --flash_attn auto \
    --dataset_dir data \
    --dataset disc-law-sft-triplet \
    --cutoff_len 8192 \
    --learning_rate 5e-05 \
    --num_train_epochs 3.0 \
    --max_samples 100000 \
    --per_device_train_batch_size 1 \
    --gradient_accumulation_steps 8 \
    --lr_scheduler_type cosine \
    --max_grad_norm 1.0 \
    --logging_steps 5 \
    --save_steps 100 \
    --warmup_steps 0 \
    --optim adamw_torch \
    --packing False \
    --report_to none \
    --output_dir saves/GLM-4-9B-Chat/lora/train_2024-06-07-09-44-37 \
    --bf16 True \
    --plot_loss True \
    --ddp_timeout 180000000 \
    --include_num_input_tokens_seen True \
    --adapter_name_or_path saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03 \
    --lora_rank 8 \
    --lora_alpha 16 \
    --lora_dropout 0 \
    --lora_target all \
    --reward_model saves/GLM-4-9B-Chat/lora/train_2024-06-07-09-37-06 \
    --reward_model_type lora \
    --ppo_score_norm True \
    --top_k 0 \
    --top_p 0.9

### Expected behavior

_No response_

### Others

[2024-06-07 10:10:55,970] torch.distributed.run: [WARNING] 
[2024-06-07 10:10:55,970] torch.distributed.run: [WARNING] *****************************************
[2024-06-07 10:10:55,970] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
[2024-06-07 10:10:55,970] torch.distributed.run: [WARNING] *****************************************
[2024-06-07 10:11:03,623] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to npu (auto detect)
[2024-06-07 10:11:03,661] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to npu (auto detect)
[2024-06-07 10:11:03,705] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to npu (auto detect)
[2024-06-07 10:11:03,818] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to npu (auto detect)
[2024-06-07 10:11:03,836] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to npu (auto detect)
[2024-06-07 10:11:03,905] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to npu (auto detect)
[2024-06-07 10:11:03,955] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to npu (auto detect)
[2024-06-07 10:11:03,991] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to npu (auto detect)
06/07/2024 10:11:17 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.
06/07/2024 10:11:17 - INFO - llamafactory.hparams.parser - Process rank: 0, device: npu:0, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16
2024-06-07 10:11:17,434 - modelscope - INFO - PyTorch version 2.2.0 Found.
2024-06-07 10:11:17,436 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer
2024-06-07 10:11:17,490 - modelscope - INFO - Loading done! Current index file version is 1.14.0, with md5 ceb78a2ac746b5506819a47dbbf0e37c and a total number of 976 components indexed
06/07/2024 10:11:17 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.
06/07/2024 10:11:17 - INFO - llamafactory.hparams.parser - Process rank: 7, device: npu:7, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16
06/07/2024 10:11:17 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.
06/07/2024 10:11:17 - INFO - llamafactory.hparams.parser - Process rank: 4, device: npu:4, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16
06/07/2024 10:11:17 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.
06/07/2024 10:11:17 - INFO - llamafactory.hparams.parser - Process rank: 6, device: npu:6, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16
06/07/2024 10:11:17 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.
06/07/2024 10:11:17 - INFO - llamafactory.hparams.parser - Process rank: 2, device: npu:2, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16
06/07/2024 10:11:17 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.
06/07/2024 10:11:17 - INFO - llamafactory.hparams.parser - Process rank: 1, device: npu:1, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16
06/07/2024 10:11:18 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.
06/07/2024 10:11:18 - INFO - llamafactory.hparams.parser - Process rank: 5, device: npu:5, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16
[INFO|tokenization_utils_base.py:2106] 2024-06-07 10:11:18,235 >> loading file tokenizer.model
[INFO|tokenization_utils_base.py:2106] 2024-06-07 10:11:18,235 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2106] 2024-06-07 10:11:18,236 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2106] 2024-06-07 10:11:18,236 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2106] 2024-06-07 10:11:18,236 >> loading file tokenizer.json
06/07/2024 10:11:18 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.
06/07/2024 10:11:18 - INFO - llamafactory.hparams.parser - Process rank: 3, device: npu:3, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16
[WARNING|logging.py:314] 2024-06-07 10:11:19,288 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
06/07/2024 10:11:19 - INFO - llamafactory.data.template - Add <|user|>,<|observation|> to stop words.
06/07/2024 10:11:19 - INFO - llamafactory.data.loader - Loading dataset disc-law-sft-triplet.json...
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
06/07/2024 10:11:19 - INFO - llamafactory.data.template - Add <|user|>,<|observation|> to stop words.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
06/07/2024 10:11:19 - INFO - llamafactory.data.template - Add <|user|>,<|observation|> to stop words.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
06/07/2024 10:11:20 - INFO - llamafactory.data.template - Add <|user|>,<|observation|> to stop words.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
06/07/2024 10:11:20 - INFO - llamafactory.data.template - Add <|user|>,<|observation|> to stop words.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
06/07/2024 10:11:20 - INFO - llamafactory.data.template - Add <|user|>,<|observation|> to stop words.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
06/07/2024 10:11:20 - INFO - llamafactory.data.template - Add <|user|>,<|observation|> to stop words.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
06/07/2024 10:11:22 - INFO - llamafactory.data.template - Add <|user|>,<|observation|> to stop words.
06/07/2024 10:11:26 - INFO - llamafactory.data.loader - Loading dataset disc-law-sft-triplet.json...
06/07/2024 10:11:26 - INFO - llamafactory.data.loader - Loading dataset disc-law-sft-triplet.json...
06/07/2024 10:11:26 - INFO - llamafactory.data.loader - Loading dataset disc-law-sft-triplet.json...
06/07/2024 10:11:26 - INFO - llamafactory.data.loader - Loading dataset disc-law-sft-triplet.json...
06/07/2024 10:11:26 - INFO - llamafactory.data.loader - Loading dataset disc-law-sft-triplet.json...
06/07/2024 10:11:26 - INFO - llamafactory.data.loader - Loading dataset disc-law-sft-triplet.json...
06/07/2024 10:11:26 - INFO - llamafactory.data.loader - Loading dataset disc-law-sft-triplet.json...
Running tokenizer on dataset (num_proc=16): 100%|█████████████████████████████████████████████████████████████████| 16000/16000 [00:38<00:00, 416.91 examples/s]
input_ids:
[151331, 151333, 151336, 198, 100698, 103309, 101138, 3837, 113094, 110590, 105177, 99312, 8994, 98379, 106170, 117921, 3837, 98546, 20, 98334, 21, 98424, 99146, 98385, 99082, 117225, 3837, 108592, 98696, 105181, 103757, 117537, 98380, 99043, 100451, 102337, 103273, 106156, 118828, 98798, 105181, 101376, 98314, 117055, 98550, 109534, 3837, 98459, 101247, 105079, 98634, 123900, 98324, 117537, 98595, 101676, 111602, 99916, 98760, 101642, 98335, 3837, 108592, 98696, 105181, 98453, 105529, 109290, 98396, 98381, 103941, 98798, 105181, 99195, 118894, 3837, 103078, 98711, 109534, 105079, 98322, 107801, 98993, 114731, 100129, 101242, 3837, 98547, 110664, 99999, 105181, 109487, 98365, 3837, 108592, 98696, 105181, 98701, 107801, 98993, 114731, 103941, 98798, 105181, 98314, 99527, 113995, 3837, 99704, 124187, 116767, 101806, 98583, 109695, 98829, 110960, 99416, 121952, 109055, 112246, 117442, 101242, 3837, 117442, 101242, 100048, 98875, 121424, 99054, 99893, 98649, 105862, 98433, 112998, 99108, 120250, 106318, 100035, 1773, 98365, 98379, 118828, 98798, 105181, 105420, 3837, 101113, 99131, 100588, 98634, 100059, 98493, 108592, 98696, 105181, 98607, 103278, 98344, 98817, 1773, 98379, 103171, 3837, 109534, 108634, 99532, 102492, 20, 11, 124206, 13, 24, 98575, 3837, 109055, 108634, 99532, 102492, 16, 11, 19, 101474, 13, 102486, 98575, 3837, 117442, 101242, 108634, 99532, 102492, 17, 11, 24, 99951, 13, 99082, 98575, 3837, 99054, 99893, 98649, 106508, 99108, 120250, 108634, 99532, 102492, 24, 11, 102114, 21, 98575, 3837, 111086, 101832, 99532, 106234, 102492, 98729, 11, 101135, 17, 13, 21, 98575, 1773, 101409, 100867, 3837, 108592, 98696, 105181, 98319, 119626, 98322, 100297, 98479, 110416, 3837, 118828, 98798, 105181, 5373, 100547, 105181, 5373, 104464, 105181, 110065, 3837, 110664, 99999, 105181, 98314, 98697, 98856, 3837, 100059, 111413, 99565, 98990, 3837, 116550, 99304, 3837, 103171, 102622, 98560, 3837, 108592, 98696, 105181, 98314, 127251, 98381, 102070, 98539, 98404, 102243, 105483, 3837, 106144, 102919, 1773, 151337]
inputs:
[gMASK] <sop> <|user|> 
基于下列案件，推测可能的判决结果。
经审理查明，2015年6月21日15时许，被告人白某某在大东区小河沿公交车站乘坐被害人张某某驾驶的133路公交车，当车辆行驶至沈阳市大东区东陵西路26号附近时，被告人白某某因未能下车而与司机张某某发生争执，并在该公交车行驶中用手拉拽档杆，被证人韩某某拉开后，被告人白某某又用手拉拽司机张某某的右胳膊，导致该车失控撞向右侧马路边停放的轿车和一个路灯杆，路灯杆折断后将福锅记炖品店的牌匾砸坏。后经被害人张某某报警，公安人员赶至现场将被告人白某某传唤到案。经鉴定，公交车受损价值人民币5,189.9元，轿车受损价值人民币1,449.57元，路灯杆受损价值人民币2,927.15元，福锅记饭店牌匾受损价值人民币9,776元，本案损失价值共计人民币19,342.6元。上述事实，被告人白某某在庭审中亦无异议，被害人张某某、朱某某、詹某某陈述，证人韩某某的证言，现场勘察笔录，视听资料，鉴定结论书，被告人白某某的供述与辩解等证据证实，足以认定。 <|assistant|>
[INFO|configuration_utils.py:731] 2024-06-07 10:12:08,107 >> loading configuration file /root/.cache/modelscope/hub/ZhipuAI/glm-4-9b-chat/config.json
[INFO|configuration_utils.py:731] 2024-06-07 10:12:08,110 >> loading configuration file /root/.cache/modelscope/hub/ZhipuAI/glm-4-9b-chat/config.json
[INFO|configuration_utils.py:796] 2024-06-07 10:12:08,111 >> Model config ChatGLMConfig {
  "_name_or_path": "/root/.cache/modelscope/hub/ZhipuAI/glm-4-9b-chat",
  "add_bias_linear": false,
  "add_qkv_bias": true,
  "apply_query_key_layer_scaling": true,
  "apply_residual_connection_post_layernorm": false,
  "architectures": [
    "ChatGLMModel"
  ],
  "attention_dropout": 0.0,
  "attention_softmax_in_fp32": true,
  "auto_map": {
    "AutoConfig": "configuration_chatglm.ChatGLMConfig",
    "AutoModel": "modeling_chatglm.ChatGLMForConditionalGeneration",
    "AutoModelForCausalLM": "modeling_chatglm.ChatGLMForConditionalGeneration",
    "AutoModelForSeq2SeqLM": "modeling_chatglm.ChatGLMForConditionalGeneration",
    "AutoModelForSequenceClassification": "modeling_chatglm.ChatGLMForSequenceClassification"
  },
  "bias_dropout_fusion": true,
  "classifier_dropout": null,
  "eos_token_id": [
    151329,
    151336,
    151338
  ],
  "ffn_hidden_size": 13696,
  "fp32_residual_connection": false,
  "hidden_dropout": 0.0,
  "hidden_size": 4096,
  "kv_channels": 128,
  "layernorm_epsilon": 1.5625e-07,
  "model_type": "chatglm",
  "multi_query_attention": true,
  "multi_query_group_num": 2,
  "num_attention_heads": 32,
  "num_hidden_layers": 40,
  "num_layers": 40,
  "original_rope": true,
  "pad_token_id": 151329,
  "padded_vocab_size": 151552,
  "post_layer_norm": true,
  "rmsnorm": true,
  "rope_ratio": 500,
  "seq_length": 131072,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.41.2",
  "use_cache": true,
  "vocab_size": 151552
}

[INFO|modeling_utils.py:3471] 2024-06-07 10:12:08,159 >> loading weights file /root/.cache/modelscope/hub/ZhipuAI/glm-4-9b-chat/model.safetensors.index.json
[INFO|modeling_utils.py:1519] 2024-06-07 10:12:08,160 >> Instantiating ChatGLMForConditionalGeneration model under default dtype torch.bfloat16.
[INFO|configuration_utils.py:962] 2024-06-07 10:12:08,162 >> Generate config GenerationConfig {
  "eos_token_id": [
    151329,
    151336,
    151338
  ],
  "pad_token_id": 151329
}

Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:06<00:00,  1.45it/s]
[INFO|modeling_utils.py:4280] 2024-06-07 10:12:15,224 >> All model checkpoint weights were used when initializing ChatGLMForConditionalGeneration.

[INFO|modeling_utils.py:4288] 2024-06-07 10:12:15,224 >> All the weights of ChatGLMForConditionalGeneration were initialized from the model checkpoint at /root/.cache/modelscope/hub/ZhipuAI/glm-4-9b-chat.
If your task is similar to the task the model of the checkpoint was trained on, you can already use ChatGLMForConditionalGeneration for predictions without further training.
[INFO|modeling_utils.py:3797] 2024-06-07 10:12:15,231 >> Generation config file not found, using a generation config created from the model config.
06/07/2024 10:12:15 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
06/07/2024 10:12:15 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation.
06/07/2024 10:12:15 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
06/07/2024 10:12:15 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
Loading checkpoint shards:  60%|██████████████████████████████████████████████████████████▏                                      | 6/10 [00:04<00:02,  1.35it/s]06/07/2024 10:12:15 - INFO - llamafactory.model.adapter - Loaded adapter(s): saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03
06/07/2024 10:12:15 - INFO - llamafactory.model.model_utils.valuehead - Provided path (saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03) does not contain value head weights: saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03 does not appear to have a file named value_head.bin. Checkout 'https://huggingface.co/saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03/tree/None' for available files..
06/07/2024 10:12:15 - INFO - llamafactory.model.model_utils.valuehead - Ignore the above message if you are not resuming the training of a value head model.
06/07/2024 10:12:15 - INFO - llamafactory.model.loader - trainable params: 21180417 || all params: 9421131777 || trainable%: 0.2248
Loading checkpoint shards:  70%|███████████████████████████████████████████████████████████████████▉                             | 7/10 [00:05<00:02,  1.39it/s]06/07/2024 10:12:16 - INFO - llamafactory.train.trainer_utils - Loaded adapter weights of reward model from saves/GLM-4-9B-Chat/lora/train_2024-06-07-09-37-06
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:06<00:00,  1.51it/s]
06/07/2024 10:12:17 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
06/07/2024 10:12:17 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation.
06/07/2024 10:12:17 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
06/07/2024 10:12:17 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:07<00:00,  1.42it/s]
06/07/2024 10:12:18 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
06/07/2024 10:12:18 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation.
06/07/2024 10:12:18 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
06/07/2024 10:12:18 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:07<00:00,  1.36it/s]
06/07/2024 10:12:18 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
06/07/2024 10:12:18 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation.
06/07/2024 10:12:18 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
06/07/2024 10:12:18 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:07<00:00,  1.35it/s]
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:07<00:00,  1.34it/s]
06/07/2024 10:12:18 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
06/07/2024 10:12:18 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation.
06/07/2024 10:12:18 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
06/07/2024 10:12:18 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
06/07/2024 10:12:18 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
06/07/2024 10:12:18 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation.
06/07/2024 10:12:18 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
06/07/2024 10:12:18 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
Loading checkpoint shards:  90%|███████████████████████████████████████████████████████████████████████████████████████▎         | 9/10 [00:07<00:00,  1.19it/s]06/07/2024 10:12:18 - INFO - llamafactory.model.adapter - Loaded adapter(s): saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03
06/07/2024 10:12:18 - INFO - llamafactory.model.model_utils.valuehead - Provided path (saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03) does not contain value head weights: saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03 does not appear to have a file named value_head.bin. Checkout 'https://huggingface.co/saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03/tree/None' for available files..
06/07/2024 10:12:18 - INFO - llamafactory.model.model_utils.valuehead - Ignore the above message if you are not resuming the training of a value head model.
06/07/2024 10:12:18 - INFO - llamafactory.model.loader - trainable params: 21180417 || all params: 9421131777 || trainable%: 0.2248
06/07/2024 10:12:19 - INFO - llamafactory.model.adapter - Loaded adapter(s): saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03
06/07/2024 10:12:19 - INFO - llamafactory.model.model_utils.valuehead - Provided path (saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03) does not contain value head weights: saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03 does not appear to have a file named value_head.bin. Checkout 'https://huggingface.co/saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03/tree/None' for available files..
06/07/2024 10:12:19 - INFO - llamafactory.model.model_utils.valuehead - Ignore the above message if you are not resuming the training of a value head model.
06/07/2024 10:12:19 - INFO - llamafactory.model.loader - trainable params: 21180417 || all params: 9421131777 || trainable%: 0.2248
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:08<00:00,  1.19it/s]
06/07/2024 10:12:19 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
06/07/2024 10:12:19 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation.
06/07/2024 10:12:19 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
06/07/2024 10:12:19 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
06/07/2024 10:12:19 - INFO - llamafactory.model.adapter - Loaded adapter(s): saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03
06/07/2024 10:12:19 - INFO - llamafactory.model.model_utils.valuehead - Provided path (saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03) does not contain value head weights: saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03 does not appear to have a file named value_head.bin. Checkout 'https://huggingface.co/saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03/tree/None' for available files..
06/07/2024 10:12:19 - INFO - llamafactory.model.model_utils.valuehead - Ignore the above message if you are not resuming the training of a value head model.
06/07/2024 10:12:19 - INFO - llamafactory.train.trainer_utils - Loaded adapter weights of reward model from saves/GLM-4-9B-Chat/lora/train_2024-06-07-09-37-06
06/07/2024 10:12:19 - INFO - llamafactory.model.loader - trainable params: 21180417 || all params: 9421131777 || trainable%: 0.2248
06/07/2024 10:12:19 - INFO - llamafactory.train.trainer_utils - Loaded adapter weights of reward model from saves/GLM-4-9B-Chat/lora/train_2024-06-07-09-37-06
06/07/2024 10:12:19 - INFO - llamafactory.model.adapter - Loaded adapter(s): saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03
06/07/2024 10:12:19 - INFO - llamafactory.model.adapter - Loaded adapter(s): saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03
06/07/2024 10:12:19 - INFO - llamafactory.model.model_utils.valuehead - Provided path (saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03) does not contain value head weights: saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03 does not appear to have a file named value_head.bin. Checkout 'https://huggingface.co/saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03/tree/None' for available files..
06/07/2024 10:12:19 - INFO - llamafactory.model.model_utils.valuehead - Ignore the above message if you are not resuming the training of a value head model.
06/07/2024 10:12:19 - INFO - llamafactory.model.model_utils.valuehead - Provided path (saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03) does not contain value head weights: saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03 does not appear to have a file named value_head.bin. Checkout 'https://huggingface.co/saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03/tree/None' for available files..
06/07/2024 10:12:19 - INFO - llamafactory.model.model_utils.valuehead - Ignore the above message if you are not resuming the training of a value head model.
06/07/2024 10:12:19 - INFO - llamafactory.model.loader - trainable params: 21180417 || all params: 9421131777 || trainable%: 0.2248
06/07/2024 10:12:19 - INFO - llamafactory.model.loader - trainable params: 21180417 || all params: 9421131777 || trainable%: 0.2248
06/07/2024 10:12:20 - INFO - llamafactory.train.trainer_utils - Loaded adapter weights of reward model from saves/GLM-4-9B-Chat/lora/train_2024-06-07-09-37-06
06/07/2024 10:12:20 - INFO - llamafactory.train.trainer_utils - Loaded adapter weights of reward model from saves/GLM-4-9B-Chat/lora/train_2024-06-07-09-37-06
Loading checkpoint shards:  90%|███████████████████████████████████████████████████████████████████████████████████████▎         | 9/10 [00:09<00:01,  1.02s/it]06/07/2024 10:12:20 - INFO - llamafactory.model.adapter - Loaded adapter(s): saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03
06/07/2024 10:12:20 - INFO - llamafactory.train.trainer_utils - Loaded adapter weights of reward model from saves/GLM-4-9B-Chat/lora/train_2024-06-07-09-37-06
06/07/2024 10:12:20 - INFO - llamafactory.model.model_utils.valuehead - Provided path (saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03) does not contain value head weights: saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03 does not appear to have a file named value_head.bin. Checkout 'https://huggingface.co/saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03/tree/None' for available files..
06/07/2024 10:12:20 - INFO - llamafactory.model.model_utils.valuehead - Ignore the above message if you are not resuming the training of a value head model.
06/07/2024 10:12:20 - INFO - llamafactory.model.loader - trainable params: 21180417 || all params: 9421131777 || trainable%: 0.2248
06/07/2024 10:12:21 - INFO - llamafactory.train.trainer_utils - Loaded adapter weights of reward model from saves/GLM-4-9B-Chat/lora/train_2024-06-07-09-37-06
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:10<00:00,  1.05s/it]
06/07/2024 10:12:21 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
06/07/2024 10:12:21 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation.
06/07/2024 10:12:21 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
06/07/2024 10:12:21 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
06/07/2024 10:12:22 - INFO - llamafactory.model.adapter - Loaded adapter(s): saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03
06/07/2024 10:12:22 - INFO - llamafactory.model.model_utils.valuehead - Provided path (saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03) does not contain value head weights: saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03 does not appear to have a file named value_head.bin. Checkout 'https://huggingface.co/saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03/tree/None' for available files..
06/07/2024 10:12:22 - INFO - llamafactory.model.model_utils.valuehead - Ignore the above message if you are not resuming the training of a value head model.
06/07/2024 10:12:22 - INFO - llamafactory.model.loader - trainable params: 21180417 || all params: 9421131777 || trainable%: 0.2248
06/07/2024 10:12:23 - INFO - llamafactory.train.trainer_utils - Loaded adapter weights of reward model from saves/GLM-4-9B-Chat/lora/train_2024-06-07-09-37-06
06/07/2024 10:12:23 - INFO - llamafactory.train.ppo.trainer - ***** Running training *****
06/07/2024 10:12:23 - INFO - llamafactory.train.ppo.trainer -   Num examples = 16000
06/07/2024 10:12:23 - INFO - llamafactory.train.ppo.trainer -   Num Epochs = 3.0
06/07/2024 10:12:23 - INFO - llamafactory.train.ppo.trainer -   Instantaneous batch size per device = 1
06/07/2024 10:12:23 - INFO - llamafactory.train.ppo.trainer -   Total train batch size (w. parallel, buffer, distributed & accumulation) = 64
06/07/2024 10:12:23 - INFO - llamafactory.train.ppo.trainer -   Gradient Accumulation steps = 8
06/07/2024 10:12:23 - INFO - llamafactory.train.ppo.trainer -   Num optimization epochs per batch = 4
06/07/2024 10:12:23 - INFO - llamafactory.train.ppo.trainer -   Total training steps = 750
06/07/2024 10:12:23 - INFO - llamafactory.train.ppo.trainer -   Number of trainable parameters = 21180417
  0%|                                                                                                                                   | 0/750 [00:00<?, ?it/s]/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/transformers/generation/logits_process.py:1591: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at torch_npu/csrc/aten/common/TensorFactories.cpp:74.)
  scores_processed = torch.where(scores != scores, 0.0, scores)
/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/transformers/generation/logits_process.py:1591: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at torch_npu/csrc/aten/common/TensorFactories.cpp:74.)
  scores_processed = torch.where(scores != scores, 0.0, scores)
/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/transformers/generation/logits_process.py:1591: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at torch_npu/csrc/aten/common/TensorFactories.cpp:74.)
  scores_processed = torch.where(scores != scores, 0.0, scores)
/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/transformers/generation/logits_process.py:1591: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at torch_npu/csrc/aten/common/TensorFactories.cpp:74.)
  scores_processed = torch.where(scores != scores, 0.0, scores)
/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/transformers/generation/logits_process.py:1591: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at torch_npu/csrc/aten/common/TensorFactories.cpp:74.)
  scores_processed = torch.where(scores != scores, 0.0, scores)
/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/transformers/generation/logits_process.py:1591: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at torch_npu/csrc/aten/common/TensorFactories.cpp:74.)
  scores_processed = torch.where(scores != scores, 0.0, scores)
/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/transformers/generation/logits_process.py:1591: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at torch_npu/csrc/aten/common/TensorFactories.cpp:74.)
  scores_processed = torch.where(scores != scores, 0.0, scores)
/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/transformers/generation/logits_process.py:1591: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at torch_npu/csrc/aten/common/TensorFactories.cpp:74.)
  scores_processed = torch.where(scores != scores, 0.0, scores)
[rank1]:[W VariableFallbackKernel.cpp:51] Warning: CAUTION: The operator 'aten::isin.Tensor_Tensor_out' is not currently supported on the NPU backend and will fall back to run on the CPU. This may have performance implications. (function npu_cpu_fallback)
[rank0]:[W VariableFallbackKernel.cpp:51] Warning: CAUTION: The operator 'aten::isin.Tensor_Tensor_out' is not currently supported on the NPU backend and will fall back to run on the CPU. This may have performance implications. (function npu_cpu_fallback)
[rank7]:[W VariableFallbackKernel.cpp:51] Warning: CAUTION: The operator 'aten::isin.Tensor_Tensor_out' is not currently supported on the NPU backend and will fall back to run on the CPU. This may have performance implications. (function npu_cpu_fallback)
[rank2]:[W VariableFallbackKernel.cpp:51] Warning: CAUTION: The operator 'aten::isin.Tensor_Tensor_out' is not currently supported on the NPU backend and will fall back to run on the CPU. This may have performance implications. (function npu_cpu_fallback)
[rank6]:[W VariableFallbackKernel.cpp:51] Warning: CAUTION: The operator 'aten::isin.Tensor_Tensor_out' is not currently supported on the NPU backend and will fall back to run on the CPU. This may have performance implications. (function npu_cpu_fallback)
[rank3]:[W VariableFallbackKernel.cpp:51] Warning: CAUTION: The operator 'aten::isin.Tensor_Tensor_out' is not currently supported on the NPU backend and will fall back to run on the CPU. This may have performance implications. (function npu_cpu_fallback)
[rank4]:[W VariableFallbackKernel.cpp:51] Warning: CAUTION: The operator 'aten::isin.Tensor_Tensor_out' is not currently supported on the NPU backend and will fall back to run on the CPU. This may have performance implications. (function npu_cpu_fallback)
[rank5]:[W VariableFallbackKernel.cpp:51] Warning: CAUTION: The operator 'aten::isin.Tensor_Tensor_out' is not currently supported on the NPU backend and will fall back to run on the CPU. This may have performance implications. (function npu_cpu_fallback)
  0%|                                                                                                                                   | 0/750 [00:14<?, ?it/s]
Traceback (most recent call last):
  File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 9, in <module>
    launch()
  File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 5, in launch
    run_exp()
  File "/data/LLaMA-Factory/src/llamafactory/train/tuner.py", line 37, in run_exp
    run_ppo(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/workflow.py", line 59, in run_ppo
    ppo_trainer.ppo_train(resume_from_checkpoint=training_args.resume_from_checkpoint)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 220, in ppo_train
    mini_batch_rewards = self.get_rewards(mini_batch_queries, mini_batch_responses)
  File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 387, in get_rewards
    rewards.append(values[i, end_index].float().detach().cpu())  # use fp32 type
IndexError: index 213 is out of bounds for dimension 1 with size 1
Traceback (most recent call last):
  File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 9, in <module>
    launch()
  File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 5, in launch
    run_exp()
  File "/data/LLaMA-Factory/src/llamafactory/train/tuner.py", line 37, in run_exp
    run_ppo(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/workflow.py", line 59, in run_ppo
    ppo_trainer.ppo_train(resume_from_checkpoint=training_args.resume_from_checkpoint)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 220, in ppo_train
    mini_batch_rewards = self.get_rewards(mini_batch_queries, mini_batch_responses)
  File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 387, in get_rewards
    rewards.append(values[i, end_index].float().detach().cpu())  # use fp32 type
IndexError: index 67 is out of bounds for dimension 1 with size 1
Traceback (most recent call last):
  File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 9, in <module>
    launch()
  File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 5, in launch
    run_exp()
  File "/data/LLaMA-Factory/src/llamafactory/train/tuner.py", line 37, in run_exp
    run_ppo(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/workflow.py", line 59, in run_ppo
    ppo_trainer.ppo_train(resume_from_checkpoint=training_args.resume_from_checkpoint)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 220, in ppo_train
    mini_batch_rewards = self.get_rewards(mini_batch_queries, mini_batch_responses)
  File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 387, in get_rewards
    rewards.append(values[i, end_index].float().detach().cpu())  # use fp32 type
IndexError: index 379 is out of bounds for dimension 1 with size 1
Traceback (most recent call last):
  File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 9, in <module>
    launch()
  File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 5, in launch
    run_exp()
  File "/data/LLaMA-Factory/src/llamafactory/train/tuner.py", line 37, in run_exp
    run_ppo(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/workflow.py", line 59, in run_ppo
Traceback (most recent call last):
  File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 9, in <module>
    ppo_trainer.ppo_train(resume_from_checkpoint=training_args.resume_from_checkpoint)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 220, in ppo_train
    mini_batch_rewards = self.get_rewards(mini_batch_queries, mini_batch_responses)
  File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    launch()
  File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 5, in launch
    return func(*args, **kwargs)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 387, in get_rewards
    run_exp()
  File "/data/LLaMA-Factory/src/llamafactory/train/tuner.py", line 37, in run_exp
    run_ppo(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/workflow.py", line 59, in run_ppo
    rewards.append(values[i, end_index].float().detach().cpu())  # use fp32 type
IndexError: index 390 is out of bounds for dimension 1 with size 1
    ppo_trainer.ppo_train(resume_from_checkpoint=training_args.resume_from_checkpoint)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 220, in ppo_train
    mini_batch_rewards = self.get_rewards(mini_batch_queries, mini_batch_responses)
  File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 387, in get_rewards
    rewards.append(values[i, end_index].float().detach().cpu())  # use fp32 type
IndexError: index 408 is out of bounds for dimension 1 with size 1
Traceback (most recent call last):
  File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 9, in <module>
    launch()
  File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 5, in launch
    run_exp()
  File "/data/LLaMA-Factory/src/llamafactory/train/tuner.py", line 37, in run_exp
    run_ppo(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/workflow.py", line 59, in run_ppo
    ppo_trainer.ppo_train(resume_from_checkpoint=training_args.resume_from_checkpoint)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 220, in ppo_train
    mini_batch_rewards = self.get_rewards(mini_batch_queries, mini_batch_responses)
  File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 387, in get_rewards
    rewards.append(values[i, end_index].float().detach().cpu())  # use fp32 type
IndexError: index 499 is out of bounds for dimension 1 with size 1
Traceback (most recent call last):
  File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 9, in <module>
    launch()
  File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 5, in launch
    run_exp()
  File "/data/LLaMA-Factory/src/llamafactory/train/tuner.py", line 37, in run_exp
    run_ppo(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/workflow.py", line 59, in run_ppo
    ppo_trainer.ppo_train(resume_from_checkpoint=training_args.resume_from_checkpoint)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 220, in ppo_train
    mini_batch_rewards = self.get_rewards(mini_batch_queries, mini_batch_responses)
  File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 387, in get_rewards
    rewards.append(values[i, end_index].float().detach().cpu())  # use fp32 type
IndexError: index 501 is out of bounds for dimension 1 with size 1
Traceback (most recent call last):
  File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 9, in <module>
    launch()
  File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 5, in launch
    run_exp()
  File "/data/LLaMA-Factory/src/llamafactory/train/tuner.py", line 37, in run_exp
    run_ppo(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/workflow.py", line 59, in run_ppo
    ppo_trainer.ppo_train(resume_from_checkpoint=training_args.resume_from_checkpoint)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 220, in ppo_train
    mini_batch_rewards = self.get_rewards(mini_batch_queries, mini_batch_responses)
  File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 387, in get_rewards
    rewards.append(values[i, end_index].float().detach().cpu())  # use fp32 type
IndexError: index 488 is out of bounds for dimension 1 with size 1
[2024-06-07 10:12:46,085] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 2227860 closing signal SIGTERM
[2024-06-07 10:12:46,085] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 2227861 closing signal SIGTERM
[2024-06-07 10:12:46,086] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 2227862 closing signal SIGTERM
[2024-06-07 10:12:46,086] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 2227863 closing signal SIGTERM
[2024-06-07 10:12:46,086] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 2227864 closing signal SIGTERM
[2024-06-07 10:12:46,086] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 2227865 closing signal SIGTERM
[2024-06-07 10:12:46,451] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 2227858) of binary: /data/anaconda3/envs/llama_factory/bin/python
Traceback (most recent call last):
  File "/data/anaconda3/envs/llama_factory/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 347, in wrapper
    return f(*args, **kwargs)
  File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/distributed/run.py", line 812, in main
    run(args)
  File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/distributed/run.py", line 803, in run
    elastic_launch(
  File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 135, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 268, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
/data/LLaMA-Factory/src/llamafactory/launcher.py FAILED
------------------------------------------------------------
Failures:
[1]:
  time      : 2024-06-07_10:12:46
  host      : localhost.localdomain
  rank      : 1 (local_rank: 1)
  exitcode  : 1 (pid: 2227859)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-06-07_10:12:46
  host      : localhost.localdomain
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 2227858)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

The text was updated successfully, but these errors were encountered:

hiyouga · 2024-06-07T11:26:22Z

把这个变量改成 False 试试？

LLaMA-Factory/src/llamafactory/train/ppo/trainer.py

Line 135 in 8bf9da6

    
           self.is_chatglm_model = getattr(unwrapped_model.config, "model_type", None) == "chatglm"

1737686924 · 2024-06-10T03:46:43Z

Reminder

I have read the README and searched the existing issues.

System Info

[2024-06-07 10:17:14,980] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to npu (auto detect)

llamafactory version: 0.7.2.dev0
Platform: Linux-5.10.0-198.0.0.111.oe2203sp3.aarch64-aarch64-with-glibc2.34
Python version: 3.10.14
PyTorch version: 2.2.0 (NPU)
Transformers version: 4.41.2
Datasets version: 2.19.2
Accelerate version: 0.30.1
PEFT version: 0.11.1
TRL version: 0.9.3
NPU type: Ascend910B2
CANN version: 8.0.RC2.alpha001
DeepSpeed version: 0.13.2

Reproduction

llamafactory-cli train \
    --stage ppo \
    --do_train True \
    --model_name_or_path ZhipuAI/glm-4-9b-chat \
    --preprocessing_num_workers 16 \
    --finetuning_type lora \
    --template glm4 \
    --flash_attn auto \
    --dataset_dir data \
    --dataset disc-law-sft-triplet \
    --cutoff_len 8192 \
    --learning_rate 5e-05 \
    --num_train_epochs 3.0 \
    --max_samples 100000 \
    --per_device_train_batch_size 1 \
    --gradient_accumulation_steps 8 \
    --lr_scheduler_type cosine \
    --max_grad_norm 1.0 \
    --logging_steps 5 \
    --save_steps 100 \
    --warmup_steps 0 \
    --optim adamw_torch \
    --packing False \
    --report_to none \
    --output_dir saves/GLM-4-9B-Chat/lora/train_2024-06-07-09-44-37 \
    --bf16 True \
    --plot_loss True \
    --ddp_timeout 180000000 \
    --include_num_input_tokens_seen True \
    --adapter_name_or_path saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03 \
    --lora_rank 8 \
    --lora_alpha 16 \
    --lora_dropout 0 \
    --lora_target all \
    --reward_model saves/GLM-4-9B-Chat/lora/train_2024-06-07-09-37-06 \
    --reward_model_type lora \
    --ppo_score_norm True \
    --top_k 0 \
    --top_p 0.9

### Expected behavior

_No response_

### Others

[2024-06-07 10:10:55,970] torch.distributed.run: [WARNING] 
[2024-06-07 10:10:55,970] torch.distributed.run: [WARNING] *****************************************
[2024-06-07 10:10:55,970] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
[2024-06-07 10:10:55,970] torch.distributed.run: [WARNING] *****************************************
[2024-06-07 10:11:03,623] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to npu (auto detect)
[2024-06-07 10:11:03,661] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to npu (auto detect)
[2024-06-07 10:11:03,705] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to npu (auto detect)
[2024-06-07 10:11:03,818] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to npu (auto detect)
[2024-06-07 10:11:03,836] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to npu (auto detect)
[2024-06-07 10:11:03,905] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to npu (auto detect)
[2024-06-07 10:11:03,955] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to npu (auto detect)
[2024-06-07 10:11:03,991] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to npu (auto detect)
06/07/2024 10:11:17 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.
06/07/2024 10:11:17 - INFO - llamafactory.hparams.parser - Process rank: 0, device: npu:0, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16
2024-06-07 10:11:17,434 - modelscope - INFO - PyTorch version 2.2.0 Found.
2024-06-07 10:11:17,436 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer
2024-06-07 10:11:17,490 - modelscope - INFO - Loading done! Current index file version is 1.14.0, with md5 ceb78a2ac746b5506819a47dbbf0e37c and a total number of 976 components indexed
06/07/2024 10:11:17 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.
06/07/2024 10:11:17 - INFO - llamafactory.hparams.parser - Process rank: 7, device: npu:7, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16
06/07/2024 10:11:17 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.
06/07/2024 10:11:17 - INFO - llamafactory.hparams.parser - Process rank: 4, device: npu:4, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16
06/07/2024 10:11:17 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.
06/07/2024 10:11:17 - INFO - llamafactory.hparams.parser - Process rank: 6, device: npu:6, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16
06/07/2024 10:11:17 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.
06/07/2024 10:11:17 - INFO - llamafactory.hparams.parser - Process rank: 2, device: npu:2, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16
06/07/2024 10:11:17 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.
06/07/2024 10:11:17 - INFO - llamafactory.hparams.parser - Process rank: 1, device: npu:1, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16
06/07/2024 10:11:18 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.
06/07/2024 10:11:18 - INFO - llamafactory.hparams.parser - Process rank: 5, device: npu:5, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16
[INFO|tokenization_utils_base.py:2106] 2024-06-07 10:11:18,235 >> loading file tokenizer.model
[INFO|tokenization_utils_base.py:2106] 2024-06-07 10:11:18,235 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2106] 2024-06-07 10:11:18,236 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2106] 2024-06-07 10:11:18,236 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2106] 2024-06-07 10:11:18,236 >> loading file tokenizer.json
06/07/2024 10:11:18 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.
06/07/2024 10:11:18 - INFO - llamafactory.hparams.parser - Process rank: 3, device: npu:3, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16
[WARNING|logging.py:314] 2024-06-07 10:11:19,288 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
06/07/2024 10:11:19 - INFO - llamafactory.data.template - Add <|user|>,<|observation|> to stop words.
06/07/2024 10:11:19 - INFO - llamafactory.data.loader - Loading dataset disc-law-sft-triplet.json...
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
06/07/2024 10:11:19 - INFO - llamafactory.data.template - Add <|user|>,<|observation|> to stop words.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
06/07/2024 10:11:19 - INFO - llamafactory.data.template - Add <|user|>,<|observation|> to stop words.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
06/07/2024 10:11:20 - INFO - llamafactory.data.template - Add <|user|>,<|observation|> to stop words.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
06/07/2024 10:11:20 - INFO - llamafactory.data.template - Add <|user|>,<|observation|> to stop words.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
06/07/2024 10:11:20 - INFO - llamafactory.data.template - Add <|user|>,<|observation|> to stop words.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
06/07/2024 10:11:20 - INFO - llamafactory.data.template - Add <|user|>,<|observation|> to stop words.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
06/07/2024 10:11:22 - INFO - llamafactory.data.template - Add <|user|>,<|observation|> to stop words.
06/07/2024 10:11:26 - INFO - llamafactory.data.loader - Loading dataset disc-law-sft-triplet.json...
06/07/2024 10:11:26 - INFO - llamafactory.data.loader - Loading dataset disc-law-sft-triplet.json...
06/07/2024 10:11:26 - INFO - llamafactory.data.loader - Loading dataset disc-law-sft-triplet.json...
06/07/2024 10:11:26 - INFO - llamafactory.data.loader - Loading dataset disc-law-sft-triplet.json...
06/07/2024 10:11:26 - INFO - llamafactory.data.loader - Loading dataset disc-law-sft-triplet.json...
06/07/2024 10:11:26 - INFO - llamafactory.data.loader - Loading dataset disc-law-sft-triplet.json...
06/07/2024 10:11:26 - INFO - llamafactory.data.loader - Loading dataset disc-law-sft-triplet.json...
Running tokenizer on dataset (num_proc=16): 100%|█████████████████████████████████████████████████████████████████| 16000/16000 [00:38<00:00, 416.91 examples/s]
input_ids:
[151331, 151333, 151336, 198, 100698, 103309, 101138, 3837, 113094, 110590, 105177, 99312, 8994, 98379, 106170, 117921, 3837, 98546, 20, 98334, 21, 98424, 99146, 98385, 99082, 117225, 3837, 108592, 98696, 105181, 103757, 117537, 98380, 99043, 100451, 102337, 103273, 106156, 118828, 98798, 105181, 101376, 98314, 117055, 98550, 109534, 3837, 98459, 101247, 105079, 98634, 123900, 98324, 117537, 98595, 101676, 111602, 99916, 98760, 101642, 98335, 3837, 108592, 98696, 105181, 98453, 105529, 109290, 98396, 98381, 103941, 98798, 105181, 99195, 118894, 3837, 103078, 98711, 109534, 105079, 98322, 107801, 98993, 114731, 100129, 101242, 3837, 98547, 110664, 99999, 105181, 109487, 98365, 3837, 108592, 98696, 105181, 98701, 107801, 98993, 114731, 103941, 98798, 105181, 98314, 99527, 113995, 3837, 99704, 124187, 116767, 101806, 98583, 109695, 98829, 110960, 99416, 121952, 109055, 112246, 117442, 101242, 3837, 117442, 101242, 100048, 98875, 121424, 99054, 99893, 98649, 105862, 98433, 112998, 99108, 120250, 106318, 100035, 1773, 98365, 98379, 118828, 98798, 105181, 105420, 3837, 101113, 99131, 100588, 98634, 100059, 98493, 108592, 98696, 105181, 98607, 103278, 98344, 98817, 1773, 98379, 103171, 3837, 109534, 108634, 99532, 102492, 20, 11, 124206, 13, 24, 98575, 3837, 109055, 108634, 99532, 102492, 16, 11, 19, 101474, 13, 102486, 98575, 3837, 117442, 101242, 108634, 99532, 102492, 17, 11, 24, 99951, 13, 99082, 98575, 3837, 99054, 99893, 98649, 106508, 99108, 120250, 108634, 99532, 102492, 24, 11, 102114, 21, 98575, 3837, 111086, 101832, 99532, 106234, 102492, 98729, 11, 101135, 17, 13, 21, 98575, 1773, 101409, 100867, 3837, 108592, 98696, 105181, 98319, 119626, 98322, 100297, 98479, 110416, 3837, 118828, 98798, 105181, 5373, 100547, 105181, 5373, 104464, 105181, 110065, 3837, 110664, 99999, 105181, 98314, 98697, 98856, 3837, 100059, 111413, 99565, 98990, 3837, 116550, 99304, 3837, 103171, 102622, 98560, 3837, 108592, 98696, 105181, 98314, 127251, 98381, 102070, 98539, 98404, 102243, 105483, 3837, 106144, 102919, 1773, 151337]
inputs:
[gMASK] <sop> <|user|> 
基于下列案件，推测可能的判决结果。
经审理查明，2015年6月21日15时许，被告人白某某在大东区小河沿公交车站乘坐被害人张某某驾驶的133路公交车，当车辆行驶至沈阳市大东区东陵西路26号附近时，被告人白某某因未能下车而与司机张某某发生争执，并在该公交车行驶中用手拉拽档杆，被证人韩某某拉开后，被告人白某某又用手拉拽司机张某某的右胳膊，导致该车失控撞向右侧马路边停放的轿车和一个路灯杆，路灯杆折断后将福锅记炖品店的牌匾砸坏。后经被害人张某某报警，公安人员赶至现场将被告人白某某传唤到案。经鉴定，公交车受损价值人民币5,189.9元，轿车受损价值人民币1,449.57元，路灯杆受损价值人民币2,927.15元，福锅记饭店牌匾受损价值人民币9,776元，本案损失价值共计人民币19,342.6元。上述事实，被告人白某某在庭审中亦无异议，被害人张某某、朱某某、詹某某陈述，证人韩某某的证言，现场勘察笔录，视听资料，鉴定结论书，被告人白某某的供述与辩解等证据证实，足以认定。 <|assistant|>
[INFO|configuration_utils.py:731] 2024-06-07 10:12:08,107 >> loading configuration file /root/.cache/modelscope/hub/ZhipuAI/glm-4-9b-chat/config.json
[INFO|configuration_utils.py:731] 2024-06-07 10:12:08,110 >> loading configuration file /root/.cache/modelscope/hub/ZhipuAI/glm-4-9b-chat/config.json
[INFO|configuration_utils.py:796] 2024-06-07 10:12:08,111 >> Model config ChatGLMConfig {
  "_name_or_path": "/root/.cache/modelscope/hub/ZhipuAI/glm-4-9b-chat",
  "add_bias_linear": false,
  "add_qkv_bias": true,
  "apply_query_key_layer_scaling": true,
  "apply_residual_connection_post_layernorm": false,
  "architectures": [
    "ChatGLMModel"
  ],
  "attention_dropout": 0.0,
  "attention_softmax_in_fp32": true,
  "auto_map": {
    "AutoConfig": "configuration_chatglm.ChatGLMConfig",
    "AutoModel": "modeling_chatglm.ChatGLMForConditionalGeneration",
    "AutoModelForCausalLM": "modeling_chatglm.ChatGLMForConditionalGeneration",
    "AutoModelForSeq2SeqLM": "modeling_chatglm.ChatGLMForConditionalGeneration",
    "AutoModelForSequenceClassification": "modeling_chatglm.ChatGLMForSequenceClassification"
  },
  "bias_dropout_fusion": true,
  "classifier_dropout": null,
  "eos_token_id": [
    151329,
    151336,
    151338
  ],
  "ffn_hidden_size": 13696,
  "fp32_residual_connection": false,
  "hidden_dropout": 0.0,
  "hidden_size": 4096,
  "kv_channels": 128,
  "layernorm_epsilon": 1.5625e-07,
  "model_type": "chatglm",
  "multi_query_attention": true,
  "multi_query_group_num": 2,
  "num_attention_heads": 32,
  "num_hidden_layers": 40,
  "num_layers": 40,
  "original_rope": true,
  "pad_token_id": 151329,
  "padded_vocab_size": 151552,
  "post_layer_norm": true,
  "rmsnorm": true,
  "rope_ratio": 500,
  "seq_length": 131072,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.41.2",
  "use_cache": true,
  "vocab_size": 151552
}

[INFO|modeling_utils.py:3471] 2024-06-07 10:12:08,159 >> loading weights file /root/.cache/modelscope/hub/ZhipuAI/glm-4-9b-chat/model.safetensors.index.json
[INFO|modeling_utils.py:1519] 2024-06-07 10:12:08,160 >> Instantiating ChatGLMForConditionalGeneration model under default dtype torch.bfloat16.
[INFO|configuration_utils.py:962] 2024-06-07 10:12:08,162 >> Generate config GenerationConfig {
  "eos_token_id": [
    151329,
    151336,
    151338
  ],
  "pad_token_id": 151329
}

Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:06<00:00,  1.45it/s]
[INFO|modeling_utils.py:4280] 2024-06-07 10:12:15,224 >> All model checkpoint weights were used when initializing ChatGLMForConditionalGeneration.

[INFO|modeling_utils.py:4288] 2024-06-07 10:12:15,224 >> All the weights of ChatGLMForConditionalGeneration were initialized from the model checkpoint at /root/.cache/modelscope/hub/ZhipuAI/glm-4-9b-chat.
If your task is similar to the task the model of the checkpoint was trained on, you can already use ChatGLMForConditionalGeneration for predictions without further training.
[INFO|modeling_utils.py:3797] 2024-06-07 10:12:15,231 >> Generation config file not found, using a generation config created from the model config.
06/07/2024 10:12:15 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
06/07/2024 10:12:15 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation.
06/07/2024 10:12:15 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
06/07/2024 10:12:15 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
Loading checkpoint shards:  60%|██████████████████████████████████████████████████████████▏                                      | 6/10 [00:04<00:02,  1.35it/s]06/07/2024 10:12:15 - INFO - llamafactory.model.adapter - Loaded adapter(s): saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03
06/07/2024 10:12:15 - INFO - llamafactory.model.model_utils.valuehead - Provided path (saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03) does not contain value head weights: saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03 does not appear to have a file named value_head.bin. Checkout 'https://huggingface.co/saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03/tree/None' for available files..
06/07/2024 10:12:15 - INFO - llamafactory.model.model_utils.valuehead - Ignore the above message if you are not resuming the training of a value head model.
06/07/2024 10:12:15 - INFO - llamafactory.model.loader - trainable params: 21180417 || all params: 9421131777 || trainable%: 0.2248
Loading checkpoint shards:  70%|███████████████████████████████████████████████████████████████████▉                             | 7/10 [00:05<00:02,  1.39it/s]06/07/2024 10:12:16 - INFO - llamafactory.train.trainer_utils - Loaded adapter weights of reward model from saves/GLM-4-9B-Chat/lora/train_2024-06-07-09-37-06
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:06<00:00,  1.51it/s]
06/07/2024 10:12:17 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
06/07/2024 10:12:17 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation.
06/07/2024 10:12:17 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
06/07/2024 10:12:17 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:07<00:00,  1.42it/s]
06/07/2024 10:12:18 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
06/07/2024 10:12:18 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation.
06/07/2024 10:12:18 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
06/07/2024 10:12:18 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:07<00:00,  1.36it/s]
06/07/2024 10:12:18 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
06/07/2024 10:12:18 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation.
06/07/2024 10:12:18 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
06/07/2024 10:12:18 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:07<00:00,  1.35it/s]
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:07<00:00,  1.34it/s]
06/07/2024 10:12:18 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
06/07/2024 10:12:18 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation.
06/07/2024 10:12:18 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
06/07/2024 10:12:18 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
06/07/2024 10:12:18 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
06/07/2024 10:12:18 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation.
06/07/2024 10:12:18 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
06/07/2024 10:12:18 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
Loading checkpoint shards:  90%|███████████████████████████████████████████████████████████████████████████████████████▎         | 9/10 [00:07<00:00,  1.19it/s]06/07/2024 10:12:18 - INFO - llamafactory.model.adapter - Loaded adapter(s): saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03
06/07/2024 10:12:18 - INFO - llamafactory.model.model_utils.valuehead - Provided path (saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03) does not contain value head weights: saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03 does not appear to have a file named value_head.bin. Checkout 'https://huggingface.co/saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03/tree/None' for available files..
06/07/2024 10:12:18 - INFO - llamafactory.model.model_utils.valuehead - Ignore the above message if you are not resuming the training of a value head model.
06/07/2024 10:12:18 - INFO - llamafactory.model.loader - trainable params: 21180417 || all params: 9421131777 || trainable%: 0.2248
06/07/2024 10:12:19 - INFO - llamafactory.model.adapter - Loaded adapter(s): saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03
06/07/2024 10:12:19 - INFO - llamafactory.model.model_utils.valuehead - Provided path (saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03) does not contain value head weights: saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03 does not appear to have a file named value_head.bin. Checkout 'https://huggingface.co/saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03/tree/None' for available files..
06/07/2024 10:12:19 - INFO - llamafactory.model.model_utils.valuehead - Ignore the above message if you are not resuming the training of a value head model.
06/07/2024 10:12:19 - INFO - llamafactory.model.loader - trainable params: 21180417 || all params: 9421131777 || trainable%: 0.2248
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:08<00:00,  1.19it/s]
06/07/2024 10:12:19 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
06/07/2024 10:12:19 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation.
06/07/2024 10:12:19 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
06/07/2024 10:12:19 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
06/07/2024 10:12:19 - INFO - llamafactory.model.adapter - Loaded adapter(s): saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03
06/07/2024 10:12:19 - INFO - llamafactory.model.model_utils.valuehead - Provided path (saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03) does not contain value head weights: saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03 does not appear to have a file named value_head.bin. Checkout 'https://huggingface.co/saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03/tree/None' for available files..
06/07/2024 10:12:19 - INFO - llamafactory.model.model_utils.valuehead - Ignore the above message if you are not resuming the training of a value head model.
06/07/2024 10:12:19 - INFO - llamafactory.train.trainer_utils - Loaded adapter weights of reward model from saves/GLM-4-9B-Chat/lora/train_2024-06-07-09-37-06
06/07/2024 10:12:19 - INFO - llamafactory.model.loader - trainable params: 21180417 || all params: 9421131777 || trainable%: 0.2248
06/07/2024 10:12:19 - INFO - llamafactory.train.trainer_utils - Loaded adapter weights of reward model from saves/GLM-4-9B-Chat/lora/train_2024-06-07-09-37-06
06/07/2024 10:12:19 - INFO - llamafactory.model.adapter - Loaded adapter(s): saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03
06/07/2024 10:12:19 - INFO - llamafactory.model.adapter - Loaded adapter(s): saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03
06/07/2024 10:12:19 - INFO - llamafactory.model.model_utils.valuehead - Provided path (saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03) does not contain value head weights: saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03 does not appear to have a file named value_head.bin. Checkout 'https://huggingface.co/saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03/tree/None' for available files..
06/07/2024 10:12:19 - INFO - llamafactory.model.model_utils.valuehead - Ignore the above message if you are not resuming the training of a value head model.
06/07/2024 10:12:19 - INFO - llamafactory.model.model_utils.valuehead - Provided path (saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03) does not contain value head weights: saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03 does not appear to have a file named value_head.bin. Checkout 'https://huggingface.co/saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03/tree/None' for available files..
06/07/2024 10:12:19 - INFO - llamafactory.model.model_utils.valuehead - Ignore the above message if you are not resuming the training of a value head model.
06/07/2024 10:12:19 - INFO - llamafactory.model.loader - trainable params: 21180417 || all params: 9421131777 || trainable%: 0.2248
06/07/2024 10:12:19 - INFO - llamafactory.model.loader - trainable params: 21180417 || all params: 9421131777 || trainable%: 0.2248
06/07/2024 10:12:20 - INFO - llamafactory.train.trainer_utils - Loaded adapter weights of reward model from saves/GLM-4-9B-Chat/lora/train_2024-06-07-09-37-06
06/07/2024 10:12:20 - INFO - llamafactory.train.trainer_utils - Loaded adapter weights of reward model from saves/GLM-4-9B-Chat/lora/train_2024-06-07-09-37-06
Loading checkpoint shards:  90%|███████████████████████████████████████████████████████████████████████████████████████▎         | 9/10 [00:09<00:01,  1.02s/it]06/07/2024 10:12:20 - INFO - llamafactory.model.adapter - Loaded adapter(s): saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03
06/07/2024 10:12:20 - INFO - llamafactory.train.trainer_utils - Loaded adapter weights of reward model from saves/GLM-4-9B-Chat/lora/train_2024-06-07-09-37-06
06/07/2024 10:12:20 - INFO - llamafactory.model.model_utils.valuehead - Provided path (saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03) does not contain value head weights: saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03 does not appear to have a file named value_head.bin. Checkout 'https://huggingface.co/saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03/tree/None' for available files..
06/07/2024 10:12:20 - INFO - llamafactory.model.model_utils.valuehead - Ignore the above message if you are not resuming the training of a value head model.
06/07/2024 10:12:20 - INFO - llamafactory.model.loader - trainable params: 21180417 || all params: 9421131777 || trainable%: 0.2248
06/07/2024 10:12:21 - INFO - llamafactory.train.trainer_utils - Loaded adapter weights of reward model from saves/GLM-4-9B-Chat/lora/train_2024-06-07-09-37-06
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:10<00:00,  1.05s/it]
06/07/2024 10:12:21 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
06/07/2024 10:12:21 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation.
06/07/2024 10:12:21 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
06/07/2024 10:12:21 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
06/07/2024 10:12:22 - INFO - llamafactory.model.adapter - Loaded adapter(s): saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03
06/07/2024 10:12:22 - INFO - llamafactory.model.model_utils.valuehead - Provided path (saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03) does not contain value head weights: saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03 does not appear to have a file named value_head.bin. Checkout 'https://huggingface.co/saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03/tree/None' for available files..
06/07/2024 10:12:22 - INFO - llamafactory.model.model_utils.valuehead - Ignore the above message if you are not resuming the training of a value head model.
06/07/2024 10:12:22 - INFO - llamafactory.model.loader - trainable params: 21180417 || all params: 9421131777 || trainable%: 0.2248
06/07/2024 10:12:23 - INFO - llamafactory.train.trainer_utils - Loaded adapter weights of reward model from saves/GLM-4-9B-Chat/lora/train_2024-06-07-09-37-06
06/07/2024 10:12:23 - INFO - llamafactory.train.ppo.trainer - ***** Running training *****
06/07/2024 10:12:23 - INFO - llamafactory.train.ppo.trainer -   Num examples = 16000
06/07/2024 10:12:23 - INFO - llamafactory.train.ppo.trainer -   Num Epochs = 3.0
06/07/2024 10:12:23 - INFO - llamafactory.train.ppo.trainer -   Instantaneous batch size per device = 1
06/07/2024 10:12:23 - INFO - llamafactory.train.ppo.trainer -   Total train batch size (w. parallel, buffer, distributed & accumulation) = 64
06/07/2024 10:12:23 - INFO - llamafactory.train.ppo.trainer -   Gradient Accumulation steps = 8
06/07/2024 10:12:23 - INFO - llamafactory.train.ppo.trainer -   Num optimization epochs per batch = 4
06/07/2024 10:12:23 - INFO - llamafactory.train.ppo.trainer -   Total training steps = 750
06/07/2024 10:12:23 - INFO - llamafactory.train.ppo.trainer -   Number of trainable parameters = 21180417
  0%|                                                                                                                                   | 0/750 [00:00<?, ?it/s]/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/transformers/generation/logits_process.py:1591: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at torch_npu/csrc/aten/common/TensorFactories.cpp:74.)
  scores_processed = torch.where(scores != scores, 0.0, scores)
/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/transformers/generation/logits_process.py:1591: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at torch_npu/csrc/aten/common/TensorFactories.cpp:74.)
  scores_processed = torch.where(scores != scores, 0.0, scores)
/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/transformers/generation/logits_process.py:1591: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at torch_npu/csrc/aten/common/TensorFactories.cpp:74.)
  scores_processed = torch.where(scores != scores, 0.0, scores)
/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/transformers/generation/logits_process.py:1591: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at torch_npu/csrc/aten/common/TensorFactories.cpp:74.)
  scores_processed = torch.where(scores != scores, 0.0, scores)
/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/transformers/generation/logits_process.py:1591: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at torch_npu/csrc/aten/common/TensorFactories.cpp:74.)
  scores_processed = torch.where(scores != scores, 0.0, scores)
/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/transformers/generation/logits_process.py:1591: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at torch_npu/csrc/aten/common/TensorFactories.cpp:74.)
  scores_processed = torch.where(scores != scores, 0.0, scores)
/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/transformers/generation/logits_process.py:1591: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at torch_npu/csrc/aten/common/TensorFactories.cpp:74.)
  scores_processed = torch.where(scores != scores, 0.0, scores)
/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/transformers/generation/logits_process.py:1591: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at torch_npu/csrc/aten/common/TensorFactories.cpp:74.)
  scores_processed = torch.where(scores != scores, 0.0, scores)
[rank1]:[W VariableFallbackKernel.cpp:51] Warning: CAUTION: The operator 'aten::isin.Tensor_Tensor_out' is not currently supported on the NPU backend and will fall back to run on the CPU. This may have performance implications. (function npu_cpu_fallback)
[rank0]:[W VariableFallbackKernel.cpp:51] Warning: CAUTION: The operator 'aten::isin.Tensor_Tensor_out' is not currently supported on the NPU backend and will fall back to run on the CPU. This may have performance implications. (function npu_cpu_fallback)
[rank7]:[W VariableFallbackKernel.cpp:51] Warning: CAUTION: The operator 'aten::isin.Tensor_Tensor_out' is not currently supported on the NPU backend and will fall back to run on the CPU. This may have performance implications. (function npu_cpu_fallback)
[rank2]:[W VariableFallbackKernel.cpp:51] Warning: CAUTION: The operator 'aten::isin.Tensor_Tensor_out' is not currently supported on the NPU backend and will fall back to run on the CPU. This may have performance implications. (function npu_cpu_fallback)
[rank6]:[W VariableFallbackKernel.cpp:51] Warning: CAUTION: The operator 'aten::isin.Tensor_Tensor_out' is not currently supported on the NPU backend and will fall back to run on the CPU. This may have performance implications. (function npu_cpu_fallback)
[rank3]:[W VariableFallbackKernel.cpp:51] Warning: CAUTION: The operator 'aten::isin.Tensor_Tensor_out' is not currently supported on the NPU backend and will fall back to run on the CPU. This may have performance implications. (function npu_cpu_fallback)
[rank4]:[W VariableFallbackKernel.cpp:51] Warning: CAUTION: The operator 'aten::isin.Tensor_Tensor_out' is not currently supported on the NPU backend and will fall back to run on the CPU. This may have performance implications. (function npu_cpu_fallback)
[rank5]:[W VariableFallbackKernel.cpp:51] Warning: CAUTION: The operator 'aten::isin.Tensor_Tensor_out' is not currently supported on the NPU backend and will fall back to run on the CPU. This may have performance implications. (function npu_cpu_fallback)
  0%|                                                                                                                                   | 0/750 [00:14<?, ?it/s]
Traceback (most recent call last):
  File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 9, in <module>
    launch()
  File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 5, in launch
    run_exp()
  File "/data/LLaMA-Factory/src/llamafactory/train/tuner.py", line 37, in run_exp
    run_ppo(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/workflow.py", line 59, in run_ppo
    ppo_trainer.ppo_train(resume_from_checkpoint=training_args.resume_from_checkpoint)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 220, in ppo_train
    mini_batch_rewards = self.get_rewards(mini_batch_queries, mini_batch_responses)
  File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 387, in get_rewards
    rewards.append(values[i, end_index].float().detach().cpu())  # use fp32 type
IndexError: index 213 is out of bounds for dimension 1 with size 1
Traceback (most recent call last):
  File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 9, in <module>
    launch()
  File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 5, in launch
    run_exp()
  File "/data/LLaMA-Factory/src/llamafactory/train/tuner.py", line 37, in run_exp
    run_ppo(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/workflow.py", line 59, in run_ppo
    ppo_trainer.ppo_train(resume_from_checkpoint=training_args.resume_from_checkpoint)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 220, in ppo_train
    mini_batch_rewards = self.get_rewards(mini_batch_queries, mini_batch_responses)
  File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 387, in get_rewards
    rewards.append(values[i, end_index].float().detach().cpu())  # use fp32 type
IndexError: index 67 is out of bounds for dimension 1 with size 1
Traceback (most recent call last):
  File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 9, in <module>
    launch()
  File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 5, in launch
    run_exp()
  File "/data/LLaMA-Factory/src/llamafactory/train/tuner.py", line 37, in run_exp
    run_ppo(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/workflow.py", line 59, in run_ppo
    ppo_trainer.ppo_train(resume_from_checkpoint=training_args.resume_from_checkpoint)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 220, in ppo_train
    mini_batch_rewards = self.get_rewards(mini_batch_queries, mini_batch_responses)
  File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 387, in get_rewards
    rewards.append(values[i, end_index].float().detach().cpu())  # use fp32 type
IndexError: index 379 is out of bounds for dimension 1 with size 1
Traceback (most recent call last):
  File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 9, in <module>
    launch()
  File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 5, in launch
    run_exp()
  File "/data/LLaMA-Factory/src/llamafactory/train/tuner.py", line 37, in run_exp
    run_ppo(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/workflow.py", line 59, in run_ppo
Traceback (most recent call last):
  File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 9, in <module>
    ppo_trainer.ppo_train(resume_from_checkpoint=training_args.resume_from_checkpoint)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 220, in ppo_train
    mini_batch_rewards = self.get_rewards(mini_batch_queries, mini_batch_responses)
  File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    launch()
  File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 5, in launch
    return func(*args, **kwargs)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 387, in get_rewards
    run_exp()
  File "/data/LLaMA-Factory/src/llamafactory/train/tuner.py", line 37, in run_exp
    run_ppo(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/workflow.py", line 59, in run_ppo
    rewards.append(values[i, end_index].float().detach().cpu())  # use fp32 type
IndexError: index 390 is out of bounds for dimension 1 with size 1
    ppo_trainer.ppo_train(resume_from_checkpoint=training_args.resume_from_checkpoint)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 220, in ppo_train
    mini_batch_rewards = self.get_rewards(mini_batch_queries, mini_batch_responses)
  File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 387, in get_rewards
    rewards.append(values[i, end_index].float().detach().cpu())  # use fp32 type
IndexError: index 408 is out of bounds for dimension 1 with size 1
Traceback (most recent call last):
  File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 9, in <module>
    launch()
  File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 5, in launch
    run_exp()
  File "/data/LLaMA-Factory/src/llamafactory/train/tuner.py", line 37, in run_exp
    run_ppo(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/workflow.py", line 59, in run_ppo
    ppo_trainer.ppo_train(resume_from_checkpoint=training_args.resume_from_checkpoint)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 220, in ppo_train
    mini_batch_rewards = self.get_rewards(mini_batch_queries, mini_batch_responses)
  File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 387, in get_rewards
    rewards.append(values[i, end_index].float().detach().cpu())  # use fp32 type
IndexError: index 499 is out of bounds for dimension 1 with size 1
Traceback (most recent call last):
  File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 9, in <module>
    launch()
  File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 5, in launch
    run_exp()
  File "/data/LLaMA-Factory/src/llamafactory/train/tuner.py", line 37, in run_exp
    run_ppo(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/workflow.py", line 59, in run_ppo
    ppo_trainer.ppo_train(resume_from_checkpoint=training_args.resume_from_checkpoint)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 220, in ppo_train
    mini_batch_rewards = self.get_rewards(mini_batch_queries, mini_batch_responses)
  File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 387, in get_rewards
    rewards.append(values[i, end_index].float().detach().cpu())  # use fp32 type
IndexError: index 501 is out of bounds for dimension 1 with size 1
Traceback (most recent call last):
  File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 9, in <module>
    launch()
  File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 5, in launch
    run_exp()
  File "/data/LLaMA-Factory/src/llamafactory/train/tuner.py", line 37, in run_exp
    run_ppo(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/workflow.py", line 59, in run_ppo
    ppo_trainer.ppo_train(resume_from_checkpoint=training_args.resume_from_checkpoint)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 220, in ppo_train
    mini_batch_rewards = self.get_rewards(mini_batch_queries, mini_batch_responses)
  File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 387, in get_rewards
    rewards.append(values[i, end_index].float().detach().cpu())  # use fp32 type
IndexError: index 488 is out of bounds for dimension 1 with size 1
[2024-06-07 10:12:46,085] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 2227860 closing signal SIGTERM
[2024-06-07 10:12:46,085] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 2227861 closing signal SIGTERM
[2024-06-07 10:12:46,086] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 2227862 closing signal SIGTERM
[2024-06-07 10:12:46,086] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 2227863 closing signal SIGTERM
[2024-06-07 10:12:46,086] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 2227864 closing signal SIGTERM
[2024-06-07 10:12:46,086] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 2227865 closing signal SIGTERM
[2024-06-07 10:12:46,451] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 2227858) of binary: /data/anaconda3/envs/llama_factory/bin/python
Traceback (most recent call last):
  File "/data/anaconda3/envs/llama_factory/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 347, in wrapper
    return f(*args, **kwargs)
  File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/distributed/run.py", line 812, in main
    run(args)
  File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/distributed/run.py", line 803, in run
    elastic_launch(
  File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 135, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 268, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
/data/LLaMA-Factory/src/llamafactory/launcher.py FAILED
------------------------------------------------------------
Failures:
[1]:
  time      : 2024-06-07_10:12:46
  host      : localhost.localdomain
  rank      : 1 (local_rank: 1)
  exitcode  : 1 (pid: 2227859)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-06-07_10:12:46
  host      : localhost.localdomain
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 2227858)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
===========================================================

老哥，你好，昇腾的glm4微调训练成功了么，可不可以提供你的脚本

jesuswa · 2024-06-14T03:09:02Z

大佬，有没有推理脚本

zhoushaoxiang · 2024-06-15T09:51:28Z

请问大家有使用昇腾进行glm4微调训练成功的吗，我这一直报
The param dtype not implemented for DT_BFLOAT16, should be in dtype support list [DT_FLOAT16,DT_FLOAT,DT_DOUBLE,DT_INT8,DT_UINT8,DT_INT16,DT_INT32,DT_INT64,DT_BOOL,DT_COMPLEX64,DT_COMPLEX128,]
有没有大佬能解决这个问题

DaozeZhang · 2024-06-19T06:32:12Z

同样想要glm4的finetune脚本，推荐的配置参数命令也行，感谢

hiyouga added the pending This problem is yet to be addressed label Jun 7, 2024

hiyouga added the npu This problem is related to NPU devices label Jun 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

【NPU】GLM-4-9B-Chat PPO 出错 #4135

【NPU】GLM-4-9B-Chat PPO 出错 #4135

hunterhome commented Jun 7, 2024 •

edited by hiyouga

Loading

hiyouga commented Jun 7, 2024

1737686924 commented Jun 10, 2024

Reminder

System Info

Reproduction

jesuswa commented Jun 14, 2024

zhoushaoxiang commented Jun 15, 2024

DaozeZhang commented Jun 19, 2024

【NPU】GLM-4-9B-Chat PPO 出错 #4135

【NPU】GLM-4-9B-Chat PPO 出错 #4135

Comments

hunterhome commented Jun 7, 2024 • edited by hiyouga Loading

Reminder

System Info

Reproduction

hiyouga commented Jun 7, 2024

1737686924 commented Jun 10, 2024

Reminder

System Info

Reproduction

jesuswa commented Jun 14, 2024

zhoushaoxiang commented Jun 15, 2024

DaozeZhang commented Jun 19, 2024

hunterhome commented Jun 7, 2024 •

edited by hiyouga

Loading