lora微调这个怎么解决啊？ #345

kuang1216 · 2024-06-05T06:41:35Z

[2024-06-05 14:39:53,109] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
[WARNING] using untested triton version (2.3.0), only 1.0.0 is known to be compatible
[2024-06-05 14:39:53,937] [WARNING] [runner.py:202:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2024-06-05 14:39:53,937] [INFO] [runner.py:568:main] cmd = /root/anaconda3/envs/kh/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMF19 --master_addr=127.0.0.1 --master_port=29500 --enable_each_rank_log=None finetune_clm_lora.py --model_name_or_path /AI2024/kh/Models/Meta-Llama-3-8B-Instruct --train_files ../../data/train_sft.csv --validation_files ../../data/dev_sft.csv ../../data/dev_sft_sharegpt.csv --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --do_train --do_eval --use_fast_tokenizer false --output_dir /AI2024/kh/Finetune/Llama-Chinese/finetune_model --evaluation_strategy steps --max_eval_samples 800 --learning_rate 1e-4 --gradient_accumulation_steps 8 --num_train_epochs 10 --warmup_steps 400 --load_in_bits 4 --lora_r 8 --lora_alpha 32 --target_modules q_proj,k_proj,v_proj,o_proj,down_proj,gate_proj,up_proj --logging_dir /AI2024/kh/Finetune/Llama-Chinese/finetune_model/logs --logging_strategy steps --logging_steps 10 --save_strategy steps --preprocessing_num_workers 10 --save_steps 20 --eval_steps 20 --save_total_limit 2000 --seed 42 --disable_tqdm false --ddp_find_unused_parameters false --block_size 2048 --report_to tensorboard --overwrite_output_dir --deepspeed ds_config_zero2.json --ignore_data_skip true --bf16 --gradient_checkpointing --bf16_full_eval --ddp_timeout 18000000
[2024-06-05 14:39:55,793] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
[WARNING] using untested triton version (2.3.0), only 1.0.0 is known to be compatible
[2024-06-05 14:39:56,609] [INFO] [launch.py:139:main] 0 NCCL_P2P_DISABLE=1
[2024-06-05 14:39:56,609] [INFO] [launch.py:146:main] WORLD INFO DICT: {'localhost': [0]}
[2024-06-05 14:39:56,609] [INFO] [launch.py:152:main] nnodes=1, num_local_procs=1, node_rank=0
[2024-06-05 14:39:56,609] [INFO] [launch.py:163:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0]})
[2024-06-05 14:39:56,609] [INFO] [launch.py:164:main] dist_world_size=1
[2024-06-05 14:39:56,609] [INFO] [launch.py:168:main] Setting CUDA_VISIBLE_DEVICES=0
[2024-06-05 14:39:56,609] [INFO] [launch.py:256:main] process 87440 spawned with command: ['/root/anaconda3/envs/kh/bin/python', '-u', 'finetune_clm_lora.py', '--local_rank=0', '--model_name_or_path', '/AI2024/kh/Models/Meta-Llama-3-8B-Instruct', '--train_files', '../../data/train_sft.csv', '--validation_files', '../../data/dev_sft.csv', '../../data/dev_sft_sharegpt.csv', '--per_device_train_batch_size', '1', '--per_device_eval_batch_size', '1', '--do_train', '--do_eval', '--use_fast_tokenizer', 'false', '--output_dir', '/AI2024/kh/Finetune/Llama-Chinese/finetune_model', '--evaluation_strategy', 'steps', '--max_eval_samples', '800', '--learning_rate', '1e-4', '--gradient_accumulation_steps', '8', '--num_train_epochs', '10', '--warmup_steps', '400', '--load_in_bits', '4', '--lora_r', '8', '--lora_alpha', '32', '--target_modules', 'q_proj,k_proj,v_proj,o_proj,down_proj,gate_proj,up_proj', '--logging_dir', '/AI2024/kh/Finetune/Llama-Chinese/finetune_model/logs', '--logging_strategy', 'steps', '--logging_steps', '10', '--save_strategy', 'steps', '--preprocessing_num_workers', '10', '--save_steps', '20', '--eval_steps', '20', '--save_total_limit', '2000', '--seed', '42', '--disable_tqdm', 'false', '--ddp_find_unused_parameters', 'false', '--block_size', '2048', '--report_to', 'tensorboard', '--overwrite_output_dir', '--deepspeed', 'ds_config_zero2.json', '--ignore_data_skip', 'true', '--bf16', '--gradient_checkpointing', '--bf16_full_eval', '--ddp_timeout', '18000000']
[2024-06-05 14:39:58,562] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
[WARNING] using untested triton version (2.3.0), only 1.0.0 is known to be compatible
Traceback (most recent call last):
File "/root/anaconda3/envs/kh/lib/python3.11/site-packages/transformers/utils/import_utils.py", line 1535, in _get_module
return importlib.import_module("." + module_name, self.name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/envs/kh/lib/python3.11/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "", line 1204, in _gcd_import
File "", line 1176, in _find_and_load
File "", line 1147, in _find_and_load_unlocked
File "", line 690, in _load_unlocked
File "", line 940, in exec_module
File "", line 241, in _call_with_frames_removed
File "/root/anaconda3/envs/kh/lib/python3.11/site-packages/transformers/trainer.py", line 180, in
from apex import amp
File "/root/anaconda3/envs/kh/lib/python3.11/site-packages/apex/init.py", line 13, in
from pyramid.session import UnencryptedCookieSessionFactoryConfig
ImportError: cannot import name 'UnencryptedCookieSessionFactoryConfig' from 'pyramid.session' (unknown location)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/AI2024/kh/Finetune/Llama-Chinese/train/sft/finetune_clm_lora.py", line 48, in
from transformers import (
File "", line 1229, in _handle_fromlist
File "/root/anaconda3/envs/kh/lib/python3.11/site-packages/transformers/utils/import_utils.py", line 1525, in getattr
module = self._get_module(self._class_to_module[name])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/envs/kh/lib/python3.11/site-packages/transformers/utils/import_utils.py", line 1537, in _get_module
raise RuntimeError(
RuntimeError: Failed to import transformers.trainer because of the following error (look up to see its traceback):
cannot import name 'UnencryptedCookieSessionFactoryConfig' from 'pyramid.session' (unknown location)
[2024-06-05 14:40:01,613] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 87440
[2024-06-05 14:40:01,614] [ERROR] [launch.py:325:sigkill_handler] ['/root/anaconda3/envs/kh/bin/python', '-u', 'finetune_clm_lora.py', '--local_rank=0', '--model_name_or_path', '/AI2024/kh/Models/Meta-Llama-3-8B-Instruct', '--train_files', '../../data/train_sft.csv', '--validation_files', '../../data/dev_sft.csv', '../../data/dev_sft_sharegpt.csv', '--per_device_train_batch_size', '1', '--per_device_eval_batch_size', '1', '--do_train', '--do_eval', '--use_fast_tokenizer', 'false', '--output_dir', '/AI2024/kh/Finetune/Llama-Chinese/finetune_model', '--evaluation_strategy', 'steps', '--max_eval_samples', '800', '--learning_rate', '1e-4', '--gradient_accumulation_steps', '8', '--num_train_epochs', '10', '--warmup_steps', '400', '--load_in_bits', '4', '--lora_r', '8', '--lora_alpha', '32', '--target_modules', 'q_proj,k_proj,v_proj,o_proj,down_proj,gate_proj,up_proj', '--logging_dir', '/AI2024/kh/Finetune/Llama-Chinese/finetune_model/logs', '--logging_strategy', 'steps', '--logging_steps', '10', '--save_strategy', 'steps', '--preprocessing_num_workers', '10', '--save_steps', '20', '--eval_steps', '20', '--save_total_limit', '2000', '--seed', '42', '--disable_tqdm', 'false', '--ddp_find_unused_parameters', 'false', '--block_size', '2048', '--report_to', 'tensorboard', '--overwrite_output_dir', '--deepspeed', 'ds_config_zero2.json', '--ignore_data_skip', 'true', '--bf16', '--gradient_checkpointing', '--bf16_full_eval', '--ddp_timeout', '18000000'] exits with return code = 1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lora微调这个怎么解决啊？ #345

lora微调这个怎么解决啊？ #345

kuang1216 commented Jun 5, 2024

lora微调这个怎么解决啊？ #345

lora微调这个怎么解决啊？ #345

Comments

kuang1216 commented Jun 5, 2024