You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[2024-06-05 14:39:53,109] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
[WARNING] using untested triton version (2.3.0), only 1.0.0 is known to be compatible
[2024-06-05 14:39:53,937] [WARNING] [runner.py:202:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2024-06-05 14:39:53,937] [INFO] [runner.py:568:main] cmd = /root/anaconda3/envs/kh/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMF19 --master_addr=127.0.0.1 --master_port=29500 --enable_each_rank_log=None finetune_clm_lora.py --model_name_or_path /AI2024/kh/Models/Meta-Llama-3-8B-Instruct --train_files ../../data/train_sft.csv --validation_files ../../data/dev_sft.csv ../../data/dev_sft_sharegpt.csv --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --do_train --do_eval --use_fast_tokenizer false --output_dir /AI2024/kh/Finetune/Llama-Chinese/finetune_model --evaluation_strategy steps --max_eval_samples 800 --learning_rate 1e-4 --gradient_accumulation_steps 8 --num_train_epochs 10 --warmup_steps 400 --load_in_bits 4 --lora_r 8 --lora_alpha 32 --target_modules q_proj,k_proj,v_proj,o_proj,down_proj,gate_proj,up_proj --logging_dir /AI2024/kh/Finetune/Llama-Chinese/finetune_model/logs --logging_strategy steps --logging_steps 10 --save_strategy steps --preprocessing_num_workers 10 --save_steps 20 --eval_steps 20 --save_total_limit 2000 --seed 42 --disable_tqdm false --ddp_find_unused_parameters false --block_size 2048 --report_to tensorboard --overwrite_output_dir --deepspeed ds_config_zero2.json --ignore_data_skip true --bf16 --gradient_checkpointing --bf16_full_eval --ddp_timeout 18000000
[2024-06-05 14:39:55,793] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
[WARNING] using untested triton version (2.3.0), only 1.0.0 is known to be compatible
[2024-06-05 14:39:56,609] [INFO] [launch.py:139:main] 0 NCCL_P2P_DISABLE=1
[2024-06-05 14:39:56,609] [INFO] [launch.py:146:main] WORLD INFO DICT: {'localhost': [0]}
[2024-06-05 14:39:56,609] [INFO] [launch.py:152:main] nnodes=1, num_local_procs=1, node_rank=0
[2024-06-05 14:39:56,609] [INFO] [launch.py:163:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0]})
[2024-06-05 14:39:56,609] [INFO] [launch.py:164:main] dist_world_size=1
[2024-06-05 14:39:56,609] [INFO] [launch.py:168:main] Setting CUDA_VISIBLE_DEVICES=0
[2024-06-05 14:39:56,609] [INFO] [launch.py:256:main] process 87440 spawned with command: ['/root/anaconda3/envs/kh/bin/python', '-u', 'finetune_clm_lora.py', '--local_rank=0', '--model_name_or_path', '/AI2024/kh/Models/Meta-Llama-3-8B-Instruct', '--train_files', '../../data/train_sft.csv', '--validation_files', '../../data/dev_sft.csv', '../../data/dev_sft_sharegpt.csv', '--per_device_train_batch_size', '1', '--per_device_eval_batch_size', '1', '--do_train', '--do_eval', '--use_fast_tokenizer', 'false', '--output_dir', '/AI2024/kh/Finetune/Llama-Chinese/finetune_model', '--evaluation_strategy', 'steps', '--max_eval_samples', '800', '--learning_rate', '1e-4', '--gradient_accumulation_steps', '8', '--num_train_epochs', '10', '--warmup_steps', '400', '--load_in_bits', '4', '--lora_r', '8', '--lora_alpha', '32', '--target_modules', 'q_proj,k_proj,v_proj,o_proj,down_proj,gate_proj,up_proj', '--logging_dir', '/AI2024/kh/Finetune/Llama-Chinese/finetune_model/logs', '--logging_strategy', 'steps', '--logging_steps', '10', '--save_strategy', 'steps', '--preprocessing_num_workers', '10', '--save_steps', '20', '--eval_steps', '20', '--save_total_limit', '2000', '--seed', '42', '--disable_tqdm', 'false', '--ddp_find_unused_parameters', 'false', '--block_size', '2048', '--report_to', 'tensorboard', '--overwrite_output_dir', '--deepspeed', 'ds_config_zero2.json', '--ignore_data_skip', 'true', '--bf16', '--gradient_checkpointing', '--bf16_full_eval', '--ddp_timeout', '18000000']
[2024-06-05 14:39:58,562] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
[WARNING] using untested triton version (2.3.0), only 1.0.0 is known to be compatible
Traceback (most recent call last):
File "/root/anaconda3/envs/kh/lib/python3.11/site-packages/transformers/utils/import_utils.py", line 1535, in _get_module
return importlib.import_module("." + module_name, self.name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/envs/kh/lib/python3.11/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "", line 1204, in _gcd_import
File "", line 1176, in _find_and_load
File "", line 1147, in _find_and_load_unlocked
File "", line 690, in _load_unlocked
File "", line 940, in exec_module
File "", line 241, in _call_with_frames_removed
File "/root/anaconda3/envs/kh/lib/python3.11/site-packages/transformers/trainer.py", line 180, in
from apex import amp
File "/root/anaconda3/envs/kh/lib/python3.11/site-packages/apex/init.py", line 13, in
from pyramid.session import UnencryptedCookieSessionFactoryConfig
ImportError: cannot import name 'UnencryptedCookieSessionFactoryConfig' from 'pyramid.session' (unknown location)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/AI2024/kh/Finetune/Llama-Chinese/train/sft/finetune_clm_lora.py", line 48, in
from transformers import (
File "", line 1229, in _handle_fromlist
File "/root/anaconda3/envs/kh/lib/python3.11/site-packages/transformers/utils/import_utils.py", line 1525, in getattr
module = self._get_module(self._class_to_module[name])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/envs/kh/lib/python3.11/site-packages/transformers/utils/import_utils.py", line 1537, in _get_module
raise RuntimeError(
RuntimeError: Failed to import transformers.trainer because of the following error (look up to see its traceback):
cannot import name 'UnencryptedCookieSessionFactoryConfig' from 'pyramid.session' (unknown location)
[2024-06-05 14:40:01,613] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 87440
[2024-06-05 14:40:01,614] [ERROR] [launch.py:325:sigkill_handler] ['/root/anaconda3/envs/kh/bin/python', '-u', 'finetune_clm_lora.py', '--local_rank=0', '--model_name_or_path', '/AI2024/kh/Models/Meta-Llama-3-8B-Instruct', '--train_files', '../../data/train_sft.csv', '--validation_files', '../../data/dev_sft.csv', '../../data/dev_sft_sharegpt.csv', '--per_device_train_batch_size', '1', '--per_device_eval_batch_size', '1', '--do_train', '--do_eval', '--use_fast_tokenizer', 'false', '--output_dir', '/AI2024/kh/Finetune/Llama-Chinese/finetune_model', '--evaluation_strategy', 'steps', '--max_eval_samples', '800', '--learning_rate', '1e-4', '--gradient_accumulation_steps', '8', '--num_train_epochs', '10', '--warmup_steps', '400', '--load_in_bits', '4', '--lora_r', '8', '--lora_alpha', '32', '--target_modules', 'q_proj,k_proj,v_proj,o_proj,down_proj,gate_proj,up_proj', '--logging_dir', '/AI2024/kh/Finetune/Llama-Chinese/finetune_model/logs', '--logging_strategy', 'steps', '--logging_steps', '10', '--save_strategy', 'steps', '--preprocessing_num_workers', '10', '--save_steps', '20', '--eval_steps', '20', '--save_total_limit', '2000', '--seed', '42', '--disable_tqdm', 'false', '--ddp_find_unused_parameters', 'false', '--block_size', '2048', '--report_to', 'tensorboard', '--overwrite_output_dir', '--deepspeed', 'ds_config_zero2.json', '--ignore_data_skip', 'true', '--bf16', '--gradient_checkpointing', '--bf16_full_eval', '--ddp_timeout', '18000000'] exits with return code = 1
The text was updated successfully, but these errors were encountered:
[2024-06-05 14:39:53,109] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
[WARNING] using untested triton version (2.3.0), only 1.0.0 is known to be compatible
[2024-06-05 14:39:53,937] [WARNING] [runner.py:202:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2024-06-05 14:39:53,937] [INFO] [runner.py:568:main] cmd = /root/anaconda3/envs/kh/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMF19 --master_addr=127.0.0.1 --master_port=29500 --enable_each_rank_log=None finetune_clm_lora.py --model_name_or_path /AI2024/kh/Models/Meta-Llama-3-8B-Instruct --train_files ../../data/train_sft.csv --validation_files ../../data/dev_sft.csv ../../data/dev_sft_sharegpt.csv --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --do_train --do_eval --use_fast_tokenizer false --output_dir /AI2024/kh/Finetune/Llama-Chinese/finetune_model --evaluation_strategy steps --max_eval_samples 800 --learning_rate 1e-4 --gradient_accumulation_steps 8 --num_train_epochs 10 --warmup_steps 400 --load_in_bits 4 --lora_r 8 --lora_alpha 32 --target_modules q_proj,k_proj,v_proj,o_proj,down_proj,gate_proj,up_proj --logging_dir /AI2024/kh/Finetune/Llama-Chinese/finetune_model/logs --logging_strategy steps --logging_steps 10 --save_strategy steps --preprocessing_num_workers 10 --save_steps 20 --eval_steps 20 --save_total_limit 2000 --seed 42 --disable_tqdm false --ddp_find_unused_parameters false --block_size 2048 --report_to tensorboard --overwrite_output_dir --deepspeed ds_config_zero2.json --ignore_data_skip true --bf16 --gradient_checkpointing --bf16_full_eval --ddp_timeout 18000000
[2024-06-05 14:39:55,793] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
[WARNING] using untested triton version (2.3.0), only 1.0.0 is known to be compatible
[2024-06-05 14:39:56,609] [INFO] [launch.py:139:main] 0 NCCL_P2P_DISABLE=1
[2024-06-05 14:39:56,609] [INFO] [launch.py:146:main] WORLD INFO DICT: {'localhost': [0]}
[2024-06-05 14:39:56,609] [INFO] [launch.py:152:main] nnodes=1, num_local_procs=1, node_rank=0
[2024-06-05 14:39:56,609] [INFO] [launch.py:163:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0]})
[2024-06-05 14:39:56,609] [INFO] [launch.py:164:main] dist_world_size=1
[2024-06-05 14:39:56,609] [INFO] [launch.py:168:main] Setting CUDA_VISIBLE_DEVICES=0
[2024-06-05 14:39:56,609] [INFO] [launch.py:256:main] process 87440 spawned with command: ['/root/anaconda3/envs/kh/bin/python', '-u', 'finetune_clm_lora.py', '--local_rank=0', '--model_name_or_path', '/AI2024/kh/Models/Meta-Llama-3-8B-Instruct', '--train_files', '../../data/train_sft.csv', '--validation_files', '../../data/dev_sft.csv', '../../data/dev_sft_sharegpt.csv', '--per_device_train_batch_size', '1', '--per_device_eval_batch_size', '1', '--do_train', '--do_eval', '--use_fast_tokenizer', 'false', '--output_dir', '/AI2024/kh/Finetune/Llama-Chinese/finetune_model', '--evaluation_strategy', 'steps', '--max_eval_samples', '800', '--learning_rate', '1e-4', '--gradient_accumulation_steps', '8', '--num_train_epochs', '10', '--warmup_steps', '400', '--load_in_bits', '4', '--lora_r', '8', '--lora_alpha', '32', '--target_modules', 'q_proj,k_proj,v_proj,o_proj,down_proj,gate_proj,up_proj', '--logging_dir', '/AI2024/kh/Finetune/Llama-Chinese/finetune_model/logs', '--logging_strategy', 'steps', '--logging_steps', '10', '--save_strategy', 'steps', '--preprocessing_num_workers', '10', '--save_steps', '20', '--eval_steps', '20', '--save_total_limit', '2000', '--seed', '42', '--disable_tqdm', 'false', '--ddp_find_unused_parameters', 'false', '--block_size', '2048', '--report_to', 'tensorboard', '--overwrite_output_dir', '--deepspeed', 'ds_config_zero2.json', '--ignore_data_skip', 'true', '--bf16', '--gradient_checkpointing', '--bf16_full_eval', '--ddp_timeout', '18000000']
[2024-06-05 14:39:58,562] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
[WARNING] using untested triton version (2.3.0), only 1.0.0 is known to be compatible
Traceback (most recent call last):
File "/root/anaconda3/envs/kh/lib/python3.11/site-packages/transformers/utils/import_utils.py", line 1535, in _get_module
return importlib.import_module("." + module_name, self.name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/envs/kh/lib/python3.11/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "", line 1204, in _gcd_import
File "", line 1176, in _find_and_load
File "", line 1147, in _find_and_load_unlocked
File "", line 690, in _load_unlocked
File "", line 940, in exec_module
File "", line 241, in _call_with_frames_removed
File "/root/anaconda3/envs/kh/lib/python3.11/site-packages/transformers/trainer.py", line 180, in
from apex import amp
File "/root/anaconda3/envs/kh/lib/python3.11/site-packages/apex/init.py", line 13, in
from pyramid.session import UnencryptedCookieSessionFactoryConfig
ImportError: cannot import name 'UnencryptedCookieSessionFactoryConfig' from 'pyramid.session' (unknown location)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/AI2024/kh/Finetune/Llama-Chinese/train/sft/finetune_clm_lora.py", line 48, in
from transformers import (
File "", line 1229, in _handle_fromlist
File "/root/anaconda3/envs/kh/lib/python3.11/site-packages/transformers/utils/import_utils.py", line 1525, in getattr
module = self._get_module(self._class_to_module[name])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/envs/kh/lib/python3.11/site-packages/transformers/utils/import_utils.py", line 1537, in _get_module
raise RuntimeError(
RuntimeError: Failed to import transformers.trainer because of the following error (look up to see its traceback):
cannot import name 'UnencryptedCookieSessionFactoryConfig' from 'pyramid.session' (unknown location)
[2024-06-05 14:40:01,613] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 87440
[2024-06-05 14:40:01,614] [ERROR] [launch.py:325:sigkill_handler] ['/root/anaconda3/envs/kh/bin/python', '-u', 'finetune_clm_lora.py', '--local_rank=0', '--model_name_or_path', '/AI2024/kh/Models/Meta-Llama-3-8B-Instruct', '--train_files', '../../data/train_sft.csv', '--validation_files', '../../data/dev_sft.csv', '../../data/dev_sft_sharegpt.csv', '--per_device_train_batch_size', '1', '--per_device_eval_batch_size', '1', '--do_train', '--do_eval', '--use_fast_tokenizer', 'false', '--output_dir', '/AI2024/kh/Finetune/Llama-Chinese/finetune_model', '--evaluation_strategy', 'steps', '--max_eval_samples', '800', '--learning_rate', '1e-4', '--gradient_accumulation_steps', '8', '--num_train_epochs', '10', '--warmup_steps', '400', '--load_in_bits', '4', '--lora_r', '8', '--lora_alpha', '32', '--target_modules', 'q_proj,k_proj,v_proj,o_proj,down_proj,gate_proj,up_proj', '--logging_dir', '/AI2024/kh/Finetune/Llama-Chinese/finetune_model/logs', '--logging_strategy', 'steps', '--logging_steps', '10', '--save_strategy', 'steps', '--preprocessing_num_workers', '10', '--save_steps', '20', '--eval_steps', '20', '--save_total_limit', '2000', '--seed', '42', '--disable_tqdm', 'false', '--ddp_find_unused_parameters', 'false', '--block_size', '2048', '--report_to', 'tensorboard', '--overwrite_output_dir', '--deepspeed', 'ds_config_zero2.json', '--ignore_data_skip', 'true', '--bf16', '--gradient_checkpointing', '--bf16_full_eval', '--ddp_timeout', '18000000'] exits with return code = 1
The text was updated successfully, but these errors were encountered: