You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
/data/anaconda3/envs/c-mteb/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:632: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.5 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.
return fn(*args, **kwargs)
0%| | 1/452 [00:04<32:47, 4.36s/it]
/data/anaconda3/envs/c-mteb/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:632: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.5 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.
return fn(*args, **kwargs)
0%| | 1/452 [00:07<59:15, 7.88s/it]
模型
jina-embeddings-v2-base-zh
模型微调参考
FlagEmbedding/examples/finetune/embedder/encoder_only/ 路径下的脚本修改
微调命令:
export WANDB_MODE=disabled
train_data="/xxx/xxx/data/finetune_data_score_v2.jsonl"
num_train_epochs=4
per_device_train_batch_size=256
num_gpus=2
if [ -z "$HF_HUB_CACHE" ]; then
export HF_HUB_CACHE="$HOME/.cache/huggingface/hub"
fi
model_args="
--model_name_or_path /SharedNFS/LLM_model/jina-embeddings-v2-base-zh
--cache_dir $HF_HUB_CACHE
--trust_remote_code True
"
data_args="
--train_data $train_data
--cache_path ~/.cache
--train_group_size 15
--query_max_len 32
--passage_max_len 32
--pad_to_multiple_of 8
--query_instruction_for_retrieval 'Represent this sentence for searching relevant passages: '
--query_instruction_format '{}{}'
--knowledge_distillation True
"
training_args="
--output_dir ./knowledge_distillation_agent_minedHN_score_test_encoder_only_base_jina-embeddings-v2-base-zh
--overwrite_output_dir
--learning_rate 1e-5
--fp16 \ # 唯一区别就是这里要不要指定fp16进行训练
--num_train_epochs $num_train_epochs
--per_device_train_batch_size $per_device_train_batch_size
--dataloader_drop_last True
--warmup_ratio 0.1
--gradient_checkpointing
--deepspeed ../../ds_stage0.json
--logging_steps 1
--save_steps 1000
--negatives_cross_device
--temperature 0.02
--sentence_pooling_method mean
--normalize_embeddings True
--kd_loss_type kl_div
"
cmd="torchrun --nproc_per_node $num_gpus
-m FlagEmbedding.finetune.embedder.encoder_only.base
$model_args
$data_args
$training_args
"
echo $cmd
eval $cmd
log输出(fp16):
/data/anaconda3/envs/c-mteb/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:632: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.5 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.
return fn(*args, **kwargs)
0%| | 1/452 [00:04<32:47, 4.36s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0, 'epoch': 0.01}
0%| | 1/452 [00:04<32:47, 4.36s/it]
0%| | 2/452 [00:07<26:18, 3.51s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0, 'epoch': 0.02}
0%| | 2/452 [00:07<26:18, 3.51s/it]
1%| | 3/452 [00:10<24:00, 3.21s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0, 'epoch': 0.03}
1%| | 3/452 [00:10<24:00, 3.21s/it]
1%| | 4/452 [00:13<25:15, 3.38s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0, 'epoch': 0.04}
1%| | 4/452 [00:13<25:15, 3.38s/it]
1%| | 5/452 [00:16<24:40, 3.31s/it]
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0, 'epoch': 0.04}
1%| | 5/452 [00:16<24:40, 3.31s/it]
1%|▏ | 6/452 [00:19<23:28, 3.16s/it]
log输出(fp32):
/data/anaconda3/envs/c-mteb/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:632: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.5 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.
return fn(*args, **kwargs)
0%| | 1/452 [00:07<59:15, 7.88s/it]
{'loss': 5.9738, 'learning_rate': 0.0, 'epoch': 0.01}
0%| | 1/452 [00:07<59:15, 7.88s/it]
0%| | 2/452 [00:14<52:21, 6.98s/it]
{'loss': 5.2496, 'learning_rate': 1.8104259678004022e-06, 'epoch': 0.02}
0%| | 2/452 [00:14<52:21, 6.98s/it]
1%| | 3/452 [00:20<50:08, 6.70s/it]
{'loss': 5.9845, 'learning_rate': 2.8694572692954448e-06, 'epoch': 0.03}
1%| | 3/452 [00:20<50:08, 6.70s/it]
1%| | 4/452 [00:28<54:47, 7.34s/it]
{'loss': 5.684, 'learning_rate': 3.6208519356008044e-06, 'epoch': 0.04}
1%| | 4/452 [00:28<54:47, 7.34s/it]
1%| | 5/452 [00:36<54:48, 7.36s/it]
{'loss': 5.5646, 'learning_rate': 4.203678918349396e-06, 'epoch': 0.04}
1%| | 5/452 [00:36<54:48, 7.36s/it]
1%|▏ | 6/452 [00:42<52:31, 7.07s/it]
另外想问一下,loss大概在多少模型差不多收敛
The text was updated successfully, but these errors were encountered: