Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finetune BGE-M3 #1346

Open
tenafrangelos opened this issue Jan 20, 2025 · 2 comments
Open

Finetune BGE-M3 #1346

tenafrangelos opened this issue Jan 20, 2025 · 2 comments

Comments

@tenafrangelos
Copy link

How could I finetune dense and sparse embedding only ?
I try to use this script :

%%bash
torchrun --nproc_per_node 1 \
	-m FlagEmbedding.finetune.embedder.encoder_only.m3 \
	--model_name_or_path /home/alex/ejada/developers/martina/my_cache/models--BAAI--bge-m3 \
    --cache_dir ./cache/model \
    --train_data ./ft_data/training.json \
    --train_group_size 4 \
    --query_max_len 256 \
    --passage_max_len 256 \
    --pad_to_multiple_of 4 \
    --query_instruction_for_retrieval 'Represent this sentence for searching relevant passages: ' \
    --query_instruction_format '{}{}' \
    --knowledge_distillation False \
	--output_dir ./test_encoder \
    --learning_rate 1e-5 \
    --num_train_epochs 1 \
    --per_device_train_batch_size 1 \
    --dataloader_drop_last True \
    --warmup_ratio 0.1 \
    --logging_steps 1 \
    --save_steps 1000 \
    --negatives_cross_device \
    --temperature 0.02 \
    --sentence_pooling_method cls \
    --normalize_embeddings True \
    --kd_loss_type m3_kd_loss \
    --unified_finetuning True \
    --use_self_distill True \
    --fix_encoder True \
    --colbert_dim 0 \
    --self_distill_start_step 0
@545999961
Copy link
Collaborator

ColBERT vector and sparse embedding are finetuned together. If you want to remove the colbert vector, you need to remove the code in the finetune module.

@tenafrangelos
Copy link
Author

Thanks for your answer . It's work.
I update loss function to the following :
Before :

  1. return dense_scores + 0.3 * sparse_scores + colbert_scores
  2. loss = (loss + ensemble_loss + 0.1 * sparse_loss + colbert_loss) / 4
  3. loss += (dense_self_distill_loss + 0.1 * sparse_self_distill_loss + colbert_self_distill_loss) / 3

After :

  1. return dense_scores + 0.3 * sparse_scores
  2. loss = (loss + ensemble_loss + 0.1 * sparse_loss) / 3
  3. loss += (dense_self_distill_loss + 0.1 * sparse_self_distill_loss) / 2

Is that valid or there is better equation ? should I reduce sparse_score ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants