PairDistill: Pairwise Relevance Distillation for Dense Retrieval

📃 Paper • 🤗 Huggingface Collection • Model & Dataset (Google Drive)

Source code, trained models, and data of our paper "PairDistill: Pairwise Relevance Distillation for Dense Retrieval", accepted to EMNLP 2024 Main Conference.

Please cite the following reference if you find our code, models, and datasets useful.

@inproceedings{huang-chen-2024-pairdistill,
    title = "{P}air{D}istill: Pairwise Relevance Distillation for Dense Retrieval",
    author = "Huang, Chao-Wei  and
      Chen, Yun-Nung",
    editor = "Al-Onaizan, Yaser  and
      Bansal, Mohit  and
      Chen, Yun-Nung",
    booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2024",
    address = "Miami, Florida, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.emnlp-main.1013",
    doi = "10.18653/v1/2024.emnlp-main.1013",
    pages = "18225--18237"
}

Overview

PairDistill is a pairwise relevance distillation framework designed to enhance the retrieval performance of dense retrieval models. PairDistill leverages the pairwise relevance signals to guide the distillation process. PairDistill achieves superior performance on MS MARCO, BEIR, and LoTTE.

Train PairDistill

Preparation

Install Dependencies

Make a new Python 3.9+ environment using virtualenv or conda.

conda create -n pair-distill python=3.10
conda activate pair-distill
# Install python dependencies. We specify the versions in the requirements.txt file, but newer versions should work generally okay.
pip install -r requirements.txt

PairDistill supports two dense retrieval models: ColBERT and DPR through the dpr-scale library. Please install the corresponding dependencies for the model you want to use.

# Install ColBERT dependencies
pip install -r ColBERT/requirements.txt

# Install DPR dependencies
pip install -r dpr-scale/requirements.txt

Download Checkpoints and Datasets

In order to train PairDistill, please download the following checkpoints and datasets:

Pretrained ColBERTv2 checkpoint: unzip to ColBERT/colbertv2.0
Preprocessed training dataset: put the files into data/msmarco/

Training

Please navigate to ColBERT for ColBERT training. You could directly run

python3 train.py

to launch the training. Adjust the number of GPUs according to your setup.

Inference with PairDistill

PairDistill is directly compatible with ColBERT. Most of the instructions for running inference can be found in the original ColBERT repo.

In this repo, we provide instructions on how to run inference on the MSMARCO dev set and perform evaluation.

Pretrained Checkpoints

Please download our pretrained PairDistill checkpoint from Google Drive or Huggingface. Put it in ColBERT/PairDistill

Running Inference

Please navigate to ColBERT for inference. Run

python3 index.py

to run the indexing with the trained checkpoint. You might need to adjust the path to the trained model.

Then, run

python3 retrieve.py

to run retrieval. The retrieval results will be saved at ColBERT/PairDistill/index/{DATETIME}/msmarco.nbits=2.ranking.tsv.

Evaluation

You can use the provided script for evaluation. Run

python3 -m utility.evaluate.msmarco_passages \
      --ranking ColBERT/PairDistill/index/{DATETIME}/msmarco.nbits=2.ranking.tsv \
      --qrels ../data/msmarco/qrels.dev.small.tsv

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
ColBERT		ColBERT
dpr-scale		dpr-scale
figures		figures
README.md		README.md
filter_duplicate_ids.py		filter_duplicate_ids.py
pairwise_rerank.py		pairwise_rerank.py
pairwise_rerank_instupr.py		pairwise_rerank_instupr.py
pointwise_rerank.py		pointwise_rerank.py
pointwise_rerank_instupr.py		pointwise_rerank_instupr.py
prep_beir_eval.py		prep_beir_eval.py
process_rerank.py		process_rerank.py
requirements.txt		requirements.txt
run_rerank.sh		run_rerank.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PairDistill: Pairwise Relevance Distillation for Dense Retrieval

Overview

Train PairDistill

Preparation

Install Dependencies

Download Checkpoints and Datasets

Training

Inference with PairDistill

Pretrained Checkpoints

Running Inference

Evaluation

About

Releases

Packages

Languages

MiuLab/PairDistill

Folders and files

Latest commit

History

Repository files navigation

PairDistill: Pairwise Relevance Distillation for Dense Retrieval

Overview

Train PairDistill

Preparation

Install Dependencies

Download Checkpoints and Datasets

Training

Inference with PairDistill

Pretrained Checkpoints

Running Inference

Evaluation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages