Skip to content

Latest commit

 

History

History
306 lines (260 loc) · 16.1 KB

experiments-vector-prf.md

File metadata and controls

306 lines (260 loc) · 16.1 KB

Pyserini: Reproducing Vector PRF Results

This guide provides instructions to reproduce the Vector PRF in the following work and on all datasets and DR models available in Pyserini:

Hang Li, Ahmed Mourad, Shengyao Zhuang, Bevan Koopman, Guido Zuccon. Pseudo Relevance Feedback with Deep Language Models and Dense Retrievers: Successes and Pitfalls

And the ANCE-PRF model from the following work:

HongChien Yu, Chenyan Xiong, Jamie Callan. Improving Query Representations for Dense Retrieval with Pseudo Relevance Feedback

Starting with v0.12.0, you can reproduce these results directly from the Pyserini PyPI package. Since dense retrieval depends on neural networks, Pyserini requires a more complex set of dependencies to use this feature. See package installation notes for more details.

Note that we have observed minor differences in scores between different computing environments (e.g., Linux vs. macOS). However, the differences usually appear in the fifth digit after the decimal point, and do not appear to be a cause for concern from a reproducibility perspective. Thus, while the scoring script provides results to much higher precision, we have intentionally rounded to four digits after the decimal point.

Summary

Here's how our results stack up against all available models and datasets in Pyserini:

Passage Ranking Datasets

TREC DL 2019 Passage

Model Method MAP nDCG@100 Recall@1000
ANCE Original 0.3710 0.5540 0.7554
ANCE Average PRF 3 0.4247 0.5937 0.7739
ANCE Rocchio PRF 5 A0.4 B0.6 0.4211 0.5928 0.7825
ANCE-PRF PRF 3 0.4250 0.5966 0.7915
TCT-ColBERT V1 Original 0.3906 0.5730 0.7916
TCT-ColBERT V1 Average PRF 3 0.4336 0.6119 0.8230
TCT-ColBERT V1 Rocchio PRF 5 A0.4 B0.6 0.4463 0.6143 0.8393
TCT-ColBERT V2 Original 0.4269 0.6129 0.8342
TCT-ColBERT V2 Average PRF 3 0.4766 0.6487 0.8574
TCT-ColBERT V2 Rocchio PRF 5 A0.4 B0.6 0.4709 0.6435 0.8496
DistillBERT KD Original 0.3759 0.5765 0.6853
DistillBERT KD Average PRF 3 0.4362 0.6217 0.7180
DistillBERT KD Rocchio PRF 5 A0.4 B0.6 0.4378 0.6189 0.7291
DistillBERT Balanced Original 0.4761 0.6360 0.7826
DistillBERT Balanced Average PRF 3 0.5057 0.6526 0.8054
DistillBERT Balanced Rocchio PRF 5 A0.4 B0.6 0.5249 0.6684 0.8352
SBERT Original 0.4060 0.5985 0.7872
SBERT Average PRF 3 0.4354 0.6149 0.7937
SBERT Rocchio PRF 5 A0.4 B0.6 0.4371 0.6149 0.7941
ADORE Original 0.4188 0.5946 0.7759
ADORE Average PRF 3 0.4672 0.6263 0.7890
ADORE Rocchio PRF 5 A0.4 B0.6 0.4629 0.6325 0.7950

TREC DL 2020 Passage

Model Method MAP nDCG@100 Recall@1000
ANCE Original 0.4076 0.5679 0.7764
ANCE Average PRF 3 0.4325 0.5793 0.7909
ANCE Rocchio PRF 5 A0.4 B0.6 0.4315 0.5800 0.7957
ANCE-PRF PRF 3 0.4338 0.5815 0.7942
TCT-ColBERT V1 Original 0.4290 0.5826 0.8181
TCT-ColBERT V1 Average PRF 3 0.4725 0.6101 0.8667
TCT-ColBERT V1 Rocchio PRF 5 A0.4 B0.6 0.4625 0.6056 0.8576
TCT-ColBERT V2 Original 0.4717 0.6200 0.8407
TCT-ColBERT V2 Average PRF 3 0.4701 0.6209 0.8739
TCT-ColBERT V2 Rocchio PRF 5 A0.4 B0.6 0.4819 0.6324 0.8760
DistillBERT KD Original 0.3909 0.5728 0.6893
DistillBERT KD Average PRF 3 0.3955 0.5755 0.7279
DistillBERT KD Rocchio PRF 5 A0.4 B0.6 0.3990 0.5760 0.7222
DistillBERT Balanced Original 0.4755 0.6346 0.8009
DistillBERT Balanced Average PRF 3 0.4873 0.6449 0.8392
DistillBERT Balanced Rocchio PRF 5 A0.4 B0.6 0.4846 0.6470 0.8262
SBERT Original 0.4124 0.5734 0.7937
SBERT Average PRF 3 0.4258 0.5781 0.8169
SBERT Rocchio PRF 5 A0.4 B0.6 0.4342 0.5851 0.8226
ADORE Original 0.4418 0.5949 0.8151
ADORE Average PRF 3 0.4706 0.6176 0.8323
ADORE Rocchio PRF 5 A0.4 B0.6 0.4760 0.6193 0.8251

MS MARCO Passage V1

The PRF does not perform well with sparse judgements like in MS MARCO, the results here are just complements.

Model Method MAP nDCG@100 Recall@1000
ANCE Original 0.3363 0.4457 0.9584
ANCE Average PRF 3 0.3132 0.4246 0.9490
ANCE Rocchio PRF 5 A0.4 B0.6 0.3116 0.4250 0.9547
ANCE-PRF PRF 3 0.3468 0.4554 0.9596
TCT-ColBERT V1 Original 0.3416 0.4514 0.9640
TCT-ColBERT V1 Average PRF 3 0.2882 0.4014 0.9452
TCT-ColBERT V1 Rocchio PRF 5 A0.4 B0.6 0.2809 0.3988 0.9543
TCT-ColBERT V2 Original 0.3504 0.4625 0.9679
TCT-ColBERT V2 Average PRF 3 0.3055 0.4189 0.9547
TCT-ColBERT V2 Rocchio PRF 5 A0.4 B0.6 0.3002 0.4190 0.9627
DistillBERT KD Original 0.3309 0.4391 0.9553
DistillBERT KD Average PRF 3 0.2830 0.3940 0.9325
DistillBERT KD Rocchio PRF 5 A0.4 B0.6 0.2787 0.3937 0.9432
DistillBERT Balanced Original 0.3515 0.4651 0.9771
DistillBERT Balanced Average PRF 3 0.2978 0.4150 0.9613
DistillBERT Balanced Rocchio PRF 5 A0.4 B0.6 0.2969 0.4178 0.9702
SBERT Original 0.3373 0.4453 0.9558
SBERT Average PRF 3 0.3094 0.4183 0.9446
SBERT Rocchio PRF 5 A0.4 B0.6 0.3034 0.4157 0.9529
ADORE Original 0.3523 0.4637 0.9688
ADORE Average PRF 3 0.3188 0.4330 0.9583
ADORE Rocchio PRF 5 A0.4 B0.6 0.3209 0.4376 0.9669

Reproducing Results

To reproduce the Average Vector PRF on different models, same command with different parameter values can be used:

$ python -m pyserini.dsearch --topics topic \
    --index index \
    --encoder encoder \
    --batch-size 64 \
    --threads 12 \
    --output runs/run.average_prf3.trec \
    --prf-depth 3 \
    --prf-method avg

To reproduce the Rocchio Vector PRF on different models, similar with Average:

$ python -m pyserini.dsearch --topics topic \
    --index index \
    --encoder encoder \
    --batch-size 64 \
    --threads 12 \
    --output runs/run.rocchio_prf5_a0.4_b0.6.trec \
    --prf-depth 5 \
    --prf-method rocchio \
    --rocchio-alpha 0.4 \
    --rocchio-beta 0.6

For different models and datasets, the --topics, --index, and --encoder are different, since Pyserini has all these datasets available, we can pass in different values to run on different datasets.

--topics:
    TREC DL 2019 Passage: dl19-passage
    TREC DL 2020 Passage: dl20
    MS MARCO Passage V1: msmarco-passage-dev-subset

--index:
    ANCE index with MS MARCO V1 passage collection: msmarco-passage-ance-bf
    TCT-ColBERT V1 index with MS MARCO V1 passage collection: msmarco-passage-tct_colbert-bf
    TCT-ColBERT V2 index with MS MARCO V1 passage collection: msmarco-passage-tct_colbert-v2-bf
    DistillBERT KD index with MS MARCO V1 passage collection: msmarco-passage-distilbert-dot-margin_mse-T2-bf
    DistillBERT Balanced index with MS MARCO V1 passage collection: msmarco-passage-distilbert-dot-tas_b-b256-bf
    SBERT index with MS MARCO V1 passage collection: msmarco-passage-sbert-bf

Note: TREC DL 2019, TREC DL 2020, and MS MARCO Passage V1 use the same passage collection, so the index of the same model will be the same among these three datasets.

--encoder:
    ANCE: castorini/ance-msmarco-passage
    TCT-ColBERT V1: castorini/tct_colbert-msmarco
    TCT-ColBERT V2: castorini/tct_colbert-v2-hnp-msmarco
    DistillBERT KD: sebastian-hofstaetter/distilbert-dot-margin_mse-T2-msmarco
    DistillBERT Balanced: sebastian-hofstaetter/distilbert-dot-tas_b-b256-msmarco
    SBERT: sentence-transformers/msmarco-distilbert-base-v3

Note: If you have pre-computed queries available, the --encoder can be replaced with --encoded-queries to avoid "on-the-fly" query encoding by passing in the path to your pre-computed query file. For example, Pyserini has the ANCE pre-computed query available for MS MARCO Passage V1, so instead of using --encoder castorini/ance-msmarco-passage, one can use --encoded-queries ance-msmarco-passage-dev-subset. For ADORE model, you can only use --encoded-queries, otf encoding is not available.

With these parameters, one can easily reproduce the results above, for example, to reproduce TREC DL 2019 Passage with ANCE Average Vector PRF 3 the command will be:

$ python -m pyserini.dsearch --topics dl19-passage \
    --index msmarco-passage-ance-bf \
    --encoder castorini/ance-msmarco-passage \
    --batch-size 64 \
    --threads 12 \
    --output runs/run.ance.dl19-passage.average_prf3.trec \
    --prf-depth 3 \
    --prf-method avg

To reproduce TREC DL 2019 Passage with ANCE Rocchio Vector PRF 5 Alpha 0.4 Beta 0.6, the command will be:

$ python -m pyserini.dsearch --topics dl19-passage \
    --index msmarco-passage-ance-bf \
    --encoder castorini/ance-msmarco-passage \
    --batch-size 64 \
    --threads 12 \
    --output runs/run.ance.dl19-passage.rocchio_prf5_a0.4_b0.6.trec \
    --prf-depth 5 \
    --prf-method rocchio \
    --rocchio-alpha 0.4 \
    --rocchio-beta 0.6

To evaluate, we use trec_eval built in Pyserini:

For TREC DL 2019, use this command to evaluate your run file:

$ python -m pyserini.eval.trec_eval -c -m map -m ndcg_cut.100 -m recall.1000 -l 2 dl19-passage runs/run.ance.dl19-passage.average_prf3.trec
map                 all     0.4247
ndcg_cut_100        all     0.5937
recall_1000         all     0.7739

Qrels file is available in Pyserini, so just replace the runs/run.ance.dl19-passage.average_prf3.trec with your own run file path to test your reproduced results.

Similarly, for TREC DL 2020:

$ python -m pyserini.eval.trec_eval -c -m map -m ndcg_cut.100 -m recall.1000 -l 2 dl20-passage runs/run.ance.dl20-passage.average_prf3.trec
map                 all     0.4325
ndcg_cut_100        all     0.5793
recall_1000         all     0.7909

Qrels file also available in Pyserini, just replace the runs/run.ance.dl20-passage.average_prf3.trec with your own run file path to test your reproduced results.

For MS MARCO Passage V1, no need to use -l 2 option:

$ python -m pyserini.eval.trec_eval -c -m map -m ndcg_cut.100 -m recall.1000 msmarco-passage-dev-subset runs/run.ance.msmarco-passage.average_prf3.trec
map                 all     0.3132
ndcg_cut_100        all     0.4246
recall_1000         all     0.9490

Qrels file already available, replace the runs/run.ance.msmarco-passage.average_prf3.trec with your own run file path to test your reproduced results.

Reproducing ANCE-PRF Results

To reproduce the ANCE-PRF results, it has certain limitations.

First of all, different PRF depths need to use different --ance-prf-encoder, the one we provided is only for k=3, which means it can only do --prf-depth 3.

Second, it takes two more parameters, one --ance-prf-encoder which points to the checkpoint directory, and --sparse-index that points to a lucene index.

For the lucene index, it needs to have --storeRaw enabled when building the index.

To reproduce TREC DL 2019 Passage, use the command below, change --ance-prf-encoder to the path that stores the checkpoint:

$ python -m pyserini.dsearch --topics dl19-passage \                                               
    --index msmarco-passage-ance-bf \
    --encoder castorini/ance-msmarco-passage \
    --batch-size 32 \
    --output runs/run.dl19-passage.ance-prf3.trec \
    --prf-depth 3 \
    --prf-method ance-prf \
    --threads 12 \
    --sparse-index msmarco-passage \
    --ance-prf-encoder ckpt/ance_prf_k3_checkpoint

To evaluate:

$ python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 -m recall.1000 -l 2 dl19-passage runs/run.dl19-passage.ance-prf3.trec
ndcg_cut_10         all     0.6776
recall_1000         all     0.7915

For TREC DL 2020 Passage:

$ python -m pyserini.dsearch --topics dl20 \                                               
    --index msmarco-passage-ance-bf \
    --encoder castorini/ance-msmarco-passage \
    --batch-size 32 \
    --output runs/run.dl20-passage.ance-prf3.trec \
    --prf-depth 3 \
    --prf-method ance-prf \
    --threads 12 \
    --sparse-index msmarco-passage \
    --ance-prf-encoder ckpt/ance_prf_k3_checkpoint

To evaluate:

$ python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 -m recall.1000 -l 2 dl20-passage runs/run.dl20-passage.ance-prf3.trec
ndcg_cut_10         all     0.6736
recall_1000         all     0.7942

For MS MARCO V1 Passage:

$ python -m pyserini.dsearch --topics msmarco-passage-dev-subset \                                               
    --index msmarco-passage-ance-bf \
    --encoder castorini/ance-msmarco-passage \
    --batch-size 32 \
    --output runs/run.marco-passagev1.ance-prf3.tsv \
    --prf-depth 3 \
    --prf-method ance-prf \
    --threads 12 \
    --sparse-index msmarco-passage \
    --ance-prf-encoder ckpt/ance_prf_k3_checkpoint \
    --output-format msmarco

To evaluate:

$ python -m pyserini.eval.msmarco_passage_eval msmarco-passage-dev-subset runs/run.marco-passagev1.ance-prf3.tsv
#####################
MRR @10: 0.34179202483285515
QueriesRanked: 6980
#####################
$ python -m pyserini.eval.convert_msmarco_run_to_trec_run --input runs/run.marco-passagev1.ance-prf3.tsv --output runs/run.marco-passagev1.ance-prf3.trec
$ python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 -m recall.1000 msmarco-passage-dev-subset runs/run.marco-passagev1.ance-prf3.trec
ndcg_cut_10         all     0.3991
recall_1000         all     0.9596

Reproduction Log*