Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Evaluation of Rankers and RRF Techniques for Retrieval Pipelines #1

Merged
merged 1 commit into from
Jun 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
version: 2
updates:
- package-ecosystem: 'github-actions'
directory: '/'
schedule:
interval: 'daily'
26 changes: 26 additions & 0 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
name: Release

on:
push:
tags:
- "v[0-9].[0-9]+.[0-9]+*"

jobs:
release-on-pypi:
runs-on: ubuntu-latest

steps:
- name: Checkout
uses: actions/checkout@v4

- name: Install Hatch
run: pip install hatch

- name: Build
run: hatch build

- name: Publish on PyPi
env:
HATCH_INDEX_USER: __token__
HATCH_INDEX_AUTH: ${{ secrets.PYPI_API_TOKEN }}
run: hatch publish -y
45 changes: 45 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
name: Test

on:
push:
branches:
- main
pull_request:

concurrency:
group: test-${{ github.head_ref }}
cancel-in-progress: true

env:
PYTHONUNBUFFERED: "1"
FORCE_COLOR: "1"
HF_API_TOKEN: ${{ secrets.HF_API_TOKEN }}

jobs:
run:
name: Python ${{ matrix.python-version }} on ${{ startsWith(matrix.os, 'macos-') && 'macOS' || startsWith(matrix.os, 'windows-') && 'Windows' || 'Linux' }}
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest, windows-latest, macos-12]
python-version: ["3.8", "3.9", "3.10", "3.11", "3.12"]

steps:
- name: Support longpaths
if: matrix.os == 'windows-latest'
run: git config --system core.longpaths true

- uses: actions/checkout@v4

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}

- name: Install Hatch
run: pip install --upgrade hatch

- name: Lint
if: matrix.python-version == '3.9' && runner.os == 'Linux'
run: hatch run lint:all
65 changes: 64 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,64 @@
# rrf
## Performance Evaluation of Rankers and RRF Techniques for Retrieval Pipelines

**Paper:** [Performance Evaluation of Rankers and RRF Techniques for Retrieval Pipelines](paper/rankers_rrf.pdf)

In the intricate world of Long-form Question Answering (LFQA) and Retrieval Augmented Generation (RAG), making the most of the LLM’s context window is paramount. Any wasted space or repetitive content limits the depth and breadth of the answers we can extract and generate. It’s a delicate balancing act to lay out the content of the context window appropriately.

With the addition of three rankers, viz., Diversity Ranker, Lost In The Middle Ranker, Similarity Rankers and RRF techniques, we aim to address these challenges and improve the answers generated by the LFQA/RAG pipelines. We have done a comparative study of adding different combinations of rankers in a Retrieval pipeline and evaluated the results on four metrics, viz., Normalized Discounted Cumulative Gain (NDCG), Mean Average Precision (MAP), Recall and Precision.

In our study, we consider the following cases of retrieval:

<img src="plots/pipelines_taxonomy.png" alt="RAG Pipelines Taxonomy" align="middle" width="600" height="300">

The following rankers were used:

- **Diversity Ranker:** The Diversity Ranker enhances the diversity of the paragraphs selected for the context window.

- **Lost In The Middle Ranker:** The Lost In The Middle Ranker optimizes the layout of the selected documents in the LLM’s context window.

- **Transformers Similarity Ranker:** The Transformers Similarity Ranker ranks Documents based on how similar they are to the query. It uses a pre-trained cross-encoder model to embed both the query and the Documents. It then compares the embeddings to determine how similar they are.

**Dense Retrieval:**

For Dense retrieval, `INSTRUCTOR-XL` and `all-mpnet-base-v2` models were employed.

<img src="plots/rankers_dense_pipeline.png" alt="Dense Pipeline with Rankers" align="middle" width="550" height="100">

**Hybrid Retrieval:**

BM25 retrieval was used for Sparse retrieval in the Hybrid pipelines. The `bge-reranker-large` model was used in the Similarity Ranker, and `ms-marco-MiniLM-L-12-v2` for the Diversity Ranker.

**Reciprocal Rank Fusion** (RRF) was used to combine the results for Hybrid retrieval.

<img src="plots/rankers_hybrid_pipeline.png" alt="Hybrid Pipeline with Rankers" align="middle" width="820" height="230">

## Usage

To run the pipelines, you will need to clone this repository and install the required libraries.

1. Install the `rrf` package:

```bash
git clone https://github.com/avnlp/rrf
cd rrf
pip install -e .
```

2. To add the data to an index in Pinecone using the INSTRUCTOR-XL embedding model:

```python
cd src/rrf/indexing_pipeline/fiqa
python pinecone_instructor_index.py
```

3. To run a specific pipeline you will have to go that file path and then run the file.
For example, running the pipeline that uses dense retrieval with a combination of Diversity Ranker, Lost In The Middle Ranker and Similarity Ranker:

```python
cd src/rrf/pointwise/instructor_xl/fiqa/
python dense_similarity_diversity_litm.py
```

## License

The source files are distributed under the [MIT License](https://github.com/avnlp/rrf/blob/main/LICENSE).
Binary file added paper/rankers_rrf.pdf
Binary file not shown.
Binary file added plots/pipelines_taxonomy.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added plots/ranker_pipeline.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added plots/rankers_dense_pipeline.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added plots/rankers_hybrid_pipeline.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
181 changes: 181 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,181 @@
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[project]
name = "rrf"
dynamic = ["version"]
description = 'Performance Evaluation of Rankers and RRF Techniques for Retrieval Pipelines'
readme = "README.md"
requires-python = ">=3.8"
license = "MIT"
keywords = ["RAG", "Retrieval", "Rankers", "LLMs", "RRF"]
authors = [
{ name = "Ashwin Mathur", email = "" },
{ name = "Varun Mathur", email = "" },
]
maintainers = [
{ name = "Ashwin Mathur", email = "" },
{ name = "Varun Mathur", email = "" },
]
classifiers = [
"Development Status :: 5 - Production/Stable",
"Intended Audience :: Science/Research",
"License :: Freely Distributable",
"License :: OSI Approved :: MIT License",
"Operating System :: OS Independent",
"Programming Language :: Python",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.8",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: Implementation :: PyPy",
"Topic :: Scientific/Engineering :: Artificial Intelligence",
]

dependencies = [
"typing_extensions",
"haystack-ai",
"sentence-transformers",
"instructor-embedders-haystack",
"beir",
"pinecone-haystack",
"chroma-haystack",
"weaviate-haystack",
"llama-cpp-haystack",
]


[project.urls]
Documentation = "https://github.com/avnlp/rrf#readme"
Issues = "https://github.com/avnlp/rrf/issues"
Source = "https://github.com/avnlp/rrf"

[tool.hatch.build.targets.wheel]
packages = ["src/rrf"]

[tool.hatch.version]
path = "src/rrf/__about__.py"

[tool.hatch.envs.default]
dependencies = ["coverage[toml]>=6.5", "coveralls", "pytest"]

[tool.hatch.envs.default.scripts]
test = "pytest {args:tests}"
test-cov = "coverage run -m pytest {args:tests}"
cov-report = ["- coverage combine", "coverage xml"]
cov = ["test-cov", "cov-report"]

[[tool.hatch.envs.all.matrix]]
python = ["3.8", "3.9", "3.10", "3.11", "3.12"]

[tool.hatch.envs.lint]
detached = true
dependencies = ["black>=23.1.0", "mypy>=1.0.0", "ruff>=0.0.243"]

[tool.hatch.envs.lint.scripts]
typing = "mypy --install-types --non-interactive {args:src/rrf}"
style = ["ruff check {args:.}", "black --check --diff {args:.}"]
fmt = ["black {args:.}", "ruff check --fix {args:.}", "style"]
all = ["fmt", "typing"]


[tool.hatch.metadata]
allow-direct-references = true

[tool.black]
target-version = ["py37"]
line-length = 120
skip-string-normalization = true

[tool.ruff]
target-version = "py37"
line-length = 120
lint.select = [
"A",
"ARG",
"B",
"C",
"DTZ",
"E",
"EM",
"F",
"FBT",
"I",
"ICN",
"ISC",
"N",
"PLC",
"PLE",
"PLR",
"PLW",
"Q",
"RUF",
"S",
"T",
"TID",
"UP",
"W",
"YTT",
]
lint.ignore = [
# Allow non-abstract empty methods in abstract base classes
"B027",
# Allow boolean positional values in function calls, like `dict.get(... True)`
"FBT003",
# Ignore checks for possible passwords
"S105",
"S106",
"S107",
# Ignore complexity
"C901",
"PLR0911",
"PLR0912",
"PLR0913",
"PLR0915",
# Ignore print statements
"T201",
"E501",
]
lint.unfixable = [
# Don't touch unused imports
"F401",
]

[tool.ruff.lint.isort]
known-first-party = ["rrf"]

[tool.ruff.lint.flake8-tidy-imports]
ban-relative-imports = "all"

[tool.ruff.lint.per-file-ignores]
# Tests can use magic values, assertions, and relative imports
"tests/**/*" = ["PLR2004", "S101", "TID252"]

[tool.coverage.run]
source_pkgs = ["rrf", "tests"]
branch = true
parallel = true
omit = ["src/rrf/__about__.py", "examples"]

[tool.coverage.paths]
rrf = [
"src/rrf",
"*/rrf/src/rrf",
]
tests = ["tests", "*rrf/tests"]

[tool.coverage.report]
exclude_lines = ["no cov", "if __name__ == .__main__.:", "if TYPE_CHECKING:"]

[tool.pytest.ini_options]
minversion = "6.0"
addopts = "-vv"
markers = ["unit: unit tests", "integration: integration tests"]

[tool.mypy]
ignore_missing_imports = true

[[tool.mypy.overrides]]
module = ["haystack.*", "pytest.*"]
ignore_missing_imports = true
1 change: 1 addition & 0 deletions src/rrf/__about__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
__version__ = "0.0.1"
4 changes: 4 additions & 0 deletions src/rrf/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
from rrf.beir_dataloader import BeirDataloader
from rrf.beir_evaluator import BeirEvaluator

__all__ = ["BeirEvaluator", "BeirDataloader"]
26 changes: 26 additions & 0 deletions src/rrf/beir_dataloader.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
import os
from typing import Any, Dict, Optional, Tuple

from beir import util
from beir.datasets.data_loader import GenericDataLoader


class BeirDataloader:

def __init__(self, dataset: str):
self.dataset = dataset

def download_and_unzip(self):
url = f"https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/{self.dataset}.zip"
out_dir = os.path.join(os.getcwd(), "datasets")
self.data_path = util.download_and_unzip(url, out_dir)
print(f"Dataset downloaded here: {self.data_path}")
return self.data_path

def load(
self, data_path: Optional[str] = None, split: str = "test"
) -> Tuple[Dict[str, Any], Dict[str, Any], Dict[str, Any]]:
if data_path:
self.data_path = data_path
corpus, queries, qrels = GenericDataLoader(self.data_path).load(split=split)
return corpus, queries, qrels
Loading