Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
121 changes: 79 additions & 42 deletions www/docs/api-reference/search-apis/reranking.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@ import {vars} from '@site/static/variables.json';

import CodePanel from '@site/src/theme/CodePanel';


Initial search results often fail to capture nuanced relevance or diversity,
potentially leading to suboptimal user experiences. Utilizing Vectara's
reranking can significantly enhance the quality and usefulness of
Expand All @@ -23,17 +22,24 @@ more accurate results.

## Available rerankers

Vectara currently provides the following rerankers:
Vectara offers multiple reranking models that enable you to choose the best one
that for your data and use case. You can evaluate different models
Copy link
Preview

Copilot AI Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a grammatical error in line 25-26. 'the best one that for your data' should be 'the best one for your data' - remove the word 'that'.

Suggested change
that for your data and use case. You can evaluate different models
for your data and use case. You can evaluate different models

Copilot uses AI. Check for mistakes.

against your own dataset to determine which provides optimal results for your
domain and accuracy and latency requirements.

* [**Multilingual Reranker v1**](/docs/learn/vectara-multi-lingual-reranker) (`type=customer_reranker` and `reranker_name=Rerank_Multilingual_v1`)
also known as Slingshot, provides more accurate neural ranking than the
initial Boomerang retrieval. While computationally more expensive, it offers
improved text scoring across a wide range of languages, making it suitable
for diverse content.
* [**Maximal Marginal Relevance (MMR) Reranker**](/docs/learn/mmr-reranker) (`type=mmr`)
for diversifying results while maintaining relevance.
* [**User Defined Function Reranker**](/docs/learn/user-defined-function-reranker) (`type=userfn`) for
custom scoring based on metadata.
| Reranker Name | API Name | Description |
|--------------|----------|-------------|
| **Qwen3 Reranker** (default) | `qwen3-reranker` | High-performance multilingual neural reranker optimized for accuracy. In many benchmarks, Qwen3 demonstrates strong performance, though results vary by dataset. |
| **Mixbread Reranker** | `mxbai-rerank-base-v2` | Efficient production-friendly model offering a good balance between speed and accuracy. |
| [**Multilingual Reranker v1**](/docs/learn/vectara-multi-lingual-reranker) (Slingshot) | `Rerank_Multilingual_v1` | Neural reranker providing more accurate ranking than initial Boomerang retrieval. While computationally more expensive, it offers improved text scoring across a wide range of languages. |
| [**Maximal Marginal Relevance (MMR) Reranker**](/docs/learn/mmr-reranker) | `type=mmr` | Diversifies results while maintaining relevance. |
| [**User Defined Function Reranker**](/docs/learn/user-defined-function-reranker) | `type=userfn` | Applies custom scoring based on metadata or business rules. |

:::tip
To enable reranking in the Vectara console, navigate to the
Query tab of a corpus and select **Retrieval**. Use this for exploration
and experimenting with the API.
:::

### Chain reranking

Expand All @@ -51,32 +57,66 @@ precision while maintaining recall.

## Enable reranking

To enable reranking, specify the appropriate value for the `type` in the
`reranker` object. For the MMR reranker, use `mmr`. In most scenarios,
it makes sense to use the default query `start` value of `0` so that you're
reranking all of the best initial results. You can also set the `limit` of the
`query` to the total number of documents you wish to rerank. The default value
To enable reranking, specify the appropriate value for the `type` in the
`reranker` object. For the MMR reranker, use `mmr`. In most scenarios,
it makes sense to use the default query `start` value of `0` so that you're
reranking all of the best initial results. You can also set the `limit` of the
`query` to the total number of documents you wish to rerank. The default value
is `25`.

The following example shows the `limit` and `type` values in a query. Note that
The following example shows the `limit` and `type` values in a query. Note that
this simplified example intentionally omits several parameter values.

<CodePanel snippets={[{language: "json", code: `{
"query": "What is my question?",
"stream_response": false,
"search": {
"start": 0,
"limit": 25,
"context_configuration": {},
},
"reranker": {
"type": "mmr",
"diversity_bias": "0.4"
},
"generation": [],
"enable_factual_consistency_score": true
"query": "What is my question?",
"stream_response": false,
"search": {
"start": 0,
"limit": 25,
"context_configuration": {},
},
"reranker": {
"type": "mmr",
"diversity_bias": "0.4"
},
"generation": [],
"enable_factual_consistency_score": true
}`}]} title="Code Example" layout="stacked" />

### Using neural rerankers

For neural rerankers like Qwen3, Mixbread, or Multilingual v1, use
`type=customer_reranker` and specify the `reranker_name`.

<CodePanel snippets={[{language: "json", code: `{
"query": "What is quantum computing?",
"reranker": {
"type": "customer_reranker",
"reranker_name": "qwen3-reranker"
}
}`}]} title="Query with Qwen3 Reranker" layout="stacked" />

<CodePanel snippets={[{language: "json", code: `{
"query": "What is quantum computing?",
"reranker": {
"type": "customer_reranker",
"reranker_name": "mxbai-rerank-base-v2"
}
}`}]} title="Query with Mixbread Reranker" layout="stacked" />

## Best practices

When working with multiple rerankers, consider the following best practices:

* **Experimentation**: Each reranker behaves differently depending on your
content and queries. Evaluate each reranker on your own dataset to determine
which provides the best results for your specific use case.
* **Latency vs. accuracy**: Larger models like Qwen3 tend to provide more
accurate results but can add more latency compared to smaller models like
Mixbread. Test both models to find the right balance for your application.
* **Fallback handling**: Ensure your application handles reranker errors
gracefully and can fall back to retrieval-only results if a reranker fails
or times out.

## Search cutoffs

Expand All @@ -92,9 +132,9 @@ level of relevance. For example, when you set the `cutoff` to `0.5`, only result
with a score of `0.5` or higher are considered. For example:

<CodePanel snippets={[{language: "json", code: `"reranker": {
"type": "customer_reranker",
"reranker_name": "Rerank_Multilingual_v1",
"cutoff": 0.5
"type": "customer_reranker",
"reranker_name": "Rerank_Multilingual_v1",
"cutoff": 0.5
}`}]} title="Code Example" layout="stacked" />
When a reranker is applied with a cutoff, it performs the following steps:

Expand All @@ -111,10 +151,11 @@ cutoff is applied first, followed by the limit.
:::

:::caution
Search cutoffs are most effective when used with neural rerankers like
the Vectara Multilingual reranker (Slingshot). This provides normalized
scores between 0 and 1. If you use hybrid search methods that involve BM25,
scores may be unbounded, making cutoff values less predictable.
Search cutoffs are most effective when used with neural rerankers like
Qwen3, Mixbread, or the Vectara Multilingual reranker (Slingshot), which
provide normalized scores between 0 and 1. If you use hybrid search methods
that involve BM25, scores may be unbounded, making cutoff values less
predictable.
:::

## Search limits
Expand Down Expand Up @@ -204,8 +245,4 @@ only highly relevant and recent documents for summarization.
2. The next stage prioritizes documents based on their `publish_ts` value,
which represents the publication timestamp.

:::tip
You can also enable reranking in the Vectara console after navigating to the
Query tab of a corpus and selecting **Retrieval**. Use this for exploration
and experimenting with the API.
:::

26 changes: 19 additions & 7 deletions www/docs/learn/knee-reranking.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,17 +15,17 @@ distributions across queries. Knee reranking addresses this challenge by
detecting natural boundaries between relevant and irrelevant results
automatically.

Knee reranking combines statistical analysis with configurable parameters
to provide intelligent, adaptive filtering. Designed specifically to work
after the Slingshot reranker, it analyzes score patterns to identify
significant drops in relevance while maintaining safeguards against
over-aggressive filtering. For more details about how this reranker works, see
Knee reranking combines statistical analysis with configurable parameters
to provide intelligent, adaptive filtering. Designed specifically to work
after the Slingshot reranker, it analyzes score patterns to identify
significant drops in relevance while maintaining safeguards against
over-aggressive filtering. For more details about how this reranker works, see
this [**blog post**](https://www.vectara.com/blog/introducing-the-knee-reranking-smart-result-filtering-for-better-results).

## Enable knee reranking

Enable knee reranking by adding it your reranking chain after the Slingshot
reranker. The default settings balance precision and recall, making them
Enable knee reranking by adding it your reranking chain after the Slingshot
reranker. The default settings balance precision and recall, making them
suitable for most use cases.

<CodePanel snippets={[{language: "json", code: `{
Expand All @@ -38,6 +38,18 @@ suitable for most use cases.
}
}`}]} title="Default Configuration Example" layout="stacked" />

Knee reranking also works with the new neural rerankers like Qwen3 and Mixbread:

<CodePanel snippets={[{language: "json", code: `{
"reranker": {
"type": "chain",
"rerankers": [
{ "type": "customer_reranker", "reranker_name": "qwen3-reranker" },
{ "type": "userfn", "user_function": "knee()", "cutoff": 0.5 }
]
}
}`}]} title="Knee Reranking with Qwen3" layout="stacked" />

Customize the behavior of knee reranking through two key parameters:

* **Sensitivity:** Controls how sharply the score must drop to identify a cutoff.
Expand Down
46 changes: 24 additions & 22 deletions www/docs/learn/vectara-multi-lingual-reranker.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,28 +11,30 @@ import {vars} from '@site/static/variables.json';
import CodePanel from '@site/src/theme/CodePanel';


Generative AI applications often struggle with ranking the most relevant
information, leading to hallucinations and irrelevant responses. The new
Vectara Multilingual Reranker V1, also known as Slingshot, is a
state-of-the-art reranking model that significantly enhances the precision of
retrieved results. Providing advanced neural ranking, it refines the output of
initial models like [Boomerang](https://vectara.com/blog/introducing-boomerang-vectaras-new-and-improved-retrieval-model/),
offering even more accurate document scoring and response quality in Retrieval
Augmented Generation (RAG) pipelines.

The Vectara Multilingual Reranker operates as a second-pass refinement tool,
building on Boomerang’s high-recall capabilities. While Boomerang quickly
retrieves a broad set of relevant documents, the Multilingual Reranker
delivers more precise results, ensuring that the top-ranked documents are the
most relevant. This reranker also excels across both English and multilingual
datasets, making it a powerful tool for global use cases.

While more computationally expensive and introducing some additional latency,
the multilingual reranker improves neural ranking beyond Boomerang’s initial
selection by providing more precise text scoring. Think of the Slingshot
reranker as a "better Boomerang" for refining results, with the multilingual
capability serving primarily as a differentiator from other rerankers in the
market, which are often English-only.
Generative AI applications often struggle with ranking the most relevant
information, leading to hallucinations and irrelevant responses. The Vectara
Multilingual Reranker V1, also known as Slingshot, is a neural reranking model
that enhances the precision of retrieved results. Providing advanced neural
ranking, it refines the output of initial models like [Boomerang](https://vectara.com/blog/introducing-boomerang-vectaras-new-and-improved-retrieval-model/),
offering improved document scoring and response quality in Retrieval Augmented
Generation (RAG) pipelines.

The Vectara Multilingual Reranker operates as a second-pass refinement tool,
building on Boomerang's high-recall capabilities. While Boomerang quickly
retrieves a broad set of relevant documents, the Multilingual Reranker
delivers more precise results, ensuring that the top-ranked documents are the
most relevant. This reranker excels across both English and multilingual
datasets, making it a strong choice for global use cases.

While more computationally expensive and introducing some additional latency,
the multilingual reranker improves neural ranking beyond Boomerang's initial
selection by providing more precise text scoring. The multilingual capability
serves as a key differentiator, as many market rerankers are English-only.

Vectara now offers multiple reranking models including Qwen3 (the default for
SaaS) and Mixbread. You should evaluate different rerankers on your own
dataset to determine which provides the best results for your specific use
case and latency requirements.

Using this reranker requires both the `type` and `reranker_name` in the
`reranker` object. Set the `type` as `customer_reranker` and the `reranker_name`
Expand Down
28 changes: 28 additions & 0 deletions www/docs/release-notes.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,34 @@ and how these product and documentation changes can benefit your enterprise.

---

## New Reranking Models: Qwen3 and Mixbread

_September 30, 2025_

Vectara now offers two new neural reranking models: **Qwen3** and **Mixbread**.
These models provide more flexibility to optimize search result relevance
based on your specific accuracy and latency requirements.

**Why it matters:** Each use case demands different tradeoffs. With multiple
available rerankers, you can evaluate and select the model that best fits your
data and performance needs.

**What's new:**

- **Qwen3 Reranker** (`qwen3-reranker`): High-accuracy multilingual model, now the
default option. Optimized for precision across diverse datasets.
- **Mixbread Reranker** (`mxbai-rerank-base-v2`): Efficient model balancing speed
and accuracy for high-volume production workloads.
- Override defaults per query or combine rerankers in chains for advanced strategies.
- Evaluate all rerankers against your data to find the optimal fit.

**More information:**

* [Reranking](/docs/api-reference/search-apis/reranking)
* [Chain Reranker](/docs/learn/chain-reranker)

---

## Vectara Agents Framework

_September 3, 2025_
Expand Down
Loading