Skip to content

Commit a4316cc

Browse files
Elastic Rerank model landing page (#2884) (#2891)
(cherry picked from commit 65aa83a) Co-authored-by: Liam Thompson <[email protected]>
1 parent 03eea32 commit a4316cc

File tree

2 files changed

+357
-0
lines changed

2 files changed

+357
-0
lines changed

docs/en/stack/ml/nlp/index.asciidoc

+1
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ include::ml-nlp-inference.asciidoc[leveloffset=+1]
99
include::ml-nlp-apis.asciidoc[leveloffset=+1]
1010
include::ml-nlp-built-in-models.asciidoc[leveloffset=+1]
1111
include::ml-nlp-elser.asciidoc[leveloffset=+2]
12+
include::ml-nlp-elastic-rerank.asciidoc[leveloffset=+2]
1213
include::ml-nlp-e5.asciidoc[leveloffset=+2]
1314
include::ml-nlp-lang-ident.asciidoc[leveloffset=+2]
1415
include::ml-nlp-model-ref.asciidoc[leveloffset=+1]
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,356 @@
1+
[[ml-nlp-rerank]]
2+
= Elastic Rerank
3+
4+
Elastic Rerank is a state-of-the-art cross-encoder reranking model trained by Elastic that helps you improve search relevance with a few simple API calls.
5+
Elastic Rerank is Elastic's first semantic reranking model and is available out-of-the-box in supporting Elastic deployments using the {es} Inference API.
6+
7+
Use Elastic Rerank to improve existing search applications including:
8+
9+
* Traditional BM25 scoring
10+
* Hybrid semantic search
11+
* Retrieval Augmented Generation (RAG)
12+
13+
The model can significantly improve search result quality by reordering results based on deeper semantic understanding of queries and documents.
14+
15+
When reranking BM25 results, it provides an average 40% improvement in ranking quality on a diverse benchmark of retrieval tasks— matching the performance of models 11x its size.
16+
17+
[discrete]
18+
[[ml-nlp-rerank-availability]]
19+
== Availability and requirements
20+
21+
experimental[]
22+
23+
[discrete]
24+
[[ml-nlp-rerank-availability-serverless]]
25+
=== Elastic Cloud Serverless
26+
27+
Elastic Rerank is available in {es} Serverless projects as of November 25, 2024.
28+
29+
[discrete]
30+
[[ml-nlp-rerank-availability-elastic-stack]]
31+
=== Elastic Cloud Hosted and self-managed deployments
32+
33+
Elastic Rerank is available in Elastic Stack version 8.17+:
34+
35+
* To use Elastic Rerank, you must have the appropriate subscription level or the trial period activated.
36+
* A 4GB ML node
37+
+
38+
[IMPORTANT]
39+
====
40+
Deploying the Elastic Rerank model in combination with ELSER (or other hosted models) requires at minimum an 8GB ML node. The current maximum size for trial ML nodes is 4GB (defaults to 1GB).
41+
====
42+
43+
[discrete]
44+
[[ml-nlp-rerank-deploy]]
45+
== Download and deploy
46+
47+
To download and deploy Elastic Rerank, use the {ref}/infer-service-elasticsearch.html[create inference API] to create an {es} service `rerank` endpoint.
48+
49+
[discrete]
50+
[[ml-nlp-rerank-deploy-steps]]
51+
=== Create an inference endpoint
52+
53+
. In {kib}, navigate to the *Dev Console*.
54+
55+
. Create an {infer} endpoint with the Elastic Rerank service by running:
56+
+
57+
[source,console]
58+
----------------------------------
59+
PUT _inference/rerank/my-rerank-model
60+
{
61+
"service": "elasticsearch",
62+
"service_settings": {
63+
"adaptive_allocations": {
64+
"enabled": true,
65+
"min_number_of_allocations": 1,
66+
"max_number_of_allocations": 10
67+
},
68+
"num_threads": 1,
69+
"model_id": ".rerank-v1"
70+
}
71+
}
72+
----------------------------------
73+
+
74+
NOTE: The API request automatically downloads and deploys the model. This example uses <<ml-nlp-auto-scale,autoscaling>> through adaptive allocation.
75+
76+
[NOTE]
77+
====
78+
You might see a 502 bad gateway error in the response when using the {kib} Console.
79+
This error usually just reflects a timeout, while the model downloads in the background.
80+
You can check the download progress in the {ml-app} UI.
81+
If using the Python client, you can set the `timeout` parameter to a higher value.
82+
====
83+
84+
After creating the Elastic Rerank {infer} endpoint, it's ready to use with a {ref}/retriever.html#text-similarity-reranker-retriever-example-elastic-rerank[`text_similarity_reranker`] retriever.
85+
86+
[discrete]
87+
[[ml-nlp-rerank-deploy-verify]]
88+
== Deploy in an air-gapped environment
89+
90+
If you want to deploy the Elastic Rerank model in a restricted or closed network, you have two options:
91+
92+
* Create your own HTTP/HTTPS endpoint with the model artifacts on it
93+
* Put the model artifacts into a directory inside the config directory on all master-eligible nodes.
94+
95+
[discrete]
96+
[[ml-nlp-rerank-model-artifacts]]
97+
=== Model artifact files
98+
99+
For the cross-platform version, you need the following files in your system:
100+
```
101+
https://ml-models.elastic.co/rerank-v1.metadata.json
102+
https://ml-models.elastic.co/rerank-v1.pt
103+
https://ml-models.elastic.co/rerank-v1.vocab.json
104+
```
105+
106+
// For the optimized version, you need the following files in your system:
107+
// ```
108+
// https://ml-models.elastic.co/rerank-v1_linux-x86_64.metadata.json
109+
// https://ml-models.elastic.co/rerank-v1_linux-x86_64.pt
110+
// https://ml-models.elastic.co/rerank-v1_linux-x86_64.vocab.json
111+
// ```
112+
113+
[discrete]
114+
=== Using an HTTP server
115+
116+
INFO: If you use an existing HTTP server, note that the model downloader only
117+
supports passwordless HTTP servers.
118+
119+
You can use any HTTP service to deploy the model. This example uses the official
120+
Nginx Docker image to set a new HTTP download service up.
121+
122+
. Download the <<ml-nlp-rerank-model-artifacts,model artifact files>>.
123+
. Put the files into a subdirectory of your choice.
124+
. Run the following commands:
125+
+
126+
--
127+
[source, shell]
128+
--------------------------------------------------
129+
export ELASTIC_ML_MODELS="/path/to/models"
130+
docker run --rm -d -p 8080:80 --name ml-models -v ${ELASTIC_ML_MODELS}:/usr/share/nginx/html nginx
131+
--------------------------------------------------
132+
133+
Don't forget to change `/path/to/models` to the path of the subdirectory where
134+
the model artifact files are located.
135+
136+
These commands start a local Docker image with an Nginx server with the
137+
subdirectory containing the model files. As the Docker image has to be
138+
downloaded and built, the first start might take a longer period of time.
139+
Subsequent runs start quicker.
140+
--
141+
. Verify that Nginx runs properly by visiting the following URL in your
142+
browser:
143+
+
144+
--
145+
```
146+
http://{IP_ADDRESS_OR_HOSTNAME}:8080/rerank-v1.metadata.json
147+
```
148+
149+
If Nginx runs properly, you see the content of the metdata file of the model.
150+
--
151+
. Point your {es} deployment to the model artifacts on the HTTP server
152+
by adding the following line to the `config/elasticsearch.yml` file:
153+
+
154+
--
155+
```
156+
xpack.ml.model_repository: http://{IP_ADDRESS_OR_HOSTNAME}:8080
157+
```
158+
159+
If you use your own HTTP or HTTPS server, change the address accordingly. It is
160+
important to specificy the protocol ("http://" or "https://"). Ensure that all
161+
master-eligible nodes can reach the server you specify.
162+
--
163+
. Repeat step 5 on all master-eligible nodes.
164+
. {ref}/restart-cluster.html#restart-cluster-rolling[Restart] the
165+
master-eligible nodes one by one.
166+
. Create an inference endpoint to deploy the model per <<ml-nlp-rerank-deploy-steps,these steps>>.
167+
168+
The HTTP server is only required for downloading the model. After the download
169+
has finished, you can stop and delete the service. You can stop the Docker image
170+
used in this example by running the following command:
171+
172+
[source, shell]
173+
--------------------------------------------------
174+
docker stop ml-models
175+
--------------------------------------------------
176+
177+
[discrete]
178+
=== Using file-based access
179+
180+
For a file-based access, follow these steps:
181+
182+
. Download the <<ml-nlp-rerank-model-artifacts,model artifact files>>.
183+
. Put the files into a `models` subdirectory inside the `config` directory of
184+
your {es} deployment.
185+
. Point your {es} deployment to the model directory by adding the
186+
following line to the `config/elasticsearch.yml` file:
187+
+
188+
--
189+
```
190+
xpack.ml.model_repository: file://${path.home}/config/models/
191+
```
192+
--
193+
. Repeat step 2 and step 3 on all master-eligible nodes.
194+
. {ref}/restart-cluster.html#restart-cluster-rolling[Restart] the
195+
master-eligible nodes one by one.
196+
. Create an inference endpoint to deploy the model per <<ml-nlp-rerank-deploy-steps,these steps>>.
197+
198+
[discrete]
199+
[[ml-nlp-rerank-limitations]]
200+
== Limitations
201+
202+
* English language only
203+
* Maximum context window of 512 tokens
204+
+
205+
When using the {ref}/semantic-text.html[`semantic_text` field type], text is divided into chunks. By default, each chunk contains 250 words (approximately 400 tokens). Be cautious when increasing the chunk size - if the combined length of your query and chunk text exceeds 512 tokens, the model won't have access to the full content.
206+
+
207+
When the combined inputs exceed the 512 token limit, a balanced truncation strategy is used. If both the query and input text are longer than 255 tokens each then both are truncated, otherwise the longest is truncated.
208+
209+
[discrete]
210+
[[ml-nlp-rerank-perf-considerations]]
211+
== Performance considerations
212+
213+
It's important to note that if you rerank to depth `n` then you will need to run `n` inferences per query. This will include the document text and will therefore be significantly more expensive than inference for query embeddings. Hardware can be scaled to run these inferences in parallel, but we would recommend shallow reranking for CPU inference: no more than top-30 results. You may find that the preview version is cost prohibitive for high query rates and low query latency requirements. We plan to address performance issues for GA.
214+
215+
[discrete]
216+
[[ml-nlp-rerank-model-specs]]
217+
== Model specifications
218+
219+
* Purpose-built for English language content
220+
221+
* Relatively small: 184M parameters (86M backbone + 98M embedding layer)
222+
223+
* Matches performance of billion-parameter reranking models
224+
225+
* Built directly into {es} - no external services or dependencies needed
226+
227+
[discrete]
228+
[[ml-nlp-rerank-arch-overview]]
229+
== Model architecture
230+
231+
Elastic Rerank is built on the https://arxiv.org/abs/2111.09543[DeBERTa v3] language model architecture.
232+
233+
The model employs several key architectural features that make it particularly effective for reranking:
234+
235+
* *Disentangled attention mechanism* enables the model to:
236+
** Process word content and position separately
237+
** Learn more nuanced relationships between query and document text
238+
** Better understand the semantic importance of word positions and relationships
239+
240+
* *ELECTRA-style pre-training* uses:
241+
** A GAN-like approach to token prediction
242+
** Simultaneous training of token generation and detection
243+
** Enhanced parameter efficiency compared to traditional masked language modeling
244+
245+
[discrete]
246+
[[ml-nlp-rerank-arch-training]]
247+
== Training process
248+
249+
Here is an overview of the Elastic Rerank model training process:
250+
251+
* *Initial relevance extraction*
252+
** Fine-tunes the pre-trained DeBERTa [CLS] token representation
253+
** Uses a GeLU activation and dropout layer
254+
** Preserves important pre-trained knowledge while adapting to the reranking task
255+
256+
* *Trained by distillation*
257+
** Uses an ensemble of bi-encoder and cross-encoder models as a teacher
258+
** Bi-encoder provides nuanced negative example assessment
259+
** Cross-encoder helps differentiate between positive and negative examples
260+
** Combines strengths of both model types
261+
262+
[discrete]
263+
[[ml-nlp-rerank-arch-data]]
264+
=== Training data
265+
266+
The training data consists of:
267+
268+
* Open domain Question-Answering datasets
269+
* Natural document pairs (like article headings and summaries)
270+
* 180,000 synthetic query-passage pairs with varying relevance
271+
* Total of approximately 3 million queries
272+
273+
The data preparation process includes:
274+
275+
* Basic cleaning and fuzzy deduplication
276+
* Multi-stage prompting for diverse topics (on the synthetic portion of the training data only)
277+
* Varied query types:
278+
** Keyword search
279+
** Exact phrase matching
280+
** Short and long natural language questions
281+
282+
[discrete]
283+
[[ml-nlp-rerank-arch-sampling]]
284+
=== Negative sampling
285+
286+
The model uses an advanced sampling strategy to ensure high-quality rankings:
287+
288+
* Samples from top 128 documents per query using multiple retrieval methods
289+
* Uses five negative samples per query - more than typical approaches
290+
* Applies probability distribution shaped by document scores for sampling
291+
292+
* Deep sampling benefits:
293+
** Improves model robustness across different retrieval depths
294+
** Enhances score calibration
295+
** Provides better handling of document diversity
296+
297+
[discrete]
298+
[[ml-nlp-rerank-arch-optimization]]
299+
=== Training optimization
300+
301+
The training process incorporates several key optimizations:
302+
303+
Uses cross-entropy loss function to:
304+
305+
* Model relevance as probability distribution
306+
* Learn relationships between all document scores
307+
* Fit scores through maximum likelihood estimation
308+
309+
Implemented parameter averaging along optimization trajectory:
310+
311+
* Eliminates need for traditional learning rate scheduling and provides improvement in the final model quality
312+
313+
[discrete]
314+
[[ml-nlp-rerank-performance]]
315+
== Performance
316+
317+
Elastic Rerank shows significant improvements in search quality across a wide range of retrieval tasks.
318+
319+
[discrete]
320+
[[ml-nlp-rerank-performance-overview]]
321+
=== Overview
322+
323+
* Average 40% improvement in ranking quality when reranking BM25 results
324+
* 184M parameter model matches performance of 2B parameter alternatives
325+
* Evaluated across 21 different datasets using the BEIR benchmark suite
326+
327+
[discrete]
328+
[[ml-nlp-rerank-performance-benchmarks]]
329+
=== Key benchmark results
330+
331+
* Natural Questions: 90% improvement
332+
* MS MARCO: 85% improvement
333+
* Climate-FEVER: 80% improvement
334+
* FiQA-2018: 76% improvement
335+
336+
For detailed benchmark information, including complete dataset results and methodology, refer to the https://www.elastic.co/search-labs/introducing-elastic-rerank[Introducing Elastic Rerank blog].
337+
338+
// [discrete]
339+
// [[ml-nlp-rerank-benchmarks-hw]]
340+
// === Hardware benchmarks
341+
// Note: these are more for GA timeframe
342+
343+
[discrete]
344+
[[ml-nlp-rerank-resources]]
345+
== Further resources
346+
347+
*Documentation*:
348+
349+
* {ref}/semantic-reranking.html#semantic-reranking-in-es[Semantic re-ranking in {es} overview]
350+
* {ref}/infer-service-elasticsearch.html#inference-example-elastic-reranker[Inference API example]
351+
352+
*Blogs*:
353+
354+
* https://www.elastic.co/search-labs/blog/elastic-semantic-reranker-part-1[Part 1]
355+
* https://www.elastic.co/search-labs/blog/elastic-semantic-reranker-part-2[Part 2]
356+
* https://www.elastic.co/search-labs/blog/elastic-semantic-reranker-part-3[Part 3]

0 commit comments

Comments
 (0)