Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Split Response Processor to 2.17 Search Pipeline docs #8081

Merged
merged 1 commit into from
Aug 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions _search-plugins/search-pipelines/search-processors.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ Processor | Description | Earliest available version
[`rerank`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rerank-processor/)| Reranks search results using a cross-encoder model. | 2.12
[`retrieval_augmented_generation`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rag-processor/) | Used for retrieval-augmented generation (RAG) in [conversational search]({{site.url}}{{site.baseurl}}/search-plugins/conversational-search/). | 2.10 (generally available in 2.12)
[`sort`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/sort-processor/)| Sorts an array of items in either ascending or descending order. | 2.16
[`split`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/split-processor/)| Splits a string field into an array of substrings based on a specified delimiter. | 2.17
[`truncate_hits`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/truncate-hits-processor/)| Discards search hits after a specified target count is reached. Can undo the effect of the `oversample` request processor. | 2.12


Expand Down
236 changes: 236 additions & 0 deletions _search-plugins/search-pipelines/split-processor.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,236 @@
---
layout: default
title: Split
nav_order: 140
has_children: false
parent: Search processors
grand_parent: Search pipelines
---

# Split processor
Introduced 2.17
{: .label .label-purple }

The `split` processor splits a string field into an array of substrings based on a specified delimiter.

## Request fields

The following table lists all available request fields.

Field | Data type | Description
:--- | :--- | :---
`field` | String | The field containing the string to be split. Required.
`separator` | String | The delimiter used to split the string. Specify either a single separator character or a regular expression pattern. Required.
`preserve_trailing` | Boolean | If set to `true`, preserves empty trailing fields (for example, `''`) in the resulting array. If set to `false`, then empty trailing fields are removed from the resulting array. Default is `false`.
`target_field` | String | The field in which the array of substrings is stored. If not specified, then the field is updated in place.
`tag` | String | The processor's identifier.
`description` | String | A description of the processor.
`ignore_failure` | Boolean | If `true`, then OpenSearch [ignores any failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/creating-search-pipeline/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`.

## Example

The following example demonstrates using a search pipeline with a `split` processor.

### Setup

Create an index named `my_index` and index a document containing the field `message`:

```json
POST /my_index/_doc/1
{
"message": "ingest, search, visualize, and analyze data",
"visibility": "public"
}
```
{% include copy-curl.html %}

### Creating a search pipeline

The following request creates a search pipeline with a `split` response processor that splits the `message` field and stores the results in the `split_message` field:

```json
PUT /_search/pipeline/my_pipeline
{
"response_processors": [
{
"split": {
"field": "message",
"separator": ", ",
"target_field": "split_message"
}
}
]
}
```
{% include copy-curl.html %}

### Using a search pipeline

Search for documents in `my_index` without a search pipeline:

```json
GET /my_index/_search
```
{% include copy-curl.html %}

The response contains the field `message`:

<details open markdown="block">
<summary>
Response
</summary>
{: .text-delta}
```json
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "my_index",
"_id": "1",
"_score": 1,
"_source": {
"message": "ingest, search, visualize, and analyze data",
"visibility": "public"
}
}
]
}
}
```
</details>

To search with a pipeline, specify the pipeline name in the `search_pipeline` query parameter:

```json
GET /my_index/_search?search_pipeline=my_pipeline
```
{% include copy-curl.html %}

The `message` field is split and the results are stored in the `split_message` field:

<details open markdown="block">
<summary>
Response
</summary>
{: .text-delta}

```json
{
"took": 6,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "my_index",
"_id": "1",
"_score": 1,
"_source": {
"visibility": "public",
"message": "ingest, search, visualize, and analyze data",
"split_message": [
"ingest",
"search",
"visualize",
"and analyze data"
]
}
}
]
}
}
```
</details>

You can also use the `fields` option to search for specific fields in a document:

```json
POST /my_index/_search?pretty&search_pipeline=my_pipeline
{
"fields": ["visibility", "message"]
}
```
{% include copy-curl.html %}

In the response, the `message` field is split and the results are stored in the `split_message` field:

<details open markdown="block">
<summary>
Response
</summary>
{: .text-delta}

```json
{
"took": 7,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "my_index",
"_id": "1",
"_score": 1,
"_source": {
"visibility": "public",
"message": "ingest, search, visualize, and analyze data",
"split_message": [
"ingest",
"search",
"visualize",
"and analyze data"
]
},
"fields": {
"visibility": [
"public"
],
"message": [
"ingest, search, visualize, and analyze data"
],
"split_message": [
"ingest",
"search",
"visualize",
"and analyze data"
]
}
}
]
}
}
```
</details>
Loading