Merge branch 'main' into Character-filters-1483

opensearch-project · Oct 8, 2024 · 55bcf61 · 55bcf61
2 parents 8c41636 + 3f77141
commit 55bcf61
Show file tree

Hide file tree

Showing 25 changed files with 166 additions and 51 deletions.
diff --git a/_analyzers/index-analyzers.md b/_analyzers/index-analyzers.md
@@ -2,6 +2,7 @@
 layout: default
 title: Index analyzers
 nav_order: 20
+parent: Analyzers
 ---
 
 # Index analyzers

diff --git a/_analyzers/index.md b/_analyzers/index.md
@@ -45,20 +45,9 @@ An analyzer must contain exactly one tokenizer and may contain zero or more char
 
 There is also a special type of analyzer called a ***normalizer***. A normalizer is similar to an analyzer except that it does not contain a tokenizer and can only include specific types of character filters and token filters. These filters can perform only character-level operations, such as character or pattern replacement, and cannot perform operations on the token as a whole. This means that replacing a token with a synonym or stemming is not supported. See [Normalizers]({{site.url}}{{site.baseurl}}/analyzers/normalizers/) for further details.
 
-## Built-in analyzers
+## Supported analyzers
 
-The following table lists the built-in analyzers that OpenSearch provides. The last column of the table contains the result of applying the analyzer to the string `It’s fun to contribute a brand-new PR or 2 to OpenSearch!`.
-
-Analyzer | Analysis performed | Analyzer output 
-:--- | :--- | :---
-**Standard** (default) | - Parses strings into tokens at word boundaries <br> - Removes most punctuation <br> - Converts tokens to lowercase | [`it’s`, `fun`, `to`, `contribute`, `a`,`brand`, `new`, `pr`, `or`, `2`, `to`, `opensearch`]
-**Simple** | - Parses strings into tokens on any non-letter character <br> - Removes non-letter characters <br> - Converts tokens to lowercase  | [`it`, `s`, `fun`, `to`, `contribute`, `a`,`brand`, `new`, `pr`, `or`, `to`, `opensearch`]
-**Whitespace** | - Parses strings into tokens on white space | [`It’s`, `fun`, `to`, `contribute`, `a`,`brand-new`, `PR`, `or`, `2`, `to`, `OpenSearch!`]
-**Stop** | - Parses strings into tokens on any non-letter character <br> - Removes non-letter characters <br> - Removes stop words <br> - Converts tokens to lowercase | [`s`, `fun`, `contribute`, `brand`, `new`, `pr`, `opensearch`]
-**Keyword** (no-op) | - Outputs the entire string unchanged | [`It’s fun to contribute a brand-new PR or 2 to OpenSearch!`]
-**Pattern** | - Parses strings into tokens using regular expressions <br> - Supports converting strings to lowercase <br> - Supports removing stop words | [`it`, `s`, `fun`, `to`, `contribute`, `a`,`brand`, `new`, `pr`, `or`, `2`, `to`, `opensearch`]
-[**Language**]({{site.url}}{{site.baseurl}}/analyzers/language-analyzers/) | Performs analysis specific to a certain language (for example, `english`). | [`fun`, `contribut`, `brand`, `new`, `pr`, `2`, `opensearch`]
-**Fingerprint** | - Parses strings on any non-letter character <br> - Normalizes characters by converting them to ASCII <br> - Converts tokens to lowercase <br> - Sorts, deduplicates, and concatenates tokens into a single token <br> - Supports removing stop words | [`2 a brand contribute fun it's new opensearch or pr to`] <br> Note that the apostrophe was converted to its ASCII counterpart.
+For a list of supported analyzers, see [Analyzers]({{site.url}}{{site.baseurl}}/analyzers/supported-analyzers/index/).
 
 ## Custom analyzers
 
@@ -195,3 +184,4 @@ Normalization ensures that searches are not limited to exact term matches, allow
 ## Next steps
 
 - Learn more about specifying [index analyzers]({{site.url}}{{site.baseurl}}/analyzers/index-analyzers/) and [search analyzers]({{site.url}}{{site.baseurl}}/analyzers/search-analyzers/).
+- See the list of [supported analyzers]({{site.url}}{{site.baseurl}}/analyzers/supported-analyzers/index/).
diff --git a/_analyzers/language-analyzers.md b/_analyzers/language-analyzers.md
@@ -1,14 +1,15 @@
 ---
 layout: default
 title: Language analyzers
-nav_order: 10
+nav_order: 100
+parent: Analyzers
 redirect_from:
   - /query-dsl/analyzers/language-analyzers/
 ---
 
-# Language analyzer
+# Language analyzers
 
-OpenSearch supports the following language values with the `analyzer` option:
+OpenSearch supports the following language analyzers:
 `arabic`, `armenian`, `basque`, `bengali`, `brazilian`, `bulgarian`, `catalan`, `czech`, `danish`, `dutch`, `english`, `estonian`, `finnish`, `french`, `galician`, `german`, `greek`, `hindi`, `hungarian`, `indonesian`, `irish`, `italian`, `latvian`, `lithuanian`, `norwegian`, `persian`, `portuguese`, `romanian`, `russian`, `sorani`, `spanish`, `swedish`, `turkish`, and `thai`.
 
 To use the analyzer when you map an index, specify the value within your query. For example, to map your index with the French language analyzer, specify the `french` value for the analyzer field:
@@ -40,4 +41,4 @@ PUT my-index
 }
 ```
 
-<!-- TO do: each of the options needs its own section with an example. Convert table to individual sections, and then give a streamlined list with valid values. -->
+<!-- TO do: each of the options needs its own section with an example. Convert table to individual sections, and then give a streamlined list with valid values. -->
diff --git a/_analyzers/search-analyzers.md b/_analyzers/search-analyzers.md
@@ -2,6 +2,7 @@
 layout: default
 title: Search analyzers
 nav_order: 30
+parent: Analyzers
 ---
 
 # Search analyzers
@@ -42,7 +43,7 @@ GET shakespeare/_search
 ```
 {% include copy-curl.html %}
 
-Valid values for [built-in analyzers]({{site.url}}{{site.baseurl}}/analyzers/index#built-in-analyzers) are `standard`, `simple`, `whitespace`, `stop`, `keyword`, `pattern`, `fingerprint`, or any supported [language analyzer]({{site.url}}{{site.baseurl}}/analyzers/language-analyzers/).
+For more information about supported analyzers, see [Analyzers]({{site.url}}{{site.baseurl}}/analyzers/supported-analyzers/index/).
 
 ## Specifying a search analyzer for a field
 

diff --git a/_analyzers/supported-analyzers/index.md b/_analyzers/supported-analyzers/index.md
@@ -0,0 +1,32 @@
+---
+layout: default
+title: Analyzers
+nav_order: 40
+has_children: true
+has_toc: false
+redirect_from:
+    - /analyzers/supported-analyzers/index/
+---
+
+# Analyzers
+
+The following sections list all analyzers that OpenSearch supports.
+
+## Built-in analyzers
+
+The following table lists the built-in analyzers that OpenSearch provides. The last column of the table contains the result of applying the analyzer to the string `It’s fun to contribute a brand-new PR or 2 to OpenSearch!`.
+
+Analyzer | Analysis performed | Analyzer output 
+:--- | :--- | :---
+**Standard** (default) | - Parses strings into tokens at word boundaries <br> - Removes most punctuation <br> - Converts tokens to lowercase | [`it’s`, `fun`, `to`, `contribute`, `a`,`brand`, `new`, `pr`, `or`, `2`, `to`, `opensearch`]
+**Simple** | - Parses strings into tokens on any non-letter character <br> - Removes non-letter characters <br> - Converts tokens to lowercase  | [`it`, `s`, `fun`, `to`, `contribute`, `a`,`brand`, `new`, `pr`, `or`, `to`, `opensearch`]
+**Whitespace** | - Parses strings into tokens on white space | [`It’s`, `fun`, `to`, `contribute`, `a`,`brand-new`, `PR`, `or`, `2`, `to`, `OpenSearch!`]
+**Stop** | - Parses strings into tokens on any non-letter character <br> - Removes non-letter characters <br> - Removes stop words <br> - Converts tokens to lowercase | [`s`, `fun`, `contribute`, `brand`, `new`, `pr`, `opensearch`]
+**Keyword** (no-op) | - Outputs the entire string unchanged | [`It’s fun to contribute a brand-new PR or 2 to OpenSearch!`]
+**Pattern** | - Parses strings into tokens using regular expressions <br> - Supports converting strings to lowercase <br> - Supports removing stop words | [`it`, `s`, `fun`, `to`, `contribute`, `a`,`brand`, `new`, `pr`, `or`, `2`, `to`, `opensearch`]
+[**Language**]({{site.url}}{{site.baseurl}}/analyzers/language-analyzers/) | Performs analysis specific to a certain language (for example, `english`). | [`fun`, `contribut`, `brand`, `new`, `pr`, `2`, `opensearch`]
+**Fingerprint** | - Parses strings on any non-letter character <br> - Normalizes characters by converting them to ASCII <br> - Converts tokens to lowercase <br> - Sorts, deduplicates, and concatenates tokens into a single token <br> - Supports removing stop words | [`2 a brand contribute fun it's new opensearch or pr to`] <br> Note that the apostrophe was converted to its ASCII counterpart.
+
+## Language analyzers
+
+OpenSearch supports analyzers for various languages. For more information, see [Language analyzers]({{site.url}}{{site.baseurl}}/analyzers/language-analyzers/).
diff --git a/_analyzers/token-filters/index.md b/_analyzers/token-filters/index.md
@@ -4,6 +4,8 @@ title: Token filters
 nav_order: 70
 has_children: true
 has_toc: false
+redirect_from:
+    - /analyzers/token-filters/index/
 ---
 
 # Token filters

diff --git a/_analyzers/tokenizers/index.md b/_analyzers/tokenizers/index.md
@@ -4,6 +4,8 @@ title: Tokenizers
 nav_order: 60
 has_children: false
 has_toc: false
+redirect_from:
+    - /analyzers/tokenizers/index/
 ---
 
 # Tokenizers

diff --git a/_benchmark/glossary.md b/_benchmark/glossary.md
@@ -1,7 +1,7 @@
 ---
 layout: default
 title: Glossary
-nav_order: 10
+nav_order: 100
 ---
 
 # OpenSearch Benchmark glossary

diff --git a/...hmark/user-guide/configuring-benchmark.md → ...ll-and-configure/configuring-benchmark.md b/...hmark/user-guide/configuring-benchmark.md → ...ll-and-configure/configuring-benchmark.md
@@ -1,10 +1,12 @@
 ---
 layout: default
-title: Configuring OpenSearch Benchmark
+title: Configuring
 nav_order: 7
-parent: User guide
+grand_parent: User guide
+parent: Install and configure
 redirect_from:
   - /benchmark/configuring-benchmark/
+  - /benchmark/user-guide/configuring-benchmark/
 ---
 
 # Configuring OpenSearch Benchmark

diff --git a/_benchmark/user-guide/install-and-configure/index.md b/_benchmark/user-guide/install-and-configure/index.md
@@ -0,0 +1,12 @@
+---
+layout: default
+title: Install and configure
+nav_order: 5
+parent: User guide
+has_children: true
+---
+
+# Installing and configuring OpenSearch Benchmark 
+
+This section details how to install and configure OpenSearch Benchmark.
+
diff --git a/...chmark/user-guide/installing-benchmark.md → ...all-and-configure/installing-benchmark.md b/...chmark/user-guide/installing-benchmark.md → ...all-and-configure/installing-benchmark.md
@@ -1,10 +1,12 @@
 ---
 layout: default
-title: Installing OpenSearch Benchmark
+title: Installing
 nav_order: 5
-parent: User guide
+grand_parent: User guide
+parent: Install and configure
 redirect_from:
   - /benchmark/installing-benchmark/
+  - /benchmark/user-guide/installing-benchmark/
 ---
 
 # Installing OpenSearch Benchmark

diff --git a/_benchmark/user-guide/distributed-load.md → ...optimizing-benchmarks/distributed-load.md b/_benchmark/user-guide/distributed-load.md → ...optimizing-benchmarks/distributed-load.md
@@ -2,12 +2,12 @@
 layout: default
 title: Running distributed loads
 nav_order: 15
-parent: User guide
+parent: Optimizing benchmarks
+grand_parent: User guide
 ---
 
 # Running distributed loads 
 
-
 OpenSearch Benchmark loads always run on the same machine on which the benchmark was started. However, you can use multiple load drivers to generate additional benchmark testing loads, particularly for large clusters on multiple machines. This tutorial describes how to distribute benchmark loads across multiple machines in a single cluster.
 
 ## System architecture 

diff --git a/_benchmark/user-guide/optimizing-benchmarks/index.md b/_benchmark/user-guide/optimizing-benchmarks/index.md
@@ -0,0 +1,11 @@
+---
+layout: default
+title: Optimizing benchmarks
+nav_order: 25
+parent: User guide
+has_children: true
+---
+
+# Optimizing benchmarks
+
+This section details different ways you can optimize the benchmark tools for your cluster.
diff --git a/_benchmark/user-guide/target-throughput.md → ...ptimizing-benchmarks/target-throughput.md b/_benchmark/user-guide/target-throughput.md → ...ptimizing-benchmarks/target-throughput.md
@@ -2,7 +2,10 @@
 layout: default
 title: Target throughput
 nav_order: 150
-parent: User guide
+parent: Optimizing benchmarks
+grand_parent: User guide
+redirect_from: 
+  - /benchmark/user-guide/target-throughput/
 ---
 
 # Target throughput

diff --git a/_benchmark/user-guide/telemetry.md b/_benchmark/user-guide/telemetry.md
diff --git a/_benchmark/user-guide/understanding-results/index.md b/_benchmark/user-guide/understanding-results/index.md
@@ -0,0 +1,12 @@
+---
+layout: default
+title: Understanding results
+nav_order: 20
+parent: User guide
+has_children: true
+---
+
+After a [running a workload]({{site.url}}{{site.baseurl}}/benchmark/user-guide/working-with-workloads/running-workloads/), OpenSearch Benchmark produces a series of metrics. The following pages details:
+
+- [How metrics are reported]({{site.url}}{{site.baseurl}}/benchmark/user-guide/understanding-results/summary-reports/)
+- [How to visualize metrics]({{site.url}}{{site.baseurl}}/benchmark/user-guide/understanding-results/telemetry/)
diff --git a/...hmark/user-guide/understanding-results.md → .../understanding-results/summary-reports.md b/...hmark/user-guide/understanding-results.md → .../understanding-results/summary-reports.md
@@ -1,10 +1,14 @@
 ---
 layout: default
-title: Understanding benchmark results
+title: Summary reports
 nav_order: 22
-parent: User guide
+grand_parent: User guide
+parent: Understanding results
+redirect_from: 
+  - /benchmark/user-guide/understanding-results/
 ---
 
+# Understanding the summary report
 
 At the end of each test run, OpenSearch Benchmark creates a summary of test result metrics like service time, throughput, latency, and more. These metrics provide insights into how the selected workload performed on a benchmarked OpenSearch cluster.
 

diff --git a/_benchmark/user-guide/understanding-results/telemetry.md b/_benchmark/user-guide/understanding-results/telemetry.md
@@ -0,0 +1,21 @@
+---
+layout: default
+title: Enabling telemetry devices
+nav_order: 30
+grand_parent: User guide
+parent: Understanding results
+redirect_from: 
+  - /benchmark/user-guide/telemetry
+---
+
+# Enabling telemetry devices
+
+Telemetry results will not appear in the summary report. To visualize telemetry results, ingest the data into OpenSearch and visualize the data in OpenSearch Dashboards. 
+
+To view a list of the available telemetry devices, use the command `opensearch-benchmark list telemetry`. After you've selected a [supported telemetry device]({{site.url}}{{site.baseurl}}/benchmark/reference/telemetry/), you can activate the device when running a tests with the `--telemetry` command flag. For example, if you want to use the `jfr` device with the `geonames` workload, enter the following command:
+
+```json
+opensearch-benchmark workload --workload=geonames --telemetry=jfr
+```
+{% include copy-curl.html %}
+
diff --git a/_benchmark/user-guide/understanding-workloads/index.md b/_benchmark/user-guide/understanding-workloads/index.md
@@ -1,7 +1,7 @@
 ---
 layout: default
 title: Understanding workloads
-nav_order: 7
+nav_order: 10
 parent: User guide
 has_children: true
 ---

diff --git a/...mark/user-guide/contributing-workloads.md → ...-with-workloads/contributing-workloads.md b/...mark/user-guide/contributing-workloads.md → ...-with-workloads/contributing-workloads.md
@@ -2,7 +2,10 @@
 layout: default
 title: Sharing custom workloads
 nav_order: 11
-parent: User guide
+grand_parent: User guide
+parent: Working with workloads
+redirect_from: 
+  - /benchmark/user-guide/contributing-workloads/
 ---
 
 # Sharing custom workloads

diff --git a/...k/user-guide/creating-custom-workloads.md → ...th-workloads/creating-custom-workloads.md b/...k/user-guide/creating-custom-workloads.md → ...th-workloads/creating-custom-workloads.md
@@ -2,7 +2,8 @@
 layout: default
 title: Creating custom workloads
 nav_order: 10
-parent: User guide
+grand_parent: User guide
+parent: Working with workloads
 redirect_from: 
   - /benchmark/user-guide/creating-custom-workloads/
   - /benchmark/creating-custom-workloads/

diff --git a/_benchmark/user-guide/finetine-workloads.md → ...king-with-workloads/finetune-workloads.md b/_benchmark/user-guide/finetine-workloads.md → ...king-with-workloads/finetune-workloads.md
@@ -2,7 +2,10 @@
 layout: default
 title: Fine-tuning custom workloads
 nav_order: 12
-parent: User guide
+grand_parent: User guide
+parent: Working with workloads
+redirect_from: 
+  - /benchmark/user-guide/finetine-workloads/
 ---
 
 # Fine-tuning custom workloads

diff --git a/_benchmark/user-guide/working-with-workloads/index.md b/_benchmark/user-guide/working-with-workloads/index.md
@@ -0,0 +1,16 @@
+---
+layout: default
+title: Working with workloads
+nav_order: 15
+parent: User guide
+has_children: true
+---
+
+# Working with workloads
+
+Once you [understand workloads]({{site.url}}{{site.baseurl}}/benchmark/user-guide/understanding-workloads/index/) and have [chosen a workload]({{site.url}}{{site.baseurl}}/benchmark/user-guide/understanding-workloads/choosing-a-workload/) to run your benchmarks with, you can begin working with workloads.
+
+- [Running workloads]({{site.url}}{{site.baseurl}}/benchmark/user-guide/working-with-workloads/running-workloads/): Learn how to run an OpenSearch Benchmark workload.
+- [Creating custom workloads]({{site.url}}{{site.baseurl}}/benchmark/user-guide/working-with-workloads/creating-custom-workloads/): Create a custom workload with your own datasets.
+- [Fine-tuning workloads]({{site.url}}{{site.baseurl}}/benchmark/user-guide/working-with-workloads/finetune-workloads/): Fine-tune your custom workload according to the needs of your cluster.
+- [Contributing workloads]({{site.url}}{{site.baseurl}}/benchmark/user-guide/working-with-workloads/contributing-workloads/): Contribute your custom workload for the OpenSearch community to use.
diff --git a/_benchmark/user-guide/running-workloads.md → ...rking-with-workloads/running-workloads.md b/_benchmark/user-guide/running-workloads.md → ...rking-with-workloads/running-workloads.md
@@ -2,7 +2,10 @@
 layout: default
 title: Running a workload
 nav_order: 9
-parent: User guide
+grand_parent: User guide
+parent: Working with workloads
+redirect_from: 
+  - /benchmark/user-guide/running-workloads/
 ---
 
 # Running a workload

diff --git a/_ingest-pipelines/processors/text-chunking.md b/_ingest-pipelines/processors/text-chunking.md
@@ -31,16 +31,17 @@ The following is the syntax for the `text_chunking` processor:
 
 The following table lists the required and optional parameters for the `text_chunking` processor.
 
-| Parameter  | Data type | Required/Optional  | Description  |
-|:---|:---|:---|:---|
-| `field_map` | Object | Required	 | Contains key-value pairs that specify the mapping of a text field to the output field.	  |
-| `field_map.<input_field>`	  | String	| Required	 | The name of the field from which to obtain text for generating chunked passages.	                                   |
-| `field_map.<output_field>`	 | String	    | Required	 | The name of the field in which to store the chunked results.	|
-| `algorithm`	| Object	    | Required	 | Contains at most one key-value pair that specifies the chunking algorithm and parameters. |
-| `algorithm.<name>` | String	    | Optional	 | The name of the chunking algorithm. Valid values are [`fixed_token_length`](#fixed-token-length-algorithm) or [`delimiter`](#delimiter-algorithm). Default is `fixed_token_length`.	|
-| `algorithm.<parameters>`	   | Object	    | Optional	 | The parameters for the chunking algorithm. By default, contains the default parameters of the `fixed_token_length` algorithm.	 |
-| `description`	              | String	    | Optional	 | A brief description of the processor. |
-| `tag`	| String	    | Optional	 | An identifier tag for the processor. Useful when debugging in order to distinguish between processors of the same type.	|
+| Parameter                   | Data type | Required/Optional  | Description                                                                                                                                                                          |
+|:----------------------------|:----------|:---|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `field_map`                 | Object    | Required	 | Contains key-value pairs that specify the mapping of a text field to the output field.	                                                                                              |
+| `field_map.<input_field>`	  | String	   | Required	 | The name of the field from which to obtain text for generating chunked passages.	                                                                                                    |
+| `field_map.<output_field>`	 | String	   | Required	 | The name of the field in which to store the chunked results.	                                                                                                                        |
+| `algorithm`	                | Object	   | Required	 | Contains at most one key-value pair that specifies the chunking algorithm and parameters.                                                                                            |
+| `algorithm.<name>`          | String	   | Optional	 | The name of the chunking algorithm. Valid values are [`fixed_token_length`](#fixed-token-length-algorithm) or [`delimiter`](#delimiter-algorithm). Default is `fixed_token_length`.	 |
+| `algorithm.<parameters>`	   | Object	   | Optional	 | The parameters for the chunking algorithm. By default, contains the default parameters of the `fixed_token_length` algorithm.	                                                       |
+| `ignore_missing`	           | Boolean	  | Optional	 | If `true`, empty fields are excluded from the output. If `false`, the output will contain an empty list for every empty field. Default is `false`.	                                                        |
+| `description`	              | String	   | Optional	 | A brief description of the processor.                                                                                                                                                |
+| `tag`	                      | String	   | Optional	 | An identifier tag for the processor. Useful when debugging in order to distinguish between processors of the same type.	                                                             |
 
 To perform chunking on nested fields, specify `input_field` and `output_field` values as JSON objects. Dot paths of nested fields are not supported. For example, use `"field_map": { "foo": { "bar": "bar_chunk"} }` instead of `"field_map": { "foo.bar": "foo.bar_chunk"}`.
 {: .note}