Refactor analyzers section (#8477) (#8478)

opensearch-project · Oct 8, 2024 · 87932d2 · 87932d2
1 parent 32b7731
commit 87932d2
Show file tree

Hide file tree

Showing 7 changed files with 44 additions and 15 deletions.
diff --git a/_analyzers/index-analyzers.md b/_analyzers/index-analyzers.md
@@ -2,6 +2,7 @@
 layout: default
 title: Index analyzers
 nav_order: 20
+parent: Analyzers
 ---
 
 # Index analyzers

diff --git a/_analyzers/index.md b/_analyzers/index.md
@@ -45,20 +45,9 @@ An analyzer must contain exactly one tokenizer and may contain zero or more char
 
 There is also a special type of analyzer called a ***normalizer***. A normalizer is similar to an analyzer except that it does not contain a tokenizer and can only include specific types of character filters and token filters. These filters can perform only character-level operations, such as character or pattern replacement, and cannot perform operations on the token as a whole. This means that replacing a token with a synonym or stemming is not supported. See [Normalizers]({{site.url}}{{site.baseurl}}/analyzers/normalizers/) for further details.
 
-## Built-in analyzers
+## Supported analyzers
 
-The following table lists the built-in analyzers that OpenSearch provides. The last column of the table contains the result of applying the analyzer to the string `It’s fun to contribute a brand-new PR or 2 to OpenSearch!`.
-
-Analyzer | Analysis performed | Analyzer output 
-:--- | :--- | :---
-**Standard** (default) | - Parses strings into tokens at word boundaries <br> - Removes most punctuation <br> - Converts tokens to lowercase | [`it’s`, `fun`, `to`, `contribute`, `a`,`brand`, `new`, `pr`, `or`, `2`, `to`, `opensearch`]
-**Simple** | - Parses strings into tokens on any non-letter character <br> - Removes non-letter characters <br> - Converts tokens to lowercase  | [`it`, `s`, `fun`, `to`, `contribute`, `a`,`brand`, `new`, `pr`, `or`, `to`, `opensearch`]
-**Whitespace** | - Parses strings into tokens on white space | [`It’s`, `fun`, `to`, `contribute`, `a`,`brand-new`, `PR`, `or`, `2`, `to`, `OpenSearch!`]
-**Stop** | - Parses strings into tokens on any non-letter character <br> - Removes non-letter characters <br> - Removes stop words <br> - Converts tokens to lowercase | [`s`, `fun`, `contribute`, `brand`, `new`, `pr`, `opensearch`]
-**Keyword** (no-op) | - Outputs the entire string unchanged | [`It’s fun to contribute a brand-new PR or 2 to OpenSearch!`]
-**Pattern** | - Parses strings into tokens using regular expressions <br> - Supports converting strings to lowercase <br> - Supports removing stop words | [`it`, `s`, `fun`, `to`, `contribute`, `a`,`brand`, `new`, `pr`, `or`, `2`, `to`, `opensearch`]
-[**Language**]({{site.url}}{{site.baseurl}}/analyzers/language-analyzers/) | Performs analysis specific to a certain language (for example, `english`). | [`fun`, `contribut`, `brand`, `new`, `pr`, `2`, `opensearch`]
-**Fingerprint** | - Parses strings on any non-letter character <br> - Normalizes characters by converting them to ASCII <br> - Converts tokens to lowercase <br> - Sorts, deduplicates, and concatenates tokens into a single token <br> - Supports removing stop words | [`2 a brand contribute fun it's new opensearch or pr to`] <br> Note that the apostrophe was converted to its ASCII counterpart.
+For a list of supported analyzers, see [Analyzers]({{site.url}}{{site.baseurl}}/analyzers/supported-analyzers/index/).
 
 ## Custom analyzers
 
@@ -195,3 +184,4 @@ Normalization ensures that searches are not limited to exact term matches, allow
 ## Next steps
 
 - Learn more about specifying [index analyzers]({{site.url}}{{site.baseurl}}/analyzers/index-analyzers/) and [search analyzers]({{site.url}}{{site.baseurl}}/analyzers/search-analyzers/).
+- See the list of [supported analyzers]({{site.url}}{{site.baseurl}}/analyzers/supported-analyzers/index/).
diff --git a/_analyzers/language-analyzers.md b/_analyzers/language-analyzers.md
@@ -1,7 +1,8 @@
 ---
 layout: default
 title: Language analyzers
-nav_order: 10
+nav_order: 100
+parent: Analyzers
 redirect_from:
   - /query-dsl/analyzers/language-analyzers/
 ---

diff --git a/_analyzers/search-analyzers.md b/_analyzers/search-analyzers.md
@@ -2,6 +2,7 @@
 layout: default
 title: Search analyzers
 nav_order: 30
+parent: Analyzers
 ---
 
 # Search analyzers
@@ -42,7 +43,7 @@ GET shakespeare/_search
 ```
 {% include copy-curl.html %}
 
-Valid values for [built-in analyzers]({{site.url}}{{site.baseurl}}/analyzers/index#built-in-analyzers) are `standard`, `simple`, `whitespace`, `stop`, `keyword`, `pattern`, `fingerprint`, or any supported [language analyzer]({{site.url}}{{site.baseurl}}/analyzers/language-analyzers/).
+For more information about supported analyzers, see [Analyzers]({{site.url}}{{site.baseurl}}/analyzers/supported-analyzers/index/).
 
 ## Specifying a search analyzer for a field
 

diff --git a/_analyzers/supported-analyzers/index.md b/_analyzers/supported-analyzers/index.md
@@ -0,0 +1,32 @@
+---
+layout: default
+title: Analyzers
+nav_order: 40
+has_children: true
+has_toc: false
+redirect_from:
+    - /analyzers/supported-analyzers/index/
+---
+
+# Analyzers
+
+The following sections list all analyzers that OpenSearch supports.
+
+## Built-in analyzers
+
+The following table lists the built-in analyzers that OpenSearch provides. The last column of the table contains the result of applying the analyzer to the string `It’s fun to contribute a brand-new PR or 2 to OpenSearch!`.
+
+Analyzer | Analysis performed | Analyzer output 
+:--- | :--- | :---
+**Standard** (default) | - Parses strings into tokens at word boundaries <br> - Removes most punctuation <br> - Converts tokens to lowercase | [`it’s`, `fun`, `to`, `contribute`, `a`,`brand`, `new`, `pr`, `or`, `2`, `to`, `opensearch`]
+**Simple** | - Parses strings into tokens on any non-letter character <br> - Removes non-letter characters <br> - Converts tokens to lowercase  | [`it`, `s`, `fun`, `to`, `contribute`, `a`,`brand`, `new`, `pr`, `or`, `to`, `opensearch`]
+**Whitespace** | - Parses strings into tokens on white space | [`It’s`, `fun`, `to`, `contribute`, `a`,`brand-new`, `PR`, `or`, `2`, `to`, `OpenSearch!`]
+**Stop** | - Parses strings into tokens on any non-letter character <br> - Removes non-letter characters <br> - Removes stop words <br> - Converts tokens to lowercase | [`s`, `fun`, `contribute`, `brand`, `new`, `pr`, `opensearch`]
+**Keyword** (no-op) | - Outputs the entire string unchanged | [`It’s fun to contribute a brand-new PR or 2 to OpenSearch!`]
+**Pattern** | - Parses strings into tokens using regular expressions <br> - Supports converting strings to lowercase <br> - Supports removing stop words | [`it`, `s`, `fun`, `to`, `contribute`, `a`,`brand`, `new`, `pr`, `or`, `2`, `to`, `opensearch`]
+[**Language**]({{site.url}}{{site.baseurl}}/analyzers/language-analyzers/) | Performs analysis specific to a certain language (for example, `english`). | [`fun`, `contribut`, `brand`, `new`, `pr`, `2`, `opensearch`]
+**Fingerprint** | - Parses strings on any non-letter character <br> - Normalizes characters by converting them to ASCII <br> - Converts tokens to lowercase <br> - Sorts, deduplicates, and concatenates tokens into a single token <br> - Supports removing stop words | [`2 a brand contribute fun it's new opensearch or pr to`] <br> Note that the apostrophe was converted to its ASCII counterpart.
+
+## Language analyzers
+
+OpenSearch supports analyzers for various languages. For more information, see [Language analyzers]({{site.url}}{{site.baseurl}}/analyzers/language-analyzers/).
diff --git a/_analyzers/token-filters/index.md b/_analyzers/token-filters/index.md
@@ -4,6 +4,8 @@ title: Token filters
 nav_order: 70
 has_children: true
 has_toc: false
+redirect_from:
+    - /analyzers/token-filters/index/
 ---
 
 # Token filters

diff --git a/_analyzers/tokenizers/index.md b/_analyzers/tokenizers/index.md
@@ -4,6 +4,8 @@ title: Tokenizers
 nav_order: 60
 has_children: false
 has_toc: false
+redirect_from:
+    - /analyzers/tokenizers/index/
 ---
 
 # Tokenizers