Skip to content

Commit

Permalink
Added persian_stem.
Browse files Browse the repository at this point in the history
Signed-off-by: dblock <[email protected]>
  • Loading branch information
dblock committed Sep 30, 2024
1 parent 39efae2 commit 6e544b2
Show file tree
Hide file tree
Showing 9 changed files with 156 additions and 61 deletions.
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,8 @@ Inspired from [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
- Added `/_bulk/stream` ([#584](https://github.com/opensearch-project/opensearch-api-specification/pull/584))
- Added `/_plugins/_ml/agents/_register`, `/_plugins/_ml/connectors/_create`, `DELETE /_plugins/_ml/agents/{agent_id}`, `DELETE /_plugins/_ml/connectors/{connector_id}` ([#228](https://github.com/opensearch-project/opensearch-api-specification/issues/228))
- Added the `context` query param to the `put_script` APIs ([#586](https://github.com/opensearch-project/opensearch-api-specification/pull/586))

- Added `persian_stem` filter ([#](https://github.com/opensearch-project/opensearch-api-specification/))
-
### Changed

- Replaced Smithy with a native OpenAPI spec ([#189](https://github.com/opensearch-project/opensearch-api-specification/issues/189))
Expand Down
12 changes: 12 additions & 0 deletions spec/schemas/_common.analysis.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -474,6 +474,7 @@ components:
- $ref: '#/components/schemas/NoriPartOfSpeechTokenFilter'
- $ref: '#/components/schemas/PatternCaptureTokenFilter'
- $ref: '#/components/schemas/PatternReplaceTokenFilter'
- $ref: '#/components/schemas/PersianStemTokenFilter'
- $ref: '#/components/schemas/PorterStemTokenFilter'
- $ref: '#/components/schemas/PredicateTokenFilter'
- $ref: '#/components/schemas/RemoveDuplicatesTokenFilter'
Expand Down Expand Up @@ -894,6 +895,17 @@ components:
required:
- pattern
- type
PersianStemTokenFilter:
allOf:
- $ref: '#/components/schemas/TokenFilterBase'
- type: object
properties:
type:
type: string
enum:
- persian_stem
required:
- type
PorterStemTokenFilter:
allOf:
- $ref: '#/components/schemas/TokenFilterBase'
Expand Down
60 changes: 0 additions & 60 deletions tests/default/_core/analyze.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,63 +30,3 @@ chapters:
- Moneyball, directed by Bennett Miller
response:
status: 200
- synopsis: Apply a filter.
path: /_analyze
method: GET
request:
payload:
tokenizer: keyword
filter:
- uppercase
text: Moneyball
response:
status: 200
payload:
tokens:
- token: MONEYBALL
type: word
start_offset: 0
end_offset: 9
position: 0
- synopsis: Apply a character filter.
path: /_analyze
method: GET
request:
payload:
tokenizer: keyword
filter:
- lowercase
char_filter:
- html_strip
text: <b>Moneyball</b>
response:
status: 200
payload:
tokens:
- token: moneyball
type: word
start_offset: 3
end_offset: 16
position: 0
- synopsis: Combine a lowercase translation with a stop filter.
path: /_analyze
method: GET
request:
payload:
tokenizer: whitespace
filter:
- lowercase
- type: stop
stopwords:
- in
- to
text: Moneyball directed by Bennett Miller
response:
status: 200
payload:
tokens:
- token: moneyball
type: word
start_offset: 0
end_offset: 9
position: 0
23 changes: 23 additions & 0 deletions tests/default/_core/analyze/filter/asciifolding.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
$schema: ../../../../../json_schemas/test_story.schema.yaml

description: Test /_analyze with a filter.
version: '>= 2.17'
chapters:
- synopsis: Apply a asciifolding filter.
path: /_analyze
method: GET
request:
payload:
tokenizer: keyword
filter:
- asciifolding
text: Léon

Check failure on line 14 in tests/default/_core/analyze/filter/asciifolding.yaml

View workflow job for this annotation

GitHub Actions / lint

Unknown word: "Léon"
response:
status: 200
payload:
tokens:
- token: Leon
type: word
start_offset: 0
end_offset: 4
position: 0
24 changes: 24 additions & 0 deletions tests/default/_core/analyze/filter/lowercase.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
$schema: ../../../../../json_schemas/test_story.schema.yaml

description: Test /_analyze with a filter.
chapters:
- synopsis: Apply a lowercase character filter.
path: /_analyze
method: GET
request:
payload:
tokenizer: keyword
filter:
- lowercase
char_filter:
- html_strip
text: <b>Moneyball</b>
response:
status: 200
payload:
tokens:
- token: moneyball
type: word
start_offset: 3
end_offset: 16
position: 0
23 changes: 23 additions & 0 deletions tests/default/_core/analyze/filter/persian_stem.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
$schema: ../../../../../json_schemas/test_story.schema.yaml

description: Test /_analyze with a filter.
version: '>= 2.17'
chapters:
- synopsis: Apply a persian_stem filter.
path: /_analyze
method: GET
request:
payload:
tokenizer: keyword
filter:
- persian_stem
text: جامدات

Check failure on line 14 in tests/default/_core/analyze/filter/persian_stem.yaml

View workflow job for this annotation

GitHub Actions / lint

Unknown word: "جامدات"
response:
status: 200
payload:
tokens:
- token: جامد

Check failure on line 19 in tests/default/_core/analyze/filter/persian_stem.yaml

View workflow job for this annotation

GitHub Actions / lint

Unknown word: "جامد"
type: word
start_offset: 0
end_offset: 6
position: 0
23 changes: 23 additions & 0 deletions tests/default/_core/analyze/filter/porterstem.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
$schema: ../../../../../json_schemas/test_story.schema.yaml

description: Test /_analyze with a filter.
version: '>= 2.17'
chapters:
- synopsis: Apply a porter_stem filter.
path: /_analyze
method: GET
request:
payload:
tokenizer: keyword
filter:
- porter_stem
text: Directed by Bennett Miller
response:
status: 200
payload:
tokens:
- token: Directed by Bennett Mil
type: word
start_offset: 0
end_offset: 26
position: 0
26 changes: 26 additions & 0 deletions tests/default/_core/analyze/filter/stop.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
$schema: ../../../../../json_schemas/test_story.schema.yaml

description: Test /_analyze with a filter.
chapters:
- synopsis: Combine a lowercase translation with a stop filter.
path: /_analyze
method: GET
request:
payload:
tokenizer: whitespace
filter:
- lowercase
- type: stop
stopwords:
- in
- to
text: Moneyball directed by Bennett Miller
response:
status: 200
payload:
tokens:
- token: moneyball
type: word
start_offset: 0
end_offset: 9
position: 0
23 changes: 23 additions & 0 deletions tests/default/_core/analyze/filter/uppercase.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
$schema: ../../../../../json_schemas/test_story.schema.yaml

description: Test /_analyze with a filter.
chapters:
- synopsis: Apply an uppercase character filter.
path: /_analyze
method: GET
request:
payload:
tokenizer: keyword
filter:
- uppercase
text: Moneyball
response:
status: 200
payload:
tokens:
- token: MONEYBALL
type: word
start_offset: 0
end_offset: 9
position: 0

0 comments on commit 6e544b2

Please sign in to comment.