Skip to content

中文more like this查询,highlight的词汇不对 #1077

@xuetaofeng

Description

@xuetaofeng

Description

中文more like this查询,highlight的词汇不对。 比如我查询 “项目经理”,但是返回的结果highlight的是: “高< em>级项目经< /em>理(”

Steps to reproduce

创建ik_smart的index

#!/usr/bin/bash
curl -X DELETE "localhost:9201/my_index"
curl -X PUT "localhost:9201/my_index" -H 'Content-Type: application/json' -d'
{
"settings": {
"analysis": {
"analyzer": {
"my_ik_smart": {
"type": "custom",
"tokenizer": "ik_smart"
}
}
}
},
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "my_ik_smart",
"position_increment_gap": 1,
"term_vector": "with_positions_offsets_payloads"
}
}
}
}
'

插入文档:

#! /usr/bin/bash

curl -X POST "localhost:9201/my_index/_doc/1" -H 'Content-Type: application/json' -d @- << 'EOF'
{
"title": [
"项目经理",
"ex Mingyuan - 前任 明源福州 销售负责人(till 06/2019)/ 前任 用友 高级项目经理(till 03/2020)",
"销售负责人(till 06/2019)/ 前任 用友 高级项目经理(till 03/2020)"
]
}
EOF

curl -X POST "localhost:9201/my_index/_doc/2" -H 'Content-Type: application/json' -d @- << 'EOF'
{
"title": [
"开发工程师",
"前任 Google 软件工程师经理",
"现任 Facebook 高级开发工程师"
]
}
EOF

curl -X POST "localhost:9201/my_index/_doc/3" -H 'Content-Type: application/json' -d @- << 'EOF'
{
"title": [
"数据分析师",
"前任 IBM 数据分析师",
"现任 Amazon 数据科学家",
"现任 Amazon 项目数据科学家"
]
}
EOF

使用more like this 和 highlight 查询:

#! /usr/bin/bash
curl -X POST "localhost:9201/my_index/_search?pretty" -H 'Content-Type: application/json' -d @- << 'EOF'
{
"query": {
"more_like_this": {
"fields": ["title"],
"like": "项目经理",
"min_term_freq": 1,
"min_doc_freq": 1,
"analyzer": "my_ik_smart"
}
},
"highlight": {
"fields": {
"title": {"type": "fvh",
"fragment_size": 150,
"number_of_fragments": 3}
}
}
}
EOF

Priovde your configuration or code snippet that helps.

Expected behavior

期望项目经理可以得到highlight

Actual behavior

得到结果,:

{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.3648179,
"hits" : [
{
"_index" : "my_index",
"_id" : "1",
"_score" : 1.3648179,
"_source" : {
"title" : [
"项目经理",
"ex Mingyuan - 前任 明源福州 销售负责人(till 06/2019)/ 前任 用友 高级项目经理(till 03/2020)",
"销售负责人(till 06/2019)/ 前任 用友 高级项目经理(till 03/2020)"
]
},
"highlight" : {
"title" : [
"项目经理",
"ex Mingyuan - 前任 明源福州 销售负责人(till 06/2019)/ 前任 用友 高级项目经理(till 03/2020)",
"销售负责人(till 06/2019)/ 前任 用友 高< em>级项目经< /em>理(till 03/2020)"
]
}
}
]
}
}

Environment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions