Skip to content

拼音首字母查询问题,当第二个字的拼音首字母为第一个字的韵母时查询不到结果 #293

@veynor-guo

Description

@veynor-guo

{
"settings":{
"number_of_shards":3,
"number_of_replicas":1,
"default_pipeline":"biz_timestamp_pipeline",
"analysis":{
"analyzer":{
"pinyin_analyzer":{
"tokenizer":"my_pinyin"
}
},
"tokenizer":{
"my_pinyin":{
"type":"pinyin",
"keep_separate_first_letter":true,
"keep_full_pinyin":true,
"keep_joined_full_pinyin":false,
"keep_original":true,
"limit_first_letter_length":16,
"lowercase":true,
"remove_duplicated_term":true,
"ignore_pinyin_offset":false
}
}
}
},
"mappings":{
"properties":{
"vendorName":{
"type":"text",
"analyzer":"pinyin_analyzer",
"search_analyzer":"pinyin_analyzer",
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
}
}
}
}

示例一:
中文:刘德华阿里巴巴
分词结果:
{
"tokens": [
{
"token": "l",
"start_offset": 0,
"end_offset": 1,
"type": "word",
"position": 0
},
{
"token": "liu",
"start_offset": 0,
"end_offset": 1,
"type": "word",
"position": 0
},
{
"token": "刘德华阿里巴巴",
"start_offset": 0,
"end_offset": 7,
"type": "word",
"position": 0
},
{
"token": "ldhalbb",
"start_offset": 0,
"end_offset": 7,
"type": "word",
"position": 0
},
{
"token": "d",
"start_offset": 1,
"end_offset": 2,
"type": "word",
"position": 1
},
{
"token": "de",
"start_offset": 1,
"end_offset": 2,
"type": "word",
"position": 1
},
{
"token": "h",
"start_offset": 2,
"end_offset": 3,
"type": "word",
"position": 2
},
{
"token": "hua",
"start_offset": 2,
"end_offset": 3,
"type": "word",
"position": 2
},
{
"token": "a",
"start_offset": 3,
"end_offset": 4,
"type": "word",
"position": 3
},
{
"token": "li",
"start_offset": 4,
"end_offset": 5,
"type": "word",
"position": 4
},
{
"token": "b",
"start_offset": 5,
"end_offset": 6,
"type": "word",
"position": 5
},
{
"token": "ba",
"start_offset": 5,
"end_offset": 6,
"type": "word",
"position": 5
}
]
}

查询:
{
"query": {
"match_phrase": {
"vendorName": {
"query": "ldha"
}
}
}
}

可以看到分词结果中包含了首字母ldha,但查询不到结果,"阿"的首字母a,感觉是受到,"华"(hua)字中的a影响查不到。

示例二:
中文:深圳健安医药有限公司
{
"tokens": [
{
"token": "s",
"start_offset": 0,
"end_offset": 1,
"type": "word",
"position": 0
},
{
"token": "shen",
"start_offset": 0,
"end_offset": 1,
"type": "word",
"position": 0
},
{
"token": "深圳健安医药有限公司",
"start_offset": 0,
"end_offset": 10,
"type": "word",
"position": 0
},
{
"token": "szjayyyxgs",
"start_offset": 0,
"end_offset": 10,
"type": "word",
"position": 0
},
{
"token": "z",
"start_offset": 1,
"end_offset": 2,
"type": "word",
"position": 1
},
{
"token": "zhen",
"start_offset": 1,
"end_offset": 2,
"type": "word",
"position": 1
},
{
"token": "j",
"start_offset": 2,
"end_offset": 3,
"type": "word",
"position": 2
},
{
"token": "jian",
"start_offset": 2,
"end_offset": 3,
"type": "word",
"position": 2
},
{
"token": "a",
"start_offset": 3,
"end_offset": 4,
"type": "word",
"position": 3
},
{
"token": "an",
"start_offset": 3,
"end_offset": 4,
"type": "word",
"position": 3
},
{
"token": "y",
"start_offset": 4,
"end_offset": 5,
"type": "word",
"position": 4
},
{
"token": "yi",
"start_offset": 4,
"end_offset": 5,
"type": "word",
"position": 4
},
{
"token": "yao",
"start_offset": 5,
"end_offset": 6,
"type": "word",
"position": 5
},
{
"token": "you",
"start_offset": 6,
"end_offset": 7,
"type": "word",
"position": 6
},
{
"token": "x",
"start_offset": 7,
"end_offset": 8,
"type": "word",
"position": 7
},
{
"token": "xian",
"start_offset": 7,
"end_offset": 8,
"type": "word",
"position": 7
},
{
"token": "g",
"start_offset": 8,
"end_offset": 9,
"type": "word",
"position": 8
},
{
"token": "gong",
"start_offset": 8,
"end_offset": 9,
"type": "word",
"position": 8
},
{
"token": "si",
"start_offset": 9,
"end_offset": 10,
"type": "word",
"position": 9
}
]
}

查询:
{
"query": {
"match_phrase": {
"vendorName": {
"query": "szja"
}
}
}
}

可以看到分词结果中包含了首字母szja,但查询不到结果,"安"的首字母a,感觉是受到,"健"(jian)字中的a影响查不到。

其它中文,例如:深圳恩,使用sze同样查询不到,恩的首字母e 受到深(shen)字中的e影响查不到。

我调了很多参数都无法解决这个问题,有大佬救救我吗

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions