-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Open
Description
Description
ik无法按照main.dic字典分词,比如创立,已经在词典了,但ik_smart的时候分不出来
Steps to reproduce
POST _analyze
{
"analyzer": "ik_smart",
"text": "什么时候创立了公司?"
}
分词结果
{
"tokens" : [
{
"token" : "什么时候",
"start_offset" : 0,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 0
},
{
"token" : "创",
"start_offset" : 4,
"end_offset" : 5,
"type" : "CN_CHAR",
"position" : 1
},
{
"token" : "立了",
"start_offset" : 5,
"end_offset" : 7,
"type" : "CN_WORD",
"position" : 2
},
{
"token" : "公司",
"start_offset" : 7,
"end_offset" : 9,
"type" : "CN_WORD",
"position" : 3
}
]
}
了 是停止词,不知道为啥会分出 "立了"
Expected behavior
{
"tokens" : [
{
"token" : "什么时候",
"start_offset" : 0,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 0
},
{
"token" : "创立",
"start_offset" : 4,
"end_offset" : 6,
"type" : "CN_CHAR",
"position" : 1
},
{
"token" : "公司",
"start_offset" : 7,
"end_offset" : 9,
"type" : "CN_WORD",
"position" : 3
}
]
}
Environment
- Versions: 7.13.4
Metadata
Metadata
Assignees
Labels
No labels