Skip to content

ik无法按照main.dic字典分词,比如创立,已经在词典了,但ik_smart的时候分不出来 #1060

@jiankunking

Description

@jiankunking

Description

ik无法按照main.dic字典分词,比如创立,已经在词典了,但ik_smart的时候分不出来

Steps to reproduce

POST _analyze
{
  "analyzer": "ik_smart",
  "text": "什么时候创立了公司?"
}

分词结果

{
  "tokens" : [
    {
      "token" : "什么时候",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "创",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "CN_CHAR",
      "position" : 1
    },
    {
      "token" : "立了",
      "start_offset" : 5,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "公司",
      "start_offset" : 7,
      "end_offset" : 9,
      "type" : "CN_WORD",
      "position" : 3
    }
  ]
}

了 是停止词,不知道为啥会分出 "立了"

Expected behavior

{
  "tokens" : [
    {
      "token" : "什么时候",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "创立",
      "start_offset" : 4,
      "end_offset" : 6,
      "type" : "CN_CHAR",
      "position" : 1
    },
    {
      "token" : "公司",
      "start_offset" : 7,
      "end_offset" : 9,
      "type" : "CN_WORD",
      "position" : 3
    }
  ]
}

Environment

  • Versions: 7.13.4

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions