Skip to content

Add support for Japanese #39

@KevinDanikowski

Description

@KevinDanikowski

There is no support for Japanese, however, it's a popular enough language that I think it should be supported.

Current behavior is to guess the language is English due to Japanese characters not being recognized since it's a unique character set.

Sample: "シャーロック・ホームズ (Sherlock Holmes) は、19世紀後半に活躍したイギリスの小説家・アーサー・コナン・ドイルの創作した[1]、シャーロック・ホームズシリーズの主人公である、架空の探偵"

Result:

[
  [ 'english', 0.030795454545454626 ],
  [ 'somali', 0.026553030303030245 ],
  [ 'estonian', 0.021590909090909105 ],
  [ 'hungarian', 0.021098484848484755 ],
  [ 'danish', 0.019962121212121264 ],
  [ 'albanian', 0.019053030303030183 ],
  [ 'hawaiian', 0.015946969696969737 ],
  [ 'french', 0.015643939393939377 ],
  [ 'latin', 0.015606060606060623 ],
  [ 'german', 0.015454545454545388 ],
  [ 'hausa', 0.01435606060606065 ],
  [ 'swedish', 0.012575757575757462 ],
  [ 'welsh', 0.011325757575757489 ],
  [ 'portuguese', 0.010909090909090868 ],
  [ 'czech', 0.010833333333333361 ],
  [ 'spanish', 0.010492424242424137 ],
  [ 'latvian', 0.01041666666666663 ],
  [ 'swahili', 0.010227272727272751 ],
  [ 'norwegian', 0.009356060606060645 ],
  [ 'pidgin', 0.00920454545454541 ],
  [ 'vietnamese', 0.007348484848484826 ],
  [ 'dutch', 0.006212121212121224 ],
  [ 'icelandic', 0.005113636363636487 ],
  [ 'indonesian', 0.003901515151515156 ],
  [ 'lithuanian', 0.0012499999999999734 ]
]

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions