-
Notifications
You must be signed in to change notification settings - Fork 46
Open
Labels
Description
There is no support for Japanese, however, it's a popular enough language that I think it should be supported.
Current behavior is to guess the language is English due to Japanese characters not being recognized since it's a unique character set.
Sample: "シャーロック・ホームズ (Sherlock Holmes) は、19世紀後半に活躍したイギリスの小説家・アーサー・コナン・ドイルの創作した[1]、シャーロック・ホームズシリーズの主人公である、架空の探偵"
Result:
[
[ 'english', 0.030795454545454626 ],
[ 'somali', 0.026553030303030245 ],
[ 'estonian', 0.021590909090909105 ],
[ 'hungarian', 0.021098484848484755 ],
[ 'danish', 0.019962121212121264 ],
[ 'albanian', 0.019053030303030183 ],
[ 'hawaiian', 0.015946969696969737 ],
[ 'french', 0.015643939393939377 ],
[ 'latin', 0.015606060606060623 ],
[ 'german', 0.015454545454545388 ],
[ 'hausa', 0.01435606060606065 ],
[ 'swedish', 0.012575757575757462 ],
[ 'welsh', 0.011325757575757489 ],
[ 'portuguese', 0.010909090909090868 ],
[ 'czech', 0.010833333333333361 ],
[ 'spanish', 0.010492424242424137 ],
[ 'latvian', 0.01041666666666663 ],
[ 'swahili', 0.010227272727272751 ],
[ 'norwegian', 0.009356060606060645 ],
[ 'pidgin', 0.00920454545454541 ],
[ 'vietnamese', 0.007348484848484826 ],
[ 'dutch', 0.006212121212121224 ],
[ 'icelandic', 0.005113636363636487 ],
[ 'indonesian', 0.003901515151515156 ],
[ 'lithuanian', 0.0012499999999999734 ]
]
emmaalecrim, ChrisSmith5, chrissar, Prikalel and Kuroidaka