non ascii phrases aren't correctly determined #6

Tarnak-public · 2021-06-04T22:08:16Z

When using custom model with non English phrases (exactly Polish words with accents) I had problems with correct classifying texts using is_spam().
As a workaround I've used accents remover during train and checking( code: https://gist.github.com/AdoHaha/a76157c6de5155bf6b0adc77988724d9 ) which works great.
So, could you add normalizing parameter into code or fix accents somehow?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

non ascii phrases aren't correctly determined #6

non ascii phrases aren't correctly determined #6

Tarnak-public commented Jun 4, 2021

non ascii phrases aren't correctly determined #6

non ascii phrases aren't correctly determined #6

Comments

Tarnak-public commented Jun 4, 2021