Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

look's like langdetect is getting fooled by bytes #94

Open
Fratso opened this issue Oct 11, 2021 · 0 comments
Open

look's like langdetect is getting fooled by bytes #94

Fratso opened this issue Oct 11, 2021 · 0 comments

Comments

@Fratso
Copy link

Fratso commented Oct 11, 2021

Hi,
I tried to use it as a plaintext detector, to check if it could detect an english sentance from a random deciphered string.

Here's an example:

>>> from langdetect import detect
>>> from langdetect import detect_langs

>>> deciphered_string = b'Q\x04RWUV\x04YTXS\x05RTTPU\x00QYPSURTYSTRW\x04\x05R\x05\x04WVRUQTXQQP\x04R\x07TRT\x02\x04WSVPQRS'
>>> deciphered_string.decode("utf-8")
'Q\x04RWUV\x04YTXS\x05RTTPU\x00QYPSURTYSTRW\x04\x05R\x05\x04WVRUQTXQQP\x04R\x07TRT\x02\x04WSVPQRS'

>>> detect_langs(deciphered_string.decode("utf-8"))
[en:0.999994546875217]
>>> detect(deciphered_string.decode("utf-8"))
'en'

I expected the function to throw an error but not to send a bad result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant