Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong detection for simple string "nbs" - U_FILE_ACCESS_ERROR (ArgumentError) #182

Open
iuri-gg opened this issue Jul 20, 2024 · 0 comments

Comments

@iuri-gg
Copy link

iuri-gg commented Jul 20, 2024

When running CharlockHolmes::EncodingDetector.detect "nbs" I get high confidence (score 75) detection but it is wrong {:type=>:text, :encoding=>"IBM420_ltr", :ruby_encoding=>"binary", :confidence=>75, :language=>"ar"}.

Moreover when I try to convert that string to UTF8, I get U_FILE_ACCESS_ERROR (ArgumentError) error. Below is the code

input = "nbs"
encoding = Encoding::UTF_8
detection = CharlockHolmes::EncodingDetector.detect(input)
CharlockHolmes::Converter.convert(input, detection[:encoding], encoding.to_s)

I am using ruby 3.3.4 and gem version 0.7.8.

Am I coming across a bug or am I using it wrong?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant