Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide an option for charset #1570

Open
fluviusmagnus opened this issue Nov 13, 2024 · 4 comments
Open

Provide an option for charset #1570

fluviusmagnus opened this issue Nov 13, 2024 · 4 comments

Comments

@fluviusmagnus
Copy link

Is your feature request related to a problem? Please describe.
Unfortunately, not all websites are using utf-8, for example Zeno is still written in iso-8859-1. The same problem has already been mentioned in several issues before (such as #129, #1317).

Describe the solution you'd like
Detect the charset of the page or just provide an option to specify it manually will solve the problem permanently.

@dteviot
Copy link
Owner

dteviot commented Nov 13, 2024

@fluviusmagnus

I am aware of the problem.

Detect the charset of the page

Unfortunately, this is not an easy thing to do. Yes, I know Browsers do this. But, last I looked, they don't provide an API or similar to access this functionality. Note, if you can find a way to do this, I will be very happy to add this ability.

an option to specify it manually

I have considered this as well. The problem is, most users seem unable to grasp CSS. I suspect trying to explain charsets to them to be impossible.

That said, I'm prepared for you to convince me that I'm wrong.

@fluviusmagnus
Copy link
Author

@dteviot

Yes, I agree that it could be sometimes very confusing to many.

But as a compromise, maybe to hide this option in the 'Advanced Options' is less unacceptable (at the expense of a working 'Test' workflow)?

@dteviot
Copy link
Owner

dteviot commented Nov 14, 2024

@fluviusmagnus

at the expense of a working 'Test' workflow

I don't understand. Can you expand on this?

I'll add

  1. WebToEpub DOES handle sites that don't use UTF-8. (Mostly the assorted Chinese charsets) It's just I have to add some code to the parser for each site.
  2. The Advanced Settings apply to All Sites. You'd really want to set the charset on a per-site basis. So, would be a field on the default parser.

@fluviusmagnus
Copy link
Author

@dteviot

Sorry for the ambiguity. I WAS talking about the default parser. But all I thought then was to find a place to show this option exclusively to advanced users. If it’s not on the default parser page, one must move on to the next step, even if the testing result seems weird.

But realizing that the default parser is already prepared for advanced users, now I think a field on the default parser page would be great, and quite logical. Thank you for mentioning that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants