Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide text with UTF-8 MIME Type by default #970

Open
wants to merge 3 commits into
base: next
Choose a base branch
from

Conversation

ArneBab
Copy link
Contributor

@ArneBab ArneBab commented Jul 25, 2024

This avoids very common text encoding problems.

@Bombe
Copy link
Contributor

Bombe commented Jul 25, 2024

You have not avoided the very common test writing problem! 😄

@ArneBab
Copy link
Contributor Author

ArneBab commented Sep 22, 2024

You have not avoided the very common test writing problem! 😄

I had also not avoided the very common "my change does not have any effect and a test would have shown that" problem 😓

Now it’s fixed: our plain text filter actually detects the charset from the BOM and uses UTF-8 by default.

if(handler.takesACharset && ((charset == null) || (charset.isEmpty()))) {
byte[] charsetBuffer = new byte[CHARSET_DETECTION_FALLBACK_BUFFERSIZE];
int offset = readIntoBuffer(input, CHARSET_DETECTION_FALLBACK_BUFFERSIZE, charsetBuffer);
BOMDetection bom = CSSReadFilter.detectCharsetFromBOM(charsetBuffer, CHARSET_DETECTION_FALLBACK_BUFFERSIZE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m pretty sure this is 100% wrong. That method detects an encoding from the representation of the string @charset. It is also gloriously misnamed as it has nothing to do with a BOM. 😀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants