fix: correct Chinese and special characters display in HTML renderer #305

wangyinyuan · 2025-02-20T06:29:44Z

Issue Description

The current HTML renderer has issues with displaying non-ASCII characters (like Chinese, Japanese, Korean) correctly. This is because:

The original code only applies character encoding handling when encoding is explicitly specified in the data URL
When there's no explicit charset in the data URL, it skips the decoding process entirely, leading to garbled characters

Root Cause

The issue occurs because Base64-encoded HTML content needs proper character encoding handling regardless of whether the charset is explicitly specified. When atob() decodes Base64 content, it returns a string of bytes using Latin1 encoding, which needs to be properly decoded using the correct charset. For more information about Base64, see MDN documentation.

Changes Made

// Before
if (encoding) {
  const buffer = new Uint8Array(body.length);
  for (let i = 0; i < body.length; i++) buffer[i] = body.charCodeAt(i);
  body = new TextDecoder(encoding).decode(buffer);
}

// After
// Always handle encoding with utf-8 as fallback
encoding = charset || "utf-8";
const buffer = Uint8Array.from(body, (c) => c.charCodeAt(0));
body = new TextDecoder(encoding).decode(buffer);

Key Improvements

Always perform character encoding conversion, not just when charset is specified
Use "utf-8" as fallback encoding when charset is not specified
Use Uint8Array.from() for more concise and efficient buffer creation
Ensure consistent handling of all non-ASCII characters

Testing

Tested with HTML files containing:

Chinese characters
Mixed ASCII and non-ASCII content

All characters now display correctly regardless of whether charset is explicitly specified in the data URL.

fix: correct Chinese and special characters display in HTML renderer

c358c9d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: correct Chinese and special characters display in HTML renderer #305

fix: correct Chinese and special characters display in HTML renderer #305

wangyinyuan commented Feb 20, 2025

fix: correct Chinese and special characters display in HTML renderer #305

Are you sure you want to change the base?

fix: correct Chinese and special characters display in HTML renderer #305

Conversation

wangyinyuan commented Feb 20, 2025

Issue Description

Root Cause

Changes Made

Key Improvements

Testing