Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output HTML contains NULL chracters in at least CJK languages #9985

Open
2 of 7 tasks
tats-u opened this issue Mar 26, 2024 · 28 comments
Open
2 of 7 tasks

Output HTML contains NULL chracters in at least CJK languages #9985

tats-u opened this issue Mar 26, 2024 · 28 comments
Labels
bug An error in the Docusaurus core causing instability or issues with its execution

Comments

@tats-u
Copy link
Contributor

tats-u commented Mar 26, 2024

Have you read the Contributing Guidelines on issues?

Prerequisites

  • I'm using the latest version of Docusaurus.
  • I have tried the npm run clear or yarn clear command.
  • I have tried rm -rf node_modules yarn.lock package-lock.json and re-installing packages.
  • I have tried creating a repro with https://new.docusaurus.io.
  • I have read the console error message carefully (if applicable).

Description

Docusarus sometimes contaminate output HTMLs with NULL chracters.
NULL characters confuses some HTML parsers used in some document scraper like https://github.com/meilisearch/docs-scraper. (it uses lxml written in Python)
Also it prevents Windows' copy-and-paste feature from copying the complete source code.

Reproducible demo

No response

Steps to reproduce

curl -LsSf https://docusaurus.io/zh-CN/docs/markdown-features/toc | rg '\x00' -a -r '[[NULL]]' --color=always | perl -C -pe 'use utf8; s/^.+?(.{50})(?=\[\[NULL)/...\1/'
curl -LsSf https://docusaurus-i18n-staging.netlify.app/ja/docs/markdown-features/toc | rg '\x00' -a -r '[[NULL]]' --color=always | perl -C -pe 'use ut
f8; s/^.+?(.{50})(?=\[\[NULL)/...\1/'
curl -LsSf https://docusaurus.io/ko/docs/markdown-features/toc | rg '\x00' -a -r '[[NULL]]' --color=always | perl -C -pe 'use utf8; s/^.+?(.{50})(?=\[
\[NULL)/...\1/'

Note

  • rg is ripgrep.
  • Perl is used for trimming of the results.

For your own documents

Write your documents in CJK or possibly other non-latin languages and then do:

npm run build
 rg '\x00' -a -r '[[NULL]]' --color=always -t html build | perl -C -pe 'use utf8; s/^.+?(.{50})(?=\[\[NULL)/...\1/'

Note

Built JS files do not seem to be affected. (no NULs are found there)

Expected behavior

No outputs (NULL characters are not found)

Actual behavior

🇨🇳

...res"><span itemprop="name">Markdown 特[[NULL]][[NULL]]性</span></a><meta itemprop="position" content="2"></li><li itemscope="" itemprop="itemListElement" itemtype="https://schema.org/ListItem" class="breadcrumbs__item breadcrumbs__item--active"><span class="breadcrumbs__link" itemprop="name">标题和目录</span><meta itemprop="position" content="3"></li></ul></nav><span class="theme-doc-version-badge badge badge--secondary">版本:3.1.1</span><div class="tocCollapsible_BEWm theme-doc-toc-mobile tocMobile_NSfz"><button type="button" class="clean-btn tocCollapsibleButton_IbtT">本页总览</button></div><div class="theme-doc-markdown markdown"><h1>标题和目录</h1>
...ia-label="链接到 示例小节 1 a III" title="链接[[NULL]][[NULL]]到 示例小节 1 a III">​</a></h4>

🇯🇵

...ocusaurus</b></a><nav aria-label="ドキュ[[NULL]]メントのサイドバー" class="menu thin-scrollbar menu_rWGR menuWithAnnouncementBar_Pf08"><ul class="theme-doc-sidebar-menu menu__list"><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-1 menu__list-item"><a class="menu__link" href="/ja/docs">はじめに</a></li><li class="theme-doc-sidebar-item-category theme-doc-sidebar-item-category-level-1 menu__list-item"><div class="menu__list-item-collapsible"><a class="menu__link menu__link--sublist" href="/ja/docs/category/getting-started">入門編</a><button aria-label="Collapse sidebar category &#x27;入門編&#x27;" aria-expanded="true" type="button" class="clean-btn menu__caret"></button></div><ul style="display:block;overflow:visible;height:auto" class="menu__list"><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-2 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/installation">インストール</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-2 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/configuration">設定</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-2 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/playground">プレイグラウンド</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-2 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/typescript-support">TypeScript サポート</a></li></ul></li><li class="theme-doc-sidebar-item-category theme-doc-sidebar-item-category-level-1 menu__list-item"><div class="menu__list-item-collapsible"><a class="menu__link menu__link--sublist menu__link--active" href="/ja/docs/category/guides">ガイド</a><button aria-label="Collapse sidebar category &#x27;ガイド&#x27;" aria-expanded="true" type="button" class="clean-btn menu__caret"></button></div><ul style="display:block;overflow:visible;height:auto" class="menu__list"><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-2 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/creating-pages">Pages</a></li><li class="theme-doc-sidebar-item-category theme-doc-sidebar-item-category-level-2 menu__list-item menu__list-item--collapsed"><div class="menu__list-item-collapsible"><a class="menu__link menu__link--sublist" tabindex="0" href="/ja/docs/docs-introduction">ドキュメント</a><button aria-label="Expand sidebar category &#x27;ドキュメント&#x27;" aria-expanded="false" type="button" class="clean-btn menu__caret"></button></div></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-2 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/blog">ブログ</a></li><li class="theme-doc-sidebar-item-category theme-doc-sidebar-item-category-level-2 menu__list-item"><div class="menu__list-item-collapsible"><a class="menu__link menu__link--sublist menu__link--active" tabindex="0" href="/ja/docs/markdown-features">マークダウンの機能</a><button aria-label="Collapse sidebar category &#x27;マークダウンの機能&#x27;" aria-expanded="true" type="button" class="clean-btn menu__caret"></button></div><ul style="display:block;overflow:visible;height:auto" class="menu__list"><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-3 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/markdown-features/react">MDX and React</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-3 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/markdown-features/tabs">Tabs</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-3 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/markdown-features/code-blocks"> コードブロック</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-3 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/markdown-features/admonitions">注意書き</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-3 menu__list-item"><a class="menu__link menu__link--active" aria-current="page" tabindex="0" href="/ja/docs/markdown-features/toc">見出しと目次</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-3 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/markdown-features/assets">Assets</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-3 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/markdown-features/links">Markdown links</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-3 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/markdown-features/plugins">MDX Plugins</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-3 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/markdown-features/math-equations">数式</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-3 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/markdown-features/diagrams">図</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-3 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/markdown-features/head-metadata">Head metadata</a></li></ul></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-2 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/styling-layout">Styling and Layout</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-2 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/swizzling">スウィズリング(Swizzling)</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-2 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/static-assets">静的アセット</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-2 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/search">検索</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-2 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/browser-support">ブラウザ対応</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-2 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/seo">SEO</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-2 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/using-plugins">プラグインの利用</a></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-2 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/deployment">デプロイ</a></li><li class="theme-doc-sidebar-item-category theme-doc-sidebar-item-category-level-2 menu__list-item menu__list-item--collapsed"><div class="menu__list-item-collapsible"><a class="menu__link menu__link--sublist" tabindex="0" href="/ja/docs/i18n/introduction">国際化 (i18n)</a><button aria-label="Expand sidebar category &#x27;国際化 (i18n)&#x27;" aria-expanded="false" type="button" class="clean-btn menu__caret"></button></div></li><li class="theme-doc-sidebar-item-link theme-doc-sidebar-item-link-level-2 menu__list-item"><a class="menu__link" tabindex="0" href="/ja/docs/guides/whats-next">What&#x27;s next?</a></li></ul></li><li class="theme-doc-sidebar-item-category theme-doc-sidebar-item-category-level-1 menu__list-item menu__list-item--collapsed"><div class="menu__list-item-collapsible"><a class="menu__link menu__link--sublist" href="/ja/docs/advanced">上級者向けガイド</a><button aria-label="Expand sidebar category &#x27;上級者向けガイド&#x27;" aria-expanded="false" type="button" class="clean-btn menu__caret"></button></div></li><li class="theme-doc-sidebar-item-category theme-doc-sidebar-item-category-level-1 menu__list-item menu__list-item--collapsed"><div class="menu__list-item-collapsible"><a class="menu__link menu__link--sublist" href="/ja/docs/migration">Upgrading</a><button aria-label="Expand sidebar category &#x27;Upgrading&#x27;" aria-expanded="false" type="button" class="clean-btn menu__caret"></button></div></li></ul></nav><button type="button" title="サ イドバーを隠す" aria-label="サイドバーを隠す" class="button button--secondary button--outline collapseSidebarButton_PUyN"><svg width="20" height="20" aria-hidden="true" class="collapseSidebarButtonIcon_DI0B"><g fill="#7a7a7a"><path d="M9.992 10.023c0 .2-.062.399-.172.547l-4.996 7.492a.982.982 0 01-.828.454H1c-.55 0-1-.453-1-1 0-.2.059-.403.168-.551l4.629-6.942L.168 3.078A.939.939 0 010 2.528c0-.548.45-.997 1-.997h2.996c.352 0 .649.18.828.45L9.82 9.472c.11.148.172.347.172.55zm0 0"></path><path d="M19.98 10.023c0 .2-.058.399-.168.547l-4.996 7.492a.987.987 0 01-.828.454h-3c-.547 0-.996-.453-.996-1 0-.2.059-.403.168-.551l4.625-6.942-4.625-6.945a.939.939 0 01-.168-.55 1 1 0 01.996-.997h3c.348 0 .649.18.828.45l4.996 7.492c.11.148.168.347.168.55zm0 0"></path></g></svg></button></div></div></aside><main class="docMainContainer_EfwR"><div class="container padding-top--md padding-bottom--lg"><div class="row"><div class="col docItemCol_n6xZ"><div class="docItemContainer_RhpI"><article><nav class="theme-doc-breadcrumbs breadcrumbsContainer_Wvrh" aria-label="パンくずリスト"><ul class="breadcrumbs" itemscope="" itemtype="https://schema.org/BreadcrumbList"><li class="breadcrumbs__item"><a aria-label="ホーム画面" class="breadcrumbs__link" href="/ja/"><svg viewBox="0 0 24 24" class="breadcrumbHomeIcon_uaSn"><path d="M10 19v-5h4v5c0 .55.45 1 1 1h3c.55 0 1-.45 1-1v-7h1.7c.46 0 .68-.57.33-.87L12.67 3.6c-.38-.34-.96-.34-1.34 0l-8.36 7.53c-.34.3-.13.87.33.87H5v7c0 .55.45 1 1 1h3c.55 0 1-.45 1-1z" fill="currentColor"></path></svg></a></li><li itemscope="" itemprop="itemListElement" itemtype="https://schema.org/ListItem" class="breadcrumbs__item"><a class="breadcrumbs__link" itemprop="item" href="/ja/docs/category/guides"><span itemprop="name">ガイド</span></a><meta itemprop="position" content="1"></li><li itemscope="" itemprop="itemListElement" itemtype="https://schema.org/ListItem" class="breadcrumbs__item"><a class="breadcrumbs__link" itemprop="item" href="/ja/docs/markdown-features"><span itemprop="name">マークダウンの機能</span></a><meta itemprop="position" content="2"></li><li itemscope="" itemprop="itemListElement" itemtype="https://schema.org/ListItem" class="breadcrumbs__item breadcrumbs__item--active"><span class="breadcrumbs__link" itemprop="name">見出しと目次</span><meta itemprop="position" content="3"></li></ul></nav><div class="tocCollapsible_BEWm theme-doc-toc-mobile tocMobile_NSfz"><button type="button" class="clean-btn tocCollapsibleButton_IbtT">このページ</button></div><div class="theme-doc-markdown markdown"><h1>見出しと目次</h1>
...itle="Example subsubsection 3 b I への直[[NULL]]リンク">​</a></h4>

🇰🇷

...를 사용하는 경우에는 각 ID가 각 페이지에서 정확하게 한 번만 표[[NULL]]시되는지 확인하세요. 그렇지 않으면 같은 ID를 가진 두 개의 DOM 요소가 존재하게 됩니다. 이는 잘못된 HTML이며 제목과 적절하게 연결할 수 없게 됩니다.</p></div></div>
...iv class="admonitionContent_Knsx"><p>[[NULL]][[NULL]]아래는 현재 페이지에서 더 많은 목차 항목을 사용할 수 있는 더미 콘텐츠입니다.</p></div></div>

Note

  • Other pages are likely to be affected.
  • The same pages in latin languages are not affected.

Your environment

First found private document site written in Japanese:

  • Public source code: N/A
  • Public site URL: N/A
  • Docusaurus version used: 3.1.1
  • Environment name and version (e.g. Chrome 89, Node.js 16.4): Node 20 (latest LTS)
  • Operating system and version (e.g. Ubuntu 20.04.2 LTS): Ubuntu (GitHub Actions)

The above commands are run in Ubuntu 22.04 on WSL on Windows 11.

Self-service

  • I'd be willing to fix this bug myself.
@tats-u tats-u added bug An error in the Docusaurus core causing instability or issues with its execution status: needs triage This issue has not been triaged by maintainers labels Mar 26, 2024
@Josh-Cena
Copy link
Collaborator

Have you checked if it's an MDX issue? Hard to believe Docusaurus has anything to do here. I can also test later.

@Josh-Cena Josh-Cena added status: needs more information There is not enough information to take action on the issue. domain: markdown Related to Markdown parsing or syntax and removed status: needs triage This issue has not been triaged by maintainers labels Mar 26, 2024
@tats-u
Copy link
Contributor Author

tats-u commented Mar 26, 2024

I will check other CJK sites built with other software (e.g. Astro & Nextra).

@Josh-Cena
Copy link
Collaborator

When I'm debugging this, I usually isolate an MDX compiler with the same setup as Docusaurus, and invoke it programmatically.

@tats-u
Copy link
Contributor Author

tats-u commented Mar 27, 2024

None of Astro & Nextra sites seem to be affected.

Rspress, which also uses MDX (maybe uses mdxjs-rs or markdown-rs instead), is not affected.

However, The document of Ant Design is affected. (They do not use Docusaurus or MDX but only remark.

Also, the demo of @easyops-cn/docusaurus-search-local is affected only when the UI language is Chinese despite the fact that the page content is the same one written in Chinese. This is strange and interesting.

@slorber
Copy link
Collaborator

slorber commented Mar 28, 2024

Hey

To be honest I'm not super familiar with any of those concepts and won't have the bandwidth to investigate much 😅

I was just wondering, couldn't this be a Crowdin translation issue?

I'm not super skilled in rg and perl, can you tell me if you see anything weird in these input MD files?

zh-CN.zip

@tats-u
Copy link
Contributor Author

tats-u commented Mar 28, 2024

can you tell me if you see anything weird in these input MD files?

No NULL characters are found in html, md, mdx, json, or css files in your ZIP archive.

I was just wondering, couldn't this be a Crowdin translation issue?

I found this issue in my (our) site where i18n is not applied, so I am convinced that Crowdin is not concerned with it.

@slorber
Copy link
Collaborator

slorber commented Mar 28, 2024

Thanks for investigating.

Also worth giving a try to use this env variable on your site when building: process.env.SKIP_HTML_MINIFICATION === 'true'

@tats-u
Copy link
Contributor Author

tats-u commented Mar 29, 2024

Neither of $env:SKIP_HTML_MINIFICATION = "true" (I am using PowerShell) nor --no-minify helped.
MDX part was still minified.
Also, changing the locale from "ja" to "en" did not, either.

@tats-u
Copy link
Contributor Author

tats-u commented Mar 29, 2024

https://typescriptbook.jp/ (https://github.com/yytypescript/book)

This site uses Docusaurus 2.4.1, and NULL chars are not found there.

@Josh-Cena
Copy link
Collaborator

I will check this afternoon. There's a chance that there's something environment specific.

@tats-u
Copy link
Contributor Author

tats-u commented May 6, 2024

I found both Docusaurus and Ant Design website have div whose class has markdown.
However, none of Nextra, Rspress, or Astro have.

And looks like https://ant.design/docs/blog/line-ellipsis-cn doesn't contain NULL now.

@tats-u
Copy link
Contributor Author

tats-u commented Jun 3, 2024

I found the top page of the Docusaurus homepage in some languages has NULL:

@tats-u
Copy link
Contributor Author

tats-u commented Oct 1, 2024

I found the following pages contain NULL, too.

curl -LsSf https://docusaurus-archive-october-2023.netlify.app/zh-cn/docs/2.0.1/markdown-features/toc | rg '\x00' -a -r '[[NULL]]'
<div class="theme-admonition theme-admonition-warning admonition_o5H7 alert alert--warning"><div class="admonitionHeading_FzoX"><span class="admonitionIcon_rXq6"><svg viewBox="0 0 16 16"><path fill-rule="evenodd" d="M8.893 1.5c-.183-.31-.52-.5-.887-.5s-.703.19-.886.5L.138 13.499a.98.98 0 0 0 0 1.001c.193.31.53.501.886.501h13.964c.367 0 .704-.19.877-.5a1.03 1.03 0 0 0 .01-1.002L8.893 1.5zm.133 11.497H6.987v-2.003h2.039v2.003zm0-3.004H6.987V5.987h2.039v4.006z"></path></svg></span>Avoid colliding IDs</div><div class="admonitionContent_Knsx"><p>自动生成的标题 ID 会保证在每个页面上都是唯一的,但如 果你使用了自定义 ID,请确保每个 ID 都[[NULL]]只出现一次。否则会出现两个有相同 ID 的 DOM 元素,而这是无效的 HTML 语义,会导致一个标题无法被链接到。</p></div></div>
PS C:\Users\tatsu> curl -LsSf https://docusaurus-archive-october-2023.netlify.app/zh-cn/docs/installation | rg '\x00' -a -r '[[NULL]]'
<p>Ask for help on <a href="https://stackoverflow.com/questions/tagged/docusaurus" target="_blank" rel="noopener noreferrer">Stack Overflow</a>, on our <a href="https://github.com/facebook/docusaurus" target="_blank" rel="noopener noreferrer">GitHub repository</a>, our <a href="https://discordapp.com/invite/docusaurus" target="_blank" rel="noopener noreferrer">Discord server</a>, or <a href="https://twitter.com/docusaurus" target="_blank" rel="noopener noreferrer">Twitter</a>.</p></div><footer class="theme-doc-footer docusaurus-mt-lg"><div class="theme-doc-footer-edit-meta-row row"><div class="col"><a href="https://crowdin.com/project/docusaurus-v2/zh-CN" target="_blank" rel="noopener noreferrer" class="theme-edit-this-page"><svg fill="currentColor" height="20" width="20" viewBox="0 0 40 40" class="iconEdit_IMw_" aria-hidden="true"><g><path d="m34.5 11.7l-3 3.1-6.3-6.3 3.1-3q0.5-0.5 1.2-0.5t1.1 0.5l3.9 3.9q0.5 0.4 0.5 1.1t-0.5 1.2z m-29.5 17.1l18.4-18.5 6.3 6.3-18.4 18.4h-6.3v-6.2z"></path></g></svg>[[NULL]]编辑此页</a></div><div class="col lastUpdated_DtqZ"></div></div></footer></article><nav class="pagination-nav docusaurus-mt-lg" aria-label="文件选项卡"><a class="pagination-nav__link pagination-nav__link--prev" href="/zh-CN/docs/category/getting-started"><div class="pagination-nav__sublabel">上一页</div><div class="pagination-nav__label">Getting Started</div></a><a class="pagination-nav__link pagination-nav__link--next" href="/zh-CN/docs/configuration"><div class="pagination-nav__sublabel">下一页</div><div class="pagination-nav__label">配置</div></a></nav></div></div><div class="col col--3"><div class="tableOfContents_RLlU thin-scrollbar theme-doc-toc-desktop"><ul class="table-of-contents table-of-contents__left-border"><li><a href="#requirements" class="table-of-contents__link toc-highlight">Requirements</a></li><li><a href="#scaffold-project-website" class="table-of-contents__link toc-highlight">Scaffold project website</a></li><li><a href="#project-structure" class="table-of-contents__link toc-highlight">Project structure</a><ul><li><a href="#project-structure-rundown" class="table-of-contents__link toc-highlight">Project structure rundown</a></li><li><a href="#monorepos" class="table-of-contents__link toc-highlight">Monorepos</a></li></ul></li><li><a href="#running-the-development-server" class="table-of-contents__link toc-highlight">Running the development server</a></li><li><a href="#build" class="table-of-contents__link toc-highlight">Build</a></li><li><a href="#updating-your-docusaurus-version" class="table-of-contents__link toc-highlight">Updating your Docusaurus version</a></li><li><a href="#problems" class="table-of-contents__link toc-highlight">遇到了什么问题吗?</a></li></ul></div></div></div></div></main></div></div></div><footer class="footer footer--dark"><div class="container container-fluid"><div class="row footer__links"><div class="col footer__col"><div class="footer__title">Learn</div><ul class="footer__items clean-list"><li class="footer__item"><a class="footer__link-item" href="/zh-CN/docs">Introduction</a></li><li class="footer__item"><a class="footer__link-item" href="/zh-CN/docs/installation">Installation</a></li><li class="footer__item"><a class="footer__link-item" href="/zh-CN/docs/migration">Migration from v1 to v2</a></li></ul></div><div class="col footer__col"><div class="footer__title">Community</div><ul class="footer__items clean-list"><li class="footer__item"><a href="https://stackoverflow.com/questions/tagged/docusaurus" target="_blank" rel="noopener noreferrer" class="footer__link-item">Stack Overflow<svg width="13.5" height="13.5" aria-hidden="true" viewBox="0 0 24 24" class="iconExternalLink_Rdzz"><path fill="currentColor" d="M21 13v10h-21v-19h12v2h-10v15h17v-8h2zm3-12h-10.988l4.035 4-6.977 7.07 2.828 2.828 6.977-7.07 4.125 4.172v-11z"></path></svg></a></li><li class="footer__item"><a class="footer__link-item" href="/zh-CN/feature-requests">Feature Requests</a></li><li class="footer__item"><a href="https://discordapp.com/invite/docusaurus" target="_blank" rel="noopener noreferrer" class="footer__link-item">Discord<svg width="13.5" height="13.5" aria-hidden="true" viewBox="0 0 24 24" class="iconExternalLink_Rdzz"><path fill="currentColor" d="M21 13v10h-21v-19h12v2h-10v15h17v-8h2zm3-12h-10.988l4.035 4-6.977 7.07 2.828 2.828 6.977-7.07 4.125 4.172v-11z"></path></svg></a></li><li class="footer__item"><a class="footer__link-item" href="/zh-CN/community/support">Help</a></li></ul></div><div class="col footer__col"><div class="footer__title">More</div><ul class="footer__items clean-list"><li class="footer__item"><a class="footer__link-item" href="/zh-CN/blog">Blog</a></li><li class="footer__item"><a class="footer__link-item" href="/zh-CN/changelog">Changelog</a></li><li class="footer__item"><a href="https://github.com/facebook/docusaurus" target="_blank" rel="noopener noreferrer" class="footer__link-item">GitHub<svg width="13.5" height="13.5" aria-hidden="true" viewBox="0 0 24 24" class="iconExternalLink_Rdzz"><path fill="currentColor" d="M21 13v10h-21v-19h12v2h-10v15h17v-8h2zm3-12h-10.988l4.035 4-6.977 7.07 2.828 2.828 6.977-7.07 4.125 4.172v-11z"></path></svg></a></li><li class="footer__item"><a href="https://twitter.com/docusaurus" target="_blank" rel="noopener noreferrer" class="footer__link-item">Twitter<svg width="13.5" height="13.5" aria-hidden="true" viewBox="0 0 24 24" class="iconExternalLink_Rdzz"><path fill="currentColor" d="M21 13v10h-21v-19h12v2h-10v15h17v-8h2zm3-12h-10.988l4.035 4-6.977 7.07 2.828 2.828 6.977-7.07 4.125 4.172v-11z"></path></svg></a></li><li class="footer__item">
PS C:\Users\tatsu> curl -LsSf https://docusaurus-archive-october-2023.netlify.app/zh-cn/docs/markdown-features/toc | rg '\x00' -a -r '[[NULL]]'
<h4 class="anchor anchorWithHideOnScrollNavbar_SSbb" id="示例小节-2-b-i">示例小节 2 b I<a href="#示[[NULL]][[NULL]]例小 节-2-b-i" class="hash-link" aria-label="示例小节 2 b I的直接链接" title="示例小节 2 b I的直接链接">​</a></h4>

This shows Docusaurus 2.4.3 also has this problem.

@guyskk
Copy link

guyskk commented Oct 3, 2024

I meet the same issue, but can not reproduce by simple '@mdx-js/mdx' demo.

I find a similar issue in terser plugin, maybe the NUL byte is caused by some core dependencies?
terser/terser#942

@guyskk
Copy link

guyskk commented Oct 3, 2024

I run with SKIP_HTML_MINIFICATION=true npx docusaurus build --no-minify still has NULL bytes

@slorber
Copy link
Collaborator

slorber commented Oct 4, 2024

While working on #10554 I also noticed the new minifier reporter errors, even on our own website.

The minifier reported NULL chars for these paths:

  - "/blog/2017/12/14/introducing-docusaurus"
  - "/blog/releases/3.5"
  - "/changelog"
  - "/changelog/2.0.0-alpha.51"
  - "/changelog/2.0.0-beta.10"
  - "/changelog/2.3.0"
  - "/tests/docs/toc/toc-test-bad"
  - "/docs/migration/v3"
        Error: Can't render static file for pathname "/docs/migration/v3"
            at generateStaticFile (/Users/sebastienlorber/Desktop/projects/docusaurus/packages/docusaurus/lib/ssg.js:118:15)
            at async /Users/sebastienlorber/Desktop/projects/docusaurus/node_modules/p-map/index.js:57:22 {
          [cause]: Error: HTML minification failed (SWC)
              at Object.minifyHtmlWithSwc [as minify] (/Users/sebastienlorber/Desktop/projects/docusaurus/packages/docusaurus-bundler/lib/minifyHtml.js:107:23)
              at async generateStaticFile (/Users/sebastienlorber/Desktop/projects/docusaurus/packages/docusaurus/lib/ssg.js:106:25)
              at async /Users/sebastienlorber/Desktop/projects/docusaurus/node_modules/p-map/index.js:57:22 {
            [cause]: Error: HTML minification diagnostic errors:
            - [error] Unexpected null character - {"primary_spans":[{"end":111132,"start":111131}],"span_labels":[]}
            - [error] Unexpected null character - {"primary_spans":[{"end":111132,"start":111131}],"span_labels":[]}

Source MDX

{/* prettier-ignore */}
```mdx title="japanese.mdx"
<strong>「。」の後に文を続けると`**`が意図した動作をしません。</strong>また、<strong>[リンク](https://docusaurus.io/)</strong>や<strong>`コード`</strong>のすぐ外側に`**`、そのさらに外側に句読点以外がある場合も同様です。
```

More precisely the NULL char occurs here </strong>や<strong>

I've used this local function to have a better estimate of the position:

function reportNullChar(str: string) {
  const nullPos = str.indexOf('\0');
  if (nullPos !== -1) {
    const printAround = 100;
    const before = str.substring(Math.max(0, nullPos - printAround), nullPos);
    const after = str.substring(
      nullPos + 1,
      Math.min(str.length, nullPos + printAround),
    );
    console.warn(`HTML contains NULL char
Before: ${before}
After: ${after}
`);
  }
}

From my analysis, the output of MDX doesn't contain null chars. But the output of the React renderer does, so the NULL char appears probably in-between.

I'm pretty sure this is not limited to CJK languages, because I also have an error around this heading of "/blog/releases/3.5", and removing it fixes the error:

CleanShot 2024-10-04 at 18 13 26


@tats-u I'm not super familiar with your Bash commands. How could I easily check if the website/build/__server folder contains NULL chars

@slorber
Copy link
Collaborator

slorber commented Oct 4, 2024

Note that it doesn't seem to be a Webpack problem. I tried with Rspack and still get the error (or they ported the bug 🤷‍♂️ )

This reproduces consistently on our v3.5 blog post.

I was able to "shrink it" to this smaller version:

---
title: Docusaurus 3.5
authors: [slorber]
tags: [release]
image: ./img/social-card.png
date: 2024-08-09
---

We are happy to announce **Docusaurus 3.5**.

This release contains many **new exciting blog features**.

Upgrading should be easy. Our [release process](/community/release-process) respects [Semantic Versioning](https://semver.org/). Minor versions do not include any breaking changes.

![Docusaurus blog post social card](./img/social-card.png)

{/* truncate */}

## Highlights

### Blog Social Icons

In [#10222](https://github.com/facebook/docusaurus/pull/10222), we added the possibility to associate social links to blog authors, for inline authors declared in front matter or global through the `authors.yml` file.

```yml title="blog/authors.yml"
slorber:
  name: Sébastien Lorber
  # other author properties...
  # highlight-start
  socials:
    x: sebastienlorber
    linkedin: sebastienlorber
    github: slorber
    newsletter: https://thisweekinreact.com
  # highlight-end
```

![Author socials screenshot displaying `slorber` author with 4 social platform icons](./img/author-socials.png)

Icons and handle shortcuts are provided for pre-defined platforms `x`, `linkedin`, `github` and `stackoverflow`. It's possible to provide any additional platform entry (like `newsletter` in the example above) with a full URL.

### Blog Authors Pages

The null char happens in the markup around the last heading. Deleting the heading fixes it.

Even more surprising, removing the ** bold around **Docusaurus 3.5** or changing the URL of an imported relative image fixes it too. But changing some random text doesn't necessarily fix it. It's like the structure of the MDX docs lead to a NULL char appearing, but in a consistent way, it always reproduce or not for the given source file.

@slorber
Copy link
Collaborator

slorber commented Oct 4, 2024

Finally found the bug location.

The problem is in our React 18 SSG integration.

Replacing renderToPipeableStream by the former renderToString method fixes the problem.

https://github.com/facebook/docusaurus/blob/main/packages/docusaurus/src/client/renderToHtml.tsx

import type {ReactNode} from 'react';
import {renderToPipeableStream} from 'react-dom/server';
import {Writable} from 'stream';

export async function renderToHtml(app: ReactNode): Promise<string> {
  // Inspired from
  // https://react.dev/reference/react-dom/server/renderToPipeableStream#waiting-for-all-content-to-load-for-crawlers-and-static-generation
  // https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/cache-dir/static-entry.js
  const writableStream = new WritableAsPromise();

  const {pipe} = renderToPipeableStream(app, {
    onError(error) {
      writableStream.destroy(error as Error);
    },
    onAllReady() {
      pipe(writableStream);
    },
  });

  return writableStream.getPromise();
}

// WritableAsPromise inspired by https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/cache-dir/server-utils/writable-as-promise.js

/* eslint-disable no-underscore-dangle */
class WritableAsPromise extends Writable {
  private _output: string;
  private _deferred: {
    promise: Promise<string> | null;
    resolve: (value: string) => void;
    reject: (reason: Error) => void;
  };

  constructor() {
    super();
    this._output = ``;
    this._deferred = {
      promise: null,
      resolve: () => null,
      reject: () => null,
    };
    this._deferred.promise = new Promise((resolve, reject) => {
      this._deferred.resolve = resolve;
      this._deferred.reject = reject;
    });
  }

  override _write(
    chunk: {toString: () => string},
    _enc: unknown,
    next: () => void,
  ) {
    this._output += chunk.toString();
    next();
  }

  override _destroy(error: Error | null, next: (error?: Error | null) => void) {
    if (error instanceof Error) {
      this._deferred.reject(error);
    } else {
      next();
    }
  }

  override end() {
    this._deferred.resolve(this._output);
    return this.destroy();
  }

  getPromise(): Promise<string> {
    return this._deferred.promise!;
  }
}

Edit: it could be a React bug: https://x.com/joshcstory/status/1842254523194314900

@kurtextrem
Copy link

kurtextrem commented Oct 4, 2024

At Jochen Schweizer, I've used TextDecoder's streaming option: https://nodejs.org/api/util.html#textdecoderdecodeinput-options and haven't run into any wrong outputs from React 18. Maybe that helps?

e.g. as an example:

class WritableStream extends Writable {
    html = '';

    decoder = new TextDecoder();

    _write(chunk, enc, next) {
        this.html += this.decoder.decode(chunk, { stream: true });
        next();
    }

    destroy() {
        this.decoder = null;
        this.html = null;
    }
}

@slorber
Copy link
Collaborator

slorber commented Oct 5, 2024

Thanks

Yes our streaming to promise thing is buggy, I got at least 3 better solutions here, one of them being TextEncoder

https://x.com/phry/status/1842301184763425075?t=CqSlq7pLVEjyu2fpHqs_sQ&s=19

@slorber
Copy link
Collaborator

slorber commented Oct 7, 2024

TLDR: I reported a React bug facebook/react#31134


I thought our WritableAsPromise code was buggy, as suggested by others, although I tried the various alternatives suggested and still get an extra NULL char, so I'll try to create a minimal repro and report a bug to the React team.


For example this code:

import type {ReactNode} from 'react';
import {renderToPipeableStream, renderToString} from 'react-dom/server';
import {PassThrough} from 'node:stream';
import {text} from 'node:stream/consumers';

export async function renderToHtml(app: ReactNode): Promise<string> {
  return new Promise<string>((resolve, reject) => {
    const passThrough = new PassThrough();
    const {pipe} = renderToPipeableStream(app, {
      onError(error) {
        reject(error);
      },
      onAllReady() {
        pipe(passThrough);
        text(passThrough).then(resolve, reject);
      },
    });
  });
}

When adding this little test code:

  if (html.includes('\0')) {
    const goodHtml = renderToString(app);
    throw new Error(`renderToPipeableStream HTML contains null chars
renderToPipeableStream HTML length = ${html.length}
renderToString HTML length = ${goodHtml.length}
renderToString HTML contains contains null chars??? = ${goodHtml.includes('\0')}
    `);
  }

This will error with:

CleanShot 2024-10-07 at 11 14 10@2x

renderToPipeableStream and renderToString give the exact same output, except for 6 routes of our Docusaurus site that contain an extra NULL char in between.


Exact same for:

import type {ReactNode} from 'react';
import {renderToPipeableStream, renderToString} from 'react-dom/server';
import {PassThrough, Readable} from 'node:stream';

export async function renderToHtml(app: ReactNode): Promise<string> {
  return new Promise<string>((resolve, reject) => {
    const {pipe} = renderToPipeableStream(app, {
      onError(error) {
        reject(error);
      },
      onAllReady() {
        const passThrough = new PassThrough();
        pipe(passThrough);
        const webStream = Readable.toWeb(passThrough);
        // @ts-expect-error: temp
        new Response(webStream).text().then(resolve, reject);
      },
    });
  });
}

Exact same result for:

class WritableStream extends Writable {
  html = '';
  decoder = new TextDecoder();
  // @ts-expect-error: temp
  _write(chunk, enc, next) {
    this.html += this.decoder.decode(chunk, {stream: true});
    next();
  }
}

export async function renderToHtml(app: ReactNode): Promise<string> {
  return new Promise<string>((resolve, reject) => {
    const {pipe} = renderToPipeableStream(app, {
      onError(error) {
        reject(error);
      },
      onAllReady() {
        const writeableStream: WritableStream = new WritableStream();
        pipe(writeableStream);
        resolve(writeableStream.html);
      },
    });
  });
}

Note sure if I'm supposed to use a specific TextEncoder encoding, but I tried various ones and didn't get any improvement.


Note: the paths that generate NULL chars on our Docusauru website are:

  [cause]: Error: Docusaurus static site generation failed for 8 paths:
  - "/blog/2017/12/14/introducing-docusaurus"
  - "/blog/releases/3.5"
  - "/changelog"
  - "/changelog/2.0.0-alpha.51"
  - "/changelog/2.0.0-beta.10"
  - "/changelog/2.3.0"
  - "/tests/docs/toc/toc-test-bad"
  - "/docs/migration/v3"

Some paths generate more than one NULL chars, for example /changelog/2.0.0-beta.10 has 3

The extra chars are always NULL chars, and this always prints true:

`Equal without null chars = ${html.replace(/\0/g, '') === goodHtml}`

Note: I doubt React v18 will fix it, so maybe for Docusaurus v3.x we could just apply this workaround temporarily: html.replace(/\0/g, '')

@slorber
Copy link
Collaborator

slorber commented Oct 7, 2024

Looks like using renderToReadableStream is not affected by this problem. This PR might fix it: #10562

@slorber slorber removed the status: needs more information There is not enough information to take action on the issue. label Oct 7, 2024
@tats-u
Copy link
Contributor Author

tats-u commented Oct 7, 2024

I'm not super familiar with your Bash commands. How could I easily check if the website/build/__server folder contains NULL chars

How about this? (Update: false positive)

grep -lF $'\x00' *.html

(You can combine this with find)

find -name '*.html' -type f -exec grep -lF $'\x00' {} +

Anyway glad that we were able to find this is presumably due to a bug of React itself.
We can replace domain: markdown with another label.

@slorber slorber removed the domain: markdown Related to Markdown parsing or syntax label Oct 7, 2024
@tats-u
Copy link
Contributor Author

tats-u commented Oct 10, 2024

Sorry I should have used the -P option instead. (In macOS use -E instead)

grep -lPa '\x00' *.html
find \( -name '*.html' -o -name '*.js' \) -type f -exec grep -lPa '\x00' {} +

@slorber
Copy link
Collaborator

slorber commented Oct 11, 2024

@tats-u I believe our new HTML minifier (available in canary, upcoming v3.6) fixes the null chars: #10554

With this new minifier, this emits nulls:

SKIP_HTML_MINIFICATION=true yarn build:website:fast

rg '\x00' -a -r '[[NULL]]' --color=always -t html website/build | perl -C -pe 'use utf8; s/^.+?(.{50})(?=\[\[NULL)/...\1/'

This doesn't emit nulls, but the minifier reports a warning instead:

yarn build:website:fast

rg '\x00' -a -r '[[NULL]]' --color=always -t html website/build | perl -C -pe 'use utf8; s/^.+?(.{50})(?=\[\[NULL)/...\1/'

Until we figure out the React SSR/SSG bug, I'll silent that minifier warning.

Can you please check on our website or using a canary locally and tell us if you still see any NULL char?

@tats-u
Copy link
Contributor Author

tats-u commented Oct 12, 2024

@slorber I confirmed it in the official website and the Japanese staging site.

 tatsu@TATSU-DPC-2ND  ~  curl -LsSf https://docusaurus.io/zh-CN/docs/markdown-features/toc | rg '\x00' -a -r '[[NULL]]' --color=always
 tatsu@TATSU-DPC-2ND  ~  curl -LsSf https://docusaurus-i18n-staging.netlify.app/ja/docs/markdown-features/toc | rg '\x00' -a -r '[[NULL]]' --color=always
 tatsu@TATSU-DPC-2ND  ~  curl -LsSf https://docusaurus.io/ko/docs/markdown-features/toc | rg '\x00' -a -r '[[NULL]]' --color=always
 tatsu@TATSU-DPC-2ND  ~  curl -LsSf https://docusaurus.io/ko/ | rg '\x00' -a -r '[[NULL]]' --color=always
 tatsu@TATSU-DPC-2ND  ~  curl -LsSf https://docusaurus.io/zh-CN/ | rg '\x00' -a -r '[[NULL]]' --color=always
 tatsu@TATSU-DPC-2ND  ~  curl -LsSf https://docusaurus-i18n-staging.netlify.app/ja/ | rg '\x00' -a -r '[[NULL]]' --color=always
 tatsu@TATSU-DPC-2ND  ~ 

@slorber
Copy link
Collaborator

slorber commented Oct 12, 2024

Great, so at least we have a decent workaround to the possible React bug, available in canary and soon v3.6.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug An error in the Docusaurus core causing instability or issues with its execution
Projects
None yet
6 participants
@kurtextrem @slorber @guyskk @tats-u @Josh-Cena and others