Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PCDATA invalid Char value #36

Open
billdenney opened this issue Dec 18, 2024 · 2 comments
Open

PCDATA invalid Char value #36

billdenney opened this issue Dec 18, 2024 · 2 comments

Comments

@billdenney
Copy link

I'm not sure if this is a bug in commonmark or xml2 I will report it here because I think that the issue is with the original encoding here. This happened when trying to use usethis::use_spell_check() on the janitor package.

When converting the attached NEWS.md file, I think that commonmark doesn't correctly parse and escape values. (The issue may be that xml2 doesn't correctly handle these values, though.)

text <- readLines("https://github.com/user-attachments/files/18189322/NEWS.md")
md <- commonmark::markdown_xml(text, sourcepos = TRUE)
doc <- xml2::xml_ns_strip(xml2::read_xml(md))
#> Error in read_xml.raw(charToRaw(enc2utf8(x)), "UTF-8", ..., as_html = as_html, : PCDATA invalid Char value 19 [9]

Created on 2024-12-18 with reprex v2.1.1

@jeroen
Copy link
Member

jeroen commented Dec 21, 2024

Thanks, I can reproduce this. Strangely I don't see any PCDATA in the generated xml text. Do you want to help narrow down which line of the NEWS.md is triggering the bug?

@billdenney
Copy link
Author

The error relates to line 295 (I just looped over each line to see where it happened).

text <- readLines("https://github.com/user-attachments/files/18189322/NEWS.md")
md <- commonmark::markdown_xml(text[[295]], sourcepos = TRUE)
doc <- xml2::xml_ns_strip(xml2::read_xml(md))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants