-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Warn that Infra's and XML's notion of ascii whitespace aren't the same. #649
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM although I'd like @annevk to take a look and in particular comment on XML 1.1 vs. other versions, as I remember there being some confusion there about whether 1.1 is actually used.
infra.bs
Outdated
@@ -933,6 +933,8 @@ SPACE. | |||
|
|||
<p class=note>"Whitespace" is a mass noun. | |||
|
|||
<p class=note>[[xml11#NT-S|XML's definition of whitespace]] omits U+000C FF. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WebKit has a "ASCII whitespace without FF" definition that is used for JSON/HTTP/XML. Perhaps we should add the same definition to Infra?
See also https://fetch.spec.whatwg.org/#http.
And yeah, it's not clear to me that 1.1 is actually supported. Chromium and WebKit seem rather lenient and allow data:text/xml,<?xml version="1.2"?><test/>
which I don't think is permitted by the XML specifications, and Gecko rejects that and also rejects 1.1
.
I'm happy with any of Anne's suggestions. It sounds like referring to XML 1.0 is clearly better. What name do y'all prefer for the new whitespace-sans-FF definition? |
I would slightly prefer not introducing a new definition. Because, I doubt XML, JSON, or HTTP specifications will update to be based on Infra, and I think we prefer that other specifications match "the rest of the web" when possible instead of adopting the without-FF definition. However, it could be good to update the note to something like:
(maybe it should even be a WDYT? (I'm also happy to land this PR as-is and iterate on something like the above in a followup, because I don't want to require you to do more work on top of what is already a clear, helpful improvement.) |
Done. I'm not sure of the HTTP citation because HTTP itself says whitespace is just space and tab. Fetch and mimesniff include CR/LF and FF inconsistently. So I went with "parts of HTTP" and a link to the Fetch definition. I don't have a strong opinion between |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, will give @annevk a day or two to check it as well.
This change is inspired by Candidate Correction 4 in EPUB, where they accidentally used the Infra definition of whitespace where they'd meant the XML one. See w3c/epub-specs#2637 by @mattgarrish. Their citation is to XML 1.0, but the definitions are the same between 1.0 and 1.1.
I'm not tied to the exact wording here, but it seems worth warning about the semantic difference as Infra gets more widely used in non-HTML contexts.
Preview | Diff