-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define "normalized absolute URL" #58
Comments
I strongly feel like it is normali[sz]ation. Just like how RFC 3986 refers to it. Canonicali[sz]ation to me refers to what rel-canonical is used for, matching the definition from Wikipedia:
There is no way for a parser like the mf2 parser to figure out that value, since it only has the string to work on. (I would be very much opposed to requiring mf2 parsers to fetch resources, look for rel-canonicals, etc.) |
"Normali[sz]ation" for the reasons @Zegnat noted above.
Maybe something like:
|
Are paths always |
👍 on using "normalization." And good catch -- we should differentiate schemes as part of the steps. Loose ideas (not in spec language yet):
|
While we're updating this section of text, I think we should include text to cover #48 (comment) and microformats/php-mf2#186. |
Is this the root cause of microformats/mf2py#177 (comment)? ie, is it undefined whether normalizing |
@snarfed I think that's a good question to clarify for this issue, but with php-mf2 I think it's more a side effect than an explicit choice. RFC3986 Component Recomposition seems to indicate the "?" should be preserved with the pseudocode and note:
|
This issue is split from #9 intended to focus only on the process of normalizing URLs when parsing
u-*
.Current language:
One of the simplest things, which microformats/tests#112 is waiting on, is whether to normalize an empty URL path component to "/". @jgarber623 detailed some specs and software that include this normalization, so I think this would be pretty agreeable among implementers.
@Zegnat raised the concern of defining what we mean by "path component" since parts of URLs have been renamed over the years. The IndieAuth spec includes a normative reference to WHATWG's URL standard and explains "path component" with a simple example instead of a spec definition:
So perhaps that would be sufficient for the microformats parsing spec, too?
RFC 3986 lists some additional normalizations that could be nice-to-have but I'm not sure if they are strictly necessary for parsers:
RFC 3986 also describes remove_dot_segments to normalize "." and ".." path segments. From a quick check, it appears at least php-mf2, mf2py, and Ruby parsers are all doing this, which makes sense since it's necessary to correclty handle
<base href>
.Questions:
The text was updated successfully, but these errors were encountered: