-
Notifications
You must be signed in to change notification settings - Fork 217
Update "Character encoding" and related provisions #438 #461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
0beba16
91f07a0
e391329
e7119e8
90b017d
bc98ead
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -227,7 +227,7 @@ Character encoding | |||||
| Permitted characters | ||||||
| -------------------- | ||||||
|
|
||||||
| A canonical ``purl`` is composed of these characters ("Permitted Characters"): | ||||||
| A canonical ``purl`` is composed of these Permitted Characters: | ||||||
|
|
||||||
| - alphanumeric characters ``A to Z``, ``a to z``, ``0 to 9``, | ||||||
| - the ``purl`` separators ``:/@?=&#`` (colon ':', slash '/', at sign '@', | ||||||
|
|
@@ -257,31 +257,26 @@ These ``purl`` separator characters MUST NOT be percent-encoded when used as | |||||
| Percent-encoding rules | ||||||
| ---------------------- | ||||||
|
|
||||||
| Unless otherwise provided in this specification, when applying percent-encoding | ||||||
| or decoding to a string, use the rules of RFC 3986 section 2 | ||||||
| (https://datatracker.ietf.org/doc/html/rfc3986#section-2). In the event of any | ||||||
| conflict between this specification and RFC 3986 section 2, this specification | ||||||
| governs. | ||||||
|
|
||||||
| In the "Rules for each ``purl`` component" section above, each component | ||||||
| defines when and how to apply percent-encoding and decoding to its content. | ||||||
|
|
||||||
| When percent-encoding is required, all Permitted Characters MUST be encoded as | ||||||
| UTF-8 and then percent-encoded except for the following: | ||||||
|
|
||||||
| - the alphanumeric characters, | ||||||
|
|
||||||
| - the ASCII characters ``.-_~`` (period '.', dash '-', underscore | ||||||
| '_' and tilde '~'), | ||||||
|
|
||||||
| - the percent sign '%' when used to represent a percent-encoded character, | ||||||
|
|
||||||
| - a ``purl`` separator when being used as a ``purl`` separator, and | ||||||
|
|
||||||
| - the colon ':', whether used as a ``purl`` separator or otherwise. | ||||||
|
|
||||||
| In addition, where the space ' ' is permitted, it MUST be percent-encoded as | ||||||
| '%20'. | ||||||
| - In the "Rules for each ``purl`` component" section above, each component | ||||||
| defines when and how to apply percent-encoding and decoding to its content, | ||||||
| including which characters to percent-encode and when percent-encoding is | ||||||
| required. | ||||||
| - When percent-encoding is required by a component definition, each | ||||||
| codepoint MUST be replaced by the percent-encoded bytes of the codepoint's | ||||||
| UTF-8 encoding using the percent-encoding mechanism defined in RFC 3986 | ||||||
| section 2.1 (https://datatracker.ietf.org/doc/html/rfc3986#section-2.1). | ||||||
| - With the exception of the percent-encoding mechanism, the rules regarding | ||||||
| percent-encoding are defined by this specification alone. | ||||||
| - Where the space ' ' is permitted, it MUST be percent-encoded as | ||||||
pombredanne marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||
| '%20'. | ||||||
| - The following characters do not need to be percent-encoded: | ||||||
pombredanne marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||
|
|
||||||
| - the alphanumeric characters ``A to Z``, ``a to z``, ``0 to 9``, | ||||||
| - the ASCII characters ``.-_~`` (period '.', dash '-', underscore | ||||||
| '_' and tilde '~'), | ||||||
| - the percent sign '%' when used to represent a percent-encoded character, | ||||||
| - a ``purl`` separator when being used as a ``purl`` separator, and | ||||||
| - the colon ':', whether used as a ``purl`` separator or otherwise. | ||||||
|
||||||
| These ``purl`` separator characters MUST NOT be percent-encoded when used as | |
| ``purl`` separators: |
Do we need to repeat it here too?
- line 279: perhaps I am misunderstanding your point here -- without line 279, how will users know that colons do not need to be percent-encoded?
Sure we need to say that colon : does not need to be percent-encoded, but I think we don't need to repeat that it also does not need to be encoded when used as a separator.
Maybe we could make this paragraph less descriptive and more imperative like:
To percent-encode a string of characters:
1. encode it using UTF-8,
2. for each byte of the encoded string:
- if the byte corresponds to:
- an alphanumeric ASCII character (``A to Z``, ``a to z``, ``0 to 9``)
- or one of the ASCII characters `.`, `-`, `_`, `~` and `:`.
copy the byte to the output.
- otherwise, append the percent-encoding of the byte to the output, as defined in RFC 3986
section 2.1 (https://datatracker.ietf.org/doc/html/rfc3986#section-2.1).
Uh oh!
There was an error while loading. Please reload this page.