-
Notifications
You must be signed in to change notification settings - Fork 217
Update 'qualifiers' rule in core spec #382 #398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 7 commits
1c1fb31
b621d65
17685b2
654e1b9
9d849b9
ccf07c5
2a13760
42e7ec0
9a7089b
898c64b
3d2a128
a990290
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -176,25 +176,25 @@ The rules for each component are: | |
|
|
||
| - **qualifiers**: | ||
|
|
||
| - The ``qualifiers`` string is prefixed by a '?' separator when not empty | ||
| - This '?' is not part of the ``qualifiers`` | ||
| - This is a query string composed of zero or more ``key=value`` pairs each | ||
| separated by a '&' ampersand. A ``key`` and ``value`` are separated by the equal | ||
| '=' character | ||
| - These '&' are not part of the ``key=value`` pairs. | ||
| - ``key`` must be unique within the keys of the ``qualifiers`` string | ||
| - ``value`` cannot be an empty string: a ``key=value`` pair with an empty ``value`` | ||
| is the same as no key/value at all for this key | ||
| - For each pair of ``key`` = ``value``: | ||
|
|
||
| - The ``key`` must be composed only of ASCII letters and numbers, '.', '-' and | ||
| '_' (period, dash and underscore) | ||
| - A ``key`` cannot start with a number | ||
| - A ``key`` must NOT be percent-encoded | ||
| - A ``key`` is case insensitive. The canonical form is lowercase | ||
| - A ``key`` cannot contain spaces | ||
| - A ``value`` must be a percent-encoded string | ||
| - The '=' separator is neither part of the ``key`` nor of the ``value`` | ||
| - The ``qualifiers`` component MUST be prefixed by a '?' separator when not empty. | ||
| - The '?' separator is not part of the ``qualifiers`` component. | ||
| - The ``qualifiers`` component is a query string composed of one or more ``key=value`` | ||
| pairs. Multiple ``key=value`` pairs MUST be separated by an ampersand '&'. | ||
| A ``key`` and ``value`` MUST be separated by the equal '=' character. | ||
| - Neither the '&' nor the '=' separator is part of the ``key`` or the ``value``. | ||
| - Each ``key`` MUST be unique among the keys of the ``qualifiers`` string. | ||
| - A ``value`` MUST NOT be an empty string: a ``key=value`` pair with an empty ``value`` | ||
| is the same as if no ``key=value`` pair exists for this ``key``. | ||
|
|
||
| - For each ``key=value`` pair: | ||
|
|
||
| - The ``key`` MUST be composed only of ASCII letters and numbers, '.', '-' and | ||
| '_' (period, dash and underscore). | ||
| - A ``key`` MUST start with an ASCII letter. | ||
| - A ``key`` MUST NOT be percent-encoded. | ||
| - A ``key`` is case insensitive. The canonical form is lowercase. | ||
| - A ``value`` MAY be composed of any character. A ``value`` MUST be | ||
| percent-encoded as described in the "Character encoding" section. | ||
|
|
||
|
|
||
| - **subpath**: | ||
|
|
@@ -206,44 +206,78 @@ The rules for each component are: | |
| in the canonical form | ||
| - Each ``subpath`` segment MUST be a percent-encoded string | ||
| - When percent-decoded, a segment: | ||
|
|
||
| - MUST NOT contain a '/' | ||
| - MUST NOT be any of '..' or '.' | ||
| - MUST NOT be empty | ||
|
|
||
| - The ``subpath`` MUST be interpreted as relative to the root of the package | ||
|
|
||
|
|
||
| Character encoding | ||
| ~~~~~~~~~~~~~~~~~~ | ||
|
|
||
| For clarity and simplicity a ``purl`` is always an ASCII string. To ensure that | ||
| there is no ambiguity when parsing a ``purl``, separator characters and non-ASCII | ||
| characters must be UTF-encoded and then percent-encoded as defined at:: | ||
| A canonical ``purl`` is always an ASCII string composed only of these characters: | ||
|
|
||
| - ``A to Z``, | ||
| - ``a to z``, | ||
| - ``0 to 9`` and | ||
| - the punctuation marks ``:/@?#%.-_~`` . | ||
|
||
|
|
||
| To ensure that there is no ambiguity when parsing a ``purl``, separator characters | ||
| and non-ASCII characters MUST be UTF-encoded and then percent-encoded as defined at | ||
| https://en.wikipedia.org/wiki/Percent-encoding and as further defined below. | ||
|
|
||
| ---- | ||
|
|
||
| Use these rules for percent-encoding and decoding the characters that comprise | ||
| a ``purl`` string. Except as otherwise provided in the "Rules for each | ||
| ``purl`` component" section above: | ||
|
|
||
| - A character used in a ``purl`` component MUST be percent-encoded unless it is: | ||
|
|
||
| - an unreserved character as defined in RFC 3986 section 2.3 (https://datatracker.ietf.org/doc/html/rfc3986#section-2.3), | ||
|
|
||
| - expressly defined in this PURL-SPECIFICATION.rst as a ``purl`` separator (and only when used as such a separator), or | ||
|
|
||
| https://en.wikipedia.org/wiki/Percent-encoding | ||
| - expressly permitted in that ``purl`` component. | ||
|
|
||
| Use these rules for percent-encoding and decoding ``purl`` components: | ||
| - All non-ASCII characters MUST be encoded as UTF-8 and then percent-encoded. | ||
|
|
||
| - the ``type`` must NOT be encoded and must NOT contain separators | ||
| - The characters used as ``purl`` separators are listed below. These characters: | ||
|
|
||
| - the '#', '?', '@' and ':' characters must NOT be encoded when used as | ||
| separators. They may need to be encoded elsewhere | ||
| - MUST NOT be percent-encoded when used as separators. | ||
|
|
||
| - the ':' ``scheme`` and ``type`` separator does not need to and must NOT be encoded. | ||
| It is unambiguous unencoded everywhere | ||
| - MUST be percent-encoded when not used as separators unless expressly permitted | ||
| by a ``purl`` component. | ||
|
|
||
| - the '/' used as ``type``/``namespace``/``name`` and ``subpath`` segments separator | ||
| does not need to and must NOT be percent-encoded. It is unambiguous unencoded | ||
| everywhere | ||
| - ``purl`` separators: | ||
|
|
||
| - the '@' ``version`` separator must be encoded as ``%40`` elsewhere | ||
| - the '?' ``qualifiers`` separator must be encoded as ``%3F`` elsewhere | ||
| - the '=' ``qualifiers`` key/value separator must NOT be encoded | ||
| - the '#' ``subpath`` separator must be encoded as ``%23`` elsewhere | ||
| ':' (colon) | ||
| - between ``scheme`` and ``type`` | ||
|
|
||
| - All non-ASCII characters must be encoded as UTF-8 and then percent-encoded | ||
| '@' (at sign) | ||
| - ``version`` prefix | ||
|
|
||
| It is OK to percent-encode ``purl`` components otherwise except for the ``type``. | ||
| Parsers and builders must always percent-decode and percent-encode ``purl`` | ||
| '?' (question mark) | ||
| - ``qualifiers`` prefix | ||
|
|
||
| '#' (number sign) | ||
| - ``subpath`` prefix | ||
|
|
||
| '/' (slash) | ||
| - ``type``/``namespace``/``name`` separator | ||
| - ``subpath`` segments separator | ||
|
|
||
| '=' (equals) | ||
| - ``qualifiers`` ``key``/``value`` separator | ||
|
|
||
| '&' (ampersand) | ||
| - ``qualifiers`` ``key=value`` separator | ||
|
|
||
| ---- | ||
|
|
||
| Parsers and builders MUST always percent-decode and percent-encode ``purl`` | ||
| components and component segments as explained in the "How to parse" and "How to | ||
| build" sections. | ||
|
|
||
|
|
@@ -486,3 +520,12 @@ License | |
| ~~~~~~~ | ||
|
|
||
| This document is licensed under the MIT license | ||
|
|
||
| Definitions | ||
| ~~~~~~~~~~~ | ||
|
|
||
| [ASCII] See, e.g., | ||
|
|
||
| - American National Standards Institute, "Coded Character Set -- 7-bit | ||
| American Standard Code for Information Interchange", ANSI X3.4, 1986. | ||
| - https://en.wikipedia.org/wiki/ASCII. | ||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -6,7 +6,7 @@ Scheme | |||||
|
|
||||||
| **QUESTION**: Can the ``scheme`` component be followed by a colon and two slashes, like a URI? | ||||||
|
|
||||||
| No. Since a ``purl`` never contains a URL Authority, its ``scheme`` should not be suffixed with double slash as in 'pkg://' and should use 'pkg:' instead. Otherwise this would be an invalid URI per RFC 3986 at https://tools.ietf.org/html/rfc3986#section-3.3:: | ||||||
| **ANSWER**: No. Since a ``purl`` never contains a URL Authority, its ``scheme`` should not be suffixed with double slash as in 'pkg://' and should use 'pkg:' instead. Otherwise this would be an invalid URI per RFC 3986 at https://tools.ietf.org/html/rfc3986#section-3.3:: | ||||||
|
|
||||||
| If a URI does not contain an authority component, then the path | ||||||
| cannot begin with two slash characters ("//"). | ||||||
|
|
@@ -24,9 +24,10 @@ For example, although these two purls are strictly equivalent, the first is in c | |||||
|
|
||||||
| pkg://gem/ruby-advisory-db-check@0.12.4 | ||||||
|
|
||||||
|
|
||||||
| **QUESTION**: Is the colon between ``scheme`` and ``type`` encoded? Can it be encoded? If yes, how? | ||||||
|
|
||||||
| The "Rules for each ``purl`` component" section provides that "[t]he ``scheme`` MUST be followed by an unencoded colon ':'. | ||||||
| **ANSWER**: The "Rules for each ``purl`` component" section provides that "[t]he ``scheme`` MUST be followed by an unencoded colon ':'. | ||||||
|
||||||
| **ANSWER**: The "Rules for each ``purl`` component" section provides that "[t]he ``scheme`` MUST be followed by an unencoded colon ':'. | |
| **ANSWER**: The "Rules for each ``purl`` component" section provides that the ``scheme`` MUST be followed by an unencoded colon ':'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point -- fixed.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@johnmhoran Can we refine this with the new wording? and remove the the weird square brackets in [t]he?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pombredanne I've fixed the use of square brackets (thanks for catching that) and will commit and push these updates. I'm not sure what you are referring to by "the new wording" aside from the square brackets -- please clarify as needed once the revised faq.rst has been pushed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are missing the "ampersand" in that list:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ampersand '&' added. Good eye.