-
Notifications
You must be signed in to change notification settings - Fork 313
fix: only percent-encode characters in the userinfo encode set #1852
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
CodSpeed Performance ReportMerging #1852 will not alter performanceComparing Summary
|
src/url.rs
Outdated
| /// Note that this doesn't actually include % itself - see the note in | ||
| /// https://url.spec.whatwg.org/#string-percent-encode-after-encoding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RFC 3986 says:
Because the percent ("%") character serves as the indicator for
percent-encoded octets, it must be percent-encoded as "%25" for that
octet to be used as data within a URI.
And the note in the WHATWG spec isn't clear to me:
The other values for the percentEncodeSet argument — which happen to be used by the URL parser — leave U+0025 (%) untouched and as such it needs to be percent-encoded first in order to be properly represented.
As I understand it, it says that some encoding sets don't encode %, and as such % need special-casing and must be percent-encoded?
Before #1829, it seems that % wasn't encoded:
import pydantic as py
scheme = "mysql+pymysql"
host = "my_host"
port = 3306
path = "my_db"
username = "my_user"
password = "my_passwor%d"
url_1 = py.AnyUrl(f"{scheme}://{username}:{password}@{host}:{port}/{path}")
print(url_1)
#> mysql+pymysql://my_user:my_passwor%d@my_host:3306/my_dbBut following what I said previously this should be a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| /// | ||
| /// Note that this doesn't actually include % itself - see the note in | ||
| /// https://url.spec.whatwg.org/#string-percent-encode-after-encoding | ||
| const USERINFO_ENCODE_SET: &AsciiSet = &CONTROLS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional note:
In previous Pydantic versions (without #1829 included), @ was already percent-encoded:
from pydantic_core import Url
scheme = "mysql+pymysql"
host = "my_host"
port = 3306
path = "my_db"
username = "my_user"
password = "my_passwor@d"
Url.build(
scheme=scheme, username=username, password=password, host=host, port=port, path=path
)
#> mysql+pymysql://my_user:my_passwor%40d@my_host:3306/my_dbit turns out that when we use build(), we build a string representation of the URL, which is then passed to the pydantic_core.Url constructor, which will use a url validator. It delegates to the url crate. All this to say, I think it is right for our build() API to properly percent encode components (btw, should this apply to only username and password? Why not others?). However, we should document that when using the constructor with a string directly (e.g. AnyUrl(...)), components should already be percent encoded. It seems like the url crate does best effort to encode @s, but I don't know about others.
I'm mentioning this because we got a report (#1829) of build() and the constructor now producing different results. I think we can consider it as expected with what I said?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All this to say, I think it is right for our build() API to properly percent encode components (btw, should this apply to only username and password? Why not others?).
I agree with you. Given the experience of changing userinfo has been a little uncomfortable, shall we defer encoding the rest until 2.13?
Change Summary
See #1829 (comment)
The fix released in 2.41.2 was slightly too aggressive, this corrects the character set used to match the URL spec.
Related issue number
N/A
Checklist
pydantic-core(except for expected changes)