Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LibWeb/URL: Add strip_trailing_spaces_from_an_opaque_path() #21021

Merged
merged 3 commits into from
Sep 15, 2023

Conversation

kemzeb
Copy link
Contributor

@kemzeb kemzeb commented Sep 11, 2023

Adds https://url.spec.whatwg.org/#potentially-strip-trailing-spaces-from-an-opaque-path.

Tested the code paths in set_search() and set_hash() and they appear fine, though I think the design may be questionable, specifically with exposing more methods and delegating the work in those methods to AK::URL.

@github-actions github-actions bot added the 👀 pr-needs-review PR needs review from a maintainer or community member label Sep 11, 2023
AK/URL.h Outdated Show resolved Hide resolved
Userland/Libraries/LibWeb/URL/URL.cpp Show resolved Hide resolved
Userland/Libraries/LibWeb/URL/URL.h Outdated Show resolved Hide resolved
Userland/Libraries/LibWeb/URL/URL.cpp Outdated Show resolved Hide resolved
@kemzeb kemzeb force-pushed the libweb/url branch 3 times, most recently from 82b1c27 to ff2099b Compare September 11, 2023 03:56
@shannonbooth
Copy link
Member

Interestingly, seems like there is a change being proposed that is removing this section of the spec: whatwg/url#785 (only up for discussion right now)

@kemzeb
Copy link
Contributor Author

kemzeb commented Sep 11, 2023

Nice find! From reading it (and my basic knowledge of character encoding that I should to work on), I believe some of the possible approaches they give would not have too much of an impact on my implementation.

@shannonbooth shannonbooth added ⏳ pr-waiting-for-author PR is blocked by feedback / code changes from the author and removed 👀 pr-needs-review PR needs review from a maintainer or community member labels Sep 11, 2023
@github-actions github-actions bot added 👀 pr-needs-review PR needs review from a maintainer or community member and removed ⏳ pr-waiting-for-author PR is blocked by feedback / code changes from the author labels Sep 11, 2023
@kemzeb
Copy link
Contributor Author

kemzeb commented Sep 11, 2023

Recent snapshot implements AK::serialize_path() to spec, but not sure if the spec is correct. "If url has an opaque path, then return url’s path." Why don't we prefix a /?

Even weirder, when I tried to do it myself by returning a DeprecatedString::formatted("/{}", m_paths[0]), the test suite hangs. Not sure what's going on, but the test suite should fail right now since the expected opaque pathnames won't have a / prefixed.

@kemzeb
Copy link
Contributor Author

kemzeb commented Sep 11, 2023

The most recent change removes AK::URL::is_opaque_path(). It also reverts AK::URL::serialize_path() back to using cannot_be_a_base_url() as its first step because of the problems mentioned in this comment.

I also had to modify 2 AK/TestURL.cpp test cases since they fail, but appear to be incorrect. Running the test cases "unicode" and "file_url_with_encoded_characters" on Chrome by doing the following in the DevTools console:

let url = new URL("http://example.com/_ünicöde_téxt_©");
undefined
url.pathname
'/_%C3%BCnic%C3%B6de_t%C3%A9xt_%C2%A9'

let url = new URL("http://example.com/a%23test");
undefined
url.pathname
'/a%23test'

their expected output follows inline with the changes I made to these test cases.

I also made changes to the strip_trailing_spaces_from_an_opaque_path() due to the the removal of AK::URL::is_opaque_path(). I also changed the expected test cases to have URL-encoded space characters.

@kemzeb kemzeb force-pushed the libweb/url branch 2 times, most recently from ff97b57 to 0cc239b Compare September 12, 2023 03:48
@kemzeb
Copy link
Contributor Author

kemzeb commented Sep 12, 2023

My recent change fixes a comment typo

@shannonbooth
Copy link
Member

The most recent change removes AK::URL::is_opaque_path(). It also reverts AK::URL::serialize_path() back to using cannot_be_a_base_url() as its first step because of the problems mentioned in this comment.

I also had to modify 2 AK/TestURL.cpp test cases since they fail, but appear to be incorrect. Running the test cases "unicode" and "file_url_with_encoded_characters" on Chrome by doing the following in the DevTools console:

let url = new URL("http://example.com/_ünicöde_téxt_©");
undefined
url.pathname
'/_%C3%BCnic%C3%B6de_t%C3%A9xt_%C2%A9'

let url = new URL("http://example.com/a%23test");
undefined
url.pathname
'/a%23test'

their expected output follows inline with the changes I made to these test cases.

I also made changes to the strip_trailing_spaces_from_an_opaque_path() due to the the removal of AK::URL::is_opaque_path(). I also changed the expected test cases to have URL-encoded space characters.

I think we need to preserve the behaviour of the existing AK tests here. Existing code in serenity outside of Web specs expect that the path is percent decoded. I removed this flag in: db5ad0c since nothing was actually setting that flag, and also fixed some places that were percent decoding when we shouldn't have.

But it appears I shouldn't have removed that flag for "serialized_path" (or we need a different API which is a bit more clear?) - we need to be setting it here to follow web specs but not for serenity code outside of LibWeb which expected a percent-decoded path.

@shannonbooth shannonbooth added ⏳ pr-waiting-for-author PR is blocked by feedback / code changes from the author and removed 👀 pr-needs-review PR needs review from a maintainer or community member labels Sep 12, 2023
This commit also reverts db5ad0c since code outside of the web spec
expects serialized paths to be percent decoded.

Also, there are issues trying to implement the concept "opaque
path". For now, we still use the old cannot_be_a_base_url(), but its
usage needs to be removed in favor of a has_opaque_path() as the spec
has changed since then.
@github-actions github-actions bot added 👀 pr-needs-review PR needs review from a maintainer or community member and removed ⏳ pr-waiting-for-author PR is blocked by feedback / code changes from the author labels Sep 12, 2023
@BuggieBot
Copy link
Member

Hello!

One or more of the commit messages in this PR do not match the SerenityOS code submission policy, please check the lint_commits CI job for more details on which commits were flagged and why.
Please do not close this PR and open another, instead modify your commit message(s) with git commit --amend and force push those changes to update this PR.

Since the spec expects us to use AK::URL::serialize_path() without
doing any percent decoding.
@kemzeb
Copy link
Contributor Author

kemzeb commented Sep 12, 2023

I should note that I changed the signature of LibWeb::URL::path_segment_at_index() to just directly return a DeprecatedString. I also added LibWeb::URL::set_paths() which delegates its work to AK::URL::set_paths() because LibWeb::URL::set_pathname() returns immediately if it finds a URL that is an opaque path.

Also remove 2 FIXMEs by including this function.
Copy link
Member

@shannonbooth shannonbooth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me

@shannonbooth shannonbooth added ✅ pr-community-approved PR has been approved by a community member and removed 👀 pr-needs-review PR needs review from a maintainer or community member labels Sep 14, 2023
builder.append('/');
builder.append(percent_decode(path));

// Let output be the empty string.
Copy link
Member

@ADKaster ADKaster Sep 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. Kind of weird that the spec has a bulleted list for this instead of a numbered list. Might be worth a spec issue in https://github.com/whatwg/url if you're feeling speccy :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I wasn't too sure if this was intentional too. I'll file an issue there 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened an issue at whatwg/url#786

@ADKaster ADKaster merged commit 824c54a into SerenityOS:master Sep 15, 2023
14 checks passed
@github-actions github-actions bot removed the ✅ pr-community-approved PR has been approved by a community member label Sep 15, 2023
@kemzeb kemzeb deleted the libweb/url branch September 15, 2023 19:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants