Fix file:// URL roundtrip bugs (#1101, #1102)#1103
Open
jrey8343 wants to merge 3 commits intoservo:mainfrom
Open
Fix file:// URL roundtrip bugs (#1101, #1102)#1103jrey8343 wants to merge 3 commits intoservo:mainfrom
jrey8343 wants to merge 3 commits intoservo:mainfrom
Conversation
Add 7 fuzz targets covering the entire rust-url workspace: - fuzz_url_parse_roundtrip: URL parse/serialize roundtrip invariant checking - fuzz_url_differential: relative URL resolution and make_relative roundtrip - fuzz_url_setters: URL mutation via setters with validity invariant checks - fuzz_idna: IDNA domain_to_ascii/domain_to_unicode roundtrip + Punycode - fuzz_data_url: data: URL processing and base64 decoding - fuzz_form_urlencoded: form-urlencoded parse/serialize roundtrip - fuzz_percent_encoding: percent encode/decode roundtrip across ASCII sets Also includes: - Seed corpus with representative URL samples - Fuzzing dictionary for URL/IDNA/data-url tokens - CIFuzz workflow to fuzz all pull requests automatically
- fuzz_percent_encoding: use NON_ALPHANUMERIC for roundtrip assertions since it encodes '%', preventing spurious decode mismatches - fuzz_url_differential: use char_indices() to split UTF-8 input on valid character boundaries, preventing panics on multi-byte chars - fuzz.dict: replace C-style escapes (\t, \n, \r, \\) with \xHH hex escapes required by libfuzzer dictionary format
This commit fixes two bugs found through fuzzing that caused file:// URLs to fail roundtrip tests (parse → serialize → parse). Bug servo#1101: File URLs with hosts and paths starting with multiple slashes were losing their host component during roundtrip. The path normalization logic was too aggressive in stripping leading slashes, which changed how the URL was interpreted on re-parsing. Fix: Preserve path structure when a host component is present, only normalizing leading slashes for hostless file:// URLs. Bug servo#1102: Calling set_host("localhost") on file:// URLs didn't apply the same normalization as the parser, which converts "localhost" to an empty host per WHATWG spec. Fix: Normalize "localhost" to empty host in set_host() for file:// URLs, matching parser behavior. Both fixes improve WHATWG URL spec compliance and resolve 4 previously failing Web Platform Tests: - file://spider/// - file://monkey/ with pathname set to \\\\ - file:///unicorn with pathname set to //\\/ - file:///unicorn with pathname set to //monkey/..//
This was referenced Feb 8, 2026
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1103 +/- ##
=======================================
Coverage ? 86.41%
=======================================
Files ? 27
Lines ? 5337
Branches ? 0
=======================================
Hits ? 4612
Misses ? 725
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Member
|
Could you rebase this fix off of the fuzz PR? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR fixes two bugs found through OSS-Fuzz that caused file:// URLs to fail roundtrip tests (parse → serialize → parse).
Fixes #1101 - File URLs with hosts and paths containing multiple slashes were losing their host component during roundtrip
Fixes #1102 -
set_host("localhost")on file:// URLs didn't normalize localhost to empty host like the parser doesChanges
Bug #1101: Path structure preservation
Problem: When parsing file URLs like
file://host//path, the path normalization was stripping all leading slashes, causing the host to be lost on re-parse.Fix: Modified
parse_path()inparser.rsto preserve path structure when a host component is present. Leading slash normalization now only applies to hostless file:// URLs.Example:
Bug #1102: localhost normalization in set_host()
Problem: The URL parser normalizes "localhost" to empty host for file:// URLs per WHATWG spec, but
set_host()wasn't applying the same normalization, causing asymmetric behavior.Fix: Modified
set_host()inlib.rsto normalize "localhost" toNonefor file:// URLs, matching the parser's behavior.Example:
Impact
file://spider///file://monkey/with pathname set to\\\\file:///unicornwith pathname set to//\\/file:///unicornwith pathname set to//monkey/..//Testing
Added comprehensive test suite in
url/tests/roundtrip_bugs.rsthat reproduces both bugs and verifies the fixes.WHATWG Spec Compliance
Both fixes align with the WHATWG URL Standard:
Found while integrating rust-url with OSS-Fuzz for continuous fuzzing.