[PM-9550] Fixes a mojibake bug in web authenticators in some multibyte locales #3346
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #3345
Make the implementation of the character conversion consistent with their consumers in the web app.
Type of change
Objective
This PR fixes a mojibake issue in the FIDO2 Web Authn.
You can see the detailed description of the issue and and screenshots in the issue #3345. In short, the current implementation displays a wrong, unreadable text "WebAuthn ã�®èª�証"" where it should display "Web Authn の認証" when the user locale is Japanese.
You can also find a minimal reproducible example of the issue at #3345 (comment).
I have actually tried reproducing the issue only in Japanese locale. In addition, theoretically it can happen in any locales whose localized text of
Fido2AuthenticateWebAuthn
message contains multibyte characters. You should be also able to reproduce the issue in an arbitrary locale by inserting a Unicode emoji into the text.Code changes
This PR removes the extra character conversion steps from
Bit.App.Utilities.AppHelpers::EncodeDataParameter
.This change makes the data encoding in the method consistent with the decoding at
apps/web/src/connectors/captcha.ts#L50
andapps/web/src/connectors/webauthn.ts#L88
.Analysis of the issue in the original implementation
The issue in the following code in the original implementation is that it applies unnecessary encoding to multibyte characters before the essential part, and there is no corresponding decoding in the end-consumer of the return value.
Design of the current implementation
The intention of this original implementation is to convert the given set of query parameters for web authenticators (web apps) into a url-safe string. The generated strings are passed into the query parameters of the web authenticators (https://vault.bitwarden.com/webauthn-mobile-connector.html) or (https://vault.bitwarden.com/captcha-mobile-connector.html).
The callers of
EncodeDataParameter
,Bit.App.Pages.CaptchaProtectedViewModel.HandleCaptchaAsync
andBit.App.Pages.TwoFactorPageViewModel.Fido2AuthenticateAsync
place some localized UI texts into the parameterobj
. Therefore, the implementation ofEncodeDataParameter
needs to safely convert multibyte characters in the message into url-safe forms, and the conversion must be consistent with the reverse conversion in the consumers.For example,
obj
has a fieldbtnText
whose value is "WebAuthn の認証" whenTwoFactorPageViewModel.Fido2AuthenticateAsync
calls the method in Japanese locale.The value of
btnText
corresponds to theFido2AuthenticateWebAuthn
entry in the resource file.What is actually happening behind the scene
Let's take a closer look at the first multibyte character in the
btnText
value, for example. The multibyte character is "の" (U+306E)."
(double quotation) or\
(backslash).Uri.EscapeDataString
internally callsSystem.UriHelper.EscapeStringToBuilder
and it encodes the character into its UTF-8 byte sequence representation, and then individual bytes in the sequence are encoded into the form of "%xx".0xE3, 0x81, 0xAE
in UTF-8Regex.Replace
replaces the individual occurrences of "%xx" form by callingEncodeMultibyte
Convert.ToUInt32
inEncodeMultibyte
converts the regex captures of "E3", "81", and "AE" into0xE3u
,0x81u
, and0xAEu
, respectivelyConvert.ToChar
converts individual integer values into their corresponding unicode code points.0xE3u
,0x81u
, and0xAEu
are converted into the following characters, respectively:ã
(U+00E3, LATIN SMALL LETTER A WITH TILDE)®
(U+00AE, REGISTERED SIGN)Encoding.UTF8.GetBytes
returns the UTF-8 byte sequence representation of the given string.ã
, , and®
are converted into their corresponding byte sequences as the followings:ã
(U+00E3) --> 0xC3, 0xA3®
(U+00AE) --> 0xC2", 0xAEConvert.ToBase64String
converts the 6 bytes into "w6PCgcKu".The consumer-side implementation in
apps/web/src/connectors/captcha.ts#L50
or apps/web/src/connectors/webauthn.ts#L88 reverses just the step (5), (4), and (1). Therefore, we get�
(U+00E3, U+0031, U+00AE) instead of "の" (U+306E).Justification of the proposed fix
Just
WebUtility.UrlEncode
would be sufficient to keep the return value url-safe. It internally converts the given string into its UTF-8 byte sequence representation, and then convert the individual bytes into url-safe characters.In practice, Base64 encoding is also necessary for consistency with the decoding in the end-consumer of the data.
Anyway, the earlier steps of (2), and (3) are not necessary in terms of the url-safety. Both of
WebUtility.UrlEncode
andConvert.ToBase64String(Encoding.UTF8.GetBytes(str))
are safe to multibyte characters. Actually, just (1), (4), (5) are what the consumer-side implementation reverses.The steps (2) and (3) actually just make the encoded value incompatible to the implementation of the decoding.
This is the reason why this PR removes the extra steps (2) and (3).
Comparison with ASCII characters.
You might wonder why this issue did not happen in earlier characters in the
btnText
string or any other messages in other many locales, e.g. in US English.It is because the byte sequence representations in UTF-8 are equal to the Unicode codepoint values themselves for ASCII characters. Therefore, the combination of the extra steps (2) and (3) is effectively no-op for the characters.
For example, the character 'W' (U+0057) is represented as a 1-length sequence of bytes
0x57
. The step (2) converts it into a string "%57". TheEncodeMultibyte
in the step (3) converts the regex capture of "57" into an integer0x57u
, and thenConvert.ToChar
converts it into a char value 'W' (U+0057).The issue happens only when the strings in the target JSON object contains characters whose UTF-8 byte sequence is different from the Unicode codepoint itself.
Before you submit
dotnet format --verify-no-changes
) (required)