Add error-tolerant mode #19

darrachequesne · 2016-10-16T20:58:59Z

Closes #2 and #5

coveralls · 2016-10-16T21:01:36Z

Coverage increased (+0.4%) to 92.958% when pulling c373d19 on darrachequesne:patch-1 into 2fa80fa on mathiasbynens:master.

darrachequesne · 2016-10-18T08:20:59Z

@mathiasbynens does that implementation comply with what you had in mind? Could you please review when you have time?

mathiasbynens · 2016-10-18T09:36:17Z

Of course! It might take a while until I get around to it, though.

darrachequesne · 2016-10-18T09:59:31Z

No problem! Please tell me if I can help in any way.

darrachequesne · 2016-11-21T21:45:51Z

Hi @mathiasbynens ! Do you know when you'll be able to review that PR please?

coveralls · 2016-12-18T23:05:21Z

Coverage increased (+0.4%) to 92.958% when pulling 41c4eef on darrachequesne:patch-1 into 5566334 on mathiasbynens:master.

chharvey · 2021-02-26T16:52:27Z

@darrachequesne Does this handle the case of missing or extra continuation bytes?

The encoding 1110xxxx 10xxxxxx 10xxxxxx 0xxxxxxx (a 3-sequence followed by a 1-sequence) is well-formed and decodes to two codepoints. But if one of the “continuation bytes” was lost in transmission,1110xxxx 10xxxxxx 0xxxxxxx would error. With {strict: false}, we would want the first character to resolve to U+FFFD instead of erroring, and the second character to resolve as normal. Example:

utf8.decode(
	'\xE2\xAC\xE2\x82\xAC', // 11100010 10101100 11100010 10000010 10101100
	{strict: false},
) === '\uFFFD\u20AC';

Likewise, 1110xxxx 10xxxxxx 10xxxxxx 10xxxxxx is not well-formed either. With strict turned off, the first character (the 3-sequence) should resolve as normal, but then U+FFFD should be returned for any remaining continuation bytes until the next “header byte” (that is, a byte starting with 00, 01, or 11) is found. Example:

utf8.decode(
	'\xE2\x82\xAC\x82\xAC\xE2\x82\xAC', // 11100010 10000010 10101100 10000010 10101100 11100010 10000010 10101100
	{strict: false},
) === '\u20AC\uFFFD\u20AC';

darrachequesne mentioned this pull request Oct 20, 2016

[fix] Sanitize strings by removing lone surrogates socketio/engine.io-parser#72

Closed

Add error-tolerant mode

41c4eef

darrachequesne force-pushed the patch-1 branch from c373d19 to 41c4eef Compare December 18, 2016 23:02

darrachequesne mentioned this pull request Dec 18, 2016

Sanitize strings by removing lone surrogates socketio/engine.io-parser#82

Merged

mathiasbynens force-pushed the master branch 4 times, most recently from 4eff386 to d4de352 Compare December 4, 2017 03:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add error-tolerant mode #19

Add error-tolerant mode #19

darrachequesne commented Oct 16, 2016 •

edited

Loading

coveralls commented Oct 16, 2016

darrachequesne commented Oct 18, 2016

mathiasbynens commented Oct 18, 2016

darrachequesne commented Oct 18, 2016

darrachequesne commented Nov 21, 2016

coveralls commented Dec 18, 2016 •

edited

Loading

chharvey commented Feb 26, 2021

Add error-tolerant mode #19

Are you sure you want to change the base?

Add error-tolerant mode #19

Conversation

darrachequesne commented Oct 16, 2016 • edited Loading

coveralls commented Oct 16, 2016

darrachequesne commented Oct 18, 2016

mathiasbynens commented Oct 18, 2016

darrachequesne commented Oct 18, 2016

darrachequesne commented Nov 21, 2016

coveralls commented Dec 18, 2016 • edited Loading

chharvey commented Feb 26, 2021

darrachequesne commented Oct 16, 2016 •

edited

Loading

coveralls commented Dec 18, 2016 •

edited

Loading