`/(?<foo>A)\k<foo>/` is a syntax error unless using Annex B #2434

mysticatea · 2021-06-13T22:58:46Z

Description: Adding 1 and 1 is two, but it should be...

This is about /(?<foo>A)\k<foo>/ without u flag and without Annex B.

22.2.3.2.3 Static Semantics: ParsePattern ( patternText, u ) says "if no u flag, parse the pattern without N parameter, then if no syntax errors exist and named capturing groups exist, re-parse the pattern with N parameter."
\k is a syntax error if no N parameter and no Annex B.

As a result, named capturing groups without u flag are always a syntax error.

I think IdentityEscape requires [~N] k or something like that in order to allow named capturing groups with no u flag.

eshost Output:

(I'm not sure about the environment without Annex B)

The text was updated successfully, but these errors were encountered:

bakkot · 2021-06-13T23:14:48Z

It's a syntax error on the first pass, I agree, but that means that it will re-parse with the N parameter set, and when parsing with the N parameter we have the production AtomEscape :: k GroupName, which matches \k<foo> as in this example. Since it parses successfully on the second pass, it's allowed.

Edit: oh, except that it has to parse successfully on the first pass in order for the second pass to happen, I see. I think that's a spec bug, yes. (I wish we didn't have two grammars. As far as I'm aware every major implementation uses the Annex B grammar; the non-Annex B one is basically irrelevant and therefore does not get maintained properly.)

mysticatea · 2021-06-14T01:08:52Z

Yes, it skips the second pass if the first pass had a syntax error, and \k in the first pass is a syntax error.

Maybe, should both the N parameter and two passes parsing move to Annex B? Those look to be for Annex B behaviors.

devsnek · 2021-06-14T02:06:57Z

I think this is correct as specified. The first pass will parse \k<foo> as the atoms k < f o o >, not a syntax error.

mysticatea · 2021-06-14T02:13:26Z

@devsnek That's right in Annex B. But it's a syntax error in the spec core because the sequence \ k is not allowed.

devsnek · 2021-06-14T02:22:17Z

hm... is that the "but not UnicodeIDContinue" in the spec? i must have a bug in engine262

bakkot · 2021-06-14T02:42:05Z

is that the "but not UnicodeIDContinue" in the spec?

Yes. Keep in mind that's only there for the non-Annex B grammar, i.e. the one which is not used in actual implementations. (The Annex B variant does not have that restriction.)

mysticatea · 2021-06-14T03:48:50Z

mysticatea@7579276

That may be a candidate to patch this issue.

Remove two passes parsing from 22.2.3.2.3 Static Semantics: ParsePattern ( patternText, u ). It parses the pattern with [+N] always. This should be no problem because there are no other differences than allowing named backreferences.
Add an Annex B section to extend 22.2.3.2.3 Static Semantics: ParsePattern ( patternText, u ) to introduce two passes parsing for backward compatibility about \k.

scole66 · 2021-06-24T02:31:09Z

This is also #1888, sort of.

mysticatea mentioned this issue Jun 13, 2021

Bug: Named backreferences will always cause a syntax error for non-Unicode regexes in strict parsing mode mysticatea/regexpp#23

Open

mysticatea changed the title ~~Regular expression /(?<foo>A)\k<foo>/ (without u flag) is a syntax error~~ /(?<foo>A)\k<foo>/ is a syntax error unless using Annex B Jun 14, 2021

mysticatea mentioned this issue Jun 15, 2021

Normative: allow named backreferences without u flag in core grammar #2436

Merged

ljharb closed this as completed in #2436 Feb 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`/(?<foo>A)\k<foo>/` is a syntax error unless using Annex B #2434

`/(?<foo>A)\k<foo>/` is a syntax error unless using Annex B #2434

mysticatea commented Jun 13, 2021

bakkot commented Jun 13, 2021 •

edited

Loading

mysticatea commented Jun 14, 2021

devsnek commented Jun 14, 2021

mysticatea commented Jun 14, 2021

devsnek commented Jun 14, 2021

bakkot commented Jun 14, 2021

mysticatea commented Jun 14, 2021

scole66 commented Jun 24, 2021

/(?<foo>A)\k<foo>/ is a syntax error unless using Annex B #2434

/(?<foo>A)\k<foo>/ is a syntax error unless using Annex B #2434

Comments

mysticatea commented Jun 13, 2021

bakkot commented Jun 13, 2021 • edited Loading

mysticatea commented Jun 14, 2021

devsnek commented Jun 14, 2021

mysticatea commented Jun 14, 2021

devsnek commented Jun 14, 2021

bakkot commented Jun 14, 2021

mysticatea commented Jun 14, 2021

scole66 commented Jun 24, 2021

`/(?<foo>A)\k<foo>/` is a syntax error unless using Annex B #2434

`/(?<foo>A)\k<foo>/` is a syntax error unless using Annex B #2434

bakkot commented Jun 13, 2021 •

edited

Loading