Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/(?<foo>A)\k<foo>/ is a syntax error unless using Annex B #2434

Closed
mysticatea opened this issue Jun 13, 2021 · 8 comments · Fixed by #2436
Closed

/(?<foo>A)\k<foo>/ is a syntax error unless using Annex B #2434

mysticatea opened this issue Jun 13, 2021 · 8 comments · Fixed by #2436

Comments

@mysticatea
Copy link
Contributor

Description: Adding 1 and 1 is two, but it should be...

From mysticatea/regexpp#23.

This is about /(?<foo>A)\k<foo>/ without u flag and without Annex B.

As a result, named capturing groups without u flag are always a syntax error.

I think IdentityEscape requires [~N] k or something like that in order to allow named capturing groups with no u flag.

eshost Output:

(I'm not sure about the environment without Annex B)

@bakkot
Copy link
Contributor

bakkot commented Jun 13, 2021

It's a syntax error on the first pass, I agree, but that means that it will re-parse with the N parameter set, and when parsing with the N parameter we have the production AtomEscape :: k GroupName, which matches \k<foo> as in this example. Since it parses successfully on the second pass, it's allowed.

Edit: oh, except that it has to parse successfully on the first pass in order for the second pass to happen, I see. I think that's a spec bug, yes. (I wish we didn't have two grammars. As far as I'm aware every major implementation uses the Annex B grammar; the non-Annex B one is basically irrelevant and therefore does not get maintained properly.)

@mysticatea
Copy link
Contributor Author

Yes, it skips the second pass if the first pass had a syntax error, and \k in the first pass is a syntax error.

Maybe, should both the N parameter and two passes parsing move to Annex B? Those look to be for Annex B behaviors.

@devsnek
Copy link
Member

devsnek commented Jun 14, 2021

I think this is correct as specified. The first pass will parse \k<foo> as the atoms k < f o o >, not a syntax error.

@mysticatea
Copy link
Contributor Author

@devsnek That's right in Annex B. But it's a syntax error in the spec core because the sequence \ k is not allowed.

@devsnek
Copy link
Member

devsnek commented Jun 14, 2021

hm... is that the "but not UnicodeIDContinue" in the spec? i must have a bug in engine262

@bakkot
Copy link
Contributor

bakkot commented Jun 14, 2021

is that the "but not UnicodeIDContinue" in the spec?

Yes. Keep in mind that's only there for the non-Annex B grammar, i.e. the one which is not used in actual implementations. (The Annex B variant does not have that restriction.)

@mysticatea
Copy link
Contributor Author

mysticatea@7579276

That may be a candidate to patch this issue.

  1. Remove two passes parsing from 22.2.3.2.3 Static Semantics: ParsePattern ( patternText, u ). It parses the pattern with [+N] always. This should be no problem because there are no other differences than allowing named backreferences.
  2. Add an Annex B section to extend 22.2.3.2.3 Static Semantics: ParsePattern ( patternText, u ) to introduce two passes parsing for backward compatibility about \k.

@mysticatea mysticatea changed the title Regular expression /(?<foo>A)\k<foo>/ (without u flag) is a syntax error /(?<foo>A)\k<foo>/ is a syntax error unless using Annex B Jun 14, 2021
@scole66
Copy link

scole66 commented Jun 24, 2021

This is also #1888, sort of.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants