Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use fewer negative lookahead assertions in regular expressions #4209

Conversation

rogpeppe
Copy link
Contributor

The latest version of the JSON Schema specification mentions the fact that there are many different regular expression variants and says:

given the high disparity in regular expression constructs support,
schema authors SHOULD limit themselves to the following regular expression tokens:

(See https://json-schema.org/draft/2020-12/draft-bhutton-json-schema-01#name-regular-expressions)

One example of such a variant is Go's regular expression evaluator which, despite, and in fact because of, some limitations, such as no backreferences or lookahead, has some technical advantages such as a guaranteed non-exponential runtime (see
https://swtch.com/~rsc/regexp/regexp1.html for some background).

Our JSON Schema checker uses that implementation for matching regular expressions, which means that it cannot check the regular expressions in ci.json that contain negative lookahead assertions.

Given the above, this PR changes two schemas to avoid using negative lookahead assertions in their regular expressions.

The first, in package.json is straightforward: the replacement pattern is exactly equivalent to the original..

The second, in stylelintrc.json, is somewhat less so.

As pointed out in #4047, this schema has many problems. One of those is that in many places, stylelint accepts an array of two values, the first of which is the actual value and the second of which corresponds to the coreRule schema. However the stylelint schema squashes both of those into a single anyOf schema where it could instead use an array value for items, defining a tuple that expresses exactly the constraint needed.

I've done that for objectRule here, which avoids the need to use the negative-lookahead regexp in patternProperties, which wasn't actually expressing the right constraint in any case (any field containing the substring message or severity would just be ignored.

Hopefully this should point the way towards future fixes in the rest of the schema.

In fixing this, it became clear that there were a couple of other issues that needed to be fixed to make the tests pass:

I verified the above by actually checking a .stylelintrc.json file with stylelint itself.

As an aside, this schema does not actually represent the current version of stylelint: various fields have been renamed or removed (for example all the *-whitelist and *-blacklist fields have been renamed to *-allowed-list and *-disallowed-list.

Copy link
Contributor

Thanks for the PR!

This section of the codebase is owned by @madskristensen and @hyperupcall - if they write a comment saying "LGTM" then it will be merged.

@rogpeppe rogpeppe force-pushed the rog-004-less-negative-lookaheads branch 3 times, most recently from cf2befb to 356ce8b Compare November 12, 2024 14:27
@rogpeppe
Copy link
Contributor Author

For the record, I don't understand why the precommit hook is failing on this PR. I've run npm run prettier:fix and it did not change any files. The CI run says "files were modified by this hook" but does seem to include any details of what files were modified or what the diffs were, so it's hard to know what to fix.

@hyperupcall
Copy link
Member

hyperupcall commented Nov 15, 2024

@rogpeppe Thanks! Agreed, I think features such as backreferences, lookaheads and lookbehinds should generally be avoided in JSON Schemas, especially schemas that are hosted here.

The CI run says "files were modified by this hook" but does seem to include any details of what files were modified or what the diffs were, so it's hard to know what to fix.

Yeah I run into this exact issue every time I make a pull request. pre-commit.ci seems to fail whenever files in src/test/prettierrc/* are formatted (even if they are supposed to be formatted). I've been planning to stop using pre-commit.ci for this reason (#4194), but haven't gotten around to actually finishing it. Info is available in CONTRIBUTING[1] to fix this - I usually do something like (if I have already made a commit):

git reset --soft HEAD~
git checkout HEAD -- 'src/test/prettierrc/*'
git add -A
git commit --message 'blah'

[1] https://github.com/SchemaStore/schemastore/blob/master/CONTRIBUTING.md#pre-commit-fails-to-format-files-in-ci

The latest version of the JSON Schema specification mentions the fact
that there are many different regular expression variants and says:

> given the high disparity in regular expression constructs support,
> schema authors SHOULD limit themselves to the following regular expression tokens:

(See https://json-schema.org/draft/2020-12/draft-bhutton-json-schema-01#name-regular-expressions)

One example of such a variant is Go's regular expression evaluator
which, despite, and in fact because of, some limitations, such as no
backreferences or lookahead, has some technical advantages such as a
guaranteed non-exponential runtime (see
https://swtch.com/~rsc/regexp/regexp1.html for some background).

Our JSON Schema checker uses that implementation for matching regular
expressions, which means that it cannot check the regular expressions in
`ci.json` that contain negative lookahead assertions.

Given the above, this PR changes two schemas to avoid using negative
lookahead assertions in their regular expressions.

The first, in `package.json` is straightforward: the replacement pattern
is exactly equivalent to the original..

The second, in `stylelintrc.json`, is somewhat less so.

As pointed out in SchemaStore#4047, this schema has many problems. One of those is
that in many places, `stylelint` accepts an array of two values, the
first of which is the actual value and the second of which corresponds
to the coreRule schema. However the `stylelint` schema squashes both of
those into a single `anyOf` schema where it could instead use an array
value for `items`, defining a tuple that expresses exactly the
constraint needed.

I've done that for `objectRule` here, which avoids the need to use the
negative-lookahead regexp in `patternProperties`, which wasn't actually
expressing the right constraint in any case (any field containing the
substring `message` or `severity` would just be ignored).

Hopefully this should point the way towards future fixes in the rest of
the schema.

In fixing this, it became clear that there were a couple of other issues
that needed to be fixed to make the tests pass:

- boolean values [are not always required to be true](https://stylelint.io/user-guide/configure/#reportdescriptionlessdisables)
- array values [_can_ be empty](https://stylelint.io/user-guide/rules/declaration-property-unit-allowed-list/)
- the values inside `objectRule` [_can_ contain strings and empty arrays](https://stylelint.io/user-guide/rules/declaration-property-unit-allowed-list/)

I verified the above by actually checking a `.stylelintrc.json`
file with `stylelint` itself.

As an aside, this schema does not actually represent the current version
of `stylelint`: various fields have been renamed or removed (for example
all the `*-whitelist` and `*-blacklist` fields have been renamed to
`*-allowed-list` and `*-disallowed-list`.
@rogpeppe rogpeppe force-pushed the rog-004-less-negative-lookaheads branch from 356ce8b to eb59ee8 Compare November 18, 2024 10:09
@rogpeppe
Copy link
Contributor Author

@rogpeppe Thanks! Agreed, I think features such as backreferences, lookaheads and lookbehinds should generally be avoided in JSON Schemas, especially schemas that are hosted here.

The CI run says "files were modified by this hook" but does seem to include any details of what files were modified or what the diffs were, so it's hard to know what to fix.

Yeah I run into this exact issue every time I make a pull request. pre-commit.ci seems to fail whenever files in src/test/prettierrc/* are formatted (even if they are supposed to be formatted). I've been planning to stop using pre-commit.ci for this reason (#4194), but haven't gotten around to actually finishing it. Info is available in CONTRIBUTING[1] to fix this - I usually do something like (if I have already made a commit):

Ah, thanks! I had missed (or forgotten) that detail.

git reset --soft HEAD~
git checkout HEAD -- 'src/test/prettierrc/*'
git add -A
git commit --message 'blah'

FWIW, the above can be slightly simpler, avoiding the need to add the commit message again:

git restore -s 'HEAD^' src/test/prettierrc
git commit -a --amend --no-edit

I have the latter command aliased as "git addcommit" because I do it so much...

@hyperupcall
Copy link
Member

Thanks!

I have the latter command aliased as "git addcommit" because I do it so much...

I have an aAcamnepf alias for git add -A && git commit --amend --no-edit && git push --force-with-lease alias 🙈

@hyperupcall hyperupcall merged commit a3795b4 into SchemaStore:master Nov 19, 2024
3 checks passed
benpops89 pushed a commit to benpops89/schemastore that referenced this pull request Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants