-
-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tree-sitter rolling fixes (January edition) #859
Tree-sitter rolling fixes (January edition) #859
Conversation
Angle brackets in type parameter lists (e.g., `Array<string>`) should not be scoped like comparison operators.
* Highlight namespaced JSX tags (e.g., `<React.Fragment>`) like other JSX tags * Don't scope `/` punctuation in JSX tags like a division operator * Scope ordinary functions as `support.other.function` just as in JavaScript * Scope the last segment of `namespace` declarations as an `entity`, other segments as `support.class`
…and move some files around for consistency.
This is our first venture into _configurable_ indentation. Exciting times.
…like * Constructors (the "Foo" in `new Foo()`) * Certain function-like builtins like `list`, `unset`, and `print`
…and include some highlighting fixes from pulsar-edit#869.
677fbb1
to
f171f33
Compare
Choosing the other side of some dilemmas I had to resolve six months ago. Highlighting between modern and legacy Tree-sitters now appears to be identical to me (with rare exception) on the One Dark theme.
…and add a new style to `one-dark-syntax` and `one-light-syntax` to ensure continuity of appearance now that object properties are not all treated as `variable`s. Also, scope private field method definitions in both JS and TS.
.github/workflows/build.yml
Outdated
- name: Upload Video Artifacts | ||
# Run whether this job passed or failed, unless explicitly cancelled. | ||
if: '!cancelled()' | ||
uses: actions/upload-artifact@v3 | ||
with: | ||
name: ${{ matrix.os }} Videos |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can support this. 👍
Optional-to-read commentary:
Seems kind of arbitrary to do this in this PR specifically, although I can imagine how it might have come up.
Nevertheless I am in favor of this, we basically want to upload these if they've been made. And if they haven't been made, the 'upload video artifacts' job just warns us about it. So, this is effectively a best-effort to upload the test vids whenever possible, which I can get behind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I needed to see the integration test failures to know what on earth was going on, and this is just the PR where it happened.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There's a hard-to-grok setting for language injections that allows a deeper layer to monopolize the scope application for a range. In most cases, an injection is placed into a node that the parser doesn't know much about (like a `script` block in HTML); but Rust and C parsers needed a way to inject themselves _into themselves_ so that they could add syntax highlighting to macros. Because they were applying highlighting to a range that the base layer _already_ had plans to highlight, they needed a way to block the shallower layer from acting. This mode has never worked briliantly, but it's been made smarter in several ways since the invention of modern Tree-sitter. And here's another one: if the highlight iterator is at a position where an injection range is _about_ to begin, it shouldn't be able to stop any other layer from _closing_ a scope; and if the highlight iterator is at a position where an injection range has just _finished_, it shouldn't be able to stop any other layer from _opening_ a scope. Because of this, we can now fix a bug that I think might've been present for a while in the application of scopes to rust macros like `println!` — the position after the exclamation point is one of those injection-layer boundaries, to the effect that a scope name was opened that would persist until at least the end of the screen line.
A leading space was being accounted for when doing some math to compute indent level… but not on both sides of the equation.
…when assigning, reassigning, or incrementing. Also add `++` and `--` as operators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm making progress on this review.
Yeah, I'm blindly trusting a great portion of the actual grammar changes, but I am testing the bins, too and they're working for me so far. And I am making an effort to get up to speed over time so I can be more literate as to the particulars of these changes.
In the mean time, a few more files/packages down, several more to go! In the coming days I hope/no doubt (?). Soon ™️. Maybe. Probably. I'm trying, at least.
Mostly just commentary so I can recall what I looked at and what impressions I had of them at the time when picking the review back up or for posterity. A few questions here and there. Nothing crucial. (Also hopefully shows I'm paying attention and not just 100% rubber-stamping.)
Thank you for these efforts!! Major big props to you on this!
const TODO_PATTERN = /\b(TODO|FIXME|CHANGED|XXX|IDEA|HACK|NOTE|REVIEW|NB|BUG|QUESTION|COMBAK|TEMP|DEBUG|OPTIMIZE|WARNING)\b/; | ||
const HYPERLINK_PATTERN = /\bhttps?:/ | ||
|
||
atom.grammars.addInjectionPoint(`source.${language}`, { | ||
type: 'comment', | ||
language: (node) => { | ||
return TODO_PATTERN.test(node.text) ? 'todo' : undefined; | ||
}, | ||
content: (node) => node, | ||
languageScope: null | ||
exports.consumeHyperlinkInjection = (hyperlink) => { | ||
for (const language of ['c', 'cpp']) { | ||
hyperlink.addInjectionPoint(`source.${language}`, { | ||
types: ['comment', 'string_literal'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the big TODO_PATTERN
as a service ("TAAS") change Ive been hearing about?
Sign me up. /j
If I totally understood, I'd say more. But uh, I'll say code deduplication is nice! 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit of a hack.
We could inject the TODO
grammar into all comments just in case there's a TODO there, and there didn't seem to be a performance problem with that approach, but in a large file that's hundreds of unneeded injections into line comments. Didn't feel right, even if it was fast.
We could have a single TODO injection layer that operated on the top-level node and just ignored any ranges that weren't comments. That felt less wasteful, but the performance sucked.
The middle ground is to inject into individual nodes, but only after doing a coarse regex check on the content to see if it's worth the trouble. This is as fast as option 1 and as polite and discerning as option 2. We use a similar strategy for the hyperlink injection.
@@ -8,6 +8,7 @@ fileTypes: [ | |||
] | |||
|
|||
treeSitter: | |||
parserSource: 'github:tree-sitter/tree-sitter-css#98c7b3dceb24f1ee17f1322f3947e55638251c37' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 again for this kind of documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to include these — @confused-Techie wrote the CI check that enforces it! :-)
; WORKAROUND: | ||
|
||
; NOTE: `tree-sitter-css` recovers poorly from invalidity inside a block when | ||
; you're adding a new property-value pair above others in a list. When the user | ||
; is typing and the file is temporarily invalid, it will make incorrect guesses | ||
; about tokens that occur between the cursor and the end of the block. | ||
; | ||
; The fix here is for `tree-sitter-css` to get better at recovering from its | ||
; parsing error, but parser authors don't currently have much control over | ||
; that. In the meantime, this query is a decent mitigation: it colors the | ||
; affected tokens like plain text instead of assuming (nearly always | ||
; incorrectly) them to be tag names. | ||
; | ||
; When you're typing a new property name inside of a list, tree-sitter-css will | ||
; assume the thing you're typing is a descendant selector tag name until you | ||
; get to the colon. This prevents it from highlighting the incomplete line like | ||
; a selector tag name. | ||
; Ideally, this is temporary, and we can remove it soon. Until then, it makes | ||
; syntax highlighting less obnoxious. | ||
|
||
(descendant_selector | ||
(tag_name) @_IGNORE_ | ||
(#set! capture.final true)) | ||
((tag_name) @_IGNORE_ | ||
(#is? test.descendantOfType "ERROR") | ||
(#set! capture.final)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understood more, I'd say this sounds like a good thing.
For now, I'll just indicate I've seen this and made some attempt to understand it. 👀
Doing my best out here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my syntax theme, when I'm typing on line 2 of this example…
div {
displ
color: blue;
padding: 10px;
margin: 10px;
}
…color
, padding
and margin
turn red when they should be brown. It's distracting. padding
and margin
go back to normal after I type a colon, but color
doesn't turn brown again until I type the semicolon.
This minimizes how obviously tree-sitter-css
is screwing up the parsing by making these tokens plain until the tree isn't in an error state anymore.
* Style `*` (universal selector) * Add support for namespaced tags and attributes (now that `tree-sitter-css` supports them)
…by namespacing them as `@_IGNORE_.foo`, `@_IGNORE_.bar`, etc. It's sometimes necessary to define a capture in a query not because you want to apply a scope name, but because you need to use it as the argument to a predicate. `@_IGNORE_` was intended for that purpose, but it was the _only_ capture name with that special effect. Now, you can specify any number of captures that don't apply scope names. As long as it equals `@_IGNORE_` or _starts with_ `@_IGNORE_.`, it'll have the same effect. This lets you target two or more nodes and use them all in predicates in the same query without any of them applying a scope name.
…when a project-specific config is present. Most people don't use a project-specific config, which is why this bug has been present for ages. Read the new spec for an explanation.
We are crossing the Rubicon. https://github.com/orgs/pulsar-edit/discussions/249 is a discussion of specific ways that indentation is different in JavaScript because of the new grammar. We intentionally didn’t try to preserve the indentation decisions in the old TextMate grammar, but this is (predictably) disruptive to some people. We don’t want to punt like legacy Tree-sitter did and rely on the TextMate grammar’s coarse regexes to decide what gets indented, but we can at least make the new indentation system granularly configurable. This commit makes practically every indentation decision configurable in JavaScript and TypeScript. It groups these settings under a new “Indentation” section in each package’s settings. I might decide that this goes too far and roll back the granular settings for `{`, `[`, and `(`, but the other indentation hints are absolutely matters of taste, so it’s only proper that users can choose to disable them. In the long run, we could _consider_ allowing users to swap in another indentation engine altogther, much like pulsar-edit#895 proposes for folds. That would let us legitimately offer users a way to opt into “the way the TextMate grammar does it” without affecting other users. Another stopgap option would be to add a way for a user to disable indentation hinting altogether for a given grammar.
…no matter how deeply they’re nested. Also highlight `?` in an incomplete ternary. (Also fix a lot of final segments in the TypeScript highlights file; I’ve been lazily copying-and-pasting and forgetting to change them.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was a long one.
I left some feedback that could be acted upon, but much of it relies on your judgment ultimately. (EDIT: And a lot of not-that-actionable fluff or notes-for-posterity, it seems like.)
With that being the case, I leave an "Approve" such that the feedback is sort of optional to address. And I will be watching for any updates, so that I can hopefully be rather timely with any round two's on this, to encourage what I hope will be an accurate sense that any action taken won't result in inordinate delays.
(i.e. if action is warranted, please do go ahead and take it, and/or ask clarifying questions to my feedback. I'll do my best to respond quickly.)
What a whopper. But yeah, approved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HAAS HQ, located. /j
Seems fine, after a quick skim. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noting that a bunch of files were renamed in this package, but that the highlighting still works in testing, so it must have been done properly. Also, there's CI and all that. So, 👍.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm in favor of making some of this configurable if it works better with certain indentation styles people might prefer out there (whitespace stuff). 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QUESTION:
Do these settings require the user to be more opinionated than before, as opposed to seamlessly supporting various indentation styles? Or is it "not that simple"?
I take it it's "not that simple" per the Discussion that was linked to, but thought I'd ask (perhaps serving as a recap) just to be sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, we pick good defaults here. If too many people find that the indentation is getting in their way, then more of these settings need to be disabled by default.
There are a couple rules here that I think just make gigantic sense, and don't even need a setting. Probably the most inarguable of them is that the closing }
of a statement block should always match the indentation of the line on which the statement block started.
if (foo) {
bar();
}
Because the braces are on lines 1 and 3, the indentation of line 3 should match that of line 1.
if (foo)
{
bar;
}
Because the block begins on lines 2 and 4, those lines should match in indentation.
If your coding style is so ornery as to violate this rule, then all IDEs probably drive you nuts, not just Pulsar.
Of course, those people could still be accommodated by a feature where they could choose between multiple indentation providers, one of which is “none,” and that's on my to-do list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And, to be clear, I don't think that most of the defaults are new behavior compared to TextMate grammars, but it will vary by the grammar. The JavaScript TextMate grammar already indented {
, [
, (
the way we are now, and the legacy Tree-sitter grammar used that same system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No action needed, just remarking: PHP seems like a strange language, per the comments in here.
if (name === '_IGNORE_') { | ||
if (name === '_IGNORE_' || name.startsWith('_IGNORE_.')) { | ||
// "@_IGNORE_" is a magical variable in an SCM file that will not be | ||
// applied in the grammar, but which allows us to prevent other kinds of | ||
// scopes from matching. We purposefully allowed this syntax node to set | ||
// data for a given range, but not to apply its scope ID to any | ||
// boundaries. | ||
// | ||
// A query can also use multiple different @_IGNORE_-style variables by | ||
// adding segments after the @_IGNORE_, such as @_IGNORE_.foo.bar. | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand, but it seems considered or deliberate based on there being comments explaining it. I barely know what this is about, but the intention seems good (??).
As I frequently do, I wish I could be more insightful reviewing this stuff. But thanks for the efforts. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometimes you need to target things in Tree-sitter queries without giving them side effects through naming.
If you give something a capture name in a highlights.scm
, you are automatically adding a scope name. But you might just capture it because you need to run a predicate against it: “scope X as string
when its grandparent node Y passes a certain test.” You don't want to add a scope name to Y, but you need to give it a name so you can designate which node is being tested.
@_IGNORE_
is a special node name in all query files that has no side-effects. But for even more complex queries, you might need more than one ignored node, so now @_IGNORE_
can also be a namespace: @_IGNORE_.foo
, @_IGNORE_.bar
, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
large diffs are not rendered by default
Oh no.
UPDATE: Well, I've read (or skimmed for the rather dense / alien-to-me parts) this entire diff. I don't totally understand, but I consider reading it a sort of immersion learning. Nothing jumped out as alarming. I barely know what I'm looking at, but *gestures like Buzz Lightyear* "Ranges. Ranges, everywhere."
I commend and salute your efforts in maintaining an upgrading these systems.
if (this.currentScopeIsCovered) { | ||
// console.log( | ||
// iterator.name, | ||
// iterator.depth, | ||
// 'would close', | ||
// iterator._inspectScopes( | ||
// iterator.getCloseScopeIds() | ||
// ), | ||
// 'at', | ||
// iterator.getPosition().toString(), | ||
// 'but scope is covered!' | ||
// ); | ||
} else { | ||
// console.log( | ||
// iterator.name, | ||
// iterator.depth, | ||
// 'CLOSING', | ||
// iterator.getPosition().toString(), | ||
// iterator._inspectScopes( | ||
// iterator.getCloseScopeIds() | ||
// ) | ||
// ); | ||
} | ||
// if (this.currentIteratorIsCovered === true || this.currentIteratorIsCovered === 'close') { | ||
// console.log( | ||
// iterator.name, | ||
// iterator.depth, | ||
// 'would close', | ||
// iterator._inspectScopes( | ||
// iterator.getCloseScopeIds() | ||
// ), | ||
// 'at', | ||
// iterator.getPosition().toString(), | ||
// 'but scope is covered!' | ||
// ); | ||
// } else { | ||
// console.log( | ||
// iterator.name, | ||
// iterator.depth, | ||
// 'CLOSING', | ||
// iterator.getPosition().toString(), | ||
// iterator._inspectScopes( | ||
// iterator.getCloseScopeIds() | ||
// ) | ||
// ); | ||
// } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can/should this large, commented-out section of this file be deleted, now that there's a separate if
below seemingly for the same check?
if (this.currentIteratorIsCovered === true || this.currentIteratorIsCovered === 'close') {
There's one more similar-looking chunk like this later in this file as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When there's a low-level bug in highlighting, these blocks are the first ones that I uncomment. If we were writing C, these would probably be macros or something. I'd take them out entirely if I didn't use them so often.
// ) | ||
// ); | ||
} | ||
// if (this.currentIteratorIsCovered === true || this.currentIteratorIsCovered === 'open') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto the above comment.
Can/should this commented out chunk just be deleted? Do we need this still, for reference purposes, or...
); | ||
}); | ||
let covers = false; | ||
for (let it of rest) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OPTIONAL CODE NIT:
Honestly not important, but I noticed it:
Theoretically things like this can be const it of rest
instead of let it of rest
.
Since it's assigning a variable within the block scope of the loop ("loop scope"?) that only survives for one iteration of the loop and is re-created next iteration, and which we don't anticipate changing inside the loop scope (phrasing?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a militant on Team let
. It is shorter to type, so it is what I type.
The guy who proposed const
in JavaScript said that he never would have envisioned that people would react to it by thinking, “great, now we can declare things as const
by default.” It adds a tiny bit of friction to every single variable declaration you add for the rest of your life, and the upside is that you theoretically avoid falling victim to one of the most obscure kinds of bugs there is.
If other people write const
, I won't rewrite it out of spite. Hell, if there's a linting rule that rewrites let
to const
, I'm fine with it! Just don't make me type const
. As with most matters of style, we should let robots do the drudgery.
They don’t gain us anything, and in fact actually cause this grammar to be ranked _below_ the TextMate PHP grammar when they fail.
I offer to personally fix the merge conflict in |
I'll take you up on that, assuming it lets you edit this PR. I never turn down anyone's help with merge conflicts. |
NOTE: This pull request will have a large number of syntax highlighting fixes for issues that have been reported since the release of 1.113. These aggregated Tree-sitter PRs typically get merged shortly before the next release — in this case, a few days before Pulsar 1.114 is released on February 15 — but the large volume of changes in this PR may mean that we merge this one earlier so that we can get the fixes to you sooner in the form of a rolling release.
If you're curious whether a specific problem has been fixed, please read the commit descriptions. You can also check the other issues to see if there's a Mega-Issue™ for your particular language — if so, then any submitted fixes for that language will be detailed in the issue comments. Once I've commented that a particular issue has been fixed, CI will usually generate usable binaries for Windows, macOS, and Linux (on Intel or ARM32 platforms) within an hour.
If you have reported any of these issues yourself, we are very grateful.
If you like, you can download a CI build of Pulsar for your platform (unless you’re on an M1/M2 or ARM64 platform) to verify these fixes and tide yourself over until they land in 1.114:
Getting this one in early.
@claytonrcarter asked about all the duplication of
TODO_PATTERN
andHYPERLINK_PATTERN
in a bunch of languages (for the purpose of TODO/hyperlink injections) and I rubberducked myself into an answer. I hadn't even tried to convert those usages into services, but it was dead simple. Still, I didn't want to force it into 1.113 at the last minute, and it can certainly wait a month.Also, it's finally no longer mandatory for a grammar to have a
highlightsQuery
.