Retain comments in AST #308

turbolent · 2020-08-10T16:49:01Z

SupunS · 2021-03-09T05:22:39Z

I think we may have to capture the comments at a further granular level. Because a comment can exist at any place where whitespace can occur, not only at declarations/statements/expressions level, and we would have to retain those as well.
e.g. let /*Im a comment*/ x: Int

A solution would be to:

Capture doc comments only at the declaration level. (It is useful to have them separately)
Capture other comments (and any trivia including whitespace, newlines, etc, if needed) at the token level.

Roslyn (C#) compiler follows a similar approach: https://github.com/dotnet/roslyn/blob/main/docs/wiki/Roslyn-Overview.md#syntax-trivia

This way, it also divides the whitespace/comments between two consecutive non-trivia tokens as leading and trailing trivia (rather than having leading trivia only, or trailing trivia only), so that the comments are attached to the correct node/token. And moving that node around (code organizing/formatting), would also move the comment along with it.

e.g: See the bellow comments, which is the 'usual' way followed by most people.

// (1) The comment immediately before the decl is usually about 'x' decl
let x: Int  // (2) A comment on the same line also talks about 'x' decl (or about `Int`)
// (3) Similarly, this is a comment about 'y' decl
let y: Int  // (4) This is also a comment about 'y' decl

In the above, comments (1) and (2) would be a part of x-var-decl. Moving let x: 5 around would also move the comments (1) and (2) along with it.
Similarly, (3) and (4) would be a part of y-var-decl.

lkadian · 2021-03-10T02:56:08Z

hi @SupunS Great comment, agree with your proposed solution.
Have you started work on this issue? I ask this because I had actually worked on this but never got around to completing it. I was planning to put up a PR soon, so would just like to confirm if you are also working on it so that we can avoid duplicate work!

SupunS · 2021-03-10T03:49:28Z

@lkadian I haven't done or started any work on this. Great to hear that you have already been working on it! 👏
Would love to have a PR!

Looping in @MaxStalker.

lkadian · 2021-03-29T03:28:56Z

This issue has been a bit trickier than I thought as it touches a lot of parts of the parsing code, and as a result it affects a lot of the tests. Posting some info on how I am implementing this, so that if anything seems completely off, someone can let me know! Also to mention that it's still in progress

The original issue mentions retaining trivia for declarations, statement and expressions, but as @SupunS pointed out it looks like we may need to capture leading/trailing trivia for each token. Basically, we'll have a full syntax tree where we hold every token, and each token will know about its leading/trailing trivia.

Restructure lexer so that tokens and trivia are separate concepts
Fix broken lexer tests
Add lexertests

Currently we hold trivia as regular tokens, but we need these to be a separate entity. Then when scanning, we will emit each token accompanied with it's leading and trailing trivia:

type Token struct {
	Type  TokenType
	Value interface{}
	ast.Range
	LeadingTrivia  []Trivia
	TrailingTrivia []Trivia
}

Rules for leading/trailing trivia (taken from Swift):

A token owns all of its trailing trivia up to, but not including, the next newline character.
Looking backward in the text, a token owns all of the leading trivia up to and including the first contiguous sequence of newlines characters

Restructure parser to work with above changes and support all tokens in the AST (full syntax tree)
Fix broken parser tests
Add parser tests
Currently not all tokens are stored in the AST as they are not needed in later stages. But if we want this sort of "full syntax tree", we will need to keep all tokens for each syntax node.

SupunS · 2021-04-13T04:10:54Z

Thanks, @lkadian for taking this forward and for the great progress!
Sorry for not being in touch recently - I somehow miss updates from this thread, probably have drowned with other stuff.

Currently we hold trivia as regular tokens, but we need these to be a separate entity. Then when scanning, we will emit each token accompanied with it's leading and trailing trivia:
type Token struct {
	Type  TokenType
	Value interface{}
	ast.Range
	LeadingTrivia  []Trivia
	TrailingTrivia []Trivia
}
Rules for leading/trailing trivia (taken from Swift):

A token owns all of its trailing trivia up to, but not including, the next newline character.

Looking backward in the text, a token owns all of the leading trivia up to and including the first contiguous sequence of newlines characters

This sounds like a pretty good solution to me! 👏

Restructure parser to work with above changes and support all tokens in the AST (full syntax tree)

Fix broken parser tests

Add parser tests
Currently not all tokens are stored in the AST as they are not needed in later stages. But if we want this sort of "full syntax tree", we will need to keep all tokens for each syntax node.

That's a good point. I think we'll have to retain the full syntax tree, if we need to retain all the comments.

In the future, we could add an option to switch capturing the full syntax-tree on and off, depending on the use-case. For example, we need the full syntax tree and the trivia, when using in the IDE/playground during the development time, but don't really need them when parsing and running the code on-chain.

Maybe we can keep a flag for now, but have it 'on' by default, and later we can make it configurable. I'm not sure how big would that change (adding a flag) be. Another option is to add the flag altogether as a separate change, so that current changes don't get too big.

j1010001 · 2023-01-30T23:26:32Z

We can split this into 2 milestones:

Cover the majority of cases – we don't need to have perfect solution to cover every edge-case, good benchmark is to preserve all comments in contracts on Mainnet.
Cover the remaining edge-cases

turbolent added Good First Issue Good for newcomers Feature Feedback and removed Feedback labels Aug 10, 2020

turbolent mentioned this issue Aug 10, 2020

Implement pretty print for Cadence #209

Open

turbolent mentioned this issue Sep 13, 2022

Beautify Cadence onflow/vscode-cadence#72

Open

This was referenced Jul 31, 2024

Preserve comments in AST bartolomej/cadence#1

Closed

[WIP] Retain comments in AST #3509

Draft

Cadence formatter onflow/developer-grants#266

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retain comments in AST #308

Retain comments in AST #308

turbolent commented Aug 10, 2020 •

edited

Loading

SupunS commented Mar 9, 2021 •

edited

Loading

lkadian commented Mar 10, 2021

SupunS commented Mar 10, 2021

lkadian commented Mar 29, 2021 •

edited

Loading

SupunS commented Apr 13, 2021

j1010001 commented Jan 30, 2023 •

edited by turbolent

Loading

Retain comments in AST #308

Retain comments in AST #308

Comments

turbolent commented Aug 10, 2020 • edited Loading

Issue To Be Solved

Definition of Done

SupunS commented Mar 9, 2021 • edited Loading

lkadian commented Mar 10, 2021

SupunS commented Mar 10, 2021

lkadian commented Mar 29, 2021 • edited Loading

SupunS commented Apr 13, 2021

j1010001 commented Jan 30, 2023 • edited by turbolent Loading

turbolent commented Aug 10, 2020 •

edited

Loading

SupunS commented Mar 9, 2021 •

edited

Loading

lkadian commented Mar 29, 2021 •

edited

Loading

j1010001 commented Jan 30, 2023 •

edited by turbolent

Loading