Parse comments #106

pablohirafuji · 2020-07-01T20:49:43Z

Add comment parsing and storing. This was the best strategy I could think of, but if you have any ideas (it would be nice to keep the (|.) and (|=) infixes), let me know.

Janiczek · 2020-07-01T21:11:43Z

Hi @pablohirafuji, wow, thanks!

I'll review this, hopefully this week.

I'd also like to validate this approach with folks:

Aaron V. (how does elm-format solve this?)
Evan (is there a viable approach that doesn't fork elm/parser? Should elm/parser be extended?)
and perhaps look at other languages and papers, functional pearls etc.

Perhaps there are more insights to be gained.

Thanks so much for the work you've done here! ❤️

Janiczek · 2020-07-02T00:34:03Z

src/Stage/Parse/AdvancedWithState.elm

+        |> keep myParser
+        |> ignore spaces


I believe I've seen keep / skip naming somewhere, which mighit be nicer because the two functions have names of the same length? WDYT?

Ah, got it. The same length might actually be a disadvantage, not an advantage. Cool to see it on specific code snippets.

Janiczek · 2020-07-02T16:27:57Z

One of the feedback I've received is that ParserWithState state a is roughly equivalent to Parser (a, state). We could use the elm/parser as-is and just track the state explicitly. WDYT?

Re other approaches: I'm still researching.

elm-format

Judging from the source code, the way elm-format works is by remembering all parsed whitespace (including comments) and holding that inside all the types like Expr, Declaration, etc.

Take look at eg. declarations: https://github.com/avh4/elm-format/blob/master/parser/src/AST/Declaration.hs#L14-L31
which corresponds roughly to our https://github.com/elm-in-elm/compiler/blob/master/src/Elm/Data/Declaration.elm#L45-L67.

Or Exprs: elm-format https://github.com/avh4/elm-format/blob/master/parser/src/AST/Expression.hs#L33 where many of the constructors contain the Comments type / Commented wrapper. (Compared to our https://github.com/elm-in-elm/compiler/blob/master/src/Elm/AST/Frontend.elm#L53)

I'm not entirely sure this is the approach to pursue, although it would remove a need we'd probably have with the current approach of this PR (Module.comments : List Comment), which is that when putting somehow transformed Frontend.Exprs back into String, we'd have to somehow thread the comments back in during the emit. That sounds like going through a list N times (unless we'd have some clever data structure) and searching whether one of the comments relates to the range you're currently emitting. The elm-format approach starts to sound good to me in comparison, although I'm worried how the types would look afterwards.

Anyways, as said, I'm still researching and asking around.

Janiczek · 2020-07-02T18:32:56Z

Another option might possibly be having | Comment Expr as another Expr variant? Unsure.

Also I found this article: https://github.com/terrajobst/minsk/blob/master/docs/episode-24.md

pablohirafuji · 2020-07-02T21:34:19Z

One of the feedback I've received is that ParserWithState state a is roughly equivalent to Parser (a, state). We could use the elm/parser as-is and just track the state explicitly. WDYT?

I tried this approach first. It produces code like this:

P.succeed
    (\moduleType_ comments1 moduleName_ comments2 comments3 exposing_ ->
        { content = ( moduleType_, moduleName_, exposing_ )
        , comments = comments1 ++ comments2 ++ comments3
        }
    )
    |= moduleType
    |= spacesCommentsAndGreaterIndent
    |= moduleName
    |= spacesCommentsAndGreaterIndent
    |. (P.keyword (P.Token "exposing" ExpectingExposingKeyword))
    |= spacesCommentsAndGreaterIndent
    |= exposingList

In contrast, the WithState forked parser module let you do this:

P.succeed
    (\moduleType_ moduleName_ exposing_ ->
        ( moduleType_, moduleName_, exposing_ )
    )
    |> P.keep moduleType
    |> P.ignore spacesCommentsAndGreaterIndent
    |> P.keep moduleName
    |> P.ignore spacesCommentsAndGreaterIndent
    |> P.ignore (P.keyword (P.Token "exposing" ExpectingExposingKeyword))
    |> P.ignore spacesCommentsAndGreaterIndent
    |> P.keep exposingList

My thoughts were:

The (|.) infix will rarely be used;
Coding the intented parser already requires intensive focus without having to handle the comments;
Appending the comments in every parser/sub-parser is error prone;

Judging from the source code, the way elm-format works is by remembering all parsed whitespace (including comments) and holding that inside all the types like Expr, Declaration, etc.

Interesting. Only the Frontend AST would have to look like this? What happens to the comments on the other stages?

Another option might possibly be having "| Comment Expr" as another Expr variant? Unsure.

I don't think this solves or facilitate anything. What happens with the comments between the if...then expression? And I'm not sure how the incredible useful Pratt parser would behave.

Janiczek · 2020-07-02T21:57:44Z

I tried this approach first. It produces code like this:
In contrast, the WithState forked parser module let you do this:

I'll have to think about this a bit. I've been doing something similar before in the InferTypes phase, basically creating abstractions that would thread some state around for me, but in the end it didn't work out very well and we ditched it. I might have a tendency towards the explicit state threading as a result 😅

Weighing the pros and cons, also the goals of "being a learning resource" etc, I'm leaning towards not having our own parser library forks at the moment.

Judging from the source code, the way elm-format works is by remembering all parsed whitespace (including comments) and holding that inside all the types like Expr, Declaration, etc.

Interesting. Only the Frontend AST would have to look like this? What happens to the comments on the other stages?

Yes, IMHO canonical + typed AST can throw all that away and be "pure". If we're after the "desugar" / lowering stage, we're free to tweak the AST as we see fit.

Another option might possibly be having "| Comment Expr" as another Expr variant? Unsure.

I don't think this solves or facilitate anything. What happens with the comments between the if...then expression? And I'm not sure how the incredible useful Pratt parser would behave.

Yes it's unlikely that anything would come out of this. ❌

Re "comments on the side" vs "comments in every type",perhaps we should check how would working with each look like (after the parsing). Some real-ish tasks one might want to do with the AST and the comments, eg. for elm-format-like tool (say, moving comments around). I have to say I don't like how the types would look after "comments in every type" were implemented, but it seems much easier to work with than module.comments : List Comment.

pablohirafuji · 2020-07-02T22:15:16Z

I have to say I don't like how the types would look after "comments in every type" were implemented, but it seems much easier to work with than module.comments : List Comment.

I agree. I love elm-format, it is very reliable. I thought no one would do such approach (didn't even consider it so much), but makes sense. Can I implement it?

Janiczek · 2020-07-02T22:17:58Z

@pablohirafuji Could you try? 🙏
I feel bad asking you to basically duplicate the effort, but I believe it's an important decision so it deserves two spikes in different directions :)

pablohirafuji · 2020-07-02T22:19:42Z

No worries, I am striving for the best code the package can have . I'm idealist, not pragmatist. That's why I love Elm so much 😄

Janiczek · 2020-07-02T22:29:57Z

We'll likely need to remember all whitespace "inside" expressions

One thing that I've learned from today's discussions that I probably should disseminate here (apparently it was a lesson learned the hard way) is that expressions shouldn't eagerly eat whitespace that "doesn't belong to them".

      fn {- blabla -} x    y
(YES) ^^ span of `Var fn`
(YES)                 ^ span of `Var x`
(YES)                      ^ span of `Var y`
      ^^^^^^^^^^^^^^^^^^^^^^ span of `FunctionCall`

      fn {- blabla -} x    y
(NOT) ^^^^^^^^^^^^^^^^ span of `Var fn`
(NOT)                 ^^^^^ span of `Var x`
(NOT)                  ^^^^^ span of `Var y`

So... perhaps that will come in handy :)

pablohirafuji · 2020-08-11T15:49:09Z

In essence, the parser is ready. The PR is missing some adjustments, like:

Store some parsed but not stored comments;
Some locations are not right;
Store empty list and record comments [ {-comment-} ];
Tests;
Documentations;

pablohirafuji · 2020-08-11T15:58:45Z

I'm storing the comments before any declaration with the declaration itself (commentsBefore), instead of creating another Declaration type just for the comments. It's easy to change it if you prefer to store the comments as declarations, but note that the declarations are being stored as a Dict String Declaration, so the comments outside any declaration will be throw in a comments : List Comment, like it was before. We can store the declarations in a List DeclarationOrComment, but I don't know the implications of this change.

Janiczek · 2020-08-11T16:19:09Z

Woot woot! I'll take a look at this relatively-soonish 🙂 Thanks @pablohirafuji !

Merge master

6d1fba3

Janiczek reviewed Jul 2, 2020

View reviewed changes

Change comment storing approach

8c72c2f

pablohirafuji added 2 commits August 11, 2020 18:50

Remove unused helpers

9aee903

Fix UserDefinedType parser

d04c032

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parse comments #106

Parse comments #106

pablohirafuji commented Jul 1, 2020 •

edited

Loading

Janiczek commented Jul 1, 2020

Janiczek Jul 2, 2020

pablohirafuji Jul 2, 2020

Janiczek Jul 2, 2020

Janiczek commented Jul 2, 2020

Janiczek commented Jul 2, 2020

pablohirafuji commented Jul 2, 2020 •

edited

Loading

Janiczek commented Jul 2, 2020

pablohirafuji commented Jul 2, 2020 •

edited

Loading

Janiczek commented Jul 2, 2020

pablohirafuji commented Jul 2, 2020 •

edited

Loading

Janiczek commented Jul 2, 2020

pablohirafuji commented Aug 11, 2020

pablohirafuji commented Aug 11, 2020 •

edited

Loading

Janiczek commented Aug 11, 2020

Parse comments #106

Are you sure you want to change the base?

Parse comments #106

Conversation

pablohirafuji commented Jul 1, 2020 • edited Loading

Janiczek commented Jul 1, 2020

Janiczek Jul 2, 2020

Choose a reason for hiding this comment

pablohirafuji Jul 2, 2020

Choose a reason for hiding this comment

Janiczek Jul 2, 2020

Choose a reason for hiding this comment

Janiczek commented Jul 2, 2020

elm-format

Janiczek commented Jul 2, 2020

pablohirafuji commented Jul 2, 2020 • edited Loading

Janiczek commented Jul 2, 2020

pablohirafuji commented Jul 2, 2020 • edited Loading

Janiczek commented Jul 2, 2020

pablohirafuji commented Jul 2, 2020 • edited Loading

Janiczek commented Jul 2, 2020

pablohirafuji commented Aug 11, 2020

pablohirafuji commented Aug 11, 2020 • edited Loading

Janiczek commented Aug 11, 2020

pablohirafuji commented Jul 1, 2020 •

edited

Loading

pablohirafuji commented Jul 2, 2020 •

edited

Loading

pablohirafuji commented Jul 2, 2020 •

edited

Loading

pablohirafuji commented Jul 2, 2020 •

edited

Loading

pablohirafuji commented Aug 11, 2020 •

edited

Loading