Skip to content
This repository was archived by the owner on Oct 5, 2024. It is now read-only.

Parse comments #106

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Parse comments #106

wants to merge 4 commits into from

Conversation

pablohirafuji
Copy link
Contributor

@pablohirafuji pablohirafuji commented Jul 1, 2020

Add comment parsing and storing. This was the best strategy I could think of, but if you have any ideas (it would be nice to keep the (|.) and (|=) infixes), let me know.

@Janiczek
Copy link
Contributor

Janiczek commented Jul 1, 2020

Hi @pablohirafuji, wow, thanks!

I'll review this, hopefully this week.

I'd also like to validate this approach with folks:

  • Aaron V. (how does elm-format solve this?)
  • Evan (is there a viable approach that doesn't fork elm/parser? Should elm/parser be extended?)
    and perhaps look at other languages and papers, functional pearls etc.

Perhaps there are more insights to be gained.

Thanks so much for the work you've done here! ❤️

Comment on lines 58 to 59
|> keep myParser
|> ignore spaces
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I've seen keep / skip naming somewhere, which mighit be nicer because the two functions have names of the same length? WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it is nicer, but I think it is harder to distinguish each one, eg.:

|> P.keep moduleType
|> P.skip spacesCommentsAndGreaterIndent
|> P.keep moduleName
|> P.skip spacesCommentsAndGreaterIndent
|> P.skip (P.keyword (P.Token "exposing" ExpectingExposingKeyword))
|> P.skip spacesCommentsAndGreaterIndent
|> P.keep exposingList

With the ignore:

|> P.keep moduleType
|> P.ignore spacesCommentsAndGreaterIndent
|> P.keep moduleName
|> P.ignore spacesCommentsAndGreaterIndent
|> P.ignore (P.keyword (P.Token "exposing" ExpectingExposingKeyword))
|> P.ignore spacesCommentsAndGreaterIndent
|> P.keep exposingList

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, got it. The same length might actually be a disadvantage, not an advantage. Cool to see it on specific code snippets.

@Janiczek
Copy link
Contributor

Janiczek commented Jul 2, 2020

One of the feedback I've received is that ParserWithState state a is roughly equivalent to Parser (a, state). We could use the elm/parser as-is and just track the state explicitly. WDYT?


Re other approaches: I'm still researching.

elm-format

Judging from the source code, the way elm-format works is by remembering all parsed whitespace (including comments) and holding that inside all the types like Expr, Declaration, etc.

Take look at eg. declarations: https://github.com/avh4/elm-format/blob/master/parser/src/AST/Declaration.hs#L14-L31
which corresponds roughly to our https://github.com/elm-in-elm/compiler/blob/master/src/Elm/Data/Declaration.elm#L45-L67.

Or Exprs: elm-format https://github.com/avh4/elm-format/blob/master/parser/src/AST/Expression.hs#L33 where many of the constructors contain the Comments type / Commented wrapper. (Compared to our https://github.com/elm-in-elm/compiler/blob/master/src/Elm/AST/Frontend.elm#L53)

I'm not entirely sure this is the approach to pursue, although it would remove a need we'd probably have with the current approach of this PR (Module.comments : List Comment), which is that when putting somehow transformed Frontend.Exprs back into String, we'd have to somehow thread the comments back in during the emit. That sounds like going through a list N times (unless we'd have some clever data structure) and searching whether one of the comments relates to the range you're currently emitting. The elm-format approach starts to sound good to me in comparison, although I'm worried how the types would look afterwards.

Anyways, as said, I'm still researching and asking around.

@Janiczek
Copy link
Contributor

Janiczek commented Jul 2, 2020

Another option might possibly be having | Comment Expr as another Expr variant? Unsure.

Also I found this article: https://github.com/terrajobst/minsk/blob/master/docs/episode-24.md

@pablohirafuji
Copy link
Contributor Author

pablohirafuji commented Jul 2, 2020

One of the feedback I've received is that ParserWithState state a is roughly equivalent to Parser (a, state). We could use the elm/parser as-is and just track the state explicitly. WDYT?

I tried this approach first. It produces code like this:

P.succeed
    (\moduleType_ comments1 moduleName_ comments2 comments3 exposing_ ->
        { content = ( moduleType_, moduleName_, exposing_ )
        , comments = comments1 ++ comments2 ++ comments3
        }
    )
    |= moduleType
    |= spacesCommentsAndGreaterIndent
    |= moduleName
    |= spacesCommentsAndGreaterIndent
    |. (P.keyword (P.Token "exposing" ExpectingExposingKeyword))
    |= spacesCommentsAndGreaterIndent
    |= exposingList

In contrast, the WithState forked parser module let you do this:

P.succeed
    (\moduleType_ moduleName_ exposing_ ->
        ( moduleType_, moduleName_, exposing_ )
    )
    |> P.keep moduleType
    |> P.ignore spacesCommentsAndGreaterIndent
    |> P.keep moduleName
    |> P.ignore spacesCommentsAndGreaterIndent
    |> P.ignore (P.keyword (P.Token "exposing" ExpectingExposingKeyword))
    |> P.ignore spacesCommentsAndGreaterIndent
    |> P.keep exposingList

My thoughts were:

  1. The (|.) infix will rarely be used;
  2. Coding the intented parser already requires intensive focus without having to handle the comments;
  3. Appending the comments in every parser/sub-parser is error prone;

Judging from the source code, the way elm-format works is by remembering all parsed whitespace (including comments) and holding that inside all the types like Expr, Declaration, etc.

Interesting. Only the Frontend AST would have to look like this? What happens to the comments on the other stages?


Another option might possibly be having "| Comment Expr" as another Expr variant? Unsure.

I don't think this solves or facilitate anything. What happens with the comments between the if...then expression? And I'm not sure how the incredible useful Pratt parser would behave.

@Janiczek
Copy link
Contributor

Janiczek commented Jul 2, 2020

I tried this approach first. It produces code like this:
In contrast, the WithState forked parser module let you do this:

I'll have to think about this a bit. I've been doing something similar before in the InferTypes phase, basically creating abstractions that would thread some state around for me, but in the end it didn't work out very well and we ditched it. I might have a tendency towards the explicit state threading as a result 😅

Weighing the pros and cons, also the goals of "being a learning resource" etc, I'm leaning towards not having our own parser library forks at the moment.

Judging from the source code, the way elm-format works is by remembering all parsed whitespace (including comments) and holding that inside all the types like Expr, Declaration, etc.

Interesting. Only the Frontend AST would have to look like this? What happens to the comments on the other stages?

Yes, IMHO canonical + typed AST can throw all that away and be "pure". If we're after the "desugar" / lowering stage, we're free to tweak the AST as we see fit.

Another option might possibly be having "| Comment Expr" as another Expr variant? Unsure.

I don't think this solves or facilitate anything. What happens with the comments between the if...then expression? And I'm not sure how the incredible useful Pratt parser would behave.

Yes it's unlikely that anything would come out of this. ❌


Re "comments on the side" vs "comments in every type",perhaps we should check how would working with each look like (after the parsing). Some real-ish tasks one might want to do with the AST and the comments, eg. for elm-format-like tool (say, moving comments around). I have to say I don't like how the types would look after "comments in every type" were implemented, but it seems much easier to work with than module.comments : List Comment.

@pablohirafuji
Copy link
Contributor Author

pablohirafuji commented Jul 2, 2020

I have to say I don't like how the types would look after "comments in every type" were implemented, but it seems much easier to work with than module.comments : List Comment.

I agree. I love elm-format, it is very reliable. I thought no one would do such approach (didn't even consider it so much), but makes sense. Can I implement it?

@Janiczek
Copy link
Contributor

Janiczek commented Jul 2, 2020

@pablohirafuji Could you try? 🙏
I feel bad asking you to basically duplicate the effort, but I believe it's an important decision so it deserves two spikes in different directions :)

@pablohirafuji
Copy link
Contributor Author

pablohirafuji commented Jul 2, 2020

No worries, I am striving for the best code the package can have . I'm idealist, not pragmatist. That's why I love Elm so much 😄

@Janiczek
Copy link
Contributor

Janiczek commented Jul 2, 2020

We'll likely need to remember all whitespace "inside" expressions

One thing that I've learned from today's discussions that I probably should disseminate here (apparently it was a lesson learned the hard way) is that expressions shouldn't eagerly eat whitespace that "doesn't belong to them".

      fn {- blabla -} x    y
(YES) ^^ span of `Var fn`
(YES)                 ^ span of `Var x`
(YES)                      ^ span of `Var y`
      ^^^^^^^^^^^^^^^^^^^^^^ span of `FunctionCall`

      fn {- blabla -} x    y
(NOT) ^^^^^^^^^^^^^^^^ span of `Var fn`
(NOT)                 ^^^^^ span of `Var x`
(NOT)                  ^^^^^ span of `Var y`

So... perhaps that will come in handy :)

@pablohirafuji
Copy link
Contributor Author

In essence, the parser is ready. The PR is missing some adjustments, like:

  • Store some parsed but not stored comments;
  • Some locations are not right;
  • Store empty list and record comments [ {-comment-} ];
  • Tests;
  • Documentations;

@pablohirafuji
Copy link
Contributor Author

pablohirafuji commented Aug 11, 2020

I'm storing the comments before any declaration with the declaration itself (commentsBefore), instead of creating another Declaration type just for the comments. It's easy to change it if you prefer to store the comments as declarations, but note that the declarations are being stored as a Dict String Declaration, so the comments outside any declaration will be throw in a comments : List Comment, like it was before. We can store the declarations in a List DeclarationOrComment, but I don't know the implications of this change.

@Janiczek
Copy link
Contributor

Woot woot! I'll take a look at this relatively-soonish 🙂 Thanks @pablohirafuji !

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants