Change the AST to make it easier to pattern match and remove ambiguities #24

doorgan · 2021-07-11T14:38:02Z

This is derived from a discussion at #23. The tl;dr is that wrapping literals in :__block__ works, but makes it cumbersome to pattern match against. Some other nodes could be changed as well, like sigils.

The changes to be made would be such that the following syntaxes would be mapped as follows:
foo -> {:var, metadata, :foo}
400_000 -> {:int, [token: "400_000" | metadata], 400000}
42.0 -> {:float, [token: "42.0" | metadata], 42.0}
:foo -> {:atom, metadata, :foo}
[1, 2] -> {[], metadata, [1, 2]}
{1, 2} -> {:{}, metadata, [1. 2]}
~w[a b c] -> {{:sigil, "w"}, metadata, [segments, modifiers]}

{:sigil, meta, ["w", segments, modifiers]} for sigils would be ambiguous with function calls, for example. ~w/foo #{:bar}/a and foo("w", "foo #{:bar}", 'a') already produce the very same AST, with the only exception that the sigil has a :delimiter metadata that lets us recognize it as an actual sigil. Because AST nodes metadata order is not guaranteed, we can't just pattern match the metadata to assert it's a sigil. Hence the idea of using a tuple as the node type instead.

The same goes for lists. If the list node is {:list, meta, elements}, then [1, 2, 3] and list(1, 2, 3) would produce exactly the same AST, so I was thinking on using {[], meta, elements} instead. I'm not very happy with the loss in readability there, but it would work.

There are some opportunities to enrich the AST further, for example, the AST for A.B.C.fun is this:

{{:., _, [{:__aliases__, [], [:A, :B, :C], :fun]}, _, []}

By leveraging the :static_atoms_encoder parser option, we could transform it to this:

{{:., _, [{:__aliases__, [], [
  {:atom, _, :A}, {:atom, _, :B}, {:atom, _, :C}
], {:atom, _, :fun}]}, _, []}

Which, of course, it a lot more verbose, but on the other hand it allows us to:
a) Have a more consistent AST, meaning we no longer have some places were atoms are wrapped and others were they're not
b) We can have more precise information about where exactly each alias segment is happening

For b we would still miss information about where exactly the dots are located, but at least we could accurately replace the inner segments.

The final change I had thought of is to make the parsing safe by default, meaning that parsing a file wont create new atoms. The downside is that the Formatter only works with atoms, so Sourceror.to_string would create new atoms anyways and we're back to square one.

The text was updated successfully, but these errors were encountered:

doorgan · 2021-07-11T14:40:08Z

Some side effects of this:

We need a custom traversal function instead of relying on Macro.traverse
The Zipper API may need to be updated to handle this new AST, meaning it may no longer work as expected with the regular AST
We need an extra step before passing the AST to the formatter, so we give it something it can understand

All of these are easily addressable.

doorgan · 2021-07-11T15:18:13Z

You can try these changes(with the exception of the static_atoms_encoder one) in the rich_ast branch:

{:sourceror, github: "doorgan/sourceror", branch: "rich_ast"}

CC @marcandre

doorgan · 2021-07-11T20:40:44Z

Sigils were changed to {:"~", metadata, [letter, contents, modifiers]}

msaraiva · 2021-07-27T13:59:03Z

@doorgan would it be wiser to wait for these changes to be applied before using sourceror? I have a couple of mix tasks I want to add to Surface that could make good use of sourceror but I believe I may have to change the implementation a lot when the new AST comes out.

doorgan · 2021-07-27T14:06:13Z

Hi @msaraiva, I don't see these changes applied to the main branch anytime soon, as it's still too experimental and I realize it will take quite some time to find something that works fine for Sourceror use cases.

When the time to merge them comes closer I'll let you know and I'll be happy to help you migrate the Surface scripts :)

msaraiva · 2021-07-27T14:13:04Z

@doorgan sounds good!

I didn't know these changes were that far away. I'll give v0.8.1 a try then.

Cheers.

doorgan · 2021-07-27T14:21:46Z

@msaraiva I initially intended to apply them in 0.9, but I started to deviate too much from the original goal and I want to have a clearer spec first. There's lot of nuances when it comes to having completely unambiguous node types and keeping the 3-tuple shape that I need to resolve 🙂. So maybe they will land in 0.10 or 0.11.

doorgan added the enhancement New feature or request label Jul 11, 2021

zachallaun mentioned this issue Mar 14, 2024

Proposal: Introduce some hook to change encoding of parsed result elixir-tools/spitfire#18

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change the AST to make it easier to pattern match and remove ambiguities #24

Change the AST to make it easier to pattern match and remove ambiguities #24

doorgan commented Jul 11, 2021

doorgan commented Jul 11, 2021

doorgan commented Jul 11, 2021 •

edited

Loading

doorgan commented Jul 11, 2021

msaraiva commented Jul 27, 2021

doorgan commented Jul 27, 2021

msaraiva commented Jul 27, 2021

doorgan commented Jul 27, 2021 •

edited

Loading

Change the AST to make it easier to pattern match and remove ambiguities #24

Change the AST to make it easier to pattern match and remove ambiguities #24

Comments

doorgan commented Jul 11, 2021

doorgan commented Jul 11, 2021

doorgan commented Jul 11, 2021 • edited Loading

doorgan commented Jul 11, 2021

msaraiva commented Jul 27, 2021

doorgan commented Jul 27, 2021

msaraiva commented Jul 27, 2021

doorgan commented Jul 27, 2021 • edited Loading

doorgan commented Jul 11, 2021 •

edited

Loading

doorgan commented Jul 27, 2021 •

edited

Loading