Skip to content

Planned Features: Compiler Internals

Jake Chitel edited this page Nov 14, 2017 · 1 revision

Parser DSL

With macros, it will be very easy to define DSLs in Ren.

One place where there is a clear use case for a DSL is in the parser. At the moment it is quite cumbersome to add logic to the parser, despite significant efforts to simplify it. A strongly-typed DSL would be a perfect way to handle this.

Syntax (IN PROGRESS)

NonTerminalDefinition ::= "export"? identifier "::=" Expansion
Expansion ::= (identifier ExpansionModifiers? ":" ExpansionType ExpansionTypeModifiers?)+(sep ",")
            | (ExpansionType)+(sep "|")
ExpansionType ::= ("(" Expansion ")")
                | identifier
                | "\"" string "\""
ExpansionModifiers ::= "!"
                     | ("(" identifier ")")
ExpansionTypeModifiers ::= "?"
                         | ("*" SeparatorModifier?)
                         | ("+" SeparatorModifier?)
SeparatorModifier ::= ("(" "sep" (identifier | ("\"" string "\"")) ")")

Explanation

A grammar is built of non-terminals. Every non-terminal has a name and an expansion. A non-terminal corresponds to a node in the resulting AST, so the expansion corresponds to the fields of that node. An expansion is a list of one or more field names, each with a type.

Expansion field names can be suffixed with either of two items: a "!" indicating that the field is "definite", meaning that if it is parsed, the containing component can be labelled as parsed; or an error message, which is a parenthesized Ren expression resolving to either a string or a function that takes a Token instance and returns a string.

Expansion types can be:

  • an identifier, corresponding to either a token type or another non-terminal
  • a string, corresponding to an exact token image
  • a sub-expansion, which will be parsed as a separate node that will be placed at this field
  • a choice expansion, which is composed of a "|"-separated list of other expansion types, each of which is attempted in order, resolving to the first one that matches

Choice expansions can also be used instead of normal expansions at the top-level definition of a non-terminal, meaning that that non-terminal will not resolve to its own node, but whatever choice is parsed.

When a choice expansion is used at the top-level, it can also specify suffix choices, which are used to specify left-recursive non-terminals. Suffix choices specify what name the previously parsed choice is placed under within itself. NOTE: This is not included in the above grammar yet.

Macro

The syntax and functionality of macros haven't even been solidly defined yet, but ideally it will allow the above functionality to be strongly-typed. For example, you cannot specify an optional field if the corresponding field in the node type is not also optional, and you cannot specify a repeated field if the corresponding field in the node type is not a list.

This would also integrate into the main language in that non-terminals can be exported to be called by other modules.