-
-
Notifications
You must be signed in to change notification settings - Fork 289
Description
Summary
Make white-space handling less confusing / more consistent with the introduction of an "adjacent selector": ,
Motivation
-
The rules
soi,eoi,white-space&commentlook like any other rule, and withwhitespaceandcommentbeing names that users are extremely likely to use themselves, the hidden behaviour can be confusing, unexpected and even impractical for the user's desire -
The implicit behaviour of the
~selector only when awhitespacerule is defined, is sometimes not what the user wants and they have to resort to the@modifier even though this effects the entire rule and not just one selector. This can make it very difficult to achieve certain effects without resorting to nested rules
Guide-level explanation
Adjacent elements in a rule can be separated by a ",":
rule = { a, b, c }
Where rule a is followed by rule b which is followed by rule c. No white-space is assumed between the rules, though note that any rule can itself contain a whitespace rule.
Contrast this with:
rule = { a ~ b ~ c }
Which will check for optional white-space and comments between rules a, b & c.
The adjacent (,) and white-space selector (~) can be combined in a rule; this can be used to carefully control the automatic assumption of white-space or comments.
rule = { a, b ~ c }
The behaviour of the adjacent selector remains consistent within nested rules making use of modifiers.
In the below example, the behaviour of rule a is unaffected by the parent rules utilising different white-space moderators.
rule1 = @{ a }
rule2 = ${ a }
rule3 = !{ c }
a = { b, c }
Reference-level explanation
I can't comment on the inner workings of Pest, as I haven't even completed my first ever Rust program yet, but based on writing parsers in other languages (VB6, Go, Perl6), the adjacent selector should provide minimum difficulty. It doesn't add functionality, and expresses a natural state of one token following another.
Drawbacks
-
Even with the addition of such a simple feature there is still the cost of implementation, testing, documentation and compatibility
-
There could be unforeseen consequences with parsing behaviour with complex combinations of features; adding another feature increases the available complexity
Rationale and alternatives
This is the simplest possible design to resolve a need of adjacent selection without changing the behaviour of existing selectors (for compatibility). The choice of actual character used (,), the name, the terminology etc, can be debated.
In some languages (e.g. PHP, some functional languages) a dot is used for concatenation:
alternative = { a . b . c }
This may be desirable over a comma as it is more explicitly a "operator" between words, where as a comma could be confused for something more general and peppered by the user in places where it shouldn't be.
Adjacency could be communicated without the use of a character where one rule separated from another rule by white-space implies adjacency; e.g.:
alternate = { a b c }
Whilst this form exists in BNF and some derivatives, the use of an explicit separator avoids potential parse-errors or unintentional behaviour from the user and also provides visual balance as every rule is always separated in all cases, regardless of separator.
Alternative options include changing the functionality of existing features, such as how the "~" selector or "@" modifier operates.
The automagic behaviour of rules with specific names is a documentation / support / learning hindrance, but I believe that that can be resolved separately outside of this RFC as they will have much more drastic implications.
The proposed feature adds to the project, provides benefits, whilst also not taking away from existing features. At the cost of additional documentation, it may help users avoid issues starting out.
Prior art
The use of a comma for adjacency is present in EBNF, which Pest roughly follows.
Unresolved questions
?