Skip to content

RFC: Make white-space handling less confusing / more consistent with the introduction of an "adjacent selector": , #271

@Kroc

Description

@Kroc

Summary

Make white-space handling less confusing / more consistent with the introduction of an "adjacent selector": ,

Motivation

  • The rules soi, eoi, white-space & comment look like any other rule, and with whitespace and comment being names that users are extremely likely to use themselves, the hidden behaviour can be confusing, unexpected and even impractical for the user's desire

  • The implicit behaviour of the ~ selector only when a whitespace rule is defined, is sometimes not what the user wants and they have to resort to the @ modifier even though this effects the entire rule and not just one selector. This can make it very difficult to achieve certain effects without resorting to nested rules

Guide-level explanation

Adjacent elements in a rule can be separated by a ",":

rule = { a, b, c }

Where rule a is followed by rule b which is followed by rule c. No white-space is assumed between the rules, though note that any rule can itself contain a whitespace rule.

Contrast this with:

rule = { a ~ b ~ c }

Which will check for optional white-space and comments between rules a, b & c.
The adjacent (,) and white-space selector (~) can be combined in a rule; this can be used to carefully control the automatic assumption of white-space or comments.

rule = { a, b ~ c }

The behaviour of the adjacent selector remains consistent within nested rules making use of modifiers.
In the below example, the behaviour of rule a is unaffected by the parent rules utilising different white-space moderators.

rule1 = @{ a }
rule2 = ${ a }
rule3 = !{ c }

a = { b, c }

Reference-level explanation

I can't comment on the inner workings of Pest, as I haven't even completed my first ever Rust program yet, but based on writing parsers in other languages (VB6, Go, Perl6), the adjacent selector should provide minimum difficulty. It doesn't add functionality, and expresses a natural state of one token following another.

Drawbacks

  • Even with the addition of such a simple feature there is still the cost of implementation, testing, documentation and compatibility

  • There could be unforeseen consequences with parsing behaviour with complex combinations of features; adding another feature increases the available complexity

Rationale and alternatives

This is the simplest possible design to resolve a need of adjacent selection without changing the behaviour of existing selectors (for compatibility). The choice of actual character used (,), the name, the terminology etc, can be debated.

In some languages (e.g. PHP, some functional languages) a dot is used for concatenation:

alternative = { a . b . c }

This may be desirable over a comma as it is more explicitly a "operator" between words, where as a comma could be confused for something more general and peppered by the user in places where it shouldn't be.

Adjacency could be communicated without the use of a character where one rule separated from another rule by white-space implies adjacency; e.g.:

alternate = { a b c }

Whilst this form exists in BNF and some derivatives, the use of an explicit separator avoids potential parse-errors or unintentional behaviour from the user and also provides visual balance as every rule is always separated in all cases, regardless of separator.

Alternative options include changing the functionality of existing features, such as how the "~" selector or "@" modifier operates.

The automagic behaviour of rules with specific names is a documentation / support / learning hindrance, but I believe that that can be resolved separately outside of this RFC as they will have much more drastic implications.

The proposed feature adds to the project, provides benefits, whilst also not taking away from existing features. At the cost of additional documentation, it may help users avoid issues starting out.

Prior art

The use of a comma for adjacency is present in EBNF, which Pest roughly follows.

Unresolved questions

?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions