pest language evolution

# Summary

This RFC hopes to address the concerns in #197, #261, #271, and #329 by laying the foundation of pest's evolution and transition.

# Motivation

While pest grammars offer an expressive language for building grammars, they lack certain features we've become accustomed with in programming languages which weakens their effectives as expressive and reusable tools. With the growing popularity of the project, more and more discussion has been focused on improving the predictability of pest as a language and a number of needs have been put forth: trivia handling, reusability, expressiveness, and general consistency.

## Trivia handling complexity

Probably the hardest concept to grasp when first learning the ropes is how trivia, i.e. whitespace and comments, are handled. pest has an automatic mechanism that simply permits trivia to live between expressions which is controlled by atomicity. Since atomic rules are cascading, it's not immediately obvious if two sequenced expressions `a ~ b` accept trivia—it wholly depends on whether or not the current rule inherits atomicity.

```pest
atomic                = @{ definitely_not_atomic }
not_atomic            =  { confusing }
definitely_not_atomic = !{ confusing }

confusing = { a ~ b }
```

The example above illustrates how `confusing` can accept trivia in some cases but not others.

## Reusability of expressions and rules

While rules can be composed from one another, there is currently no means to parametrize them. Parametrization can be extremely useful in cases where some idioms are often reused, e.g. repeated, separated values. Currently, you need to repeat some form of `e ~ ("," ~ e)*` which is less readable than `separated(e, ",")`.

Though less immediately useful, another addition would be to be able to use rules from different grammars.

## Expressiveness

Improving expressiveness is somewhat of a continuously open question. In 2.0 we've added additional stack calls that help recognize indentation-sensitive languages, namely `PEEK_ALL`, `POP_ALL`, and `DROP`. This conservative design was adopted in order to better understand what exactly is needed in real-world examples.

However, legitimate need of more refined localization within the stack has been illustrated in #329. Being able to accurately slice the stack for every one of the `PEEK`, `POP`, `DROP` calls seems to be required going forward.

## General consistency

With the introduction of built-in rules, capitalization has been selected as a form of differentiation from user-defined rules. Capitalized are also stack calls, start- and end-of-input calls, and unicode categories. The only way of differentiating between them is to simply know ahead of time what they do.

# Guide-level explanation

## Versioning

The pest language will be versioned according to the [semver] guide and grammar language versions will be optionally selected before parsing. This will ensure a smoother transition to 3.0, and beyond, it will be enabling users to opt-in to the newer version early on.

[semver]: https://semver.org/

## Modules

Akin to Rust's modules, a module can contain rules or other modules. This removes the need for capitalization of built-in rules. They can be part of separate modules.

```pest
/// Modules can be created by importing other grammars and are immediately public.
use "cool.pest";
use "this.pest" as that;

/// pest has its own sub-modules.
any     = { pest::any }
stack   = { pest::stack::peek }
unicode = { pest::unicode::binary::punctuation }
```

## Parametrizable rules

Rules will have optional arguments. Their definition will be parametrizable with argument names, all of them being valid pest expressions.

```pest
/// Definition
separated(e, s) = _{ e ~ (s ~ e)* }

/// Use
comma_separated(e) = _{ separated(e, ",") }
```

## Controlled trivia

The infix sequence operator `~` itself will be a user-defined rule:

```pest
~(lhs, rhs) = { lhs ~ " "* ~ rhs }
```

Without any `~` defined, `~`, `*`, `+`, and `{}` operators will all run according to their definitions without accepting any trivia between expressions. When it *is* defined, the repetitions will make use of the sequence operator:

```pest
*(e) = { e? ~ e* }
+(e) = { e ~ e* }
/// ... etc.
```

In order to be able to have both trivia-accepting and non-trivia-accepting operators working together, separate non-trivia operators will be introduced, namely `-` for sequence and all repetitions preceded by it:

| Operator                         | Trivia    | Non-trivia |
|----------------------------------|:---------:|:----------:|
| Sequence                         | `~`       | `-`        |
| Repeat zero or more times        | `*`       | `-*`       |
| Repeat one or more times         | `+`       | `-+`       |
| Repeat exactly n times           | `{n}`     | `-{n}`     |
| Repeat minimum of n times        | `{n..}`   | `-{n..}`   |
| Repeat maximum of n - 1 times    | `{..n}`   | `-{..n}`   |
| Repeat maximum of n times        | `{..=n}`  | `-{..=n}`  |
| Repeat between m and n - 1 times | `{m..n}`  | `-{m..n}`  |
| Repeat between m and n times     | `{m..=n}` | `-{m..=n}` |

## Stack slicing

Stack slicing will work similarly to Rust slicing with the exception that ranges will accept negative end values, similarly to Python. Slicing will happen from bottom to top such that for a stack `[a, b, c, d, e]`:

* `[1] == a`
* `[-1] == e`
* `[1..4] == [b, c, d]`
* `[1..-1] == [b, c, d]`
* `[1..=-1] == [b, c, d, e]`
* `[..-2] == [a, b, c]`

As such, `pest::stack::*`, i.e. `peek`, `pop`, `drop`, can be optionally sliced or indexed, e.g. `pest::stack::peek[..-1]`. The indices will be constant with the exception of those relative to the top of the stack since the stack's size is variable.

# Reference-level explanation

The grammar's version will be selected through the grammar attribute:

```rust
#[grammar = "grammar.pest", version = "3.0"]
```

`pest_meta` will handle both grammar language versions during the `2.*` transition period, then migrate to 3.0. This will need to be enforced if we want to take advantage of the more concise grammars during optimization and generation.

Much of the rest of this RFC is straight-forward:

1. add second grammar
2. implement validation
3. add module resolution to AST (in `pest_meta` and `pest_generator`)
4. add rule parameters to AST (in `pest_meta` and `pest_generator`)

# Drawbacks

Breaking compatibility so early could be dangerous, but we can offer help for people migrating to 3.0. If need be, we could also offer a pest fix tool that would be able to convert 2.0 to 3.0 grammars.

Some of the syntax introduced in the trivia handling might be a little heavy on the eye and we might want to fine tune it before it's set in stone.



Operator	Trivia	Non-trivia
Sequence	`~`	`-`
Repeat zero or more times	`*`	`-*`
Repeat one or more times	`+`	`-+`
Repeat exactly n times	`{n}`	`-{n}`
Repeat minimum of n times	`{n..}`	`-{n..}`
Repeat maximum of n - 1 times	`{..n}`	`-{..n}`
Repeat maximum of n times	`{..=n}`	`-{..=n}`
Repeat between m and n - 1 times	`{m..n}`	`-{m..n}`
Repeat between m and n times	`{m..=n}`	`-{m..=n}`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

pest language evolution #333

Summary

Motivation

Trivia handling complexity

Reusability of expressions and rules

Expressiveness

General consistency

Guide-level explanation

Versioning

Modules

Parametrizable rules

Controlled trivia

Stack slicing

Reference-level explanation

Drawbacks

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

pest language evolution #333

Description

Summary

Motivation

Trivia handling complexity

Reusability of expressions and rules

Expressiveness

General consistency

Guide-level explanation

Versioning

Modules

Parametrizable rules

Controlled trivia

Stack slicing

Reference-level explanation

Drawbacks

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions