Skip to content

Conversation

ydah
Copy link
Member

@ydah ydah commented Feb 21, 2025

I considered introducing Token Categories into Lrama, inspired by Chevrotain's Token Categories. This mechanism allows multiple token types to be logically grouped together so they can be matched collectively in grammar rules.

Reference: Chevrotain Token Categories

For example, in parse.y, we currently define grammar rules like this:

p_cases : opt_else
        | p_case_body
        ;

This rule matches two token types: opt_else and p_case_body. If we introduce a Token Category named p_cases to group these token types, we could define the rule more concisely:

%token-categories <node> p_cases: opt_else p_case_body

By grouping multiple token types into a single category, grammar rules only need to consume the category, simplifying rule definitions.

In Chevrotain, tokens can belong to multiple categories by specifying categories in their definitions. However, this approach makes it difficult to see which tokens belong to a given category at a glance. To address this, Lrama explicitly lists token members in the category definition.

I considered introducing **Token Categories** into Lrama, inspired by Chevrotain's Token Categories.
This mechanism allows multiple token types to be logically grouped together so they can be matched collectively in grammar rules.

Reference: [Chevrotain Token Categories](https://chevrotain.io/docs/features/token_categories.html)

For example, in `parse.y`, we currently define grammar rules like this:

```yacc
p_cases : opt_else
        | p_case_body
        ;
```

This rule matches two token types: `opt_else` and `p_case_body`.
If we introduce a **Token Category** named `p_cases` to group these token types, we could define the rule more concisely:

```yacc
%token-categories <node> p_cases: opt_else p_case_body
```

By grouping multiple token types into a single category, grammar rules only need to consume the category, simplifying rule definitions.

In Chevrotain, tokens can belong to multiple categories by specifying categories in their definitions.
However, this approach makes it difficult to see which tokens belong to a given category at a glance.
To address this, Lrama explicitly lists token members in the category definition.
@ydah ydah requested a review from yui-knk February 21, 2025 17:22
@ydah ydah marked this pull request as draft August 28, 2025 09:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant