[PoC] Support Token Categories #551

ydah · 2025-02-21T17:21:51Z

I considered introducing Token Categories into Lrama, inspired by Chevrotain's Token Categories. This mechanism allows multiple token types to be logically grouped together so they can be matched collectively in grammar rules.

Reference: Chevrotain Token Categories

For example, in parse.y, we currently define grammar rules like this:

p_cases : opt_else
        | p_case_body
        ;

This rule matches two token types: opt_else and p_case_body. If we introduce a Token Category named p_cases to group these token types, we could define the rule more concisely:

%token-categories <node> p_cases: opt_else p_case_body

By grouping multiple token types into a single category, grammar rules only need to consume the category, simplifying rule definitions.

In Chevrotain, tokens can belong to multiple categories by specifying categories in their definitions. However, this approach makes it difficult to see which tokens belong to a given category at a glance. To address this, Lrama explicitly lists token members in the category definition.

I considered introducing **Token Categories** into Lrama, inspired by Chevrotain's Token Categories. This mechanism allows multiple token types to be logically grouped together so they can be matched collectively in grammar rules. Reference: [Chevrotain Token Categories](https://chevrotain.io/docs/features/token_categories.html) For example, in `parse.y`, we currently define grammar rules like this: ```yacc p_cases : opt_else | p_case_body ; ``` This rule matches two token types: `opt_else` and `p_case_body`. If we introduce a **Token Category** named `p_cases` to group these token types, we could define the rule more concisely: ```yacc %token-categories <node> p_cases: opt_else p_case_body ``` By grouping multiple token types into a single category, grammar rules only need to consume the category, simplifying rule definitions. In Chevrotain, tokens can belong to multiple categories by specifying categories in their definitions. However, this approach makes it difficult to see which tokens belong to a given category at a glance. To address this, Lrama explicitly lists token members in the category definition.

ydah requested a review from yui-knk February 21, 2025 17:22

ydah marked this pull request as draft August 28, 2025 09:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[PoC] Support Token Categories #551

[PoC] Support Token Categories #551

Uh oh!

ydah commented Feb 21, 2025 •

edited

Loading

Uh oh!

Uh oh!

[PoC] Support Token Categories #551

Are you sure you want to change the base?

[PoC] Support Token Categories #551

Uh oh!

Conversation

ydah commented Feb 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ydah commented Feb 21, 2025 •

edited

Loading