Skip to content

Commit

Permalink
variables in reduce action now takes the value itself, include terminal
Browse files Browse the repository at this point in the history
    - remove slice capture
        - remove end_stack
    - remove TermData, NonTermData
    - reduce action canbe none if this rule has only one token data
    - bootstrapped new syntax
    - fix calculator example following new syntax
    - return Err if RuleType is defined but action is not defined
  • Loading branch information
ehwan committed Aug 7, 2024
1 parent 8af6cba commit 75f0c0f
Show file tree
Hide file tree
Showing 16 changed files with 690 additions and 1,183 deletions.
65 changes: 29 additions & 36 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ yacc-like LR(1) and LALR(1) Deterministic Finite Automata (DFA) generator from C

```
[dependencies]
rusty_lr = "0.12.3"
rusty_lr = "1.0.0"
```

## Features
Expand All @@ -17,12 +17,12 @@ rusty_lr = "0.12.3"
#### Why proc-macro, not external executable?
- Decent built-in lexer, with consideration of unicode and comments.
- Can generate *pretty* error messages, by just passing `Span` data.
- With modern IDE, can see errors in real-time with specific location.
- With modern IDE, auto-completion and error highlighting can be done in real-time.

## Sample

- [Calculator](example/calculator): calculator with enum `Token`
- [Calculator with `u8`](example/calculator_u8): calculator with `u8`
- [Calculator u8](example/calculator_u8): calculator with `u8`
- [Bootstrap](rusty_lr_parser/src/parser.rs): bootstrapped line parser of `lr1!` and `lalr1!` macro

Please refer to the [example](example) directory for the full example.
Expand Down Expand Up @@ -61,28 +61,27 @@ lr1! {

WS0: space*;

Digit: zero | one | two | three | four | five | six | seven | eight | nine;
Digit(u8): zero | one | two | three | four | five | six | seven | eight | nine;

Number(i32): WS0 Digit+ WS0 { std::str::from_utf8(Digit.slice).unwrap().parse().unwrap() };
Number(i32): WS0 Digit+ WS0 { std::str::from_utf8(&Digit).unwrap().parse().unwrap() };

A(f32): A plus a2=A {
*data += 1; // access userdata by `data`
println!( "{:?} {:?} {:?}", A.slice, *plus, a2.slice );
*A + *a2
println!( "{:?} {:?} {:?}", A, plus as char, a2 );
A + a2
}
| M { *M }
| M
;

M(f32): M star m2=M { *M * *m2 }
| P { *P }
M(f32): M star m2=M { M * m2 }
| P
;

P(f32): Number { *Number as f32 }
| WS0 lparen E rparen WS0 { *E }
P(f32): Number { Number as f32 }
| WS0 lparen E rparen WS0 { E }
;

E(f32) : A { *A };

E(f32) : A ;
}
```

Expand Down Expand Up @@ -116,8 +115,8 @@ fn main() {

The result will be:
```
[51, 32] 43 [32, 52, 32]
[32, 32, 49, 32] 43 [32, 32, 50, 48, 32, 42, 32, 32, 32, 40, 51, 32, 43, 32, 52, 32, 41, 32, 32, 32]
3.0 '+' 4.0
1.0 '+' 140.0
result: 141
userdata: 2
```
Expand Down Expand Up @@ -235,43 +234,37 @@ Define the type of value that this production rule holds.
Define the action to be executed when the rule is matched and reduced.
If `<RuleType>` is defined, `<ReduceAction>` itself must be the value of `<RuleType>` (i.e. no semicolon at the end of the statement).

`<ReduceAction>` can be omitted if:
- `<RuleType>` is not defined
- Only one token is holding value in the production rule

### Accessing token data in ReduceAction

**predefined variables** can be used in `<ReduceAction>`:
- `s` : slice of shifted terminal symbols `&[<TermType>]` captured by current rule.
- `data` : userdata passed to `feed()` function.

To access the data of each token, you can directly use the name of the token as a variable.
For non-terminal symbols, the type of data is [`rusty_lr::NonTermData`](rusty_lr_core/src/nontermdata.rs).
You can access the value of `<RuleType>` by `NonTerm.value`(by value) or `*NonTerm`(by `Deref`), slice by `NonTerm.slice`.
For terminal symbols, the type of data is [`rusty_lr::TermData`](rusty_lr_core/src/termdata.rs).
You can access the shifted terminal symbol by `Term.value`(`&TermType`) or `*Term`(`&TermType`).
For non-terminal symbols, the type of variable is `<RuleType>`.
For terminal symbols, the type of variable is `%tokentype`.

If multiple variables are defined with the same name, the variable on the front-most will be used.

For regex-like pattern, `<RuleType>` will be modified by following:
For regex-like pattern, type of variable will be modified by following:
| Pattern | Non-Terminal<br/>`<RuleType>=T` | Non-Terminal<br/>`<RuleType>=(not defined)` | Terminal |
|:-------:|:--------------:|:--------------------------:|:--------:|
| '*' | `Vec<T>` | (not defined) | (not defined) |
| '+' | `Vec<T>` | (not defined) | (not defined) |
| '?' | `Option<T>` | (not defined) | (not defined) |
| '*' | `Vec<T>` | (not defined) | `Vec<TermType>` |
| '+' | `Vec<T>` | (not defined) | `Vec<TermType>` |
| '?' | `Option<T>` | (not defined) | `Option<TermType>` |

For example, following code will print the value of each `A`, and the slice of each `A` and `plus` token in the production rule `E -> A plus A`.
```rust
%token plus ...;

E : A plus? a2=A*
{
println!("Value of 1st A: {}", A.value); // i32
println!("Slice of 1st A: {:?}", A.slice);
println!("Value of 2nd As: {:?}", a2.value); // Vec<i32>
println!("Slice of 2nd As: {:?}", a2.slice);

if let plus.slice.len() == 0 {
// plus is not captured
}else {
// plus is captured
}
println!("Value of 1st A: {}", A); // i32
println!("Value of plus: {:?}", plus); // Option<TermType>
println!("Value of 2nd As: {:?}", a2); // Vec<i32>
}
;

Expand All @@ -291,7 +284,7 @@ Define the type of `Err` variant in `Result<(), Err>` returned from `<ReduceActi
## Start Parsing
`lr1!` and `lalr1!` will generate struct `<StartSymbol>Parser`.

The struct has the following functions:
The parser struct has the following functions:
- `new()` : create new parser
- `begin(&self)` : create new context
- `feed(&self, &mut Context, TermType, &mut UserData) -> Result<(), ParseError>` : feed token to the parser
Expand Down
29 changes: 14 additions & 15 deletions example/calculator/src/parser.rs
Original file line number Diff line number Diff line change
Expand Up @@ -74,33 +74,32 @@ lalr1! {
// s is slice of shifted terminal symbols captured by current rule
// userdata can be accessed by `data` ( &mut i32, for this situation )
A(i32) : A plus a2=A {
println!("{:?} {:?} {:?}", A.slice, *plus, a2.slice );
// ^ ^ ^
// | | |- slice of 2nd 'A'
// | |- &Token
// |- slice of 1st 'A'
println!( "{:?}", s );
println!("{:?} {:?} {:?}", A, plus, a2 );
// ^ ^ ^
// | | |- value of 2nd 'A'
// | |- Token
// |- value of 1st 'A'
*data += 1;
A.value + a2.value // --> this will be new value of current 'A'
// ^ ^
// | |- value of 2nd 'A'
A + a2 // --> this will be new value of current 'A'
// ^ ^
// | |- value of 2nd 'A'
// |- value of 1st 'A'
}
| M { M.value }
| M
;

M(i32) : M star m2=M { *M * *m2 }
| P { *P }
M(i32) : M star m2=M { M * m2 }
| P
;

P(i32) : num {
if let Token::Num(n) = *num { *n }
if let Token::Num(n) = num { n }
else { return Err(format!("{:?}", num)); }
// ^^^^^^^^^^^^^^^^^^^^^^^^^^
// reduce action returns Result<(), String>
}
| lparen E rparen { *E }
| lparen E rparen { E }
;

E(i32) : A { *A };
E(i32) : A;
}
20 changes: 10 additions & 10 deletions example/calculator_u8/src/parser.rs
Original file line number Diff line number Diff line change
Expand Up @@ -27,26 +27,26 @@ lr1! {

WS0: space*;

Digit: zero | one | two | three | four | five | six | seven | eight | nine;
Digit(u8): zero | one | two | three | four | five | six | seven | eight | nine;

Number(i32): WS0 Digit+ WS0 { std::str::from_utf8(Digit.slice).unwrap().parse().unwrap() };
Number(i32): WS0 Digit+ WS0 { std::str::from_utf8(&Digit).unwrap().parse().unwrap() };

A(f32): A plus a2=A {
*data += 1; // access userdata by `data`
println!( "{:?} {:?} {:?}", A.slice, *plus, a2.slice );
*A + *a2
println!( "{:?} {:?} {:?}", A, plus as char, a2 );
A + a2
}
| M { *M }
| M
;

M(f32): M star m2=M { *M * *m2 }
| P { *P }
M(f32): M star m2=M { M * m2 }
| P
;

P(f32): Number { *Number as f32 }
| WS0 lparen E rparen WS0 { *E }
P(f32): Number { Number as f32 }
| WS0 lparen E rparen WS0 { E }
;

E(f32) : A { *A };
E(f32) : A ;

}
6 changes: 3 additions & 3 deletions rusty_lr/Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "rusty_lr"
version = "0.12.3"
version = "1.0.0"
edition = "2021"
license = "MIT"
description = "yacc-like, proc-macro based LR(1) and LALR(1) parser generator and code generation"
Expand All @@ -10,8 +10,8 @@ keywords = ["parser", "yacc", "context-free-grammar", "lr", "compiler"]
categories = ["parsing"]

[dependencies]
rusty_lr_core = "0.10.0"
rusty_lr_derive = "0.11.6"
rusty_lr_core = "1.0.0"
rusty_lr_derive = "1.0.0"
# rusty_lr_core = { path = "../rusty_lr_core" }
# rusty_lr_derive = { path = "../rusty_lr_derive" }

Expand Down
2 changes: 1 addition & 1 deletion rusty_lr_core/Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "rusty_lr_core"
version = "0.10.0"
version = "1.0.0"
edition = "2021"
license = "MIT"
description = "yacc-like, proc-macro based LR(1) and LALR(1) parser generator and code generation"
Expand Down
5 changes: 0 additions & 5 deletions rusty_lr_core/src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,9 +1,7 @@
pub(crate) mod grammar;
pub(crate) mod nontermdata;
pub(crate) mod parser;
pub(crate) mod rule;
pub(crate) mod state;
pub(crate) mod termdata;
pub(crate) mod token;

// reexport
Expand All @@ -26,6 +24,3 @@ pub use parser::callback::DefaultCallback;
pub use parser::context::Context;
pub use parser::error::ParseError;
pub use parser::parser::Parser;

pub use nontermdata::NonTermData;
pub use termdata::TermData;
35 changes: 0 additions & 35 deletions rusty_lr_core/src/nontermdata.rs

This file was deleted.

28 changes: 0 additions & 28 deletions rusty_lr_core/src/termdata.rs

This file was deleted.

4 changes: 2 additions & 2 deletions rusty_lr_derive/Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "rusty_lr_derive"
version = "0.11.6"
version = "1.0.0"
edition = "2021"
license = "MIT"
description = "yacc-like, proc-macro based LR(1) and LALR(1) parser generator and code generation"
Expand All @@ -15,4 +15,4 @@ proc-macro = true
[dependencies]
proc-macro2 = "1.0.86"
# rusty_lr_parser = { path = "../rusty_lr_parser" }
rusty_lr_parser = "0.3.2"
rusty_lr_parser = "1.0.0"
2 changes: 1 addition & 1 deletion rusty_lr_expand/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,4 @@ proc-macro2 = "1.0.86"
quote = "1.0"
clap = { version = "4.5.7", features = ["derive"] }
# rusty_lr_parser = { path = "../rusty_lr_parser" }
rusty_lr_parser = "0.3.2"
rusty_lr_parser = "1.0.0"
4 changes: 2 additions & 2 deletions rusty_lr_parser/Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "rusty_lr_parser"
version = "0.3.2"
version = "1.0.0"
edition = "2021"
license = "MIT"
description = "macro line parser for rusty_lr"
Expand All @@ -12,5 +12,5 @@ categories = ["parsing"]
[dependencies]
proc-macro2 = "1.0.86"
quote = "1.0"
rusty_lr_core = "0.10.0"
rusty_lr_core = "1.0.0"
# rusty_lr_core = { path = "../rusty_lr_core" }
Loading

0 comments on commit 75f0c0f

Please sign in to comment.