Skip to content

Commit

Permalink
refactored and clean for real world usage
Browse files Browse the repository at this point in the history
    - bump version to 2.0.0 (breaking API change to ParseError)
    - core: fix ParseError variants
    - core, parser: remove feed_callback()
    - core: add type aliases for emitted Parser
    - exec: fix print shift/reduce conflict resolving only on verbose option
    - parser: replace expect with unwrap() for guaranteed unreachable
    - add(parser) exclamation pattern
  • Loading branch information
ehwan committed Aug 14, 2024
1 parent 4946715 commit f0ef458
Show file tree
Hide file tree
Showing 35 changed files with 2,013 additions and 2,938 deletions.
500 changes: 199 additions & 301 deletions README.md

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion example/lrtest/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@ version = "0.1.0"
edition = "2021"

[dependencies]
rusty_lr = { path = "../../rusty_lr" }
rusty_lr = { path = "../../rusty_lr", feature = ["fxhash", "builder"] }
6 changes: 3 additions & 3 deletions example/lrtest/src/main.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
use rusty_lr::*;

fn main() {
let mut grammar = Grammar::new();
let mut grammar = builder::Grammar::new();

let _ = grammar.set_reduce_type('0', ReduceType::Right);

Expand All @@ -23,8 +23,8 @@ fn main() {
Ok(_) => {
println!("Build successful");
}
Err(e) => {
println!("Build failed: {:?}", e);
Err(_) => {
println!("Build failed");
}
}
}
Binary file modified images/error1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified images/error2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed images/error3.png
Binary file not shown.
7 changes: 4 additions & 3 deletions rusty_lr/Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "rusty_lr"
version = "1.6.1"
version = "2.0.0"
edition = "2021"
license = "MIT"
description = "yacc-like, LR(1) and LALR(1) parser generator and code generation"
Expand All @@ -10,14 +10,15 @@ keywords = ["parser", "yacc", "context-free-grammar", "lr", "compiler"]
categories = ["parsing"]

[dependencies]
rusty_lr_core = "1.4"
rusty_lr_derive = "1.3"
rusty_lr_core = "2.0"
rusty_lr_derive = "1.4"
# rusty_lr_core = { path = "../rusty_lr_core" }
# rusty_lr_derive = { path = "../rusty_lr_derive" }

[features]
default = []
fxhash = ["rusty_lr_core/fxhash"]
builder = ["rusty_lr_core/builder"]
# default = ["core", "derive"]
# core = ["rusty_lr_core"]
# derive = ["rusty_lr_derive"]
285 changes: 181 additions & 104 deletions rusty_lr/src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,146 +1,223 @@
//! # RustyLR
//! yacc-like LR(1) and LALR(1) Deterministic Finite Automata (DFA) generator from Context Free Grammar (CFGs).
//!
//! RustyLR provides both [executable](#executable-rustylr) and [procedural macros](#proc-macro-syntax) to generate LR(1) and LALR(1) parser.
//!
//! `features=["fxhash"]` to replace `std::collections::HashMap` with [`FxHashMap`](https://github.com/rust-lang/rustc-hash)
//! RustyLR provides both [executable](#executable-rustylr) and [procedural macros](#proc-macro) to generate LR(1) and LALR(1) parser.
//! The generated parser will be a pure Rust code, and the calculation of building DFA will be done at compile time.
//! Reduce action can be written in Rust code,
//! and the error messages are readable and detailed with [executable](#executable-rustylr).
//! For huge and complex grammars, it is recommended to use the [executable](#executable-rustylr) version.
//!
//! By default, RustyLR uses [`std::collections::HashMap`] for the parser tables.
//! If you want to use `FxHashMap` from [`rustc-hash`](https://github.com/rust-lang/rustc-hash), add `features=["fxhash"]` to your `Cargo.toml`.
//! ```toml
//! [dependencies]
//! rusty_lr = { version = "...", features = ["fxhash"] }
//! ```
//!
//! ## Features
//! - pure Rust implementation
//! - readable error messages, both for grammar building and parsing
//! - compile-time DFA construction from CFGs
//! - customizable reduce action
//! - resolving conflicts of ambiguous grammar
//! - tracing parser action with callback
//! - regex patterns partially supported
//! - executable for generating parser tables from CFGs
//!
//! ### Simple definition of CFG
//! ```rust
//! lr1! {
//! // userdata type
//! %userdata i32;
//! // token type
//! %tokentype char;
//! // start symbol
//! %start E;
//! // eof symbol
//! %eof '\0';
//!
//! // token definition
//! %token zero '0';
//! %token one '1';
//! %token two '2';
//! %token three '3';
//! %token four '4';
//! %token five '5';
//! %token six '6';
//! %token seven '7';
//! %token eight '8';
//! %token nine '9';
//! %token plus '+';
//! %token star '*';
//! %token lparen '(';
//! %token rparen ')';
//! %token space ' ';
//!
//! // conflict resolving
//! %left plus;
//! %left star;
//!
//! // context-free grammars
//! WS0: space*;
//!
//! Digit(char): [zero-nine];
//!
//! Number(i32): WS0 Digit+ WS0 { Digit.into_iter().collect::<String>().parse().unwrap() };
//!
//! A(f32): A plus a2=A {
//! *data += 1; // access userdata by `data`
//! println!( "{:?} {:?} {:?}", A, plus, a2 );
//! A + a2 // this will be the new value of A
//! }
//! | M
//! ;
//!
//! M(f32): M star m2=M { M * m2 }
//! | P
//! ;
//!
//! P(f32): Number { Number as f32 }
//! | WS0 lparen E rparen WS0 { E }
//! ;
//!
//! E(f32) : A ;
//! }
//! ```
//! ## proc-macro syntax
//! - executable for generating parser tables
//!
//! ## proc-macro
//! Below procedural macros are provided:
//! - `lr1!`, `lalr1!`
//!
//! These macros will define three structs: `Parser`, `Context`, and `enum NonTerminals`, prefixed by `<StartSymbol>`.
//! In most cases, what you want is the `Parser` struct, which contains the DFA states and `feed()` functions.
//! Please refer to the [Start Parsing](#start-parsing) section below for actual usage of the `Parser` struct.
//! - [`lr1!`] : LR(1) parser
//! - [`lalr1!`] : LALR(1) parser
//!
//! Those macros (those without '_runtime' suffix) will generate `Parser` struct at compile-time.
//! The calculation of building DFA will be done at compile-time, and the generated code will be *TONS* of `insert` of tokens one by one.
//!
//! [Bootstrap](https://github.com/ehwan/RustyLR/blob/main/rusty_lr_parser/src/parser/parser.rs), [Expanded Bootstrap](https://github.com/ehwan/RustyLR/blob/main/rusty_lr_parser/src/parser/parser_expanded.rs) would be a good example to understand the syntax and generated code. It is RustyLR syntax parser written in RustyLR itself.
//!
//! Every line in the macro must follow the syntax [here](https://github.com/ehwan/RustyLR?tab=readme-ov-file#proc-macro-syntax)
//! These macros will generate structs:
//! - `Parser` : contains DFA tables and production rules
//! - [`ParseError`] : type alias for `Error` returned from `feed()`
//! - `Context` : contains current state and data stack
//! - `enum NonTerminals` : a list of non-terminal symbols
//! - [`Rule`](`ProductionRule`) : type alias for production rules
//! - [`State`] : type alias for DFA states
//!
//! All structs above are prefixed by `<StartSymbol>`.
//! In most cases, what you want is the `Parser` and `ParseError` structs, and the others are used internally.
//!
//! ## Start Parsing
//! `<StartSymbol>Parser` will be generated by the procedural macros.
//!
//! The parser struct has the following functions:
//! The `Parser` struct has the following functions:
//! - `new()` : create new parser
//! - `begin(&self)` : create new context
//! - `feed(&self, &mut Context, TermType, &mut UserData) -> Result<(), ParseError>` : feed token to the parser
//! - `feed_callback(&self, &mut Context, &mut C: Callback, TermType, &mut UserData) -> Result<(), ParseError>` : feed token with callback
//! - `feed(&self, &mut Context, TerminalType, &mut UserData) -> Result<(), ParseError>` : feed token to the parser
//!
//! Note that the parameter `&mut UserData` is omitted if `%userdata` is not defined.
//! Once the input sequence is feeded (including `eof` token), without errors, you can get the value of start symbol by calling `context.accept()`.
//! All you need to do is to call `new()` to generate the parser, and `begin()` to create a context.
//! Then, you can feed the input sequence one by one with `feed()` function.
//! Once the input sequence is feeded (including `eof` token), without errors,
//! you can get the value of start symbol by calling `context.accept()`.
//!
//! ```ignore
//! let parser = parser::EParser::new();
//! // create context
//! let mut context = parser.begin();
//! // define userdata
//! let mut userdata: i32 = 0;
//!
//! // start feeding tokens
//! ```rust
//! let parser = Parser::new();
//! let context = parser.begin();
//! for token in input_sequence {
//! match parser.feed(&mut context, token, &mut userdata) {
//! // ^^^^^ ^^^^^^^^^^^^ userdata passed here as `&mut i32`
//! // |- feed token
//! match parser.feed(&context, token) {
//! Ok(_) => {}
//! Err(e) => {
//! Err(e) => { // e: ParseError
//! println!("{}", e);
//! // println!( "{}", e.long_message() ); // for more detailed error message
//! return;
//! }
//! }
//! }
//! // res = value of start symbol
//! let res = context.accept();
//! println!("{}", res);
//! println!("userdata: {}", userdata);
//! let start_symbol_value = context.accept();
//! ```
//!
//!
//! ## executable `rustylr`
//! ## Error Handling
//! There are two error variants returned from `feed()` function:
//! - `InvalidTerminal(InvalidTerminalError)` : when invalid terminal symbol is fed
//! - `ReduceAction(ReduceActionError)` : when the reduce action returns `Err(Error)`
//!
//! For `ReduceActionError`, the error type can be defined by [`%err`](#error-type-optional) directive. If not defined, `String` will be used.
//!
//! When printing the error message, there are two ways to get the error message:
//! - `e.long_message( &parser.rules, &parser.states, &context.state_stack )` : get the error message as `String`, in a detailed format
//! - `e as Display` : briefly print the short message through `Display` trait.
//!
//! The `long_message` function requires the reference to the parser's rules and states, and the context's state stack.
//! It will make a detailed error message of what current state was trying to parse, and what the expected terminal symbols were.
//! ### Example of long_message
//! ```text
//! Invalid Terminal: *
//! Expected one of: , (, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
//! -------------------------------Backtracing state--------------------------------
//! WS0 -> • _RustyLRGenerated0
//! _RustyLRGenerated1 -> •
//! _RustyLRGenerated1 -> • _RustyLRGenerated1
//! _RustyLRGenerated0 -> • _RustyLRGenerated1
//! _RustyLRGenerated0 -> •
//! Number -> • WS0 _RustyLRGenerated3 WS0
//! M -> • M * M
//! M -> M * • M
//! M -> • P
//! P -> • Number
//! P -> • WS0 ( E ) WS0
//! -----------------------------------Prev state-----------------------------------
//! M -> M • * M
//! -----------------------------------Prev state-----------------------------------
//! A -> • A + A
//! A -> A + • A
//! A -> • M
//! M -> • M * M
//! -----------------------------------Prev state-----------------------------------
//! A -> A • + A
//! -----------------------------------Prev state-----------------------------------
//! A -> • A + A
//! E -> • A
//! Augmented -> • E
//! ```
//! cargo install rustylr
//!
//! ## Syntax
//! To start writing down a context-free grammar, you need to define necessary directives first.
//! This is the syntax of the procedural macros.
//!
//! ```rust
//! lr1! {
//! // %directives
//! // %directives
//! // ...
//! // %directives
//!
//! // NonTerminalSymbol(RuleType): ProductionRules
//! // NonTerminalSymbol(RuleType): ProductionRules
//! // ...
//! }
//! ```
//!
//! `lr1!` macro will generate a parser struct with LR(1) DFA tables.
//! If you want to generate LALR(1) parser, use `lalr1!` macro.
//! Every line in the macro must follow the syntax below.
//!
//! Syntax can be found in [repository](https://github.com/ehwan/RustyLR/tree/coreclean?tab=readme-ov-file#syntax).
//!
//!
//! ## executable `rustylr`
//! An executable version of `lr1!` and `lalr1!` macro.
//! Converts a context-free grammar into a deterministic finite automaton (DFA) tables,
//! and generates a Rust code that can be used as a parser for that grammar.
//!
//! ```
//! cargo install rustylr
//! ```
//!
//! This executable will provide much more detailed, pretty-printed error messages than the procedural macros.
//! If you are writing a huge, complex grammar, it is recommended to use this executable than the procedural macros.
//! `--verbose` option is useful for debugging the grammar. It will print the auto-generated rules and the resolving process of shift/reduce conflicts.
//! `--verbose` option is useful for debugging the grammar.
//! It will print where the auto-generated rules are originated from and the resolving process of shift/reduce conflicts.
//! [like](https://github.com/ehwan/RustyLR/blob/main/images/example1.png) [this](https://github.com/ehwan/RustyLR/blob/main/images/example2.png)
//!
//! Although it is convenient to use the proc-macros for small grammars,
//! since modern IDEs feature (rust-analyzer's auto completion, inline error messages) could be enabled.
//!
//! This program searches for `%%` in the input file. ( Not the `lr1!`, `lalr1!` macro )
//!
//! The contents before `%%` will be copied into the output file as it is.
//! Context-free grammar must be followed by `%%`.
//! Each line must follow the syntax of [rusty_lr#syntax](#syntax)
//!
//! ```rust
//! // my_grammar.rs
//! use some_crate::some_module::SomeStruct;
//!
//! enum SomeTypeDef {
//! A,
//! B,
//! C,
//! }
//!
//! %% // <-- input file splitted here
//!
//! %tokentype u8;
//! %start E;
//! %eof b'\0';
//!
//! %token a b'a';
//! %token lparen b'(';
//! %token rparen b')';
//!
//! E: lparen E rparen
//! | P
//! ;
//!
//! P: a;
//! ```
//!
//! Calling the command will generate a Rust code `my_parser.rs`.
//! ```
//! $ rustylr my_grammar.rs my_parser.rs --verbose
//! ```
//!
//!
//! Possible options can be found by `--help`.
//! ```
//! $ rustylr --help
//! Usage: rustylr [OPTIONS] <INPUT_FILE> [OUTPUT_FILE]
//!
//! Arguments:
//! <INPUT_FILE>
//! input_file to read
//!
//! [OUTPUT_FILE]
//! output_file to write
//!
//! [default: out.tab.rs]
//!
//! Options:
//! --no-format
//! do not rustfmt the output
//!
//! -l, --lalr
//! build LALR(1) parser
//!
//! -v, --verbose
//! print debug information.
//!
//! print the auto-generated rules, and where they are originated from.
//! print the shift/reduce conflicts, and the resolving process.
//! ```
// re-exports

Expand Down
3 changes: 2 additions & 1 deletion rusty_lr_core/Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "rusty_lr_core"
version = "1.4.1"
version = "2.0.0"
edition = "2021"
license = "MIT"
description = "yacc-like, LR(1) and LALR(1) parser generator and code generation"
Expand All @@ -16,3 +16,4 @@ rustc-hash = { version = "2.0", optional = true }
default = []
# use `rustc-hash` crate for hash map
fxhash = ["dep:rustc-hash"]
builder = []
Loading

0 comments on commit f0ef458

Please sign in to comment.