PoC: Pratt parsing with shunting yard algorithm#618
PoC: Pratt parsing with shunting yard algorithm#61839555 wants to merge 34 commits intowinnow-rs:mainfrom
shunting yard algorithm#618Conversation
| // what we expecting to parse next | ||
| let mut waiting_operand = true; | ||
| // a stack for computing the result | ||
| let mut value_stack = Vec::<Operand>::new(); |
There was a problem hiding this comment.
We'll need to mark this as requiring std. That is the one benefit to recursion that it can operate in no_std environments.
There was a problem hiding this comment.
A user provided stack similar to Accumulate
Ah, curious idea to explore. I wouldn't put this as a blocker but we can create an issue and see if it garners interest
Pull Request Test Coverage Report for Build 11838052714Details
💛 - Coveralls |
| let mut value_stack = Vec::<Operand>::new(); | ||
| let mut operator_stack = Vec::<Operator<'_, Operand>>::new(); | ||
|
|
||
| 'parse: loop { |
There was a problem hiding this comment.
Our use of a loop with waiting_operand reminds me of rust-lang/rfcs#3720
src/combinator/shunting_yard.rs
Outdated
There was a problem hiding this comment.
nit: I think I'd name this something like unwind_operator_stack_to to make it clear what the condition is for unwinding
| _ => fail | ||
| }, | ||
| ), | ||
| trace("postfix", fail), |
There was a problem hiding this comment.
precedence could put these traces on the parameters it passes to shunting_yard
There was a problem hiding this comment.
Granted, encouraging users to do it makes the parameter list easier to read
| dispatch! {peek(any); | ||
| '(' => delimited('(', trace("recursion", parser), cut_err(')')), | ||
| _ => digit1.parse_to::<i32>() | ||
| }, |
There was a problem hiding this comment.
Still requires recursion to do parenthesis but avoiding that is likely only something that can be handled with trivial expressions. This also puts the responsibility for recursion on the users side so they know its happening and can account for it as needed (e.g. having a depth check)
src/combinator/shunting_yard.rs
Outdated
There was a problem hiding this comment.
Its not clear to me what problem you are having here
There was a problem hiding this comment.
The user needs to manually convert &|_| {} into &dyn Fn
src/combinator/shunting_yard.rs
Outdated
There was a problem hiding this comment.
the infix should return the ugly (left_power, right_power, &dyn Fn(O) -> O) just 2 powers for now without a trick with Assoc enum.
So the two powers is more of a raw implementation and the associativity enums are an abstraction over it?
There was a problem hiding this comment.
Yes. The algorithm uses two powers for infix to determine what to parse next. From matklad's https://matklad.github.io/2020/04/13/simple-but-powerful-pratt-parsing.html
expr: A + B + C
power: 0 3 3.1 3 3.1 0
the enum in chumsky automatically bumps the value with some clever trick
impl Associativity {
fn left_power(&self) -> u32 {
match self {
Self::Left(x) => *x as u32 * 2,
Self::Right(x) => *x as u32 * 2 + 1,
}
}
fn right_power(&self) -> u32 {
match self {
Self::Left(x) => *x as u32 * 2 + 1,
Self::Right(x) => *x as u32 * 2,
}
}
}| // if eval_stack.len() > 1 { | ||
| // // Error: value left on stack | ||
| // } |
There was a problem hiding this comment.
Error kinds. Algorithm has missing operand and value left on stack
I can see it being important to know of a "missing operand".
What end-user condition leaves a value on the stack or is that more of an assert?
There was a problem hiding this comment.
If this is the way to go I will apply your rewiew suggestions from #614 where they’re still applicable.
I assume this API style of API, removing RefCell and allowing dispatch! could be applied to #614.
What is your overall impression of the two?
I'm also curious about the performance of recursion vs iteration but I suspect some differences, like use of dispatch! would bias things
There was a problem hiding this comment.
I will write a benchmark with dispatches and stripped down ReffCells and tuples. We will see what is the best. The explicit stack is nice if the interface allows the user to customize the type such as VecDeque or SmallVec or something for no_std. Both functions are really similar except the recursion part
|
A really great description of this algorithm https://github.com/erikeidt/erikeidt.github.io/blob/master/The-Double-E-Method.md and a nice implementation in C# https://github.com/erikeidt/Draconum/blob/master/src/3.%20Expression%20Parser/Expression%20Parser%20Library/ExpressionParser.cs |
Co-authored-by: Ed Page <eopage@gmail.com>
This feature was an overengineering based on suggestion "Why make our own trait" in winnow-rs#614 (comment)
works without it
…d be - based on review "Why allow non_snake_case?" in winnow-rs#614 (comment) - remove `allow_unused` based on "Whats getting unused?" winnow-rs#614 (comment)
until we find a satisfactory api based on winnow-rs#614 (comment) > "We are dumping a lot of stray types into combinator. The single-line summaries should make it very easy to tell they are related to precedence"
based on "Organizationally, O prefer the "top level" thing going first and then branching out from there. In this case, precedence is core." winnow-rs#614 (comment)
the api has an unsound problem. The `Parser` trait is implemented on the `&Operator` but inside `parse_next` a mutable ref and `ReffCell::borrow_mut` are used which can lead to potential problems. We can return to the API later. But for now lets keep only the essential algorithm and pass affix parsers as 3 separate entities Also add left_binding_power and right_binding_power to the operators based on winnow-rs#614 (comment)
I will write the documentation later
- require explicit `trace` for operators - fix associativity handling for infix operators: `1 + 2 + 3` should be `(1 + 2) + 3` and not `1 + (2 + 3)`
- ternary operator - function call - index
- fix failing tests related to the ternary operator and commas
# Conflicts: # src/combinator/mod.rs
updates from precedence.rs: - enum `Assoc` - associativity `Neither` - function pointers `fn()` instead of `dyn& Fn`
This commit: * Fixes errors due to winnow 0.7 migration * Adapts winnow-rs/winnow#620 to work outside winnow * Uncomments test cases For the migration steps, I've attached CHANGELOG.md line numbers for context. Use winnow-rs/winnow@73c6e05 for the version of CHANGELOG.md in winnow-rs/winnow, as the line numbers will inevitably change with time. The main migration steps are: * `PResult` replaced with `winnow::Result` (L153, L92-L98) * Use `winnow::Result` over `ModalResult` when `cut_err` isn't used * Swap `ErrMode::from_error_kind` to `ParserError::from_input` (L89) References: winnow-rs/winnow#618 References: winnow-rs/winnow#620 References: winnow-rs/winnow#622
Attempt №2 #614
This is much smaller implementation based on the modified
shunting yardfrom the https://github.com/bourguet/operator_precedence_parsing/tree/masterDifferences from the previous Pratt implementation:
Vecstack is now used, with one stack for operands and another for operators.RefCellsDifferences from the https://en.wikipedia.org/wiki/Shunting_yard_algorithm:
operandusing recursive sub-expression similar to Precedence parsing rust-bakery/nom#1362 (Wikipedia hardcodes braces into the algorithm itself)operator_precedence_parsingrepository introduces specialprefix_actionandpostfix_actionmutable closures for handling braces. This complicates the algorithm, so it’s out of scope for our PoC. However, we’ll keep it in mind if recursion for braces is undesirable.This is extremely barebones for now without the fancy UX we will agree later. it is 3 parsers slapped into the function signature: one each for prefix, postfix, and infix. Prefix and postfix parsers should return
(power, &dyn Fn(O) -> O)and the infix should return the ugly(left_power, right_power, &dyn Fn(O) -> O)just 2 powers for now without a trick withAssocenum.If this is the way to go I will apply your rewiew suggestions from #614 where they’re still applicable.
Minor things to consider
Accumulatemissing operandandvalue left on stackEDIT: This algorithm is described as
The-Double-E-Methodin https://github.com/erikeidt/erikeidt.github.io/blob/master/The-Double-E-Method.md