PoC: Pratt parsing with `shunting yard` algorithm by 39555 · Pull Request #618 · winnow-rs/winnow

39555 · 2024-11-14T13:20:39Z

Attempt №2 #614

This is much smaller implementation based on the modified shunting yard from the https://github.com/bourguet/operator_precedence_parsing/tree/master

Differences from the previous Pratt implementation:

No more recursion. The explicit Vec stack is now used, with one stack for operands and another for operators.
Without RefCells

Differences from the https://en.wikipedia.org/wiki/Shunting_yard_algorithm:

Parsing is done in a single pass without first converting the expression to Polish notation.
Braces '(' are handled as an operand using recursive sub-expression similar to Precedence parsing rust-bakery/nom#1362 (Wikipedia hardcodes braces into the algorithm itself)
- The operator_precedence_parsing repository introduces special prefix_action and postfix_action mutable closures for handling braces. This complicates the algorithm, so it’s out of scope for our PoC. However, we’ll keep it in mind if recursion for braces is undesirable.

This is extremely barebones for now without the fancy UX we will agree later. it is 3 parsers slapped into the function signature: one each for prefix, postfix, and infix. Prefix and postfix parsers should return
(power, &dyn Fn(O) -> O) and the infix should return the ugly (left_power, right_power, &dyn Fn(O) -> O) just 2 powers for now without a trick with Assoc enum.

If this is the way to go I will apply your rewiew suggestions from #614 where they’re still applicable.

Minor things to consider

A user provided stack similar to Accumulate
Error kinds. Algorithm has missing operand and value left on stack

EDIT: This algorithm is described as The-Double-E-Method in https://github.com/erikeidt/erikeidt.github.io/blob/master/The-Double-E-Method.md

epage · 2024-11-14T14:54:24Z

src/combinator/shunting_yard.rs

+    // what we expecting to parse next
+    let mut waiting_operand = true;
+    // a stack for computing the result
+    let mut value_stack = Vec::<Operand>::new();


We'll need to mark this as requiring std. That is the one benefit to recursion that it can operate in no_std environments.

A user provided stack similar to Accumulate

Ah, curious idea to explore. I wouldn't put this as a blocker but we can create an issue and see if it garners interest

coveralls · 2024-11-14T14:55:30Z

Pull Request Test Coverage Report for Build 11838052714

Details

0 of 54 (0.0%) changed or added relevant lines in 1 file are covered.
1 unchanged line in 1 file lost coverage.
Overall coverage decreased (-0.7%) to 40.843%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
src/combinator/shunting_yard.rs	0	54	0.0%

Files with Coverage Reduction	New Missed Lines	%
src/stream/mod.rs	1	24.95%

Totals
Change from base Build 11802515306:	-0.7%
Covered Lines:	1298
Relevant Lines:	3178

💛 - Coveralls

epage · 2024-11-14T14:57:35Z

src/combinator/shunting_yard.rs

+    let mut value_stack = Vec::<Operand>::new();
+    let mut operator_stack = Vec::<Operator<'_, Operand>>::new();
+
+    'parse: loop {


Our use of a loop with waiting_operand reminds me of rust-lang/rfcs#3720

epage · 2024-11-14T15:00:00Z

src/combinator/shunting_yard.rs

nit: I think I'd name this something like unwind_operator_stack_to to make it clear what the condition is for unwinding

epage · 2024-11-14T15:01:12Z

src/combinator/shunting_yard.rs

+                    _ => fail
+                },
+            ),
+            trace("postfix", fail),


precedence could put these traces on the parameters it passes to shunting_yard

Granted, encouraging users to do it makes the parameter list easier to read

epage · 2024-11-14T15:03:25Z

src/combinator/shunting_yard.rs

+                dispatch! {peek(any);
+                    '(' => delimited('(', trace("recursion",  parser), cut_err(')')),
+                    _ => digit1.parse_to::<i32>()
+                },


Still requires recursion to do parenthesis but avoiding that is likely only something that can be handled with trivial expressions. This also puts the responsibility for recursion on the users side so they know its happening and can account for it as needed (e.g. having a depth check)

epage · 2024-11-14T15:03:52Z

src/combinator/shunting_yard.rs

Its not clear to me what problem you are having here

The user needs to manually convert &|_| {} into &dyn Fn

epage · 2024-11-14T15:11:54Z

src/combinator/shunting_yard.rs

the infix should return the ugly (left_power, right_power, &dyn Fn(O) -> O) just 2 powers for now without a trick with Assoc enum.

So the two powers is more of a raw implementation and the associativity enums are an abstraction over it?

Yes. The algorithm uses two powers for infix to determine what to parse next. From matklad's https://matklad.github.io/2020/04/13/simple-but-powerful-pratt-parsing.html

expr: A + B + C power: 0 3 3.1 3 3.1 0

the enum in chumsky automatically bumps the value with some clever trick

impl Associativity { fn left_power(&self) -> u32 { match self { Self::Left(x) => *x as u32 * 2, Self::Right(x) => *x as u32 * 2 + 1, } } fn right_power(&self) -> u32 { match self { Self::Left(x) => *x as u32 * 2 + 1, Self::Right(x) => *x as u32 * 2, } } }

epage · 2024-11-14T15:14:15Z

src/combinator/shunting_yard.rs

+    // if eval_stack.len() > 1 {
+    //     // Error: value left on stack
+    // }


Error kinds. Algorithm has missing operand and value left on stack

I can see it being important to know of a "missing operand".

What end-user condition leaves a value on the stack or is that more of an assert?

epage · 2024-11-14T15:16:58Z

src/combinator/shunting_yard.rs

If this is the way to go I will apply your rewiew suggestions from #614 where they’re still applicable.

I assume this API style of API, removing RefCell and allowing dispatch! could be applied to #614.

What is your overall impression of the two?

I'm also curious about the performance of recursion vs iteration but I suspect some differences, like use of dispatch! would bias things

I will write a benchmark with dispatches and stripped down ReffCells and tuples. We will see what is the best. The explicit stack is nice if the interface allows the user to customize the type such as VecDeque or SmallVec or something for no_std. Both functions are really similar except the recursion part

39555 · 2024-11-15T17:58:27Z

A really great description of this algorithm https://github.com/erikeidt/erikeidt.github.io/blob/master/The-Double-E-Method.md and a nice implementation in C# https://github.com/erikeidt/Draconum/blob/master/src/3.%20Expression%20Parser/Expression%20Parser%20Library/ExpressionParser.cs

Co-authored-by: Ed Page <eopage@gmail.com>

This feature was an overengineering based on suggestion "Why make our own trait" in winnow-rs#614 (comment)

works without it

…d be - based on review "Why allow non_snake_case?" in winnow-rs#614 (comment) - remove `allow_unused` based on "Whats getting unused?" winnow-rs#614 (comment)

until we find a satisfactory api based on winnow-rs#614 (comment) > "We are dumping a lot of stray types into combinator. The single-line summaries should make it very easy to tell they are related to precedence"

based on "Organizationally, O prefer the "top level" thing going first and then branching out from there. In this case, precedence is core." winnow-rs#614 (comment)

the api has an unsound problem. The `Parser` trait is implemented on the `&Operator` but inside `parse_next` a mutable ref and `ReffCell::borrow_mut` are used which can lead to potential problems. We can return to the API later. But for now lets keep only the essential algorithm and pass affix parsers as 3 separate entities Also add left_binding_power and right_binding_power to the operators based on winnow-rs#614 (comment)

I will write the documentation later

- require explicit `trace` for operators - fix associativity handling for infix operators: `1 + 2 + 3` should be `(1 + 2) + 3` and not `1 + (2 + 3)`

- ternary operator - function call - index

- fix failing tests related to the ternary operator and commas

winnow-rs#622 (comment)

# Conflicts: # src/combinator/mod.rs

updates from precedence.rs: - enum `Assoc` - associativity `Neither` - function pointers `fn()` instead of `dyn& Fn`

This commit: * Fixes errors due to winnow 0.7 migration * Adapts winnow-rs/winnow#620 to work outside winnow * Uncomments test cases For the migration steps, I've attached CHANGELOG.md line numbers for context. Use winnow-rs/winnow@73c6e05 for the version of CHANGELOG.md in winnow-rs/winnow, as the line numbers will inevitably change with time. The main migration steps are: * `PResult` replaced with `winnow::Result` (L153, L92-L98) * Use `winnow::Result` over `ModalResult` when `cut_err` isn't used * Swap `ErrMode::from_error_kind` to `ParserError::from_input` (L89) References: winnow-rs/winnow#618 References: winnow-rs/winnow#620 References: winnow-rs/winnow#622

feat: implement Pratt parser

fed8c90

epage reviewed Nov 14, 2024

View reviewed changes

epage mentioned this pull request Nov 15, 2024

perf: bench for pratt and shunting yard parsers #620

Closed

39555 and others added 10 commits November 16, 2024 12:34

commit suggestion

ee4459d

Co-authored-by: Ed Page <eopage@gmail.com>

remove spaces from #[doc(alias = "...")]

4b1499d

remove UnaryOp and BinaryOp in favor of Fn

acf4577

This feature was an overengineering based on suggestion "Why make our own trait" in winnow-rs#614 (comment)

remove redundant trait impl

a816a1c

works without it

remove allow_unused, move allow(non_snake_case) to where it shoul…

2a80e65

…d be - based on review "Why allow non_snake_case?" in winnow-rs#614 (comment) - remove `allow_unused` based on "Whats getting unused?" winnow-rs#614 (comment)

stop dumping pratt into combinator namespace

29fe18d

until we find a satisfactory api based on winnow-rs#614 (comment) > "We are dumping a lot of stray types into combinator. The single-line summaries should make it very easy to tell they are related to precedence"

move important things to go first

5a4f4b4

based on "Organizationally, O prefer the "top level" thing going first and then branching out from there. In this case, precedence is core." winnow-rs#614 (comment)

remove wrong and long doc for now

0273a29

I will write the documentation later

fix: precedence for associativity, remove trace()

f218911

- require explicit `trace` for operators - fix associativity handling for infix operators: `1 + 2 + 3` should be `(1 + 2) + 3` and not `1 + (2 + 3)`

39555 mentioned this pull request Nov 17, 2024

feat: implement Pratt parsing #614

Closed

8 tasks

39555 added 4 commits November 18, 2024 01:32

switch from &dyn Fn(O) -> O to fn(O) -> O

3d7ef41

feat: pass Input into operator closures

a6cbc1a

add trace for tests parser

29b64fa

feat: operator closures must return PResult

b31a3a3

epage mentioned this pull request Nov 18, 2024

Pratt parsing support #131

Closed

2 tasks

feat: allow the user to specify starting power

33c82f3

39555 added 12 commits November 19, 2024 12:36

feat: enum Assoc for infix operators. Add Neither associativity

040dd85

fix: switch to i64, fix precedence checking

6d88dff

example: pratt expression parser

8f18fc2

feat: complex postfix operators

a4ad844

- ternary operator - function call - index

pratt_example: operator closures return PResult

54cb315

test: add tests

d6da343

specify the parser start precedence

c1a8535

- fix failing tests related to the ternary operator and commas

style: fix indentation

a85291b

winnow-rs#622 (comment)

refactor: remove unnecessarily multispace0

39cc484

winnow-rs#622 (comment)

fix: failed tests

c52c10d

use Assoc enum. tests for associativity Neither

d3c3d0a

fix: switch to i64

b7b0629

39555 mentioned this pull request Nov 19, 2024

Pratt example #622

Closed

3 tasks

39555 added 6 commits November 19, 2024 15:19

tests ill-formed expressions

5e7fb65

update benchmark

7b6e3e0

PoC: Pratt parsing with shunting yard algorithm

63e30e1

rename unwind_operators_stack -> unwind_operators_stack_to

4ff9b25

# Conflicts: # src/combinator/mod.rs

refactor: make steps more distinct

7b82b0e

update shunting_yard

be02d0a

updates from precedence.rs: - enum `Assoc` - associativity `Neither` - function pointers `fn()` instead of `dyn& Fn`

39555 force-pushed the shunting-yard branch from ebddbb8 to be02d0a Compare November 19, 2024 18:55

ssmendon mentioned this pull request Jul 27, 2025

feat: implement Pratt parsing #804

Merged

Conversation

39555 commented Nov 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Minor things to consider

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coveralls commented Nov 14, 2024

Pull Request Test Coverage Report for Build 11838052714

Details

💛 - Coveralls

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

39555 Nov 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

39555 commented Nov 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

39555 commented Nov 14, 2024 •

edited

Loading

39555 Nov 14, 2024 •

edited

Loading

39555 commented Nov 15, 2024 •

edited

Loading