Skip to content

PoC: Pratt parsing with shunting yard algorithm#618

Draft
39555 wants to merge 34 commits intowinnow-rs:mainfrom
39555:shunting-yard
Draft

PoC: Pratt parsing with shunting yard algorithm#618
39555 wants to merge 34 commits intowinnow-rs:mainfrom
39555:shunting-yard

Conversation

@39555
Copy link
Contributor

@39555 39555 commented Nov 14, 2024

Attempt №2 #614

This is much smaller implementation based on the modified shunting yard from the https://github.com/bourguet/operator_precedence_parsing/tree/master

Differences from the previous Pratt implementation:

  • No more recursion. The explicit Vec stack is now used, with one stack for operands and another for operators.
  • Without RefCells

Differences from the https://en.wikipedia.org/wiki/Shunting_yard_algorithm:

  • Parsing is done in a single pass without first converting the expression to Polish notation.
  • Braces '(' are handled as an operand using recursive sub-expression similar to Precedence parsing rust-bakery/nom#1362 (Wikipedia hardcodes braces into the algorithm itself)
    • The operator_precedence_parsing repository introduces special prefix_action and postfix_action mutable closures for handling braces. This complicates the algorithm, so it’s out of scope for our PoC. However, we’ll keep it in mind if recursion for braces is undesirable.

This is extremely barebones for now without the fancy UX we will agree later. it is 3 parsers slapped into the function signature: one each for prefix, postfix, and infix. Prefix and postfix parsers should return
(power, &dyn Fn(O) -> O) and the infix should return the ugly (left_power, right_power, &dyn Fn(O) -> O) just 2 powers for now without a trick with Assoc enum.

If this is the way to go I will apply your rewiew suggestions from #614 where they’re still applicable.

Minor things to consider

  • A user provided stack similar to Accumulate
  • Error kinds. Algorithm has missing operand and value left on stack

EDIT: This algorithm is described as The-Double-E-Method in https://github.com/erikeidt/erikeidt.github.io/blob/master/The-Double-E-Method.md

// what we expecting to parse next
let mut waiting_operand = true;
// a stack for computing the result
let mut value_stack = Vec::<Operand>::new();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to mark this as requiring std. That is the one benefit to recursion that it can operate in no_std environments.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A user provided stack similar to Accumulate

Ah, curious idea to explore. I wouldn't put this as a blocker but we can create an issue and see if it garners interest

@coveralls
Copy link

Pull Request Test Coverage Report for Build 11838052714

Details

  • 0 of 54 (0.0%) changed or added relevant lines in 1 file are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage decreased (-0.7%) to 40.843%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/combinator/shunting_yard.rs 0 54 0.0%
Files with Coverage Reduction New Missed Lines %
src/stream/mod.rs 1 24.95%
Totals Coverage Status
Change from base Build 11802515306: -0.7%
Covered Lines: 1298
Relevant Lines: 3178

💛 - Coveralls

let mut value_stack = Vec::<Operand>::new();
let mut operator_stack = Vec::<Operator<'_, Operand>>::new();

'parse: loop {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our use of a loop with waiting_operand reminds me of rust-lang/rfcs#3720

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think I'd name this something like unwind_operator_stack_to to make it clear what the condition is for unwinding

_ => fail
},
),
trace("postfix", fail),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

precedence could put these traces on the parameters it passes to shunting_yard

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Granted, encouraging users to do it makes the parameter list easier to read

Comment on lines +172 to +246
dispatch! {peek(any);
'(' => delimited('(', trace("recursion", parser), cut_err(')')),
_ => digit1.parse_to::<i32>()
},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still requires recursion to do parenthesis but avoiding that is likely only something that can be handled with trivial expressions. This also puts the responsibility for recursion on the users side so they know its happening and can account for it as needed (e.g. having a depth check)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its not clear to me what problem you are having here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user needs to manually convert &|_| {} into &dyn Fn

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the infix should return the ugly (left_power, right_power, &dyn Fn(O) -> O) just 2 powers for now without a trick with Assoc enum.

So the two powers is more of a raw implementation and the associativity enums are an abstraction over it?

Copy link
Contributor Author

@39555 39555 Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The algorithm uses two powers for infix to determine what to parse next. From matklad's https://matklad.github.io/2020/04/13/simple-but-powerful-pratt-parsing.html

expr:      A       +       B       +       C
power:  0      3      3.1      3      3.1     0

the enum in chumsky automatically bumps the value with some clever trick

impl Associativity {
    fn left_power(&self) -> u32 {
        match self {
            Self::Left(x) => *x as u32 * 2,
            Self::Right(x) => *x as u32 * 2 + 1,
        }
    }

    fn right_power(&self) -> u32 {
        match self {
            Self::Left(x) => *x as u32 * 2 + 1,
            Self::Right(x) => *x as u32 * 2,
        }
    }
}

Comment on lines +104 to +142
// if eval_stack.len() > 1 {
// // Error: value left on stack
// }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error kinds. Algorithm has missing operand and value left on stack

I can see it being important to know of a "missing operand".

What end-user condition leaves a value on the stack or is that more of an assert?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the way to go I will apply your rewiew suggestions from #614 where they’re still applicable.

I assume this API style of API, removing RefCell and allowing dispatch! could be applied to #614.

What is your overall impression of the two?

I'm also curious about the performance of recursion vs iteration but I suspect some differences, like use of dispatch! would bias things

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will write a benchmark with dispatches and stripped down ReffCells and tuples. We will see what is the best. The explicit stack is nice if the interface allows the user to customize the type such as VecDeque or SmallVec or something for no_std. Both functions are really similar except the recursion part

@39555
Copy link
Contributor Author

39555 commented Nov 15, 2024

39555 and others added 10 commits November 16, 2024 12:34
Co-authored-by: Ed Page <eopage@gmail.com>
This feature was an overengineering

based on suggestion "Why make our own trait" in
winnow-rs#614 (comment)
…d be

- based on review "Why allow non_snake_case?"

in winnow-rs#614 (comment)

- remove `allow_unused` based on "Whats getting unused?"
winnow-rs#614 (comment)
until we find a satisfactory api

based on
winnow-rs#614 (comment)

> "We are dumping a lot of stray types into combinator. The single-line
summaries should make it very easy to tell they are related to
precedence"
based on "Organizationally, O prefer the "top level" thing going first
and then branching out from there. In this case, precedence is core."

winnow-rs#614 (comment)
the api has an unsound problem. The `Parser` trait is implemented on the
`&Operator` but inside `parse_next` a mutable ref and
`ReffCell::borrow_mut` are used which can lead to potential problems.

We can return to the API later. But for now lets keep only the essential
algorithm and pass affix parsers as 3 separate entities

Also add left_binding_power and right_binding_power to the operators
based on
winnow-rs#614 (comment)
I will write the documentation later
- require explicit `trace` for operators
- fix associativity handling for infix operators:
`1 + 2 + 3` should be `(1 + 2) + 3` and not `1 + (2 + 3)`
@39555 39555 mentioned this pull request Nov 17, 2024
8 tasks
@epage epage mentioned this pull request Nov 18, 2024
2 tasks
@39555 39555 mentioned this pull request Nov 19, 2024
3 tasks
updates from precedence.rs:
- enum `Assoc`
- associativity `Neither`
- function pointers `fn()` instead of `dyn& Fn`
ssmendon added a commit to ssmendon/rollers that referenced this pull request May 31, 2025
This commit:
* Fixes errors due to winnow 0.7 migration
* Adapts winnow-rs/winnow#620 to work outside winnow
* Uncomments test cases

For the migration steps, I've attached CHANGELOG.md line numbers for
context.

Use winnow-rs/winnow@73c6e05 for the
version of CHANGELOG.md in winnow-rs/winnow, as the line numbers will
inevitably change with time.

The main migration steps are:

  * `PResult` replaced with `winnow::Result` (L153, L92-L98)
  * Use `winnow::Result` over `ModalResult` when `cut_err` isn't used
  * Swap `ErrMode::from_error_kind` to `ParserError::from_input` (L89)

References: winnow-rs/winnow#618
References: winnow-rs/winnow#620
References: winnow-rs/winnow#622
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments