Skip to content

Comments

perf: bench for pratt and shunting yard parsers#620

Closed
39555 wants to merge 47 commits intowinnow-rs:mainfrom
39555:pratt-perf
Closed

perf: bench for pratt and shunting yard parsers#620
39555 wants to merge 47 commits intowinnow-rs:mainfrom
39555:pratt-perf

Conversation

@39555
Copy link
Contributor

@39555 39555 commented Nov 14, 2024

The results are quite interesting. When parsing without recursive brackets, the performance is on par:

pratt/pratt             time:   [44.447 µs 44.870 µs 45.396 µs]
                        change: [-2.2112% -0.9080% +0.4095%] (p = 0.17 > 0.05)
                        No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  1 (1.00%) high severe


pratt/shunting yard     time:   [45.557 µs 46.063 µs 46.590 µs]
                        change: [-0.0566% +1.2500% +2.5518%] (p = 0.05 > 0.05)
                        No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

But when parsing with recursive bracketed subexpressions, the performance of the shunting yard declines dramatically:

pratt/pratt             time:   [52.585 µs 53.006 µs 53.450 µs]
                        change: [+16.321% +17.706% +19.164%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild


pratt/shunting yard     time:   [72.964 µs 73.782 µs 74.596 µs]
                        change: [+58.952% +61.360% +63.907%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

linux x86_64 Intel(R) Core(TM) i5-10400F CPU @ 2.90GHz.

The same goes on macos M2.

Here are the corresponding flamegraphs (the parsing includes bracketed expressions):

I think it is related to frequent raw_vec::allocate followed by drop -> deallocate when parsing recursive operators such as (1).

@epage
Copy link
Collaborator

epage commented Nov 14, 2024

Oh, interesting.

If only there was a way to share an "allocator" between each recursive instance of the algorithm. If we provided a custom allocator, users could at least seed the allocator with an expected capacity. Granted, in that case, the user could specify smallvec and put things back on the stack, if the standard stack size is relatively small. The problem is implementing whatever trait we make for smallvec as the orphan rules prevent users from doing it and I'd prefer to keep external types outside of our API.

@epage
Copy link
Collaborator

epage commented Nov 14, 2024

I did try

#[derive(Default)]
enum Stack<T> {
    #[default]
    Empty,
    One(T),
    Many(Vec<T>),
}

and that saw no performance change despite my quick and dirt hacks showing the stack sizes were usually 0-2 items, up to 4 items for the benchmark.

@epage
Copy link
Collaborator

epage commented Nov 14, 2024

Capacity of 4 didn't make a difference

@epage
Copy link
Collaborator

epage commented Nov 14, 2024

If we had built-in parenthesis handling, then this would likely not be an issue.

@epage
Copy link
Collaborator

epage commented Nov 14, 2024

I'm leaning towards going the safety route. This level of nesting used in the benchmark is an extreme.

@39555
Copy link
Contributor Author

39555 commented Nov 15, 2024

SmallVec::<[_; 4]> did the trick and made things slightly faster than Pratt when running without brackets, with comparable timings when brackets are used.

Report (numbers from M2)

pratt/pratt             time:   [26.492 µs 26.529 µs 26.582 µs]
                        change: [-4.4518% -4.1741% -3.8992%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  4 (4.00%) high mild
  7 (7.00%) high severe

pratt/shunting yard     time:   [23.692 µs 23.732 µs 23.776 µs]
                        change: [+0.6546% +0.9260% +1.1760%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

Small expressions

Unfortunately, this is not the case for small expressions such as 1+(1+1*3)*1+(1+(1*4))*1*5 where Shunting Yard is consistently slower

Report (numbers from M2)


pratt/pratt             time:   [136.57 ns 136.96 ns 137.51 ns]
                        change: [-49.206% -49.000% -48.769%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  5 (5.00%) high mild
  7 (7.00%) high severe

pratt/shunting yard     time:   [196.34 ns 196.97 ns 197.86 ns]
                        change: [-38.493% -38.304% -38.096%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe

Without SmallVec, the performance degrades significantly:

Report (numbers from M2)

pratt/pratt             time:   [136.90 ns 136.99 ns 137.11 ns]
                        change: [-0.8013% -0.3646% +0.0220%] (p = 0.09 > 0.05)
                        No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) high mild
  6 (6.00%) high severe

pratt/shunting yard     time:   [280.28 ns 280.48 ns 280.70 ns]
                        change: [+42.683% +43.255% +43.776%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 20 outliers among 100 measurements (20.00%)
  1 (1.00%) low mild
  6 (6.00%) high mild
  13 (13.00%) high severe

And the performance is not much better without brackets:

Report (numbers from M2)

pratt/pratt             time:   [116.45 ns 116.54 ns 116.63 ns]
                        change: [-3.5605% -3.3230% -3.1042%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe

pratt/shunting yard     time:   [164.99 ns 165.11 ns 165.23 ns]
                        change: [+2.2790% +2.6136% +2.9324%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

@epage
Copy link
Collaborator

epage commented Nov 15, 2024

I worry about pulling in a dependency just for one algorithm that most won't use, of coming up with an appropriate abstraction for a feature. That mostly leaves smallvec to how well we can make the stack customizable for people to use it.

btw something I forgot to call out in the discussion of trade offs in this thread is that recursion is a more trivial solution for no_std than using the heap.

Also, API wise, a pratt parser could return a struct "Pratt" that has a function on it for controlling recursion depth so we could do it on behalf of users.

Over at #618 (comment) you called out Double E Method.

Whats your opinion on

  • Pratt vs Shunting yard
  • Whether its worth experimenting with Double E Method for reducing the pains associated with Shunting yard

@39555
Copy link
Contributor Author

39555 commented Nov 15, 2024

Whether its worth experimenting with Double E Method for reducing the pains associated with Shunting yard

That link, as I understand it, describes the same approach that I've implemented here and called it shunting yard. It's a modification of the original shunting yard but it can handle unary operators and can reject ill-formed expressions similar to the Pratt.

We already have two algorithms with similar interfaces. I'm planning to create a comprehensive example, similar to the JSON one. This will be useful to have anyway. Something like parsing function calls or ternary operators. This might show the advantages and disadvantages of each approach

39555 and others added 14 commits November 16, 2024 12:34
Co-authored-by: Ed Page <eopage@gmail.com>
This feature was an overengineering

based on suggestion "Why make our own trait" in
winnow-rs#614 (comment)
…d be

- based on review "Why allow non_snake_case?"

in winnow-rs#614 (comment)

- remove `allow_unused` based on "Whats getting unused?"
winnow-rs#614 (comment)
until we find a satisfactory api

based on
winnow-rs#614 (comment)

> "We are dumping a lot of stray types into combinator. The single-line
summaries should make it very easy to tell they are related to
precedence"
based on "Organizationally, O prefer the "top level" thing going first
and then branching out from there. In this case, precedence is core."

winnow-rs#614 (comment)
the api has an unsound problem. The `Parser` trait is implemented on the
`&Operator` but inside `parse_next` a mutable ref and
`ReffCell::borrow_mut` are used which can lead to potential problems.

We can return to the API later. But for now lets keep only the essential
algorithm and pass affix parsers as 3 separate entities

Also add left_binding_power and right_binding_power to the operators
based on
winnow-rs#614 (comment)
I will write the documentation later
# Conflicts:
#	src/combinator/shunting_yard.rs
@39555
Copy link
Contributor Author

39555 commented Nov 20, 2024

Update with feature-complete parsers

I pushed an updated benchmark inside the example using the bumpalo allocator.
Input: a = 2*-2 * (a ? 1 + 2 * 4 - --a.bar + 2 : 2) / ( &**foo.a->p! -+1) + 3^1 / 4 == 1 * (2 - 7 + 567 *12 /2) + 3*(1+2*( 45 /2))

Without bumpalo

pratt/pratt        
                        time:   [2.4638 µs 2.4669 µs 2.4703 µs]
                        change: [+0.4618% +0.6850% +0.9004%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) high mild
  5 (5.00%) high severe

pratt/shunting_yard     
                        time:   [2.5804 µs 2.5814 µs 2.5829 µs]
                        change: [-0.6435% -0.3945% -0.1386%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 13 outliers among 100 measurements (13.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  8 (8.00%) high severe

Reusing the bump arena

pratt/pratt             
                        time:   [1.6877 µs 1.6896 µs 1.6915 µs]
                        change: [-1.6832% -1.4469% -1.2231%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  6 (6.00%) high mild
  3 (3.00%) high severe

pratt/shunting_yard     
                        time:   [1.9346 µs 1.9364 µs 1.9385 µs]
                        change: [-1.3229% -1.0127% -0.7373%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

A new bump arena each time

pratt/pratt_with_new_bump_each_time
                        time:   [1.7935 µs 1.7950 µs 1.7973 µs]
                        change: [-0.8932% -0.7001% -0.5126%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) high mild
  8 (8.00%) high severe

spratt/shunting_yard_with_new_bump_each_time
                        time:   [2.0403 µs 2.0420 µs 2.0450 µs]
                        change: [-1.7721% -1.5693% -1.3480%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  5 (5.00%) high mild
  7 (7.00%) high severe

@39555
Copy link
Contributor Author

39555 commented Nov 20, 2024

It is hard to beat the application stack in terms of flexibility, platform support and speed. The recursion grows only with prefixes like -+--++-++--+1 and with right associative infix operators such as pow: 1 ** 2 ** 3 ** 4 ** 5 ** 7 ** 2 ** 1. I assume the user can track the recursion inside the parser and the fold function using the Stateful<> input, as thefold occurs only after all the operands have been parsed.

|i: &mut Stateful<...>| {
  if i.state.recursion > 99 {
       return Err(...)
  }
  i.state.recursion +=1;
  ...
}.value((Right(20), |i: _, a: _, b:_ | {
     i.state.recursion -= 1;
     a + b
}))

ssmendon added a commit to ssmendon/rollers that referenced this pull request May 31, 2025
This commit:
* Fixes errors due to winnow 0.7 migration
* Adapts winnow-rs/winnow#620 to work outside winnow
* Uncomments test cases

For the migration steps, I've attached CHANGELOG.md line numbers for
context.

Use winnow-rs/winnow@73c6e05 for the
version of CHANGELOG.md in winnow-rs/winnow, as the line numbers will
inevitably change with time.

The main migration steps are:

  * `PResult` replaced with `winnow::Result` (L153, L92-L98)
  * Use `winnow::Result` over `ModalResult` when `cut_err` isn't used
  * Swap `ErrMode::from_error_kind` to `ParserError::from_input` (L89)

References: winnow-rs/winnow#618
References: winnow-rs/winnow#620
References: winnow-rs/winnow#622
ssmendon added a commit to ssmendon/rollers that referenced this pull request Jun 1, 2025
I also edited some benchmark IDs to be consistent. For now, I'm fine
having criterion write to the same ID. It's convenient for my CLI
testing.
@lu-zero lu-zero mentioned this pull request Jul 6, 2025
@epage epage closed this Nov 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants