Conversation
epage
left a comment
There was a problem hiding this comment.
Wow, this looks great. You put a lot of work into this and I see care is taken in cases like the handling of infinite loops and such
src/combinator/precedence.rs
Outdated
| #[doc(alias = "shunting yard")] | ||
| #[doc(alias = "precedence climbing")] | ||
| #[inline(always)] | ||
| pub fn precedence<I, ParseOperand, Operators, Operand: 'static, E>( |
There was a problem hiding this comment.
I like names to make their role in the grammar clear. I'm trying to decide how this does and if there is anything better. We use separated for this kind of thing but unsure how to tie it in without it being overly verbose
There was a problem hiding this comment.
Maybe expression? Similar to how combine names this parser https://docs.rs/combine-language/latest/combine_language/fn.expression_parser.html
There was a problem hiding this comment.
Oh interesting, I wasn't aware of that crate. We should at least add expression_parser to our aliases.
We likely should pick one and deal with it and rename it if a better name comes up. expression seems just as good as any other. It fits with how this would be used in a language grammar.
src/combinator/precedence.rs
Outdated
| }; | ||
|
|
||
| /// An adapter for the [`Parser`] trait to enable its use in the [`precedence`] parser. | ||
| pub trait PrecedenceParserExt<I, E> { |
There was a problem hiding this comment.
Our rule of thumb is grammar-level concepts are standalone functions and value processing are trait functions. These seem grammar-level.
There was a problem hiding this comment.
Thanks for reviewing! Do you suggest making prefix, infix.. standalone functions? I followed the concern in the attached issue to make <parser>.prefix(...) work:
So I assume we'd go with
let calc = pratt( digits1.map(Expr::Int), ( '-'.prefix(Right(1), |r| unary(r, Op::Neg)); '+'.infix(Left(0), |l, r| binary(l, r, Op::Add)); '!'.prefix(Right(3), |r| unary(r, Op::Fact)); ) );
These prefix, infix, postfix are strange beasts in this implementation. They are both grammar-level constructs and value processing functions because the data processing is required by the parser to advance further.
I though about the interface like prefix(0, "-").map(|(a, b)| a - b). It looks clean but what should be the default operator when we don't call map. For a unary it is easy -- just return the argument, but for a binary it is unclear how to merge operands.
There was a problem hiding this comment.
Ha! I previously looked at this and viewed the associativity as not grammar level and now looked at it and viewed it as grammar level. Normally trait methods are reserved for purely operating on the data, like map. This is actually changing how we parse the results. The one weird middle ground we have is cut_err. Unsure if this should be treated like that or not.
I also feel like a bunch of free functions floating around might not be the easiest for discovery and use.
I also suspect the ultimate answer will be dependent on the answer to other questions as I alluded to the interplay of my comments in #614 (comment)
src/combinator/precedence.rs
Outdated
| #[doc(alias = "shunting yard")] | ||
| #[doc(alias = "precedence climbing")] | ||
| #[inline(always)] | ||
| pub fn precedence<I, ParseOperand, Operators, Operand: 'static, E>( |
There was a problem hiding this comment.
The list of parsers will need to be updated
src/combinator/precedence.rs
Outdated
| #[doc(alias = "shunting yard")] | ||
| #[doc(alias = "precedence climbing")] |
There was a problem hiding this comment.
Don't know if aliases with spaces works
src/combinator/precedence.rs
Outdated
| /// Type-erased unary predicate that folds an expression into a new expression. | ||
| /// Useful for supporting not only closures but also arbitrary types as operator predicates within the [`precedence`] parser. | ||
| pub trait UnaryOp<O> { | ||
| /// Invokes the [`UnaryOp`] predicate. | ||
| fn fold_unary(&mut self, o: O) -> O; | ||
| } | ||
| /// Type-erased binary predicate that folds two expressions into a new expression similar to | ||
| /// [`UnaryOp`] within the [`precedence`] parser. | ||
| pub trait BinaryOp<O> { | ||
| /// Invokes the [`BinaryOp`] predicate. | ||
| fn fold_binary(&mut self, lhs: O, rhs: O) -> O; | ||
| } | ||
|
|
||
| impl<O, F> UnaryOp<O> for F | ||
| where | ||
| F: Fn(O) -> O, | ||
| { | ||
| #[inline(always)] | ||
| fn fold_unary(&mut self, o: O) -> O { | ||
| (self)(o) | ||
| } | ||
| } | ||
| impl<O, F> BinaryOp<O> for F | ||
| where |
There was a problem hiding this comment.
I tend to prefer that impls be closely organized with their trait
src/combinator/mod.rs
Outdated
| pub use self::debug::*; | ||
| pub use self::multi::*; | ||
| pub use self::parser::*; | ||
| pub use self::precedence::*; |
There was a problem hiding this comment.
We are dumping a lot of stray types into combinator. The single-line summaries should make it very easy to tell they are related to precedence (maybe be the first word) and somehow help the user know what, if any, they should care about
src/combinator/precedence.rs
Outdated
| } | ||
| } | ||
|
|
||
| macro_rules! impl_parser_for_tuple { |
There was a problem hiding this comment.
So we find the right "mode" by falling through an alt of parsers for that mode and then get the weight and fold function from the first success.
Looking at the example
precedence(
digit1.try_map(|d: &str| d.parse::<i32>()),
(
"-".value(2).prefix(|x| -1 * x),
"+".value(2).prefix(|x| x),
"!".value(2).postfix(|x| factorial(x)),
"+".value(0).infix(|a, b| a + b),
"-".value(0).infix(|a, b| a + b),
"*".value(1).infix(|a, b| a * b),
"/".value(1).infix(|a, b| a / b),
),
)
.parse_next(i)the alt seems unfortunate for performance reasons. dispatch can be used in this case and I suspect a lot of other cases. It would be a big help to find a way to set this up so dispatch can be used as well, e.g.
precedence(
digit1.try_map(|d: &str| d.parse::<i32>()),
dispatch!{any;
'-' => prefix(2, |x| -1 * x),
'+' => prefix(2, |x| x),
'!' => postfix(2, |x| factorial(x)),
'+' => infix(0, |a, b| a + b),
'-' => infix(0, |a, b| a + b),
'*' => infix(1, |a, b| a * b),
'/' => infix(1, |a, b| a / b),
_ => fail,
),
)
.parse_next(i)(note: I might give other suggestions that run counter to this, we'll need to weigh out each one and figure out how we want to balance the different interests)
There was a problem hiding this comment.
Another idea. The implementation in nom rust-bakery/nom#1362 uses 3 parsers one for each affix in the function signature. It has an advantage that you can use dispatch and there is no prefix, postfix.. functions. The downside is that from just the function signature it's hard to remember which parser is which. And also fail is used if the parser is not provided.
Maybe abuilder pattern would be ideal e.g:
precedence(digit1.parse_to::<i32>())
.prefix(dispatch!{any;
"+" => empty.map(|_| move |a, b| a + b)
"-" => empty.map(|_| move |a, b| a + b)
})
.postfix(dispatch!{any; ...})
.infix(dispatch!{any; ...})
.build()
.parse_next(i)Methods are similar to repeat().fold. This approach would eliminate stray functions, reduce the performance impact of iterating over each parser for each affix, and allow users to use dispatch or any custom logic they want.
I’m not sure yet how this would interact with binding power. I'm going to research whether we can drop the explicit index argument and rely on array indexing, how binding power interact between affixes .etc.
There was a problem hiding this comment.
I came across another idea. Instead of dispathing by the parser type maybe we could dispatch by the input type:
enum Input<'i, I>{
Prefix(&'i mut I),
Postfix(&'i mut I),
Infix(&'i mut I)
}
then it would be possible to (I'm not sure yet if Deref would work or not)
dispatch!{any;
'-' => prefix(2, |x| -1 * x),
'+' => prefix(2, |x| x),
'!' => postfix(2, |x| factorial(x)),
'+' => infix(0, |a, b| a + b),
'-' => infix(0, |a, b| a + b),
'*' => infix(1, |a, b| a * b),
'/' => infix(1, |a, b| a / b),
_ => fail,
)then prefix parsers would do
if let Prefix() = input {
...
} else {
fail
}There was a problem hiding this comment.
This is nice because we only run the needed parsers (less alt, no dispatch just to get an error).
I agree about the opacity of parameter order. Its also not great if a parameter is unused (e.g. no postfix).
If we go with the "builder" approach, we are again violating the API design guidelines as discussed in #614 (comment). However, it does make it nice for naming the parameters and only requiring what is needed.
Maybe one way of looking at this is the error reporting. What happens if I use an operator in the wrong location?
There was a problem hiding this comment.
Not quite getting the input type and dealing with custom input types in the middle of a parser adds a lot of complexity.
src/combinator/precedence.rs
Outdated
| #[doc(alias = "shunting yard")] | ||
| #[doc(alias = "precedence climbing")] | ||
| #[inline(always)] | ||
| pub fn precedence<I, ParseOperand, Operators, Operand: 'static, E>( |
There was a problem hiding this comment.
Organizationally, O prefer the "top level" thing going first and then branching out from there. In this case, precedence is core.
There was a problem hiding this comment.
Streaming support
This looks to be agnostic of streaming support like separated is
There was a problem hiding this comment.
Hide behind a feature flagunstable-pratt
If this is going to start off unstable, then its fine noting most of my feedback in the "tracking" issue and not resolving all of it here
src/combinator/precedence.rs
Outdated
| // recursive function | ||
| fn precedence_impl<I, ParseOperand, Operators, Operand: 'static, E>( |
There was a problem hiding this comment.
Is there a way we can do this without recursion? Parser authors are generally cautious around recursion due to blowing up the stack.
Hmm, seems all of the other ones use recursion
src/combinator/precedence.rs
Outdated
| } | ||
|
|
||
| // recursive function | ||
| fn precedence_impl<I, ParseOperand, Operators, Operand: 'static, E>( |
There was a problem hiding this comment.
From #614 (comment)
While looking at it I found a potential error. My current implementation is unsound. Consider the 2 + 4 + 7. In the current parser it evaluates as (4 + 7) + 2 instead of (2 + 4) + 7 when operators have equal binding power . That is why the algorithm uses associativity. While + or * are commutative someone might want to parse function calls for example
Co-authored-by: Ed Page <eopage@gmail.com>
This feature was an overengineering based on suggestion "Why make our own trait" in winnow-rs#614 (comment)
works without it
…d be - based on review "Why allow non_snake_case?" in winnow-rs#614 (comment) - remove `allow_unused` based on "Whats getting unused?" winnow-rs#614 (comment)
until we find a satisfactory api based on winnow-rs#614 (comment) > "We are dumping a lot of stray types into combinator. The single-line summaries should make it very easy to tell they are related to precedence"
based on "Organizationally, O prefer the "top level" thing going first and then branching out from there. In this case, precedence is core." winnow-rs#614 (comment)
the api has an unsound problem. The `Parser` trait is implemented on the `&Operator` but inside `parse_next` a mutable ref and `ReffCell::borrow_mut` are used which can lead to potential problems. We can return to the API later. But for now lets keep only the essential algorithm and pass affix parsers as 3 separate entities Also add left_binding_power and right_binding_power to the operators based on winnow-rs#614 (comment)
I will write the documentation later
This feature was an overengineering based on suggestion "Why make our own trait" in winnow-rs#614 (comment)
…d be - based on review "Why allow non_snake_case?" in winnow-rs#614 (comment) - remove `allow_unused` based on "Whats getting unused?" winnow-rs#614 (comment)
until we find a satisfactory api based on winnow-rs#614 (comment) > "We are dumping a lot of stray types into combinator. The single-line summaries should make it very easy to tell they are related to precedence"
based on "Organizationally, O prefer the "top level" thing going first and then branching out from there. In this case, precedence is core." winnow-rs#614 (comment)
the api has an unsound problem. The `Parser` trait is implemented on the `&Operator` but inside `parse_next` a mutable ref and `ReffCell::borrow_mut` are used which can lead to potential problems. We can return to the API later. But for now lets keep only the essential algorithm and pass affix parsers as 3 separate entities Also add left_binding_power and right_binding_power to the operators based on winnow-rs#614 (comment)
- require explicit `trace` for operators - fix associativity handling for infix operators: `1 + 2 + 3` should be `(1 + 2) + 3` and not `1 + (2 + 3)`
until we find a satisfactory api based on winnow-rs#614 (comment) > "We are dumping a lot of stray types into combinator. The single-line summaries should make it very easy to tell they are related to precedence"
|
Another difference from the #618 algorithm is that this recursive one backtracks when the left binding power of an operand is less than the current right binding power. It could affect the performance if parsing the operators is expensive. The #618 always stacks all parsed operators into a vector. |
|
Operator callbacks should be able to return fn ternary(cond: Expr) -> ... {
move |i: &mut Input| {
let (a, b) = separated_pair(pratt_parser , ':' , pratt_parser).parse_next(i)?;
Ok(Expr::Ternary(cond, a, b)
}
}
fn pratt_parser() -> ... {
precedence(
...,
trace("postfix", '?'.value((5, ternary))),
...
)
} |
- builder pattern - `Prefix` `Postfix` `Infix` struct tuples
|
Updated with the new API.
use Infix::*;
expression(0, digit1.parse_to::<i32>())
.prefix(dispatch! {any;
'+' => Prefix(12, |_, a| Ok(a)),
'-' => Prefix(12, |_, a: i32| Ok(-a)),
_ => fail
})
.infix(dispatch! {any;
'+' => Left(5, |_, a, b| Ok(a + b)),
'-' => Left(5, |_, a, b| Ok(a - b)),
'*' => Left(7, |_, a, b| Ok(a * b)),
'/' => Left(7, |_, a, b| Ok(a / b)),
'%' => Left(7, |_, a, b| Ok(a % b)),
'^' => Right(9, |_, a, b| Ok(a ^ b)),
_ => fail
})
.parse_next(i) |
src/combinator/expression.rs
Outdated
| #[doc(alias = "precedence_climbing")] | ||
| #[inline(always)] | ||
| pub fn expression<I, ParseOperand, O, E>( | ||
| start_power: i64, |
There was a problem hiding this comment.
Should start_power also be a "builder" method?
- Only used in very specific cases
- Can better focus documentation on it
There was a problem hiding this comment.
Also, would referring to it as precedence or precedence_level be a more approachable user-facing term?
There was a problem hiding this comment.
start_precedence_level or min_precedence_level, allowed_precedence ? This is needed for if the binding power of the operator < start_precedence the parser stops parsing. This is exposed to allow parsing a subset of an expression such as a ? b : c, 1 + 2 (parsing stops at ,)
There was a problem hiding this comment.
When a user sets it, is it the same precedence level they set for their operator? If so, I'd use current_ as the prefix
| #[inline(always)] | ||
| pub fn prefix<NewParsePrefix>( | ||
| self, | ||
| parser: NewParsePrefix, | ||
| ) -> Expression<I, O, ParseOperand, NewParsePrefix, Post, Pix, E> | ||
| where | ||
| NewParsePrefix: Parser<I, Prefix<I, O, E>, E>, | ||
| { | ||
| Expression { | ||
| start_power: self.start_power, | ||
| parse_operand: self.parse_operand, | ||
| parse_prefix: parser, | ||
| parse_postfix: self.parse_postfix, | ||
| parse_infix: self.parse_infix, | ||
| i: Default::default(), | ||
| o: Default::default(), | ||
| e: Default::default(), | ||
| } | ||
| } |
There was a problem hiding this comment.
For these, how come we don't do
| #[inline(always)] | |
| pub fn prefix<NewParsePrefix>( | |
| self, | |
| parser: NewParsePrefix, | |
| ) -> Expression<I, O, ParseOperand, NewParsePrefix, Post, Pix, E> | |
| where | |
| NewParsePrefix: Parser<I, Prefix<I, O, E>, E>, | |
| { | |
| Expression { | |
| start_power: self.start_power, | |
| parse_operand: self.parse_operand, | |
| parse_prefix: parser, | |
| parse_postfix: self.parse_postfix, | |
| parse_infix: self.parse_infix, | |
| i: Default::default(), | |
| o: Default::default(), | |
| e: Default::default(), | |
| } | |
| } | |
| #[inline(always)] | |
| pub fn prefix<NewParsePrefix>( | |
| mut self, | |
| parser: NewParsePrefix, | |
| ) -> Expression<I, O, ParseOperand, NewParsePrefix, Post, Pix, E> | |
| where | |
| NewParsePrefix: Parser<I, Prefix<I, O, E>, E>, | |
| { | |
| self.parse_prefix = parser; | |
| self | |
| } |
There was a problem hiding this comment.
Different types of the Expression due to the different type of the new parser
This feature was an overengineering based on suggestion "Why make our own trait" in winnow-rs#614 (comment)
…d be - based on review "Why allow non_snake_case?" in winnow-rs#614 (comment) - remove `allow_unused` based on "Whats getting unused?" winnow-rs#614 (comment)
until we find a satisfactory api based on winnow-rs#614 (comment) > "We are dumping a lot of stray types into combinator. The single-line summaries should make it very easy to tell they are related to precedence"
based on "Organizationally, O prefer the "top level" thing going first and then branching out from there. In this case, precedence is core." winnow-rs#614 (comment)
the api has an unsound problem. The `Parser` trait is implemented on the `&Operator` but inside `parse_next` a mutable ref and `ReffCell::borrow_mut` are used which can lead to potential problems. We can return to the API later. But for now lets keep only the essential algorithm and pass affix parsers as 3 separate entities Also add left_binding_power and right_binding_power to the operators based on winnow-rs#614 (comment)
Closes winnow-rs#131 Implementation history in winnow-rs#614
Closes winnow-rs#131 Implementation history in winnow-rs#614
Closes winnow-rs#131 Implementation history in winnow-rs#614
This commit adds the `expression` module, which implements an API for operator precedence parsing. It specifically leverages a Pratt parsing algorithm. The core of the API is the `Expression` struct, which implements a builder pattern for configuring the Pratt parser. You create it using the `expression()` method. This is based on the work by @39555 in PR winnow-rs#614, with some fixes and style changes by @ssmendon. Closes winnow-rs#131 Co-authored-by: Igor <ig_@tutamail.com>
This commit adds the `expression` module, which implements an API for operator precedence parsing. It specifically leverages a Pratt parsing algorithm. The core of the API is the `Expression` struct, which implements a builder pattern for configuring the Pratt parser. You create it using the `expression()` method. This is based on the work by @39555 in PR winnow-rs#614, with some fixes and style changes by @ssmendon. Closes winnow-rs#131 Co-authored-by: Igor <ig_@tutamail.com>
This commit adds the `expression` module, which implements an API for operator precedence parsing. It specifically leverages a Pratt parsing algorithm. The core of the API is the `Expression` struct, which implements a builder pattern for configuring the Pratt parser. You create it using the `expression()` method. This is based on the work by @39555 in PR winnow-rs#614, with some fixes and style changes by @ssmendon. Closes winnow-rs#131 Co-authored-by: Igor <ig_@tutamail.com>
This commit adds the `expression` module, which implements an API for operator precedence parsing. It specifically leverages a Pratt parsing algorithm. The core of the API is the `Expression` struct, which implements a builder pattern for configuring the Pratt parser. You create it using the `expression()` method. This is based on the work by @39555 in PR winnow-rs#614, with some fixes and style changes by @ssmendon. Closes winnow-rs#131 Co-authored-by: Igor <ig_@tutamail.com>
This commit adds the `expression` module, which implements an API for operator precedence parsing. It specifically leverages a Pratt parsing algorithm. The core of the API is the `Expression` struct, which implements a builder pattern for configuring the Pratt parser. You create it using the `expression()` method. This is based on the work by @39555 in PR winnow-rs#614, with some fixes and style changes by @ssmendon. Closes winnow-rs#131 Co-authored-by: Igor <ig_@tutamail.com>
This commit adds the `expression` module, which implements an API for operator precedence parsing. It specifically leverages a Pratt parsing algorithm. The core of the API is the `Expression` struct, which implements a builder pattern for configuring the Pratt parser. You create it using the `expression()` method. This is based on the work by @39555 in PR winnow-rs#614, with some fixes and style changes by @ssmendon. Closes winnow-rs#131 Co-authored-by: Igor <ig_@tutamail.com>
Closes: #131
This is the initial implementation of the
operator precedenceparser based on the Pratt algorithm https://en.wikipedia.org/wiki/Operator-precedence_parser#Pratt_parsing.The interface:
Any parser that returns a number can be considered as operator with its
binding power. To convert the parser intoprefix,infixorpostfix, we use the extension trait on top of theimpl Parser.Implementation details
Binding power, associativity, Pratt and Dijkstra
The Pratt algorithm only uses theEDIT: I am wrong here. We need to use associativity forbinding power.Leftandrightassociativity are used in Dijkstra'sShunting Yardalgorithm. (It seems likechumskyincorrectly names its implementation asprattwhile usingshunting yardinternally).infixoperators to determine the order of the operators with equal binding power.I named the combinator
precedencewith aliases topratt,shunting yard,precedence climbing,separated. If any of the names are confusing, we may consider better alternatives.The implementation is partially inspired by https://gist.github.com/ilonachan/3d92577265846e5327a3011f7aa30770.
But it doesn't use any
Rc<...>allocations for closures; instead, it uses uses& RefCell<& dyn ...>references.Operators from the tuple are statically dispatched by the requested affix so
prefix.as_postfix()will returnfailwhileprefix.as_prefix()with return the actual parser. This may have a performance issue because it iterates through the whole tuple each time the parser is requested for any affix:See the trace:
(I tried to make a some sort of a linked list where each parser holds an index to the next one with the same affix but, it quickly became a nightmare of lifetimes, complexity, and compilation errors so it was rejected).
Performance
A quick criterion benchmark with a simple input &str
1-2*4+12-561-5-6*6/9-3+1*-2*4-758*3shows:TODO
Implement the api for Vec<..> and &[...]unstable-prattEDIT:
Things to consider
i64but may overflow due to operations likeprecedence + 1andprecedence - 1. Consider replacing+and-with checks against theAssocenum instead.