Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Challenge: match Parboiled2 result value expressiveness #81

Open
AndreVanDelft opened this issue May 17, 2015 · 1 comment
Open

Challenge: match Parboiled2 result value expressiveness #81

AndreVanDelft opened this issue May 17, 2015 · 1 comment

Comments

@AndreVanDelft
Copy link
Owner

Parboiled2 is quite expressive for result values; with help of Shapeless.

E.g. the following lines are extracted from the CSV parser example:

  case class CsvFile(header: Option[Record], records: immutable.Seq[Record])
  case class Record(fields: immutable.Seq[String])

    val file = rule {
      OWS ~ optional(test(ctx.headerPresent) ~ header ~ NL) ~
        oneOrMore(record).separatedBy(NL) ~ optional(NL) ~ EOI ~> CsvFile
    }

    val header = rule { record }

    val record = rule { oneOrMore(field).separatedBy(ctx.fieldDelimiter) ~> Record }

    val NL = rule { optional('\r') ~ '\n' }

    val OWS = rule { zeroOrMore(' ') }

I have no idea how ~> CsvFile gets two parameters for the construction of a CsvFile; one from the optional header and one from the one-or-more records.

Anyway, it is a challenge to improve SubScript's expressiveness so that it would understand a similar parser specification.
If no result values had been required the specification would simply be:

file = ows; if (ctx.headerPresent) header NL; record .. NL; .NL; EOI

However, result values are needed. The main challenge is to hand the two parameters to the constructor of CsvFile.
Suppose we would write:

file = OWS 
       optional: [header^ NL] ^1
       oneOrMore: record, separatedBy: NL ^2
       NL; EOI
       ~~~> CsvFile

^1 and 2 would imply that the result of the enclosing script (not file here, but a lambda, because of the arrow), must be a tupel with 2 fields. This tuple is filled in 2 steps. It is then transferred using the arrow, and handed as 2 parameters to CsvFile.
But as long as I don't understand the Parboiled2 code, I cannot be more specific here.

Scripts such as optional could probably be defined as:

script..

  success[T](v:T) = {: v :} (+)

  optional(r: Script[R]): Option[R] = success(None) + r^{Option(_)}

  zeroOrMore:separatedBy[R](r: Script[R], s: Script[_]): Seq[R] = {:Nil:}^; .; r^{$ :+ _}..s
   oneOrMore:separatedBy[R](r: Script[R], s: Script[_]): Seq[R] = {:Nil:}^;    r^{$ :+ _}..s

Note that r^{Option(_)} sets the result of the enclosing script to the result of the call to r, wrapped in an Option.

There would be more issues, but it would be good to have a proper explanation of the Parboiled2 code first.

BTW the script success may look a bit weird. It starts with a tiny code fragment that sets the result value. This code fragment behaves successfully, but success should act like a 1. Therefore the tiny code fragment is put in a sequential composition with (+) i.e. the 1 element.
Maybe a shorter notation would be needed; then we may think of

  success[T](v:T) = {+ v +}

and

  failure(e:Throwable) = {- throw e -}
@anatoliykmetyuk
Copy link
Collaborator

I have no idea how ~> CsvFile gets two parameters for the construction of a CsvFile; one from the optional header and one from the one-or-more records.

This rule:

OWS ~ optional(test(ctx.headerPresent) ~ header ~ NL) ~
        oneOrMore(record).separatedBy(NL) ~ optional(NL) ~ EOI

must have a type of Option[Record] :: immutable.Seq[Record], where :: is a thing Shapeless constructs lists of types with. This is so, because only header and record are of type Rule1[_] there, everything else is of type Rule0. Their behaviour on successful match differs: Rule0 does nothing and Rule1[T] pushes the matched value of type T to the value stack. So we can't use the value matched by Rule0, but we can use value matched by Rule1[_] in future.

When two rules of type Rule1[A] and Rule1[B] are concatenated by the ~ operator, the resulting rule is of type Rule1[A :: B] - meaning it will push a list of values (A and B) to the stack.

I don't know how ~> is implemented internally, but the intuition is as follows.
The lhs of the ~> is a rule and the rhs is a function that transforms the values outputted by this rule. The result is a rule of the same type as the rhs's output.
On compile time, the ~> looks at the left-hand side rule's type - which is a list of types of the values that are pushed by it to the value stack. It'll then typecheck the right-hand side operand to be a function arguments of which are of same types as the values pushed to the stack by the left-hand side rule. Since rules are macros, operations with types must be easy.
If everything's OK, ~> produces a new rule. This rule, on runtime, will invoke the ~>'s lhs rule to do a match for it, then pop its matched values from the value stack, transform them with its rhs function and push the result to the stack as its own result.

I don't think SubScript's implementation will be similar to Parboiled2. Parboiled2 parses linearly, hence it is possible to have a value stack and safely work with it. SubScript traverses its call graph in much more complicated way - so we can't use the value stack.

I think, if we're intended to use SubScript for parsing, we should create some parsing tests-challenges for it, and also a module that will turn SubScript into a parser. So that we can see how it performs as we add new features.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants