Skip to content

[Postgresql] Grammar is not idiomatic ANTLR #4291

@masonwheeler

Description

@masonwheeler

If I had to guess, it looks like this Postgres parser grammar was produced by someone taking a grammar designed for a different parser generator and translating it as literally as possible. But it's got a lot of content that is very bad ANTLR.

For example, you see this pattern a lot:

from_clause
    : FROM from_list
    |
    ;

Having a rule end in | ; means it will always be considered a valid match, even if it contains no content. This is painful to work with, because now in the visitor, on a rule that contains this as a sub-rule, you can't simply say if (context.from_clause() != null) to see if you have a real match; the code to check for it is significantly messier.

The idomatic way to do this in ANTLR is to define it as:

from_clause
    : FROM from_list
    ;

and then have any rule that uses it invoke it as from_clause?.

Also, some things are just really weird. For example, the following:

from_list
    : non_ansi_join
    | table_ref (COMMA table_ref)*
    ;

non_ansi_join
    : table_ref (COMMA table_ref)+
    ;

Unless I'm overlooking some crucial detail, the non_ansi_join rule here is entirely superfluous because the alternative branch of from_list is a strict superset of it. Again, this feels like it was translated overly-literally from some other parser generator's grammar.

Would it be possible to clean this grammar up a bit?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions