-
Notifications
You must be signed in to change notification settings - Fork 32
Add documentation example of retrieving position information of an arbitrary node #29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
alberth
wants to merge
1
commit into
igordejanovic:master
Choose a base branch
from
alberth:doc-add-position-example
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function assumes that all
nodeshavelendefined which might not be true as the elements ofnodescan be arbitrary objects produced by lower level actions.The only way I see to track positions of individual nodes is to take note of them in all actions. That is handled by parglare if tree building is used.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
contexthas a start and end position, so it must know the length of everything. Unfortunately, it does not provide a start position for each token, it only does that for the production rule as a whole. (That is, what I really want iscontext.token_start_position[token_index].) As a work-around, I need to calculate the offset from the production rule start to the token, so I can use the start position of the rule, and applyparglare.pos_to_line_colto get a human-readable terminal position.I wrote this code, since in my opinion, line (and column) information is required in almost all non-trivial programs that use a parser. Parglare doesn't provide a standard function for it, so I think almost every Parglare user will have to dig through the API in the manual like I did, figuring out how to get useful position information. The example code shows how to do it in a common use case (ie parsing plain text).
While for you, parsing is the one and only job, for me, parsing is just the simple first step (takes a day or two if the language design is finished, and if it fits nicely in the parser assumptions).
Right behind it is static semantics checking and/or interpretation of the input, and error detection and reporting about the input is a dominant part there (takes several months at least rather than 2 days). In my experience, you always need line/column position from the input, with ascii-ish text being the dominant stuff being parsed, and the need to be able to fallback to dumping error messages onto
stderr. (That is,incorrect type at line 32, column 54is so much nicer thanincorrect type in the type definition starting at offset 33536.)I agree you need length of a token in my code, but I don't know how else to get a
(line, column)position of an arbitrary terminal in a production rule. As a secondary consideration, in case there is no length of a token, my guess is that a(line, column)notion won't make sense either, ie, you don't need this code in that case.I didn't look at building a parse tree, since my actions immediately abstract to an AST. I don't want a tree with all irrelevant stuff still in there, since I will have to crawl over it with manually written code rather than with your automagically generated callbacks. (That is, a parse tree is more work.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, probably I wasn't clear enough. It's not that I think that position is not important. It is very much important to me as well as the parsing is not the only thing I do. I do stuff after parsing :)
The note I made was that the code above assumes that all objects returned from the lower level actions in the bottom-up processing have length defined which might not be true. So, this example should make that clear.
parglare idea is to be general enough and flexible to be used in different context. So I try to provide bare minimum functionality and build on top of it. For example, support for assignments and auto-building of ASTs is provided using built-in
objaction which uses the same action calling mechanism as any other action.Keeping positions of all tokens might be a significant waste of space/time if the information is not needed. If it is needed than a simple action decorator that is applied to all actions and keeps track of all tokens shouldn't be more than a few lines of code in Python. I've been doing recently some support for action decorators to make action registration easier.
It sounds like a good idea to provide decorator for position tracking in parglare, and that would be a more general approach which won't have assumption about results of the previous actions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I typically need the position of one token in a production rule (which gets copied into the AST). I can even point out which token(s) exactly in advance. In another parser we use the
@to denote a token should be an argument in the callback, so we can retrieve its position, egIt's the wrong programming language, but you get the idea :p
This parser is much more aimed at parsing text, so 'Token.position' is a line and column already, which avoids a lot of boilerplate position conversions.
Another option could be to drop all the position conversion here, and just get and store the offset of a token in the AST. Then, conversion becomes a separate problem outside parsing, although I can see a user wanting to convert the offset into position while constructing the AST rather than at error reporting time. You'd likely want to provide a simple additional API for the 'text input' case though as a service.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, ok. You can easily do that now in parglare. Default actions in parglare will convert token to string it matches and non-terminal rule to a Python list. If you use assignments,
objaction will be executed on those rules by default which creates object with attributes created by the LHS of assignments and values matched by RHS. The class of the object will be named after the grammar rule. Thus, you have automatic AST building out-of-the box. Position is kept in all AST created objects as_pg_start_positionand_pg_end_position(that is done byobjaction). The only thing you might need in addition is the position of tokens/terminal matches. That can be easily provided by overriding default shift action that converts token to a string.For example:
where
tokenis action you provide that will take what is matched and create object with matched content and position for example. Notice that in this way you can choose which token keeps its position.The other way to achieve similar is by converting each terminal rule to a non-terminal rule using assignments:
Now, these rules are actually non-terminal rules using assignments. They will be processed by
objaction and they will get_pg_start/end_positionattributes. Matched value will be in thevalueattribute.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great that it's already possible. The main problem might be lack of a "how to get positions in my ast" guide?
I typically don't use automagic AST generators, since the structure of the parsed text normally does not match the structure of my AST. As a simple example, consider
This declares variables in C style, ie
int i, j, k;In my AST, I'd likeThis makes further processing of variable declarations easier. Automagic AST generation can't handle such structural changes, for the simple reason that it doesn't know what I want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there is a lack of this in the docs. I'll see to provide some form of action decorator for this and document it.
You got the point. Indeed, there is a need to manually handle such cases. That's why I didn't go with textX route like before where all you have is just auto-AST building. In parglare you have some sane defaults (at least sane in my view :) ) but you can override what you don't like. Here you could override default
objaction forvardefand provide your own as there is the change in the structure. But for the part where the syntactic structure fits wanted AST you don't need to do anything.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it's less work, but I prefer to have a proper class for each AST node, since it has room for documentation, and it serves as a stable, independent, easy to find reference, while working on semantics checking. That's likely a personal preference though, ymmv.