-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
missing lexer rules if tokenVocab defined #157
Conversation
grammar.getTokenType(t.getText())==Label.INVALID ) | ||
Character.isLowerCase(currentRuleName.charAt(0))) && | ||
(hasTokenVocabAndIsParserOrCombined || | ||
grammar.getTokenType(t.getText())==Label.INVALID )) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced this is completely correct.
- Why do you change the behavior for parser grammars? A parser grammar would use a
tokenVocab = LexerGrammarName
, and that lexer grammar name could itself have atokenVocab = CustomTokensFile
. It appears that only combined grammars are impacted by the scenario you are describing. - Does the condition assume that the imported
tokenVocab
defines tokens which are referenced in the parser? What happens if the referencedtokenVocab
file is empty, or otherwise does not define a literal which is referenced in a parser rule?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your feedback.
- I will change the condition to combined grammars only, my mistake.
- I guess you are right that the relaxation of the condition might be too relaxed. I assumed the use cases you described are covered by existing tests (and they all passed). I have only written a test for the use case were the defined tokens in
tokenVocab
are referenced in the parser (as a side notice, should I open an issue for this bug?)
Have you tried separating your grammar into separate lexer and parser grammars, and then using the |
…d normal parser before) - fixed token vocab parser, it got into an endless loop when there was an error at the end of the file - added tests for: -- empty token vocab file -- token vocab file with errors -- token vocab file which includes referenced tokens -- token vocab file which includes non-referenced tokens
As mentioned in my previous comment I have changed the condition to combined parsers only. Furthermore I have added the desired tests and also fixed a bug in the token vocab parser on the go. Btw. separating the grammar is not an option since I do not want more maintenance than necessary (and it's just matter of adding this feature and we are good to go). |
…t empty lines in a tokenVocab
Are there any open questions or issues left? |
we will look at this for the next release. |
I created an alternate implementation of this feature. Note that you don't need to include each token twice in the .tokens file. For example, the following is sufficient to cover the assignment for all of
|
Sounds promising. I will try it out as soon as your pull request is merged. |
Please consider to cherry pick the commit: robstoll@20d859a Would be nice to be able to add comments and new lines to token files. I am sorry that I did not put that into a separate pull request. Could create one though if you like. |
Fix is due to a problem I encountered, see http://stackoverflow.com/questions/22285912/antlr-v3-token-file-array-and-null
Problem was: lexer rules for tokens defined in the tokenVocab where not generated which resulted in NullPointerExceptions in the lexer.
This pull request fixes this issue (and I have fixed some broken links in the README.txt). The reasons why I want to use a tokenVocab already in the parser are explained in the StackOverflow thread