-
Notifications
You must be signed in to change notification settings - Fork 624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
R: rewrite the parser #2452
R: rewrite the parser #2452
Conversation
Hey, I pulled the branch from your fork and was not able to build it for some reason. I can build the ctags master branch, not sure what the difference is. I am on OSX. |
Do you have an error in building ctags or a trouble about git operation? |
It's a build issue. I couldn't find the documentation for how to run it so I just ran autoconfig then make. It gave an error about no symbols for x86_64. Sorry I'm afk at the moment. So I can build the master branch of ctags but can't remember how it was configured. |
gcc, gnu make, automake, autoconf, and pkg-config must be installed.
|
I have verified all dependencies and followed your instructions. I had the same error. Here is the output:
|
Very strange. The debug mode is partially enabled . I cannot imge the reason. I would like you to try both:
and
|
Just built with the debugging enabled flag successfully. Strange |
Found one missed variable.
Continuing to review |
From the r-extended.d directory, we miss the library attachment and sourcing of a local file. I am not sure how ctags generally handles this.
|
I also noticed in the file r-scoped/input.r that the following line:
Defines a5 as an argument to f5, and this is not picked up in the output:
|
|
I have reviewed all the provided test cases, I have one more observation; when
Is executed, the output is:
Because the Sorry for the delay in getting back to you, hopefully this helps. |
@suntzuisafterU, thank you very much. I must fix the other items you spotted first. |
In `(' and `)', newline between 'f' and '<-' should be ignored. $ cat Units/parser-r.r/r-scope.d/input-3.r f <- function () { d <- ( 1 + 2 + 3 + (a <- 4) + 5) print(a) } In this input, "a" must be captured though "<-" is not at the same line. The original code doesn't ignore the newline. The issue is suggested by @suntzuisafterU in universal-ctags#2452. Signed-off-by: Masatake YAMATO <[email protected]>
This one may be fixed in the new commits. |
In this case, r1 on the left side of '<<-' and '<-' are not definition. They should be captured as reference tags because they are just updating the value of variable. I found one derived input.
In this case 'r1' is defined in the code f. To make a better tags file, ctags can know whether a name is already appeared or not. With the symbol table, the R parser can know whether r1 is defined already or not.
In this case, r1 is not defined before parsing f. So we can say r1 is defined here.
ctags works in lexical level. So r1 in f and r1 in g are captured as definition tags. What I cannot find the way how I should do is scope.
In R's semantic, the scope of r1 is empty; r1 is in the top level namespace.
In this case the tag entry for r1 is...
executionScope is not special for R. So it it better to make it short: xscope. Maybe a better term is defined in compiler theory. |
|
In this example:
There are actually 2 instances of variables with the symbol r1. Also, when a variable is defined within a function via Also note that:
Prints the following:
Indicating that Handling multiple definitions in the same scope may be problematic, but semantically this |
Signed-off-by: Masatake YAMATO <[email protected]>
The orignal code doesn't consider the case that a scope is empty. As the result, foreachEntriesInScope (index, NULL,...) raises SEGV. Signed-off-by: Masatake YAMATO <[email protected]>
Signed-off-by: Masatake YAMATO <[email protected]>
Using newly introduced facility called "symbol table", I have refined the original pull request. TODO
The last one may be important when thinking about improving JavaScript parser. |
…unctions Signed-off-by: Masatake YAMATO <[email protected]>
…holders Signed-off-by: Masatake YAMATO <[email protected]>
anyKindsEntryInScopeRecursive does the mostly same as anyKindsEntryInScope(), but does recursively from the given scope to the root scope upward. Signed-off-by: Masatake YAMATO <[email protected]>
Signed-off-by: Masatake YAMATO <[email protected]>
Codecov Report
@@ Coverage Diff @@
## master #2452 +/- ##
==========================================
+ Coverage 86.47% 86.52% +0.04%
==========================================
Files 182 182
Lines 37850 38280 +430
==========================================
+ Hits 32732 33122 +390
- Misses 5118 5158 +40
Continue to review full report at Codecov.
|
13f19b7
to
30628a6
Compare
The coverage of the scanner is too low. |
* A broken ifdef condtion for macros for debugging is spotted by @suntzuisafterU. * The following issue in the original pull request is reported by @suntzuisafterU. In `(' and `)', newline between 'f' and '<-' should be ignored. $ cat Units/parser-r.r/r-scope.d/input-3.r f <- function () { d <- ( 1 + 2 + 3 + (a <- 4) + 5) print(a) } In this input, "a" must be captured though "<-" is not at the same line. The original code doesn't ignore the newline. Signed-off-by: Masatake YAMATO <[email protected]>
3273574
to
71bdf7d
Compare
They were kinds for definition tags. Now libraries attached by library function and scripts loaded by source functions are captured as reference tags. New code accepts only a string literal or identifier as a library or a script. The new code doesn't recognize compound expressions like: source(file.path(Sys.getenv("HOME"), "R", "mystuff.R")) Signed-off-by: Masatake YAMATO <[email protected]>
R is a dynamic language; a variable is not declared explicitly. If assigning a value to a variable and it is the first time, ctags should capture it as a variable definition. ctags doesn't and should not evaluate the code, so detecting the first appearance of a variable is not easy. However, there are areas we can improve. * parameter If assigning a value to a variable listed in the function parameter list, it should not be tagged. Instead the parameter should be tagged. * only the first assignment in the scope Only the first assignment to a variable in the scope should be tagged. The later assignments to the same variable in the same scope should be ignored. This behavior is very different from the old R parser. The old R parser tags all assignments. Unlike JavaScript parser, the new R parser tags all assignments at the top level if the values are functions. Signed-off-by: Masatake YAMATO <[email protected]>
…ariables f <- function () { x <<- 1 } In this case x should be captured as globalVar kind object. y <- 1 g <- function () { y <<- 2 } In this case y in g should not be captured. Signed-off-by: Masatake YAMATO <[email protected]>
Signed-off-by: Masatake YAMATO <[email protected]>
Signed-off-by: Masatake YAMATO <[email protected]>
Close #2393.
Still considering how to handle 'a$b'.
The original parser may capture 'a$b'. However, capturing just 'b' looks better.
In that case, I would like to resolve a.
In that case, the symbol table feature developed at #2427 is needed.
Class related codes are not implemented yet.