Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[systemverilog] parser mistakes typedef for a module's port #2413

Closed
antoinemadec opened this issue Feb 4, 2020 · 24 comments · Fixed by #2492
Closed

[systemverilog] parser mistakes typedef for a module's port #2413

antoinemadec opened this issue Feb 4, 2020 · 24 comments · Fixed by #2492

Comments

@antoinemadec
Copy link

The name of the parser: verilog.c

The command line you used to run ctags:

$ ctags --options=NONE foo.sv

The content of input file:
foo.sv

typedef bit[31:0] int32_t;
module mod(
  input bit clk,
  input int32_t a
);
endmodule

The tags output you are not satisfied with:

clk     foo.sv  /^  input bit clk,$/;"  p       module:mod
int32_t foo.sv  /^  input int32_t a$/;" p       module:mod
int32_t foo.sv  /^typedef bit[31:0] int32_t;$/;"        T
mod     foo.sv  /^module mod($/;"       m
mod.clk foo.sv  /^  input bit clk,$/;"  p       module:mod
mod.int32_t     foo.sv  /^  input int32_t a$/;" p       module:mod

The tags output you expect:

clk     foo.sv  /^  input bit clk,$/;"  p       module:mod
a       foo.sv  /^  input int32_t a$/;" p       module:mod
int32_t foo.sv  /^typedef bit[31:0] int32_t;$/;"        T
mod     foo.sv  /^module mod($/;"       m
mod.clk foo.sv  /^  input bit clk,$/;"  p       module:mod
mod.a   foo.sv  /^  input int32_t a$/;" p       module:mod

The version of ctags:

$ ctags --version
Universal Ctags 0.0.0(81fa5b1a), Copyright (C) 2015 Universal Ctags Team                    
  Universal Ctags is derived from Exuberant Ctags.                                            
  Exuberant Ctags 5.8, Copyright (C) 1996-2009 Darren Hiebert                                 
    Compiled: Feb  4 2020, 14:36:50                                                           
    URL: https://ctags.io/                                                                    
    Optional compiled features: +wildcards, +regex, +iconv, +option-directory, +xpath, +json, 
  +interactive, +sandbox, +yaml, +packcc
masatake added a commit to masatake/ctags that referenced this issue Feb 15, 2020
masatake added a commit to masatake/ctags that referenced this issue Feb 15, 2020
masatake added a commit to masatake/ctags that referenced this issue Feb 24, 2020
masatake added a commit to masatake/ctags that referenced this issue Feb 26, 2020
masatake added a commit to masatake/ctags that referenced this issue Feb 26, 2020
masatake added a commit to masatake/ctags that referenced this issue Feb 26, 2020
masatake added a commit to masatake/ctags that referenced this issue Feb 27, 2020
masatake added a commit to masatake/ctags that referenced this issue Apr 4, 2020
Close universal-ctags#2413.

The original code could not capture "a" in "mod" in the following input because
the parser didn't recognize "int32_t" before "a" is a name of type:

    typedef bit[31:0] int32_t;
    module mod(
      input bit clk,
      input int32_t a
    );
    endmodule

This change makes the parser look up the cork symbol table when the
parser reads a token with unknown kind. A typedef like int32_t is
stored to the symbol table when making a tag for it. As the result, the
parser can resolve the kind for the token as typedef when parsing
int32_t in "input int32_t a".

Signed-off-by: Masatake YAMATO <[email protected]>
@masatake
Copy link
Member

masatake commented Apr 4, 2020

Thank you for reporting.
Could you try #2492 ?

@antoinemadec
Copy link
Author

antoinemadec commented Apr 5, 2020

Thanks a lot for your response!

#2492 fixes my original file. Fix is definitely improving things.
However, it still fails if I do: ctags --options=NONE a.sv b.sv

Where:
a.sv

`include "b.sv"

module mod(
  input bit clk,
  input int32_t a
);
endmodule

b.sv

typedef bit[31:0] int32_t;

@masatake
Copy link
Member

masatake commented Apr 5, 2020

Currently, ctags cannot track cross-input file dependencies like a.sv and b.sv.
I'm trying to add features handling the cross-input file dependencies:
#2428

However, it will take more than a year to implement it.
So I will merge #2492 first.

@antoinemadec
Copy link
Author

@masatake got it, makes sense!

I think there is a way to fix this which does not require cross-input file dependencies.
It rather simple: the name of the input/output/inout port is always the last word before a comma or the last parenthesis.

module mod(
  input dont_need_to_know_what_this_is port_name[64]
);

What do your think, would this be implementable?

Thanks,
Antoine

@masatake masatake reopened this Apr 5, 2020
@masatake
Copy link
Member

masatake commented Apr 5, 2020

Obviously Implementable.
The question is who implements that algorithm.

The original maintainer of the systemverilog parser didn't want to take such algorithm.

#1488
#80
#1495

I'm looking for a new maintainer who implements the algorithm.
If we cannot find such person, we have to wait till I complete the "multi-pass/multi-file" infrastructure.
It will take longer time.

So as usual, I have to say "pull requests are well come."

@masatake
Copy link
Member

masatake commented Apr 5, 2020

BTW, could you help me about writing a commit log for fixing the SystemVerilog parser?
You know SystemVerilog well. So I would like to see #2489 .

@antoinemadec
Copy link
Author

@masatake , got it.
I already looked at the code and tried to fix it before opening this ticket.
The C code was a bit out of my league TBH.
Will take another shot at it without any guarantee of results 😃

I just responded to your question on 2489.
Don't hesitate to ask me if you have more question regarding SV, I would be happy to help 😉

Thanks,
Antoine

@masatake
Copy link
Member

masatake commented Apr 6, 2020

@antoinemadec, thank you.

One of the difficulties for changing the logic in verilog.c is that the parsers (systemverilog and verilog) parsers use "kind" (verilogKind) as token type like:


typedef struct sTokenInfo {
	verilogKind         kind;

kind is a thing assigned to a tag entry in the output.
In other hand, "token" is a parser private thing.

See python.c. It defines types for token like:

typedef enum eTokenType {
	/* 0..255 are the byte's value */
	TOKEN_EOF = 256,
	TOKEN_UNDEFINED,
	TOKEN_INDENT,
	TOKEN_KEYWORD,
	TOKEN_OPERATOR,
	TOKEN_IDENTIFIER,
	TOKEN_STRING,
	TOKEN_WHITESPACE,
} tokenType;

typedef struct {
	int				type;

Defining types for token brings great flexibility to the parser code.
If you cannot find easy way to implement what you want, "defining types for token and using the kinds only for emitting tag entries" may be a hint.

Thank you for trying improving the parser.
Feel free to ask me the question. I don't know SystemVerilog. However, I know ctags, especially language common part.

I'm not good at English. So the ways to express my "thanks" are very limited.
"Thank you very much", "very much" suffixed version of "thank you", is kept for the time when I receive a pull request for the issue from you:-)

Good luck.

p.s. #2489, thank you.

I'm thinking about organizing , "language consultants".
Can I allow me to contact you when I get a question about SystemVerilog in the future?
I would like to see #2494.

@masatake
Copy link
Member

masatake commented Apr 6, 2020

@vhda, could you look at this one and give us comments if you have.
I also would like you to see #2494.
BTW, fnally I implemented an infrastructure for multi-pass parsing on an input file is implemented.
I'm working on multi-pass parsing on multiple input files.

@vhda
Copy link
Contributor

vhda commented Apr 19, 2020

An issue with Verilog is that you can have both

input name,

and

input type name,

That is, when you parse the word after type you don't know if it's a type or a name.
But it should be possible to store each string and only determine the token name when , or ; are found. In fact, I vaguely remember doing something like this at the time to solve some other problem.

Let's do as following: let me pick up on a work in progress I have here to extend support to structs. This should help me remember how everything is coded and refresh my C, which has not been used since. I'll then rebase this change over master branch to learn what changed. Finally I'll look into this problem.

Cheers,
Vitor

@masatake
Copy link
Member

Feel free to revert 23297f6 .

@vhda
Copy link
Contributor

vhda commented Apr 19, 2020

Far from that! Preferably we would even talk a bit about cork, to see if I finally understand it.

@masatake
Copy link
Member

masatake commented Apr 20, 2020

See http://docs.ctags.io/en/latest/internal.html#output-tag-stream about cork.
And feel free to ask me question about ctags itself.

@masatake
Copy link
Member

SystemVerilog parser and Ada parser, both have the same limitation in their design level.
They reuse their kind indexes for their token types.
They must be separated. The kind indexes are for tag emission. The token types are for tokenizing.
They are different. This separation brings us great flexibility to improve and change the parsers
without exposing parser internal changes to users' eyes.

@masatake
Copy link
Member

masatake commented Jun 16, 2020

diff --git a/parsers/verilog.c b/parsers/verilog.c
index 737fc815..a3f4ce6e 100644
--- a/parsers/verilog.c
+++ b/parsers/verilog.c
@@ -607,8 +607,8 @@ static void dropEndContext (tokenInfo *const token)
 	vStringDelete(endTokenName);
 }
 
-
-static void createTag (tokenInfo *const token)
+static int createTag (tokenInfo *const token);
+static int createTagFull (tokenInfo *const token, int *corkIndexFQ)
 {
 	tagEntryInfo tag;
 	verilogKind kind;
@@ -627,7 +627,7 @@ static void createTag (tokenInfo *const token)
 	if (vStringLength (token->name) == 0 || ! kindEnabled (kind))
 	{
 		verbose ("Unexpected empty token or kind disabled\n");
-		return;
+		return CORK_NIL;
 	}
 
 	/* Create tag */
@@ -668,7 +668,10 @@ static void createTag (tokenInfo *const token)
 		tag.name = vStringValue (scopedName);
 
 		markTagExtraBit (&tag, XTAG_QUALIFIED_TAGS);
-		makeTagEntry (&tag);
+		if (corkIndexFQ)
+			*corkIndexFQ = makeTagEntry (&tag);
+		else
+			makeTagEntry (&tag);
 
 		vStringDelete (scopedName);
 	}
@@ -705,6 +708,12 @@ static void createTag (tokenInfo *const token)
 
 	/* Clear no longer required inheritance information */
 	vStringClear (token->inheritance);
+	return corkIndex;
+}
+
+static int createTag (tokenInfo *const token)
+{
+	return createTagFull (token, NULL);
 }
 
 static bool findBlockName (tokenInfo *const token)
@@ -1162,7 +1171,7 @@ static void tagNameList (tokenInfo* token, int c)
 {
 	verilogKind localKind;
 	bool repeat;
-
+	int cork = CORK_NIL, fqcork = CORK_NIL;
 	/* Many keywords can have bit width.
 	*   reg [3:0] net_name;
 	*   inout [(`DBUSWIDTH-1):0] databus;
@@ -1206,7 +1215,7 @@ static void tagNameList (tokenInfo* token, int c)
 			else
 			/* Create tag in case name is not a known kind ... */
 			{
-				createTag (token);
+				cork = createTagFull (token, &fqcork);
 			}
 		}
 		else
@@ -1237,9 +1246,22 @@ static void tagNameList (tokenInfo* token, int c)
 		}
 		if (c == ',')
 		{
+			cork = CORK_NIL;
+			fqcork = CORK_NIL;
 			c = skipWhite (vGetc ());
 			repeat = true;
 		}
+		else if (isIdentifierCharacter(c))
+		{
+			tagEntryInfo *e;
+			e = getEntryInCorkQueue (cork);
+			if (e)
+				e->placeholder = 1;
+			e = getEntryInCorkQueue (fqcork);
+			if (e)
+				e->placeholder = 1;
+			repeat = true;
+		}
 	} while (repeat);
 	vUngetc (c);
 }
[jet@living]~/var/ctags% cat /tmp/foo.sv 
cat /tmp/foo.sv 
pack.sv:
package PACK;
    localparam N = 100;
    typedef logic [N-1:0] PORT_TYPE_T;
endpackage // PACK

port.sv:
import PACK::*;

module port (
    input  PORT_TYPE_T  port,
    input  PORT_TYPE_T  port2);
endmodule // port
[jet@living]~/var/ctags% ./ctags  --sort=no -o - /tmp/foo.sv
./ctags  --sort=no -o - /tmp/foo.sv
PACK	/tmp/foo.sv	/^package PACK;$/;"	K
N	/tmp/foo.sv	/^    localparam N = 100;$/;"	c	package:PACK
PORT_TYPE_T	/tmp/foo.sv	/^    typedef logic [N-1:0] PORT_TYPE_T;$/;"	T	package:PACK
port	/tmp/foo.sv	/^module port ($/;"	m
port	/tmp/foo.sv	/^    input  PORT_TYPE_T  port,$/;"	p	module:port
port2	/tmp/foo.sv	/^    input  PORT_TYPE_T  port2);$/;"	p	module:port
[jet@living]~/var/ctags% cat /tmp/bar.sv 
cat /tmp/bar.sv 
`include "b.sv"

module mod(
  input bit clk,
  input int32_t a
);
endmodule // mod
[jet@living]~/var/ctags% ./ctags  --sort=no -o - /tmp/bar.sv
./ctags  --sort=no -o - /tmp/bar.sv
mod	/tmp/bar.sv	/^module mod($/;"	m
clk	/tmp/bar.sv	/^  input bit clk,$/;"	p	module:mod
a	/tmp/bar.sv	/^  input int32_t a$/;"	p	module:mod

@masatake
Copy link
Member

masatake commented Jun 16, 2020

input t a,

The algorithm:

  1. create a tag for t as we did before.
  2. if you find a character that can be part of an identifier as 1, cancel the emission of the tag for t.
  3. create a tag for the identifier a, goto 2.

The step 2. is based on what @vhda wrote.

I guess this doesn't work well because createTag() is called recursively. So the cancellation works partially.

@hirooih
Copy link
Contributor

hirooih commented Sep 20, 2020

Hi,
Are anyone working on this issue? If not, I'd like to try implement @antoinemadec 's algorithm (#2413 (comment)).


SystemVerilog BNF is very complex, so I squashed it.
https://docs.google.com/document/d/1Sb0-iO9HofB7BR52XUarhGYPph6mFMx5oIFcU8eeFj0/edit?usp=sharing

Here is my basic idea.

do {
  skip light-gray tokens
  If the first token is identifier:
     the identifier is a type_identifier or an identifier with implicit_data_type (https://github.com/universal-ctags/ctags/issues/2413#issuecomment-609580989)
  else
     the token is a keyword: (already implemented)
}

@masatake san,

#2413 (comment)

One of the difficulties for changing the logic in verilog.c is that the parsers (systemverilog and verilog) parsers use "kind" (verilogKind) as token type like:
...
kind is a thing assigned to a tag entry in the output.
In other hand, "token" is a parser private thing.

I think the current implementation of verilog.c is straight-forward by reading as follows;

  • enum verilogKind and KeywordTable[] are for kinds of tokens
  • VerilogKinds[] and SystemVerilogKinds[] are for kind of identifier-tokens (assigned to a tag entry in the output)

Any comments or feedback are welcome. Thank you.

@masatake
Copy link
Member

@hirooih, I would like you to know the typical good parser implementation.

In ctags, generally, "kind" is for tags, not for tokens.
A user can know what kinds are available in a parser with "ctags --list-kinds-full=".

"kinds" defined in a parser is part of APIs. We should keep their compatibility as much as possible.
We can implement a parser for a language in various ways. However, changes in the implementation should not be visible
as changes of "kinds" definition.

When writing a token oriented parser, you may want to assign a type to a token like NUMBER, SYMBOL, SEPARATOR, STRING-LITERAL, ... You can define them as you want. They should be used in parser internal. If you change the type definition and/or assignment, the "kinds" definition of parser should not be changed. The types of token and kinds of tags are different things. Users should not know the types of tokens. Actually we don't provide --list-token-types option.

A good parser like Python parser uses token types and token keywords.

typedef enum eTokenType {
	/* 0..255 are the byte's value */
	TOKEN_EOF = 256,
	TOKEN_UNDEFINED,
	TOKEN_INDENT,
	TOKEN_KEYWORD,
	TOKEN_OPERATOR,
	TOKEN_IDENTIFIER,
	TOKEN_STRING,
	TOKEN_ARROW,				/* -> */
	TOKEN_WHITESPACE,
} tokenType;

If a token has TOKEN_KEYWORD type, it can have a subtype called keyword:

enum {
	KEYWORD_as,
	KEYWORD_async,
	KEYWORD_cdef,
	KEYWORD_class,
	KEYWORD_cpdef,
	KEYWORD_def,
	KEYWORD_extern,
	KEYWORD_from,
	KEYWORD_import,
	KEYWORD_inline,
	KEYWORD_lambda,
	KEYWORD_pass,
	KEYWORD_return,
};

These constants are used in the parser internally.
The kinds of python are defined separately:

typedef enum {
	K_CLASS,
	K_FUNCTION,
	K_METHOD,
	K_VARIABLE,
	K_NAMESPACE,
	K_MODULE,
	K_UNKNOWN,
	K_PARAMETER,
	K_LOCAL_VARIABLE,
	COUNT_KIND
} pythonKind;

SystemVerilog is one of the parsers that use keywords and kinds mixed-way.
I think this mixture makes improving the parser harder (without breaking the APIs).

I wrote much. HOWEVER, YOU CAN DO AS YOU WANT for fixing the issue.

What I would like to know is "kinds" is part of the APIs of the parser. We should try to keep compatibility.
SystemVerilog parser has enough test cases. They will detect incompatibility unexpectedly introduced.


enum verilogKind and KeywordTable[] are for kinds of tokens

I don't think this is good. Using kind, a user-visible thing in tokens for typing.

VerilogKinds[] and SystemVerilogKinds[] are for kind of identifier-tokens (assigned to a tag entry in the output)

Your code reading is correct. However, using kind for tokens is not a good idea. kinds are for tags.
tags and tokens are different things. tokens represent the result of reading the input source file.
tags represent the output of ctags.

@masatake
Copy link
Member

BTW, if you need explanations for registerEntry() and foreachEntriesInScope() for fixing this issue, please let me know.

@hirooih
Copy link
Contributor

hirooih commented Sep 20, 2020

@masatake san

I believe I understood what you wrote. I read your similar explanations in #2576, #2618, and this issue.
I just wrote that we could use "kind" as a general English word.

I also think enum verilogKind is a little bit tricky. It bridges KeywordTable[] and VerilogKinds[]/SystemVerilogKinds[]. If verilogKind is greater than or equal to 0, it can be used as an index of VerilogKinds[]/SystemVerilogKinds[].

And APIs ( initTagEntry, makeTagEntry) refers only VerilogKinds[]/SystemVerilogKinds[] and enum verilogKind (>=0). The verilog.c does not break APIs.

BTW, if you need explanations for registerEntry() and foreachEntriesInScope() for fixing this issue, please let me know.

Thank you. I don't think I have to change this part, but I will ask your help if I need.


Before I start working, I would like to propose changing Units/parser-verilog.r/*/args.ctags as follows to output all supported information. We can detect any change of output not to break compatibility. And --sort=no helps debugging.

--extras=+q
--fields=+i
--kinds-systemverilog=+Q
--sort=no

(This is the output of cat */args.ctags | sort | uniq at Units/parser-verilog.r/.)

Off course I also update */expected.tags.

May I do that?

@masatake
Copy link
Member

--sort=no is acceptable. However, I don't think unifying args.ctags is a good idea.
Each test is developed for its own purpose. And each args.ctags is designed for its purpose.
Unifying args.ctags hides the original intention of the test cases.

You are the only person who improves the SystemVerilog parser on the earth. So If you really need unified args.ctags, you can do that.

@hirooih
Copy link
Contributor

hirooih commented Sep 20, 2020

I don't think unifying args.ctags is a good idea.

I tried and found you were right. They added only noises.
I added only "--sort=no".
I sent pull-request #2650


--extras=+q

    parser-verilog.r/verilog-sf_bug73_3
    parser-verilog.r/systemverilog-assignment
    parser-verilog.r/systemverilog-2d-array
    parser-verilog.r/systemverilog-github2635
    parser-verilog.r/verilog-sf_bug108_2
    parser-verilog.r/verilog-memleak

emits only non-useful messages.

--fields=+i

no change

--kinds-systemverilog=+Q

no change

@hirooih
Copy link
Contributor

hirooih commented Oct 4, 2020

@antoinemadec,
I've commit the fix for this issue on #2653. Please try it.

@hirooih
Copy link
Contributor

hirooih commented Oct 5, 2020

Let me close this issue.

@hirooih hirooih closed this as completed Oct 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants