-
Notifications
You must be signed in to change notification settings - Fork 9
semantic: Discrepancy in parsing of header comments #56
Comments
This is how Go identifies which comments correspond to which node: if the comment is immediately before the line, it's attached to a node. In terms of Babelfish, this may be considered a bug, but not for the Native mode, since it's by definition "whatever the native parser produces". However, I think the same is true for Semantic, so we should place comments to corresponding places at lest in that mode. |
Okay, updating the title to |
It's somehow similar to how CDT handles comments in cpp-driver. I think we've already discussed it, and we decided that it's hard to define which comment belongs to what statement moreover what does it mean that comments belongs to some nested node in tree. |
@kuba-- Correct, but I was thinking about CDT issue in terms of Native mode. Then we can add a processing stage for Semantic to place those comments (as separate nodes) inside the tree based on positions. Same can be done for Go here. But we may need to pull some comments out from existing nodes first. |
But I'm not sure if it's really needed. Comments are comments: text + position. I'm not sure if we really need to complicate it. So far (based on use cases), I don't see the reason. Personally, I would keep them as a flat, separate branch from root node. |
Yeah, I agree about simplicity of traversal. This is one of the reasons why I advocate for graphs: comments may point to nodes, not nodes to comments. This way they won't appear in AST during the traversal, but are still easily accessible. But for now we have to work with trees. Having a flat list of comments with positions works for advanced users that can map them back to nodes, but not for a simple use case. For example, if you open play.bblf.sh and paste the code with comments, you will expect them to appear near relevant nodes (but not in Native, as the issue points out). |
I would ask opposite question - do people really care where they see comments in tree. I know what we have in bblfsh playground, so far. My question is what could be wrong if we change it? |
From a user's point of view I would say this. Currently in some drivers, this one included, we have both implemented:
Now, I would tend to say you ought to keep only the first and actually generalize this, in semantic mode, to all drivers. That is because as you said it's easy to map the comments using positional information if need be, comments do not need to be augmented with structural information imo, and I tend to think that for a lot of use cases one would simply extract all comments. That being said, I use Babelfish often, so I might be biased, and I can see how having comments at their correct position might help. What I think is that, unless you want to follow a data-sparse approach in which case picking one or the other should be done, then there is no problem with having both. But in that case, I think it would be good to say so in the doc, possibly port the flat representation to drivers where it does not exist, and finally fill the gaps where they exist, e.g. here where some comments appear only in the flat representation - because these kind of discrepancies will definitely confuse users. |
Mapping comments to syntactic constructs is very tricky to get right, and relies on conventions that differ across languages. There is no single rule that gives the "right answer" across languages, because the way people write comments feeds back on how the tools interpret them—meaning you write comments in different layouts if you expect different interpretation. If we want to pick a universal rule, it's going to have to be pretty simplistic: Having a flat list of comments for the whole file, for example, should always work—even if we have to synthesize it for languages that don't provide it natively. The corner cases about what counts as "adjacent" are very tricky, and are not treated the same way between tools like Doxygen, Javadoc, gofmt, clang-format, and so on. I agree we should try to provide users with a reasonable experience, but I don't foresee being able to pick a simple rule that accomplishes that. I did an extensive survey of comment layout rules at my last job, and if I still have the notes from it I will try to post them somewhere so you can get a sense of how annoyingly inconsistent people are about their expectations. 😀 |
Basically, if we have:
Then:
comment 1
,comment 2
,comment 3
andcomment 4
go in theComments
attribute of the root nodecomment 1
andcomment 2
go in theDoc
attribute of the root nodecomment 3
andcomment 4
go in theDoc
attribute of theGenDecl
node in the root node'sDecl
attributeHowever, if we have:
Then:
comment 1
,comment 2
,comment 3
andcomment 4
still go in theComments
attribute of the root nodecomment 2
goes in theDoc
attribute of the root nodecomment 4
goes in theDoc
attribute of theGenDecl
node in the root node'sDecl
attributeBasically, any comments placed before a newline before the
package
/import
keywords get removed from theDoc
attributes of the respective nodes.The text was updated successfully, but these errors were encountered: