Adding autoformatting #328

KarelPeeters · 2024-04-22T17:57:52Z

KarelPeeters
Apr 22, 2024

Intro

I'm investigating how hard it would be to add automating as a feature, as part of #156. This project already has a parser, now we "only" need to add the other direction 🙂.

Inspiration can be taken from rustfmt, notably Design.md and the more up-to-date Contributing.md.

Current state

I noticed the file display.rs has implementations for some, but not all, AST nodes. As far as I can tell statements and anything that would require them (up to and including AnyDesignUnit::Secondary) does not yet have a Display implementation.
Is the intention that this gets extended to cover the entire AST?

Some limitations of the current design:

Indentation is hard-coded per AST node type, which does not extend to arbitrarily nested expressions.
Parenthesis insertion is decided based on precedence levels, potentially overriding user intent.
Comments are dropped, these need to be preserved.
There is no line wrapping, and it's not obvious how to add this.

Future

Some decisions need to be made:

Should the formatter be whitespace only or more all-encompassing (including line formatting and blank lines) like rustfmt?
- I think white-space only is much easier to implement, since we don't need to make any complicated line-breaking decisions.
How much information do we want to add to the AST to enable formatting? For example, comments can be handled in two ways: either add them to the AST and then emit them again, or keep them in a separate data structure and re-insert them during stringifying. The same is true for line breaks.

I'll do some more research and maybe start on a prototype, I just wanted to start the discussion as early as possible.

kraigher · 2024-04-22T18:56:39Z

kraigher
Apr 22, 2024
Maintainer

First of all the Display trait is a dead end for formatting. To do proper formatting you need to have a context such as the current indentation level and probably also settings.

How much information do we want to add to the AST to enable formatting? For example, comments can be handled in two ways: either add them to the AST and then emit them again, or keep them in a separate data structure and re-insert them during stringifying. The same is true for line breaks.

The current architecture has been to not include comments in the AST but rather have each AST be aware of the span of Tokens it consists of. The comments can be extracted from the token span and be kept in this way. Each Token is associated with either leading or trailing comments. The comments are stored in the tokens and are not tokens themselves.

This separation between AST and Tokens is to keep the different layers of the architecture simpler and focused om their role. To have all information in one big structure would be very complex and also all information is not availablein a single pass. VHDL-LS has three clear layers:

Tokens, the result of tokenization
AST, the result of parsing
Named entities, the result of semantic analys (type checking etc)

Layers willhave references to each other, for example AST will reference a span of tokens. AST will also reference named entities which are either decarations or uses of for example signals, entities etc.

Should the formatter be whitespace only or more all-encompassing (including line formatting and blank lines) like rustfmt?

Rust format is a good guiding star to lead the way, its a very nice tool. Of course you got to start somewhere and it will not be as good as rustfmt in the first iterations.

0 replies

Schottkyc137 · 2024-04-22T21:15:44Z

Schottkyc137
Apr 22, 2024
Maintainer

Just want to add that I am very interested in pushing this forward. Should anyone need any help (and I know that implementing such capabilities can mean a lot of work), feel free to contact me

0 replies

KarelPeeters · 2024-04-22T21:59:05Z

KarelPeeters
Apr 22, 2024
Author

Thanks for the feedback @kraigher and for the offer @Schottkyc137. I'm interested in working on this, I'll start doing some initial experiments and see if I get stuck anywhere.

0 replies

Schottkyc137 · 2024-04-23T17:30:20Z

Schottkyc137
Apr 23, 2024
Maintainer

First of all the Display trait is a dead end for formatting. To do proper formatting you need to have a context such as the current indentation level and probably also settings.

How much information do we want to add to the AST to enable formatting? For example, comments can be handled in two ways: either add them to the AST and then emit them again, or keep them in a separate data structure and re-insert them during stringifying. The same is true for line breaks.

The current architecture has been to not include comments in the AST but rather have each AST be aware of the span of Tokens it consists of. The comments can be extracted from the token span and be kept in this way. Each Token is associated with either leading or trailing comments. The comments are stored in the tokens and are not tokens themselves.

This separation between AST and Tokens is to keep the different layers of the architecture simpler and focused om their role. To have all information in one big structure would be very complex and also all information is not availablein a single pass. VHDL-LS has three clear layers:

Tokens, the result of tokenization

AST, the result of parsing

Named entities, the result of semantic analys (type checking etc)

Layers willhave references to each other, for example AST will reference a span of tokens. AST will also reference named entities which are either decarations or uses of for example signals, entities etc.

Should the formatter be whitespace only or more all-encompassing (including line formatting and blank lines) like rustfmt?

Rust format is a good guiding star to lead the way, its a very nice tool. Of course you got to start somewhere and it will not be as good as rustfmt in the first iterations.

One thing that currently hinders a straightforward implementation of such features is that tokens are not always part of the AST. Instead, some AST elements are enriched with the source code information. Do you think that, in the long run, this source code information should entirely be replaced with the token information?
This is a pretty significant change which encompasses the whole stack, making the implementation fairly difficult. Is there a better way?

0 replies

kraigher · 2024-04-24T07:02:41Z

kraigher
Apr 24, 2024
Maintainer

One thing that currently hinders a straightforward implementation of such features is that tokens are not always part of the AST. Instead, some AST elements are enriched with the source code information. Do you think that, in the long run, this source code information should entirely be replaced with the token information?
This is a pretty significant change which encompasses the whole stack, making the implementation fairly difficult. Is there a better way?

Yes I think all AST nodes should have information on which token span they come from in the form of 32-bit indexes into the tokens of the file. In most cases this would mean you could remove the WithPos etc from AST nodes.

Sure it will be a lot of grinding to achive this but I do not see a better way forward. Whay would be the alternative? Somehow adding comments directly to the AST would be even more work, less elegant and would not preserve information about other aspects that a formatter would care about such as extra paranthesis and casing.

0 replies

Shivlocin · 2024-06-16T14:39:27Z

Shivlocin
Jun 16, 2024

Maybe this project is relevant to the conversation.

0 replies

SethGower · 2024-07-22T14:17:08Z

SethGower
Jul 22, 2024

Maybe this project is relevant to the conversation.

There has already been some discussion over there from @kraigher jeremiah-c-leary/vhdl-style-guide#920

0 replies

jeremiah-c-leary · 2024-07-24T12:59:56Z

jeremiah-c-leary
Jul 24, 2024

Morning everyone,

Maybe this project is relevant to the conversation.

I am the creator of this project and have spent over 6 years on it's development. I believe I am at the point where I need to re-evaluate my implementation and was wondering if using rust_hdl would be an option to build upon.

Honestly not sure where to start, but maybe this question will do:

Is there an API documented somewhere? I tried, briefly, to look for something but I did not see a link to one.

Regards,

--Jeremy

0 replies

Schottkyc137 · 2024-07-24T16:20:42Z

Schottkyc137
Jul 24, 2024
Maintainer

Unfortunately there is no stable API yet. Attempts have been made, but not much has happened since quite some time. One of the main reasons (at least according to what I think) is that there are still a few fundamental design decisions that have to be made before one can publish an API. For example, for the formatter, a lot of changes concerning the AST were necessary, mostly to incorporate source locations into the nodes.
You will find the exported documentation for what is available here, but note that what is documented is often lacking.

0 replies

jeremiah-c-leary · 2024-07-25T02:29:16Z

jeremiah-c-leary
Jul 25, 2024

Thanks @Schottkyc137 ,

I looked over this thread and this pull request. It looks like some form of output formatting has been implemented.

How do you see this collaboration working out? My knowledge of rust is limited at best, so code contributions would be out of the question. I do have a lot of experience on the user side of formatting VHDL code, what users are looking for and their use cases.

--Jeremy

0 replies

Schottkyc137 · 2024-07-28T12:49:43Z

Schottkyc137
Jul 28, 2024
Maintainer

So I'm not quite sure what the best model is here. But what is lacking from the formatter as designed in #321 is some way of enabling user input. So basically, what I am looking for is a "configuration blueprint". What is it that users actually want to configure?
I can obviously create this myself, but it would probably reflect my preferences / style. Therefore, I am hoping to somehow build on your expertise in this field. The actual implementation of capabilities described in the config file should actually not be too hard (or at least way easier than building the full formatter, which has been done now)

On of the problem that I see with this "proposal" is that there is a lot of commonality in this project and yours. Essentially, after all is done, vhdl_lang would be able to replace your project. Cannibalising other open source projects is not something that I aim for. Unfortunately, I'm not sure what the best way forward is here.
With my opinion on the matter being outlined, do you see any potential in a mutually beneficial collaboration?

0 replies

jeremiah-c-leary · 2024-07-28T14:58:49Z

jeremiah-c-leary
Jul 28, 2024

Morning @Schottkyc137 ,

So I'm not quite sure what the best model is here. But what is lacking from the formatter as designed in #321 is some way of enabling user input. So basically, what I am looking for is a "configuration blueprint". What is it that users actually want to configure?
I can obviously create this myself, but it would probably reflect my preferences / style. Therefore, I am hoping to somehow build on your expertise in this field.

Users want to configure everything. VHDL does not have a standard formatting guide similar to Python's PEP8. All formatting decisions are left to the user, and like anything left to the users there will be variations. If you have not reviewed the documentation for VSG, here are some examples:

We went through a whole discussion on tabs vs spaces, treating prefixes and suffixes of names differently than the base name. Some requests have been quite surprising to me as I would never have thought to format code in requested manner.

I wrote some documentation on how VSG rules should be created. I wish I had taken the time to create that document before I started coding. I believe there are 900 rules for formatting code, but that is a small fraction that could be created.

On of the problem that I see with this "proposal" is that there is a lot of commonality in this project and yours. Essentially, after all is done, vhdl_lang would be able to replace your project. Cannibalising other open source projects is not something that I aim for. Unfortunately, I'm not sure what the best way forward is here.

I was curious about the roadmap for vhdl_lang. The first goal in the README states:

This project aims to provide a fully featured open source VHDL frontend that is easy to integrate into other tools.

Is adding formatting to vhdl_lang going against this goal? I was under the impression it would parse and lex VHDL files and then provide a data structure which could then be queried and manipulated by another application. For example, I was contacted about using the parser out of VSG to help autogenerate some documentation. I could see applications that create block diagrams, track code statistics, get compile order and register documentation.

The key to enabling other use cases is a good API. If the formatting feature you are adding is a separate tool, then it would force the development of the API so others can use it effectively. At the moment, the formatting is within vhdl_lang, so the community does not benefit. One might argue the best way forward would be to update VSG to use vhdl_lang as its parser.

As for cannibalizing VSG, I have mixed feelings. It has been an amazing journey for me and has pushed my coding skills forward. I have had the opportunity to talk with people from around the world and at places like CERN. At the same time it has taken a large amount of my time and I can not focus on other areas of interest:

Cannibalizing would only occur if formatting was added to vhdl_lang. Whether it should be added would be determined by the road map for vhdl_lang. Without a road map, there is nothing to say what is in or out of scope for this project.

With my opinion on the matter being outlined, do you see any potential in a mutually beneficial collaboration?

I am more than willing to share my experiences in creating VSG and any lessons learned.

Would you want to start with the #321 pull request. I do have questions about the data structure and opinions on how the formatting should be implemented.

Regards,

--Jeremy

0 replies

Schottkyc137 · 2024-07-28T16:54:28Z

Schottkyc137
Jul 28, 2024
Maintainer

Good evening @jeremiah-c-leary

I wrote some documentation on how VSG rules should be created.

I read the document on how to create rules. This is really interesting and might give a good lead. I have also read some of the documentation. It's truly amazing how many different styles there are :D

Is adding formatting to vhdl_lang going against this goal?

In my opinion, it's not. There are two main reasons:

It is the goal of vhdl_lang to be a standalone tool / API, but moreover it is built to support the capabilities of (without actually being dependent on) the Language Server Protocol (LSP) with vhdl_ls being the reference implementation. As LSP supports formatting, vhdl_lang should support that too.
One potential use-case is source-to source conversion, for example, to convert a VHDL 2019 file to VHDL 2008 file (or even earlier). I believe that such use-cases could already be implemented, based on the AST only. However, after one obtains a modified AST, printing that AST to a file is currently not natively supported. Formatting would allow this.

In general, the roadmap resp. goals of this project are – broadly speaking

To be a free and open source language tool for VHDL.
To provide an efficient language front-end for other tools.

The use-cases you outlined (e.g., block diagrams, code statistics, compile order, e.t.c.) fall under the second category. I find both of these use-cases interesting, but will mostly work on the former (out of personal interest). That being said, I'm also more than happy to help and put work into creating a good API if there are concrete use-cases and helpful user input.

Cannibalizing would only occur if formatting was added to vhdl_lang. Whether it should be added would be determined by the road map for vhdl_lang. Without a road map, there is nothing to say what is in or out of scope for this project.

I will definitely add the formatting capabilities (and I have merged the PR). It's simply too much work that has gone into the tool. In what capacity this is going to be integrated into the whole eco-system is another question though.

Would you want to start with the #321 pull request

Even though it's already merged I would really love to hear your opinions and I'm happy to answer any questions of yours.

Best,
Lukas

0 replies

LukasVik · 2024-08-05T13:08:58Z

LukasVik
Aug 5, 2024

Hello Lukas, Jeremy, and others,

I tried the formatting from the latest master branch. Worked well on the 30k lines of code across 160 files that I tested it on. Really cool to see, great job!

Some thoughts:

I think that supporting user-configured style is a black hole of endless work that will never be finished. I prefer the approach of the Python formatter black with a single pre-defined style and almost no user switches.

a. I think that having a fast, functional and reliable formatter is the main goal of most people. Having it produce a consistent style of correctly-formatted code is more important than producing "exactly the exotic style that I prefer".

b. Supporting user-configured style settings adds an incredible amount of complexity. It is highly probable that this complexity will result in a tool that is not reliable.

c. In the end, I don't think everyone is gonna be happy anyway. There's always gonna be corner cases and people with crazy preferences. For me, it's a lesson learned in life and in open source that you're never gonna make everyone happy. Better to focus on making 80% of people happy.
In my opinion, step one should be to have a fast and stable formatter that produces good-looking code. Current master is almost all the way there. Step two should be to add support for automatic line-breaking of statements so the file fits within a user-specified line length. Like black does.

a. To me, this is a way more interesting feature, and adds way more value, than supporting 10 M different combinations of user style preferences.
I think that tools and projects should be neatly scoped. To avoid scope creep, bloat, and to keep complexity down. Think specifically here about the division between formatter and linter. For example, rules about e.g. generic/port/signal name prefixes do not, IMO, belong in a formatter.
From my perspective, it makes more sense if the formatter is a separate project with a separate binary. Some of my reasoning:

a. Some tools that could be built on rust_hdl: formatter (like black), linter (like pylint), import sorter (like isort), documentation extractor, 2019-to-2008 converter, etc. I don't think anyone can make the case that all of this should be in the rust_hdl repo and compiled into the vhdl_lang binary. Most people would argue that they should be separate projects with separate binaries. And if some of them should be separate then I think, logically, all of them should.

b. Main reason for this is of course scoping and complexity.

b. In a typical VSCode Python setup, most LSP duties are handled by pylance, but formatting is handled by black. Nothing wrong with this division.

c. It has been a large historical problem in the VHDL world that there is no generic reliable parser that we can build tools on. Instead every tool, e.g. each formatter, has built its own (limited and application-specific) parser first. Making a formatter in the same repo and tightly-knit with the parser seems to me like repeating a historical mistake.

d. Making the formatter as a separate tool is a great way to showcase how to build "third-party" tools on top of rust_hdl.

I would be very happy to assist in discussions, code reviews, trying new things out, etc. Other than that, I don't really have the Rust knowledge nor the computer science or compiler tech background to understand these things. And I can't allocate time for even more open source work. So unfortunately, I won't contribute any code, but I hope I can be of assistance anyway.

0 replies

Schottkyc137 · 2024-08-05T20:01:17Z

Schottkyc137
Aug 5, 2024
Maintainer

Hi @LukasVik

First of all thanks a lot for the detailed comment! Getting this kind of feedback is always massively helpful and they will shape the future of this project.

Some comments of mine:

I think that supporting user-configured style is a black hole of endless work that will never be finished. I prefer the approach of the Python formatter black with a single pre-defined style and almost no user switches.

I'm a bit on the edge on this. On the one hand I understand the push to have a simple to use configuration and "simply call a tool" that will format a VHDL file in a reasonable manner. On the other hand, @jeremiah-c-leary's point is also valid that VHDL is not python and does not have a style guide. Consider, for example, the following piece of code:

port (
    clk : in std_logic;
    reset: in std_logic;

    s_axis_tdata: in std_logic_vector(7 downto 0);
    s_axis_tvalid: in std_logic;
    s_axis_tready: in std_logic;
);

In my opinion, there are at least three ways that this code could be formatted:
a) Indent everything

port (
    clk                     : in std_logic;
    reset                 : in std_logic;

    s_axis_tdata    : in std_logic_vector(7 downto 0);
    s_axis_tvalid   : in std_logic;
    s_axis_tready : in std_logic;
);

b) Indent per block

port (
    clk     : in std_logic;
    reset : in std_logic;

    s_axis_tdata   : in std_logic_vector(7 downto 0);
    s_axis_tvalid   : in std_logic;
    s_axis_tready : in std_logic;
);

c) Don't add any extra white space.

If I were to guess, I would say that the preference is split between the three options and there are valid reasons to use either. Implementing one option would then not yield the 80%-20% satisfaction rate but rather a 33%-66% rate.

That being said I agree that a configuration file that is too detailed and allows too much customisation can be a huge bottleneck.
For example, I won't implement tabs as separators and only have whitespaces.

Therefore, my current stand is to add some configuration that is designed from the bottom up with lessons learned from other projects and with much user input while not overdoing it. But I'm happy to discuss this.

Step two should be to add support for automatic line-breaking of statements so the file fits within a user-specified line length. Like black does.

I agree on this one. Line length limitation is actually not an easy task to tackle and might take as long as the current formatter (or even longer). Tackling this issue may also show limitations in the current formatter design.

I think that tools and projects should be neatly scoped. To avoid scope creep, bloat, and to keep complexity down. Think specifically here about the division between formatter and linter. For example, rules about e.g. generic/port/signal name prefixes do not, IMO, belong in a formatter.

This I agree fully with. In my opinion, the formatter should leave the tokens 100% unchanged. This includes capitalisation, or "fixing" end labels, e.t.c. Tasks like those belong to a linter.

From my perspective, it makes more sense if the formatter is a separate project with a separate binary. Some of my reasoning:

I'm also on the edge on this one, but you are making a good point. My main reason to keep the two projects unified is that a lot of the tools that people might want to build will emit VHDL code (examples include a 2019 to 2008 converter, conversions from other HDL languages or the import sorter you mentioned). It's nice if you only have one library that you must include to do such tasks and not multiple.
But on the other hand I agree that such tools would probably then include two libraries; one for analyzing VHDL code and one for formatting. Other tools (like documentation extraction) would not include the formatter. This would mean that the language server then depends on the language frontend and the formatter.

0 replies

jeremiah-c-leary · 2024-08-07T03:20:25Z

jeremiah-c-leary
Aug 7, 2024

Evening @LukasVik and @Schottkyc137,

I prefer the approach of the Python formatter black with a single pre-defined style and almost no user switches.

That being said I agree that a configuration file that is too detailed and allows too much customisation can be a huge bottleneck.
For example, I won't implement tabs as separators and only have whitespaces.

The problem with a strongly opinionated formatter is whose opinion is it based on. VHDL does not have an equivalent of PEP8 or rusts style guide. I run black on my code base and it changes all of my single quotes to double quotes. Why? PEP8 says you can use either as long as you are consistent. It irritates me every time I look at my code.

a. I think that having a fast, functional and reliable formatter is the main goal of most people. Having it produce a consistent style of correctly-formatted code is more important than producing "exactly the exotic style that I prefer".

I believe the VSG's user base would disagree. You just have to look at the amount of configuration VSG provides to see the demand. Even though VSG would be slower, it would still be preferred over a formatter that did not match a particular style. I could be wrong though. It could be that VSG currently fills a need and most people would not care about the formatting.

In the end, I don't think everyone is gonna be happy anyway. There's always gonna be corner cases and people with crazy preferences. For me, it's a lesson learned in life and in open source that you're never gonna make everyone happy. Better to focus on making 80% of people happy.

Would it be 80%, or 20% or 1%. We do not know. We do know an opinionated formatter will have a smaller adoption than a configurable one.

I think that tools and projects should be neatly scoped. To avoid scope creep, bloat, and to keep complexity down. Think specifically here about the division between formatter and linter. For example, rules about e.g. generic/port/signal name prefixes do not, IMO, belong in a formatter.

I am struggling to see the difference between a formatter and a linter. If a formatter changes anything, then it essentially detected a DRC error and made a change. The formatter is acting like a linter. The DRCs are implied in the re-formatting. It is just not reporting anything explicitly to the user.

Supporting user-configured style settings adds an incredible amount of complexity. It is highly probable that this complexity will result in a tool that is not reliable.

It would add complexity, but I did not find it too complex. Can you elaborate on how the complexity would make the tool unreliable?

Step two should be to add support for automatic line-breaking of statements so the file fits within a user-specified line length.

How would you know the context to make any reasonable decision? There is nothing more irritating than fighting a tool that insists it knows better than you. Now you need to decide on the rules for breaking lines. What would they be? I have thought about this for VSG and decided not to implement such a feature. I figured the user had their reasons for their line length violations. Those can be fixed in code reviews or pull requests.

From my perspective, it makes more sense if the formatter is a separate project with a separate binary. Some of my reasoning:

I agree with this statement. The formatter, or any other tool, should be separate from rust_hdl. This would force the API to mature and become usable for others to build on. To gain adoption, you have to start somewhere.

Therefore, my current stand is to add some configuration that is designed from the bottom up with lessons learned from other projects and with much user input while not overdoing it. But I'm happy to discuss this.

I'm also on the edge on this one, but you are making a good point. My main reason to keep the two projects unified is that a lot of the tools that people might want to build will emit VHDL code (examples include a 2019 to 2008 converter, conversions from other HDL languages or the import sorter you mentioned).

My suggestion would be to have rust_hdl provide a data structure for other tools to process. All whitespace and comments must be stored in this data structure. Formatting tools would then use the API to make adjustments to the data structure: case rules, whitespace rules, alignment rules.

If you keep the formatting in the data structure, you can then call a rendering function to output a VHDL file. I could see the renderer being part of rust_hdl and called form the API. The renderer doesn't care about the formatting, as that was handled by a formatter, and just dumps the contents of the data structure.

My opinion is rust_hdl should focus it's efforts on parsing VHDL and providing a valuable API to interrogate and manipulate it. Everything else should be handled in a different project. If a compelling API can be created, people will use it.

Regards,

--Jeremy

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding autoformatting #328

{{title}}

Replies: 16 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Adding autoformatting #328

Intro

Current state

Future

Replies: 16 comments

kraigher Apr 22, 2024 Maintainer

Schottkyc137 Apr 22, 2024 Maintainer

KarelPeeters Apr 22, 2024 Author

Schottkyc137 Apr 23, 2024 Maintainer

kraigher Apr 24, 2024 Maintainer

Schottkyc137 Jul 24, 2024 Maintainer

Schottkyc137 Jul 28, 2024 Maintainer

Schottkyc137 Jul 28, 2024 Maintainer

Schottkyc137 Aug 5, 2024 Maintainer

kraigher
Apr 22, 2024
Maintainer

Schottkyc137
Apr 22, 2024
Maintainer

KarelPeeters
Apr 22, 2024
Author

Schottkyc137
Apr 23, 2024
Maintainer

kraigher
Apr 24, 2024
Maintainer

Schottkyc137
Jul 24, 2024
Maintainer

Schottkyc137
Jul 28, 2024
Maintainer

Schottkyc137
Jul 28, 2024
Maintainer

Schottkyc137
Aug 5, 2024
Maintainer