-
Notifications
You must be signed in to change notification settings - Fork 5
Sphinx builder and autodoc plugin for HDL #244
Comments
My very quick reaction (before going through the details of your message) is: That's one of @Paebbels' main goals since years. As you can see in Paebbels/pyVHDLParser: Use Cases, he designed the parser the he did because he wanted to preserve all the original content and produce a Document Object Model (DOM) for documentation purposes. Long story short, in Dec 2020 - Jan 2021, he split vhdl/pyVHDLModel from pyVHDLParser, and then in joint effort with Tristan, they created pyGHDL.dom, which is pyVHDLModel + libghdl. In other other words, pyGHDL uses GHDL as a frontend (or backend) to fill the abstraction provided by pyVHDLModel. Note the "Consumers" in vhdl/pyVHDLModel: Use Cases. NOTE: In order to be 100% usable for pyVHDLModel, some enhancements are required in libghdl (GHDL), because by default GHDL e.g. ignores all the comments (which are not needed for simulation or synthesis). Therefore, libghdl needs some "third elaboration mode" which allows to preserve all the details for documentation purposes. Another related enhancement is the resolution of all the symbols. That is all slow work in progress, because pyGHDL.dom is a very exciting use case of GHDL, but not the main one. pyVHDLModel and pyGHDL.dom are usable since July 2021. A few months ago, I did some proofs of concept to showcase how they can be used. See Open Source Verification Bundle (OSVB): Project:
As the name suggests, OSVB: Documentation generation » Integration with Sphinx » Lists and tables shows Lists and Tables for entities, ports, generics, architectures, etc. generated automatically with Python, using pyGHDL.dom and tabulate. Furthermore, subsection VHDL Domain contains references to "proper" Sphinx extension projects: I suggest to read VHDL/VHDLDomain#4, which is the "latest" discussion with regard to the syntax and directives to use/create. Then, read/check CESNET/sphinx-vhdl. See also the last commit in https://github.com/umarcor/osvb/commits/sphinx-directives, which is something I drafted together with Patrick this summer, after chatting with @bradleyharden. On a different but related topic, see subsection Diagrams. Apart from the references to SymbiFlow's sphinxcontrib-hdl-diagrams (using ghdl + yosys), recently Symbolator and pyHDLParser were forked to org hdl: There is naturally some overlap between pyHDLParser, pyVHDLModel and pyVHDLParser. We would like to plug pyVHDLModel into Symbolator, in order to decouple it from the parser. Last, but not least, although we are focused on VHDL, there is also pySVModel (see edaa-org/pySVModel#11). |
@GlenNicholls I would be happier with |
Distribution is probably the second most complex challenge here. The first one is knowing/learning how to write proper Sphinx extensions, since the Sphinx codebase/architecture is not the most comfortable to work with. Then, since HDLs are such complex languages, it's not easy to have an extension that works through pip only. In practice, the extension will depend on some compiled/shared library. That is true for VHDL and also for System Verilog, because the most complete open source parsers are written in compiled languages in both cases. Therefore, most projects trade functionality for distribution complexity. VUnit uses a regexp parser in Python, which is known to be limited, but it allows to not depend on anything other than Python. In fact, we discussed about replacing the parser with rust_hdl in the past, and the main reason not to do so was it being written in Rust, a compiled language. Similarly, TerosHDL uses tree-sitter mainly because it's written in JavaScript, and the current implementation is focused on trying to use as much JS as possible, in order to allow it being installed as "just a VSCode extension". pyVHDLParser would be a very interesting solution for both VUnit and TerosHDL, because it would require Python only. TerosHDL is mainly written in JS, but it uses VUnit for relevant features, hence the dependency on Python is there already. Unfortunately, pyVHDLParser was started more than a decade later than GHDL, and it's not ready-to-use per se. That's why, a year ago, we decided to split pyVHDLModel from pyVHDLParser and plug it to GHDL (libghdl). We wanted to take the best (most complete) piece from inside pyVHDLParser and make it usable with the best open source VHDL parser/analyzer/elaborator (GHDL). NOTE: all GHDL, rust_hdl and pyVHDLParser evolved naturally to a very similar internal architecture based on tokens and multiple phases. The main difference between them is the language and how much human-hours were devoted to the development. The challenge is "how to install the parsers to extract data from sources", which is independent from the type of extension we talk about (documentation as text, entity/component block symbols, post-synthesis block diagrams, GUIs such as hwstudio...). For instance, currently SymbiFlow/sphinxcontrib-hdl-diagrams supports Verilog only, because it installs Yosys through WASM-pip (YoWASP) or Conda, and Yosys by default does not support System Verilog or VHDL. There is work in progress to support installing the tools through other mechanisms (SymbiFlow/sphinxcontrib-hdl-diagrams#72), but then we have the same issues we find when distributing tools in general (e.g. SymbiFlow/sphinxcontrib-hdl-diagrams#73). To me, the dependency on GHDL (and Surelog) is perfectly acceptable from a pragmatic point of view, even though the distribution burden is increased. For my use cases, Yosys is not useful without ghdl-yosys-plugin. I need to be able to simulate pre-synthesis model and verify the behaviour, before I deal with post-synthesis results. For that same reason, I would not personally get a meaningful benefit from replacing pyGHDL.dom with pyVHDLParser, or from having GHDL with mcode on YoWASP. It would streamline some containers, but I would need "a regular GHDL" in my stack anyway (because I need to simulate, cosimulate and test the design I'm going to synthesise and document). Nonetheless, I understand that not all the target users of TerosHDL and VUnit are dependent on GHDL, since some of them might use Siemens', Aldec's, Xilinx's or Intel's tools only.
HDL tooling is blocked since years (decades) because parsing and analysing is very hard, and it's impossible to build tooling without a library that understands the language. In parallel, we can and should work on Sphinx directives, because those can be used manually and then plugged to an autodoc. For instance, in GHDL's doc we declare everything manually, because it's an Ada program documented using Sphinx.
This is something we also found when talking about the preliminary implementation of pyEDAA.ProjectModel, first between Patrick and me, and then with @ktbarrett. Terms "Project" and "Design" fall short to describe all the kinds of "things" that we want to work with.
A single "Project" might have multiple independent "Designs", each of them with their shared and unique sources and "Targets" for testing, synthesis and/or implementation. The meaning of "target" varies if we are talking e.g. about simulation entrypoints: OSVVM, CoCoTb and VUnit have very different entrypoints at the moment. pyEDAA.ProjectModel uses pyVHDLModel and pySVModel internally, so that filesets can be declared once only, and then used either for simulation, for synthesis or for documentation purposes. So far so good. There is work to be done yet, but it's doable on the Python side. However, as explained, currently the main usable open source VHDL 2008 parser and elaborator is GHDL, and it does not have a distinction between Project or Designs. For GHDL (libghdl) everything is a single "whatever": all the libraries belong to a single design of a single project. Therefore, there is some complexity to deal with when "running" the pyGHDL.dom to "auto-document" the whole content of a ProjectModel instance. That's something we might deal with on the Python side, or GHDL (libghdl) itself might be enhanced to support it. Other related enhancement in this regard is the dependency/hierarchy resolution as done by VUnit, used by TerosHDL and wanted by FuseSoC (to allow incremental compilation/analysis). The pyEDAA.ProjectModel -> pyVHDLModel -> pyGHDL.dom -> libghdl chain can allow us to use GHDL for resolving the symbols and the hierarchy. That is on the table, but not actively developed at the moment. In order to boost it, we (as a community) need to buy time for Tristan or get to train other developers who can code Ada (or Rust) with an strong compiler and hardware design background (such a unicorm). Without going into details, the distribution of components/modules is also to be taken into account. Some projects rely on git submodules for dependency control, others use specific tools such as FuseSoC or apio, the industry standard is IPXACT (see pyEDAA.IPXACT and edaa-org.github.io/pyEDAA.IPXACT: References). Intersphinx is such a powerful feature to build tightly cross-referenced but distributed documentation sites (e.g. ghdl, ghdl-cosim, edaa-org, hdl/constraints, openFPGALoader, etc.); however, we need to "resolve" dependency mechanisms in order to have it automated.
Maybe this is something to be proposed in the VHDL WG. The PAR is open for until 2024? 2025? Now is a good time to discuss whether we want to have some docstring format in the language. It is sensible to have that discussion in parallel with developing a set of Sphinx directives. The comments used by TerosHDL, which feel to be inspired by doxygen can be a good starting point for the discussion. If it does not suit the WG, it can be done in the Open Source VHDL Group (next to the pyVHDLModel).
That's what I thought as well. However, a few days ago I had to document some dataclasses, and I found the following syntax to be required: https://github.com/hdl/containers/blob/d50f46d67aa4494dfdb4a01abc67c7e49bf9efea/utils/pyHDLC/__init__.py#L70-L81. Note that
I prefer to have the documentation as close as possible to what you are documenting. E.g., I don't feel comfortable with Anyway, I believe this might be thinking too far now. IMHO, we should first deal with what's part of the language(s) already, and have directives to document either manually or automatically. Then, we might better see which is the information that needs to be provided through comments because the language cannot hold it otherwise.
I think that the Python approach is ok: have a docstrings for the file. It might be: --| This is the docstring of a VHDL file.
--|
--| It's multiline!
--: IEEE library
library ieee;
--: Standard context from the IEEE library
context ieee.ieee_std_context;
entity and_gate is
--| Documentation for and_gate.
--|
--| ``reST`` is supported and can link to other modules :data:`~lib.pkg.SOME_CONST`.
port (
--: The clock.
clk : in std_logic; -- this is a regular comment
--: Enable.
--:
--: This can also be multiline!
-- But this part is not included in the documentation.
-- It can be multiline as well.
-- It's just a regular comment after all.
en : in std_logic;
z : out std_logic -- this output is undocumented
);
end entity; Note that
I believe this is more related to the theme than to sphinx itself. Most sphinx themes have very little javascript (they are most HTML + CSS). So, the challenge is finding some frontend developer who can adapt a theme. Other interesting features in this regard would be pan and zoom viewers of diagrams, because many large and complex representations can be generated from hardware designs.
Rather than a declarative configuration file, I would focus on a Python object/class, i.e. on pyEDAA.ProjectModel. That can be read in the
I suggest we focus on the very last step. How to represent VHDL semantics in terms of Sphinx language domain features. Note: pyVHDLModelUtils converts pyVHDLModel to restructuredtext, not to Sphinx. I.e., it's not an extension but "Python code executed at build time". |
Some quick points that comes to mind on my end. As mentioned by @JimLewis , I quite like the approach of Colibri which allows adding a special character at end of comment symbol in order to specify that it is a comment for documentation, like As of today, we started using this feature of Colibri at work with nice results, but there are few points that could be improved. Off the top of my head, I would have those pro/cons about the current Colibri way of work: Support of markdown and WavedromThe fact that Colibri works in plain markdown (easy to learn, understand, write) is great. The fact that it supports Wavedrom syntax natively is an absolute gem for documentation. If possible, a documentation system shall use it too. (We've done some insane diagrams with this) In source documentationThe fact that the documentation is in source "only" is by itself a bit of an issue. Therefore a question, does Sphinx support having documentation for an arbitrary element from outside of the source file ? I don't remember exactly, but I believe that either doxygen or sphinx allows this. In the same vibe, I wouldn't like having extensive blocs of documentation between each ports. One-liner at the end of the port declaration would be my go-to. Ports documentationPlease take into account that some IOs of a module might require extensive documentation. While some trivial ports (system clock, in example) might be good with a one liner, other might be related to timings, protocols and such.
Is a great thing. This table could act as a summary and link to a more extensive documentation, shall it exists. |
I'm away, next week I will do a more detailed comment. @suzizecat about the external documentation. Is it similar to?: https://terostechnology.github.io/terosHDLdoc/documenter/start.html#special-tags |
@qarlosalberto Yay ! (Didn't even know you had those) Therefore you can have the whole documentation separated from the code (including, as example, documentation for internal signals and such) |
There is a wavedrom extension for Sphinx already. Maybe docstrings can have an optional language keyword, similarly to code blocks in markdown or rst, so that users can specify the markup language used in the docstrings.
Markdown was designed for single-file documents. Therefore, there is no standard solution to include other files. However, Sphinx and restructuredtext do support For example, fields "Constraints" and "openFPGALoader" in https://hdl.github.io/constraints/Data/Boards/ are generated automatically from YAML sources located in multiple repositories, but shown together (per board). Another example is http://vunit.github.io/examples.html#vhdl, where all the content (the description of each example) is extracted from the docstrings in |
The doxygen style of comments ( I don't suggest to use Markdown as a documentation language, because it's not powerful enough for most purposes. A language like ReStructured Text (ReST) is needed. Maybe it should be a "paragma" or global option how to interpret documentation comments as documentation syntax, because Markdown can't be distinguished from ReST content. Otherwise, the third character after the double comment character could denote the used documentation language. It's not needed to use different comment indications for before and after an object. Other languages have managed to define rules so the comment location defined to what object to annotate the comment. As @umarcor wrote, there are currently 2 approaches to get comments from VHDL files:
Why is it on hold? |
First, I want to say that everyone here is much more active and invested in the VHDL ecosystem than I am. So take my opinions for whatever you think they are worth. That being said, I see a lot of what I would consider very preliminary bikeshedding going on in this thread.
All of these things are trivial details that could be decided and changed later. In my opinion, there are three major pieces that deserve the most attention:
There are several potential options for (1). As @umarcor mentioned, distribution of the selected tool could be a challenge. But overall, this area is the most developed of the three and the easiest to get working. (2) is what I was trying to get at in Paebbels/sphinxcontrib-vhdldomain#4. I would consider this the biggest challenge. This should be your main focus. (3) can't be handled until (2) is handled anyway.
I agree with this 100%. Back in 2019, I spent a few weeks working my way through the Sphinx codebase to understand how Python documentation is generated with an eye toward VHDL. I tried to capture some of that in the linked issue. I came away with a few conclusions:
I would absolutely love a Sphinx plugin for VHDL. But creating it will be no small project. Before discussing anything else, I would strongly suggest gaining a better understanding of Sphinx and reStructured Text. Otherwise, it will be very easy to accidentally design a reST syntax that ends up being difficult to parse and translate into the correct reST nodes. In my opinion, everything else in this thread can and should take a back seat to the development of a plan for reST and Sphinx. Edit: I just took a quick look at my old post. I want to highlight this one particular line, because I think it's important for this discussion:
|
According to on of the Sphinx developers, the .NET domain is the most complex domain they did and could server as a starting point for VHDL. Other domains might be too simple. But as @bradleyharden said, Sphinx might be the main challenge after somehow getting comments from VHDL files. Even with a regexp approach like in VUnit, we would get quite decent results for comment extraction, but have no VHDL domain in Sphinx. |
I like this idea overall. My one counter would be that none of the software languages may be a good model for VHDL. I would suggest reading the .NET domain implementation while focusing on the broader Sphinx architecture. When I went through the Python implementation, there were a lot of parts that wouldn't really work for VHDL entities. I did learn a bit about how Sphinx nodes are created and recursively processed, but I wish I had understood what to focus on before I started. |
My two cents, if one wants to get a hang of Sphinx for HDL, wouldn't be more simple to start from the C++ domain declaration and make changes in order to support SV which is rather close to C++ ? This could allow a simpler parser side while helping getting the internal data structure straight, as the internal structure would be closer-ish to VHDL than any other supported (application) language. |
That's an idea. I think there are probably lots of ways to go about it. I was thinking about it a little last night, and the questions that came to mind were:
|
FYI, someone is advertising the CESNET repo on Reddit. Maybe it woulw make more sense to start from that project and expand? |
See https://gitter.im/ghdl1/Lobby?at=61336cb85cfd665e5208f098
|
@umarcor @LarsAsplund @JimLewis @eine @suzizecat @Paebbels @nfrancque I have been talking a bit with @qarlosalberto about potentially creating a sphinx builder/autodoc plugin for HDL, although I'm more interested in VHDL. I would really like the ability to automatically document VHDL code with autodoc and a VHDL language domain capability. TerosHDL already has autodoc capability for MD/HTML, but it is difficult to manage for large or distributed projects/repos. I am hoping that Sphinx solves this problem and it'd be really nice to eventually have a generic HDL extension in sphinx-contrib that anyone can use with pip and sphinx. Plus, I see a lot of potential for this to be very widely used as HDL documentation is an area where I have not seen a solution that meets most people's needs. I decided to ping y'all as I feel that you might find this valuable for your projects, you might have some good ideas, and I'm also hoping to get some feedback about how you think this should look. Feel free to ping others.
Anyways, if you feel that this discussion belongs on Gitter, the VHDL WG issues, or elsewhere, let me know and I can move it. I am planning to talk with management at my company about sponsoring this effort and Carlos was the first person I thought might be interested in taking this on based on Colibri. I previously attempted to go from Colibri to doxygen/asciidoctor. This proved to be a huge headache for distributed projects and is not going to be easy to maintain. I figured reST and Sphinx seemed like a natural choice as there is an interface to create extensions for other languages, builders, etc. The other reason I want to have a discussion about this is there are already quite a few VHDL autodoc plugins for Asciidoctor and others, but most are dead and I don't think any of them really found the right way to document VHDL or present it in the actual HTML/PDF/etc.
Documenting Source Code
Currently, Colibri and others tend to use some pattern in the comments to differentiate between a comment and documentation. I believe colibri can use any symbol, but it defaults to
!
, i.e.--!
. I think this is a natural direction because VHDL doesn't have any notion of a docstring or similar and I know parsing VHDL is not easy. However, I'm not the biggest fan of that pattern because it's extra typing. If your editor/TerosHDL supports carrying on this comment and symbol with ENTER, then it's not that big of a deal. But, if people don't have that then it's annoying to write and update documentation. Using a different pattern in a multi-line comment makes a little bit more sense to me, but that is only valid for 2008+. I don't think this is viable though.Instead, I think it should be based on the placement of the comment similar to Python. In python, you can add a docstring by placing the block comment directly underneath the declaration:
For VHDL, I think it makes sense to put the documentation after things as well:
Or even this maybe, adapting from google's style guide for python:
Block comments would also be allowed in the same place. I don't like that the documentation is below the entire entity declaration, maybe it makes sense to adopt Python's notation for classes where the docstring is after the class declaration or something like this maybe?
Personally I think right before the entity makes sense. However, a lot of people put comment headers in their code so for things like signals/constants/etc. this might jack up the documentation but that could also be something that just has to change if they want to use the tool.
One unknown here is how to provide documentation for a file. I.e. if you have a license at the top of your file, how does the parser know if there is something below it that should be documentation like https://github.com/VUnit/vunit/blob/master/vunit/__init__.py#L7-L9? I don't necessarily think that this is even needed though. People generally put one package/ent per file so it seems like something that doesn't really matter. Even in cases where someone puts multiple entities/packages in the same file, their documentation for the library already has everything so I can't think of any useful reason to have a global file documentation section.
Layout
Once the documentation is generated, I HATE the idea that tables are used for ports/generics. I don't think it's intuitive, I feel that tables make it so you focus too much on port attributes like the type or size instead of what the port is and how to use it. Instead, I think that it should just look like it does for python:
With VHDL, we care more about types/ranges and whatnot, so for this we should include that information in the parameters/generics list.
As for the documentation itself in the image above but for an entity? I think that should be it. All you really care about when instancing it is the API and how to use the component, you don't care about the implementation details and they should be considered private. However, sometimes this is critical to understand, so I think that within the documentation for
and_gate
, there should be a drop-down to expand the architecture (or architectures) for you to view the internal/private documentation like info about signals, FSM's, etc. Teros has some neat features and internal FSM's etc would be cool to reference in an entity's documentation so it shows up when someone scrolls pastand_gate
in the documentation. Another neat idea would be to display the block hierarchy, e.g.and_gate
and how it connects sub-components. This would be super useful for project documentation where you can go into the top level's documentation and see exactly how everything is connected.Now for subprograms, packages, etc. it'd just expand on the above ideas to determine a documentation convention along with how it should be displayed.
Annotations/Autodoc
So for autodoc with Python in Sphinx, you can do stuff like this for automagically inserting documentation in a reST document:
How would these
automodule
,autoclass
,... look for VHDL/HDL? I don't like the idea that we would specify files directly as I will talk about how we can handle that later. I'm kind of thinking that we could useautolibrary
which would populate all entitys, packages, architectures, etc in a library:As for other annotations, I think referencing other components, subprograms, etc. would use the library if it exists, otherwise the name itself if it is not in a library:
Sphinx Building
I am thinking that there would only need to be an HDL extension as the included html builder should still work once it gets the correct information. However, the HDL extension would provide a flag to specify a configuration file. This file would be CSV (or TOML, YAML, or whatever makes sense) that would define all files in the project, their associated library, and anything else needed for building the documentation.
There's probably other stuff I haven't thought about, but this is the gist. Thoughts, suggestions, concerns?
The text was updated successfully, but these errors were encountered: