feat(doc): add schema doc generator and initial Docusaurus content#1758
feat(doc): add schema doc generator and initial Docusaurus content#1758dhower-qc wants to merge 31 commits intoriscv:mainfrom
Conversation
- Add schema doc generator gem (tools/ruby-gems/schema_doc_gen/) that produces MDX pages from JSON Schema files; invoked via bin/chore gen schema-docs (D7) - Generator renders bare enum values as pipe-separated backtick values - Add AnchorOpenDetails React component for collapsible schema blocks - Generate schema docs for all v0.1 and v0.2 schemas under doc/docs/schemas/ - Improve config_schema.json and ext_schema.json descriptions and examples - Add spec_state description to schema_defs.json (all six ratification states) - Add IDL syntax highlighting, landing page, and language reference pages (D5/D6) - Add idlc compiler page (doc/docs/idl/idlc.md) - Add configurations concept overview (doc/docs/concepts/configurations/) - Configure navbar, theming, UDB/IDL logos with CSS variable support (D8) - Add CI build-only job via bin/npm wrapper (D4) - Update planning docs (decisions.md D7/D8, implementation plan task status)
There was a problem hiding this comment.
Pull request overview
This PR adds a Ruby-based JSON Schema → MDX documentation generator and seeds a Docusaurus documentation site with initial content (schema reference, IDL pages, concepts), plus CI wiring to keep generated docs from drifting.
Changes:
- Introduce
schema_doc_gen(Ruby gem + CLIs) andbin/chore gen schema-docsto generate MDX schema reference pages (and an index) intodoc/docs/schemas/. - Add Docusaurus site improvements (navbar/logo behavior, Prism language loading + YAML highlighting, new “open details on anchor” helper component).
- Update several JSON Schemas with richer descriptions/examples and add CI/regress coverage for schema-doc generation.
Reviewed changes
Copilot reviewed 49 out of 52 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/test/regress-tests.yaml | Adds a PR-stage smoke test to run schema doc generation. |
| tools/ruby-gems/schema_doc_gen/schema_doc_gen.gemspec | New gemspec for the schema doc generator. |
| tools/ruby-gems/schema_doc_gen/lib/schema_doc_gen/version.rb | Defines schema_doc_gen version. |
| tools/ruby-gems/schema_doc_gen/lib/schema_doc_gen/index_generator.rb | Generates the schema reference index page (MDX). |
| tools/ruby-gems/schema_doc_gen/lib/schema_doc_gen.rb | Main JSON Schema → MDX generator implementation. |
| tools/ruby-gems/schema_doc_gen/bin/schema-docs-all | CLI to generate docs for all schemas + version categories + index. |
| tools/ruby-gems/schema_doc_gen/bin/schema-doc-gen | CLI to generate docs for a single schema. |
| spec/schemas/schema_defs.json | Improves shared schema definitions (descriptions/examples). |
| spec/schemas/ext_schema.json | Expands extension schema description and adds examples/field docs. |
| spec/schemas/config_schema.json | Adds richer top-level documentation and examples for configurations. |
| doc/src/theme/prism-include-languages.js | Adjusts Prism language loading to support extra languages (YAML/IDL). |
| doc/src/theme/Logo/index.tsx | Updates navbar logo rendering to be logo-only and forwards props. |
| doc/src/css/custom.css | Adds YAML syntax highlighting tweaks for Prism themes. |
| doc/src/components/AnchorOpenDetails/index.tsx | New component intended to open collapsible blocks when navigating by anchor. |
| doc/planning/documentation-implementation-plan.md | Updates planning status and notes reflecting implemented docs work. |
| doc/planning/decisions.md | Records decisions D7 (schema docs generator) and D8 (logo-only navbar). |
| doc/docusaurus.config.ts | Updates navbar links and adds Prism additional languages. |
| doc/docs/schemas/v0.2/csr_schema.mdx | Generated schema reference page (v0.2). |
| doc/docs/schemas/v0.2/category.json | Generated Docusaurus category metadata for v0.2 schemas. |
| doc/docs/schemas/v0.1/schema_defs.mdx | Generated schema reference page for shared defs (v0.1). |
| doc/docs/schemas/v0.1/register_file_schema.mdx | Generated schema reference page (v0.1). |
| doc/docs/schemas/v0.1/profile_schema.mdx | Generated schema reference page (v0.1). |
| doc/docs/schemas/v0.1/profile_release_schema.mdx | Generated schema reference page (v0.1). |
| doc/docs/schemas/v0.1/profile_family_schema.mdx | Generated schema reference page (v0.1). |
| doc/docs/schemas/v0.1/proc_cert_model_schema.mdx | Generated schema reference page (v0.1). |
| doc/docs/schemas/v0.1/proc_cert_class_schema.mdx | Generated schema reference page (v0.1). |
| doc/docs/schemas/v0.1/prm_schema.mdx | Generated schema reference page (v0.1). |
| doc/docs/schemas/v0.1/param_schema.mdx | Generated schema reference page (v0.1). |
| doc/docs/schemas/v0.1/non_isa_schema.mdx | Generated schema reference page (v0.1). |
| doc/docs/schemas/v0.1/mmr_schema.mdx | Generated schema reference page (v0.1). |
| doc/docs/schemas/v0.1/manual_version_schema.mdx | Generated schema reference page (v0.1). |
| doc/docs/schemas/v0.1/manual_schema.mdx | Generated schema reference page (v0.1). |
| doc/docs/schemas/v0.1/interrupt_code_schema.mdx | Generated schema reference page (v0.1). |
| doc/docs/schemas/v0.1/inst_variable_metadatas.mdx | Generated schema reference page (v0.1). |
| doc/docs/schemas/v0.1/inst_var_type_schema.mdx | Generated schema reference page (v0.1). |
| doc/docs/schemas/v0.1/inst_var_schema.mdx | Generated schema reference page (v0.1). |
| doc/docs/schemas/v0.1/inst_type_schema.mdx | Generated schema reference page (v0.1). |
| doc/docs/schemas/v0.1/inst_subtype_schema.mdx | Generated schema reference page (v0.1). |
| doc/docs/schemas/v0.1/inst_schema.mdx | Generated schema reference page (v0.1). |
| doc/docs/schemas/v0.1/inst_opcode_schema.mdx | Generated schema reference page (v0.1). |
| doc/docs/schemas/v0.1/ext_schema.mdx | Generated schema reference page (v0.1). |
| doc/docs/schemas/v0.1/exception_code_schema.mdx | Generated schema reference page (v0.1). |
| doc/docs/schemas/v0.1/config_schema.mdx | Generated schema reference page (v0.1). |
| doc/docs/schemas/v0.1/category.json | Generated Docusaurus category metadata for v0.1 schemas. |
| doc/docs/schemas/index.mdx | Generated schema reference landing/index page. |
| doc/docs/schemas/category.json | Adds top-level “Schema Reference” sidebar/category config. |
| doc/docs/idl/idlc.md | New hand-authored IDL compiler internals documentation page. |
| doc/docs/concepts/configurations/overview.md | New configurations concept overview page. |
| doc/docs/concepts/configurations/category.json | Adds Concepts → Configurations category metadata. |
| doc/docs/concepts/category.json | Adds top-level Concepts category metadata. |
| bin/chore | Adds gen schema-docs subcommand and wires it into gen all. |
| .github/workflows/regress.yml | Adds regress-schema-docs job to CI and includes it in required checks. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1758 +/- ##
=======================================
Coverage 72.07% 72.07%
=======================================
Files 55 55
Lines 27799 27799
Branches 6009 6009
=======================================
Hits 20035 20035
Misses 7764 7764
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Add docs-preview.yml workflow that builds and deploys the Docusaurus site to GitHub Pages on every push to doc_next. Update docusaurus.config.ts to read url/baseUrl from env vars (with upstream defaults) so the fork can override them without creating a diff that blocks upstreaming. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 51 out of 54 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 51 out of 54 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 57 out of 60 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 71 out of 74 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 82 out of 86 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| :::note Widening operators are an exception | ||
| Widening operators (`` `+ ``, `` `- ``, `` `* ``, `` `<< ``) produce a result wider than both operands and do not follow the standard conversion rules above. See [Widening Operators](./operators#widening-operators) for details. | ||
| ::: |
There was a problem hiding this comment.
The widening-operators note uses nested backticks with extra spaces (e.g., `+) and an unbalanced final token (`<<). This is likely to render incorrectly in MDX. Use consistent inline-code formatting like `+`, `-`, `*`, `<<`.
|
|
||
| ```idl | ||
| Bits<4> x = 4'd-1; # -1 in 4 bits = 0b1111 = 15 (unsigned) — OK | ||
| Bits<3> y = 3'd-1; # truncates to 3 bits: 0b111 = 7 — sign bit lost | ||
| Bits<3> y = 4'd-1; # truncates to 3 bits: 0b111 = 7 — sign bit lost | ||
| ``` |
There was a problem hiding this comment.
This example is introduced as a case where the literal width is too narrow, but after the edit the literal is 4'd-1 in both lines and the narrowing is coming from assigning into Bits<3>. Either adjust the surrounding explanation to talk about assignment truncation, or change the example back to demonstrate a too-narrow literal width.
| :::note Auto-generated | ||
| This page is generated from [`inst_schema.json`](https://github.com/riscv/riscv-unified-db/blob/main/spec/schemas/inst_schema.json) by the [schema doc generator](https://github.com/riscv/riscv-unified-db/blob/main/tools/ruby-gems/schema_doc_gen/lib/schema_doc_gen.rb). To update this page, edit the schema file and run `bin/chore gen schema-docs`. | ||
| ::: |
There was a problem hiding this comment.
This generated page links to the generator at tools/ruby-gems/schema_doc_gen/..., but the generator in this PR lives under tools/internal-gems/schema_doc_gen/... (as used by other generated pages). This will be a broken link unless the output is regenerated/fixed to point at the correct path.
| @output_dir.children | ||
| .select(&:directory?) | ||
| .map { |d| d.basename.to_s } | ||
| .select { |d| d.start_with?("v") } | ||
| .sort | ||
| .reverse # Highest version first | ||
| end |
There was a problem hiding this comment.
scan_versions sorts version directory names lexicographically, so v0.2 will be considered newer than v0.10. Parse the numeric components and sort by semantic version (e.g., Gem::Version.new(dir.delete_prefix('v'))) to keep the index ordering correct as versions grow.
| - name: Install Node dependencies | ||
| run: npm ci | ||
|
|
||
| - name: Build Docusaurus site | ||
| env: | ||
| DOCUSAURUS_URL: https://dhower-qc.github.io | ||
| DOCUSAURUS_BASE_URL: /riscv-unified-db/ | ||
| run: npm run build --workspace=doc |
There was a problem hiding this comment.
This workflow runs npm ci/npm run build without setting up the repo's pinned toolchain. The repo pins Node via mise (.mise.toml currently uses Node 24.x) and other workflows use ./.github/actions/mise-setup + ./bin/npm .... Without that, ubuntu-latest's default Node version can drift and break the docs build. Use the mise setup action and the bin/npm wrapper here as well (or explicitly set up the required Node version + caching).
| - name: Install Node dependencies | |
| run: npm ci | |
| - name: Build Docusaurus site | |
| env: | |
| DOCUSAURUS_URL: https://dhower-qc.github.io | |
| DOCUSAURUS_BASE_URL: /riscv-unified-db/ | |
| run: npm run build --workspace=doc | |
| - name: Setup pinned toolchain | |
| uses: ./.github/actions/mise-setup | |
| - name: Install Node dependencies | |
| run: ./bin/npm ci | |
| - name: Build Docusaurus site | |
| env: | |
| DOCUSAURUS_URL: https://dhower-qc.github.io | |
| DOCUSAURUS_BASE_URL: /riscv-unified-db/ | |
| run: ./bin/npm run build --workspace=doc |
| ## Worked Example | ||
|
|
||
| The following example shows how the Branch if Less Than or Equal Unsigned (`BLTU`) instruction is specified in IDL. `rs1`, `rs2`, and `imm` are decode fields extracted automatically from the instruction encoding before execution. | ||
| The following example shows how the Branch if Less Than or Equal Unsigned (`BLTU`) instruction is specified in IDL. `xs1`, `xs2`, and `imm` are decode fields extracted automatically from the instruction encoding before execution. | ||
|
|
||
| ```idl title="BLTU instruction" | ||
| Bits<MXLEN> src1 = X[rs1]; # (1) Read X[rs1] | ||
| Bits<MXLEN> src2 = X[rs2]; # (2) Read X[rs2] | ||
| Bits<MXLEN> src1 = X[xs1]; # (1) Read X[xs1] | ||
| Bits<MXLEN> src2 = X[xs2]; # (2) Read X[xs2] | ||
|
|
||
| if (src1 <= src2) { # (3) Unsigned comparison | ||
| jump(PC + $signed(imm)); # (4) Jump to target | ||
| jump($pc + $signed(imm)); # (4) Jump to target | ||
| } |
There was a problem hiding this comment.
RISC-V BLTU is “branch if less-than unsigned” (strict <), not “less than or equal”. If the example intends BLTU, update the prose and the step (3) comparison accordingly; otherwise use the correct mnemonic for the described behavior.
| name: Docs Preview | ||
| on: | ||
| push: | ||
| branches: [doc_next] |
There was a problem hiding this comment.
Is this from testing, or expected to remain? If so, we need to document what's going on here.
|
|
||
| - name: Build Docusaurus site | ||
| env: | ||
| DOCUSAURUS_URL: https://dhower-qc.github.io |
There was a problem hiding this comment.
Is this expected to remain (for now)?
| - Validating that a design conforms to a RISC-V profile | ||
| - Submitting a design for certification | ||
|
|
||
| **Key property**: All IDL code can be fully evaluated and optimized; no unknowns remain. Generated artifacts are tailored to this exact configuration. |
There was a problem hiding this comment.
We haven't really introduced "IDL" here. Shall we substitute "semantic", instead, throughout?
| # All configs: parameter values | ||
| params: # Omit for unconfigured |
There was a problem hiding this comment.
These 2 lines appear to be contradictory.
| ``` | ||
|
|
||
| :::note | ||
| `arch_overlay` is the current name for the spec overlay feature. This will be renamed to `spec_overlay` in a future version for clarity. |
There was a problem hiding this comment.
Shall we make that change now?
| ``` | ||
|
|
||
| Use `implemented?()` for static (compile-time) presence checks and `CSR[misa]` for dynamic enable/disable via software. | ||
| Use `implemented?()` for static (compile-time) checks of whether an extension is implemented and `CSR[misa]` for dynamic (runtime) checks of whether an extension is enabled. |
There was a problem hiding this comment.
| Use `implemented?()` for static (compile-time) checks of whether an extension is implemented and `CSR[misa]` for dynamic (runtime) checks of whether an extension is enabled. | |
| Use `implemented?()` for static (compile-time) checks of whether an extension is implemented and `CSR[misa]` to check whether extensions that can be dynamically unimplemented at runtime are implemented. |
(I'm not even sure I like that suggested wording, but I think it needs to at least be scoped to misa extensions and use "implemented" instead of "enabled".)
| ### The `value_try` / `value_else` Pattern | ||
|
|
||
| The compiler uses a value-error mechanism (similar to exceptions, but lighter-weight) to handle unknowns gracefully: | ||
|
|
||
| ```ruby | ||
| value_result = value_try do | ||
| v = some_expression.value(symtab) | ||
| # Use v here | ||
| end | ||
| value_else(value_result) do | ||
| # Expression has unknown value; handle the fallback | ||
| end | ||
| ``` |
There was a problem hiding this comment.
Is this a custom API specific to the IDL compiler? (I don't think it's a native Ruby API.) Presuming so, the pseudo-namespace "value_" is slightly overloaded here IMHO, with value_try and value_else as parts of the API, but value_result is not, as I understand. Maybe change value_result to just result?
What happens at "Use v here" when some_expression.value() returns something that is at least partially unknown?
We haven't yet really talked about expressions. I think there is some below (haven't read forward yet.) Maybe that discussion needs to come before this?
Also, I think the "expression" here is really referring to a compiled expression, so should make that more clear. In the abstract, an expression is just a string, to my understanding, but a compiled expression is an object with at least a value() method.
|
|
||
| #### `compile_inst_scope(idl, symtab:, input_file:, input_line:)` | ||
|
|
||
| Compiles an instruction `operation()` body, which includes decode variable declarations in addition to statements. |
There was a problem hiding this comment.
includes decode variable declarations
What does that mean, exactly? Does it mean that the body has to include decode variables, or that the result will automatically include decode variables? An example might help?
| require 'pathname' | ||
|
|
||
| # Assume we have: | ||
| # - global_symtab: a SymbolTable with global function and type definitions |
There was a problem hiding this comment.
How does one get a symbol table? (We havn't even instantiated a compiler instance yet.)
| }, | ||
| "requirement_string": { | ||
| "type": "string", | ||
| "description": "Version requirement string (e.g., `>= 2.0`, `~> 1.5`, `= 2.1`)", |
There was a problem hiding this comment.
Should we describe what ~> means here?
| # Ensure output directory exists | ||
| FileUtils.mkdir_p(File.dirname(output_path)) |
There was a problem hiding this comment.
Not really unix convention to create directories in a given path if they don't already exist. This is conventionally the user's responsibility. But, I won't die on this hill.
| schema_name = schema_file.basename(".json") | ||
| output_path = File.join(output_dir, version, "#{schema_name}.mdx") | ||
|
|
||
| FileUtils.mkdir_p(File.dirname(output_path)) |
| Generates Markdown documentation from JSON Schema files for the | ||
| RISC-V Unified Database documentation site. | ||
| DESC | ||
| s.date = "2024-01-01" |
Live preview of these changes is here:
https://dhower-qc.github.io/riscv-unified-db/
MDX pages from JSON Schema files; invoked via bin/chore gen schema-docs (D7)