feat: add backend for an instruction table by thecharlesjenkins · Pull Request #1780 · riscv/riscv-unified-db

thecharlesjenkins · 2026-04-07T23:41:55Z

This new backend that can be generated with "bundle exec rake gen:inst_table" is designed to organize all of the requested instructions into a table modeled after the syscall table in Linux. The usecase for this is to commit the generated table into Linux which will use the table to generate the necessary instruction headers.

To be able to commit to Linux, I am marking the generated file as GPL-2.0-only and Reuse requires adding the License file to this repo for that.

codecov · 2026-04-07T23:48:40Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.21%. Comparing base (7259e0e) to head (39ee8dd).

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1780   +/-   ##
=======================================
  Coverage   72.21%   72.21%           
=======================================
  Files          52       52           
  Lines       27740    27740           
  Branches     6012     6012           
=======================================
  Hits        20033    20033           
  Misses       7707     7707

Flag	Coverage Δ
idlc	`76.16% <ø> (ø)`
udb	`66.25% <100.00%> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

ThinkOpenly · 2026-04-08T04:36:59Z

To be able to commit to Linux, I am marking the generated file as GPL-2.0-only and Reuse requires adding the License file to this repo for that.

I don't believe generated content intended for a GPL project needs to licensed as GPL at creation. It does need to be GPL-compatible, and there is a good description of many licenses and their compatibility with GPL at https://www.gnu.org/licenses/license-list.en.html. As an example, you'll find "MIT license" not only compatible with GPL, but also already present here.

organize all of the requested instructions into a table modeled after the syscall table in Linux

What are "requested instructions"? Put differently, what exactly are you generating? Are you intending to replace existing content, or generate new content? How is it used in Linux? (Is this for instruction single-step?)

thecharlesjenkins · 2026-04-08T04:52:24Z

To be able to commit to Linux, I am marking the generated file as GPL-2.0-only and Reuse requires adding the License file to this repo for that.

I don't believe generated content intended for a GPL project needs to licensed as GPL at creation. It does need to be GPL-compatible, and there is a good description of many licenses and their compatibility with GPL at https://www.gnu.org/licenses/license-list.en.html. As an example, you'll find "MIT license" not only compatible with GPL, but also already present here.

Oh interesting, if it's preferred to have a different license I can do that to.

organize all of the requested instructions into a table modeled after the syscall table in Linux

What are "requested instructions"? Put differently, what exactly are you generating? Are you intending to replace existing content, or generate new content? How is it used in Linux? (Is this for instruction single-step?)

"requested instructions" for Linux is all instructions but if other projects what to use this table they can take a subset of instructions if they prefer, I was just trying to be general.

Here is the patch series to Linux I just sent out to use this table https://lore.kernel.org/linux-riscv/20260407-riscv_insn_table-v1-0-54b4736a1e77@gmail.com/T/#t. Instruction single-step is part of it but there are a ton of places that use manually generated instructions that I swap over to using this table in the series.

ThinkOpenly · 2026-04-08T16:07:38Z

Oh interesting, if it's preferred to have a different license I can do that to.

To me, it seems like the "easy button" is to use the same license for content generated from other content as the original content. The YAML files here are BSD-based (described as compatible with GPL at the link above as "The Clear BSD License").

Here is the patch series to Linux I just sent out to use this table https://lore.kernel.org/linux-riscv/20260407-riscv_insn_table-v1-0-54b4736a1e77@gmail.com/T/#t. Instruction single-step is part of it but there are a ton of places that use manually generated instructions that I swap over to using this table in the series.

That seems very, very cool! I'd love to look at the script that performs the generation, but it doesn't appear that patch (01/16) made it to the mailing list archives. :-/

There is a lot of code that is manually manipulating riscv instructions. Constructing these headers for new instructions is time consuming and error prone. The first patch in this series introduces the instruction definition table along with a script that runs at compile time to create all of the associated instruction manipulation functions. The remaining patches migrate all of the manual instruction code to use these generated functions. The instruction definition table is generated from the riscv-unified-db[1], an open source RVI project for instruction specifications. I am sending this series at the same time as I am sending the patch to the riscv-unified-db for the format of the instruction table. That patch can be accessed on github [2]. A lot of the validation for these changes is from running all possible 32-bit or 16-bit integers through these functions to see that any given instruction will produce to expected result. I give more detail on the test cases in the notes of the first couple of patches. For the KVM patches, I also introduce two test cases to ensure that the instruction manipulation is working as expected, along with a bug fix of the current implemention. [1] https://github.com/riscv/riscv-unified-db [2] riscv/riscv-unified-db#1780 # Describe the purpose of this series. The information you put here # will be used by the project maintainer to make a decision whether # your patches should be reviewed, and in what priority order. Please be # very detailed and link to any relevant discussions or sites that the # maintainer can review to better understand your proposed changes. If you # only have a single patch in your series, the contents of the cover # letter will be appended to the "under-the-cut" portion of the patch. # Lines starting with # will be removed from the cover letter. You can # use them to add notes or reminders to yourself. If you want to use # markdown headers in your cover letter, start the line with ">#". # You can add trailers to the cover letter. Any email addresses found in # these trailers will be added to the addresses specified/generated # during the b4 send stage. You can also run "b4 prep --auto-to-cc" to # auto-populate the To: and Cc: trailers based on the code being # modified. To: Paul Walmsley <pjw@kernel.org> To: Palmer Dabbelt <palmer@dabbelt.com> To: Alexandre Ghiti <alex@ghiti.fr> To: Anup Patel <anup@brainfault.org> To: Atish Patra <atish.patra@linux.dev> To: Conor Dooley <conor@kernel.org> To: Paolo Bonzini <pbonzini@redhat.com> To: Andrew Morton <akpm@linux-foundation.org> To: Shuah Khan <shuah@kernel.org> Cc: linux-riscv@lists.infradead.org Cc: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org Cc: kvm-riscv@lists.infradead.org Cc: linux-kselftest@vger.kernel.org Signed-off-by: Charlie Jenkins <thecharlesjenkins@gmail.com> --- Changes in v2: - EDITME: describe what is new in this series revision. - EDITME: use bulletpoints and terse descriptions. - Link to v1: https://lore.kernel.org/r/20260407-riscv_insn_table-v1-0-54b4736a1e77@gmail.com --- b4-submit-tracking --- # This section is used internally by b4 prep for tracking purposes. { "series": { "revision": 2, "change-id": "20260114-riscv_insn_table-38495495dfd3", "prefixes": [], "history": { "v1": [ "20260407-riscv_insn_table-v1-0-54b4736a1e77@gmail.com" ] } } }

thecharlesjenkins · 2026-04-08T18:08:39Z

I resent the patch and then right after I sent it the original one showed up on the archives so now it is there twice... It might have been throttled since it is a large patch. Here it is https://lore.kernel.org/lkml/20260407-riscv_insn_table-v1-1-54b4736a1e77@gmail.com/

ThinkOpenly

One thing we try to do for backends that generate consistent output is to add a simple test that compares newly generated output with a "golden" reference. This can be a bit of a pain to maintain, but I expect we'll have "autofix" fix things up automatically in the near future. This helps if someone, for example, changes the scripts here but thinks the changes are innocuous when they aren't.

thecharlesjenkins · 2026-04-10T02:21:18Z

One thing we try to do for backends that generate consistent output is to add a simple test that compares newly generated output with a "golden" reference. This can be a bit of a pain to maintain, but I expect we'll have "autofix" fix things up automatically in the near future. This helps if someone, for example, changes the scripts here but thinks the changes are innocuous when they aren't.

Yes I agree that makes sense. However, I am not sure how to add tests in the new generator infrastructure, and how I would go about making a test that wouldn't break each time a new instruction is added. It looks like the only existing test case in this new method is the type linting:

riscv-unified-db/tools/ruby-gems/udb-gen/Rakefile

Line 71 in ba151af

namespace :test do

. I can mess with it longer, but I don't really know how this is supposed to fit together.

ThinkOpenly · 2026-04-10T18:44:31Z

I am not sure ... how I would go about making a test that wouldn't break each time a new instruction is added.

It's supposed to break when new content is added. We're trying to catch unintentionally added content, and for intentionally added content, to see it is processed as expected. With no representation of the output of the script, there's nothing to test against and script modifications go untested.

I am not sure how to add tests in the new generator infrastructure

I'm thinking more like the testing done in backends/instruction_appendix or tools/python (profile_extensions). Nothing fancy.

You could take it further, of course:

Add the kernel generation script (insn_tbl.sh) here as well and use it to generate insn_gen.h here just to make sure that script continues working forever.
Maybe instead of adding the script, live pull the latest version of that script from the kernel tree instead of storing a copy here. (Or get "renovate" to keep it up to date here automatically, once we get "renovate" working satisfactorily... if that's within renovate's scope.)
Compile something meaningful that uses the generated insn_gen.h to verify inst_tbl.sh is generating valid content. "Something meaningful" could be as sophisticated as desired, including more pieces directly from the kernel.

All of that is a little risky if paths or file names change in the kernel, of course.

thecharlesjenkins · 2026-04-15T21:32:58Z

I added a couple of test cases. One of the test cases is to compare against the full output, and the other ones check that hypothetical instruction match expected output.

thecharlesjenkins · 2026-04-16T20:51:56Z

I added mocha to mock the instruction objects to improve the testing. How do I trigger the dockerfile to rebuild so that it gets the updated Rakefile and pulls in mocha?

This new backend that can be generated with "bundle exec ruby bin/container/udb-gen inst-table -o inst_table.txt" is designed to organize the UDB instructions into a table modeled after the syscall table in Linux. The usecase for this is to commit the generated table into Linux which will then use the table to generate the necessary instruction headers. Signed-off-by: Charlie Jenkins <thecharlesjenkins@gmail.com>

Tapioca annotations in rbi-central are community maintained and are not always correct. Remove them from the chore script to avoid them being forcefully added by the CI. This was making it impossible to add "mocha" as a dependency because the rbi-central types are conflicting with the sorbet-typed versions. Signed-off-by: Charlie Jenkins <thecharlesjenkins@gmail.com>

Add the Mocha gem to use for mocking objects in test cases. Signed-off-by: Charlie Jenkins <thecharlesjenkins@gmail.com>

Validate the instruction table generation against all existing instructions. This test file will need to be regenerated each time instructions change, but can be useful to catch unexpected changes. Signed-off-by: Charlie Jenkins <thecharlesjenkins@gmail.com>

ThinkOpenly · 2026-04-16T13:45:28Z

    puts "All IDL passed type checking"
  end
+
+  task :inst_table_gen do


This whole stanza should probably not be in the main Rakefile, but closer to the generator itself, and included here. We have a mechanism already set up well above that automatically pulls in tools/*/tasks.rake. So, I suggest creating a new tools/ruby-gems/tasks.rake and move this content there.

ThinkOpenly · 2026-04-16T13:47:48Z

+  task :inst_table_gen do
+    require_relative "tools/ruby-gems/udb-gen/lib/udb-gen/generators/inst_table/table_builder"
+    golden_path = "tests/data/golden/test_table.txt"
+    cfg = "_"


This should allow other configs, at the user's discretion. You can find a use of CFG environment variable for allowing that use-case above.

ThinkOpenly · 2026-04-16T14:01:31Z

+      actual_output = Dir.chdir(dir) do
+        builder.generate
+      end
+
+      if File.exist?(golden_path)
+        expected_output = File.read(golden_path)
+
+        if actual_output == expected_output
+          puts "SUCCESS: Generated output matches #{golden_path}"
+        else
+          puts "FAILURE: Output has diverged!"
+
+          File.write(actual_path, actual_output)
+          system("diff -u #{golden_path} #{actual_path}")
+
+          raise "Golden file mismatch for #{cfg}"
+        end
+      else
+        puts "Golden file not found. Creating #{golden_path}..."
+        FileUtils.mkdir_p(File.dirname(golden_path))
+        File.write(golden_path, actual_output)
+        puts "Done."
+      end


This is carrying a big load. Let's split this into three different tasks:

gen:inst_table
Just generates the table (optionally based on a specified configuration).
Used in local testing and by the next two tasks.

test:inst_table
Verify that a newly generated (via gen:inst_table) table is identical to the golden reference.
Mostly used by CI and maybe pre-commit.

chore:update_inst_table
Generate a new inst_table and makes it the golden reference.
Used locally to complete a PR, maybe pre-commit as well, and ideally autofix.

ThinkOpenly · 2026-05-05T12:33:26Z

-
-  # tapioca annotations requires the ext/rbi-central submodule to be initialized.
-  if [ ! -f "${UDB_ROOT}/ext/rbi-central/index.json" ]; then
-    echo "Initializing ext/rbi-central submodule..."
-    git -C "${UDB_ROOT}" submodule update --init ext/rbi-central
-  fi
-  "${UDB_ROOT}"/bin/bundle exec --gemfile "${UDB_ROOT}"/Gemfile tapioca annotations


@dhower-qc and/or @jordancarlin, thoughts here? @thecharlesjenkins explained this change in the commit message a3f4b4f:

Tapioca annotations in rbi-central are community maintained and are not
always correct. Remove them from the chore script to avoid them being
forcefully added by the CI. This was making it impossible to add "mocha"
as a dependency because the rbi-central types are conflicting with the
sorbet-typed versions.

ThinkOpenly · 2026-05-05T12:35:33Z

Whoa. I guess I forgot to actually submit comments from an earlier review session. sigh.

thecharlesjenkins requested review from ThinkOpenly and dhower-qc as code owners April 7, 2026 23:41

ThinkOpenly requested changes Apr 9, 2026

View reviewed changes

thecharlesjenkins force-pushed the dev/charlie/insn_table branch from 61274ee to a1f6fe8 Compare April 10, 2026 02:27

thecharlesjenkins force-pushed the dev/charlie/insn_table branch from a1f6fe8 to 3d473cf Compare April 15, 2026 21:32

thecharlesjenkins force-pushed the dev/charlie/insn_table branch 6 times, most recently from 21cfdff to 982a22d Compare April 21, 2026 03:44

thecharlesjenkins requested a review from ThinkOpenly April 21, 2026 04:28

thecharlesjenkins added 4 commits May 4, 2026 23:10

feat: add mocha to udb-gen

38a3168

Add the Mocha gem to use for mocking objects in test cases. Signed-off-by: Charlie Jenkins <thecharlesjenkins@gmail.com>

thecharlesjenkins force-pushed the dev/charlie/insn_table branch from 982a22d to 39ee8dd Compare May 5, 2026 06:14

ThinkOpenly requested changes May 5, 2026

View reviewed changes

Conversation

thecharlesjenkins commented Apr 7, 2026

Uh oh!

codecov Bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ThinkOpenly commented Apr 8, 2026

Uh oh!

thecharlesjenkins commented Apr 8, 2026

Uh oh!

ThinkOpenly commented Apr 8, 2026

Uh oh!

thecharlesjenkins commented Apr 8, 2026

Uh oh!

ThinkOpenly left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thecharlesjenkins commented Apr 10, 2026

Uh oh!

ThinkOpenly commented Apr 10, 2026

Uh oh!

thecharlesjenkins commented Apr 15, 2026

Uh oh!

thecharlesjenkins commented Apr 16, 2026

Uh oh!

Uh oh!

Uh oh!

ThinkOpenly Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

ThinkOpenly Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

ThinkOpenly Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

ThinkOpenly May 5, 2026

Choose a reason for hiding this comment

Uh oh!

ThinkOpenly commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented Apr 7, 2026 •

edited

Loading