Skip to content

feat: add backend for an instruction table#1780

Open
thecharlesjenkins wants to merge 4 commits intoriscv:mainfrom
thecharlesjenkins:dev/charlie/insn_table
Open

feat: add backend for an instruction table#1780
thecharlesjenkins wants to merge 4 commits intoriscv:mainfrom
thecharlesjenkins:dev/charlie/insn_table

Conversation

@thecharlesjenkins
Copy link
Copy Markdown
Contributor

This new backend that can be generated with "bundle exec rake gen:inst_table" is designed to organize all of the requested instructions into a table modeled after the syscall table in Linux. The usecase for this is to commit the generated table into Linux which will use the table to generate the necessary instruction headers.

To be able to commit to Linux, I am marking the generated file as GPL-2.0-only and Reuse requires adding the License file to this repo for that.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 7, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.21%. Comparing base (7259e0e) to head (39ee8dd).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1780   +/-   ##
=======================================
  Coverage   72.21%   72.21%           
=======================================
  Files          52       52           
  Lines       27740    27740           
  Branches     6012     6012           
=======================================
  Hits        20033    20033           
  Misses       7707     7707           
Flag Coverage Δ
idlc 76.16% <ø> (ø)
udb 66.25% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@ThinkOpenly
Copy link
Copy Markdown
Collaborator

To be able to commit to Linux, I am marking the generated file as GPL-2.0-only and Reuse requires adding the License file to this repo for that.

I don't believe generated content intended for a GPL project needs to licensed as GPL at creation. It does need to be GPL-compatible, and there is a good description of many licenses and their compatibility with GPL at https://www.gnu.org/licenses/license-list.en.html. As an example, you'll find "MIT license" not only compatible with GPL, but also already present here.

organize all of the requested instructions into a table modeled after the syscall table in Linux

What are "requested instructions"? Put differently, what exactly are you generating? Are you intending to replace existing content, or generate new content? How is it used in Linux? (Is this for instruction single-step?)

@thecharlesjenkins
Copy link
Copy Markdown
Contributor Author

To be able to commit to Linux, I am marking the generated file as GPL-2.0-only and Reuse requires adding the License file to this repo for that.

I don't believe generated content intended for a GPL project needs to licensed as GPL at creation. It does need to be GPL-compatible, and there is a good description of many licenses and their compatibility with GPL at https://www.gnu.org/licenses/license-list.en.html. As an example, you'll find "MIT license" not only compatible with GPL, but also already present here.

Oh interesting, if it's preferred to have a different license I can do that to.

organize all of the requested instructions into a table modeled after the syscall table in Linux

What are "requested instructions"? Put differently, what exactly are you generating? Are you intending to replace existing content, or generate new content? How is it used in Linux? (Is this for instruction single-step?)

"requested instructions" for Linux is all instructions but if other projects what to use this table they can take a subset of instructions if they prefer, I was just trying to be general.

Here is the patch series to Linux I just sent out to use this table https://lore.kernel.org/linux-riscv/20260407-riscv_insn_table-v1-0-54b4736a1e77@gmail.com/T/#t. Instruction single-step is part of it but there are a ton of places that use manually generated instructions that I swap over to using this table in the series.

@ThinkOpenly
Copy link
Copy Markdown
Collaborator

Oh interesting, if it's preferred to have a different license I can do that to.

To me, it seems like the "easy button" is to use the same license for content generated from other content as the original content. The YAML files here are BSD-based (described as compatible with GPL at the link above as "The Clear BSD License").

Here is the patch series to Linux I just sent out to use this table https://lore.kernel.org/linux-riscv/20260407-riscv_insn_table-v1-0-54b4736a1e77@gmail.com/T/#t. Instruction single-step is part of it but there are a ton of places that use manually generated instructions that I swap over to using this table in the series.

That seems very, very cool! I'd love to look at the script that performs the generation, but it doesn't appear that patch (01/16) made it to the mailing list archives. :-/

thecharlesjenkins added a commit to thecharlesjenkins/linux that referenced this pull request Apr 8, 2026
There is a lot of code that is manually manipulating riscv instructions.
Constructing these headers for new instructions is time consuming and
error prone.

The first patch in this series introduces the instruction definition
table along with a script that runs at compile time to create all of the
associated instruction manipulation functions. The remaining patches
migrate all of the manual instruction code to use these generated
functions.

The instruction definition table is generated from the riscv-unified-db[1],
an open source RVI project for instruction specifications. I am sending
this series at the same time as I am sending the patch to the
riscv-unified-db for the format of the instruction table. That patch can
be accessed on github [2].

A lot of the validation for these changes is from running all possible
32-bit or 16-bit integers through these functions to see that any given
instruction will produce to expected result. I give more detail on the
test cases in the notes of the first couple of patches.

For the KVM patches, I also introduce two test cases to ensure that the
instruction manipulation is working as expected, along with a bug fix of
the current implemention.

[1] https://github.com/riscv/riscv-unified-db
[2] riscv/riscv-unified-db#1780


# Describe the purpose of this series. The information you put here
# will be used by the project maintainer to make a decision whether
# your patches should be reviewed, and in what priority order. Please be
# very detailed and link to any relevant discussions or sites that the
# maintainer can review to better understand your proposed changes. If you
# only have a single patch in your series, the contents of the cover
# letter will be appended to the "under-the-cut" portion of the patch.

# Lines starting with # will be removed from the cover letter. You can
# use them to add notes or reminders to yourself. If you want to use
# markdown headers in your cover letter, start the line with ">#".

# You can add trailers to the cover letter. Any email addresses found in
# these trailers will be added to the addresses specified/generated
# during the b4 send stage. You can also run "b4 prep --auto-to-cc" to
# auto-populate the To: and Cc: trailers based on the code being
# modified.

To: Paul Walmsley <pjw@kernel.org>
To: Palmer Dabbelt <palmer@dabbelt.com>
To: Alexandre Ghiti <alex@ghiti.fr>
To: Anup Patel <anup@brainfault.org>
To: Atish Patra <atish.patra@linux.dev>
To: Conor Dooley <conor@kernel.org>
To: Paolo Bonzini <pbonzini@redhat.com>
To: Andrew Morton <akpm@linux-foundation.org>
To: Shuah Khan <shuah@kernel.org>
Cc: linux-riscv@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: kvm@vger.kernel.org
Cc: kvm-riscv@lists.infradead.org
Cc: linux-kselftest@vger.kernel.org
Signed-off-by: Charlie Jenkins <thecharlesjenkins@gmail.com>

---
Changes in v2:
- EDITME: describe what is new in this series revision.
- EDITME: use bulletpoints and terse descriptions.
- Link to v1: https://lore.kernel.org/r/20260407-riscv_insn_table-v1-0-54b4736a1e77@gmail.com



--- b4-submit-tracking ---
# This section is used internally by b4 prep for tracking purposes.
{
  "series": {
    "revision": 2,
    "change-id": "20260114-riscv_insn_table-38495495dfd3",
    "prefixes": [],
    "history": {
      "v1": [
        "20260407-riscv_insn_table-v1-0-54b4736a1e77@gmail.com"
      ]
    }
  }
}
@thecharlesjenkins
Copy link
Copy Markdown
Contributor Author

I resent the patch and then right after I sent it the original one showed up on the archives so now it is there twice... It might have been throttled since it is a large patch. Here it is https://lore.kernel.org/lkml/20260407-riscv_insn_table-v1-1-54b4736a1e77@gmail.com/

Copy link
Copy Markdown
Collaborator

@ThinkOpenly ThinkOpenly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing we try to do for backends that generate consistent output is to add a simple test that compares newly generated output with a "golden" reference. This can be a bit of a pain to maintain, but I expect we'll have "autofix" fix things up automatically in the near future. This helps if someone, for example, changes the scripts here but thinks the changes are innocuous when they aren't.

Comment thread backends/inst_table/lib/udb/generate_inst_table.rb Outdated
Comment thread backends/inst_table/lib/udb/generate_inst_table.rb Outdated
Comment thread backends/inst_table/lib/udb/generate_inst_table.rb Outdated
Comment thread backends/inst_table/lib/udb/generate_inst_table.rb Outdated
Comment thread backends/inst_table/lib/udb/generate_inst_table.rb Outdated
Comment thread backends/inst_table/lib/udb/generate_inst_table.rb Outdated
Comment thread backends/inst_table/bin/generate_inst_table Outdated
Comment thread backends/inst_table/lib/udb/generate_inst_table.rb Outdated
Comment thread backends/inst_table/tasks.rake Outdated
Comment thread tools/ruby-gems/udb/lib/udb/obj/instruction.rb
@thecharlesjenkins
Copy link
Copy Markdown
Contributor Author

One thing we try to do for backends that generate consistent output is to add a simple test that compares newly generated output with a "golden" reference. This can be a bit of a pain to maintain, but I expect we'll have "autofix" fix things up automatically in the near future. This helps if someone, for example, changes the scripts here but thinks the changes are innocuous when they aren't.

Yes I agree that makes sense. However, I am not sure how to add tests in the new generator infrastructure, and how I would go about making a test that wouldn't break each time a new instruction is added. It looks like the only existing test case in this new method is the type linting:

. I can mess with it longer, but I don't really know how this is supposed to fit together.

@ThinkOpenly
Copy link
Copy Markdown
Collaborator

I am not sure ... how I would go about making a test that wouldn't break each time a new instruction is added.

It's supposed to break when new content is added. We're trying to catch unintentionally added content, and for intentionally added content, to see it is processed as expected. With no representation of the output of the script, there's nothing to test against and script modifications go untested.

I am not sure how to add tests in the new generator infrastructure

I'm thinking more like the testing done in backends/instruction_appendix or tools/python (profile_extensions). Nothing fancy.

You could take it further, of course:

  • Add the kernel generation script (insn_tbl.sh) here as well and use it to generate insn_gen.h here just to make sure that script continues working forever.
  • Maybe instead of adding the script, live pull the latest version of that script from the kernel tree instead of storing a copy here. (Or get "renovate" to keep it up to date here automatically, once we get "renovate" working satisfactorily... if that's within renovate's scope.)
  • Compile something meaningful that uses the generated insn_gen.h to verify inst_tbl.sh is generating valid content. "Something meaningful" could be as sophisticated as desired, including more pieces directly from the kernel.

All of that is a little risky if paths or file names change in the kernel, of course.

@thecharlesjenkins
Copy link
Copy Markdown
Contributor Author

I added a couple of test cases. One of the test cases is to compare against the full output, and the other ones check that hypothetical instruction match expected output.

@thecharlesjenkins
Copy link
Copy Markdown
Contributor Author

I added mocha to mock the instruction objects to improve the testing. How do I trigger the dockerfile to rebuild so that it gets the updated Rakefile and pulls in mocha?

@thecharlesjenkins thecharlesjenkins force-pushed the dev/charlie/insn_table branch 6 times, most recently from 21cfdff to 982a22d Compare April 21, 2026 03:44
This new backend that can be generated with "bundle exec ruby
bin/container/udb-gen inst-table -o inst_table.txt" is designed to organize the
UDB instructions into a table modeled after the syscall table in Linux. The
usecase for this is to commit the generated table into Linux which will then use
the table to generate the necessary instruction headers.

Signed-off-by: Charlie Jenkins <thecharlesjenkins@gmail.com>
Tapioca annotations in rbi-central are community maintained and are not
always correct. Remove them from the chore script to avoid them being
forcefully added by the CI. This was making it impossible to add "mocha"
as a dependency because the rbi-central types are conflicting with the
sorbet-typed versions.

Signed-off-by: Charlie Jenkins <thecharlesjenkins@gmail.com>
Add the Mocha gem to use for mocking objects in test cases.

Signed-off-by: Charlie Jenkins <thecharlesjenkins@gmail.com>
Validate the instruction table generation against all existing instructions.
This test file will need to be regenerated each time instructions change, but
can be useful to catch unexpected changes.

Signed-off-by: Charlie Jenkins <thecharlesjenkins@gmail.com>
@thecharlesjenkins thecharlesjenkins force-pushed the dev/charlie/insn_table branch from 982a22d to 39ee8dd Compare May 5, 2026 06:14
Comment thread tools/ruby-gems/udb-gen/Gemfile Outdated
Comment thread tools/ruby-gems/udb-gen/Rakefile Outdated
Comment thread Rakefile
puts "All IDL passed type checking"
end

task :inst_table_gen do
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole stanza should probably not be in the main Rakefile, but closer to the generator itself, and included here. We have a mechanism already set up well above that automatically pulls in tools/*/tasks.rake. So, I suggest creating a new tools/ruby-gems/tasks.rake and move this content there.

Comment thread Rakefile
task :inst_table_gen do
require_relative "tools/ruby-gems/udb-gen/lib/udb-gen/generators/inst_table/table_builder"
golden_path = "tests/data/golden/test_table.txt"
cfg = "_"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should allow other configs, at the user's discretion. You can find a use of CFG environment variable for allowing that use-case above.

Comment thread Rakefile
Comment on lines +281 to +303
actual_output = Dir.chdir(dir) do
builder.generate
end

if File.exist?(golden_path)
expected_output = File.read(golden_path)

if actual_output == expected_output
puts "SUCCESS: Generated output matches #{golden_path}"
else
puts "FAILURE: Output has diverged!"

File.write(actual_path, actual_output)
system("diff -u #{golden_path} #{actual_path}")

raise "Golden file mismatch for #{cfg}"
end
else
puts "Golden file not found. Creating #{golden_path}..."
FileUtils.mkdir_p(File.dirname(golden_path))
File.write(golden_path, actual_output)
puts "Done."
end
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is carrying a big load. Let's split this into three different tasks:

  • gen:inst_table
    Just generates the table (optionally based on a specified configuration).
    Used in local testing and by the next two tasks.
  • test:inst_table
    Verify that a newly generated (via gen:inst_table) table is identical to the golden reference.
    Mostly used by CI and maybe pre-commit.
  • chore:update_inst_table
    Generate a new inst_table and makes it the golden reference.
    Used locally to complete a PR, maybe pre-commit as well, and ideally autofix.

Comment thread bin/chore
Comment on lines -447 to -453

# tapioca annotations requires the ext/rbi-central submodule to be initialized.
if [ ! -f "${UDB_ROOT}/ext/rbi-central/index.json" ]; then
echo "Initializing ext/rbi-central submodule..."
git -C "${UDB_ROOT}" submodule update --init ext/rbi-central
fi
"${UDB_ROOT}"/bin/bundle exec --gemfile "${UDB_ROOT}"/Gemfile tapioca annotations
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dhower-qc and/or @jordancarlin, thoughts here? @thecharlesjenkins explained this change in the commit message a3f4b4f:

Tapioca annotations in rbi-central are community maintained and are not
always correct. Remove them from the chore script to avoid them being
forcefully added by the CI. This was making it impossible to add "mocha"
as a dependency because the rbi-central types are conflicting with the
sorbet-typed versions.

@ThinkOpenly
Copy link
Copy Markdown
Collaborator

Whoa. I guess I forgot to actually submit comments from an earlier review session. sigh.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants