feat: add backend for an instruction table#1780
feat: add backend for an instruction table#1780thecharlesjenkins wants to merge 4 commits intoriscv:mainfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1780 +/- ##
=======================================
Coverage 72.21% 72.21%
=======================================
Files 52 52
Lines 27740 27740
Branches 6012 6012
=======================================
Hits 20033 20033
Misses 7707 7707
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
I don't believe generated content intended for a GPL project needs to licensed as GPL at creation. It does need to be GPL-compatible, and there is a good description of many licenses and their compatibility with GPL at https://www.gnu.org/licenses/license-list.en.html. As an example, you'll find "MIT license" not only compatible with GPL, but also already present here.
What are "requested instructions"? Put differently, what exactly are you generating? Are you intending to replace existing content, or generate new content? How is it used in Linux? (Is this for instruction single-step?) |
Oh interesting, if it's preferred to have a different license I can do that to.
"requested instructions" for Linux is all instructions but if other projects what to use this table they can take a subset of instructions if they prefer, I was just trying to be general. Here is the patch series to Linux I just sent out to use this table https://lore.kernel.org/linux-riscv/20260407-riscv_insn_table-v1-0-54b4736a1e77@gmail.com/T/#t. Instruction single-step is part of it but there are a ton of places that use manually generated instructions that I swap over to using this table in the series. |
To me, it seems like the "easy button" is to use the same license for content generated from other content as the original content. The YAML files here are BSD-based (described as compatible with GPL at the link above as "The Clear BSD License").
That seems very, very cool! I'd love to look at the script that performs the generation, but it doesn't appear that patch (01/16) made it to the mailing list archives. :-/ |
There is a lot of code that is manually manipulating riscv instructions. Constructing these headers for new instructions is time consuming and error prone. The first patch in this series introduces the instruction definition table along with a script that runs at compile time to create all of the associated instruction manipulation functions. The remaining patches migrate all of the manual instruction code to use these generated functions. The instruction definition table is generated from the riscv-unified-db[1], an open source RVI project for instruction specifications. I am sending this series at the same time as I am sending the patch to the riscv-unified-db for the format of the instruction table. That patch can be accessed on github [2]. A lot of the validation for these changes is from running all possible 32-bit or 16-bit integers through these functions to see that any given instruction will produce to expected result. I give more detail on the test cases in the notes of the first couple of patches. For the KVM patches, I also introduce two test cases to ensure that the instruction manipulation is working as expected, along with a bug fix of the current implemention. [1] https://github.com/riscv/riscv-unified-db [2] riscv/riscv-unified-db#1780 # Describe the purpose of this series. The information you put here # will be used by the project maintainer to make a decision whether # your patches should be reviewed, and in what priority order. Please be # very detailed and link to any relevant discussions or sites that the # maintainer can review to better understand your proposed changes. If you # only have a single patch in your series, the contents of the cover # letter will be appended to the "under-the-cut" portion of the patch. # Lines starting with # will be removed from the cover letter. You can # use them to add notes or reminders to yourself. If you want to use # markdown headers in your cover letter, start the line with ">#". # You can add trailers to the cover letter. Any email addresses found in # these trailers will be added to the addresses specified/generated # during the b4 send stage. You can also run "b4 prep --auto-to-cc" to # auto-populate the To: and Cc: trailers based on the code being # modified. To: Paul Walmsley <pjw@kernel.org> To: Palmer Dabbelt <palmer@dabbelt.com> To: Alexandre Ghiti <alex@ghiti.fr> To: Anup Patel <anup@brainfault.org> To: Atish Patra <atish.patra@linux.dev> To: Conor Dooley <conor@kernel.org> To: Paolo Bonzini <pbonzini@redhat.com> To: Andrew Morton <akpm@linux-foundation.org> To: Shuah Khan <shuah@kernel.org> Cc: linux-riscv@lists.infradead.org Cc: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org Cc: kvm-riscv@lists.infradead.org Cc: linux-kselftest@vger.kernel.org Signed-off-by: Charlie Jenkins <thecharlesjenkins@gmail.com> --- Changes in v2: - EDITME: describe what is new in this series revision. - EDITME: use bulletpoints and terse descriptions. - Link to v1: https://lore.kernel.org/r/20260407-riscv_insn_table-v1-0-54b4736a1e77@gmail.com --- b4-submit-tracking --- # This section is used internally by b4 prep for tracking purposes. { "series": { "revision": 2, "change-id": "20260114-riscv_insn_table-38495495dfd3", "prefixes": [], "history": { "v1": [ "20260407-riscv_insn_table-v1-0-54b4736a1e77@gmail.com" ] } } }
|
I resent the patch and then right after I sent it the original one showed up on the archives so now it is there twice... It might have been throttled since it is a large patch. Here it is https://lore.kernel.org/lkml/20260407-riscv_insn_table-v1-1-54b4736a1e77@gmail.com/ |
ThinkOpenly
left a comment
There was a problem hiding this comment.
One thing we try to do for backends that generate consistent output is to add a simple test that compares newly generated output with a "golden" reference. This can be a bit of a pain to maintain, but I expect we'll have "autofix" fix things up automatically in the near future. This helps if someone, for example, changes the scripts here but thinks the changes are innocuous when they aren't.
Yes I agree that makes sense. However, I am not sure how to add tests in the new generator infrastructure, and how I would go about making a test that wouldn't break each time a new instruction is added. It looks like the only existing test case in this new method is the type linting: . I can mess with it longer, but I don't really know how this is supposed to fit together. |
61274ee to
a1f6fe8
Compare
It's supposed to break when new content is added. We're trying to catch unintentionally added content, and for intentionally added content, to see it is processed as expected. With no representation of the output of the script, there's nothing to test against and script modifications go untested.
I'm thinking more like the testing done in You could take it further, of course:
All of that is a little risky if paths or file names change in the kernel, of course. |
a1f6fe8 to
3d473cf
Compare
|
I added a couple of test cases. One of the test cases is to compare against the full output, and the other ones check that hypothetical instruction match expected output. |
|
I added mocha to mock the instruction objects to improve the testing. How do I trigger the dockerfile to rebuild so that it gets the updated Rakefile and pulls in mocha? |
21cfdff to
982a22d
Compare
This new backend that can be generated with "bundle exec ruby bin/container/udb-gen inst-table -o inst_table.txt" is designed to organize the UDB instructions into a table modeled after the syscall table in Linux. The usecase for this is to commit the generated table into Linux which will then use the table to generate the necessary instruction headers. Signed-off-by: Charlie Jenkins <thecharlesjenkins@gmail.com>
Tapioca annotations in rbi-central are community maintained and are not always correct. Remove them from the chore script to avoid them being forcefully added by the CI. This was making it impossible to add "mocha" as a dependency because the rbi-central types are conflicting with the sorbet-typed versions. Signed-off-by: Charlie Jenkins <thecharlesjenkins@gmail.com>
Add the Mocha gem to use for mocking objects in test cases. Signed-off-by: Charlie Jenkins <thecharlesjenkins@gmail.com>
Validate the instruction table generation against all existing instructions. This test file will need to be regenerated each time instructions change, but can be useful to catch unexpected changes. Signed-off-by: Charlie Jenkins <thecharlesjenkins@gmail.com>
982a22d to
39ee8dd
Compare
| puts "All IDL passed type checking" | ||
| end | ||
|
|
||
| task :inst_table_gen do |
There was a problem hiding this comment.
This whole stanza should probably not be in the main Rakefile, but closer to the generator itself, and included here. We have a mechanism already set up well above that automatically pulls in tools/*/tasks.rake. So, I suggest creating a new tools/ruby-gems/tasks.rake and move this content there.
| task :inst_table_gen do | ||
| require_relative "tools/ruby-gems/udb-gen/lib/udb-gen/generators/inst_table/table_builder" | ||
| golden_path = "tests/data/golden/test_table.txt" | ||
| cfg = "_" |
There was a problem hiding this comment.
This should allow other configs, at the user's discretion. You can find a use of CFG environment variable for allowing that use-case above.
| actual_output = Dir.chdir(dir) do | ||
| builder.generate | ||
| end | ||
|
|
||
| if File.exist?(golden_path) | ||
| expected_output = File.read(golden_path) | ||
|
|
||
| if actual_output == expected_output | ||
| puts "SUCCESS: Generated output matches #{golden_path}" | ||
| else | ||
| puts "FAILURE: Output has diverged!" | ||
|
|
||
| File.write(actual_path, actual_output) | ||
| system("diff -u #{golden_path} #{actual_path}") | ||
|
|
||
| raise "Golden file mismatch for #{cfg}" | ||
| end | ||
| else | ||
| puts "Golden file not found. Creating #{golden_path}..." | ||
| FileUtils.mkdir_p(File.dirname(golden_path)) | ||
| File.write(golden_path, actual_output) | ||
| puts "Done." | ||
| end |
There was a problem hiding this comment.
This is carrying a big load. Let's split this into three different tasks:
- gen:inst_table
Just generates the table (optionally based on a specified configuration).
Used in local testing and by the next two tasks. - test:inst_table
Verify that a newly generated (via gen:inst_table) table is identical to the golden reference.
Mostly used by CI and maybe pre-commit. - chore:update_inst_table
Generate a new inst_table and makes it the golden reference.
Used locally to complete a PR, maybe pre-commit as well, and ideally autofix.
|
|
||
| # tapioca annotations requires the ext/rbi-central submodule to be initialized. | ||
| if [ ! -f "${UDB_ROOT}/ext/rbi-central/index.json" ]; then | ||
| echo "Initializing ext/rbi-central submodule..." | ||
| git -C "${UDB_ROOT}" submodule update --init ext/rbi-central | ||
| fi | ||
| "${UDB_ROOT}"/bin/bundle exec --gemfile "${UDB_ROOT}"/Gemfile tapioca annotations |
There was a problem hiding this comment.
@dhower-qc and/or @jordancarlin, thoughts here? @thecharlesjenkins explained this change in the commit message a3f4b4f:
Tapioca annotations in rbi-central are community maintained and are not
always correct. Remove them from the chore script to avoid them being
forcefully added by the CI. This was making it impossible to add "mocha"
as a dependency because the rbi-central types are conflicting with the
sorbet-typed versions.
|
Whoa. I guess I forgot to actually submit comments from an earlier review session. sigh. |
This new backend that can be generated with "bundle exec rake gen:inst_table" is designed to organize all of the requested instructions into a table modeled after the syscall table in Linux. The usecase for this is to commit the generated table into Linux which will use the table to generate the necessary instruction headers.
To be able to commit to Linux, I am marking the generated file as GPL-2.0-only and Reuse requires adding the License file to this repo for that.