Refactoring the RISCV architecture to Auto-Sync on LLVM #2756

moste00 · 2025-07-16T22:57:51Z

Your checklist for this pull request

I've documented or updated the documentation of every API function and struct this PR changes.
I've added tests that prove my fix is effective or that my feature works (if possible)

Detailed description

This PR is attempting to refactor the RISCV architecture to be updatable from the Auto-Sync tool, such that it automatically follows RISCV-in-LLVM additions and fixes. Following the refactoring guide

depends-on: capstone-engine/llvm-capstone#83

Test plan

...

Closing issues

...

Rot127 · 2025-07-25T10:59:04Z

arch/RISCV/RISCVInstPrinter.c

+			SStream_concat0(markup(O, Markup_Register), "x18");
+		}
+		break;
+	case RISCVZC_RLISTENCODE::RA_S0_S3:


These namespace specifiers should have been replaced by AS. This might be a bug.

Rot127 · 2025-07-25T11:10:59Z

arch/RISCV/RISCVInstPrinter.c

+
+	if (PrintBranchImmAsAddress) {
+		uint64_t Target = Address + MCOperand_getImm(MO);
+		if (!STI.hasFeature(RISCV_Feature64Bit))


The STI.hasFeature should probably be automated away as well.
You can check the STIFeatureBits patch and just copy and adapt it if you have some time.

Rot127 · 2025-07-25T11:11:53Z

arch/RISCV/RISCVInstPrinter.c

+	if (SysReg && SysReg->haveRequiredFeatures(STI.getFeatureBits()))
+		SStream_concat0(markup(O, Markup_Register), SysReg->Name);
+	else
+		SStream_concat0(markup(O, Markup_Register), formatImm(Imm));


Yeah, better get rid of these formatImm calls. Otherwise we have two places to take care of formatting and everything gets less consistent.

Rot127 · 2025-07-25T11:13:19Z

arch/RISCV/RISCVInstPrinter.c

+		// which will not print trailing zeros and will use scientific notation
+		// if it is shorter than printing as a decimal. The smallest value requires
+		// 12 digits of precision including the decimal.
+		if (FPVal == (int)(FPVal))


I'd need to check the standard, but casting an float to an integer just rounds it up/down.
Please just use the SStream function for printing it.

Rot127 · 2025-08-14T13:47:49Z

cc @hainest You are probably aware of this PR, just wanted to ping you as well. Because you mentioned in the Zydis discussion you use Capstone for RISCV as well.

hainest · 2025-08-14T18:52:10Z

cc @hainest You are probably aware of this PR, just wanted to ping you as well. Because you mentioned in the Zydis discussion you use Capstone for RISCV as well.

I had not seen this. @wxrdnx have you looked at this?

tests/MC/RISCV/XCVmac_valid_riscv32_xcvmac.txt.yaml

Rot127 · 2025-08-25T13:20:34Z

suite/cstest/include/test_mapping.h

+	{ .str = "riscv32", .val = CS_MODE_RISCV32 },
+	{ .str = "riscv64", .val = CS_MODE_RISCV64 },


In response to above. Please stick with the Capstone flag naming convention.
Is RSICV32 is anything else than just 32bit?

You have a point, but I don't know, RISCV[32|64] and RISCVC were legacy flags that were already in the old plugin, so I kept them.

Removing them would probably not affect anything but just be a hassle, it could also be the case that the generated code references them from lots of places and changing the generated code is always a hassle.

generated code references them

Which one? The test files from the MCUpdater? Those ones can be replaced as described here #2756 (comment)

@Rot127 sorry for the late reply, yes, only the yaml MC tests reference those values, plus a few references in handwritten code in the cstool.c and so on, so it's doable to replace them, will do.

Although this might break APIs, because I see some Python usage from my editor search, until now I have avoided deleting enum members from includes/riscv.h, that's why we still have CS_MODE_RISCV_C and CS_MODE_RISCVC.

include/capstone/capstone.h

suite/cstest/include/test_mapping.h

wargio · 2025-09-17T02:55:58Z

cstool/cstool.c

+	{ "+c", "Enables RISCV C extension", {
+		CS_ARCH_RISCV, CS_ARCH_MAX }, 0, CS_MODE_RISCV_C },
+	{ "+fd", "Enables RISCV F and D extensions ", {
+		CS_ARCH_RISCV, CS_ARCH_MAX }, 0, CS_MODE_RISCV_FD},
+	{ "+v", "Enables RISCV V extension ", {
+		CS_ARCH_RISCV, CS_ARCH_MAX }, 0, CS_MODE_RISCV_V},
+	{ "+inx", "Enables RISCV Zfinx, Zdinx, and Zhinx extensions," 
+			  " zhinxmin is also enabled as it's subset of zhinx ", {
+		CS_ARCH_RISCV, CS_ARCH_MAX }, 0, CS_MODE_RISCV_ZFINX},
+	{ "+zcmp-t-e", "Enables the following RISCV code size reduction extensions: zcmp, zcmt and zce", {
+		CS_ARCH_RISCV, CS_ARCH_MAX }, 0, CS_MODE_RISCV_ZCMP_ZCMT_ZCE},


looks like many modes are missing:

CS_MODE_RISCV_C = 1 << 2, ///< RISCV compressed instructure mode CS_MODE_RISCV_FD = 1 << 3, CS_MODE_RISCV_V = 1 << 4, CS_MODE_RISCV_ZFINX = 1 << 5, CS_MODE_RISCV_ZCMP_ZCMT_ZCE = 1 << 6, CS_MODE_RISCV_ZICFISS = 1 << 7, CS_MODE_RISCV_E = 1 << 8, CS_MODE_RISCV_A = 1 << 9, CS_MODE_RISCV_COREV = 1 << 10, CS_MODE_RISCV_THEAD = 1 << 11, CS_MODE_RISCV_SIFIVE = 1 << 12, CS_MODE_RISCV_BITMANIP = 1 << 13, CS_MODE_RISCV_ZBA = 1 << 14, CS_MODE_RISCV_ZBB = 1 << 15, CS_MODE_RISCV_ZBC = 1 << 16, CS_MODE_RISCV_ZBKB = 1 << 17, CS_MODE_RISCV_ZBKC = 1 << 18, CS_MODE_RISCV_ZBKX = 1 << 19, CS_MODE_RISCV_ZBS = 1 << 20,

Right, ideally we need at least one cstool test per mode. Just my thoughts. @Rot127 WDYT?

these should be independent flags like you would setup gcc/clang when specifying the cpu options

Just my thoughts. @Rot127 WDYT?

Yes, good idea. For the sake of simplicity we can implement them in Python in a new sub directory tests/integration/cstool I would suggest.

@wargio sorry for the late reply, but do you mean we should have a seperate flag per each RISCV extension ? my impression is that we said that the goal is to combine as much as possible into coarse-grained flags.

(I recognize that at the moment some RISCV extensions don't feature in any flags, that just means that no test failed because they were enabled, so they're always catch-all enabled in the feautre-checking logic. We can always add flags later.)

I think we can't have a seperate flag per extension because RISCV kept dividing and sub-dividing extensions till they exceeded 32, and we're using an int32 to store the feature bitvector.

but do you mean we should have a seperate flag per each RISCV extension ?

Yes, for each individual one, not for combinations. We don't know what the use cases of people are. So allowing for finer configs is important.

Also there are not that many if I am not mistaken?
https://en.wikichip.org/wiki/risc-v/standard_extensions

I agree with the fine-grained config, but the link is probably incomplete, off the top of my head I can't see the half-percision floating point extension. I'm sure we find others not listed if we look carefully.

You can also group some of the LLVM extensions in a reasonable way.
We don't need and should not copy them one to one.

E.g. casually looking at RISCVFeatures.td seems to have common prefixes for the different Z extensions. Where Zb, Zc, Zf etc. are certain extensions and Zfhmin, Zfh etc, are just sub-categories.

So just creating an extension for Zf, Zb, Zc etc. would be sufficient IMO.

Rot127

Please check out the helper functions in Mapping.h. They should always be used for these tasks. Because they do bounds checking and such. Accessing the struct members directly (e.g. like details->groups[details->groups_count]) should be the absolute exception, not the rule.

You can also see how the helpers are used in other architectures and just do it the same way. LoongArch is a good example, because it is relatively simple.
AArch64 has many very complex operands. You can check its code if you need examples for these cases.

arch/RISCV/RISCVMapping.c

Rot127 · 2025-09-17T12:33:03Z

The conflicts are coming from the clang-format formatting we did recently btw. So nothing to preserve there.
You can just accept all your changes.

Rot127 · 2025-09-26T09:05:00Z

suite/cstest/include/test_compare.h

 		} \
 	}

+#define compare_string_from_int_ret(actual, expected, converter, ret_val) \


Rot127 · 2025-10-10T13:26:34Z

arch/RISCV/RISCVMapping.c

+// for weird reasons some instructions end up with valid operands that are
+// interspersed with invalid operands, i.e. the operands array is an "island"
+// of valid operands with invalid gaps between them, this function will compactify
+// all the valid operands and pad the rest of the array to invalid


Have you debugged one or two of them? It could be an indicator something went wrong during the C++ -> C translation.

Hmm, didn't think of that.. the operands array already behaves strangely, for example it can be filled out of order (the operand at [1] will be put in the array before the operand at [0]), I just dismissed those erratic behaviours as how the generated code behaves

I will see one of them in details.

…son, and riscv.h comments for generated content added

…V is ready

…test failures: 1230 out of 4757

…ated, it doesn't compile yet but is mostly valid C except for a few problems

…6 pass

…ol and the test yaml schema accordingly

include/capstone/capstone.h

…ift amounts, as it's a bug wtih LLVM upstream

…rsion

Rot127 · 2025-10-22T12:46:53Z

arch/RISCV/RISCVBaseInfo.c

+
+float getFPImm(unsigned Imm)
+{
+	CS_ASSERT(Imm != 1 && Imm != 30 && Imm != 31 &&


Imm can still be greater than the array boundaries.
Please check this as well.

github-actions bot added RISCV Arch Auto-Sync-files Auto-Sync LLVM-generated-files labels Jul 16, 2025

moste00 marked this pull request as draft July 16, 2025 22:58

Rot127 mentioned this pull request Jul 17, 2025

auto-sync progress tracker: Refactor and implement architectures #2015

Open

47 tasks

notxvilka mentioned this pull request Jul 17, 2025

RISC-V - Update Capstone, improve analysis, add RzIL uplifting rizinorg/rizin#5275

Open

9 tasks

moste00 mentioned this pull request Jul 17, 2025

Adding RISCV to supported architectures capstone-engine/llvm-capstone#83

Draft

Rot127 reviewed Jul 25, 2025

View reviewed changes

github-actions bot added CS-core-files auto-sync LLVM-core-files auto-sync labels Jul 26, 2025

Rot127 requested changes Aug 25, 2025

View reviewed changes

Rot127 added this to the v6 - Beta milestone Sep 1, 2025

Rot127 added the blocker Must be finished with the assigned milestone. label Sep 1, 2025

wargio reviewed Sep 17, 2025

View reviewed changes

Rot127 requested changes Sep 17, 2025

View reviewed changes

arch/RISCV/RISCVMapping.c Outdated Show resolved Hide resolved

arch/RISCV/RISCVMapping.c Outdated Show resolved Hide resolved

arch/RISCV/RISCVMapping.c Outdated Show resolved Hide resolved

Rot127 reviewed Sep 26, 2025

View reviewed changes

suite/cstest/include/test_compare.h

} \

}

#define compare_string_from_int_ret(actual, expected, converter, ret_val) \

Copy link

Collaborator

Rot127 Sep 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea

Rot127 reviewed Oct 10, 2025

View reviewed changes

moste00 added 9 commits October 17, 2025 03:28

Refactoring RISCV: INC generatione done, RISCV added to arch_config.j…

aa8e587

…son, and riscv.h comments for generated content added

the C++ translator runs successfully and first output under arch/RISC…

fb08d9d

…V is ready

fixed Disassembler.c, in the process of fixing InstPrinter

0c2cc3b

more compile errors fixed, RISCVInstPrinter error-free

ad513b1

plugins compiles and runs, several test failures were fixed, current …

8a9626c

…test failures: 1230 out of 4757

compressed instructions compression and uncompression logic was gener…

a275efd

…ated, it doesn't compile yet but is mostly valid C except for a few problems

massive fix handling various issues, test failures down to 35

cfc5ef8

test failures down to 2, related to inline option arch directives

012099c

added options to cstool, and started working on details test, 39/10…

b4d9e1b

…6 pass

moste00 added 5 commits October 17, 2025 04:01

more successful details test, added CSR operand type and changed csto…

e074c3d

…ol and the test yaml schema accordingly

more progress

0949896

failures decreased to 2

0d8581c

milestone: test failures 0

e1eaf2e

add additional files generated from ASUpdate and update CMakeLists

1d59145

moste00 force-pushed the refactor_riscv_autosync branch from 12882dc to 1d59145 Compare October 17, 2025 01:16

moste00 marked this pull request as ready for review October 17, 2025 01:19

Rot127 reviewed Oct 17, 2025

View reviewed changes

include/capstone/capstone.h Show resolved Hide resolved

moste00 added 2 commits October 20, 2025 02:01

fix most issues tests except the one about c_srli allowing illegal sh…

e15a39b

…ift amounts, as it's a bug wtih LLVM upstream

fixed c_srli bug by updating the generated files from a fixed LLVM ve…

7f45480

…rsion

Rot127 reviewed Oct 22, 2025

View reviewed changes

		{ .str = "riscv32", .val = CS_MODE_RISCV32 },
		{ .str = "riscv64", .val = CS_MODE_RISCV64 },

Refactoring the RISCV architecture to Auto-Sync on LLVM #2756

Are you sure you want to change the base?

Refactoring the RISCV architecture to Auto-Sync on LLVM #2756

Uh oh!

Conversation

moste00 commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Rot127 Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Rot127 commented Aug 14, 2025

Uh oh!

hainest commented Aug 14, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Rot127 Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Rot127 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Rot127 commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

moste00 commented Jul 16, 2025 •

edited

Loading

Rot127 Jul 25, 2025 •

edited

Loading

Rot127 Sep 17, 2025 •

edited

Loading

Rot127 commented Sep 17, 2025 •

edited

Loading