Skip to content

Conversation

moste00
Copy link
Contributor

@moste00 moste00 commented Jul 16, 2025

Your checklist for this pull request

  • I've documented or updated the documentation of every API function and struct this PR changes.
  • I've added tests that prove my fix is effective or that my feature works (if possible)

Detailed description

This PR is attempting to refactor the RISCV architecture to be updatable from the Auto-Sync tool, such that it automatically follows RISCV-in-LLVM additions and fixes. Following the refactoring guide

depends-on: capstone-engine/llvm-capstone#83

Test plan

...

Closing issues

...

SStream_concat0(markup(O, Markup_Register), "x18");
}
break;
case RISCVZC_RLISTENCODE::RA_S0_S3:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These namespace specifiers should have been replaced by AS. This might be a bug.


if (PrintBranchImmAsAddress) {
uint64_t Target = Address + MCOperand_getImm(MO);
if (!STI.hasFeature(RISCV_Feature64Bit))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The STI.hasFeature should probably be automated away as well.
You can check the STIFeatureBits patch and just copy and adapt it if you have some time.

if (SysReg && SysReg->haveRequiredFeatures(STI.getFeatureBits()))
SStream_concat0(markup(O, Markup_Register), SysReg->Name);
else
SStream_concat0(markup(O, Markup_Register), formatImm(Imm));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, better get rid of these formatImm calls. Otherwise we have two places to take care of formatting and everything gets less consistent.

// which will not print trailing zeros and will use scientific notation
// if it is shorter than printing as a decimal. The smallest value requires
// 12 digits of precision including the decimal.
if (FPVal == (int)(FPVal))
Copy link
Collaborator

@Rot127 Rot127 Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd need to check the standard, but casting an float to an integer just rounds it up/down.
Please just use the SStream function for printing it.

@Rot127
Copy link
Collaborator

Rot127 commented Aug 14, 2025

cc @hainest You are probably aware of this PR, just wanted to ping you as well. Because you mentioned in the Zydis discussion you use Capstone for RISCV as well.

@hainest
Copy link
Contributor

hainest commented Aug 14, 2025

cc @hainest You are probably aware of this PR, just wanted to ping you as well. Because you mentioned in the Zydis discussion you use Capstone for RISCV as well.

I had not seen this. @wxrdnx have you looked at this?

Comment on lines 188 to 189
{ .str = "riscv32", .val = CS_MODE_RISCV32 },
{ .str = "riscv64", .val = CS_MODE_RISCV64 },
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In response to above. Please stick with the Capstone flag naming convention.
Is RSICV32 is anything else than just 32bit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have a point, but I don't know, RISCV[32|64] and RISCVC were legacy flags that were already in the old plugin, so I kept them.

Removing them would probably not affect anything but just be a hassle, it could also be the case that the generated code references them from lots of places and changing the generated code is always a hassle.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generated code references them

Which one? The test files from the MCUpdater? Those ones can be replaced as described here #2756 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rot127 sorry for the late reply, yes, only the yaml MC tests reference those values, plus a few references in handwritten code in the cstool.c and so on, so it's doable to replace them, will do.

Although this might break APIs, because I see some Python usage from my editor search, until now I have avoided deleting enum members from includes/riscv.h, that's why we still have CS_MODE_RISCV_C and CS_MODE_RISCVC.

@Rot127 Rot127 added this to the v6 - Beta milestone Sep 1, 2025
@Rot127 Rot127 added the blocker Must be finished with the assigned milestone. label Sep 1, 2025
cstool/cstool.c Outdated
Comment on lines 77 to 87
{ "+c", "Enables RISCV C extension", {
CS_ARCH_RISCV, CS_ARCH_MAX }, 0, CS_MODE_RISCV_C },
{ "+fd", "Enables RISCV F and D extensions ", {
CS_ARCH_RISCV, CS_ARCH_MAX }, 0, CS_MODE_RISCV_FD},
{ "+v", "Enables RISCV V extension ", {
CS_ARCH_RISCV, CS_ARCH_MAX }, 0, CS_MODE_RISCV_V},
{ "+inx", "Enables RISCV Zfinx, Zdinx, and Zhinx extensions,"
" zhinxmin is also enabled as it's subset of zhinx ", {
CS_ARCH_RISCV, CS_ARCH_MAX }, 0, CS_MODE_RISCV_ZFINX},
{ "+zcmp-t-e", "Enables the following RISCV code size reduction extensions: zcmp, zcmt and zce", {
CS_ARCH_RISCV, CS_ARCH_MAX }, 0, CS_MODE_RISCV_ZCMP_ZCMT_ZCE},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like many modes are missing:

	CS_MODE_RISCV_C   = 1 << 2,        ///< RISCV compressed instructure mode
	CS_MODE_RISCV_FD  = 1 << 3,
	CS_MODE_RISCV_V   = 1 << 4,
	CS_MODE_RISCV_ZFINX = 1 << 5,
	CS_MODE_RISCV_ZCMP_ZCMT_ZCE = 1 << 6,
	CS_MODE_RISCV_ZICFISS = 1 << 7,
	CS_MODE_RISCV_E = 1 << 8,
	CS_MODE_RISCV_A = 1 << 9,
	CS_MODE_RISCV_COREV = 1 << 10,
	CS_MODE_RISCV_THEAD = 1 << 11,
	CS_MODE_RISCV_SIFIVE = 1 << 12,
	CS_MODE_RISCV_BITMANIP = 1 << 13,
	CS_MODE_RISCV_ZBA  = 1 << 14,  
	CS_MODE_RISCV_ZBB  = 1 << 15,  
	CS_MODE_RISCV_ZBC  = 1 << 16,  
	CS_MODE_RISCV_ZBKB = 1 << 17,  
	CS_MODE_RISCV_ZBKC = 1 << 18, 
	CS_MODE_RISCV_ZBKX = 1 << 19, 
	CS_MODE_RISCV_ZBS  = 1 << 20, 

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, ideally we need at least one cstool test per mode. Just my thoughts. @Rot127 WDYT?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should be independent flags like you would setup gcc/clang when specifying the cpu options

Copy link
Collaborator

@Rot127 Rot127 Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just my thoughts. @Rot127 WDYT?

Yes, good idea. For the sake of simplicity we can implement them in Python in a new sub directory tests/integration/cstool I would suggest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wargio sorry for the late reply, but do you mean we should have a seperate flag per each RISCV extension ? my impression is that we said that the goal is to combine as much as possible into coarse-grained flags.

(I recognize that at the moment some RISCV extensions don't feature in any flags, that just means that no test failed because they were enabled, so they're always catch-all enabled in the feautre-checking logic. We can always add flags later.)

I think we can't have a seperate flag per extension because RISCV kept dividing and sub-dividing extensions till they exceeded 32, and we're using an int32 to store the feature bitvector.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but do you mean we should have a seperate flag per each RISCV extension ?

Yes, for each individual one, not for combinations. We don't know what the use cases of people are. So allowing for finer configs is important.

Also there are not that many if I am not mistaken?
https://en.wikichip.org/wiki/risc-v/standard_extensions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the fine-grained config, but the link is probably incomplete, off the top of my head I can't see the half-percision floating point extension. I'm sure we find others not listed if we look carefully.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also group some of the LLVM extensions in a reasonable way.
We don't need and should not copy them one to one.

E.g. casually looking at RISCVFeatures.td seems to have common prefixes for the different Z extensions. Where Zb, Zc, Zf etc. are certain extensions and Zfhmin, Zfh etc, are just sub-categories.

So just creating an extension for Zf, Zb, Zc etc. would be sufficient IMO.

Copy link
Collaborator

@Rot127 Rot127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check out the helper functions in Mapping.h. They should always be used for these tasks. Because they do bounds checking and such. Accessing the struct members directly (e.g. like details->groups[details->groups_count]) should be the absolute exception, not the rule.

You can also see how the helpers are used in other architectures and just do it the same way. LoongArch is a good example, because it is relatively simple.
AArch64 has many very complex operands. You can check its code if you need examples for these cases.

@Rot127
Copy link
Collaborator

Rot127 commented Sep 17, 2025

The conflicts are coming from the clang-format formatting we did recently btw. So nothing to preserve there.
You can just accept all your changes.

} \
}

#define compare_string_from_int_ret(actual, expected, converter, ret_val) \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea

Comment on lines +137 to +116
// for weird reasons some instructions end up with valid operands that are
// interspersed with invalid operands, i.e. the operands array is an "island"
// of valid operands with invalid gaps between them, this function will compactify
// all the valid operands and pad the rest of the array to invalid
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you debugged one or two of them? It could be an indicator something went wrong during the C++ -> C translation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, didn't think of that.. the operands array already behaves strangely, for example it can be filled out of order (the operand at [1] will be put in the array before the operand at [0]), I just dismissed those erratic behaviours as how the generated code behaves

I will see one of them in details.

@moste00 moste00 force-pushed the refactor_riscv_autosync branch from 12882dc to 1d59145 Compare October 17, 2025 01:16
@moste00 moste00 marked this pull request as ready for review October 17, 2025 01:19

float getFPImm(unsigned Imm)
{
CS_ASSERT(Imm != 1 && Imm != 30 && Imm != 31 &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imm can still be greater than the array boundaries.
Please check this as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Auto-Sync-files Auto-Sync blocker Must be finished with the assigned milestone. CS-core-files auto-sync LLVM-core-files auto-sync LLVM-generated-files RISCV Arch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants