Skip to content

feat: optimize C extensions#1814

Open
henry-hsieh wants to merge 1 commit intoriscv:mainfrom
henry-hsieh:feat-optimize-c-ext
Open

feat: optimize C extensions#1814
henry-hsieh wants to merge 1 commit intoriscv:mainfrom
henry-hsieh:feat-optimize-c-ext

Conversation

@henry-hsieh
Copy link
Copy Markdown
Contributor

Add 2 new attributes to instruction schema:

  1. add: Used to adjust operand value by adding an integer to binary value. Useful for describing x8-x15 registers in C extensions.
  2. implicit: Used to hint that the instruction is using an operand without explicitly allocating a field in the encoding. Useful for describing sp-/ra-relative instructions in C extensions.

Though implicit is not used by current riscv-unified-db, it may be useful for the library relied on it.

Other notable changes:

  1. inst_schema.json version is bumped to v0.3
  2. Because sign_extend default to false, I have added it to C instructions required signed-extend.
  3. Fix several C instructions missing alias for rd'/rs1' operands.
  4. Update pretty_print that if not array size is more than its reversion, use it instead.

Add 2 new attributes to instruction schema:
1. `add`: Used to adjust operand value by adding an integer to binary value.
   Useful for describing x8-x15 registers in C extensions.
2. `implicit`: Used to hint that the instruction is using an operand
   without explicitly allocating a field in the encoding. Useful for
   describing sp-/ra-relative instructions in C extensions.
@henry-hsieh henry-hsieh force-pushed the feat-optimize-c-ext branch from ed1337f to d1b4718 Compare April 30, 2026 01:55
@dhower-qc
Copy link
Copy Markdown
Collaborator

This has some commonalities with #1532. @ThinkOpenly, take a look and see what you think about any overlap.

@henry-hsieh
Copy link
Copy Markdown
Contributor Author

Didn't notice about the ongoing refactor in #1527. Feel free to close this one.
After short dig in, I think the add field is covered by new encode / decode field.
But I think implicit is not covered.

Copy link
Copy Markdown
Collaborator

@ThinkOpenly ThinkOpenly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for submitting this PR!

I suggest creating a separate PR with the bug fixes. That's an easy objective review which I expect would be merged quickly.

Aliases are probably fine and could go in a separate PR to get them reviewed/merged relatively quickly.

I have reservations for "add" described in a review comment.

I think "implicit" is probably needed, but associating it with operands instead of opcode fields is perhaps a better approach, and maybe it should be carried in #1527.

Comment on lines +270 to +274
"add": {
"type": "integer",
"default": 0,
"description": "Amount the field should be added before use, e.g., source register is ranged between 8 to 15, add is 8"
},
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, while this is useful in isolation, it presents some challenges by only solving one arithmetic issue, and leaves others (like additional operators and "precedence") unresolved. In #1527, I'm trying to lump all of the operand<->field transformations into new, general encode() and decode() methods. These are IDL, which allows most any transformation, but also complicates their use. Would these methods accommodate your use-case well enough?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. encode() and decode() should be enough for this. With these 2 functions, I think left_shift() could be merged into them, too.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think left_shift() could be merged into them, too.

True.

Comment on lines +275 to +278
"implicit": {
"type": "integer",
"description": "Implicitly use an operand with the value"
},
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do need something like this. You've applied it to opcode fields, but the variable fields are by definition, variable, and "implicit" by definition is not really variable, so it's not a great fit.

The implicit content seems to be more closely associated with operands (which is the approach in #1527): the input to the instruction.

It is a bit subjective.

Either way, I think it needs to be generalized to at least cover:

  • stack pointer (as you do here)
  • program counter (which can't be represented as an "integer" as is done here)
  • the unmentioned register in register pair usage (this is awkward because it depends on one of the non-implicit operands, so we'll need a way to represent that). I guess this case is sort of implicit and variable.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with moving it to the fields other than variable. To integrate with new operand field, I have some ideas:

  1. Implicit operand (with new implicit attribute)
$schema: "operand_schema.json#"
kind: operand
name: sp-implicit
data:
  $inherits: operand/xs.yaml#/data
  possible_values: [2]
  implicit: true
  1. Name with multiple of actual registers
  • Introduce name to values mapping
  • The name could be an array for ABI name swapping
  • The values can be an array with all included values
$schema: "operand_schema.json#"
kind: operand
name: xs-pair
data:
  type: reg_pair
  name: reg_pair_xs
  possible_values:
    - $inherits: operand/xs.yaml#/data
      name: ["x0", "zero"]
      values: [0, 1]
    - $inherits: operand/xs.yaml#/data
      name: ["x2", "sp"]
      values: [2, 3]
    - $inherits: operand/xs.yaml#/data
      name: ["x4", "tp"]
      values: [4, 5]
...

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good ideas. Feel free to post them in #1527. (Or, I'll copy them there if you wish.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I'll copy them to the PR.

- name: xd
location: 11-7
not: 0
alias: xs1
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't even know UDB had this feature. Are you adding this just to match the ISA manual? Do you have a use-case for it? Just curious. It seems pretty harmless, so no objections.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, just to match ISA manual. But I think it also beneficial for someone who wants to use this database for compiler scheduling or RTL decoder generation. For these purposes, you must know these operands are used for both source and destination. I think with new operand system. We could just create a new xsdc operand for this purpose.

$schema: "operand_schema.json#"
kind: operand
name: xsdc
data:
  $inherits: operand/xdc.yaml#/data
  source: true

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. I didn't yet model a source/destination register, but I think the capability is already there (as in your example).

- name: imm
location: 12|6-2
not: 0
sign_extend: true
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These sign_extend fixes should probably go in their own PR, since they are not subjective and we could get them merged much faster.

Copy link
Copy Markdown
Contributor Author

@henry-hsieh henry-hsieh May 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I'll file another PR for this in a few days.

Edit: A few minutes actually 😅

Comment on lines +27 to +28
- name: xd
location: 11-7
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this fails to validate properly, as this new "variable" is hardcoded to 2 ("00010") in the "match" string just above. Try running ./do gen:resolved_arch.

An open question, perhaps: where is the syntax defined for c.addi16sp? Is is c.addi16sp sp, imm or c.addi16sp imm? (Why would "sp" need to be an actual operand when it's attached to the mnemonic?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forget to change mask "00010" to "-----". I think two approaches are valid:

  1. Keep mask as is, but add implicit operand to this instruction
  2. Change mask to "-----" and add a new operand which only has 1 possible value, i.e., 2.

I don't think c.addi16sp is defined in any official document. I think we use this syntax because official GCC is using it: riscv-collab/riscv-gnu-toolchain#372

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. Thanks for digging up that (very sad) reference. 😕 Not a choice I would've made.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a strange case, indeed. Making the field a "variable with only one possible value" seems a little odd. Making "sp" an implicit operand when it is explicitly there also seems odd. This instruction is odd.

I could be swayed either way, but prefer the "variable with one possible value", if you want to update the mask.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the "variable with one possible value", too. But I want to use new operand system to do it instead of handcoding 0-31 exclude 2.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should do both the field and the operand. We do want to properly constrain the field so that simple decode won't match (if the purpose of the decode was to execute rather than disassemble, for example).

description: |
C.MV (move register) performs copy of the data in register xs2 to register xd
C.MV expands to addi xd, x0, xs2.
C.MV expands to add xd, x0, xs2.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another bug fix for a separate PR.

Comment thread spec/std/isa/inst/C/c.sdsp.yaml
if effective_xlen.nil?
if defined_in_base?(32)
encoding(32).decode_variables.each do |d|
next if d.size.zero?
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a Ruby expert... is this ignoring variables which are "implicit"?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, otherwise the execution of Ruby script will fail. If the implicit will be added to operands instead of fields, I think this change should be discarded.

# implicitly use an operand with the value
attr_reader :implicit

def implicit? = @implicit != 0
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have an use of implicit: 0, in c.beqz above. What does this do?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to use this field to skip references to variable with implicit attribute. However, I eventually used checks like if @encoding_fields.empty? and if !field_data["location"].nil?. These lines should be removed.

Comment on lines +789 to +790
return [] if @encoding_fields.empty?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this needed because of the changes in this PR?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. @encoding_fields may be empty for implicit variables. If we have consensuses that implicit should be add in operand instead of variable, all the changes in this Ruby file can be discarded.

@henry-hsieh
Copy link
Copy Markdown
Contributor Author

If everything is addressed, I would like to close this PR and wait for the new operand system. What do you think?

@ThinkOpenly
Copy link
Copy Markdown
Collaborator

If everything is addressed, I would like to close this PR and wait for the new operand system. What do you think?

How did you want to handle the operand alias names that you added here?

dhower-qc pushed a commit to dhower-qc/riscv-unified-db that referenced this pull request May 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants