diff --git a/docs/codegen.md b/docs/codegen.md new file mode 100644 index 000000000..b97f3ee4e --- /dev/null +++ b/docs/codegen.md @@ -0,0 +1,187 @@ +# Code Generation + +## Overview +rv32emu employs a tiered execution strategy with an interpreter and a two-tier JIT compiler. +The interpreter provides baseline execution while the JIT compiler generates native machine code for hot paths, +significantly improving performance for compute-intensive workloads. + +The code generation infrastructure supports both x86-64 and Arm64 host architectures through +an abstraction layer that maps common operations to architecture-specific instructions. + +## Execution Tiers +The emulator uses three execution tiers: +1. Interpreter: Direct execution of RISC-V instructions via tail-call threaded dispatch +2. Tier-1 JIT: Template-based native code generation for basic blocks +3. Tier-2 JIT: LLVM-based compilation for frequently executed hot paths + +When a basic block reaches a configurable execution threshold, it gets promoted from interpreter to Tier-1 JIT. +Blocks that continue to execute frequently may be further compiled by the Tier-2 LLVM backend for additional optimizations. + +## Source File Organization + +| File | Purpose | +|------|---------| +| `src/rv32_template.c` | Interpreter instruction implementations using RVOP macro | +| `src/rv32_jit.c` | Tier-1 JIT code generators using GEN macro (included by jit.c) | +| `src/rv32_constopt.c` | IR-level constant folding and optimization | +| `src/jit.c` | Tier-1 JIT infrastructure, emit_* API, and fused instruction handlers | +| `src/t2c.c` | Tier-2 JIT driver (includes t2c_template.c) | +| `src/t2c_template.c` | Tier-2 JIT instruction handlers using T2C_OP macro | +| `src/emulate.c` | Main execution loop, tail-call dispatch, and macro-op fusion | + +## Interpreter Implementation +The interpreter uses the `RVOP` macro to define instruction handlers. +Each handler receives the emulator state and decoded instruction, then performs the operation directly in C: +```c +RVOP(name, { body }) +``` + +This expands to a function that: +- Receives: rv (emulator state), ir (decoded instruction), cycle (counter), PC +- Returns: bool indicating whether to continue execution + +Example implementation for the addi instruction: +```c +RVOP(addi, { rv->X[ir->rd] = rv->X[ir->rs1] + ir->imm; }) +``` + +The interpreter uses Tail Call Threaded Code (TCTC) for efficient instruction dispatch. +Each handler ends with a tail call (using the `musttail` attribute) to the next instruction's handler, +avoiding function call overhead and enabling better branch prediction compared to switch-based dispatch. + +## Tier-1 JIT Code Generation +The Tier-1 JIT compiler uses the GEN macro to define native code generators: +```c +GEN(name, { body }) +``` + +Each generator emits native machine code that performs the equivalent operation using host CPU instructions. +The `emit_*` functions append bytes to the JIT buffer. + +Example: The addi instruction generates a host instruction sequence like: +``` +mov VR0, [memory address of (rv->X + rs1)] +mov VR1, VR0 +add VR1, imm +``` + +Note: This is conceptual assembly. +Actual instruction generation depends on the dynamic register allocator state and whether source registers are already mapped. + +## Register Allocation (Tier-1) +The Tier-1 JIT maintains a register mapping between RISC-V and host registers: + +| Register | Purpose | +|----------|---------| +| `vm_reg[0..2]` | Host registers mapped to VM registers for current operation | +| `temp_reg` | Scratch register for intermediate calculations | +| `parameter_reg[0]` | Points to the `riscv_t` structure | + +Register allocation is performed dynamically. +When a RISC-V register value is needed, +the allocator either returns an already-mapped host register or loads the value from memory into an available host register. + +## Tier-2 JIT Compilation +The Tier-2 JIT uses LLVM to compile frequently executed blocks into highly optimized native code. +Instruction handlers are defined in `src/t2c_template.c` using the `T2C_OP` macro: +```c +T2C_OP(name, { body }) +``` + +Each handler translates the RISC-V instruction semantics into LLVM IR using the LLVM C API. +LLVM then applies its optimization passes and register allocation, +producing native code that typically outperforms Tier-1 for hot paths. + +Tier-2 compilation requires LLVM 18 and is enabled with `ENABLE_JIT=1` at build time. + +## IR Optimization +Before execution or JIT compilation, +the emulator performs optimization passes on the internal instruction representation (IR). +Defined in `src/rv32_constopt.c`, +these passes analyze basic blocks to identify opportunities for constant propagation and folding. +For example, a `lui` followed by `addi` to construct a 32-bit constant can be optimized to reduce runtime computation. +This optimization benefits both the interpreter and JIT tiers. + +## Code Generation Macros (Tier-1) +Helper macros reduce code duplication for common Tier-1 instruction patterns: + +| Macro | Instructions | +|-------|--------------| +| `GEN_BRANCH` | beq, bne, blt, bge, bltu, bgeu | +| `GEN_CBRANCH` | cbeqz, cbnez (compressed) | +| `GEN_ALU_IMM` | addi, xori, ori, andi | +| `GEN_ALU_REG` | add, sub, xor, or, and | +| `GEN_SHIFT_IMM` | slli, srli, srai | +| `GEN_SHIFT_REG` | sll, srl, sra | +| `GEN_SLT_IMM` | slti, sltiu | +| `GEN_SLT_REG` | slt, sltu | +| `GEN_LOAD` | lb, lh, lw, lbu, lhu | +| `GEN_STORE` | sb, sh, sw | + +Each macro encapsulates the common code generation pattern for its instruction class, +taking only the instruction-specific parameters (opcode, condition code, etc.). + +## Dual-Architecture Support +The JIT supports both x86-64 and Arm64 through an abstraction layer in jit.c. +Constants use x86-64 encoding values but serve as symbolic identifiers on both architectures. +The `emit_*` functions contain architecture-specific implementations guarded by preprocessor conditionals: +```c +#if defined(__x86_64__) + // x86-64 specific code generation +#elif defined(__aarch64__) + // Arm64 specific code generation +#endif +``` + +For example, jump condition codes (`JCC_JE`, `JCC_JNE`, etc.) match x86-64 Jcc opcodes +but are mapped to equivalent Arm64 condition codes by `emit_jcc_offset()`. + +## Emit API (Tier-1) +The `emit_*` functions provide the low-level interface for Tier-1 machine code generation: + +| Function | Description | +|----------|-------------| +| `emit_alu32_imm32` | ALU operation with 32-bit immediate | +| `emit_alu32_imm8` | ALU operation with 8-bit immediate | +| `emit_alu32` | ALU operation between registers | +| `emit_load_imm` | Load immediate value into register | +| `emit_load` / `emit_store` | Memory access operations | +| `emit_mov` | Register-to-register move | +| `emit_cmp32` / `emit_cmp_imm32` | Comparison operations | +| `emit_jcc_offset` | Conditional jump | +| `emit_jmp` | Unconditional jump | +| `emit_call` | Function call to runtime helper | + +## Block Chaining +Translated blocks can be chained together to avoid returning to the dispatcher between consecutive blocks. +When a block ends with a direct branch to another translated block, +the JIT patches the branch target to jump directly to the target block's native code. + +Block chaining is controlled by the `ENABLE_BLOCK_CHAINING` configuration option. + +## Macro-op Fusion +The emulator fuses common instruction sequences into single operations, +benefiting both the interpreter and JIT tiers. Fusion is implemented in `src/emulate.c` +via `match_pattern` and `do_fuse*` handlers, which transform the IR before execution. + +For example, a lui followed by addi to construct a 32-bit constant can be fused into a single load-immediate operation. +The fused instructions are then handled by dedicated handlers in each tier: +- Interpreter: `do_fuse*` functions in `src/emulate.c` +- Tier-1 JIT: `do_fuse*` functions in `src/jit.c` +- Tier-2 JIT: `T2C_OP(fuse*, ...)` handlers in `src/t2c_template.c` + +Macro-op fusion is controlled by the `ENABLE_MOP_FUSION` configuration option. + +## Memory Access and MMIO +Load and store instructions check for memory-mapped I/O regions when system emulation is enabled. +The `GEN_LOAD` and `GEN_STORE` macros include conditional code generation for MMIO handling: +```c +IIF(RV32_HAS(SYSTEM_MMIO))( + // MMIO check and handler call +, + // Direct memory access +) +``` + +When MMIO is not enabled, the generated code performs direct memory access +without the overhead of region checking. diff --git a/src/emulate.c b/src/emulate.c index b9650383a..a283267c3 100644 --- a/src/emulate.c +++ b/src/emulate.c @@ -618,7 +618,7 @@ static uint32_t peripheral_update_ctr = 64; } while (0) #endif -#define RVOP(inst, code, asm) \ +#define RVOP(inst, code) \ static PRESERVE_NONE bool do_##inst(riscv_t *rv, const rv_insn_t *ir, \ uint64_t cycle, uint32_t PC) \ { \ diff --git a/src/jit.c b/src/jit.c index 98fc6124d..ad2b38b3b 100644 --- a/src/jit.c +++ b/src/jit.c @@ -875,31 +875,31 @@ static inline void emit_jcc_offset(struct jit_state *state, int code) { #if defined(__x86_64__) /* unconditional jump instruction does not have 0x0f prefix */ - if (code != 0xe9) + if (code != JCC_JMP) emit1(state, 0x0f); emit1(state, code); emit4(state, 0); #elif defined(__aarch64__) switch (code) { - case 0x84: /* BEQ */ + case JCC_JE: /* BEQ */ code = COND_EQ; break; - case 0x85: /* BNE */ + case JCC_JNE: /* BNE */ code = COND_NE; break; - case 0x8c: /* BLT */ + case JCC_JL: /* BLT */ code = COND_LT; break; - case 0x8d: /* BGE */ + case JCC_JGE: /* BGE */ code = COND_GE; break; - case 0x82: /* BLTU */ + case JCC_JB: /* BLTU */ code = COND_LO; break; - case 0x83: /* BGEU */ + case JCC_JAE: /* BGEU */ code = COND_HS; break; - case 0xe9: /* AL */ + case JCC_JMP: /* AL */ code = COND_AL; break; default: @@ -1194,7 +1194,7 @@ static inline void emit_jmp(struct jit_state *state, uint32_t target_satp UNUSED) { #if defined(__x86_64__) - emit1(state, 0xe9); + emit1(state, JCC_JMP); emit_jump_target_address(state, target_pc, target_satp); #elif defined(__aarch64__) assert(state->n_jumps < MAX_JUMPS); @@ -1240,7 +1240,7 @@ static inline void emit_call(struct jit_state *state, intptr_t target) static inline void emit_exit(struct jit_state *state) { #if defined(__x86_64__) - emit1(state, 0xe9); + emit1(state, JCC_JMP); emit_jump_target_offset(state, state->offset, state->exit_loc); emit4(state, 0); #elif defined(__aarch64__) @@ -1293,7 +1293,7 @@ static void divmod(struct jit_state *state, if (sign) { /* handle overflow */ uint32_t jump_loc_0 = state->offset; - emit_jcc_offset(state, 0x85); + emit_jcc_offset(state, JCC_JNE); emit_cmp_imm32(state, rm, -1); if (mod) emit_load_imm(state, R10, 0); @@ -1402,7 +1402,7 @@ static void muldivmod(struct jit_state *state, /* handle DIV overflow */ emit1(state, 0x9d); /* popfq */ uint32_t jump_loc_0 = state->offset; - emit_jcc_offset(state, 0x85); + emit_jcc_offset(state, JCC_JNE); emit_cmp_imm32(state, RCX, 0x80000000); emit_conditional_move(state, RCX, RAX); emit_jump_target_offset(state, JUMP_LOC_0, state->offset); @@ -1417,7 +1417,7 @@ static void muldivmod(struct jit_state *state, /* handle REM overflow */ emit1(state, 0x9d); /* popfq */ uint32_t jump_loc_0 = state->offset; - emit_jcc_offset(state, 0x85); + emit_jcc_offset(state, JCC_JNE); emit_cmp_imm32(state, RCX, 0x80000000); emit_load_imm(state, RCX, 0); emit_conditional_move(state, RCX, RDX); @@ -1832,7 +1832,6 @@ static int liveness_cmp(const void *l, const void *r) return 0; } -/* TODO: this function could be generated by "tools/gen-jit-template.py" */ static inline void liveness_calc(block_t *block) { uint32_t idx; @@ -2316,7 +2315,7 @@ void parse_branch_history_table(struct jit_state *state, emit_load_imm(state, register_map[0].reg_idx, bt->PC[max_idx]); emit_cmp32(state, temp_reg, register_map[0].reg_idx); uint32_t jump_loc_0 = state->offset; - emit_jcc_offset(state, 0x85); + emit_jcc_offset(state, JCC_JNE); #if RV32_HAS(SYSTEM) emit_jmp(state, bt->PC[max_idx], bt->satp[max_idx]); #else diff --git a/src/jit.h b/src/jit.h index 4769eeb48..13f87e7bb 100644 --- a/src/jit.h +++ b/src/jit.h @@ -10,6 +10,19 @@ #include "riscv_private.h" #include "utils.h" +/* Jump condition codes for branch instruction code generation. + * Values match x86-64 Jcc opcodes; on Arm64, emit_jcc_offset() maps these + * to equivalent condition codes. Used by both architectures as symbolic + * constants for conditional/unconditional jumps. + */ +#define JCC_JE 0x84 /* Jump if Equal (conditional) */ +#define JCC_JNE 0x85 /* Jump if Not Equal (conditional) */ +#define JCC_JL 0x8c /* Jump if Less - signed (conditional) */ +#define JCC_JGE 0x8d /* Jump if Greater or Equal - signed (conditional) */ +#define JCC_JB 0x82 /* Jump if Below - unsigned (conditional) */ +#define JCC_JAE 0x83 /* Jump if Above or Equal - unsigned (conditional) */ +#define JCC_JMP 0xe9 /* Jump unconditional */ + struct jump { uint32_t offset_loc; uint32_t target_pc; diff --git a/src/rv32_jit.c b/src/rv32_jit.c index b7127ddf8..a56ee1059 100644 --- a/src/rv32_jit.c +++ b/src/rv32_jit.c @@ -1,3 +1,324 @@ +/* + * JIT Code Generator for RISC-V Instructions (Tier-1) + * + * This file contains native code generation handlers for RISC-V instructions, + * supporting x86-64 and Arm64 host architectures. + * + * Architecture Overview: + * - GEN(name, { body }): Defines a generator that emits host machine code + * equivalent to the RISC-V instruction 'name'. + * - Register Allocation: Maps RISC-V registers (X[rd]) to host registers + * (vm_reg[0..2]) using a farthest-liveness eviction policy. + * - Manual Maintenance: Handlers are manually optimized for host performance + * and are independent of the interpreter implementations in rv32_template.c. + * + * Key Registers: + * - vm_reg[0..2]: Host registers allocated for VM register operations + * - temp_reg: Scratch register for intermediate calculations + * - parameter_reg[0]: Points to riscv_t structure + * + * Code Generation (emit_*) API: + * --------------------------------------------------------------------------- + * Function | Description + * --------------------------------------------------------------------------- + * emit_alu32/64 | Emits arithmetic/logic (ADD, SUB, XOR, OR, AND). + * emit_alu32_imm32/8 | Emits ALU operations with immediate operands. + * emit_load/store | Emits memory access with MMIO/System support. + * emit_load_sext | Emits sign-extending memory loads (LB, LH). + * emit_cmp32/imm32 | Emits comparison logic for branches/SLT. + * emit_jcc_offset | Emits conditional jumps (using JCC_* identifiers). + * emit_jmp | Emits unconditional jumps to a target PC. + * emit_exit | Emits the epilogue to return from JIT execution. + * --------------------------------------------------------------------------- + * + * Code Generation Macros: + * Helper macros reduce duplication for common instruction patterns: + * - GEN_BRANCH: Conditional branch instructions (beq, bne, blt, etc.) + * - GEN_CBRANCH: Compressed branch instructions (cbeqz, cbnez) + * - GEN_ALU_IMM: ALU with immediate operand (addi, xori, ori, andi) + * - GEN_ALU_REG: ALU with register operands (add, sub, xor, or, and) + * - GEN_SHIFT_IMM: Shift by immediate (slli, srli, srai) + * - GEN_SHIFT_REG: Shift by register (sll, srl, sra) + * - GEN_SLT_IMM: Set-less-than immediate (slti, sltiu) + * - GEN_SLT_REG: Set-less-than register (slt, sltu) + * - GEN_LOAD: Memory load with MMIO support (lb, lh, lw, lbu, lhu) + * - GEN_STORE: Memory store with MMIO support (sb, sh, sw) + * + * Host Abstraction Layer: + * The emit_* API abstracts architecture differences by using x86-64 bit + * patterns (e.g., JCC_JE=0x84, ALU_OP_ADD=0x01) as symbolic identifiers across + * all hosts. The backend in src/jit.c maps these to native Arm64 or x86 + * instructions. + * + * Memory Access Patterns: + * Handlers use IIF(RV32_HAS(SYSTEM_MMIO)) to switch between direct RAM access + * (User mode) and the JIT MMU handler path (System mode). + * + * See rv32_template.c for the corresponding interpreter implementations. + */ + +/* Branch epilogue helper - emits fall-through and taken paths. + * Used by both regular (4-byte) and compressed (2-byte) branch instructions. + */ +#define EMIT_BRANCH_EPILOGUE(inst_size) \ + do { \ + if (ir->branch_untaken) { \ + emit_jmp(state, ir->pc + (inst_size), rv->csr_satp); \ + } \ + emit_load_imm(state, temp_reg, ir->pc + (inst_size)); \ + emit_store(state, S32, temp_reg, parameter_reg[0], \ + offsetof(riscv_t, PC)); \ + emit_exit(state); \ + emit_jump_target_offset(state, JUMP_LOC_0, state->offset); \ + if (ir->branch_taken) { \ + emit_jmp(state, ir->pc + ir->imm, rv->csr_satp); \ + } \ + emit_load_imm(state, temp_reg, ir->pc + ir->imm); \ + emit_store(state, S32, temp_reg, parameter_reg[0], \ + offsetof(riscv_t, PC)); \ + emit_exit(state); \ + } while (0) + +/* Branch instruction handler macro - all branch instructions follow + * the same pattern, differing only in the condition code. + */ +#define GEN_BRANCH(inst, cond) \ + GEN(inst, { \ + ra_load2(state, ir->rs1, ir->rs2); \ + emit_cmp32(state, vm_reg[1], vm_reg[0]); \ + store_back(state); \ + uint32_t jump_loc_0 = state->offset; \ + emit_jcc_offset(state, cond); \ + EMIT_BRANCH_EPILOGUE(4); /* 4-byte instruction */ \ + }) + +/* Compressed branch instruction handler macro - compares rs1 with zero + * and uses pc+2 instead of pc+4 for compressed instruction size. + */ +#define GEN_CBRANCH(inst, cond) \ + GEN(inst, { \ + vm_reg[0] = ra_load(state, ir->rs1); \ + emit_cmp_imm32(state, vm_reg[0], 0); \ + store_back(state); \ + uint32_t jump_loc_0 = state->offset; \ + emit_jcc_offset(state, cond); \ + EMIT_BRANCH_EPILOGUE(2); /* 2-byte instruction */ \ + }) + +/* Group 1 ALU opcode for immediate operand (x86-64 encoding). + * On Arm64, this opcode is ignored; ALU_* selectors determine the operation. + */ +#define ALU_GRP1_OPCODE 0x81 + +/* ALU operation selectors for group 1 operations. + * On x86-64: ModR/M reg field values. On Arm64: switch case selectors + * in emit_alu32_imm32() that map to native instructions. + */ +#define ALU_ADD 0 +#define ALU_OR 1 +#define ALU_AND 4 +#define ALU_XOR 6 + +/* ALU immediate instruction handler macro */ +#define GEN_ALU_IMM(inst, op) \ + GEN(inst, { \ + vm_reg[0] = ra_load(state, ir->rs1); \ + vm_reg[1] = map_vm_reg_reserved(state, ir->rd, vm_reg[0]); \ + if (vm_reg[0] != vm_reg[1]) { \ + emit_mov(state, vm_reg[0], vm_reg[1]); \ + } \ + emit_alu32_imm32(state, ALU_GRP1_OPCODE, op, vm_reg[1], ir->imm); \ + }) + +/* Shift operation identifiers. + * Values match x86-64 ModR/M reg field; used on both architectures. + */ +#define SHIFT_SHL 4 +#define SHIFT_SHR 5 +#define SHIFT_SAR 7 + +/* Shift opcodes (x86-64 encoding). + * On Arm64, emit_alu32_imm8() and emit_alu32() map SHIFT_* values to native + * instructions. + */ +#define SHIFT_IMM_OPCODE 0xc1 /* Shift by immediate */ +#define SHIFT_REG_OPCODE 0xd3 /* Shift by register */ + +/* RV32 shift amount mask - only lower 5 bits used */ +#define RV32_SHIFT_MASK 0x1f + +/* Shift immediate instruction handler macro */ +#define GEN_SHIFT_IMM(inst, op) \ + GEN(inst, { \ + vm_reg[0] = ra_load(state, ir->rs1); \ + vm_reg[1] = map_vm_reg_reserved(state, ir->rd, vm_reg[0]); \ + if (vm_reg[0] != vm_reg[1]) { \ + emit_mov(state, vm_reg[0], vm_reg[1]); \ + } \ + emit_alu32_imm8(state, SHIFT_IMM_OPCODE, op, vm_reg[1], \ + ir->imm & RV32_SHIFT_MASK); \ + }) + +/* ALU opcodes for register-to-register operations (x86-64 encoding). + * On Arm64, emit_alu32() maps these to equivalent instructions. + */ +#define ALU_OP_ADD 0x01 +#define ALU_OP_SUB 0x29 +#define ALU_OP_XOR 0x31 +#define ALU_OP_OR 0x09 +#define ALU_OP_AND 0x21 + +/* ALU register instruction handler macro */ +#define GEN_ALU_REG(inst, op) \ + GEN(inst, { \ + ra_load2(state, ir->rs1, ir->rs2); \ + vm_reg[2] = map_vm_reg_reserved2(state, ir->rd, vm_reg[0], vm_reg[1]); \ + emit_mov(state, vm_reg[1], temp_reg); \ + emit_mov(state, vm_reg[0], vm_reg[2]); \ + emit_alu32(state, op, temp_reg, vm_reg[2]); \ + }) + +/* Shift register instruction handler macro */ +#define GEN_SHIFT_REG(inst, op) \ + GEN(inst, { \ + ra_load2(state, ir->rs1, ir->rs2); \ + vm_reg[2] = map_vm_reg_reserved2(state, ir->rd, vm_reg[0], vm_reg[1]); \ + emit_mov(state, vm_reg[1], temp_reg); \ + emit_mov(state, vm_reg[0], vm_reg[2]); \ + emit_alu32_imm32(state, ALU_GRP1_OPCODE, ALU_AND, temp_reg, \ + RV32_SHIFT_MASK); \ + emit_alu32(state, SHIFT_REG_OPCODE, op, vm_reg[2]); \ + }) + +/* Set-less-than immediate instruction handler macro (slti/sltiu) */ +#define GEN_SLT_IMM(inst, cond) \ + GEN(inst, { \ + vm_reg[0] = ra_load(state, ir->rs1); \ + emit_cmp_imm32(state, vm_reg[0], ir->imm); \ + vm_reg[1] = map_vm_reg_reserved(state, ir->rd, vm_reg[0]); \ + emit_load_imm(state, vm_reg[1], 1); \ + uint32_t jump_loc_0 = state->offset; \ + emit_jcc_offset(state, cond); \ + emit_load_imm(state, vm_reg[1], 0); \ + emit_jump_target_offset(state, JUMP_LOC_0, state->offset); \ + }) + +/* Set-less-than register instruction handler macro (slt/sltu) */ +#define GEN_SLT_REG(inst, cond) \ + GEN(inst, { \ + ra_load2(state, ir->rs1, ir->rs2); \ + vm_reg[2] = map_vm_reg_reserved2(state, ir->rd, vm_reg[0], vm_reg[1]); \ + emit_cmp32(state, vm_reg[1], vm_reg[0]); \ + emit_load_imm(state, vm_reg[2], 1); \ + uint32_t jump_loc_0 = state->offset; \ + emit_jcc_offset(state, cond); \ + emit_load_imm(state, vm_reg[2], 0); \ + emit_jump_target_offset(state, JUMP_LOC_0, state->offset); \ + }) + +/* Load instruction handler macro - handles MMIO path when SYSTEM_MMIO enabled. + * Parameters: + * inst: instruction name (lb, lh, lw, lbu, lhu) + * insn_type: rv_insn_* constant for MMIO handler + * size: memory access size (S8, S16, S32) + * load_fn: emit_load or emit_load_sext + */ +#define GEN_LOAD(inst, insn_type, size, load_fn) \ + GEN(inst, { \ + memory_t *m = PRIV(rv)->mem; \ + vm_reg[0] = ra_load(state, ir->rs1); \ + IIF(RV32_HAS(SYSTEM_MMIO))( \ + { \ + emit_load_imm_sext(state, temp_reg, ir->imm); \ + emit_alu32(state, ALU_OP_ADD, vm_reg[0], temp_reg); \ + emit_store(state, S32, temp_reg, parameter_reg[0], \ + offsetof(riscv_t, jit_mmu.vaddr)); \ + emit_load_imm(state, temp_reg, insn_type); \ + emit_store(state, S32, temp_reg, parameter_reg[0], \ + offsetof(riscv_t, jit_mmu.type)); \ + \ + store_back(state); \ + emit_jit_mmu_handler(state, ir->rd); \ + reset_reg(); \ + \ + /* If MMIO, load from X[rd]; otherwise load from memory */ \ + emit_load(state, S32, parameter_reg[0], temp_reg, \ + offsetof(riscv_t, jit_mmu.is_mmio)); \ + emit_cmp_imm32(state, temp_reg, 0); \ + vm_reg[1] = map_vm_reg(state, ir->rd); \ + uint32_t jump_loc_0 = state->offset; \ + emit_jcc_offset(state, JCC_JE); \ + \ + emit_load(state, S32, parameter_reg[0], vm_reg[1], \ + offsetof(riscv_t, X) + 4 * ir->rd); \ + uint64_t jump_loc_1 = state->offset; \ + emit_jcc_offset(state, JCC_JMP); \ + \ + emit_jump_target_offset(state, JUMP_LOC_0, state->offset); \ + emit_load(state, S32, parameter_reg[0], vm_reg[0], \ + offsetof(riscv_t, jit_mmu.paddr)); \ + emit_load_imm_sext(state, temp_reg, (intptr_t) m->mem_base); \ + emit_alu64(state, ALU_OP_ADD, vm_reg[0], temp_reg); \ + load_fn(state, size, temp_reg, vm_reg[1], 0); \ + emit_jump_target_offset(state, JUMP_LOC_1, state->offset); \ + }, \ + { \ + emit_load_imm_sext(state, temp_reg, \ + (intptr_t) (m->mem_base + ir->imm)); \ + emit_alu64(state, ALU_OP_ADD, vm_reg[0], temp_reg); \ + vm_reg[1] = map_vm_reg(state, ir->rd); \ + load_fn(state, size, temp_reg, vm_reg[1], 0); \ + }) \ + }) + +/* Store instruction handler macro - handles MMIO path when SYSTEM_MMIO enabled. + * Parameters: + * inst: instruction name (sb, sh, sw) + * insn_type: rv_insn_* constant for MMIO handler + * size: memory access size (S8, S16, S32) + */ +#define GEN_STORE(inst, insn_type, size) \ + GEN(inst, { \ + memory_t *m = PRIV(rv)->mem; \ + vm_reg[0] = ra_load(state, ir->rs1); \ + IIF(RV32_HAS(SYSTEM_MMIO))( \ + { \ + emit_load_imm_sext(state, temp_reg, ir->imm); \ + emit_alu32(state, ALU_OP_ADD, vm_reg[0], temp_reg); \ + emit_store(state, S32, temp_reg, parameter_reg[0], \ + offsetof(riscv_t, jit_mmu.vaddr)); \ + emit_load_imm(state, temp_reg, insn_type); \ + emit_store(state, S32, temp_reg, parameter_reg[0], \ + offsetof(riscv_t, jit_mmu.type)); \ + store_back(state); \ + emit_jit_mmu_handler(state, ir->rs2); \ + reset_reg(); \ + \ + /* If MMIO, skip store (handled by MMIO handler) */ \ + emit_load(state, S32, parameter_reg[0], temp_reg, \ + offsetof(riscv_t, jit_mmu.is_mmio)); \ + emit_cmp_imm32(state, temp_reg, 1); \ + uint32_t jump_loc_0 = state->offset; \ + emit_jcc_offset(state, JCC_JE); \ + \ + emit_load(state, S32, parameter_reg[0], vm_reg[0], \ + offsetof(riscv_t, jit_mmu.paddr)); \ + emit_load_imm_sext(state, temp_reg, (intptr_t) m->mem_base); \ + emit_alu64(state, ALU_OP_ADD, vm_reg[0], temp_reg); \ + vm_reg[1] = ra_load(state, ir->rs2); \ + emit_store(state, size, vm_reg[1], temp_reg, 0); \ + emit_jump_target_offset(state, JUMP_LOC_0, state->offset); \ + reset_reg(); \ + }, \ + { \ + emit_load_imm_sext(state, temp_reg, \ + (intptr_t) (m->mem_base + ir->imm)); \ + emit_alu64(state, ALU_OP_ADD, vm_reg[0], temp_reg); \ + vm_reg[1] = ra_load(state, ir->rs2); \ + emit_store(state, size, vm_reg[1], temp_reg, 0); \ + }) \ + }) + GEN(nop, {}) GEN(lui, { vm_reg[0] = map_vm_reg(state, ir->rd); @@ -21,8 +342,9 @@ GEN(jal, { GEN(jalr, { vm_reg[0] = ra_load(state, ir->rs1); emit_mov(state, vm_reg[0], temp_reg); - emit_alu32_imm32(state, 0x81, 0, temp_reg, ir->imm); - emit_alu32_imm32(state, 0x81, 4, temp_reg, ~1U); + emit_alu32_imm32(state, ALU_GRP1_OPCODE, ALU_ADD, temp_reg, ir->imm); + /* RISC-V spec: target address LSB is always cleared */ + emit_alu32_imm32(state, ALU_GRP1_OPCODE, ALU_AND, temp_reg, ~1U); if (ir->rd) { vm_reg[1] = map_vm_reg(state, ir->rd); emit_load_imm(state, vm_reg[1], ir->pc + 4); @@ -32,683 +354,46 @@ GEN(jalr, { emit_store(state, S32, temp_reg, parameter_reg[0], offsetof(riscv_t, PC)); emit_exit(state); }) -GEN(beq, { - ra_load2(state, ir->rs1, ir->rs2); - emit_cmp32(state, vm_reg[1], vm_reg[0]); - store_back(state); - uint32_t jump_loc_0 = state->offset; - emit_jcc_offset(state, 0x84); - if (ir->branch_untaken) { - emit_jmp(state, ir->pc + 4, rv->csr_satp); - } - emit_load_imm(state, temp_reg, ir->pc + 4); - emit_store(state, S32, temp_reg, parameter_reg[0], offsetof(riscv_t, PC)); - emit_exit(state); - emit_jump_target_offset(state, JUMP_LOC_0, state->offset); - if (ir->branch_taken) { - emit_jmp(state, ir->pc + ir->imm, rv->csr_satp); - } - emit_load_imm(state, temp_reg, ir->pc + ir->imm); - emit_store(state, S32, temp_reg, parameter_reg[0], offsetof(riscv_t, PC)); - emit_exit(state); -}) -GEN(bne, { - ra_load2(state, ir->rs1, ir->rs2); - emit_cmp32(state, vm_reg[1], vm_reg[0]); - store_back(state); - uint32_t jump_loc_0 = state->offset; - emit_jcc_offset(state, 0x85); - if (ir->branch_untaken) { - emit_jmp(state, ir->pc + 4, rv->csr_satp); - } - emit_load_imm(state, temp_reg, ir->pc + 4); - emit_store(state, S32, temp_reg, parameter_reg[0], offsetof(riscv_t, PC)); - emit_exit(state); - emit_jump_target_offset(state, JUMP_LOC_0, state->offset); - if (ir->branch_taken) { - emit_jmp(state, ir->pc + ir->imm, rv->csr_satp); - } - emit_load_imm(state, temp_reg, ir->pc + ir->imm); - emit_store(state, S32, temp_reg, parameter_reg[0], offsetof(riscv_t, PC)); - emit_exit(state); -}) -GEN(blt, { - ra_load2(state, ir->rs1, ir->rs2); - emit_cmp32(state, vm_reg[1], vm_reg[0]); - store_back(state); - uint32_t jump_loc_0 = state->offset; - emit_jcc_offset(state, 0x8c); - if (ir->branch_untaken) { - emit_jmp(state, ir->pc + 4, rv->csr_satp); - } - emit_load_imm(state, temp_reg, ir->pc + 4); - emit_store(state, S32, temp_reg, parameter_reg[0], offsetof(riscv_t, PC)); - emit_exit(state); - emit_jump_target_offset(state, JUMP_LOC_0, state->offset); - if (ir->branch_taken) { - emit_jmp(state, ir->pc + ir->imm, rv->csr_satp); - } - emit_load_imm(state, temp_reg, ir->pc + ir->imm); - emit_store(state, S32, temp_reg, parameter_reg[0], offsetof(riscv_t, PC)); - emit_exit(state); -}) -GEN(bge, { - ra_load2(state, ir->rs1, ir->rs2); - emit_cmp32(state, vm_reg[1], vm_reg[0]); - store_back(state); - uint32_t jump_loc_0 = state->offset; - emit_jcc_offset(state, 0x8d); - if (ir->branch_untaken) { - emit_jmp(state, ir->pc + 4, rv->csr_satp); - } - emit_load_imm(state, temp_reg, ir->pc + 4); - emit_store(state, S32, temp_reg, parameter_reg[0], offsetof(riscv_t, PC)); - emit_exit(state); - emit_jump_target_offset(state, JUMP_LOC_0, state->offset); - if (ir->branch_taken) { - emit_jmp(state, ir->pc + ir->imm, rv->csr_satp); - } - emit_load_imm(state, temp_reg, ir->pc + ir->imm); - emit_store(state, S32, temp_reg, parameter_reg[0], offsetof(riscv_t, PC)); - emit_exit(state); -}) -GEN(bltu, { - ra_load2(state, ir->rs1, ir->rs2); - emit_cmp32(state, vm_reg[1], vm_reg[0]); - store_back(state); - uint32_t jump_loc_0 = state->offset; - emit_jcc_offset(state, 0x82); - if (ir->branch_untaken) { - emit_jmp(state, ir->pc + 4, rv->csr_satp); - } - emit_load_imm(state, temp_reg, ir->pc + 4); - emit_store(state, S32, temp_reg, parameter_reg[0], offsetof(riscv_t, PC)); - emit_exit(state); - emit_jump_target_offset(state, JUMP_LOC_0, state->offset); - if (ir->branch_taken) { - emit_jmp(state, ir->pc + ir->imm, rv->csr_satp); - } - emit_load_imm(state, temp_reg, ir->pc + ir->imm); - emit_store(state, S32, temp_reg, parameter_reg[0], offsetof(riscv_t, PC)); - emit_exit(state); -}) -GEN(bgeu, { - ra_load2(state, ir->rs1, ir->rs2); - emit_cmp32(state, vm_reg[1], vm_reg[0]); - store_back(state); - uint32_t jump_loc_0 = state->offset; - emit_jcc_offset(state, 0x83); - if (ir->branch_untaken) { - emit_jmp(state, ir->pc + 4, rv->csr_satp); - } - emit_load_imm(state, temp_reg, ir->pc + 4); - emit_store(state, S32, temp_reg, parameter_reg[0], offsetof(riscv_t, PC)); - emit_exit(state); - emit_jump_target_offset(state, JUMP_LOC_0, state->offset); - if (ir->branch_taken) { - emit_jmp(state, ir->pc + ir->imm, rv->csr_satp); - } - emit_load_imm(state, temp_reg, ir->pc + ir->imm); - emit_store(state, S32, temp_reg, parameter_reg[0], offsetof(riscv_t, PC)); - emit_exit(state); -}) -GEN(lb, { - memory_t *m = PRIV(rv)->mem; - vm_reg[0] = ra_load(state, ir->rs1); - IIF(RV32_HAS(SYSTEM_MMIO))( - { - emit_load_imm_sext(state, temp_reg, ir->imm); - emit_alu32(state, 0x01, vm_reg[0], temp_reg); - emit_store(state, S32, temp_reg, parameter_reg[0], - offsetof(riscv_t, jit_mmu.vaddr)); - emit_load_imm(state, temp_reg, rv_insn_lb); - emit_store(state, S32, temp_reg, parameter_reg[0], - offsetof(riscv_t, jit_mmu.type)); - - store_back(state); - emit_jit_mmu_handler(state, ir->rd); - /* clear register mapping */ - reset_reg(); - - /* - * If it's MMIO, assign the read value to host register, otherwise, - * load from memory. - */ - emit_load(state, S32, parameter_reg[0], temp_reg, - offsetof(riscv_t, jit_mmu.is_mmio)); - emit_cmp_imm32(state, temp_reg, 0); - uint32_t jump_loc_0 = state->offset; - emit_jcc_offset(state, 0x84); - vm_reg[1] = map_vm_reg(state, ir->rd); - - emit_load(state, S32, parameter_reg[0], vm_reg[1], - offsetof(riscv_t, X) + 4 * ir->rd); - /* skip regular loading */ - uint64_t jump_loc_1 = state->offset; - emit_jcc_offset(state, 0xe9); - - emit_jump_target_offset(state, JUMP_LOC_0, state->offset); - emit_load(state, S32, parameter_reg[0], vm_reg[0], - offsetof(riscv_t, jit_mmu.paddr)); - emit_load_imm_sext(state, temp_reg, (intptr_t) m->mem_base); - emit_alu64(state, 0x01, vm_reg[0], temp_reg); - emit_load_sext(state, S8, temp_reg, vm_reg[1], 0); - emit_jump_target_offset(state, JUMP_LOC_1, state->offset); - }, - { - emit_load_imm_sext(state, temp_reg, - (intptr_t) (m->mem_base + ir->imm)); - emit_alu64(state, 0x01, vm_reg[0], temp_reg); - vm_reg[1] = map_vm_reg(state, ir->rd); - emit_load_sext(state, S8, temp_reg, vm_reg[1], 0); - }) -}) -GEN(lh, { - memory_t *m = PRIV(rv)->mem; - vm_reg[0] = ra_load(state, ir->rs1); - IIF(RV32_HAS(SYSTEM_MMIO))( - { - emit_load_imm_sext(state, temp_reg, ir->imm); - emit_alu32(state, 0x01, vm_reg[0], temp_reg); - emit_store(state, S32, temp_reg, parameter_reg[0], - offsetof(riscv_t, jit_mmu.vaddr)); - emit_load_imm(state, temp_reg, rv_insn_lh); - emit_store(state, S32, temp_reg, parameter_reg[0], - offsetof(riscv_t, jit_mmu.type)); - - store_back(state); - emit_jit_mmu_handler(state, ir->rd); - /* clear register mapping */ - reset_reg(); - - /* - * If it's MMIO, assign the read value to host register, otherwise, - * load from memory. - */ - emit_load(state, S32, parameter_reg[0], temp_reg, - offsetof(riscv_t, jit_mmu.is_mmio)); - emit_cmp_imm32(state, temp_reg, 0); - uint32_t jump_loc_0 = state->offset; - emit_jcc_offset(state, 0x84); - vm_reg[1] = map_vm_reg(state, ir->rd); - - emit_load(state, S32, parameter_reg[0], vm_reg[1], - offsetof(riscv_t, X) + 4 * ir->rd); - /* skip regular loading */ - uint64_t jump_loc_1 = state->offset; - emit_jcc_offset(state, 0xe9); - - emit_jump_target_offset(state, JUMP_LOC_0, state->offset); - emit_load(state, S32, parameter_reg[0], vm_reg[0], - offsetof(riscv_t, jit_mmu.paddr)); - emit_load_imm_sext(state, temp_reg, (intptr_t) m->mem_base); - emit_alu64(state, 0x01, vm_reg[0], temp_reg); - emit_load_sext(state, S16, temp_reg, vm_reg[1], 0); - emit_jump_target_offset(state, JUMP_LOC_1, state->offset); - }, - { - emit_load_imm_sext(state, temp_reg, - (intptr_t) (m->mem_base + ir->imm)); - emit_alu64(state, 0x01, vm_reg[0], temp_reg); - vm_reg[1] = map_vm_reg(state, ir->rd); - emit_load_sext(state, S16, temp_reg, vm_reg[1], 0); - }) -}) -GEN(lw, { - memory_t *m = PRIV(rv)->mem; - vm_reg[0] = ra_load(state, ir->rs1); - IIF(RV32_HAS(SYSTEM_MMIO))( - { - emit_load_imm_sext(state, temp_reg, ir->imm); - emit_alu32(state, 0x01, vm_reg[0], temp_reg); - emit_store(state, S32, temp_reg, parameter_reg[0], - offsetof(riscv_t, jit_mmu.vaddr)); - emit_load_imm(state, temp_reg, rv_insn_lw); - emit_store(state, S32, temp_reg, parameter_reg[0], - offsetof(riscv_t, jit_mmu.type)); - - store_back(state); - emit_jit_mmu_handler(state, ir->rd); - /* clear register mapping */ - reset_reg(); - - /* - * If it's MMIO, assign the read value to host register, otherwise, - * load from memory. - */ - emit_load(state, S32, parameter_reg[0], temp_reg, - offsetof(riscv_t, jit_mmu.is_mmio)); - emit_cmp_imm32(state, temp_reg, 0); - uint32_t jump_loc_0 = state->offset; - emit_jcc_offset(state, 0x84); - vm_reg[1] = map_vm_reg(state, ir->rd); - - emit_load(state, S32, parameter_reg[0], vm_reg[1], - offsetof(riscv_t, X) + 4 * ir->rd); - /* skip regular loading */ - uint64_t jump_loc_1 = state->offset; - emit_jcc_offset(state, 0xe9); - - emit_jump_target_offset(state, JUMP_LOC_0, state->offset); - emit_load(state, S32, parameter_reg[0], vm_reg[0], - offsetof(riscv_t, jit_mmu.paddr)); - emit_load_imm_sext(state, temp_reg, (intptr_t) m->mem_base); - emit_alu64(state, 0x01, vm_reg[0], temp_reg); - emit_load(state, S32, temp_reg, vm_reg[1], 0); - emit_jump_target_offset(state, JUMP_LOC_1, state->offset); - }, - { - emit_load_imm_sext(state, temp_reg, - (intptr_t) (m->mem_base + ir->imm)); - emit_alu64(state, 0x01, vm_reg[0], temp_reg); - vm_reg[1] = map_vm_reg(state, ir->rd); - emit_load(state, S32, temp_reg, vm_reg[1], 0); - }) -}) -GEN(lbu, { - memory_t *m = PRIV(rv)->mem; - vm_reg[0] = ra_load(state, ir->rs1); - IIF(RV32_HAS(SYSTEM_MMIO))( - { - emit_load_imm_sext(state, temp_reg, ir->imm); - emit_alu32(state, 0x01, vm_reg[0], temp_reg); - emit_store(state, S32, temp_reg, parameter_reg[0], - offsetof(riscv_t, jit_mmu.vaddr)); - emit_load_imm(state, temp_reg, rv_insn_lbu); - emit_store(state, S32, temp_reg, parameter_reg[0], - offsetof(riscv_t, jit_mmu.type)); - - store_back(state); - emit_jit_mmu_handler(state, ir->rd); - /* clear register mapping */ - reset_reg(); - - /* - * If it's MMIO, assign the read value to host register, otherwise, - * load from memory. - */ - emit_load(state, S32, parameter_reg[0], temp_reg, - offsetof(riscv_t, jit_mmu.is_mmio)); - emit_cmp_imm32(state, temp_reg, 0); - uint32_t jump_loc_0 = state->offset; - emit_jcc_offset(state, 0x84); - vm_reg[1] = map_vm_reg(state, ir->rd); - - emit_load(state, S32, parameter_reg[0], vm_reg[1], - offsetof(riscv_t, X) + 4 * ir->rd); - /* skip regular loading */ - uint64_t jump_loc_1 = state->offset; - emit_jcc_offset(state, 0xe9); - - emit_jump_target_offset(state, JUMP_LOC_0, state->offset); - emit_load(state, S32, parameter_reg[0], vm_reg[0], - offsetof(riscv_t, jit_mmu.paddr)); - emit_load_imm_sext(state, temp_reg, (intptr_t) m->mem_base); - emit_alu64(state, 0x01, vm_reg[0], temp_reg); - emit_load(state, S8, temp_reg, vm_reg[1], 0); - emit_jump_target_offset(state, JUMP_LOC_1, state->offset); - }, - { - emit_load_imm_sext(state, temp_reg, - (intptr_t) (m->mem_base + ir->imm)); - emit_alu64(state, 0x01, vm_reg[0], temp_reg); - vm_reg[1] = map_vm_reg(state, ir->rd); - emit_load(state, S8, temp_reg, vm_reg[1], 0); - }) -}) -GEN(lhu, { - memory_t *m = PRIV(rv)->mem; - vm_reg[0] = ra_load(state, ir->rs1); - IIF(RV32_HAS(SYSTEM_MMIO))( - { - emit_load_imm_sext(state, temp_reg, ir->imm); - emit_alu32(state, 0x01, vm_reg[0], temp_reg); - emit_store(state, S32, temp_reg, parameter_reg[0], - offsetof(riscv_t, jit_mmu.vaddr)); - emit_load_imm(state, temp_reg, rv_insn_lhu); - emit_store(state, S32, temp_reg, parameter_reg[0], - offsetof(riscv_t, jit_mmu.type)); - - store_back(state); - emit_jit_mmu_handler(state, ir->rd); - /* clear register mapping */ - reset_reg(); - - /* - * If it's MMIO, assign the read value to host register, otherwise, - * load from memory. - */ - emit_load(state, S32, parameter_reg[0], temp_reg, - offsetof(riscv_t, jit_mmu.is_mmio)); - emit_cmp_imm32(state, temp_reg, 0); - uint32_t jump_loc_0 = state->offset; - emit_jcc_offset(state, 0x84); - vm_reg[1] = map_vm_reg(state, ir->rd); - - emit_load(state, S32, parameter_reg[0], vm_reg[1], - offsetof(riscv_t, X) + 4 * ir->rd); - /* skip regular loading */ - uint64_t jump_loc_1 = state->offset; - emit_jcc_offset(state, 0xe9); - - emit_jump_target_offset(state, JUMP_LOC_0, state->offset); - emit_load(state, S32, parameter_reg[0], vm_reg[0], - offsetof(riscv_t, jit_mmu.paddr)); - emit_load_imm_sext(state, temp_reg, (intptr_t) m->mem_base); - emit_alu64(state, 0x01, vm_reg[0], temp_reg); - emit_load(state, S16, temp_reg, vm_reg[1], 0); - emit_jump_target_offset(state, JUMP_LOC_1, state->offset); - }, - { - emit_load_imm_sext(state, temp_reg, - (intptr_t) (m->mem_base + ir->imm)); - emit_alu64(state, 0x01, vm_reg[0], temp_reg); - vm_reg[1] = map_vm_reg(state, ir->rd); - emit_load(state, S16, temp_reg, vm_reg[1], 0); - }) -}) -GEN(sb, { - memory_t *m = PRIV(rv)->mem; - vm_reg[0] = ra_load(state, ir->rs1); - IIF(RV32_HAS(SYSTEM_MMIO))( - { - emit_load_imm_sext(state, temp_reg, ir->imm); - emit_alu32(state, 0x01, vm_reg[0], temp_reg); - emit_store(state, S32, temp_reg, parameter_reg[0], - offsetof(riscv_t, jit_mmu.vaddr)); - emit_load_imm(state, temp_reg, rv_insn_sb); - emit_store(state, S32, temp_reg, parameter_reg[0], - offsetof(riscv_t, jit_mmu.type)); - store_back(state); - emit_jit_mmu_handler(state, ir->rs2); - /* clear register mapping */ - reset_reg(); - - /* - * If it's MMIO, it does not need to do the storing since it has - * been done in the mmio handler, otherwise, store the value into - * memory. - */ - emit_load(state, S32, parameter_reg[0], temp_reg, - offsetof(riscv_t, jit_mmu.is_mmio)); - emit_cmp_imm32(state, temp_reg, 1); - uint32_t jump_loc_0 = state->offset; - emit_jcc_offset(state, 0x84); - - emit_load(state, S32, parameter_reg[0], vm_reg[0], - offsetof(riscv_t, jit_mmu.paddr)); - emit_load_imm_sext(state, temp_reg, (intptr_t) m->mem_base); - emit_alu64(state, 0x01, vm_reg[0], temp_reg); - vm_reg[1] = ra_load(state, ir->rs2); - emit_store(state, S8, vm_reg[1], temp_reg, 0); - emit_jump_target_offset(state, JUMP_LOC_0, state->offset); - /* - * Clear register mapping since we do not ensure operand "ir->rs2" - * is loaded or not. - */ - reset_reg(); - }, - { - emit_load_imm_sext(state, temp_reg, - (intptr_t) (m->mem_base + ir->imm)); - emit_alu64(state, 0x01, vm_reg[0], temp_reg); - vm_reg[1] = ra_load(state, ir->rs2); - emit_store(state, S8, vm_reg[1], temp_reg, 0); - }) -}) -GEN(sh, { - memory_t *m = PRIV(rv)->mem; - vm_reg[0] = ra_load(state, ir->rs1); - IIF(RV32_HAS(SYSTEM_MMIO))( - { - emit_load_imm_sext(state, temp_reg, ir->imm); - emit_alu32(state, 0x01, vm_reg[0], temp_reg); - emit_store(state, S32, temp_reg, parameter_reg[0], - offsetof(riscv_t, jit_mmu.vaddr)); - emit_load_imm(state, temp_reg, rv_insn_sh); - emit_store(state, S32, temp_reg, parameter_reg[0], - offsetof(riscv_t, jit_mmu.type)); - store_back(state); - emit_jit_mmu_handler(state, ir->rs2); - /* clear register mapping */ - reset_reg(); - - /* - * If it's MMIO, it does not need to do the storing since it has - * been done in the mmio handler, otherwise, store the value into - * memory. - */ - emit_load(state, S32, parameter_reg[0], temp_reg, - offsetof(riscv_t, jit_mmu.is_mmio)); - emit_cmp_imm32(state, temp_reg, 1); - uint32_t jump_loc_0 = state->offset; - emit_jcc_offset(state, 0x84); - - emit_load(state, S32, parameter_reg[0], vm_reg[0], - offsetof(riscv_t, jit_mmu.paddr)); - emit_load_imm_sext(state, temp_reg, (intptr_t) m->mem_base); - emit_alu64(state, 0x01, vm_reg[0], temp_reg); - vm_reg[1] = ra_load(state, ir->rs2); - emit_store(state, S16, vm_reg[1], temp_reg, 0); - emit_jump_target_offset(state, JUMP_LOC_0, state->offset); - /* - * Clear register mapping since we do not ensure operand "ir->rs2" - * is loaded or not. - */ - reset_reg(); - }, - { - emit_load_imm_sext(state, temp_reg, - (intptr_t) (m->mem_base + ir->imm)); - emit_alu64(state, 0x01, vm_reg[0], temp_reg); - vm_reg[1] = ra_load(state, ir->rs2); - emit_store(state, S16, vm_reg[1], temp_reg, 0); - }) -}) -GEN(sw, { - memory_t *m = PRIV(rv)->mem; - vm_reg[0] = ra_load(state, ir->rs1); - IIF(RV32_HAS(SYSTEM_MMIO))( - { - emit_load_imm_sext(state, temp_reg, ir->imm); - emit_alu32(state, 0x01, vm_reg[0], temp_reg); - emit_store(state, S32, temp_reg, parameter_reg[0], - offsetof(riscv_t, jit_mmu.vaddr)); - emit_load_imm(state, temp_reg, rv_insn_sw); - emit_store(state, S32, temp_reg, parameter_reg[0], - offsetof(riscv_t, jit_mmu.type)); - store_back(state); - emit_jit_mmu_handler(state, ir->rs2); - /* clear register mapping */ - reset_reg(); - - /* - * If it's MMIO, it does not need to do the storing since it has - * been done in the mmio handler, otherwise, store the value into - * memory. - */ - emit_load(state, S32, parameter_reg[0], temp_reg, - offsetof(riscv_t, jit_mmu.is_mmio)); - emit_cmp_imm32(state, temp_reg, 1); - uint32_t jump_loc_0 = state->offset; - emit_jcc_offset(state, 0x84); - - emit_load(state, S32, parameter_reg[0], vm_reg[0], - offsetof(riscv_t, jit_mmu.paddr)); - emit_load_imm_sext(state, temp_reg, (intptr_t) m->mem_base); - emit_alu64(state, 0x01, vm_reg[0], temp_reg); - vm_reg[1] = ra_load(state, ir->rs2); - emit_store(state, S32, vm_reg[1], temp_reg, 0); - emit_jump_target_offset(state, JUMP_LOC_0, state->offset); - /* - * Clear register mapping since we do not ensure operand "ir->rs2" - * is loaded into host register "vm_reg[1]" or not. - */ - reset_reg(); - }, - { - emit_load_imm_sext(state, temp_reg, - (intptr_t) (m->mem_base + ir->imm)); - emit_alu64(state, 0x01, vm_reg[0], temp_reg); - vm_reg[1] = ra_load(state, ir->rs2); - emit_store(state, S32, vm_reg[1], temp_reg, 0); - }) -}) -GEN(addi, { - vm_reg[0] = ra_load(state, ir->rs1); - vm_reg[1] = map_vm_reg_reserved(state, ir->rd, vm_reg[0]); - if (vm_reg[0] != vm_reg[1]) { - emit_mov(state, vm_reg[0], vm_reg[1]); - } - emit_alu32_imm32(state, 0x81, 0, vm_reg[1], ir->imm); -}) -GEN(slti, { - vm_reg[0] = ra_load(state, ir->rs1); - emit_cmp_imm32(state, vm_reg[0], ir->imm); - vm_reg[1] = map_vm_reg_reserved(state, ir->rd, vm_reg[0]); - emit_load_imm(state, vm_reg[1], 1); - uint32_t jump_loc_0 = state->offset; - emit_jcc_offset(state, 0x8c); - emit_load_imm(state, vm_reg[1], 0); - emit_jump_target_offset(state, JUMP_LOC_0, state->offset); -}) -GEN(sltiu, { - vm_reg[0] = ra_load(state, ir->rs1); - emit_cmp_imm32(state, vm_reg[0], ir->imm); - vm_reg[1] = map_vm_reg_reserved(state, ir->rd, vm_reg[0]); - emit_load_imm(state, vm_reg[1], 1); - uint32_t jump_loc_0 = state->offset; - emit_jcc_offset(state, 0x82); - emit_load_imm(state, vm_reg[1], 0); - emit_jump_target_offset(state, JUMP_LOC_0, state->offset); -}) -GEN(xori, { - vm_reg[0] = ra_load(state, ir->rs1); - vm_reg[1] = map_vm_reg_reserved(state, ir->rd, vm_reg[0]); - if (vm_reg[0] != vm_reg[1]) { - emit_mov(state, vm_reg[0], vm_reg[1]); - } - emit_alu32_imm32(state, 0x81, 6, vm_reg[1], ir->imm); -}) -GEN(ori, { - vm_reg[0] = ra_load(state, ir->rs1); - vm_reg[1] = map_vm_reg_reserved(state, ir->rd, vm_reg[0]); - if (vm_reg[0] != vm_reg[1]) { - emit_mov(state, vm_reg[0], vm_reg[1]); - } - emit_alu32_imm32(state, 0x81, 1, vm_reg[1], ir->imm); -}) -GEN(andi, { - vm_reg[0] = ra_load(state, ir->rs1); - vm_reg[1] = map_vm_reg_reserved(state, ir->rd, vm_reg[0]); - if (vm_reg[0] != vm_reg[1]) { - emit_mov(state, vm_reg[0], vm_reg[1]); - } - emit_alu32_imm32(state, 0x81, 4, vm_reg[1], ir->imm); -}) -GEN(slli, { - vm_reg[0] = ra_load(state, ir->rs1); - vm_reg[1] = map_vm_reg_reserved(state, ir->rd, vm_reg[0]); - if (vm_reg[0] != vm_reg[1]) { - emit_mov(state, vm_reg[0], vm_reg[1]); - } - emit_alu32_imm8(state, 0xc1, 4, vm_reg[1], ir->imm & 0x1f); -}) -GEN(srli, { - vm_reg[0] = ra_load(state, ir->rs1); - vm_reg[1] = map_vm_reg_reserved(state, ir->rd, vm_reg[0]); - if (vm_reg[0] != vm_reg[1]) { - emit_mov(state, vm_reg[0], vm_reg[1]); - } - emit_alu32_imm8(state, 0xc1, 5, vm_reg[1], ir->imm & 0x1f); -}) -GEN(srai, { - vm_reg[0] = ra_load(state, ir->rs1); - vm_reg[1] = map_vm_reg_reserved(state, ir->rd, vm_reg[0]); - if (vm_reg[0] != vm_reg[1]) { - emit_mov(state, vm_reg[0], vm_reg[1]); - } - emit_alu32_imm8(state, 0xc1, 7, vm_reg[1], ir->imm & 0x1f); -}) -GEN(add, { - ra_load2(state, ir->rs1, ir->rs2); - vm_reg[2] = map_vm_reg_reserved2(state, ir->rd, vm_reg[0], vm_reg[1]); - emit_mov(state, vm_reg[1], temp_reg); - emit_mov(state, vm_reg[0], vm_reg[2]); - emit_alu32(state, 0x01, temp_reg, vm_reg[2]); -}) -GEN(sub, { - ra_load2(state, ir->rs1, ir->rs2); - vm_reg[2] = map_vm_reg_reserved2(state, ir->rd, vm_reg[0], vm_reg[1]); - emit_mov(state, vm_reg[1], temp_reg); - emit_mov(state, vm_reg[0], vm_reg[2]); - emit_alu32(state, 0x29, temp_reg, vm_reg[2]); -}) -GEN(sll, { - ra_load2(state, ir->rs1, ir->rs2); - vm_reg[2] = map_vm_reg_reserved2(state, ir->rd, vm_reg[0], vm_reg[1]); - emit_mov(state, vm_reg[1], temp_reg); - emit_mov(state, vm_reg[0], vm_reg[2]); - emit_alu32_imm32(state, 0x81, 4, temp_reg, 0x1f); - emit_alu32(state, 0xd3, 4, vm_reg[2]); -}) -GEN(slt, { - ra_load2(state, ir->rs1, ir->rs2); - vm_reg[2] = map_vm_reg_reserved2(state, ir->rd, vm_reg[0], vm_reg[1]); - emit_cmp32(state, vm_reg[1], vm_reg[0]); - emit_load_imm(state, vm_reg[2], 1); - uint32_t jump_loc_0 = state->offset; - emit_jcc_offset(state, 0x8c); - emit_load_imm(state, vm_reg[2], 0); - emit_jump_target_offset(state, JUMP_LOC_0, state->offset); -}) -GEN(sltu, { - ra_load2(state, ir->rs1, ir->rs2); - vm_reg[2] = map_vm_reg_reserved2(state, ir->rd, vm_reg[0], vm_reg[1]); - emit_cmp32(state, vm_reg[1], vm_reg[0]); - emit_load_imm(state, vm_reg[2], 1); - uint32_t jump_loc_0 = state->offset; - emit_jcc_offset(state, 0x82); - emit_load_imm(state, vm_reg[2], 0); - emit_jump_target_offset(state, JUMP_LOC_0, state->offset); -}) -GEN(xor, { - ra_load2(state, ir->rs1, ir->rs2); - vm_reg[2] = map_vm_reg_reserved2(state, ir->rd, vm_reg[0], vm_reg[1]); - emit_mov(state, vm_reg[1], temp_reg); - emit_mov(state, vm_reg[0], vm_reg[2]); - emit_alu32(state, 0x31, temp_reg, vm_reg[2]); -}) -GEN(srl, { - ra_load2(state, ir->rs1, ir->rs2); - vm_reg[2] = map_vm_reg_reserved2(state, ir->rd, vm_reg[0], vm_reg[1]); - emit_mov(state, vm_reg[1], temp_reg); - emit_mov(state, vm_reg[0], vm_reg[2]); - emit_alu32_imm32(state, 0x81, 4, temp_reg, 0x1f); - emit_alu32(state, 0xd3, 5, vm_reg[2]); -}) -GEN(sra, { - ra_load2(state, ir->rs1, ir->rs2); - vm_reg[2] = map_vm_reg_reserved2(state, ir->rd, vm_reg[0], vm_reg[1]); - emit_mov(state, vm_reg[1], temp_reg); - emit_mov(state, vm_reg[0], vm_reg[2]); - emit_alu32_imm32(state, 0x81, 4, temp_reg, 0x1f); - emit_alu32(state, 0xd3, 7, vm_reg[2]); -}) -GEN(or, { - ra_load2(state, ir->rs1, ir->rs2); - vm_reg[2] = map_vm_reg_reserved2(state, ir->rd, vm_reg[0], vm_reg[1]); - emit_mov(state, vm_reg[1], temp_reg); - emit_mov(state, vm_reg[0], vm_reg[2]); - emit_alu32(state, 0x09, temp_reg, vm_reg[2]); -}) -GEN(and, { - ra_load2(state, ir->rs1, ir->rs2); - vm_reg[2] = map_vm_reg_reserved2(state, ir->rd, vm_reg[0], vm_reg[1]); - emit_mov(state, vm_reg[1], temp_reg); - emit_mov(state, vm_reg[0], vm_reg[2]); - emit_alu32(state, 0x21, temp_reg, vm_reg[2]); -}) +/* RV32I Branch Instructions */ +GEN_BRANCH(beq, JCC_JE) +GEN_BRANCH(bne, JCC_JNE) +GEN_BRANCH(blt, JCC_JL) +GEN_BRANCH(bge, JCC_JGE) +GEN_BRANCH(bltu, JCC_JB) +GEN_BRANCH(bgeu, JCC_JAE) +/* RV32I Load Instructions */ +GEN_LOAD(lb, rv_insn_lb, S8, emit_load_sext) +GEN_LOAD(lh, rv_insn_lh, S16, emit_load_sext) +GEN_LOAD(lw, rv_insn_lw, S32, emit_load) +GEN_LOAD(lbu, rv_insn_lbu, S8, emit_load) +GEN_LOAD(lhu, rv_insn_lhu, S16, emit_load) +/* RV32I Store Instructions */ +GEN_STORE(sb, rv_insn_sb, S8) +GEN_STORE(sh, rv_insn_sh, S16) +GEN_STORE(sw, rv_insn_sw, S32) +/* RV32I ALU Immediate Instructions */ +GEN_ALU_IMM(addi, ALU_ADD) +GEN_SLT_IMM(slti, JCC_JL) +GEN_SLT_IMM(sltiu, JCC_JB) +GEN_ALU_IMM(xori, ALU_XOR) +GEN_ALU_IMM(ori, ALU_OR) +GEN_ALU_IMM(andi, ALU_AND) +/* RV32I Shift Immediate Instructions */ +GEN_SHIFT_IMM(slli, SHIFT_SHL) +GEN_SHIFT_IMM(srli, SHIFT_SHR) +GEN_SHIFT_IMM(srai, SHIFT_SAR) +/* RV32I ALU Register Instructions */ +GEN_ALU_REG(add, ALU_OP_ADD) +GEN_ALU_REG(sub, ALU_OP_SUB) +/* RV32I Shift Register Instructions */ +GEN_SHIFT_REG(sll, SHIFT_SHL) +GEN_SLT_REG(slt, JCC_JL) +GEN_SLT_REG(sltu, JCC_JB) +GEN_ALU_REG(xor, ALU_OP_XOR) +GEN_SHIFT_REG(srl, SHIFT_SHR) +GEN_SHIFT_REG(sra, SHIFT_SAR) +GEN_ALU_REG(or, ALU_OP_OR) +GEN_ALU_REG(and, ALU_OP_AND) GEN(fence, { assert(NULL); }) GEN(ecall, { store_back(state); @@ -757,7 +442,7 @@ GEN(mulh, { emit_mov(state, vm_reg[1], temp_reg); emit_mov(state, vm_reg[0], vm_reg[2]); muldivmod(state, 0x2f, temp_reg, vm_reg[2], 0); - emit_alu64_imm8(state, 0xc1, 5, vm_reg[2], 32); + emit_alu64_imm8(state, SHIFT_IMM_OPCODE, SHIFT_SHR, vm_reg[2], 32); }) GEN(mulhsu, { ra_load2_sext(state, ir->rs1, ir->rs2, true, false); @@ -765,7 +450,7 @@ GEN(mulhsu, { emit_mov(state, vm_reg[1], temp_reg); emit_mov(state, vm_reg[0], vm_reg[2]); muldivmod(state, 0x2f, temp_reg, vm_reg[2], 0); - emit_alu64_imm8(state, 0xc1, 5, vm_reg[2], 32); + emit_alu64_imm8(state, SHIFT_IMM_OPCODE, SHIFT_SHR, vm_reg[2], 32); }) GEN(mulhu, { ra_load2(state, ir->rs1, ir->rs2); @@ -773,7 +458,7 @@ GEN(mulhu, { emit_mov(state, vm_reg[1], temp_reg); emit_mov(state, vm_reg[0], vm_reg[2]); muldivmod(state, 0x2f, temp_reg, vm_reg[2], 0); - emit_alu64_imm8(state, 0xc1, 5, vm_reg[2], 32); + emit_alu64_imm8(state, SHIFT_IMM_OPCODE, SHIFT_SHR, vm_reg[2], 32); }) GEN(div, { ra_load2_sext(state, ir->rs1, ir->rs2, true, true); @@ -852,7 +537,8 @@ GEN(caddi4spn, { if (vm_reg[0] != vm_reg[1]) { emit_mov(state, vm_reg[0], vm_reg[1]); } - emit_alu32_imm32(state, 0x81, 0, vm_reg[1], (uint16_t) ir->imm); + emit_alu32_imm32(state, ALU_GRP1_OPCODE, ALU_ADD, vm_reg[1], + (uint16_t) ir->imm); }) GEN(clw, { memory_t *m = PRIV(rv)->mem; @@ -873,7 +559,8 @@ GEN(csw, { GEN(cnop, {}) GEN(caddi, { vm_reg[0] = ra_load(state, ir->rd); - emit_alu32_imm32(state, 0x81, 0, vm_reg[0], (int16_t) ir->imm); + emit_alu32_imm32(state, ALU_GRP1_OPCODE, ALU_ADD, vm_reg[0], + (int16_t) ir->imm); }) GEN(cjal, { vm_reg[0] = map_vm_reg(state, rv_reg_ra); @@ -890,7 +577,7 @@ GEN(cli, { }) GEN(caddi16sp, { vm_reg[0] = ra_load(state, ir->rd); - emit_alu32_imm32(state, 0x81, 0, vm_reg[0], ir->imm); + emit_alu32_imm32(state, ALU_GRP1_OPCODE, ALU_ADD, vm_reg[0], ir->imm); }) GEN(clui, { vm_reg[0] = map_vm_reg(state, ir->rd); @@ -898,15 +585,15 @@ GEN(clui, { }) GEN(csrli, { vm_reg[0] = ra_load(state, ir->rs1); - emit_alu32_imm8(state, 0xc1, 5, vm_reg[0], ir->shamt); + emit_alu32_imm8(state, SHIFT_IMM_OPCODE, SHIFT_SHR, vm_reg[0], ir->shamt); }) GEN(csrai, { vm_reg[0] = ra_load(state, ir->rs1); - emit_alu32_imm8(state, 0xc1, 7, vm_reg[0], ir->shamt); + emit_alu32_imm8(state, SHIFT_IMM_OPCODE, SHIFT_SAR, vm_reg[0], ir->shamt); }) GEN(candi, { vm_reg[0] = ra_load(state, ir->rs1); - emit_alu32_imm32(state, 0x81, 4, vm_reg[0], ir->imm); + emit_alu32_imm32(state, ALU_GRP1_OPCODE, ALU_AND, vm_reg[0], ir->imm); }) GEN(csub, { ra_load2(state, ir->rs1, ir->rs2); @@ -943,49 +630,13 @@ GEN(cj, { emit_store(state, S32, temp_reg, parameter_reg[0], offsetof(riscv_t, PC)); emit_exit(state); }) -GEN(cbeqz, { - vm_reg[0] = ra_load(state, ir->rs1); - emit_cmp_imm32(state, vm_reg[0], 0); - store_back(state); - uint32_t jump_loc_0 = state->offset; - emit_jcc_offset(state, 0x84); - if (ir->branch_untaken) { - emit_jmp(state, ir->pc + 2, rv->csr_satp); - } - emit_load_imm(state, temp_reg, ir->pc + 2); - emit_store(state, S32, temp_reg, parameter_reg[0], offsetof(riscv_t, PC)); - emit_exit(state); - emit_jump_target_offset(state, JUMP_LOC_0, state->offset); - if (ir->branch_taken) { - emit_jmp(state, ir->pc + ir->imm, rv->csr_satp); - } - emit_load_imm(state, temp_reg, ir->pc + ir->imm); - emit_store(state, S32, temp_reg, parameter_reg[0], offsetof(riscv_t, PC)); - emit_exit(state); -}) -GEN(cbnez, { - vm_reg[0] = ra_load(state, ir->rs1); - emit_cmp_imm32(state, vm_reg[0], 0); - store_back(state); - uint32_t jump_loc_0 = state->offset; - emit_jcc_offset(state, 0x85); - if (ir->branch_untaken) { - emit_jmp(state, ir->pc + 2, rv->csr_satp); - } - emit_load_imm(state, temp_reg, ir->pc + 2); - emit_store(state, S32, temp_reg, parameter_reg[0], offsetof(riscv_t, PC)); - emit_exit(state); - emit_jump_target_offset(state, JUMP_LOC_0, state->offset); - if (ir->branch_taken) { - emit_jmp(state, ir->pc + ir->imm, rv->csr_satp); - } - emit_load_imm(state, temp_reg, ir->pc + ir->imm); - emit_store(state, S32, temp_reg, parameter_reg[0], offsetof(riscv_t, PC)); - emit_exit(state); -}) +/* RV32C Compressed Branch Instructions */ +GEN_CBRANCH(cbeqz, JCC_JE) +GEN_CBRANCH(cbnez, JCC_JNE) GEN(cslli, { vm_reg[0] = ra_load(state, ir->rd); - emit_alu32_imm8(state, 0xc1, 4, vm_reg[0], (uint8_t) ir->imm); + emit_alu32_imm8(state, SHIFT_IMM_OPCODE, SHIFT_SHL, vm_reg[0], + (uint8_t) ir->imm); }) GEN(clwsp, { memory_t *m = PRIV(rv)->mem; diff --git a/src/rv32_template.c b/src/rv32_template.c index eaa016836..efc6307e9 100644 --- a/src/rv32_template.c +++ b/src/rv32_template.c @@ -1,226 +1,105 @@ -/* RV32I Base Instruction Set */ -/* conforming to the instructions specified in chapter 2 of the unprivileged - * specification version 20191213. +/* RV32I Base Instruction Set + * + * Conforming to the instructions specified in chapter 2 of the RISC-V + * unprivileged specification version 20191213. */ -/* This file establishes a low-level instruction execution abstraction layer, - * crucial for both the interpreter's instruction dispatching and the - * execution of native functions written to memory. The JIT compiler currently - * supports only x86-64 (x64) and Aarch64 (Arm64) architectures, which - * simplifies the process due to their abundant registers and register-based - * calling conventions. It effectively navigates the limitations associated - * with self-modifying code. - * - * To accommodate the specific needs of these platforms, a highly selective - * approach in practices and design is adopted. The file is designed as a - * foundational template for code generation in both the interpreter and JIT - * contexts. To facilitate this, a domain-specific language (DSL) is utilized, - * augmented by a C macro named 'RVOP'. Furthermore, a Python script is employed - * to convert code templates efficiently, enabling automatic generation of the - * JIT code generator and thus eliminating the need for repetitive manual - * coding. - * - * Example: - * - * RVOP( - * addi, - * { rv->X[ir->rd] = (int32_t) (rv->X[ir->rs1]) + ir->imm; }, - * GEN({ - * rald, VR0, rs1; - * map, VR1, rd; - * cond, regneq; - * mov, VR0, VR1; - * end; - * alu32imm, 32, 0x81, 0, VR1, imm; - * })) - * - * VR0, VR1, VR2 are host registers for storing calculated value - * during execution. TMP are host registers for storing temporary calculated - * value or memory address during execution. The block defined as 'GEN' is - * mapped to the generic C code used in the interpreter. The following - * instructions will be generated by JIT compiler: - * - Load X->rs1 (target field) from the rv data structure to VR0 - * (destination register), if X->rs1 has been loaded to the host register, the - * host register number would be assigned to VR0. - * - Map the host register to VM register X->rd. - * - Move the register value of VR0 (X->rs1) into VR1 (X->rd) if the - * VR0 (X->rs1) is not equal to VR1 (X->rd). - * - Add imm to VR1 (X->rd) +/* Interpreter instruction implementations * - * The sequence of host instructions generated during dynamic binary translation - * for the addi instruction: - * mov VR0, [memory address of (rv->X + rs1)] - * mov VR1, VR0 - * add VR1, imm + * This file contains the purely semantic implementations of RISC-V instructions + * for the interpreter. It uses the RVOP macro to define the behavior of each + * instruction by directly manipulating the emulator state. * - * The parameter of x64 or arm64 instruction API - * - size: size of data - * - op: opcode - * - src: source register - * - dst: destination register - * - pc: program counter + * Architecture: + * - RVOP(name, { body }): Defines an interpreter handler function. + * - Parameters: rv (emulator state), ir (decoded instruction), + * cycle (cycle counter), PC (program counter). + * - Return: 'bool' indicating whether to continue execution. * - * Here is the mnemonic listing for the DSL. + * Example: + * RVOP(addi, { rv->X[ir->rd] = rv->X[ir->rs1] + ir->imm; }) * - * | Mnemonic | Meaning | - * |--------------------------------+----------------------------------------| - * | alu[32|64]imm, size, op, | Do ALU operation on src and imm and | - * | src, dst, imm; | store the result into dst. | - * | alu[32|64], op, src, dst; | Do ALU operation on src and dst and | - * | | store the result into dst. | - * | ldimm, dst, imm32; | Load immediate into dst. (zero-extend) | - * | ldimms, dst, imm; | Load immediate into dst. | - * | lds, size, src, dst, | Load data of a specified size from | - * | offset; | memory and sign-extend it into the dst,| - * | | using the memory address calculated as | - * | | the sum of the src and the specified | - * | | offset. | - * | rald, dst, field | Map VM register to host register, and | - * | | load the target field from rv data | - * | | if needed. | - * | rald2, field1, field2 | Map 2 VM register to 2 host register, | - * | | and load the target fields from rv data| - * | | respectively if needed. | - * | rald2s, field1, field2 | Map 2 VM register to 2 host register, | - * | | and load the target fields from rv data| - * | | and sign-extend it respectively. | - * | map, dst, field | Map VM register to host register. | - * | ld, size, dst, member, field; | load the target field from rv data | - * | | structure to dst. | - * | st, size, src, member, field; | store src value to the target field of | - * | | rv data structure. | - * | cmp, src, dst; | compare the value between src and dst. | - * | cmpimm, src, imm; | compare the value of src and imm. | - * | jmp, pc, imm; | jump to the program counter of pc + imm| - * | jcc, op; | jump with condition. | - * | setjmpoff; | set the location of jump with condition| - * | | instruction. | - * | jmpoff; | set the jump target of jump with | - * | | condition instruction. | - * | mem; | get memory base. | - * | call, handler; | call function handler stored in rv->io | - * | exit; | exit machine code execution. | - * | mul, op, src, dst, imm; | Do mul operation on src and dst and | - * | | store the result into dst. | - * | div, op, src, dst, imm; | Do div operation on src and dst and | - * | | store the result into dst. | - * | mod, op, src, dst, imm; | Do mod operation on src and dst and | - * | | store the result into dst. | - * | cond, src; | set condition if (src) | - * | end; | set the end of condition if (src) | - * | predict; | parse the branch table of indirect | - * | | jump and search the jump target with | - * | | maximal frequency. Then, comparing | - * | | and jumping to the target if the | - * | | program counter matches. | - * | break; | In the end of a basic block, we need | - * | | to store all VM register value to rv | - * | | data, because the register allocation | - * | | is only applied on a basic block. | + * Implementation notes: + * - Changes to instruction semantics should be applied here, while JIT-specific + * optimizations should be applied in src/rv32_jit.c. */ /* Internal */ -RVOP(nop, { rv->X[rv_reg_zero] = 0; }, GEN({/* no operation */})) +RVOP(nop, { rv->X[rv_reg_zero] = 0; }) /* LUI is used to build 32-bit constants and uses the U-type format. LUI * places the U-immediate value in the top 20 bits of the destination * register rd, filling in the lowest 12 bits with zeros. The 32-bit * result is sign-extended to 64 bits. */ -RVOP( - lui, - { rv->X[ir->rd] = ir->imm; }, - GEN({ - map, VR0, rd; - ldimm, VR0, imm; - })) +RVOP(lui, { rv->X[ir->rd] = ir->imm; }) /* AUIPC is used to build pc-relative addresses and uses the U-type format. * AUIPC forms a 32-bit offset from the 20-bit U-immediate, filling in the * lowest 12 bits with zeros, adds this offset to the address of the AUIPC * instruction, then places the result in register rd. */ -RVOP( - auipc, - { rv->X[ir->rd] = ir->imm + PC; }, - GEN({ - map, VR0, rd; - ldimm, VR0, pc, imm; - })) +RVOP(auipc, { rv->X[ir->rd] = ir->imm + PC; }) /* JAL: Jump and Link * store successor instruction address into rd. * add next J imm (offset) to pc. */ -RVOP( - jal, - { - const uint32_t pc = PC; - /* Jump */ - PC += ir->imm; - /* link with return address */ - if (ir->rd) - rv->X[ir->rd] = pc + 4; +RVOP(jal, { + const uint32_t pc = PC; + /* Jump */ + PC += ir->imm; + /* link with return address */ + if (ir->rd) + rv->X[ir->rd] = pc + 4; /* check instruction misaligned */ #if !RV32_HAS(EXT_C) - RV_EXC_MISALIGN_HANDLER(pc, INSN, false, 0); + RV_EXC_MISALIGN_HANDLER(pc, INSN, false, 0); #endif - struct rv_insn *taken = ir->branch_taken; - if (taken) { + struct rv_insn *taken = ir->branch_taken; + if (taken) { #if RV32_HAS(JIT) - IIF(RV32_HAS(SYSTEM)(if (!rv->is_trapped && !reloc_enable_mmu), )) + IIF(RV32_HAS(SYSTEM)(if (!rv->is_trapped && !reloc_enable_mmu), )) + { + IIF(RV32_HAS(SYSTEM))(block_t *next =, ) + cache_get(rv->block_cache, PC, true); + IIF(RV32_HAS(SYSTEM))( + if (next->satp == rv->csr_satp && !next->invalidated), ) { - IIF(RV32_HAS(SYSTEM))(block_t *next =, ) - cache_get(rv->block_cache, PC, true); - IIF(RV32_HAS(SYSTEM))( - if (next->satp == rv->csr_satp && !next->invalidated), ) - { - if (!set_add(&pc_set, PC)) - has_loops = true; - if (cache_hot(rv->block_cache, PC)) - goto end_op; - } + if (!set_add(&pc_set, PC)) + has_loops = true; + if (cache_hot(rv->block_cache, PC)) + goto end_op; } + } #endif #if RV32_HAS(SYSTEM) - if (!rv->is_trapped) + if (!rv->is_trapped) #endif - { - /* - * The last_pc should only be updated when not in the trap path. - * Updating it during the trap path could lead to incorrect - * block chaining in rv_step(). Specifically, an interrupt might - * occur before locating the previous block with last_pc, and - * since __trap_handler() uses the same RVOP, the last_pc could - * be updated incorrectly during the trap path. - * - * This rule also applies to same statements elsewhere in this - * file. - */ - last_pc = PC; - - MUST_TAIL return taken->impl(rv, taken, cycle, PC); - } + { + /* The last_pc should only be updated when not in the trap path. + * Updating it during the trap path could lead to incorrect + * block chaining in rv_step(). Specifically, an interrupt might + * occur before locating the previous block with last_pc, and + * since __trap_handler() uses the same RVOP, the last_pc could + * be updated incorrectly during the trap path. + * + * This rule also applies to same statements elsewhere in this + * file. + */ + last_pc = PC; + + MUST_TAIL return taken->impl(rv, taken, cycle, PC); } - goto end_op; - }, - GEN({ - cond, rd; - map, VR0, rd; - ldimm, VR0, pc, 4; - end; - break; - jmp, pc, imm; - ldimm, TMP, pc, imm; - st, S32, TMP, PC; - exit; - })) + } + goto end_op; +}) /* The branch history table records historical data pertaining to indirect jump * targets. This functionality alleviates the need to invoke block_find() and * incurs overhead only when the indirect jump targets are not previously - * recorded. Additionally, the C code generator can reference the branch history - * table to link he indirect jump targets. + * recorded. Additionally, this table lets the interpreter fast-path indirect + * jumps without repeatedly calling block_find(). */ #if !RV32_HAS(JIT) #define LOOKUP_OR_UPDATE_BRANCH_HISTORY_TABLE() \ @@ -303,62 +182,43 @@ RVOP( * register rd. Register x0 can be used as the destination if the result is * not required. */ -RVOP( - jalr, - { - const uint32_t pc = PC; - /* jump */ - PC = (rv->X[ir->rs1] + ir->imm) & ~1U; - /* link */ - if (ir->rd) - rv->X[ir->rd] = pc + 4; +RVOP(jalr, { + const uint32_t pc = PC; + /* jump */ + PC = (rv->X[ir->rs1] + ir->imm) & ~1U; + /* link */ + if (ir->rd) + rv->X[ir->rd] = pc + 4; /* check instruction misaligned */ #if !RV32_HAS(EXT_C) - RV_EXC_MISALIGN_HANDLER(pc, INSN, false, 0); + RV_EXC_MISALIGN_HANDLER(pc, INSN, false, 0); #endif - LOOKUP_OR_UPDATE_BRANCH_HISTORY_TABLE(); + LOOKUP_OR_UPDATE_BRANCH_HISTORY_TABLE(); #if RV32_HAS(SYSTEM) - /* - * relocate_enable_mmu is the first function called to set up the MMU. - * Inside the function, at address 0x98, an invalid PTE is accessed, - * causing a fetch page fault and trapping into the trap_handler, and - * it will not return via sret. - * - * After the jalr instruction at physical address 0xc00000b4 - * (the final instruction of relocate_enable_mmu), the MMU becomes - * available. - * - * Based on this, we need to manually escape from the trap_handler after - * the jalr instruction is executed. - */ - if (!reloc_enable_mmu && reloc_enable_mmu_jalr_addr == 0xc00000b4) { - reloc_enable_mmu = true; - need_retranslate = true; - rv->is_trapped = false; - } + /* + * relocate_enable_mmu is the first function called to set up the MMU. + * Inside the function, at address 0x98, an invalid PTE is accessed, + * causing a fetch page fault and trapping into the trap_handler, and + * it will not return via sret. + * + * After the jalr instruction at physical address 0xc00000b4 + * (the final instruction of relocate_enable_mmu), the MMU becomes + * available. + * + * Based on this, we need to manually escape from the trap_handler after + * the jalr instruction is executed. + */ + if (!reloc_enable_mmu && reloc_enable_mmu_jalr_addr == 0xc00000b4) { + reloc_enable_mmu = true; + need_retranslate = true; + rv->is_trapped = false; + } #endif /* RV32_HAS(SYSTEM) */ - goto end_op; - }, - GEN({ - /* The register which stores the indirect address needs to be loaded - * first to avoid being overriden by other operation. - */ - rald, VR0, rs1; - mov, VR0, TMP; - alu32imm, 32, 0x81, 0, TMP, imm; - alu32imm, 32, 0x81, 4, TMP, ~1U; - cond, rd; - map, VR1, rd; - ldimm, VR1, pc, 4; - end; - break; - predict; - st, S32, TMP, PC; - exit; - })) + goto end_op; +}) /* clang-format off */ #define BRANCH_COND(type, x, y, cond) \ @@ -449,160 +309,35 @@ RVOP( */ /* BEQ: Branch if Equal */ -RVOP( - beq, - { BRANCH_FUNC(uint32_t, !=); }, - GEN({ - rald2, rs1, rs2; - cmp, VR1, VR0; - break; - setjmpoff; - jcc, 0x84; - cond, branch_untaken; - jmp, pc, 4; - end; - ldimm, TMP, pc, 4; - st, S32, TMP, PC; - exit; - jmpoff; - cond, branch_taken; - jmp, pc, imm; - end; - ldimm, TMP, pc, imm; - st, S32, TMP, PC; - exit; - })) +RVOP(beq, { BRANCH_FUNC(uint32_t, !=); }) /* BNE: Branch if Not Equal */ -RVOP( - bne, - { BRANCH_FUNC(uint32_t, ==); }, - GEN({ - rald2, rs1, rs2; - cmp, VR1, VR0; - break; - setjmpoff; - jcc, 0x85; - cond, branch_untaken; - jmp, pc, 4; - end; - ldimm, TMP, pc, 4; - st, S32, TMP, PC; - exit; - jmpoff; - cond, branch_taken; - jmp, pc, imm; - end; - ldimm, TMP, pc, imm; - st, S32, TMP, PC; - exit; - })) +RVOP(bne, { BRANCH_FUNC(uint32_t, ==); }) /* BLT: Branch if Less Than */ -RVOP( - blt, - { BRANCH_FUNC(int32_t, >=); }, - GEN({ - rald2, rs1, rs2; - cmp, VR1, VR0; - break; - setjmpoff; - jcc, 0x8c; - cond, branch_untaken; - jmp, pc, 4; - end; - ldimm, TMP, pc, 4; - st, S32, TMP, PC; - exit; - jmpoff; - cond, branch_taken; - jmp, pc, imm; - end; - ldimm, TMP, pc, imm; - st, S32, TMP, PC; - exit; - })) +RVOP(blt, { BRANCH_FUNC(int32_t, >=); }) /* BGE: Branch if Greater Than */ -RVOP( - bge, - { BRANCH_FUNC(int32_t, <); }, - GEN({ - rald2, rs1, rs2; - cmp, VR1, VR0; - break; - setjmpoff; - jcc, 0x8d; - cond, branch_untaken; - jmp, pc, 4; - end; - ldimm, TMP, pc, 4; - st, S32, TMP, PC; - exit; - jmpoff; - cond, branch_taken; - jmp, pc, imm; - end; - ldimm, TMP, pc, imm; - st, S32, TMP, PC; - exit; - })) +RVOP(bge, { BRANCH_FUNC(int32_t, <); }) /* BLTU: Branch if Less Than Unsigned */ -RVOP( - bltu, - { BRANCH_FUNC(uint32_t, >=); }, - GEN({ - rald2, rs1, rs2; - cmp, VR1, VR0; - break; - setjmpoff; - jcc, 0x82; - cond, branch_untaken; - jmp, pc, 4; - end; - ldimm, TMP, pc, 4; - st, S32, TMP, PC; - exit; - jmpoff; - cond, branch_taken; - jmp, pc, imm; - end; - ldimm, TMP, pc, imm; - st, S32, TMP, PC; - exit; - })) +RVOP(bltu, { BRANCH_FUNC(uint32_t, >=); }) /* BGEU: Branch if Greater Than Unsigned */ -RVOP( - bgeu, - { BRANCH_FUNC(uint32_t, <); }, - GEN({ - rald2, rs1, rs2; - cmp, VR1, VR0; - break; - setjmpoff; - jcc, 0x83; - cond, branch_untaken; - jmp, pc, 4; - end; - ldimm, TMP, pc, 4; - st, S32, TMP, PC; - exit; - jmpoff; - cond, branch_taken; - jmp, pc, imm; - end; - ldimm, TMP, pc, imm; - st, S32, TMP, PC; - exit; - })) - -/* RAM Fast-Path Memory Access Macros +RVOP(bgeu, { BRANCH_FUNC(uint32_t, <); }) + +/* There are 5 types of loads: two for byte and halfword sizes, and one for word + * size. Two instructions are required for byte and halfword loads because they + * can be either zero-extended or sign-extended to fill the register. However, + * for word-sized loads, an entire register's worth of data is read from memory, + * and no extension is needed. + */ + +/* RAM fast-path memory access macros * * In non-SYSTEM mode, bypass io callback indirection for direct RAM access. - * This eliminates ~5-10 cycles of function pointer dispatch overhead per - * memory operation. In SYSTEM mode, use io callbacks for MMU/TLB handling. + * This eliminates function pointer dispatch overhead per memory operation. + * In SYSTEM mode, use io callbacks for MMU/TLB handling. */ #if !RV32_HAS(SYSTEM) #define MEM_READ_W(rv, addr) ram_read_w(rv, addr) @@ -620,95 +355,38 @@ RVOP( #define MEM_WRITE_B(rv, addr, val) (rv)->io.mem_write_b(rv, addr, val) #endif -/* There are 5 types of loads: two for byte and halfword sizes, and one for word - * size. Two instructions are required for byte and halfword loads because they - * can be either zero-extended or sign-extended to fill the register. However, - * for word-sized loads, an entire register's worth of data is read from memory, - * and no extension is needed. - */ - /* LB: Load Byte */ -RVOP( - lb, - { - uint32_t addr = rv->X[ir->rs1] + ir->imm; - rv->X[ir->rd] = sign_extend_b(MEM_READ_B(rv, addr)); - }, - GEN({ - mem; - rald, VR0, rs1; - ldimms, TMP, mem; - alu64, 0x01, VR0, TMP; - map, VR1, rd; - lds, S8, TMP, VR1, 0; - })) +RVOP(lb, { + uint32_t addr = rv->X[ir->rs1] + ir->imm; + rv->X[ir->rd] = sign_extend_b(MEM_READ_B(rv, addr)); +}) /* LH: Load Halfword */ -RVOP( - lh, - { - const uint32_t addr = rv->X[ir->rs1] + ir->imm; - RV_EXC_MISALIGN_HANDLER(1, LOAD, false, 1); - rv->X[ir->rd] = sign_extend_h(MEM_READ_S(rv, addr)); - }, - GEN({ - mem; - rald, VR0, rs1; - ldimms, TMP, mem; - alu64, 0x01, VR0, TMP; - map, VR1, rd; - lds, S16, TMP, VR1, 0; - })) +RVOP(lh, { + const uint32_t addr = rv->X[ir->rs1] + ir->imm; + RV_EXC_MISALIGN_HANDLER(1, LOAD, false, 1); + rv->X[ir->rd] = sign_extend_h(MEM_READ_S(rv, addr)); +}) /* LW: Load Word */ -RVOP( - lw, - { - const uint32_t addr = rv->X[ir->rs1] + ir->imm; - RV_EXC_MISALIGN_HANDLER(3, LOAD, false, 1); - rv->X[ir->rd] = MEM_READ_W(rv, addr); - }, - GEN({ - mem; - rald, VR0, rs1; - ldimms, TMP, mem; - alu64, 0x01, VR0, TMP; - map, VR1, rd; - ld, S32, TMP, VR1, 0; - })) +RVOP(lw, { + const uint32_t addr = rv->X[ir->rs1] + ir->imm; + RV_EXC_MISALIGN_HANDLER(3, LOAD, false, 1); + rv->X[ir->rd] = MEM_READ_W(rv, addr); +}) /* LBU: Load Byte Unsigned */ -RVOP( - lbu, - { - uint32_t addr = rv->X[ir->rs1] + ir->imm; - rv->X[ir->rd] = MEM_READ_B(rv, addr); - }, - GEN({ - mem; - rald, VR0, rs1; - ldimms, TMP, mem; - alu64, 0x01, VR0, TMP; - map, VR1, rd; - ld, S8, TMP, VR1, 0; - })) +RVOP(lbu, { + uint32_t addr = rv->X[ir->rs1] + ir->imm; + rv->X[ir->rd] = MEM_READ_B(rv, addr); +}) /* LHU: Load Halfword Unsigned */ -RVOP( - lhu, - { - const uint32_t addr = rv->X[ir->rs1] + ir->imm; - RV_EXC_MISALIGN_HANDLER(1, LOAD, false, 1); - rv->X[ir->rd] = MEM_READ_S(rv, addr); - }, - GEN({ - mem; - rald, VR0, rs1; - ldimms, TMP, mem; - alu64, 0x01, VR0, TMP; - map, VR1, rd; - ld, S16, TMP, VR1, 0; - })) +RVOP(lhu, { + const uint32_t addr = rv->X[ir->rs1] + ir->imm; + RV_EXC_MISALIGN_HANDLER(1, LOAD, false, 1); + rv->X[ir->rd] = MEM_READ_S(rv, addr); +}) /* There are 3 types of stores: byte, halfword, and word-sized. Unlike loads, * there are no signed or unsigned variants, as stores to memory write exactly @@ -717,159 +395,65 @@ RVOP( */ /* SB: Store Byte */ -RVOP( - sb, - { - const uint32_t addr = rv->X[ir->rs1] + ir->imm; - const uint32_t value = rv->X[ir->rs2]; - MEM_WRITE_B(rv, addr, value); +RVOP(sb, { + const uint32_t addr = rv->X[ir->rs1] + ir->imm; + const uint32_t value = rv->X[ir->rs2]; + MEM_WRITE_B(rv, addr, value); #if RV32_HAS(ARCH_TEST) - check_tohost_write(rv, addr, value); + check_tohost_write(rv, addr, value); #endif - }, - GEN({ - mem; - rald, VR0, rs1; - ldimms, TMP, mem; - alu64, 0x01, VR0, TMP; - rald, VR1, rs2; - st, S8, VR1, TMP, 0; - })) +}) /* SH: Store Halfword */ -RVOP( - sh, - { - const uint32_t addr = rv->X[ir->rs1] + ir->imm; - RV_EXC_MISALIGN_HANDLER(1, STORE, false, 1); - const uint32_t value = rv->X[ir->rs2]; - MEM_WRITE_S(rv, addr, value); +RVOP(sh, { + const uint32_t addr = rv->X[ir->rs1] + ir->imm; + RV_EXC_MISALIGN_HANDLER(1, STORE, false, 1); + const uint32_t value = rv->X[ir->rs2]; + MEM_WRITE_S(rv, addr, value); #if RV32_HAS(ARCH_TEST) - check_tohost_write(rv, addr, value); + check_tohost_write(rv, addr, value); #endif - }, - GEN({ - mem; - rald, VR0, rs1; - ldimms, TMP, mem; - alu64, 0x01, VR0, TMP; - rald, VR1, rs2; - st, S16, VR1, TMP, 0; - })) +}) /* SW: Store Word */ -RVOP( - sw, - { - const uint32_t addr = rv->X[ir->rs1] + ir->imm; - RV_EXC_MISALIGN_HANDLER(3, STORE, false, 1); - const uint32_t value = rv->X[ir->rs2]; - MEM_WRITE_W(rv, addr, value); +RVOP(sw, { + const uint32_t addr = rv->X[ir->rs1] + ir->imm; + RV_EXC_MISALIGN_HANDLER(3, STORE, false, 1); + const uint32_t value = rv->X[ir->rs2]; + MEM_WRITE_W(rv, addr, value); #if RV32_HAS(ARCH_TEST) - check_tohost_write(rv, addr, value); + check_tohost_write(rv, addr, value); #endif - }, - GEN({ - mem; - rald, VR0, rs1; - ldimms, TMP, mem; - alu64, 0x01, VR0, TMP; - rald, VR1, rs2; - st, S32, VR1, TMP, 0; - })) +}) /* ADDI adds the sign-extended 12-bit immediate to register rs1. Arithmetic * overflow is ignored and the result is simply the low XLEN bits of the * result. ADDI rd, rs1, 0 is used to implement the MV rd, rs1 assembler * pseudo-instruction. */ -RVOP( - addi, - { rv->X[ir->rd] = rv->X[ir->rs1] + ir->imm; }, - GEN({ - rald, VR0, rs1; - map, VR1, rd; - cond, regneq; - mov, VR0, VR1; - end; - alu32imm, 32, 0x81, 0, VR1, imm; - })) +RVOP(addi, { rv->X[ir->rd] = rv->X[ir->rs1] + ir->imm; }) /* SLTI place the value 1 in register rd if register rs1 is less than the * signextended immediate when both are treated as signed numbers, else 0 is * written to rd. */ -RVOP( - slti, - { rv->X[ir->rd] = ((int32_t) (rv->X[ir->rs1]) < ir->imm) ? 1 : 0; }, - GEN({ - rald, VR0, rs1; - cmpimm, VR0, imm; - map, VR1, rd; - ldimm, VR1, 1; - setjmpoff; - jcc, 0x8c; - ldimm, VR1, 0; - jmpoff; - })) +RVOP(slti, { rv->X[ir->rd] = ((int32_t) (rv->X[ir->rs1]) < ir->imm) ? 1 : 0; }) /* SLTIU places the value 1 in register rd if register rs1 is less than the * immediate when both are treated as unsigned numbers, else 0 is written to rd. */ -RVOP( - sltiu, - { rv->X[ir->rd] = (rv->X[ir->rs1] < (uint32_t) ir->imm) ? 1 : 0; }, - GEN({ - rald, VR0, rs1; - cmpimm, VR0, imm; - map, VR1, rd; - ldimm, VR1, 1; - setjmpoff; - jcc, 0x82; - ldimm, VR1, 0; - jmpoff; - })) +RVOP(sltiu, { rv->X[ir->rd] = (rv->X[ir->rs1] < (uint32_t) ir->imm) ? 1 : 0; }) /* XORI: Exclusive OR Immediate */ -RVOP( - xori, - { rv->X[ir->rd] = rv->X[ir->rs1] ^ ir->imm; }, - GEN({ - rald, VR0, rs1; - map, VR1, rd; - cond, regneq; - mov, VR0, VR1; - end; - alu32imm, 32, 0x81, 6, VR1, imm; - })) +RVOP(xori, { rv->X[ir->rd] = rv->X[ir->rs1] ^ ir->imm; }) /* ORI: OR Immediate */ -RVOP( - ori, - { rv->X[ir->rd] = rv->X[ir->rs1] | ir->imm; }, - GEN({ - rald, VR0, rs1; - map, VR1, rd; - cond, regneq; - mov, VR0, VR1; - end; - alu32imm, 32, 0x81, 1, VR1, imm; - })) +RVOP(ori, { rv->X[ir->rd] = rv->X[ir->rs1] | ir->imm; }) /* ANDI performs bitwise AND on register rs1 and the sign-extended 12-bit * immediate and place the result in rd. */ -RVOP( - andi, - { rv->X[ir->rd] = rv->X[ir->rs1] & ir->imm; }, - GEN({ - rald, VR0, rs1; - map, VR1, rd; - cond, regneq; - mov, VR0, VR1; - end; - alu32imm, 32, 0x81, 4, VR1, imm; - })) +RVOP(andi, { rv->X[ir->rd] = rv->X[ir->rs1] & ir->imm; }) FORCE_INLINE void shift_func(riscv_t *rv, const rv_insn_t *ir) { @@ -892,311 +476,136 @@ FORCE_INLINE void shift_func(riscv_t *rv, const rv_insn_t *ir) /* SLLI performs logical left shift on the value in register rs1 by the shift * amount held in the lower 5 bits of the immediate. */ -RVOP( - slli, - { shift_func(rv, ir); }, - GEN({ - rald, VR0, rs1; - map, VR1, rd; - cond, regneq; - mov, VR0, VR1; - end; - alu32imm, 8, 0xc1, 4, VR1, imm, 0x1f; - })) +RVOP(slli, { shift_func(rv, ir); }) /* SRLI performs logical right shift on the value in register rs1 by the shift * amount held in the lower 5 bits of the immediate. */ -RVOP( - srli, - { shift_func(rv, ir); }, - GEN({ - rald, VR0, rs1; - map, VR1, rd; - cond, regneq; - mov, VR0, VR1; - end; - alu32imm, 8, 0xc1, 5, VR1, imm, 0x1f; - })) +RVOP(srli, { shift_func(rv, ir); }) /* SRAI performs arithmetic right shift on the value in register rs1 by the * shift amount held in the lower 5 bits of the immediate. */ -RVOP( - srai, - { shift_func(rv, ir); }, - GEN({ - rald, VR0, rs1; - map, VR1, rd; - cond, regneq; - mov, VR0, VR1; - end; - alu32imm, 8, 0xc1, 7, VR1, imm, 0x1f; - })) +RVOP(srai, { shift_func(rv, ir); }) /* ADD */ -RVOP( - add, - { rv->X[ir->rd] = rv->X[ir->rs1] + rv->X[ir->rs2]; }, - GEN({ - rald2, rs1, rs2; - map, VR2, rd; - mov, VR1, TMP; - mov, VR0, VR2; - alu32, 0x01, TMP, VR2; - })) +RVOP(add, { rv->X[ir->rd] = rv->X[ir->rs1] + rv->X[ir->rs2]; }) /* SUB: Subtract */ -RVOP( - sub, - { rv->X[ir->rd] = rv->X[ir->rs1] - rv->X[ir->rs2]; }, - GEN({ - rald2, rs1, rs2; - map, VR2, rd; - mov, VR1, TMP; - mov, VR0, VR2; - alu32, 0x29, TMP, VR2; - })) +RVOP(sub, { rv->X[ir->rd] = rv->X[ir->rs1] - rv->X[ir->rs2]; }) /* SLL: Shift Left Logical */ -RVOP( - sll, - { rv->X[ir->rd] = rv->X[ir->rs1] << (rv->X[ir->rs2] & 0x1f); }, - GEN({ - rald2, rs1, rs2; - map, VR2, rd; - mov, VR1, TMP; - mov, VR0, VR2; - alu32imm, 32, 0x81, 4, TMP, 0x1f; - alu32, 0xd3, 4, VR2; - })) +RVOP(sll, { rv->X[ir->rd] = rv->X[ir->rs1] << (rv->X[ir->rs2] & 0x1f); }) /* SLT: Set on Less Than */ -RVOP( - slt, - { - rv->X[ir->rd] = - ((int32_t) (rv->X[ir->rs1]) < (int32_t) (rv->X[ir->rs2])) ? 1 : 0; - }, - GEN({ - rald2, rs1, rs2; - map, VR2, rd; - cmp, VR1, VR0; - ldimm, VR2, 1; - setjmpoff; - jcc, 0x8c; - ldimm, VR2, 0; - jmpoff; - })) +RVOP(slt, { + rv->X[ir->rd] = + ((int32_t) (rv->X[ir->rs1]) < (int32_t) (rv->X[ir->rs2])) ? 1 : 0; +}) /* SLTU: Set on Less Than Unsigned */ -RVOP( - sltu, - { rv->X[ir->rd] = (rv->X[ir->rs1] < rv->X[ir->rs2]) ? 1 : 0; }, - GEN({ - rald2, rs1, rs2; - map, VR2, rd; - cmp, VR1, VR0; - ldimm, VR2, 1; - setjmpoff; - jcc, 0x82; - ldimm, VR2, 0; - jmpoff; - })) +RVOP(sltu, { rv->X[ir->rd] = (rv->X[ir->rs1] < rv->X[ir->rs2]) ? 1 : 0; }) /* XOR: Exclusive OR */ -RVOP( - xor, - { - rv->X[ir->rd] = rv->X[ir->rs1] ^ rv->X[ir->rs2]; - }, - GEN({ - rald2, rs1, rs2; - map, VR2, rd; - mov, VR1, TMP; - mov, VR0, VR2; - alu32, 0x31, TMP, VR2; - })) +RVOP(xor, { + rv->X[ir->rd] = rv->X[ir->rs1] ^ rv->X[ir->rs2]; +}) /* SRL: Shift Right Logical */ -RVOP( - srl, - { rv->X[ir->rd] = rv->X[ir->rs1] >> (rv->X[ir->rs2] & 0x1f); }, - GEN({ - rald2, rs1, rs2; - map, VR2, rd; - mov, VR1, TMP; - mov, VR0, VR2; - alu32imm, 32, 0x81, 4, TMP, 0x1f; - alu32, 0xd3, 5, VR2; - })) +RVOP(srl, { rv->X[ir->rd] = rv->X[ir->rs1] >> (rv->X[ir->rs2] & 0x1f); }) /* SRA: Shift Right Arithmetic */ -RVOP( - sra, - { rv->X[ir->rd] = ((int32_t) rv->X[ir->rs1]) >> (rv->X[ir->rs2] & 0x1f); }, - GEN({ - rald2, rs1, rs2; - map, VR2, rd; - mov, VR1, TMP; - mov, VR0, VR2; - alu32imm, 32, 0x81, 4, TMP, 0x1f; - alu32, 0xd3, 7, VR2; - })) +RVOP(sra, + { rv->X[ir->rd] = ((int32_t) rv->X[ir->rs1]) >> (rv->X[ir->rs2] & 0x1f); }) /* OR */ -RVOP( - or - , - { rv->X[ir->rd] = rv->X[ir->rs1] | rv->X[ir->rs2]; }, - GEN({ - rald2, rs1, rs2; - map, VR2, rd; - mov, VR1, TMP; - mov, VR0, VR2; - alu32, 0x09, TMP, VR2; - })) +RVOP(or, { rv->X[ir->rd] = rv->X[ir->rs1] | rv->X[ir->rs2]; }) /* AND */ /* clang-format off */ RVOP( and, - { rv->X[ir->rd] = rv->X[ir->rs1] & rv->X[ir->rs2]; }, - GEN({ - rald2, rs1, rs2; - map, VR2, rd; - mov, VR1, TMP; - mov, VR0, VR2; - alu32, 0x21, TMP, VR2; - })) + { rv->X[ir->rd] = rv->X[ir->rs1] & rv->X[ir->rs2]; }) /* clang-format on */ /* * FENCE: order device I/O and memory accesses as viewed by other * RISC-V harts and external devices or coprocessors */ -RVOP( - fence, - { - PC += 4; - /* FIXME: fill real implementations */ - goto end_op; - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(fence, { + PC += 4; + /* FIXME: fill real implementations */ + goto end_op; +}) /* ECALL: Environment Call */ -RVOP( - ecall, - { - rv->compressed = false; - rv->csr_cycle = cycle; - rv->PC = PC; - rv->io.on_ecall(rv); - return true; - }, - GEN({ - break; - ldimm, TMP, pc; - st, S32, TMP, PC; - call, ecall; - exit; - })) +RVOP(ecall, { + rv->compressed = false; + rv->csr_cycle = cycle; + rv->PC = PC; + rv->io.on_ecall(rv); + return true; +}) /* EBREAK: Environment Break */ -RVOP( - ebreak, - { - rv->compressed = false; - rv->csr_cycle = cycle; - rv->PC = PC; - rv->io.on_ebreak(rv); - return true; - }, - GEN({ - break; - ldimm, TMP, pc; - st, S32, TMP, PC; - call, ebreak; - exit; - })) +RVOP(ebreak, { + rv->compressed = false; + rv->csr_cycle = cycle; + rv->PC = PC; + rv->io.on_ebreak(rv); + return true; +}) /* WFI: Wait for Interrupt */ -RVOP( - wfi, - { - PC += 4; - /* FIXME: Implement */ - goto end_op; - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(wfi, { + PC += 4; + /* FIXME: Implement */ + goto end_op; +}) /* URET: return from traps in U-mode */ -RVOP( - uret, - { - /* FIXME: Implement */ - return false; - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(uret, { + /* FIXME: Implement */ + return false; +}) /* SRET: return from traps in S-mode */ #if RV32_HAS(SYSTEM) -RVOP( - sret, - { - rv->is_trapped = false; - rv->priv_mode = (rv->csr_sstatus & SSTATUS_SPP) >> SSTATUS_SPP_SHIFT; - rv->csr_sstatus &= ~(SSTATUS_SPP); +RVOP(sret, { + rv->is_trapped = false; + rv->priv_mode = (rv->csr_sstatus & SSTATUS_SPP) >> SSTATUS_SPP_SHIFT; + rv->csr_sstatus &= ~(SSTATUS_SPP); - const uint32_t sstatus_spie = - (rv->csr_sstatus & SSTATUS_SPIE) >> SSTATUS_SPIE_SHIFT; - rv->csr_sstatus |= (sstatus_spie << SSTATUS_SIE_SHIFT); - rv->csr_sstatus |= SSTATUS_SPIE; + const uint32_t sstatus_spie = + (rv->csr_sstatus & SSTATUS_SPIE) >> SSTATUS_SPIE_SHIFT; + rv->csr_sstatus |= (sstatus_spie << SSTATUS_SIE_SHIFT); + rv->csr_sstatus |= SSTATUS_SPIE; - rv->PC = rv->csr_sepc; + rv->PC = rv->csr_sepc; - return true; - }, - GEN({ - assert; /* FIXME: Implement */ - })) + return true; +}) #endif /* HRET: return from traps in H-mode */ -RVOP( - hret, - { - /* FIXME: Implement */ - return false; - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(hret, { + /* FIXME: Implement */ + return false; +}) /* MRET: return from traps in M-mode */ -RVOP( - mret, - { - rv->priv_mode = (rv->csr_mstatus & MSTATUS_MPP) >> MSTATUS_MPP_SHIFT; - rv->csr_mstatus &= ~(MSTATUS_MPP); - - const uint32_t mstatus_mpie = - (rv->csr_mstatus & MSTATUS_MPIE) >> MSTATUS_MPIE_SHIFT; - rv->csr_mstatus |= (mstatus_mpie << MSTATUS_MIE_SHIFT); - rv->csr_mstatus |= MSTATUS_MPIE; - - rv->PC = rv->csr_mepc; - return true; - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(mret, { + rv->priv_mode = (rv->csr_mstatus & MSTATUS_MPP) >> MSTATUS_MPP_SHIFT; + rv->csr_mstatus &= ~(MSTATUS_MPP); + + const uint32_t mstatus_mpie = + (rv->csr_mstatus & MSTATUS_MPIE) >> MSTATUS_MPIE_SHIFT; + rv->csr_mstatus |= (mstatus_mpie << MSTATUS_MIE_SHIFT); + rv->csr_mstatus |= MSTATUS_MPIE; + + rv->PC = rv->csr_mepc; + return true; +}) /* SFENCE.VMA: synchronize updates to in-memory memory-management data * structures with current execution. @@ -1209,60 +618,45 @@ RVOP( * VA→PA mappings. This is necessary when PTEs are modified without changing * SATP (e.g., munmap + mmap to different PA, or mprotect changes). */ -RVOP( - sfencevma, - { - PC += 4; +RVOP(sfencevma, { + PC += 4; #if RV32_HAS(SYSTEM) - if (ir->rs1 == 0) { - /* Global flush: invalidate all TLB entries */ - mmu_tlb_flush_all(rv); + if (ir->rs1 == 0) { + /* Global flush: invalidate all TLB entries */ + mmu_tlb_flush_all(rv); #if RV32_HAS(JIT) - /* Invalidate JIT blocks with current SATP */ - cache_invalidate_satp(rv->block_cache, rv->csr_satp); + /* Invalidate JIT blocks with current SATP */ + cache_invalidate_satp(rv->block_cache, rv->csr_satp); #endif - } else { - /* Selective flush: invalidate TLB entry for specific VA */ - uint32_t va = rv->X[ir->rs1]; - mmu_tlb_flush(rv, va); + } else { + /* Selective flush: invalidate TLB entry for specific VA */ + uint32_t va = rv->X[ir->rs1]; + mmu_tlb_flush(rv, va); #if RV32_HAS(JIT) - /* Invalidate JIT blocks in the target VA page */ - cache_invalidate_va(rv->block_cache, va, rv->csr_satp); + /* Invalidate JIT blocks in the target VA page */ + cache_invalidate_va(rv->block_cache, va, rv->csr_satp); #endif - } + } #endif - goto end_op; - }, - GEN({ - assert; /* FIXME: Implement */ - })) + goto end_op; +}) #if RV32_HAS(Zifencei) /* RV32 Zifencei Standard Extension */ -RVOP( - fencei, - { - PC += 4; - /* FIXME: fill real implementations */ - rv->csr_cycle = cycle; - rv->PC = PC; - return true; - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(fencei, { + PC += 4; + /* FIXME: fill real implementations */ + rv->csr_cycle = cycle; + rv->PC = PC; + return true; +}) #endif #if RV32_HAS(Zicsr) /* RV32 Zicsr Standard Extension */ /* CSRRW: Atomic Read/Write CSR */ -RVOP( - csrrw, - { - uint32_t tmp = csr_csrrw(rv, ir->imm, rv->X[ir->rs1], cycle); - rv->X[ir->rd] = ir->rd ? tmp : rv->X[ir->rd]; - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(csrrw, { + uint32_t tmp = csr_csrrw(rv, ir->imm, rv->X[ir->rs1], cycle); + rv->X[ir->rd] = ir->rd ? tmp : rv->X[ir->rd]; +}) /* CSRRS: Atomic Read and Set Bits in CSR */ /* The initial value in integer register rs1 is treated as a bit mask that @@ -1273,139 +667,75 @@ RVOP( * * See Page 56 of the RISC-V Unprivileged Specification. */ -RVOP( - csrrs, - { - uint32_t tmp = csr_csrrs( - rv, ir->imm, (ir->rs1 == rv_reg_zero) ? 0U : rv->X[ir->rs1], cycle); - rv->X[ir->rd] = ir->rd ? tmp : rv->X[ir->rd]; - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(csrrs, { + uint32_t tmp = csr_csrrs( + rv, ir->imm, (ir->rs1 == rv_reg_zero) ? 0U : rv->X[ir->rs1], cycle); + rv->X[ir->rd] = ir->rd ? tmp : rv->X[ir->rd]; +}) /* CSRRC: Atomic Read and Clear Bits in CSR */ -RVOP( - csrrc, - { - uint32_t tmp = csr_csrrc( - rv, ir->imm, (ir->rs1 == rv_reg_zero) ? 0U : rv->X[ir->rs1], cycle); - rv->X[ir->rd] = ir->rd ? tmp : rv->X[ir->rd]; - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(csrrc, { + uint32_t tmp = csr_csrrc( + rv, ir->imm, (ir->rs1 == rv_reg_zero) ? 0U : rv->X[ir->rs1], cycle); + rv->X[ir->rd] = ir->rd ? tmp : rv->X[ir->rd]; +}) /* CSRRWI */ -RVOP( - csrrwi, - { - uint32_t tmp = csr_csrrw(rv, ir->imm, ir->rs1, cycle); - rv->X[ir->rd] = ir->rd ? tmp : rv->X[ir->rd]; - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(csrrwi, { + uint32_t tmp = csr_csrrw(rv, ir->imm, ir->rs1, cycle); + rv->X[ir->rd] = ir->rd ? tmp : rv->X[ir->rd]; +}) /* CSRRSI */ -RVOP( - csrrsi, - { - uint32_t tmp = csr_csrrs(rv, ir->imm, ir->rs1, cycle); - rv->X[ir->rd] = ir->rd ? tmp : rv->X[ir->rd]; - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(csrrsi, { + uint32_t tmp = csr_csrrs(rv, ir->imm, ir->rs1, cycle); + rv->X[ir->rd] = ir->rd ? tmp : rv->X[ir->rd]; +}) /* CSRRCI */ -RVOP( - csrrci, - { - uint32_t tmp = csr_csrrc(rv, ir->imm, ir->rs1, cycle); - rv->X[ir->rd] = ir->rd ? tmp : rv->X[ir->rd]; - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(csrrci, { + uint32_t tmp = csr_csrrc(rv, ir->imm, ir->rs1, cycle); + rv->X[ir->rd] = ir->rd ? tmp : rv->X[ir->rd]; +}) #endif /* RV32M Standard Extension */ #if RV32_HAS(EXT_M) /* MUL: Multiply */ -RVOP( - mul, - { - const int64_t multiplicand = (int32_t) rv->X[ir->rs1]; - const int64_t multiplier = (int32_t) rv->X[ir->rs2]; - rv->X[ir->rd] = - ((uint64_t) (multiplicand * multiplier)) & ((1ULL << 32) - 1); - }, - GEN({ - rald2, rs1, rs2; - map, VR2, rd; - mov, VR1, TMP; - mov, VR0, VR2; - mul, 0x28, TMP, VR2, 0; - })) +RVOP(mul, { + const int64_t multiplicand = (int32_t) rv->X[ir->rs1]; + const int64_t multiplier = (int32_t) rv->X[ir->rs2]; + rv->X[ir->rd] = + ((uint64_t) (multiplicand * multiplier)) & ((1ULL << 32) - 1); +}) /* MULH: Multiply High Signed Signed */ /* It is important to first cast rs1 and rs2 to i32 so that the subsequent * cast to i64 sign-extends the register values. */ -RVOP( - mulh, - { - const int64_t multiplicand = (int32_t) rv->X[ir->rs1]; - const int64_t multiplier = (int32_t) rv->X[ir->rs2]; - rv->X[ir->rd] = ((uint64_t) (multiplicand * multiplier)) >> 32; - }, - GEN({ - rald2s, rs1, rs2, true, true; - map, VR2, rd; - mov, VR1, TMP; - mov, VR0, VR2; - mul, 0x2f, TMP, VR2, 0; - alu64imm, 8, 0xc1, 5, VR2, 32; - })) +RVOP(mulh, { + const int64_t multiplicand = (int32_t) rv->X[ir->rs1]; + const int64_t multiplier = (int32_t) rv->X[ir->rs2]; + rv->X[ir->rd] = ((uint64_t) (multiplicand * multiplier)) >> 32; +}) /* MULHSU: Multiply High Signed Unsigned */ /* It is essential to perform an initial cast of rs1 to i32, ensuring that the * subsequent cast to i64 results in sign extension of the register value. * Additionally, rs2 should not undergo sign extension. */ -RVOP( - mulhsu, - { - const int64_t multiplicand = (int32_t) rv->X[ir->rs1]; - const uint64_t umultiplier = rv->X[ir->rs2]; - rv->X[ir->rd] = ((uint64_t) (multiplicand * umultiplier)) >> 32; - }, - GEN({ - rald2s, rs1, rs2, true, false; - map, VR2, rd; - mov, VR1, TMP; - mov, VR0, VR2; - mul, 0x2f, TMP, VR2, 0; - alu64imm, 8, 0xc1, 5, VR2, 32; - })) +RVOP(mulhsu, { + const int64_t multiplicand = (int32_t) rv->X[ir->rs1]; + const uint64_t umultiplier = rv->X[ir->rs2]; + rv->X[ir->rd] = ((uint64_t) (multiplicand * umultiplier)) >> 32; +}) /* MULHU: Multiply High Unsigned Unsigned */ -RVOP( - mulhu, - { - rv->X[ir->rd] = - ((uint64_t) rv->X[ir->rs1] * (uint64_t) rv->X[ir->rs2]) >> 32; - }, - GEN({ - rald2, rs1, rs2; - map, VR2, rd; - mov, VR1, TMP; - mov, VR0, VR2; - mul, 0x2f, TMP, VR2, 0; - alu64imm, 8, 0xc1, 5, VR2, 32; - })) +RVOP(mulhu, { + rv->X[ir->rd] = + ((uint64_t) rv->X[ir->rs1] * (uint64_t) rv->X[ir->rs2]) >> 32; +}) /* DIV: Divide Signed */ /* +------------------------+-----------+----------+-----------+ @@ -1415,24 +745,14 @@ RVOP( * | Overflow (signed only) | −2^{L−1} | −1 | −2^{L−1} | * +------------------------+-----------+----------+-----------+ */ -RVOP( - div, - { - const int32_t dividend = (int32_t) rv->X[ir->rs1]; - const int32_t divisor = (int32_t) rv->X[ir->rs2]; - rv->X[ir->rd] = !divisor ? ~0U - : (divisor == -1 && rv->X[ir->rs1] == 0x80000000U) - ? rv->X[ir->rs1] /* overflow */ - : (unsigned int) (dividend / divisor); - }, - GEN({ - rald2s, rs1, rs2, true, true; - map, VR2, rd; - mov, VR1, TMP; - mov, VR0, VR2; - div, 0x38, TMP, VR2, 1; - /* FIXME: handle overflow */ - })) +RVOP(div, { + const int32_t dividend = (int32_t) rv->X[ir->rs1]; + const int32_t divisor = (int32_t) rv->X[ir->rs2]; + rv->X[ir->rd] = !divisor ? ~0U + : (divisor == -1 && rv->X[ir->rs1] == 0x80000000U) + ? rv->X[ir->rs1] /* overflow */ + : (unsigned int) (dividend / divisor); +}) /* DIVU: Divide Unsigned */ /* +------------------------+-----------+----------+----------+ @@ -1441,20 +761,11 @@ RVOP( * | Division by zero | x | 0 | 2^L − 1 | * +------------------------+-----------+----------+----------+ */ -RVOP( - divu, - { - const uint32_t udividend = rv->X[ir->rs1]; - const uint32_t udivisor = rv->X[ir->rs2]; - rv->X[ir->rd] = !udivisor ? ~0U : udividend / udivisor; - }, - GEN({ - rald2, rs1, rs2; - map, VR2, rd; - mov, VR1, TMP; - mov, VR0, VR2; - div, 0x38, TMP, VR2, 0; - })) +RVOP(divu, { + const uint32_t udividend = rv->X[ir->rs1]; + const uint32_t udivisor = rv->X[ir->rs2]; + rv->X[ir->rd] = !udivisor ? ~0U : udividend / udivisor; +}) /* clang-format off */ /* REM: Remainder Signed */ @@ -1472,15 +783,7 @@ RVOP(rem, { : (divisor == -1 && rv->X[ir->rs1] == 0x80000000U) ? 0 : (dividend % divisor); -}, -GEN({ - rald2s, rs1, rs2, true, true; - map, VR2, rd; - mov, VR1, TMP; - mov, VR0, VR2; - mod, 0x98, TMP, VR2, 1; - /* FIXME: handle overflow */ -})) +}) /* REMU: Remainder Unsigned */ /* +------------------------+-----------+----------+----------+ @@ -1494,14 +797,7 @@ RVOP(remu, { const uint32_t udivisor = rv->X[ir->rs2]; rv->X[ir->rd] = !udivisor ? udividend : udividend % udivisor; -}, -GEN({ - rald2, rs1, rs2; - map, VR2, rd; - mov, VR1, TMP; - mov, VR0, VR2; - mod, 0x98, TMP, VR2, 0; -})) +}) /* clang-format on */ #endif @@ -1530,404 +826,279 @@ GEN({ */ /* LR.W: Load Reserved */ -RVOP( - lrw, - { - const uint32_t addr = rv->X[ir->rs1]; - RV_EXC_MISALIGN_HANDLER(3, LOAD, false, 1); - if (ir->rd) - rv->X[ir->rd] = MEM_READ_W(rv, addr); - /* skip registration of the 'reservation set' - * FIXME: unimplemented - */ - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(lrw, { + const uint32_t addr = rv->X[ir->rs1]; + RV_EXC_MISALIGN_HANDLER(3, LOAD, false, 1); + if (ir->rd) + rv->X[ir->rd] = MEM_READ_W(rv, addr); + /* skip registration of the 'reservation set' + * FIXME: unimplemented + */ +}) /* SC.W: Store Conditional */ -RVOP( - scw, - { - /* assume the 'reservation set' is valid - * FIXME: unimplemented - */ - const uint32_t addr = rv->X[ir->rs1]; - RV_EXC_MISALIGN_HANDLER(3, STORE, false, 1); - const uint32_t value = rv->X[ir->rs2]; - MEM_WRITE_W(rv, addr, value); - rv->X[ir->rd] = 0; +RVOP(scw, { + /* assume the 'reservation set' is valid + * FIXME: unimplemented + */ + const uint32_t addr = rv->X[ir->rs1]; + RV_EXC_MISALIGN_HANDLER(3, STORE, false, 1); + const uint32_t value = rv->X[ir->rs2]; + MEM_WRITE_W(rv, addr, value); + rv->X[ir->rd] = 0; #if RV32_HAS(ARCH_TEST) - check_tohost_write(rv, addr, value); + check_tohost_write(rv, addr, value); #endif - }, - GEN({ - assert; /* FIXME: Implement */ - })) +}) /* AMOSWAP.W: Atomic Swap */ -RVOP( - amoswapw, - { - const uint32_t addr = rv->X[ir->rs1]; - RV_EXC_MISALIGN_HANDLER(3, LOAD, false, 1); - const uint32_t value1 = MEM_READ_W(rv, addr); - const uint32_t value2 = rv->X[ir->rs2]; - if (ir->rd) - rv->X[ir->rd] = value1; - MEM_WRITE_W(rv, addr, value2); +RVOP(amoswapw, { + const uint32_t addr = rv->X[ir->rs1]; + RV_EXC_MISALIGN_HANDLER(3, LOAD, false, 1); + const uint32_t value1 = MEM_READ_W(rv, addr); + const uint32_t value2 = rv->X[ir->rs2]; + if (ir->rd) + rv->X[ir->rd] = value1; + MEM_WRITE_W(rv, addr, value2); #if RV32_HAS(ARCH_TEST) - check_tohost_write(rv, addr, value2); + check_tohost_write(rv, addr, value2); #endif - }, - GEN({ - assert; /* FIXME: Implement */ - })) +}) /* AMOADD.W: Atomic ADD */ -RVOP( - amoaddw, - { - const uint32_t addr = rv->X[ir->rs1]; - RV_EXC_MISALIGN_HANDLER(3, LOAD, false, 1); - const uint32_t value1 = MEM_READ_W(rv, addr); - const uint32_t value2 = rv->X[ir->rs2]; - if (ir->rd) - rv->X[ir->rd] = value1; - const uint32_t res = value1 + value2; - MEM_WRITE_W(rv, addr, res); +RVOP(amoaddw, { + const uint32_t addr = rv->X[ir->rs1]; + RV_EXC_MISALIGN_HANDLER(3, LOAD, false, 1); + const uint32_t value1 = MEM_READ_W(rv, addr); + const uint32_t value2 = rv->X[ir->rs2]; + if (ir->rd) + rv->X[ir->rd] = value1; + const uint32_t res = value1 + value2; + MEM_WRITE_W(rv, addr, res); #if RV32_HAS(ARCH_TEST) - check_tohost_write(rv, addr, res); + check_tohost_write(rv, addr, res); #endif - }, - GEN({ - assert; /* FIXME: Implement */ - })) +}) /* AMOXOR.W: Atomic XOR */ -RVOP( - amoxorw, - { - const uint32_t addr = rv->X[ir->rs1]; - RV_EXC_MISALIGN_HANDLER(3, LOAD, false, 1); - const uint32_t value1 = MEM_READ_W(rv, addr); - const uint32_t value2 = rv->X[ir->rs2]; - if (ir->rd) - rv->X[ir->rd] = value1; - const uint32_t res = value1 ^ value2; - MEM_WRITE_W(rv, addr, res); +RVOP(amoxorw, { + const uint32_t addr = rv->X[ir->rs1]; + RV_EXC_MISALIGN_HANDLER(3, LOAD, false, 1); + const uint32_t value1 = MEM_READ_W(rv, addr); + const uint32_t value2 = rv->X[ir->rs2]; + if (ir->rd) + rv->X[ir->rd] = value1; + const uint32_t res = value1 ^ value2; + MEM_WRITE_W(rv, addr, res); #if RV32_HAS(ARCH_TEST) - check_tohost_write(rv, addr, res); + check_tohost_write(rv, addr, res); #endif - }, - GEN({ - assert; /* FIXME: Implement */ - })) +}) /* AMOAND.W: Atomic AND */ -RVOP( - amoandw, - { - const uint32_t addr = rv->X[ir->rs1]; - RV_EXC_MISALIGN_HANDLER(3, LOAD, false, 1); - const uint32_t value1 = MEM_READ_W(rv, addr); - const uint32_t value2 = rv->X[ir->rs2]; - if (ir->rd) - rv->X[ir->rd] = value1; - const uint32_t res = value1 & value2; - MEM_WRITE_W(rv, addr, res); +RVOP(amoandw, { + const uint32_t addr = rv->X[ir->rs1]; + RV_EXC_MISALIGN_HANDLER(3, LOAD, false, 1); + const uint32_t value1 = MEM_READ_W(rv, addr); + const uint32_t value2 = rv->X[ir->rs2]; + if (ir->rd) + rv->X[ir->rd] = value1; + const uint32_t res = value1 & value2; + MEM_WRITE_W(rv, addr, res); #if RV32_HAS(ARCH_TEST) - check_tohost_write(rv, addr, res); + check_tohost_write(rv, addr, res); #endif - }, - GEN({ - assert; /* FIXME: Implement */ - })) +}) /* AMOOR.W: Atomic OR */ -RVOP( - amoorw, - { - const uint32_t addr = rv->X[ir->rs1]; - RV_EXC_MISALIGN_HANDLER(3, LOAD, false, 1); - const uint32_t value1 = MEM_READ_W(rv, addr); - const uint32_t value2 = rv->X[ir->rs2]; - if (ir->rd) - rv->X[ir->rd] = value1; - const uint32_t res = value1 | value2; - MEM_WRITE_W(rv, addr, res); +RVOP(amoorw, { + const uint32_t addr = rv->X[ir->rs1]; + RV_EXC_MISALIGN_HANDLER(3, LOAD, false, 1); + const uint32_t value1 = MEM_READ_W(rv, addr); + const uint32_t value2 = rv->X[ir->rs2]; + if (ir->rd) + rv->X[ir->rd] = value1; + const uint32_t res = value1 | value2; + MEM_WRITE_W(rv, addr, res); #if RV32_HAS(ARCH_TEST) - check_tohost_write(rv, addr, res); + check_tohost_write(rv, addr, res); #endif - }, - GEN({ - assert; /* FIXME: Implement */ - })) +}) /* AMOMIN.W: Atomic MIN */ -RVOP( - amominw, - { - const uint32_t addr = rv->X[ir->rs1]; - RV_EXC_MISALIGN_HANDLER(3, LOAD, false, 1); - const uint32_t value1 = MEM_READ_W(rv, addr); - const uint32_t value2 = rv->X[ir->rs2]; - if (ir->rd) - rv->X[ir->rd] = value1; - const int32_t a = value1; - const int32_t b = value2; - const uint32_t res = a < b ? value1 : value2; - MEM_WRITE_W(rv, addr, res); +RVOP(amominw, { + const uint32_t addr = rv->X[ir->rs1]; + RV_EXC_MISALIGN_HANDLER(3, LOAD, false, 1); + const uint32_t value1 = MEM_READ_W(rv, addr); + const uint32_t value2 = rv->X[ir->rs2]; + if (ir->rd) + rv->X[ir->rd] = value1; + const int32_t a = value1; + const int32_t b = value2; + const uint32_t res = a < b ? value1 : value2; + MEM_WRITE_W(rv, addr, res); #if RV32_HAS(ARCH_TEST) - check_tohost_write(rv, addr, res); + check_tohost_write(rv, addr, res); #endif - }, - GEN({ - assert; /* FIXME: Implement */ - })) +}) /* AMOMAX.W: Atomic MAX */ -RVOP( - amomaxw, - { - const uint32_t addr = rv->X[ir->rs1]; - RV_EXC_MISALIGN_HANDLER(3, LOAD, false, 1); - const uint32_t value1 = MEM_READ_W(rv, addr); - const uint32_t value2 = rv->X[ir->rs2]; - if (ir->rd) - rv->X[ir->rd] = value1; - const int32_t a = value1; - const int32_t b = value2; - const uint32_t res = a > b ? value1 : value2; - MEM_WRITE_W(rv, addr, res); +RVOP(amomaxw, { + const uint32_t addr = rv->X[ir->rs1]; + RV_EXC_MISALIGN_HANDLER(3, LOAD, false, 1); + const uint32_t value1 = MEM_READ_W(rv, addr); + const uint32_t value2 = rv->X[ir->rs2]; + if (ir->rd) + rv->X[ir->rd] = value1; + const int32_t a = value1; + const int32_t b = value2; + const uint32_t res = a > b ? value1 : value2; + MEM_WRITE_W(rv, addr, res); #if RV32_HAS(ARCH_TEST) - check_tohost_write(rv, addr, res); + check_tohost_write(rv, addr, res); #endif - }, - GEN({ - assert; /* FIXME: Implement */ - })) +}) /* AMOMINU.W */ -RVOP( - amominuw, - { - const uint32_t addr = rv->X[ir->rs1]; - RV_EXC_MISALIGN_HANDLER(3, LOAD, false, 1); - const uint32_t value1 = MEM_READ_W(rv, addr); - const uint32_t value2 = rv->X[ir->rs2]; - if (ir->rd) - rv->X[ir->rd] = value1; - const uint32_t ures = value1 < value2 ? value1 : value2; - MEM_WRITE_W(rv, addr, ures); +RVOP(amominuw, { + const uint32_t addr = rv->X[ir->rs1]; + RV_EXC_MISALIGN_HANDLER(3, LOAD, false, 1); + const uint32_t value1 = MEM_READ_W(rv, addr); + const uint32_t value2 = rv->X[ir->rs2]; + if (ir->rd) + rv->X[ir->rd] = value1; + const uint32_t ures = value1 < value2 ? value1 : value2; + MEM_WRITE_W(rv, addr, ures); #if RV32_HAS(ARCH_TEST) - check_tohost_write(rv, addr, ures); + check_tohost_write(rv, addr, ures); #endif - }, - GEN({ - assert; /* FIXME: Implement */ - })) +}) /* AMOMAXU.W */ -RVOP( - amomaxuw, - { - const uint32_t addr = rv->X[ir->rs1]; - RV_EXC_MISALIGN_HANDLER(3, LOAD, false, 1); - const uint32_t value1 = MEM_READ_W(rv, addr); - const uint32_t value2 = rv->X[ir->rs2]; - if (ir->rd) - rv->X[ir->rd] = value1; - const uint32_t ures = value1 > value2 ? value1 : value2; - MEM_WRITE_W(rv, addr, ures); +RVOP(amomaxuw, { + const uint32_t addr = rv->X[ir->rs1]; + RV_EXC_MISALIGN_HANDLER(3, LOAD, false, 1); + const uint32_t value1 = MEM_READ_W(rv, addr); + const uint32_t value2 = rv->X[ir->rs2]; + if (ir->rd) + rv->X[ir->rd] = value1; + const uint32_t ures = value1 > value2 ? value1 : value2; + MEM_WRITE_W(rv, addr, ures); #if RV32_HAS(ARCH_TEST) - check_tohost_write(rv, addr, ures); + check_tohost_write(rv, addr, ures); #endif - }, - GEN({ - assert; /* FIXME: Implement */ - })) +}) #endif /* RV32_HAS(EXT_A) */ /* RV32F Standard Extension */ #if RV32_HAS(EXT_F) /* FLW */ -RVOP( - flw, - { - /* copy into the float register */ - const uint32_t addr = rv->X[ir->rs1] + ir->imm; - RV_EXC_MISALIGN_HANDLER(3, LOAD, false, 1); - rv->F[ir->rd].v = MEM_READ_W(rv, addr); - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(flw, { + /* copy into the float register */ + const uint32_t addr = rv->X[ir->rs1] + ir->imm; + RV_EXC_MISALIGN_HANDLER(3, LOAD, false, 1); + rv->F[ir->rd].v = MEM_READ_W(rv, addr); +}) /* FSW */ -RVOP( - fsw, - { - /* copy from float registers */ - const uint32_t addr = rv->X[ir->rs1] + ir->imm; - RV_EXC_MISALIGN_HANDLER(3, STORE, false, 1); - const uint32_t value = rv->F[ir->rs2].v; - MEM_WRITE_W(rv, addr, value); +RVOP(fsw, { + /* copy from float registers */ + const uint32_t addr = rv->X[ir->rs1] + ir->imm; + RV_EXC_MISALIGN_HANDLER(3, STORE, false, 1); + const uint32_t value = rv->F[ir->rs2].v; + MEM_WRITE_W(rv, addr, value); #if RV32_HAS(ARCH_TEST) - check_tohost_write(rv, addr, value); + check_tohost_write(rv, addr, value); #endif - }, - GEN({ - assert; /* FIXME: Implement */ - })) +}) /* FMADD.S */ -RVOP( - fmadds, - { - set_rounding_mode(rv, ir->rm); - rv->F[ir->rd] = - f32_mulAdd(rv->F[ir->rs1], rv->F[ir->rs2], rv->F[ir->rs3]); - set_fflag(rv); - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(fmadds, { + set_rounding_mode(rv, ir->rm); + rv->F[ir->rd] = f32_mulAdd(rv->F[ir->rs1], rv->F[ir->rs2], rv->F[ir->rs3]); + set_fflag(rv); +}) /* FMSUB.S */ -RVOP( - fmsubs, - { - set_rounding_mode(rv, ir->rm); - riscv_float_t tmp = rv->F[ir->rs3]; - tmp.v ^= FMASK_SIGN; - rv->F[ir->rd] = f32_mulAdd(rv->F[ir->rs1], rv->F[ir->rs2], tmp); - set_fflag(rv); - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(fmsubs, { + set_rounding_mode(rv, ir->rm); + riscv_float_t tmp = rv->F[ir->rs3]; + tmp.v ^= FMASK_SIGN; + rv->F[ir->rd] = f32_mulAdd(rv->F[ir->rs1], rv->F[ir->rs2], tmp); + set_fflag(rv); +}) /* FNMSUB.S */ -RVOP( - fnmsubs, - { - set_rounding_mode(rv, ir->rm); - riscv_float_t tmp = rv->F[ir->rs1]; - tmp.v ^= FMASK_SIGN; - rv->F[ir->rd] = f32_mulAdd(tmp, rv->F[ir->rs2], rv->F[ir->rs3]); - set_fflag(rv); - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(fnmsubs, { + set_rounding_mode(rv, ir->rm); + riscv_float_t tmp = rv->F[ir->rs1]; + tmp.v ^= FMASK_SIGN; + rv->F[ir->rd] = f32_mulAdd(tmp, rv->F[ir->rs2], rv->F[ir->rs3]); + set_fflag(rv); +}) /* FNMADD.S */ -RVOP( - fnmadds, - { - set_rounding_mode(rv, ir->rm); - riscv_float_t tmp1 = rv->F[ir->rs1]; - riscv_float_t tmp2 = rv->F[ir->rs3]; - tmp1.v ^= FMASK_SIGN; - tmp2.v ^= FMASK_SIGN; - rv->F[ir->rd] = f32_mulAdd(tmp1, rv->F[ir->rs2], tmp2); - set_fflag(rv); - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(fnmadds, { + set_rounding_mode(rv, ir->rm); + riscv_float_t tmp1 = rv->F[ir->rs1]; + riscv_float_t tmp2 = rv->F[ir->rs3]; + tmp1.v ^= FMASK_SIGN; + tmp2.v ^= FMASK_SIGN; + rv->F[ir->rd] = f32_mulAdd(tmp1, rv->F[ir->rs2], tmp2); + set_fflag(rv); +}) /* FADD.S */ -RVOP( - fadds, - { - set_rounding_mode(rv, ir->rm); - rv->F[ir->rd] = f32_add(rv->F[ir->rs1], rv->F[ir->rs2]); - set_fflag(rv); - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(fadds, { + set_rounding_mode(rv, ir->rm); + rv->F[ir->rd] = f32_add(rv->F[ir->rs1], rv->F[ir->rs2]); + set_fflag(rv); +}) /* FSUB.S */ -RVOP( - fsubs, - { - set_rounding_mode(rv, ir->rm); - rv->F[ir->rd] = f32_sub(rv->F[ir->rs1], rv->F[ir->rs2]); - set_fflag(rv); - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(fsubs, { + set_rounding_mode(rv, ir->rm); + rv->F[ir->rd] = f32_sub(rv->F[ir->rs1], rv->F[ir->rs2]); + set_fflag(rv); +}) /* FMUL.S */ -RVOP( - fmuls, - { - set_rounding_mode(rv, ir->rm); - rv->F[ir->rd] = f32_mul(rv->F[ir->rs1], rv->F[ir->rs2]); - set_fflag(rv); - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(fmuls, { + set_rounding_mode(rv, ir->rm); + rv->F[ir->rd] = f32_mul(rv->F[ir->rs1], rv->F[ir->rs2]); + set_fflag(rv); +}) /* FDIV.S */ -RVOP( - fdivs, - { - set_rounding_mode(rv, ir->rm); - rv->F[ir->rd] = f32_div(rv->F[ir->rs1], rv->F[ir->rs2]); - set_fflag(rv); - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(fdivs, { + set_rounding_mode(rv, ir->rm); + rv->F[ir->rd] = f32_div(rv->F[ir->rs1], rv->F[ir->rs2]); + set_fflag(rv); +}) /* FSQRT.S */ -RVOP( - fsqrts, - { - set_rounding_mode(rv, ir->rm); - rv->F[ir->rd] = f32_sqrt(rv->F[ir->rs1]); - set_fflag(rv); - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(fsqrts, { + set_rounding_mode(rv, ir->rm); + rv->F[ir->rd] = f32_sqrt(rv->F[ir->rs1]); + set_fflag(rv); +}) /* FSGNJ.S */ -RVOP( - fsgnjs, - { - rv->F[ir->rd].v = - (rv->F[ir->rs1].v & ~FMASK_SIGN) | (rv->F[ir->rs2].v & FMASK_SIGN); - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(fsgnjs, { + rv->F[ir->rd].v = + (rv->F[ir->rs1].v & ~FMASK_SIGN) | (rv->F[ir->rs2].v & FMASK_SIGN); +}) /* FSGNJN.S */ -RVOP( - fsgnjns, - { - rv->F[ir->rd].v = - (rv->F[ir->rs1].v & ~FMASK_SIGN) | (~rv->F[ir->rs2].v & FMASK_SIGN); - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(fsgnjns, { + rv->F[ir->rd].v = + (rv->F[ir->rs1].v & ~FMASK_SIGN) | (~rv->F[ir->rs2].v & FMASK_SIGN); +}) /* FSGNJX.S */ -RVOP( - fsgnjxs, - { rv->F[ir->rd].v = rv->F[ir->rs1].v ^ (rv->F[ir->rs2].v & FMASK_SIGN); }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(fsgnjxs, + { rv->F[ir->rd].v = rv->F[ir->rs1].v ^ (rv->F[ir->rs2].v & FMASK_SIGN); }) /* FMIN.S * In IEEE754-201x, fmin(x, y) return @@ -1936,175 +1107,113 @@ RVOP( * - if both are NaN, return NaN * When input is signaling NaN, raise invalid operation */ -RVOP( - fmins, - { - if (f32_isSignalingNaN(rv->F[ir->rs1]) || - f32_isSignalingNaN(rv->F[ir->rs2])) - rv->csr_fcsr |= FFLAG_INVALID_OP; - bool less = f32_lt_quiet(rv->F[ir->rs1], rv->F[ir->rs2]) || - (f32_eq(rv->F[ir->rs1], rv->F[ir->rs2]) && - (rv->F[ir->rs1].v & FMASK_SIGN)); - if (is_nan(rv->F[ir->rs1].v) && is_nan(rv->F[ir->rs2].v)) - rv->F[ir->rd].v = RV_NAN; - else - rv->F[ir->rd] = (less || is_nan(rv->F[ir->rs2].v) ? rv->F[ir->rs1] - : rv->F[ir->rs2]); - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(fmins, { + if (f32_isSignalingNaN(rv->F[ir->rs1]) || + f32_isSignalingNaN(rv->F[ir->rs2])) + rv->csr_fcsr |= FFLAG_INVALID_OP; + bool less = f32_lt_quiet(rv->F[ir->rs1], rv->F[ir->rs2]) || + (f32_eq(rv->F[ir->rs1], rv->F[ir->rs2]) && + (rv->F[ir->rs1].v & FMASK_SIGN)); + if (is_nan(rv->F[ir->rs1].v) && is_nan(rv->F[ir->rs2].v)) + rv->F[ir->rd].v = RV_NAN; + else + rv->F[ir->rd] = (less || is_nan(rv->F[ir->rs2].v) ? rv->F[ir->rs1] + : rv->F[ir->rs2]); +}) /* FMAX.S */ -RVOP( - fmaxs, - { - if (f32_isSignalingNaN(rv->F[ir->rs1]) || - f32_isSignalingNaN(rv->F[ir->rs2])) - rv->csr_fcsr |= FFLAG_INVALID_OP; - bool greater = f32_lt_quiet(rv->F[ir->rs2], rv->F[ir->rs1]) || - (f32_eq(rv->F[ir->rs1], rv->F[ir->rs2]) && - (rv->F[ir->rs2].v & FMASK_SIGN)); - if (is_nan(rv->F[ir->rs1].v) && is_nan(rv->F[ir->rs2].v)) - rv->F[ir->rd].v = RV_NAN; - else - rv->F[ir->rd] = - (greater || is_nan(rv->F[ir->rs2].v) ? rv->F[ir->rs1] - : rv->F[ir->rs2]); - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(fmaxs, { + if (f32_isSignalingNaN(rv->F[ir->rs1]) || + f32_isSignalingNaN(rv->F[ir->rs2])) + rv->csr_fcsr |= FFLAG_INVALID_OP; + bool greater = f32_lt_quiet(rv->F[ir->rs2], rv->F[ir->rs1]) || + (f32_eq(rv->F[ir->rs1], rv->F[ir->rs2]) && + (rv->F[ir->rs2].v & FMASK_SIGN)); + if (is_nan(rv->F[ir->rs1].v) && is_nan(rv->F[ir->rs2].v)) + rv->F[ir->rd].v = RV_NAN; + else + rv->F[ir->rd] = (greater || is_nan(rv->F[ir->rs2].v) ? rv->F[ir->rs1] + : rv->F[ir->rs2]); +}) /* FCVT.W.S and FCVT.WU.S convert a floating point number to an integer, * the rounding mode is specified in rm field. */ /* FCVT.W.S */ -RVOP( - fcvtws, - { - set_rounding_mode(rv, ir->rm); - uint32_t ret = f32_to_i32(rv->F[ir->rs1], softfloat_roundingMode, true); - if (ir->rd) - rv->X[ir->rd] = ret; - set_fflag(rv); - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(fcvtws, { + set_rounding_mode(rv, ir->rm); + uint32_t ret = f32_to_i32(rv->F[ir->rs1], softfloat_roundingMode, true); + if (ir->rd) + rv->X[ir->rd] = ret; + set_fflag(rv); +}) /* FCVT.WU.S */ -RVOP( - fcvtwus, - { - set_rounding_mode(rv, ir->rm); - uint32_t ret = - f32_to_ui32(rv->F[ir->rs1], softfloat_roundingMode, true); - if (ir->rd) - rv->X[ir->rd] = ret; - set_fflag(rv); - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(fcvtwus, { + set_rounding_mode(rv, ir->rm); + uint32_t ret = f32_to_ui32(rv->F[ir->rs1], softfloat_roundingMode, true); + if (ir->rd) + rv->X[ir->rd] = ret; + set_fflag(rv); +}) /* FMV.X.W */ -RVOP( - fmvxw, - { - if (ir->rd) - rv->X[ir->rd] = rv->F[ir->rs1].v; - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(fmvxw, { + if (ir->rd) + rv->X[ir->rd] = rv->F[ir->rs1].v; +}) /* FEQ.S performs a quiet comparison: it only sets the invalid operation * exception flag if either input is a signaling NaN. */ -RVOP( - feqs, - { - uint32_t ret = f32_eq(rv->F[ir->rs1], rv->F[ir->rs2]); - if (ir->rd) - rv->X[ir->rd] = ret; - set_fflag(rv); - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(feqs, { + uint32_t ret = f32_eq(rv->F[ir->rs1], rv->F[ir->rs2]); + if (ir->rd) + rv->X[ir->rd] = ret; + set_fflag(rv); +}) /* FLT.S and FLE.S perform what the IEEE 754-2008 standard refers to as * signaling comparisons: that is, they set the invalid operation exception * flag if either input is NaN. */ -RVOP( - flts, - { - uint32_t ret = f32_lt(rv->F[ir->rs1], rv->F[ir->rs2]); - if (ir->rd) - rv->X[ir->rd] = ret; - set_fflag(rv); - }, - GEN({ - assert; /* FIXME: Implement */ - })) - -RVOP( - fles, - { - uint32_t ret = f32_le(rv->F[ir->rs1], rv->F[ir->rs2]); - if (ir->rd) - rv->X[ir->rd] = ret; - set_fflag(rv); - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(flts, { + uint32_t ret = f32_lt(rv->F[ir->rs1], rv->F[ir->rs2]); + if (ir->rd) + rv->X[ir->rd] = ret; + set_fflag(rv); +}) + +RVOP(fles, { + uint32_t ret = f32_le(rv->F[ir->rs1], rv->F[ir->rs2]); + if (ir->rd) + rv->X[ir->rd] = ret; + set_fflag(rv); +}) /* FCLASS.S */ -RVOP( - fclasss, - { - if (ir->rd) - rv->X[ir->rd] = calc_fclass(rv->F[ir->rs1].v); - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(fclasss, { + if (ir->rd) + rv->X[ir->rd] = calc_fclass(rv->F[ir->rs1].v); +}) /* FCVT.S.W */ -RVOP( - fcvtsw, - { - set_rounding_mode(rv, ir->rm); - rv->F[ir->rd] = i32_to_f32(rv->X[ir->rs1]); - set_fflag(rv); - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(fcvtsw, { + set_rounding_mode(rv, ir->rm); + rv->F[ir->rd] = i32_to_f32(rv->X[ir->rs1]); + set_fflag(rv); +}) /* FCVT.S.WU */ -RVOP( - fcvtswu, - { - set_rounding_mode(rv, ir->rm); - rv->F[ir->rd] = ui32_to_f32(rv->X[ir->rs1]); - set_fflag(rv); - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(fcvtswu, { + set_rounding_mode(rv, ir->rm); + rv->F[ir->rd] = ui32_to_f32(rv->X[ir->rs1]); + set_fflag(rv); +}) /* FMV.W.X */ -RVOP( - fmvwx, - { rv->F[ir->rd].v = rv->X[ir->rs1]; }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(fmvwx, { rv->F[ir->rd].v = rv->X[ir->rs1]; }) #endif /* RV32C Standard Extension */ @@ -2116,65 +1225,35 @@ RVOP( * This instruction is used to generate pointers to stack-allocated variables, * and expands to addi rd', x2, nzuimm[9:2]. */ -RVOP( - caddi4spn, - { rv->X[ir->rd] = rv->X[rv_reg_sp] + (uint16_t) ir->imm; }, - GEN({ - rald, VR0, rv_reg_sp; - map, VR1, rd; - cond, regneq; - mov, VR0, VR1; - end; - alu32imm, 32, 0x81, 0, VR1, uint, 16, imm; - })) +RVOP(caddi4spn, { rv->X[ir->rd] = rv->X[rv_reg_sp] + (uint16_t) ir->imm; }) /* C.LW loads a 32-bit value from memory into register rd'. It computes an * effective address by adding the zero-extended offset, scaled by 4, to the * base address in register rs1'. It expands to lw rd', offset[6:2](rs1'). */ -RVOP( - clw, - { - const uint32_t addr = rv->X[ir->rs1] + (uint32_t) ir->imm; - RV_EXC_MISALIGN_HANDLER(3, LOAD, true, 1); - rv->X[ir->rd] = MEM_READ_W(rv, addr); - }, - GEN({ - mem; - rald, VR0, rs1; - ldimms, TMP, mem; - alu64, 0x01, VR0, TMP; - map, VR1, rd; - ld, S32, TMP, VR1, 0; - })) +RVOP(clw, { + const uint32_t addr = rv->X[ir->rs1] + (uint32_t) ir->imm; + RV_EXC_MISALIGN_HANDLER(3, LOAD, true, 1); + rv->X[ir->rd] = MEM_READ_W(rv, addr); +}) /* C.SW stores a 32-bit value in register rs2' to memory. It computes an * effective address by adding the zero-extended offset, scaled by 4, to the * base address in register rs1'. * It expands to sw rs2', offset[6:2](rs1'). */ -RVOP( - csw, - { - const uint32_t addr = rv->X[ir->rs1] + (uint32_t) ir->imm; - RV_EXC_MISALIGN_HANDLER(3, STORE, true, 1); - const uint32_t value = rv->X[ir->rs2]; - MEM_WRITE_W(rv, addr, value); +RVOP(csw, { + const uint32_t addr = rv->X[ir->rs1] + (uint32_t) ir->imm; + RV_EXC_MISALIGN_HANDLER(3, STORE, true, 1); + const uint32_t value = rv->X[ir->rs2]; + MEM_WRITE_W(rv, addr, value); #if RV32_HAS(ARCH_TEST) - check_tohost_write(rv, addr, value); + check_tohost_write(rv, addr, value); #endif - }, - GEN({ - mem; - rald, VR0, rs1; - ldimms, TMP, mem; - alu64, 0x01, VR0, TMP; - rald, VR1, rs2; - st, S32, VR1, TMP, 0; - })) +}) /* C.NOP */ -RVOP(cnop, {/* no operation */}, GEN({/* no operation */})) +RVOP(cnop, {/* no operation */}) /* C.ADDI adds the non-zero sign-extended 6-bit immediate to the value in * register rd then writes the result to rd. C.ADDI expands into @@ -2182,79 +1261,50 @@ RVOP(cnop, {/* no operation */}, GEN({/* no operation */})) * with both rd=x0 and nzimm=0 encodes the C.NOP instruction; the remaining * code points with either rd=x0 or nzimm=0 encode HINTs. */ -RVOP( - caddi, - { rv->X[ir->rd] += (int16_t) ir->imm; }, - GEN({ - rald, VR0, rd; - alu32imm, 32, 0x81, 0, VR0, int, 16, imm; - })) +RVOP(caddi, { rv->X[ir->rd] += (int16_t) ir->imm; }) /* C.JAL */ -RVOP( - cjal, - { - rv->X[rv_reg_ra] = PC + 2; - PC += ir->imm; - struct rv_insn *taken = ir->branch_taken; - if (taken) { +RVOP(cjal, { + rv->X[rv_reg_ra] = PC + 2; + PC += ir->imm; + struct rv_insn *taken = ir->branch_taken; + if (taken) { #if RV32_HAS(JIT) - IIF(RV32_HAS(SYSTEM))(block_t *next =, ) - cache_get(rv->block_cache, PC, true); - IIF(RV32_HAS(SYSTEM))( - if (next->satp == rv->csr_satp && !next->invalidated), ) - { - if (!set_add(&pc_set, PC)) - has_loops = true; - if (cache_hot(rv->block_cache, PC)) - goto end_op; - } + IIF(RV32_HAS(SYSTEM))(block_t *next =, ) + cache_get(rv->block_cache, PC, true); + IIF(RV32_HAS(SYSTEM))( + if (next->satp == rv->csr_satp && !next->invalidated), ) + { + if (!set_add(&pc_set, PC)) + has_loops = true; + if (cache_hot(rv->block_cache, PC)) + goto end_op; + } #endif #if RV32_HAS(SYSTEM) - if (!rv->is_trapped) + if (!rv->is_trapped) #endif - { - last_pc = PC; - MUST_TAIL return taken->impl(rv, taken, cycle, PC); - } + { + last_pc = PC; + MUST_TAIL return taken->impl(rv, taken, cycle, PC); } - goto end_op; - }, - GEN({ - map, VR0, rv_reg_ra; - ldimm, VR0, pc, 2; - break; - jmp, pc, imm; - ldimm, TMP, pc, imm; - st, S32, TMP, PC; - exit; - })) + } + goto end_op; +}) /* C.LI loads the sign-extended 6-bit immediate, imm, into register rd. * C.LI expands into addi rd, x0, imm[5:0]. * C.LI is only valid when rd=x0; the code points with rd=x0 encode HINTs. */ -RVOP( - cli, - { rv->X[ir->rd] = ir->imm; }, - GEN({ - map, VR0, rd; - ldimm, VR0, imm; - })) +RVOP(cli, { rv->X[ir->rd] = ir->imm; }) /* C.ADDI16SP is used to adjust the stack pointer in procedure prologues * and epilogues. It expands into addi x2, x2, nzimm[9:4]. * C.ADDI16SP is only valid when nzimm'=0; the code point with nzimm=0 is * reserved. */ -RVOP( - caddi16sp, - { rv->X[ir->rd] += ir->imm; }, - GEN({ - rald, VR0, rd; - alu32imm, 32, 0x81, 0, VR0, imm; - })) +RVOP(caddi16sp, { rv->X[ir->rd] += ir->imm; }) /* C.LUI loads the non-zero 6-bit immediate field into bits 17–12 of the * destination register, clears the bottom 12 bits, and sign-extends bit @@ -2263,407 +1313,232 @@ RVOP( * C.LUI is only valid when rd'={x0, x2}, and when the immediate is not equal * to zero. */ -RVOP( - clui, - { rv->X[ir->rd] = ir->imm; }, - GEN({ - map, VR0, rd; - ldimm, VR0, imm; - })) +RVOP(clui, { rv->X[ir->rd] = ir->imm; }) /* C.SRLI is a CB-format instruction that performs a logical right shift * of the value in register rd' then writes the result to rd'. The shift * amount is encoded in the shamt field. C.SRLI expands into srli rd', * rd', shamt[5:0]. */ -RVOP( - csrli, - { rv->X[ir->rs1] >>= ir->shamt; }, - GEN({ - rald, VR0, rs1; - alu32imm, 8, 0xc1, 5, VR0, shamt; - })) +RVOP(csrli, { rv->X[ir->rs1] >>= ir->shamt; }) /* C.SRAI is defined analogously to C.SRLI, but instead performs an * arithmetic right shift. C.SRAI expands to srai rd', rd', shamt[5:0]. */ -RVOP( - csrai, - { - const uint32_t mask = 0x80000000 & rv->X[ir->rs1]; - rv->X[ir->rs1] >>= ir->shamt; - for (unsigned int i = 0; i < ir->shamt; ++i) - rv->X[ir->rs1] |= mask >> i; - }, - GEN({ - rald, VR0, rs1; - alu32imm, 8, 0xc1, 7, VR0, shamt; - /* FIXME: Incomplete */ - })) +RVOP(csrai, { + const uint32_t mask = 0x80000000 & rv->X[ir->rs1]; + rv->X[ir->rs1] >>= ir->shamt; + for (unsigned int i = 0; i < ir->shamt; ++i) + rv->X[ir->rs1] |= mask >> i; +}) /* C.ANDI is a CB-format instruction that computes the bitwise AND of the * value in register rd' and the sign-extended 6-bit immediate, then writes * the result to rd'. C.ANDI expands to andi rd', rd', imm[5:0]. */ -RVOP( - candi, - { rv->X[ir->rs1] &= ir->imm; }, - GEN({ - rald, VR0, rs1; - alu32imm, 32, 0x81, 4, VR0, imm; - })) +RVOP(candi, { rv->X[ir->rs1] &= ir->imm; }) /* C.SUB */ -RVOP( - csub, - { rv->X[ir->rd] = rv->X[ir->rs1] - rv->X[ir->rs2]; }, - GEN({ - rald2, rs1, rs2; - map, VR2, rd; - mov, VR1, TMP; - mov, VR0, VR2; - alu32, 0x29, TMP, VR2; - })) +RVOP(csub, { rv->X[ir->rd] = rv->X[ir->rs1] - rv->X[ir->rs2]; }) /* C.XOR */ -RVOP( - cxor, - { rv->X[ir->rd] = rv->X[ir->rs1] ^ rv->X[ir->rs2]; }, - GEN({ - rald2, rs1, rs2; - map, VR2, rd; - mov, VR1, TMP; - mov, VR0, VR2; - alu32, 0x31, TMP, VR2; - })) +RVOP(cxor, { rv->X[ir->rd] = rv->X[ir->rs1] ^ rv->X[ir->rs2]; }) -RVOP( - cor, - { rv->X[ir->rd] = rv->X[ir->rs1] | rv->X[ir->rs2]; }, - GEN({ - rald2, rs1, rs2; - map, VR2, rd; - mov, VR1, TMP; - mov, VR0, VR2; - alu32, 0x09, TMP, VR2; - })) +RVOP(cor, { rv->X[ir->rd] = rv->X[ir->rs1] | rv->X[ir->rs2]; }) -RVOP( - cand, - { rv->X[ir->rd] = rv->X[ir->rs1] & rv->X[ir->rs2]; }, - GEN({ - rald2, rs1, rs2; - map, VR2, rd; - mov, VR1, TMP; - mov, VR0, VR2; - alu32, 0x21, TMP, VR2; - })) +RVOP(cand, { rv->X[ir->rd] = rv->X[ir->rs1] & rv->X[ir->rs2]; }) /* C.J performs an unconditional control transfer. The offset is sign-extended * and added to the pc to form the jump target address. * C.J can therefore target a ±2 KiB range. * C.J expands to jal x0, offset[11:1]. */ -RVOP( - cj, - { - PC += ir->imm; - struct rv_insn *taken = ir->branch_taken; - if (taken) { +RVOP(cj, { + PC += ir->imm; + struct rv_insn *taken = ir->branch_taken; + if (taken) { #if RV32_HAS(JIT) - IIF(RV32_HAS(SYSTEM))(block_t *next =, ) - cache_get(rv->block_cache, PC, true); - IIF(RV32_HAS(SYSTEM))( - if (next->satp == rv->csr_satp && !next->invalidated), ) - { - if (!set_add(&pc_set, PC)) - has_loops = true; - if (cache_hot(rv->block_cache, PC)) - goto end_op; - } + IIF(RV32_HAS(SYSTEM))(block_t *next =, ) + cache_get(rv->block_cache, PC, true); + IIF(RV32_HAS(SYSTEM))( + if (next->satp == rv->csr_satp && !next->invalidated), ) + { + if (!set_add(&pc_set, PC)) + has_loops = true; + if (cache_hot(rv->block_cache, PC)) + goto end_op; + } #endif #if RV32_HAS(SYSTEM) - if (!rv->is_trapped) + if (!rv->is_trapped) #endif - { - last_pc = PC; - MUST_TAIL return taken->impl(rv, taken, cycle, PC); - } + { + last_pc = PC; + MUST_TAIL return taken->impl(rv, taken, cycle, PC); } - goto end_op; - }, - GEN({ - break; - jmp, pc, imm; - ldimm, TMP, pc, imm; - st, S32, TMP, PC; - exit; - })) + } + goto end_op; +}) /* C.BEQZ performs conditional control transfers. The offset is sign-extended * and added to the pc to form the branch target address. * It can therefore target a ±256 B range. C.BEQZ takes the branch if the * value in register rs1' is zero. It expands to beq rs1', x0, offset[8:1]. */ -RVOP( - cbeqz, - { - if (rv->X[ir->rs1]) { - is_branch_taken = false; - struct rv_insn *untaken = ir->branch_untaken; - if (!untaken) - goto nextop; +RVOP(cbeqz, { + if (rv->X[ir->rs1]) { + is_branch_taken = false; + struct rv_insn *untaken = ir->branch_untaken; + if (!untaken) + goto nextop; #if RV32_HAS(JIT) - IIF(RV32_HAS(SYSTEM))(block_t *next =, ) - cache_get(rv->block_cache, PC + 2, true); - IIF(RV32_HAS(SYSTEM))( - if (next->satp == rv->csr_satp && !next->invalidated), ) - { - if (!set_add(&pc_set, PC + 2)) - has_loops = true; - if (cache_hot(rv->block_cache, PC + 2)) - goto nextop; - } + IIF(RV32_HAS(SYSTEM))(block_t *next =, ) + cache_get(rv->block_cache, PC + 2, true); + IIF(RV32_HAS(SYSTEM))( + if (next->satp == rv->csr_satp && !next->invalidated), ) + { + if (!set_add(&pc_set, PC + 2)) + has_loops = true; + if (cache_hot(rv->block_cache, PC + 2)) + goto nextop; + } #endif - PC += 2; + PC += 2; #if RV32_HAS(SYSTEM) - if (!rv->is_trapped) + if (!rv->is_trapped) #endif - { - last_pc = PC; - MUST_TAIL return untaken->impl(rv, untaken, cycle, PC); - } - - goto end_op; + { + last_pc = PC; + MUST_TAIL return untaken->impl(rv, untaken, cycle, PC); } - is_branch_taken = true; - PC += ir->imm; - struct rv_insn *taken = ir->branch_taken; - if (taken) { + + goto end_op; + } + is_branch_taken = true; + PC += ir->imm; + struct rv_insn *taken = ir->branch_taken; + if (taken) { #if RV32_HAS(JIT) - IIF(RV32_HAS(SYSTEM))(block_t *next =, ) - cache_get(rv->block_cache, PC, true); - IIF(RV32_HAS(SYSTEM))( - if (next->satp == rv->csr_satp && !next->invalidated), ) - { - if (!set_add(&pc_set, PC)) - has_loops = true; - if (cache_hot(rv->block_cache, PC)) - goto end_op; - } + IIF(RV32_HAS(SYSTEM))(block_t *next =, ) + cache_get(rv->block_cache, PC, true); + IIF(RV32_HAS(SYSTEM))( + if (next->satp == rv->csr_satp && !next->invalidated), ) + { + if (!set_add(&pc_set, PC)) + has_loops = true; + if (cache_hot(rv->block_cache, PC)) + goto end_op; + } #endif #if RV32_HAS(SYSTEM) - if (!rv->is_trapped) + if (!rv->is_trapped) #endif - { - last_pc = PC; - MUST_TAIL return taken->impl(rv, taken, cycle, PC); - } + { + last_pc = PC; + MUST_TAIL return taken->impl(rv, taken, cycle, PC); } - goto end_op; - }, - GEN({ - rald, VR0, rs1; - cmpimm, VR0, 0; - break; - setjmpoff; - jcc, 0x84; - cond, branch_untaken; - jmp, pc, 2; - end; - ldimm, TMP, pc, 2; - st, S32, TMP, PC; - exit; - jmpoff; - cond, branch_taken; - jmp, pc, imm; - end; - ldimm, TMP, pc, imm; - st, S32, TMP, PC; - exit; - })) + } + goto end_op; +}) /* C.BEQZ */ -RVOP( - cbnez, - { - if (!rv->X[ir->rs1]) { - is_branch_taken = false; - struct rv_insn *untaken = ir->branch_untaken; - if (!untaken) - goto nextop; +RVOP(cbnez, { + if (!rv->X[ir->rs1]) { + is_branch_taken = false; + struct rv_insn *untaken = ir->branch_untaken; + if (!untaken) + goto nextop; #if RV32_HAS(JIT) - IIF(RV32_HAS(SYSTEM))(block_t *next =, ) - cache_get(rv->block_cache, PC + 2, true); - IIF(RV32_HAS(SYSTEM))( - if (next->satp == rv->csr_satp && !next->invalidated), ) - { - if (!set_add(&pc_set, PC + 2)) - has_loops = true; - if (cache_hot(rv->block_cache, PC + 2)) - goto nextop; - } + IIF(RV32_HAS(SYSTEM))(block_t *next =, ) + cache_get(rv->block_cache, PC + 2, true); + IIF(RV32_HAS(SYSTEM))( + if (next->satp == rv->csr_satp && !next->invalidated), ) + { + if (!set_add(&pc_set, PC + 2)) + has_loops = true; + if (cache_hot(rv->block_cache, PC + 2)) + goto nextop; + } #endif - PC += 2; + PC += 2; #if RV32_HAS(SYSTEM) - if (!rv->is_trapped) + if (!rv->is_trapped) #endif - { - last_pc = PC; - MUST_TAIL return untaken->impl(rv, untaken, cycle, PC); - } - - goto end_op; + { + last_pc = PC; + MUST_TAIL return untaken->impl(rv, untaken, cycle, PC); } - is_branch_taken = true; - PC += ir->imm; - struct rv_insn *taken = ir->branch_taken; - if (taken) { + + goto end_op; + } + is_branch_taken = true; + PC += ir->imm; + struct rv_insn *taken = ir->branch_taken; + if (taken) { #if RV32_HAS(JIT) - IIF(RV32_HAS(SYSTEM))(block_t *next =, ) - cache_get(rv->block_cache, PC, true); - IIF(RV32_HAS(SYSTEM))( - if (next->satp == rv->csr_satp && !next->invalidated), ) - { - if (!set_add(&pc_set, PC)) - has_loops = true; - if (cache_hot(rv->block_cache, PC)) - goto end_op; - } + IIF(RV32_HAS(SYSTEM))(block_t *next =, ) + cache_get(rv->block_cache, PC, true); + IIF(RV32_HAS(SYSTEM))( + if (next->satp == rv->csr_satp && !next->invalidated), ) + { + if (!set_add(&pc_set, PC)) + has_loops = true; + if (cache_hot(rv->block_cache, PC)) + goto end_op; + } #endif #if RV32_HAS(SYSTEM) - if (!rv->is_trapped) + if (!rv->is_trapped) #endif - { - last_pc = PC; - MUST_TAIL return taken->impl(rv, taken, cycle, PC); - } + { + last_pc = PC; + MUST_TAIL return taken->impl(rv, taken, cycle, PC); } - goto end_op; - }, - GEN({ - rald, VR0, rs1; - cmpimm, VR0, 0; - break; - setjmpoff; - jcc, 0x85; - cond, branch_untaken; - jmp, pc, 2; - end; - ldimm, TMP, pc, 2; - st, S32, TMP, PC; - exit; - jmpoff; - cond, branch_taken; - jmp, pc, imm; - end; - ldimm, TMP, pc, imm; - st, S32, TMP, PC; - exit; - })) + } + goto end_op; +}) /* C.SLLI is a CI-format instruction that performs a logical left shift of * the value in register rd then writes the result to rd. The shift amount * is encoded in the shamt field. C.SLLI expands into slli rd, rd, shamt[5:0]. */ -RVOP( - cslli, - { rv->X[ir->rd] <<= (uint8_t) ir->imm; }, - GEN({ - rald, VR0, rd; - alu32imm, 8, 0xc1, 4, VR0, uint, 8, imm; - })) +RVOP(cslli, { rv->X[ir->rd] <<= (uint8_t) ir->imm; }) /* C.LWSP */ -RVOP( - clwsp, - { - const uint32_t addr = rv->X[rv_reg_sp] + ir->imm; - RV_EXC_MISALIGN_HANDLER(3, LOAD, true, 1); - rv->X[ir->rd] = MEM_READ_W(rv, addr); - }, - GEN({ - mem; - rald, VR0, rv_reg_sp; - ldimms, TMP, mem; - alu64, 0x01, VR0, TMP; - map, VR1, rd; - ld, S32, TMP, VR1, 0; - })) +RVOP(clwsp, { + const uint32_t addr = rv->X[rv_reg_sp] + ir->imm; + RV_EXC_MISALIGN_HANDLER(3, LOAD, true, 1); + rv->X[ir->rd] = MEM_READ_W(rv, addr); +}) /* C.JR */ -RVOP( - cjr, - { - PC = rv->X[ir->rs1]; - LOOKUP_OR_UPDATE_BRANCH_HISTORY_TABLE(); - goto end_op; - }, - GEN({ - rald, VR0, rs1; - mov, VR0, TMP; - break; - predict; - st, S32, TMP, PC; - exit; - })) +RVOP(cjr, { + PC = rv->X[ir->rs1]; + LOOKUP_OR_UPDATE_BRANCH_HISTORY_TABLE(); + goto end_op; +}) /* C.MV */ -RVOP( - cmv, - { rv->X[ir->rd] = rv->X[ir->rs2]; }, - GEN({ - rald, VR0, rs2; - map, VR1, rd; - cond, regneq; - mov, VR0, VR1; - else; - pollute, VR1; - end; - })) +RVOP(cmv, { rv->X[ir->rd] = rv->X[ir->rs2]; }) /* C.EBREAK */ -RVOP( - cebreak, - { - rv->compressed = true; - rv->csr_cycle = cycle; - rv->PC = PC; - rv->io.on_ebreak(rv); - return true; - }, - GEN({ - break; - ldimm, TMP, pc; - st, S32, TMP, PC; - call, ebreak; - exit; - })) +RVOP(cebreak, { + rv->compressed = true; + rv->csr_cycle = cycle; + rv->PC = PC; + rv->io.on_ebreak(rv); + return true; +}) /* C.JALR */ -RVOP( - cjalr, - { - /* Unconditional jump and store PC+2 to ra */ - const int32_t jump_to = rv->X[ir->rs1]; - rv->X[rv_reg_ra] = PC + 2; - PC = jump_to; - LOOKUP_OR_UPDATE_BRANCH_HISTORY_TABLE(); - goto end_op; - }, - GEN({ - /* The register which stores the indirect address needs to be loaded - * first to avoid being overriden by other operation. - */ - rald, VR0, rs1; - mov, VR0, TMP; - map, VR1, rv_reg_ra; - ldimm, VR1, pc, 2; - break; - predict; - st, S32, TMP, PC; - exit; - })) +RVOP(cjalr, { + /* Unconditional jump and store PC+2 to ra */ + const int32_t jump_to = rv->X[ir->rs1]; + rv->X[rv_reg_ra] = PC + 2; + PC = jump_to; + LOOKUP_OR_UPDATE_BRANCH_HISTORY_TABLE(); + goto end_op; +}) /* C.ADD adds the values in registers rd and rs2 and writes the result to * register rd. @@ -2672,95 +1547,56 @@ RVOP( * the C.JALR and C.EBREAK instructions. The code points with rs2=x0 and rd=x0 * are HINTs. */ -RVOP( - cadd, - { rv->X[ir->rd] = rv->X[ir->rs1] + rv->X[ir->rs2]; }, - GEN({ - rald2, rs1, rs2; - map, VR2, rd; - mov, VR1, TMP; - mov, VR0, VR2; - alu32, 0x01, TMP, VR2; - })) +RVOP(cadd, { rv->X[ir->rd] = rv->X[ir->rs1] + rv->X[ir->rs2]; }) /* C.SWSP */ -RVOP( - cswsp, - { - const uint32_t addr = rv->X[rv_reg_sp] + ir->imm; - RV_EXC_MISALIGN_HANDLER(3, STORE, true, 1); - const uint32_t value = rv->X[ir->rs2]; - MEM_WRITE_W(rv, addr, value); +RVOP(cswsp, { + const uint32_t addr = rv->X[rv_reg_sp] + ir->imm; + RV_EXC_MISALIGN_HANDLER(3, STORE, true, 1); + const uint32_t value = rv->X[ir->rs2]; + MEM_WRITE_W(rv, addr, value); #if RV32_HAS(ARCH_TEST) - check_tohost_write(rv, addr, value); + check_tohost_write(rv, addr, value); #endif - }, - GEN({ - mem; - rald, VR0, rv_reg_sp; - ldimms, TMP, mem; - alu64, 0x01, VR0, TMP; - rald, VR1, rs2; - st, S32, VR1, TMP, 0; - })) +}) #endif #if RV32_HAS(EXT_C) && RV32_HAS(EXT_F) /* C.FLWSP */ -RVOP( - cflwsp, - { - const uint32_t addr = rv->X[rv_reg_sp] + ir->imm; - RV_EXC_MISALIGN_HANDLER(3, LOAD, false, 1); - rv->F[ir->rd].v = MEM_READ_W(rv, addr); - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(cflwsp, { + const uint32_t addr = rv->X[rv_reg_sp] + ir->imm; + RV_EXC_MISALIGN_HANDLER(3, LOAD, false, 1); + rv->F[ir->rd].v = MEM_READ_W(rv, addr); +}) /* C.FSWSP */ -RVOP( - cfswsp, - { - const uint32_t addr = rv->X[rv_reg_sp] + ir->imm; - RV_EXC_MISALIGN_HANDLER(3, STORE, false, 1); - const uint32_t value = rv->F[ir->rs2].v; - MEM_WRITE_W(rv, addr, value); +RVOP(cfswsp, { + const uint32_t addr = rv->X[rv_reg_sp] + ir->imm; + RV_EXC_MISALIGN_HANDLER(3, STORE, false, 1); + const uint32_t value = rv->F[ir->rs2].v; + MEM_WRITE_W(rv, addr, value); #if RV32_HAS(ARCH_TEST) - check_tohost_write(rv, addr, value); + check_tohost_write(rv, addr, value); #endif - }, - GEN({ - assert; /* FIXME: Implement */ - })) +}) /* C.FLW */ -RVOP( - cflw, - { - const uint32_t addr = rv->X[ir->rs1] + (uint32_t) ir->imm; - RV_EXC_MISALIGN_HANDLER(3, LOAD, false, 1); - rv->F[ir->rd].v = MEM_READ_W(rv, addr); - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(cflw, { + const uint32_t addr = rv->X[ir->rs1] + (uint32_t) ir->imm; + RV_EXC_MISALIGN_HANDLER(3, LOAD, false, 1); + rv->F[ir->rd].v = MEM_READ_W(rv, addr); +}) /* C.FSW */ -RVOP( - cfsw, - { - const uint32_t addr = rv->X[ir->rs1] + (uint32_t) ir->imm; - RV_EXC_MISALIGN_HANDLER(3, STORE, false, 1); - const uint32_t value = rv->F[ir->rs2].v; - MEM_WRITE_W(rv, addr, value); +RVOP(cfsw, { + const uint32_t addr = rv->X[ir->rs1] + (uint32_t) ir->imm; + RV_EXC_MISALIGN_HANDLER(3, STORE, false, 1); + const uint32_t value = rv->F[ir->rs2].v; + MEM_WRITE_W(rv, addr, value); #if RV32_HAS(ARCH_TEST) - check_tohost_write(rv, addr, value); + check_tohost_write(rv, addr, value); #endif - }, - GEN({ - assert; /* FIXME: Implement */ - })) +}) #endif /* RV32Zba Standard Extension */ @@ -2768,28 +1604,13 @@ RVOP( #if RV32_HAS(Zba) /* SH1ADD */ -RVOP( - sh1add, - { rv->X[ir->rd] = (rv->X[ir->rs1] << 1) + rv->X[ir->rs2]; }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(sh1add, { rv->X[ir->rd] = (rv->X[ir->rs1] << 1) + rv->X[ir->rs2]; }) /* SH2ADD */ -RVOP( - sh2add, - { rv->X[ir->rd] = (rv->X[ir->rs1] << 2) + rv->X[ir->rs2]; }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(sh2add, { rv->X[ir->rd] = (rv->X[ir->rs1] << 2) + rv->X[ir->rs2]; }) /* SH3ADD */ -RVOP( - sh3add, - { rv->X[ir->rd] = (rv->X[ir->rs1] << 3) + rv->X[ir->rs2]; }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(sh3add, { rv->X[ir->rd] = (rv->X[ir->rs1] << 3) + rv->X[ir->rs2]; }) #endif @@ -2798,205 +1619,115 @@ RVOP( #if RV32_HAS(Zbb) /* ANDN */ -RVOP( - andn, - { rv->X[ir->rd] = rv->X[ir->rs1] & (~rv->X[ir->rs2]); }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(andn, { rv->X[ir->rd] = rv->X[ir->rs1] & (~rv->X[ir->rs2]); }) /* ORN */ -RVOP( - orn, - { rv->X[ir->rd] = rv->X[ir->rs1] | (~rv->X[ir->rs2]); }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(orn, { rv->X[ir->rd] = rv->X[ir->rs1] | (~rv->X[ir->rs2]); }) /* XNOR */ -RVOP( - xnor, - { rv->X[ir->rd] = ~(rv->X[ir->rs1] ^ rv->X[ir->rs2]); }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(xnor, { rv->X[ir->rd] = ~(rv->X[ir->rs1] ^ rv->X[ir->rs2]); }) /* CLZ */ -RVOP( - clz, - { - if (rv->X[ir->rs1]) - rv->X[ir->rd] = rv_clz(rv->X[ir->rs1]); - else - rv->X[ir->rd] = 32; - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(clz, { + if (rv->X[ir->rs1]) + rv->X[ir->rd] = rv_clz(rv->X[ir->rs1]); + else + rv->X[ir->rd] = 32; +}) /* CTZ */ -RVOP( - ctz, - { - if (rv->X[ir->rs1]) - rv->X[ir->rd] = rv_ctz(rv->X[ir->rs1]); - else - rv->X[ir->rd] = 32; - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(ctz, { + if (rv->X[ir->rs1]) + rv->X[ir->rd] = rv_ctz(rv->X[ir->rs1]); + else + rv->X[ir->rd] = 32; +}) /* CPOP */ -RVOP( - cpop, - { rv->X[ir->rd] = rv_popcount(rv->X[ir->rs1]); }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(cpop, { rv->X[ir->rd] = rv_popcount(rv->X[ir->rs1]); }) /* MAX */ -RVOP( - max, - { - const int32_t x = rv->X[ir->rs1]; - const int32_t y = rv->X[ir->rs2]; - rv->X[ir->rd] = x > y ? rv->X[ir->rs1] : rv->X[ir->rs2]; - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(max, { + const int32_t x = rv->X[ir->rs1]; + const int32_t y = rv->X[ir->rs2]; + rv->X[ir->rd] = x > y ? rv->X[ir->rs1] : rv->X[ir->rs2]; +}) /* MIN */ -RVOP( - min, - { - const int32_t x = rv->X[ir->rs1]; - const int32_t y = rv->X[ir->rs2]; - rv->X[ir->rd] = x < y ? rv->X[ir->rs1] : rv->X[ir->rs2]; - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(min, { + const int32_t x = rv->X[ir->rs1]; + const int32_t y = rv->X[ir->rs2]; + rv->X[ir->rd] = x < y ? rv->X[ir->rs1] : rv->X[ir->rs2]; +}) /* MAXU */ -RVOP( - maxu, - { - const uint32_t x = rv->X[ir->rs1]; - const uint32_t y = rv->X[ir->rs2]; - rv->X[ir->rd] = x > y ? rv->X[ir->rs1] : rv->X[ir->rs2]; - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(maxu, { + const uint32_t x = rv->X[ir->rs1]; + const uint32_t y = rv->X[ir->rs2]; + rv->X[ir->rd] = x > y ? rv->X[ir->rs1] : rv->X[ir->rs2]; +}) /* MINU */ -RVOP( - minu, - { - const uint32_t x = rv->X[ir->rs1]; - const uint32_t y = rv->X[ir->rs2]; - rv->X[ir->rd] = x < y ? rv->X[ir->rs1] : rv->X[ir->rs2]; - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(minu, { + const uint32_t x = rv->X[ir->rs1]; + const uint32_t y = rv->X[ir->rs2]; + rv->X[ir->rd] = x < y ? rv->X[ir->rs1] : rv->X[ir->rs2]; +}) /* SEXT.B */ -RVOP( - sextb, - { - rv->X[ir->rd] = rv->X[ir->rs1] & 0xff; - if (rv->X[ir->rs1] & (1U << 7)) - rv->X[ir->rd] |= 0xffffff00; - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(sextb, { + rv->X[ir->rd] = rv->X[ir->rs1] & 0xff; + if (rv->X[ir->rs1] & (1U << 7)) + rv->X[ir->rd] |= 0xffffff00; +}) /* SEXT.H */ -RVOP( - sexth, - { - rv->X[ir->rd] = rv->X[ir->rs1] & 0xffff; - if (rv->X[ir->rs1] & (1U << 15)) - rv->X[ir->rd] |= 0xffff0000; - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(sexth, { + rv->X[ir->rd] = rv->X[ir->rs1] & 0xffff; + if (rv->X[ir->rs1] & (1U << 15)) + rv->X[ir->rd] |= 0xffff0000; +}) /* ZEXT.H */ -RVOP( - zexth, - { rv->X[ir->rd] = rv->X[ir->rs1] & 0x0000ffff; }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(zexth, { rv->X[ir->rd] = rv->X[ir->rs1] & 0x0000ffff; }) /* ROL */ -RVOP( - rol, - { - const unsigned int shamt = rv->X[ir->rs2] & 0b11111; - rv->X[ir->rd] = - (rv->X[ir->rs1] << shamt) | (rv->X[ir->rs1] >> (32 - shamt)); - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(rol, { + const unsigned int shamt = rv->X[ir->rs2] & 0b11111; + rv->X[ir->rd] = + (rv->X[ir->rs1] << shamt) | (rv->X[ir->rs1] >> (32 - shamt)); +}) /* ROR */ -RVOP( - ror, - { - const unsigned int shamt = rv->X[ir->rs2] & 0b11111; - rv->X[ir->rd] = - (rv->X[ir->rs1] >> shamt) | (rv->X[ir->rs1] << (32 - shamt)); - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(ror, { + const unsigned int shamt = rv->X[ir->rs2] & 0b11111; + rv->X[ir->rd] = + (rv->X[ir->rs1] >> shamt) | (rv->X[ir->rs1] << (32 - shamt)); +}) /* RORI */ -RVOP( - rori, - { - const unsigned int shamt = ir->imm & 0b11111; - rv->X[ir->rd] = - (rv->X[ir->rs1] >> shamt) | (rv->X[ir->rs1] << (32 - shamt)); - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(rori, { + const unsigned int shamt = ir->imm & 0b11111; + rv->X[ir->rd] = + (rv->X[ir->rs1] >> shamt) | (rv->X[ir->rs1] << (32 - shamt)); +}) /* ORCB */ -RVOP( - orcb, - { - const uint32_t x = rv->X[ir->rs1]; - rv->X[ir->rd] = 0; - for (int i = 0; i < 4; i++) - if (x & (0xffu << (i * 8))) - rv->X[ir->rd] |= 0xffu << (i * 8); - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(orcb, { + const uint32_t x = rv->X[ir->rs1]; + rv->X[ir->rd] = 0; + for (int i = 0; i < 4; i++) + if (x & (0xffu << (i * 8))) + rv->X[ir->rd] |= 0xffu << (i * 8); +}) /* REV8 */ -RVOP( - rev8, - { - rv->X[ir->rd] = (((rv->X[ir->rs1] & 0xffU) << 24) | - ((rv->X[ir->rs1] & 0xff00U) << 8) | - ((rv->X[ir->rs1] & 0xff0000U) >> 8) | - ((rv->X[ir->rs1] & 0xff000000U) >> 24)); - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(rev8, { + rv->X[ir->rd] = + (((rv->X[ir->rs1] & 0xffU) << 24) | ((rv->X[ir->rs1] & 0xff00U) << 8) | + ((rv->X[ir->rs1] & 0xff0000U) >> 8) | + ((rv->X[ir->rs1] & 0xff000000U) >> 24)); +}) #endif @@ -3005,46 +1736,31 @@ RVOP( #if RV32_HAS(Zbc) /* CLMUL */ -RVOP( - clmul, - { - uint32_t output = 0; - for (int i = 0; i < 32; i++) - if ((rv->X[ir->rs2] >> i) & 1) - output ^= rv->X[ir->rs1] << i; - rv->X[ir->rd] = output; - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(clmul, { + uint32_t output = 0; + for (int i = 0; i < 32; i++) + if ((rv->X[ir->rs2] >> i) & 1) + output ^= rv->X[ir->rs1] << i; + rv->X[ir->rd] = output; +}) /* CLMULH */ -RVOP( - clmulh, - { - uint32_t output = 0; - for (int i = 1; i < 32; i++) - if ((rv->X[ir->rs2] >> i) & 1) - output ^= rv->X[ir->rs1] >> (32 - i); - rv->X[ir->rd] = output; - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(clmulh, { + uint32_t output = 0; + for (int i = 1; i < 32; i++) + if ((rv->X[ir->rs2] >> i) & 1) + output ^= rv->X[ir->rs1] >> (32 - i); + rv->X[ir->rd] = output; +}) /* CLMULR */ -RVOP( - clmulr, - { - uint32_t output = 0; - for (int i = 0; i < 32; i++) - if ((rv->X[ir->rs2] >> i) & 1) - output ^= rv->X[ir->rs1] >> (32 - i - 1); - rv->X[ir->rd] = output; - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(clmulr, { + uint32_t output = 0; + for (int i = 0; i < 32; i++) + if ((rv->X[ir->rs2] >> i) & 1) + output ^= rv->X[ir->rs1] >> (32 - i - 1); + rv->X[ir->rd] = output; +}) #endif @@ -3053,91 +1769,51 @@ RVOP( #if RV32_HAS(Zbs) /* BCLR */ -RVOP( - bclr, - { - const unsigned int index = rv->X[ir->rs2] & (32 - 1); - rv->X[ir->rd] = rv->X[ir->rs1] & (~(1U << index)); - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(bclr, { + const unsigned int index = rv->X[ir->rs2] & (32 - 1); + rv->X[ir->rd] = rv->X[ir->rs1] & (~(1U << index)); +}) /* BCLRI */ -RVOP( - bclri, - { - const unsigned int index = ir->imm & (32 - 1); - rv->X[ir->rd] = rv->X[ir->rs1] & (~(1U << index)); - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(bclri, { + const unsigned int index = ir->imm & (32 - 1); + rv->X[ir->rd] = rv->X[ir->rs1] & (~(1U << index)); +}) /* BEXT */ -RVOP( - bext, - { - const unsigned int index = rv->X[ir->rs2] & (32 - 1); - rv->X[ir->rd] = (rv->X[ir->rs1] >> index) & 1; - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(bext, { + const unsigned int index = rv->X[ir->rs2] & (32 - 1); + rv->X[ir->rd] = (rv->X[ir->rs1] >> index) & 1; +}) /* BEXTI */ -RVOP( - bexti, - { - const unsigned int index = ir->imm & (32 - 1); - rv->X[ir->rd] = (rv->X[ir->rs1] >> index) & 1; - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(bexti, { + const unsigned int index = ir->imm & (32 - 1); + rv->X[ir->rd] = (rv->X[ir->rs1] >> index) & 1; +}) /* BINV */ -RVOP( - binv, - { - const unsigned int index = rv->X[ir->rs2] & (32 - 1); - rv->X[ir->rd] = rv->X[ir->rs1] ^ (1U << index); - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(binv, { + const unsigned int index = rv->X[ir->rs2] & (32 - 1); + rv->X[ir->rd] = rv->X[ir->rs1] ^ (1U << index); +}) /* BINVI */ -RVOP( - binvi, - { - const unsigned int index = ir->imm & (32 - 1); - rv->X[ir->rd] = rv->X[ir->rs1] ^ (1U << index); - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(binvi, { + const unsigned int index = ir->imm & (32 - 1); + rv->X[ir->rd] = rv->X[ir->rs1] ^ (1U << index); +}) /* BSET */ -RVOP( - bset, - { - const unsigned int index = rv->X[ir->rs2] & (32 - 1); - rv->X[ir->rd] = rv->X[ir->rs1] | (1U << index); - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(bset, { + const unsigned int index = rv->X[ir->rs2] & (32 - 1); + rv->X[ir->rd] = rv->X[ir->rs1] | (1U << index); +}) /* BSETI */ -RVOP( - bseti, - { - const unsigned int index = ir->imm & (32 - 1); - rv->X[ir->rd] = rv->X[ir->rs1] | (1U << index); - }, - GEN({ - assert; /* FIXME: Implement */ - })) +RVOP(bseti, { + const unsigned int index = ir->imm & (32 - 1); + rv->X[ir->rd] = rv->X[ir->rs1] | (1U << index); +}) #endif diff --git a/tools/gen-jit-template.py b/tools/gen-jit-template.py deleted file mode 100755 index 6ee1edc1e..000000000 --- a/tools/gen-jit-template.py +++ /dev/null @@ -1,367 +0,0 @@ -#!/usr/bin/env python3 - -""" -This script serves as a code generator for creating JIT code templates -based on existing code files in the 'src' directory, eliminating the need -for writing duplicated code. -""" - -import re -import sys - -INSN = { - "Zifencei": ["fencei"], - "Zicsr": ["csrrw", "csrrs", "csrrc", "csrrwi", "csrrsi", "csrrci"], - "EXT_M": ["mul", "mulh", "mulhsu", "mulhu", "div", "divu", "rem", "remu"], - "EXT_A": [ - "lrw", - "scw", - "amoswapw", - "amoaddw", - "amoxorw", - "amoandw", - "amoorw", - "amominw", - "amomaxw", - "amominuw", - "amomaxuw", - ], - "EXT_F": [ - "flw", - "fsw", - "fmadds", - "fmsubs", - "fnmsubs", - "fnmadds", - "fadds", - "fsubs", - "fmuls", - "fdivs", - "fsqrts", - "fsgnjs", - "fsgnjns", - "fsgnjxs", - "fmins", - "fmaxs", - "fcvtws", - "fcvtwus", - "fmvxw", - "feqs", - "flts", - "fles", - "fclasss", - "fcvtsw", - "fcvtswu", - "fmvwx", - ], - "EXT_C": [ - "caddi4spn", - "clw", - "csw", - "cnop", - "caddi", - "cjal", - "cli", - "caddi16sp", - "clui", - "csrli", - "csrai", - "candi", - "csub", - "cxor", - "cor", - "cand", - "cj", - "cbeqz", - "cbnez", - "cslli", - "clwsp", - "cjr", - "cmv", - "cebreak", - "cjalr", - "cadd", - "cswsp", - ], - "EXT_FC": [ - "cflwsp", - "cfswsp", - "cflw", - "cfsw", - ], - "SYSTEM": ["sret"], - "Zba": [ - "sh3add", - "sh2add", - "sh1add", - ], - "Zbb": [ - "rev8", - "orcb", - "rori", - "ror", - "rol", - "zexth", - "sexth", - "sextb", - "minu", - "maxu", - "min", - "max", - "cpop", - "ctz", - "clz", - "xnor", - "orn", - "andn", - ], - "Zbc": [ - "clmulr", - "clmulh", - "clmul", - ], - "Zbs": [ - "bseti", - "bset", - "binvi", - "binv", - "bexti", - "bext", - "bclri", - "bclr", - ], -} -EXT_LIST = [ - "Zifencei", - "Zicsr", - "EXT_M", - "EXT_A", - "EXT_F", - "EXT_C", - "SYSTEM", - "Zba", - "Zbb", - "Zbc", - "Zbs", -] -SKIP_LIST = [] -# check enabled extension in Makefile - - -def parse_argv(EXT_LIST, SKIP_LIST): - for argv in sys.argv: - if argv.find("RV32_FEATURE_") != -1: - ext = argv[argv.find("RV32_FEATURE_") + 13 : -2] - if argv[-1:] == "1" and EXT_LIST.count(ext): - EXT_LIST.remove(ext) - for ext in EXT_LIST: - SKIP_LIST += INSN[ext] - if "EXT_F" in EXT_LIST or "EXT_C" in EXT_LIST: - SKIP_LIST += INSN["EXT_FC"] - - -parse_argv(EXT_LIST, SKIP_LIST) -# prepare PROLOGUE -output = "" -f = open("src/rv32_template.c", "r") -lines = f.read() -# remove_comment -lines = re.sub(r"/\*[\s|\S]+?\*/", "", lines) -# remove exception handler -lines = re.sub(r"RV_EXC[\S]+?\([\S|\s]+?\);\s", "", lines) -# collect functions -emulate_funcs = re.findall(r"RVOP\([\s|\S]+?}\)", lines) -codegen_funcs = re.findall(r"GEN\([\s|\S]+?}\)", lines) -op = [] -impl = [] -for i in range(len(emulate_funcs)): - op.append(emulate_funcs[i][5 : emulate_funcs[i].find(",")].strip()) - impl.append(codegen_funcs[i]) - -f.close() - -fields = { - "imm", - "pc", - "rs1", - "rs2", - "rd", - "shamt", - "branch_taken", - "branch_untaken", -} -virt_regs = {"VR0", "VR1", "VR2"} -# generate jit template -for i in range(len(op)): - if not SKIP_LIST.count(op[i]): - output += impl[i][0:4] + op[i] + ", {" - IRs = re.findall(r"[\s|\S]+?;", impl[i][5:]) - # parse_and_translate_IRs - for i in range(len(IRs)): - IR = IRs[i].strip()[:-1] - items = [s.strip() for s in IR.split(",")] - asm = "" - for i in range(len(items)): - if items[i] in fields: - items[i] = "ir->" + items[i] - if items[i] in virt_regs: - items[i] = "vm_reg[" + items[i][-1] + "]" - if items[i] == "TMP": - items[i] = "temp_reg" - if items[0] == "alu32imm": - if len(items) == 8: - asm = "emit_alu32_imm{}(state, {}, {}, {}, ({}{}_t) {});".format( - items[1], - items[2], - items[3], - items[4], - items[5], - items[6], - items[7], - ) - elif len(items) == 7: - asm = ( - "emit_alu32_imm{}(state, {}, {}, {}, {} & {});".format( - items[1], - items[2], - items[3], - items[4], - items[5], - items[6], - ) - ) - else: - asm = "emit_alu32_imm{}(state, {}, {}, {}, {});".format( - items[1], items[2], items[3], items[4], items[5] - ) - elif items[0] == "alu64imm": - asm = "emit_alu64_imm{}(state, {}, {}, {}, {});".format( - items[1], items[2], items[3], items[4], items[5] - ) - elif items[0] == "alu64": - asm = "emit_alu64(state, {}, {}, {});".format( - items[1], items[2], items[3] - ) - elif items[0] == "alu32": - asm = "emit_alu32(state, {}, {}, {});".format( - items[1], items[2], items[3] - ) - elif items[0] == "ldimm": - if len(items) == 4: - asm = "emit_load_imm(state, {}, {} + {});".format( - items[1], items[2], items[3] - ) - else: - asm = "emit_load_imm(state, {}, {});".format( - items[1], items[2] - ) - elif items[0] == "ldimms": - if items[2] == "mem": - asm = "emit_load_imm_sext(state, {}, (intptr_t) (m->mem_base + ir->imm));".format( - items[1] - ) - elif len(items) == 4: - asm = "emit_load_imm_sext(state, {}, {} + {});".format( - items[1], items[2], items[3] - ) - else: - asm = "emit_load_imm_sext(state, {}, {});".format( - items[1], items[2] - ) - elif items[0] == "lds": - if items[3] == "X": - asm = "emit_load_sext(state, {}, parameter_reg[0], {}, offsetof(riscv_t, X) + 4 * {});".format( - items[1], items[2], items[4] - ) - else: - asm = "emit_load_sext(state, {}, {}, {}, {});".format( - items[1], items[2], items[3], items[4] - ) - elif items[0] == "rald": - asm = "{} = ra_load(state, {});".format(items[1], items[2]) - elif items[0] == "rald2": - asm = "ra_load2(state, {}, {});".format(items[1], items[2]) - elif items[0] == "rald2s": - asm = "ra_load2_sext(state, {}, {}, {}, {});".format( - items[1], items[2], items[3], items[4] - ) - elif items[0] == "map": - asm = "{} = map_vm_reg(state, {});".format(items[1], items[2]) - elif items[0] == "ld": - if items[3] == "X": - asm = "emit_load(state, {}, parameter_reg[0], {}, offsetof(riscv_t, X) + 4 * {});".format( - items[1], items[2], items[4] - ) - else: - asm = "emit_load(state, {}, {}, {}, {});".format( - items[1], items[2], items[3], items[4] - ) - elif items[0] == "st": - if items[3] == "X": - asm = "emit_store(state, {}, {}, parameter_reg[0], offsetof(riscv_t, X) + 4 * {});".format( - items[1], items[2], items[4] - ) - elif items[3] == "PC" or items[3] == "compressed": - asm = "emit_store(state, {}, {}, parameter_reg[0], offsetof(riscv_t, {}));".format( - items[1], items[2], items[3] - ) - else: - asm = "emit_store(state, {}, {}, {}, {});".format( - items[1], items[2], items[3], items[4] - ) - elif items[0] == "mov": - asm = "emit_mov(state, {}, {});".format(items[1], items[2]) - elif items[0] == "cmp": - asm = "emit_cmp32(state, {}, {});".format(items[1], items[2]) - elif items[0] == "cmpimm": - asm = "emit_cmp_imm32(state, {}, {});".format( - items[1], items[2] - ) - elif items[0] == "jmp": - asm = "emit_jmp(state, {} + {});".format(items[1], items[2]) - elif items[0] == "jcc": - asm = "emit_jcc_offset(state, {});".format(items[1]) - elif items[0] == "setjmpoff": - asm = "uint32_t jump_loc = state->offset;" - elif items[0] == "jmpoff": - asm = "emit_jump_target_offset(state, JUMP_LOC, state->offset);" - elif items[0] == "mem": - asm = "memory_t *m = PRIV(rv)->mem;" - elif items[0] == "call": - asm = "emit_call(state, (intptr_t) rv->io.on_{});".format( - items[1] - ) - elif items[0] == "exit": - asm = "emit_exit(state);" - elif items[0] == "mul": - asm = "muldivmod(state, {}, {}, {}, {});".format( - items[1], items[2], items[3], items[4] - ) - elif items[0] == "div": - asm = "muldivmod(state, {}, {}, {}, {});".format( - items[1], items[2], items[3], items[4] - ) - elif items[0] == "mod": - asm = "muldivmod(state, {}, {}, {}, {});".format( - items[1], items[2], items[3], items[4] - ) - elif items[0] == "cond": - if items[1] == "regneq": - items[1] = "vm_reg[0] != vm_reg[1]" - asm = "if({})".format(items[1]) + "{" - elif items[0] == "else": - asm = "} else {" - elif items[0] == "end": - asm = "}" - elif items[0] == "pollute": - asm = "set_dirty({}, true);".format(items[1]) - elif items[0] == "break": - asm = "store_back(state);" - elif items[0] == "assert": - asm = "assert(NULL);" - elif items[0] == "predict": - asm = "parse_branch_history_table(state, ir);" - output += asm + "\n" - output += "})\n" - -sys.stdout.write(output)