Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
187 changes: 187 additions & 0 deletions docs/codegen.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,187 @@
# Code Generation

## Overview
rv32emu employs a tiered execution strategy with an interpreter and a two-tier JIT compiler.
The interpreter provides baseline execution while the JIT compiler generates native machine code for hot paths,
significantly improving performance for compute-intensive workloads.

The code generation infrastructure supports both x86-64 and Arm64 host architectures through
an abstraction layer that maps common operations to architecture-specific instructions.

## Execution Tiers
The emulator uses three execution tiers:
1. Interpreter: Direct execution of RISC-V instructions via tail-call threaded dispatch
2. Tier-1 JIT: Template-based native code generation for basic blocks
3. Tier-2 JIT: LLVM-based compilation for frequently executed hot paths

When a basic block reaches a configurable execution threshold, it gets promoted from interpreter to Tier-1 JIT.
Blocks that continue to execute frequently may be further compiled by the Tier-2 LLVM backend for additional optimizations.

## Source File Organization

| File | Purpose |
|------|---------|
| `src/rv32_template.c` | Interpreter instruction implementations using RVOP macro |
| `src/rv32_jit.c` | Tier-1 JIT code generators using GEN macro (included by jit.c) |
| `src/rv32_constopt.c` | IR-level constant folding and optimization |
| `src/jit.c` | Tier-1 JIT infrastructure, emit_* API, and fused instruction handlers |
| `src/t2c.c` | Tier-2 JIT driver (includes t2c_template.c) |
| `src/t2c_template.c` | Tier-2 JIT instruction handlers using T2C_OP macro |
| `src/emulate.c` | Main execution loop, tail-call dispatch, and macro-op fusion |

## Interpreter Implementation
The interpreter uses the `RVOP` macro to define instruction handlers.
Each handler receives the emulator state and decoded instruction, then performs the operation directly in C:
```c
RVOP(name, { body })
```

This expands to a function that:
- Receives: rv (emulator state), ir (decoded instruction), cycle (counter), PC
- Returns: bool indicating whether to continue execution

Example implementation for the addi instruction:
```c
RVOP(addi, { rv->X[ir->rd] = rv->X[ir->rs1] + ir->imm; })
```

The interpreter uses Tail Call Threaded Code (TCTC) for efficient instruction dispatch.
Each handler ends with a tail call (using the `musttail` attribute) to the next instruction's handler,
avoiding function call overhead and enabling better branch prediction compared to switch-based dispatch.

## Tier-1 JIT Code Generation
The Tier-1 JIT compiler uses the GEN macro to define native code generators:
```c
GEN(name, { body })
```

Each generator emits native machine code that performs the equivalent operation using host CPU instructions.
The `emit_*` functions append bytes to the JIT buffer.

Example: The addi instruction generates a host instruction sequence like:
```
mov VR0, [memory address of (rv->X + rs1)]
mov VR1, VR0
add VR1, imm
```

Note: This is conceptual assembly.
Actual instruction generation depends on the dynamic register allocator state and whether source registers are already mapped.

## Register Allocation (Tier-1)
The Tier-1 JIT maintains a register mapping between RISC-V and host registers:

| Register | Purpose |
|----------|---------|
| `vm_reg[0..2]` | Host registers mapped to VM registers for current operation |
| `temp_reg` | Scratch register for intermediate calculations |
| `parameter_reg[0]` | Points to the `riscv_t` structure |

Register allocation is performed dynamically.
When a RISC-V register value is needed,
the allocator either returns an already-mapped host register or loads the value from memory into an available host register.

## Tier-2 JIT Compilation
The Tier-2 JIT uses LLVM to compile frequently executed blocks into highly optimized native code.
Instruction handlers are defined in `src/t2c_template.c` using the `T2C_OP` macro:
```c
T2C_OP(name, { body })
```

Each handler translates the RISC-V instruction semantics into LLVM IR using the LLVM C API.
LLVM then applies its optimization passes and register allocation,
producing native code that typically outperforms Tier-1 for hot paths.

Tier-2 compilation requires LLVM 18 and is enabled with `ENABLE_JIT=1` at build time.

## IR Optimization
Before execution or JIT compilation,
the emulator performs optimization passes on the internal instruction representation (IR).
Defined in `src/rv32_constopt.c`,
these passes analyze basic blocks to identify opportunities for constant propagation and folding.
For example, a `lui` followed by `addi` to construct a 32-bit constant can be optimized to reduce runtime computation.
This optimization benefits both the interpreter and JIT tiers.

## Code Generation Macros (Tier-1)
Helper macros reduce code duplication for common Tier-1 instruction patterns:

| Macro | Instructions |
|-------|--------------|
| `GEN_BRANCH` | beq, bne, blt, bge, bltu, bgeu |
| `GEN_CBRANCH` | cbeqz, cbnez (compressed) |
| `GEN_ALU_IMM` | addi, xori, ori, andi |
| `GEN_ALU_REG` | add, sub, xor, or, and |
| `GEN_SHIFT_IMM` | slli, srli, srai |
| `GEN_SHIFT_REG` | sll, srl, sra |
| `GEN_SLT_IMM` | slti, sltiu |
| `GEN_SLT_REG` | slt, sltu |
| `GEN_LOAD` | lb, lh, lw, lbu, lhu |
| `GEN_STORE` | sb, sh, sw |

Each macro encapsulates the common code generation pattern for its instruction class,
taking only the instruction-specific parameters (opcode, condition code, etc.).

## Dual-Architecture Support
The JIT supports both x86-64 and Arm64 through an abstraction layer in jit.c.
Constants use x86-64 encoding values but serve as symbolic identifiers on both architectures.
The `emit_*` functions contain architecture-specific implementations guarded by preprocessor conditionals:
```c
#if defined(__x86_64__)
// x86-64 specific code generation
#elif defined(__aarch64__)
// Arm64 specific code generation
#endif
```

For example, jump condition codes (`JCC_JE`, `JCC_JNE`, etc.) match x86-64 Jcc opcodes
but are mapped to equivalent Arm64 condition codes by `emit_jcc_offset()`.

## Emit API (Tier-1)
The `emit_*` functions provide the low-level interface for Tier-1 machine code generation:

| Function | Description |
|----------|-------------|
| `emit_alu32_imm32` | ALU operation with 32-bit immediate |
| `emit_alu32_imm8` | ALU operation with 8-bit immediate |
| `emit_alu32` | ALU operation between registers |
| `emit_load_imm` | Load immediate value into register |
| `emit_load` / `emit_store` | Memory access operations |
| `emit_mov` | Register-to-register move |
| `emit_cmp32` / `emit_cmp_imm32` | Comparison operations |
| `emit_jcc_offset` | Conditional jump |
| `emit_jmp` | Unconditional jump |
| `emit_call` | Function call to runtime helper |

## Block Chaining
Translated blocks can be chained together to avoid returning to the dispatcher between consecutive blocks.
When a block ends with a direct branch to another translated block,
the JIT patches the branch target to jump directly to the target block's native code.

Block chaining is controlled by the `ENABLE_BLOCK_CHAINING` configuration option.

## Macro-op Fusion
The emulator fuses common instruction sequences into single operations,
benefiting both the interpreter and JIT tiers. Fusion is implemented in `src/emulate.c`
via `match_pattern` and `do_fuse*` handlers, which transform the IR before execution.

For example, a lui followed by addi to construct a 32-bit constant can be fused into a single load-immediate operation.
The fused instructions are then handled by dedicated handlers in each tier:
- Interpreter: `do_fuse*` functions in `src/emulate.c`
- Tier-1 JIT: `do_fuse*` functions in `src/jit.c`
- Tier-2 JIT: `T2C_OP(fuse*, ...)` handlers in `src/t2c_template.c`

Macro-op fusion is controlled by the `ENABLE_MOP_FUSION` configuration option.

## Memory Access and MMIO
Load and store instructions check for memory-mapped I/O regions when system emulation is enabled.
The `GEN_LOAD` and `GEN_STORE` macros include conditional code generation for MMIO handling:
```c
IIF(RV32_HAS(SYSTEM_MMIO))(
// MMIO check and handler call
,
// Direct memory access
)
```

When MMIO is not enabled, the generated code performs direct memory access
without the overhead of region checking.
2 changes: 1 addition & 1 deletion src/emulate.c
Original file line number Diff line number Diff line change
Expand Up @@ -618,7 +618,7 @@ static uint32_t peripheral_update_ctr = 64;
} while (0)
#endif

#define RVOP(inst, code, asm) \
#define RVOP(inst, code) \
static PRESERVE_NONE bool do_##inst(riscv_t *rv, const rv_insn_t *ir, \
uint64_t cycle, uint32_t PC) \
{ \
Expand Down
29 changes: 14 additions & 15 deletions src/jit.c
Original file line number Diff line number Diff line change
Expand Up @@ -875,31 +875,31 @@ static inline void emit_jcc_offset(struct jit_state *state, int code)
{
#if defined(__x86_64__)
/* unconditional jump instruction does not have 0x0f prefix */
if (code != 0xe9)
if (code != JCC_JMP)
emit1(state, 0x0f);
emit1(state, code);
emit4(state, 0);
#elif defined(__aarch64__)
switch (code) {
case 0x84: /* BEQ */
case JCC_JE: /* BEQ */
code = COND_EQ;
break;
case 0x85: /* BNE */
case JCC_JNE: /* BNE */
code = COND_NE;
break;
case 0x8c: /* BLT */
case JCC_JL: /* BLT */
code = COND_LT;
break;
case 0x8d: /* BGE */
case JCC_JGE: /* BGE */
code = COND_GE;
break;
case 0x82: /* BLTU */
case JCC_JB: /* BLTU */
code = COND_LO;
break;
case 0x83: /* BGEU */
case JCC_JAE: /* BGEU */
code = COND_HS;
break;
case 0xe9: /* AL */
case JCC_JMP: /* AL */
code = COND_AL;
break;
default:
Expand Down Expand Up @@ -1194,7 +1194,7 @@ static inline void emit_jmp(struct jit_state *state,
uint32_t target_satp UNUSED)
{
#if defined(__x86_64__)
emit1(state, 0xe9);
emit1(state, JCC_JMP);
emit_jump_target_address(state, target_pc, target_satp);
#elif defined(__aarch64__)
assert(state->n_jumps < MAX_JUMPS);
Expand Down Expand Up @@ -1240,7 +1240,7 @@ static inline void emit_call(struct jit_state *state, intptr_t target)
static inline void emit_exit(struct jit_state *state)
{
#if defined(__x86_64__)
emit1(state, 0xe9);
emit1(state, JCC_JMP);
emit_jump_target_offset(state, state->offset, state->exit_loc);
emit4(state, 0);
#elif defined(__aarch64__)
Expand Down Expand Up @@ -1293,7 +1293,7 @@ static void divmod(struct jit_state *state,
if (sign) {
/* handle overflow */
uint32_t jump_loc_0 = state->offset;
emit_jcc_offset(state, 0x85);
emit_jcc_offset(state, JCC_JNE);
emit_cmp_imm32(state, rm, -1);
if (mod)
emit_load_imm(state, R10, 0);
Expand Down Expand Up @@ -1402,7 +1402,7 @@ static void muldivmod(struct jit_state *state,
/* handle DIV overflow */
emit1(state, 0x9d); /* popfq */
uint32_t jump_loc_0 = state->offset;
emit_jcc_offset(state, 0x85);
emit_jcc_offset(state, JCC_JNE);
emit_cmp_imm32(state, RCX, 0x80000000);
emit_conditional_move(state, RCX, RAX);
emit_jump_target_offset(state, JUMP_LOC_0, state->offset);
Expand All @@ -1417,7 +1417,7 @@ static void muldivmod(struct jit_state *state,
/* handle REM overflow */
emit1(state, 0x9d); /* popfq */
uint32_t jump_loc_0 = state->offset;
emit_jcc_offset(state, 0x85);
emit_jcc_offset(state, JCC_JNE);
emit_cmp_imm32(state, RCX, 0x80000000);
emit_load_imm(state, RCX, 0);
emit_conditional_move(state, RCX, RDX);
Expand Down Expand Up @@ -1832,7 +1832,6 @@ static int liveness_cmp(const void *l, const void *r)
return 0;
}

/* TODO: this function could be generated by "tools/gen-jit-template.py" */
static inline void liveness_calc(block_t *block)
{
uint32_t idx;
Expand Down Expand Up @@ -2316,7 +2315,7 @@ void parse_branch_history_table(struct jit_state *state,
emit_load_imm(state, register_map[0].reg_idx, bt->PC[max_idx]);
emit_cmp32(state, temp_reg, register_map[0].reg_idx);
uint32_t jump_loc_0 = state->offset;
emit_jcc_offset(state, 0x85);
emit_jcc_offset(state, JCC_JNE);
#if RV32_HAS(SYSTEM)
emit_jmp(state, bt->PC[max_idx], bt->satp[max_idx]);
#else
Expand Down
13 changes: 13 additions & 0 deletions src/jit.h
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,19 @@
#include "riscv_private.h"
#include "utils.h"

/* Jump condition codes for branch instruction code generation.
* Values match x86-64 Jcc opcodes; on Arm64, emit_jcc_offset() maps these
* to equivalent condition codes. Used by both architectures as symbolic
* constants for conditional/unconditional jumps.
*/
#define JCC_JE 0x84 /* Jump if Equal (conditional) */
#define JCC_JNE 0x85 /* Jump if Not Equal (conditional) */
#define JCC_JL 0x8c /* Jump if Less - signed (conditional) */
#define JCC_JGE 0x8d /* Jump if Greater or Equal - signed (conditional) */
#define JCC_JB 0x82 /* Jump if Below - unsigned (conditional) */
#define JCC_JAE 0x83 /* Jump if Above or Equal - unsigned (conditional) */
#define JCC_JMP 0xe9 /* Jump unconditional */

struct jump {
uint32_t offset_loc;
uint32_t target_pc;
Expand Down
Loading
Loading