Skip to content

RISC-V | TL-Verilog | Pipe-lining | Micro-Architecture | Makechip

Notifications You must be signed in to change notification settings

emillal/RISC-V_based_MYTH_IIITB

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 

Repository files navigation

RISC-V based MYTH

This repository summarises work done in ASIC class on the RISC-V workshop mentored by Kunal Ghosh of VSD Corp. Pvt. Ltd.

Day 1-Introduction to RISC-V ISA And GNU compiler toolchain

Introduction
RISC-V is an open-source instruction set architecture (ISA) for computer processors.
An instruction set architecture defines the set of instructions that a processor can execute and the organization and behaviour of those instructions. RISC-V is unique in that any single company or organization does not own it. and it is freely available for anyone to use, modify, and implement without the need for licensing fees or proprietary restrictions.

The RISC-V project began at the University of California, Berkeley in 2010, and it has since gained significant traction in both academia and industry. Its open nature has led to a growing ecosystem of hardware and software developers collaborating to create a wide range of products, from simple embedded devices to high-performance supercomputers.

Screenshot from 2023-08-19 12-10-50


Application software (apps) and hardware are linked by 'system software'.There are various layers of **system software**. This includes major components like Compiler and Assembler.

The compiler compiles high-level codes like C and C++ to Instructions(eg: the codes inside .exe files) that can be read by the Assembler.
The Assembler converts it into binary codes which the machine can understand. The instructions act as an interface between the high-level language and the machine language.

The converted binary is then given to an RTL snippet that understands the instruction. This is done by a Hardware Description Language (HDL).
This is basically called RTL implementation and a netlist is being generated. with this, a physical design implementation of the design is generated.
Lab 1 : RISC-V software toolchain | GCC Compile and Disassemble
First, let us write a basic C program to find the sum of n numbers.
#include <stdio.h>
int main(){
int n = 100,sum=0,i;
for(i=0;i<=n;i++)
{
    sum= sum +i;
}
printf("The sum of %d consecutive numbers is :%d \n",n,sum);
return 0;
}

We need to install the necessary Rriscv_workshop_collateralsISC-V toolchain for RISC-V activities.
Steps to set up the toolchain:

  1. git clone https://github.com/kunalg123/riscv_workshop_collaterals.git using terminal

  2. go to the riscv_workshop_collaterals folder.

  3. On terminal chmod 755 run.sh and then ./run.sh

  4. During the installation, an error might pop up at the end. Ignore it for now.

  5. The riscv_toolchain folder will be on the home.

  6. change directory(cd) to /riscv_toolchain/iverilog/

  7. Then do the following commands:

    git checkout --track -b v10-branch origin/v10-branch
    git pull 
    chmod 777 autoconf.sh 
    ./autoconf.sh 
    ./configure 
    make
    sudo make install
    
  8. We need to set the PATH variable in .bashrc.Do the following commands.

    gedit .bashrc
    **within .bashc file**
    export PATH="/home/<username>/riscv_toolchain/riscv64-unknown-elf-gcc-8.3.0-2019.08.0-x86_64-linux-ubuntu14/bin:$PATH"
    **within .bashrc file **
    source .bashrc
    

So to compile the .c file with the RISC-V compiler tool. we are using the below command.

riscv64-unknown-elf-gcc -O1 -mabi=lp64 -march=rv64i -o <filename>.o <filename>.c

To view the assembly code for the same, do the following command.

riscv64-unknown-elf-objdump -d <filename>.o

Screenshot from 2023-08-19 07-51-55

To view the detailed code do the following command.

riscv64-unknown-elf-objdump -d <filename>.o | less

To find main we type /main in the : and press the 'N' button.

Screenshot from 2023-08-19 07-59-21

In the above screenshot, we can see the memory address for the instructions. Where it starts and where another one begins.
If we subtract '00000000000101c0'(end of main) and '0000000000010184'(beginning of main) and then divide by 4 we get 15. Which is the number of instructions within that particular block(main).

Now let's execute the below commands:

riscv64-unknown-elf-gcc -Ofast -mabi=lp64 -march=rv64i -o <filename>.o <filename>.c
riscv64-unknown-elf-objdump -d <filename>.o | less

Screenshot from 2023-08-19 08-14-21

In the above screenshot, if we subtract '00000000000100e0'(end of main) and '00000000000100b0'(beginning of main) and then divide by 4 we get 12. Which is the number of instructions within that particular block(main).

Lab 2 : Spike Simulation and Debug
In this lab, we are going to Debug the '.o' file that we generated using the RISC-V compiler.
For that, we use the following command:
```
spike pk <filename>.o // to give an output
spike -d pk <filename>.o //open debugger
```

The debugger mode will be open.
We use the until command to move to a particular address.
Here we are debugging the highlighted instructions.

Screenshot from 2023-08-19 09-49-10

The following commands are used in the debugger:

until pc 0 100b0 //moves the program counter(PC) to the address
reg 0 a2    //views content of the address
**We press enter to go to the next instruction**

Screenshot from 2023-08-19 09-49-25

In the above screenshot, we can see the register pertaining to a particular instruction getting updated.

Integer number representation
Integer number representation refers to the method used to represent whole numbers (integers) within a computer's memory or processor. There are different ways to represent integers in binary form, which is the fundamental language of computers.
On a high level, we recognize numbers as decimals but computers recognize them as binary (1's or 0's). So binary conversion is key here.

Unsigned Numbers


Unsigned numbers are a type of integer representation that only includes non-negative integers. These numbers do not have a sign bit to indicate whether they are positive or negative; they represent values greater than or equal to zero. In binary representation, all the bits are used to represent the magnitude of the number, and there is no need to allocate a bit for the sign.

Unsigned numbers are often used in situations where negative values are not relevant or meaningful. They can represent quantities, indices, counts, and other values that are always positive or zero.

Screenshot from 2023-08-19 11-56-34

It is important to note that:

  • 1 byte is 8 bits
  • 1 word is 4 bytes
  • 2 words (double word) is 8 bytes

So A word is 32 bits and a double word is 16 bits.

### 2 bit
the total number of representations = **2^2=4**
0 -> (2^2 -1)

### 4 bit
the total number of representations = **2^4=16**
0 -> (2^4 -1)

Similarly, we can follow the same for the **RISC-V** which has 64-bit Architecture.
the total number of representations = **2^16**
0 -> (2^16 -1)

Signed Numbers


Signed numbers are a type of integer representation that includes both positive and negative integers. In computer systems, signed numbers are represented using various methods to indicate the sign and magnitude of the number.
Two's complement is the most widely used method for representing signed integers in computers. In this method, the leftmost bit (the most significant bit) is the sign bit. A value of 0 in the sign bit represents a positive number, and a value of 1 represents a negative number. The remaining bits represent the magnitude of the number in binary form.

To negate a number in two's complement, you invert all the bits (change 0s to 1s and vice versa) and then add 1 to the result. This method simplifies arithmetic operations and eliminates the need for a separate subtraction circuit.
Lab 3 : Signed and unsigned

LAB 3A

Here we are going to execute the following code.
The output is given for unsigned numbers and we are just viewing if it is within the range or it goes out in which case displays either the minimum or maximum values.


#include <stdio.h>
#include <math.h>
int main()
{
    unsigned long long int max = (unsigned long long int)(pow(2,64)-1);       //statement 1  //will display the max number
    //unsigned long long int max = (unsigned long long int)(pow(2,127)-1);    //statement 2  //(out of range) will display the max number(within range)
    //unsigned long long int max = (unsigned long long int)(pow(2,10)-1);     //statement 3  //will display number (since it is within range)
    //unsigned long long int max = (unsigned long long int)(pow(2,64) * -1);  //statement 4  //will display 0 since it is a -ve number
    printf("highest number represented by unsigned long long int is %llu\n",max);
    return 0;
}

The output is given below in sequential order.
Screenshot from 2023-08-19 13-10-09

LAB 3B

Here we are going to execute the following code.

#include <stdio.h>
#include <math.h>
int main()
{
    long long int max = (long long int)(pow(2,63)-1);        //will display the max number
    long long int min = (long long int)(pow(2,63) * -1);     //will display the min number
    printf("highest number represented by long long int is %lld\n",max);
    printf("lowest number represented by long long int is %lld\n",min);
    return 0;
}

The below screenshot shows the output of the same.

Screenshot from 2023-08-19 14-41-36

Day 2-Introduction to Application Binary Interface And Basic Error Flow

Introduction to ABI and verification Flows
The Application Binary Interface (ABI) is a set of rules and conventions that dictate how binary code communicates and interacts with other binary code, usually across different components of a software system or even across different software systems. In simpler terms, it defines how functions are called, how data is organized, and how components cooperate at the binary level.

There are multiple layers between the Application program and the Hardware. One of these layers is the ABI.

Screenshot from 2023-08-19 15-20-31
ABIs are crucial for enabling interoperability between different software components, whether they're compiled by the same or different compilers, or even running on different hardware architectures. Operating systems, libraries, and various programming languages need to adhere to a specific ABI to ensure that their binary components can work together seamlessly.

What is XLEN?


'XLEN' typically refers to the X Register Length. The XLEN value represents the number of bits in the general-purpose registers, which are used for storing data and performing computations. In a RISC-V system with a 32-bit XLEN, the general-purpose registers would be 32 bits wide.
Similarly, in a RISC-V system with a 64-bit XLEN, the general-purpose registers would be 64 bits wide. The XLEN value has a significant impact on the performance, memory-addressing capabilities, and overall capabilities of the processor.
A larger XLEN allows for more addressable memory space and potentially more complex computations. However, it also comes with increased hardware. The XLEN value has a significant impact on the performance, memory-addressing capabilities, and overall capabilities of the processor.

There are two ways in which we can load data into the 64-bit register.

  • Can be directly loaded
  • Can be loaded into memory and then loaded to reg

RISC-V follows the little-endian condition.
Little Endian is a byte order or endianness used in computer architecture to represent multi-byte data types such as integers and floating-point numbers in memory.
In a Little Endian system, the least significant byte (LSB) of a multi-byte value is stored at the lowest memory address, while the most significant byte (MSB) is stored at the highest memory address.

RISC-V little-endian is illustrated by the figure below:

Screenshot from 2023-08-19 15-44-13

Load,Add and Store Instructions

Firstly let us load an array 'M' of 3 double-word as shown in the below figure.

Screenshot from 2023-08-19 15-59-44
Here let us load it into the 'x8' register. let's say reg 'x23' contains the base address of array M and it is '0'.
as in it starts at 0. Let us consider the instruction below,ld(load doubleword).

ld   x8, 16(x23)

breaking down the instruction:

  • ld: load double
  • x8: destination register 'rd'
  • 16: offset immediate
  • x23: Souce register 'rs1'

Here the immediate offset gets added to the contents of the source register to form the final address and loads the value from '16' in the figure above into the x8 reg.

Screenshot from 2023-08-19 16-12-32
The diagram above represents the instruction(I-type).
It is important to note that even though the register size is of 64-bit in RV64 the instruction size is 32-bit.
The opcode and the funct3 determine the type of instruction, whether it is an 'ld' or 'addi' etc.
rs1 and rd are the source and destination registers respectively.
The immediate bits will contain the offset.

Let us look at another instruction,add.

add x8, x24,x8

The structure for the above command is given below (R-type), unlike the one before here there are two source registers instead of one and another funct register also.

Screenshot from 2023-08-19 16-23-38

There is another instruction going by sd(store doubleword).

sd x8, 8(x23)

Since there is a limited availability of registers we need to store values to registers after loading and doing operations.
The Instruction structure will look like so(S-type):
Screenshot from 2023-08-19 16-35-09

Instruction set in RISC-V

There are mainly three types of Instructions:

  1. I-type: I-type instructions are used for operations that involve an immediate value (constant) and a register.
  2. R-type: R-type instructions are used for operations that involve two source registers and a destination register.
    These instructions typically perform arithmetic, logical, or bitwise operations.
  3. S-type: S-type instructions are used for memory store operations.
    These instructions store data from a source register in memory at an address determined by an offset from another register.
LAB work using ABI function calls
For this lab, we are going to create a C program and simulate it with a Function call.
We are going to be following the below algorithm.

Screenshot from 2023-08-19 18-27-06
The C program used is:

#include <stdio.h>
extern int load(int x,int y);
int main()
 {
 	int result = 0;
	int count =9;
 	result = load(0x0,count+1);
 	printf("Sum of numbers from 1 to %d is %d\n",count,result);
 }

We also create a load.S file as below:

.section .text
.global load
.type load, @function

load: 
     add   a4,a0,zero    //initialize sum register a4 with 0x0
     add   a2,a0,a1      //store count of 10 in reg a. reg a1 is loaded with 0xa(decimal 10) from main
     add   a3,a0,zero    //initialize intermediate sum reg a3 by 0x0

loop:
 add   a4,a3,a4     // Incremental addition
     addi  a3,a3,1      // Increment intermediate register by 1
     blt   a3,a2,loop   // If a3 is less than a2,branch to label <loop> 
     add   a0,a4,zero   // store final result to reg a0 so that it can be read by main pgm
     ret

We do the following commands in the terminal:

riscv64-unknown-elf-gcc -ofast -mabi=lp64 -march=rv64i -o <filename>.o <filename>.c load.S
spike pk <filename>.o
riscv64-unknown-elf-objdump -d <filename>.o | less

The output screenshots are shown below:
Screenshot from 2023-08-19 18-34-28 Screenshot from 2023-08-19 18-35-44

Basic verification flow using verilog
In this lab, we are basically going to generate a hex file and a bitstream of the same code done above.
We run the below code to generate the same.
riscv64-unknown-elf-gcc -c -mabi=ilp32 -march=rv32im -o <filename>.o <filename>.c
riscv64-unknown-elf-gcc -c -mabi=ilp32 -march=rv32im -o load.o load.S

riscv64-unknown-elf-gcc -c -mabi=ilp32 -march=rv32im -o syscalls.o syscalls.c
riscv64-unknown-elf-gcc -mabi=ilp32 -march=rv32im -Wl,--gc-sections -o firmware.elf load.o <filename>.o syscalls.o -T riscv.ld -lstdc++
chmod -x firmware.elf
riscv64-unknown-elf-gcc -mabi=ilp32 -march=rv32im -nostdlib -o start.elf start.S -T start.ld -lstdc++
chmod -x start.elf
riscv64-unknown-elf-objcopy -O verilog start.elf start.tmp
riscv64-unknown-elf-objcopy -O verilog firmware.elf firmware.tmp
cat start.tmp firmware.tmp > firmware.hex
python3 hex8tohex32.py firmware.hex > firmware32.hex
rm -f start.tmp firmware.tmp
iverilog -o testbench.vvp testbench.v picorv32.v
chmod -x testbench.vvp
vvp -N testbench.vvp

The file firmware.hex is the hex file and firmware32.hex is the bitstream generated.
The below file is firmware.hex.
Screenshot from 2023-08-19 19-03-24

The below file is firmware32.hex (bitstream).
Screenshot from 2023-08-19 19-03-42

Day 3-Introduction to TL Verilog and Makerchip

Digital logic with TL-Verilog and Makerchip

In this part of the workshop, we are going to look at:

  1. Logic gates
  2. MakerChip platform(IDE)
  3. Combinational Logic
  4. Sequential Logic
  5. Piplining logic
  6. Slate

Introduction to logic gates

Logic gates are fundamental building blocks of digital circuits. They are electronic devices that perform basic logical operations on one or more binary inputs (usually 0 or 1) to produce a single binary output. These gates are the foundation of all digital systems, including computers, microcontrollers, and other digital devices. Logic gates are typically implemented using electronic components such as transistors.

The most common logic gates are:

  1. NOT
  2. AND
  3. OR
  4. NAND
  5. NOR
  6. XOR
  7. XNOR

Screenshot from 2023-08-19 21-30-21

Makerchip

Makerchip is an online platform that provides an integrated development environment (IDE) for digital design and verification using SystemVerilog and TL Verilog. It allows engineers, students, and enthusiasts to design and simulate digital circuits, develop RTL (Register Transfer Level) code, and explore hardware design concepts without requiring the local installation of tools.

To Familiarize oneself with the IDE there are tutorials on the platform where we can experiment.

Combinational Logic

For this, I have started with a basic inverter logic program, logical operations and a multi-bit mux program. Here we are using Transitional Level Verilog(TL-V), which is quite different from standard verilog code, syntax varies quite a bit. There is no need to declare $out and $in unlike Verilog. There is also no need to assign $in. A random stimulus is provided, and a warning is produced.

Attaching output of the inverter in Makerchip IDE for reference:
invertor

Now let us do a bit more of a complex combinational circuit. Let us create the following circuit.
Below shown is a calculator which uses a mux.

combi_calc1

The code for this here and the output is shown as below.

combi

Sequential Logic

Sequential Circuits will always use a clock. The basic element required for a circuit to be sequential is a D-FF. The DFF transitions
to the next state on the positive edge of the clock.
A basic circuit design implementing a Fibonacci series is done on Makerchip.
Also, a free-running counter having the following design is implemented.

freerncntr
Right-click open the code to get the tlv code for the free running counter.

The output for the same is shown below.
feecnte

LAB Sequential Calculator
Here we are going to implement a FF to the combinational calculator we did before.
The design of the sequential calculator is like so:

seqcalc0
The output of the previous clock cycle is assigned to the input1.
The tlv code is available here.

The output for the same is shown below.
seqcalc

Pipeling Logic
Here firstly we are trying to make the following pipeline and see the output waveform.

err
Click to view the code.
The output in Makerchip is shown below.

error

Two-cycle calculator

We are required to write code and view the waveform of the below design.

calc_pipeline
As shown above, the 2-cycle calculator clears the output alternatively and the output of given inputs is observed at the next cycle.
Click to view the code.

The output on Makerchip is shown below.

cyclepipe

Validity
First, we shall see a distance accumulator coupled with a Pythagorean pipeline as shown below.

validity_fig
Click to view the code for the Distance accumulator.

The output for it generated in MakerChip is given below.
validity_ss

Cycle Calculator with Validity

The following design is what we are required to create.
one

Click here to open the code

The output in Makerchip is given below.
oneout

Cycle Calculator with Validity and memory

The design we have to implement is given below.
memory1

Click here to open code.

The output in Makerchip is given below.

memory2

Wrapping up
Here we are just going to look at some special codes.

Conways Game of life

The Game of Life, also known simply as "Life," is a cellular automaton devised by mathematician John Conway in 1970. It's not a traditional game with players,
but rather a simulation that follows a set of rules to create patterns and behaviours.

The Game of Life takes place on an infinite grid of cells, each of which can be in one of two states: alive or dead. The state of each cell evolves over discrete time steps based on its current state and the states of its eight neighbouring cells. The evolution is determined by the following rules:

  1. Any live cell with fewer than two live neighbours dies as if caused by underpopulation.
  2. Any live cell with two or three live neighbours survives to the next generation.
  3. Any live cell with more than three live neighbours dies, as if by overpopulation.
  4. Any dead cell with exactly three live neighbours becomes a live cell, as if by reproduction.

game of life

Pythagoras's theorem

Screenshot from 2023-08-20 21-14-23
The output on Makechip is:
pygorus

Day 4-Basic RISC-V CPU micro-architecture

Introduction to RISC-V microacrchitecture
Microarchitecture refers to the internal design and organization of a CPU that implements a particular ISA.
RISC-V CPUs can have different microarchitectures that optimize for various aspects such as performance, power efficiency, and area (size of the chip).
Here are some key features and concepts commonly found in RISC-V microarchitecture:
  1. Instruction Fetch (IF)
  2. Instruction Decode (ID)
  3. Execution Units

It's important to note that RISC-V is an instruction set architecture, and microarchitectures based on RISC-V can vary widely depending on the design goals of the processor manufacturer. Different companies and research institutions may develop their own microarchitectures that implement the RISC-V ISA in unique ways, tailored to specific use cases and performance goals.

Here we are designing the basic processor of 3 stages fetch, decode and execute based on RISC-V ISA.
For starting the implementation a starter code is present in the below link.

https://github.com/stevehoover/RISC-V_MYTH_Workshop

The block diagram of a basic RISC-V microarchitecture is shown in the figure below.

Screenshot from 2023-08-21 14-22-08


Using the Makerchip platform the implementation of the RISC-V microarchitecture or core is done.
For starting the implementation a starter code present here is used. The starter code shell consists of:
  1. RISC-V Assembler
  2. Test program
  3. Visualization(Viz)
  4. Commented code for register, file and memory.
Fetch and Decode
Here we are going to design a simple RISC-V Core CPU as per the below diagram.

1

The sample template is given below.

\m4_TLV_version 1d: tl-x.org
\SV
   // This code can be found in: https://github.com/stevehoover/RISC-V_MYTH_Workshop
   
   m4_include_lib(['https://raw.githubusercontent.com/BalaDhinesh/RISC-V_MYTH_Workshop/master/tlv_lib/risc-v_shell_lib.tlv'])

\SV
   m4_makerchip_module   // (Expanded in Nav-TLV pane.)
\TLV

   // /====================\
   // | Sum 1 to 9 Program |
   // \====================/
   //
   // Program for MYTH Workshop to test RV32I
   // Add 1,2,3,...,9 (in that order).
   //
   // Regs:
   //  r10 (a0): In: 0, Out: final sum
   //  r12 (a2): 10
   //  r13 (a3): 1..10
   //  r14 (a4): Sum
   // 
   // External to function:
   m4_asm(ADD, r10, r0, r0)             // Initialize r10 (a0) to 0.
   // Function:
   m4_asm(ADD, r14, r10, r0)            // Initialize sum register a4 with 0x0
   m4_asm(ADDI, r12, r10, 1010)         // Store count of 10 in register a2.
   m4_asm(ADD, r13, r10, r0)            // Initialize intermediate sum register a3 with 0
   // Loop:
   m4_asm(ADD, r14, r13, r14)           // Incremental addition
   m4_asm(ADDI, r13, r13, 1)            // Increment intermediate register by 1
   m4_asm(BLT, r13, r12, 1111111111000) // If a3 is less than a2, branch to label named <loop>
   m4_asm(ADD, r10, r14, r0)            // Store final result to register a0 so that it can be read by main program
   
   // Optional:
   // m4_asm(JAL, r7, 00000000000000000000) // Done. Jump to itself (infinite loop). (Up to 20-bit signed immediate plus implicit 0 bit (unlike JALR) provides byte address; last immediate bit should also be 0)
   m4_define_hier(['M4_IMEM'], M4_NUM_INSTRS)

   |cpu
      @0
         $reset = *reset;



      // YOUR CODE HERE
      // ...

      // Note: Because of the magic we are using for visualisation, if visualisation is enabled below,
      //       be sure to avoid having unassigned signals (which you might be using for random inputs)
      //       other than those specifically expected in the labs. You'll get strange errors for these.

   
   // Assert these to end simulation (before Makerchip cycle limit).
   *passed = *cyc_cnt > 40;
   *failed = 1'b0;
   
   // Macro instantiations for:
   //  o instruction memory
   //  o register file
   //  o data memory
   //  o CPU visualization
   |cpu
      //m4+imem(@1)    // Args: (read stage)
      //m4+rf(@1, @1)  // Args: (read stage, write stage) - if equal, no register bypass is required
      //m4+dmem(@4)    // Args: (read/write stage)
      //m4+myth_fpga(@0)  // Uncomment to run on fpga

   //m4+cpu_viz(@4)    // For visualisation, argument should be at least equal to the last stage of CPU logic. @4 would work for all labs.
\SV
   endmodule

PC Logic

In RISC-V, the Program Counter (PC) is a special-purpose register that holds the memory address of the next instruction to be fetched and executed.
The PC is also commonly referred to as the instruction pointer (IP) in other architectures.
The PC is a crucial component of the processor's control flow, as it determines the sequence of instructions that are fetched and executed.

2
The below piece of code is implemented.

|cpu
      @0
         $reset = *reset;
         
         $pc[31:0] = >>1$reset ? 32'b0 : >>1$pc + 32'd4;

The output of the same in Makerchip is given below:

3

Fetch

Instruction fetch is the initial stage of the instruction processing pipeline in a RISC-V processor.
In this stage, the processor retrieves the instruction located at the memory address pointed to by the Program Counter (PC),
and prepares it for further processing.

The below diagram is what we need to implement in this step.

4

The code we have to change is:


|cpu
      @0
         $reset = *reset;
         
         $pc[31:0] = >>1$reset ? 32'b0 : >>1$pc + 32'd4;
         
         
         
      // Note: Because of the magic we are using for visualisation, if visualisation is enabled below,
      //       be sure to avoid having unassigned signals (which you might be using for random inputs)
      //       other than those specifically expected in the labs. You'll get strange errors for these.

   
   // Assert these to end simulation (before Makerchip cycle limit).
   *passed = *cyc_cnt > 40;
   *failed = 1'b0;
   
   // Macro instantiations for:
   //  o instruction memory
   //  o register file
   //  o data memory
   //  o CPU visualization
   |cpu
      m4+imem(@1)    // Args: (read stage)
      //m4+rf(@1, @1)  // Args: (read stage, write stage) - if equal, no register bypass is required
      //m4+dmem(@4)    // Args: (read/write stage)
   
   m4+cpu_viz(@4)    // For visualisation, argument should be at least equal to the last stage of CPU logic
                      

The output as shown on Makerchip is:

5

The corrected Fetch code block is:

6

|cpu
      @0
         $reset = *reset;
         
         $pc[31:0] = >>1$reset ? 32'b0 : >>1$pc + 32'd4;
      @1
         $imem_rd_addr[M4_IMEM_INDEX_CNT-1:0] = $pc[M4_IMEM_INDEX_CNT+1:2];
         $imem_rd_en = !$reset;
         $instr[31:0] = $imem_rd_data[31:0];
      ?$imem_rd_en
         @1
            $imem_rd_data[31:0] = /imem[$imem_rd_addr]$instr;

// Assert these to end simulation (before Makerchip cycle limit).
   *passed = *cyc_cnt > 40;
   *failed = 1'b0;
   
   // Macro instantiations for:
   //  o instruction memory
   //  o register file
   //  o data memory
   //  o CPU visualization
   |cpu
      m4+imem(@1)    // Args: (read stage)
      //m4+rf(@1, @1)  // Args: (read stage, write stage) - if equal, no register bypass is required
      //m4+dmem(@4)    // Args: (read/write stage)
   
   m4+cpu_viz(@4)    // For visualisation, argument should be at least equal to the last stage of CPU logic

The output as shown on Makerchip is:

7 Screenshot from 2023-08-21 15-41-46

Decode

The instruction decode stage is the second stage in the instruction processing pipeline of a RISC-V processor.
In this stage, the fetched instruction is decoded to determine its operation, operands, and any immediate values associated with it.
The instruction decode stage prepares the instruction for execution by the subsequent stages of the pipeline.
There are 6 instructions type in RISC-V :

  1. Register (R) type
  2. Immediate (I) type
  3. Store (S) type
  4. Branch (B) type
  5. Upper immediate (U) type
  6. Jump (J) type

Screenshot from 2023-08-21 15-48-45

Instruction type decode:

19
The code for this is.


@1
         $is_u_instr = $instr[6:2] ==? 5'b0x101;
         
         $is_s_instr = $instr[6:2] ==? 5'b0100x;
         
         $is_r_instr = $instr[6:2] ==? 5'b01011 ||
                       $instr[6:2] ==? 5'b011x0 ||
                       $instr[6:2] ==? 5'b10100;
         
         $is_j_instr = $instr[6:2] ==? 5'b11011;
         
         $is_i_instr = $instr[6:2] ==? 5'b0000x ||
                       $instr[6:2] ==? 5'b001x0 ||
                       $instr[6:2] ==? 5'b11001;
         
         $is_b_instr = $instr[6:2] ==? 5'b11000;

The output as shown on Makerchip is:
11

Instruction immediate decode

12

The code below is added.

      $imm[31:0] = $is_i_instr ? {{21{$instr[31]}}, $instr[30:20]} :
 		$is_s_instr ? {{21{$instr[31]}}, $instr[30:25], $instr[11:7]} :
			 $is_b_instr ? {{20{$instr[31]}}, $instr[7], $instr[30:25], $instr[11:8], 1'b0} :
				 $is_u_instr ? {$instr[31:12], 12'b0} :
 					 $is_j_instr ? {{12{$instr[31]}}, $instr[19:12], $instr[20], $instr[30:21], 1'b0} :
	              					32'b0;

The output as shown on Makerchip is:
13

Lab to extract other instruction fields

Here we are going to try to extract other instruction fields such as $funct7 , $funct3, $rs1 ,$rs2 ,$rd, $opcode.

14
Using the code below.

	 $rs2[4:0] = $instr[24:20];
         $rs1[4:0] = $instr[19:15];
         $rd[4:0]  = $instr[11:7];
         $opcode[6:0] = $instr[6:0];
         $func7[6:0] = $instr[31:25];
         $func3[2:0] = $instr[14:12];
         

The output as shown on Makerchip is:
15

Lab to Decode Instruction field based on the Instruction type

Here we are going to try to extract other instruction fields such as $funct7 , $funct3, $rs1 ,$rs2 ,$rd, $opcode based on the instruction type.

16

Using the code below.


	$rs2_valid = $is_r_instr || $is_s_instr || $is_b_instr;
         ?$rs2_valid
            $rs2[4:0] = $instr[24:20];
            
         $rs1_valid = $is_r_instr || $is_i_instr || $is_s_instr || $is_b_instr;
         ?$rs1_valid
            $rs1[4:0] = $instr[19:15];
         
         $funct3_valid = $is_r_instr || $is_i_instr || $is_s_instr || $is_b_instr;
         ?$funct3_valid
            $funct3[2:0] = $instr[14:12];
            
         $funct7_valid = $is_r_instr ;
         ?$funct7_valid
            $funct7[6:0] = $instr[31:25];
            
         $rd_valid = $is_r_instr || $is_i_instr || $is_u_instr || $is_j_instr;
         ?$rd_valid
            $rd[4:0] = $instr[11:7];

The output as shown on Makerchip is:
17

Lab on individual decode

Here we are going to try to get the instructions that are circled in red.

18


	 $dec_bits [10:0] = {$funct7[5], $funct3, $opcode};
         $is_beq = $dec_bits ==? 11'bx_000_1100011;
         $is_bne = $dec_bits ==? 11'bx_001_1100011;
         $is_blt = $dec_bits ==? 11'bx_100_1100011;
         $is_bge = $dec_bits ==? 11'bx_101_1100011;
         $is_bltu = $dec_bits ==? 11'bx_110_1100011;
         $is_bgeu = $dec_bits ==? 11'bx_111_1100011;
         $is_addi = $dec_bits ==? 11'bx_000_0010011;
         $is_add = $dec_bits ==? 11'b0_000_0110011;
         

The output as shown on Makerchip is:

19 20

Click here to view the final code for fetch and decode.

RISC-V Control Logic

Execute and Register file read/write

Here we are going to 'read from' and 'write into' the registers.
two read and one write operations can be carried out simultaneously here.
1
We are going to use the below code.

	 $rf_wr_en = 1'b0;
         $rf_wr_index[4:0] = 5'b0;
         $rf_wr_data[31:0] = 32'b0;
         $rf_rd_en1 = $rs1_valid;
         $rf_rd_index1[4:0] = $rs1;
         $rf_rd_en2 = $rs2_valid;
         $rf_rd_index2[4:0] = $rs2;

	 $src1_value[31:0] = $rf_rd_data1; //new
         $src2_value[31:0] = $rf_rd_data2;

	 `BOGUS_USE ($rf_rd_data1)
         `BOGUS_USE ($rf_rd_data2)
         

The output as shown on Makerchip is:

2

LAB on ALU

We are going to follow the design below:

3

The following code is added to the tvl code.

	$result[31:0] = $is_addi ? $src1_value + $imm :
                        $is_add ? $src1_value + $src2_value :
                        32'bx ;

The output as shown on Makerchip is:

4

Lab On Register File Write

We are going to follow the design below:
We just need to hook up the write control registers and the data we need to write.

5

The following code is added to the tvl code.

	 $rf_wr_en = $rd_valid && $rd != 5'b0;
         $rf_wr_index[4:0] = $rd;
         $rf_wr_data[31:0] = $result;

The output as shown on Makerchip is:

6

Arrays

Arrays in RISC-V, as in most other computer architectures, are collections of elements of the same data type stored in contiguous memory locations. RISC-V processors support arrays through their memory access instructions and addressing modes. In the RISC-V architecture, arrays are commonly managed using a combination of memory locations and registers.

Typically, arrays are stored within memory, with each element occupying a specific memory address. This setup enables the processor to utilize load and store instructions for interacting with array elements.

Alternatively, if an array is relatively small and can fit within the available registers, elements can be partially stored in registers. This strategy, in situations where feasible, can yield enhanced performance for certain operations due to the swifter nature of register access compared to memory access.

In essence, the RISC-V register file encompasses an assemblage of registers designated for swift data access and temporary data storage, while arrays encompass collections of elements that can be accommodated either within memory or registers. These arrays are manipulated via a spectrum of instructions, including load, store, and computational directives.

Screenshot from 2023-08-21 20-30-08

Lab For Implementing Branch Instructions

Moving forward in the construction of the RISC-V microarchitecture involves the incorporation of branching mechanisms. In addition to straightforward immediate additions or regular additions, specific conditions might necessitate redirecting the Program Counter (PC) to a designated branch target address. At this juncture, we have introduced a range of branch instructions and accordingly updated the PC, factoring in various conditional scenarios.
Some Branching Instructions are :

  1. BEQ;==
  2. BNE:!=
  3. BLT: (x1 < Ă—2) ^ (x1[311!=Ă—2[311)
  4. BGE: (Ă—1 >= Ă—2) ^ (x1[31]!=Ă—2[31])
  5. BLTU: <
  6. BGEU: >=

Screenshot from 2023-08-21 20-48-25

The code below is implemented into the main code:

	   $taken_branch = $is_beq ? ($src1_value == $src2_value):
                         $is_bne ? ($src1_value != $src2_value):
                         $is_blt ? (($src1_value < $src2_value) ^ ($src1_value[31] != $src2_value[31])):
                         $is_bge ? (($src1_value >= $src2_value) ^ ($src1_value[31] != $src2_value[31])):
                         $is_bltu ? ($src1_value < $src2_value):
                         $is_bgeu ? ($src1_value >= $src2_value):
                                    1'b0;
         `BOGUS_USE($taken_branch)
         
         `BOGUS_USE($taken_branch)

The output as shown on Makerchip is:
Screenshot from 2023-08-21 20-44-18

Lab For Complementing Branch Instructions

The below design change is made.

Screenshot from 2023-08-21 20-58-09

The code below is implemented into the main code:

	$br_target_pc[31:0] = $pc + $imm;

The output as shown on Makerchip is:
Screenshot from 2023-08-21 20-56-12

Lab For Testbench

The code below is implemented into the main code:

*passed = |cpu/xreg[10]>>5$value == (1+2+3+4+5+6+7+8+9) ;

The output as shown on Makerchip is:
Screenshot from 2023-08-21 21-07-40

The final code for this module can be viewed here.

Day 5-Complete Pipelined RISC-V CPU Microarchitecture

Pipeling the CPU
Pipelining of the CPU core is going to be implemented, streamlining the process of retiming and considerably reducing the occurrence of functional errors. This technique enables faster computational tasks. As previously explained, establishing the pipeline is a straightforward process of incorporating stages labelled as @1, @2, and so on. A visual representation of the pipelining setup is provided below. In TL Verilog, it's important to note that there is no strict requirement to define the pipeline stages in a specific systematic order, providing an extra layer of benefit.

LAB on Cycle valid signal

We are required to get implement the logic in the following block diagram:
Screenshot from 2023-08-22 11-58-45
the following code snippet is used here.


$valid = $reset ? 1'b0 : ($start) ? 1'b1 : (>>3$valid) ;
         $start_int = $reset ? 1'b0 : 1'b1;
         $start = $reset ? 1'b0 : ($start_int && !>>1$start_int);

The following output is obtained on MakerChip.

Screenshot from 2023-08-22 12-02-15

LAB to take care of invalid cycles

Screenshot from 2023-08-22 12-51-10

The code snippet required to implement is.

 $pc[31:0] = (>>1$reset) ? 32'b0 : (>>3$valid_taken_branch) ? (>>3$br_tgt_pc) :  (>>3$int_pc)  ;
 $valid_taken_branch = $valid && $taken_br;

Screenshot from 2023-08-22 12-49-45

LAB to distribute logic

Pipelining is done in this step. Code is distributed and output is obtained.

Screenshot from 2023-08-22 12-51-10
Below given is the output on MakerChip.
Screenshot from 2023-08-22 13-01-57

Solutions to pipeline hazzards

Lab for register file Bypass to address rd-after-wr hazard

We are required to implement the logic as per the given figure.
Screenshot from 2023-08-22 15-57-48

The logic snippet required is given below.

$src1_value[31:0] = ((>>1$rf_wr_en) && (>>1$rd == $rs1 )) ? (>>1$result): $rf_rd_data1; 
$src2_value[31:0] = ((>>1$rf_wr_en) && (>>1$rd == $rs2 )) ? (>>1$result) : $rf_rd_data2;

There should be no noticeable changes at this stage.
Screenshot from 2023-08-22 15-53-54

Lab for branches to correct the branch target path

The logic snippet required is given below.

 $pc[31:0] = (>>1$reset) ? 32'b0 : (>>3$valid_taken_br) ? (>>3$br_tgt_pc) :  (>>3$int_pc)  ;
         //$valid = $reset ? 1'b0 : ($start) ? 1'b1 : (>>3$valid) ; no need for this

Screenshot from 2023-08-22 16-21-25

Lab for complete RV32I instruction set (except FENCE, ECALL, EBREAK)


Added some more instructions to the existing ones and removed bogus codes and added some real values.

Screenshot from 2023-08-22 17-19-00

Load store and data memory | Wrap up
Similar to branch,load will also have 3 cycle delay. So, added a Data Memory 1 write/read memory. Added test case to check the functionality of load/store.
	uncomment enable m4+dmem(@4)    // Args: (read/write stage)
 	connect interface signals using address bits[5:2] to perform load and store (when valid)
	Additionally Incorporation of Jump feature (JAL and JALR instructions).

Below is the output Makerchip after including load/store instructions:

Screenshot from 2023-08-22 17-50-57

The Final Diagram is shown below.
Screenshot from 2023-08-22 17-51-30

Acknowledgements

References

  1. https://github.com/kunalg123/riscv_workshop_collaterals
  2. https://github.com/stevehoover
  3. https://github.com/shivanishah269
  4. https://redwoodeda.com
  5. https://makerchip.com/

About

RISC-V | TL-Verilog | Pipe-lining | Micro-Architecture | Makechip

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published