Detail
Semester: 3rd Sem B. Tech. CSE
Section: S1
Team ID: 19
Member-1: Akhil Sakthieswaran, 231CS108, [email protected]
Member-2: Raunil Singh, 231CS148, [email protected]
Member-3: Sanjay S Bhat, 231CS153, [email protected]
Detail
Implementing ChaCha20 on a small-scale hardware system offers an efficient and secure solution for resource-constrained devices like IoT and embedded systems. Its lightweight design ensures low power consumption and high performance without specialized hardware, whileproviding robust encryption, making it ideal for real-time data protection in energy-sensitive applications.
The problem statement we are dealing with in our project is developing a hardware circuit implementation of the ChaCha20 encryption algorithm to enhance throughput, reduce latency, and minimize power consumption. Our design is planned to enable efficient processing for secure communications in embedded systems, addressing limitations found in software implementations while ensuring adaptability for various applications and maintaining high security standards.
The main features of our project include removing the very drawbacks in the implementation of the chacha20 algorithm without using circuits. High latency in software can hinder real-time applications,which can be dealt with using circuits,as hardware circuits can reduce processing time through dedicated data paths and pipelining. Software often consumes more power due to CPU overhead.On the other hand,circuits can be designed for low power operation, ideal for battery-operated devices.Another major pro of using hardware circuits to implement the chacha20 algorithm is that hardware circuits can mitigate certain side-channel attacks through physical design features, enhancing overall security.Finally, software may require significant RAM and CPU resources. Circuits can use smaller, dedicated memory and logic components for efficiency,leading to significant optimization of the implementation
Detail
The ChaCha20 cypher is a stream cypher. Here, the main focus is on encrypting the key and using it to generate a stream of characters that is XORed with the plaintext to produce the ciphertext. Its implementation contains a 2-bit counter. The user has to input the counter's initial state, the key and the nonce. The plaintext is input using two switches, one for 0 and the other for 1. The counter gets incremented whenever an input bit is received. These 2 bits from counter, 8 bits of key, 2 bits of nonce and 4 bits of constants are all passed to the key-stream-generator. These 16 bits are placed in a virtual matrix and undergo two rounds of encryption(the first round encrypts the columns, and the second encrypts the diagonals of the matrix). Each round contains four quarter rounds. A quarter round takes 4 bits as input and generates 4 bits as output. After the two rounds of encryption, the control goes to the Bit Selector. On observation, among those 16 bits, 6 bits of the matrix undergo toggling after every character input(denoted by D), 4 bits undergo toggling on every alternate input bit(denoted by A), and 6 bits remain static and do not change(denoted by S). The bit Selector selects the bits in the following order: D S A D S A D S A D S A D S. This way of selecting bits assures maximum diffusion in the generated key stream. Finally, the bit received from the key stream is XORed with the input bit to produce the output.
Detail
Step 1:
Set initial counter value, 8-bit key and nonce.
Step 2:
Set lock to 1 to lock the initial counter value and then set it back to 0.
Step 3:
To give inputs, use the switches for 0 and 1 respectivetly. For example, to give input 1, set '1 switch' to 1, read the output bit and then set it back to 0. Continue the same process for all the input bits.
Step 4:
To reset the circuit the lock can simply be turned to 1 and then set to 0.
Detail
//quarter round generator module
module qrg (
input A, B, C, D, // 4 inputs
output reg a, b, c, d // 4 outputs
);
always @(*) begin
case ({A, B, C, D})
4'b0000: {a, b, c, d} = 4'b0000;
4'b0001: {a, b, c, d} = 4'b1010;
4'b0010: {a, b, c, d} = 4'b1101;
4'b0011: {a, b, c, d} = 4'b0111;
4'b0100: {a, b, c, d} = 4'b1110;
4'b0101: {a, b, c, d} = 4'b0100;
4'b0110: {a, b, c, d} = 4'b0011;
4'b0111: {a, b, c, d} = 4'b1001;
4'b1000: {a, b, c, d} = 4'b0101;
4'b1001: {a, b, c, d} = 4'b1111;
4'b1010: {a, b, c, d} = 4'b1000;
4'b1011: {a, b, c, d} = 4'b0010;
4'b1100: {a, b, c, d} = 4'b1011;
4'b1101: {a, b, c, d} = 4'b0001;
4'b1110: {a, b, c, d} = 4'b0110;
4'b1111: {a, b, c, d} = 4'b1100;
endcase
end
endmodule
//2-bit counter module
module two_bit_counter (
input wire clk, // Clock input
input wire reset, // Asynchronous reset input
input wire [1:0] init_value, // Initial value input
input wire lock, // Lock input to set initial value
output reg [1:0] count // 2-bit counter output
);
always @(posedge clk or posedge reset or posedge lock) begin
if (reset) begin
count <= 2'b00; // Reset counter to 0
end else if (lock) begin
count <= init_value; // Set counter to initial value
end else begin
count <= count + 1; // Increment counter
end
end
endmodule
//key stream generator module
module ksg (
input [3:0] constant,
input [7:0] key,
input [1:0] counter,
input [1:0] nonce,
output [3:0] final_out1,
output [3:0] final_out2,
output [3:0] final_out3,
output [3:0] final_out4
);
wire [3:0] qrg_out1, qrg_out2, qrg_out3, qrg_out4;
// First round of qrg instances with reversed input bits
qrg qrg1 (
.A(constant[3]), // Reversed
.B(key[7]), // Reversed
.C(key[6]), // Reversed
.D(counter[1]), // Reversed
.a(qrg_out1[0]),
.b(qrg_out1[1]),
.c(qrg_out1[2]),
.d(qrg_out1[3])
);
qrg qrg2 (
.A(constant[2]), // Reversed
.B(key[5]), // Reversed
.C(key[4]), // Reversed
.D(counter[0]), // Reversed
.a(qrg_out2[0]),
.b(qrg_out2[1]),
.c(qrg_out2[2]),
.d(qrg_out2[3])
);
qrg qrg3 (
.A(constant[1]), // Reversed
.B(key[3]), // Reversed
.C(key[2]), // Reversed
.D(nonce[1]), // Reversed
.a(qrg_out3[0]),
.b(qrg_out3[1]),
.c(qrg_out3[2]),
.d(qrg_out3[3])
);
qrg qrg4 (
.A(constant[0]), // Reversed
.B(key[1]), // Reversed
.C(key[0]), // Reversed
.D(nonce[0]), // Reversed
.a(qrg_out4[0]),
.b(qrg_out4[1]),
.c(qrg_out4[2]),
.d(qrg_out4[3])
);
// Second round of qrg instances with cyclic input bits
qrg qrg5 (
.A(qrg_out1[0]),
.B(qrg_out2[1]),
.C(qrg_out3[2]),
.D(qrg_out4[3]),
.a(final_out1[3]),
.b(final_out1[2]),
.c(final_out1[1]),
.d(final_out1[0])
);
qrg qrg6 (
.A(qrg_out1[1]),
.B(qrg_out2[2]),
.C(qrg_out3[3]),
.D(qrg_out4[0]),
.a(final_out2[3]),
.b(final_out2[2]),
.c(final_out2[1]),
.d(final_out2[0])
);
qrg qrg7 (
.A(qrg_out1[2]),
.B(qrg_out2[3]),
.C(qrg_out3[0]),
.D(qrg_out4[1]),
.a(final_out3[3]),
.b(final_out3[2]),
.c(final_out3[1]),
.d(final_out3[0])
);
qrg qrg8 (
.A(qrg_out1[3]),
.B(qrg_out2[0]),
.C(qrg_out3[1]),
.D(qrg_out4[2]),
.a(final_out4[3]),
.b(final_out4[2]),
.c(final_out4[1]),
.d(final_out4[0])
);
endmodule
//plain text input module
module plain_text (
input wire plain_text_input1, // First input bit
input wire plain_text_input2, // Second input bit
output wire flag, // Flag output, set to 1 if any input bit is 1
output wire bit_value // Output the bit value that is set
);
// Set the flag if any of the input bits is 1
or(flag,plain_text_input1,plain_text_input2);
// Output the bit value that is set (1 if bit1 is set, otherwise 0)
assign bit_value = plain_text_input1 ? 1'b1 : 1'b0;
endmodule
//bit selector for final output
module four_counter (
input clk, // Clock input
input reset, // Reset input
input lock, // Lock signal
output reg [3:0] count // 4-bit counter output
);
always @(posedge clk or posedge reset) begin
if (reset) begin
count <= 4'b0000; // Reset counter to 0
end else if (lock) begin
count <= 4'b1111; // Set counter to 16 (4'b1111)
end else begin
count <= count + 1; // Increment counter
end
end
endmodule
module bit_selector (
input [15:0] data_in, // 16-bit input data
input [3:0] clock_in, // 4-bit input clock
output reg out // Selected bit output
);
// Decoder logic
always @(*) begin
case (clock_in)
4'b0000: out = data_in[15]; // Clock 0 selects last bit
4'b0001: out = data_in[14]; // Clock 1 selects second-to-last bit
4'b0010: out = data_in[13];
4'b0011: out = data_in[12];
4'b0100: out = data_in[11];
4'b0101: out = data_in[10];
4'b0110: out = data_in[9];
4'b0111: out = data_in[8];
4'b1000: out = data_in[7];
4'b1001: out = data_in[6];
4'b1010: out = data_in[5];
4'b1011: out = data_in[4];
4'b1100: out = data_in[3];
4'b1101: out = data_in[2];
4'b1110: out = data_in[1];
4'b1111: out = data_in[0]; // Clock 15 selects first bit
default: out = 1'b0;
endcase
end
endmodule
//main module
module main (
input wire clk,
input wire reset,
input wire [7:0] key,
input wire [1:0] nonce,
input wire plain_text_input1,
input wire plain_text_input2,
input wire [1:0] init_value,
input wire lock, // Lock input for both counter and bit selector
output wire final_output
);
// Internal signals
wire [1:0] counter_output;
wire [3:0] ksg_output1, ksg_output2, ksg_output3, ksg_output4;
wire [3:0] constant = 4'b1101;
wire bit_selector_output;
wire plain_text_flag;
wire plain_text_bit_value;
// Instantiate the two_bit_counter
two_bit_counter counter_uut (
.clk(plain_text_flag),
.reset(reset),
.init_value(init_value),
.lock(lock),
.count(counter_output)
);
// Instantiate the ksg module
ksg ksg_uut (
.constant(constant),
.key(key),
.counter(counter_output),
.nonce(nonce),
.final_out1(ksg_output1),
.final_out2(ksg_output2),
.final_out3(ksg_output3),
.final_out4(ksg_output4)
);
// Instantiate the plain-text input module
plain_text plain_text_uut (
.plain_text_input1(plain_text_input1),
.plain_text_input2(plain_text_input2),
.flag(plain_text_flag),
.bit_value(plain_text_bit_value)
);
// Concatenate ksg outputs to form a 16-bit input for bit selector
wire [15:0] ksg_combined_output = {ksg_output1, ksg_output2, ksg_output3, ksg_output4};
wire [3:0] four_out;
four_counter counter_4 (
.clk(plain_text_flag),
.reset(reset),
.lock(lock),
.count(four_out)
);
// Instantiate the bit selector
bit_selector bit_selector_uut (
.data_in(ksg_combined_output),
.clock_in(four_out), // Use counter output as part of clock input
// .lock(lock),
.out(bit_selector_output)
);
// XOR the bit selector output with the plain-text bit value
assign final_output = bit_selector_output ^ plain_text_bit_value;
endmodule
//testbench for main module
module main_tb;
// Inputs
reg clk;
reg reset;
reg [7:0] key;
reg [1:0] nonce;
reg plain_text_input1;
reg plain_text_input2;
reg lock;
reg [1:0] init_value;
// Outputs
wire final_output;
// Instantiate the main module
main uut (
.clk(clk),
.reset(reset),
.key(key),
.nonce(nonce),
.plain_text_input1(plain_text_input1),
.plain_text_input2(plain_text_input2),
.init_value(init_value),
.lock(lock),
.final_output(final_output)
);
// Clock generation
initial begin
clk = 0;
forever #5 clk = ~clk; // 10ns period clock
end
// Test sequence
initial begin
// Initialize inputs
reset = 1;
key = 8'h00;
nonce = 2'b00;
plain_text_input1 = 1'b0;
plain_text_input2 = 1'b0;
lock = 0;
init_value = 2'b00;
// Dump waveform data
$dumpfile("ChaCha.vcd");
$dumpvars(0, main_tb);
// Apply test vectors
#10 reset=0;
#10 key = 8'b11011011; nonce = 2'b11; lock = 1;init_value = 2'b01;
#10 lock = 0;
#10 plain_text_input1 = 1'b1;
#10 plain_text_input1 = 1'b0;
#10 plain_text_input1 = 1'b1;
#10 plain_text_input1 = 1'b0;
#10 plain_text_input1 = 1'b1;
#10 plain_text_input1 = 1'b0;
// #50 reset = 1;
#10 reset = 0; key = 8'b11011011; nonce = 2'b00; plain_text_input1 = 1'b0;;
// Finish simulation
#100 $finish;
end
initial begin
// Monitor the outputs
$monitor("At time %t, key = %b, nonce = %b, plain_text_input1 = %b,plain_text_input0 = %b, lock = %b, counter_init_value = %b, final_output = %b",
$time, key, nonce, plain_text_input1,plain_text_input2, lock, init_value,final_output);
end
endmodule
Detail
- ComputerPhile - YouTube
https://youtu.be/UeIpq-C-GSA?si=nAy34VoO6TG0Eg_5 - ChaCha20 and Poly1305 for IETF Protocols
https://datatracker.ietf.org/doc/html/rfc7539 - Wikipedia
https://en.wikipedia.org/wiki/ChaCha20-Poly1305