feature(hash): refactor sha256 hash api #953

Rexicon226 · 2025-09-14T13:01:07Z

This is a WIP PR where I focus on improving performance around sha256 related things. The first commit re-organizes the API a bit to make it easier to streamline the hashing later on. Single hashes do not need to go through a incremental API and instead can be pipelined when done in a loop, something to-be-implemented. And when doing mixins with just two inputs of a known size, we can pipeline that as well, performing much faster hashing than what we currently achieve.

codecov · 2025-09-14T16:17:28Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

Files with missing lines	Coverage Δ
src/consensus/optimistic_vote_verifier.zig	`98.90% <100.00%> (ø)`
src/consensus/vote_listener.zig	`92.24% <100.00%> (ø)`
src/core/blockhash_queue.zig	`100.00% <100.00%> (ø)`
src/core/entry.zig	`95.14% <100.00%> (+0.04%)`	⬆️
src/core/hash.zig	`95.93% <100.00%> (+0.03%)`	⬆️
src/core/poh.zig	`98.92% <100.00%> (ø)`
src/core/shred.zig	`100.00% <100.00%> (ø)`
src/gossip/fuzz_service.zig	`96.61% <100.00%> (ø)`
src/gossip/ping_pong.zig	`98.23% <100.00%> (ø)`
src/gossip/service.zig	`82.03% <100.00%> (ø)`
... and 10 more

... and 3 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

yewman

Looks good to me, just one small comment on a missed signature_count being hashed in conformance and small style preference

conformance/src/txn_execute.zig

dnut · 2025-10-01T11:06:28Z

src/core/hash.zig

+    pub fn init(data: []const u8) Hash {
+        var out: [32]u8 = undefined;
+        Sha256.hash(data, &out, .{});
+        return .{ .data = out };
    }

-    /// re-hashes the current hash with the mixed-in byte slice(s).
-    pub fn extendAndHash(self: Hash, data: anytype) Hash {
-        return generateSha256(.{ self.data, data });
+    /// Does the same thing as `init`, but updates the hash with each
+    /// input slice from the `data` list.
+    pub fn initMany(data: []const []const u8) Hash {
+        var new = Sha256.init(.{});
+        for (data) |d| new.update(d);
+        return .{ .data = new.finalResult() };
    }

-    fn update(hasher: *Sha256, data: anytype) void {
-        const T = @TypeOf(data);
-
-        if (T == Hash or T == *const Hash or T == *Hash) {
-            hasher.update(&data.data);
-        } else if (@typeInfo(T) == .@"struct") {
-            inline for (data) |val| update(hasher, val);
-        } else if (std.meta.Elem(T) == u8) switch (@typeInfo(T)) {
-            .array => hasher.update(&data),
-            else => hasher.update(data),
-        } else {
-            for (data) |val| update(hasher, val);
-        }
+    /// re-hashes the current hash with the mixed-in byte slice(s).
+    pub fn extend(self: Hash, data: []const u8) Hash {


I don't see how this API is an improvement. This makes it less ergonomic. What's the benefit? I would prefer to keep it as is.

The reflection is not necessary, introduces a ton of code bloat, and makes it impossible to optimize hashing in any meaningful way. I believe this PR only proves that the reflection is not necessary, with how small a diff was required to conform to exactly []const []const u8 when you just think a bit about the code around callsites. We need to stop abstracting every little thing in the codebase.

The reflection is not necessary,

I agree.

introduces a ton of code bloat

You're saying because there are separate versions of this function for every usage?

Let's say we decided to eliminate these functions and forced every call site to directly use the hasher. I don't see code bloat as a major problem with that. Do you think this function introduces more bloat than inlining the usage of the hasher?

makes it impossible to optimize hashing in any meaningful way.

I don't understand why, but regardless I think this only justifies the change when it coincides with optimizations.

think a bit about the code around callsites

This is the annoyance I'd like to avoid. I'd rather think about it only the one time that I write this function instead of every time I use it.

You're saying because there are separate versions of this function for every usage?

Almost all usages create an anonymous tuple over which the function becomes generalized. So pretty much, yes.

Let's say we decided to eliminate these functions and forced every call site to directly use the hasher. I don't see code bloat as a major problem with that. Do you think this function introduces more bloat than inlining the usage of the hasher?

Yes, it does. As I said above, it creates a new version of the function for nearly all our usages, which bloats the binary and makes inlining the hasher implementation itself nearly impossible (LLVM pretty much always refuses to do it, because of how many call sites there are). Using the hasher directly would also cause this problem in most cases, which is specifically why I don't do that. I would honestly prefer it, but since we are looking for maximizing performance, having a single (or a couple in our case, we can have specialized ones for PoH, which I was working on a while ago) entry point, which is then unrolled properly a single time, is optimal.

This is the annoyance I'd like to avoid. I'd rather think about it only the one time that I write this function instead of every time I use it.

You want to be thinking about it each time. Hashing is an expensive operation, and there are a lot of hashes that need to be done over the course of the validator. Increasing friction at those places is a good thing.

I believe this sort of mindset is one of the main driving factors behind our current performance problems. We are trying to hide everything away behind abstractions; cloning everything, refcounting everything, allocating everything. This leads to the death-by-a-thousand-cuts situation that we are currently in.

We need to start thinking about the performance of every little thing. The validator itself is not a particularly complex piece of software; however, it has a lot of moving parts and nuances. All of them are small and seemingly inexpensive, and they all need to happen. Adding them up together leads to a lot of performance loss.

github-project-automation bot added this to Sig Sep 14, 2025

github-project-automation bot moved this to 🏗 In progress in Sig Sep 14, 2025

Rexicon226 force-pushed the Rexicon226/hash-refactor branch from 8838aba to 5985c27 Compare September 14, 2025 15:57

Rexicon226 force-pushed the Rexicon226/hash-refactor branch from 5985c27 to 20694bb Compare September 26, 2025 03:22

Rexicon226 marked this pull request as ready for review September 26, 2025 13:50

Rexicon226 requested review from ultd, dnut and yewman as code owners September 26, 2025 13:50

yewman reviewed Sep 29, 2025

View reviewed changes

conformance/src/txn_execute.zig Show resolved Hide resolved

Rexicon226 added 2 commits September 29, 2025 02:55

refactor sha256 hash api

a9ecaf5

add missing hash

be5589e

Rexicon226 force-pushed the Rexicon226/hash-refactor branch from 20694bb to be5589e Compare September 29, 2025 09:57

dnut requested changes Oct 1, 2025

View reviewed changes

github-project-automation bot moved this from 🏗 In progress to 👀 In review in Sig Oct 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feature(hash): refactor sha256 hash api #953

feature(hash): refactor sha256 hash api #953

Uh oh!

Rexicon226 commented Sep 14, 2025 •

edited

Loading

Uh oh!

codecov bot commented Sep 14, 2025 •

edited

Loading

Uh oh!

yewman left a comment

Uh oh!

Uh oh!

dnut Oct 1, 2025

Uh oh!

Rexicon226 Oct 1, 2025

Uh oh!

dnut Oct 1, 2025

Uh oh!

Rexicon226 Oct 1, 2025 •

edited

Loading

Uh oh!

Uh oh!

feature(hash): refactor sha256 hash api #953

Are you sure you want to change the base?

feature(hash): refactor sha256 hash api #953

Uh oh!

Conversation

Rexicon226 commented Sep 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Sep 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

yewman left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dnut Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

Rexicon226 Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

dnut Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

Rexicon226 Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Rexicon226 commented Sep 14, 2025 •

edited

Loading

codecov bot commented Sep 14, 2025 •

edited

Loading

Rexicon226 Oct 1, 2025 •

edited

Loading