Use GPU registers more effectively #277

ashvardanian · 2025-10-29T18:16:37Z

All of our algorithms on GPU extensively leverage shared memory and warp-level synchronization. That might be suboptimal for very small input sizes. There, we should keep everything in the registers.

So I'm suggesting a new set of kernels processing the DP matrix row-by-row, but keep only one row and one scalar in GPU registers. Moreover, it process the matrix of uint8_t cells in slices of 4 continuous entries forming uint32_t-s in each row. That minimizes the number of loads & stores, conversions between 8-bit and 32-bit representations - assuming GPUs don't have much custom logic for 8-bit times and upcast practically every time.

Release builds still crash

Still crashes in release builds

Yields 4x improvement

Improve: Draft more register-optimal variant

5aacb9c

ashvardanian changed the base branch from main to main-dev October 29, 2025 18:16

ashvardanian added 8 commits October 29, 2025 20:46

Improve: Simplify the template

f75405b

Improve: Cover single-byte slots by benchmarks

8bd04bc

Fix: Matrix out-of-bound access

b42ce43

Release builds still crash

Improve: 1-row Wagner-Fisher for minimum register pressure

d9aa7fd

Still crashes in release builds

Fix: Cache second string char

13628a4

Yields 4x improvement

Improve: Register-optimal Levenshtein algorithm

4efb2c4

Add: Desperate __launch_bounds__ attempt

376dc13

Improve: Tune block size

b7bbd63

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use GPU registers more effectively #277

Use GPU registers more effectively #277

Uh oh!

ashvardanian commented Oct 29, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Use GPU registers more effectively #277

Are you sure you want to change the base?

Use GPU registers more effectively #277

Uh oh!

Conversation

ashvardanian commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ashvardanian commented Oct 29, 2025 •

edited

Loading