Tiling & Memory Locality - why? #8012

icebeing · 2023-12-28T23:23:46Z

icebeing
Dec 28, 2023

hi all -

I've been reviewing the various Halide tutorials that talk about the .tile() scheduling directive. They say it promotes "memory locality". In what way? I guess I'll borrow from the tutorials as an example: the blur3x3 example

In it, the result buffer has the following scheduler:

result.compute_root().tile(x,y,xi,yi,512,20).vectorize(xi, 32).parallel(y)
So, I get the following details from this scheduler:

it splits up the result buffer in 512x20 tiles
the computation within each tile is vectorized within 32 bytes (so 256-bit registers), within a tile
maps each tile onto a thread along the y-direction

how does memory locality benefit from this scheme? You're still jumping around memory locations within each tile, and potentially invalidating cache lines, when you change rows.

It would seem to me that doing this is better?
result.compute_root().vectorize(xi, 32).parallel(y)

The tutorials somewhat shed light on this topic, but not enough to quell my confusion about what's going on with the processor, or why it's a better scheduling approach.

Thanks again, Charles.

abadams · 2024-01-04T11:02:39Z

abadams
Jan 4, 2024
Maintainer

Consider what's happening to the input buffer. It's a 3x3 stencil, so a 512x20 tile of output reads a 514x22 region of it. This fits in cache. Most of those loaded values get used 9 times while still in cache.

If instead we processed one scanline at a time in parallel (so different scanlines may happen on different cores), each loaded value would only be reused 3 times before being evicted from L1.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tiling & Memory Locality - why? #8012

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Tiling & Memory Locality - why? #8012

icebeing Dec 28, 2023

Replies: 1 comment

abadams Jan 4, 2024 Maintainer

icebeing
Dec 28, 2023

abadams
Jan 4, 2024
Maintainer