The Midgard Shader Core #44

ysh329 · 2021-02-18T15:52:34Z

https://developer.arm.com/solutions/graphics-and-gaming/developer-guides/learn-the-basics/the-midgard-shader-core/single-page

https://developer.arm.com/documentation/100614/0314/OpenCL-optimizations-list/Mali-Midgard-GPU-specific-optimizations

ysh329 · 2021-03-06T13:12:43Z

Mali Midgard GPU架构优化细节

Midgard架构包括了T600、T700以及800系列，Arm官方给了对该架构的优化细节，下面将逐一展开，并结合我的理解。这部分内容主题来自其官网对Midgard GPU在OpenCL的文档。

参考：https://developer.arm.com/documentation/100614/0314/OpenCL-optimizations-list/Mali-Midgard-GPU-specific-optimizations

kernel中所有线程结束的时间是相同的

Midgard GPU的计算Branches are computationally cheap on Mali Midgard GPUs. This means you can use loops in kernels without any performance impact.
Your kernels can include different code segments but try to ensure the kernels exit at the same time.
A workaround to this is to use a bucket algorithm.

Make your kernel code as simple as possible

This assists the auto-vectorization process.
Using loops and branches might make auto-vectorization more difficult.

Use vector operations in kernel code

Use vector operations in kernel code to help the compiler to map them to vector instructions.

Vectorize your code

Mali Midgard GPUs perform computation with vectors. These enable you to perform multiple operations per instruction.
Vectorizing your code makes the best use of the Mali Midgard GPU hardware so ensure that you vectorize your code for maximum performance.
Mali Midgard GPUs contain 128-bit wide vector registers.

Note

The Midgard compiler can auto-vectorize some scalar code.

Vectorize incrementally

Vectorize in incremental steps. For example, start processing one pixel at a time, then two, then four.

Avoid processing single values

Avoid writing kernels that operate on single bytes or other small values. Write kernels that work on vectors.

Use 128-bit vectors

Vector sizes of 128-bits are optimal. Vector sizes greater than 128-bits are broken into 128-bit parts and operated on separately. For example, adding two 256-bit vectors takes twice as long as adding two 128-bit vectors. You can use vector sizes less than 128 bits without issue.

The disadvantage of using vectors greater than 128 bits is that they can increase code size. Increased code size uses more instruction cache space and this can reduce performance.

ysh329 pinned this issue Feb 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The Midgard Shader Core #44

The Midgard Shader Core #44

ysh329 commented Feb 18, 2021 •

edited

Loading

ysh329 commented Mar 6, 2021 •

edited

Loading

Note

The Midgard Shader Core #44

The Midgard Shader Core #44

Comments

ysh329 commented Feb 18, 2021 • edited Loading

ysh329 commented Mar 6, 2021 • edited Loading

Mali Midgard GPU架构优化细节

kernel中所有线程结束的时间是相同的

Make your kernel code as simple as possible

Use vector operations in kernel code

Vectorize your code

Note

Vectorize incrementally

Avoid processing single values

Use 128-bit vectors

ysh329 commented Feb 18, 2021 •

edited

Loading

ysh329 commented Mar 6, 2021 •

edited

Loading