Looking for info on NCHWc convoultion algorithm - why is performance so much better? #11747

philtomson · 2022-06-06T18:32:59Z

philtomson
Jun 6, 2022

(sorry, asked this over in Issues, but probably more appropriate here)

I'm curious about the NCHWc layout and convolution. If I use maximum optimization (-o 99) vs -o 2 the speedup is 2x (for a ResNet18 network). Looking at the output onnx I see that for the max optimization many ops get converted to NCHWc convolutions. adds and other mathematical ops get converted to pointwise NCHWc convolutions. Batchnorms get fused into conv ops.

Is NCHWc being used because it's more cache efficient? (ie fewer cache misses?) or are there other reasons why NCHWc convolutions perform much better? (ie the hand coded macro assember in SconvKernelAvx512F.S - is the algorithm there somehow fundamentally different and NCHWc is being used to avoid replicating handcoding this optimized algorithm for other memory layouts?)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Looking for info on NCHWc convoultion algorithm - why is performance so much better? #11747

{{title}}

Replies: 0 comments

Select a reply

Looking for info on NCHWc convoultion algorithm - why is performance so much better? #11747

philtomson Jun 6, 2022

Replies: 0 comments

philtomson
Jun 6, 2022