Replies: 7 comments
-
It's nice that you are using MKL-DNN backend :) Winograd outperforms the direct convolution for small filter size and we have the plan to enable it in the next step. |
Beta Was this translation helpful? Give feedback.
-
Thanks, let me keep this thread open to track progress. |
Beta Was this translation helpful? Give feedback.
-
@masahi Thanks for asking that. Since winograd may impact the numerical stability and model accuracy, so I don't think it can be chosen by library implictly. Besides, Winograd doesn't always outperform direct convolution in MKL-DNN. It depends on the implementation and kernel size/shape. To make this feature request more clear, could you also explain what does you model look like and what's you expectation from MKL-DNN Winograd? |
Beta Was this translation helpful? Give feedback.
-
@TaoLv Good point on numerical accuracy of winograd. Performance - accuracy trade off is an interesting topic. Since you already have int8 convolution (which trades accuracy for performance way more than winograd), it also makes sense to make winograd available so users can opt-in. I want to test performance of MKL-DNN AVX 512 winograd to compare with my home grown winograd implementation written in TVM. I can't talk about my architecture, but you can imagine a network where almost all convolutions are 3 x 3, similar to VGG. I tested my network on both MXNet + MKL-DNN and TVM + NNVM. At the moment, the TVM compiled model with winograd convolution is way faster than MXNet + MKL-DNN with AVX 512 direct convolution. I expect big speed up from MXNet by enabling mkl-dnn winograd. |
Beta Was this translation helpful? Give feedback.
-
I meant MKL-DNN shouldn't "choose the best algo automatically" because of the accuracy issue. At least there should be an environment variable to enable it at framework level. INT8 is also enabled by the user explictly and it need be tuned to exclude some layers for accuracy. If we agree that winograd cannot be applied to all convolution layers, this kind of tuning should also be applied at model level.
Do you have any chance to compare TVM and MKL-DNN winograd at kernel level? Maybe you can try benchdnn for that. |
Beta Was this translation helpful? Give feedback.
-
Yes, something like an environment variable to enable winograd is also what I had in mind. Thanks for suggesting benchdnn. I'll try it. |
Beta Was this translation helpful? Give feedback.
-
@mxnet-label-bot [Question] |
Beta Was this translation helpful? Give feedback.
-
Hi, I am trying MXNet + MKL-DNN on an AVX 512 capable machine.
Looking at the code in mkldnn_convolution.cc, I'm assuming that MXNet always choose a direct algorithm for mkl-dnn convolution (by specifying mkldnn::algorithm::convolution_direct). But MKL-DNN implements Winograd algo for AVX 512, and it should be much faster than direct algo when applicable.
Why not enable it if there is a support for AVX 512? Complication are that we need to check cpu feature
and that winograd algo is applicable for only certain filter shape. Ideally, MKL-DNN should choose the best algo automatically so that MXNet doesn't need to specify what algo to use. @pengzhao-intel @TaoLv
Beta Was this translation helpful? Give feedback.
All reactions