New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[QNN] ReduceL2 Support #22636

Open

centwang wants to merge 3 commits into main from weicwang/reducel2_qnn

Contributor

centwang commented Oct 29, 2024

Add ReduceL2 support to QNN EP. Some of the QNN AI Hub models contain Reduce L2, such as openai_clip_CLIPTextEncoder and openai_clip_CLIPIamgeEncoder, without this PR, the ReduceL2 will be assigned to CPU and the graph will be split to 2 QNN graphs, which this PR, all nodes will be in QNN EP.

centwang added the ep:QNN label

centwang requested review from skottmckay, cloudhan, adrianlizarraga and jywu-msft

October 29, 2024 02:53

jywu-msft requested a review from HectorSVC

October 29, 2024 02:55

github-actions bot requested changes

View reviewed changes

Contributor

github-actions bot left a comment

You can commit the suggested changes from lintrunner.

onnxruntime/test/providers/qnn/reduce_op_test.cc Outdated Show resolved Hide resolved

cloudhan reviewed

View reviewed changes

onnxruntime/core/providers/qnn/builder/opbuilder/reduce_op_builder.cc Show resolved Hide resolved

cloudhan approved these changes

View reviewed changes

Contributor

cloudhan left a comment

LGTM. @adrianlizarraga Any concern?

adrianlizarraga reviewed

View reviewed changes

onnxruntime/core/providers/qnn/builder/opbuilder/reduce_op_builder.cc

-                                                   std::move(param_tensor_names),
-                                                   logger, do_op_validation, GetQnnOpType(node_unit.OpType())));
+                if (node_unit.OpType() == "ReduceL2") {
+                  const auto& input = node_unit.Inputs()[0];

Contributor

adrianlizarraga Nov 6, 2024

Can you please add a quick comment about what is happening in this branch? From what I can see, this is building the QNN node sequence -> Mul -> ReduceSum -> Sqrt to reproduce the effect of an ONNX ReduceL2.

adrianlizarraga reviewed

View reviewed changes

onnxruntime/core/providers/qnn/builder/opbuilder/reduce_op_builder.cc Outdated

+                  ORT_RETURN_IF_ERROR(utils::GetQnnDataType(is_quantized_tensor, type_proto, qnn_data_type));
+                  const std::string input_name = input_names[0];
+                  const std::string pow2_name = input_name + "_ort_qnn_ep_pow2";
+                  QnnTensorWrapper pow2_tensorwrapper(pow2_name, QNN_TENSOR_TYPE_NATIVE, qnn_data_type, output_quantize_param.Copy(),

Contributor

adrianlizarraga Nov 6, 2024 •

edited

Loading

This is assigning the output of the Mul the same quantization parameters (scale/zp) as the original ReduceSum's output. Is this accurate? I would expect that the output scale of a Pow should be larger than its input scale because you are squaring the input.

adrianlizarraga reviewed

View reviewed changes

onnxruntime/core/providers/qnn/builder/opbuilder/reduce_op_builder.cc Outdated

+                  Qnn_TensorType_t output_tensor_type =
+                      qnn_model_wrapper.IsGraphOutput(output.node_arg.Name()) ? QNN_TENSOR_TYPE_APP_READ : QNN_TENSOR_TYPE_NATIVE;
+                  QnnTensorWrapper sqrt_tensorwrapper(output.node_arg.Name(), output_tensor_type, qnn_data_type,
+                                                      std::move(output_quantize_param), std::move(output_shape));

Contributor

adrianlizarraga Nov 6, 2024

The quantization parameters don't seem appropriate, but maybe I'm missing something. I don't see why every tensor in the sequence Mul -> ReduceSum -> Sqrt would use the same exact quantization parameters.

I see that the unit tests are not running in the CI. Could you please merge the latest main branch and get the unit tests running? Maybe we can add more tests to explore the accuracy of this approach.

Contributor Author

centwang Nov 20, 2024

I actually ran the tests locally and they passed. But you are right, I think the quantization parameters are not accurate, I updated the code to support non-quantized tensor only for now.

skottmckay reviewed

View reviewed changes

onnxruntime/core/providers/qnn/builder/opbuilder/reduce_op_builder.cc Outdated

Comment on lines 256 to 258

+                  ORT_RETURN_IF_NOT(qnn_model_wrapper.GetOnnxShape(input.node_arg, input_shape), "Cannot get shape");
+                  std::vector<uint32_t> output_shape;
+                  ORT_RETURN_IF_NOT(qnn_model_wrapper.GetOnnxShape(output.node_arg, output_shape), "Cannot get shape");

Contributor

skottmckay Nov 7, 2024

I try to think about log messages from the perspective of the person reading it with the broken model and what they need to know for the message to be meaningful/not cryptic.

Suggested change

      
                ORT_RETURN_IF_NOT(qnn_model_wrapper.GetOnnxShape(input.node_arg, input_shape), "Cannot get shape");
          
                std::vector<uint32_t> output_shape;
          
                ORT_RETURN_IF_NOT(qnn_model_wrapper.GetOnnxShape(output.node_arg, output_shape), "Cannot get shape");
          
                ORT_RETURN_IF_NOT(qnn_model_wrapper.GetOnnxShape(input.node_arg, input_shape), "Cannot get input shape");
          
                std::vector<uint32_t> output_shape;
          
                ORT_RETURN_IF_NOT(qnn_model_wrapper.GetOnnxShape(output.node_arg, output_shape), "Cannot get output shape");

onnxruntime/core/providers/qnn/builder/opbuilder/reduce_op_builder.cc Outdated

Comment on lines 263 to 265

+                  Qnn_DataType_t qnn_data_type = QNN_DATATYPE_FLOAT_32;
+                  ORT_RETURN_IF_ERROR(utils::GetQnnDataType(is_quantized_tensor, type_proto, qnn_data_type));
+                  const std::string input_name = input_names[0];
+                  const std::string pow2_name = input_name + "_ort_qnn_ep_pow2";

Contributor

skottmckay Nov 7, 2024

nit: can we break up the multiple steps here using whitespace so it's easier to quickly process the different parts? when it's one big block I have to read every single line to figure out the steps, which requires keeping a lot of state in limited short term memory. if I'm skipping from step to step via whitespace it's easier to recognize the previous parts that may no longer be relevant and the cognitive load is much lower. i.e. I have a much higher chance of correctly interpreting the code in a shorter amount of time.

onnxruntime/core/providers/qnn/builder/opbuilder/reduce_op_builder.cc

+                                                      std::move(input_shape));
+                  ORT_RETURN_IF_NOT(qnn_model_wrapper.AddTensorWrapper(std::move(pow2_tensorwrapper)), "AddTensorWrapper failed");
+                  ORT_RETURN_IF_NOT(
+                      qnn_model_wrapper.CreateQnnNode(pow2_name, QNN_OP_PACKAGE_NAME_QTI_AISW, QNN_OP_ELEMENT_WISE_MULTIPLY,

Contributor

skottmckay Nov 7, 2024

nit: add a comment explaining why QNN_OP_ELEMENT_WISE_MULTIPLY is being used

centwang added 3 commits

November 20, 2024 11:58


          QNN ReduceL2 support

97d1699


          fix lintrunner

aa48914


          reducel2 for non-quantized tensor only

599579d

centwang force-pushed the weicwang/reducel2_qnn branch from 35e6d6b to 599579d Compare

November 20, 2024 04:34

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

adrianlizarraga adrianlizarraga left review comments

skottmckay skottmckay left review comments

github-actions[bot] github-actions[bot] requested changes

cloudhan cloudhan approved these changes

jywu-msft Awaiting requested review from jywu-msft

HectorSVC Awaiting requested review from HectorSVC

Requested changes must be addressed to merge this pull request.

Labels

ep:QNN