-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QNN] ReduceL2 Support #22636
base: main
Are you sure you want to change the base?
[QNN] ReduceL2 Support #22636
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can commit the suggested changes from lintrunner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. @adrianlizarraga Any concern?
std::move(param_tensor_names), | ||
logger, do_op_validation, GetQnnOpType(node_unit.OpType()))); | ||
if (node_unit.OpType() == "ReduceL2") { | ||
const auto& input = node_unit.Inputs()[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add a quick comment about what is happening in this branch? From what I can see, this is building the QNN node sequence -> Mul -> ReduceSum -> Sqrt
to reproduce the effect of an ONNX ReduceL2.
ORT_RETURN_IF_ERROR(utils::GetQnnDataType(is_quantized_tensor, type_proto, qnn_data_type)); | ||
const std::string input_name = input_names[0]; | ||
const std::string pow2_name = input_name + "_ort_qnn_ep_pow2"; | ||
QnnTensorWrapper pow2_tensorwrapper(pow2_name, QNN_TENSOR_TYPE_NATIVE, qnn_data_type, output_quantize_param.Copy(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is assigning the output of the Mul the same quantization parameters (scale/zp) as the original ReduceSum's output. Is this accurate? I would expect that the output scale of a Pow should be larger than its input scale because you are squaring the input.
Qnn_TensorType_t output_tensor_type = | ||
qnn_model_wrapper.IsGraphOutput(output.node_arg.Name()) ? QNN_TENSOR_TYPE_APP_READ : QNN_TENSOR_TYPE_NATIVE; | ||
QnnTensorWrapper sqrt_tensorwrapper(output.node_arg.Name(), output_tensor_type, qnn_data_type, | ||
std::move(output_quantize_param), std::move(output_shape)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The quantization parameters don't seem appropriate, but maybe I'm missing something. I don't see why every tensor in the sequence Mul -> ReduceSum -> Sqrt
would use the same exact quantization parameters.
I see that the unit tests are not running in the CI. Could you please merge the latest main branch and get the unit tests running? Maybe we can add more tests to explore the accuracy of this approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually ran the tests locally and they passed. But you are right, I think the quantization parameters are not accurate, I updated the code to support non-quantized tensor only for now.
ORT_RETURN_IF_NOT(qnn_model_wrapper.GetOnnxShape(input.node_arg, input_shape), "Cannot get shape"); | ||
std::vector<uint32_t> output_shape; | ||
ORT_RETURN_IF_NOT(qnn_model_wrapper.GetOnnxShape(output.node_arg, output_shape), "Cannot get shape"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I try to think about log messages from the perspective of the person reading it with the broken model and what they need to know for the message to be meaningful/not cryptic.
ORT_RETURN_IF_NOT(qnn_model_wrapper.GetOnnxShape(input.node_arg, input_shape), "Cannot get shape"); | |
std::vector<uint32_t> output_shape; | |
ORT_RETURN_IF_NOT(qnn_model_wrapper.GetOnnxShape(output.node_arg, output_shape), "Cannot get shape"); | |
ORT_RETURN_IF_NOT(qnn_model_wrapper.GetOnnxShape(input.node_arg, input_shape), "Cannot get input shape"); | |
std::vector<uint32_t> output_shape; | |
ORT_RETURN_IF_NOT(qnn_model_wrapper.GetOnnxShape(output.node_arg, output_shape), "Cannot get output shape"); |
Qnn_DataType_t qnn_data_type = QNN_DATATYPE_FLOAT_32; | ||
ORT_RETURN_IF_ERROR(utils::GetQnnDataType(is_quantized_tensor, type_proto, qnn_data_type)); | ||
const std::string input_name = input_names[0]; | ||
const std::string pow2_name = input_name + "_ort_qnn_ep_pow2"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we break up the multiple steps here using whitespace so it's easier to quickly process the different parts? when it's one big block I have to read every single line to figure out the steps, which requires keeping a lot of state in limited short term memory. if I'm skipping from step to step via whitespace it's easier to recognize the previous parts that may no longer be relevant and the cognitive load is much lower. i.e. I have a much higher chance of correctly interpreting the code in a shorter amount of time.
std::move(input_shape)); | ||
ORT_RETURN_IF_NOT(qnn_model_wrapper.AddTensorWrapper(std::move(pow2_tensorwrapper)), "AddTensorWrapper failed"); | ||
ORT_RETURN_IF_NOT( | ||
qnn_model_wrapper.CreateQnnNode(pow2_name, QNN_OP_PACKAGE_NAME_QTI_AISW, QNN_OP_ELEMENT_WISE_MULTIPLY, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add a comment explaining why QNN_OP_ELEMENT_WISE_MULTIPLY is being used
35e6d6b
to
599579d
Compare
Add ReduceL2 support to QNN EP. Some of the QNN AI Hub models contain Reduce L2, such as openai_clip_CLIPTextEncoder and openai_clip_CLIPIamgeEncoder, without this PR, the ReduceL2 will be assigned to CPU and the graph will be split to 2 QNN graphs, which this PR, all nodes will be in QNN EP.