Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QNN] ReduceL2 Support #22636

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

[QNN] ReduceL2 Support #22636

wants to merge 3 commits into from

Conversation

centwang
Copy link
Contributor

Add ReduceL2 support to QNN EP. Some of the QNN AI Hub models contain Reduce L2, such as openai_clip_CLIPTextEncoder and openai_clip_CLIPIamgeEncoder, without this PR, the ReduceL2 will be assigned to CPU and the graph will be split to 2 QNN graphs, which this PR, all nodes will be in QNN EP.

@centwang centwang added the ep:QNN issues related to QNN exeution provider label Oct 29, 2024
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

onnxruntime/test/providers/qnn/reduce_op_test.cc Outdated Show resolved Hide resolved
Copy link
Contributor

@cloudhan cloudhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @adrianlizarraga Any concern?

std::move(param_tensor_names),
logger, do_op_validation, GetQnnOpType(node_unit.OpType())));
if (node_unit.OpType() == "ReduceL2") {
const auto& input = node_unit.Inputs()[0];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a quick comment about what is happening in this branch? From what I can see, this is building the QNN node sequence -> Mul -> ReduceSum -> Sqrt to reproduce the effect of an ONNX ReduceL2.

ORT_RETURN_IF_ERROR(utils::GetQnnDataType(is_quantized_tensor, type_proto, qnn_data_type));
const std::string input_name = input_names[0];
const std::string pow2_name = input_name + "_ort_qnn_ep_pow2";
QnnTensorWrapper pow2_tensorwrapper(pow2_name, QNN_TENSOR_TYPE_NATIVE, qnn_data_type, output_quantize_param.Copy(),
Copy link
Contributor

@adrianlizarraga adrianlizarraga Nov 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is assigning the output of the Mul the same quantization parameters (scale/zp) as the original ReduceSum's output. Is this accurate? I would expect that the output scale of a Pow should be larger than its input scale because you are squaring the input.

Qnn_TensorType_t output_tensor_type =
qnn_model_wrapper.IsGraphOutput(output.node_arg.Name()) ? QNN_TENSOR_TYPE_APP_READ : QNN_TENSOR_TYPE_NATIVE;
QnnTensorWrapper sqrt_tensorwrapper(output.node_arg.Name(), output_tensor_type, qnn_data_type,
std::move(output_quantize_param), std::move(output_shape));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The quantization parameters don't seem appropriate, but maybe I'm missing something. I don't see why every tensor in the sequence Mul -> ReduceSum -> Sqrt would use the same exact quantization parameters.

I see that the unit tests are not running in the CI. Could you please merge the latest main branch and get the unit tests running? Maybe we can add more tests to explore the accuracy of this approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually ran the tests locally and they passed. But you are right, I think the quantization parameters are not accurate, I updated the code to support non-quantized tensor only for now.

Comment on lines 256 to 258
ORT_RETURN_IF_NOT(qnn_model_wrapper.GetOnnxShape(input.node_arg, input_shape), "Cannot get shape");
std::vector<uint32_t> output_shape;
ORT_RETURN_IF_NOT(qnn_model_wrapper.GetOnnxShape(output.node_arg, output_shape), "Cannot get shape");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I try to think about log messages from the perspective of the person reading it with the broken model and what they need to know for the message to be meaningful/not cryptic.

Suggested change
ORT_RETURN_IF_NOT(qnn_model_wrapper.GetOnnxShape(input.node_arg, input_shape), "Cannot get shape");
std::vector<uint32_t> output_shape;
ORT_RETURN_IF_NOT(qnn_model_wrapper.GetOnnxShape(output.node_arg, output_shape), "Cannot get shape");
ORT_RETURN_IF_NOT(qnn_model_wrapper.GetOnnxShape(input.node_arg, input_shape), "Cannot get input shape");
std::vector<uint32_t> output_shape;
ORT_RETURN_IF_NOT(qnn_model_wrapper.GetOnnxShape(output.node_arg, output_shape), "Cannot get output shape");

Comment on lines 263 to 265
Qnn_DataType_t qnn_data_type = QNN_DATATYPE_FLOAT_32;
ORT_RETURN_IF_ERROR(utils::GetQnnDataType(is_quantized_tensor, type_proto, qnn_data_type));
const std::string input_name = input_names[0];
const std::string pow2_name = input_name + "_ort_qnn_ep_pow2";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we break up the multiple steps here using whitespace so it's easier to quickly process the different parts? when it's one big block I have to read every single line to figure out the steps, which requires keeping a lot of state in limited short term memory. if I'm skipping from step to step via whitespace it's easier to recognize the previous parts that may no longer be relevant and the cognitive load is much lower. i.e. I have a much higher chance of correctly interpreting the code in a shorter amount of time.

std::move(input_shape));
ORT_RETURN_IF_NOT(qnn_model_wrapper.AddTensorWrapper(std::move(pow2_tensorwrapper)), "AddTensorWrapper failed");
ORT_RETURN_IF_NOT(
qnn_model_wrapper.CreateQnnNode(pow2_name, QNN_OP_PACKAGE_NAME_QTI_AISW, QNN_OP_ELEMENT_WISE_MULTIPLY,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add a comment explaining why QNN_OP_ELEMENT_WISE_MULTIPLY is being used

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:QNN issues related to QNN exeution provider
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants