Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pre_explicit_broadcast should not expend scalar tensor #573

Open
peiciwu opened this issue Jan 18, 2024 · 13 comments
Open

pre_explicit_broadcast should not expend scalar tensor #573

peiciwu opened this issue Jan 18, 2024 · 13 comments
Labels
GPU Delegate GPU Delegate OP:Add OP:Add OP:Div OP:Div OP:Gemm OP:Gemm OP:Mul OP:Mul OP:Sub OP:Sub TODO TODO

Comments

@peiciwu
Copy link

peiciwu commented Jan 18, 2024

Issue Type

Feature Request

OS

Linux

onnx2tf version number

1.19.7

onnx version number

1.15.0

onnxruntime version number

1.16.3

onnxsim (onnx_simplifier) version number

0.4.33

tensorflow version number

2.15.0

Download URL for ONNX

Parameter Replacement JSON

none

Description

  1. The converted tflite graph cannot run through ArmNN with Arm Compute Library GPU (which is faster than tflite on native CPU.)

  2. For ONNX nodes: Input -> Gemm -> Mul (with scalar) and use batch=1:

Screen Shot 2024-01-17 at 11 47 01 PM

The converted tflite graph contains a fully connected layer with a bias of two dimensions while originally the bias is one dimension:
Screen Shot 2024-01-17 at 11 47 55 PM

Since the new dimension is 1, mathematically the converted tflite graph is still correct. Unfortunately, Arm Compute Library currently errors out if the bias has more than one dimension: https://github.com/ARM-software/ComputeLibrary/blob/c2a79a4b8c51ce835eaf984f3a1370447b3282c4/src/cpu/operators/CpuFullyConnected.cpp#L431

  1. It appears that the extra dimension is added at pre_explicit_broadcast https://github.com/PINTO0309/onnx2tf/blob/4e8f29129e6ea03f45f9d6520cc077522a790249/onnx2tf/utils/common_functions.py#L845:L858. Can this function be changed to do nothing if the tensor is a scalar?
@PINTO0309 PINTO0309 added TODO TODO Lack of reproduction methods Lack of information or resources on how to reproduce labels Jan 18, 2024
@PINTO0309
Copy link
Owner

PINTO0309 commented Jan 18, 2024

It is a pain in the ass to generate and test the ONNX myself for your request. Please provide me with an ONNX file, even a partial model. I know all I have to do is write a few lines of code in PyTorch. But even that is a pain in the ass.

@peiciwu
Copy link
Author

peiciwu commented Jan 19, 2024

For sure, here is an ONNX file to reproduce: https://mythic.box.com/s/wmwsrpd41vpe27pqv1k3c7z3lb2daxqr

@PINTO0309 PINTO0309 removed the Lack of reproduction methods Lack of information or resources on how to reproduce label Jan 19, 2024
@PINTO0309 PINTO0309 added GPU Delegate GPU Delegate OP:Gemm OP:Gemm and removed TODO TODO labels Jan 19, 2024
@PINTO0309
Copy link
Owner

PINTO0309 commented Jan 19, 2024

The tool has an option to optimize for GPU Delegate, but I noticed a bug in Gemm's implementation during testing, so I fixed it and released the tool. v1.19.8

https://github.com/PINTO0309/onnx2tf/releases/tag/1.19.8

  • Gemm
    • Fixed Abort bug when converting models with --optimization_for_gpu_delegate or -ofgd.

      onnx before tflite after tflite
      image image image

@peiciwu
Copy link
Author

peiciwu commented Jan 19, 2024

Thanks for the fix, Unfortunately, I don't want to use --optimization_for_gpu_delegate since it actually makes armNN run slower. Is it possible to have a fix of not using --optimization_for_gpu_delegate?

@PINTO0309
Copy link
Owner

PINTO0309 commented Jan 20, 2024

I think you have misunderstood something, pre_explicit_broadcast is irrelevant. The value corresponding to bias (onnx:C) is already processed as a scalar inside onnx2tf. However, when the model is converted to tflite, it generates bias of [1,1000] on its own. The issue at stake in this issue is the C of ONNX, i.e., the bias. NNAPI has a long-standing problem with properly handling bias.

onnx2tf/onnx2tf/ops/Gemm.py

Lines 194 to 205 in 6d0c3d8

# cast to attribute data type
x = tf.cast(x, tf.float32)
y = tf.cast(y, tf.float32)
z = tf.cast(z, tf.float32)
if not optimization_for_gpu_delegate:
if z is not None:
result = tf.convert_to_tensor(alpha) * tf.matmul(x, y) + tf.convert_to_tensor(beta) * z
else:
result = tf.convert_to_tensor(alpha) * tf.matmul(x, y) + tf.convert_to_tensor(beta)
else:
result = tf.convert_to_tensor(alpha) * tf.matmul(x, y) - (tf.convert_to_tensor(beta) * z) * tf.convert_to_tensor(-1.0, dtype=z.dtype)

# cast to attribute data type
x = tf.cast(x, tf.float32)
y = tf.cast(y, tf.float32)
z = tf.cast(z, tf.float32)
if not optimization_for_gpu_delegate:
    if z is not None:
        result = tf.convert_to_tensor(alpha) * tf.matmul(x, y) + tf.convert_to_tensor(beta) * z #<----- here
    else:
        result = tf.convert_to_tensor(alpha) * tf.matmul(x, y) + tf.convert_to_tensor(beta)

else:
    result = tf.convert_to_tensor(alpha) * tf.matmul(x, y) - (tf.convert_to_tensor(beta) * z) * tf.convert_to_tensor(-1.0, dtype=z.dtype)

It is a debug printout that proves the conversion as a scalar.

beta
1.0

# onnx:C TFLie: z
z.shape
TensorShape([1000])

(tf.convert_to_tensor(beta) * z).shape
TensorShape([1000])

I know from the beginning that GPU Delegate cannot handle bias in more than 2 dimensions. However, there is a bug in tflite converter that does not produce scalar bias. So I implemented a workaround that I don't like: the GPU optimization option.

You should issue an issue to the NNAPI repository or the TensorFlow repository. Btw, I have already posted that discussion on a TensorFlow issue over a year ago, but it has been ignored.

Good luck.

@peiciwu
Copy link
Author

peiciwu commented Jan 22, 2024

I ran an ONNX model that does not have a Mul node followed by Gemm. The converted tflite graph has FullyConnected node with bias of one dimension as expected though. Here is the onnx graph: https://mythic.box.com/s/e3n6a0grvtvhgf5x37ve4c568n87ilnq

At my original ONNX graph, if I made https://github.com/PINTO0309/onnx2tf/blob/4e8f29129e6ea03f45f9d6520cc077522a790249/onnx2tf/utils/common_functions.py#L845:L858 to not extend the tensor if the input is a scalar, then the converted tflite graph would fuse the Mul into the fully connected layer and the layer has bias of one dimension as expected as well.

@PINTO0309
Copy link
Owner

PINTO0309 commented Jan 22, 2024

It's too strange... It will take some time to do a little research. I can only guess at this point, but it is possible that the tflite optimizer is doing something when it optimizes the next Mul, rather than a Gemm conversion issue.

Somehow, I could roughly guess what the optimizer was doing.

repro_new_onnx_model_v2.onnx.zip

repro_new_onnx_model_v2_1.onnx.zip

repro_new_onnx_model_v2_1_float32.tflite.zip

image

PS:

  • Batch size ? but only when the batch size is changed to a fixed number such as 1.
    • Without -b option
    onnx2tf -i repro_new_onnx_model_v2.onnx
    
    image
    • With -b option
    onnx2tf -i repro_new_onnx_model_v2.onnx -b 1
    
    image

@PINTO0309
Copy link
Owner

@peiciwu
Copy link
Author

peiciwu commented Jan 23, 2024

Thanks for the quick fix. The fix won't work if the graph has a series of Mul or Div nodes followed by Gemm; both Mul and Div would be fused into Gemm and then would make bias of two dimensions.

Here is an ONNX file with Gemm -> Mul -> Div: https://mythic.box.com/s/i6ginfybruy1ibeg5pmzluxh6vgj9ye3

@PINTO0309
Copy link
Owner

PINTO0309 commented Jan 23, 2024

I understood. However, I am skeptical that we really need to automatically support that pattern in onnx2tf. I understand that a clean conversion would be great for everyone, but the two consecutive scalar Mul and Div should be able to be pre-fused before exporting to ONNX.

I don't know how you are generating your model, but wouldn't you just calculate 1.9921875 / 1.072122573852539 = 1.858171396 in advance?

@peiciwu
Copy link
Author

peiciwu commented Jan 23, 2024

This is a ONNX model which relies on ONNXRuntime to do graph optimization (including fusion) but ONNXRuntime doesn't export the optimized model as their optimization is done in memory only. I might be able to trace its pytorch model and doing the fusion there where the operators are separated for different quantization needs per hardware backends, although it becomes irrelevant for float32 in the final tflite model.

Is pre_explicit_broadcast extending the scalar tensor absolutely needed? explicit_broadcast does not broadcast the scalar tensor, so why pre_explicit_broadcast needs to do that? It's an easier change compared to pattern matching change.

@PINTO0309
Copy link
Owner

PINTO0309 commented Jan 23, 2024

why pre_explicit_broadcast needs to do that? It's an easier change compared to pattern matching change.

The pre_explicit_broadcast is not necessary if you want to adapt it to the convenience of your model only. I have already tested about 600 different model conversions, including a more complex 150,000 OP model. The dimensional conversion is not yet perfect, but so far I have determined that extending the dimension and then transposing and broadcasting is the best means of reducing the probability of error. The problem is not as simple as you think.

I'll look into the Mul -> Div, Mul -> Mul, Div -> Mul -> Div -> Mul, Mul -> Add, Add -> Mul, ... solution in a bit.

Idea note: search for all consecutive scalar operations behind Gemm, doesn't matter if it's 2 or a million, just search for the next scalar operation after Gemm. However, if the output has two branches, the process becomes too complicated, so the search ends there.

@PINTO0309 PINTO0309 added the TODO TODO label Jan 23, 2024
@peiciwu
Copy link
Author

peiciwu commented Jan 23, 2024

I think the scalar consecutive pattern searching method would unfortunately have some complexity.

I looked at the code change: 621bd27 And it seems the original intent is not for scalar tensor, so perhaps a workaround is to have an extra user-specified option to not extend the tensor for Mul/Div nodes etc. But again, it's a user defined option which I understand it adds extra maintenance liability.

I'm fine to close the issue as is, and I will try to clean the graph on my end.

@PINTO0309 PINTO0309 added TODO TODO and removed TODO TODO labels Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GPU Delegate GPU Delegate OP:Add OP:Add OP:Div OP:Div OP:Gemm OP:Gemm OP:Mul OP:Mul OP:Sub OP:Sub TODO TODO
Projects
None yet
Development

No branches or pull requests

2 participants