pre_explicit_broadcast should not expend scalar tensor #573

peiciwu · 2024-01-18T08:04:32Z

Issue Type

Feature Request

OS

Linux

onnx2tf version number

1.19.7

onnx version number

1.15.0

onnxruntime version number

1.16.3

onnxsim (onnx_simplifier) version number

0.4.33

tensorflow version number

2.15.0

Download URL for ONNX

Parameter Replacement JSON

none

Description

The converted tflite graph cannot run through ArmNN with Arm Compute Library GPU (which is faster than tflite on native CPU.)
For ONNX nodes: Input -> Gemm -> Mul (with scalar) and use batch=1:

The converted tflite graph contains a fully connected layer with a bias of two dimensions while originally the bias is one dimension:

Since the new dimension is 1, mathematically the converted tflite graph is still correct. Unfortunately, Arm Compute Library currently errors out if the bias has more than one dimension: https://github.com/ARM-software/ComputeLibrary/blob/c2a79a4b8c51ce835eaf984f3a1370447b3282c4/src/cpu/operators/CpuFullyConnected.cpp#L431

It appears that the extra dimension is added at pre_explicit_broadcast https://github.com/PINTO0309/onnx2tf/blob/4e8f29129e6ea03f45f9d6520cc077522a790249/onnx2tf/utils/common_functions.py#L845:L858. Can this function be changed to do nothing if the tensor is a scalar?

The text was updated successfully, but these errors were encountered:

PINTO0309 · 2024-01-18T08:21:31Z

It is a pain in the ass to generate and test the ONNX myself for your request. Please provide me with an ONNX file, even a partial model. I know all I have to do is write a few lines of code in PyTorch. But even that is a pain in the ass.

peiciwu · 2024-01-19T01:15:59Z

For sure, here is an ONNX file to reproduce: https://mythic.box.com/s/wmwsrpd41vpe27pqv1k3c7z3lb2daxqr

PINTO0309 · 2024-01-19T06:53:28Z

The tool has an option to optimize for GPU Delegate, but I noticed a bug in Gemm's implementation during testing, so I fixed it and released the tool. v1.19.8

https://github.com/PINTO0309/onnx2tf/releases/tag/1.19.8

Gemm
- Fixed Abort bug when converting models with --optimization_for_gpu_delegate or -ofgd.
  
  onnx before tflite after tflite

peiciwu · 2024-01-19T17:32:37Z

Thanks for the fix, Unfortunately, I don't want to use --optimization_for_gpu_delegate since it actually makes armNN run slower. Is it possible to have a fix of not using --optimization_for_gpu_delegate?

PINTO0309 · 2024-01-20T05:39:34Z

I think you have misunderstood something, pre_explicit_broadcast is irrelevant. The value corresponding to bias (onnx:C) is already processed as a scalar inside onnx2tf. However, when the model is converted to tflite, it generates bias of [1,1000] on its own. The issue at stake in this issue is the C of ONNX, i.e., the bias. NNAPI has a long-standing problem with properly handling bias.

onnx2tf/onnx2tf/ops/Gemm.py

Lines 194 to 205 in 6d0c3d8

    
           # cast to attribute data type 
        
           x = tf.cast(x, tf.float32) 
        
           y = tf.cast(y, tf.float32) 
        
           z = tf.cast(z, tf.float32) 
        
           if not optimization_for_gpu_delegate: 
        
               if z is not None: 
        
                   result = tf.convert_to_tensor(alpha) * tf.matmul(x, y) + tf.convert_to_tensor(beta) * z 
        
               else: 
        
                   result = tf.convert_to_tensor(alpha) * tf.matmul(x, y) + tf.convert_to_tensor(beta) 
        
           else: 
        
               result = tf.convert_to_tensor(alpha) * tf.matmul(x, y) - (tf.convert_to_tensor(beta) * z) * tf.convert_to_tensor(-1.0, dtype=z.dtype)

# cast to attribute data type
x = tf.cast(x, tf.float32)
y = tf.cast(y, tf.float32)
z = tf.cast(z, tf.float32)
if not optimization_for_gpu_delegate:
    if z is not None:
        result = tf.convert_to_tensor(alpha) * tf.matmul(x, y) + tf.convert_to_tensor(beta) * z #<----- here
    else:
        result = tf.convert_to_tensor(alpha) * tf.matmul(x, y) + tf.convert_to_tensor(beta)

else:
    result = tf.convert_to_tensor(alpha) * tf.matmul(x, y) - (tf.convert_to_tensor(beta) * z) * tf.convert_to_tensor(-1.0, dtype=z.dtype)

It is a debug printout that proves the conversion as a scalar.

beta
1.0

# onnx:C TFLie: z
z.shape
TensorShape([1000])

(tf.convert_to_tensor(beta) * z).shape
TensorShape([1000])

I know from the beginning that GPU Delegate cannot handle bias in more than 2 dimensions. However, there is a bug in tflite converter that does not produce scalar bias. So I implemented a workaround that I don't like: the GPU optimization option.

You should issue an issue to the NNAPI repository or the TensorFlow repository. Btw, I have already posted that discussion on a TensorFlow issue over a year ago, but it has been ignored.

Good luck.

peiciwu · 2024-01-22T20:14:50Z

I ran an ONNX model that does not have a Mul node followed by Gemm. The converted tflite graph has FullyConnected node with bias of one dimension as expected though. Here is the onnx graph: https://mythic.box.com/s/e3n6a0grvtvhgf5x37ve4c568n87ilnq

At my original ONNX graph, if I made https://github.com/PINTO0309/onnx2tf/blob/4e8f29129e6ea03f45f9d6520cc077522a790249/onnx2tf/utils/common_functions.py#L845:L858 to not extend the tensor if the input is a scalar, then the converted tflite graph would fuse the Mul into the fully connected layer and the layer has bias of one dimension as expected as well.

PINTO0309 · 2024-01-22T22:34:46Z

It's too strange... It will take some time to do a little research. I can only guess at this point, but it is possible that the tflite optimizer is doing something when it optimizes the next Mul, rather than a Gemm conversion issue.

Somehow, I could roughly guess what the optimizer was doing.

repro_new_onnx_model_v2.onnx.zip

repro_new_onnx_model_v2_1.onnx.zip

repro_new_onnx_model_v2_1_float32.tflite.zip

PS:

Batch size ? but only when the batch size is changed to a fixed number such as 1.
- Without -b option
```
onnx2tf -i repro_new_onnx_model_v2.onnx
```
- With -b option
```
onnx2tf -i repro_new_onnx_model_v2.onnx -b 1
```

PINTO0309 · 2024-01-23T01:41:57Z

Try it.

Fixes: https://github.com/PINTO0309/onnx2tf/releases/tag/1.19.10

peiciwu · 2024-01-23T06:03:11Z

Thanks for the quick fix. The fix won't work if the graph has a series of Mul or Div nodes followed by Gemm; both Mul and Div would be fused into Gemm and then would make bias of two dimensions.

Here is an ONNX file with Gemm -> Mul -> Div: https://mythic.box.com/s/i6ginfybruy1ibeg5pmzluxh6vgj9ye3

PINTO0309 · 2024-01-23T06:58:22Z

I understood. However, I am skeptical that we really need to automatically support that pattern in onnx2tf. I understand that a clean conversion would be great for everyone, but the two consecutive scalar Mul and Div should be able to be pre-fused before exporting to ONNX.

I don't know how you are generating your model, but wouldn't you just calculate 1.9921875 / 1.072122573852539 = 1.858171396 in advance?

peiciwu · 2024-01-23T09:21:14Z

This is a ONNX model which relies on ONNXRuntime to do graph optimization (including fusion) but ONNXRuntime doesn't export the optimized model as their optimization is done in memory only. I might be able to trace its pytorch model and doing the fusion there where the operators are separated for different quantization needs per hardware backends, although it becomes irrelevant for float32 in the final tflite model.

Is pre_explicit_broadcast extending the scalar tensor absolutely needed? explicit_broadcast does not broadcast the scalar tensor, so why pre_explicit_broadcast needs to do that? It's an easier change compared to pattern matching change.

PINTO0309 · 2024-01-23T09:33:35Z

why pre_explicit_broadcast needs to do that? It's an easier change compared to pattern matching change.

The pre_explicit_broadcast is not necessary if you want to adapt it to the convenience of your model only. I have already tested about 600 different model conversions, including a more complex 150,000 OP model. The dimensional conversion is not yet perfect, but so far I have determined that extending the dimension and then transposing and broadcasting is the best means of reducing the probability of error. The problem is not as simple as you think.

I'll look into the Mul -> Div, Mul -> Mul, Div -> Mul -> Div -> Mul, Mul -> Add, Add -> Mul, ... solution in a bit.

Idea note: search for all consecutive scalar operations behind Gemm, doesn't matter if it's 2 or a million, just search for the next scalar operation after Gemm. However, if the output has two branches, the process becomes too complicated, so the search ends there.

peiciwu · 2024-01-23T18:58:42Z

I think the scalar consecutive pattern searching method would unfortunately have some complexity.

I looked at the code change: 621bd27 And it seems the original intent is not for scalar tensor, so perhaps a workaround is to have an extra user-specified option to not extend the tensor for Mul/Div nodes etc. But again, it's a user defined option which I understand it adds extra maintenance liability.

I'm fine to close the issue as is, and I will try to clean the graph on my end.

PINTO0309 added TODO TODO Lack of reproduction methods Lack of information or resources on how to reproduce labels Jan 18, 2024

PINTO0309 removed the Lack of reproduction methods Lack of information or resources on how to reproduce label Jan 19, 2024

PINTO0309 mentioned this issue Jan 19, 2024

Fixed Abort bug when converting models with --optimization_for_gpu_delegate. #574

Merged

PINTO0309 added GPU Delegate GPU Delegate OP:Gemm OP:Gemm and removed TODO TODO labels Jan 19, 2024

PINTO0309 mentioned this issue Jan 23, 2024

Modified to suppress broadcast when Gemm is immediately followed by a scalar Mul, Div, Sub, or Mod. #577

Merged

PINTO0309 added OP:Add OP:Add OP:Sub OP:Sub OP:Mul OP:Mul OP:Div OP:Div labels Jan 23, 2024

PINTO0309 added the TODO TODO label Jan 23, 2024

PINTO0309 added TODO TODO and removed TODO TODO labels Apr 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pre_explicit_broadcast should not expend scalar tensor #573

pre_explicit_broadcast should not expend scalar tensor #573

peiciwu commented Jan 18, 2024

PINTO0309 commented Jan 18, 2024 •

edited

Loading

peiciwu commented Jan 19, 2024

PINTO0309 commented Jan 19, 2024 •

edited

Loading

peiciwu commented Jan 19, 2024

PINTO0309 commented Jan 20, 2024 •

edited

Loading

peiciwu commented Jan 22, 2024

PINTO0309 commented Jan 22, 2024 •

edited

Loading

PINTO0309 commented Jan 23, 2024

peiciwu commented Jan 23, 2024

PINTO0309 commented Jan 23, 2024 •

edited

Loading

peiciwu commented Jan 23, 2024

PINTO0309 commented Jan 23, 2024 •

edited

Loading

peiciwu commented Jan 23, 2024

pre_explicit_broadcast should not expend scalar tensor #573

pre_explicit_broadcast should not expend scalar tensor #573

Comments

peiciwu commented Jan 18, 2024

Issue Type

OS

onnx2tf version number

onnx version number

onnxruntime version number

onnxsim (onnx_simplifier) version number

tensorflow version number

Download URL for ONNX

Parameter Replacement JSON

Description

PINTO0309 commented Jan 18, 2024 • edited Loading

peiciwu commented Jan 19, 2024

PINTO0309 commented Jan 19, 2024 • edited Loading

peiciwu commented Jan 19, 2024

PINTO0309 commented Jan 20, 2024 • edited Loading

peiciwu commented Jan 22, 2024

PINTO0309 commented Jan 22, 2024 • edited Loading

PINTO0309 commented Jan 23, 2024

peiciwu commented Jan 23, 2024

PINTO0309 commented Jan 23, 2024 • edited Loading

peiciwu commented Jan 23, 2024

PINTO0309 commented Jan 23, 2024 • edited Loading

peiciwu commented Jan 23, 2024

PINTO0309 commented Jan 18, 2024 •

edited

Loading

PINTO0309 commented Jan 19, 2024 •

edited

Loading

PINTO0309 commented Jan 20, 2024 •

edited

Loading

PINTO0309 commented Jan 22, 2024 •

edited

Loading

PINTO0309 commented Jan 23, 2024 •

edited

Loading

PINTO0309 commented Jan 23, 2024 •

edited

Loading