1.19: Clip operator with type FLOAT16 defaults to min or max value 0.0 if not explicitly given, breaking many models using FLOAT16 #21957

lesters · 2024-09-02T11:26:01Z

Describe the issue

On version 1.19, for type FLOAT16, the min or max values are set to default 0.0 if not explicitly set. This is the wrong behaviour, and breaks quite a few models that use FLOAT16.

For instance, the tensor [-1, 0.5, 1.0] with a min value of 0.0 (and no max value set) would result in [0.0, 0.0, 0.0] instead of the expected [0.0, 0.5, 1.0]. This is likewise for giving a max value but not a min value. The behaviour is as expected if both are set.

The expected values of min and max if not given is -inf and inf respectively, not 0.0.

This is not a problem for instance for the regular FLOAT type.

This was not a problem on 1.18.

To reproduce

The following Python code fails when running FLOAT16:

import onnx
import numpy as np
import numpy.testing as npt
import onnxruntime as ort
from onnx import helper


def test_model(onnx_float_type, np_float_type):
    shape = (3,1)
    input_tensor = helper.make_tensor_value_info(name='input', elem_type=onnx_float_type, shape=shape)
    output_tensor = helper.make_tensor_value_info(name='output', elem_type=onnx_float_type, shape=shape)

    min_tensor = helper.make_tensor(
        name='min_tensor',
        data_type=onnx_float_type,
        dims=(1,),
        vals=[np_float_type(0.0)]
    )

    min_node = helper.make_node(
        'Constant',
        inputs=[],
        outputs=['min'],
        value=min_tensor
    )

    clip_node = helper.make_node(
        'Clip',
        inputs=['input', 'min'],  # no 'max' input
        outputs=['output'],
        name='clip'
    )

    graph = helper.make_graph(
        nodes=[min_node, clip_node],
        name='clip_model',
        inputs=[input_tensor],
        outputs=[output_tensor]
    )

    model = helper.make_model(graph)

    model_path = 'clip_model.onnx'
    onnx.save_model(model, model_path)
    session = ort.InferenceSession(model_path)

    test_input = np.array([[-1.0], [0.5], [1.5]], dtype=np_float_type)
    expected_output = np.array([[0.0], [0.5], [1.5]], dtype=np_float_type)

    outputs = session.run(None, {"input": test_input})

    print("Output:", outputs[0])
    npt.assert_array_equal(outputs[0], expected_output)


test_model(onnx.TensorProto.FLOAT, np.float32)
test_model(onnx.TensorProto.FLOAT16, np.float16)

This code does not fail on 1.18.

Urgency

URGENT: This is a blocker for us to upgrade to 1.19 as it breaks quite a few models using FLOAT16.

Platform

Linux

OS Version

all

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.19

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

The text was updated successfully, but these errors were encountered:

arnej27959 · 2024-09-02T12:28:19Z

we think this is triggered by #21493 which enables MLFloat16 in onnxruntime/core/providers/cpu/math/clip.cc
The implementation uses std::numeric_limits<T>::max() and lowest() as the default values for max_val and min_val; but there is no specialisation of numeric_limits for MLFloat16, so the defaults become just 0 instead of what Clip wants.

tianleiwu · 2024-09-03T17:30:45Z

@yihonglyu, please fix the cpu clip with code like this.

arnej27959 · 2024-09-04T09:00:07Z

add-numeric-limits-patch.txt

I suggest adding specialization of std::numeric_limits for the extra floating-point types, that could be generally useful for other operators also. The attached patch works for MLFloat16 and BFloat16.

### Description * Add std::numeric_limits for MLFloat16 and BFloat16. * Update some comments in csharp ORTFloat16.shared.cs. * Add unit tests (including Clip) Note that the canonical NaN is not consistent in C++ and C#. C# uses negative quiet NaN as canonical NaN, while C++ uses positive quiet NaN. The choice of CSharp Float16.NaN is to be consistent with System.Half.NaN. FP16 data returns from CUDA might have 7FFF as NaN; FP16 data from CPU provider might have 0x7E00 as NaN. Anyway there is no consistent canonical NaN in ORT right now. Because all these NaNs are aligned with IEEE spec, there shall not an issue in downstream. ### Motivation and Context std::numeric_limits is used in codebase but not defined for MLFloat16 and BFloat16. It causes some bugs like #21957 introduced by #21493.

yihonglyu · 2024-10-05T20:59:21Z

@arnej27959 Given #22197 is merged, please close the issue if it is solved.

tianleiwu assigned yihonglyu Sep 3, 2024

sophies927 added the regression issues that demonstrate a regression in ORT functionality and need to be addressed immediately label Sep 5, 2024

tianleiwu mentioned this issue Sep 24, 2024

Add numeric_limits for MLFloat16 and BFloat16 #22197

Merged

tianleiwu closed this as completed Oct 6, 2024

fdwr mentioned this issue Oct 24, 2024

CPU provider uses 0 as default when Clip operator doesn't have max field set #22585

Closed

fdwr mentioned this issue Nov 9, 2024

Request to add NPU buttons for SAM and SD 1.5 microsoft/webnn-developer-preview#49

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1.19: Clip operator with type FLOAT16 defaults to min or max value 0.0 if not explicitly given, breaking many models using FLOAT16 #21957

1.19: Clip operator with type FLOAT16 defaults to min or max value 0.0 if not explicitly given, breaking many models using FLOAT16 #21957

lesters commented Sep 2, 2024 •

edited

Loading

arnej27959 commented Sep 2, 2024

tianleiwu commented Sep 3, 2024

arnej27959 commented Sep 4, 2024

yihonglyu commented Oct 5, 2024

1.19: Clip operator with type FLOAT16 defaults to min or max value 0.0 if not explicitly given, breaking many models using FLOAT16 #21957

1.19: Clip operator with type FLOAT16 defaults to min or max value 0.0 if not explicitly given, breaking many models using FLOAT16 #21957

Comments

lesters commented Sep 2, 2024 • edited Loading

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

arnej27959 commented Sep 2, 2024

tianleiwu commented Sep 3, 2024

arnej27959 commented Sep 4, 2024

yihonglyu commented Oct 5, 2024

lesters commented Sep 2, 2024 •

edited

Loading