Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corrupted value for model outputs that are also model inputs #21922

Open
adrianlizarraga opened this issue Aug 29, 2024 · 3 comments
Open

Corrupted value for model outputs that are also model inputs #21922

adrianlizarraga opened this issue Aug 29, 2024 · 3 comments
Labels
core runtime issues related to core runtime quantization issues related to quantization stale issues that have not been addressed in a while; categorized by a bot

Comments

@adrianlizarraga
Copy link
Contributor

Describe the issue

There seems to be a memory corruption bug for model outputs that are also model inputs. Consider a model with an input that is also a model output:
image

The above model is run multiple times, and the model outputs are saved into a list. After the run loop, only the outputs for input_0 will have incorrect values. Here's some pseudo-code (full repro script included below):

all_run_outputs = []
for run_index in range(num_runs):
    inputs = # get inputs ...
    outputs = session.run(None, inputs)

    # All graph outputs are always correct here (immediately after session.run())
    assert all_outputs_correct(outputs, ...) == True, ""  # Passes

    # Add outputs to list
    all_run_outputs.append(outputs)

# After the run loop, the outputs for 'input_0' (which is also a graph input) is incorrect/corrupted.
for saved_run_outputs in all_run_outputs:
    # The following assert fails. 
    assert all_outputs_correct(saved_run_outputs, ...) == True, ""  # Fails

One workaround is to copy the output numpy arrays for the problematic outputs (e.g., input_0). Here's pseudo-code:

all_run_outputs = []
for run_index in range(num_runs):
    inputs = # get inputs ...
    outputs = session.run(None, inputs)

    # All graph outputs are always correct here (immediately after session.run())
    assert all_outputs_correct(outputs, ...) == True, ""  # Passes

    # The following would work: call np.ndarray.copy() on graph outputs that are also graph inputs.
    fixed_outputs = []
    for i, output in enumerate(outputs):
        if output_names[i] in input_names:
            fixed_outputs.append(output.copy())
        else:
            fixed_outputs.append(output)
    
    # Add fixed outputs to list
    all_run_outputs.append(fixed_outputs)

# After the run loop, all outputs are still correct
for saved_run_outputs in all_run_outputs:
    # The following assert passes. 
    assert all_outputs_correct(saved_run_outputs, ...) == True, ""  # Passes

To reproduce

Here's a full python script (repro.py) to reproduce the issue.

from __future__ import annotations
import argparse
import onnx
import onnxruntime
import numpy as np
import ctypes

"""
Reproduces memory corruption error when a graph input is also a graph output

USAGE:
  python repro.py --num_runs 7

USAGE (apply workaround):
  python repro.py --num_runs 7 --use_workaround
"""

SHAPE = (1, 2, 2, 2)

def make_model():
    """
    Makes onnx model with an input that is also a graph output.
    'input_0' ---+----> (is graph output)
                 | 
                 +--> Add(+ 10) --> 'plus_10'
    """
    inp_shape = (1, 2, 2, 2)
    input_0 = onnx.helper.make_tensor_value_info("input_0", onnx.TensorProto.FLOAT, inp_shape)
    output_0 = onnx.helper.make_tensor_value_info("plus_10", onnx.TensorProto.FLOAT, inp_shape)
    ten_const = onnx.numpy_helper.from_array(np.array(10, dtype=np.float32), "ten_const")

    add_node = onnx.helper.make_node("Add", ["input_0", "ten_const"], ["plus_10"], name="Add0")
    graph = onnx.helper.make_graph(
        [add_node],
        "AddTen_f32",
        [input_0],
        [output_0, input_0],
        initializer=[ten_const],
    )
    opset_imports = [onnx.helper.make_opsetid("", 21)]
    model = onnx.helper.make_model(graph, opset_imports=opset_imports)
    model = onnx.shape_inference.infer_shapes(model)
    onnx.checker.check_model(model, True)
    return model

def get_inputs(run_index: int):
    return {
        "input_0": np.full(SHAPE, float(run_index), dtype=np.float32),  # elems equal to run_index
    }

def get_expected_outputs(run_index: int):
    return {
        "plus_10": np.full(SHAPE, run_index + 10.0, dtype=np.float32),  # elems equal to run_index + 10
        "input_0": np.full(SHAPE, float(run_index), dtype=np.float32),  # elems equal to run_index
    }

def check_outputs(run_index, output_names, outputs, verbose=False) -> list[bool]:
    """
    Checks that the outputs for a run match the expected values.
    """
    expected_outputs = get_expected_outputs(run_index)
    output_correctness = [True] * len(outputs)
    for i, output in enumerate(outputs):
        output_name = output_names[i]
        expected_output = expected_outputs[output_name]

        if not np.array_equal(output, expected_output):
            if verbose:
                print(f"\tGraph output '{output_name}' is WRONG")
                print(f"\t\texpected: {expected_output.flatten().tolist()}")
                print(f"\t\tactual:   {output.flatten().tolist()}")
            output_correctness[i] = False
        else:
            if verbose:
                print(f"\tGraph output '{output_name}' is correct")
            output_correctness[i] = True
    
    return output_correctness

def parse_args():
    parser = argparse.ArgumentParser(description="Reproduces memory corruption error when a graph input is also a graph output")
    parser.add_argument("--use_workaround", action="store_true", default=False, help="Use a workaround for the problem")
    parser.add_argument("--num_runs", type=int, default=7, help="The number of times to run the model")
    return parser.parse_args()

if __name__ == "__main__":
    args = parse_args()
    model = make_model()
    sess_options = onnxruntime.SessionOptions()
    sess_options.graph_optimization_level = onnxruntime.GraphOptimizationLevel.ORT_DISABLE_ALL
    session = onnxruntime.InferenceSession(
        model.SerializeToString(),
        sess_options=sess_options,
        providers=['CPUExecutionProvider'],
    )

    input_names = [node_arg.name for node_arg in session.get_inputs()]
    output_names = [node_arg.name for node_arg in session.get_outputs()]

    # Run the model multiple times and collect all run outputs in a list.
    # For each run, the expected values of the graph outputs 'plus_10' and 'input_0'
    # are (run_index + 10) and run_index, respectively.
    all_run_outputs = []
    for run_index in range(args.num_runs):
        inputs = get_inputs(run_index)
        outputs = session.run(None, inputs)

        # All graph outputs are always correct at this point (immediately after session.run()).
        # However, outputs that are also graph inputs will be incorrect **after** this loop due to a memory corruption(?).
        output_correctness = check_outputs(run_index, output_names, outputs, verbose=False)
        assert all(is_correct == True for is_correct in output_correctness), ""

        # Check that the input and output numpy arrays for 'input_0' point to the same memory.
        output_index_for_input_0 = output_names.index('input_0')
        assert inputs['input_0'].ctypes.data == outputs[output_index_for_input_0].ctypes.data, "OrtValues to point to same data"

        # Add this run's outputs to a list
        if not args.use_workaround:
            all_run_outputs.append(outputs)  # Doesn't work
            #all_run_outputs.append(outputs[:])  # Doesn't work either
        else:
            # The following would work: call np.ndarray.copy() on graph outputs that are also graph inputs.
            fixed_outputs = []
            for i, output in enumerate(outputs):
                if output_names[i] in input_names:
                    fixed_outputs.append(output.copy())
                    # Storing the input np.array would also work, but shouldn't have to do this.
                    #fixed_outputs.append(inputs[output_names[i]])
                else:
                    fixed_outputs.append(output)
            all_run_outputs.append(fixed_outputs)
            
            # The following one-liner also WORKS, but copies all np.arrays unnecessarily
            #all_run_outputs.append([o.copy() for o in outputs])


    assert len(all_run_outputs) == args.num_runs, "Unexpected number of elements in all_run_outputs"

    # Check if the outputs for each run are correct.
    times_output_is_wrong = [0] * len(output_names)
    for run_index, outputs in enumerate(all_run_outputs):
        print(f"\nRun {run_index}")
        output_correctness = check_outputs(run_index, output_names, outputs, verbose=True)

        # Count how many times a graph output has been incorrect
        for i, output_is_correct in enumerate(output_correctness):
            if not output_is_correct:
                times_output_is_wrong[i] += 1
    
    
    # Print a summary of results
    if any(wrong_count > 0 for wrong_count in times_output_is_wrong):
        print("\nFAILURE")
        for i, wrong_count in enumerate(times_output_is_wrong):
            print(f"Number of incorrect '{output_names[i]}' graph outputs = {wrong_count}")
    else:
        print("\nALL OUTPUTS OK")
        

Here's the console output from a sample run:

$ python repro.py --num_runs 7

Run 0
        Graph output 'plus_10' is correct
        Graph output 'input_0' is WRONG
                expected: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
                actual:   [6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0]

Run 1
        Graph output 'plus_10' is correct
        Graph output 'input_0' is correct

Run 2
        Graph output 'plus_10' is correct
        Graph output 'input_0' is correct

Run 3
        Graph output 'plus_10' is correct
        Graph output 'input_0' is WRONG
                expected: [3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0]
                actual:   [6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0]

Run 4
        Graph output 'plus_10' is correct
        Graph output 'input_0' is WRONG
                expected: [4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0]
                actual:   [14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0]

Run 5
        Graph output 'plus_10' is correct
        Graph output 'input_0' is WRONG
                expected: [5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0]
                actual:   [15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0]

Run 6
        Graph output 'plus_10' is correct
        Graph output 'input_0' is correct

FAILURE
Number of incorrect 'plus_10' graph outputs = 0
Number of incorrect 'input_0' graph outputs = 4

Here's a run that applies the workaround described above:

$ python repro.py --num_runs 7 --use_workaround

Run 0
        Graph output 'plus_10' is correct
        Graph output 'input_0' is correct

Run 1
        Graph output 'plus_10' is correct
        Graph output 'input_0' is correct

Run 2
        Graph output 'plus_10' is correct
        Graph output 'input_0' is correct

Run 3
        Graph output 'plus_10' is correct
        Graph output 'input_0' is correct

Run 4
        Graph output 'plus_10' is correct
        Graph output 'input_0' is correct

Run 5
        Graph output 'plus_10' is correct
        Graph output 'input_0' is correct

Run 6
        Graph output 'plus_10' is correct
        Graph output 'input_0' is correct

ALL OUTPUTS OK

Urgency

This issue is causing a crash when running the python quantization tools with the Percentile, Distribution, and Entropy calibration methods. These calibration methods create an augmented onnx model that makes all model inputs into model outputs. The output data from this augmented model is corrupted, which causes an eventual crash.

def collect_data(self, data_reader: CalibrationDataReader):
"""
Entropy Calibrator collects operators' tensors as well as generates tensor histogram for each operator.
"""
while True:
inputs = data_reader.get_next()
if not inputs:
break
self.intermediate_outputs.append(self.infer_session.run(None, inputs))

Platform

Windows

OS Version

Windows 11

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.19.0

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

@adrianlizarraga adrianlizarraga added quantization issues related to quantization core runtime issues related to core runtime labels Aug 29, 2024
@adrianlizarraga
Copy link
Contributor Author

Hi @yuslepukhin,
I believe you've dealt with code related to how we wrap numpy arrays over OrtValues. Do you know what could be happening?

adrianlizarraga added a commit that referenced this issue Sep 9, 2024
…calibrators (#21972)

### Description
- Applies a workaround that prevents the histogram-based calibrators
(percentile, entropy, distribution) from crashing. The workaround
involves copying inference outputs that come directly from model inputs.
A description of the bug is here:
#21922. **This PR does
not fix the root bug, but instead provides a workaround to _unblock_
users using histogram-based calibration.**
- Adds a unit test that runs all histogram-based calibrators to help
catch future regressions. We didn't have unit tests that ran these
calibration methods.

### Motivation and Context
Trying to quantize a model with the percentile, entropy, or distribution
calibration methods raises an exception:
```shell
  File "/.../site-packages/onnxruntime/quantization/quantize.py", line 691, in quantize
    quantize_static(
  File "/.../site-packages/onnxruntime/quantization/quantize.py", line 525, in quantize_static
    calibrator.collect_data(calibration_data_reader)
  File "/.../site-packages/onnxruntime/quantization/calibrate.py", line 571, in collect_data
    self.collector.collect(clean_merged_dict)
  File "/.../site-packages/onnxruntime/quantization/calibrate.py", line 746, in collect
    return self.collect_value(name_to_arr)
  File "/.../site-packages/onnxruntime/quantization/calibrate.py", line 836, in collect_value
    hist, hist_edges = np.histogram(data_arr, self.num_bins, range=(-threshold, threshold))
  File "<__array_function__ internals>", line 180, in histogram
  File ".../site-packages/numpy/lib/histograms.py", line 793, in histogram
    bin_edges, uniform_bins = _get_bin_edges(a, bins, range, weights)
  File "/.../site-packages/numpy/lib/histograms.py", line 426, in _get_bin_edges
    first_edge, last_edge = _get_outer_edges(a, range)
  File "/.../site-packages/numpy/lib/histograms.py", line 315, in _get_outer_edges
    raise ValueError(
ValueError: supplied range of [nan, nan] is not finite
```

The calibrators create an augmented model with all tensors (including
model inputs) set as model outputs. The data for outputs that are also
model inputs is corrupted as described in
#21922. The corrupted
data sometimes contains `NaN` values that cause numpy's histogram
utilities to raise an exception.
Copy link
Contributor

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

@github-actions github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Sep 29, 2024
@kshpv
Copy link

kshpv commented Nov 19, 2024

Hi. I faced the same issue. Do you plan to take a look at it?
Note: the issue is not reproduced with ONNXRuntime==1.17.3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core runtime issues related to core runtime quantization issues related to quantization stale issues that have not been addressed in a while; categorized by a bot
Projects
None yet
Development

No branches or pull requests

2 participants