-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Corrupted value for model outputs that are also model inputs #21922
Comments
Hi @yuslepukhin, |
…calibrators (#21972) ### Description - Applies a workaround that prevents the histogram-based calibrators (percentile, entropy, distribution) from crashing. The workaround involves copying inference outputs that come directly from model inputs. A description of the bug is here: #21922. **This PR does not fix the root bug, but instead provides a workaround to _unblock_ users using histogram-based calibration.** - Adds a unit test that runs all histogram-based calibrators to help catch future regressions. We didn't have unit tests that ran these calibration methods. ### Motivation and Context Trying to quantize a model with the percentile, entropy, or distribution calibration methods raises an exception: ```shell File "/.../site-packages/onnxruntime/quantization/quantize.py", line 691, in quantize quantize_static( File "/.../site-packages/onnxruntime/quantization/quantize.py", line 525, in quantize_static calibrator.collect_data(calibration_data_reader) File "/.../site-packages/onnxruntime/quantization/calibrate.py", line 571, in collect_data self.collector.collect(clean_merged_dict) File "/.../site-packages/onnxruntime/quantization/calibrate.py", line 746, in collect return self.collect_value(name_to_arr) File "/.../site-packages/onnxruntime/quantization/calibrate.py", line 836, in collect_value hist, hist_edges = np.histogram(data_arr, self.num_bins, range=(-threshold, threshold)) File "<__array_function__ internals>", line 180, in histogram File ".../site-packages/numpy/lib/histograms.py", line 793, in histogram bin_edges, uniform_bins = _get_bin_edges(a, bins, range, weights) File "/.../site-packages/numpy/lib/histograms.py", line 426, in _get_bin_edges first_edge, last_edge = _get_outer_edges(a, range) File "/.../site-packages/numpy/lib/histograms.py", line 315, in _get_outer_edges raise ValueError( ValueError: supplied range of [nan, nan] is not finite ``` The calibrators create an augmented model with all tensors (including model inputs) set as model outputs. The data for outputs that are also model inputs is corrupted as described in #21922. The corrupted data sometimes contains `NaN` values that cause numpy's histogram utilities to raise an exception.
This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details. |
Hi. I faced the same issue. Do you plan to take a look at it? |
Describe the issue
There seems to be a memory corruption bug for model outputs that are also model inputs. Consider a model with an input that is also a model output:
The above model is run multiple times, and the model outputs are saved into a list. After the run loop, only the outputs for
input_0
will have incorrect values. Here's some pseudo-code (full repro script included below):One workaround is to copy the output numpy arrays for the problematic outputs (e.g.,
input_0
). Here's pseudo-code:To reproduce
Here's a full python script (
repro.py
) to reproduce the issue.Here's the console output from a sample run:
Here's a run that applies the workaround described above:
Urgency
This issue is causing a crash when running the python quantization tools with the
Percentile
,Distribution
, andEntropy
calibration methods. These calibration methods create an augmented onnx model that makes all model inputs into model outputs. The output data from this augmented model is corrupted, which causes an eventual crash.onnxruntime/onnxruntime/python/tools/quantization/calibrate.py
Lines 564 to 572 in 0223e86
Platform
Windows
OS Version
Windows 11
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
1.19.0
ONNX Runtime API
Python
Architecture
X64
Execution Provider
Default CPU
Execution Provider Library Version
No response
The text was updated successfully, but these errors were encountered: