-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Description
Describe the issue
Recently, since upgrading to any 1.24.0 dev build (tested in Node.js dev build 20251104 and building locally from source on main branch), I've been encountering non-deterministic output for certain models, particularly those with Conv nodes. Models often produce infinite values, stemming from Conv nodes, but this may affect other models too.
After hours of debugging, it appears to be reproducible for models like Depth Anything V2 and this test model.
Interestingly, it's quite difficult to produce a small reproduction (any attempt to provide a single node and input seems to produce the correct output, so it's somehow linked to a larger cause).
Related to this error: #26144
To reproduce
This self-contained script should be able to reproduce the bug (you may need to run it a few times; for me, it encounters the bug ~75% of the time).
from huggingface_hub import hf_hub_download
import onnxruntime as ort
import numpy as np
from onnx.utils import extract_model
import tempfile
model_id = "hf-internal-testing/tiny-random-DPTForDepthEstimation"
model_path = hf_hub_download(repo_id=model_id, filename="onnx/model.onnx")
session_cpu = ort.InferenceSession(model_path, providers=['CPUExecutionProvider'])
input_name = session_cpu.get_inputs()[0].name
input_shape = [1, 3, 32, 32]
# Run once
np.random.seed(0)
input_data = np.random.rand(*input_shape).astype(np.float32)
inputs = {input_name: input_data}
# NOTE: If we remove this line, the error does not occur
outputs = session_cpu.run(None, inputs) # <--- THIS CAUSES A GLOBAL ERROR STATE ON CPU #
#############################################
# NOTE: We are now in an error state on CPU #
#############################################
with tempfile.TemporaryDirectory() as tmpdir:
tmp_file = tmpdir + "/culprit_model.onnx"
extract_model(
input_path=model_path,
output_path=tmp_file,
input_names=["/neck/reassemble_stage/layers.3/projection/Conv_output_0"],
output_names=["/neck/reassemble_stage/layers.3/resize/Conv_output_0"],
check_model=False,
)
extracted_session_cpu = ort.InferenceSession(tmp_file, providers=['CPUExecutionProvider'])
extracted_session_webgpu = ort.InferenceSession(tmp_file, providers=['WebGpuExecutionProvider'])
inputs={
"/neck/reassemble_stage/layers.3/projection/Conv_output_0": np.random.randn(1, 768, 1, 1).astype(np.float32)
}
print("Running inference to verify...")
cpu_outputs = extracted_session_cpu.run(None, inputs)
print(np.max(cpu_outputs[0])) # -3.4028235e+38
print(np.min(cpu_outputs[0])) # -3.4028235e+38
print(np.mean(cpu_outputs[0])) # RuntimeWarning: overflow encountered in reduce
print("Running inference to verify...")
webgpu_outputs = extracted_session_webgpu.run(None, inputs)
print(np.max(webgpu_outputs[0])) # 1.5953784
print(np.min(webgpu_outputs[0])) # -1.6664962
print(np.mean(webgpu_outputs[0])) # 0.029494071On the correct hardware, this bug can be reproduced consistently by toggling this line:
outputs = session_cpu.run(None, inputs) # <--- THIS CAUSES A GLOBAL ERROR STATE ON CPUNote that the webgpu session is simply used here as a reference, and can be replaced by any functional EP.
Urgency
high -- completely breaks functionality
Platform
Mac
OS Version
Sequoia 15.6
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
ONNX Runtime API
Python
Architecture
X64
Execution Provider
Default CPU
Execution Provider Library Version
No response