Skip to content

[cpu] Loading certain models leads to global error state on M4 Max #26669

@xenova

Description

@xenova

Describe the issue

Recently, since upgrading to any 1.24.0 dev build (tested in Node.js dev build 20251104 and building locally from source on main branch), I've been encountering non-deterministic output for certain models, particularly those with Conv nodes. Models often produce infinite values, stemming from Conv nodes, but this may affect other models too.

After hours of debugging, it appears to be reproducible for models like Depth Anything V2 and this test model.

Interestingly, it's quite difficult to produce a small reproduction (any attempt to provide a single node and input seems to produce the correct output, so it's somehow linked to a larger cause).

Related to this error: #26144

To reproduce

This self-contained script should be able to reproduce the bug (you may need to run it a few times; for me, it encounters the bug ~75% of the time).

from huggingface_hub import hf_hub_download
import onnxruntime as ort
import numpy as np
from onnx.utils import extract_model
import tempfile

model_id = "hf-internal-testing/tiny-random-DPTForDepthEstimation"
model_path = hf_hub_download(repo_id=model_id, filename="onnx/model.onnx")

session_cpu = ort.InferenceSession(model_path, providers=['CPUExecutionProvider'])

input_name = session_cpu.get_inputs()[0].name
input_shape = [1, 3, 32, 32]

# Run once
np.random.seed(0)
input_data = np.random.rand(*input_shape).astype(np.float32)
inputs = {input_name: input_data}
# NOTE: If we remove this line, the error does not occur
outputs = session_cpu.run(None, inputs) # <--- THIS CAUSES A GLOBAL ERROR STATE ON CPU #

#############################################
# NOTE: We are now in an error state on CPU #
#############################################

with tempfile.TemporaryDirectory() as tmpdir:
    tmp_file = tmpdir + "/culprit_model.onnx"
    extract_model(
        input_path=model_path,
        output_path=tmp_file,
        input_names=["/neck/reassemble_stage/layers.3/projection/Conv_output_0"],
        output_names=["/neck/reassemble_stage/layers.3/resize/Conv_output_0"],
        check_model=False,
    )
    extracted_session_cpu = ort.InferenceSession(tmp_file, providers=['CPUExecutionProvider'])
    extracted_session_webgpu = ort.InferenceSession(tmp_file, providers=['WebGpuExecutionProvider'])

inputs={
    "/neck/reassemble_stage/layers.3/projection/Conv_output_0": np.random.randn(1, 768, 1, 1).astype(np.float32)
}

print("Running inference to verify...")
cpu_outputs = extracted_session_cpu.run(None, inputs)
print(np.max(cpu_outputs[0])) # -3.4028235e+38
print(np.min(cpu_outputs[0])) # -3.4028235e+38
print(np.mean(cpu_outputs[0])) # RuntimeWarning: overflow encountered in reduce


print("Running inference to verify...")
webgpu_outputs = extracted_session_webgpu.run(None, inputs)
print(np.max(webgpu_outputs[0])) # 1.5953784
print(np.min(webgpu_outputs[0])) # -1.6664962
print(np.mean(webgpu_outputs[0])) # 0.029494071

On the correct hardware, this bug can be reproduced consistently by toggling this line:

outputs = session_cpu.run(None, inputs) # <--- THIS CAUSES A GLOBAL ERROR STATE ON CPU

Note that the webgpu session is simply used here as a reference, and can be replaced by any functional EP.

Urgency

high -- completely breaks functionality

Platform

Mac

OS Version

Sequoia 15.6

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

f02a640

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    ep:WebGPUort-web webgpu providerplatform:webissues related to ONNX Runtime web; typically submitted using template

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions