CANT GENERATE IMAGE return torch._C._cuda_memoryStats(device) RuntimeError: invalid argument to memory_allocated #471

kai1040112 · 2024-06-02T13:57:56Z

Checklist

The issue exists after disabling all extensions
The issue exists on a clean installation of webui
The issue is caused by an extension, but I believe it is caused by a bug in the webui
The issue exists in the current version of the webui
The issue has not been reported before recently
The issue has been reported before but has not been fixed yet

What happened?

I am running stable diffusion on a laptop with AMD Radeon RX 7700s, but it doesn't generate anything after I entered the prompts and click on the generate button.

Steps to reproduce the problem

download stable diffusion
webui-bat
enable onnx and olive

What should have happened?

maybe stable diffusion couldn't use my gpu to generate photo because of some errors

What browsers do you use to access the UI ?

Microsoft Edge

Sysinfo

sysinfo-2024-06-02-12-55.json

Console logs

venv "C:\sd\stable-diffusion-webui-amdgpu\venv\Scripts\Python.exe"
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Version: v1.9.3-amd-24-g2c29feb5
Commit hash: 2c29feb50e5cd3592b3ea831fe20b17588a2edb4
no module 'xformers'. Processing without...
no module 'xformers'. Processing without...
No module 'xformers'. Proceeding without it.
C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\pytorch_lightning\utilities\distributed.py:258: LightningDeprecationWarning: `pytorch_lightning.utilities.distributed.rank_zero_only` has been deprecated in v1.8.1 and will be removed in v2.0.0. You can import it from `pytorch_lightning.utilities` instead.
  rank_zero_deprecation(
Launching Web UI with arguments:
ONNX: version=1.18.0 provider=AzureExecutionProvider, available=['AzureExecutionProvider', 'CPUExecutionProvider']
ZLUDA device failed to pass basic operation test: index=None, device_name=AMD Radeon RX 7700S [ZLUDA]
CUDA error: operation not supported
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 15.5s (prepare environment: 19.7s, initialize shared: 2.6s, load scripts: 0.6s, create ui: 0.6s, gradio launch: 0.4s).
Fetching 17 files: 100%|███████████████████████████████████████████████████████████████████████| 17/17 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████████████████████████████████████████████| 5/5 [00:10<00:00,  2.10s/it]
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
Applying attention optimization: InvokeAI... done.
Exception in thread MemMon:
Traceback (most recent call last):
  File "C:\Program Files\Python310\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "C:\sd\stable-diffusion-webui-amdgpu\modules\memmon.py", line 43, in run
    torch.cuda.reset_peak_memory_stats()
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\cuda\memory.py", line 309, in reset_peak_memory_stats
WARNING: ONNX implementation works best with SD.Next. Please consider migrating to SD.Next.
    return torch._C._cuda_resetPeakMemoryStats(device)
RuntimeError: invalid argument to reset_peak_memory_stats
C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\transformers\models\clip\modeling_clip.py:684: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  mask = torch.full((tgt_len, tgt_len), torch.tensor(torch.finfo(dtype).min, device=device), device=device)
C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\transformers\models\clip\modeling_clip.py:284: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len):
C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\transformers\models\clip\modeling_clip.py:292: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if causal_attention_mask.size() != (bsz, 1, tgt_len, src_len):
C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\transformers\models\clip\modeling_clip.py:324: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim):
ONNX: Successfully exported converted model: submodel=text_encoder
C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\diffusers\models\unets\unet_2d_condition.py:1114: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if dim % default_overall_up_factor != 0:
ONNX: Failed to convert model: model='dynavisionXLAllInOneStylized_release0534bakedvae.safetensors', error=mat1 and mat2 shapes cannot be multiplied (1x2560 and 2816x1280)
Fetching 17 files: 100%|██████████████████████████████████████████████████████████████████████████████| 17/17 [00:00<?, ?it/s]
Loading pipeline components...: 100%|███████████████████████████████████████████████████████████| 5/5 [00:07<00:00,  1.54s/it]
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
ONNX: processing=StableDiffusionProcessingTxt2Img, pipeline=OnnxRawPipeline
*** Error completing request
*** Arguments: ('task(hy03hugzn8jrn39)', <gradio.routes.Request object at 0x000001FF8F29CCA0>, 'girl', '', [], 1, 1, 7, 512, 512, False, 0.7, 2, 'Latent', 0, 0, 0, 'Use same checkpoint', 'Use same sampler', 'Use same scheduler', '', '', [], 0, 20, 'PNDM', 'Automatic', False, '', 0.8, -1, False, -1, 0, 0, 0, False, False, 'positive', 'comma', 0, False, False, 'start', '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, False, False, False, 0, False) {}
    Traceback (most recent call last):
      File "C:\sd\stable-diffusion-webui-amdgpu\modules\call_queue.py", line 57, in f
        res = list(func(*args, **kwargs))
      File "C:\sd\stable-diffusion-webui-amdgpu\modules\call_queue.py", line 36, in f
        res = func(*args, **kwargs)
      File "C:\sd\stable-diffusion-webui-amdgpu\modules\txt2img.py", line 109, in txt2img
        processed = processing.process_images(p)
      File "C:\sd\stable-diffusion-webui-amdgpu\modules\processing.py", line 847, in process_images
        res = process_images_inner(p)
      File "C:\sd\stable-diffusion-webui-amdgpu\modules\processing.py", line 952, in process_images_inner
        result = shared.sd_model(**kwargs)
    TypeError: 'OnnxRawPipeline' object is not callable

---
Traceback (most recent call last):
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\gradio\routes.py", line 488, in run_predict
    output = await app.get_blocks().process_api(
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\gradio\blocks.py", line 1431, in process_api
    result = await self.call_function(
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\gradio\blocks.py", line 1103, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\anyio\to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\gradio\utils.py", line 707, in wrapper
    response = f(*args, **kwargs)
  File "C:\sd\stable-diffusion-webui-amdgpu\modules\call_queue.py", line 95, in f
    mem_stats = {k: -(v//-(1024*1024)) for k, v in shared.mem_mon.stop().items()}
  File "C:\sd\stable-diffusion-webui-amdgpu\modules\memmon.py", line 99, in stop
    return self.read()
  File "C:\sd\stable-diffusion-webui-amdgpu\modules\memmon.py", line 81, in read
    torch_stats = torch.cuda.memory_stats(self.device)
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\cuda\memory.py", line 258, in memory_stats
    stats = memory_stats_as_nested_dict(device=device)
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\cuda\memory.py", line 270, in memory_stats_as_nested_dict
    return torch._C._cuda_memoryStats(device)
RuntimeError: invalid argument to memory_allocated
WARNING: ONNX implementation works best with SD.Next. Please consider migrating to SD.Next.
ONNX: Successfully exported converted model: submodel=text_encoder
ONNX: Failed to convert model: model='dynavisionXLAllInOneStylized_release0534bakedvae.safetensors', error=mat1 and mat2 shapes cannot be multiplied (1x2560 and 2816x1280)
Fetching 17 files: 100%|██████████████████████████████████████████████████████████████████████████████| 17/17 [00:00<?, ?it/s]
Loading pipeline components...: 100%|███████████████████████████████████████████████████████████| 5/5 [00:10<00:00,  2.12s/it]
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
ONNX: processing=StableDiffusionProcessingTxt2Img, pipeline=OnnxRawPipeline
*** Error completing request
*** Arguments: ('task(9o5ycnkv8wdtd7b)', <gradio.routes.Request object at 0x000001FF8C1CD120>, 'girl', '', [], 1, 1, 7, 512, 512, False, 0.7, 2, 'Latent', 0, 0, 0, 'Use same checkpoint', 'Use same sampler', 'Use same scheduler', '', '', [], 0, 20, 'PNDM', 'Automatic', False, '', 0.8, -1, False, -1, 0, 0, 0, False, False, 'positive', 'comma', 0, False, False, 'start', '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, False, False, False, 0, False) {}
    Traceback (most recent call last):
      File "C:\sd\stable-diffusion-webui-amdgpu\modules\call_queue.py", line 57, in f
        res = list(func(*args, **kwargs))
      File "C:\sd\stable-diffusion-webui-amdgpu\modules\call_queue.py", line 36, in f
        res = func(*args, **kwargs)
      File "C:\sd\stable-diffusion-webui-amdgpu\modules\txt2img.py", line 109, in txt2img
        processed = processing.process_images(p)
      File "C:\sd\stable-diffusion-webui-amdgpu\modules\processing.py", line 847, in process_images
        res = process_images_inner(p)
      File "C:\sd\stable-diffusion-webui-amdgpu\modules\processing.py", line 952, in process_images_inner
        result = shared.sd_model(**kwargs)
    TypeError: 'OnnxRawPipeline' object is not callable

---
Traceback (most recent call last):
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\gradio\routes.py", line 488, in run_predict
    output = await app.get_blocks().process_api(
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\gradio\blocks.py", line 1431, in process_api
    result = await self.call_function(
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\gradio\blocks.py", line 1103, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\anyio\to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\gradio\utils.py", line 707, in wrapper
    response = f(*args, **kwargs)
  File "C:\sd\stable-diffusion-webui-amdgpu\modules\call_queue.py", line 95, in f
    mem_stats = {k: -(v//-(1024*1024)) for k, v in shared.mem_mon.stop().items()}
  File "C:\sd\stable-diffusion-webui-amdgpu\modules\memmon.py", line 99, in stop
    return self.read()
  File "C:\sd\stable-diffusion-webui-amdgpu\modules\memmon.py", line 81, in read
    torch_stats = torch.cuda.memory_stats(self.device)
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\cuda\memory.py", line 258, in memory_stats
    stats = memory_stats_as_nested_dict(device=device)
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\cuda\memory.py", line 270, in memory_stats_as_nested_dict
    return torch._C._cuda_memoryStats(device)
RuntimeError: invalid argument to memory_allocated!

Additional information

螢幕擷取畫面 2024-06-02 205311
the gpu usage is very low when i tried to generate the picture(but it fails al the time)

lshqqytiger · 2024-06-03T01:25:10Z

RX 7700S is not officially supported by AMD HIP SDK. (gfx1102)
However, you can use unofficially built blas libraries.
https://github.com/Na3MnO4/ROCmLibs-Fallback

kai1040112 · 2024-06-03T14:27:21Z

I followed the steps copilot told me:

but I still got the error(and still couldn't generate anything):

venv "C:\sd\stable-diffusion-webui-amdgpu\venv\Scripts\Python.exe"
ROCm Toolkit was found.
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Version: v1.9.3-amd-24-g2c29feb5
Commit hash: 2c29feb
Using ZLUDA in C:\sd\stable-diffusion-webui-amdgpu.zluda
no module 'xformers'. Processing without...
no module 'xformers'. Processing without...
No module 'xformers'. Proceeding without it.
C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\pytorch_lightning\utilities\distributed.py:258: LightningDeprecationWarning: pytorch_lightning.utilities.distributed.rank_zero_only has been deprecated in v1.8.1 and will be removed in v2.0.0. You can import it from pytorch_lightning.utilities instead.
rank_zero_deprecation(
Launching Web UI with arguments:
ONNX: version=1.18.0 provider=AzureExecutionProvider, available=['AzureExecutionProvider', 'CPUExecutionProvider']
ZLUDA device failed to pass basic operation test: index=None, device_name=AMD Radeon RX 7700S [ZLUDA]
CUDA error: operation not supported
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch().
Startup time: 26.4s (prepare environment: 33.0s, initialize shared: 2.8s, load scripts: 0.6s, create ui: 0.6s, gradio launch: 0.4s).
Fetching 17 files: 100%|███████████████████████████████████████████████████████████████████████| 17/17 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████████████████████████████████████████████| 5/5 [00:10<00:00, 2.01s/it]
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing safety_checker=None. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at huggingface/diffusers#254 .
Exception in thread MemMon:
Traceback (most recent call last):
File "C:\Program Files\Python310\lib\threading.py", line 1016, in _bootstrap_inner
self.run()
File "C:\sd\stable-diffusion-webui-amdgpu\modules\memmon.py", line 43, in run
torch.cuda.reset_peak_memory_stats()
File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\cuda\memory.py", line 309, in reset_peak_memory_stats
return torch._C._cuda_resetPeakMemoryStats(device)
RuntimeError: invalid argument to reset_peak_memory_stats
Applying attention optimization: InvokeAI... done.
WARNING: ONNX implementation works best with SD.Next. Please consider migrating to SD.Next.
C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\transformers\models\clip\modeling_clip.py:684: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
mask = torch.full((tgt_len, tgt_len), torch.tensor(torch.finfo(dtype).min, device=device), device=device)
C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\transformers\models\clip\modeling_clip.py:284: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len):
C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\transformers\models\clip\modeling_clip.py:292: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if causal_attention_mask.size() != (bsz, 1, tgt_len, src_len):
C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\transformers\models\clip\modeling_clip.py:324: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim):
ONNX: Successfully exported converted model: submodel=text_encoder
C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\diffusers\models\unets\unet_2d_condition.py:1114: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if dim % default_overall_up_factor != 0:
ONNX: Failed to convert model: model='dynavisionXLAllInOneStylized_release0534bakedvae.safetensors', error=mat1 and mat2 shapes cannot be multiplied (1x2560 and 2816x1280)
Fetching 17 files: 100%|███████████████████████████████████████████████████████████████████████| 17/17 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████████████████████████████████████████████| 5/5 [00:09<00:00, 1.89s/it]
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing safety_checker=None. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at huggingface/diffusers#254 .
ONNX: processing=StableDiffusionProcessingTxt2Img, pipeline=OnnxRawPipeline
*** Error completing request
*** Arguments: ('task(570ykia0tb9ihw7)', <gradio.routes.Request object at 0x000001AF04E056C0>, 'girl', '', [], 1, 1, 7, 512, 512, False, 0.7, 2, 'Latent', 0, 0, 0, 'Use same checkpoint', 'Use same sampler', 'Use same scheduler', '', '', [], 0, 20, 'PNDM', 'Automatic', False, '', 0.8, -1, False, -1, 0, 0, 0, False, False, 'positive', 'comma', 0, False, False, 'start', '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, False, False, False, 0, False) {}
Traceback (most recent call last):
File "C:\sd\stable-diffusion-webui-amdgpu\modules\call_queue.py", line 57, in f
res = list(func(*args, **kwargs))
File "C:\sd\stable-diffusion-webui-amdgpu\modules\call_queue.py", line 36, in f
res = func(*args, **kwargs)
File "C:\sd\stable-diffusion-webui-amdgpu\modules\txt2img.py", line 109, in txt2img
processed = processing.process_images(p)
File "C:\sd\stable-diffusion-webui-amdgpu\modules\processing.py", line 847, in process_images
res = process_images_inner(p)
File "C:\sd\stable-diffusion-webui-amdgpu\modules\processing.py", line 952, in process_images_inner
result = shared.sd_model(**kwargs)
TypeError: 'OnnxRawPipeline' object is not callable

Traceback (most recent call last):
File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\gradio\routes.py", line 488, in run_predict
output = await app.get_blocks().process_api(
File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\gradio\blocks.py", line 1431, in process_api
result = await self.call_function(
File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\gradio\blocks.py", line 1103, in call_function
prediction = await anyio.to_thread.run_sync(
File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\anyio\to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\anyio_backends_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\anyio_backends_asyncio.py", line 807, in run
result = context.run(func, *args)
File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\gradio\utils.py", line 707, in wrapper
response = f(args, **kwargs)
File "C:\sd\stable-diffusion-webui-amdgpu\modules\call_queue.py", line 95, in f
mem_stats = {k: -(v//-(10241024)) for k, v in shared.mem_mon.stop().items()}
File "C:\sd\stable-diffusion-webui-amdgpu\modules\memmon.py", line 99, in stop
return self.read()
File "C:\sd\stable-diffusion-webui-amdgpu\modules\memmon.py", line 81, in read
torch_stats = torch.cuda.memory_stats(self.device)
File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\cuda\memory.py", line 258, in memory_stats
stats = memory_stats_as_nested_dict(device=device)
File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\cuda\memory.py", line 270, in memory_stats_as_nested_dict
return torch._C._cuda_memoryStats(device)
RuntimeError: invalid argument to memory_allocated

I saw this video: https://www.youtube.com/watch?v=YazUwPNsdzE, it told me to add %hip_path%bin to path, but when I type %hip_path%bin in my windows explorer, it says windows cant find it, so instead of %hip_path%bin, I add C:\Program Files\AMD\ROCm\5.7\bin to path. Is that why I get the error?

lshqqytiger · 2024-06-04T01:27:27Z

Make sure that environment variable ZLUDA is not set.
Try again after removing .zluda folder.

kai1040112 · 2024-06-04T12:10:05Z

I removed the zluda folder from path, but it didn't change anything.

Aelzaire · 2024-06-06T13:35:06Z

Happening to me as well with a 7900XT. States that there is not enough memory to convert the model: ONNX: Failed to convert model: model='prefectPonyXL_v10.safetensors', error=[enforce fail at ..\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 1342177280 bytes.

Tried with two different models as well, no change.

Final error is TypeError: 'OnnxRawPipeline' object is not callable

Edit: My bad, the error is different from OP's. But same outcome.

lshqqytiger · 2024-06-06T13:51:59Z

You need lots of memory to convert/optimize XL models. How much system memory do you have?

Aelzaire · 2024-06-06T13:58:48Z

You need lots of memory to convert/optimize XL models. How much system memory do you have?

32GB. Would I need more than this to convert? Thanks for the quick reply.

lshqqytiger · 2024-06-07T00:19:44Z

Please try again after closing unnecessary processes. If still oom, you may need more.

CS1o · 2024-06-07T12:46:54Z

Happening to me as well with a 7900XT. States that there is not enough memory to convert the model: ONNX: Failed to convert model: model='prefectPonyXL_v10.safetensors', error=[enforce fail at ..\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 1342177280 bytes.

Tried with two different models as well, no change.

Final error is TypeError: 'OnnxRawPipeline' object is not callable

Edit: My bad, the error is different from OP's. But same outcome.

With a 7900XT its not the best way to use Directml or Onnx.
To get the best performance on Windows + less VRAM usage you should install the Zluda version.
Im running it myself on a 7900XTX with no problems.

For any AMD or Nvidia User, i made a lot of Guides for Zluda, Directml, and all common Stable DIffusion Webui's like Auto1111, Comfyui, Fooocus, etc.
You can find the Install Guides here:
https://github.com/CS1o/Stable-Diffusion-Info/wiki/Installation-Guides

Aelzaire · 2024-06-07T13:35:32Z

Happening to me as well with a 7900XT. States that there is not enough memory to convert the model: ONNX: Failed to convert model: model='prefectPonyXL_v10.safetensors', error=[enforce fail at ..\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 1342177280 bytes.
Tried with two different models as well, no change.
Final error is TypeError: 'OnnxRawPipeline' object is not callable
Edit: My bad, the error is different from OP's. But same outcome.

With a 7900XT its not the best way to use Directml or Onnx. To get the best performance on Windows + less VRAM usage you should install the Zluda version. Im running it myself on a 7900XTX with no problems.

For any AMD or Nvidia User, i made a lot of Guides for Zluda, Directml, and all common Stable DIffusion Webui's like Auto1111, Comfyui, Fooocus, etc. You can find the Install Guides here: https://github.com/CS1o/Stable-Diffusion-Info/wiki/Installation-Guides

Thanks, CS1o!
Any downsides or drawbacks to zluda?

CS1o · 2024-06-07T13:57:02Z

@Aelzaire No problem!
No downsides compared to Onnx and DirectML at all!
The only thing is that some special extensions could not work. But i tested a lot and cant name any that wont rn.
Zluda is very fast and uses less VRAM while beeing compatible with mostly anything.

Edit:
Downsides of
ONNX: Bad Compatibility with a lot of Extensions + Higher VRAM usage and Model Convertion needed.
DirectML: Slower and Higher VRAM Usage.
ZLUDA: Does not support very old GPUs as ROCm support is needed for it to work.

Aelzaire · 2024-06-07T16:18:19Z

@Aelzaire No problem! No downsides compared to Onnx and DirectML at all! The only thing is that some special extensions could not work. But i tested a lot and cant name any that wont rn. Zluda is very fast and uses less VRAM while beeing compatible with mostly anything.

Edit: Downsides of ONNX: Bad Compatibility with a lot of Extensions + Higher VRAM usage and Model Convertion needed. DirectML: Slower and Higher VRAM Usage. ZLUDA: Does not support very old GPUs as ROCm support is needed for it to work.

Yo, thank you so much for this. I just got it setup earlier and yeah this is way faster. Not quite as fast as ONNX but no limitation or anything, I'll take it. That's amazing. Thanks again so much. Had no idea about this.

lshqqytiger added the zluda About ZLUDA label Jun 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CANT GENERATE IMAGE return torch._C._cuda_memoryStats(device) RuntimeError: invalid argument to memory_allocated #471

CANT GENERATE IMAGE return torch._C._cuda_memoryStats(device) RuntimeError: invalid argument to memory_allocated #471

kai1040112 commented Jun 2, 2024 •

edited

Loading

lshqqytiger commented Jun 3, 2024 •

edited

Loading

kai1040112 commented Jun 3, 2024 •

edited

Loading

lshqqytiger commented Jun 4, 2024

kai1040112 commented Jun 4, 2024

Aelzaire commented Jun 6, 2024 •

edited

Loading

lshqqytiger commented Jun 6, 2024 •

edited

Loading

Aelzaire commented Jun 6, 2024

lshqqytiger commented Jun 7, 2024

CS1o commented Jun 7, 2024

Aelzaire commented Jun 7, 2024

CS1o commented Jun 7, 2024 •

edited

Loading

Aelzaire commented Jun 7, 2024

CANT GENERATE IMAGE return torch._C._cuda_memoryStats(device) RuntimeError: invalid argument to memory_allocated #471

CANT GENERATE IMAGE return torch._C._cuda_memoryStats(device) RuntimeError: invalid argument to memory_allocated #471

Comments

kai1040112 commented Jun 2, 2024 • edited Loading

Checklist

What happened?

Steps to reproduce the problem

What should have happened?

What browsers do you use to access the UI ?

Sysinfo

Console logs

Additional information

lshqqytiger commented Jun 3, 2024 • edited Loading

kai1040112 commented Jun 3, 2024 • edited Loading

lshqqytiger commented Jun 4, 2024

kai1040112 commented Jun 4, 2024

Aelzaire commented Jun 6, 2024 • edited Loading

lshqqytiger commented Jun 6, 2024 • edited Loading

Aelzaire commented Jun 6, 2024

lshqqytiger commented Jun 7, 2024

CS1o commented Jun 7, 2024

Aelzaire commented Jun 7, 2024

CS1o commented Jun 7, 2024 • edited Loading

Aelzaire commented Jun 7, 2024

kai1040112 commented Jun 2, 2024 •

edited

Loading

lshqqytiger commented Jun 3, 2024 •

edited

Loading

kai1040112 commented Jun 3, 2024 •

edited

Loading

Aelzaire commented Jun 6, 2024 •

edited

Loading

lshqqytiger commented Jun 6, 2024 •

edited

Loading

CS1o commented Jun 7, 2024 •

edited

Loading