Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CANT GENERATE IMAGE return torch._C._cuda_memoryStats(device) RuntimeError: invalid argument to memory_allocated #471

Open
3 of 6 tasks
kai1040112 opened this issue Jun 2, 2024 · 12 comments
Labels
zluda About ZLUDA

Comments

@kai1040112
Copy link

kai1040112 commented Jun 2, 2024

Checklist

  • The issue exists after disabling all extensions
  • The issue exists on a clean installation of webui
  • The issue is caused by an extension, but I believe it is caused by a bug in the webui
  • The issue exists in the current version of the webui
  • The issue has not been reported before recently
  • The issue has been reported before but has not been fixed yet

What happened?

I am running stable diffusion on a laptop with AMD Radeon RX 7700s, but it doesn't generate anything after I entered the prompts and click on the generate button.

Steps to reproduce the problem

  1. download stable diffusion
  2. webui-bat
  3. enable onnx and olive

What should have happened?

maybe stable diffusion couldn't use my gpu to generate photo because of some errors

What browsers do you use to access the UI ?

Microsoft Edge

Sysinfo

sysinfo-2024-06-02-12-55.json

Console logs

venv "C:\sd\stable-diffusion-webui-amdgpu\venv\Scripts\Python.exe"
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Version: v1.9.3-amd-24-g2c29feb5
Commit hash: 2c29feb50e5cd3592b3ea831fe20b17588a2edb4
no module 'xformers'. Processing without...
no module 'xformers'. Processing without...
No module 'xformers'. Proceeding without it.
C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\pytorch_lightning\utilities\distributed.py:258: LightningDeprecationWarning: `pytorch_lightning.utilities.distributed.rank_zero_only` has been deprecated in v1.8.1 and will be removed in v2.0.0. You can import it from `pytorch_lightning.utilities` instead.
  rank_zero_deprecation(
Launching Web UI with arguments:
ONNX: version=1.18.0 provider=AzureExecutionProvider, available=['AzureExecutionProvider', 'CPUExecutionProvider']
ZLUDA device failed to pass basic operation test: index=None, device_name=AMD Radeon RX 7700S [ZLUDA]
CUDA error: operation not supported
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 15.5s (prepare environment: 19.7s, initialize shared: 2.6s, load scripts: 0.6s, create ui: 0.6s, gradio launch: 0.4s).
Fetching 17 files: 100%|███████████████████████████████████████████████████████████████████████| 17/17 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████████████████████████████████████████████| 5/5 [00:10<00:00,  2.10s/it]
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
Applying attention optimization: InvokeAI... done.
Exception in thread MemMon:
Traceback (most recent call last):
  File "C:\Program Files\Python310\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "C:\sd\stable-diffusion-webui-amdgpu\modules\memmon.py", line 43, in run
    torch.cuda.reset_peak_memory_stats()
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\cuda\memory.py", line 309, in reset_peak_memory_stats
WARNING: ONNX implementation works best with SD.Next. Please consider migrating to SD.Next.
    return torch._C._cuda_resetPeakMemoryStats(device)
RuntimeError: invalid argument to reset_peak_memory_stats
C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\transformers\models\clip\modeling_clip.py:684: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  mask = torch.full((tgt_len, tgt_len), torch.tensor(torch.finfo(dtype).min, device=device), device=device)
C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\transformers\models\clip\modeling_clip.py:284: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len):
C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\transformers\models\clip\modeling_clip.py:292: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if causal_attention_mask.size() != (bsz, 1, tgt_len, src_len):
C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\transformers\models\clip\modeling_clip.py:324: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim):
ONNX: Successfully exported converted model: submodel=text_encoder
C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\diffusers\models\unets\unet_2d_condition.py:1114: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if dim % default_overall_up_factor != 0:
ONNX: Failed to convert model: model='dynavisionXLAllInOneStylized_release0534bakedvae.safetensors', error=mat1 and mat2 shapes cannot be multiplied (1x2560 and 2816x1280)
Fetching 17 files: 100%|██████████████████████████████████████████████████████████████████████████████| 17/17 [00:00<?, ?it/s]
Loading pipeline components...: 100%|███████████████████████████████████████████████████████████| 5/5 [00:07<00:00,  1.54s/it]
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
ONNX: processing=StableDiffusionProcessingTxt2Img, pipeline=OnnxRawPipeline
*** Error completing request
*** Arguments: ('task(hy03hugzn8jrn39)', <gradio.routes.Request object at 0x000001FF8F29CCA0>, 'girl', '', [], 1, 1, 7, 512, 512, False, 0.7, 2, 'Latent', 0, 0, 0, 'Use same checkpoint', 'Use same sampler', 'Use same scheduler', '', '', [], 0, 20, 'PNDM', 'Automatic', False, '', 0.8, -1, False, -1, 0, 0, 0, False, False, 'positive', 'comma', 0, False, False, 'start', '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, False, False, False, 0, False) {}
    Traceback (most recent call last):
      File "C:\sd\stable-diffusion-webui-amdgpu\modules\call_queue.py", line 57, in f
        res = list(func(*args, **kwargs))
      File "C:\sd\stable-diffusion-webui-amdgpu\modules\call_queue.py", line 36, in f
        res = func(*args, **kwargs)
      File "C:\sd\stable-diffusion-webui-amdgpu\modules\txt2img.py", line 109, in txt2img
        processed = processing.process_images(p)
      File "C:\sd\stable-diffusion-webui-amdgpu\modules\processing.py", line 847, in process_images
        res = process_images_inner(p)
      File "C:\sd\stable-diffusion-webui-amdgpu\modules\processing.py", line 952, in process_images_inner
        result = shared.sd_model(**kwargs)
    TypeError: 'OnnxRawPipeline' object is not callable

---
Traceback (most recent call last):
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\gradio\routes.py", line 488, in run_predict
    output = await app.get_blocks().process_api(
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\gradio\blocks.py", line 1431, in process_api
    result = await self.call_function(
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\gradio\blocks.py", line 1103, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\anyio\to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\gradio\utils.py", line 707, in wrapper
    response = f(*args, **kwargs)
  File "C:\sd\stable-diffusion-webui-amdgpu\modules\call_queue.py", line 95, in f
    mem_stats = {k: -(v//-(1024*1024)) for k, v in shared.mem_mon.stop().items()}
  File "C:\sd\stable-diffusion-webui-amdgpu\modules\memmon.py", line 99, in stop
    return self.read()
  File "C:\sd\stable-diffusion-webui-amdgpu\modules\memmon.py", line 81, in read
    torch_stats = torch.cuda.memory_stats(self.device)
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\cuda\memory.py", line 258, in memory_stats
    stats = memory_stats_as_nested_dict(device=device)
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\cuda\memory.py", line 270, in memory_stats_as_nested_dict
    return torch._C._cuda_memoryStats(device)
RuntimeError: invalid argument to memory_allocated
WARNING: ONNX implementation works best with SD.Next. Please consider migrating to SD.Next.
ONNX: Successfully exported converted model: submodel=text_encoder
ONNX: Failed to convert model: model='dynavisionXLAllInOneStylized_release0534bakedvae.safetensors', error=mat1 and mat2 shapes cannot be multiplied (1x2560 and 2816x1280)
Fetching 17 files: 100%|██████████████████████████████████████████████████████████████████████████████| 17/17 [00:00<?, ?it/s]
Loading pipeline components...: 100%|███████████████████████████████████████████████████████████| 5/5 [00:10<00:00,  2.12s/it]
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
ONNX: processing=StableDiffusionProcessingTxt2Img, pipeline=OnnxRawPipeline
*** Error completing request
*** Arguments: ('task(9o5ycnkv8wdtd7b)', <gradio.routes.Request object at 0x000001FF8C1CD120>, 'girl', '', [], 1, 1, 7, 512, 512, False, 0.7, 2, 'Latent', 0, 0, 0, 'Use same checkpoint', 'Use same sampler', 'Use same scheduler', '', '', [], 0, 20, 'PNDM', 'Automatic', False, '', 0.8, -1, False, -1, 0, 0, 0, False, False, 'positive', 'comma', 0, False, False, 'start', '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, False, False, False, 0, False) {}
    Traceback (most recent call last):
      File "C:\sd\stable-diffusion-webui-amdgpu\modules\call_queue.py", line 57, in f
        res = list(func(*args, **kwargs))
      File "C:\sd\stable-diffusion-webui-amdgpu\modules\call_queue.py", line 36, in f
        res = func(*args, **kwargs)
      File "C:\sd\stable-diffusion-webui-amdgpu\modules\txt2img.py", line 109, in txt2img
        processed = processing.process_images(p)
      File "C:\sd\stable-diffusion-webui-amdgpu\modules\processing.py", line 847, in process_images
        res = process_images_inner(p)
      File "C:\sd\stable-diffusion-webui-amdgpu\modules\processing.py", line 952, in process_images_inner
        result = shared.sd_model(**kwargs)
    TypeError: 'OnnxRawPipeline' object is not callable

---
Traceback (most recent call last):
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\gradio\routes.py", line 488, in run_predict
    output = await app.get_blocks().process_api(
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\gradio\blocks.py", line 1431, in process_api
    result = await self.call_function(
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\gradio\blocks.py", line 1103, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\anyio\to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\gradio\utils.py", line 707, in wrapper
    response = f(*args, **kwargs)
  File "C:\sd\stable-diffusion-webui-amdgpu\modules\call_queue.py", line 95, in f
    mem_stats = {k: -(v//-(1024*1024)) for k, v in shared.mem_mon.stop().items()}
  File "C:\sd\stable-diffusion-webui-amdgpu\modules\memmon.py", line 99, in stop
    return self.read()
  File "C:\sd\stable-diffusion-webui-amdgpu\modules\memmon.py", line 81, in read
    torch_stats = torch.cuda.memory_stats(self.device)
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\cuda\memory.py", line 258, in memory_stats
    stats = memory_stats_as_nested_dict(device=device)
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\cuda\memory.py", line 270, in memory_stats_as_nested_dict
    return torch._C._cuda_memoryStats(device)
RuntimeError: invalid argument to memory_allocated!

Additional information

螢幕擷取畫面 2024-06-02 205311
the gpu usage is very low when i tried to generate the picture(but it fails al the time)

@lshqqytiger
Copy link
Owner

lshqqytiger commented Jun 3, 2024

RX 7700S is not officially supported by AMD HIP SDK. (gfx1102)
However, you can use unofficially built blas libraries.
https://github.com/Na3MnO4/ROCmLibs-Fallback

@lshqqytiger lshqqytiger added the zluda About ZLUDA label Jun 3, 2024
@kai1040112
Copy link
Author

kai1040112 commented Jun 3, 2024

I followed the steps copilot told me:
image

but I still got the error(and still couldn't generate anything):

venv "C:\sd\stable-diffusion-webui-amdgpu\venv\Scripts\Python.exe"
ROCm Toolkit was found.
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Version: v1.9.3-amd-24-g2c29feb5
Commit hash: 2c29feb
Using ZLUDA in C:\sd\stable-diffusion-webui-amdgpu.zluda
no module 'xformers'. Processing without...
no module 'xformers'. Processing without...
No module 'xformers'. Proceeding without it.
C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\pytorch_lightning\utilities\distributed.py:258: LightningDeprecationWarning: pytorch_lightning.utilities.distributed.rank_zero_only has been deprecated in v1.8.1 and will be removed in v2.0.0. You can import it from pytorch_lightning.utilities instead.
rank_zero_deprecation(
Launching Web UI with arguments:
ONNX: version=1.18.0 provider=AzureExecutionProvider, available=['AzureExecutionProvider', 'CPUExecutionProvider']
ZLUDA device failed to pass basic operation test: index=None, device_name=AMD Radeon RX 7700S [ZLUDA]
CUDA error: operation not supported
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch().
Startup time: 26.4s (prepare environment: 33.0s, initialize shared: 2.8s, load scripts: 0.6s, create ui: 0.6s, gradio launch: 0.4s).
Fetching 17 files: 100%|███████████████████████████████████████████████████████████████████████| 17/17 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████████████████████████████████████████████| 5/5 [00:10<00:00, 2.01s/it]
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing safety_checker=None. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at huggingface/diffusers#254 .
Exception in thread MemMon:
Traceback (most recent call last):
File "C:\Program Files\Python310\lib\threading.py", line 1016, in _bootstrap_inner
self.run()
File "C:\sd\stable-diffusion-webui-amdgpu\modules\memmon.py", line 43, in run
torch.cuda.reset_peak_memory_stats()
File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\cuda\memory.py", line 309, in reset_peak_memory_stats
return torch._C._cuda_resetPeakMemoryStats(device)
RuntimeError: invalid argument to reset_peak_memory_stats
Applying attention optimization: InvokeAI... done.
WARNING: ONNX implementation works best with SD.Next. Please consider migrating to SD.Next.
C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\transformers\models\clip\modeling_clip.py:684: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
mask = torch.full((tgt_len, tgt_len), torch.tensor(torch.finfo(dtype).min, device=device), device=device)
C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\transformers\models\clip\modeling_clip.py:284: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len):
C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\transformers\models\clip\modeling_clip.py:292: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if causal_attention_mask.size() != (bsz, 1, tgt_len, src_len):
C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\transformers\models\clip\modeling_clip.py:324: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim):
ONNX: Successfully exported converted model: submodel=text_encoder
C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\diffusers\models\unets\unet_2d_condition.py:1114: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if dim % default_overall_up_factor != 0:
ONNX: Failed to convert model: model='dynavisionXLAllInOneStylized_release0534bakedvae.safetensors', error=mat1 and mat2 shapes cannot be multiplied (1x2560 and 2816x1280)
Fetching 17 files: 100%|███████████████████████████████████████████████████████████████████████| 17/17 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████████████████████████████████████████████| 5/5 [00:09<00:00, 1.89s/it]
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing safety_checker=None. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at huggingface/diffusers#254 .
ONNX: processing=StableDiffusionProcessingTxt2Img, pipeline=OnnxRawPipeline
*** Error completing request
*** Arguments: ('task(570ykia0tb9ihw7)', <gradio.routes.Request object at 0x000001AF04E056C0>, 'girl', '', [], 1, 1, 7, 512, 512, False, 0.7, 2, 'Latent', 0, 0, 0, 'Use same checkpoint', 'Use same sampler', 'Use same scheduler', '', '', [], 0, 20, 'PNDM', 'Automatic', False, '', 0.8, -1, False, -1, 0, 0, 0, False, False, 'positive', 'comma', 0, False, False, 'start', '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, False, False, False, 0, False) {}
Traceback (most recent call last):
File "C:\sd\stable-diffusion-webui-amdgpu\modules\call_queue.py", line 57, in f
res = list(func(*args, **kwargs))
File "C:\sd\stable-diffusion-webui-amdgpu\modules\call_queue.py", line 36, in f
res = func(*args, **kwargs)
File "C:\sd\stable-diffusion-webui-amdgpu\modules\txt2img.py", line 109, in txt2img
processed = processing.process_images(p)
File "C:\sd\stable-diffusion-webui-amdgpu\modules\processing.py", line 847, in process_images
res = process_images_inner(p)
File "C:\sd\stable-diffusion-webui-amdgpu\modules\processing.py", line 952, in process_images_inner
result = shared.sd_model(**kwargs)
TypeError: 'OnnxRawPipeline' object is not callable


Traceback (most recent call last):
File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\gradio\routes.py", line 488, in run_predict
output = await app.get_blocks().process_api(
File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\gradio\blocks.py", line 1431, in process_api
result = await self.call_function(
File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\gradio\blocks.py", line 1103, in call_function
prediction = await anyio.to_thread.run_sync(
File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\anyio\to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\anyio_backends_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\anyio_backends_asyncio.py", line 807, in run
result = context.run(func, *args)
File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\gradio\utils.py", line 707, in wrapper
response = f(args, **kwargs)
File "C:\sd\stable-diffusion-webui-amdgpu\modules\call_queue.py", line 95, in f
mem_stats = {k: -(v//-(1024
1024)) for k, v in shared.mem_mon.stop().items()}
File "C:\sd\stable-diffusion-webui-amdgpu\modules\memmon.py", line 99, in stop
return self.read()
File "C:\sd\stable-diffusion-webui-amdgpu\modules\memmon.py", line 81, in read
torch_stats = torch.cuda.memory_stats(self.device)
File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\cuda\memory.py", line 258, in memory_stats
stats = memory_stats_as_nested_dict(device=device)
File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\cuda\memory.py", line 270, in memory_stats_as_nested_dict
return torch._C._cuda_memoryStats(device)
RuntimeError: invalid argument to memory_allocated

I saw this video: https://www.youtube.com/watch?v=YazUwPNsdzE, it told me to add %hip_path%bin to path, but when I type %hip_path%bin in my windows explorer, it says windows cant find it, so instead of %hip_path%bin, I add C:\Program Files\AMD\ROCm\5.7\bin to path. Is that why I get the error?

@lshqqytiger
Copy link
Owner

Make sure that environment variable ZLUDA is not set.
Try again after removing .zluda folder.

@kai1040112
Copy link
Author

I removed the zluda folder from path, but it didn't change anything.

@Aelzaire
Copy link

Aelzaire commented Jun 6, 2024

Happening to me as well with a 7900XT. States that there is not enough memory to convert the model: ONNX: Failed to convert model: model='prefectPonyXL_v10.safetensors', error=[enforce fail at ..\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 1342177280 bytes.

Tried with two different models as well, no change.

Final error is TypeError: 'OnnxRawPipeline' object is not callable

Edit: My bad, the error is different from OP's. But same outcome.

@lshqqytiger
Copy link
Owner

lshqqytiger commented Jun 6, 2024

You need lots of memory to convert/optimize XL models. How much system memory do you have?

@Aelzaire
Copy link

Aelzaire commented Jun 6, 2024

You need lots of memory to convert/optimize XL models. How much system memory do you have?

32GB. Would I need more than this to convert? Thanks for the quick reply.

@lshqqytiger
Copy link
Owner

Please try again after closing unnecessary processes. If still oom, you may need more.

@CS1o
Copy link

CS1o commented Jun 7, 2024

Happening to me as well with a 7900XT. States that there is not enough memory to convert the model: ONNX: Failed to convert model: model='prefectPonyXL_v10.safetensors', error=[enforce fail at ..\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 1342177280 bytes.

Tried with two different models as well, no change.

Final error is TypeError: 'OnnxRawPipeline' object is not callable

Edit: My bad, the error is different from OP's. But same outcome.

With a 7900XT its not the best way to use Directml or Onnx.
To get the best performance on Windows + less VRAM usage you should install the Zluda version.
Im running it myself on a 7900XTX with no problems.

For any AMD or Nvidia User, i made a lot of Guides for Zluda, Directml, and all common Stable DIffusion Webui's like Auto1111, Comfyui, Fooocus, etc.
You can find the Install Guides here:
https://github.com/CS1o/Stable-Diffusion-Info/wiki/Installation-Guides

@Aelzaire
Copy link

Aelzaire commented Jun 7, 2024

Happening to me as well with a 7900XT. States that there is not enough memory to convert the model: ONNX: Failed to convert model: model='prefectPonyXL_v10.safetensors', error=[enforce fail at ..\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 1342177280 bytes.
Tried with two different models as well, no change.
Final error is TypeError: 'OnnxRawPipeline' object is not callable
Edit: My bad, the error is different from OP's. But same outcome.

With a 7900XT its not the best way to use Directml or Onnx. To get the best performance on Windows + less VRAM usage you should install the Zluda version. Im running it myself on a 7900XTX with no problems.

For any AMD or Nvidia User, i made a lot of Guides for Zluda, Directml, and all common Stable DIffusion Webui's like Auto1111, Comfyui, Fooocus, etc. You can find the Install Guides here: https://github.com/CS1o/Stable-Diffusion-Info/wiki/Installation-Guides

Thanks, CS1o!
Any downsides or drawbacks to zluda?

@CS1o
Copy link

CS1o commented Jun 7, 2024

@Aelzaire No problem!
No downsides compared to Onnx and DirectML at all!
The only thing is that some special extensions could not work. But i tested a lot and cant name any that wont rn.
Zluda is very fast and uses less VRAM while beeing compatible with mostly anything.

Edit:
Downsides of
ONNX: Bad Compatibility with a lot of Extensions + Higher VRAM usage and Model Convertion needed.
DirectML: Slower and Higher VRAM Usage.
ZLUDA: Does not support very old GPUs as ROCm support is needed for it to work.

@Aelzaire
Copy link

Aelzaire commented Jun 7, 2024

@Aelzaire No problem! No downsides compared to Onnx and DirectML at all! The only thing is that some special extensions could not work. But i tested a lot and cant name any that wont rn. Zluda is very fast and uses less VRAM while beeing compatible with mostly anything.

Edit: Downsides of ONNX: Bad Compatibility with a lot of Extensions + Higher VRAM usage and Model Convertion needed. DirectML: Slower and Higher VRAM Usage. ZLUDA: Does not support very old GPUs as ROCm support is needed for it to work.

Yo, thank you so much for this. I just got it setup earlier and yeah this is way faster. Not quite as fast as ONNX but no limitation or anything, I'll take it. That's amazing. Thanks again so much. Had no idea about this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
zluda About ZLUDA
Projects
None yet
Development

No branches or pull requests

4 participants