[Bug]: SDXL olive optimized and running 2nd have memory not enough #410

Jay19751103 · 2024-03-05T10:02:37Z

Checklist

The issue exists after disabling all extensions
The issue exists on a clean installation of webui
The issue is caused by an extension, but I believe it is caused by a bug in the webui
The issue exists in the current version of the webui
The issue has not been reported before recently
The issue has been reported before but has not been fixed yet

What happened?

System
AMD 7600XT 16GB VRAM
32GB System RAM
200GB swapping.

When use olive directly, it can run every inference
Using this webUI, 2nd will have following
Olive implementation is experimental. It contains potentially an issue and is subject to change at any time.
ONNX: processing=StableDiffusionProcessingTxt2Img, pipeline=OnnxStableDiffusionXLPipeline
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [02:48<00:00, 8.44s/it]
2024-03-05 17:52:48.2647133 [E:onnxruntime:, sequential_executor.cc:514 onnxruntime::ExecuteKernel] Non-zero status code returned while running GroupNorm node. Name:'GroupNorm_22' Status Message: D:\a_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2571)\onnxruntime_pybind11_state.pyd!00007FFE7621DA68: (caller: 00007FFE769851B1) Exception(4) tid(5988) 8007000E Not enough memory resources are available to complete this operation.

Steps to reproduce the problem

Following SD 1.5 installation
and download sdxl sd_xl_base_1.0.safetensors from hugging face to copy into models\Stable-diffusion\sd_xl_base_1.0.safetensors
Changing
Setting ->OnnxRuntime -> Diffusers pipeline -> ONNX Stable Diffusion XL

What should have happened?

Should be same as olive inference

What browsers do you use to access the UI ?

No response

Sysinfo

sysinfo-2024-03-05-10-00.json

Console logs

2nd generate console log
Olive: Parameter change detected
Olive: Recompiling base model
WARNING: ONNX implementation works best with SD.Next. Please consider migrating to SD.Next.
Olive implementation is experimental. It contains potentially an issue and is subject to change at any time.
ONNX: processing=StableDiffusionProcessingTxt2Img, pipeline=OnnxStableDiffusionXLPipeline
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [02:48<00:00,  8.44s/it]
2024-03-05 17:52:48.2647133 [E:onnxruntime:, sequential_executor.cc:514 onnxruntime::ExecuteKernel] Non-zero status code returned while running GroupNorm node. Name:'GroupNorm_22' Status Message: D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2571)\onnxruntime_pybind11_state.pyd!00007FFE7621DA68: (caller: 00007FFE769851B1) Exception(4) tid(5988) 8007000E Not enough memory resources are available to complete this operation.

Additional information

No response

lshqqytiger · 2024-03-06T05:29:31Z

Close everything except for the necessary processes and webui then try again.
Nevertheless, if it fails, download optimized models from huggingface or somewhere.

Jay19751103 · 2024-03-12T06:49:34Z

Hi

I have closed every or use more VRAM GPU card , there still have issue to recompiling even I don't change anything

First time it can reach 3.53 it/s
After 10 image generated, I regenerate 1 image (batch count) . it will enter Recompiling (note generate 10 image also get same)
and system goes to very slove 4.29s/it

lshqqytiger · 2024-03-13T01:15:44Z

The compilation parameter store may be changed in somewhere. (by the code or user) Are you sure that you don't change any parameters/options after 10 images are generated? Is it possible to reproduce that issue on SD.Next?

Jay19751103 · 2024-03-13T08:18:42Z

Hi
Before generating image, change the width / height to 1024 it will enter breakpoint
following is 1st print value

To create a public link, set share=True in launch().
Startup time: 1.9s (prepare environment: 6.2s, initialize shared: 0.8s, load scripts: 0.3s, create ui: 0.2s, gradio launch: 0.3s).
Applying attention optimization: InvokeAI... done.
-> if shared.sd_model.__class__.__name__ == "OnnxRawPipeline" or not shared.sd_model.__class__.__name__.startswith("Onnx"): (Pdb) p shared.sd_model.__class__.__name__ 'OnnxRawPipeline' (Pdb) c
WARNING: ONNX implementation works best with SD.Next. Please consider migrating to SD.Next.
Olive implementation is experimental. It contains potentially an issue and is subject to change at any time.
2024-03-13 16:14:36.5365812 [W:onnxruntime:, session_state.cc:1166 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-03-13 16:14:36.5414884 [W:onnxruntime:, session_state.cc:1168 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
2024-03-13 16:14:41.0030069 [W:onnxruntime:, session_state.cc:1166 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-03-13 16:14:41.0072032 [W:onnxruntime:, session_state.cc:1168 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
2024-03-13 16:14:41.3923077 [W:onnxruntime:, session_state.cc:1166 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-03-13 16:14:41.3964782 [W:onnxruntime:, session_state.cc:1168 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
2024-03-13 16:14:42.3300983 [W:onnxruntime:, session_state.cc:1166 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-03-13 16:14:42.3344864 [W:onnxruntime:, session_state.cc:1168 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
2024-03-13 16:14:43.0725744 [W:onnxruntime:, session_state.cc:1166 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-03-13 16:14:43.0767242 [W:onnxruntime:, session_state.cc:1168 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
ONNX: processing=StableDiffusionProcessingTxt2Img, pipeline=OnnxStableDiffusionXLPipeline
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:05<00:00, 3.47it/s]

after image display on webui, click generate again.

d:\directml\pytest_sd\stable-diffusion-webui-directml\modules\onnx_impl_init_.py(159)check_parameters_changed()
-> if shared.sd_model.__class__.__name__ == "OnnxRawPipeline" or not shared.sd_model.__class__.__name__.startswith("Onnx"): (Pdb) p shared.sd_model.__class__.__name__ 'OnnxStableDiffusionXLPipeline'
(Pdb) l
154
155 def check_parameters_changed(p, refiner_enabled: bool):
156 from modules import shared, sd_models
157
158 breakpoint()
159 -> if shared.sd_model.class.name == "OnnxRawPipeline" or not shared.sd_model.class.name.startswith("Onnx"):
160 return shared.sd_model
161
162 breakpoint()
163 compile_height = p.height
164 compile_width = p.width

backtrace
c:\users\wenchien\appdata\local\anaconda3\envs\pytest_sd\lib\threading.py(973)_bootstrap()
-> self._bootstrap_inner()
c:\users\wenchien\appdata\local\anaconda3\envs\pytest_sd\lib\threading.py(1016)_bootstrap_inner()
-> self.run()
d:\directml\pytest_sd\stable-diffusion-webui-directml\venv\lib\site-packages\anyio_backends_asyncio.py(807)run()
-> result = context.run(func, *args)
d:\directml\pytest_sd\stable-diffusion-webui-directml\venv\lib\site-packages\gradio\utils.py(707)wrapper()
-> response = f(*args, **kwargs)
d:\directml\pytest_sd\stable-diffusion-webui-directml\modules\call_queue.py(57)f()
-> res = list(func(*args, **kwargs))
d:\directml\pytest_sd\stable-diffusion-webui-directml\modules\call_queue.py(36)f()
-> res = func(*args, **kwargs)
d:\directml\pytest_sd\stable-diffusion-webui-directml\modules\txt2img.py(110)txt2img()
-> processed = processing.process_images(p)
d:\directml\pytest_sd\stable-diffusion-webui-directml\modules\processing.py(787)process_images()
-> res = process_images_inner(p)
d:\directml\pytest_sd\stable-diffusion-webui-directml\modules\processing.py(848)process_images_inner()
-> shared.sd_model = check_parameters_changed(p, False)

d:\directml\pytest_sd\stable-diffusion-webui-directml\modules\onnx_impl_init_.py(159)check_parameters_changed()
-> if shared.sd_model.class.name == "OnnxRawPipeline" or not shared.sd_model.class.name.startswith("Onnx"):

Jay19751103 · 2024-03-14T02:38:07Z

Hi
The first time class name is OnnxRawPipeline,
the second time will be OnnxStableDiffusionXLPipeline

shared.compiled_model_state.height != compile_height
or shared.compiled_model_state.width != compile_width

one set is 1024, 1024, another is 512, 512 then condition true to enter Recompiling.
(Pdb) p shared.compiled_model_state.width
512
(Pdb) p shared.compiled_model_state.height
512
(Pdb) p compile_height
1024
(Pdb) p compile_width
1024

Jay19751103 · 2024-03-27T01:11:47Z

Hi
Any progress on this issue ?
I also tried with Nvidia card, it have same issue with directml.
2nd will enter recompiling and then the system get "Not enough memory issue"
Following is logs , after unet processed, it will enter vae-decoder, issue occurred.

ONNX: processing=StableDiffusionProcessingTxt2Img, pipeline=OnnxStableDiffusionXLPipeline
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:55<00:00, 2.80s/it]
2024-03-26 16:17:34.7141774 [E:onnxruntime:, sequential_executor.cc:514 onnxruntime::ExecuteKernel] Non-zero status code returned while running GroupNorm node. Name:'GroupNorm_22' Status Message: D:\a_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2571)\onnxruntime_pybind11_state.pyd!00007FFFBB3DDA68: (caller: 00007FFFBBB451B1) Exception(4) tid(20fc) 8007000E Not enough memory resources are available to complete this operation.

*** Error completing request
*** Arguments: ('task(p85gmeh2y6y2m8t)', <gradio.routes.Request object at 0x00000280252DA3E0>, 'a cat', '', [], 20, 'PNDM', 5, 1, 7, 1024, 1024, False, 0.7, 2, 'Latent', 0, 0, 0, 'Use same checkpoint', 'Use same sampler', '', '', [], 0, False, '', 0.8, -1, False, -1, 0, 0, 0, False, False, 'positive', 'comma', 0, False, False, 'start', '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, False, False, False, 0, False) {}
Traceback (most recent call last):
File "D:\DirectML\pytest_sd\stable-diffusion-webui-directml\modules\call_queue.py", line 57, in f
res = list(func(*args, **kwargs))
File "D:\DirectML\pytest_sd\stable-diffusion-webui-directml\modules\call_queue.py", line 36, in f
res = func(*args, **kwargs)
File "D:\DirectML\pytest_sd\stable-diffusion-webui-directml\modules\txt2img.py", line 110, in txt2img
processed = processing.process_images(p)
File "D:\DirectML\pytest_sd\stable-diffusion-webui-directml\modules\processing.py", line 787, in process_images
res = process_images_inner(p)
File "D:\DirectML\pytest_sd\stable-diffusion-webui-directml\modules\processing.py", line 892, in process_images_inner
result = shared.sd_model(**kwargs)
File "D:\DirectML\pytest_sd\stable-diffusion-webui-directml\venv\lib\site-packages\optimum\pipelines\diffusers\pipeline_stable_diffusion_xl.py", line 486, in call
[self.vae_decoder(latent_sample=latents[i : i + 1])[0] for i in range(latents.shape[0])]
File "D:\DirectML\pytest_sd\stable-diffusion-webui-directml\venv\lib\site-packages\optimum\pipelines\diffusers\pipeline_stable_diffusion_xl.py", line 486, in
[self.vae_decoder(latent_sample=latents[i : i + 1])[0] for i in range(latents.shape[0])]
File "D:\DirectML\pytest_sd\stable-diffusion-webui-directml\venv\lib\site-packages\optimum\onnxruntime\modeling_diffusion.py", line 482, in call
return self.forward(*args, **kwargs)
File "D:\DirectML\pytest_sd\stable-diffusion-webui-directml\venv\lib\site-packages\optimum\onnxruntime\modeling_diffusion.py", line 528, in forward
outputs = self.session.run(None, onnx_inputs)
File "D:\DirectML\pytest_sd\stable-diffusion-webui-directml\venv\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 220, in run
return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running GroupNorm node. Name:'GroupNorm_22' Status Message: D:\a_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2571)\onnxruntime_pybind11_state.pyd!00007FFFBB3DDA68: (caller: 00007FFFBBB451B1) Exception(4) tid(20fc) 8007000E Not enough memory resources are available to complete this operation.

Jay19751103 · 2024-04-24T06:21:09Z

Hi
Any fix for this 2nd enter Olive recompiling issue ?

lshqqytiger added question Further information is requested ONNX About ONNX labels Mar 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: SDXL olive optimized and running 2nd have memory not enough #410

[Bug]: SDXL olive optimized and running 2nd have memory not enough #410

Jay19751103 commented Mar 5, 2024

lshqqytiger commented Mar 6, 2024

Jay19751103 commented Mar 12, 2024 •

edited

Loading

lshqqytiger commented Mar 13, 2024

Jay19751103 commented Mar 13, 2024 •

edited

Loading

Jay19751103 commented Mar 14, 2024

Jay19751103 commented Mar 27, 2024

Jay19751103 commented Apr 24, 2024

[Bug]: SDXL olive optimized and running 2nd have memory not enough #410

[Bug]: SDXL olive optimized and running 2nd have memory not enough #410

Comments

Jay19751103 commented Mar 5, 2024

Checklist

What happened?

Steps to reproduce the problem

What should have happened?

What browsers do you use to access the UI ?

Sysinfo

Console logs

Additional information

lshqqytiger commented Mar 6, 2024

Jay19751103 commented Mar 12, 2024 • edited Loading

lshqqytiger commented Mar 13, 2024

Jay19751103 commented Mar 13, 2024 • edited Loading

Jay19751103 commented Mar 14, 2024

Jay19751103 commented Mar 27, 2024

Jay19751103 commented Apr 24, 2024

Jay19751103 commented Mar 12, 2024 •

edited

Loading

Jay19751103 commented Mar 13, 2024 •

edited

Loading