Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: SDXL olive optimized and running 2nd have memory not enough #410

Open
6 tasks
Jay19751103 opened this issue Mar 5, 2024 · 7 comments
Open
6 tasks
Labels
ONNX About ONNX question Further information is requested

Comments

@Jay19751103
Copy link

Checklist

  • The issue exists after disabling all extensions
  • The issue exists on a clean installation of webui
  • The issue is caused by an extension, but I believe it is caused by a bug in the webui
  • The issue exists in the current version of the webui
  • The issue has not been reported before recently
  • The issue has been reported before but has not been fixed yet

What happened?

System
AMD 7600XT 16GB VRAM
32GB System RAM
200GB swapping.

  1. When use olive directly, it can run every inference
  2. Using this webUI, 2nd will have following
    Olive implementation is experimental. It contains potentially an issue and is subject to change at any time.
    ONNX: processing=StableDiffusionProcessingTxt2Img, pipeline=OnnxStableDiffusionXLPipeline
    100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [02:48<00:00, 8.44s/it]
    2024-03-05 17:52:48.2647133 [E:onnxruntime:, sequential_executor.cc:514 onnxruntime::ExecuteKernel] Non-zero status code returned while running GroupNorm node. Name:'GroupNorm_22' Status Message: D:\a_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2571)\onnxruntime_pybind11_state.pyd!00007FFE7621DA68: (caller: 00007FFE769851B1) Exception(4) tid(5988) 8007000E Not enough memory resources are available to complete this operation.

Steps to reproduce the problem

Following SD 1.5 installation
and download sdxl sd_xl_base_1.0.safetensors from hugging face to copy into models\Stable-diffusion\sd_xl_base_1.0.safetensors
Changing
Setting ->OnnxRuntime -> Diffusers pipeline -> ONNX Stable Diffusion XL

What should have happened?

Should be same as olive inference

What browsers do you use to access the UI ?

No response

Sysinfo

sysinfo-2024-03-05-10-00.json

Console logs

2nd generate console log
Olive: Parameter change detected
Olive: Recompiling base model
WARNING: ONNX implementation works best with SD.Next. Please consider migrating to SD.Next.
Olive implementation is experimental. It contains potentially an issue and is subject to change at any time.
ONNX: processing=StableDiffusionProcessingTxt2Img, pipeline=OnnxStableDiffusionXLPipeline
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [02:48<00:00,  8.44s/it]
2024-03-05 17:52:48.2647133 [E:onnxruntime:, sequential_executor.cc:514 onnxruntime::ExecuteKernel] Non-zero status code returned while running GroupNorm node. Name:'GroupNorm_22' Status Message: D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2571)\onnxruntime_pybind11_state.pyd!00007FFE7621DA68: (caller: 00007FFE769851B1) Exception(4) tid(5988) 8007000E Not enough memory resources are available to complete this operation.

Additional information

No response

@lshqqytiger
Copy link
Owner

Close everything except for the necessary processes and webui then try again.
Nevertheless, if it fails, download optimized models from huggingface or somewhere.

@lshqqytiger lshqqytiger added question Further information is requested ONNX About ONNX labels Mar 6, 2024
@Jay19751103
Copy link
Author

Jay19751103 commented Mar 12, 2024

Hi

I have closed every or use more VRAM GPU card , there still have issue to recompiling even I don't change anything
image
First time it can reach 3.53 it/s
After 10 image generated, I regenerate 1 image (batch count) . it will enter Recompiling (note generate 10 image also get same)
and system goes to very slove 4.29s/it

@lshqqytiger
Copy link
Owner

The compilation parameter store may be changed in somewhere. (by the code or user) Are you sure that you don't change any parameters/options after 10 images are generated? Is it possible to reproduce that issue on SD.Next?

@Jay19751103
Copy link
Author

Jay19751103 commented Mar 13, 2024

Hi
Before generating image, change the width / height to 1024 it will enter breakpoint
following is 1st print value

To create a public link, set share=True in launch().
Startup time: 1.9s (prepare environment: 6.2s, initialize shared: 0.8s, load scripts: 0.3s, create ui: 0.2s, gradio launch: 0.3s).
Applying attention optimization: InvokeAI... done.
-> if shared.sd_model.__class__.__name__ == "OnnxRawPipeline" or not shared.sd_model.__class__.__name__.startswith("Onnx"): (Pdb) p shared.sd_model.__class__.__name__ 'OnnxRawPipeline' (Pdb) c
WARNING: ONNX implementation works best with SD.Next. Please consider migrating to SD.Next.
Olive implementation is experimental. It contains potentially an issue and is subject to change at any time.
2024-03-13 16:14:36.5365812 [W:onnxruntime:, session_state.cc:1166 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-03-13 16:14:36.5414884 [W:onnxruntime:, session_state.cc:1168 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
2024-03-13 16:14:41.0030069 [W:onnxruntime:, session_state.cc:1166 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-03-13 16:14:41.0072032 [W:onnxruntime:, session_state.cc:1168 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
2024-03-13 16:14:41.3923077 [W:onnxruntime:, session_state.cc:1166 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-03-13 16:14:41.3964782 [W:onnxruntime:, session_state.cc:1168 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
2024-03-13 16:14:42.3300983 [W:onnxruntime:, session_state.cc:1166 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-03-13 16:14:42.3344864 [W:onnxruntime:, session_state.cc:1168 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
2024-03-13 16:14:43.0725744 [W:onnxruntime:, session_state.cc:1166 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-03-13 16:14:43.0767242 [W:onnxruntime:, session_state.cc:1168 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
ONNX: processing=StableDiffusionProcessingTxt2Img, pipeline=OnnxStableDiffusionXLPipeline
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:05<00:00, 3.47it/s]

after image display on webui, click generate again.

d:\directml\pytest_sd\stable-diffusion-webui-directml\modules\onnx_impl_init_.py(159)check_parameters_changed()
-> if shared.sd_model.__class__.__name__ == "OnnxRawPipeline" or not shared.sd_model.__class__.__name__.startswith("Onnx"): (Pdb) p shared.sd_model.__class__.__name__ 'OnnxStableDiffusionXLPipeline'
(Pdb) l
154
155 def check_parameters_changed(p, refiner_enabled: bool):
156 from modules import shared, sd_models
157
158 breakpoint()
159 -> if shared.sd_model.class.name == "OnnxRawPipeline" or not shared.sd_model.class.name.startswith("Onnx"):
160 return shared.sd_model
161
162 breakpoint()
163 compile_height = p.height
164 compile_width = p.width

backtrace
c:\users\wenchien\appdata\local\anaconda3\envs\pytest_sd\lib\threading.py(973)_bootstrap()
-> self._bootstrap_inner()
c:\users\wenchien\appdata\local\anaconda3\envs\pytest_sd\lib\threading.py(1016)_bootstrap_inner()
-> self.run()
d:\directml\pytest_sd\stable-diffusion-webui-directml\venv\lib\site-packages\anyio_backends_asyncio.py(807)run()
-> result = context.run(func, *args)
d:\directml\pytest_sd\stable-diffusion-webui-directml\venv\lib\site-packages\gradio\utils.py(707)wrapper()
-> response = f(*args, **kwargs)
d:\directml\pytest_sd\stable-diffusion-webui-directml\modules\call_queue.py(57)f()
-> res = list(func(*args, **kwargs))
d:\directml\pytest_sd\stable-diffusion-webui-directml\modules\call_queue.py(36)f()
-> res = func(*args, **kwargs)
d:\directml\pytest_sd\stable-diffusion-webui-directml\modules\txt2img.py(110)txt2img()
-> processed = processing.process_images(p)
d:\directml\pytest_sd\stable-diffusion-webui-directml\modules\processing.py(787)process_images()
-> res = process_images_inner(p)
d:\directml\pytest_sd\stable-diffusion-webui-directml\modules\processing.py(848)process_images_inner()
-> shared.sd_model = check_parameters_changed(p, False)

d:\directml\pytest_sd\stable-diffusion-webui-directml\modules\onnx_impl_init_.py(159)check_parameters_changed()
-> if shared.sd_model.class.name == "OnnxRawPipeline" or not shared.sd_model.class.name.startswith("Onnx"):

@Jay19751103
Copy link
Author

Hi
The first time class name is OnnxRawPipeline,
the second time will be OnnxStableDiffusionXLPipeline

shared.compiled_model_state.height != compile_height
or shared.compiled_model_state.width != compile_width

one set is 1024, 1024, another is 512, 512 then condition true to enter Recompiling.
(Pdb) p shared.compiled_model_state.width
512
(Pdb) p shared.compiled_model_state.height
512
(Pdb) p compile_height
1024
(Pdb) p compile_width
1024

@Jay19751103
Copy link
Author

Hi
Any progress on this issue ?
I also tried with Nvidia card, it have same issue with directml.
2nd will enter recompiling and then the system get "Not enough memory issue"
Following is logs , after unet processed, it will enter vae-decoder, issue occurred.

ONNX: processing=StableDiffusionProcessingTxt2Img, pipeline=OnnxStableDiffusionXLPipeline
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:55<00:00, 2.80s/it]
2024-03-26 16:17:34.7141774 [E:onnxruntime:, sequential_executor.cc:514 onnxruntime::ExecuteKernel] Non-zero status code returned while running GroupNorm node. Name:'GroupNorm_22' Status Message: D:\a_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2571)\onnxruntime_pybind11_state.pyd!00007FFFBB3DDA68: (caller: 00007FFFBBB451B1) Exception(4) tid(20fc) 8007000E Not enough memory resources are available to complete this operation.

*** Error completing request
*** Arguments: ('task(p85gmeh2y6y2m8t)', <gradio.routes.Request object at 0x00000280252DA3E0>, 'a cat', '', [], 20, 'PNDM', 5, 1, 7, 1024, 1024, False, 0.7, 2, 'Latent', 0, 0, 0, 'Use same checkpoint', 'Use same sampler', '', '', [], 0, False, '', 0.8, -1, False, -1, 0, 0, 0, False, False, 'positive', 'comma', 0, False, False, 'start', '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, False, False, False, 0, False) {}
Traceback (most recent call last):
File "D:\DirectML\pytest_sd\stable-diffusion-webui-directml\modules\call_queue.py", line 57, in f
res = list(func(*args, **kwargs))
File "D:\DirectML\pytest_sd\stable-diffusion-webui-directml\modules\call_queue.py", line 36, in f
res = func(*args, **kwargs)
File "D:\DirectML\pytest_sd\stable-diffusion-webui-directml\modules\txt2img.py", line 110, in txt2img
processed = processing.process_images(p)
File "D:\DirectML\pytest_sd\stable-diffusion-webui-directml\modules\processing.py", line 787, in process_images
res = process_images_inner(p)
File "D:\DirectML\pytest_sd\stable-diffusion-webui-directml\modules\processing.py", line 892, in process_images_inner
result = shared.sd_model(**kwargs)
File "D:\DirectML\pytest_sd\stable-diffusion-webui-directml\venv\lib\site-packages\optimum\pipelines\diffusers\pipeline_stable_diffusion_xl.py", line 486, in call
[self.vae_decoder(latent_sample=latents[i : i + 1])[0] for i in range(latents.shape[0])]
File "D:\DirectML\pytest_sd\stable-diffusion-webui-directml\venv\lib\site-packages\optimum\pipelines\diffusers\pipeline_stable_diffusion_xl.py", line 486, in
[self.vae_decoder(latent_sample=latents[i : i + 1])[0] for i in range(latents.shape[0])]
File "D:\DirectML\pytest_sd\stable-diffusion-webui-directml\venv\lib\site-packages\optimum\onnxruntime\modeling_diffusion.py", line 482, in call
return self.forward(*args, **kwargs)
File "D:\DirectML\pytest_sd\stable-diffusion-webui-directml\venv\lib\site-packages\optimum\onnxruntime\modeling_diffusion.py", line 528, in forward
outputs = self.session.run(None, onnx_inputs)
File "D:\DirectML\pytest_sd\stable-diffusion-webui-directml\venv\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 220, in run
return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running GroupNorm node. Name:'GroupNorm_22' Status Message: D:\a_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2571)\onnxruntime_pybind11_state.pyd!00007FFFBB3DDA68: (caller: 00007FFFBBB451B1) Exception(4) tid(20fc) 8007000E Not enough memory resources are available to complete this operation.

@Jay19751103
Copy link
Author

Hi
Any fix for this 2nd enter Olive recompiling issue ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ONNX About ONNX question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants