Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: CPU is being used instead of GPU for AMD 7800xt on Arch Linux #15542

Open
4 of 6 tasks
keystroke3 opened this issue Apr 17, 2024 · 3 comments
Open
4 of 6 tasks
Labels
bug-report Report of a bug, yet to be confirmed

Comments

@keystroke3
Copy link

keystroke3 commented Apr 17, 2024

Checklist

  • The issue exists after disabling all extensions
  • The issue exists on a clean installation of webui
  • The issue is caused by an extension, but I believe it is caused by a bug in the webui
  • The issue exists in the current version of the webui
  • The issue has not been reported before recently
  • The issue has been reported before but has not been fixed yet

What happened?

Following the AMD and Arch Linux instructions in the Wiki, I get:

glibc version is 2.39
Check TCMalloc: libtcmalloc_minimal.so.4
libtcmalloc_minimal.so.4 is linked with libc.so,execute LD_PRELOAD=/usr/lib/libtcmalloc_minimal.so.4
Python 3.11.8 (main, Feb 12 2024, 14:50:05) [GCC 13.2.1 20230801]
Version: v1.9.0
Commit hash: adadb4e3c7382bf3e4f7519126cd6c70f4f8557b
Launching Web UI with arguments: --skip-torch-cuda-test --no-half
Unable to find TSan function AnnotateHappensAfter.
Unable to find TSan function AnnotateHappensBefore.
Unable to find TSan function AnnotateIgnoreWritesBegin.
Unable to find TSan function AnnotateIgnoreWritesEnd.
Unable to find TSan function AnnotateNewMemory.
Unable to find TSan function __tsan_func_entry.
Unable to find TSan function __tsan_func_exit.
Warning: please export TSAN_OPTIONS='ignore_noninstrumented_modules=1' to avoid false positive reports from the OpenMP runtime!
[atlas:365908:0:365908] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
==== backtrace (tid: 365908) ====
 0 0x000000000003c770 __sigaction()  ???:0
=================================
./webui.sh: line 297: 365908 Segmentation fault      (core dumped) "${python_cmd}" -u "${LAUNCH_SCRIPT}" "$@"

With python launch.py --precision full --no-half --skip-torch-cuda-test I get:

python launch.py --precision full --no-half --skip-torch-cuda-test
Python 3.11.8 (main, Feb 12 2024, 14:50:05) [GCC 13.2.1 20230801]
Version: v1.9.0
Commit hash: adadb4e3c7382bf3e4f7519126cd6c70f4f8557b
Launching Web UI with arguments: --precision full --no-half --skip-torch-cuda-test
python: /usr/src/debug/hip-runtime-amd/clr-rocm-6.0.2/hipamd/src/hip_code_object.cpp:762: hip::FatBinaryInfo** hip::StatCO::addFatBinary(const void*, bool): Assertion `err == hipSuccess' failed.
zsh: IOT instruction (core dumped)  python launch.py --precision full --no-half --skip-torch-cuda-test

When the virtual env is created without --system-site-packages flag, the models are loaded and I am able to access the web UI. When I go to generate images, The CPU usage goes to 56% while the GPU sits at an idle 6%. It takes about 50-60 seconds to finish generating one 512x512 image.

image

Steps to reproduce the problem

  1. Have AMD Ryzen 7700 and Radeon RX 7800 XT
  2. Perform a full system update
  3. Follow install on AMD and Arch LInux instructions, with python-pytorch-opt-rocm
  4. Fail to launch web ui
  5. Recreate venv without --system-site-packages flag
  6. WebUI launches
  7. Generate image with prompt "spaceship"
  8. CPU Usage goes up
  9. GPU stays Idle

What should have happened?

The GPU should be the one doing the work, and image generation should take less time if the hardware is being fully utilsed.

What browsers do you use to access the UI ?

Mozilla Firefox

Sysinfo

{
    "Platform": "Linux-6.8.5-arch1-1-x86_64-with-glibc2.39",
    "Python": "3.11.8",
    "Version": "v1.9.0",
    "Commit": "adadb4e3c7382bf3e4f7519126cd6c70f4f8557b",
    "Script path": "/home/salvaje/ai/stable-diffusion-webui",
    "Data path": "/home/salvaje/ai/stable-diffusion-webui",
    "Extensions dir": "/home/salvaje/ai/stable-diffusion-webui/extensions",
    "Checksum": "383ec2eb7ee2d68ae18b219bc2bd835a08274a646231fb7858c0391f36a83132",
    "Commandline": [
        "launch.py",
        "--skip-torch-cuda-test",
        "--no-half"
    ],
    "Torch env info": {
        "torch_version": "2.2.2+cu121",
        "is_debug_build": "False",
        "cuda_compiled_version": "12.1",
        "gcc_version": "(GCC) 13.2.1 20230801",
        "clang_version": null,
        "cmake_version": "version 3.29.2",
        "os": "Arch Linux (x86_64)",
        "libc_version": "glibc-2.39",
        "python_version": "3.11.8 (main, Feb 12 2024, 14:50:05) [GCC 13.2.1 20230801] (64-bit runtime)",
        "python_platform": "Linux-6.8.5-arch1-1-x86_64-with-glibc2.39",
        "is_cuda_available": "False",
        "cuda_runtime_version": null,
        "cuda_module_loading": "N/A",
        "nvidia_driver_version": null,
        "nvidia_gpu_models": null,
        "cudnn_version": null,
        "pip_version": "pip3",
        "pip_packages": [
            "numpy==1.26.2",
            "open-clip-torch==2.20.0",
            "pytorch-lightning==1.9.4",
            "torch==2.2.2",
            "torchdiffeq==0.2.3",
            "torchmetrics==1.3.2",
            "torchsde==0.2.6",
            "torchvision==0.17.2",
            "triton==2.2.0"
        ],
        "conda_packages": null,
        "hip_compiled_version": "N/A",
        "hip_runtime_version": "N/A",
        "miopen_runtime_version": "N/A",
        "caching_allocator_config": "",
        "is_xnnpack_available": "True",
        "cpu_info": [
            "Architecture:                         x86_64",
            "CPU op-mode(s):                       32-bit, 64-bit",
            "Address sizes:                        48 bits physical, 48 bits virtual",
            "Byte Order:                           Little Endian",
            "CPU(s):                               16",
            "On-line CPU(s) list:                  0-15",
            "Vendor ID:                            AuthenticAMD",
            "Model name:                           AMD Ryzen 7 7700 8-Core Processor",
            "CPU family:                           25",
            "Model:                                97",
            "Thread(s) per core:                   2",
            "Core(s) per socket:                   8",
            "Socket(s):                            1",
            "Stepping:                             2",
            "CPU(s) scaling MHz:                   93%",
            "CPU max MHz:                          5389.0000",
            "CPU min MHz:                          400.0000",
            "BogoMIPS:                             7602.28",
            "Flags:                                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba perfmon_v2 ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk avx512_bf16 clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif x2avic v_spec_ctrl vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid overflow_recov succor smca fsrm flush_l1d",
            "Virtualization:                       AMD-V",
            "L1d cache:                            256 KiB (8 instances)",
            "L1i cache:                            256 KiB (8 instances)",
            "L2 cache:                             8 MiB (8 instances)",
            "L3 cache:                             32 MiB (1 instance)",
            "NUMA node(s):                         1",
            "NUMA node0 CPU(s):                    0-15",
            "Vulnerability Gather data sampling:   Not affected",
            "Vulnerability Itlb multihit:          Not affected",
            "Vulnerability L1tf:                   Not affected",
            "Vulnerability Mds:                    Not affected",
            "Vulnerability Meltdown:               Not affected",
            "Vulnerability Mmio stale data:        Not affected",
            "Vulnerability Reg file data sampling: Not affected",
            "Vulnerability Retbleed:               Not affected",
            "Vulnerability Spec rstack overflow:   Mitigation; Safe RET",
            "Vulnerability Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl",
            "Vulnerability Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization",
            "Vulnerability Spectre v2:             Mitigation; Enhanced / Automatic IBRS; IBPB conditional; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected",
            "Vulnerability Srbds:                  Not affected",
            "Vulnerability Tsx async abort:        Not affected"
        ]
    },
    "Exceptions": [],
    "CPU": {
        "model": "",
        "count logical": 16,
        "count physical": 8
    },
    "RAM": {
        "total": "31GB",
        "used": "14GB",
        "free": "414MB",
        "active": "17GB",
        "inactive": "10GB",
        "buffers": "244MB",
        "cached": "16GB",
        "shared": "375MB"
    },
    "Extensions": [],
    "Inactive extensions": [],
    "Environment": {
        "GIT": "git",
        "GRADIO_ANALYTICS_ENABLED": "False",
        "TORCH_COMMAND": "pip install torch torchvision --index-url https://download.pytorch.org/whl/nightly/rocm5.7"
    },
    "Config": {
        "ldsr_steps": 100,
        "ldsr_cached": false,
        "SCUNET_tile": 256,
        "SCUNET_tile_overlap": 8,
        "SWIN_tile": 192,
        "SWIN_tile_overlap": 8,
        "SWIN_torch_compile": false,
        "hypertile_enable_unet": false,
        "hypertile_enable_unet_secondpass": false,
        "hypertile_max_depth_unet": 3,
        "hypertile_max_tile_unet": 256,
        "hypertile_swap_size_unet": 3,
        "hypertile_enable_vae": false,
        "hypertile_max_depth_vae": 3,
        "hypertile_max_tile_vae": 128,
        "hypertile_swap_size_vae": 3,
        "sd_model_checkpoint": "v1-5-pruned-emaonly.safetensors [6ce0161689]",
        "sd_checkpoint_hash": "6ce0161689b3853acaa03779ec93eafe75a02f4ced659bee03f50797806fa2fa"
    },
    "Startup": {
        "total": 18.180575847625732,
        "records": {
            "initial startup": 0.003721475601196289,
            "prepare environment/checks": 2.4557113647460938e-05,
            "prepare environment/git version info": 0.005462646484375,
            "prepare environment/torch GPU test": 0.0008912086486816406,
            "prepare environment/install clip": 3.529914617538452,
            "prepare environment/clone repositores": 0.025794506072998047,
            "prepare environment/install requirements": 11.73979926109314,
            "prepare environment/run extensions installers": 0.00019598007202148438,
            "prepare environment": 15.302018404006958,
            "launcher": 0.0003914833068847656,
            "import torch": 1.302628993988037,
            "import gradio": 0.3022737503051758,
            "setup paths": 0.4390909671783447,
            "import ldm": 0.0013229846954345703,
            "import sgm": 1.6689300537109375e-06,
            "initialize shared": 0.029368877410888672,
            "other imports": 0.17979073524475098,
            "opts onchange": 0.00017023086547851562,
            "setup SD model": 2.5510787963867188e-05,
            "setup codeformer": 0.00023031234741210938,
            "setup gfpgan": 0.0022432804107666016,
            "set samplers": 1.33514404296875e-05,
            "list extensions": 0.0004019737243652344,
            "restore config state file": 3.814697265625e-06,
            "list SD models": 0.010065555572509766,
            "list localizations": 8.940696716308594e-05,
            "load scripts/custom_code.py": 0.0005052089691162109,
            "load scripts/img2imgalt.py": 0.0001728534698486328,
            "load scripts/loopback.py": 7.390975952148438e-05,
            "load scripts/outpainting_mk_2.py": 8.7738037109375e-05,
            "load scripts/poor_mans_outpainting.py": 5.7697296142578125e-05,
            "load scripts/postprocessing_codeformer.py": 4.220008850097656e-05,
            "load scripts/postprocessing_gfpgan.py": 3.504753112792969e-05,
            "load scripts/postprocessing_upscale.py": 8.463859558105469e-05,
            "load scripts/prompt_matrix.py": 6.103515625e-05,
            "load scripts/prompts_from_file.py": 6.151199340820312e-05,
            "load scripts/sd_upscale.py": 4.839897155761719e-05,
            "load scripts/xyz_grid.py": 0.0006356239318847656,
            "load scripts/ldsr_model.py": 0.03894925117492676,
            "load scripts/lora_script.py": 0.03897452354431152,
            "load scripts/scunet_model.py": 0.006529808044433594,
            "load scripts/swinir_model.py": 0.006334066390991211,
            "load scripts/hotkey_config.py": 6.461143493652344e-05,
            "load scripts/extra_options_section.py": 8.535385131835938e-05,
            "load scripts/hypertile_script.py": 0.013138532638549805,
            "load scripts/hypertile_xyz.py": 5.364418029785156e-05,
            "load scripts/postprocessing_autosized_crop.py": 7.271766662597656e-05,
            "load scripts/postprocessing_caption.py": 4.506111145019531e-05,
            "load scripts/postprocessing_create_flipped_copies.py": 4.220008850097656e-05,
            "load scripts/postprocessing_focal_crop.py": 0.00027251243591308594,
            "load scripts/postprocessing_split_oversized.py": 5.054473876953125e-05,
            "load scripts/soft_inpainting.py": 0.00012445449829101562,
            "load scripts/comments.py": 0.006476640701293945,
            "load scripts/refiner.py": 7.152557373046875e-05,
            "load scripts/sampler.py": 5.602836608886719e-05,
            "load scripts/seed.py": 6.29425048828125e-05,
            "load scripts": 0.11328625679016113,
            "load upscalers": 0.0005030632019042969,
            "refresh VAE": 0.00031948089599609375,
            "refresh textual inversion templates": 2.09808349609375e-05,
            "scripts list_optimizers": 0.00021505355834960938,
            "scripts list_unets": 8.821487426757812e-06,
            "reload hypernetworks": 0.000141143798828125,
            "initialize extra networks": 0.0038666725158691406,
            "scripts before_ui_callback": 0.007918834686279297,
            "create ui": 0.2715001106262207,
            "gradio launch": 0.20488810539245605,
            "add APIs": 0.00393366813659668,
            "app_started_callback/lora_script.py": 0.00014781951904296875,
            "app_started_callback": 0.0001506805419921875
        }
    },
    "Packages": [
        "accelerate==0.21.0",
        "aenum==3.1.15",
        "aiofiles==23.2.1",
        "aiohttp==3.9.5",
        "aiosignal==1.3.1",
        "altair==5.3.0",
        "annotated-types==0.6.0",
        "antlr4-python3-runtime==4.9.3",
        "anyio==3.7.1",
        "attrs==23.2.0",
        "blendmodes==2022",
        "certifi==2024.2.2",
        "charset-normalizer==3.3.2",
        "clean-fid==0.1.35",
        "click==8.1.7",
        "clip==1.0",
        "contourpy==1.2.1",
        "cycler==0.12.1",
        "deprecation==2.1.0",
        "diskcache==5.6.3",
        "einops==0.4.1",
        "facexlib==0.3.0",
        "fastapi==0.94.0",
        "ffmpy==0.3.2",
        "filelock==3.13.4",
        "filterpy==1.4.5",
        "fonttools==4.51.0",
        "frozenlist==1.4.1",
        "fsspec==2024.3.1",
        "ftfy==6.2.0",
        "gitdb==4.0.11",
        "gitpython==3.1.32",
        "gradio-client==0.5.0",
        "gradio==3.41.2",
        "h11==0.12.0",
        "httpcore==0.15.0",
        "httpx==0.24.1",
        "huggingface-hub==0.22.2",
        "idna==3.7",
        "imageio==2.34.0",
        "importlib-resources==6.4.0",
        "inflection==0.5.1",
        "jinja2==3.1.3",
        "jsonmerge==1.8.0",
        "jsonschema-specifications==2023.12.1",
        "jsonschema==4.21.1",
        "kiwisolver==1.4.5",
        "kornia-rs==0.1.3",
        "kornia==0.6.7",
        "lark==1.1.2",
        "lazy-loader==0.4",
        "lightning-utilities==0.11.2",
        "llvmlite==0.42.0",
        "markupsafe==2.1.5",
        "matplotlib==3.8.4",
        "mpmath==1.3.0",
        "multidict==6.0.5",
        "networkx==3.3",
        "numba==0.59.1",
        "numpy==1.26.2",
        "nvidia-cublas-cu12==12.1.3.1",
        "nvidia-cuda-cupti-cu12==12.1.105",
        "nvidia-cuda-nvrtc-cu12==12.1.105",
        "nvidia-cuda-runtime-cu12==12.1.105",
        "nvidia-cudnn-cu12==8.9.2.26",
        "nvidia-cufft-cu12==11.0.2.54",
        "nvidia-curand-cu12==10.3.2.106",
        "nvidia-cusolver-cu12==11.4.5.107",
        "nvidia-cusparse-cu12==12.1.0.106",
        "nvidia-nccl-cu12==2.19.3",
        "nvidia-nvjitlink-cu12==12.4.127",
        "nvidia-nvtx-cu12==12.1.105",
        "omegaconf==2.2.3",
        "open-clip-torch==2.20.0",
        "opencv-python==4.9.0.80",
        "orjson==3.10.1",
        "packaging==24.0",
        "pandas==2.2.2",
        "piexif==1.1.3",
        "pillow==9.5.0",
        "pip==24.0",
        "protobuf==3.20.3",
        "psutil==5.9.5",
        "pydantic-core==2.18.1",
        "pydantic==1.10.15",
        "pydub==0.25.1",
        "pyparsing==3.1.2",
        "python-dateutil==2.9.0.post0",
        "python-multipart==0.0.9",
        "pytorch-lightning==1.9.4",
        "pytz==2024.1",
        "pywavelets==1.6.0",
        "pyyaml==6.0.1",
        "referencing==0.34.0",
        "regex==2024.4.16",
        "requests==2.31.0",
        "resize-right==0.0.2",
        "rpds-py==0.18.0",
        "safetensors==0.4.2",
        "scikit-image==0.21.0",
        "scipy==1.13.0",
        "semantic-version==2.10.0",
        "sentencepiece==0.2.0",
        "setuptools==69.2.0",
        "six==1.16.0",
        "smmap==5.0.1",
        "sniffio==1.3.1",
        "spandrel==0.1.6",
        "starlette==0.26.1",
        "sympy==1.12",
        "tifffile==2024.2.12",
        "timm==0.9.16",
        "tokenizers==0.13.3",
        "tomesd==0.1.3",
        "toolz==0.12.1",
        "torch==2.2.2",
        "torchdiffeq==0.2.3",
        "torchmetrics==1.3.2",
        "torchsde==0.2.6",
        "torchvision==0.17.2",
        "tqdm==4.66.2",
        "trampoline==0.1.2",
        "transformers==4.30.2",
        "triton==2.2.0",
        "typing-extensions==4.11.0",
        "tzdata==2024.1",
        "urllib3==2.2.1",
        "uvicorn==0.29.0",
        "wcwidth==0.2.13",
        "websockets==11.0.3",
        "wheel==0.43.0",
        "yarl==1.9.4"
    ]
}

Console logs

./webui.sh --skip-torch-cuda-test --no-half

################################################################
Install script for stable-diffusion + Web UI
Tested on Debian 11 (Bullseye), Fedora 34+ and openSUSE Leap 15.4 or newer.
################################################################

################################################################
Running on salvaje user
################################################################

################################################################
Repo already cloned, using it as install directory
################################################################

################################################################
python venv already activate or run without venv: /home/salvaje/.virtualenvs/stable-diffusion-webui-xxxg
################################################################

################################################################
Launching launch.py...
################################################################
glibc version is 2.39
Check TCMalloc: libtcmalloc_minimal.so.4
libtcmalloc_minimal.so.4 is linked with libc.so,execute LD_PRELOAD=/usr/lib/libtcmalloc_minimal.so.4
Python 3.11.8 (main, Feb 12 2024, 14:50:05) [GCC 13.2.1 20230801]
Version: v1.9.0
Commit hash: adadb4e3c7382bf3e4f7519126cd6c70f4f8557b
Installing clip
Installing requirements
Launching Web UI with arguments: --skip-torch-cuda-test --no-half
no module 'xformers'. Processing without...
no module 'xformers'. Processing without...
No module 'xformers'. Proceeding without it.
Warning: caught exception 'Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx', memory monitor disabled
Loading weights [6ce0161689] from /home/salvaje/ai/stable-diffusion-webui/models/Stable-diffusion/v1-5-pruned-emaonly.safetensors
Creating model from config: /home/salvaje/ai/stable-diffusion-webui/configs/v1-inference.yaml
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 18.2s (prepare environment: 15.3s, import torch: 1.3s, import gradio: 0.3s, setup paths: 0.4s, other imports: 0.2s, load scripts: 0.1s, create ui: 0.3s, gradio launch: 0.2s).
/usr/bin/xdg-open: line 1045: x-www-browser: command not found
Applying attention optimization: InvokeAI... done.
Model loaded in 1.0s (create model: 0.4s, apply weights to model: 0.5s).
100%|███████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:49<00:00,  2.49s/it]
Total progress: 100%|███████████████████████████████████████████████████████████████████████████████| 20/20 [00:49<00:00,  2.50s/it]
Total progress: 100%|███████████████████████████████████████████████████████████████████████████████| 20/20 [00:49<00:00,  2.52s/it]

Additional information

I looked at this issue #15432 which was closed with issue lshqqytiger#433 being marked as providing a solution, but none of the options mentioned there have worked. I am on Linux, so directml is not an option.
Thanks

@keystroke3 keystroke3 added the bug-report Report of a bug, yet to be confirmed label Apr 17, 2024
@Chris2000SP
Copy link

I am no Developer but, the 7800 XT has 16GB VRAM right? I get it running after setting --medvram and --medvram-sdxl on my 6800 XT with 16GB VRAM. If i do it without it i cannot run it with my GPU.

@keystroke3
Copy link
Author

I changed my python version from 3.12 to 3.10 and it kind of solved the issue. Rocm compilation can be a problem, but other than that, it works now. I did not need the --medvram flag.

@hqnicolas
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-report Report of a bug, yet to be confirmed
Projects
None yet
Development

No branches or pull requests

3 participants