How does ComfyUI run on AMD graphics cards via WSL? #9969

FengZi-lv · 2025-09-21T14:43:03Z

FengZi-lv
Sep 21, 2025

I have completed all the steps here and successfully installed ROCM and PyTorch.

However, when I run ComfyUI, it doesn't work on my GPU. Instead, all its work is done on the CPU, which is very bad. Is there a way to fix this problem？

Thanks.

How do I operate

fengzi@Zephyr-PC:~$ cd ~/ComfyUI
fengzi@Zephyr-PC:~/ComfyUI$ source venv/bin/activate
(venv) fengzi@Zephyr-PC:~/ComfyUI$ python main.py
[START] Security scan
[ComfyUI-Manager] Using uv as Python module for pip operations.
Using Python 3.12.3 environment at: venv
[DONE] Security scan
## ComfyUI-Manager: installing dependencies done.
** ComfyUI startup time: 2025-09-21 22:23:12.238
** Platform: Linux
** Python version: 3.12.3 (main, Aug 14 2025, 17:47:21) [GCC 13.3.0]
** Python executable: /home/fengzi/ComfyUI/venv/bin/python
** ComfyUI Path: /home/fengzi/ComfyUI
** ComfyUI Base Folder Path: /home/fengzi/ComfyUI
** User directory: /home/fengzi/ComfyUI/user
** ComfyUI-Manager config path: /home/fengzi/ComfyUI/user/default/ComfyUI-Manager/config.ini
** Log path: /home/fengzi/ComfyUI/user/comfyui.log
Using Python 3.12.3 environment at: venv
Using Python 3.12.3 environment at: venv

Prestartup times for custom nodes:
   3.2 seconds: /home/fengzi/ComfyUI/custom_nodes/ComfyUI-Manager

Checkpoint files will always be loaded safely.
Total VRAM 16304 MB, total RAM 23759 MB
pytorch version: 2.6.0+rocm6.4.2.git76481f7c
AMD arch: gfx1201
ROCm version: (6, 4)
Set vram state to: NORMAL_VRAM
Device: cuda:0 AMD Radeon RX 9070 XT : native
Using sub quadratic optimization for attention, if you have memory or speed issues try using: --use-split-cross-attention
Python version: 3.12.3 (main, Aug 14 2025, 17:47:21) [GCC 13.3.0]
ComfyUI version: 0.3.59
ComfyUI frontend version: 1.26.13
[Prompt Server] web root: /home/fengzi/ComfyUI/venv/lib/python3.12/site-packages/comfyui_frontend_package/static
### Loading: ComfyUI-Manager (V3.37)
[ComfyUI-Manager] network_mode: public
### ComfyUI Revision: 3898 [72212fef] *DETACHED | Released on '2025-09-10'

Import times for custom nodes:
   0.0 seconds: /home/fengzi/ComfyUI/custom_nodes/websocket_image_save.py
   0.4 seconds: /home/fengzi/ComfyUI/custom_nodes/ComfyUI-Manager

Will assume non-transactional DDL.
No target revision found.
Starting server

To see the GUI go to: http://127.0.0.1:8188
FETCH ComfyRegistry Data: 5/98
FETCH ComfyRegistry Data: 10/98
FETCH ComfyRegistry Data: 15/98
FETCH ComfyRegistry Data: 20/98
FETCH ComfyRegistry Data: 25/98
FETCH ComfyRegistry Data: 30/98
FETCH ComfyRegistry Data: 35/98
FETCH ComfyRegistry Data: 40/98
FETCH ComfyRegistry Data: 45/98
FETCH ComfyRegistry Data: 50/98
FETCH ComfyRegistry Data: 55/98
FETCH ComfyRegistry Data: 60/98
FETCH ComfyRegistry Data: 65/98
FETCH ComfyRegistry Data: 70/98
FETCH ComfyRegistry Data: 75/98
FETCH ComfyRegistry Data: 80/98
FETCH ComfyRegistry Data: 85/98
FETCH ComfyRegistry Data: 90/98
FETCH ComfyRegistry Data: 95/98
FETCH ComfyRegistry Data [DONE]
[ComfyUI-Manager] default cache updated: https://api.comfy.org/nodes
FETCH DATA from: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/custom-node-list.json[ComfyUI-Manager] Due to a network error, switching to local mode.
=> custom-node-list.json
=> Cannot connect to host raw.githubusercontent.com:443 ssl:True [SSLCertVerificationError: (1, '[SSL:
CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')]
FETCH DATA from: /home/fengzi/ComfyUI/custom_nodes/ComfyUI-Manager/custom-node-list.json [DONE]
[ComfyUI-Manager] All startup tasks have been completed.

The output when I start generating images

model weight dtype torch.float16, manual cast: None
model_type FLOW
Using split attention in VAE
Using split attention in VAE
VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16
no CLIP/text encoder weights in checkpoint, the text encoder model will not be loaded.
Requested to load SD3ClipModel_
0 models unloaded.
loaded completely 9.5367431640625e+25 10644.189453125 True
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cuda:0, dtype: torch.float16
clip missing: ['text_projection.weight']
0 models unloaded.
0 models unloaded.
Requested to load SD3
loaded partially 4666.70763671875 4662.8812255859375 0
 20%|████████████████▊                                                                   | 8/40 [01:02<03:10,  5.97s/it]

rocminfo output

fengzi@Zephyr-PC:~$ rocminfo
WSL environment detected.
=====================
HSA System Attributes
=====================
Runtime Version:         1.1
Runtime Ext Version:     1.7
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE
System Endianness:       LITTLE
Mwaitx:                  DISABLED
XNACK enabled:           NO
DMAbuf Support:          YES
VMM Support:             YES

==========
HSA Agents
==========
*******
Agent 1
*******
 Name:                    Intel(R) Core(TM) Ultra 7 265K
 Uuid:                    CPU-XX
 Marketing Name:          Intel(R) Core(TM) Ultra 7 265K
 Vendor Name:             CPU
 Feature:                 None specified
 Profile:                 FULL_PROFILE
 Float Round Mode:        NEAR
 Max Queue Number:        0(0x0)
 Queue Min Size:          0(0x0)
 Queue Max Size:          0(0x0)
 Queue Type:              MULTI
 Node:                    0
 Device Type:             CPU
 Cache Info:
   L1:                      49152(0xc000) KB
 Chip ID:                 0(0x0)
 Cacheline Size:          64(0x40)
 Internal Node ID:        0
 Compute Unit:            20
 SIMDs per CU:            0
 Shader Engines:          0
 Shader Arrs. per Eng.:   0
 Memory Properties:
 Features:                None
 Pool Info:
   Pool 1
     Segment:                 GLOBAL; FLAGS: FINE GRAINED
     Size:                    24329392(0x1733cb0) KB
     Allocatable:             TRUE
     Alloc Granule:           4KB
     Alloc Recommended Granule:4KB
     Alloc Alignment:         4KB
     Accessible by all:       TRUE
   Pool 2
     Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
     Size:                    24329392(0x1733cb0) KB
     Allocatable:             TRUE
     Alloc Granule:           4KB
     Alloc Recommended Granule:4KB
     Alloc Alignment:         4KB
     Accessible by all:       TRUE
   Pool 3
     Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
     Size:                    24329392(0x1733cb0) KB
     Allocatable:             TRUE
     Alloc Granule:           4KB
     Alloc Recommended Granule:4KB
     Alloc Alignment:         4KB
     Accessible by all:       TRUE
   Pool 4
     Segment:                 GLOBAL; FLAGS: COARSE GRAINED
     Size:                    24329392(0x1733cb0) KB
     Allocatable:             TRUE
     Alloc Granule:           4KB
     Alloc Recommended Granule:4KB
     Alloc Alignment:         4KB
     Accessible by all:       TRUE
 ISA Info:
*******
Agent 2
*******
 Name:                    gfx1201
 Marketing Name:          AMD Radeon RX 9070 XT
 Vendor Name:             AMD
 Feature:                 KERNEL_DISPATCH
 Profile:                 BASE_PROFILE
 Float Round Mode:        NEAR
 Max Queue Number:        128(0x80)
 Queue Min Size:          64(0x40)
 Queue Max Size:          131072(0x20000)
 Queue Type:              MULTI
 Node:                    1
 Device Type:             GPU
 Cache Info:
   L1:                      32(0x20) KB
   L3:                      65536(0x10000) KB
 Chip ID:                 30032(0x7550)
 Cacheline Size:          64(0x40)
 Max Clock Freq. (MHz):   2570
 Internal Node ID:        1
 Compute Unit:            64
 SIMDs per CU:            2
 Shader Engines:          4
 Shader Arrs. per Eng.:   2
 Coherent Host Access:    FALSE
 Memory Properties:
 Features:                KERNEL_DISPATCH
 Fast F16 Operation:      TRUE
 Wavefront Size:          32(0x20)
 Workgroup Max Size:      1024(0x400)
 Workgroup Max Size per Dimension:
   x                        1024(0x400)
   y                        1024(0x400)
   z                        1024(0x400)
 Max Waves Per CU:        32(0x20)
 Max Work-item Per CU:    1024(0x400)
 Grid Max Size:           4294967295(0xffffffff)
 Grid Max Size per Dimension:
   x                        4294967295(0xffffffff)
   y                        4294967295(0xffffffff)
   z                        4294967295(0xffffffff)
 Max fbarriers/Workgrp:   32
 Packet Processor uCode:: 58
 SDMA engine uCode::      0
 IOMMU Support::          None
 Pool Info:
   Pool 1
     Segment:                 GLOBAL; FLAGS: COARSE GRAINED
     Size:                    16695296(0xfec000) KB
     Allocatable:             TRUE
     Alloc Granule:           4KB
     Alloc Recommended Granule:2048KB
     Alloc Alignment:         4KB
     Accessible by all:       FALSE
   Pool 2
     Segment:                 GROUP
     Size:                    64(0x40) KB
     Allocatable:             FALSE
     Alloc Granule:           0KB
     Alloc Recommended Granule:0KB
     Alloc Alignment:         0KB
     Accessible by all:       FALSE
 ISA Info:
   ISA 1
     Name:                    amdgcn-amd-amdhsa--gfx1201
     Machine Models:          HSA_MACHINE_MODEL_LARGE
     Profiles:                HSA_PROFILE_BASE
     Default Rounding Mode:   NEAR
     Default Rounding Mode:   NEAR
     Fast f16:                TRUE
     Workgroup Max Size:      1024(0x400)
     Workgroup Max Size per Dimension:
       x                        1024(0x400)
       y                        1024(0x400)
       z                        1024(0x400)
     Grid Max Size:           4294967295(0xffffffff)
     Grid Max Size per Dimension:
       x                        4294967295(0xffffffff)
       y                        4294967295(0xffffffff)
       z                        4294967295(0xffffffff)
     FBarrier Max Size:       32
   ISA 2
     Name:                    amdgcn-amd-amdhsa--gfx12-generic
     Machine Models:          HSA_MACHINE_MODEL_LARGE
     Profiles:                HSA_PROFILE_BASE
     Default Rounding Mode:   NEAR
     Default Rounding Mode:   NEAR
     Fast f16:                TRUE
     Workgroup Max Size:      1024(0x400)
     Workgroup Max Size per Dimension:
       x                        1024(0x400)
       y                        1024(0x400)
       z                        1024(0x400)
     Grid Max Size:           4294967295(0xffffffff)
     Grid Max Size per Dimension:
       x                        4294967295(0xffffffff)
       y                        4294967295(0xffffffff)
       z                        4294967295(0xffffffff)
     FBarrier Max Size:       32
*** Done ***

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How does ComfyUI run on AMD graphics cards via WSL? #9969

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

How does ComfyUI run on AMD graphics cards via WSL? #9969

Uh oh!

FengZi-lv Sep 21, 2025

Replies: 0 comments

FengZi-lv
Sep 21, 2025