You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
However, when I run ComfyUI, it doesn't work on my GPU. Instead, all its work is done on the CPU, which is very bad. Is there a way to fix this problem?
Thanks.
How do I operate
fengzi@Zephyr-PC:~$ cd ~/ComfyUI
fengzi@Zephyr-PC:~/ComfyUI$ source venv/bin/activate
(venv) fengzi@Zephyr-PC:~/ComfyUI$ python main.py
[START] Security scan
[ComfyUI-Manager] Using uv as Python module for pip operations.
Using Python 3.12.3 environment at: venv
[DONE] Security scan
## ComfyUI-Manager: installing dependencies done.
** ComfyUI startup time: 2025-09-21 22:23:12.238
** Platform: Linux
** Python version: 3.12.3 (main, Aug 14 2025, 17:47:21) [GCC 13.3.0]
** Python executable: /home/fengzi/ComfyUI/venv/bin/python
** ComfyUI Path: /home/fengzi/ComfyUI
** ComfyUI Base Folder Path: /home/fengzi/ComfyUI
** User directory: /home/fengzi/ComfyUI/user
** ComfyUI-Manager config path: /home/fengzi/ComfyUI/user/default/ComfyUI-Manager/config.ini
** Log path: /home/fengzi/ComfyUI/user/comfyui.log
Using Python 3.12.3 environment at: venv
Using Python 3.12.3 environment at: venv
Prestartup times for custom nodes:
3.2 seconds: /home/fengzi/ComfyUI/custom_nodes/ComfyUI-Manager
Checkpoint files will always be loaded safely.
Total VRAM 16304 MB, total RAM 23759 MB
pytorch version: 2.6.0+rocm6.4.2.git76481f7c
AMD arch: gfx1201
ROCm version: (6, 4)
Set vram state to: NORMAL_VRAM
Device: cuda:0 AMD Radeon RX 9070 XT : native
Using sub quadratic optimization for attention, if you have memory or speed issues try using: --use-split-cross-attention
Python version: 3.12.3 (main, Aug 14 2025, 17:47:21) [GCC 13.3.0]
ComfyUI version: 0.3.59
ComfyUI frontend version: 1.26.13
[Prompt Server] web root: /home/fengzi/ComfyUI/venv/lib/python3.12/site-packages/comfyui_frontend_package/static
### Loading: ComfyUI-Manager (V3.37)
[ComfyUI-Manager] network_mode: public
### ComfyUI Revision: 3898 [72212fef] *DETACHED | Released on '2025-09-10'
Import times for custom nodes:
0.0 seconds: /home/fengzi/ComfyUI/custom_nodes/websocket_image_save.py
0.4 seconds: /home/fengzi/ComfyUI/custom_nodes/ComfyUI-Manager
Will assume non-transactional DDL.
No target revision found.
Starting server
To see the GUI go to: http://127.0.0.1:8188
FETCH ComfyRegistry Data: 5/98
FETCH ComfyRegistry Data: 10/98
FETCH ComfyRegistry Data: 15/98
FETCH ComfyRegistry Data: 20/98
FETCH ComfyRegistry Data: 25/98
FETCH ComfyRegistry Data: 30/98
FETCH ComfyRegistry Data: 35/98
FETCH ComfyRegistry Data: 40/98
FETCH ComfyRegistry Data: 45/98
FETCH ComfyRegistry Data: 50/98
FETCH ComfyRegistry Data: 55/98
FETCH ComfyRegistry Data: 60/98
FETCH ComfyRegistry Data: 65/98
FETCH ComfyRegistry Data: 70/98
FETCH ComfyRegistry Data: 75/98
FETCH ComfyRegistry Data: 80/98
FETCH ComfyRegistry Data: 85/98
FETCH ComfyRegistry Data: 90/98
FETCH ComfyRegistry Data: 95/98
FETCH ComfyRegistry Data [DONE]
[ComfyUI-Manager] default cache updated: https://api.comfy.org/nodes
FETCH DATA from: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/custom-node-list.json[ComfyUI-Manager] Due to a network error, switching to local mode.
=> custom-node-list.json
=> Cannot connect to host raw.githubusercontent.com:443 ssl:True [SSLCertVerificationError: (1, '[SSL:
CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')]
FETCH DATA from: /home/fengzi/ComfyUI/custom_nodes/ComfyUI-Manager/custom-node-list.json [DONE]
[ComfyUI-Manager] All startup tasks have been completed.
The output when I start generating images
model weight dtype torch.float16, manual cast: None
model_type FLOW
Using split attention in VAE
Using split attention in VAE
VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16
no CLIP/text encoder weights in checkpoint, the text encoder model will not be loaded.
Requested to load SD3ClipModel_
0 models unloaded.
loaded completely 9.5367431640625e+25 10644.189453125 True
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cuda:0, dtype: torch.float16
clip missing: ['text_projection.weight']
0 models unloaded.
0 models unloaded.
Requested to load SD3
loaded partially 4666.70763671875 4662.8812255859375 0
20%|████████████████▊ | 8/40 [01:02<03:10, 5.97s/it]
rocminfo output
fengzi@Zephyr-PC:~$ rocminfo
WSL environment detected.
=====================
HSA System Attributes
=====================
Runtime Version: 1.1
Runtime Ext Version: 1.7
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
Mwaitx: DISABLED
XNACK enabled: NO
DMAbuf Support: YES
VMM Support: YES
==========
HSA Agents
==========
*******
Agent 1
*******
Name: Intel(R) Core(TM) Ultra 7 265K
Uuid: CPU-XX
Marketing Name: Intel(R) Core(TM) Ultra 7 265K
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 49152(0xc000) KB
Chip ID: 0(0x0)
Cacheline Size: 64(0x40)
Internal Node ID: 0
Compute Unit: 20
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
Memory Properties:
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 24329392(0x1733cb0) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 24329392(0x1733cb0) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 24329392(0x1733cb0) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 4
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 24329392(0x1733cb0) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
*******
Agent 2
*******
Name: gfx1201
Marketing Name: AMD Radeon RX 9070 XT
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 32(0x20) KB
L3: 65536(0x10000) KB
Chip ID: 30032(0x7550)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2570
Internal Node ID: 1
Compute Unit: 64
SIMDs per CU: 2
Shader Engines: 4
Shader Arrs. per Eng.: 2
Coherent Host Access: FALSE
Memory Properties:
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 32(0x20)
Max Work-item Per CU: 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 58
SDMA engine uCode:: 0
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 16695296(0xfec000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Recommended Granule:0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1201
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
ISA 2
Name: amdgcn-amd-amdhsa--gfx12-generic
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
I have completed all the steps here and successfully installed ROCM and PyTorch.
However, when I run ComfyUI, it doesn't work on my GPU. Instead, all its work is done on the CPU, which is very bad. Is there a way to fix this problem?
Thanks.
Beta Was this translation helpful? Give feedback.
All reactions