You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This improves GPU utilization from ~50% average to ~95% on my 1070 but then requires >8GB VRAM for medium size or larger images, otherwise will spill into shared memory and be much slower. It'll need a full restart of Forge to take effect after making the change.
Currently investigating modifications to the pipe - seems like it encodes the prompt for every tile.
I've reworked the model handling: now only encodes the prompt once, VAE encodes all tiles, then inferences all tiles, then VAE decodes all tiles - so reduced VRAM usage + much less model moving on GPUs with low VRAM. (original HF Space implementation, for each tile: encode prompt > VAE encode > inference > VAE decode; with potential CLIP/VAE/unet moving before each stage).
Also uses Forge's model movement, which should be better than diffusers model offloading, although this might not make any real difference after the main changes.
Thanks, now it works much better/faster. Anyway I think this SR is a bit unpolished (not to get results as described here zsyOAOA/InvSR#5) and needs good quality input (if upscale real images for example as separate tool at spaces tab) and better faces handling but it's question to its developer and not you.
accelerate==0.26.0
diffusers==0.32.0
P.S. Btw processing speed is very slow at NV RTX 4070 and about 1/4 of its power is used.
The text was updated successfully, but these errors were encountered: