-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[enhancement]: CFG++ #6516
Comments
As I understand it, the math is straightforward. But it clashes in an awful way with how Schedulers are abstracted in diffusers (and thus Invoke). I've created this issue so there's a place to keep notes about that. |
In current https://github.com/huggingface/diffusers (0.29), Schedulers perform two jobs:
The snag is that there's also the guidance function that extrapolates a result from the combination of In CFG++, they start with the same guidance function That seems like an entirely reasonable choice to me, but I don't see any way of accommodating it in the current Scheduler API. (The current reference implementation for CFG++ uses |
Possible approaches: A. use existing Schedulers, take their B. make a new CFGPlusPlusScheduler and add something like this before our call to if scheduler_handles_cfg(scheduler): # check for a flag or introspect method args or something
scheduler_extra_kwargs.update({
"unconditioned_model_output": unconditioned_model_output,
"conditioned_model_output": conditioned_model_output}) Cons: Still not a nicely-typed interface. Requires re-implementing each Scheduler we want to use. |
additional design considerations: we already have additional additional design considerations: Guiding a Diffusion Model with a Bad Version of Itself [Karras, 2024] computes the two components of CFG with different models. Which might sound horrible at first glance, but if the "unconditioned" model is much cheaper to run, then that's potentially a big win, because it means that each step is slow + cheap instead of slow x 2. |
There is a CFG++ implementation available in https://github.com/dunkeroni/InvokeAI_ModularDenoiseNodes for anyone who wants to try it out. In my testing, I find it to be almost identical to regular CFG as long as the CFG scale is kept to sane levels. The premise of the paper seems to be "CFG looks bad at 9 - 12 on SDXL, so we made a different CFG and artificially capped it at 1." CFG++ is certainly capable of the same faults of CFG, including two tails on a dog and other things they point out as it fixing in their paper. I have not been able to show any improvement in text generation either. Most results end up looking nearly identical to the non-++ version if you can find the correct normal CFG level that matches it. I will say that there are occasionally smaller details that show up with more prevalence or better regularity in CFG++, but they are the sort of details that also show up better in different schedulers (I have only implemented the DDIM version). Part of the problem might be what is considered "out of manifold" is not guaranteed to be a worse image, and assuming everything in-manifold is good and everything out of manifold is bad (while theoretically the correct way to consider models) might be oversimplifying. However, for models that sometimes produce useless mush (looking at you, Pony), CFG++ does have the potential to save it. This was one in about 20 comparison gens where the rest were identical, but when you are specifically prompting for things that a model is bad at then CFG++ occasionally comes out well above standard CFG. |
Their other claim is that it helps with invert denoise methods. I can see that, because that technique is prone to ending up in places far outside distribution as it attempts to work with an image that may or may not be well fit to the prompt or model. idk if folks have been using that type of workflow. I did hack together a quick and dirty Invert Denoise Invocation a while back, but it probably needs a bit of updating for Invoke 4.2. |
CFG++, like CFG Rescale, is an attempt to address the way the linear Classifier-Free Guidance function is prone to producing out-of-distribution values.
The text was updated successfully, but these errors were encountered: