Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[enhancement]: CFG++ #6516

Open
keturn opened this issue Jun 15, 2024 · 6 comments
Open

[enhancement]: CFG++ #6516

keturn opened this issue Jun 15, 2024 · 6 comments
Labels
enhancement New feature or request

Comments

@keturn
Copy link
Contributor

keturn commented Jun 15, 2024

CFG++, like CFG Rescale, is an attempt to address the way the linear Classifier-Free Guidance function is prone to producing out-of-distribution values.

@keturn keturn added the enhancement New feature or request label Jun 15, 2024
@keturn
Copy link
Contributor Author

keturn commented Jun 15, 2024

As I understand it, the math is straightforward. But it clashes in an awful way with how Schedulers are abstracted in diffusers (and thus Invoke). I've created this issue so there's a place to keep notes about that.

@keturn
Copy link
Contributor Author

keturn commented Jun 15, 2024

In current https://github.com/huggingface/diffusers (0.29), Schedulers perform two jobs:

  1. defining the schedule of timesteps the diffusion model is run at. e.g. a 5-step schedule results in timesteps [1000, 800, 600, 400, 200]. Thus the name Scheduler.
  2. extrapolating from the outputs of unet.forward(latent, t, …) calls to get the latents for the next timestep in the schedule. (Sometimes this is trivial, but more sophisticated schedulers extrapolate based on the results of multiple previous steps.)

The snag is that there's also the guidance function that extrapolates a result from the combination of conditioned and unconditioned results. In diffusers, that guidance function is hardcoded in to the pipeline's sampling loop, and the scheduler's step method has no access to it.

In CFG++, they start with the same guidance function guided_model_output = unconditioned_model_output + guidance_scale * conditioned_model_output, but then they have the extrapolation to the next timestep take both unconditioned_model_output and guided_model_output into account.

That seems like an entirely reasonable choice to me, but I don't see any way of accommodating it in the current Scheduler API.

(The current reference implementation for CFG++ uses diffusers for the unet model, but eschews both the diffusers-provided Scheduler and Pipeline.)

@keturn
Copy link
Contributor Author

keturn commented Jun 15, 2024

Possible approaches:

A. use existing Schedulers, take their SchedulerOutput.pred_original_sample, exploit the fact that their scheduler.alphas is accessible, and re-compute the next sample using CFG++. Cons: many schedulers don't output pred_original_sample, there's no real guarantee that schedulers expose an alphas property or use it in the same way. Makes some part of the scheduler.step's calculations redundant, and our after-the-fact re-compute method might not take advantage of everything the scheduler has to offer.

B. make a new CFGPlusPlusScheduler and add something like this before our call to scheduler.step:

if scheduler_handles_cfg(scheduler):  # check for a flag or introspect method args or something
    scheduler_extra_kwargs.update({
        "unconditioned_model_output": unconditioned_model_output, 
        "conditioned_model_output": conditioned_model_output})

Cons: Still not a nicely-typed interface. Requires re-implementing each Scheduler we want to use.

@keturn
Copy link
Contributor Author

keturn commented Jun 16, 2024

additional design considerations:

we already have InvokeAIDiffuserCompnonent._combine factored out and guidance_rescale_multiplier implemented inline before the scheduler.step call and arbitrary additional_guidance functions called after. We don't currently pass the two components of CFG to it, but we could.

additional additional design considerations:

Guiding a Diffusion Model with a Bad Version of Itself [Karras, 2024] computes the two components of CFG with different models. Which might sound horrible at first glance, but if the "unconditioned" model is much cheaper to run, then that's potentially a big win, because it means that each step is slow + cheap instead of slow x 2.

@dunkeroni
Copy link
Contributor

There is a CFG++ implementation available in https://github.com/dunkeroni/InvokeAI_ModularDenoiseNodes for anyone who wants to try it out. In my testing, I find it to be almost identical to regular CFG as long as the CFG scale is kept to sane levels. The premise of the paper seems to be "CFG looks bad at 9 - 12 on SDXL, so we made a different CFG and artificially capped it at 1."

CFG++ is certainly capable of the same faults of CFG, including two tails on a dog and other things they point out as it fixing in their paper. I have not been able to show any improvement in text generation either. Most results end up looking nearly identical to the non-++ version if you can find the correct normal CFG level that matches it.
image

I will say that there are occasionally smaller details that show up with more prevalence or better regularity in CFG++, but they are the sort of details that also show up better in different schedulers (I have only implemented the DDIM version).
image

Part of the problem might be what is considered "out of manifold" is not guaranteed to be a worse image, and assuming everything in-manifold is good and everything out of manifold is bad (while theoretically the correct way to consider models) might be oversimplifying.
image

However, for models that sometimes produce useless mush (looking at you, Pony), CFG++ does have the potential to save it. This was one in about 20 comparison gens where the rest were identical, but when you are specifically prompting for things that a model is bad at then CFG++ occasionally comes out well above standard CFG.
image

@keturn
Copy link
Contributor Author

keturn commented Jun 27, 2024

Their other claim is that it helps with invert denoise methods. I can see that, because that technique is prone to ending up in places far outside distribution as it attempts to work with an image that may or may not be well fit to the prompt or model.

idk if folks have been using that type of workflow. I did hack together a quick and dirty Invert Denoise Invocation a while back, but it probably needs a bit of updating for Invoke 4.2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants