Merge pull request #26 from blepping/attention_improvements

October update
blepping · Oct 14, 2024 · 4e66c60 · 4e66c60
2 parents 86afeb7 + 71435fc
commit 4e66c60
Show file tree

Hide file tree

Showing 7 changed files with 1,121 additions and 308 deletions.
diff --git a/README.md b/README.md
@@ -187,6 +187,67 @@ of 32, 64 or 128 (may need to experiment). Known to work with ELLA, FreeU (V2),
 Input blocks downscale and output blocks upscale so the biggest effect on performance will be applying this
 to input blocks with a low block number and output blocks with a high block number.
 
+<details>
+
+<summary>YAML parameters</summary>
+
+This input can be converted to a multi-line text widget. Allows setting advanced/rare parameters. You can also override the node parameters here. JSON is valid YAML so you can use that if you prefer.
+
+Default parameter values:
+
+```yaml
+# In addition to the extra advanced options, you can override any fields from
+# the node here. For example:
+# time_mode: percent
+
+# Scale mode used as a fallback only when image sizes are not multiples of 64. May decrease image quality.
+# May also be set to "disabled" to disable the workaround or "skip" to skip MSW-MSA attention on incompatible sizes.
+scale_mode: nearest-exact
+
+# Scale mode used to reverse the scale_mode scaling.
+reverse_scale_mode: nearest-exact
+
+# One of global, block, both, ignore
+last_shift_mode: global
+
+# One of decrement, increment, retry
+last_shift_strategy: decrement
+
+# Can be enabled to disable the log warning about incompatible image sizes.
+silent: false
+
+# Allow scaling the window before/after the window or window reverse operation.
+pre_window_multiplier: 1.0
+post_window_multiplier: 1.0
+pre_window_reverse_multiplier: 1.0
+post_window_reverse_multiplier: 1.0
+
+# Positive/negative distance to search for candidate rescales when dealing with incompatible
+# resolutions. Can possibly be used to brute force attn2 application (you can set it to something
+# absurd like 32).
+rescale_search_tolerance: 1
+
+# Not recommended. Forces applying the attention patch to attn2.
+force_apply_attn2: false
+
+# Logging verbosity level. 1 - Dumps config at startup. 2 - Warnings are also no longer throttled.
+verbose: 0
+```
+
+
+* `scale_mode`: Scale mode used as a fallback only when image sizes are not multiples of 64. May decrease image quality. Use `disabled` to bypass the fallback (may result in error) or `skip` to skip using MSW-MSA attention when the image size is incompatible. Any of the available scaling modes may be used here.
+* `reverse_scale_mode`: Scale mode used to reverse the scaling done by `scale_mode`. No effect when `scale_mode` is not being applied.
+* `last_shift_mode`: `global` - tracking is independent of blocks. `block` - remembers the last shift by block. `both` - avoids using the last shift both by block and globally. `ignore` - just uses whatever shift was randomly picked.
+* `last_shift_strategy`: Only has an effect when `last_shift_mode` is not `ignore`. There are four possible shift types. `decrement` - decrements the shift type. `increment` - increments the shift type. `retry` - keeps generating random shifts until it hits one not on the ignore list (changes seeds most significantly).
+* `pre_window_multipler` (etc): You can multiply the tensor before/after the window or window reverse operation. There's generally no difference between doing it before or after unless you're using weird upscale modes from [ComfyUI-bleh](https://github.com/blepping/ComfyUI-bleh). I don't know why/when this would be useful, but it's there if you want to mess with it!
+* `force_apply_attn2`: Forces applying to attn2 rather than attn1. **Warning**: MSW-MSA attention was not made for `attn2` and the sizes are guaranteed to be incompatible and require scaling. Using it also doesn't seem to improve performance, there isn't much reason to enable this unless you're a weirdo like me and just like trying strange things.
+
+The last shift options are for trying to avoid choosing the same shift size consecutively. This may or may not actually be helpful.
+
+**Note**: Normal error checking generally doesn't apply to parameters set/overriden here. You are allowed to shoot yourself in the foot and will likely just get an exception if you enter the wrong type/an absurd value.
+
+</details>
+
 ### `ApplyRAUNet`
 
 **Use case**: Helps avoid artifacts when generating at resolutions significantly higher than what the model
@@ -258,6 +319,65 @@ probably work best if you don't want to manually set segments.
 other scaling effects that target the same blocks (i.e. Deep Shrink). By itself, I think it should be fine with
 HyperTile and Deep Cache though I haven't actually tested that. May not work properly with ControlNet.
 
+<details>
+
+<summary>YAML parameters</summary>
+
+This input can be converted to a multi-line text widget. Allows setting advanced/rare parameters. You can also override the node parameters here. JSON is valid YAML so you can use that if you prefer.
+
+Default parameter values:
+
+```yaml
+# In addition to the extra advanced options, you can override any fields from
+# the node here. For example:
+# time_mode: percent
+
+# Patches input blocks after the skip connection when enabled (similar to Kohya deep shrink).
+ca_input_after_skip_mode: false
+
+# Either null or set to a time (with the same time mode as the other times).
+# Starts fading out the CA scaling effect, starting from the specified time.
+ca_fadeout_start_time: null
+
+# Maximum fadeout, specified as a percentage of the total scaling effect.
+ca_fadeout_cap: 0.0
+
+# null or float. Allows setting the width scale separately. When null the same
+# factor will be used for height and width.
+ca_downscale_factor_w: null
+
+# When applying CA scaling, ensures the rescaled latent is divisible by the specified incremenrt.
+ca_latent_pixel_increment: 8
+
+# When using the avg_pool2d method, enable ceil mode.
+# See: https://pytorch.org/docs/stable/generated/torch.nn.functional.avg_pool2d.html
+ca_avg_pool2d_ceil_mode: true
+
+# Allows applying a multiplier to the tensor: can be set separately for before/after upscale, downscale
+# and whether it's CA or not.
+pre_upscale_multiplier: 1.0
+post_upscale_multiplier: 1.0
+pre_downscale_multiplier: 1.0
+post_downscale_multiplier: 1.0
+ca_pre_upscale_multiplier: 1.0
+ca_post_upscale_multiplier: 1.0
+ca_pre_downscale_multiplier: 1.0
+ca_post_downscale_multiplier: 1.0
+
+# Logging verbosity level. 1 - Dumps configuration on startup.
+verbose: 0
+```
+
+* `ca_input_after_skip_mode`: When applying CA scaling, the effect will occur after the skip connection. This is the default for Kohya Deep Shrink and may produce less noisy results. **Note**: This changes the corresponding output block you need to set if not targeting a downscale block (i.e. ones you can target with the main RAUNet effect). It seems like you generally just subtract one. Example: Using SD15 and targeting input 4, you'd normally use output 8 - use output 7 instead.
+* `ca_latent_pixel_increment`: Ensures the scaled sizes are a multiple of the latent pixel increment. The default of 8 should ensure the scaled size is compatible with MSW-MSA attention without scaling workarounds. *Note*: Has no effect when downscaling with `avg_pool2d`.
+* `ca_fadeout_start_time`: Will start fading out the CA downscale factor starting from the specified time (which uses the same time mode as other configured times). The fadeout occurs such that the downscale factor will reach `1.0` (no downscaling) at `ca_end_time`. This can (sometimes) help decrease artifacts compared to simply ending the scale effect abruptly.
+* `ca_fadeout_cap`: Only has an effect when fadeout is in effect (see above). This is expressed as a percentage of the scaling effect, so, for example, you could set it to `0.5` to fade out the first 50% of the downscale effect and after that the downscale would stay at 50% (of the total downscale effect) until `ca_end_time` is reached.
+* `pre_upscale_multipler` (etc): You can multiply the tensor before/after it's upscaled or downscaled. There's generally no difference between doing it before or after unless you're using weird upscale modes from [ComfyUI-bleh](https://github.com/blepping/ComfyUI-bleh). Should you multiply it? Maybe not! It's a setting to possibly mess with and (not very scientifically) it seems like applying a mild positive multiplier can help.
+
+**Note**: Normal error checking generally doesn't apply to parameters set/overriden here. You are allowed to shoot yourself in the foot and will likely just get an exception if you enter the wrong type/an absurd value.
+
+</details>
+
 ## Credits
 
 Code based on the HiDiffusion original implementation: https://github.com/megvii-research/HiDiffusion

diff --git a/changelog.md b/changelog.md
@@ -2,6 +2,23 @@
 
 Note, only relatively significant changes to user-visible functionality will be included here. Most recent changes at the top.
 
+## 20241014
+
+_Note_: Advanced MSW-MSA Attention node parameters changed. May break workflows.
+
+_Note_: This update may slightly change seeds.
+
+* MSW-MSA attention can now work with all images sizes. When the size is incompatible it will scale the latent which may affect quality. Contributed by @pamparamm. Thanks!
+* Scaling now tries to make the output size a multiple of 8 so it's compatible with MSW-MSA attention. May change seeds, set `ca_latent_pixel_increment: 1` in YAML parameters for the old behavior. *Note*: Does not apply if you use `avg_pool2d` for downscaling.
+* CA downscaling now uses `adaptive_avg_pool2d` as the default method which supports fractional downscale sizes. As far as I know, it's the same as `avg_pool2d` with integer sizes but it's possible this will change seeds.
+* Simple nodes now support an "auto" model type parameter that will try to guess the model from the latent type.
+* Added a `yaml_parameters` input to the advanced nodes which allows specifying advanced/uncommon parameters. See main README for possible settings.
+* You can now use a different scale factor for width and height in RAUNet CA scaling. See `ca_downscale_factor_w` in YAML parameters.
+* You can now fade out the CA scaling effect in RAUNet node. See `ca_fadeout_start_time` and `ca_fadeout_cap` in YAML parameters.
+* Simple nodes default parameters for SDXL models adjusted to match the official HiDiffusion ones more closely.
+
+Check the expandable "YAML Parameters" sections in the main README for more information about advanced parameters added in this update.
+
 ## 20240827
 
 * Fixed (hopefully) an issue with RAUNet model patching that could cause semi-non-deterministic output. Unfortunately the fix also may change seeds.