Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: DORA for Locon and LORA arent loadable "Version Mismatch" #635

Open
4 of 6 tasks
CoffeeVampir3 opened this issue Apr 3, 2024 · 5 comments
Open
4 of 6 tasks

Comments

@CoffeeVampir3
Copy link

Checklist

  • The issue exists after disabling all extensions
  • The issue exists on a clean installation of webui
  • The issue is caused by an extension, but I believe it is caused by a bug in the webui
  • The issue exists in the current version of the webui
  • The issue has not been reported before recently
  • The issue has been reported before but has not been fixed yet

What happened?

Kohaku LORA/LYCORIS modules loaded as LOKR or LOHA trained using dora decomposition method (dora_wd on lycoris) are able to be loaded, but if the module is a vanila LORA or LOCON (Which are treated by the same loader: locon come up with this error: error line

The primary difference dora is contributing is the addition of a decomposed weight parameter, it seems as though the lycoris module is being used to correctly load these for LOHA/LOKR but not being utilized for LOCON/LORA. I couldn't identify where the exact issue is.

Steps to reproduce the problem

I've uploaded an example model here to test https://huggingface.co/Blackroot/SD-DORA-Example-LORA/tree/main
This is a lycoris LORA trained using kohya-sd scripts with the following network parameters:
https://gist.github.com/CoffeeVampir3/dde66b3df88d32fa88f4d02d4bc0e901#file-dora_train2-sh-L93

Attempting to use this lora gives the aformentioned version error and does not apply the lora.

What should have happened?

The lora should be applied.

What browsers do you use to access the UI ?

No response

Sysinfo

sysinfo-2024-04-03-13-06.json

Console logs

(base) ➜  stable-diffusion-webui-forge git:(main) ✗ ./webui.sh --cuda-stream --pin-shared-memory

################################################################
Install script for stable-diffusion + Web UI
Tested on Debian 11 (Bullseye), Fedora 34+ and openSUSE Leap 15.4 or newer.
################################################################

################################################################
Running on blackroot user
################################################################

################################################################
Repo already cloned, using it as install directory
################################################################

################################################################
Create and activate python venv
################################################################

################################################################
Launching launch.py...
################################################################
glibc version is 2.39
Check TCMalloc: libtcmalloc_minimal.so.4
libtcmalloc_minimal.so.4 is linked with libc.so,execute LD_PRELOAD=/usr/lib/libtcmalloc_minimal.so.4
Python 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:45:18) [GCC 12.3.0]
Version: f0.0.17v1.8.0rc-latest-276-g29be1da7
Commit hash: 29be1da7cf2b5dccfc70fbdd33eb35c56a31ffb7
Legacy Preprocessor init warning: Unable to install insightface automatically. Please try run `pip install insightface` manually.
Launching Web UI with arguments: --cuda-stream --pin-shared-memory
Total VRAM 24238 MB, total RAM 128676 MB
Set vram state to: NORMAL_VRAM
Always pin shared GPU memory
Device: cuda:0 NVIDIA GeForce RTX 3090 Ti : native
Hint: your device supports --cuda-malloc for potential speed improvements.
VAE dtype: torch.bfloat16
CUDA Stream Activated:  True
Using pytorch cross attention
ControlNet preprocessor location: /home/blackroot/Desktop/stable-diffusion-webui-forge/models/ControlNetPreprocessor
Loading weights [8ea2b6e4e2] from /home/blackroot/Desktop/stable-diffusion-webui-forge/models/Stable-diffusion/CHEYENNE_v16.safetensors
2024-04-03 07:05:43,361 - ControlNet - INFO - ControlNet UI callback registered.
Running on local URL:  http://127.0.0.1:7861

To create a public link, set `share=True` in `launch()`.
model_type EPS
UNet ADM Dimension 2816
Startup time: 5.5s (prepare environment: 1.2s, import torch: 1.8s, import gradio: 0.3s, setup paths: 0.4s, other imports: 0.2s, load scripts: 0.7s, create ui: 0.4s, gradio launch: 0.4s).
Using pytorch attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using pytorch attention in VAE
extra {'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids', 'cond_stage_model.clip_l.logit_scale', 'cond_stage_model.clip_l.text_projection'}
left over keys: dict_keys(['denoiser.sigmas'])
To load target model SDXLClipModel
Begin to load 1 model
[Memory Management] Current Free GPU Memory (MB) =  9310.80859375
[Memory Management] Model Memory (MB) =  2144.3546981811523
[Memory Management] Minimal Inference Memory (MB) =  1024.0
[Memory Management] Estimated Remaining GPU Memory (MB) =  6142.453895568848
Moving model(s) has taken 0.20 seconds
Model loaded in 2.9s (load weights from disk: 0.4s, forge load real models: 2.1s, calculate empty prompt: 0.3s).
[LORA] LoRA version mismatch for SDXL: /mnt/ue/loras/dpr1.safetensors

Additional information

No response

@jetjodh
Copy link

jetjodh commented Apr 3, 2024

#608 will solve this hopefully

@altarofwisdom
Copy link

facing this issue as well with a kohya generated lora of mine...

@jt-michels
Copy link

jt-michels commented Apr 15, 2024

Was going to report an issue, but I think it's related to or the same as this one... However, I don't use DORA - I use almost exclusively IA3 due to its lightweight efficiency, and I cannot get most of them to work on Forge.

I get the same error (LORA: Version Mismatch) and the IA3 (or sometimes other DyLoRA or LyCORIS) are not applied to the inference.

I have some that do work though, so I am going to see what parameter are different on those that would be the culprit (will def check weigh decomp first)
Edit: below table shows direct comparison delta of parameters (other than basic stuff like # of steps, etc.) of an IA3 I trained on 03/07 that works with no issue in Forge vs one I trained the other day that does not get applied and shows mismatch issue (but works in SD Webui with no issue)
Hopefully this may help figure out the root cause

IA3 Version/ Hyperparameter Older Version (Working IA3) New IA3 (doesn't work on Forge)
Trained Date 2024-04-13T07:35:39 2024-03-07T10:41:28
Base Model Used Kohya SDXL Base Uses local SDXL Base w/ VAE
SNR Gamma Used 1.0 Min SNR Gamma No min SNR Gamma
Timestep_range N/A ( was not an option ) Used default timestep_range [0,1000]*
Loss_type N/A ( was not an option ) L2 (Default)*
bucket_no_upscale TRUE FALSE
debiased_estimation FALSE TRUE
noise_offset_random_strength N/A ( was not an option ) FALSE*
ia3_train_on_input False ( was not an option) TRUE
huber_c N/A ( was not an option ) 0.1*
huber_schedule N/A ( was not an option ) SNR*
max_token_length N/A ( was not an option ) 150*
gradient_accumulation 1 2
safeguard_warmup FALSE TRUE
Betas betas = [0.9,0.999] betas = [0.935,0.985]
Weight_Decay 0.1 0.15
Scale_Weight_Norms 5 FALSE*
Precision bf16 fp16
ip_noise_gamma N/A ( was not an option ) ip_noise_gamma =.1*
ip_noise_gamma_random_strength N/A ( was not an option ) TRUE
Keep_tokens 0 1
shuffle_caption FALSE TRUE

^ If I put strikethrough in the third column that means I went back and tried to recreate an IA3 with the old settings from the functional IA3 in column 2 (did not produce a working version)
* If I marked with a * that means that I could not revert the parameter to the previous value because of changes in the UI (mostly new fields except for "scale_weight_norms"

@catboxanon
Copy link
Collaborator

IA3 is not supported in Forge, because it is also not supported in ComfyUI (the LoRA backend Forge uses). comfyanonymous/ComfyUI#1071 (comment)

I'm not sure where comfyanonymous "officially" mentioned this. Probably Matrix or 4chan. If you look at the source it's clear no implementation exists though, unlike the webui.

@jt-michels
Copy link

jt-michels commented Apr 15, 2024

IA3 is not supported in Forge, because it is also not supported in ComfyUI (the LoRA backend Forge uses). comfyanonymous/ComfyUI#1071 (comment)

I'm not sure where comfyanonymous "officially" mentioned this. Probably Matrix or 4chan. If you look at the source it's clear no implementation exists though, unlike the webui.

Hmm, thanks... So... that's weird that most of my 'older' ( like 1 month + ) IA3s do function as expected in Forge, but I can't recreate them with the existing parameters from Kohya on current version

PS: Y'all are missing out! I appreciate the speed up in inference in Forge but it's not worth the trade for me. IA3 takes 15 minutes to train and weighs in at 1.2MB for SDXL finetune

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants