Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

img2img pipeline #22

Open
realimposter opened this issue Mar 3, 2023 · 3 comments
Open

img2img pipeline #22

realimposter opened this issue Mar 3, 2023 · 3 comments

Comments

@realimposter
Copy link

realimposter commented Mar 3, 2023

any chance you can add an example of using control net with img2img to the Collab doc (without inpainting)?

I followed the instructions and tried adding the StableDiffusionControlNetInpaintImg2ImgPipeline class without any luck :

from diffusers.utils import load_image
from diffusers import StableDiffusionInpaintPipeline, StableDiffusionControlNetInpaintImg2ImgPipeline,  ControlNetModel

# we have downloaded models locally, you can also load from huggingface
# control_sd15_seg is converted from control_sd15_seg.safetensors using instructions above
controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-depth")
pipe_control = StableDiffusionControlNetInpaintImg2ImgPipeline.from_pretrained("C:/Users/User/Desktop/cnet_img2img/diffusers/control_openjourney-v2_depth",controlnet=controlnet,torch_dtype=torch.float16).to('cuda')
pipe_inpaint = StableDiffusionInpaintPipeline.from_pretrained("prompthero/openjourney-v2",torch_dtype=torch.float16).to('cuda')

# yes, we can directly replace the UNet
pipe_control.unet = pipe_inpaint.unet
pipe_control.unet.in_channels = 4

# we also the same example as stable-diffusion-inpainting
image = load_image("https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png")
mask = load_image("https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png")

# the segmentation result is generated from https://huggingface.co/spaces/hysts/ControlNet
control_image = load_image("https://raw.githubusercontent.com/haofanwang/ControlNet-for-Diffusers/main/images/desk_depth.png")

image = pipe_control(prompt="Face of a yellow cat, high resolution, sitting on a park bench", 
                     negative_prompt="lowres, bad anatomy, worst quality, low quality",
                     controlnet_hint=control_image, 
                     image=image,
                     mask_image=mask,
                     width=448,
                     height=640,
                     num_inference_steps=100).images[0]

image.save("inpaint_seg.jpg")

# complete this scentence```


gives me the error:

```Incorrect configuration settings! The config of `pipeline.unet`: FrozenDict([('sample_size', 64), ('in_channels', 4), ('out_channels', 4), ('center_input_sample', False), ('flip_sin_to_cos', True), ('freq_shift', 0), ('down_block_types', ['CrossAttnDownBlock2D', 'CrossAttnDownBlock2D', 'CrossAttnDownBlock2D', 'DownBlock2D']), ('mid_block_type', 'UNetMidBlock2DCrossAttn'), ('up_block_types', ['UpBlock2D', 'CrossAttnUpBlock2D', 'CrossAttnUpBlock2D', 'CrossAttnUpBlock2D']), ('only_cross_attention', False), ('block_out_channels', [320, 640, 1280, 1280]), ('layers_per_block', 2), ('downsample_padding', 1), ('mid_block_scale_factor', 1), ('act_fn', 'silu'), ('norm_num_groups', 32), ('norm_eps', 1e-05), ('cross_attention_dim', 768), ('attention_head_dim', 8), ('dual_cross_attention', False), ('use_linear_projection', False), ('class_embed_type', None), ('num_class_embeds', None), ('upcast_attention', False), ('resnet_time_scale_shift', 'default'), ('time_embedding_type', 'positional'), ('timestep_post_act', None), ('time_cond_proj_dim', None), ('conv_in_kernel', 3), ('conv_out_kernel', 3), ('projection_class_embeddings_input_dim', None), ('_class_name', 'UNet2DConditionModel'), ('_diffusers_version', '0.11.0.dev0'), ('_name_or_path', 'C:\\Users\\User\\.cache\\huggingface\\hub\\models--prompthero--openjourney-v2\\snapshots\\32e0aa8629c1d5ed82ff19f1543017bceb5f84d6\\unet')]) expects 4 but received `num_channels_latents`: 4 + `num_channels_mask`: 1 + `num_channels_masked_image`: 4 = 9. Please verify the config of `pipeline.unet` or your `mask_image` or `image` input.```
@haofanwang
Copy link
Owner

It should be easy to implement if you are familiar with diffusers pipelines. I just don't want to make this project too redundant. But I can show you some guidance soon in my free time!

@realimposter
Copy link
Author

Thanks so much! I'm new to diffusers and been struggling a bit to figure it out

@un1tz3r0
Copy link

@haofanwang i'd also really like to know how to do this too... i don't really get why i can't seem to prepare my own latents for the pipeline's latents= option by using the vae to encode an init image the way the img2img or inpaint pipelines do. i keep getting mismatched tensor sizes when the pipeline tries to add the controlnet output to the sample in the timestep loop

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants