img2img pipeline #22

realimposter · 2023-03-03T08:21:24Z

any chance you can add an example of using control net with img2img to the Collab doc (without inpainting)?

I followed the instructions and tried adding the StableDiffusionControlNetInpaintImg2ImgPipeline class without any luck :

from diffusers.utils import load_image
from diffusers import StableDiffusionInpaintPipeline, StableDiffusionControlNetInpaintImg2ImgPipeline,  ControlNetModel

# we have downloaded models locally, you can also load from huggingface
# control_sd15_seg is converted from control_sd15_seg.safetensors using instructions above
controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-depth")
pipe_control = StableDiffusionControlNetInpaintImg2ImgPipeline.from_pretrained("C:/Users/User/Desktop/cnet_img2img/diffusers/control_openjourney-v2_depth",controlnet=controlnet,torch_dtype=torch.float16).to('cuda')
pipe_inpaint = StableDiffusionInpaintPipeline.from_pretrained("prompthero/openjourney-v2",torch_dtype=torch.float16).to('cuda')

# yes, we can directly replace the UNet
pipe_control.unet = pipe_inpaint.unet
pipe_control.unet.in_channels = 4

# we also the same example as stable-diffusion-inpainting
image = load_image("https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png")
mask = load_image("https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png")

# the segmentation result is generated from https://huggingface.co/spaces/hysts/ControlNet
control_image = load_image("https://raw.githubusercontent.com/haofanwang/ControlNet-for-Diffusers/main/images/desk_depth.png")

image = pipe_control(prompt="Face of a yellow cat, high resolution, sitting on a park bench", 
                     negative_prompt="lowres, bad anatomy, worst quality, low quality",
                     controlnet_hint=control_image, 
                     image=image,
                     mask_image=mask,
                     width=448,
                     height=640,
                     num_inference_steps=100).images[0]

image.save("inpaint_seg.jpg")

# complete this scentence```


gives me the error:

```Incorrect configuration settings! The config of `pipeline.unet`: FrozenDict([('sample_size', 64), ('in_channels', 4), ('out_channels', 4), ('center_input_sample', False), ('flip_sin_to_cos', True), ('freq_shift', 0), ('down_block_types', ['CrossAttnDownBlock2D', 'CrossAttnDownBlock2D', 'CrossAttnDownBlock2D', 'DownBlock2D']), ('mid_block_type', 'UNetMidBlock2DCrossAttn'), ('up_block_types', ['UpBlock2D', 'CrossAttnUpBlock2D', 'CrossAttnUpBlock2D', 'CrossAttnUpBlock2D']), ('only_cross_attention', False), ('block_out_channels', [320, 640, 1280, 1280]), ('layers_per_block', 2), ('downsample_padding', 1), ('mid_block_scale_factor', 1), ('act_fn', 'silu'), ('norm_num_groups', 32), ('norm_eps', 1e-05), ('cross_attention_dim', 768), ('attention_head_dim', 8), ('dual_cross_attention', False), ('use_linear_projection', False), ('class_embed_type', None), ('num_class_embeds', None), ('upcast_attention', False), ('resnet_time_scale_shift', 'default'), ('time_embedding_type', 'positional'), ('timestep_post_act', None), ('time_cond_proj_dim', None), ('conv_in_kernel', 3), ('conv_out_kernel', 3), ('projection_class_embeddings_input_dim', None), ('_class_name', 'UNet2DConditionModel'), ('_diffusers_version', '0.11.0.dev0'), ('_name_or_path', 'C:\\Users\\User\\.cache\\huggingface\\hub\\models--prompthero--openjourney-v2\\snapshots\\32e0aa8629c1d5ed82ff19f1543017bceb5f84d6\\unet')]) expects 4 but received `num_channels_latents`: 4 + `num_channels_mask`: 1 + `num_channels_masked_image`: 4 = 9. Please verify the config of `pipeline.unet` or your `mask_image` or `image` input.```

The text was updated successfully, but these errors were encountered:

haofanwang · 2023-03-03T08:54:15Z

It should be easy to implement if you are familiar with diffusers pipelines. I just don't want to make this project too redundant. But I can show you some guidance soon in my free time!

realimposter · 2023-03-03T17:33:58Z

Thanks so much! I'm new to diffusers and been struggling a bit to figure it out

un1tz3r0 · 2023-03-12T04:32:51Z

@haofanwang i'd also really like to know how to do this too... i don't really get why i can't seem to prepare my own latents for the pipeline's latents= option by using the vae to encode an init image the way the img2img or inpaint pipelines do. i keep getting mismatched tensor sizes when the pipeline tries to add the controlnet output to the sample in the timestep loop

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

img2img pipeline #22

img2img pipeline #22

realimposter commented Mar 3, 2023 •

edited

Loading

haofanwang commented Mar 3, 2023

realimposter commented Mar 3, 2023

un1tz3r0 commented Mar 12, 2023

img2img pipeline #22

img2img pipeline #22

Comments

realimposter commented Mar 3, 2023 • edited Loading

haofanwang commented Mar 3, 2023

realimposter commented Mar 3, 2023

un1tz3r0 commented Mar 12, 2023

realimposter commented Mar 3, 2023 •

edited

Loading