cubiq · LichAcademy · Sep 9, 2024 · Sep 9, 2024 · Sep 10, 2024 · Sep 10, 2024
diff --git a/.gitignore b/.gitignore
@@ -1,4 +1,7 @@
 /__pycache__/
 /models/*.bin
 /models/*.safetensors
-.directory
+.directory
+.pylintrc
+ComfyUI_IPAdapter_plus.code-workspace
+docs/.markdownlint.json
diff --git a/docs/assets/input_images.zip b/docs/assets/input_images.zip
diff --git a/docs/assets/workflows.zip b/docs/assets/workflows.zip
diff --git a/docs/images/input_images/African_man_jacket.png b/docs/images/input_images/African_man_jacket.png
diff --git a/docs/images/input_images/Venus_botticelli_detail.jpg b/docs/images/input_images/Venus_botticelli_detail.jpg
diff --git a/docs/images/input_images/ai_face2.png b/docs/images/input_images/ai_face2.png
diff --git a/docs/images/input_images/face2.png b/docs/images/input_images/face2.png
diff --git a/docs/images/input_images/girl.png b/docs/images/input_images/girl.png
diff --git a/docs/images/input_images/girl_face.png b/docs/images/input_images/girl_face.png
diff --git a/docs/images/input_images/italian_businesswoman.png b/docs/images/input_images/italian_businesswoman.png
diff --git a/docs/images/input_images/italian_businesswoman_face_center.png b/docs/images/input_images/italian_businesswoman_face_center.png
diff --git a/docs/images/input_images/italian_businesswoman_face_only.png b/docs/images/input_images/italian_businesswoman_face_only.png
diff --git a/docs/images/input_images/river.png b/docs/images/input_images/river.png
diff --git a/docs/images/input_images/statue.png b/docs/images/input_images/statue.png
diff --git a/docs/images/input_images/stone.png b/docs/images/input_images/stone.png
diff --git a/docs/images/input_images/vermeer.jpg b/docs/images/input_images/vermeer.jpg
diff --git a/docs/images/input_images/warrior_woman.png b/docs/images/input_images/warrior_woman.png
diff --git a/docs/images/input_images/warrior_woman_512.png b/docs/images/input_images/warrior_woman_512.png
diff --git a/docs/images/input_images/warrior_woman_img.png b/docs/images/input_images/warrior_woman_img.png
diff --git a/docs/images/input_images/woman_on_bench_img.png b/docs/images/input_images/woman_on_bench_img.png
diff --git a/docs/images/input_images/woman_on_bench_square.png b/docs/images/input_images/woman_on_bench_square.png
diff --git a/docs/images/input_images/woman_on_bench_square_face.png b/docs/images/input_images/woman_on_bench_square_face.png
diff --git a/docs/images/input_images/woman_on_bench_square_outpaint.png b/docs/images/input_images/woman_on_bench_square_outpaint.png
diff --git a/docs/images/other/Apply_IP_Adapter.png b/docs/images/other/Apply_IP_Adapter.png
diff --git a/docs/images/other/Apply_IP_Adapter_thumb.png b/docs/images/other/Apply_IP_Adapter_thumb.png
diff --git a/docs/images/other/eyes_up_here.png b/docs/images/other/eyes_up_here.png
diff --git a/docs/images/other/eyes_up_here_thumb.png b/docs/images/other/eyes_up_here_thumb.png
diff --git a/docs/images/other/using_nonsquare_images.png b/docs/images/other/using_nonsquare_images.png
diff --git a/docs/images/other/using_nonsquare_images_thumb.png b/docs/images/other/using_nonsquare_images_thumb.png
diff --git a/docs/images/workflows/01_ipadapter_sd15.png b/docs/images/workflows/01_ipadapter_sd15.png
diff --git a/docs/images/workflows/01_ipadapter_sd15_thumb.png b/docs/images/workflows/01_ipadapter_sd15_thumb.png
diff --git a/docs/images/workflows/02_ipadapter_sd15.png b/docs/images/workflows/02_ipadapter_sd15.png
diff --git a/docs/images/workflows/02_ipadapter_sd15_thumb.png b/docs/images/workflows/02_ipadapter_sd15_thumb.png
diff --git a/docs/images/workflows/03_ipadapter_sd15.png b/docs/images/workflows/03_ipadapter_sd15.png
diff --git a/docs/images/workflows/03_ipadapter_sd15_thumb.png b/docs/images/workflows/03_ipadapter_sd15_thumb.png
diff --git a/docs/images/workflows/04_ipadapter_sd15_plus.png b/docs/images/workflows/04_ipadapter_sd15_plus.png
diff --git a/docs/images/workflows/04_ipadapter_sd15_plus_thumb.png b/docs/images/workflows/04_ipadapter_sd15_plus_thumb.png
diff --git a/docs/images/workflows/05_ipadapter_sd15_plus.png b/docs/images/workflows/05_ipadapter_sd15_plus.png
diff --git a/docs/images/workflows/05_ipadapter_sd15_plus_thumb.png b/docs/images/workflows/05_ipadapter_sd15_plus_thumb.png
diff --git a/docs/images/workflows/06_ipadapter_sd15_plus.png b/docs/images/workflows/06_ipadapter_sd15_plus.png
diff --git a/docs/images/workflows/06_ipadapter_sd15_plus_thumb.png b/docs/images/workflows/06_ipadapter_sd15_plus_thumb.png
diff --git a/docs/images/workflows/07_batch_input.png b/docs/images/workflows/07_batch_input.png
diff --git a/docs/images/workflows/07_batch_input_thumb.png b/docs/images/workflows/07_batch_input_thumb.png
diff --git a/docs/images/workflows/08_ipadapter_sd15_plus_face.png b/docs/images/workflows/08_ipadapter_sd15_plus_face.png
diff --git a/docs/images/workflows/08_ipadapter_sd15_plus_face_thumb.png b/docs/images/workflows/08_ipadapter_sd15_plus_face_thumb.png
diff --git a/docs/images/workflows/09_ipadapter_sdxl.png b/docs/images/workflows/09_ipadapter_sdxl.png
diff --git a/docs/images/workflows/09_ipadapter_sdxl_thumb.png b/docs/images/workflows/09_ipadapter_sdxl_thumb.png
diff --git a/docs/images/workflows/10_ipadapter_sdxl_noise.png b/docs/images/workflows/10_ipadapter_sdxl_noise.png
diff --git a/docs/images/workflows/10_ipadapter_sdxl_noise_thumb.png b/docs/images/workflows/10_ipadapter_sdxl_noise_thumb.png
diff --git a/docs/images/workflows/11_ipadapter_img2img.png b/docs/images/workflows/11_ipadapter_img2img.png
diff --git a/docs/images/workflows/11_ipadapter_img2img_thumb.png b/docs/images/workflows/11_ipadapter_img2img_thumb.png
diff --git a/docs/images/workflows/12_ipadapter_inpaint.png b/docs/images/workflows/12_ipadapter_inpaint.png
diff --git a/docs/images/workflows/12_ipadapter_inpaint_thumb.png b/docs/images/workflows/12_ipadapter_inpaint_thumb.png
diff --git a/docs/installation.md b/docs/installation.md
@@ -0,0 +1,49 @@
+# Installation Guide
+
+## Models Checklist
+
+```tree
+📦models
+ ┣ 📂clip_vision
+ ┃ ┣ 📜CLIP-ViT-bigG-14-laion2B-39B-b160k.safetensors
+ ┃ ┣ 📜CLIP-ViT-H-14-laion2B-s32B-b79K.safetensors
+ ┣ 📂insightface
+ ┃ ┣ 📂models
+ ┃ ┃ ┣ 📂antelopev2
+ ┃ ┃ ┃ ┣ 📜1k3d68.onnx
+ ┃ ┃ ┃ ┣ 📜2d106det.onnx
+ ┃ ┃ ┃ ┣ 📜genderage.onnx
+ ┃ ┃ ┃ ┣ 📜glintr100.onnx
+ ┃ ┃ ┃ ┗ 📜scrfd_10g_bnkps.onnx
+ ┃ ┃ ┣ 📂buffalo_l
+ ┃ ┃ ┃ ┣ 📜1k3d68.onnx
+ ┃ ┃ ┃ ┣ 📜2d106det.onnx
+ ┃ ┃ ┃ ┣ 📜det_10g.onnx
+ ┃ ┃ ┃ ┣ 📜genderage.onnx
+ ┃ ┃ ┃ ┗ 📜w600k_r50.onnx
+ ┃ ┃ ┗ 📜buffalo_l.zip
+ ┃ ┗ 📜inswapper_128.onnx
+ ┣ 📂instantid
+ ┃ ┗ 📂SDXL
+ ┃ ┃ ┗ 📜ip-adapter.bin
+ ┣ 📂ipadapter
+ ┃ ┣ 📜ip-adapter-faceid-plusv2_sd15.bin
+ ┃ ┣ 📜ip-adapter-faceid-plusv2_sdxl.bin
+ ┃ ┣ 📜ip-adapter-full-face_sd15.safetensors
+ ┃ ┣ 📜ip-adapter-plus-face_sd15.safetensors
+ ┃ ┣ 📜ip-adapter-plus-face_sdxl_vit-h.safetensors
+ ┃ ┣ 📜ip-adapter-plus_sd15.safetensors
+ ┃ ┣ 📜ip-adapter-plus_sdxl_vit-h.safetensors
+ ┃ ┣ 📜ip-adapter_sd15.safetensors
+ ┃ ┣ 📜ip-adapter_sd15_light.safetensors
+ ┃ ┣ 📜ip-adapter_sd15_light_v11.bin
+ ┃ ┣ 📜ip-adapter_sd15_vit-G.safetensors
+ ┃ ┣ 📜ip-adapter_sdxl.safetensors
+ ┃ ┣ 📜ip-adapter_sdxl_vit-h.safetensors
+ ┃ ┣ 📜ip_plus_composition_sd15.safetensors
+ ┃ ┗ 📜ip_plus_composition_sdxl.safetensors
+ ┣ 📂loras
+    ┗ 📂ipadapter
+       ┣ 📜ip-adapter-faceid-plusv2_sd15_lora.safetensors
+       ┗ 📜ip-adapter-faceid-plusv2_sdxl_lora.safetensors
+```
diff --git a/docs/user-guide.md b/docs/user-guide.md
@@ -0,0 +1,188 @@
+# IP Adapter User Guide
+
+Welcome to IP Adapter User Guide. Please note that this guide is very much work in progress at this stage.
+
+- [1. Quick Start](#1-quick-start)
+  - [1.1 Before You Begin](#11-before-you-begin)
+  - [1.2 Stable Diffusion 1.5](#12-stable-diffusion-15)
+    - [1.2.1 Image-Only Prompts](#121-image-only-prompts)
+    - [1.2.2 Noise Injection](#122-noise-injection)
+  - [1.3 Multimodal Input](#13-multimodal-input)
+  - [1.4 Nonsquare Reference](#14-nonsquare-reference)
+  - [1.5 Multi-Image Input](#15-multi-image-input)
+  - [1.6 Plus Face Model](#16-plus-face-model)
+  - [1.7 SDXL Models](#17-sdxl-models)
+  - [1.8 Image-To-Image](#18-image-to-image)
+  - [1.9 Inpainting with IP Adapter](#19-inpainting-with-ip-adapter)
+- [2. IP Adapter: Advanced Guide](#2-ip-adapter-advanced-guide)
+- [3. Models Reference Table](#3-models-reference-table)
+
+# 1. Quick Start
+
+Anyone completely new to this should watch [How to use IPAdapter models in ComfyUI](https://www.youtube.com/watch?v=7m9ZZFU3HWo) on YouTube, by Matteo Spinelli.
+
+> **IMPORTANT** As of this writing, over a year has passed since the video got published, and in that time IP Adapter nodes have undergone several major updates. For a number of reasons, including the changes in how the weights are calculated, reproducing exact outcomes as shown in the video are no longer possible.
+
+Below are the adapted workflows in more-or-less chronological order, which you can load into ComfyUI via drag-and-drop.
+
+## 1.1 Before You Begin
+
+To begin, you will need to:
+
+- Use Node Manager to download IP Adapter models (avoid depricated ones)
+- Download two ClipVision models using the same
+
+Optionally, if you wish to reproduce exact results as shown in workflows, you may wish to:
+
+- Download Lycon's Dreamshaper 8 checkpoint from Civitai (download [link](https://civitai.com/api/download/models/128713?type=Model&format=SafeTensor&size=pruned&fp=fp16)) and place the safetensor file in `~/ComfyUI/models/checkpoints`.
+- Download input images (download [link](assets/input_images.zip)) and place them in `~/ComfyUI/input`.
+
+Alternatively, you are welcome to use checkpoint and inputs of your choosing.
+
+You can drag-and-drop each workflow <u>directly from this page</u>, and ComfyUI will load it. However, should you wish to save them on your computer, you can download [workflows.zip](assets/workflows.zip).
+
+## 1.2 Stable Diffusion 1.5
+
+### 1.2.1 Image-Only Prompts
+
+Given that 'IP' in 'IP Adapter' stands for **I**mage **P**rompt, prompting using images is a reasonable place to start.
+
+Each adapter model must be matched with the corresponding ClipVision model (ViT-H/BigG), and and used with the correct checkpoint (SD1.5/SDXL). Detailed matchup is shown at the bottom of this document. However, IP Adapter now has **Unified Loader**, which automatically loads the correct ClipVision model.
+
+Below workflows show the use of reference image as a replacement for text prompt using 'standard' and 'plus' models. Note that 'Plus' model is stronger and will have greated impact on the output compared to regular one; whereas standard model describes reference image using only 4 tokens, Plus model uses 16 tokens.
+
+**Workflow with 'Standard' model:**
+
+![IP Adapter: Simple](images/workflows/01_ipadapter_sd15_thumb.png)
+
+**Workflow with 'Plus' model:**
+
+![IP Adapter: Simple](images/workflows/04_ipadapter_sd15_plus_thumb.png)
+
+### 1.2.2 Noise Injection
+
+**Workflow with 'Standard' model:**
+
+If **Apply IPAdapter** is no longer available, how are we to inject noise?
+
+![Apply IPAdapter](images/other/Apply_IP_Adapter_thumb.png)
+
+**Answer**: we use a node called **IPAdapter Noise**.
+
+IP Adapter now has a dedicated node for noise injection. However, if you wish to replicate noise injection that was previously available via **Apply IPAdapter** node, load IPAdapter Noise, connect it to the reference image and choose 'shuffle'. See below workflows.
+
+![Injecting Noise with Standard model](images/workflows/02_ipadapter_sd15_thumb.png)
+
+**Workflow with 'Plus' model:**
+
+![Injecting Noise with Plus model](images/workflows/05_ipadapter_sd15_plus_thumb.png)
+
+## 1.3 Multimodal Input
+
+If prompting with both image and text, weights on IP Adapter need to be reduced, as below workflows show.
+
+**Workflow with 'Standard' model:**
+
+![Workflow showing image prompt and text prompt](images/workflows/03_ipadapter_sd15_thumb.png)
+
+**Workflow with 'Plus' model:**
+
+![Image + Text + Noise Injection](images/workflows/06_ipadapter_sd15_plus_thumb.png)
+
+## 1.4 Nonsquare Reference
+
+If you are using nonsquare image as your input, CLIP image processor will make it a square by cropping it at the center.
+
+This is fine, if the centre of the image is the focus. For portraits of people, this will likely give unexpected results:
+
+![Nonsquare image as input](images/other/using_nonsquare_images_thumb.png)
+
+Unless, of course this is what you want, we need a way to tell ClipVision 'Hey! Eyes up here!'
+
+You can accomplish that using **Prep Image for ClipVision** node with `crop_position` set to `Top`:
+
+![prep image for ClipVision](images/other/eyes_up_here_thumb.png)
+
+You can, of course, crop the image manually into a square. The only alternative is to outpaint image sides until it is a square image. Either way, ClipVision will receive a square image in the end.
+
+## 1.5 Multi-Image Input
+
+You can use IP Adapter to make multiple images serve as input
+
+![IP Adapter: Multi-Image Input](images/workflows/07_batch_input_thumb.png)
+
+## 1.6 Plus Face Model
+
+![IP Adapter: Plus Face](images/workflows/08_ipadapter_sd15_plus_face_thumb.png)
+
+## 1.7 SDXL Models
+
+SDXL 'Standard':
+
+![SDXL workflow](images/workflows/09_ipadapter_sdxl_thumb.png)
+
+SDXL 'Standard' with noise injection:
+
+![SDXL with noise injection](images/workflows/10_ipadapter_sdxl_noise_thumb.png)
+
+## 1.8 Image-To-Image
+
+![IP Adapter: Image-to-Image](images/workflows/11_ipadapter_img2img_thumb.png)
+
+## 1.9 Inpainting with IP Adapter
+
+![Inpainting with IP Adapter](images/workflows/12_ipadapter_inpaint_thumb.png)
+
+# 2. IP Adapter: Advanced Guide
+
+(To be continued...)
+
+# 3. Models Reference Table
+
+Below tables show the matching of models with visual transformers, checkpoints and/or LoRAs.
+
+**IP Adapter with SD1.5**
+
+| IP Adapter                            | Matching Visual Transformer (ViT)              | Alias[^1]         |
+| ------------------------------------- | ---------------------------------------------- | ----------------- |
+| ip-adapter_sd15.safetensors           | CLIP-ViT-H-14-laion2B-s32B-b79K.safetensors    | STANDARD          |
+| ip-adapter_sd15_light_v11.bin[^2]     | CLIP-ViT-H-14-laion2B-s32B-b79K.safetensors    | LIGHT - SD15 Only |
+| ip-adapter_sd15_vit-G.safetensors     | CLIP-ViT-bigG-14-laion2B-39B-b160k.safetensors | VIT-G             |
+| ip-adapter-plus_sd15.safetensors      | CLIP-ViT-H-14-laion2B-s32B-b79K.safetensors    | PLUS              |
+| ip-adapter-plus-face_sd15.safetensors | CLIP-ViT-H-14-laion2B-s32B-b79K.safetensors    | PLUS FACE         |
+
+**IP Adapter with SDXL**
+
+| IP Adapter                                  | Matching Visual Transformer (ViT)              | Alias[^1] |
+| ------------------------------------------- | ---------------------------------------------- | --------- |
+| ip-adapter_sdxl.safetensors                 | CLIP-ViT-bigG-14-laion2B-39B-b160k.safetensors | STANDARD  |
+| ip-adapter_sdxl_vit-h.safetensors           | CLIP-ViT-H-14-laion2B-s32B-b79K.safetensors    | VIT-G     |
+| ip-adapter-plus_sdxl_vit-h.safetensors      | CLIP-ViT-H-14-laion2B-s32B-b79K.safetensors    | PLUS      |
+| ip-adapter-plus-face_sdxl_vit-h.safetensors | CLIP-ViT-H-14-laion2B-s32B-b79K.safetensors    | PLUS FACE |
+
+**FaceID with SD1.5**
+
+| IP Adapter                              | Matching Visual Transformer (ViT)           | LoRA                                           | Alias[^1]       |
+| --------------------------------------- | ------------------------------------------- | ---------------------------------------------- | --------------- |
+| ip-adapter-faceid_sd15.bin              | CLIP-ViT-H-14-laion2B-s32B-b79K.safetensors | ip-adapter-faceid_sd15_lora.safetensors        | FACEID          |
+| ip-adapter-faceid-plusv2_sd15.bin       | CLIP-ViT-H-14-laion2B-s32B-b79K.safetensors | ip-adapter-faceid-plusv2_sd15_lora.safetensors | FACEID PLUS V2  |
+| ip-adapter-faceid-portrait-v11_sd15.bin | CLIP-ViT-H-14-laion2B-s32B-b79K.safetensors | None                                           | FACEID PORTRAIT |
+
+**FaceID with SDXL**
+
+| IP Adapter                          | Matching Visual Transformer (ViT)           | LoRA                                           | Alias[^1]       |
+| ----------------------------------- | ------------------------------------------- | ---------------------------------------------- | --------------- |
+| ip-adapter-faceid_sdxl.bin          | CLIP-ViT-H-14-laion2B-s32B-b79K.safetensors | ip-adapter-faceid_sdxl_lora.safetensors        | FACEID          |
+| ip-adapter-faceid-plusv2_sdxl.bin   | CLIP-ViT-H-14-laion2B-s32B-b79K.safetensors | ip-adapter-faceid-plusv2_sdxl_lora.safetensors | FACEID PLUS V2  |
+| ip-adapter-faceid-portrait_sdxl.bin | CLIP-ViT-H-14-laion2B-s32B-b79K.safetensors | None                                           | FACEID PORTRAIT |
+
+[^1]: When using Unified Loader
+[^2]: Version 1 (`ip-adapter_sd15_light.safetensors`) is depricated. You need version v1.1.
+
+**Community Models**
+
+| IP Adapter                                 | Matching Visual Transformer (ViT)           | LoRA | Alias[^1]                          |
+| ------------------------------------------ | ------------------------------------------- | ---- | ---------------------------------- |
+| ip-adapter-faceid-portrait_sdxl_unnorm.bin | CLIP-ViT-H-14-laion2B-s32B-b79K.safetensors | None | FACEID PORTRAIT UNNORM - SDXL Only |
+
+(to be continued)