Text-driven image synthesis has emerged as a popular research area. Existing approaches to invert image to text face challenges like requiring multiple images, slow convergence, or overfitting thus suffering from editing capability. In this paper, we propose a novel initialization method for inverting text using off-the-shelf classification or captioning models. This approach enables multi-token embedding learning from a single input image while eliminating the need for fine-tuning and ensuring faster convergence. We demonstrate a significant improvement in convergence speed compared to vanilla TI.
-
Notifications
You must be signed in to change notification settings - Fork 0
jenci2114/csc413-project
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published