This is a repository that documents different generative learning approaches using the Keras library and tutorials for synthetic data, Generative Deep Learning: Teaching Machines to Paint, Write, Compose, and Play, and hugging face. This repo impements the following models:
- Auto-Encoders
- Autoregressive Models
- Diffusion Models
- MultiModal Models
Gif created using LatentSpaceGifMaker using 1 text prompt of dogs drinking coffee in outer space overlooking earth with with random walk and circular walk enabled using 12 random steps, step size of 0.005, cfg_scale of 7.5, batch size of 3 and num of diffusion steps of 25
Gif created using LatentSpaceGifMaker using 1 text prompt of dogs drinking coffee in outer space overlooking earth with circular walk enabled using 12 random steps, cfg_scale of 7.5, batch size of 3 and num of diffusion steps of 25
Gif created using LatentSpaceGifMaker using 1 text prompt of dogs drinking coffee in outer space overlooking earth with random walk enabled using 12 random steps, cfg_scale of 7.5, batch size of 3 and num of diffusion steps of 25
Prompt: an oil painting of {placeholder_token}
Generated Images created using StabeDiffusion-TextualInversion
Prompt(s): man in fancy suit with {placeholder_token} walking in New York""high quality, highly detailed, elegant, sharp focus" "character concepts, mystery, adventure"
Generated Images created using MyPersonalizedWeights
Combining Stable Diffusion's Textual Embedding Space with its Image Manifold through Textual Inversion and non-style prompts
My Pre-trained weights can be found here and must be loaded beforehand in layer two of Stable Diffusion/CLIP's text encoder before generating images/gifs.
Prompt: man with {placeholder_token} in fancy suit in a red ferrari driving in Frankfurt high quality, highly detailed, elegant, sharp focus
Gif created using MyPersonalizedWeights with the following hyperparameters/configurations: cfg_scale = 7.5; walk_steps = 60; batch_size = 3; noise_start = normal distribution; diffusion_noise = scaled cos/sin; num_of_Diffusion_steps=25;negative_prompt=None;frame_per_seconds=10
Prompt: man with {placeholder_token} in fancy suit in a red ferrari driving in Frankfurt high quality, highly detailed, elegant, sharp focus
Gif created using MyPersonalizedWeights with the following hyperparameters/configurations: cfg_scale = 7.9; walk_steps = 60; batch_size = 3; noise_start = normal distribution; diffusion_noise = scaled cos/sin; num_of_Diffusion_steps=25;negative_prompt=None;frame_per_seconds=10
Prompt: man with {placeholder_token} in fancy suit in a red ferrari driving in Frankfurt high quality, highly detailed, elegant, sharp focus
Gif created using MyPersonalizedWeights with the following hyperparameters/configurations: cfg_scale = 7.9; walk_steps = 60; batch_size = 3; noise_start = normal distribution; diffusion_noise = scaled cos/sin; num_of_Diffusion_steps=50;negative_prompt=None;frame_per_seconds=10
Prompt: man with {placeholder_token} in fancy suit in a red ferrari driving in Frankfurt high quality, highly detailed, elegant, sharp focus
Gif created using MyPersonalizedWeights with the following hyperparameters/configurations: cfg_scale = 8; walk_steps = 60; batch_size = 3; noise_start = normal distribution; diffusion_noise = scaled cos/sin; num_of_Diffusion_steps=50;negative_prompt=None;frame_per_seconds=10
Prompt: man with {placeholder_token} in fancy suit in a red ferrari driving in Frankfurt high quality, highly detailed, elegant, sharp focus
Gif created using MyPersonalizedWeights with the following hyperparameters/configurations: cfg_scale = 8; walk_steps = 60; batch_size = 3; noise_start = normal distribution; diffusion_noise = unscaled; num_of_Diffusion_steps=50;negative_prompt=None; frame_per_seconds=10
Prompt: man with {placeholder_token} in fancy suit in a red ferrari driving in Frankfurt high quality, highly detailed, elegant, sharp focus
Gif created using MyPersonalizedWeights with the following hyperparameters/configurations: cfg_scale = 8; walk_steps = 60; batch_size = 3; noise_start = (technically) None; diffusion_noise = (technically) None ; num_of_Diffusion_steps=50;negative_prompt=None; frame_per_seconds=10
Prompt: man with {placeholder_token} in fancy suit in a red ferrari driving in Frankfurt high quality, highly detailed, elegant, sharp focus
Gif created using MyPersonalizedWeights with the following hyperparameters/configurations: cfg_scale = 8; walk_steps = 60; batch_size = 3; noise_start = (technically) None; diffusion_noise = (technically) None ; num_of_Diffusion_steps=50;negative_prompt=None; frame_per_seconds=10
Prompt: man with {placeholder_token} in fancy suit in a red ferrari driving in Frankfurt high quality, highly detailed, elegant, sharp focus
Gif created using MyPersonalizedWeights with the following hyperparameters/configurations: cfg_scale = 8; walk_steps = 60; batch_size = 3; noise_start = normal distribution; diffusion_noise = scaled by min_freq 1 max freq 1000; num_of_Diffusion_steps=50;negative_prompt=None; frame_per_seconds=10
Left image is the input image, right image is newly generated image based on prompt, negative prompt, strengh, and guidance.
Prompt: wizard, gandalf, lord of the rings, detailed, fantasy, cute, adorable, Pixar, Disney, 8k
Prompt: my face with afro hairstyle
Prompt black king with crown sitting on throne holding sword, detailed, fantasy, dark, Pixar, Disney, 8k
Images created using StableDiffusion-Image2Image
Original Image
Gif created using StableVideoDiffusion-Image2Video using the text prompt: "suba diver swimming in ocean next to sharks, detailed, photo-realistic, 8k"
Generated Gifs
Gif created using StableVideoDiffusion-Image2Video using the folloing hyperparameters: motion_bucket_id=100, noise_aug_strength=0.02, latents=None
Gif created using StableVideoDiffusion-Image2Video using the folloing hyperparameters: motion_bucket_id=127, noise_aug_strength=0.1, latents=None
Gif created using StableVideoDiffusion-Image2Video using the folloing hyperparameters: motion_bucket_id=200, noise_aug_strength=0.02, latents=None
Gif created with textual inversion of my face using huggining face's textual inversion tutorial as found in the notebook HuggingFace_textualInversion_Myweights
Gif created with textual inversion of my face using huggining face's textual inversion tutorial as found in the notebook HuggingFace_textualInversion_Myweights
Combining Keras Model Weights with Pytorch Model
The previous gifs were created using huggining face's textual inversion tutorial using the defalut parameters of the script and the model-Id of runwayml/stable-diffusion-v1-5. After training for 1 hour with my placeholder token of I was able to generate very basic images with the prompt(s): "man with {placeholder_token}" or "man with {placeholder_token} swimming." The images that were high quality I did use later on in hugging face's implementation of Stable Video Diffusion to create the gifs seen above. However, I noticed that even with prompt weighting and various variations of guidance_scale, I was not able to generate an accurate image using long prompts such as this: "man with {placeholder_token} in fancy suit driving ferrari on highway in Berlin, side view." There could be many reasons why this is so (probably something in the training script I am missing in regards to embedding longer prompts).
With that being said, I wanted to generate images of me driving a nice car in a fancy (boogie) suit, so I used my pretrained weights from combining Stable Diffusion's Textual Embedding Space with its Image Manifold through Textual Inversion and non-style prompts to generate a decent image and feed that image into hugging face's implementation of Stable Video Diffusion. The end result of doing so was acceptable (execpt it was an old ferrari, but at least it gave me some cool glasses lol).
Image created with pretrained weights from Kera's tutortial on textual inversion using the images of my face as found in the section combining Stable Diffusion's Textual Embedding Space with its Image Manifold through Textual Inversion and non-style prompts with the following paramters/prompts: negative_prompt="ugly, deformed, disfigured, poor details, bad anatomy", prompt="man with {placeholder_token} in fancy suit driving ferrari on highway in Berlin, side view", unconditional_guidance_scale=12, num_steps=100
Gif created with SVD with the following parameters:motion = 100, augmentation = 0.02 and latent/pre-generated=None
Gif created with SVD with the following parameters:motion = 50, augmentation = 0.02 and latent/pre-generated=None
Gif created with SVD with the following parameters:motion = 50, augmentation = 0.02 and latent/pre-generated=torch.normal(0, 1, size=(1, 25, 4, 72, 128), generator=generator, dtype=torch.float16)
Image created with Keras-SVD from Kera's tutortial on textual inversion using the images of my face as found in the section combining Stable Diffusion's Textual Embedding Space with its Image Manifold through Textual Inversion and non-style prompts with the following paramters/prompts: negative_prompt="ugly, deformed, disfigured, poor details, bad anatomy", prompt="man with {placeholder_token} in fancy suit dancing", unconditional_guidance_scale=12, num_steps=100
Gif created with Keras-SVD with the following parameters:motion = 50, augmentation = 0.02 and latent/pre-generated=none
Gif created with Keras-SVD with the following parameters:motion = 60, augmentation = 0.02 and latent/pre-generated=none
Image created with Keras-SVD from Kera's tutortial on textual inversion using the images of my face as found in the section combining Stable Diffusion's Textual Embedding Space with its Image Manifold through Textual Inversion and non-style prompts with the following paramters/prompts: negative_prompt="ugly, deformed, disfigured, poor details, bad anatomy", prompt="my {placeholder_token} on face of group of dancing monkeys", unconditional_guidance_scale=10, num_steps=100
Gif created with Keras-SVD with the following parameters:motion = 50, augmentation = 0.02 and latent/pre-generated=none