AdamClarkStandke / GenerativeDeepLearning Public

Notifications You must be signed in to change notification settings
Fork 0
Star 0

Repository for documenting different generative learning approaches for creating and implementing synthetic data for machine learning

0 stars 0 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 237 Commits
2-way-interpolation (2).jpg		2-way-interpolation (2).jpg
2-way-interpolation.jpg		2-way-interpolation.jpg
AE.ipynb		AE.ipynb
AutoRegressiveModels_TextGeneration.ipynb		AutoRegressiveModels_TextGeneration.ipynb
DreamBoothTutorial.ipynb		DreamBoothTutorial.ipynb
GPT.ipynb		GPT.ipynb
GenerativeDeepLearning.ipynb		GenerativeDeepLearning.ipynb
IMG_0561.jpg		IMG_0561.jpg
IMG_0562.jpg		IMG_0562.jpg
IMG_0563.jpg		IMG_0563.jpg
IMG_0566.jpg		IMG_0566.jpg
IMG_0567.jpg		IMG_0567.jpg
KerasStableDiffusion.ipynb		KerasStableDiffusion.ipynb
Keras_Weights.ipynb		Keras_Weights.ipynb
LICENSE		LICENSE
LatentSpaceGifMaker.ipynb		LatentSpaceGifMaker.ipynb
MusicGeneration.ipynb		MusicGeneration.ipynb
PixelCNN.ipynb		PixelCNN.ipynb
README.md		README.md
StableDiffusion_LatentSpace.ipynb		StableDiffusion_LatentSpace.ipynb
StableDiffusion_img2img.ipynb		StableDiffusion_img2img.ipynb
StableDiffustion_Text2Image.ipynb		StableDiffustion_Text2Image.ipynb
StableVideoDiffusion.ipynb		StableVideoDiffusion.ipynb
StableVideoDiffusionCustom.ipynb		StableVideoDiffusionCustom.ipynb
TextualInversion_StyleTransfer.ipynb		TextualInversion_StyleTransfer.ipynb
Textual_Inversion.ipynb		Textual_Inversion.ipynb
VAE.ipynb		VAE.ipynb
bowler_hat_man.gif		bowler_hat_man.gif
circularNoise.gif		circularNoise.gif
circular_walk_paris_at_night (1).gif		circular_walk_paris_at_night (1).gif
circular_walk_paris_at_night (2).gif		circular_walk_paris_at_night (2).gif
circular_walk_paris_at_night.gif		circular_walk_paris_at_night.gif
doggo-and-fruit-5.gif		doggo-and-fruit-5.gif
download (1).png		download (1).png
download (10).png		download (10).png
download (13).png		download (13).png
download (2).png		download (2).png
download (3).png		download (3).png
download (4).png		download (4).png
download (5).png		download (5).png
download (6).png		download (6).png
download (8).png		download (8).png
download (9).png		download (9).png
durations		durations
exp_one.gif		exp_one.gif
exp_two (1).gif		exp_two (1).gif
exp_two (2).gif		exp_two (2).gif
exp_two (3).gif		exp_two (3).gif
exp_two (4).gif		exp_two (4).gif
exp_two (5).gif		exp_two (5).gif
exp_two (7).gif		exp_two (7).gif
exp_two.gif		exp_two.gif
face_eight.png		face_eight.png
face_five.png		face_five.png
face_four.png		face_four.png
face_nine.png		face_nine.png
face_one.png		face_one.png
face_seven.png		face_seven.png
face_six.png		face_six.png
face_three.png		face_three.png
face_two.png		face_two.png
faces_two (4).png		faces_two (4).png
faces_two (5).png		faces_two (5).png
faces_two (6).png		faces_two (6).png
faces_two (7).png		faces_two (7).png
faces_two (8).png		faces_two (8).png
generated_art (3).png		generated_art (3).png
generated_art.png		generated_art.png
house.png		house.png
image.png		image.png
main_suit(2).png		main_suit(2).png
main_suit.png		main_suit.png
me_dancing (1).png		me_dancing (1).png
me_dancing.png		me_dancing.png
notes		notes
output-0.gif		output-0.gif
paris_at_night.gif		paris_at_night.gif
randomWalk.gif		randomWalk.gif
randomWalk_circularNoise.gif		randomWalk_circularNoise.gif
random_walk.yaml		random_walk.yaml
stuff (1).gif		stuff (1).gif
stuff (1).png		stuff (1).png
stuff (10).gif		stuff (10).gif
stuff (12).gif		stuff (12).gif
stuff (13).gif		stuff (13).gif
stuff (14).gif		stuff (14).gif
stuff (17).gif		stuff (17).gif
stuff (18).gif		stuff (18).gif
stuff (19).gif		stuff (19).gif
stuff (2).png		stuff (2).png
stuff (3).gif		stuff (3).gif
stuff (3).png		stuff (3).png
stuff (4).gif		stuff (4).gif
stuff (5).gif		stuff (5).gif
stuff (6).gif		stuff (6).gif
stuff (7).gif		stuff (7).gif
stuff (8).gif		stuff (8).gif
stuff (9).gif		stuff (9).gif
stuff.gif		stuff.gif
suit.gif		suit.gif

Repository files navigation

Generative Deep Learning Repo

This is a repository that documents different generative learning approaches using the Keras library and tutorials for synthetic data, Generative Deep Learning: Teaching Machines to Paint, Write, Compose, and Play, and hugging face. This repo impements the following models:

Auto-Encoders
- Generic Auto-Encoder
- Variational Auto-Encoder
Autoregressive Models
Diffusion Models

Denoising Diffusion Probabilistic Models

Denoising Diffusion Implicit Models

MultiModal Models

Traversing Along Stable Diffusion's Latent Space

dogs drinking coffee in outer space overlooking earth

Gif created using LatentSpaceGifMaker using 1 text prompt of dogs drinking coffee in outer space overlooking earth with with random walk and circular walk enabled using 12 random steps, step size of 0.005, cfg_scale of 7.5, batch size of 3 and num of diffusion steps of 25

dogs drinking coffee in outer space overlooking earth

Gif created using LatentSpaceGifMaker using 1 text prompt of dogs drinking coffee in outer space overlooking earth with circular walk enabled using 12 random steps, cfg_scale of 7.5, batch size of 3 and num of diffusion steps of 25

dogs drinking coffee in outer space overlooking earth

Gif created using LatentSpaceGifMaker using 1 text prompt of dogs drinking coffee in outer space overlooking earth with random walk enabled using 12 random steps, cfg_scale of 7.5, batch size of 3 and num of diffusion steps of 25

Textual Inversion of Stable Diffusion's Embedding Space using Non-Style prompts

Input Images:

Generated Images and Prompts Used:

Prompt: an oil painting of {placeholder_token}

Generated Images created using StabeDiffusion-TextualInversion

Prompt(s): man in fancy suit with {placeholder_token} walking in New York""high quality, highly detailed, elegant, sharp focus" "character concepts, mystery, adventure"

Generated Images created using MyPersonalizedWeights

Combining Stable Diffusion's Textual Embedding Space with its Image Manifold through Textual Inversion and non-style prompts

My Pre-trained weights can be found here and must be loaded beforehand in layer two of Stable Diffusion/CLIP's text encoder before generating images/gifs.

Prompt: man with {placeholder_token} in fancy suit in a red ferrari driving in Frankfurt high quality, highly detailed, elegant, sharp focus

Gif created using MyPersonalizedWeights with the following hyperparameters/configurations: cfg_scale = 7.5; walk_steps = 60; batch_size = 3; noise_start = normal distribution; diffusion_noise = scaled cos/sin; num_of_Diffusion_steps=25;negative_prompt=None;frame_per_seconds=10

Prompt: man with {placeholder_token} in fancy suit in a red ferrari driving in Frankfurt high quality, highly detailed, elegant, sharp focus

Gif created using MyPersonalizedWeights with the following hyperparameters/configurations: cfg_scale = 7.9; walk_steps = 60; batch_size = 3; noise_start = normal distribution; diffusion_noise = scaled cos/sin; num_of_Diffusion_steps=25;negative_prompt=None;frame_per_seconds=10

Prompt: man with {placeholder_token} in fancy suit in a red ferrari driving in Frankfurt high quality, highly detailed, elegant, sharp focus

Gif created using MyPersonalizedWeights with the following hyperparameters/configurations: cfg_scale = 7.9; walk_steps = 60; batch_size = 3; noise_start = normal distribution; diffusion_noise = scaled cos/sin; num_of_Diffusion_steps=50;negative_prompt=None;frame_per_seconds=10

Prompt: man with {placeholder_token} in fancy suit in a red ferrari driving in Frankfurt high quality, highly detailed, elegant, sharp focus

Gif created using MyPersonalizedWeights with the following hyperparameters/configurations: cfg_scale = 8; walk_steps = 60; batch_size = 3; noise_start = normal distribution; diffusion_noise = scaled cos/sin; num_of_Diffusion_steps=50;negative_prompt=None;frame_per_seconds=10

Prompt: man with {placeholder_token} in fancy suit in a red ferrari driving in Frankfurt high quality, highly detailed, elegant, sharp focus

Gif created using MyPersonalizedWeights with the following hyperparameters/configurations: cfg_scale = 8; walk_steps = 60; batch_size = 3; noise_start = normal distribution; diffusion_noise = unscaled; num_of_Diffusion_steps=50;negative_prompt=None; frame_per_seconds=10

Prompt: man with {placeholder_token} in fancy suit in a red ferrari driving in Frankfurt high quality, highly detailed, elegant, sharp focus

Gif created using MyPersonalizedWeights with the following hyperparameters/configurations: cfg_scale = 8; walk_steps = 60; batch_size = 3; noise_start = (technically) None; diffusion_noise = (technically) None ; num_of_Diffusion_steps=50;negative_prompt=None; frame_per_seconds=10

Prompt: man with {placeholder_token} in fancy suit in a red ferrari driving in Frankfurt high quality, highly detailed, elegant, sharp focus

Gif created using MyPersonalizedWeights with the following hyperparameters/configurations: cfg_scale = 8; walk_steps = 60; batch_size = 3; noise_start = (technically) None; diffusion_noise = (technically) None ; num_of_Diffusion_steps=50;negative_prompt=None; frame_per_seconds=10

Prompt: man with {placeholder_token} in fancy suit in a red ferrari driving in Frankfurt high quality, highly detailed, elegant, sharp focus

Gif created using MyPersonalizedWeights with the following hyperparameters/configurations: cfg_scale = 8; walk_steps = 60; batch_size = 3; noise_start = normal distribution; diffusion_noise = scaled by min_freq 1 max freq 1000; num_of_Diffusion_steps=50;negative_prompt=None; frame_per_seconds=10

Stable Diffusion Image-to-Image Application

Left image is the input image, right image is newly generated image based on prompt, negative prompt, strengh, and guidance.

Prompt: wizard, gandalf, lord of the rings, detailed, fantasy, cute, adorable, Pixar, Disney, 8k

Prompt: my face with afro hairstyle

Prompt black king with crown sitting on throne holding sword, detailed, fantasy, dark, Pixar, Disney, 8k

Images created using StableDiffusion-Image2Image

Stable Video Diffusion

Original Image

Gif created using StableVideoDiffusion-Image2Video using the text prompt: "suba diver swimming in ocean next to sharks, detailed, photo-realistic, 8k"

Generated Gifs

Gif created using StableVideoDiffusion-Image2Video using the folloing hyperparameters: motion_bucket_id=100, noise_aug_strength=0.02, latents=None

Gif created using StableVideoDiffusion-Image2Video using the folloing hyperparameters: motion_bucket_id=127, noise_aug_strength=0.1, latents=None

Gif created using StableVideoDiffusion-Image2Video using the folloing hyperparameters: motion_bucket_id=200, noise_aug_strength=0.02, latents=None

Gif created with textual inversion of my face using huggining face's textual inversion tutorial as found in the notebook HuggingFace_textualInversion_Myweights

Gif created with textual inversion of my face using huggining face's textual inversion tutorial as found in the notebook HuggingFace_textualInversion_Myweights

Combining Keras Model Weights with Pytorch Model

The previous gifs were created using huggining face's textual inversion tutorial using the defalut parameters of the script and the model-Id of runwayml/stable-diffusion-v1-5. After training for 1 hour with my placeholder token of I was able to generate very basic images with the prompt(s): "man with {placeholder_token}" or "man with {placeholder_token} swimming." The images that were high quality I did use later on in hugging face's implementation of Stable Video Diffusion to create the gifs seen above. However, I noticed that even with prompt weighting and various variations of guidance_scale, I was not able to generate an accurate image using long prompts such as this: "man with {placeholder_token} in fancy suit driving ferrari on highway in Berlin, side view." There could be many reasons why this is so (probably something in the training script I am missing in regards to embedding longer prompts).

With that being said, I wanted to generate images of me driving a nice car in a fancy (boogie) suit, so I used my pretrained weights from combining Stable Diffusion's Textual Embedding Space with its Image Manifold through Textual Inversion and non-style prompts to generate a decent image and feed that image into hugging face's implementation of Stable Video Diffusion. The end result of doing so was acceptable (execpt it was an old ferrari, but at least it gave me some cool glasses lol).

Image created with pretrained weights from Kera's tutortial on textual inversion using the images of my face as found in the section combining Stable Diffusion's Textual Embedding Space with its Image Manifold through Textual Inversion and non-style prompts with the following paramters/prompts: negative_prompt="ugly, deformed, disfigured, poor details, bad anatomy", prompt="man with {placeholder_token} in fancy suit driving ferrari on highway in Berlin, side view", unconditional_guidance_scale=12, num_steps=100

Gif created with SVD with the following parameters:motion = 100, augmentation = 0.02 and latent/pre-generated=None

Gif created with SVD with the following parameters:motion = 50, augmentation = 0.02 and latent/pre-generated=None

Gif created with SVD with the following parameters:motion = 50, augmentation = 0.02 and latent/pre-generated=torch.normal(0, 1, size=(1, 25, 4, 72, 128), generator=generator, dtype=torch.float16)

Image created with Keras-SVD from Kera's tutortial on textual inversion using the images of my face as found in the section combining Stable Diffusion's Textual Embedding Space with its Image Manifold through Textual Inversion and non-style prompts with the following paramters/prompts: negative_prompt="ugly, deformed, disfigured, poor details, bad anatomy", prompt="man with {placeholder_token} in fancy suit dancing", unconditional_guidance_scale=12, num_steps=100

Gif created with Keras-SVD with the following parameters:motion = 50, augmentation = 0.02 and latent/pre-generated=none

Gif created with Keras-SVD with the following parameters:motion = 60, augmentation = 0.02 and latent/pre-generated=none

Image created with Keras-SVD from Kera's tutortial on textual inversion using the images of my face as found in the section combining Stable Diffusion's Textual Embedding Space with its Image Manifold through Textual Inversion and non-style prompts with the following paramters/prompts: negative_prompt="ugly, deformed, disfigured, poor details, bad anatomy", prompt="my {placeholder_token} on face of group of dancing monkeys", unconditional_guidance_scale=10, num_steps=100

Gif created with Keras-SVD with the following parameters:motion = 50, augmentation = 0.02 and latent/pre-generated=none

About

Repository for documenting different generative learning approaches for creating and implementing synthetic data for machine learning

Report repository

Releases

No releases published

Packages

No packages published

Languages

Jupyter Notebook 100.0%