Skip to content

andruum/AttnGAN

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

72 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AttnGAN with noise suppression

This repository is a fork from the implementation of AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks

Problem:

Original implementation doesn't have any noise suppression algorithm, thus sentence and word feature vectors are exposed to noise in an input text.

Proposed solution:

In a paper A Generative Adversarial Approach for Zero-Shot Learning from Noisy Texts there was proposed an approach to handle noise in texts. In essence, the solution uses a fully-connected layer for an input vector which reduces its size (from 11083 to 1000). After this it adds some noise and scales to 4096 feature vector.

Basically, it works like an autoencoder with one hidden layer, which reduces noise in the input vector. So, the same idea was applied in this repo, in order to get better results with the original attentional GAN.

Autoencoder was applied to the outputs of RNN-encoder in AttnGAN. It has word feature vectors and sentence feature vector - they represent features of the text, so we can try to reduce noise in them.

Solution

autoencoder= 
nn.Sequential(
    nn.Linear(nhidden, nhidden // 20),
    nn.LeakyReLU(),
    nn.Linear(nhidden // 20, nhidden),
    nn.Tanh())

nhidden - number of hidden output features of RNN-encoder.

The first attempt was to apply noise suppression to all outputs (word and sentence features as well, two different autoencoders), but it caused to degradation in quality of picture details:

Original solution Word and sentence autoencoder Sentence autoencoder only
alt text alt text alt text

So, the final solution contains only sentence autoencoder.

Also, it has intuition behind this - sentence features are quite important, because they are used for generation of the first low-scale image, and the final result depends on its quality and realism. And word features are used to generate more details of the image - and we want our image to have more visual features.

The suggested approach helped to get rid of the noise for some examples:

Original AE (25 hidden) AE (17 hidden) AE (12 hidden) AE (10 hidden)
alt text alt text alt text alt text alt text

Autoencoder with 12 hidden features shows a good result here - bird doesn't have two beaks like in the original example. So, this architecture was trained for more epochs:

Original AE
alt text alt text
alt text alt text
alt text alt text

Conclusion

Sentence autoencoder fixed some cases for which the original GAN generated non-relevant examples, but also it may reduce details of an image in some cases.

So, there is space for further exploration.

About

AttnGAN with noise suppression

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%