Recently, we wrote about a research paper that studies and tries to understand the internal representations of GANs. Despite efforts like this, the understanding of various aspects of the image synthesis process in Generative Adversarial Networks is still elusive.
GANs have achieved great success, especially recently, in terms resolution and quality of the generated images. Also, it is widespread among different methods to be able to do latent space interpolations. Although successful, there is a lack of understanding of this latent space.
A new paper, published by NVIDIA Research this week, introduced a novel generator architecture StyleGan. The new method demonstrates better interpolation properties, and also better disentangles the latent factors of variation – two significant things. Researchers show that the new architecture automatically learns to separate high-level attributes (e.g., face pose and identity) and stochastic variation in the generated images (e.g., freckles, hair).
The idea behind StyleGAN is that it adjusts the “style” of the image at each convolution layer. This adjustment is based on the latent code, therefore directly controlling the strength of image features at different scales. Thus, the main difference with ordinary Generator network is that the latent vector is mapped to an intermediate latent space. Then this intermediate latent vector is fed into the synthesis (Generator) network along with some noise.
The researchers show that the generated image quality stays sufficiently good and even improved compared to ordinary GAN. Evaluation has been done using the famous CELEBA-HQ dataset and a new dataset called FFHQ.
The paper is published on ArXiv, and a video explaining the method is available here. The source code is expected to be published soon.