A group of researchers from the Department of Computer Science at West Virginia University has built the first autoencoder model that is able to compare with the capabilities of GAN networks in terms of realistic image generation.
In the paper, named “Adversarial Latent Autoencoders”, researchers explore the idea of having an autoencoder model that has the same generative power as GANs and it is able to learn disentangled representations. The resulting model was called ALAE or Adversarial Latent Autoencoder, which leverages recent advances in Generative Adversarial Networks and their training.
In the proposed architecture the original GAN paradigm was a bit modified by decomposing the generator and the discriminator networks. Each of the networks is decomposed as a chain of two sub-networks (G to G and F, D to D and E) and researchers assume that the latent spaces between F and G, and between E and D are the same. This setup allows E and G to be used as the encoder and decoder or generator networks in autoencoder fashion.
Two such autoencoders were designed using the abovementioned paradigm. The first one based on an MLP encoder and the other one based on Style GAN generator, called StyleALAE.
Researchers used several different datasets to evaluate the performance of the proposed model: MNIST, FFHQ, LSUN, Celeba-HQ, etc. The evaluations show that the new approach enables learning representations with a higher level of disentanglement. Additionally, StyleALAE model managed to generate realistic, high-resolution (1024×1024) face images with results being comparable with StyleGAN’s output.