GLIDE: Openair model for generating images by text

GLIDE is an OpenAI model for generating an image based on its description. GLIDE is superior to DALL-E and at the same time has 3 times fewer parameters.

In January 2021, OpenAI introduced DALL-E, a version with 12 billion parameters of the GPT-3 language model designed to create photorealistic images using text captions as hints. After that, NVIDIA released its counterpart – GauGAN2.

GLIDE is a diffusion model that provides performance competitive with DALL-E when using less than one-third of its parameters (3.5 billion parameters versus 12 billion). Recent studies have shown that diffusion models have the ability to generate high-quality synthetic images.

In addition to generating images from text, GLIDE can also be used to edit existing images — for example, inserting new objects, adding shadows and reflections — using natural language text prompts. GLIDE can also transform simple sketches into photorealistic images.