Microsoft has introduced a dataset of synthetic facial images Fake It Till You Make It. The dataset is aimed at pre-training facial recognition algorithms before being used in real-world scenarios.
Synthetic data has been used in biometrics for several years, but the gap between real and synthetic applications remains one of the key problems, especially in the task of facial recognition. To solve this problem, Microsoft has developed a generative neural network that creates a parametric 3D model of a face. Then the textures of the face and hair are randomly applied to this model, allowing you to visualize the model with a high degree of realism and diversity.
This approach allows you to fully control the variability of the dataset and thus avoid the problem of bias. Another important feature of the dataset is pixel segmentation and the formation of a map of key points with almost 100 percent accuracy. Curiously, the face generation model was not trained on real data.
The dataset consists of 100,000 synthetic images of faces and will be made publicly available along with two-dimensional maps of key points in the near future.