A group of researchers has collected and released a new dataset of facial expressions extracted from pre-modern Japanese art.
The dataset, called KaoKore contains 5552 RGB image files of size 256 x 256, along with several labels for each of the images. The images were extracted from the Collection of Facial Expressions, an effort by the ROIS-DS Center for Open Data in the Humanities (CODH) from a few years ago. Researchers processed this collection, prepared the labeled images and studied how to utilize this dataset in image classification as well as in generative modeling.
KaoKore dataset contains labels for gender and social status all the faces that appear in the images. Gender has two classes: male and female, while social status has four mutually exclusive classes: noble, warrior, incarnation, commoner. Classification models were trained and evaluated using these classes and the newly created dataset. The results show that popular neural network architectures achieve more than 92% accuracy for gender and more than 78% for the social status class. The dataset was also used in several tasks within the creativity applications and generative modeling. Researchers explored the use of Generative Adversarial Networks, Neural Painting models and intrinsic style transfer methods using KaoKore data.
According to the researchers, the goal of the KaoKore project was to foster research by building datasets that are socially and culturally relevant. They mention that Japanese artworks are one example of a rich and uncharted area in history and humanities and for this reason, they decided to create the KaoKore dataset.
The dataset is publicly available and was released together with standard data loaders for PyTorch and Tensorflow. More about the KaoKore dataset can be read in the paper. The data can be downloaded by following the instructions in the Github repository.