The GoEmotions dataset consists of comments from Reddit users with labels of their emotional coloring. GoEmotions is designed to train neural networks to perform deep analysis of the tonality of texts.
Most of the existing emotion classification datasets cover certain areas (for example, news headlines and movie subtitles), are small in size and use a scale of only six basic emotions (anger, surprise, disgust, joy, fear, and sadness). The expansion of the emotional spectrum considered in datasets could make it possible to create more sensitive chatbots, models for detecting dangerous behavior on the Internet, as well as improve customer support services.
GoEmotions is a dataset of 58,000 Reddit comments extracted from popular English-language subreddits and manually marked up into 27 categories of emotions. This is currently the largest dataset for analyzing the tonality of texts. The categories of emotions were identified by Google together with psychologists and include 12 positive, 11 negative, 4 ambiguous emotions, and 1 neutral, which makes the dataset suitable for solving tasks that require subtle differentiation between different emotions.
The GoEmotions dataset has been released along with a detailed tutorial that demonstrates the learning process of a neural network (available in TensorFlow Model Garden) using GoEmotions and its application to the task of offering emojis based on the message text.