Natural language contains idioms, sarcasm, and other techniques that make it difficult for neural networks to recognize the meaning of texts. The Text Blob and VADER libraries allow you to evaluate the tonality of texts with a few lines of code.
Let’s analyze the work with libraries by analyzing the sentiment of tweets from the Sentiment140 dataset, which includes 1.6 million tweets that are annotated as having a positive, negative, or neutral tone. From this dataset, a total of 500 tweets were selected to evaluate the effectiveness of Text Blob and VADER by comparing their output with annotations. First, import the libraries:
from textblob import TextBlob from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer import pandas as pdanalyzer = SentimentIntensityAnalyzer()
Then load tweets:
completeDF = pd.read_csv("testdata.manual.2009.06.14.csv", names=["polarity", "id","date","query","user","text"])df = completeDF.drop(columns=['date','query','user']) df.polarity = df.polarity.replace({0: -1, 2: 0, 4: 1})
In the dataset, the positive, neutral, and negative tones are encoded by the numbers 4, 2, and 0, respectively. Text Blob and Vader use the same classification, but use real numbers in the range from 1 to -1 to evaluate the tonality: -1 is the most negative and +1 is the most positive. Therefore, in the code above, the numbers from the dataset are replaced for comparison with the output data of the analyzers. Now you can sort through the tweets and pass them through two analyzers:
TextBlob(text).sentiment.polarity) analyzer.polarity_scores(text)['compound'])
Then you need to round up the output of the analyzers to assign each tweet one of the three categories of tonality. The choice of rounding method affects the accuracy of the results. The best accuracy is achieved when rounding the key above 0 to 1, and less than 0 to -1. The result was as follows:
Overall length 498
VADER agreements/disagreements 360/138
Accuracy: 72.28915662650603%
TextBlob agreements/disagreements 324/174
Accuracy: 65.06024096385542%
Thus, TextBlob and VADER showed an accuracy of 65% and 72%, respectively. These values are typical for similar tonality analyzers. The Sentiment140 project data is available here.