fbpx
  • TextBlob and VADER: libraries for sentiment analysis

    Natural language contains idioms, sarcasm, and other techniques that make it difficult for neural networks to recognize the meaning of texts. The Text Blob and VADER libraries allow you to evaluate the tonality of texts with a few lines of code.

    Let’s analyze the work with libraries by analyzing the sentiment of tweets from the Sentiment140 dataset, which includes 1.6 million tweets that are annotated as having a positive, negative, or neutral tone. From this dataset, a total of 500 tweets were selected to evaluate the effectiveness of Text Blob and VADER by comparing their output with annotations. First, import the libraries:

    from textblob import TextBlob
    from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
    import pandas as pdanalyzer = SentimentIntensityAnalyzer()

    Then load tweets:

    completeDF = pd.read_csv("testdata.manual.2009.06.14.csv", names=["polarity", "id","date","query","user","text"])df = completeDF.drop(columns=['date','query','user'])
    df.polarity = df.polarity.replace({0: -1, 2: 0, 4: 1})

    In the dataset, the positive, neutral, and negative tones are encoded by the numbers 4, 2, and 0, respectively. Text Blob and Vader use the same classification, but use real numbers in the range from 1 to -1 to evaluate the tonality: -1 is the most negative and +1 is the most positive. Therefore, in the code above, the numbers from the dataset are replaced for comparison with the output data of the analyzers. Now you can sort through the tweets and pass them through two analyzers:

    TextBlob(text).sentiment.polarity) 
    analyzer.polarity_scores(text)['compound'])

    Then you need to round up the output of the analyzers to assign each tweet one of the three categories of tonality. The choice of rounding method affects the accuracy of the results. The best accuracy is achieved when rounding the key above 0 to 1, and less than 0 to -1. The result was as follows:

    Overall length 498
    VADER agreements/disagreements 360/138
    Accuracy: 72.28915662650603%
    TextBlob agreements/disagreements 324/174
    Accuracy: 65.06024096385542%

    Thus, TextBlob and VADER showed an accuracy of 65% and 72%, respectively. These values are typical for similar tonality analyzers. The Sentiment140 project data is available here.

    Subscribe
    Notify of
    guest
    0 Comments
    Inline Feedbacks
    View all comments