Simple Sentiment Analysis using Python NLTK library

Sentiment analysis is the process of analyzing how a sentence contains emotions, such as happy, sad, like, dislike, anger, etc. It is mainly used to determine whether a sentence is positive or negative. Positive: “I’m glad it’s sunny today” Positive: “I’m tired and lazy at work” Negative: “I don’t mind crunchy” “I like you, but I want to break up with you”

A sentiment analysis tool, VADER (Valence Aware Dictionary and Entailment Reasoner) is a customized model that can be applied to social media and is available in the python NLTK package.

VADER evaluates the intensity of emotions along with their polarity (positive/negative), and also supports negative modifiers (“not”) and their contractions (“n’t”). It also supports slang words such as “kinda” (colloquial form of “kind of”) and “sux”. In addition to words, it also recognizes exclamation points and emoticons to indicate emotional intensity and reflects them in the emotion score.

VADER uses a combination of “dictionaries” and “rules” to find emotion values, and requires no learning.
The VADER dictionary, vader_lexicon.txt, contains 7,520 words. This dictionary collects vocabulary from various language resources, manually scores the degree of positivity/negativity [-4, +4], and sifts out words whose negativity/positivity is likely to change depending on the context.
In other words, a dictionary is a semantic specification of the degree of positivity/negativity in a word, or in other words, it is just a shade of negativity assigned to a word.

Goal here is to play with python nltk library by analyzing simple string data, then plot ‘compound’ sentiment in bar chat.
The compound score ‘compound’ is the sum of the scores of all words normalized between [-1, +1].

import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer
import matplotlib.pyplot as plt
import pandas as pd

nltk.download('vader_lexicon')

sentences = ["I am happy", 
             "I am sad", 
             "I am geek",
            "I got covid", 
            "I love geek"]

analyzer = SentimentIntensityAnalyzer()

result = []
for s in sentences:
    result.append(analyzer.polarity_scores(s))

i = 0
df = pd.DataFrame()
for i in range(len(sentences)):
    x = pd.DataFrame.from_dict(result[i], orient='index').T
    df = pd.concat([df,x], ignore_index=True)
df.index = sentences

print(df)

fig, ax = plt.subplots(figsize=(7,6))
ax.bar(df.index,df['compound'])
plt.rcParams.update({'font.size': 16})
ax.axhline(0, color='grey', linewidth=0.8)
ax.set_ylabel('Compound sentiment')
plt.xticks(rotation=40)
Simple Sentiment Analysis test using Python NLTK library

Leave a Reply

Your email address will not be published. Required fields are marked *