Sentiment Analysis of Stock headline news using Python NLTK library

The use of numerical values such as stock prices, trading volume, economic indicators, and financial data is fundamental to analyzing markets, and machine learning systems are no exception. However, another important approach is to use natural language processing and other methods to handle text data to see its relevance to financial markets.

The goal of this note is to understand sentiment analysis in practical situation, but not intend to use them to predict high return stocks etc. This is because just use of simple natural language processing will not work for that purpose, because humans tend to create baseline based on their own self-determined scales.

We tend to create baseline numbers such as “the Dow Jones Industrial Average has exceeded $27,000” or “the dollar is on the verge of 100 yen”. This is because it is easy for human beings to understand, but there is no inevitability in the neatness of the numbers when financial markets are viewed as a physical phenomenon.

However, it can be used to measure how the sentiment to the stock or cryptocurrency changes by time, such as sentiment to the bitcoin.

Below code perform sentiment analysis from python nltk package to the recent headline news of given ticker symbol, and estimate average of sentiment. The code available here for basic usage of nltk package.

#%%

import pandas as pd
from bs4 import BeautifulSoup
import matplotlib.pyplot as plt
from urllib.request import urlopen, Request
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer


# Parameters 
tickers = ['CVX', 'XOM', 'TSLA', 'AMZN', 'GOOG', 'MSFT', 'AAPL', 'DOCS', 'JNJ', 'V', 'PFE']
nltk.download('vader_lexicon')

# Get Data
finwiz_url = 'https://finviz.com/quote.ashx?t='
news_tables = {}

for ticker in tickers:
    url = finwiz_url + ticker
    req = Request(url=url,headers={'user-agent': 'my-app/0.0.1'}) 
    resp = urlopen(req)    
    html = BeautifulSoup(resp, features="lxml")
    news_table = html.find(id='news-table')
    news_tables[ticker] = news_table



# Iterate through the news
parsed_news = []
for file_name, news_table in news_tables.items():
    for x in news_table.findAll('tr'):
        text = x.a.get_text() 
        date_scrape = x.td.text.split()

        if len(date_scrape) == 1:
            time = date_scrape[0]
            
        else:
            date = date_scrape[0]
            time = date_scrape[1]

        ticker = file_name.split('_')[0]
        
        parsed_news.append([ticker, date, time, text])
        
# Sentiment Analysis
analyzer = SentimentIntensityAnalyzer()

columns = ['Ticker', 'Date', 'Time', 'Headline']
news = pd.DataFrame(parsed_news, columns=columns)
scores = news['Headline'].apply(analyzer.polarity_scores).tolist()

df_scores = pd.DataFrame(scores)
news = news.join(df_scores, rsuffix='_right')


# View Data 
news['Date'] = pd.to_datetime(news.Date).dt.date

unique_ticker = news['Ticker'].unique().tolist()
news_dict = {name: news.loc[news['Ticker'] == name] for name in unique_ticker}

values = []
for ticker in tickers: 
    df_sentiment = news_dict[ticker]
    df_sentiment = df_sentiment.set_index('Ticker')
    
    mean = round(df_sentiment['compound'].mean(), 4)
    values.append(mean)
    
df = pd.DataFrame(list(zip(tickers, values)), columns =['Ticker', 'Mean Sentiment']) 
df = df.set_index('Ticker')
df = df.sort_values('Mean Sentiment', ascending=False)

plt.rcParams.update({'font.size': 16})
df.plot.bar(figsize=(8,6))
# %%

Sentiment Analysis of Stock headline news using Python NLTK library

Published by dr.T

Leave a Reply Cancel reply