Social Media Sentiment Analysis

In this post I will try to demonstrate a small POC that I accomplished few weeks back. If you spend much of your time in social sites like Facebook, Twitter, Linkedin etc then this post will definitely give you goosebumps the moment you realise how your day to day activities can be mined though different techniques and detect behavioural pattern on day to day basis. I will not go much deep in to technical stuffs as it involves heavy algorithms behind the scene. In this post we will take two giant eCommerce website Flipkart and Amazon to see how people are twitting about them. Are they really writing something positive or putting there frustration over internet. I will try my best to keep this post up to date with the latest integration techniques as it comes to the market.

Before we start here is a basic PPT that I had prepared for my demo.

We will categorise  each of the user tweets in to positive, negative and neutral. I will share the link to download the python project as it will be really long to describe each and every line. Basically we will be using one twitter account that has permission to create application. (follow this link https://developer.twitter.com/apps). Once you are logged in create an app and provide justification that you are trying this application for learning purposes and agree to all the terms and condition of twitter and will not exploit for personal gain. Once twitter team approves this then goto the API Keys tab, there you will find your Consumer key and Consumer secret keys. Copy the consumer key (API key) and consumer secret from the screen into our application and use in the python program that is shared at the end of this post.

Sample screenshot of twitter App.

 

Then you need to have below set of files to configure and run the python program. Refer https://d3js.org/ for different types of chart to display your tweet and the size of circle based on their followers count.

Link to the code base for download.

This is the method that takes the search keyword.

Here the listener keep listening to the tweets, parses it and writes to the file. In today’s world this can be pumped into data delivery stream like Amazon Firehose and Kafka topic for real time integration. There are also twitter connection available as Kafka source connectors. You can opt for pubsub from Google Cloud and Event hub from Microsoft Azure but you should have a process that can push these tweets into the message queue.

Now we will start two thread. One will continuously listen and write to the file and the other thread read and parses the tweet for D3JS. Since the bubble chart uses a csv file behind the scene, second method will process and categorise the tweets in to positive, negative and neutral and write into a predefined format.

Inside the StartMain we have the algorithms used for classification.

 

Thats it for today. Let me know if your find this interesting. I know this can be improved to a greater extent but enough to get you started.

See you in my next post.

 

Thanks

Bhabani

About Bhabani 86 Articles
Bhabani has 10 plus years of experience in Data warehousing and Analytics projects that has span across multiple domains like Travel, Banking and Financial, Betting and Gaming Industries. Solution areas he focuses on designing the data warehouse and integrating it with cloud platforms like AWS or GCP. He is also a Elite level contributor at OTN forum more than 9 years. He loves to do experiment and POC on different integration tools and services. Some of his favorite skills are Redshift, Big Query, Python, Apache Airflow, Kafka, HDFS, Map Reduce ,HIVE, Habse, Sqoop, Drill, Impala.

1 Comment

Leave a Reply

Your email address will not be published.


*