By Arafath Hossain
A report published by Daily Star on December 1st, 2017 reported that according to Google’s record a total of 40 million people actively use the internet in Bangladesh. Out of that 40 million users, 35% access it every day. A majority of that internet users spend a significant portion of their internet presence on social networking sites. This makes Dhaka the 2nd most active city on Facebook as reported in Global Digital Statshot, a report published by online media sites Hootsuite and ‘we are social’, last year. It’s already a proven fact that we Bangladeshis have accepted social media as a part of our lives. Social media platforms have also evolved. From being merely personal networking online mediums, they have evolved into global opinion sharing platforms which now even shape people’s opinion. For Bangladesh, this huge number of population, mostly young, thriving on the internet could be a great way to represent itself to the world. But how do we collectively represent our country in the social media? What are the things that we care most or how do we feel when we talk about Bangladesh? Study of the social media contents with the help of advanced data analysis techniques and machine learning algorithms may give us some idea. This article will give a glimpse of finding from such a study.
According to a report published in 2015 by CNN, Facebook is the most widely used social media platform in Bangladesh followed by Twitter. Accordingly, though Facebook should be the first choice, Facebook does not provide integration facility with data analysis tools to collect large-scale data to analyze for such study. On the other hand, Twitter’s publicly accessible API (application programming interface) makes it a good candidate to fetch large scale of publicly available tweets to study, making Twitter a perfect candidate for this analysis. Since the study interest was about knowing what impression we give to the global audience, only the messages written in English were considered for analysis. For the study, a total number of 20,000 tweets in English that contained ‘Bangladesh’ were collected which were posted on a single day in January 2018. These 20,000 tweets were collected and analyzed using R, an open-source data analysis tool and different machine learning models were used to derive the insights which will be briefly discussed along the way of discussion.
To put our discussion in a context, let’s consider this study as a random walk on the street called ‘Twitter’ with a celebrity named ‘Bangladesh’. During our walk, we recorded all the comments about Bangladesh. Now we will analyze the records (tweets) and try to understand what people talked about most and what kind of emotion they exuded.
To unfold the story, we will start our analysis with the words from random comments about Bangladesh, in this context ‘tweets’, that caught our attention. A word cloud formed from the tweets may give us an idea. From the word cloud using the most frequently used words in the tweets, we can see Rohingya, News, Refuge, Cricket are some of the words that people used most often. Apparently, there are a lot of gossips going on relating Rohingya, Cricket, and News. But what about the other words that we heard and how do they relate to each other or do they even relate at all? In other words, can we say that that people talked about the news on Rohingya or news on something else? What about the country names that we see in the word cloud? Why do people care about India and Pakistan?
To have an understanding of these questions we now resort to a machine learning algorithm, named Latent Dirichlet Allocation (LDA), which considers a document as a collection of topics and each topic as a collection of words. Eventually, as part of the result, LDA assigns a probability to each word showing how likely each word to be part of a topic. In the context of our discussion, we can consider LDA as our intelligence consultant who will help us understand the noisy gossips that we heard about Bangladesh by giving us collections of words in a more structured way.
We asked LDA to find out the two most gossiped areas. LDA came up with the following two collections of top 10 words separated into two topics:
*The words with unusual spelling are not spelling mistakes. They have been converted into their stem or core form as part of data processing required for the analysis.
From the word clusters, now we can have a much clearer idea about the context of the most common words in our word cloud. A brief observation of the words reveal the subject matter of the two topics as Rohingya and Cricket. Looking at different words under topic 01 we can assume that people were concerned about the Rohingya people at the refugee camp, the plight of the Rohingya women and children. Similarly, from topic 02 we can assume that people talked about cricket update on news, other cricket playing countries such as Zimbabwe, India, Pakistan also came into their discussion.
Now, as we have an understanding of the most common topics around Bangladesh, we will move forward to understand emotion and sentiment of the people. In other words, we will try to see overall how people expressed opinions: positive or negative and how they felt: sad, angry, happy and so on. To do that, we broke down all the tweets into list of words. Then we matched the words with a lexicon. Lexicon is basically a list of words, in our case it’s a list of standard English words, along with the record of the general sentiment that each word carries. For example, ‘cry’ is associated with sadness or ‘laugh’ is associated with joy. For our purpose, we used the lexicon called NRC Emotion Lexicon (EmoLex) which is a crowd-sourced lexicon created by Dr. Saif Mohammad, senior research officer at the National Research Council, Canada. NRC lexicon has a division of words based on 8 prototypical emotions namely: trust, surprise, sadness, joy, fear, disgust, anticipation, and anger; and two sentiments: positive and negative.
From our analysis, as shown in above figure, we see that 20% of the words used in the tweets conveyed positive sentiment against around 12% conveying the negative sentiment. Words conveying emotions related to trust, and anticipation are the most dominant ones among all emotions. Words of joy trumps over words of fear but by a very small margin. One important note of caution here, when we conduct a standard lexicon based analysis we should be cognizant of the fact that the same word may carry different emotions based on the context which is not captured from such analysis. To put that into the context of this study, we are just a visitor on the street of Twitter. So, we won’t try to interpret much detail into the emotions. Rather we will try to have an overall idea about the emotions and sentiments that were expressed. A comparison between the overall emotional state and the emotional state related to a specific topic may provide us with some clearer understanding of overall emotional change. We see positive sentiment clearly prevails over negative sentiment overall. The emotion of joy trumps over sadness. We would like to see if these emotions and sentiments stay consistent or they change based on the topics that people talk about.
To check that we chose topic 01, Rohingya, to study a bit further. All tweets that contained ‘Rohingya’ are selected from our total set of tweets. And a similar sentiment analysis is conducted on those tweets as we did previously on the total set of tweets.
From the results shown above, we see a drastic change in the distribution of different emotions and sentiments. Unlike the overall sentiment analysis, sentiment analysis on Rohingya issue shows both positive and negative sentiments at a similar level. People expressed a significant amount of sadness about Rohingyas. Similar to the overall sentiment analysis, trust and anticipation were still quite dominant. The most striking change was the prevalence of fear. Though fear was not dominantly present in the overall tweets, it was the second most dominant emotion following sadness in case of Rohingya.
From our random walk on the street of Twitter, we have seen cricket and Rohingya are the two areas that people cared most when they talked about Bangladesh. Overall, people were more positive with exuding emotions of trust and anticipation most. On the other hand, it came down to the issue of Rohingya crisis, things were a bit different. People showed sentiments equally on both positive and negative sides. They felt sorry for Rohingya but also expressed a heightened sense of fear.
So, what does it mean for Bangladesh? In recent times we have seen quite a lot of efforts from different entities regarding nation branding and digitization. For a country with the vision of becoming a technology savvy tourist and investor attraction, digital footprint could be a great asset to manage. Collectively what people talk and how they talk about the country over the internet build part of the digital footprint of that country. This creates an impression of the country to the world. Such analysis of social media can give us an idea about that impression by showing us the direction where and how the general perception is moving. That in turn may help find out areas of improvement such as strategic use of communications to manage perceptions. For example, it was a great feat by Bangladesh to accommodate 700,000 Rohingya refugees during one of the worst humanitarian crises of recent times. Could Bangladesh spread out the great news effectively to create a good impression on her? Though this small study indicates that there is some sort of mixed feeling in people’s mind, we can’t decisively answer the question. However, similar studies conducted on a regular basis and large scale may lead us to the answers. To put that in the context that we set earlier, we should go for more walks with Bangladesh on different online streets such as Facebook, Youtube, Instagram and so on to know what people perceive about her and where to put efforts to manage the perception.
About Writer
ARAFATH HOSSAIN is a graduate student of MBA and Quality Management programs at Illinois State University, United States. Prior to starting his graduate programs, he worked for three years in market research and product management areas for respectively Millward Brown Bangladesh and Robi Axiata Limited. He is highly interested in the applicability of data analytics techniques as a tool for informed decision making. For any question/query/comment, reach Arafath at [email protected].