1 Introduction

Nowadays, the Internet is becoming worldwide popular, and it is serving as a cost-effective platform for information carrier by the rapid enlargement of social media. Several social media platforms like blogs, reviews, posts, tweets are being processed for extracting the people’s opinions about a particular product, organization, or situation. The attitude and feelings comprise an essential part in evaluating the behaviour of an individual that is known as sentiments. These sentiments can further be analyzed towards an entity, known as sentiment analysis or opinion mining. By using sentimental analysis, we can interpret the sentiments or emotions of others and classify them into different categories that help an organization to know people’s emotions and act accordingly. This analysis depends on its expected outcomes, e.g., analyzing the text depending on its polarity and emotions, feedback about a particular feature, and analyzing the text in different languages require detection of the respective language. It requires a large amount of data that may not be properly structured. Therefore, some preprocessing techniques are used to construct final data set from the extracted data. Moreover, the real-time analysis helps us to look into the current scenario and make decisions to get better results. The COVID-19 or Corona Virus has a major outbreak around the various parts of the world, and people are affected on a very large scale. This leads to a major loss in human life even though the people have different views on the outbreak of the Corona Virus. In the literature, various researchers have performed the sentiment analysis on COVID-19 with different perspective on different data sets to categorize the people’s opinion (Sailunaz and Alhajj 2019; Dubey 2020; Muthusami et al. 2020; Rajput et al. 2020; Pastor 2020; Alhajji et al. 2020; Medford et al. 2020; Prabhakar Kaila et al. 2020).

In this paper, the authors analyse people’s opinions around the world including India to understand the favorable or unfavorable situations for them. Therefore, our main focus is to do the sentiment analysis on COVID-19 to draw some conclusions on people’s opinion. Recently, it has been observed that the number of people actively participated in social media like facebook, twitter, etc. However, this work uses the tweeter, a social media platform, to collect the public opinions in the form of reviews, comments, post on COVID-19. In this proposed model, we scrape data from the twitter using the existing twitter APIs, and prepare two data sets, i.e., world-specific and India-specific data sets. After that the sentiment analysis is performed using different matrices like Average Likes and Re-tweets a period, Intensity Analysis, Polarity & Subjectivity, and Wordcloud. Along with this, the Bidirectional Encoder Representations from Transformers (BERT) model is also used for the classification of public opinions on corona virus.

Fig. 1
figure 1

Internal working of sentiment analysis system (Castillo et al. 2015)

The organization of this paper is as follows: Sect. 2 discuss the related work, and Sect. 3 discuss the proposed work on COVID-19. The performance analysis of this work is discussed in Sect. 4. Finally, Sect. 5 concludes the work and specifies its possible application and usefulness.

2 Related work

The sentiment analysis makes use of natural language processing techniques to analyse a persons opinion, emotion, and it is also known as an opinion mining (Alsaeedi and Khan 2019). In the recent past, the researchers have shown insights on such information and classify the emotions (Bakshi et al. 2016; Liu 2012). In literature, several techniques are available to perform sentiment analysis that involves extracting lexical sentiments with the documents (Kim and Hovy 2006), how features are associated with the sentiment using bi and tri-grams (Dave et al. 2003). As the emotions are now a common way to express feelings, therefore emojis can be used for positive, neutral and negative thoughts (Agarwal et al. 2011). Figure 1 exhibits the internal working of the system by using any one of the existing methods to perform sentiment analysis (Castillo et al. 2015). This figure exhibits the process to classify the text among various sentiment groups like positive, negative, and neutral.

Nowadays, the impact of coronavirus is being deliberately changed the social and personal lives all over the world. Therefore, many researchers are working to observe the sentiments towards novel coronavirus from a different perspective, and depicting their conclusions in different ways using available tools and techniques (Sailunaz and Alhajj 2019; Alsaeedi and Khan 2019). In the sequence of this work, Dubey et. al. have collected the tweets over 20 days in March from four states of the European continent to analyse the explosion of the novel coronavirus (Dubey 2020). Medford et. al. applied the unsupervised machine learning techniques for analyzing the collected data on coronavirus (Medford et al. 2020). In other work, Alhajji et. al. used the Naive Bayes classifier to analyse and classify the collected data scraped from tweeter (Alhajji et al. 2020), and Rajput et. al. have examined the tweets published in January 2020 (Rajput et al. 2020). They have emphasized on the word occurrence pattern and sentiment recognition using bi-grams and tri-grams methods. Kaila et. al. have applied the Latent Dirichlet Allocation (LDA) technique on collected data to analyse the outbreaks of novel COVID-19 and inferred that people are frightened of the novel coronavirus storming around the globe (Prabhakar Kaila et al. 2020). Muthusami et. al. have conceived the outbreak of COVID-19 to analyse the sentiments by using different machine learning approaches (Muthusami et al. 2020). Kaur et. al. have used the NLTK library to perform the cleaning, and Textblob data set to analyse the tweets for categorize the public’ sentiments (Kaur and Sharma 2020). Pastor et. al. have scraped the tweets from the Philippines region and study the consequences of the symptoms of the novel coronavirus on community quarantine (Pastor 2020). The centre point of all above-said work is the coronavirus and its implications by analyzing the text of social media. Therefore, the authors have performed the sentiment analysis on the twitter data to specify the situation in India as well as in the rest of the world. During this pandemic situation, a specified lockdown is being announced throughout the world, and the people including celebs, sports personalities, and the top leaders from various organizations have been expressing their views on Twitter. Therefore, it becomes the most valuable platform for researchers to collect the data to pursue their research to analyse the people opinions.

3 Sentiment analysis on COVID-19

Natural Language Processing provides a way to analyse textual data using different approaches that work on different parameters ranging from manual work to an automatic process using in-built libraries. The rule-based approach is one form where the rules are manually defined to perform stemming and covering the text data into tokens, and then classifying the tokens in positive, neutral, and negative categories according to their essence. These categories are used to calculate the polarity of the sentences that helps to decide the polarity of the text. On the other hand, automatic approaches employ the machine learning techniques to categorized the given sentences. Both the approaches have their own pros and cons. Therefore, in this work, the hybrid approach of aforementioned techniques is used to design the proposed model. Figure 2 exhibits the working of the proposed model that is divided into two phases: training phase and prediction phase. In the training phase, the tagged data are provided as the input to a machine learning algorithm to build a classifier model. In the prediction phase, untagged textual data are categorized using the built classifier.

Fig. 2
figure 2

Working of the proposed sentiment analysis model

Table 1 Description of scraped data of COVID-19

The sentiment analysis process requires two phases:

  1. 1

    Data set preparation phase and

  2. 2

    Sentiment analysis phase

The data set preparation phase requires the following steps: scraping data from twitter, cleaning the data, and selecting the relevant features. We scrape tweets from the twitter using the scraper and the tweepy python APIs and filter the scraped data according to our requirements to perform sentiment analysis, e.g., we may scrape the country-specific data on COVID-19. The scraped data have a number of features, however, we need only selected features to carry out the further process. In sentiment analysis phase, we look at insights on cleaned data for various measures as polarity, subjectivity, wordcloud, etc., and use the BERT model for emotions classification (Sun et al. 2019a). The following subsections explain the complete process of the proposed model.

3.1 Data set preparation phase

The data set preparation phase comprises of the following steps: Data scraping and cleaning, and selection of the relevant features.

3.1.1 Data scraping and cleaning

We select Twitter, a social media platform, for extracting tweets on COVID-19, and use the Twitter scraper and the tweepy APIs for data scraping. Table 1 exhibits the total number of tweets extracted from the twitter to prepare the data sets. Thereafter, we clean the scraped data set using regression where tweet text is mapped with an equation and filters out links, images, and emotions from the text. A time stamp feature is a composite entity of date and time in that we require only date of a tweet. Twitter scrapper API is used to extract data with hashtags (#COVID2019 OR #COVID19 OR corona&virus) from 20 Jan 2020 to 25 April 2020 concerning the beginning and end date of each statement to get the required attributes. We construct two data sets, first contains the tweets from the entire world, and another contains the tweets from India only. We use keywords like ’India’ and ’Modi’ to filter out tweets from India. The Indian tweet 1 data set is created using the keyword ’India’ while the Indian tweet 2 data set is created by using the keyword ’Modi’. To get the resultant data set from Indian tweets, we merge both the Indian tweets 1 and the Indian tweets 2 data sets. We do not have an appropriate method to use location filter, therefore, we use the specific keywords to create the second data set. English language is the only medium of expression of emotions in our data set. The APIs yield a large number of features from derived tweets and among them some features may be irrelevant. Therefore, a feature selection technique is used to extract only the relevant features.

3.1.2 Feature selection

After cleaning the data, we perform the feature selection to select the relevant features using Minimum Redundancy Maximum Relevancy (mRMR) (Agarwal and Mittal 2013; Shirzad and Keyvanpour 2015). The scraped data set has various features such as has_media, hashtags, img_urls, is_replied, is_reply_to, likes, links, parent_tweets, replies, reply_to_ users, retweets, screen_ name, text_html, timestamp, tweet_ id, tweet_url, user_id, username, video_url, etc. Among aforementioned feature, we select only tweet_id, likes, retweets, timestamp, and text_html features of a tweet to create our resultant data set for the sentiment analysis.

3.2 Sentiment analysis phase

After preparing the resultant data sets, we analyse both data sets to calculate various measures using the in-built libraries and functions provided by Numpy and Pandas. We select five metrics, Average Likes over the period, Average Re-tweets over the period, Intensity Analysis, Polarity & Subjectivity, and Wordcloud, for analysis, and use the BERT model for classification (Devlin et al. 2018; Sun et al. 2019b). The details of each metric are given in the following subsections.

3.2.1 Average likes over a period

More likes on tweets show higher similarity in thoughts. People read tweets online, relate it with the tweets by other people, and then like accordingly. As we have date wise data, we sum up the likes for each tweet on daily basis and calculate the average/mean using appropriate functions to create a time series plot of the average likes for the tweets.

3.2.2 Average re-tweets over a period

Re-tweeting the tweets express your consent for the same; people may also re-tweet the already re-tweeted tweets. Therefore, we calculate the mean re-tweets as the mean of likes that give the time series plot of the average re-tweets.

Fig. 3
figure 3

Method used to assign sentiment value to the text

3.2.3 Intensity analysis

Every text has an intensity value, which signifies how a text is positive or negative. We use the Vader sentiment analyzer to calculate the intensity. This analyzer provides a \(polarity\_score()\) method that takes the tweet text as an input and produces the polarity of the entire sentence on the basis of word by word. We have divided the polarity into seven different categories which are neutral, weakly positive, mild positive, strongly positive, weakly negative, mild negative and strongly negative for the different range of the polarity as shown in Fig. 3. In the literature, it has been observed that the sentiments of a text mainly classified into three groups: positive, negative, and neutral (Agarwal et al. 2011; Akhtar et al. 2018; Yu et al. 2017). However, some other researchers further classify the positive and negative sentiments into sub-categories depending on the application requirements (Bouazizi and Ohtsuki 2017; Tian et al. 2018). Therefore, we extract different keywords from the tweets and classify the positive and negative sentiments of text in aforementioned sub-categories using the polarity method defined in Fig. 3.

3.2.4 Polarity and subjectivity

Polarity and Subjectivity are one of the most important metric for sentiment analysis. The Polarity is the intensity or the strength of the emotion showing through text that indicates the behaviour of the person. On the other hand, subjectivity is a person opinion or view on a specific issue or a thing. It is not necessary that a subjective sentence defines polarity or depicts a behaviour, i.e., it can be a normal statement made as an opinion. We use TextBlob() library function for the same purpose that takes a sentence as an input and returns the polarity and the subjectivity of the text. A mean of these values has been taken for each date, and relevant plots have been made. Figure 4 shows the working of TextBlob() method and its respective result (Ahuja and Dubey 2017; Hasan et al. 2018). The polarity score values are ranging from [\(-1\) to + 1], and the subjectivity score values are ranging from [0 to + 1].

Fig. 4
figure 4

Working of the TextBlob method

3.2.5 Wordcloud

Each sentence consists of several words having different intensity and behaviour. In the previous step, we have calculated the type of polarity i.e., positive, negative, and neutral. Each of them has a wordcloud showing different words that come in that category. Wordcloud shows all the words and frequencies that indicate the size of respective words. The bigger word represents a high frequency of occurring in the text. Stop words also have a high frequency than any other words, but these words do not make any sense and do not show the emotion. Therefore, we remove the stop words before making the wordcloud.

Fig. 5
figure 5

Calculated mean of likes over a period

Fig. 6
figure 6

Calculated mean of re-tweets over a period

3.3 Emotion classification using BERT

In this phase, Bi-directional Encoding Representation for a Transformer (BERT) model is used for emotion classification. The meaning of a word in a given sentence depends on the other words surrounding it. The BERT feeds all input at once to handle dependencies among words, and it is of two types: BERT-base model and BERT-large model. The BERT-base model uses 12 transformer encoders while the BERT-large model uses 24 transformer encoders. We can easily fine-tune the BERT model to get the desired results. We follow the following steps to create a mask and the encoder representation of the BERT using hugging face with pytorch library for emotion classification:

  • Dividing the collected data into training and testing sets using train-test split.

  • Converting the training set into respective torch tensors for the model.

  • Defining the batch size to create tensors and iterators to fine-tune the BERT model.

  • Training the BERT model using the model parameters and validating its accuracy.

  • Evaluating the performance of the model

4 Performance analysis

This section presents the implementation and experimental details of the work. This work presents the sentiment analysis on COVID-19 and uses five metrices for performance analysis that are Average Likes over the period, Average Re-tweets over the period, Intensity Analysis, Polarity & Subjectivity, and Wordcloud.

4.1 Experimental setup

We have implemented the proposed model on Intel Core i5 CPU 2.40 GHz with 4 GB RAM and IDE disk under Windows 10 64-bit operating system for sentiment analysis. We use the Anaconda 2019.10, a open-source software for developing Machine Learning and Artificial Intelligent projects, the existing python libraries such as Numpy, Pandas, and scikit-learn in Python 3.7 on Jupyter Notebook, and an Integrated Development Environment (IDE), i.e., made for programming in Python. We use Twitter, a social media platform, to build our data set to perform the sentiment analysis on the coronavirus. We have scraped the data from twitter over 80 days period of time from 20 Jan 2020 to 25 April 2020.

4.2 Results and discussion

In this section, we discuss the evaluated results from our proposed experimental setup with the help of five aforementioned performance metrices in detail.

Table 2 Percent values of the sentiments’ group for world data set
Table 3 Percent values of the sentiments’ group for India data set
Fig. 7
figure 7

Multi-class classification of Sentiments

4.2.1 Number of likes and re-tweets over a period

We calculate the mean likes for each day which results in average likes over 80 days on both data sets. Figure 5a and b exhibit the mean likes, while Fig. 6a and b exhibit the mean re-tweets over a given period for the world and Indian data sets, respectively. After the coronavirus outbreak in Wuhan, China, public around the world get curious to know about the coronavirus and how it is affecting people’s lives. From Figs. 5a and 6a, we observe a large hike in likes and re-tweets on 29 Jan 2020 and 9 Mar 2020 due to the large number of death cases has reported around all over the world. The people from different countries have evacuated from China after this outbreak. The USA also sent flights to bring back their citizens from China. As there were delaying in the evacuation plan, people were much more active about the situation and at the same time, more interaction has been carried out through social media.

From Figs. 5b and 6b, it can be seen a sizeable rise on 26 Feb 2020 and 26 Mar 2020 in terms of likes and re-tweets. The first peak in February shows the time when India decided to deploy a raft of naval ships as well as military aircraft to deliver relief goods and evacuate its citizens from China, at that time, the people have reacted much with tweets. The next peak was come in March that was the time when a three-week lockdown was announced in the entire country; hence we can observe the hike in likes there too.

Fig. 8
figure 8

Calculated mean of polarity over the period

Fig. 9
figure 9

Calculated mean of subjectivity over the period

Fig. 10
figure 10

Wordcloud of the world data set

Fig. 11
figure 11

Wordcloud of the India data set

Fig. 12
figure 12

Classification using BERT

4.2.2 Intensity analysis

We have already discussed in the earlier section about seven different categories to analyse the tweets intensity ranging between \(-1\) and + 1 and treated strong intensity values beyond the given range. The selected range indicates the polarity of a tweet that means how readers will judge the tweet of the others when s/he has gone through with the tweet. The intensity categories are neutral, weakly positive, mildly positive, strongly positive, weakly negative, mildly negative, and strongly negative. Tables 2 and 3 exhibit the intensity values in numbers and percentages for each category obtained for the world data set and the Indian data set, respectively. Figure 7 shows the pie chart of the world data set and the Indian data set, and it represents the data in more interpretable way, and the plots exhibit the segments of each category.

From Tables 2 and 3, we observe that the world data set has more neutral sentiments as compared to the Indian data set, however, strongly negative tweets are more in the Indian data set due to hate for the coronavirus in the initial stages. We can also observe that the strong positive tweets in the Indian data set are much more than the world data set because of the suitable and efficient actions taken by the Indian government to help people and to control this pandemic situation.

4.2.3 Polarity and subjectivity

This metric helps to check the sentiments and their intensity over 80 days. We have taken the mean value to make 80 unique records for each date. Figure 8 exhibits the polarity over a period for the world data set and the Indian data set, respectively. This figure reveals that the mean value for the world sentiments is ranging mostly between 0.03 to 0.07, and does not drop below zero, whereas, for the Indian data set, the mean value drops below zero at various intervals. During the lockdown phase when the virus was spreading epidemically, the mean value shows the most negative polarity.

Figure 9 exhibits the mean subjectivity values over the period for the world data set and the Indian data set, respectively. We use the mean subjectivity values and plot the graph using a scattered plot. These plots change the color of markers according to the values on the y-axis in all four cases. It can observe that the result in the first case is compact within a small limit and makes two segments increasing gradually in March with fluctuation then behavior turns to normal. However in the second case, it varies in a much larger limit. We can see how instantly it changes the next day in some of the cases.

4.2.4 Wordcloud

After plotting all the numeric information from the tweeter, we use the text data to plot wordclouds for three categories, neutral, positive, and negative. We create three different data sets using the sentiment value calculated previously. Tweet text is used to plot these clouds, and more stop words are added to filter the words. Figure 10 exhibits the wordcloud made for the positive, neutral, and negative text sentiments for the world data set. The data set comprises of several type of words in which peoples’ opinion can be reflected and used to classify the text in a aforementioned categories. Some categories along with its possible words as given below:

  • The words like flu, coronavirus, novel, and many other words in all wordcloud that does not belong to any one of the categories.

  • The words like new cases, Wuhan, treatment, Chinese, Germany, Italy, confirmed, via, etc., are commonly used to express people opinions and reactions for spreading virus. These words are seems like neither positive nor negative.

  • The word space like health, commodities, crew, charge, vaccine, God, save, hospital, etc., is used to represent positiveness and signify the actions taken during the outbreak.

  • The words like pandemic, death, fear, etc., can be classified as the negative space that shows the issues faced by people during the times of crisis in the world.

  • The words representing actions like travel ban, fight back, infected with, etc., express a lot of emotions that may possess hate or angry towards China being so careless about this pandemic situation.

  • The words like worse, cry, etc., can be seen as the death rates started increasing in various parts of the globe.

Figure 11 shows the wordcloud for the positive, neutral, and negative text sentiments for the Indian data set. India had a mixed reaction on this outbreak of the coronavirus. The repetition of words can be observed in the plots, and questions have been asked constantly to the government about the pandemic situation. Whenever, the infection rate get increases, and tweets with words like new, infected, confirmed, deaths, etc., can be observed at large scale in the wordcloud.

4.2.5 BERT implementation

We discussed how to fine-tune our BERT and prepare data for training. We fetched pre-labeled data from a Github repository that worked with similar data (Savan 2020; Sun et al. 2019a). Figure 12a shows the 6 emotion groups, and their corresponding label encodes assigned by the label encoder. Figure 12c exhibits the label encodes for actual and predicted values. We can observe that the actual label for serial number 10 is 1, i.e., fear, whereas, the predicted label is 5, i.e., surprise and remaining all others are predicted correctly. Figure 12b exhibits validation accuracy as 0.9389 (i.e., 93.89%). The MCC validation accuracy is based on the Matthews correlation coefficient (MCC) that is a widely used statistical rate that generate a high score prediction results. We can also show the performance in terms of training operations like the training loss through the line chart. Overall, the evaluated accuracy reveals the strength of the proposed model in the sentiment analysis.

5 Conclusion and future scope

In this work, the sentiment analysis is performed with the help of the BERT model on the twitter data sets. The data set is categorized on the basis of location of tweets made by people of India and rest of the world. The collected tweets are taken at the time when there was lack of positiveness about the coronavirus around the world that impact their personal and professional lives. Simultaneously, it has been observed that people from India have relatively more positive communication on the twitter and less tendency towards spreading negativity. The emotion analysis indicates the success or failure of the measures adopted by the government of a country in various circumstances. Further, it can be observed that the efficacy of taken measures for the people of a country that can support the government in taking more significant decisions to tackle novel coronavirus. The overall performance of the proposed model in terms of the validation accuracy on the collected data sets is approximately 94%.