ArticlePDF Available

Arabic sentiment analysis of Monkeypox using deep neural network and optimized hyperparameters of machine learning algorithms

January 2024
Social Network Analysis and Mining 14(1)

January 2024
14(1)

DOI:10.1007/s13278-023-01188-4

Authors:

Hasan Gharaibeh

Yarmouk University

Rabia Emhamd Al Mamlook

Western Michigan University

Ghassan Samara

Zarqa University

Ahmad Nasayreh

Yarmouk University

Show all 10 authorsHide

Sentiment analysis, a branch of natural language processing (NLP), has gained significant attention for its applications in various domains. This study focuses on utilizing machine learning and deep learning algorithms for sentiment analysis in the context of analyzing Monkeypox using Arabic sentiment text. The objective is to develop an accurate and efficient model capable of classifying Arabic text into sentiment categories, facilitating the understanding of public perceptions toward Monkeypox. The study begins by collecting a diverse dataset of Arabic text containing sentiments related to Monkeypox. Machine learning algorithms, such as Support Vector Machines, Naive Bayes, and Random Forest, along with deep learning (DNN) techniques, including Recurrent Neural Networks and Transformer models, are employed for sentiment classification. Hyperparameter optimization techniques were implemented to fine-tune the models for optimal performance. The impact of various hyperparameters on the model is assessed to select the best configuration. Experimental results demonstrate the effectiveness of the proposed sentiment analysis models in accurately classifying Arabic sentiment text related to Monkeypox. The DNN models based on Leaky ReLU showcased the significance of leveraging complex representations for NLP tasks with 92%. Hyperparameter optimization aids in selecting suitable configurations, improving model accuracy, and reducing overfitting. The findings from this study contribute to advancing sentiment analysis techniques in Arabic text and provide valuable insights into public sentiments toward Monkeypox. The developed models can be utilized in public health monitoring, crisis management, and policymaking, offering valuable insights into the sentiment landscape surrounding the disease.

Pseudo-code of preprocessing steps

…

Pseudo-code of DE optimization

…

Pseudo-code of neural net model

…

The proposed methodology of work

…

DNN architecture

…

Figures - uploaded by Ahmad Nasayreh

Content may be subject to copyright.

Content uploaded by Ahmad Nasayreh

Content may be subject to copyright.

Vol.:(0123456789)

Social Network Analysis and Mining (2024) 14:30

https://doi.org/10.1007/s13278-023-01188-4

ORIGINAL ARTICLE

Arabic sentiment analysis ofMonkeypox using deep neural network

andoptimized hyperparameters ofmachine learning algorithms

HasanGharaibeh1· RabiaEmhamedAlMamlook2,3· GhassanSamara4· AhmadNasayreh1· SajaSmadi1·

KhalidM.O.Nahar1· MohammadAljaidi4· EssamAl‑Daoud4· MohammadGharaibeh5· LaithAbualigah6,7,8,9,10,11,12

Received: 18 October 2023 / Accepted: 18 December 2023

Abstract

Sentiment analysis, a branch of natural language processing (NLP), has gained signiﬁcant attention for its applications in

various domains. This study focuses on utilizing machine learning and deep learning algorithms for sentiment analysis in

the context of analyzing Monkeypox using Arabic sentiment text. The objective is to develop an accurate and eﬃcient model

capable of classifying Arabic text into sentiment categories, facilitating the understanding of public perceptions toward

Monkeypox. The study begins by collecting a diverse dataset of Arabic text containing sentiments related to Monkeypox.

Machine learning algorithms, such as Support Vector Machines, Naive Bayes, and Random Forest, along with deep learning

(DNN) techniques, including Recurrent Neural Networks and Transformer models, are employed for sentiment classiﬁcation.

Hyperparameter optimization techniques were implemented to ﬁne-tune the models for optimal performance. The impact

of various hyperparameters on the model is assessed to select the best conﬁguration. Experimental results demonstrate the

eﬀectiveness of the proposed sentiment analysis models in accurately classifying Arabic sentiment text related to Monkeypox.

The DNN models based on Leaky ReLU showcased the signiﬁcance of leveraging complex representations for NLP tasks

with 92%. Hyperparameter optimization aids in selecting suitable conﬁgurations, improving model accuracy, and reducing

overﬁtting. The ﬁndings from this study contribute to advancing sentiment analysis techniques in Arabic text and provide

valuable insights into public sentiments toward Monkeypox. The developed models can be utilized in public health monitor-

ing, crisis management, and policymaking, oﬀering valuable insights into the sentiment landscape surrounding the disease.

Keywords Arabic sentiment analysis· Machine learning· Optimization· Hyperparameter tuning· DNN· NLB

1 Introduction

Sentiment analysis is considered one of the important topics

in recent years, as it analyzes people’s opinions about a spe-

ciﬁc problem or topic, whether the opinion is negative, posi-

tive, or neutral. Information is gathered from social media

platforms such as Facebook, Twitter, and YouTube (Cambria

2022). Natural language processing plays a signiﬁcant role

in this ﬁeld, whether it is removing stop words, performing

stemming, and much more. Sentiment analysis targets vari-

ous ﬁelds, including politics, economics, social issues, and

health. This approach allows companies to assess customer

satisfaction with their products or services. This complex-

ity of Arabic includes multiple dialects and morphological

aspects (El-Beltagy and Ali 2013).

One area where sentiment analysis can have a signiﬁcant

impact is in health care and disease monitoring. Accurate

and timely identiﬁcation of sentiments related to speciﬁc

diseases can help public health authorities and medical pro-

fessionals understand public perception, identify potential

outbreaks, and respond eﬀectively. In this context, Monk-

eypox, a viral disease that aﬀects humans and animals, pre-

sents a unique challenge. Monkeypox outbreaks can lead to

panic, misinformation, and fear among the public. Monitor-

ing sentiment related to Monkeypox can provide early warn-

ing signs, identify areas of concern, and guide public health

interventions. In recent times, deep learning has been used

in sentiment analysis for Arabic language databases. The

number of studies that have utilized deep learning for Arabic

language research is very limited. The database was col-

lected from Twitter for this study, given the increased inter-

action on social media platforms. These platforms serve as

a space for expressing opinions and exchanging information

Extended author information available on the last page of the article

Social Network Analysis and Mining (2024) 14:30 30 Page 2 of 18

on various aspects of life. According to a study conducted

by Arab social media outlets, the number of Arabic users

on Twitter reached 11 million, with the number of tweets

exceeding 849 million tweets (Salem 2017).

This paper investigates the problem of Monkeypox by

proposing a new sentiment analysis approach using machine

learning and deep learning to identify the features that play

a signiﬁcant role in capturing sentiment from tweets. The

tweets written in Arabic were classiﬁed into three catego-

ries: positive, neutral, and negative. The following models

were used to conduct this study: Support Vector Machine

(SVM), AdaBoost, XGBoost, and LGBM (Light Gradient

Boosting Machine). By creating a robust sentiment analysis

methodology speciﬁcally designed for Arabic sentiment text

related to Monkeypox, we can gain a valuable understand-

ing of public sentiment and swiftly address concerns while

countering the spread of false information. The results of

this study have the power to signiﬁcantly contribute to public

health initiatives by equipping decision-makers with crucial

tools for eﬀective disease management and control.

The paper is organized as follows: Sect.2 contains the

related previous works on sentiment analysis in the Ara-

bic language. Section3 includes a detailed explanation of

the database, along with its preprocessing. It also describes

the models used in this study. Section4 presents the results

obtained after applying the proposed approach to the

database.

2 Related work

Researchers have conducted extensive studies on sentiment

analysis and disease monitoring, covering various aspects

and employing diﬀerent techniques. A study by Atoum and

Nouman (2019) proposed a model for categorizing tweets

in the Jordanian dialect into negative, neutral, and posi-

tive classes. They utilized Naive Bayes and Support Vector

Machine (SVM) algorithms, with SVM achieving an accu-

racy of 82.1%. The study collected 1000 tweets from Twitter

using a dedicated reading application.

Tan etal. (2023) proposed a novel model for sentiment

analysis. They applied emotional variance analysis on stu-

dent journals garnered from an experiential course, and

they found that EVA is helpful for proﬁling variations in

sentiment polarity, with results showing an accuracy of

88.7% using a Multi-Layer Perceptron (MLP) machine

learning model. Jain etal. (2023) presented two methods

to visualize the results in order to interpret the system’s

decisions. The study used sentiment analysis based on

Natural Language Processing. Valence Aware Diction-

ary for Sentiment Reasoning (“VADER”) and Locally

Interpretable Model-Agnostic Explanations (“LIME”) have

been used for visual justiﬁcation of the result, increasing the

understanding.

Rukhsar etal. (2023) performed experiments on a dataset

of 90,000 tweets relevant to the COVID-19 pandemic using

both deep learning and machine learning methods in order

to understand the psychological impact of the pandemic on

people. The deep learning model Long Short-Term Memory

(LSTM) and the Support Vector Machine (SVM) classiﬁer

both achieved 90% accuracy. Huang etal. (2023) detected

the techniques used for sentiment analysis (SA) in current

e-commerce platforms and the future directions for e-com-

merce. The paper has chosen 54 experimental papers for

review, 26 of which have employed machine learning tech-

niques. In contrast, 24 employed SA through deep learning

techniques, and four employed both machine learning and

deep learning techniques.

Rodríguez-Ibánez et al. (2023) conducted a

comprehensive review of the multifaceted reality of

sentiment analysis in social networks about any topic on

this platform. This paper reviews the domains where

these techniques have been applied, including academic

perspective, causal relationships, temporal dynamics, and

applications in industry. Heikal etal. (2018) focused on

deep learning approaches for sentiment analysis of Arabic

tweets. They employed the Long Short-Term Memory

(LSTM) and Convolutional Neural Network (CNN) models

to predict sentiment. Their study utilized the Arabic

sentiment tweets dataset (ASTD), which contained 10,000

tweets categorized into negative, neutral, positive, and

objective sentiments. The models achieved an F1-score

of 64.46%. Sayed etal. (2020) conducted a classiﬁcation

task on a database of hotel reviews from the Booking.com

website. The database consisted of 6318 reviews written in

various forms of the Arabic language. The study utilized

nine models, including Naive Bayes, K-Nearest Neighbor

(KNN), Multi-layer Perceptron (MLP), Random Forest

(RF), Support Vector Machine (SVM), Decision Tree

(DT), Gradient Boosting (GB), Ridge Classiﬁer (RC), and

Logistic Regression (LR). The Ridge Classiﬁer attained the

highest accuracy of 95.21%. In their study, Alayba etal.

(2018) introduced a sentiment analysis approach for Arabic

text using Convolutional Neural Network (CNN) and Long

Short-Term Memory (LSTM) models. Multiple databases

were utilized, including datasets related to Arabic health-

care services, Twitter, and Arabic sentiment tweets. The

study achieved impressive accuracies of 94.24%, 95.68%,

and 92% for each respective database. Oussous etal. (2020)

proposed an approach employing Convolutional Neural

Network (CNN), Long Short-Term Memory (LSTM),

Naive Bayes, and Support Vector Machine (SVM) models

for sentiment analysis. Various preprocessing techniques,

such as normalization, stop-word removal, stemming,

Social Network Analysis and Mining (2024) 14:30 Page 3 of 18 30

and tokenization, were applied. The Moroccan Sentiment

Analysis Corpus (MSAC), comprising 2000 reviews, was

used for evaluation, resulting in an accuracy of 96%. Hadwan

etal. (2022) presented an approach to determine user

opinions on diﬀerent applications in the Kingdom of Saudi

Arabia. The researchers collected a database of 8000 reviews

from social media platforms and Google Play, which was

subsequently reduced to 7759 reviews after preprocessing.

Multiple classiﬁers, including Naive Bayes, Decision Tree,

Support Vector Machine (SVM), and K-Nearest Neighbor

(KNN), were employed. The approach achieved an accuracy

of 78.46%.

Baker etal. (2020) conducted a study on the sentiment

classiﬁcation of Arabic tweets related to inﬂuenza. Multi-

ple classiﬁers, such as Decision Tree (DT), Support Vec-

tor Machine (SVM), Naive Bayes, and K-Nearest Neighbor

(KNN), were utilized. The dataset consisted of 6300 tweets

collected from Twitter. The Naive Bayes classiﬁer achieved

the highest accuracy of 89.06%. Gamal etal. (2019) focused

on analyzing emotions in various Arabic dialects. They

gathered a large collection of 151,000 tweets and classiﬁed

them into positive and negative sentiments. Support Vec-

tor Machine (SVM), Naive Bayes, Ridge Regression, and

AdaBoost models were applied, with the Ridge Regression

model achieving an impressive accuracy of 99.9%. In their

study, Mohammed and Kora (2019) proposed an advanced

approach utilizing deep learning techniques to analyze emo-

tions across multiple topics. They employed a dataset com-

prising 40,000 tweets collected from Twitter. Three classi-

ﬁers, namely, Convolutional Neural Network (CNN), Long

Short-Term Memory (LSTM), and RCNN, were utilized,

along with the utilization of word embeddings. The LSTM

model demonstrated the highest accuracy of 81.31%. Nota-

bly, the study observed an 8.3% increase in accuracy when

increasing the dataset size. Aloqaily etal. (2020) focused

on sentiment analysis in the context of the Syrian crisis

and civil wars using the Arabic tweets dataset. This data-

set consisted of 2000 tweets, and various machine learn-

ing algorithms, including Simple Logistic, Logistic Model

Trees (LMT), Support Vector Machine (SVM), K-Nearest

Neighbors (KNN), and Vote, were employed. The proposed

approach achieved commendable results, including an accu-

racy of 85.55%, an AUC of 86%, and an F1-score of 92%.

Hnaif etal. (2021) proposed an approach for sentiment

analysis, incorporating normalization, stemming, and stop-

word removal techniques. They utilized classiﬁers such as

Naive Bayes, Support Vector Machine (SVM), K-Nearest

Neighbor (KNN), Random Forest, and Decision Tree. The

database consisted of 6138 tweets and posts collected from

Facebook and Twitter at both local and international levels.

The Support Vector Machine achieved the highest F1-score,

reaching 83%.

In (Al-Tamimi etal. 2017), supervised models were

employed to classify comments on YouTube. The research-

ers collected a database from YouTube encompassing 8053

comments from 23 Arab countries. They utilized models

including K-Nearest Neighbor (KNN), Bernoulli Naive

Bayes, and Support Vector Machine with RBF kernel,

achieving an accuracy of 88.8%. Abu-Farha and Magdy

(2020) focused on the detection of sarcastic tweets using the

ArSarcasm database, which contained 10,547 tweets with

16% of them being sarcastic. They employed a model called

BiLSTM, which achieved an accuracy of 46%. Alwakid etal.

(2017) analyzed Arabic sentiments related to unemployment

in the Kingdom of Saudi Arabia using a dataset of 4000

tweets. They utilized Support Vector Machine (SVM) and

Naive Bayes classiﬁers, achieving an accuracy of 73%. In

the study by Alayba etal. (2017), tweets regarding Arab

opinions on health-care services were collected from Twit-

ter, resulting in a ﬁltered database of 2026 tweets. Various

machine learning models, including Logistic Regression,

Support Vector Machine (SVM), Naive Bayes, Stochas-

tic Gradient Descent, and Convolutional Neural Network

(CNN), were employed. The study achieved an accuracy of

90.14%. Table1 shows the previous studies closely to pro-

posed work.

The previous studies have made valuable contributions

to the ﬁeld of sentiment analysis in diﬀerent contexts. How-

ever, they also come with certain gaps and limitations that

need to be addressed. One common limitation observed

in some studies is the scarcity of data, which can impact

the accuracy of the sentiment analysis approach. This

emphasizes the need for larger and more diverse datasets

to improve the robustness of the models. Furthermore, a

few studies encountered challenges with the suitability of

the applied methods for the speciﬁc type of data they were

analyzing. It is crucial to select appropriate techniques that

align with the characteristics and nuances of the given data-

set to ensure accurate sentiment analysis results. While the

mentioned studies have explored sentiment analysis in vari-

ous domains, such as health care and social media, there is

a gap in the literature when it comes to studying sentiment

analysis speciﬁcally for Monkeypox using Arabic sentiment

text. Due to the lack of research interested in the Arabic

language, in addition to the fact that Monkeypox is one of

the diseases that have gained wide popularity in terms of

its cause and source, diﬀerent opinions have emerged about

this disease, some of which were negative and some were

positive. This highlights an opportunity for future research to

ﬁll this gap and develop a comprehensive sentiment analysis

approach tailored to analyze Arabic sentiment text related

to Monkeypox.

This study will improve disease monitoring. By develop-

ing a robust sentiment analysis approach for Monkeypox

using Arabic sentiment text, this enables early detection

Social Network Analysis and Mining (2024) 14:30 30 Page 4 of 18

of potential outbreaks, identiﬁcation of areas of concern,

and proactive intervention strategies. Therefore, this study

will be insights into public sentiment. This study will oﬀer

valuable insights into public sentiment toward Monkeypox

through sentiment analysis of Arabic text. Understanding

public opinions, emotions, and attitudes toward the disease

is crucial for eﬀective public health campaigns and com-

munication strategies. The insights gained from this work

can guide targeted interventions, address concerns, and dis-

pel misinformation to foster trust and cooperation within

aﬀected communities. Finally, this study will be as decision-

making tool for disease management. The developed senti-

ment analysis approach will provide decision-making tools

for public health professionals in the ﬁeld of disease man-

agement. By analyzing sentiment trends in Arabic sentiment

text, authorities can prioritize resources, allocate personnel,

and tailor intervention strategies based on the sentiment pat-

terns observed. This optimizes eﬀorts and ensures eﬃcient

utilization of limited resources.

3 Methodology

This section describes the proposed method of DNN that

is used for classiﬁcation and ML algorithms that are used

for comparing with the proposed approach. Figure1 shows

the framework started from data collection and preprocess-

ing, which are removing duplicated values, replacing each

emoji with text, removing Arabic stop words, removing non-

Arabic words, removing diacritics and numbers and punc-

tuation, removing hashtags and links, and reducing words to

their roots, and the end of preprocessing is extract features

from text using Count Vectorizer and TﬁdfTransformer

and SMOTE. Final classical ML algorithms are applied

to the data, which are Light Gradient Boosting, AdaBoost,

XGBoost, and SVM. Also, diﬀerential evolution optimiza-

tion is used with the ML algorithm, and then, the proposed

DNN approach is applied.

Table 1 Summary of the previous work

References Technique Dataset Best result

Atoum and Nouman (2019) Naive Bayes and SVM 1000 tweets collected from Twitter Accuracy of 82.1%

Tan etal. (2023) Multi-Layer Perceptron (MLP)

machine learning model

Emotional data Accuracy of 88.7%

Rukhsar etal. (2023) LSTM and SVM 90,000 tweets relevant to the

COVID-19

Accuracy of 90%

Heikal etal. (2018) LSTM and CNN Arabic sentiment tweets dataset

(ASTD)

F1-score: 64.46%

Sayed etal. (2020) NB, KNN, MLP, RF, SVM, DT,

GB, RC, and LR

6318 reviews written in various

forms of the Arabic language

95.21%

Alayba etal. (2018) LSTM and CNN Arabic health-care services dataset,

Twitter dataset, and Arabic senti-

ment tweets dataset

First database: 94.24%, second

database: 95.68%, and third

database: 92%

Oussous etal. (2020) CNN, LSTM, SVM, and Naïve

Bayes

Moroccan Sentiment Analysis

Corpus (MSAC)

96%

Hadwan etal. (2022) Naïve Bayes, DT, SVM, and KNN The dataset was collected contains

7759 reviews

78.46%

Baker etal. (2020) KNN, Naïve Bayes, SVM, and DT 6300 tweets about inﬂuenza 89.06%

Gamal etal. (2019) SVM, Naïve Bayes, Ridge Regres-

sion, and AdaBoost

151,000 tweets collected about vari-

ous Arabic dialects

99.9%

Mohammed and Kora (2019) LSTM, CNN, and RCNN 40,000 tweets were collected 81.31%

Aloqaily etal. (2020) LMT, KNN, SVM, Vote, and Sim-

ple Logistic

Arabic tweets dataset Acc = 85.55%, F1 = 92%, and

AUC = 86%

Hnaif etal. (2021) SVM, Naïve Bayes, KNN, Random

Forest, and Decision Tree

6138 tweets and posts collected

from Facebook and Twitter

F1-score: 83%

Al-Tamimi etal. (2017) Bernoulli NB, SVM-RBF, and KNN 8053 comments from Youtube 88.8%

Abu-Farha and Magdy (2020) BiLSTM ArSarcasm (10,547 tweets) 46%

Alwakid etal. (2017) SVM and Naïve Bayes 4000 tweets were collected 73%

Alayba etal. (2017) Logistic regression, Stochastic Gra-

dient Descent, CNN, SVM, and

Naïve Bayes

2026 tweets 90.14%

Social Network Analysis and Mining (2024) 14:30 Page 5 of 18 30

3.1 Data collection

The data that are used in this work are collected using

Twitter API and Tweepy library. Tweepy, a Python module

for accessing the Twitter API, provides developers with

access to Twitter content, such as tweets, retweets, and

timestamps. The focal point in choosing Monkeypox was

because of is an epidemic that spread quickly and takes a lot

of interest, especially in the Arab world, with the selection of

Arab tweets related to these polarizing incidents for analysis.

The data collection phase extended for eight months from

28/7/2022 to 4/4/2023, when 4763 tweets were collected and

classiﬁed by language specialists as positive, negative, and

neutral based on the meaning of the sentence, where 1102

tweets were neutral, 944 tweets were positive, and 2717

were negative. The data contained Modern Standard Arabic

(MSA) and Arabic dialects but the most was Arabic dialects.

Some samples of data with translation are shown in Table2.

3.2 Data preprocessing

In the realm of natural language processing (NLP), data pre-

processing encompasses a series of steps taken to cleanse,

convert, and ready raw text data for analysis and modeling

purposes. The ensuing information outlines various preproc-

essing tasks speciﬁc to NLP.

3.2.1 Eliminating duplicate values

This stage entails identifying and eliminating instances of

redundant data. Duplicates may arise when identical text

data appear multiple times within the dataset. By removing

duplicates, we ensure that each unique instance is repre-

sented only once, thereby averting data redundancies. This

step was performed twice, both prior to and after preprocess-

ing, taking into account the presence of duplicated tweets

with diﬀerent emojis or hashtags. After removing these dis-

crepancies, the tweets became identical.

Fig. 1 The proposed methodol-

ogy of work

Table 2 Some instances for tweets and their translate with their label

Tweet text Translate of tweet text Label

          

           

         

Investors are racing to acquire shares of companies that may help eradi-

cate Monkeypox, which has become a source of concern, at a time when

health authorities around the world intensify their search to stop the

outbreak of the disease

Positive

          

          



New York declares a public emergency due to the high incidence of

"monkeypox," and the mayor of the city conﬁrms that nearly a thousand

residents of the city are at risk of infection

Negative

          There is nothing but a well-known observation of Monkeypox, and those

who were before us knew it

Natural

Social Network Analysis and Mining (2024) 14:30 30 Page 6 of 18

3.2.2 Removing Arabic stop words

In NLP, it is customary to eliminate stop words to reduce

data dimensionality and prioritize more meaningful words.

Arabic stop words are speciﬁc to the Arabic language and

comprise articles, prepositions, pronouns, and similar terms.

By eliminating these stop words, we focus on the most per-

tinent content words.

3.2.3 Removing diacritics, numbers, punctuation,

hashtags, andlinks

Diacritics encompass marks or symbols added to charac-

ters to indicate pronunciation or other linguistic features.

Removing diacritics involves stripping these markings

from the text. Removing numbers and punctuation serves

to simplify the text and remove potentially irrelevant or dis-

tracting elements. Hashtags and links frequently appear in

social media or web-based data, and their removal facilitates

concentration on the textual content rather than metadata or

web-related information.

3.2.4 Emojis extracting

Replacing each emoji with text, emojis are graphical sym-

bols used to express emotions or convey ideas in written

communication to enhance the interpretability of NLP algo-

rithms. In this work, we extracted all the emojis from the

existing texts, then deﬁned each emoji with the Arabic word

that represents it, and then, all the emojis were replaced

according to the deﬁnition that was written.

Algorithm1 Pseudo-code of preprocessing steps

Input: Dataset

Output: Cleaned data

1. Begin

2. data <- Remove Duplicates(data)

3. data <- Replace Emoji with Text(data)

4. data <- ExtractEmoji(data)

5. constants <- Load EmojiConstants()

6. data <- Replace Emoticons with Emoji(data)

7. data <- Remove Arabic Stopwords(data)

8. data <- Remove Non-Arabic Words(data)

9. data <- Remove Diacritics(data)

10. data <- Remove Numbers(data)

11. data <- Remove Hashtags(data)

12. data <- Remove Links(data)

13. data <- Remove Punctuations(data)

14. data <- ReduceWordsToRoots(data)

15. data <- Remove Duplicates(data)

16. End

3.2.5 Stemming words

These techniques aim to convert words back to their base or

root form, thereby reducing variations of the same word to

a common representation. For instance, the words “”

(writing), “” (written), and “” (writes) would be

reduced to the root form “” (write).

3.2.6 Extracting features fromtext using CountVectorizer,

TﬁdfTransformer, andSMOTE

Extracting features from text entails converting textual

data into numeric representations that can be processed

by machine learning algorithms. CountVectorizer is a

method that transforms text into an array, indicating the

frequency of words in the text. TﬁdfTransformer calculates

term frequency-inverse document frequency (TF-IDF) val-

ues, reﬂecting the signiﬁcance of words in a document

relative to the entire corpus. SMOTE (Synthetic Minority

Oversampling Technique) (Chawla etal. 2002) (Kovács

etal. 2020) is a technique employed to address class

imbalance in datasets by oversampling the minority class,

thus enhancing the performance of classiﬁcation models.

3.3 Machine learning algorithms

In this research, we have created, built, and evaluated mul-

tiple models using various machine learning techniques

which are LGBM, AdaBoost, XGBoost, and SVM. We

divided 70% of the datasets for training, 15% for valida-

tion, and 15% for testing purposes. The default hyper-

parameter settings in sklearn were used for all classical

machine learning algorithms. Then, diﬀerential evolution

optimization algorithms are used to raise the accuracy by

adjusting the hyperparameters in each machine learning

algorithm. The next section provides a discussion of the

implemented techniques.

3.3.1 Light gradient boosting machine (LGBM)

LightGBM is also a learning algorithm that enhances gradi-

ent and is similar to XGBoost in terms of eﬃciency, speed,

and performance. This model is distinguished by its work

on reducing data in training through the use of a technique

called Gradient-based One-Side Sampling (GOSS), where

the leaf growth method is also used. Unlike other enhanced

ranking algorithms increase its performance and eﬃciency

in classiﬁcation (Ke etal. 2017).

3.3.2 Adaptive boosting (AdaBoost)

AdaBoost is a classiﬁcation algorithm based on gradient

reinforcement and like XGBoost. Its working principle is

Social Network Analysis and Mining (2024) 14:30 Page 7 of 18 30

based on choosing a strong classiﬁer from a group of weak

classiﬁers. In each iteration, the weights of the erroneous

classiﬁcations are adjusted to be corrected in the next itera-

tion, which contributes to enhancing eﬃciency and obtain-

ing high accuracy (Margineantu and Dietterich 1997).

3.3.3 Support vector machine (SVM)

SVM is a learning algorithm that works on classiﬁcation and

regression, where it works to separate features by ﬁnding the

best hyperplane and also works to increase generalization by

widening the margin that separates the decision limit from

the nearest data point, and SVM can be suitable for high-

dimensional data (Jakkula 2011).

3.3.4 Extreme gradient boosting (XGBoost)

XGBoost is a machine learning algorithm for scaling up that

is characterized by speed and performance. It relies in its

work on building weak decision trees to create strong pre-

dictive decisions. Some decision trees are added iteratively,

which contributes to improving the loss function, which

helps prevent overﬁtting (Chen etal. 2018).

3.3.5 Hyperparameter tuning using diﬀerential evolution

The differential evolution (DE) algorithm (Derviş and

Selçuk 2004) is a heuristic method that offers three

important advantages: It can ﬁnd the true global minimum

regardless of the values of the initial parameters, it exhibits

fast convergence, and it only requires a few control

parameters. DE is a population-based algorithm that is

similar to genetic algorithms and uses common factors such

as crossover, mutation, and selection. The main diﬀerence

between genetic algorithms and DE lies in their approach

to building better solutions. While genetic algorithms rely

heavily on the intersection, DE focuses on the process of

mutation.

The DE mutation process depends on the diﬀerences

between pairs of solutions taken at random in the population.

This process acts as a research mechanism, while the selec-

tion process directs the research toward the most promising

areas in the ﬁeld of research.

In DE, the optimization task is represented using D

parameters as a D-dimensional vector. The algorithm begins

by randomly generating a set of NP solvent vectors. These

vectors are then iteratively optimized by applying mutation,

crossover, and selection factors. This iterative process aims

to ﬁnd the best solution for the optimization problem in the

speciﬁed search space.

For each target vector

a mutant vector is produced by:

where

i,r1,r2,r3

∈ {1, 2,…,NP} are randomly chosen and

must be diﬀerent from each other. In Eq.(1), F is the scal-

ing factor which has an effect on the difference vector

(xr2, G − xr3, G), and K is the combination factor. The DE

algorithm also uses non-uniform crossover which gives pri-

ority to taking child vector parameters from one parent more

frequently than the other. Using components from existing

community members were to build beta vectors. The recom-

bination factor (intersection) eﬃciently switches the infor-

mation about successful combinations making it easier to

search for better solution spaces. The parent vector is mixed

with the mutated vector to produce a trial vector

uj,i,G+1

where

j=1, 2, ..., D; rj ∈[0, 1]

is the random number;

is crossover constant

∈[0, 1]

; and

rni∈ (1,2, ..., D)

is the

randomly chosen index.

All solutions in the community have an equal oppor-

tunity to choose parents without relying on their ﬁtness

value. The child produced after the mutation and crosso-

vers is evaluated. After that, the performance of the child

and its parent vectors is compared, and the best is chosen.

If the parent is still better, it is kept among the population.

Algorithm2 shows steps of DE optimization.

Algorithm2 Pseudo-code of DE optimization

1. Initialization

2. Repeat

3. Mutation

4. Crossover

5. Evaluation

6. Selection

7. Until (termination criteria are met)

(1)

vi,G+1

i,G

+K⋅

(

r1,G

−x

i,G)

+F⋅

(

r2,G

−x

r3,G)

(2)

j,i,G+1=

{

vji,G+1if

(

rndj≤CR

)

or j=rn

ji,G

if(rnd >CR)and j≠rn

Table 3 Hyperparameters tuning for all models

ML model Best hyperparameters

SVM C: 7.59

Kernel: rbf

Gamma: 0.68

AdaBoost Learning rate: 0.319 n_estimators: 300

XGBoost Learning rate: 0.433 max_depth: 9

LGBM Learning rate: 0.542 max_depth: 8

Social Network Analysis and Mining (2024) 14:30 30 Page 8 of 18

Table3 shows hyperparameter tuning for different

machine learning models (SVM, AdaBoost, XGBoost,

and LGBM) and identiﬁed the best hyperparameters for

each model.

3.4 Proposed DNN approach

An artiﬁcial neural network (ANN) approach was used as a

computational model because it is inﬂuenced by the proper-

ties of biological neural networks to incorporate intelligence

into our proposed method. Feed-Forward Neural Network

(FFN), a type of ANN, is represented as a graph directed

to pass various system information along the edges from

one node to another without forming a cycle (Ullah and

Mahmoud 2022). We adopt a multi-layer model (MLP),

which is a kind of FFN in the proposed model consisting

of one input layer, three hidden layers, and an output layer

containing three labels; each layer contains many neurons

Fig. 2 DNN architecture

Fig. 3 Flowchart for ReLU, Leaky ReLU, and ReLU6

Social Network Analysis and Mining (2024) 14:30 Page 9 of 18 30

or units in mathematical notation. Experiments choose the

number of hidden layers. The information is transferred

from one layer to the other in a forward direction, with

neurons in each layer fully connected, as shown in ﬁgure.

It starts from 7008, which is the input and the count of fea-

tures, then there is output of it which are 512, and we call

that the input is fully connected layer, and there are three

hidden fully connected layers, which are (512, 256), (256,

128), and (128, 32) followed by ﬁnal output fully connected

layer which is (32, 3) noting that 3 is the count of classes.

Also, after every fully connected layer, there is an activa-

tion function. Figure2 shows the general architecture of

DNN methods.

MLP is deﬁned mathematically as

Rm×Rn

where m is the

size of inputs

x=x1,x2,x3,….., xm

, and n is the outputs size

hi(x)

and deﬁned mimetically as follows:

Which f is ReLU, Leaky ReLU, and relu6 activation func-

tions, which are used as three experiments to reduce the

state of vanishing and error gradient issue also to parsing

between them. Each function is used after every layer. This

way of stacking hidden layers is typically called deep neural

networks (DNNs) as shown in Fig.2. The activation func-

tions are deﬁned mathematically as the equation below and

as shown in Fig.3.

SReLU (Scaled Exponential Linear Unit): The SReLU,

or Scaled Exponential Linear Unit, is an activation function

that introduces a piecewise linear curve with two parameters:

alpha and beta. The mathematical expression for SReLU is

as follows:

The SReLU is like the ReLU (Rectiﬁed Linear Unit) in

that it only activates for positive input values, setting nega-

tive values to zero. However, the SReLU has two additional

parameters (alpha and beta) that allow it to have diﬀerent

(3)

(x)=f

(

x+bi

)

(4)

ReLU = max (0, x)

slopes for positive input values, enabling more ﬂexibility

in modeling complex data distributions (Gustineli 2022).

Leaky ReLU: The Leaky ReLU is a variant of the tradi-

tional ReLU that addresses a potential issue called “dying

ReLU.” In ReLU, neurons with negative inputs are assigned

an output of zero, eﬀectively deactivating them. If a neuron

is deactivated for all inputs during training, it may become

stuck in that state and not update its weights further, leading

to the neuron “dying” and not contributing to the network’s

learning process (Gustineli 2022; Apicella etal. 2021). The

mathematical expression for Leaky ReLU is as follows:

In the Leaky ReLU, instead of setting negative val-

ues to zero, it introduces a small slope (0.1 in this case)

for negative inputs. This small slope allows the neuron to

carry a small gradient even for negative inputs, preventing

the “dying ReLU” problem and promoting better learning

in deeper neural networks. Figure3 shows a ﬂowchart for

ReLU, Leaky ReLU, and ReLU 6.

ReLU6: ReLU6 is another variant of the ReLU activation

function that bounds the output to a maximum value, usu-

ally 6 (Kim etal. 2021). The mathematical expression for

ReLU6 is as follows:

It behaves like a regular ReLU for positive inputs, set-

ting them to the input value. However, it clips the output to

6 if the input value is greater than 6. This bounded behavior

can be helpful in preventing extremely large activations

that might cause numerical instability or other issues in

certain neural network architectures. The provided Algo-

rithm3 represents a simple neural network model with a

Leaky ReLU activation function. The neural network is

designed for a multi-class classiﬁcation problem with three

output classes. Below is a brief explanation of theneural

net model.

(5)

Leaky ReLU =max(0.1x,x)

(6)

ReLU6 =min(max (0, x),6

)

Social Network Analysis and Mining (2024) 14:30 30 Page 10 of 18

Algorithm3 Pseudo-code of neural net model

Input:

X_train (input data), shape (n_samples, n_features)

Output:A deep neural network model: NeuralNet

1. Begin

//Data Collection: Data were gathered using Twitter API and Tweepy library which are 4763 tweets divided

to 1102 tweets were neutral, 944 tweets were positive and 2717 were negative)

2. Data <- Preprocessing

3. function NeuralNet():

4. fc1 <- Linear layer with input size matching the number of features and output size 512

5. fc2 <- Linear layer with input size 512 and output size 256

6. fc3 <- Linear layer with input size 256 and output size 128

7. fc4 <- Linear layer with input size 128 and output size 32

8. fc5 <- Linear layer with input size 32 and output size 3

//(3 output classes)

9. forward function:

10.Input: x

11.Output: x

12.x = Leakyrelu(fc1(x))

13.x = Leakyrelu (fc2(x))

14.x = Leakyrelu (fc3(x))

15.x = Leakyrelu (fc4(x))

16.x = fc5(x)

17.x = Leakyrelu (x)

18.return x

19.return forward

20.function train(X_train):

21.model = NeuralNet()

22.optimizer = Adam(model.parameters())

23.criterion = CrossEntropyLoss()

24.for epoch in range(num_epochs):

25.optimizer.zero_grad()

26.outputs = model.forward(X_train)

27.loss = criterion(outputs, y_true)

28.loss.backward()

29.optimizer.step()

30.function test(X_test):

31.model = NeuralNet()

32.outputs = model.forward(X_test)

33. predicted_labels = argmax(outputs, axis=1)

34.return predicted_labels

35.

End

3.5 Performance models

Performance evaluation of machine learning models is a cru-

cial step in assessing their eﬀectiveness in solving speciﬁc

tasks. In this research, the model’s evaluation is based on

several performance metrics accuracy, initiative, recall, and

F1-score with a preference for F1-score and accuracy.

Accuracy: It is the most used, and probably the ﬁrst choice,

for evaluating an algorithm’s performance on classiﬁcation

problems. It is deﬁned as the ratio of true classiﬁcations to all

true or false classiﬁcations (Eusebi 2013).

Precision: It simply refers to “the number of speciﬁc

relevant data items,” as the name implies. Or how many

positive notes the algorithm predicted were positive (Juba

and Le 2019). Precision is equal to the number of true

positives divided by the total number of true positives and

false positives:

(7)

Accuracy

TP +TN

TP +FP +FN +TN

(8)

Precision

TP +FP

Social Network Analysis and Mining (2024) 14:30 Page 11 of 18 30

Recall: How many positive observations—really cor-

rect ones—have the algorithm predicted? Recall equals the

number of true positives divided by the total number of true

positives and false negatives (Otten etal. 2005).

F1-score: This metric, often known as an f-score, evalu-

ates the accuracy and callback of an algorithm to evaluate its

performance. It is theoretically represented as the harmonic

mean of recall and accuracy (Fourure etal. 2021) which

represents the best evaluation in some classiﬁcations.

(9)

Recall

TP +FN

4 Results anddiscussion

In this study, we will discuss a comprehensive exploration of

the performance characteristics of various machine learning

algorithms also using diﬀerent evolution optimization algo-

rithms with compressing with proposed DNN model. The

performance of each model is evaluated based on accuracy,

(10)

−score =2×

precision ×recall

precision +recall

Table 4 Performance of DNN

and ML algorithms with DE

and without it

Type of algorithm Performance Avg precision Avg recall Avg F1-score Accuracy

ML algorithm without DE SVM 0.86 0.86 0.86 0.86

LGBM 0.85 0.85 0.85 0.86

AdaBoost 0.74 0.73 0.73 0.73

XGBoost 0.85 0.85 0.85 0.85

ML algorithm with DE SVM 0.90 0.90 0.90 0.90

LGBM 0.85 0.85 0.85 0.86

AdaBoost 0.80 0.79 0.79 0.79

XGBoost 0.87 0.86 0.86 0.87

Proposed DNN DNN + ReLU 0.9109 0.9091 0.9105 0.9125

DNN + Leaky ReLU 0.9169 0.9165 0.9167 0.9182

DNN + ReLU6 0.9109 0.9104 0.9106 0.9125

Fig. 4 Performance comparison

of ML algorithm

Social Network Analysis and Mining (2024) 14:30 30 Page 12 of 18

precision, recall, and F1-score, the result of classical ML

algorithm shown in table. Our goal is to shed light on how

these algorithms perform in diﬀerent contexts and to under-

stand the factors that contribute to their eﬀectiveness. To

achieve this, we categorize our analysis into three main sec-

tions: machine learning (ML) algorithms without optimiza-

tion, ML algorithms with optimization, and Deep Neural

Network (DNN) algorithms as shown in Table4. Each of

these categories represents a distinct approach to handling

predictive tasks, and our aim is to delve into the intricacies

of their performances.

The initial category centers on machine learning algo-

rithms that operate without any optimization. This phase

of our study is designed to assess the inherent capabilities

of these algorithms and identify those that exhibit the most

promising metrics.

Four outstanding algorithms Support Vector Machine

(SVM), LightGBM (LGBM), AdaBoost, and XGBoost are

subjected to a rigorous evaluation process. SVM, a widely

employed classification algorithm, consistently demon-

strates performance across all metrics, yielding an average

precision, recall, F1-score, and accuracy of 0.86 as shown

in Fig.4. Although it may not reach the pinnacle of optimi-

zation, SVM stands out for its reliability and ability to pro-

duce well-balanced outcomes. Similarly, LGBM proves to

be reliable, boasting an average precision, recall, F1-score,

and accuracy of 0.85. This highlights LGBM’s capability

to generate dependable predictions even without extensive

optimization, positioning it as a versatile choice across

diverse applications.

Contrastingly, AdaBoost’s performance metrics average

precision, recall, F1-score, and accuracy register at 0.74,

slightly trailing the top-performing algorithms. However,

the potential of AdaBoost in speciﬁc scenarios is not to

be underestimated. Its specialized applicability makes

it a viable consideration for tasks where its strengths can

shine. The XGBoost algorithm, recognized for its ensemble

approach, consistently performs with an average precision,

recall, F1-score, and accuracy of 0.85. This underscores its

suitability for tasks that demand well-balanced and reliable

predictions.

Transitioning to the second category, we delve into the

realm of machine learning algorithms with optimization.

This phase shifts our focus toward understanding the impact

of optimization on algorithmic performance. In Fig.5, we

reevaluate the four previously mentioned algorithms SVM,

LGBM, AdaBoost, and XGBoost after subjecting them to

optimization, aiming to uncover potential improvements in

their predictive capabilities. Post-optimization, SVM experi-

ences a substantial performance boost, achieving an average

precision, recall, F1-score, and accuracy of 0.9. This notable

enhancement highlights SVM’s adaptability for reﬁned pre-

dictions and further solidiﬁes its standing as a robust con-

tender. LGBM, even after optimization, maintains its perfor-

mance integrity, with an average precision, recall, F1-score,

and accuracy of 0.85. This consistency underscores LGBM’s

inherent reliability and stability.

Optimization proves to be a crucial factor in elevating

AdaBoost’s metrics, resulting in an average precision, recall,

F1-score, and accuracy of 0.8. This positions AdaBoost as

Fig. 5 Performance comparison

of ML algorithm with optimiza-

tion

Social Network Analysis and Mining (2024) 14:30 Page 13 of 18 30

a valuable choice, particularly in scenarios where optimiza-

tion plays a pivotal role. Similarly, the XGBoost algorithm,

following optimization, exhibits heightened results, boasting

an average precision, recall, F1-score, and accuracy of 0.87.

This advancement further solidiﬁes XGBoost’s adaptability

for precise predictions, reaﬃrming its place in the landscape

of machine learning algorithms.

In the ﬁnal phase of our analysis as shown in Fig.6,

we introduce the realm of Deep Neural Network (DNN)

algorithms. This part of the study focuses on assessing the

performance of DNN algorithms under various activation

Fig. 6 Performance comparison

of DL algorithm with optimiza-

tion

Fig. 7 Receiver operating

characteristic (ROC) for Leaky

ReLU

Social Network Analysis and Mining (2024) 14:30 30 Page 14 of 18

functions to understand their impact on predictive capa-

bilities. The ﬁrst DNN variant, DNN + ReLU, demon-

strates commendable performance, achieving an average

precision, recall, F1-score, and accuracy of 0.9109. This

consistent and high-scoring performance underscores its

competence in predictive tasks. Surpassing its predeces-

sor, DNN + Leaky ReLU records an average precision,

recall, F1-score, and accuracy of 0.9169. This improve-

ment highlights the eﬃcacy of the Leaky ReLU activa-

tion function in enhancing overall performance. Simi-

larly, DNN + ReLU6 attains an average precision, recall,

F1-score, and accuracy of 0.9109, mirroring the stability

and reliability of the ReLU6 activation function.

For the proposed model, a higher accuracy was obtained

than the rest of the models. During the three experiments,

the highest accuracy was achieved by using the model

with Leky ReLU how much appears in the table, followed

by the use of relu6. In the end, ReLU with the proposed

model achieved less accuracy, and despite that it outper-

formed the rest of the models machine learning. The rea-

son why Leaky ReLU is superior to ReLU and relu6 is to

avoid the problem of dying ReLU. Instead of converting

negative values into zeros and causing inactivity of some

neurons, they turn into very small negative values, and

because of that, a slowdown in the calculation process

can occur.

Fig. 8 Confusion matrix for DNN with activation function that used

Social Network Analysis and Mining (2024) 14:30 Page 15 of 18 30

By examining the box plot in the ﬁgure, we can gain

insights into the performance diﬀerences and distributions

between diﬀerent machine learning models for each met-

ric. We can see that the proposed DNN model has the best

results. Also, the ﬁgure displays the confusion matrix of

the DNN model with the three activation functions. When

analyzing the confusion matrix, we observed a large number

of true positive for certain categories, indicating that the

model correctly predicted these cases. This makes the model

eﬀective for detecting positive cases. An AUC value of 0.97

indicates that the model is highly eﬀective in diﬀerentiating

between positive and negative instances as shown in Fig.7.

It suggests that the model has a high true-positive rate (sen-

sitivity) and a low false-positive rate (1-speciﬁcity).

This could be attributed to the model’s diﬃculty in recog-

nizing subtle diﬀerences in text that indicates a more neutral

or negative sentiment. As shown in Fig.8, falsely classi-

ﬁed as negative sentiments were 41 instances, whereas they

should have been categorized as neutral or positive. The mis-

classiﬁcations may be a result of complex language usage,

sarcasm, or nuanced expressions that the model struggled

to comprehend accurately. As the ﬁeld of NLP continues to

advance, minimizing false positives will play a signiﬁcant

role in harnessing the true potential of sentiment analysis

in diverse applications, from market research to customer

support and beyond.

When comparing our study to previous research (Ke,

etal. 2017; Margineantu and Dietterich 1997; Jakkula

2011; Chen etal. 2018; Derviş and Selçuk 2004; Ullah and

Mahmoud 2022; Gustineli 2022; Apicella etal. 2021; Kim

etal. 2021; Eusebi 2013; Juba and Le 2019; Otten etal.

2005; Fourure etal. 2021), certain distinctive features come

to light. Our study stands out due to its speciﬁc emphasis on

Arabic COVID-19 tweets and its utilization of a Deep Neural

Network (DNN) approach as shown in Table5. Our research

bears several noteworthy attributes that diﬀerentiate it from

the pre-existing body of work, making substantial contribu-

tions to the domain of disease-related social media analysis.

Particularly, the distinct focus on Arabic COVID-19 tweets

distinguishes our study. Whereas preceding research encom-

passed a variety of diseases and languages, our concentrated

investigation into a speciﬁc ailment within a particular lin-

guistic community oﬀers illuminating perspectives on how

COVID-19 is perceived and discussed among Arabic speak-

ers. This unique focus unveils cultural and linguistic nuances

that can oﬀer deeper insights into pandemic perceptions.

The superiority of some research to the results obtained

is due to the lack of data collected and limited to one ﬁeld,

which is Monkeypox, in contrast with the previous research

that worked on datasets in a broader ﬁeld. In addition to the

great similarity in positive, negative, and natural tweets, it

aﬀects the classiﬁcation process. Given that the data col-

lected is real and that the tweets were written in colloquial

Arabic from more than one Arab country, and thus, there

is diversity in Arabic dialects, working on it is sometimes

Table 5 Performance of related work on Arabic and English sentiment analysis datasets

References Methodology Appli-

cation

language

Accuracy (%)

Abdurrahim etal. (2022) Gaussian Naïve Bayes, Bernoulli Naïve Bayes, and Naïve Bayes Multino-

mial

English 80

Mohbey etal. (2022) CNN and LSTM English 94

Bengesi etal. (2023) Random Forest, Logistic Regression, Multi-layer Perceptron (MLP), SVM,

KNN, Naïve Bayes, and XGBoost

English 93.48

Oscar etal. (2017) ML classiﬁers English 95.15

Chintalapudi etal. (2021) Logistic regression, SVM, and LSTM English 87.33

Dangi etal. (2022) Decision Tree, SVM, Logistic Regression, Random Forest, and Multino-

mial NB

English 98

Iparraguirre-Villanueva etal. (2023) CNN and LSTM English 88

Musleh etal. (2022) SVM, Random Forest (RF), Logistic Regression (LR), KNN, AdaBoost,

and Naïve Bayes (NB)

Arabic 82.39

Al-Musallam and Al-Abdullatif (2022) SVM, KNN, Logistic Regression (LR), and Naïve Bayes Arabic 82

Baker etal. (2020) Naive Bayes, SVM, KNN, and Decision Trees Arabic 89.06

Aljameel etal. (2021) SVM, KNN, and Naïve Bayes Arabic 85

Waheeb etal. (2020) Random Forest, SVM, CNN (LSTM), RNN (LSTM), and Naïve Bayes Arabic 89

Boulesnane etal. (2022) SVM, BiLTSM, GRU, and LSTM Arabic 89.8

Alabid and Katheeth (2021) Support Vector Machine (SVM) and Naive Bayes Arabic 81%

Our research DNN Arabic 92

Social Network Analysis and Mining (2024) 14:30 30 Page 16 of 18

diﬃcult, and the extracted features are considered more

complex.

Due to the previous work, most researchers used ML in

their work and did not expand their methodology. Therefore,

the proposed approach is considered distinct from the rest, as

it contains an eﬀective preprocessing to prepare the data for

training, in addition to the use of DNN with three diﬀerent

classiﬁcation methods, and the inclusion of DE to help in

adjusting the parameters.

Furthermore, the adoption of a Deep Neural Network

(DNN) approach sets our research on the vanguard of con-

temporary methods for deciphering social media data. DNNs

have proven exceptionally adept at uncovering intricate pat-

terns and intricate connections within data, making them

particularly apt for capturing the subtleties of language and

sentiment prevalent in social media dialogs. The adoption

of this modern approach elevates our research by enhanc-

ing the accuracy and depth of our analysis. Consequently,

while our attained accuracy score of 92% resides within the

range observed in prior studies, it remains an achievement

worthy of commendation, especially given the intricate

nature of deciphering social media data. This attainment

underscores the fact that our chosen DNN approach is on

par with or potentially surpasses the accuracy exhibited by

other methodologies employed in earlier investigations. Such

a high accuracy score is pivotal in establishing the credibility

and validity of our ﬁndings. Above all, our research carries

signiﬁcant implications for the expansive realm of disease-

related social media analysis. With its laser focus on Arabic

COVID-19 tweets, our study delves into a previously unex-

plored facet of disease dialogues. This, in turn, augments our

comprehension of public perceptions and sentiments, while

concurrently delivering invaluable insights for shaping public

health interventions and tailoring policies to resonate with

the Arabic-speaking population. The distinctive aspects of

our research its focal point on Arabic COVID-19 tweets,

incorporation of a DNN methodology, competitive accuracy

score, and the absence of explicitly stated limitations jointly

position it as a valuable addition to the ﬁeld of disease-related

social media analysis. The strengths of our study not only

underscore its capacity to deepen our comprehension of dis-

ease discussions within the Arabic-speaking community but

also to provide actionable insights for health-care profession-

als and researchers.

5 Conclusion

This study focused on the application of sentiment analysis

in analyzing Monkeypox using Arabic sentiment text, with

an emphasis on hyperparameter optimization for machine

learning and deep learning algorithms. The deep learning

model-based Leaky ReLU showcasing the signiﬁcance of

leveraging complex representations for NLP tasks with 92%.

The use of deep learning models outperformed traditional

machine learning algorithms, highlighting the importance

of leveraging complex representations for NLP tasks. By

gaining insights into public sentiments toward Monkeypox,

health-care authorities and policymakers can make more

informed decisions to address public concerns eﬀectively.

This research contributes to advancing sentiment analysis

techniques in Arabic text and sheds light on the potential

applications of sentiment analysis in the context of public

health issues. As the ﬁeld of sentiment analysis continues

to evolve, these models can be further reﬁned to yield even

more accurate and eﬃcient results, ultimately aiding in

decision-making processes, and enhancing public health

outcomes.

Limitation: Research limitations center on the diﬃculty

of collecting data, as it takes a long time, in addition to its

scarcity, given that the subject of Monkeypox is still new. In

addition, the preprocessing of the data took a great deal of time

and eﬀort due to its diﬃculty in the Arabic language. In the

future, we seek to collect larger data in addition to including

more important topics to know people’s feelings about them.

Author contributions HG worked in software, resources, writing—

original draft, supervision, methodology, conceptualization, formal

analysis, and review and editing. REAM worked in supervision, meth-

odology, conceptualization, and writing—original draft. GS helped in

formal analysis, and writing—review and editing. AN helped in formal

analysis and writing—review and editing. SS helped in formal analysis

and writing—review and editing. KMON helped in formal analysis

and writing—review and editing. MA helped in formal analysis and

writing—review and editing. EAD helped in formal analysis and writ-

ing—review and editing. MG helped in formal analysis and writing—

review and editing. LA helped in formal analysis and writing—review

and editing. All authors read and approved the ﬁnal paper.

Funding Not applicable.

Data availability Data are available from the authors upon reasonable

request.

Declarations

Conflict of interest The authors declare that there is no conﬂict of in-

terest regarding the publication of this paper.

Ethical approval This article does not contain any studies with human

participants or animals performed by any of the authors.

Informed consent Informed consent was obtained from all individual

participants included in the study.

References

Abdurrahim A, Syafa’ah L, Lestandy M (2022) Sentiment analysis of

Covid-19 vaccine tweets utilizing Naïve Bayes. AIP Conference

Proceedings, vol. 2453. https:// doi. org/ 10. 1063/5. 00946 07

Social Network Analysis and Mining (2024) 14:30 Page 17 of 18 30

Abu-Farha I, Magdy W (2020) From Arabic sentiment analysis to

sarcasm detection: the ArSarcasm dataset, Aclweb.Org, Euro-

pean L, pp 32–39

Alabid NN, Katheeth ZD (2021) Sentiment analysis of twitter posts

related to the covid-19 vaccines. Indones J Electr Eng Comput

Sci 24(3):1727–1734. https:// doi. org/ 10. 11591/ ijeecs. v24. i3.

pp1727- 1734

Alayba AM, Palade V, England M, Iqbal R (2017) Arabic language

sentiment analysis on health services, pp 114–118. https:// doi.

org/ 10. 1109/ asar. 2017. 80677 71

Alayba AM, Palade V, England M, Iqbal R (2018) A combined CNN

and LSTM model for Arabic sentiment analysis. In: Lect. Notes

Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect.

Notes Bioinformatics), vol. 11015 LNCS, pp 179–191. https://

doi. org/ 10. 1007/ 978-3- 319- 99740-7_ 12

Aljameel SS etal (2021) A sentiment analysis approach to predict

an individual’s awareness of the precautionary procedures to

prevent covid-19 outbreaks in Saudi Arabia. Int J Environ Res

Public Health 18(1):1–12. https:// doi. org/ 10. 3390/ ijerp h1801

0218

Al-Musallam N, Al-Abdullatif M (2022) Depression detection through

identifying depressive Arabic tweets from Saudi Arabia: machine

learning approach. In: Proceedings 2022 5th National Conference

Saudi Computer Colleges, NCCC 2022, pp 11–18. https:// doi. org/

10. 1109/ NCCC5 7165. 2022. 10067 346

Aloqaily A, Al-Hassan M, Salah K, Elshqeirat B, Almashagbah M

(2020) Sentiment analysis for Arabic tweets datasets: Lexicon-

based and machine learning approaches. J Theor Appl Inf Technol

98(4):612–623

Al-Tamimi AK, Shatnawi A, Bani-Issa E (2017) Arabic sentiment

analysis of YouTube comments. In: 2017 IEEE Jordan confer-

ence on applied electrical engineering and computing technologie-

sAEECT, pp 1–6. https:// doi. org/ 10. 1109/ AEECT . 2017. 82577 66

Alwakid G, Osman T, Hughes-Roberts T (2017) Challenges in sen-

timent analysis for Arabic social networks. Proc Comput Sci

117:89–100. https:// doi. org/ 10. 1016/j. procs. 2017. 10. 097

Apicella A, Donnarumma F, Isgrò F, Prevete R (2021) A survey on

modern trainable activation functions. Neural Netw 138(June):14–

32. https:// doi. org/ 10. 1016/j. neunet. 2021. 01. 026

Atoum JO, Nouman M (2019) Sentiment analysis of Arabic Jordanian

dialect tweets. Int J Adv Comput Sci Appl 10(2):256–262. https://

doi. org/ 10. 14569/ ijacsa. 2019. 01002 34

Baker QB, Shatnawi F, Rawashdeh S, Al-Smadi M, Jararweh Y (2020)

Detecting epidemic diseases using sentiment analysis of arabic

tweets. J Univers Comput Sci 26(1):50–70. https:// doi. org/ 10.

3897/ jucs. 2020. 004

Bengesi S, Oladunni T, Olusegun R, Audu H (2023) A machine learn-

ing-sentiment analysis on Monkeypox outbreak: an extensive

dataset to show the polarity of public opinion from twitter tweets.

IEEE Access 11(February):11811–11826. https:// doi. org/ 10. 1109/

ACCESS. 2023. 32422 90

Boulesnane A, Meshoul S, Aouissi K (2022) Inﬂuenza-like illness

detection from Arabic Facebook posts based on sentiment analysis

and 1D convolutional neural network. Mathematics 10(21):1–22.

https:// doi. org/ 10. 3390/ math1 02140 89

Cambria E (2022) Sentic computing. In: Encyclopedia of big data, pp

821–827. Springer International Publishing, Cham

Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE:

synthetic minority over-sampling technique. J Artif Intell Res

16:321–357

Chen T, He T, Benesty M (2018) XGBoost : eXtreme Gradient Boost-

ing. R Packag. version 0.71–2, pp 1–4

Chintalapudi N, Battineni G, Amenta F (2021) Sentimental analysis

of COVID-19 tweets using deep learning models. Infect Dis Rep

13(2):329–339. https:// doi. org/ 10. 3390/ IDR13 020032

Dangi D, Dixit DK, Bhagat A (2022) Sentiment analysis of COVID-

19 social media data through machine learning. Multimed

Tools Appl 81(29):42261–42283. https:// doi. org/ 10. 1007/

s11042- 022- 13492-w

Derviş K, Selçuk Ö (2004) A simple and global optimization algorithm

for engineering problems: diﬀerential evolution algorithm. Turk-

ish J Electr Eng Comput Sci 12(1):53–60

El-Beltagy SR, Ali A (2013) Open issues in the sentiment analysis of

Arabic social media: a case study. In: 2013 9th international con-

ference innovation information technology, IIT 2013, pp 215–220.

https:// doi. org/ 10. 1109/ Innov ations. 2013. 65444 21

Eusebi P (2013) Diagnostic accuracy measures. Cerebrovasc Dis

36(4):267–272. https:// doi. org/ 10. 1159/ 00035 3863

Fourure D, Javaid MU, Posocco N, Tihon S (2021) Anomaly detection:

how to artiﬁcially increase your F1-Score with a biased evalu-

ation protocol. In: Lecture notes computere science (including

Subseries lecture notes artiﬁal intelligence lecture notes bioin-

formatics), vol. 12978 LNAI, pp 3–18. https:// doi. org/ 10. 1007/

978-3- 030- 86514-6_1

Gamal D, Alfonse M, El-Horbaty E-SM, Salem A-BM (2019) Twit-

ter benchmark dataset for Arabic sentiment analysis. Int J Mod

Educ Comput Sci 11(1):33–38. https:// doi. org/ 10. 5815/ ijmecs.

2019. 01. 04

Gustineli M (2022) A survey on recently proposed activation functions

for deep learning, pp 1–7. http:// arxiv. org/ abs/ 2204. 02921

Hadwan M, Al-Hagery MA, Al-Sarem M, Saeed F (2022) Arabic senti-

ment analysis of users’ opinions of governmental mobile applica-

tions. Comput Mater Contin 72(3):4675–4689. https:// doi. org/ 10.

32604/ cmc. 2022. 027311

Heikal M, Torki M, El-Makky N (2018) Sentiment analysis of Ara-

bic Tweets using deep learning. Proc Comput Sci 142:114–122.

https:// doi. org/ 10. 1016/j. procs. 2018. 10. 466

Hnaif AA, Kanan E, Kanan T (2021) Sentiment analysis for arabic

social media news polarity. Intell Autom Soft Comput 28(1):107–

119. https:// doi. org/ 10. 32604/ iasc. 2021. 015939

Huang H, Zavareh AA, Mustafa MB (2023) Sentiment analysis in

E-Commerce platforms: a review of current techniques and future

directions. IEEE Access 11(July):90367–90382. https:// doi. org/

10. 1109/ ACCESS. 2023. 33073 08

Iparraguirre-Villanueva O etal (2023) The public health contribution

of sentiment analysis of Monkeypox tweets to detect polarities

using the CNN-LSTM model. Vaccines 11(2):1–12. https:// doi.

org/ 10. 3390/ vacci nes11 020312

Jain R etal (2023) Explaining sentiment analysis results on social media

texts through visualization. Multimed Tools Appl 82(15):22613–

22629. https:// doi. org/ 10. 1007/ s11042- 023- 14432-y

Jakkula V (2011) Tutorial on Support Vector Machine (SVM). School

of EECS, Washington State University, pp 1–13

Juba B, Le HS (2019) Precision-Recall versus accuracy and the role

of large data sets. In: 33rd AAAI confernce artiﬁal intelligence

AAAI 2019, 31st innovative applied artiﬁal intelligence confer-

ence IAAI 2019 9th AAAI symposium education advance artiﬁal

intelligence, EAAI 2019, pp 4039–4048. https:// doi. org/ 10. 1609/

aaai. v33i01. 33014 039

Ke G etal. (2017) LightGBM: a highly eﬃcient gradient boosting

decision tree. Adv Neural Inf Process Syst, vol. 2017-Decem, pp

3147–3155

Kim H, Park J, Lee C, Kim JJ (2021) Improving accuracy of binary

neural networks using unbalanced activation distribution. In: Proc.

IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp

7858–7867. https:// doi. org/ 10. 1109/ CVPR4 6437. 2021. 00777.

Kovács B, Tinya F, Németh C, Ódor P (2020) Unfolding the eﬀects

of diﬀerent forestry treatments on microclimate in oak forests:

results of a 4-yr experiment. Ecol Appl 30(2):321–357. https://

doi. org/ 10. 1002/ eap. 2043

Social Network Analysis and Mining (2024) 14:30 30 Page 18 of 18

Margineantu DD, Dietterich TG (1997) Pruning Adaptive Boosting

*** ICML-97 Final Draft ***"

Mohammed A, Kora R (2019) Deep learning approaches for Arabic

sentiment analysis. Soc Netw Anal Min 9(1):1–12. https:// doi. o rg/

10. 1007/ s13278- 019- 0596-4

Mohbey KK, Meena G, Kumar S, Lokesh K (2022) A CNN-LSTM-

based hybrid deep learning approach to detect sentiment polarities

on Monkeypox tweets, pp 1–11

Musleh DA etal (2022) Twitter arabic sentiment analysis to detect

depression using machine learning. Comput Mater Contin

71(2):3463–3477. https:// doi. org/ 10. 32604/ cmc. 2022. 022508

Oscar N, Fox PA, Croucher R, Wernick R, Keune J, Hooker K (2017)

Machine learning, sentiment analysis, and tweets: an examination

of Alzheimer’s disease stigma on Twitter. J Gerontol Ser B Psy-

chol Sci Soc Sci 72(5):742–751. https:// doi. org/ 10. 1093/ geronb/

gbx014

Otten JDM etal (2005) Eﬀect of recall rate on earlier screen detection

of breast cancers based on the Dutch performance indicators. J

Natl Cancer Inst 97(10):748–754. https:// doi. org/ 10. 1093/ jnci/

dji131

Oussous A, Benjelloun FZ, Lahcen AA, Belfkih S (2020) ASA: a

framework for Arabic sentiment analysis. J Inf Sci 46(4):544–559.

https:// doi. org/ 10. 1177/ 01655 51519 849516

Rodríguez-Ibánez M, Casánez-Ventura A, Castejón-Mateos F, Cuenca-

Jiménez PM (2023) A review on sentiment analysis from social

media platforms. Expert Syst Appl. https:// doi. org/ 10. 1016/j. eswa.

2023. 119862

Rukhsar S etal (2023) Artiﬁcial intelligence based sentence level sen-

timent analysis of COVID-19. Comput Syst Sci Eng 47(1):791–

807. https:// doi. org/ 10. 32604/ csse. 2023. 038384

Salem F (2017) Social Media and the Internet of Things (The Arab

Social Media Report 2017), Arab Social Media Report Series

Sayed AA, Elgeldawi E, Zaki AM, Galal AR (2020) Sentiment analysis

for Arabic reviews using machine learning classiﬁcation algo-

rithms. In: Proceedings 2020 international conference innovative

trends communications computere engineering, ITCE 2020, pp

56–63. https:// doi. org/ 10. 1109/ ITCE4 8509. 2020. 90478 22

Tan L, Tan OK, Sze CC, Bin Goh WW (2023) Emotional variance

analysis: a new sentiment analysis feature set for artiﬁcial intel-

ligence and machine learning applications. PLoS One 18(1):1–22.

https:// doi. org/ 10. 1371/ journ al. pone. 02742 99

Ullah I, Mahmoud QH (2022) An anomaly detection model for iot

networks based on ﬂow and ﬂag features using a feed-forward

neural network. In: Proceedings - IEEE consumer communica-

tions network conference CCNC, pp 363–368. https:// doi. org/ 10.

1109/ CCNC4 9033. 2022. 97005 97

Waheeb SA, Khan NA, Chen B, Shang X (2020) Machine learning

based sentiment text classiﬁcation for evaluating treatment qual-

ity of discharge summary. Information. https:// doi. org/ 10. 3390/

INFO1 10502 81

Publisher's Note Springer Nature remains neutral with regard to

jurisdictional claims in published maps and institutional aﬃliations.

Springer Nature or its licensor (e.g. a society or other partner) holds

exclusive rights to this article under a publishing agreement with the

author(s) or other rightsholder(s); author self-archiving of the accepted

manuscript version of this article is solely governed by the terms of

such publishing agreement and applicable law.

Authors and Aliations

HasanGharaibeh1· RabiaEmhamedAlMamlook2,3· GhassanSamara4· AhmadNasayreh1· SajaSmadi1·

KhalidM.O.Nahar1· MohammadAljaidi4· EssamAl‑Daoud4· MohammadGharaibeh5· LaithAbualigah6,7,8,9,10,11,12

* Laith Abualigah

Aligah.2020@gmail.com

Essam Al-Daoud

essamdz@zu.edu.jo

1 Department ofInformation Technology andComputer

Sciences, Yarmouk University, Irbid211633, Jordan

2 Department ofBusiness Administration, Trine University,

Indiana, USA

3 Department ofMechanical andIndustrial Engineering,

University ofZawia, Tripoli, Libya

4 Department ofComputer Science, Faculty ofInformation

Technology, Zarqa University, Zarqa13110, Jordan

5 Department ofMedicine, Faculty ofMedicine, Hashemite

University, Zarqa13133, Jordan

6 Artiﬁcial Intelligence andSensing Technologies (AIST)

Research Center, University ofTabuk, Tabuk71491,

SaudiArabia

7 Hourani Center forApplied Scientiﬁc Research, Al-Ahliyya

Amman University, Amman19328, Jordan

8 MEU Research Unit, Middle East University, Amman,

Jordan

9 Department ofElectrical andComputer Engineering,

Lebanese American University, Byblos13-5053, Lebanon

10 School ofComputer Sciences, Universiti Sains Malaysia,

11800GeorgeTown, PulauPinang, Malaysia

11 School ofEngineering andTechnology, Sunway University

Malaysia, 27500PetalingJaya, Malaysia

12 Computer Science Department, Al al-Bayt University,

Mafraq25113, Jordan

A preview of this full-text is provided by Springer Nature.

Learn more

Content available from Social Network Analysis and Mining

This content is subject to copyright. Terms and conditions apply.

Migration Letters Knowledge Fusion by Harnessing Support Vector Machines for Collaborative Uncertain Data Classification in Multiagent Systems

Article

May 2024

Distributed data mining (DDM) has emerged as a useful method for analyzing data that is spread across multiple sources. Nevertheless, DDM has other challenges that restrict its effectiveness, such as autonomy, privacy, efficiency, and implementation. DDM's rigidity and lack of adaptability may render it unsuitable for numerous applications due to its requirement for a consistent environment, administration, control, and categorization procedures. In order to address these challenges, we suggest the implementation of MAS-DDM, which combines a multiagent system (MAS) with DDM. MAS, or Multiagent Systems, is a methodology used to create independent agents that possess shared environments and can collaborate and communicate with one another. The study showcases the advantages and attractiveness of MAS-DDM. In the context of MAS-DDM, agents can exchange their thoughts, even when the data they possess is classified and cannot be disclosed. Other agents can then decide whether to incorporate these beliefs into their decision-making process, which may result in a revision of their initial assumptions about each data class. MAS-DDM focuses on the support vector machine (SVM) method, which is commonly employed for handling uncertain data. Our investigation demonstrates that the performance of MAS-DDM surpasses that of DDM strategies that do not incorporate communicative processes, even when all MAS-DDM agents utilize the same methodology. We present empirical evidence demonstrating that the precision of the categorization job is significantly enhanced through the exchange of knowledge among agents.

Sentiment Analysis in E-Commerce Platforms: A Review of Current Techniques and Future Directions

Article

Full-text available

Jan 2023

Sentiment analysis (SA), also referred to as opinion mining, has become a widely used real-world application of Natural Language Processing in recent times. Its main goal is to identify the hidden emotions behind the plain text. SA is especially useful in e-commerce fields, where comments and reviews often contain a wealth of valuable business information that has great research value. The objective of this study is to examine the techniques used for SA in current e-commerce platforms as well as the future directions for SA in e-commerce. After examining the existing systematic review papers, it was found that there is a lack of a single comprehensive review paper that addresses research questions. The findings of this study can provide researchers in the field of SA with a comprehensive understanding of the current techniques and platforms utilized, as well as provide insights into the future directions. Through the utilization of specific keywords, we have identified 271 papers and have chosen 54 experimental papers for review. Among these, 26 papers (representing 48.%) have exclusively employed machine Learning techniques, while 24 (44.%) have looked into addressing SA through deep learning techniques, and 4 (7.%) have employed a hybrid approach using both machine learning and deep learning techniques. Additionally, our review revealed that Amazon and Twitter emerged as the two most favored data sources among researchers. Looking ahead, promising research avenues to include the development of more universal language models, aspect-based SA, implicit aspect recognition and extraction, sarcasm detection, and fine-grained sentiment analysis.

Artificial Intelligence Based Sentence Level Sentiment Analysis of COVID-19

Article

Full-text available

May 2023
COMPUT SYST SCI ENG

Web-blogging sites such as Twitter and Facebook are heavily influenced by emotions, sentiments, and data in the modern era. Twitter, a widely used microblogging site where individuals share their thoughts in the form of tweets, has become a major source for sentiment analysis. In recent years, there has been a significant increase in demand for sentiment analysis to identify and classify opinions or expressions in text or tweets. Opinions or expressions of people about a particular topic, situation, person, or product can be identified from sentences and divided into three categories: positive for good, negative for bad, and neutral for mixed or confusing opinions. The process of analyzing changes in sentiment and the combination of these categories is known as "sentiment analysis." In this study, sentiment analysis was performed on a dataset of 90,000 tweets using both deep learning and machine learning methods. The deep learning-based model long-short-term memory (LSTM) performed better than machine learning approaches. Long short-term memory achieved 87% accuracy, and the support vector machine (SVM) classifier achieved slightly worse results than LSTM at 86%. The study also tested binary classes of positive and negative, where LSTM and SVM both achieved 90% accuracy.

A Review on Sentiment Analysis from Social Media Platforms

Article

Full-text available

Mar 2023
EXPERT SYST APPL

Sentiment analysis has proven to be a valuable tool to gauge public opinion in different disciplines. It has been successfully employed in financial market prediction, health issues, customer analytics, commercial valuation assessment, brand marketing, politics, crime prediction, and emergency management. Many of the published studies have focused on sentiment analysis of Twitter messages, mainly because a large and diverse population expresses opinions about almost any topic daily on this platform. This paper proposes a comprehensive review of the multifaceted reality of sentiment analysis in social networks. We not only review the existing methods for sentiment analysis in social networks from an academic perspective, but also explore new aspects such as temporal dynamics, causal relationships, and applications in industry. We also study domains where these techniques have been applied, and discuss the practical applicability of emerging Artificial Intelligence methods. This paper emphasizes the importance of temporal characterization and causal effects in sentiment analysis in social networks, and explores their applications in different contexts such as stock market value, politics, and cyberbullying in educational centers. A strong interest from industry in this discipline can be inferred by the intense activity we observe in the field of intellectual protection, with more than 8,000 patents issued on the topic in only five years. This interest compares positively with the effort from academia, with more than 2,300 articles published in 15 years. But these papers are unevenly split across domains: there is a strong presence in marketing, politics, economics, and health, but less activity in other domains such as emergencies. Regarding the techniques employed, traditional techniques such as dictionaries, neural networks, or Support Vector Machines are widely represented. In contrast, we could still not find a comparable representation of advanced state-of-the-art techniques such as Transformers-based systems like BERT, T5, T0++, or GPT-2/3. This reality is consistent with the results found by the authors of this work, where computationally expensive tools such as GPT-3 are challenging to apply to achieve competitive results compared to those from simpler, lighter and more conventional techniques. These results, together with the interest shown by industry and academia, suggest that there is still ample room for research opportunities on domains, techniques and practical applications, and we expect to keep observing a sustained cadence in the number of published papers, patents and commercial tools made available.

A Machine Learning - Sentiment Analysis on Monkeypox Outbreak: An Extensive Dataset to Show the Polarity of Public Opinion from Twitter Tweets.

Article

Full-text available

Jan 2023

Research on sentiment analysis has proven to be very useful in public health, particularly in analyzing infectious diseases. As the world recovers from the onslaught of the covid-19 pandemic, concerns are rising that another pandemic, known as monkeypox, might hit the world again. Monkeypox is an infectious disease whose cases have been confirmed and reported in over 73 countries across the globe. This sudden outbreak has become a major concern for many individuals and health authorities. Different social media channels have presented discussions, views, opinions, and emotions about the monkeypox outbreak. Social media sentiments often result in panic, misinformation, and stigmatization of some minority groups. Therefore, accurate information, guidelines, and health protocols related to this virus are critical. We aim to analyze public sentiments on the recent monkeypox outbreak, with the purpose of helping decision-makers gain a better understanding of the public perceptions of the disease. We hope that government and health authorities will find the work useful in crafting health policies and mitigating strategies to control the spread of the disease, and guide against its misrepresentations. Our study was conducted in two stages. In the first stage, we collected over 500,000 multilingual tweets related to the monkeypox post on Twitter and then performed sentiment analysis on them using VADER and TextBlob, to annotate the extracted tweets into positive, negative, and neutral sentiments. The second stage of our study involved the design, development, and evaluation of 56 classification models. Stemming and lemmatization techniques were used for vocabulary normalization. Vectorization was based on CountVectorizer and TF-IDF methodologies. K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Random Forest, Logistic Regression, Multilayer Perceptron (MLP), Naïve Bayes, and XGBoost were deployed as learning algorithms. Performance evaluation was based on accuracy, F1 Score, Precision, and Recall. Our experimental results showed that the model developed using TextBlob annotation + Lemmatization + CountVectorizer + SVM yielded the highest accuracy of about 0.9348.

Explaining sentiment analysis results on social media texts through visualization

Article

Full-text available

Feb 2023
MULTIMED TOOLS APPL

Today, Artificial Intelligence is achieving prodigious real-time performance, thanks to growing computational data and power capacities. However, there is little knowledge about what system results convey; thus, they are at risk of being susceptible to bias, and with the roots of Artificial Intelligence (“AI”) in almost every territory, even a minuscule bias can result in excessive damage. Efforts towards making AI interpretable have been made to address fairness, accountability, and transparency concerns. This paper proposes two unique methods to understand the system’s decisions aided by visualizing the results. For this study, interpretability has been implemented on Natural Language Processing-based sentiment analysis using data from various social media sites like Twitter, Facebook, and Reddit. With Valence Aware Dictionary for Sentiment Reasoning (“VADER”), heatmaps are generated, which account for visual justification of the result, increasing comprehensibility. Furthermore, Locally Interpretable Model-Agnostic Explanations (“LIME”) have been used to provide in-depth insight into the predictions. It has been found experimentally that the proposed system can surpass several contemporary systems designed to attempt interpretability.

The Public Health Contribution of Sentiment Analysis of Monkeypox Tweets to Detect Polarities Using the CNN-LSTM Model

Article

Full-text available

Jan 2023

Monkeypox is a rare disease caused by the monkeypox virus. This disease was considered eradicated in 1980 and was believed to affect rodents and not humans. However, recent years have seen a massive outbreak of monkeypox in humans, setting off worldwide alerts from health agencies. As of September 2022, the number of confirmed cases in Peru had reached 1964. Although most monkeypox patients have been discharged, we cannot neglect the monitoring of the population with respect to the monkeypox virus. Lately, the population has started to express their feelings and opinions through social media, specifically Twitter, as it is the most used social medium and is an ideal space to gather what people think about the monkeypox virus. The information imparted through this medium can be in different formats, such as text, videos, images, audio, etc. The objective of this work is to analyze the positive, negative, and neutral feelings of people who publish their opinions on Twitter with the hashtag #Monkeypox. To find out what people think about this disease, a hybrid-based model architecture built on CNN and LSTM was used to determine the prediction accuracy. The prediction result obtained from the total monkeypox data was 83% accurate. Other performance metrics were also used to evaluate the model, such as specificity, recall level, and F1 score, representing 99%, 85%, and 88%, respectively. The results also showed the polarity of feelings through the CNN-LSTM confusion matrix, where 45.42% of people expressed neither positive nor negative opinions, while 19.45% expressed negative and fearful feelings about this infectious disease. The results of this work contribute to raising public awareness about the monkeypox virus.

Emotional Variance Analysis: A new sentiment analysis feature set for Artificial Intelligence and Machine Learning applications

Article

Full-text available

Jan 2023
PLOS ONE

Sentiment Analysis (SA) is a category of data mining techniques that extract latent representations of affective states within textual corpuses. This has wide ranging applications from online reviews to capturing mental states. In this paper, we present a novel SA feature set; Emotional Variance Analysis (EVA), which captures patterns of emotional instability. Applying EVA on student journals garnered from an Experiential Learning (EL) course, we find that EVA is useful for profiling variations in sentiment polarity and intensity, which in turn can predict academic performance. As a feature set, EVA is compatible with a wide variety of Artificial Intelligence (AI) and Machine Learning (ML) applications. Although evaluated on education data, we foresee EVA to be useful in mental health profiling and consumer behaviour applications. EVA is available at https://qr.page/g/5jQ8DQmWQT4. Our results show that EVA was able to achieve an overall accuracy of 88.7% and outperform NLP (76.0%) and SentimentR (58.0%) features by 15.8% and 51.7% respectively when predicting student experiential learning grade scores through a Multi-Layer Perceptron (MLP) ML model.

Influenza-like Illness Detection from Arabic Facebook Posts Based on Sentiment Analysis and 1D Convolutional Neural Network

Article

Full-text available

Nov 2022

The recent large outbreak of infectious diseases, such as influenza-like illnesses and COVID-19, has resulted in a flood of health-related posts on the Internet in general and on social media in particular, in a wide range of languages and dialects around the world. The obvious relationship between the number of infectious disease cases and the number of social media posts prompted us to consider how we can leverage such health-related content to detect the emergence of diseases, particularly influenza-like illnesses, and foster disease surveillance systems. We used Algerian Arabic posts as a case study in our research. From data collection to content classification, a complete workflow was implemented. The main contributions of this work are the creation of a large corpus of Arabic Facebook posts based on Algerian dialect and the proposal of a new classification model based on sentiment analysis and one-dimensional convolutional neural networks. The proposed model categorizes Facebook posts based on the users’ feelings. To counteract data imbalance, two techniques have been considered, namely, SMOTE and random oversampling (ROS). Using a 5-fold cross-validation, the proposed model outperformed other baseline and state-of-the-art models such as SVM, LSTM, GRU, and BiLTSM in terms of several performance metrics.

A CNN-LSTM-Based Hybrid Deep Learning Approach for Sentiment Analysis on Monkeypox Tweets

Article

Aug 2023

The research on sentiment analysis has shown a great deal of utility in the field of public health, specifically in the investigation of infectious illnesses. As the world begins to recuperate from the devastating effects of the COVID-19 pandemic, there is a growing concern that a different pandemic, known as Monkeypox, may strike the world once more. The contagious illness known as Monkeypox has been documented in over 73 countries worldwide. This unexpected epidemic has become a significant cause of anxiety for many people and health authorities. Various social media platforms have presented various perspectives regarding the monkeypox epidemic. Our goal is to research how the public feels about the recent Monkeypox epidemic to assist policymakers in developing a deeper comprehension of how the public views the illness. This research uses a CNN-LSTM-based hybrid architecture to ascertain people's feelings regarding Monkeypox disease. A series of experiments were conducted on an open-access dataset of tweets related to the Monkeypox. The tweets undergo various pre-processing, global vectorization, and one-hot encoding techniques. According to the findings of our experiments, the hybrid model provided better accuracy, which was approximately 91%. In addition, the findings are validated by contrasting them with more conventional machine learning techniques. The outcomes of this investigation contribute to a general population that has a greater awareness of the Monkeypox infection.

Depression Detection Through Identifying Depressive Arabic Tweets From Saudi Arabia: Machine Learning Approach

Conference Paper

Dec 2022

Arabic sentiment analysis of Monkeypox using deep neural network and optimized hyperparameters of machine learning algorithms

Abstract and Figures

Recommended publications

Arabic Sentiment Analysis for ChatGPT Using Machine Learning Classification Algorithms: A Hyperparam...

Arabic Sentiment Analysis of YouTube Comments: NLP-Based Machine Learning Approaches for Content Eva...

MYC: A Moroccan Corpus for Sentiment Analysis

Recognition of Arabic Air-Written Letters: Machine Learning, Convolutional Neural Networks, and Opti...