ArticlePDF Available

Social Media Sentiment Analysis and Opinion Mining in Public Security: Taxonomy, Trend Analysis, Issues and Future Directions

Authors:
Social media sentiment analysis and opinion mining in public security:
Taxonomy, trend analysis, issues and future directions
Mohd Suhairi Md Suhaimin
a,b
, Mohd Hanafi Ahmad Hijazi
a,c,
, Ervin Gubin Moung
a
,
Puteri Nor Ellyza Nohuddin
d,e
, Stephanie Chua
f
, Frans Coenen
g
a
Data Technology and Applications Research Group, Faculty of Computing and Informatics, Universiti Malaysia Sabah, Kota Kinabalu 88400, Sabah, Malaysia
b
Polytechnic and Community College Education Department, Galeria PjH Aras 4-7, Jalan P4W Persiaran Perdana, 62100 Putrajaya, Malaysia
c
Creative Advanced Machine Intelligence Research Centre, Faculty of Computing and Informatics, Universiti Malaysia Sabah, Kota Kinabalu 88400, Sabah, Malaysia
d
Institute of IR4.0, Universiti Kebangsaan Malaysia, 43000 Selangor, Malaysia
e
Faculty of Business, Higher Colleges of Technology, United Arab Emirates
f
Faculty of Computer Science and Information Technology, Universiti Malaysia Sarawak, 94300 Kota Samarahan, Sarawak, Malaysia
g
Department of Computer Science, University of Liverpool, United Kingdom
article info
Article history:
Received 28 March 2023
Revised 31 August 2023
Accepted 19 September 2023
Available online 26 September 2023
Keywords:
Sentiment analysis
Opinion mining
Public security
Public threat
Taxonomy
abstract
The interest in social media sentiment analysis and opinion mining for public security events has
increased over the years. The availability of social media platforms for communication provides a valu-
able source of information for sentiment analysis and opinion mining research. The content shared across
the media gives potential input to the physical environment and social phenomena related to public
security threats. The input has been used to: monitor public security threats or emergency events, ana-
lyzing sentiment and opinionated data for threat management and the detection of public security threat
events using geographic location-based sentiment analysis. However, a systematic survey that describes
the trends and latest developments in this domain is unavailable. This paper presents a survey of social
media sentiment analysis and opinion mining for public security. This paper aims to: understand the pro-
gress of the current state-of-the-art, identify the research gaps, and propose potential future directions. In
total, 200 articles published from 2016 to 2023 were considered in this survey. The taxonomy shows the
key attributes and limitations of the work presented in the surveyed articles. Subsequently, the potential
future direction of work on sentiment analysis in the public security domain is suggested for interested
researchers.
Ó2023 The Author(s). Published by Elsevier B.V. on behalf of King Saud University. This is an open access
article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
Contents
1. Introduction . . . ........................................................................................................ 2
2. Methodology. . . ........................................................................................................ 3
3. An overview of sentiment analysis and opinion mining for public security . . . . . . . . . ............................................... 3
4. The taxonomy for sentiment analysis and opinion mining for public security . . . . . . . ............................................... 5
4.1. Objective of sentiment analysis and opinion mining for public security. . . . . . . . . . . . . . . . . ..................................... 5
4.1.1. Analysis of events. . . . . . . . . . ................................................................................ 6
4.1.2. Improvement of techniques . . ................................................................................ 7
https://doi.org/10.1016/j.jksuci.2023.101776
1319-1578/Ó2023 The Author(s). Published by Elsevier B.V. on behalf of King Saud University.
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
Corresponding author.
E-mail addresses: mohd_suhairi_di21@iluv.ums.edu.my (M.S. Md Suhaimin), hanafi@ums.edu.my (M.H. Ahmad Hijazi), ervin@ums.edu.my (E.G. Moung), pnohuddin@hct.
ac.ae (P.N.E. Nohuddin), chlstephanie@unimas.my (S. Chua), coenen@liverpool.ac.uk (F. Coenen).
Peer review under responsibility of King Saud University. Production and hosting by Elsevier.
Production and hosting by Elsevier
Journal of King Saud University Computer and Information Sciences 35 (2023) 101776
Contents lists available at ScienceDirect
Journal of King Saud University
Computer and Information Sciences
journal homepage: www.sciencedirect.com
4.1.3. Corpus generation. . . . . . . . . . ................................................................................ 7
4.1.4. Multipurpose . . . . . . . . . . . . . ................................................................................ 7
4.2. Domain of interest in public security. . . . . . . ........................................................................... 8
4.2.1. Natural . . ................................................................................................ 8
4.2.2. Non-natural (human-made). . ................................................................................ 8
4.3. Public security event timeframe. . . . . . . . . . . ........................................................................... 8
4.3.1. Pre-event. ................................................................................................ 8
4.3.2. During-event. . . . . . . . . . . . . . ................................................................................ 8
4.3.3. Post-event ................................................................................................ 9
4.4. Social media platform . . . ........................................................................................... 9
4.5. Dataset . . . . . . . . . . . . . . . .......................................................................................... 10
4.6. Language . . . . . . . . . . . . . .......................................................................................... 11
4.7. Sentiment analysis and opinion mining approach . . . . . . . . . . . . . . . ....................................................... 11
4.7.1. Machine learning-based . . . . . ............................................................................... 11
4.7.2. Lexicon approaches . . . . . . . . ............................................................................... 12
4.7.3. Hybrid . . . ............................................................................................... 13
4.7.4. Manual coding . . . . . . . . . . . . ............................................................................... 13
5. Analysis of trend, issues and future directions of sentiment analysis and opinion mining for public security . . . . . . . . . . . . . . . . . ........... 14
5.1. Trend analysis of sentiment analysis and opinion mining objectives for public security. . . . .................................... 14
5.2. Trend analysis of domain of interest in public security . . . . . . . . . . . ....................................................... 14
5.3. Trend analysis of event timeframe. . . . . . . . . .......................................................................... 14
5.4. Trend analysis of social media platforms used . . . . . . . . . . . . . . . . . . ....................................................... 15
5.5. Trend analysis of language of dataset used . . .......................................................................... 16
5.6. Trend analysis of dataset type used . . . . . . . . .......... ................................................................ 16
5.7. Trend analysis of approaches for sentiment analysis and opinion mining in public security .... ................................ 16
5.8. Issues and future direction of social media sentiment analysis and opinion mining for public security . . . . . . . . . . ................. 17
5.8.1. Issues. . . . ............................................................................................... 17
5.8.2. Potential future works . . . . . . ............................................................................... 18
6. Discussion. . . . . ....................................................................................................... 19
7. Conclusions. . . . ....................................................................................................... 19
Author contributions. . . . . . . . . . . . . . . . . .................................................................................... 20
Declaration of competing interest . . . . .................................................................................... 20
Acknowledgement . . . . . . . . . . . . . . . . . .................................................................................... 20
References . . . . ....................................................................................................... 20
1. Introduction
Ensuring public security has long been a core factor of a stable
country. Over time, the definition of security has broadened to
encompass a range of sectors and domains, including environ-
mental, societal, economic, and political. Additionally, the con-
cept of security has deepened to include individual safety and
well-being, not just national or state-level security (Stevens and
Vaughan-Williams, 2016). According to the Oxford dictionary,
security encompasses the activities involved in protecting a
country, building, or person against attack and danger, as well
as the state of feeling happy and safe from danger or worry
(Hornby and Cowie, 1995). In general, public security consists
of: maintaining social privacy, eliminating risks, and the optimal
use of opportunities to ensure sustainable development and
well-being (Dehdezi and Sardi, 2016). The common definition
of public security is the protection and safety of persons or prop-
erty against the threat of attack and danger (Manunta, 1999;
Ortmeier, 1998). The threat concerns can be criminal or non-
criminal. Criminal threats typically arise from non-natural causes
such as terrorism, riots, protests, crises, conflicts, accidents and
crime. Non-criminal threats, on the other hand, are caused by
natural events such as natural disasters, disease outbreaks and
pandemics. Ensuring public security is vital for protecting the
general public: from significant threats, danger, injury, harm,
damage and/or loss of life; whether caused by natural or non-
natural events (Bansal et al., 2021; Chung and Zeng, 2018;
Ortmeier, 1998). These events have seriously threatened human
life and safety for a considerable time, causing significant eco-
nomic and cultural loss.
Opinion mining, also known as sentiment analysis, is the field of
study that analyzes people’s opinions, sentiments, appraisals, atti-
tudes, and emotions toward entities and their attributes expressed
in written text. Although the terms ‘‘opinion” and ‘‘sentiment” are
related, there is a subtle difference between them. Opinion mining
primarily deals with a person’s concrete view of something, while
sentiment refers to an attitude or thought prompted by a feeling.
Opinion mining involves two levels of abstraction: a single opinion
and a set of opinions, whereas sentiment analysis mainly focuses
on opinions that express or imply positive or negative sentiment
(Liu, 2020). However, the terms ‘‘sentiment analysis” and ‘‘opinion
mining” are used interchangeably as an umbrella for different
tasks, such as opinion extraction, sentiment mining, subjectivity
analysis, effect analysis, review mining, entity extraction and emo-
tion analysis (Bhatia et al., 2020;Gupta and Agrawal, 2020; Pang
and Lee, 2008).
Sentiment analysis and opinion mining were initially used for
product review applications but have recently shifted to other
tasks, including: stock markets, elections, disasters, healthcare
and software engineering (Mäntylä et al., 2018). In the context of
public security, sentiment analysis and opinion mining have been
used to analyze sentiment and public opinion, of an event or disas-
ter, so as to trigger warnings to the public. More generally, senti-
ment analysis can be said to comprise the analysis of the
sentiment, emotions, opinions and attitudes expressed by individ-
uals towards events, phenomena and particular crises. Due to the
availability of social media data and the huge amount of sentiment
and opinionated data it contains, sentiment analysis and opinion
mining have been used in a wide range of domains, such as busi-
ness, marketing, entertainment, hospitality, politics, social issues,
M.S. Md Suhaimin, M.H. Ahmad Hijazi, E.G. Moung et al. Journal of King Saud University Computer and Information Sciences 35 (2023) 101776
2
healthcare and disasters. The techniques used have been employed
not only by organizations and individuals but also by local and fed-
eral governments. The occurrence of events is typically accompa-
nied by public opinion on social media, which serves as a
medium for disseminating or expressing sentiment. The content
shared across social media platforms provides a valuable source
of knowledge about the physical environment and social phenom-
ena (Alfarrarjeh et al., 2017). The availability of textual data from
social media platforms has encouraged research on sentiment
analysis and opinion mining. As a result, the public security
domain has become an important application domain in sentiment
analysis and opinion mining.
Researchers have leveraged sentiment analysis and opinion
mining to achieve a range of purposes. For example, there is much
reported work that utilizes sentiment analysis and opinion mining
directed at social media, to analyze public security threats. This
includes the analysis of social media data that expresses opinions
and/or sentiment for the monitoring of public security threats
and emergency events, as well as the prediction or detection of
events from social media acquired data using sentiment analysis.
Additionally, researchers have employed geographic location-
based sentiment analysis for the detection of public security
threats or emergency events (Sattaru et al., 2021).
Several literature reviews and survey papers for sentiment anal-
ysis and opinion mining directed at the public security domain
have been published. de Carvalho and Seixas Costa (2021) provided
a comprehensive review of social web mining and sentiment anal-
ysis in public security and proposed a research agenda for future
work. Boukabous and Azizi (2020) reviewed the latest trends in
learning-based sentiment analysis techniques for security intelli-
gence purposes, with a focus on cybersecurity, security attacks,
crimes, extremism, disasters and hate speech. They suggested a
future research direction to consider combining learning-based
security for sentiment analysis. Meanwhile, Sharma and Jain
(2020) surveyed the sentiment analysis approaches and techniques
for social media security and analytics, covering various security
domains such as deception detection, anomaly detection, risk man-
agement, and disaster relief. Finally, Razali et al. (2021) examined
the literature on opinion mining in multiple domains based on
text, and highlighting the potential of the Kansei approach in
national security research. However, these papers did not provide
a descriptive taxonomy of sentiment analysis and opinion mining
in the public security domain. A taxonomy categorizes previous
work based on specified attributes that could help stakeholders
to better comprehend the issues involved. Therefore, this survey
paper aims to fill this gap by categorizing relevant work, and devel-
oping a taxonomy based on the most recent research on sentiment
analysis and opinion mining for public security. Additionally, this
paper provides an overview and analysis of current trends in sen-
timent analysis and opinion mining in public security, an overview
and analysis not included in previous surveys and studies directed
at the topic. Finally, the paper discusses the current issues and
future directions of sentiment analysis and opinion mining in the
public security domain.
This survey paper has three primary objectives: (1) to develop a
taxonomy of the current state-of-the-art sentiment analysis and
opinion mining techniques specifically applicable to the public
security domain; (2) to visualize and analyze recent trends in the
field; and (3) to identify any remaining issues and suggest poten-
tial future research directions. The structure of the paper is as fol-
lows: Section 2 outlines the methodology used to conduct this
survey, Section 3 provides an overview of sentiment analysis and
opinion mining in the context of public security, Section 4 presents
the developed taxonomy and provides detailed explanations of
each sub-branch within it, Section 5 analyzes trends, research gaps,
and remaining issues in the field, and suggests potential future
directions for sentiment analysis and opinion mining in public
security. Section 6 discusses the critical area reflected to the issues
and future direction based on authors’ opinions and finally, Sec-
tion 7 concludes the paper.
2. Methodology
In this section, the methodology used to conduct the survey of
recent work directed at sentiment analysis and opinion mining
for public security is described. Fig. 1 shows the methodology used.
First, several relevant databases of peer-reviewed articles were
identified. The Scopus
1
, IEEE Xplore
2
, and Science Direct
3
databases
were selected due to their wide coverage of scientific peer-reviewed
articles and strict evaluations of the journals indexed in their data-
bases. A set of relevant keywords were used to search for all possible
articles related to sentiment analysis and opinion mining in public
security. Various combinations of terms and operators were used,
including ‘‘sentiment AND analysis AND public AND security", ‘‘opin-
ion AND mining AND public AND security", ‘‘sentiment AND analysis
AND public AND threat", ‘‘opinion AND mining AND public AND
threat", ‘‘sentiment AND analysis AND public AND disaster", ‘‘opin-
ion AND mining AND public AND disaster", ‘‘sentiment AND analysis
AND public AND event", and ‘‘opinion AND mining AND public AND
event”. The search yielded a total of 2097 articles in Scopus, 669 in
IEEE Xplore, and 239 in Science Direct.
Second, a screening process of the articles found for each data-
base was conducted. The non-academic articles were removed. The
articles that remained consist of journals, conference proceedings
and serials. Duplication of articles found in each of the selected
databases was also removed. To ensure only the state-of-the-art
work was considered in this paper, only articles published in
recent years (2016–2023) were included. After the screening, the
remaining number of articles were Scopus: 1903, IEEE Xplore:
663, and Science Direct: 166.
Third, we performed the eligibility filtering process, which
involved two sub-processes: (i) the combination of the databases
and removal of duplicates across the database, and (ii) a brief
review of the articles. After removing duplicates, the number of
articles was reduced to 1485. Then, we manually reviewed the
titles, abstracts, and keywords of each article to identify works that
employed sentiment analysis/opinion mining in the public security
domain and were written in English. Based on these criteria, we
selected a total of 280 articles. We conducted a thorough review
of the content of these articles and found 200 articles to be eligible
for inclusion in this paper.
3. An overview of sentiment analysis and opinion mining for
public security
This section provides an overview of opinion mining and senti-
ment analysis in the context of public security. The overview is
based on the general framework found in the majority of the arti-
cles. Fig. 2 shows the general framework typically adopted when
conducting sentiment analysis and opinion mining in the public
security domain.
As noted in the introduction to this paper, the aim of sentiment
analysis and opinion mining is to analyze written text to identify
sentiments, opinions, attitudes, emotions, and appraisals toward
entities and their attributes. In the context of the public security
domain, this involves analyzing text related to events that could
1
https://www.scopus.com/
2
https://ieeexplore.ieee.org/Xplore/home.jsp
3
https://www.sciencedirect.com/
M.S. Md Suhaimin, M.H. Ahmad Hijazi, E.G. Moung et al. Journal of King Saud University Computer and Information Sciences 35 (2023) 101776
3
potentially threaten the public, whether in the form of criminal
activity or non-criminal incidents caused by natural or non-
natural factors. The initial step in conducting sentiment analysis
and opinion mining is to acquire a dataset of text from a social
media platform or other relevant sources using keywords, geo-
graphic location information, or specific timeframes based on the
objectives of the research. In some cases, publicly shared datasets
may be used, which eliminates the need for data acquisition. For
language-focused events, the relevant social media platform is
used to provide insight into the true sentiment of the event.
The acquired dataset undergoes pre-processing using text-
processing techniques to transform the raw data into a suitable
form. Various approaches such as machine learning-based, lexicon
or rule-based, hybrid, or manual coding can be used, depending on
the preference of the researcher. However, supervised and semi-
supervised machine learning require a data annotation step for
data labeling before pre-processing. This pre-processing stage is
required to remove noise and irrelevant data in preparation for
the feature engineering stage (Liu, 2020). Pre-processing tech-
niques commonly used include text cleaning, normalization
(Wadawadagi and Pagi, 2021;Zhang and Cheng, 2021), replace-
ment (Geeta, 2016), and stopword removal (Mohamed Ridhwan
and Hargreaves, 2021).
After pre-processing, the next step is feature engineering, which
involves feature extraction, selection, and representation (Eke
et al., 2020). Features are then extracted from the pre-processed
Remove duplication
Consideration
Review topic/ abstract
Remove duplication
Selection
String keywords: sentiment analysis, opinion mining, public security,
Scopus = 2097, IEEE Xplore = 669, Science Direct = 239
Scopus = 1903, IEEE Xplore = 663, Science Direct = 236
2016 to 2023: S copus = 123 1, IEEE Xplor e = 388, Science Direct =
166
Total database combined = 1785
Total after topic and abstract review = 280
Total after relevant content selection = 200
Sco pus = 20 97, IE EE X plore = 66 9, Sci ence Dire ct = 2
39
Sco pus = 19 03, IE EE X plore = 66 3, Sci ence Dire ct = 2
36
2016 to 2023: S copus = 1231, IEEE Xplore = 388, Science Direct
=
16
6
Da taba se: S co pus, IEE E X plore , Sc ien ce Dir ect
String keywords: sentiment analysis, opinion mining, public security,
publ ic threa t, pub lic di sast er, pub lic eve nt
Iden tif icat ion
Scre enin g
Sele ct d at abase
Query string
Remove non-academ ic articles
Remove duplication within
database
Select relevant years
Total database combined = 1785
Total after topic and abstract review = 280
Eligibility
Combine articles
Review title/ abstract/ keywords
Remove duplication across
database
Re
vi
ew
f
ul
re
le
va
nt
c
on
te
nt
To
ta
af
te
r
re
le
va
nt
c
on
te
nt
el
ec
ti
on
=
20
0
Incl ude d
Review full relevant content
Total after duplication removal = 1485
dliti
dplicatio
Fig. 1. Methodology used to conduct this survey.
Fig. 2. General sentiment analysis and opinion mining framework in the public security domain.
M.S. Md Suhaimin, M.H. Ahmad Hijazi, E.G. Moung et al. Journal of King Saud University Computer and Information Sciences 35 (2023) 101776
4
data. The extracted features represent the original text in a mean-
ingful form, typically numerical, that is compatible with the algo-
rithms used Kulkarni and Shivananda (2019). The techniques
used to extract features include statistical, Natural Language Pro-
cessing (NLP), or rule-based techniques. Deep learning has also
been proposed as a feature engineering method, which can learn
multiple levels of representation from raw data and reduce the
effort required for feature extraction and selection (Bhatia et al.,
2020).
After the feature engineering stage, sentiment or opinion classi-
fication can be performed. This is typically conducted using some
form of classification algorithm often founded on a lexicon of some
kind. In the lexicon-based approach, sentiment classification is
based on sentiment resources, such as a lexicon or a corpus data-
base. Usually, as a result of the classification, a sentiment score is
calculated and evaluated according to the associated sentiment
orientation and strength (Pang and Lee, 2008). Other approaches
use topic modeling to categorize the topics within the dataset
(Lee and Nerghes, 2017). The system’s performance, using test
data, is typically recorded for evaluation purposes.
The evaluation of system performance entails a comprehensive
understanding of several key measures. The evaluation metrics used
can be categorized based on the approach adopted for analysis.
Machine learning-based approaches frequently adopt metrics such
as accuracy, precision, recall, and F1-measures. Accuracy is the pro-
portion of total predictions that were correctly identified, encom-
passing both positive and negative sentiments. Precision, on the
other hand, indicates the proportion of positive identifications that
were actually correct. Recall, also known as sensitivity, is the propor-
tion of actual positives that were correctly identified. The F1-
measure provides a balance between precision and recall. The per-
formance measures for lexicon approach usually adopt accuracy
based on sentiment score calculation, determined based on the indi-
vidual sentiment scores of the words or phrases in the text. These
sentiment scores are typically derived from a pre-compiled senti-
ment lexicon, where each word or phrase is associated with a senti-
ment score (Fan et al., 2020; Thorat and Namrata Mahender, 2019).
Finally, the resulting sentiment analysis is used to provide
insight into the event of interest, either by predicting future occur-
rences or by providing a retrospective analysis of the event.
4. The taxonomy for sentiment analysis and opinion mining for
public security
In this section, a taxonomy of the recent work directed at senti-
ment analysis and opinion mining, derived from the overview dis-
cussed in Section 3, is presented. The aim of the taxonomy was to
summarize and provide a clear picture of the main concepts
expressed, and similarities between the recent works. Seven attri-
butes were identified for inclusion in the taxonomy, represented
using the oval shape in Fig. 3. These attributes were chosen as they
were consistent across all articles surveyed. The seven attributes
are: (1) objectives of the work conducted, (2) the domain of public
security, (3) the public security event timeframe, (4) the social
media platform used for data acquisition, (5) dataset type, (6) the
language of the dataset, and (7) the sentiment analysis or opinion
mining approach employed. Sections 4.1–4.7 provide a detailed
description of each attribute and a survey of relevant works in that
area.
4.1. Objective of sentiment analysis and opinion mining for public
security
In this survey, the first attribute of the taxonomy is the catego-
rization of recent work based on their objectives, as shown in
Table 1. The identification of the objective is necessary for under-
standing the issues addressed and the proposing of potential solu-
tions. It was found that most of the previous work focused on a
specific task of sentiment analysis in public security threats,
although some work did consider several tasks. The taxonomy cat-
egorized the recent works’ objectives into four groups: (1) analysis
of events, (2) improvement of techniques, (3) corpus generation,
and (4) multipurpose. Analysis of events refers to work in which
the sentiment analysis or opinion mining was applied to a specific
event in order to either: gain insight into the event and mine
related information, or analyze the event in order to support disas-
ter and emergency management. The technique improvement
objective refers to efforts to improve the performance of sentiment
analysis and opinion mining frameworks in general. It can be fur-
ther divided into feature engineering, classification, or event pre-
diction techniques. Corpus generation objective describes work
Fig. 3. Taxonomy of sentiment analysis and opinion mining in public security.
M.S. Md Suhaimin, M.H. Ahmad Hijazi, E.G. Moung et al. Journal of King Saud University Computer and Information Sciences 35 (2023) 101776
5
aimed at producing a corpus or dataset in the public security
domain. Finally, the multipurpose objective combined both of the
preceding objectives or more than one objective from each class.
Sub-sections 4.1.1 to 4.1.4 present the attributes and details of
each group.
4.1.1. Analysis of events
The most common objective of the reviewed work was to per-
form an analysis of some public security event. Table 1 categorizes
this objective into three groups: (i) analysis of a specific event, (ii)
disease outbreak or pandemic analysis, and (iii) disaster or emer-
Table 1
Natural domain’s event of related works.
Objectives Details Related works
Analysis specific
event
Natural disasters Astuti et al. (2023),Becken et al. (2022),Chenxi et al. (2022),Du et al. (2023),Dudani et al., (2020),Hasegawa
et al. (2020),He et al. (2019),Henríquez-Coronel et al. (2019),Karami et al. (2020),Karimiziarani and
Moradkhani (2023),Lian et al. (2020),Loureiro and Alló (2020),Ma et al. (2020);Mendon et al. (2021),
Mustakim et al. (2021),Ray and Kumar (2023);Uthirapathy and Sandanam (2023);Wu and Cui 2018,Xiong
et al. (2020),Yuan and Liu (2020),Yue et al. (2018),Zander et al. (2023),Zeng (2022)
Non-natural disasters (Abdul Reda et al. 2023;Al-Agha and Abu-Dahrooj (2019);Backfried and Shalunts (2016);Berhoum et al.
(2023);Chandra et al. (2020);Chaudhary and Bansal (2021);Chung and Zeng (2016);Duan et al. (2020);Gascó
et al. (2017);Geeta and Niyogi (2016);Gong et al. (2022);Gu et al. (2021);Khatua et al. (2020);Kostakos et al.
(2018);Kovács et al. (2021);Koytak and Celik 2022;Kumar (2018);Lee and Nerghes (2017);Lee and Nerghes
(2018);Lee et al. (2020);Li et al. (2020);Li et al. (2016);Lyu and Lu (2023);Pope and Griffith (2016);Prathap
and Ramesha (2019);Pu et al. (2022);Qi et al. (2019);Tan et al. (2021);Wang et al. (2016);Wu and Lu (2017);
Zhou and Jing (2020)
Analyze outbreak/
pandemic
Epidemic/ pandemic/
coronavirus
Abusaqer et al. (2023);Adamu et al. (2021);Ali et al. (2023);Anuratha and Parvathy (2023);Arias et al. (2022);
Azmi et al. (2022);Bashar (2022);Cai et al. (2021);Cao et al. (2021);Che et al. (2023);Chen et al. (2021);
Dimitrov et al. (2020);Fakhry et al. (2020);Garcia and Berton (2021);Ghanem et al. (2021);Gu et al. (2021);
Gupta et al. (2021);Han et al. (2022);Hu, (2022);Iksan et al. (2021);Khandelwal and Chaudhary, (2022);Li
et al. (2021);Liu et al. (2023);Liu et al. (2021);Lyu et al. (2020);Mathayomchan et al. (2022);Mohamed
Ridhwan and Hargreaves (2021);Pan et al. (2022);Pran et al. (2020);Praveen et al. (2022);Quach et al. (2022);
Rathke et al., 2023;Samaras et al., 2023;Sari and Ruldeviyani, 2020;Sukhwal and Kankanhalli, 2022;Sun et al.,
2023;Sun et al., 2022;Tsao et al., 2022;Xie and Chen, 2022;Yu et al., 2021;Yu et al., 2020;Zhang et al., 2022;
Zhang et al., 2021;Zhao et al., 2022;Zhao et al., 2020;Zhu et al., 2020)
Disease outbreak Hyo Jin et al., 2016;Olusegun et al., 2023;(Qin and Ronchieri, 2022;Su and Li, 2018;Zhou and Zhang, 2017)
Disaster/ emergency
management
Improve disaster
management
Alomari et al., 2020;Behl et al., 2021;Chen and Ji, 2021;Chung and Zeng, 2018;Dong et al., 2021;Fan et al.,
2020;Liu et al., 2023;Obiedat et al., 2021;Subramaniyaswamy et al., 2017;Xu et al., 2017;Zhang et al., 2020
Explore disaster/
emergency management
Bai and Yu (2016);El Ali et al. (2018);Fang et al. (2022);Gangadhari and Khanzode (2021);Li et al. (2020);Zhou
(2021);Zhou (2023);Zhou (2023);Zhuang et al. (2021)
Feature engineering Information enrichment Alfarrarjeh et al. (2017);Bala et al. (2017);Chen et al. (2019);Dahal et al. (2019);Du et al. (2023);Fakhry et al.
(2020);Karami et al. (2020);Laudy et al. (2017);Litvak et al. (2016);Mohamed Ridhwan and Hargreaves (2021);
Sun et al. (2023);To et al. (2017);Yan et al. (2020);Yu et al. (2021);Yue et al. (2018);Zhang and Cheng (2021);
Zhu et al. (2020)
Lexicon expansion Abid et al. (2017);Bai and Yu (2016);Li and Wang (2022);Li and Wang (2021);Li and Sun (2023);Ma et al.
(20160;Moraes Silva et al. (2022);Qin et al. (2022);Samaras et al. (2023);Sufi and Khalil (2022);Sun et al.
(2022);Xie and Chen (2022);Yin et al. (2017);Zhou (2021)
Classification Dual/ multilevel sentiment Adamu et al. (2021);Al-Agha and Abu-Dahrooj (2019);Anuratha et al. (2021);Anuratha and Parvathy (2023);
Backfried and Shalunts (2016);Fuadvy and Ibrahim (2019);Geeta and Niyogi (2016);Laudy et al. (2017);Lyu
et al. (2020);Olusegun et al. (2023);Pope and Griffith (2016);Theja Bhavaraju et al. (2019);Thukral et al.
(2021);Zhang and Xu (2021);Zhou and Zhang (2017)
Deep learning Amin et al. (2021);Andhale et al. (2021);Bansal et al. (2021);Cao (2020);Fattoh et al. (2022);Li et al. (2021);
Lin and Moh (2021);Lydiri et al. (2023);Srikanth et al. (2022);Wadawadagi and Pagi (2021)
Classification technique Akbar et al. (2021);Bashar (2022);Bhullar et al. (2022);Chen et al. (2019);Chenxi et al. (2022);Chouhan
(2021);Dimitrov et al. (2020);Jiang et al. (2017);Li and Sun (2023);Li et al. (2016);Mendon et al. (2021);
Moraes Silva et al. (2022);Qin et al. (2022);Sadiq et al. (2020);Sliva et al. (2019);Tsai and Wang (2020);Wahid
et al. (2021);Wang et al. (2020);Xia et al. (2022);Zhang and Ma (2023)
Topic modeling geo-related Choirul Rahmadan et al. (2020);Kovács et al. (2021);Tan et al. (2021);Tang et al. (2022);Xiong et al. (2020);
Yan et al. (2020);Yuan et al. (2020);Zhuang et al. (2021)
Annotation and pipeline Zhang et al. (2020);Zhang et al. (2019)
Event prediction Early event detection Adamu et al. (2021);Almehmadi et al. (2017);Alomari et al. (2020);An et al. (2019);An et al. (2021);Bai and Yu
(2016);Barachi et al. (2022);Chowdhury et al. (2022);Daou (2021);Duan et al. (2020);Lamsal et al. (2022);Li
et al. (2017);Li and Liu (2020);Litvak et al. (2016);Luna et al. (2022);Nagapudi et al. (2021);Shi et al. (2021);
Smith et al. (2018);Tang et al. (2022);Wang et al. (2020);Wu and Cui (2018);Xu et al. (2017);Yu et al. (2021);
Zhong (2021)
Real-time monitoring Al-khateeb and Agarwal (2016);Albayrak and Gray-Roncal (2019);Ali et al. (2023);Kabbani et al. (2022);Laudy
et al. (2017);Sattaru et al. (2021);Sufi and Khalil (2022)
Corpus generation Corpus initiation Backfried and Shalunts (2016);de Carvalho et al. (2020);Effrosynidis et al. (2022);Imran et al. (2022);Li and
Wang (2022);Mamta et al. (2020)
Generation technique
enhancement
Abid et al. (2017);Bai and Yu (2016);de Carvalho and Costa (2022);Ma et al. (2016);Olusegun et al. (2023);Tsai
and Wang (2020);Zhou (2021))
Multipurpose Abid et al. (2017);Adamu et al. (2021);Al-Agha and Abu-Dahrooj (2019);Ali et al. (2023);Alomari et al. (2020);Anuratha and Parvathy
(2023);Backfried and Shalunts (2016);Bai and Yu (2016);Bashar (2022);Chenxi et al. (2022);Dimitrov et al. (2020);Du et al. (2023);Duan
et al. (2020);Fakhry et al. (2020);Geeta and Niyogi (2016);Karami et al. (2020);Kovács et al. (2021);Laudy et al. (2017);Li et al. (2020);Li and
Wang (2022);Li and Sun (2023);Litvak et al. (2016);Liu et al. (2023);Lyu et al. (2020);Mendon et al. (2021);Mohamed Ridhwan and
Hargreaves (2021);Moraes Silva et al. (2022);Olusegun et al. (2023);Pope and Griffith (2016);Qin et al. (2022);Samaras et al. (2023);Sufi and
Khalil (2022);Sun et al. (2023);Sun et al. (2022);Tan et al. (2021);Tang et al. (2022);Wu and Cui (2018);Xie and Chen (2022);Xiong et al.
(2020);Xu et al. (2017);Yan et al. (2020);Yu et al. (2021);Yue et al. (2018);Zhang et al. (2021);Zhou (2021);Zhu et al. (2020);Zhuang et al.
(2021)
M.S. Md Suhaimin, M.H. Ahmad Hijazi, E.G. Moung et al. Journal of King Saud University Computer and Information Sciences 35 (2023) 101776
6
gency management. In the analysis of specific event groups, most
work was found to focus on analyzing: natural disasters, war and
terrorist activity, crime and chaos. Other analyses included: immi-
grant and refugee, energy/ nuclear activity, infrastructure and eco-
nomic threat. The main output of this group was an analytical
study that measures and analyzes the sentiment or opinion of
the public towards a certain event. It typically included an in-
depth analysis to better understand: the severity level and demo-
graphics (Al-Agha and Abu-Dahrooj, 2019; Chandra et al., 2020;
Zhang et al., 2021), public behaviors and responses (Chung and
Zeng, 2016; Gascó et al., 2017; Kostakos et al., 2018), temporal sen-
timent analysis (Chaudhary and Bansal, 2021; Kovács et al., 2021;
Yu et al., 2021), event scenarios (Gu et al., 2021; Lian et al., 2020),
and sentiment distribution and visualization (Wang et al., 2016;
Wu and Lu, 2017; Xiong et al., 2020).
In the analysis of the disease outbreak and pandemic group, the
aim was to analyze diseases and pandemics that threatened the
public, such as the Middle East Respiratory Syndrome (MERS), Zika
virus, multiple diseases, monkeypox, and recent Coronavirus
(COVID-19). The objectives included sentiment analysis and topic
modeling of COVID-19. Unlike the first group described in the fore-
going paragraph, this group only focused on analyzing pandemics
or disease outbreaks. Thus, the output of this group was an analyt-
ical study of an outbreak or pandemic that included outbreak iden-
tification (Yu et al., 2020;Zhao et al., 2022), tracking the pandemic
(Arias et al., 2022; Fakhry et al., 2020), and statistical analysis dur-
ing a pandemic (Ghanem et al., 2021;Zhang et al., 2021).
For the disaster or emergency management group, some of the
work considered focused on improving disaster management and
exploring emergency management of specific events. The output
of this group was typically an analysis of emergency and disaster
events that might include: a sentiment and disaster concern index
(Bai and Yu, 2016), an event urgency level (Chen and Ji, 2021;
Subramaniyaswamy et al., 2017), a quantitative analysis of public
resilience during emergency events (Fan et al., 2020;Li et al.,
2020; Zhang et al., 2020), and decision support systems for
improving disaster management (Behl et al., 2021; Obiedat et al.,
2021; Xu et al., 2017).
4.1.2. Improvement of techniques
The second objective of recent work that was found was to
improve the techniques used for sentiment analysis and opinion
mining systems. Proposed improvements have focused on three
main areas: (i) feature engineering, (ii) classification improvement,
and (iii) event prediction. Table 1 lists the work considered related
to these technique improvements. Each is described in the follow-
ing paragraphs.
For feature engineering improvement, the surveyed work
tended to be focused on feature enhancement and representation
techniques. In this context, features refer to input variables or attri-
butes derived from distinct raw data (Guyon et al., 2008). The goal
of improving features is to find a good representation of data rele-
vant to a specific domain and associated measurement. The sur-
veyed work identified two potential means for enhancing
features: information enrichment and expansion of lexicon. For
the information enrichment, all relevant work considered geo-
graphical and location information from harvested social media
data. Specifically, proposed approaches included: the development
of geographical sentiment models, exploration of spatio-temporal
factors, geographical sentiment analysis with topic modeling and
framework for analyzing social media using geographic locations
during disasters. The objective of lexicon expansion, on the other
hand, is more focused on improving the vocabulary and dictionary
to tackle the rapid evolution of language. This particular task pre-
sents greater challenges, resulting in a comparatively lower distri-
bution for this objective. The output for this group tended to be a
methodology for sentiment or opinion analysis from a geographic
location perspective, including geographic location-based senti-
ment models (Dahal et al., 2019; Fakhry et al., 2020;Yue et al.,
2018), and spatio-temporal approaches for sentiment analysis in
the public security domain (Dahal et al., 2019; Karami et al.,
2020;Zhang and Cheng, 2021).
Regarding lexicon expansion, the aim here was to enrich or
enlarge the lexicon to improve sentiment analysis performance.
A sentiment dictionary was built to address the shortage of sen-
timent analysis on hot events in social media. Improvement of
short text classification using domain sentiment lexicon expan-
sion with sentiment orientation analysis also has been experi-
mented. In addition, an emoji lexicon was developed for
sentiment polarity sign and intensity scores. Other work has been
directed at generating corpora based on the domain or language
used. The main outputs of this group include: feature combina-
tion sets derived from expanded lexicons (Abid et al., 2017; Bai
and Yu, 2016;Li and Wang, 2021), features sets based on the lan-
guage used (Anuratha et al., 2021; Fuadvy and Ibrahim, 2019;
Laudy et al., 2017), lexical and semantic feature combinations
for deep learning (Li et al., 2021), multi-feature based deep learn-
ing frameworks for classification (Sun et al., 2022) and textual
feature sets based on disaster visual content (Amin et al., 2021;
Sadiq et al., 2020).
For the improvement of classification techniques, most of the
surveyed work was focused on the modification of the classifiers
used to generate the classification models using machine learning,
deep learning, and hybrid/ ensembles. Some other work has also
contributed to the improving associated techniques for: annota-
tion, pipeline and parsing, utilization of lexicons for classification,
topic modeling, and multi-task or multi-level classification. The
output for this group was typically either: a technique or model
(Bansal et al., 2021; Bashar, 2022; Li et al., 2021), a framework
(Mendon et al., 2021; Thukral et al., 2021; Wahid et al., 2021),
and an algorithm of some kind (Jiang et al., 2017; Moraes Silva
et al., 2022; Wang et al., 2020) for sentiment analysis or opinion
mining in public security domain.
For the event prediction technique, the surveyed work focused
on the improvement of the technique used to detect and predict an
event according to their timeframe. Some of the objectives
included: the detection of an emergency event and early event
detection, and real-time monitoring of the progress of an event.
The ability to predict or detect events early can significantly
enhance response times and potentially mitigate harm or damage.
The output for this group tended to be an improved approach to
the detection and prediction of an emergency/ disaster event
related to the public security domain before the event occurred
(An et al., 2021; Duan et al., 2020; Zhong, 2021).
4.1.3. Corpus generation
The third objective was corpus generation, whereby a specific
corpus or dataset related to the respective domain was produced.
The generation was divided into initiation of the corpus and tech-
nique enhancement. The former created new corpus from scratch
or adding new data to an existing corpus while the latter refined
and enhance existing technique to better curate data. The outputs
from this objective include: compiled corpora of texts concerning
specific events related to public security (Backfried and Shalunts,
2016; Effrosynidis et al., 2022; Thorat and Namrata Mahender,
2019), annotated corpora (de Carvalho et al., 2020; Mamta,
2020), data collection methodology (Abid et al., 2017) and dataset
manipulation (Olusegun et al., 2023).
4.1.4. Multipurpose
The multipurpose objectives refer to work that addressed more
than one issue concurrently, such as analysis of a specific event,
M.S. Md Suhaimin, M.H. Ahmad Hijazi, E.G. Moung et al. Journal of King Saud University Computer and Information Sciences 35 (2023) 101776
7
and improvement of features or techniques. For example, the work
presented in Li and Wang (2021); Mendon et al. (2021); Mohamed
Ridhwan and Hargreaves (2021); Moraes Silva et al. (2022);Qin
et al. (2022) proposed improved classification and feature extrac-
tion techniques to enhance the quality of their models. Other work
proposed new frameworks that addressed several issues pertaining
to the domain of interest Adamu et al. (2021); Garcia and Berton
(2021); Kovács et al. (2021); Yu et al. (2021);Zhang et al. (2021).
4.2. Domain of interest in public security
The second attribute of the taxonomy is the domain of the pub-
lic security work identified. Based on the survey, the domain was
divided into two: the natural and the non-natural security threat
domains. Sub-sections 4.2.1 and 4.2.2 briefly explain each group.
4.2.1. Natural
The domain of natural events is defined by the underlying nat-
ural causes that trigger an event, which included natural disasters
and public health crises, as provided in Table 2. Examples of the
natural disaster events that have been studied include earthquake,
flood, wildfire, hurricane, storm, rainstorm, tornado, drought, tsu-
nami, climate change and blizzards. Examples of public health-
related events include: disease outbreaks such as Zika virus, mon-
keypox and Avian influenza, epidemics and pandemics of COVID-
19. A few studies have also explored events that span multiple cat-
egories. For example, the combination of earthquakes and the
Swine Influenza pandemic has been explored (Li et al., 2017). Sim-
ilarly, the joint occurrence of floods, earthquakes, and Influenza
spread have been studied (Albayrak and Gray-Roncal, 2019).
4.2.2. Non-natural (human-made)
Non-natural events are those caused by humans or are human-
made. This includes the following categories: emergencies, chaos,
crises and cyber or technology threats, as provided in Table 3.
The emergency category includes energy-related incidents, nuclear
accidents, electricity/ power issues, transportation/ airlines inci-
dents, emergency traffic issues, infrastructure collisions, crowd
accidents that triggered emergencies, water crises, and local public
emergency issues. The chaos category includes terrorism, riot, pro-
test and crime. The crises category includes: conflicts between
countries, immigrant issues and economic crises. The cyber and
technology threat category includes computer hacking, data pri-
vacy issues, and internet-related incidents. Mamta et al. (2020)
reported a study that encompassed multiple non-natural domains,
including terrorism, cyber-attacks, and crime. Recently, a review of
sentiment analysis in the context of public security was reported in
(de Carvalho and Seixas Costa, 2021) that concentrated on the con-
ceptual framework of sentiment analysis for public emergency
events.
4.3. Public security event timeframe
The public security event timeframe refers to the time when the
sentiment analysis was used in any event related to public secu-
rity. The timeframe is divided into pre-event, during-event, and
post-event. Fig. 4 depicts the chronology of this timeframe. Sub-
section 4.3.1 to 4.3.3 provide an explanation for each attribute.
4.3.1. Pre-event
The pre-event timeframe covered sentiment analysis and opin-
ion mining, related to natural and non-natural disasters or events,
prior to their occurrence. This timeframe has been the least
explored in the literature. It holds significant potential for enabling
governments and law enforcement agencies to remain vigilant
based on predicted potential threats or unrest. The idea is to lever-
age previous and current datasets to analyze and predict events,
allowing for the implementation of necessary precautions or pre-
ventive measures. For instance, Abid et al. (2017) proposed a
methodology to extract, annotate, and unify suspicious social
media content to predict potential threats to public security. They
focused on terrorism related to ISIS and Daesh using platforms
such as Twitter
5
, Facebook
6
and YouTube
7
. Similarly, Almehmadi
et al. (2017)) used Twitter public data to predict crime rates in Hous-
ton and New York City. They collected offensive and non-offensive
Tweets based on geographic location and analyzed the relationship
between tweet classification and crime rates. Similar work, utilizing
geolocation data and sentiment analysis was reported to monitor
and predict disasters in (Albayrak and Gray-Roncal (2019); Sufi
and Khalil (2022).
4.3.2. During-event
The during-event timeframe is where sentiment analysis or
opinion mining is conducted in real-time during an ongoing event.
From the survey, the during-event timeframe is the second most
common area of study in social media sentiment analysis related
to public security. This approach is particularly useful for time-
sensitive issues with a large social media following, as it enables
Table 2
Natural domain’s event of related works.
Domain Natural disaster
events
Related works
Natural
disaster
Earthquake Chen et al. (2019);Chenxi et al. (2022);Laudy et al. (2017);Xu et al. (2017);Yue et al. (2018)
Flood Choirul Rahmadan et al. (2020);Dudani et al. (2020);He et al. (2019);Karami et al. (2020)
Wildfire Astuti et al. (2023);Zander et al. (2023)
Hurricane Wu and Cui (2018);Yuan et al. (2020);Zhang et al. (2020)
Storm Theja Bhavaraju et al. (2019))
Rainstorm Zhang and Ma (2023)
Tornado Sadiq et al. (2020)
Drought Thorat and Namrata Mahender (2019)
Tsunami Chen et al. (2019)
Climate change Becken et al. (2022);Effrosynidis et al. (2022);Lydiri et al. (2023);Zeng (2022)
Blizzards Dong et al. (2021);Theja Bhavaraju et al. (2019)
Public health
crises
Disease outbreaks Hyo Jin et al. (2016);Olusegun et al. (2023);Qin and Ronchieri (2022);Su and Li (2018);Zhou and Zhang (2017)
Pandemics/
COVID-19
Rathke et al. (2023);Samaras et al. (2023);Shi et al. (2021);Srikanth et al. (2022);Sukhwal and Kankanhalli (2022);Sun et al.
(2023);Sun et al. (2022);Tsao et al. (2022);Wahid et al. (2021);Xia et al. (2022);Xie and Chen (2022);Yu et al. (2021);Zhang
et al. (2022);Zhao et al. (2022);Zhuang et al. (2021)
5
https://twitter.com/
6
https://www.facebook.com
7
https://www.youtube.com
M.S. Md Suhaimin, M.H. Ahmad Hijazi, E.G. Moung et al. Journal of King Saud University Computer and Information Sciences 35 (2023) 101776
8
real-time insights into public sentiment. For instance, analyzing
public sentiment during the 2011 Egyptian revolution in real-
time provided more valuable insights than retrospective data from
previous years. The amount of data acquired before any analysis
can be performed is event-oriented. For example, an emergency
event may necessitate the collection of social media data from
the event’s inception through to the recovery period, while for dis-
asters such as earthquakes, floods, and tsunamis, the timespan
could be shorter. In contrast, public health events, such as disease
outbreaks, require a longer time span for data collection. Once the
data has been collected within the specified timeframe, it can be
analyzed to support relief and disaster management efforts. Vari-
ous studies have utilized the during-event timeframe to analyze
specific events, such as disease outbreaks and the coronavirus pan-
demic, as reported in (Garcia and Berton, 2021; Koytak and Celik,
2022; Yu et al., 2021). Some reported work adopted a longer time-
frame, including data collected prior to the known occurrence of an
event, to fully utilize available data for analysis (Fakhry et al., 2020;
Quach et al., 2022;Zhao et al., 2022).
4.3.3. Post-event
Based on the survey, post-event sentiment analysis or opinion
mining is the most prevalent type of study in the field of public
security. This approach involves analyzing data after a specific
event has occurred, either using the event date itself, or data from
an earlier time frame that triggered the public security threat. The
purpose of the analysis is usually to re-examine issues related to
public security and provide insight into what happened (Geeta,
2016; Li et al., 2016;Ma et al., 2016) and to prepare for future
endeavors (Alfarrarjeh et al., 2017;Wang et al., 2016). The analysis
also provides support for policy making (Chung and Zeng, 2016; de
Carvalho and Costa, 2022) and management (Gascó et al., 2017).
Examples of events analyzed using post-event sentiment analysis
include immigration and border security breaches (Chung and
Zeng, 2016), shooting incidents (Wang et al., 2016), collision acci-
dents (Li et al., 2016;Ma et al., 2016), nuclear accidents (Li et al.,
2016), hurricane and earthquake disasters (Alfarrarjeh et al., 2017).
4.4. Social media platform
The fourth attribute of the recent work on social media senti-
ment analysis and opinion mining for public security was the plat-
form of social media used. The attribute can be grouped into those
using a single social media platform and those using multiple plat-
forms mixed. Most of the recent work in the public security
domain used a dataset from a single social media platform. From
the survey, it was found that Twitter and Sina Weibo
8
were the
most utilized, followed by YouTube, Facebook and other microblogs.
Only a few instances of reported work used multiple platforms, or
combined them with web feeds. There were examples of work that
combined: Twitter, Facebook and web feeds (Backfried and
Shalunts, 2016), Twitter, Facebook and YouTube (Ghanem et al.,
2021), Twitter and Flickr
9
(Alfarrarjeh et al., 2017), Twitter and Red-
dit
10
(Qi et al., 2019), Twitter and ReliefWeb
11
(Li et al., 2017), Twit-
ter and Sina Weibo (Ma et al., 2020), Twitter/ Sina Weibo and online
news feed (de Carvalho and Costa, 2022; Xu et al., 2017;Yu et al.,
2020), and, Sina Weibo and WeChat
12
(Lian et al., 2020;Liu et al.,
2021). Among these platforms, Twitter was found to be the most
Table 3
Non-natural domain’s event of related works.
Domain Non-natural disaster
events
Related works
Emergency Energy incident Gangadhari and Khanzode (2021);Pu et al. (2022);Tan et al. (2021);Wu and Lu (2017)
Nuclear accidents Gu et al. (2021);Hasegawa et al. (2020);Li et al. (2016)
Electricity/ power
issue
Li et al. (2020)
Transportation/
airlines incidents
Jiang et al. (2017);Lee et al. (2020);Li and Wang (2021);Li et al. (2016);Ma et al. (2016);Moraes Silva et al. (2022);
Yin et al. (2017);Zhou and Jing (2020)
Emergency traffic
issue
Alomari et al. (2020);Chen and Ji (2021);Kabbani et al. (2022);Li and Liu (2020)
Infrastructure
collisions
Gu et al. (2021);Subramaniyaswamy et al. (2017)
Crowd accidents Daou (2021);Zhang et al. (2020)
Water crises Xiong et al. (2020)
Local public
emergency
Cao (2020);Duan et al. (2020);Zhong (2021);Zhou (2021)
Chaos Terrorism Abid et al. (2017);An et al. (2019);An et al. (2021);Chaudhary and Bansal (2021);Chouhan (2021);El Ali et al.
(2018);Geeta and Niyogi (2016);Kostakos et al. (2018);Smith et al. (2018)
Riot Gascó et al. (2017)
Protest Kovács et al. (2021);Qi et al. (2019)
Crime Almehmadi et al. (2017);de Carvalho and Costa (2022);de Carvalho et al. (2020);Li et al. (2021);Li and Wang
(2022);Prathap and Ramesha (2019);Wang et al. (2016);Zhang et al. (2019)
Crises Countries’ conflict Al-Agha and Abu-Dahrooj (2019);Backfried and Shalunts (2016);Lee and Nerghes (2017);Litvak et al. (2016)
Immigrant Arias et al. (2022);Chung and Zeng (2016);Koytak and Celik (2022);Lee and Nerghes (2018)
Economic crises Chandra et al. (2020);Kumar (2018)
Cyber/ technology Computer hacking Abusaqer et al. (2023);Sliva et al. (2019)
Data privacy Al-khateeb and Agarwal (2016)
Internet-related
incident
Wang et al. (2020)
Pre-event
Before
event
During-event
During
event
Post-event
After
event
Fig. 4. Public security event timeframe chronology.
8
https://us.weibo.com
9
https://www.flickr.com
10
https://www.reddit.com
11
https://reliefweb.int
12
https://www.wechat.com
M.S. Md Suhaimin, M.H. Ahmad Hijazi, E.G. Moung et al. Journal of King Saud University Computer and Information Sciences 35 (2023) 101776
9
commonly used, followed by Sina Weibo. One of the reasons for their
popularity is their real-time information dissemination and data
acquisition tools (Al-khateeb and Agarwal, 2016; Bai and Yu,
2016). Platforms that prohibit data acquisition, such as Facebook,
are less popular. It is argued that a combination of social media plat-
forms is required to improve sentiment analysis system performance
and present the actual sentiment or opinion of the public on a larger
scale (Dahal et al., 2019; Mendon et al., 2021; Mohamed Ridhwan
and Hargreaves, 2021).
4.5. Dataset
The fifth attribute of the taxonomy pertains to the type of data-
set used in sentiment analysis and opinion mining for public secu-
rity. The dataset attribute can be grouped into two types: (i) public
datasets, and (ii) private datasets. Public datasets, as the name sug-
gests, are in the public domain. Public datasets are important
because they allow researchers to compare results. They are also
useful with respect to transfer learning (Anuratha et al., 2021;
Bansal et al., 2021). However, due to the shortage of public data-
sets, the majority of the work conducted on sentiment analysis
and opinion mining has been conducted using private datasets.
Furthermore, researchers prefer to acquire their own private data-
set in order to achieve a specific objective or conduct an analysis
based on a specific event. Table 4 presents a summary of accessible
public datasets used for sentiment analysis and opinion mining in
the public security domain. Some of the existing public datasets
include geotagged tweets related to disasters including: disasters
within the United States (Pfeffer and Morstatter, 2016), the Hurri-
cane Harvey Twitter Dataset (Phillips, 2017), earthquakes from the
Nepal-quake and Italy-quake (Basu et al., 2019), the international
disaster dataset (Guha-Sapir and Below, 2016), climate change
Table 4
Summary of public datasets used for sentiment analysis and opinion mining in public security domain.
Author Domain Description Link
Imran et al. (2013) Natural
disaster
Human-labeled tweets collected during 2012 Hurricane Sandy and
2011 Joplin tornado
https://crisisnlp.qcri.org/#
Alam et al. (2018) Natural
disaster
Human-labeled tweets collected from 2015 Nepal earthquake and 2013
Queensland floods
https://crisisnlp.qcri.org/#
Pfeffer and
Morstatter
(2016)
Natural
disaster
United States geotagged Twitter daily posts from June to November
2014 and 2015
https://search.gesis.org/research_data/SDN-10.7802–
1166?doi =https://doi.org/10.7802/1166
Zhang and Cheng,
(2021)
Natural
disaster
Internet archive database for historical tweets using hashtag related
Typhoon Haiyan from October to December 2013
https://archive.org/
Phillips (2017) Natural
disaster
Hurricane Harvey Twitter dataset from August to September 2017 https://digital.library.unt.edu/ark:/67531/
metadc993940/
Basu et al. (2019) Natural
disaster
Tweets collected during Nepal Earthquake 2015 and Italy Earthquake
2016
https://zenodo.org/record/2649794#.YzL5iXZByv8
Qian (2019) Natural
disaster
Tweets related to climate change sentiment dataset form April 2015 to
February 2018
https://www.kaggle.com/datasets/edqian/twitter-
climate-change-sentiment-dataset
(Effrosynidis et al.,
2022)
Natural
disaster
Climate change Twitter dataset from June 2006 to October 2019 https://data.mendeley.com/datasets/mw8yd7z9wc/
Li and Sun (2023) Natural
disaster
Twitter public opinion comments about natural disasters from August
2013 to June 2014
https://figshare.com/articles/dataset/tweets_csv_gz/
3465974/2
Lee et al. (2020) Emergency Mixed platform post related to Sewol Ferry Disaster from April 2014 to
March 2015
https://www.frontiersin.org/articles/10.3389/fpsyt.
2020.505673/full
Moraes Silva et al.
(2022)
Emergency Twitter US Airline Sentiment in February 2015 https://www.kaggle.com/datasets/crowdflower/twitter-
airline-sentiment
Al-Agha and Abu-
Dahrooj, (2019)
Chaos Tweets related terrorism keywords from December 2015 to 2106 in
United States and European country
https://github.com/odahroug2010/2017
(Berhoum et al.,
2023)
Chaos Tweets originating from Pro-ISIS supporters since November 2015 Paris
Attack
https://www.kaggle.com/datasets/fifthtribe/how-isis-
uses-twitter
(Rudra et al., 2015) Multi-
domain
Tweets related to 5 different event of public security domain https://cse.iitkgp.ac.in/krudra/disaster_dataset.html
Guha-Sapir and
Below (2016)
Multi-
domain
International Disaster Database for public access (multi-domain) https://www.emdat.be/
Mamta et al. (2020) Multi-
domain
Tweets related to public security domain keywords from January to
April 2019
https://www.iitp.ac.in/ai-nlp-ml/resources.html
(Dimitrov et al.,
2020)
Public
health
Semantically annotated corpus COVID-19-related tweets https://data.gesis.org/tweetscov19/
(Lyu et al., 2020) Public
health
Weibo labeled and unlabeled sentiment post about COVID-19 from
January to May 2020
https://github.com/COVID-19-Weibo-data/COVID-19-
sentiment-analysis-dataset-Weibo
(Kumar, S., 2020) Public
health
COVID-19-related sentiment labeled dataset collected in India from
March to July 2020
https://www.kaggle.com/datasets/surajkum1198/
twitterdata
(Zhang and Xu,
2021)
Public
health
Emotional dataset during epidemic period https://www.datafountain.cn/datasets
(Chen et al., 2020) Public
health
Multilingual COVID-19 Twitter dataset collected from January to March
2020
https://github.com/echen102/COVID-19-TweetIDs
(Lamsal, 2020) Public
health
Daily updated COVID-19 tweets dataset https://ieee-dataport.org/open-access/coronavirus-
covid-19-tweets-dataset
(Lamsal, 2020a) Public
health
Daily updated COVID-19 geo-tagged tweets dataset https://ieee-dataport.org/open-access/coronavirus-
covid-19-geo-tagged-tweets-dataset
(Mathayomchan
et al., 2022)
Public
health
Pandemic tweets dataset mentioned the SEA countries from January
2020 to July 2021
https://github.com/viriyatae/pandemictweets
(Banda et al., 2021) Public
health
Large scale COVID-19 Twitter dataset from January 2020 to June 2021 https://zenodo.org/record/7179601#.Y0Th43ZByv8
(Imran et al., 2022) Public
health
Two billion multilingual tweets related to the COVID-19 pandemic
from February 2020 to March 2021
https://crisisnlp.qcri.org/tbcov
M.S. Md Suhaimin, M.H. Ahmad Hijazi, E.G. Moung et al. Journal of King Saud University Computer and Information Sciences 35 (2023) 101776
10
(Effrosynidis et al., 2022; Qian, 2019), and an emotion dataset of
internet users during the epidemic period from DataFountain
(Zhang and Xu, 2021). Most of the public dataset available for
the public health domain are COVID-19-related (Banda et al.,
2021; Dimitrov et al., 2020;Kumar, 2020); (Lamsal, 2020a;
Lamsal, 2020;Lyu et al., 2020; Pfeffer and Morstatter, 2016). Some
authors have shared their dataset with the public. Examples
include: the refugee crisis (Lee and Nerghes, 2017), the
Palestinian-Israeli conflict (Al-Agha and Abu-Dahrooj, 2019), the
Sewol Ferry Disaster (Lee et al., 2020), terrorism and cyber security
(Mamta, 2020) and the Typhoon Haiyan disaster (Zhang and
Cheng, 2021). Recent efforts have also utilized public dataset for
multi-domain purposes (Mamta, 2020; Rudra et al., 2015).
4.6. Language
The sixth attribute of the taxonomy pertains to the language of
the dataset used. The language attribute can be grouped into three:
(i) monolingual, (ii) bilingual, and (iii) multilingual. A monolingual
dataset is composed of text in a single language, while a bilingual
dataset comprises two languages, and a multilingual dataset
includes three or more languages. In terms of pre-processing and
feature extraction support, with respect to the tools available and
the source of the lexicon, monolingual has an advantage over the
others. To fully exploit this benefit, some reported work has
excluded text in languages other than the preferred language, such
as keeping only English tweets and discarding others (Almehmadi
et al., 2017; Subramaniyaswamy et al., 2017). However, removing
data in other languages may hamper the sentiment detection,
which could provide critical information about the disaster or
events (Li et al., 2016). The most common language used in the
public security domain is English, followed by Chinese. Other lan-
guages include Indonesian, Arabic, Hindi, Japanese, Korean, Bangla,
Telugu, Vietnamese, German and Portuguese.
Bilingual datasets typically comprise English combined with
another language, such as Chinese, Filipino, Malay, Hindi, German,
Portuguese, and Spanish. In existing works, translation is often per-
formed from one language into another, such as translating written
text in Filipino to English (Zhang and Cheng, 2021) and Malay to
English (Azmi et al., 2022), before the sentiment analysis is con-
ducted. Some reported work has combined a language with
another translated language to maximize the available resources
for representing features (Fuadvy and Ibrahim, 2019). However,
communication that uses dual language presents a challenge in
bilingual datasets where different languages are compounded in
a single sentence or comment (Farzindar and Inkpen, 2020),
referred to as ‘‘code-switched” or ‘‘code-mixed” data. This type of
data presents a new research challenge in terms of filtering the lan-
guage, preprocessing, and feature representation, based on each
language, while maintaining contextual meaning. Examples of
work that uses code-mixed datasets are Malay with English
(Fuadvy and Ibrahim, 2019), Filipino with English (Zhang and
Cheng, 2021), Indonesian with English (Yan et al., 2020), and Nige-
rian with English (Adamu et al., 2021).
Sentiment analysis and opinion mining for public security that
have been conducted using multilanguage settings include combi-
nations of: English, French, and German (Smith et al., 2018), Eng-
lish, Arabic, French, and German (El Ali et al., 2018), and English
in conjunction with other European languages (Kovács et al.,
2021; Laudy et al., 2017). Additionally, there have been studies
that utilized multilingual datasets from 218 countries worldwide
,(Imran et al., 2022). However, this reported work commonly used
separate datasets for each language and generated separate models
based on each dataset. In some countries, however, it was observed
that multilingualism is used in a single dataset instead (Suhaimin
et al., 2017; Suhaimin et al., 2019).
4.7. Sentiment analysis and opinion mining approach
The seventh and last attribute of the taxonomy is the approach
used to perform sentiment analysis and opinion mining. It is criti-
cal to select an appropriate approach to achieve the objectives
described in Section 4.3 and guide each methodology in complet-
ing the analysis or mining. According to the taxonomy, the recent
approaches found in the surveyed work, in the public security
domain, can be divided into machine learning-based, lexicon-
approaches, hybrid, and manual coding groups. Sub-section 4.7.1
to 4.7.4 present a detailed explanation of each group, and includes
some surveyed works that have used each approach.
4.7.1. Machine learning-based
The machine learning method is divided into five: (i) supervised
learning, (ii) unsupervised learning, (iii) semi-supervised learning,
(iv) ensemble learning, and (v) deep learning. The following sub-
section describes the details of each.
i Supervised learning
Supervised learning uses labeled data to learn and classify the
sentiment from opinionated text. It involves preprocessing textual
data and extracting features (Razali et al., 2021). The tasks include
the removal of special characters and leaving only specific, mean-
ingful words before features, such as Bag Of Words (BOW) and
Word2Vec features, are extracted. Recent work in the public secu-
rity domain has shown that supervised learning is the most popu-
lar approach. The algorithms that have been most frequently
employed include: Random Forest (RF), Support Vector Machine
(SVM), Naïve Bayes (NB), K-Nearest Neighbor (K-NN), Logistic
Regression (LR) and Maximum Entropy (ME). The researchers usu-
ally experimented with multiple algorithms coupled with the
extracted features to find the best performing classifier Wang
et al. (2016).Bai and Yu (2016) showed that RF with a Distribution
Representation of Words (DRW) performed the best in the detec-
tion of messages related to disaster. Yue et al. (2018) found that
RF and SVM algorithms were the best in mapping different event
correlations with social media data. An et al. (2019) predicted
the influence of social media in the context of terrorist events with
multiple algorithms and concluded that LR algorithms produced
the best prediction model. The work presented in (Fakhry et al.,
2020) evaluated COVID-19 cases using social media data and its
geographic location, and found that NB produced the best perfor-
mance. Gupta et al. (2021) analyzed social media data for public
perception on COVID-19 issues using various classification models
and found that the SVM model performed the best.
Supervised learning has the advantage of using labeled or anno-
tated data to train the classifier, enabling the extraction of patterns
to provide more accurate predictions. However, a notable disad-
vantage of this approach is that it is domain-dependent, which
means that the performance of the classifier is impacted if it
encounters data outside of its domain or was not seen during the
learning stage (An et al., 2019; To et al., 2017). A further disadvan-
tage is the effort required to create the training set for inference
process (Gu et al., 2022; Xie and Chen, 2022).
ii Unsupervised learning
The unsupervised learning approach does not require labeled
training data. It leverages hidden structures or semantic associa-
tions in unlabeled data and can be applied to text data without
manual intervention (Aggarwal and Zhai, 2012; Pedrycz and
Chen, 2016). Two common unsupervised approaches used in senti-
ment analysis and opinion mining for public security are clustering
and topic modeling. Clustering is the process of grouping data into
M.S. Md Suhaimin, M.H. Ahmad Hijazi, E.G. Moung et al. Journal of King Saud University Computer and Information Sciences 35 (2023) 101776
11
distinct classes or clusters so that items belonging to the same
cluster have a high degree of similarity while objects belonging
to separate clusters exhibit a high degree of variety (Yue et al.,
2019). Clustering has been employed to assess the impact of topics
in the context of predicted events (An et al., 2019). In sentiment
analysis, researchers have set thresholds for keyword frequency
in documents related to an event (Xu et al., 2017) or orientation
of the sentiment (Ma et al., 2016). In recent work, the K-Means,
K-Medoids and Density-Based Spatial Clustering of Application
with Noise (DBSCAN) algorithms are the most commonly used. In
clustering public opinion on natural disasters, Mustakim (2021)
compared K-Medoids and DBSCAN and found that DBSCAN per-
formed better with a higher Silhouette Index (SI). In Bansal et al.
(2021) it was also found that DBSCAN outperformed other algo-
rithms in clustering multi-domain non-natural events. Topic mod-
eling has been achieved through Latent Dirichlet Allocation (LDA),
a commonly used model found in this survey. Li et al. (2016) pro-
posed an unsupervised topic sentiment for the automatic classifi-
cation of Chinese social media data. Gu et al. (2021) used the
LDA model to extract emerging topics amongst Weibo users during
an emergency, and Cao et al. (2021) used the LDA model for cate-
gorizing topics in social media posts during Wuhan lockdown.
The advantage of unsupervised learning is it does not require
labeled or annotated data. This approach also outperforms manual
selection of keywords for sentiment analysis in identifying event-
related keywords for public security purposes (Li et al., 2017).
However, unsatisfactory results have been reported in sentiment
analysis for non-dependent features or out-of-domain keywords
(de Carvalho and Costa, 2022;Li et al., 2016).
iii Semi-supervised learning
The semi-supervised learning uses a small amount of labeled or
annotated data along with a larger amount of unlabeled data for
classification. However, this approach has been the least used in
the recent work considered in the review reported here due to
its lower classification performance results (Sharma and Jain,
2020). (Zhang et al. (2020) leveraged a semi-supervised SVM to
train a model using only a small training set of labeled data to clas-
sify emotions in the emergency situation of Hurricane Irma.
Recently, Qin and Ronchieri (2022) utilized 20% of labeled data to
predict the classes of the remaining data and produce a classifica-
tion model for sentiment related to pandemic events.
Semi-supervised learning takes advantage of the small amount
of labeled data to automate the data labeling process and over-
come the expense of data labeling and annotation. However,
semi-supervised learning is not applicable to all sentiment analysis
problems, especially for data that require rule-based feature
extraction (Behl et al., 2021).
iv Ensemble learning
The ensemble approach idea is to combine outputs from multi-
ple base learners to produce a single output that yields better clas-
sification performance (Guyon et al., 2008). Among the most
commonly implemented types of ensembles are bagging and
boosting, in addition to stacking and RF (Bhosale and Patnaik,
2023; Hung et al., 2015). Bagging trains each base learner on a dif-
ferent bootstrap sample taken from the input data, while boosting
builds a model incrementally by adding weights to the data mis-
classified by the base classifiers Zong et al. (2021).Chouhan
(2021) developed a gradient boosting classifier to differentiate
types of non-situational tweets during the Pulwama Terrorist
Attack. Tsai and Wang (2020) proposed an ensemble method by
combining pre-trained and early-trained models for COVID-19-
related tweet analysis. Mamta et al. (2020) perform sentiment
analysis on multi-domain sentiment corpora of crime, emergency,
terrorism and cyber security using an ensemble of Convolutional
Neural Network (CNN), Long Short-Term Memory (LSTM), and
Gated Recurrent Unit (GRU).
Ensemble learning presents a robust strategy for addressing
challenges associated with multiple learning models by facilitating
the creation of intelligently integrated models (Bhosale and
Patnaik, 2023). It has been shown to improve sentiment analysis
performance in recent work, with boosting generally outperform-
ing bagging (Chouhan, 2021; Tsai and Wang, 2020;Wang et al.,
2016). However, ensemble methods require a deeper understand-
ing of the dataset for feature extraction and may require complex
architectures (Mamta, 2020).
v Deep learning
Deep learning is a subset of machine learning that employs
deep neural networks and data learning presentations (Razali
et al., 2021;Yue et al., 2019). Deep learning can be utilized in
supervised, partially supervised, or unsupervised learning settings
(Gupta and Agrawal, 2020). In machine learning methods, deep
learning offers solutions to complex problems by deriving insight-
ful knowledge from straightforward representations. The promi-
nence of neural network approaches is largely due to their ability
to learn precise representations, a critical structural feature
(Bhosale and Patnaik, 2022). In the field of sentiment analysis
and opinion mining for public security, researchers have shown a
growing interest in applying deep learning techniques. For
instance, Gu et al. (2021) proposed a Bi-Directional Long Short-
Term Memory (Bi-LSTM) to identify a positive, negative, or neutral
sentiment from social media activities during emergency events.
Pran et al. (2020) compared CNN and LSTM for sentiment analysis
towards COVID-19 using the Bangla language and found that CNN
performed better in accuracy. Chen et al. (2021)) experimented
with a deep transformer network, namely the Bidirectional Enco-
der Representations from Transformers (BERT), with Neural Net-
work (NN) for sentiment classification both before and after the
COVID-19 pandemic. Recently, Zhang et al. (2022) fine-tuned the
BERT inspired model, ERNIE, on pandemic outbreak datasets to
achieve better classification performance.
The deep learning approach has the advantage of producing
better classification performance results using the coarse-grained
and deep features of the dataset (Chandra et al., 2020; Garcia
and Berton, 2021). The deep learning approach, specifically in
domain-related tasks, can perform better with a few adjustments
made at the feature extraction level (Lin and Moh, 2021). However,
the disadvantages of deep learning include the large amount of
data required to generate the model (Behl et al., 2021; El Ali
et al., 2018) and the high cost of training to build the model (Lin
and Moh, 2021; Sadiq et al., 2020).
4.7.2. Lexicon approaches
Based on the recent work surveyed, another approach for senti-
ment analysis and opinion mining for public security is the
lexicon-based approach. The lexicon-based is divided into two:
dictionary-based and corpus-based. The following subsections
describe the details of each approach.
i Dictionary-based
Dictionary-based lexicon approaches utilize a dictionary that
integrates the polarity of the terms. When a word is discovered
in a text, a lookup in the dictionary is conducted, and then the sen-
timent score is calculated. From the survey, many dictionaries have
been used such as Valence Aware Dictionary and Sentiment Rea-
soner (VADER) Hutto and Gilbert (2014), WordNet (Miller, 1995),
M.S. Md Suhaimin, M.H. Ahmad Hijazi, E.G. Moung et al. Journal of King Saud University Computer and Information Sciences 35 (2023) 101776
12
SentiWordNet (Baccianella et al., 2010), SentiStrength (Thelwall,
2017), Linguistic Inquiry and Word Count (LIWC) Pennebaker
et al. (2001), AFINN (Nielsen, 2011), Word-Emotion Association
Lexicon (EmoLex) (Mohammad and Turney, 2013), and TextBlob
Loria et al. (2014). For Chinese sentiment analysis during the
COVID-19 pandemic, BosonNLP (Min et al., 2015), a sentiment dic-
tionary constructed from labeled Weibo, news and forum data, was
used (Yu et al., 2020). Additionally, HowNet (a large-scale knowl-
edge database of Chinese) was used for feature engineering to feed
into deep learning model for emergency event detection (Li and
Wang, 2021).
The advantage of the dictionary-based method is that it is sim-
ple to implement and does not rely on labeled or annotated data
(Becken et al., 2022). However, the sentiment analysis performance
of a dictionary-based method is limited by the quality of the rules
that integrate the polarity and lexicons in the dictionary (Chen and
Ji, 2021;Li et al., 2020;Zhang and Cheng, 2021). Additionally, the
coverage of social media words in existing dictionaries is often
incomplete and requires expansion (Pan et al., 2022;Qin et al.,
2022).
ii Corpus-based
The corpus-based lexicon approach relies on co-occurrence pat-
terns and a seed list of words to find other sentiment words in a
large corpus (Liu, 2011). Hence the corpus-based approach
requires a large training corpus to compute the polarity of the
words (Agarwal and Mittal, 2016). In the survey presented in
(Zhou, 2021), microblog corpora were obtained for multiple emer-
gency events by retrieving event keywords and then measuring the
influence of the event using a calculated equation. The work pre-
sented in Abid et al. (2017) adopted diverse data sources to pro-
duce heterogeneous content to improve a corpus. The authors
later obtained other XML data structures from different social
media and unified them into a single structure, targeting the detec-
tion of suspicious content circulated on social media that could
threaten public security. Jiang et al. (2017) proposed the Word
Emotion Association Network (WEAN) to compute the emotion
of words based on nouns, verbs, adjectives and cyber words in
the corpus acquired. The results produced better sentiment analy-
sis performance when applied to the emergency event of The
Malaysia Airlines MH370 dataset from Sina Weibo. Thorat and
Namrata Mahender (2019) created a corpus based on drought syn-
onym tweets and later generated patterns for predicting tweet
polarity.
The advantage of the corpus-based approach is that it provides
better performance for the specific domain for which it is devel-
oped (Abid et al., 2017; Thorat and Namrata Mahender, 2019).
However, it requires a large amount of data to build an informative
corpus (Sufi and Khalil, 2022; Theja Bhavaraju et al., 2019). Fur-
thermore, the performance of sentiment analysis in a domain-
specific corpus is usually low if transferred to other domains
(Zhou, 2021; Zong et al., 2021).
4.7.3. Hybrid
From the literature, some of the reported work has adopted a
hybrid approach to analyze sentiment concerning public threats,
which combines the machine learning and lexicon-based
approaches. This method was developed to compensate for the
shortcomings of machine learning and lexicon-based approaches
when used separately, and is the most preferred approach after
supervised learning. The reported work directed at hybrid
approaches tends to leverage lexicons to extract features and use
machine learning to classify the sentiment, rather than the calcula-
tion of the polarity based solely on the lexicon. For example, Akbar
et al. (2021) extracted a feature set based on the corpus lexicon of
COVID-19 lockdown public opinion before applying a series of
supervised machine learning classifiers to analyze the sentiment.
The result using Bernoulli NB achieved optimal values over other
classifiers. Recently, (Pan et al., 2022) expanded the corpus-based
lexicon for the Weibo COVID-19 dataset before experimenting with
several machine learning algorithms. The RF showed the highest
classification performance among all tested algorithms.
The advantage of the hybrid approach is that it improves the
performance of the sentiment analysis compared to using only
either the lexicon or machine learning approach alone (An et al.,
2019; Chandra et al., 2020; Garcia and Berton, 2021;Zhang et al.,
2019). However, the hybrid approach requires a complex frame-
work or structure to implement Chen et al. (2019);(Khatua et al.,
2020; Tang et al., 2022;Xia et al., 2020).
4.7.4. Manual coding
Manual coding is an approach that relies on human coders to
read and code subsampled datasets using pre-defined rules (Li
et al., 2016). The particular rules can be performed repeatedly or
differently. For example, manual coding was performed separately
for positive statements and negative statements (Gascó et al.,
2017). The manual codes were then verified before being assigned,
and were then used for training and testing using a larger dataset
(Henríquez-Coronel et al., 2019).
The advantage of manual coding is that specific rules can be
applied to produce a better performance analysis system. Despite
being expensive, this approach has the potential to overcome the
problem of scalability and replicability by providing deeper
insights into each task (Van Atteveldt et al., 2021). The disadvan-
Fig. 5. (a) The trend of published work for the objectives in social media sentiment analysis and opinion mining for public security in recent years and (b) the distribution of
the objectives in recent years.
M.S. Md Suhaimin, M.H. Ahmad Hijazi, E.G. Moung et al. Journal of King Saud University Computer and Information Sciences 35 (2023) 101776
13
tage is it is time consuming and not easily transferable to other
problems or domains.
5. Analysis of trend, issues and future directions of sentiment
analysis and opinion mining for public security
This section presents the trend analysis of work considered in
this survey. The trend for each of the aforementioned attributes
is presented with the goal of demonstrating the progress of the
work and the direction in which the researchers have moved over
the last seven years. Based on the selected works in this survey, a
threshold of five articles is considered for each year to be included
for better trend visualization. Consideration was also needed to
ensure the representativeness of articles against the practical lim-
itation of the considered work (Groves et al., 2011), and to make
meaningful conclusions about trends or patterns in the data
(Cohen, 1988). The demonstrated trends may be useful in deter-
mining the direction of future research in this domain. Section 5.1
to 5.7 present the analysis of the trends within the work consid-
ered in this survey. Section 5.8. identifies some remaining issues
that the authors deem pertinent and describes possible future
directions for research in social media sentiment analysis and
opinion mining in public security.
5.1. Trend analysis of sentiment analysis and opinion mining
objectives for public security
Fig. 5a depicts the trend associated with the objectives attri-
bute. The analysis of events and improvement of techniques have
emerged as the primary areas of interest within the research com-
munity, with the number of publications focused on these objec-
tives showing an upward trend since 2019. This trend could be
caused by the COVID-19 pandemic, whereby most work focused
on efforts to understand events and explore techniques for mitigat-
ing them. Meanwhile, the number of multipurpose publications
has slightly increased since 2020, while corpus generation remains
limited over time.
Fig. 5b depicts the distribution of the objectives from 2016 to
2023. Analysis of events and technique improvement are the most
common, accounting for 41% and 39% of the existing work respec-
tively. This observation demonstrates that researchers were inter-
ested in understanding events by analyzing them and proposing
techniques to improve the detection or the prediction ability of
models. The distribution of multipurpose objectives came third at
16%. In contrast, corpus generation is the least common objective,
due to most of the published work considered using existing cor-
pora to validate and verify their proposed techniques (Backfried
and Shalunts, 2016; Kostakos et al., 2018;Wang et al., 2016).
5.2. Trend analysis of domain of interest in public security
Fig. 6a depicts the trend of work that concentrated on public
security in the natural and non-natural domain. From the figure,
compared to other domains, it can be seen that the natural disaster
domain increased linearly until 2020, before it decreased. Mean-
while, the public health domain has experienced a sudden increase
from 2020 until recently. This is due to the sudden demand for
research on the COVID-19 pandemic, which can be seen in public
health domain surge from 2020. Compared to other domains, the
non-natural domain of emergency demonstrated a rapid increase
from 2019 to 2020. This category’s focus groups included emer-
gency situations, accidents, and energy incidents. However, it
decreased in 2021 and 2022, while chaos and multi-domain inter-
ests increased, indicating a desire to combine multiple domain
types for the analysis of multiple events.
Fig. 6b depicts the distribution of recent work in the public
security domain. Public health and natural disasters have domi-
nated the recent work, accounting for 32% and 25% of the total
work respectively. The majority of the works focus on the emer-
gency domain (16%), which is followed by chaos (13%). The high
percentage of works in the emergency domain could be due to
the frequent sharing of data related to emergency events on social
media. Crisis comes in fifth place with 5%, followed by multi-
domain with 4%. The cyber and technology domain has the second
lowest percentage at 3%, which could be because only a small num-
ber of people directly involved in the cyber security domain share
or discuss related topics. However, the number of published works
that integrate both natural disaster and public health remain lim-
ited, comprising only 2% of the total. This is could be due to lack of
work that connects the two domains based on public reaction.
Moreover, combining both domains also leads to large-scale data
coverage (Albayrak and Gray-Roncal, 2019;Li et al., 2017).
5.3. Trend analysis of event timeframe
Fig. 7a depicts the trend of the event timeframe attribute. From
the figure, it can be seen that the post-event timeframe attracted
the most interest up until 2020, when the during-event timeframe
featured significant increase. This increase is caused by the emer-
gence of the COVID-19 pandemic, which led to a shift in research
attention towards the existing pandemic-focused public health
Fig. 6. (a) The trend of recent published work in the non-natural public security domain and (b) the distribution of the works in the domain.
M.S. Md Suhaimin, M.H. Ahmad Hijazi, E.G. Moung et al. Journal of King Saud University Computer and Information Sciences 35 (2023) 101776
14
domain. The trend of work concerning the pre-event timeframe
remains low throughout the sample period considered.
Fig. 7b depicts the distribution of published work across differ-
ent timeframes. In recent years, a high number of publications
have been dedicated to the during-event and post-event time-
frames, which account for 50% and 36%, respectively. Meanwhile,
the number of publications that cover the multiple and pre-event
timeframes were low at 10% and 4% respectively. Thus, indicating
that current work focused on addressing the public needs for a
specific event after it has occurred, with less emphasis on predic-
tion or early detection of events.
5.4. Trend analysis of social media platforms used
Fig. 8a shows the trend of the social media platforms used for
research in the public security domain. Twitter and Sina Weibo
have grown in popularity over the years for communication and
message dissemination. These platforms offer researchers an easy
way to acquire data, and several publicly available datasets har-
vested from these platforms have been made available by
researchers (Mathayomchan et al., 2022; Nagapudi et al., 2021).
The mixed platforms recorded similar popularity to Sina Weibo,
however, demonstrated a slight decrease starting in 2021. This
Fig. 7. (a) The trend of published work in the event timeframe in public security in recent years and (b) the distribution of the work in recent years.
Fig. 8. (a) The trend of platforms used for published work in social media sentiment analysis and opinion mining for public security in recent years and (b) the distribution of
the platform used in recent years.
Fig. 9. (a) The trend of the language of the dataset used for published work in social media sentiment analysis and opinion mining for public security in recent years and (b)
the distribution of the language used in recent years.
M.S. Md Suhaimin, M.H. Ahmad Hijazi, E.G. Moung et al. Journal of King Saud University Computer and Information Sciences 35 (2023) 101776
15
could be due to users’ preference to use a single social media plat-
form to share news.
Fig. 8b depicts the distribution of the social media platforms
used for sentiment analysis and opinion mining in public security.
Twitter was found to be the dominant platform, with 59% of the
published work considered. Sina Weibo comes in second with
25%, followed by mixed platforms with 11%. In contrast, Facebook,
YouTube, and other platforms are still underutilized for public
security data acquisition. This could be due to the low popularity
of these platforms for real-time discussion and the difficulty of
acquiring data from these platforms for research purposes.
5.5. Trend analysis of language of dataset used
Fig. 9a depicts the language of the datasets used in the context
of the considered published work in sentiment analysis and opin-
ion mining for public security. The use of monolingual datasets has
steadily increased over time. This trend can be attributed to the
wide availability of pre-processing approaches and resources for
feature engineering specific to monolingual data. Based on the
trend, the study of bilingual and multilingual public security is
low. However, the trend is gradually increasing since 2020,
although interest remains relatively low compared to the monolin-
gual work. This trend suggested a growing interest among
researchers in investigating sentiment analysis and opinion mining
across different languages, in particular in the bilingual context,
within the public security domain. It was found that bilingual stud-
ies were conducted because there was a need for the analysis of
compounding language in the dataset as a result of the uniqueness
of the language itself. This is particularly relevant as bilingual com-
ments are common among the public, making them a valuable
source of information for sentiment analysis and opinion mining.
Fig. 9b depicts the language distribution of the dataset used.
Monolingual datasets dominate the work surveyed, comprising
89% of the total. The bilingual and multilingual datasets remain
significantly low, at 7% and 4%, respectively.
5.6. Trend analysis of dataset type used
Fig. 10a depicts the trend in the types of datasets used with
respect to published work in sentiment analysis and opinion min-
ing for public security. Private datasets have exhibited a steady
increase in usage over the years, whereas the use of public datasets
has remained constant with a slight increase in recent years. This
trend suggests that researchers in the public security domain pre-
fer to use their own datasets for analysis and testing their senti-
ment analysis approach, which may raise concerns about the
generalizability of the output. The marginal increase in the usage
of public datasets in 2020 and 2021 can be attributed to the wide
availability and sharing of COVID-19 datasets for urgent global
analysis.
The distribution of dataset types used is illustrated in Fig. 10b.
The chart shows that the vast majority of the surveyed work,
77% used private datasets, while only 23% used public datasets.
Despite the predominance of private datasets over public ones,
the public datasets that see the most use are predominantly from
the public health domain, followed by natural disaster. Fig. 11
shows the overall dataset usage distribution among related works
in the public security domain. Public health datasets constitute the
major chunk, amounting to 38% of overall use, followed by natural
disaster datasets at 35%. The most frequently used public health
dataset is the daily updated COVID-19 tweets dataset (Lamsal,
2020), while the Nepal Earthquake 2015 and Italy Earthquake
2016 datasets (Basu et al., 2019) top the list for natural disasters.
The popularity of these datasets can be attributed to their early
availability as large-scale public resources within their respective
domains, making them highly valuable for sentiment analysis
and opinion mining during and in the aftermath of specific events.
5.7. Trend analysis of approaches for sentiment analysis and opinion
mining in public security
Fig. 12a depicts the approaches evident from the considered
published work on sentiment analysis and opinion mining for pub-
lic security. The dictionary-based approach was found to be the
most preferred; the number of publications increased steadily over
time. Meanwhile, the rise in popularity of the hybrid and deep
Fig. 10. (a) The trend of the dataset type used for published work in social media sentiment analysis and opinion mining for public security in recent years and (b) the
distribution of the dataset type used in recent years.
Public health
38%
Natural disaster
35%
Multi-domain
11%
Emergency
8%
Chaos
8%
Fig. 11. Distribution of public dataset usage by public security domain.
M.S. Md Suhaimin, M.H. Ahmad Hijazi, E.G. Moung et al. Journal of King Saud University Computer and Information Sciences 35 (2023) 101776
16
learning approaches since 2020 suggests that researchers are tak-
ing advantage of the availability of high-end processing hardware
capable of multitasking, which might not have been easily accessi-
ble before. Supervised learning has been consistently used over the
years, while the unsupervised learning and other approaches have
seen slower growth.
Fig. 12b depicts the recent distribution of the approaches con-
sidered. The dictionary and hybrid-based approaches were found
to be the most common approaches used both at 27%. The deep
learning approach ranks second with 18%, and the supervised
learning approach ranks third with 16%. Corpus-based learning
comes in fourth with 5%, followed by unsupervised learning with
3%. Other approaches were found to have a lower distribution over
time, namely: ensemble, semi-supervised, and manual coding. This
suggests that, among the available algorithms, the ensemble and
semi-supervised approaches are still underutilized for sentiment
analysis and opinion mining in public security.
5.8. Issues and future direction of social media sentiment analysis and
opinion mining for public security
This section presents the remaining issues, as well as potential
future directions, of research work in social media sentiment anal-
ysis and opinion mining for public security. The state-of-the-art
social media sentiment analysis for the public is plagued by several
issues that can be identified from the presented trend analysis and
highlighted in this survey. The issues are presented in Subsection
5.8.1, and potential future work is presented in Subsection 5.8.2.
5.8.1. Issues
Based on the surveyed articles, there were five major issues
identified: (i) shortage of multi-class and different level analysis
approaches; (ii) insufficient availability of public security
domain-independent datasets; (iii) inadequate prediction based
on timeframe coverage; (iv) lack of supporting approaches for vari-
ations across different languages; and (v) limited information
availability.
(i) Shortage of multi-class and different level analysis
approaches: When dealing with multi-objective studies,
researchers can benefit from multi-level sentiment analysis
and opinion mining. Such scenarios encounter challenges
in multi-class classification (Bashar, 2022;Gupta et al.,
2021). Specifically, tri-class classification can be problem-
atic, as some data points may straddle the borders of multi-
ple categories (Bhullar et al., 2022; Garcia and Berton, 2021;
Gupta et al., 2021). Additionally, different levels of analysis,
such as fine-grained emotion analysis, may lead to low clas-
sification performance (Zhou and Zhang, 2017). Further-
more, addressing each problem may require distinct
techniques (Zhou and Jing, 2020).
(ii) Insufficient availability of public security domain-
independent datasets: As sentiment analysis and opinion
mining are increasingly applied to public security domains,
it is crucial to have a domain-independent dataset to ensure
reliable results (de Carvalho and Costa, 2022). However, the
availability of training and validation datasets for events
that threaten public security is minimal (Behl et al., 2021;
Zhou, 2021). The trend analysis of dataset types revealed a
lack of public dataset availability in the public security
domain, and recent work was found to often rely on datasets
from other domains for training (Fuadvy and Ibrahim, 2019;
To et al., 2017). This presents a challenge for sentiment anal-
ysis and opinion mining models as using corpora based on
other domains is typically larger than a domain-specific cor-
pus, potentially leading to reduced performance.
(iii) Inadequate prediction based on timeframe coverage: The
published work that focused on trend analysis for automatic
detection or prediction at the pre-event timeframe, as
described in Section 4.2.1, is very limited. The detection of
an event’s starting point, such as a natural disaster or emer-
gency event, or an outbreak of a disease or a pandemic, is the
most important key factor for early warning and forecasting
of public threats (Lamsal et al., 2022). The problem of misin-
formation that creates noise or false detection was found to
often hinder prediction performance (Fang et al., 2022; Luna
et al., 2022; Sufi and Khalil, 2022).
(iv) Lack of supporting approaches for variations across different
languages: The language perspective has become a major
issue that has hampered sentiment analysis and opinion
model performance for public security. Multilanguage varia-
tion across different regions (Dahal et al., 2019; Geeta, 2016;
Kostakos et al., 2018; Li et al., 2016), figurative language
such as irony and sarcasm (Andhale et al., 2021;Gupta
Fig. 12. (a) The trend of the main approach used for social media sentiment analysis and opinion mining for public security in recent years and (b) the distribution of the
approach used in recent years.
M.S. Md Suhaimin, M.H. Ahmad Hijazi, E.G. Moung et al. Journal of King Saud University Computer and Information Sciences 35 (2023) 101776
17
et al., 2021; Tsao et al., 2022), and non-English processing (Li
et al., 2016; Lin and Moh, 2021) are among the language-
related issues identified. Furthermore, the limited lexicon
sources and analysis approaches, have served to further
complicate this issue. The problem is also hampered by the
limitations of feature extraction techniques (Andhale et al.,
2021; de Carvalho and Costa, 2022; Mohamed Ridhwan
and Hargreaves, 2021).
(v) Limited information availability: The unavailability of com-
prehensive data was a major limitation found in the study,
attributed to users withholding crucial information or
restrictions imposed by social media platforms. Users
refrained from disclosing important details, such as the geo-
graphical location or event-related references (Sattaru et al.,
2021). While platform restrictions such as limited access to
user profiles and data acquisition timeframes added to the
scarcity of information. These constraints within the mes-
sage content have adversely impacted the overall sentiment
distribution Alomari et al. (2020);Chowdhury et al. (2022);
Wu and Cui (2018). Consequently, imbalanced sentiment
distribution has resulted in flawed analyses (Zhu et al.,
2020), limiting the accuracy and reliability of the findings.
5.8.2. Potential future works
This sub-section outlines potential work for further advance-
ment in the field of social media sentiment analysis and opinion
mining, specifically in relation to public security. These projections
are based on the trend analysis and issue identification discussed
in the preceding section. The recommended future works focus
on intensifying analysis levels, improving the relevance of the
dataset, utilizing variations of features to enhance outbreak detec-
tion, offering better language/ lexicon support, and mitigating
information exhaustion. It is anticipated that addressing these
focal points may tackle critical challenges in the public security
domain, thereby contributing substantively to the evolution of sen-
timent analysis and opinion mining. The suggested future work is
enumerated as follows:
(i) More extensive multi-class and multi-level analysis: Multi-
tasking with multidimensional sentiment analysis and opin-
ion mining could overcome the unspecified approach for the
problem domain. It also can overcome the difficulties for
researchers who need to analyze a specific event quickly.
According to the trend analysis of the objectives of the pub-
lished work considered, the number of multipurpose objec-
tives is still low when compared to other studies. This
highlights a significant gap in the current research, where
the value of nuanced, detailed analysis is not being fully
realized. Future work must focus extensively on multi-
class and multi-level approaches, especially to address sub-
tle class and level differences. Pre-trained transformers with
unique embeddings can be experimented with, as they offer
a superior classification approach in language models to
enhance comprehensive multi-class and multi-level analysis
(Sun et al., 2022); (Xia et al., 2022). Utilizing these advanced
computational tools can yield more detailed and nuanced
sentiment analysis, enabling the capture subtleties in public
opinion that might otherwise go unnoticed.
(ii) Production of a multi-domain public security datasets: As
noted earlier in this survey, much existing published work
used private datasets for specific events. Extra effort is
required to obtain a larger dataset that encompasses multi-
ple public security domains. The rationale for this is that lar-
ger datasets provide more representative and diverse
samples that can help in improving the robustness and gen-
eralizability of the models. Access to more data can substan-
tially enhance the performance of sentiment analysis and
opinion models (Alfred and Obit, 2021). Additionally, the
diversity of data will also serve as a catalyst for the develop-
ment of new relevant techniques. Having access to a wider
variety of data enables researchers to gain insights into dif-
ferent types of public sentiment, which can lead to the cre-
ation of more effective and tailored strategies for each sub-
domain.
(iii) Utilization of a greater variety of features and techniques for
automatic detection of outbreaks: Due to the limited data for
the pre-event timeframe, it is essential to employ diverse
features and prediction techniques to build better models.
Utilizing data from multiple domains, as recommended in
(i), can yield diverse datasets to facilitate feature extraction
and transfer learning for detection and implementation.
Moreover, network-based studies can be used to monitor
scenarios within a short event window. Dissemination net-
works, such as social network analysis, provide monitoring
tools based on structures and relationships to identify key
nodes. This is essential as key nodes often play pivotal roles
in information propagation, and early detection of unusual
activity may serve as an indicator of potential security
threats. Such studies can simulate changes in community
activity and explain the virality phenomenon within the des-
ignated timeframe (Chung and Zeng, 2018;Ma et al., 2020),
providing valuable insight for predicting and detecting out-
breaks in a timely manner.
(iv) Establishment of cross-language corpora and supporting
approaches: Cross-language lexicon corpus from corre-
sponding language datasets and larger datasets for valida-
tion could be used to address language processing issues.
This becomes increasingly important considering the global
nature of social media and public security issues. Different
communities, cultures, and countries express sentiments in
unique ways, and it is important to account for these linguis-
tic differences to ensure accurate sentiment analysis and
opinion mining. According to the objective trend analysis
presented in Fig. 12, there has been a lack of work focusing
on corpus generation. For the development of sentiment
analysis and opinion mining for public security, an approach
that does not rely solely on translation accuracy must be
investigated. NLP features can be diversified with domain
corpus for the feature extraction. A modification of transfer
learning that supports multilanguage in public security
domain can also be proposed. This approach could poten-
tially create a more inclusive model, capable of understand-
ing a wider range of sentiment from various linguistic
communities, thereby improving the overall effectiveness
and reach of sentiment analysis and opinion mining in the
public security domain.
(v) Expansion of data acquisition and geographical coverage:
Addressing the issues of limited data availability and imbal-
anced sentiment data distribution requires expanding data
acquisition across multiple platforms. This not only provide
a larger pool of data but also help in achieving a more repre-
sentative sample, capturing a broader array of sentiments
from various demographics and geographical locations. Geo-
graphical coverage areas can be extended, for instance, by
increasing the radius of the area (Yuan and Liu, 2020). Fur-
thermore, acquiring more data and integrating the Named
Entity Recognition (NER) approach in the pre-processing
stage for location detection in message dissemination can
also improve sentiment data distribution Barachi et al.
(2022). Expanding geographical coverage and improving
data acquisition can lead to more accurate sentiment analy-
sis, as it accounts for regional differences in sentiment
M.S. Md Suhaimin, M.H. Ahmad Hijazi, E.G. Moung et al. Journal of King Saud University Computer and Information Sciences 35 (2023) 101776
18
expression. The integration of NER can help identify the ori-
gin of sentiments, which is critical for public security agen-
cies in developing localized and effective strategies.
6. Discussion
This survey paper set out with the aim of constructing a taxon-
omy for state-of-the-art sentiment analysis and opinion mining
methods applicable to the public security domain, analyzing recent
trends, and identifying gaps and potential future areas of research.
First, the objective trends in sentiment analysis and opinion
mining within the public security domain are projected to display
an increasing focus on specific events, following the trends indi-
cated in Section 5.1. This shift may be due to the growing realiza-
tion that event-specific approaches allow for fine-tuned analytics
that can provide nuanced and actionable insights. However, this
requires advancements in the system’s flexibility and adaptability
to different events, thus indicating a key area for future innovation.
Despite the need for larger and more varied datasets, the genera-
tion of new corpora is predicted to remain low, primarily due to
high computational costs and challenges associated with multi-
domain and multilingual data. However, this gap presents an
opportunity for future research to explore more efficient and effec-
tive data generation methods.
Second, regarding the focus within the public security domain,
an anticipated surge is expected in the domain of public health.
This projection is informed by trend analyses discussed in Sec-
tion 5.2 and is further reinforced by a substantial amount of related
work concentrating on during-event scenarios, such as COVID-19.
This assumption is grounded in the continued global emphasis
on public health and the potential of sentiment analysis to provide
real-time public reactions and insights, a resource crucial for
responsive and efficient policy-making.
Third, as for the utilization of social media platforms, Twitter
and Sina Weibo are likely to maintain their dominance as indicated
in Section 5.4. This trend is likely to persist due to their ease of data
acquisition, extensive user base, and the availability of robust ana-
lytical tools, enabling more comprehensive sentiment analysis and
opinion mining.
Fourth, in terms of the language used in datasets, as discussed
in Section 5.5, monolingual data is expected to continue to domi-
nate over bilingual and multilingual data. However, as methods
for monolingual analysis become more established, research inter-
est in bilingual and multilingual data is likely to grow. This shift
will be crucial in the globalized context of social media, where
public opinion is diverse and multilingual. On the topic of dataset
usage in the public security domain in Section 5.6, although private
datasets are often employed, it is expected that public datasets will
be shared more frequently as they are studied and analyzed. This
shift can enhance transparency and enable collaborative efforts
across different studies.
Fifth, as discussed in Section 5.7, the hybrid approach for senti-
ment analysis and opinion mining will likely remain popular due
to its proven effectiveness in combining multiple approaches.
However, with the exponentially advancements in deep learning,
an increase in research interest is expected in this area. This pre-
diction is based on the recent surge in computational power and
the potential of deep learning to automate and enhance sentiment
analysis. In light of this trend, future taxonomies or surveys on sen-
timent analysis and opinion mining in the public security domain
may focus on deep learning methods. This shift could catalyze fur-
ther innovations in the field and provide valuable insights into the
evolving landscape of sentiment analysis.
While this survey has made significant strides in mapping the
landscape of sentiment analysis and opinion mining in the domain
of public security, it is important to acknowledge its limitations.
Primarily, this survey has not extensively covered public security
within the domain of cybersecurity. The nexus of cyber security
and public safety is indeed significant and increasingly salient in
an ever-more digitally connected world. However, the availability
of related works specific to this intersection was limited based
on search queries’ result. Moreover, this area was not within the
intended scope of this survey. Future surveys could look to address
this by specifically including cyber security within their purview. A
second limitation lies in the lack of a comparative evaluation of the
performance of machine learning and lexicon-based approach. The
works under consideration differed considerably in their objec-
tives, methodologies, and levels of detail provided. For instance,
while some research focused on the successful application of an
approach, they did not necessarily provide detailed results from
their experiments or implementations. In other cases, the focus
was primarily on statistical analysis, informed by the specific
objectives of the study. The heterogeneity in the focus and depth
of the analyzed studies precluded a thorough head-to-head com-
parison of these techniques. By acknowledging these limitations,
additional review works could be useful to yield further valuable
insights on the performance comparison of the existing work.
7. Conclusions
More research work on sentiment analysis and opinion mining
for public security domain has been conducted over the years.
However, there have been limited systematic surveys on the cur-
rent state of this work. Therefore, a more systematic survey, sup-
ported by a descriptive taxonomy, was presented here; a
taxonomy that classifies attributes.
This survey papers summarize the key concepts of the existing
work on social media sentiment analysis and opinion mining for
public security by organizing a taxonomy. Relevant indexed arti-
cles from Scopus, IEEE Xplore, and Science Direct are extracted
using keyword searching. Related journals, conference proceed-
ings, and serials underwent identification, screening, and review
for inclusion in this survey. The taxonomy includes seven key attri-
butes; objectives of the work conducted, the domain of public
security, the public security event timeframe, the social media
platform of the data acquisition, the dataset type, language of the
dataset, and the sentiment analysis or opinion mining approach
used. An analysis of the trends featured in the recent work consid-
ered, and the distribution on topics based on the identified attri-
butes from the taxonomy, was also presented.
The remaining issues and future direction of social media senti-
ment analysis and opinion mining for public security were estab-
lished and described in the second half of the survey. Five issues
were identified; (i) shortage of multi-class and different level anal-
ysis approaches; (ii) insufficient availability of public security
domain-independent datasets; (iii) inadequate prediction based
on timeframe coverage; (iv) lack of supporting approach for varia-
tion across different languages; and (v) limited information avail-
ability. Five potential areas of future work for social media
sentiment analysis and opinion mining, within the public security
domain, were suggested: (i) more extensive multi-class and multi-
level analysis; (ii) production of multi-domain public security
datasets; (iii) utilization of greater variety of features and tech-
nique for automatic detection of outbreaks; (iv) establishment of
cross-language corpora and supporting approaches; and (v) expan-
sion of data acquisition and geographical coverage.
To conclude, recent work on social media sentiment analysis
and opinion mining needs extra effort and extensive work with
respect to the public security domain, to ensure the development
of the study in the domain and delivery of related applications
for the public benefit. The presented taxonomy could be used by
M.S. Md Suhaimin, M.H. Ahmad Hijazi, E.G. Moung et al. Journal of King Saud University Computer and Information Sciences 35 (2023) 101776
19
other researchers to plan their works and research activities. The
potential future directions suggested could further improve the
existing approach or addresses the remaining gaps.
Author contributions
All authors contributed to the paper conceptualization and
design. Material preparation and analysis were performed by M.
S.M.S and M.H.A.H. Writing of the original draft was conducted
by M.S.M.S., M.H.A.H. and E.G.M. The writing review and editing
were performed by P.N.E.N. and S.C., supervised by F.C. All authors
have read and agreed to the published version of the manuscript.
Declaration of competing interest
The authors declare that they have no known competing finan-
cial interests or personal relationships that could have appeared
to influence the work reported in this paper.
Acknowledgement
This research was funded by the Ministry of Higher Education
Malaysia, grant number FRGS/1/2022/ICT02/UMS/02/3.
References
Abdul Reda, A., Sinanoglu, S., Aboussalah, A., 2023. Out of sight, out of mind: The
impact of lockdown measures on sentiment towards refugees. Journal of
Information Technology & Politics. https://doi.org/10.1080/
19331681.2023.2183301.
Abid, A., Ameur, H., Mbarek, A., Rekik, A., Jamoussi, S., & Ben Hamadou, A. 2017. An
extraction and unification methodology for social networks data: An
application to public security. Paper presented at the ACM International
Conference Proceeding Series.
Abusaqer, M., Benaoumeur Senouci, M., & Magel, K. 2023. Twitter User Sentiments
Analysis: Health System Cyberattacks Case Study. Paper presented at the 5th
International Conference on Artificial Intelligence in Information and
Communication, ICAIIC 2023.
Adamu, H., Lutfi, S.L., Malim, N.H.A.H., Hassan, R., Di Vaio, A., Mohamed, A.S.A., 2021.
Framing twitter public sentiment on Nigerian government COVID-19 palliatives
distribution using machine learning. Sustainability (Switzerland) 13 (6). https://
doi.org/10.3390/su13063497.
Agarwal, B., Mittal, N., 2016. Prominent feature extraction for sentiment analysis.
Springer.
Aggarwal, C.C., Zhai, C., 2012. Mining text data. Springer Science & Business Media.
Akbar, A.F., Santoso, A.B., Putra, P.K., Budi, I., 2021. A classification model to identify
public opinion on the lockdown policy using Indonesian tweets Retrieved from
Journal of Theoretical and Applied Information Technology 99 (14), 3394–3402
https://www.scopus.com/inward/record.uri?eid=2-s2.0-85111596289&
partnerID=40&md5=a7e6e851c96f32b1b426cbfd50525da9.
Al-Agha, I., Abu-Dahrooj, O., 2019. Multi-level analysis of political sentiments using
twitter data: A case study of the Palestinian-Israeli conflict. Jordanian Journal of
Computers and Information Technology 5 (3), 195–215. https://doi.org/
10.5455/jjcit.71-1562700251.
Alam, F., Joty, S., & Imran, M. 2018. Domain adaptation with adversarial training and
graph embeddings. arXiv preprint arXiv:1805.05151.
Albayrak, M. D., & Gray-Roncal, W. 2019. Data Mining and Sentiment Analysis of
Real-Time Twitter Messages for Monitoring and Predicting Events. Paper
presented at the 2019 9th IEEE Integrated STEM Education Conference, ISEC
2019.
Alfarrarjeh, A., Agrawal, S., Kim, S. H., & Shahabi, C. 2017. Geo-spatial multimedia
sentiment analysis in disasters. Paper presented at the Proceedings - 2017
International Conference on Data Science and Advanced Analytics, DSAA 2017.
Alfred, R., Obit, J.H., 2021. The roles of machine learning methods in limiting the
spread of deadly diseases: A systematic review. Heliyon 7 (6), e07371.
Ali, M.F., Irfan, R., Lashari, T.A., 2023. Comprehensive sentimental analysis of tweets
towards COVID-19 in Pakistan: a study on governmental preventive measures.
PeerJ Computer Science 9. https://doi.org/10.7717/PEERJ-CS.1220.
Al-khateeb, S., & Agarwal, N. 2016. The rise & fall of #NoBackDoor on Twitter: The
apple vs. FBI case. Paper presented at the 2016 IEEE/ACM International
Conference on Advances in Social Networks Analysis and Mining (ASONAM).
Almehmadi, A., Joudaki, Z., & Jalali, R. 2017. Language Usage on Twitter Predicts
Crime Rates. Paper presented at the ACM International Conference Proceeding
Series.
Alomari, E., Mehmood, R., & Katib, I. 2020 Sentiment analysis of arabic tweets for
road traffic congestion and event detection. In. EAI/Springer Innovations in
Communication and Computing (pp. 37-54).
Amin, M. S., Ahn, H., & Choi, Y. B. 2021. Human Sentiments and Associated Physical
Actions Detection in Disasters with Deep Learning. Paper presented at the 3rd
International Conference on Artificial Intelligence in Information and
Communication, ICAIIC 2021.
An, L., Han, Y., Yi, X., & Li, G. 2019. Prediction of microblogging influence and
measuring of topical influence in the context of terrorist events. Paper
presented at the 17th International Conference on Scientometrics and
Informetrics, ISSI 2019 - Proceedings.
An, L., Han, Y., Yi, X., Li, G., Yu, C., 2021. Prediction and Evolution of the Influence of
Microblog Entries in the Context of Terrorist Events. Social Science Computer
Review. https://doi.org/10.1177/08944393211029193.
Andhale, S., Mane, P., Vaingankar, M., Karia, D., & Talele, K. T. 2021. Twitter
Sentiment Analysis for COVID-19. Paper presented at the Proceedings -
International Conference on Communication, Information and Computing
Technology, ICCICT 2021.
Anuratha, K., Joshi, S., Sharmila, P., Nandhini, J. M. N., & Paravthy, M. 2021. Topical
Sentiment Classification to Unmask the Concerns of General Public during
COVID-19 Pandemic using Indian Tweets. Paper presented at the Proceedings of
the 2021 4th International Conference on Computing and Communications
Technologies, ICCCT 2021.
Anuratha, K., Parvathy, M., 2023. Multi-label Emotion Classification of COVID-19
Tweets with Deep Learning and Topic Modelling. Computer Systems Science
and Engineering 45 (3), 3005–3021. https://doi.org/10.32604/
csse.2023.031553.
Arias, F., Guerra-Adames, A., Zambrano, M., Quintero-Guerra, E., Tejedor-Flores, N.,
2022. Analyzing Spanish-Language Public Sentiment in the Context of a
Pandemic and Social Unrest: The Panama Case. International Journal of
Environmental Research and Public Health 19 (16). https://doi.org/10.3390/
ijerph191610328.
Astuti, I. F., Widagdo, P. P., Tanro, M. L. R., Cahyadi, D., & Suntara, A. A. 2023.
Sentiment analysis on land and forest fire management in Twitter using Naïve
Bayes method. Paper presented at the AIP Conference Proceedings.
Azmi, P. A. R., Abidin, A. W. Z., Mutalib, S., Zawawi, I. S. M., & Halim, S. A. 2022.
Sentiment Analysis on MySejahtera Application during COVID-19 Pandemic.
Paper presented at the 2022 3rd International Conference on Artificial
Intelligence and Data Sciences (AiDAS).
Baccianella, S., Esuli, A., & Sebastiani, F. 2010. Sentiwordnet 3.0: An enhanced lexical
resource for sentiment analysis and opinion mining. Paper presented at the
Proceedings of the Seventh International Conference on Language Resources
and Evaluation (LREC’10).
Backfried, G., & Shalunts, G. 2016. Sentiment analysis of media in German on the
refugee crisis in Europe. In: Vol. 265. Lecture Notes in Business Information
Processing (pp. 234-241).
Bai, H., Yu, G., 2016. A Weibo-based approach to disaster informatics: incidents
monitor in post-disaster situation via Weibo text negative sentiment analysis.
Natural Hazards 83 (2), 1177–1196. https://doi.org/10.1007/s11069-016-2370-
5.
Bala, M.M., Srinivasa Rao, M., Ramesh Babu, M., 2017. Sentiment trends on natural
disasters using location based twitter opinion mining Retrieved from
International Journal of Civil Engineering and Technology 8 (8), 9–19 https://
www.scopus.com/inward/record.uri?eid=2-s2.0-85028308036&partnerID=40&
md5=477be37d86174572cf160dcdda832d71.
Banda, J.M., Tekumalla, R., Wang, G., Yu, J., Liu, T., Ding, Y., Chowell, G., 2021. A
large-scale COVID-19 Twitter chatter dataset for open scientific research—an
international collaboration. Epidemiologia 2 (3), 315–324.
Bansal, D., Grover, R., Saini, N., Saha, S., 2021. GenSumm: A Joint Framework for
Multi-task Tweet Classification and Summarization using Sentiment Analysis
and Generative Modelling. IEEE Transactions on Affective Computing. https://
doi.org/10.1109/TAFFC.2021.3131516.
Barachi, M. E., Mathew, S. S., & Alkhatib, M. 2022. Combining Named Entity
Recognition and Emotion Analysis of Tweets for Early Warning of Violent
Actions. Paper presented at the 2022 7th International Conference on Smart and
Sustainable Technologies, SpliTech 2022.
Bashar, M.K., 2022. A Hybrid Approach to Explore Public Sentiments on COVID-19.
SN Computer Science 3 (3). https://doi.org/10.1007/s42979-022-01112-1.
Basu, M., Shandilya, A., Khosla, P., Ghosh, K., Ghosh, S., 2019. Extracting resource
needs and availabilities from microblogs for aiding post-disaster relief
operations. IEEE Transactions on Computational Social Systems 6 (3), 604–618.
Becken, S., Stantic, B., Chen, J., Connolly, R.M., 2022. Twitter conversations reveal
issue salience of aviation in the broader context of climate change. Journal of Air
Transport Management 98,. https://doi.org/10.1016/j.jairtraman.2021.102157
102157.
Behl, S., Rao, A., Aggarwal, S., Chadha, S., Pannu, H.S., 2021. Twitter for disaster relief
through sentiment analysis for COVID-19 and natural hazard crises.
International Journal of Disaster Risk Reduction 55. https://doi.org/10.1016/j.
ijdrr.2021.102101.
Berhoum, A., Meftah, M.C.E., Laouid, A., Hammoudeh, M., 2023. An Intelligent
Approach Based on Cleaning up of Inutile Contents for Extremism Detection and
Classification in Social Networks. ACM Transactions on Asian and Low-Resource
Language Information Processing 22 (5). https://doi.org/10.1145/3575802.
Bhatia, S., Chaudhary, P., Dey, N., 2020. Opinion mining in information retrieval.
Springer.
Bhosale, Y.H., Patnaik, K.S., 2022. Application of deep learning techniques in
diagnosis of covid-19 (coronavirus): a systematic review. Neural Processing
Letters, 1–53.
M.S. Md Suhaimin, M.H. Ahmad Hijazi, E.G. Moung et al. Journal of King Saud University Computer and Information Sciences 35 (2023) 101776
20
Bhosale, Y.H., Patnaik, K.S., 2023. PulDi-COVID: Chronic obstructive pulmonary
(lung) diseases with COVID-19 classification using ensemble deep
convolutional neural network from chest X-ray images to minimize severity
and mortality rates. Biomedical Signal Processing and Control 81, 104445.
Bhullar, G., Khullar, A., Kumar, A., Sharma, A., Pannu, H.S., Malhi, A., 2022. Time
series sentiment analysis (SA) of relief operations using social media (SM)
platform for efficient resource management. International Journal of Disaster
Risk Reduction 75,. https://doi.org/10.1016/j.ijdrr.2022.102979 102979.
Boukabous, M., & Azizi, M. 2020. Review of Learning-Based Techniques of Sentiment
Analysis for Security Purposes. Paper presented at the The Proceedings of the
Third International Conference on Smart City Applications.
Cai, M., Luo, H., Cui, Y., 2021. A Study on the Topic-Sentiment Evolution and
Diffusion in Time Series of Public Opinion Derived from Emergencies.
Complexity 2021. https://doi.org/10.1155/2021/2069010.
Cao, G., Shen, L., Evans, R., Zhang, Z., Bi, Q., Huang, W., et al., 2021. Analysis of social
media data for public emotion on the Wuhan lockdown event during the
COVID-19 pandemic. Computer Methods and Programs in Biomedicine 212,.
https://doi.org/10.1016/j.cmpb.2021.106468 106468.
Cao, H., & Lian, S. 2020. Internet Forum Texual Information Filtering Mechanism for
Emergency based on Bi-LSTM Neural Network. Paper presented at the 2020 IEEE
9th Joint International Information Technology and Artificial Intelligence
Conference (ITAIC).
Chandra, J. K., Cambria, E., & Nanetti, A. 2020. One Belt, One Road, One Sentiment? A
Hybrid Approach to Gauging Public Opinions on the New Silk Road Initiative.
Paper presented at the IEEE International Conference on Data Mining
Workshops, ICDMW.
Chaudhary, M., & Bansal, D. 2021. Agitation on Social Media Amid Terrorist Attacks:
A Case Study on Jammu and Kashmir. In: Vol. 62. Lecture Notes on Data
Engineering and Communications Technologies (pp. 383-391).
Che, S.P., Wang, X., Zhang, S., Kim, J.H., 2023. Effect of daily new cases of COVID-19
on public sentiment and concern: Deep learning-based sentiment classification
and semantic network analysis. Journal of Public Health (Germany). https://doi.
org/10.1007/s10389-023-01833-4.
Chen, S., Mao, J., & Li, G. 2019. Spatiotemporal Analysis on Sentiments and Retweet
Patterns of Tweets for Disasters. In: Vol. 11420 LNCS. Lecture Notes in
Computer Science (including subseries Lecture Notes in Artificial Intelligence
and Lecture Notes in Bioinformatics) (pp. 436-443).
Chen, Y., Ji, W., & Wang, Q. 2019. A Bayesian-Based Approach for Public Sentiment
Modeling. Paper presented at the 2019 Winter Simulation Conference (WSC).
Chen, X., Zeng, H., Xu, H., & Di, X. 2021. Sentiment Analysis of Autonomous Vehicles
after Extreme Events Using Social Media Data. Paper presented at the IEEE
Conference on Intelligent Transportation Systems, Proceedings, ITSC.
Chen, Y., Ji, W., 2021. Public demand urgency for equitable infrastructure
restoration planning. International Journal of Disaster Risk Reduction 64.
https://doi.org/10.1016/j.ijdrr.2021.102510.
Chen, E., Lerman, K., Ferrara, E., 2020. Tracking social media discourse about the
covid-19 pandemic: Development of a public coronavirus twitter data set. JMIR
Public Health and Surveillance 6 (2), e19273.
Chenxi, L., Jilin, F., Meng, H., & Zhonghao, W. 2022. Research on Post Earthquake
Public Opinion Analysis Based on XLNet-BiGRU-A Algorithm. Paper presented at
the 2022 IEEE 2nd International Conference on Computer Communication and
Artificial Intelligence (CCAI).
Choirul Rahmadan, M., Nizar Hidayanto, A., Swadani Ekasari, D., Purwandari, B., &
Theresiawati. 2020. Sentiment Analysis and Topic Modelling Using the LDA
Method related to the Flood Disaster in Jakarta on Twitter. Paper presented at
the Proceedings - 2nd International Conference on Informatics, Multimedia,
Cyber, and Information System, ICIMCIS 2020.
Chouhan, R. L. 2021. Sentiment analysis of pulwama attack using twitter data. In:
Vol. 135. Lecture Notes in Networks and Systems (pp. 119-126).
Chowdhury, S.R., Basu, S., Maulik, U., 2022. Disastrous Event and Sub-Event
Detection From Microblog Posts Using Bi-clustering Method. IEEE Transactions
on Computational Social Systems 1–0. https://doi.org/10.1109/
TCSS.2022.3213794.
Chung, W., & Zeng, D. 2018. Social-media-based policy informatics: Cyber-
surveillance for homeland security and public health informatics. In: Vol. 25.
Public Administration and Information Technology (pp. 363-385).
Chung, W., Zeng, D., 2016. Social-media-based public policy informatics: Sentiment
and network analyses of U.S. Immigration and border security. Journal of the
Association for. Information Science and Technology 67 (7), 1588–1606. https://
doi.org/10.1002/asi.23449.
Cohen, J., 1988. Stafisfical power analysis for the behavioral sciences. Academic
Press, Orlando, FL.
Dahal, B., Kumar, S.A.P., Li, Z., 2019. Topic modeling and sentiment analysis of global
climate change tweets. Social Network Analysis and Mining 9 (1). https://doi.
org/10.1007/s13278-019-0568-8.
Daou, H., 2021. Sentiment of the public: the role of social media in revealing
important events. Online Information Review 45 (1), 157–173. https://doi.org/
10.1108/OIR-12-2019-0373.
de Carvalho, V.D.H., Costa, A.P.C.S., 2022. Towards corpora creation from social web
in Brazilian Portuguese to support public security analyses and decisions.
Library Hi Tech. https://doi.org/10.1108/LHT-08-2022-0401.
de Carvalho, V. D. H., Nepomuceno, T. C. C., & Costa, A. P. C. S. 2020. An Automated
Corpus Annotation Experiment in Brazilian Portuguese for Sentiment Analysis
in Public Security. In: Vol. 384 LNBIP. Lecture Notes in Business Information
Processing (pp. 99-111).
de Carvalho, V.D.H., Seixas Costa, A.P.C., 2021. Public Security Sentiment Analysis on
Social Web: A Conceptual Framework for the Analytical Process and a Research
Agenda. International Journal of Decision Support System Technology 13 (1), 1–
20. https://doi.org/10.4018/IJDSST.2021010101.
Dehdezi, A. K., & Sardi, F. K. Q. 2016. The Role of Public Security in Society.
International Journal of Humanities and Cultural Studies (IJHCS) ISSN 2356-
5926, 2090-2096.
Dimitrov, D., Baran, E., Fafalios, P., Yu, R., Zhu, X., Zloch, M., & Dietze, S. 2020.
TweetsCOV19 - A Knowledge Base of Semantically Annotated Tweets about the
COVID-19 Pandemic. Paper presented at the International Conference on
Information and Knowledge Management, Proceedings.
Dong, Z.S., Meng, L., Christenson, L., Fulton, L., 2021. Social media information
sharing for natural disaster response. Natural Hazards 107 (3), 2077–2104.
https://doi.org/10.1007/s11069-021-04528-9.
Du, Q., Li, Y., Li, Y., Zhou, J., Cui, X., 2023. Data mining of social media for urban
resilience study: A case of rainstorm in Xi’an. International Journal of Disaster
Risk Reduction 95,. https://doi.org/10.1016/j.ijdrr.2023.103836 103836.
Duan, J., Zhai, W., Cheng, C., 2020. Crowd detection in mass gatherings based on
social media data: A case study of the 2014 shanghai new year’s eve stampede.
International Journal of Environmental Research and Public Health 17 (22), 1–
14. https://doi.org/10.3390/ijerph17228640.
Dudani, A., Srividya, V., Sneha, B., & Tripathy, B. K. 2020. Sentiment Analysis on
Kerala Floods. In: Vol. 1087. Advances in Intelligent Systems and Computing
(pp. 107-124).
Effrosynidis, D., Karasakalidis, A.I., Sylaios, G., Arampatzis, A., 2022. The climate
change Twitter dataset. Expert Systems with Applications 204,. https://doi.org/
10.1016/j.eswa.2022.117541 117541.
Eke, C.I., Norman, A.A., Shuib, L., Nweke, H.F., 2020. Sarcasm identification in textual
data: systematic review, research challenges and open directions. Artificial
Intelligence Review 53 (6), 4215–4258.
El Ali, A., Stratmann, T. C., Park, S., Schöning, J., Heuten, W., & Boll, S. C. J. 2018.
Measuring, understanding, and classifying news media sympathy on Twitter
after crisis events. Paper presented at the Conference on Human Factors in
Computing Systems - Proceedings.
Fakhry, N.N., Kassam, G., Asfoura, E., 2020. Tracking coronavirus pandemic diseases
using social media: a machine learning approach. International Journal of
Advanced Computer Science and Applications 11 (10), 211–219. https://doi.org/
10.14569/IJACSA.2020.0111028.
Fan, C., Farahmend, H., & Mostafavi, A. 2020. Rethinking infrastructure resilience
assessment with human sentiment reactions on social media in disasters. Paper
presented at the Proceedings of the Annual Hawaii International Conference on
System Sciences.
Fang, F., Wang, T., Tan, S., Chen, S., Zhou, T., Zhang, W., et al., 2022. Network
Structure and Community Evolution Online: Behavioral and Emotional Changes
in Response to COVID-19. Frontiers in Public Health 9. https://doi.org/10.3389/
fpubh.2021.813234.
Farzindar, A.A., Inkpen, D., 2020. Natural language processing for social media.
Synthesis Lectures on Human Language Technologies 13 (2), 1–219.
Fattoh, I.E., Kamal Alsheref, F., Ead, W.M., Youssef, A.M., 2022. Semantic Sentiment
Classification for COVID-19 Tweets Using Universal Sentence Encoder.
Computational Intelligence and Neuroscience 2022. https://doi.org/10.1155/
2022/6354543.
Fuadvy, M. J., & Ibrahim, R. 2019. Multilingual Sentiment Analysis on Social Media
Disaster Data. Paper presented at the ICEEIE 2019 - International Conference on
Electrical, Electronics and Information Engineering: Emerging Innovative
Technology for Sustainable Future.
Gangadhari, R. K., Khanzode, V., & Murthy, S. 2021. Disaster impacts analysis using
social media data. Paper presented at the 2021 International Conference on
Maintenance and Intelligent Asset Management, ICMIAM 2021.
Garcia, K., Berton, L., 2021. Topic detection and sentiment analysis in Twitter
content related to COVID-19 from Brazil and the USA. Applied Soft Computing
101. https://doi.org/10.1016/j.asoc.2020.107057.
Gascó, M., Bayerl, P.S., Denef, S., Akhgar, B., 2017. What do citizens communicate
about during crises? Analyzing twitter use during the 2011 UK riots.
Government Information Quarterly 34 (4), 635–645. https://doi.org/10.1016/j.
giq.2017.11.005.
Geeta, & Niyogi, R. 2016. Demographic analysis of Twitter users. Paper presented at
the 2016 International Conference on Advances in Computing, Communications
and Informatics, ICACCI 2016.
Ghanem, A., Asaad, C., Hafidi, H., Moukafih, Y., Guermah, B., Sbihi, N., Baina, K., 2021.
Real-time infoveillance of moroccan social media users’ sentiments towards the
covid-19 pandemic and its management. International Journal of
Environmental Research and Public Health 18 (22). https://doi.org/10.3390/
ijerph182212172.
Gong, P., Wang, L., Wei, Y., Yu, Y., 2022. Public attention, perception, and attitude
towards nuclear power in China: A large-scale empirical analysis based on
social media. Journal of Cleaner Production 373. https://doi.org/10.1016/j.
jclepro.2022.133919.
Groves, R.M., Fowler Jr, F.J., Couper, M.P., Lepkowski, J.M., Singer, E., Tourangeau, R.,
2011. Survey methodology. John Wiley & Sons.
Gu, M., Guo, H., Zhuang, J., 2021. Social media behavior and emotional evolution
during emergency events. Healthcare (Switzerland) 9 (9). https://doi.org/
10.3390/healthcare9091109.
Gu, M., Guo, H., Zhuang, J., Du, Y., Qian, L., 2022. Social Media User Behavior and
Emotions during Crisis Events. International Journal of Environmental Research
and Public Health 19 (9). https://doi.org/10.3390/ijerph19095197.
M.S. Md Suhaimin, M.H. Ahmad Hijazi, E.G. Moung et al. Journal of King Saud University Computer and Information Sciences 35 (2023) 101776
21
Guha-Sapir, D., Below, R., & Hoyois, P. 2016. EM-DAT: the CRED/OFDA international
disaster database.
Gupta, N., Agrawal, R., 2020. Application and techniques of opinion mining. In:
Hybrid Computational Intelligence. Elsevier, pp. 1–23.
Gupta, M., Bansal, A., Jain, B., Rochelle, J., Oak, A., Jalali, M.S., 2021. Whether the
weather will help us weather the COVID-19 pandemic: Using machine learning
to measure twitter users’ perceptions. International Journal of Medical
Informatics 145. https://doi.org/10.1016/j.ijmedinf.2020.104340.
Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L.A., 2008. Feature extraction: Foundations
and applications, Vol. 207. Springer.
Han, F., Cao, Y.D., Zhang, Z.H., Zhang, H.J., Aoki, T., Ogasawara, K., 2022. Weibo users
perception of the COVID-19 pandemic on Chinese social networking service
(Weibo): sentiment analysis and fuzzy-c-means model. Journal of Medical
Artificial Intelligence 5. https://doi.org/10.21037/jmai-21-36.
Hasegawa, S., Suzuki, T., Yagahara, A., Kanda, R., Aono, T., Yajima, K., Ogasawara, K.,
2020. Changing emotions about fukushima related to the fukushima nuclear
power station accident-how rumors determined people’s attitudes: Social
media sentiment analysis. Journal of Medical Internet Research 22 (9). https://
doi.org/10.2196/18662.
He, Y., Wen, L., & Zhu, T. 2019. Area Definition and Public Opinion Research of
Natural Disaster Based on Micro-blog Data. Paper presented at the Procedia
Computer Science.
Henríquez-Coronel, P., García García, J., & Herrera-Tapia, J. 2019. Management of
Natural Disasters Based on Twitter Analytics. 2017 Mexico Earthquake. In:Vol.
918. Advances in Intelligent Systems and Computing (pp. 3-12).
Hornby, A. S., & Cowie, A. P. 1995. Oxford advanced learner’s dictionary (Vol. 1428):
Oxford university press Oxford.
Hu, N., 2022. Sentiment Analysis of Texts on Public Health Emergencies Based on
Social Media Data Mining. Computational and Mathematical Methods in
Medicine 2022. https://doi.org/10.1155/2022/3964473.
Hung, L.P., Alfred, R., Ahmad Hijazi, M.H., Ibrahim, A., Asri, A., 2015. A review on the
ensemble framework for sentiment analysis. Advanced Science Letters 21 (10),
2957–2962.
Hutto, C., & Gilbert, E. 2014. Vader: A parsimonious rule-based model for sentiment
analysis of social media text. Paper presented at the Proceedings of the
international AAAI conference on web and social media.
Hyo Jin, D., Chae-Gyun, L., You Jin, K., & Ho-Jin, C. 2016. Analyzing emotions in
twitter during a crisis: A case study of the 2015 Middle East Respiratory
Syndrome outbreak in Korea. Paper presented at the 2016 International
Conference on Big Data and Smart Computing (BigComp).
Iksan, N., Widodo, D. A., Sunarko, B., Udayanti, E. D., & Kartikadharma, E. 2021.
Sentiment Analysis of Public Reaction to COVID19 in Twitter Media using Naïve
Bayes Classifier. Paper presented at the InHeNce 2021 - 2021 IEEE International
Conference on Health, Instrumentation and Measurement, and Natural
Sciences.
Imran, M., Elbassuoni, S., Castillo, C., Diaz, F., & Meier, P. 2013. Practical extraction of
disaster-relevant information from social media. Paper presented at the
Proceedings of the 22nd international conference on world wide web.
Imran, M., Qazi, U., Ofli, F., 2022. TBCOV: Two Billion Multilingual COVID-19 Tweets
with Sentiment, Entity, Geo, and Gender Labels. Data 7 (1). https://doi.org/
10.3390/data7010008.
Jiang, D., Luo, X., Xuan, J., Xu, Z., 2017. Sentiment computing for the news event
based on the social media big data. IEEE Access 5, 2373–2382. https://doi.org/
10.1109/ACCESS.2016.2607218.
Kabbani, O., Klumpenhouwer, W., El-Diraby, T., Shalaby, A., 2022. What do riders
say and where? The detection and analysis of eyewitness transit tweets. J. Intell.
Transp. Syst.Technology, Planning, and Operations.. https://doi.org/10.1080/
15472450.2022.2026773.
Karami, A., Shah, V., Vaezi, R., Bansal, A., 2020. Twitter speaks: A case of national
disaster situational awareness. Journal of Information Science 46 (3), 313–324.
https://doi.org/10.1177/0165551519828620.
Karimiziarani, M., Moradkhani, H., 2023. Social response and Disaster management:
Insights from twitter data Assimilation on Hurricane Ian. International Journal
of Disaster Risk Reduction 95. https://doi.org/10.1016/j.ijdrr.2023.103865.
Khandelwal, S., Chaudhary, A., 2022. COVID-19 pandemic & cyber security issues:
Sentiment analysis and topic modeling approach. Journal of Discrete
Mathematical Sciences and Cryptography 25 (4), 987–997. https://doi.org/
10.1080/09720529.2022.2072421.
Khatua, A., Cambria, E., Ho, S. S., & Na, J. C. 2020. Deciphering Public Opinion of
Nuclear Energy on Twitter. Paper presented at the Proceedings of the
International Joint Conference on Neural Networks.
Kostakos, P., Nykanen, M., Martinviita, M., Pandya, A., & Oussalah, M. 2018. Meta-
terrorism: Identifying linguistic patterns in public discourse after an attack.
Paper presented at the Proceedings of the 2018 IEEE/ACM International
Conference on Advances in Social Networks Analysis and Mining, ASONAM
2018.
Kovács, T., Kovács-Gy}
ori, A., Resch, B., 2021. #allforjan: How twitter users in europe
reacted to the murder of ján kuciak—revealing spatiotemporal patterns through
sentiment analysis and topic modeling. ISPRS International Journal of Geo-
Information 10 (9). https://doi.org/10.3390/ijgi10090585.
Koytak, H.Z., Celik, M.H., 2022. A Text Mining Approach to Determinants of Attitude
Towards Syrian Immigration in the Turkish Twittersphere. Social Science
Computer Review. https://doi.org/10.1177/08944393221117460.
Kumar, N. 2018. Sentiment Analysis of Twitter Messages: Demonetization a Use
Case. Paper presented at the 2nd International Conference on Computational
Systems and Information Technology for Sustainable Solutions, CSITSS 2017.
Kumar, S. 2020. Covid 19 Indian Sentiments on covid19 and lockdown. Retrieved
from: https://www.kaggle.com/datasets/surajkum1198/twitterdata.
Kulkarni, A., Shivananda, A., 2019. Natural language processing recipes. Apress.
Lamsal, R., 2020. Coronavirus (COVID-19) tweets dataset. IEEE Dataport 10.
Lamsal, R., Harwood, A., Read, M.R., 2022. Twitter conversations predict the daily
confirmed COVID-19 cases. Applied Soft Computing 129. https://doi.org/
10.1016/j.asoc.2022.109603.
Lamsal, R. 2020a. Coronavirus (COVID-19) geo-tagged tweets. Retrieved from:
https://ieee-dataport.org/open-access/coronavirus-covid-19-geo-tagged-
tweets-dataset.
Laudy, C., Ruini, F., Zanasi, A., Przybyszewski, M., & Stachowicz, A. 2017. Using social
media in crisis management: SOTERIA fusion center for managing information
gaps. Paper presented at the 20th International Conference on Information
Fusion, Fusion 2017 - Proceedings.
Lee, J. S., & Nerghes, A. 2017. Labels and sentiment in social media: On the role of
perceived agency in online discussions of the refugee crisis. Paper presented at
the ACM International Conference Proceeding Series.
Lee, M.J., Lee, T.R., Lee, S.J., Jang, J.S., Kim, E.J., 2020. Machine Learning-Based Data
Mining Method for Sentiment Analysis of the Sewol Ferry Disaster’s Effect on
Social Stress. Frontiers in Psychiatry 11. https://doi.org/10.3389/
fpsyt.2020.505673.
Lee, J.S., Nerghes, A., 2018. Refugee or Migrant Crisis? Labels, Perceived Agency, and
Sentiment Polarity in Online Discussions. Social Media and Society 4 (3).
https://doi.org/10.1177/2056305118785638.
Li, N., Akin, H., Yi-Fan, L., Brossard, D., Xenos, M., Scheufele, D.A., 2016. Tweeting
disaster: An analysis of online discourse about nuclear power in the wake of the
Fukushima Daiichi nuclear accident. Journal of Science Communication 15 (5),
1–20. https://doi.org/10.22323/2.15050202.
Li, Z., & Liu, W. 2020. Automatic decision support for public opinion governance of
urban public events. In: Vol. 1031 AISC. Advances in Intelligent Systems and
Computing (pp. 47-53).
Li, X., Li, Z., Tian, Y., 2021. Sentimental knowledge graph analysis of the covid-19
pandemic based on the official account of chinese universities. Electronics
(Switzerland) 10 (23). https://doi.org/10.3390/electronics10232921.
Li, L., Ma, Z., Cao, T., 2020. Leveraging social media data to study the community
resilience of New York City to 2019 power outage. International Journal of
Disaster Risk Reduction 51. https://doi.org/10.1016/j.ijdrr.2020.101776.
Li, S., Sun, X., 2023. Application of public emotion feature extraction algorithm
based on social media communication in public opinion analysis of natural
disasters. PeerJ Computer Science 9. https://doi.org/10.7717/PEERJ-CS.1417.
Li, X., Wang, Z., Gao, C., Shi, L., 2017. Reasoning human emotional responses from
large-scale social and public media. Applied Mathematics and Computation
310, 182–193. https://doi.org/10.1016/j.amc.2017.03.031.
Li, P., Wang, Q., 2021. A Multichannel Model for Microbial Key Event Extraction
Based on Feature Fusion and Attention Mechanism. Security and
Communication Networks 2021. https://doi.org/10.1155/2021/7800144.
Li, J., Wang, Y., Wang, J., 2021. An analysis of emotional tendency under the network
public opinion: Deep learning. Informatica (Slovenia) 45 (1), 149–156. https://
doi.org/10.31449/inf.v45i1.3402.
Li, L., Wang, X.T., 2022. Nonverbal communication with emojis in social media:
dissociating hedonic intensity from frequency. Language Resources and
Evaluation. https://doi.org/10.1007/s10579-022-09611-6.
Li, Y., Zhou, X., Sun, Y., Zhang, H., 2016. Design and implementation of Weibo
sentiment analysis based on LDA and dependency parsing. China
Communications 13 (11), 91–105. https://doi.org/10.1109/CC.2016.7781721.
Lian, Y., Liu, Y., Dong, X., 2020. Strategies for controlling false online information
during natural disasters: The case of Typhoon Mangkhut in China. Technology
in Society 62,. https://doi.org/10.1016/j.techsoc.2020.101265 101265.
Lin, H. Y., & Moh, T. S. 2021. Sentiment analysis on COVID tweets using COVID-
Twitter-BERT with auxiliary sentence approach. Paper presented at the
Proceedings of the 2021 ACMSE Conference - ACMSE 2021: The Annual ACM
Southeast Conference.
Litvak, M., Vanetik, N., Levi, E., & Roistacher, M. 2016. What’s up on Twitter? Catch
up with TWIST! Paper presented at the COLING 2016 - 26th International
Conference on Computational Linguistics, Proceedings of COLING 2016: System
Demonstrations.
Liu, B., 2011. Web data mining: exploring hyperlinks, contents, and usage data, Vol.
1. Springer.
Liu, B., 2020. Sentiment analysis: Mining opinions, sentiments, and emotions.
Cambridge University Press.
Liu, R., Liu, M., Li, Y., Wu, L., 2023. Crisis Management Experience from Social Media:
Public Response to the Safety Crisis of Imported Aquatic Products in China
during the Pandemic. Foods 12 (5). https://doi.org/10.3390/foods12051033.
Liu, X., Zheng, L., Jia, X., Qi, H., Yu, S., Wang, X., 2021. Public Opinion Analysis on
Novel Coronavirus Pneumonia and Interaction With Event Evolution in Real
World. IEEE Transactions on Computational Social Systems 8 (4), 1042–1051.
https://doi.org/10.1109/TCSS.2021.3087346.
Loria, S., Keen, P., Honnibal, M., Yankovsky, R., Karesh, D., & Dempsey, E. 2014.
Textblob: simplified text processing. Secondary TextBlob: simplified text
processing, 3.
Loureiro, M.L., Alló, M., 2020. Sensing climate change and energy issues: Sentiment
and emotion analysis with social media in the U.K. and Spain. Energy Policy 143.
https://doi.org/10.1016/j.enpol.2020.111490.
Luna, S., Guerrero, A., Gonzalez, K., & Akundi, A. 2022. Social media and pandemic
events: challenges for alert-warning systems. Paper presented at the 2022 17th
Annual System of Systems Engineering Conference, SOSE 2022.
M.S. Md Suhaimin, M.H. Ahmad Hijazi, E.G. Moung et al. Journal of King Saud University Computer and Information Sciences 35 (2023) 101776
22
Lydiri, M., El Mourabit, Y., El Habouz, Y., Fakir, M., 2023. A performant deep learning
model for sentiment analysis of climate change. Social Network Analysis and
Mining 13 (1). https://doi.org/10.1007/s13278-022-01014-3.
Lyu, S., & Lu, Z. 2023. Exploring Temporal and Multilingual Dynamics of Post-
Disaster Social Media Discourse: A Case of Fukushima Daiichi Nuclear Accident.
In: Proceedings of the ACM on Human-Computer Interaction, 7(CSCW1).
doi:10.1145/3579484.
Lyu, X., Chen, Z., Wu, D., & Wang, W. 2020. Sentiment Analysis on Chinese Weibo
Regarding COVID-19. In: Vol. 12430 LNAI. Lecture Notes in Computer Science
(including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in
Bioinformatics) (pp. 710-721).
Ma, L., Zhang, D., Yang, J., & Luo, X. 2016. Sentiment orientation analysis of short
text based on background and domain sentiment lexicon expansion. Paper
presented at the 2016 5th International Conference on Computer Science and
Network Technology (ICCSNT).
Ma, X., Liu, W., Zhou, X., Qin, C., Chen, Y., Xiang, Y., et al., 2020. Evolution of online
public opinion during meteorological disasters. Environmental Hazards 19 (4),
375–397. https://doi.org/10.1080/17477891.2019.1685932.
Mamta, Ekbal, A., Bhattacharyya, P., Srivastava, S., Kumar, A., & Saha, T. 2020. Multi-
domain tweet corpora for sentiment analysis: Resource creation and evaluation.
Paper presented at the LREC 2020 - 12th International Conference on Language
Resources and Evaluation, Conference Proceedings.
Mäntylä, M.V., Graziotin, D., Kuutila, M., 2018. The evolution of sentiment analysis—
A review of research topics, venues, and top cited papers. Computer Science
Review 27, 16–32. https://doi.org/10.1016/j.cosrev.2017.10.002.
Manunta, G., 1999. What is security?. Security Journal 12 (3), 57–66.
Mathayomchan, B., Taecharungroj, V., Wattanacharoensil, W., 2022. Evolution of
COVID-19 tweets about Southeast Asian Countries: topic modelling and
sentiment analyses. Place Branding and Public Diplomacy. https://doi.org/
10.1057/s41254-022-00271-5.
Mendon, S., Dutta, P., Behl, A., Lessmann, S., 2021. A Hybrid Approach of Machine
Learning and Lexicons to Sentiment Analysis: Enhanced Insights from Twitter
Data of Natural Disasters. Information Systems Frontiers. https://doi.org/
10.1007/s10796-021-10107-x.
Miller, G.A., 1995. WordNet: a lexical database for English. Communications of the
ACM 38 (11), 39–41.
Min, K., Ma, C., Zhao, T., & Li, H. 2015. BosonNLP: An ensemble approach for word
segmentation and POS tagging. In: Natural language processing and chinese
computing (pp. 520-526): Springer.
Mohamed Ridhwan, K., Hargreaves, C.A., 2021. Leveraging Twitter data to
understand public sentiment for the COVID-19 outbreak in Singapore.
International Journal of Information Management Data Insights 1, (2). https://
doi.org/10.1016/j.jjimei.2021.100021 100021.
Mohammad, S.M., Turney, P.D., 2013. Crowdsourcing a word–emotion association
lexicon. Computational Intelligence 29 (3), 436–465.
Moraes Silva, L.M., Valêncio, C.R., Donegá Zafalon, G.F., Columbini, A.C., 2022.
Feature Selection with Hybrid Bio-inspired Approach for Classifying Multi-
idiom Social Media Sentiment Analysis. Paper presented at the International
Conference on Enterprise Information Systems, ICEIS - Proceedings.
Mustakim, Fauzi, M. Z., Mustafa, Abdullah, A., & Rohayati. 2021. Clustering of Public
Opinion on Natural Disasters in Indonesia Using DBSCAN and K-Medoids
Algorithms. Paper presented at the Journal of Physics: Conference Series.
Nagapudi, V., Agrawal, A., Bulusu, N., 2021. Extracting Physical Events from Digital
Chatter for Covid-19. Paper presented at the 2021 IEEE International Conference
on Smart Computing (SMARTCOMP).
Nielsen, F. Å. 2011. A new ANEW: Evaluation of a word list for sentiment analysis in
microblogs. arXiv preprint arXiv:1103.2903.
Obiedat, R., Harfoushi, O., Qaddoura, R., Al-Qaisi, L., Al-Zoubi, A.M., 2021. An
evolutionary-based sentiment analysis approach for enhancing government
decisions during covid-19 pandemic: The case of Jordan. Applied Sciences
(Switzerland) 11 (19). https://doi.org/10.3390/app11199080.
Olusegun, R., Oladunni, T., Audu, H., Houkpati, Y., Bengesi, S., 2023. Text Mining and
Emotion Classification on Monkeypox Twitter Dataset: A Deep Learning-Natural
Language Processing (NLP) Approach. IEEE Access 11, 49882–49894. https://doi.
org/10.1109/ACCESS.2023.3277868.
Ortmeier, P.J., 1998. Public safety and security administration. Gulf Professional
Publishing.
Pan, W., Han, Y., Li, J., Zhang, E., He, B., 2022. The positive energy of netizens:
development and application of fine-grained sentiment lexicon and emotional
intensity model. Current Psychology. https://doi.org/10.1007/s12144-022-
03876-4.
Pang, B., Lee, L., 2008. Opinion Mining and Sentiment Analysis. Foundations and
Trends
Ò
. Information Retrieval 2 (1–2), 1–135. https://doi.org/10.1561/
1500000011.
Pedrycz, W., & Chen, S.-M. 2016. Sentiment Analysis and Ontology Engineering.
Studies in Computational Intelligence, 639.
Pennebaker, J. W., Francis, M. E., & Booth, R. J. 2001. Linguistic inquiry and word
count: LIWC 2001. Mahway: Lawrence Erlbaum Associates, 71(2001), 2001.
Pfeffer, J., & Morstatter, F. 2016. Geotagged Twitter posts from the United States: A
tweet collection to investigate representativeness. Version: 1. GESIS Data
Archive. Dataset. In.
Phillips, M. E. 2017. Hurricane harvey twitter dataset. Univ. North Texas, Denton,
TX, USA, Tech. Rep. ark:/67531/metadc993940.
Pope, D., Griffith, J., 2016. An analysis of online twitter sentiment surrounding the
european refugee crisis. Paper presented at the IC3K 2016 - Proceedings of the
8th International Joint Conference on Knowledge Discovery, Knowledge
Engineering and Knowledge Management.
Pran, M.S.A., Bhuiyan, M.R., Hossain, S.A., Abujar, S., 2020. Analysis Of Bangladeshi
People’s Emotion During Covid-19 In Social Media Using Deep Learning. Paper
presented at the 2020 11th International Conference on Computing,
Communication and Networking Technologies (ICCCNT).
Prathap, B. R., & Ramesha, K. 2019. Twitter sentiment for analysing different types
of crimes. Paper presented at the Proceedings of the 2018 International
Conference On Communication, Computing and Internet of Things, IC3IoT 2018.
Praveen, S.V., Ittamalla, R., Subramanian, D., 2022. Challenges in successful
implementation of Digital contact tracing to curb COVID-19 from global
citizen’s perspective: a text analysis study. International Journal of Pervasive
Computing and Communications 18 (5), 491–498. https://doi.org/10.1108/
IJPCC-09-2020-0147.
Pu, X., Jiang, Q., Fan, B., 2022. Chinese public opinion on Japan’s nuclear wastewater
discharge: A case study of Weibo comments based on a thematic model. Ocean
and Coastal Management 225. https://doi.org/10.1016/j.
ocecoaman.2022.106188.
Python, D. L. U., Kulkarni, A., & Shivananda, A. Natural Language Processing Recipes.
Qi, H., Jiang, H., Bu, W., Zhang, C., & Shim, K. J. 2019. Tracking Political Events in
Social Media: A Case Study of Hong Kong Protests. Paper presented at the
Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019.
Qian, E. 2019. Twitter Climate Change Sentiment Dataset. Retrieved from: https://
www.kaggle.com/datasets/edqian/twitter-climate-change-sentiment-dataset.
Qin, Z., Ronchieri, E., 2022. Exploring Pandemics Events on Twitter by Using
Sentiment Analysis and Topic Modelling. Applied Sciences (Switzerland) 12
(23). https://doi.org/10.3390/app122311924.
Qin, T., Wang, B., Liu, Z., Chen, Z., Ding, J., 2022. High-quality tweet generation for
online behavior security management based on semantics measurement.
Transactions on Emerging Telecommunications Technologies 33 (6). https://
doi.org/10.1002/ett.3811.
Quach, H. L., Pham, T. Q., Hoang, N. A., Phung, D. C., Nguyen, V. C., Le, S. H., et al.
2022. Using ‘infodemics’ to understand public awareness and perception of
SARS-CoV-2: A longitudinal analysis of online information about COVID-19
incidence and mortality during a major outbreak in Vietnam, July—September
2020. PLoS ONE, 17(4 April). doi:10.1371/journal.pone.0266299.
Rathke, B., Yu, H., Huang, H., 2023. What Remains Now That The Fear Has Passed?
Developmental Trajectory Analysis of COVID-19 Pandemic for Co-occurrences
of Twitter, Google Trends, and Public Health Data. Disaster Medicine and Public
Health Preparedness. https://doi.org/10.1017/dmp.2023.101.
Ray, S., Kumar, A.M.S., 2023. Prediction and Analysis of Sentiments of Reddit Users
towards the Climate Change Crisis. Paper presented at the 2023 International
Conference on Networking and Communications (ICNWC).
Razali, N.A.M., Malizan, N.A., Hasbullah, N.A., Wook, M., Zainuddin, N.M., Ishak, K.K.,
et al., 2021. Opinion mining for national security: techniques, domain
applications, challenges and research opportunities. Journal of Big Data 8 (1).
https://doi.org/10.1186/s40537-021-00536-5.
Rudra, K., Ghosh, S., Ganguly, N., Goyal, P., Ghosh, S., 2015. Extracting situational
information from microblogs during disaster events: a classification-
summarization approach. Paper presented at the Proceedings of the 24th
ACM international on conference on information and knowledge management.
Sadiq, A.M., Ahn, H., Choi, Y.B., 2020. Human sentiment and activity recognition in
disaster situations using social media images based on deep learning. Sensors
(Switzerland) 20 (24), 1–26. https://doi.org/10.3390/s20247115.
Samaras, L., García-Barriocanal, E., Sicilia, M.-A., 2023. Sentiment analysis of COVID-
19 cases in Greece using Twitter data. Expert Systems with Applications 230,.
https://doi.org/10.1016/j.eswa.2023.120577 120577.
Sari, I. C., & Ruldeviyani, Y. 2020. Sentiment Analysis of the Covid-19 Virus Infection
in Indonesian Public Transportation on Twitter Data: A Case Study of Commuter
Line Passengers. Paper presented at the 2020 International Workshop on Big
Data and Information Security, IWBIS 2020.
Sattaru, J.S., Bhatt, C.M., Saran, S., 2021. Utilizing Geo-Social Media as a Proxy Data
for Enhanced Flood Monitoring. Journal of the Indian Society of Remote Sensing
49 (9), 2173–2186. https://doi.org/10.1007/s12524-021-01376-9.
Sharma, S., Jain, A., 2020. Role of sentiment analysis in social media security and
analytics. Wiley Interdisciplinary Reviews: Data Mining and Knowledge
Discovery 10 (5), e1366.
Shi, J., Li, W., Yang, Y., Yao, N., Bai, Q., Yongchareon, S., Yu, J., 2021. Automated
Concern Exploration in Pandemic Situations - COVID-19 as a Use Case. In: Vol.
12280 LNAI. Lecture Notes in Computer Science (including subseries Lecture
Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 178–
185.
Sliva, A., Shu, K., & Liu, H. 2019. Using social media to understand cyber attack
behavior. In: Vol. 783. Advances in Intelligent Systems and Computing (pp. 636-
645).
Smith, K.S., McCreadie, R., Macdonald, C., Ounis, I., 2018. Regional Sentiment Bias in
Social Media Reporting During Crises. Information Systems Frontiers 20 (5),
1013–1025. https://doi.org/10.1007/s10796-018-9827-x.
Srikanth, J., Damodaram, A., Teekaraman, Y., Kuppusamy, R., Thelkar, A.R., 2022.
Sentiment Analysis on COVID-19 Twitter Data Streams Using Deep Belief Neural
Networks. Computational Intelligence and Neuroscience 2022. https://doi.org/
10.1155/2022/8898100.
Stevens, D., Vaughan-Williams, N., 2016. Citizens and Security Threats: Issues,
Perceptions and Consequences Beyond the National Frame. British Journal of
Political Science 46 (1), 149–175. https://doi.org/10.1017/s0007123414000143.
M.S. Md Suhaimin, M.H. Ahmad Hijazi, E.G. Moung et al. Journal of King Saud University Computer and Information Sciences 35 (2023) 101776
23
Su, C. J., & Li, Y. 2018. Sentiment analysis and information diffusion on social media:
The case of the Zika virus. Paper presented at the International Conference on
Biological Information and Biomedical Engineering, BIBE 2018.
Subramaniyaswamy, V., Logesh, R., Abejith, M., Umasankar, S., Umamakeswari, A.,
2017. Sentiment analysis of tweets for estimating criticality and security of
events. Journal of Organizational and End User Computing 29 (4), 51–71.
https://doi.org/10.4018/JOEUC.2017100103.
Sufi, F.K., Khalil, I., 2022. Automated Disaster Monitoring From Social Media Posts
Using AI-Based Location Intelligence and Sentiment Analysis. IEEE Transactions
on Computational Social Systems 1–11. https://doi.org/10.1109/
TCSS.2022.3157142.
Suhaimin, M.S.M., Hijazi, M.H.A., Alfred, R., Coenen, F., 2017. Natural language
processing based features for sarcasm detection: An investigation using
bilingual social media texts. Paper presented at the 2017 8th International
conference on information technology (ICIT).
Suhaimin, M.S.M., Hijazi, M.H.A., Alfred, R., Coenen, F., 2019. Modified framework
for sarcasm detection and classification in sentiment analysis. Indonesian
Journal of Electrical Engineering and Computer Science 13 (3), 1175–1183.
Sukhwal, P. C., & Kankanhalli, A. 2022. Determining containment policy impacts on
public sentiment during the pandemic using social media data. In: Proceedings
of the National Academy of Sciences of the United States of America, 119(19).
doi:10.1073/pnas.2117292119.
Sun, R., An, L., Li, G., Yu, C., 2022. Predicting social media rumours in the context of
public health emergencies. Journal of Information Science. https://doi.org/
10.1177/01655515221137879.
Sun, J., Zeng, Z., Li, T., Sun, S., 2023. Analyzing the spatiotemporal coupling
relationship between public opinion and the epidemic during COVID-19.
Library Hi Tech. https://doi.org/10.1108/LHT-10-2022-0462.
Tan, K., Xie, L., & Lin, L. 2021. Research on the Evolution Path of Public Opinion in
Environmental Emergencies. Paper presented at the E3S Web of Conferences.
Tang, H., Xu, H., Rui, X., Heng, X., Song, Y., 2022. The Identification and Analysis of
the Centers of Geographical Public Opinions in Flood Disasters Based on
Improved Naïve Bayes Network. Int. J.of Environmental Research and Public
Health 19 (17). https://doi.org/10.3390/ijerph191710809.
Theja Bhavaraju, S.K., Beyney, C., Nicholson, C., 2019. Quantitative analysis of social
media sensitivity to natural disasters. International Journal of Disaster Risk
Reduction 39. https://doi.org/10.1016/j.ijdrr.2019.101251.
Thelwall, M., 2017. The Heart and soul of the web? Sentiment strength detection in
the social web with SentiStrength. In: Cyberemotions. Springer, pp. 119–134.
Thorat, S., & Namrata Mahender, C. 2019. Domain-specific fuzzy rule-based opinion
mining. In: Vol. 56. Lecture Notes in Networks and Systems (pp. 287-294).
Thukral, T., Varshney, A., Gaur, V., 2021. Intensity quantification of public opinion
and emotion analysis on climate change. International Journal of Advanced
Technology and Engineering Exploration 8 (83), 1351–1366. https://doi.org/
10.19101/IJATEE.2021.874417.
To, H., Agrawal, S., Kim, S. H., & Shahabi, C. 2017. On Identifying Disaster-Related
Tweets: Matching-Based or Learning-Based. In: Paper presented at the
Proceedings - 2017 IEEE 3rd International Conference on Multimedia Big
Data, BigMM 2017.
Tsai, M. H., & Wang, Y. 2020. A New Ensemble Method for Classifying Sentiments of
COVID-19-Related Tweets. Paper presented at the Proceedings - 2020
International Conference on Computational Science and Computational
Intelligence, CSCI 2020.
Tsao, S.F., MacLean, A., Chen, H., Li, L., Yang, Y., Butt, Z.A., 2022. Public Attitudes
During the Second Lockdown: Sentiment and Topic Analyses Using Tweets
From Ontario. Canada. International Journal of Public Health 67. https://doi.org/
10.3389/ijph.2022.1604658.
Uthirapathy, S.E., Sandanam, D., 2023. Topic Modelling and Opinion Analysis On
Climate Change Twitter Data Using LDA And BERT Model. Procedia Computer
Science 218, 908–917. https://doi.org/10.1016/j.procs.2023.01.071.
Van Atteveldt, W., van der Velden, M.A., Boukes, M., 2021. The validity of sentiment
analysis: Comparing manual annotation, crowd-coding, dictionary approaches,
and machine learning algorithms. Communication Methods and Measures 15
(2), 121–140.
Wadawadagi, R., & Pagi, V. 2021. Disaster Severity Analysis from Micro-Blog Texts
Using Deep-NN. In: Vol. 1176. Advances in Intelligent Systems and Computing
(pp. 145-157).
Wahid, J. A., Hussain, S., Wang, H., Wu, Z., Shi, L., & Gao, Y. 2021. Aspect oriented
sentiment classification of COVID-19 twitter data; An enhanced LDA based text
analytic approach. Paper presented at the Proceedings - 2021 International
Conference on Computer Engineering and Artificial Intelligence, ICCEAI 2021.
Wang, N., Varghese, B., & Donnelly, P. D. 2016. A machine learning analysis of
Twitter sentiment to the Sandy Hook shootings. Paper presented at the 2016
IEEE 12th International Conference on e-Science (e-Science).
Wang, M., Wu, H., Zhang, T., & Zhu, S. 2020. Identifying critical outbreak time
window of controversial events based on sentiment analysis. PLoS ONE, 15(10
October 2020). doi:10.1371/journal.pone.0241355.
Wang, K., Qiu, Q., Wu, M., & Qiu, J. 2020. Topic analysis of internet public opinion on
natural disasters based on time division. Paper presented at the Proceedings -
2020 3rd International Conference on Advanced Electronic Materials,
Computers and Software Engineering, AEMCSE 2020.
Wu, Z., & Lu, Y. 2017. A study on micro-blog sentiment analysis of public
emergencies under the environment of big data. Paper presented at the
Proceedings of the 29th Chinese Control and Decision Conference, CCDC 2017.
Wu, D., Cui, Y., 2018. Disaster early warning and damage assessment analysis using
social media data and geo-location information. Decision Support Systems 111,
48–59. https://doi.org/10.1016/j.dss.2018.04.005.
Xia, H., An, W., Li, J., Zhang, Z.J., 2020. Outlier knowledge management for extreme
public health events: Understanding public opinions about COVID-19 based on
microblog data. Socio-Economic Planning Sciences. https://doi.org/10.1016/j.
seps.2020.100941.
Xia, H., An, W., Li, J., Zhang, Z., 2022. Outlier knowledge management for extreme
public health events: Understanding public opinions about COVID-19 based on
microblog data. Socio-Economic Planning Sciences 80,. https://doi.org/10.1016/
j.seps.2020.100941 100941.
Xie, X. H., & Chen, L. 2022. Analysis of Sentiment Tendency Based on Major Public
Health Events. Paper presented at the 2022 4th International Conference on
Natural Language Processing (ICNLP).
Xiong, J., Hswen, Y., Naslund, J.A., 2020. Digital surveillance for monitoring
environmental health threats: A case study capturing public opinion from
twitter about the 2019 Chennai water crisis. International Journal of
Environmental Research and Public Health 17 (14), 1–15. https://doi.org/
10.3390/ijerph17145077.
Xu, W., Liu, L., Shang, W., 2017. Leveraging cross-media analytics to detect events
and mine opinions for emergency management. Online Information Review 41
(4), 487–506. https://doi.org/10.1108/OIR-08-2015-0286.
Yan, Y., Chen, J., Wang, Z., 2020. Mining public sentiments and perspectives from
geotagged social media data for appraising the post-earthquake recovery of
tourism destinations. Applied Geography 123. https://doi.org/10.1016/j.
apgeog.2020.102306.
Yin, F., Beibei, Z., Su, P., & Chai, J. 2017. Research on the text sentiment classification
about the social hot events on Weibo. Paper presented at the Proceedings of
2016 IEEE Advanced Information Management, Communicates, Electronic and
Automation Control Conference, IMCEC 2016.
Yu, S., Eisenman, D., Han, Z., 2021. Temporal dynamics of public emotions during
the COVID-19 pandemic at the epicenter of the outbreak: Sentiment analysis of
weibo posts from Wuhan. Journal of Medical Internet Research 23 (3). https://
doi.org/10.2196/27078.
Yu, X., Zhong, C., Li, D., & Xu, W. 2020. Sentiment analysis for news and social media
in COVID-19. Paper presented at the Proceedings of the 6th ACM SIGSPATIAL
International Workshop Emergency Management using GIS 2020, EM-GIS 2020.
Yu, X., Ferreira, M.D., Paulovich, F.V., 2021. Senti-COVID19: An interactive visual
analytics system for detecting public sentiment and insights regarding COVID-
19 from social media. IEEE Access 9, 126684–126697. https://doi.org/10.1109/
ACCESS.2021.3111833.
Yuan, F., Li, M., Liu, R., 2020. Understanding the evolutions of public responses using
social media: Hurricane Matthew case study. International Journal of Disaster
Risk Reduction 51. https://doi.org/10.1016/j.ijdrr.2020.101798.
Yuan, F., Liu, R., 2020. Mining Social Media Data for Rapid Damage Assessment
during Hurricane Matthew: Feasibility Study. Journal of Computing in Civil
Engineering 34 (3). https://doi.org/10.1061/(ASCE)CP.1943-5487.0000877.
Yue, L., Chen, W., Li, X., Zuo, W., Yin, M., 2019. A survey of sentiment analysis in
social media. Knowledge and Information Systems 60 (2), 617–663.
Yue, S., Kondari, J., Musave, A., Smith, R., & Yue, S. 2018. Using twitter data to
determine hurricane category: An experiment. Paper presented at the
Proceedings of the International ISCRAM Conference.
Zander, K.K., Garnett, S.T., Ogie, R., Alazab, M., Nguyen, D., 2023. Trends in bushfire
related tweets during the Australian ‘Black Summer’ of 2019/20. Forest Ecology
and Management 545. https://doi.org/10.1016/j.foreco.2023.121274.
Zeng, L., 2022. Chinese Public Perception of Climate Change on Social Media: An
Investigation Based on Data Mining and Text Analysis. Journal of Environmental
and Public Health 2022. https://doi.org/10.1155/2022/6294436.
Zhang, X., & Xu, Z. 2021. Research on Sentiment Analysis and Entity Recognition of
COVID-19 Based on Multi-task Sentiment Analysis Model in Artificial
Intelligence. Paper presented at the 2021 International Conference on
Artificial Intelligence and Electromechanical Automation (AIEA).
Zhang, T., Cheng, C., 2021. Temporal and spatial evolution and influencing factors of
public sentiment in natural disasters-a case study of typhoon haiyan. ISPRS
International Journal of Geo-Information 10 (5). https://doi.org/10.3390/
ijgi10050299.
Zhang, B., Lin, J., Luo, M., Zeng, C., Feng, J., Zhou, M., Deng, F., 2022. Changes in Public
Sentiment under the Background of Major Emergencies—Taking the Shanghai
Epidemic as an Example. International Journal of Environmental Research and
Public Health 19 (19). https://doi.org/10.3390/ijerph191912594.
Zhang, X., Ma, Y., 2023. An ALBERT-based TextCNN-Hatt hybrid model enhanced
with topic knowledge for sentiment analysis of sudden-onset disasters.
Engineering Applications of Artificial Intelligence 123. https://doi.org/10.1016/
j.engappai.2023.106136.
Zhang, W., Wang, M., Zhu, Y.-C., 2020. Does government information release really
matter in regulating contagion-evolution of negative emotion during public
emergencies? From the perspective of cognitive big data analytics. International
Journal of Information Management 50, 498–514. https://doi.org/10.1016/j.
ijinfomgt.2019.04.001.
Zhang, L., Wei, J., Boncella, R.J., 2020. Emotional communication analysis of
emergency microblog based on the evolution life cycle of public opinion.
Information Discovery and Delivery 48 (3), 151–163. https://doi.org/10.1108/
IDD-10-2019-0074.
Zhang, C., Xu, S., Li, Z., Hu, S., 2021. Understanding concerns, sentiments, and
disparities among population groups during the COVID-19 pandemic via twitter
M.S. Md Suhaimin, M.H. Ahmad Hijazi, E.G. Moung et al. Journal of King Saud University Computer and Information Sciences 35 (2023) 101776
24
data mining: Large-scale cross-sectional study. Journal of Medical Internet
Research 23 (3). https://doi.org/10.2196/26482.
Zhang, W., Zhu, Y.C., Wang, J.P., 2019. An intelligent textual corpus big data
computing approach for lexicons construction and sentiment classification of
public emergency events. Multimedia Tools and Applications 78 (21), 30159–
30174. https://doi.org/10.1007/s11042-018-7018-x.
Zhao, Y., Cheng, S., Yu, X., Xu, H., 2020. Chinese public’s attention to the COVID-19
epidemic on social media: Observational descriptive study. Journal of Medical
Internet Research 22 (5). https://doi.org/10.2196/18825.
Zhao, S., Chen, L., Liu, Y., Yu, M., & Han, H. 2022. Deriving anti-epidemic policy from
public sentiment: A framework based on text analysis with microblog data.
PLoS ONE, 17(8 August). doi:10.1371/journal.pone.0270953.
Zhong, Z. 2021. Computer Intelligent Prediction Method for Hot Event by Long
Short-Term Memory Model. Paper presented at the Proceedings of 2021 IEEE
International Conference on Emergency Science and Information Technology,
ICESIT 2021.
Zhou, Q., 2021. Detecting the public’s information behaviour preferences in
multiple emergency events. Journal of Information Science. https://doi.org/
10.1177/01655515211027789.
Zhou, Q., 2023. Fine-grained detection on the public’s multi-dimensional
communication preferences in emergency events. Heliyon 9 (6). https://doi.
org/10.1016/j.heliyon.2023.e16312.
Zhou, Q., 2023. Support towards emergency event processing via fine-grained
analysis on users’ expressions. Aslib Journal of Information Management.
https://doi.org/10.1108/AJIM-05-2022-0263.
Zhou, Q., Jing, M., 2020. Multidimensional mining of public opinion in emergency
events. Electronic Library 38 (3), 545–560. https://doi.org/10.1108/EL-12-2019-
0276.
Zhou, Q., Zhang, C., 2017. Emotion evolutions of sub-topics about popular events on
microblogs. Electronic Library 35 (4), 770–782. https://doi.org/10.1108/EL-09-
2016-0184.
Zhu, B., Zheng, X., Liu, H., Li, J., Wang, P., 2020. Analysis of spatiotemporal
characteristics of big data on social media sentiment with COVID-19 epidemic
topics. Chaos, Solitons and Fractals 140. https://doi.org/10.1016/j.
chaos.2020.110123.
Zhuang, M., Li, Y., Tan, X., Xing, L., Lu, X., 2021. Analysis of public opinion evolution
of COVID-19 based on LDA-ARMA hybrid model. Complex and Intelligent
Systems 7 (6), 3165–3178. https://doi.org/10.1007/s40747-021-00514-7.
Zong, C., Xia, R., & Zhang, J. 2021. Text Data Mining (Vol. 711): Springer.
M.S. Md Suhaimin, M.H. Ahmad Hijazi, E.G. Moung et al. Journal of King Saud University Computer and Information Sciences 35 (2023) 101776
25
... The platform used in this process is Python, with Conda as the package and environment management systems. Data pre-processing includes cleansing, transformation, tokenization, data stemming, and stopword removal and carried out in this stage include [12]- [17]: -Cleansing: the process removes special characters such as hashtags, URL links, ASCII characters, numbers, and HTML attributes. -Transformation: this stage begins with converting all words to lowercase. ...
... In contrast, the importance of false positives (FP) and false negatives (FN) indicates that the data classification is incorrect [30]. Based on the confusion matrix in Table 1, performance measure of precision, recall, and F1-score [12] can be calculated in (1)-(3): ...
... Precision, recall, and F1-score for each classification are calculated to correct the classification values mentioned [12]. Table 7 displays the results of these evaluations. ...
Article
Full-text available
The Ministry of Energy and Mineral Resources of the Republic of Indonesia regularly assessed public satisfaction with its online licensing services. User rated their satisfaction at 3.42 on a scale of 4, below the organization's average of 3.53. Evaluating public service performance is crucial for quality improvement. Previous research relied solely on survey data to assess public satisfaction. This study goes further by analyzing user feedback in text form from an online licensing application to identify negative aspects of the service that need enhancement. The dataset spanned September 2019 to February 2023, with 24,112 entries. The choice of classification methods on the highest accuracy values among decision tree, random forest, naive bayes, stochastic gradient descent, logistic regression (LR), and k-nearest neighbor. The text data was converted into numerical form using CountVectorizer and term frequency-inverse document frequency (TF-IDF) techniques, along with unigrams and bigrams for dividing sentences into word segments. LR bigram CountVectorizer ranked highest with 89% for average precision, F1-score, and recall, compared to the other five classification methods. The sentiment analysis polarity level was 36.2% negative. Negative sentiment revealed expectations from the public to the ministry to improve the top three aspects: system, mechanism, and procedure; infrastructure and facilities; and service specification product types.
... The platform used in this process is Python, with Conda as the package and environment management systems. Data pre-processing includes cleansing, transformation, tokenization, data stemming, and stopword removal and carried out in this stage include [12]- [17]: -Cleansing: the process removes special characters such as hashtags, URL links, ASCII characters, numbers, and HTML attributes. -Transformation: this stage begins with converting all words to lowercase. ...
... In contrast, the importance of false positives (FP) and false negatives (FN) indicates that the data classification is incorrect [30]. Based on the confusion matrix in Table 1, performance measure of precision, recall, and F1-score [12] can be calculated in (1)-(3): ...
... Precision, recall, and F1-score for each classification are calculated to correct the classification values mentioned [12]. Table 7 displays the results of these evaluations. ...
Article
Full-text available
The Ministry of Energy and Mineral Resources of the Republic of Indonesia regularly assessed public satisfaction with its online licensing services. User rated their satisfaction at 3.42 on a scale of 4, below the organization's average of 3.53. Evaluating public service performance is crucial for quality improvement. Previous research relied solely on survey data to assess public satisfaction. This study goes further by analyzing user feedback in text form from an online licensing application to identify negative aspects of the service that need enhancement. The dataset spanned September 2019 to February 2023, with 24,112 entries. The choice of classification methods on the highest accuracy values among decision tree, random forest, naive bayes, stochastic gradient descent, logistic regression (LR), and k-nearest neighbor. The text data was converted into numerical form using CountVectorizer and term frequency-inverse document frequency (TF-IDF) techniques, along with unigrams and bigrams for dividing sentences into word segments. LR bigram CountVectorizer ranked highest with 89% for average precision, F1-score, and recall, compared to the other five classification methods. The sentiment analysis polarity level was 36.2% negative. Negative sentiment revealed expectations from the public to the ministry to improve the top three aspects: system, mechanism, and procedure; infrastructure and facilities; and service specification product types.
... The platform used in this process is Python, with Conda as the package and environment management systems. Data pre-processing includes cleansing, transformation, tokenization, data stemming, and stopword removal and carried out in this stage include [12]- [17]: -Cleansing: the process removes special characters such as hashtags, URL links, ASCII characters, numbers, and HTML attributes. -Transformation: this stage begins with converting all words to lowercase. ...
... In contrast, the importance of false positives (FP) and false negatives (FN) indicates that the data classification is incorrect [30]. Based on the confusion matrix in Table 1, performance measure of precision, recall, and F1-score [12] can be calculated in (1)-(3): ...
... Precision, recall, and F1-score for each classification are calculated to correct the classification values mentioned [12]. Table 7 displays the results of these evaluations. ...
Article
Full-text available
The Ministry of Energy and Mineral Resources of the Republic of Indonesia regularly assessed public satisfaction with its online licensing services. User rated their satisfaction at 3.42 on a scale of 4, below the organization's average of 3.53. Evaluating public service performance is crucial for quality improvement. Previous research relied solely on survey data to assess public satisfaction. This study goes further by analyzing user feedback in text form from an online licensing application to identify negative aspects of the service that need enhancement. The dataset spanned September 2019 to February 2023, with 24,112 entries. The choice of classification methods on the highest accuracy values among decision tree, random forest, naive bayes, stochastic gradient descent, logistic regression (LR), and k-nearest neighbor. The text data was converted into numerical form using CountVectorizer and term frequency-inverse document frequency (TF-IDF) techniques, along with unigrams and bigrams for dividing sentences into word segments. LR bigram CountVectorizer ranked highest with 89% for average precision, F1-score, and recall, compared to the other five classification methods. The sentiment analysis polarity level was 36.2% negative. Negative sentiment revealed expectations from the public to the ministry to improve the top three aspects: system, mechanism, and procedure; infrastructure and facilities; and service specification product types.
... The platform used in this process is Python, with Conda as the package and environment management systems. Data pre-processing includes cleansing, transformation, tokenization, data stemming, and stopword removal and carried out in this stage include [12]- [17]: -Cleansing: the process removes special characters such as hashtags, URL links, ASCII characters, numbers, and HTML attributes. -Transformation: this stage begins with converting all words to lowercase. ...
... In contrast, the importance of false positives (FP) and false negatives (FN) indicates that the data classification is incorrect [30]. Based on the confusion matrix in Table 1, performance measure of precision, recall, and F1-score [12] can be calculated in (1)-(3): ...
... Precision, recall, and F1-score for each classification are calculated to correct the classification values mentioned [12]. Table 7 displays the results of these evaluations. ...
Article
Full-text available
The Ministry of Energy and Mineral Resources of the Republic of Indonesia regularly assessed public satisfaction with its online licensing services. User rated their satisfaction at 3.42 on a scale of 4, below the organization's average of 3.53. Evaluating public service performance is crucial for quality improvement. Previous research relied solely on survey data to assess public satisfaction. This study goes further by analyzing user feedback in text form from an online licensing application to identify negative aspects of the service that need enhancement. The dataset spanned September 2019 to February 2023, with 24,112 entries. The choice of classification methods on the highest accuracy values among decision tree, random forest, naive bayes, stochastic gradient descent, logistic regression (LR), and k-nearest neighbor. The text data was converted into numerical form using CountVectorizer and term frequency-inverse document frequency (TF-IDF) techniques, along with unigrams and bigrams for dividing sentences into word segments. LR bigram CountVectorizer ranked highest with 89% for average precision, F1-score, and recall, compared to the other five classification methods. The sentiment analysis polarity level was 36.2% negative. Negative sentiment revealed expectations from the public to the ministry to improve the top three aspects: system, mechanism, and procedure; infrastructure and facilities; and service specification product types.
... This activity is important as it provides insights into how presidential candidates are perceived by the public, potentially influencing campaign directions and political strategies. On the other hand, sentiment analysis faces the challenge of complexity in interpreting large and diverse text data (Krishna, 2023;Suhaimin, 2023). In this context, classification is at the heart of sentiment analysis, as it allows the sorting of opinions into different categories (positive, negative, neutral), offering a clearer and more structured view of public sentiment (Errami, 2023;Hung, 2023;Lasri, 2023;G. ...
Article
Optimizing classification methods (forward selection, backward elimination, and optimized selection) and ensemble techniques (AdaBoost and Bagging) are essential for accurate sentiment analysis, particularly in political contexts on social media. This research compares advanced classification models with standard ones (Decision Tree, Random Tree, Naive Bayes, Random Forest, K- NN, Neural Network, and Generalized Linear Model), analyzing 1,200 tweets from December 10-11, 2023, focusing on "Indonesia" and "capres." It encompasses 490 positive, 355 negative, and 353 neutral sentiments, reflecting diverse opinions on presidential candidates and political issues. The enhanced model achieves 96.37% accuracy, with the backward selection model reaching 100% accuracy for negative sentiments. The study suggests further exploration of hybrid feature selection and improved classifiers for high-stakes sentiment analysis. With forward feature selection and ensemble method, Naive Bayes stands out for classifying negative sentiments while maintaining high overall accuracy (96.37%).
... However, there is a lack of consistency in the results obtained by different approaches on the same dataset or by the same approach on different datasets (He, Yin, & Zheng, 2022). Recently, pre-trained models have gained widespread popularity in representing texts (de Andrade et al., 2023), leading to advanced results in sentiment analysis for social media (Melton et al., 2022;Suhaimin et al., 2023). Nevertheless, the high noise in informal language locks down its forward performance. ...
... Sentiment analysis has been explored in the literature in several contexts currently, trying to emphasize the perceptions of the end user and understand the user's experience [53,54]. In the work by Chen et al. [55], the authors used deep learning to analyze emotional cues in earnings conference call audio, finding that positive emotions in statements positively influence analysts' report issuance, negative emotions have the opposite effect, and non-negative emotions in questions and responses positively impact analysts. ...
Article
Full-text available
Cyber-attacks have become increasingly prevalent with the widespread integration of technology into various aspects of our lives. The surge in social media platform usage has prompted users to share their firsthand experiences with cyber-attacks. Despite this, previous literature has not extensively investigated individuals' experiences with these attacks. This study aims to comprehensively explore and analyze the content shared by cyber-attack victims in Saudi Arabia, encompassing text, video, and audio formats. The primary objective is to investigate the factors influencing victims' perceptions of the security risks associated with these attacks. Following data collection, preparation, and cleaning, Latent Dirichlet Allocation (LDA) is employed for topic modeling, shedding light on potential factors impacting victims. Sentiment analysis is then utilized to examine the nuanced negative and positive perceptions of individuals. NVivo is deployed for data inspection, facilitating the presentation of insightful inferences. Hierarchical clustering is implemented to explore distinct clusters within the textual dataset. The study's results underscore the critical importance of spreading awareness among individuals regarding the various tactics employed by cyber attackers. Doi: 10.28991/ESJ-2024-08-01-010 Full Text: PDF
Article
Full-text available
Social media is widely used in emergencies, but the nature of the communication is poorly understood. We employed unsupervised topic modelling and sentiment analysis to analyse more than 80,000 Twitter tweets posted by users in Australia over a six-month period before, during and after the severe bushfires in 2019-2020, dubbed the 'Black Summer'. While tweets about bushfire updates dominated, politics, donations and support, impacts and public opinion were also prominent themes. Social impacts were important in the early phase of the fires (Sept & Oct 2019) with health impacts discussed when the fires were most intense and ecological impacts becoming important in the recovery phase. Twitter users also talked about emergency responses, mainly evacuation , as the fires were starting, showing that Twitter played an important role in communicating advice to leave early to avoid harm. Although the bushfires caused death, destruction and disruption, the sentiment of tweets was balanced − 40% of all tweets were positive, 36% negative and the remainder neutral (24%). Sentiments shifted little over time, but some topics were strongly associated with the expression of negative sentiments. The unusual severity of the fires was attributed to climate change in the early and recovery phases but misinformation about arson as a cause briefly diverted attention from climate change in the middle. Twitter was used to express anger about a lack of action by the Australian government to address climate change. Unexpectedly , Twitter users from the most affected areas were more likely to post positive tweets than those further away, particular during recovery, suggesting both resilience and gratitude for support provided to them. Analysis of trend data on Twitter using machine learning has the potential to identify, in real time, whether appropriate messages are reaching and being disseminated effectively, provide early warning of potentially harmful misin-formation and, if undertaken repeatedly, provide a metric against which responses to fire can be measured.
Article
Full-text available
formation about the event. Here, we investigate the public response to large and destructive hur-ricane Ian in late September 2022 by examining the textual content of tweets shared on Twitteracross the contiguous United States (CONUS). We mined and processed over twenty milliontweets for discovering the main topics of discussion and relationship between them, and classify-ing tweets into humanitarian topics and categories to help disaster management with thoroughsentiment analysis. We employed a variety of algorithms in Artificial Intelligence for Natural Lan-guage Processing (NLP) including sentiment analysis, topic modeling, and text classification toassimilate the information content in massive Twitter data. The findings of this study provide in-sights on how people utilize social media to learn and disseminate information about hurricaneevents, which accordingly aid emergency responders and disaster managers in mitigating thenegative consequences of such catastrophes and improving community preparedness (23) (PDF) Social response and Disaster management: Insights from twitter data Assimilation on Hurricane Ian. Available from: https://www.researchgate.net/publication/372347281_Social_response_and_Disaster_management_Insights_from_twitter_data_Assimilation_on_Hurricane_Ian [accessed Aug 25 2023].
Article
Full-text available
Objective The rapid onset of coronavirus disease 2019 (COVID-19) created a complex virtual collective consciousness. Misinformation and polarization were hallmarks of the pandemic in the United States, highlighting the importance of studying public opinion online. Humans express their thoughts and feelings more openly than ever before on social media; co-occurrence of multiple data sources have become valuable for monitoring and understanding public sentimental preparedness and response to an event within our society. Methods In this study, Twitter and Google Trends data were used as the co-occurrence data for the understanding of the dynamics of sentiment and interest during the COVID-19 pandemic in the United States from January 2020 to September 2021. Developmental trajectory analysis of Twitter sentiment was conducted using corpus linguistic techniques and word cloud mapping to reveal 8 positive and negative sentiments and emotions. Machine learning algorithms were used to implement the opinion mining how Twitter sentiment was related to Google Trends interest with historical COVID-19 public health data. Results The sentiment analysis went beyond polarity to detect specific feelings and emotions during the pandemic. Conclusions The discoveries on the behaviors of emotions at each stage of the pandemic were presented from the emotion detection when associated with the historical COVID-19 data and Google Trends data.
Article
Full-text available
With the rapid development of Internet technologies, the public can participate in the information communication of emergency events more conveniently and quickly. Once an emergency occurs, the public will immediately express and disseminate massive information about the causes, processes and results of the emergency. In the process of information communication, the public often adopts diversified communication modes, and then shows differential communication preferences. The detection of the public's communication preferences can more accurately understand the information demands of the public in events, and then contribute to the rational allocation of resources and improve the processing efficiency. Therefore, this paper conducted finer-grained mining on the public's online expressions in multiple events, so as to detect the public's communication preferences. Specifically, we collected the public's expressions related to emergency events from the social media and then we analyzed the expressions from multiple dimensions to obtain the corresponding communication features. Finally, based on the comparative analysis of diversified communication features, static and dynamic communication preferences were obtained. The experimental results indicate that the public's communication preferences do exist, which is universal and consistent. Meanwhile, constructing a better social environment and improving people's livelihood are the fundamental strategies to guide public opinion.
Article
Purpose The outbreak of COVID-19 has become a major public health emergency worldwide. How to effectively guide public opinion and implement precise prevention and control is a hot topic in current research. Mining the spatiotemporal coupling between online public opinion and offline epidemics can provide decision support for the precise management and control of future emergencies. Design/methodology/approach This study focuses on analyzing the spatiotemporal coupling relationship between public opinion and the epidemic. First, based on Weibo information and confirmed case information, a field framework is constructed using field theory. Second, SnowNLP is used for sentiment mining and LDA is utilized for topic extraction to analyze the topic evolution and the sentiment evolution of public opinion in each coupling stage. Finally, the spatial model is used to explore the coupling relationship between public opinion and the epidemic in space. Findings The findings show that there is a certain coupling between online public opinion sentiment and offline epidemics, with a significant coupling relationship in the time dimension, while there is no remarkable coupling relationship in space. In addition, the core topics of public concern are different at different coupling stages. Originality/value This study deeply explores the spatiotemporal coupling relationship between online public opinion and offline epidemics, adding a new research perspective to related research. The result can help the government and relevant departments understand the dynamic development of epidemic events and achieve precise control while mastering the dynamics of online public opinion.
Article
Natural disasters are usually sudden and unpredictable, so it is too difficult to infer them. Reducing the impact of sudden natural disasters on the economy and society is a very effective method to control public opinion about disasters and reconstruct them after disasters through social media. Thus, we propose a public sentiment feature extraction method by social media transmission to realize the intelligent analysis of natural disaster public opinion. Firstly, we offer a public opinion analysis method based on emotional features, which uses feature extraction and Transformer technology to perceive the sentiment in public opinion samples. Then, the extracted features are used to identify the public emotions intelligently, and the collection of public emotions in natural disasters is realized. Finally, through the collected emotional information, the public’s demands and needs in natural disasters are obtained, and the natural disaster public opinion analysis system based on social media communication is realized. Experiments demonstrate that our algorithm can identify the category of public opinion on natural disasters with an accuracy of 90.54%. In addition, our natural disaster public opinion analysis system can deconstruct the current situation of natural disasters from point to point and grasp the disaster situation in real-time.
Article
Background: Syndromic surveillance with the use of Internet data has been used to track and forecast epidemics for the last two decades, using different sources from social media to search engine records. More recently, studies have addressed how the World Wide Web could be used as a valuable source for analysing the reactions of the public to outbreaks and revealing emotions and sentiment impact from certain events, notably that of pandemics. Objective: The objective of this research is to evaluate the capability of Twitter messages (tweets) in estimating the sentiment impact of COVID-19 cases in Greece in real time as related to cases. Methods: 153,528 tweets were gathered from 18,730 Twitter users totalling 2,840,024 words for exactly one year and were examined towards two sentimental lexicons: one in English language translated into Greek (using the Vader library) and one in Greek. We then used the specific sentimental ranking included in these lexicons to track i) the positive and negative impact of COVID-19 and ii) six types of sentiments: Surprise, Disgust, Anger, Happiness, Fear and Sadness and iii) the correlations between real cases of COVID-19 and sentiments and correlations between sentiments and the volume of data. Results: Surprise (25.32%) mainly and secondly Disgust (19.88%) were found to be the prevailing sentiments of COVID-19. The correlation coefficient (R2) for the Vader lexicon is -0.07454 related to cases and -0.,70668 to the tweets, while the other lexicon had 0.167387 and -0.93095 respectively, all measured at significance level of p < 0.01. Evidence shows that the sentiment does not correlate with the spread of COVID-19, possibly since the interest in COVID-19 declined after a certain time.