ChapterPDF Available

Review on Analysis of Classifiers for Fake News Detection

Authors:

Abstract

The spread of false news on an online social media platform has been a major concern in recent years. Many sources, such as news stations, websites, and even newspaper websites, post news pieces on social media. Meanwhile, most of the new material on social media is suspect and, in some circumstances, deliberately misleading. Fake news is a term used to describe this type of information. Large volumes of bogus news on the internet have the potential to generate major societal issues. Accepting the stories and pretending that they are true is extremely harmful for our community. Many people believe that false news affected the 2016 presidential election in the United States. The term has since become commonplace as a result of the election. It has also attracted the interest of industry and academics, who are trying to figure out where it comes from, how it spreads, and what impacts it has. In this work, we looked at a number of different papers and compared all of the strategies for detecting false news.
Review on Analysis of Classifiers for Fake News
Detection
Mayank Kumar Jain1(B),RitikaGarg
1,DineshGopalani
2,
and Yogesh Kumar Meena2
1Swami Keshvanand Institute of Technology, Management and Gramothan, Jaipur 302017,
Rajasthan, India
mayank.jain@skit.ac.in
2Malaviya National Institute of Technology, Jaipur 302017, Rajasthan, India
{dgopalani.cse,ymeena.cse}@mnit.ac.in
Abstract. The spread of false news on an online social media platform has been
a major concern in recent years. Many sources, such as news stations, websites,
and even newspaper websites, post news pieces on social media. Meanwhile, most
of the new material on social media is suspect and, in some circumstances, delib-
erately misleading. Fake news is a term used to describe this type of information.
Large volumes of bogus news on the internet have the potential to generate major
societal issues. Accepting the stories and pretending that they are true is extremely
harmful for our community. Many people believe that false news affected the 2016
presidential election in the United States. The term has since become common-
place as a result of the election. It has also attracted the interest of industry and
academics, who are trying to figure out where it comes from, how it spreads, and
what impacts it has. In this work, we looked at a number of different papers and
compared all of the strategies for detecting false news.
Keywords: Natural language processing ·Support vector machine ·N-gram
analysis ·Machine learning
1 Introduction
The expansion of the Internet and recent technological advancements have had a huge
influence on social relationships. People’s use of social media to get information is
becoming increasingly common. People also use various social media platforms to dis-
cuss their own activities, hobbies, and opinions. Many benefits of social media include
simple access to information, minimal cost, and rapid dissemination of information.
Because of these benefits, many individuals prefer to get their news through social
media rather than common stories sources like television or newspapers [1]. As a result,
social news is rapidly displacing traditional news sources. Although social platform has
numerous advantages, news on the internet is not as reliable as traditional news sources.
However, the content of social media may occasionally be altered to serve other
objectives. As a result, false news and rumors spread rapidly and widely. As a result of
©TheAuthor(s),underexclusivelicensetoSpringerNatureSwitzerlandAG2022
V. E . B a l a s e t a l . ( E d s . ) : I C E T C E 2 0 2 2 , C C I S 1 5 9 1 , p p . 3 9 5 4 0 7 , 2 0 2 2 .
https://doi.org/10.1007/978-3-031-07012-9_34
396 M. K. Jain et al.
this circumstance, incorrect news stories are produced and disseminated. Furthermore,
well-intentioned individuals distribute false news and disinformation without carefully
vetting it [1,3,5]. Many websites exist on social media with the sole purpose of producing
fake news.
After the mid-1990s, the way people communicate with one another altered. Multiple
online social networks, such as Facebook, Twitter, make it easier for users to share real-
time information with others on the same or other networks [9]. Online social networks
have become a key means of communication and information sharing due to a variety
of qualities such as simplicity of use, faster transformation, and lower cost. Almost
all social network users nowadays get their news from internet sources [7]. However,
as OSNs become more widespread, the Internet has become an excellent medium for
communicating and distributing bogus news. False news is propagated through deceptive
material, fake reviews, fake stories, advertising, false political statements and other
means [5]. Fake news is currently spreading quicker on social media than in traditional
media. Some information circulates on social media, causing people to be perplexed and
distrustful. Fake news detection and identification on social media platforms is a difficult
process. Fake news spreads quickly, affecting millions of people and their real-world
surroundings. The spread of false news is not a new issue on social media platforms [5,
8]. Several firms and well-known individuals utilize a variety of social media networks
to promote their products and build their reputation. Many users are influenced by these
operations to share and like the news. Fake news propagated over the network as a result
of this procedure. In terms of a topic, the content, style, and media platform of false
news fluctuate with time, and fake news tries to distort linguistic form.
Furthermore, the user produced, shared, liked, and commented on a huge amount of
material in an online social networking site. Many phone identities use repeated postings
to propagate bogus news around the social media site [10]. Due to the large amount of
data on the social network, it is impossible to discover all of these shared items. Fake
news that spreads throughout the internet cannot be exposed by turning a blind eye.
Some features of false news must be examined in order to identify it from genuine news.
Numerous articles and blogs have been written as a result of public awareness see in
Fig. 1.
Users’ decisions have been influenced by user reviews, comments, and news posted
on social media. The dissemination of low-quality news, particularly false news, has
adetrimentalimpactonsocietalandindividualattitudes[11]. Fake news is damaging
to individuals, society, and companies, as well as governments. Fake news about the
company, for example, spread by spam or malevolent individuals, might cause significant
harm. As a result, detecting false news has become a major study topic.
Many websites, such as deversuardian.com and ABCnews.com, produced bogus
news. False claims, fraudulent ads, conspiracy theories, satirical news, and fake news
are just a few examples of fake news. These variations have an impact on people’s
life in every way. All of these stories have dominated public opinion, interest, and
decision-making. Multiple authors and researchers have agreed on numerous criteria to
detect false news pertaining to text, the reaction in the form of sharing, live, and the
source from which it originated. The traditional technique of using a human editor and
professional journalist to detect false news ignores the volume of content generated by
Review on Analysis of Classifiers for Fake News Detection 397
Fig. 1. Fake news and all that goes along with it [5]
social media platforms. To identify it in a timely manner, new computational approaches
are necessary [14]. However, professional manual verification was required, as well as
acomputertechniquetoidentifyfalsenews.
Many of the older approaches for detecting false news use a similar detection pro-
cedure. Preprocessing is applied to all datasets in order to eliminate noisy information.
Even yet, using a big and false dataset, bogus stories on Twitter are discovered with a
high level of accuracy. Linguistic features, linguistic cue approaches, tweet level char-
acteristics, and NLP features are among the features retrieved from the news data [6].
Linguistic Analysis and Word Count (LIWC), Semantic Analysis, Probabilistic Context-
Free Grammar, (TF-IDF) Term Frequency-Inverted Document Frequency, bag-of-words,
n-grams, and Doc2Vec are some of the feature extraction approaches. Once the features
have been recovered, they are fed into Machine Learning classifiers for training and
testing the accuracy of fresh unlabeled news information prediction. After the features
have been retrieved, Nave Bayes Classifier, Neural networks, Support Vector Machine
(SVM), Random Forest, XGBoost, Decision Tree, Linear Regression, Logistic Regres-
sion, K Nearest Neighbors (KNN), AdaBoost Stochastic Gradient Descent (SGD) [6,
9], and Linear SVM are the majority of Machine Learning classifiers investigated in the
previous fake news detection methods.
Our major goal is to distinguish between “true” and “false” news. We need to figure
out whether there is an existing platform for detecting false news depending on context
and content. Following this first identification, we must conduct large-scale experiments
utilizing multiclass datasets and various machine learning algorithms. We’ll employ a
variety of machine learning methods in these exercises. Some fixed findings are acces-
sible in prior models; therefore, we must compare these results to determine which
news is true and which is false. This manifests as a result/outcome on social media sites
398 M. K. Jain et al.
such as Instagram and Facebook pages, as well as messaging apps like WhatsApp and
Skype, where fake news receives a significant boost and spreads among individuals. For
multi-class article identification, there has been a massive refulgence of adjustments.
Our goal is to enhance previously detected bogus news, with the hope that it will be
valuable for future study in this field. The suggested model aids in determining the gen-
uineness of news. We examine and compare six different supervised machine learning
approaches, including SVM, Decision Tree, K-nearest Neighbor, and Logistic Regres-
sion. In this study, we aim to develop a model that can accurately estimate the viability
of a particular research project using machine learning to detect fake news.
The paper is further broken into the following sections. Section 2discusses relevant
works that have been completed in the recent past. Model description, data preparation
techniques (stop words removal, stemming), vectorization techniques count vectorizer
or bag of words (Bow), term frequency-inverse document frequency (TF-IDF), and
classification procedure and machine learning strategy are covered in Sect. 3.Section4
contains challenges of the data set as well as faced in survey. The findings of conclusion
which include data normalization, data word cloud representation, and future scope are
summarized in Last Section.
2 Related Work
Using machine learning, several academics have built models and algorithms for detect-
ing bogus news. The challenge of detecting fake news has only lately been brought, but
it has piqued the interest of those who have looked into it. Several methods have been
developed to identify falsification in various forms of data.
Two stage strategies for identifying fakeness on social media platforms were discov-
ered by Feyza et al. (2019) [1]. Several times pre-processing approach is performed to
the entire data set in the first stage, resulting in the data set being changed from unstruc-
tured to structured. Document-Term matrix utilizing the TF weighting approach. In a
secondary stage, data mining is used to apply the twenty-third supervised AI algorithm to
the collection of data in order to transform it into structured format. The findings of this
experiment are displayed in a variety of datasets, including: 1) BuzzFeed Political data
set; accuracy in J48 is 0,655, whereas accuracy in F-measure is 1,000. 2) He obtained
accuracy by using random political news data (1,000). 3) In the ISOT fakeness dataset,
the best result is 1,000.
Somya et al. (2020) utilize an automated identification approach in the Chrome
domain to detect manufactured tales on Facebook by analyzing some particular functions
on fakebooks using deep learning [2]. The author employs user profile data functions,
news data functions, and a combination of these two characteristics in this paper.
Pedro et al. (2020) [3] used five sets of data to categories and identify bogus articles
using text attributes. They discriminate both texts and media stories in three distinct
languages: Silvic, Germanic, and Latin. In addition, NLP tools such as Bags-of-words
and Word2vec can be used. With a 92% accuracy rate.
Authors Julio C.S et al. (2019) offered a new set of features, compared their per-
formance to prior data set features, and designed some new features such as lexical
features, psycholinguistic features, semantic features, and subjectivity. And the actual
positive rate is one, whereas the false positive rate is 0.4 [4].
Review on Analysis of Classifiers for Fake News Detection 399
The author of Xichen Zhang et al. (2019) [5]givesmanysortsofcharacteristics
for detecting bogus publications. And create a complete fake news detection ecosys-
tem, which includes domains such as intervention, fact checking, alert systems, fake
news detection, potential fake news prediction, suspiciousness analysis, and third-party
verification to determine if the system is trustworthy.
M. Jain et al. (2020) introduced a new set of characteristics for ML classifiers that
leverages two data sets on political articles. Extract linguistic/stylometric characteristics,
abagofwordsTF,andaBOWTF-IDFvectorfromdatasettextfields,thenusevari-
ous ML approaches with bagging and boosting methods. This model has a stylometric
accuracy of 87.26%, with Word vector having the greatest accuracy of 89.41% [6].
Mohamed K. et al. (2019) [7]proposeasequentialsurveyforfakestorydetection.
With certain predefined datasets, many types of data and various extracted characteristics
are employed for detection.
Macro L et al. [8]developamachinelearning(ML)falsetaledetectionsystemthat
blends news data with social narrative elements and outperforms current algorithms.
Second, use a Messenger chat box to develop a technique and compare it to a real-world
application. And gaining with the greatest accuracy (81.7%).
H. Liu et al. [9]establishedasystemfordetectingfalsenewsfromseveralnews
sources called the Fake News Detector Based on Multi-Source Scoring (FNDMS). To
assess the reliability of a single news source, content-based and author-based aspects are
used. The Dempster-Shafer Theory (DST) model combined the veracity of many sources
to arrive at a conclusion on the authenticity of a news occurrence. When compared to
SVM, Logistic Regression, Random Forest, and AdaBoost, the framework’s efficacy was
confirmed. It would be preferable if this framework concentrated just on the source’s
purpose.
S. Helmstetter and H. Paulheim [10]usedabigandnoisydatasettodetectbogus
news on Twitter. Features derived from a tweet include tweet level features, subject fea-
tures, user-level features, emotion features, and text features. The suggested method was
designed to detect false news tweets and sources. The source identified the user account
from which the tweets were sent. Multiple classifiers were used to categories the tweets,
including neural networks, Random Forest, SVM, Nave Bayes, and XGBoost classi-
fiers. The XGBoost algorithm performed exceptionally well in classifying the tweets.
The suggested technique has the disadvantage of focusing on only one source, namely
Twitter. It also failed to target news pieces from a variety of sources.
H. Ahmed et al. [11] suggested an n-gram analysis technique for automatically
detecting bogus news. The features were retrieved from the text using the TF and TF-IDF
feature selection approaches. To verify the news’s dependability, six different Machine
Learning Classifiers and two approaches for choosing characteristics were examined.
Six classifiers were compared: SVM, KNN, SGD, Linear SVM, and Decision Tree.
Using the Unigram, TFIDF, and Linear SVM classifiers, the suggested model had the
maximum accuracy. This approach does not consider the project’s multi-source nature
or legitimacy.
To Recognize fake news in news reports and social media platforms The writers
have recommended a number of techniques. The developers use common language
preparation processes to identify fake/phony news. Harita et al. [12] use stylometric
highlights and word vectors highlights of the content of news items to create fake news
with an accuracy of up to 95.49%.
400 M. K. Jain et al.
Yan g e t a l. [ 13]employedmultimodaltodetectfalsenewsinarticlesanduseda
convolution neural network to explicit and latent feature sets of text and picture data to
recognize fake news. In Table 1we show the limitation of some papers.
Table 1. Comparative analysis
Authors name Dataset Tec h nolog y u s ed Results Limitations
Hager et al.
2021 [24]
Dataset1,
FakeNewsNet,
FA- K E S 5,The
ISOT
Grid search,
hyperopt
optimization
Tec h nique s
Accuracy
OPCNNFAKE =
97.84%,
RNN =86.76%
Not using
Knowledge-base
and fact-based
approaches
Somya et al.
2020 [2]
More than 15
000 news
contents from
different
Facebook users
including both
fake and real
news
Fake news
detection
approach in
chrome
environment,
KNN, SVM,
Logistic
Regression
Accuracy KNN =
99.3%,
SVM =99.3%,
Logistic
regression =
99.0%,
Decision tree =
99.1%,
LSTM =99.4%
When compared
to machine
learning
algorithms, deep
learning
algorithms take
more time for
testing and
analysis
Abdullah et al.
2020 [25]
Fake news
dataset from
Kaggle
Hybrid
CNN-LSTM
Model
Accuracy
99.7%, 97.5% of
training and
testing
Doesn’t use
pre-processing
Feyza et al.
2019 [1]
BuzzFeed
Political News
Data set,
Random
Political News
Data set,
ISOT Fake
News Data set
TF weighting
method and
Document-Term
Matrix
Twe n t y - t h r e e
supervised
artificial
intelligence
algorithms
Accuracy of
65.5% and 64.4%
They work on
smaller data set
of 1500 news
articles
K. Shu et al.
2019 [22]
FakeNewsNet Fake news data
repository
FakeNewsNet,
Convolution
neural network
Accuracy is 92%
for Buzzfeed
news and 93.6 for
PolitiFact news
Accuracy is less
due to a smaller
number of
features
J.C et al. 2019
[23]
None Measure the
prediction
performance of
proposed
approaches to
design auto
detection system
SVM, KNN
Accuracy
RF =85%,
KNN =80%,
SVM =79%
Accuracy for
detecting fake
account is very
due to small
dataset
Review on Analysis of Classifiers for Fake News Detection 401
3 Methodology
3.1 Data Pre-processing
Preparing raw data for use in a machine learning model is known as data pre-processing.
It’s the first and most crucial step in creating a machine learning model. Three phases
make up data pre-processing: -
Data Cleaning - This is a method used to improve the quality of data. Normalizing data,
reducing mistakes, smoothing noisy data, addressing missing data, finding unnecessary
observations, and correcting errors are all part of this process.
Data Transformation - In this approach, raw data is converted into a format that allows
data mining to collect strategic information quickly and efficiently. Because raw data is
difficult to track, any information collected must first be pre-processed.
Data Reduction - Data reduction is a technique for shrinking the size of original data so
that it may be represented in a lesser amount of space. Data reduction strategies maintain
data integrity while lowering data (Fig. 2).
Fig. 2. Data types in the news
Stop Words Removal: Stop words have no significance until they are combined with
other words. Stop words are considered noise in text classification when characteristics
are retrieved from text. These are the terms that are utilized in article sentences to connect
the concept and sentence structure. Stop words include prepositions, conjunctions, and
articles. Words like by, for, from, how, of, on, that, the, too, was, what, when, about,
and so on are examples. These terms are no longer in use. Total number of words in all
articles and unique word count in the listed articles after preprocessing.
402 M. K. Jain et al.
Stemming: The tokens are then converted into basic words as the next stage in text
normalization. Stemming is a method of transforming a word into its proper form.
Stemming is a technique for reducing the number of classes of words in data. Words’
affixes are changed by stemming. We utilized the Porter Stemmer Algorithm to convert
the term Python Ly to Python. Because it produces good stemming outcomes. In our
data collection, words like extreme, very, government, transgender, and minister were
replaced with extreme, govern, transgend, and minist, respectively.
3.2 Feature Extraction and Selection
Getting from high dimensional information is one of the challenges of text classifiers.
There are a few concepts, phrases, and explanations in documents that contribute to
the learning cycle’s high computational weight. Furthermore, extraneous and excessive
highlights might degrade the precision and execution of classifiers.
Term F r e q ue ncy : It is commonly known as count vectorizer (CV), is a technique that
assesses the correspondence of texts using the bag of words (BoW) technique. Each
document comprises a collection of words that have been represented as a length vector.
Each count represents the likelihood of a word appearing in the manuscript. If a word
exists in a document, its value is one; otherwise, its value is zero. Weights of certain
words from the corpus in TF (BoW). That formula given in Eq. (1).
Here, a =Number of times the words occur in the text.
b=Total number of words in text.
c=Total number of documents.
d=Number of documents with word x in it.
TF =a/b(1)
TF-IDF: Frequency-Inverses Document Frequency, also referred as TF-IDF, is a widely
used approach for determining the importance of a word in a document by using altered
text numerical representation. For Natural Language Processing, this is a frequently used
feature extraction approach (NLP). One of the most important features of IDF is that it
affects term recurrence while going up the unusual terms. For example, words like “the”
and “at that point” frequently appear in material, and if we just utilize TF, phrases like
“the” and “at that point” will govern the recurrence check. The IDF, on the other hand,
reduces the impact of these phrases. Below formula given in Eq. (2)and(3)
IDF =Log [c/d](2)
TFIDF =TF IDF (3)
Content Based Feature: It is a ML approach that makes decisions based on feature
similarity. It is frequently employed in recommender systems, which are algorithms that
advertise or recommend items to people based on information gathered about them. Like-
Number sentences, words, question mark, exclamation and capital letters, punctuation,
negations (no, not), frequency used words.
Review on Analysis of Classifiers for Fake News Detection 403
Context-Based Feature: The contextual information aids in maximizing the con-
sumers’ understanding of the scene photographs on the web. Such details can help
distinguish between ambiguous scene photos with intra-class variance and inter-class
similarities,User_Name, User_Age, Registration Date, Registration Time, Follower,
Following, Number of Posts.
3.3 Machine Learning Strategies for Detecting Fake News
Machine Learning is used to create models that make predictions based on past or
historical data. It’s a feature of AI; it learns from previous experiences and may anticipate
our needs. Typically used to store and then analyses massive amounts of data. Nowadays,
it is utilized to identify fraud. There are many other machine learning approaches, but
we’re focusing on supervised machine learning [1522].
Naive Bayes: To forecast the class of the previous dataset, it is a basic, straightforward,
and effective classification procedure. This machine learning classifier is used to make
quick predictions. It forecasts data based on the likelihood of an object. It produces
superior outcomes in multi-class predictions. This is mostly used for text categorization
and consumes a significant quantity of data. When it employs NLP tasks like Sentimental
analysis, it produces superior results.
It entails completing the following steps:
First, make a frequency table based on the terms.
Calculate the probability for each class based on the frequency table.
Calculate the posterior probability for each class.
The greatest posterior probability is the result of the prediction classifier.
K-Nearest Neighbours (KNN): It is based on a machine learning method that is super-
vised. KNN classification is mostly used to discover and identify abnormalities (false
news). It is mostly used for classification problems, such as detecting bogus news; it
is simple to use and takes little time, and it can readily reduce noise. It calculates the
shortest distance using the Euclidean distance formula, which is K. Across all the input,
this square root of the total of the square differences between a training data and a testing
point.
Random Forest (RF): It is a supervised learning algorithm for text classification. It
prevents overfitting by classifying the text and eliminating unpredictability. The “forest”
suggestion is a collection of uncorrelated decision trees that are combined to eliminate
variation and provide more accurate data predictions.
Extra Tree Classifier: It is a form of batch learning algorithm that outputs a classifica-
tion result by aggregating several de-correlated decision trees collected in a “forest”. It
is quite similar to a Random Forest Classifier, and the only difference is that it informs
the forest’s decision trees. We didn’t utilize it as a classifier in this case; instead, we used
it as a feature selection approach to choose the most suited features, which we then used
in the Classifiers to improve results and performance.
404 M. K. Jain et al.
Support Vector Machine: This algorithm’s purpose is to find the funniest line or deci-
sion boundary that can divide n-dimensional space into classes so that fresh data points
can be readily placed in the correct category in the future. A hyperplane denotes the
optimal choice boundary. Machine learning classification is shown in Fig. 3.
Logistic Regression: It’s a method for predicting a categorical dependent variable from
asetofindependentvariables.Acategoricaldependentvariablesoutputispredicted
using logistic regression. As a result, the result must be a discrete or categorical value.
It can be Yes or No, 0 or 1, true or false, and so on, but instead of giving exact values
like 0 and 1, it delivers probabilistic values that are somewhere between 0 and 1.
Fig. 3. Machine learning architecture
Convolutional Neural Networks: ACNNisaDeepLearningmethodthattakesan
image as input and assigns different weights and biases to different sections of the picture
so that they can be distinguished. The Convolutional Neural Network Model can handle
avarietyoftasksintheImageProcessingdomainoncetheybecomedifferentiable,such
as Image Recognition, Image Classification, Object and Face Detection, and so on.
Recurrent Neural Networks: Recurrent Neural Networks (RNNs) are neural networks
that process temporal or consecutive data. To develop the best predictions, RNNs employ
other data points in a series. They achieve this by taking in input and influencing the
output by recycling the activations of previous nodes or later nodes in the sequence.
Deep learning approach shown in Fig. 4.
Review on Analysis of Classifiers for Fake News Detection 405
Fig. 4. Deep learning architecture
4 Challenges
Deep Learning: Deep learning technology can deal with any type of data, including
text, photos, video, and audio. It may be adapted to a new type of problem and avoids
feature engineering, which is the most time-consuming yet essential aspect of a machine-
learning framework. Deep learning technologies, on the other hand, have the disadvan-
tage of requiring a significant amount of time for model training with a relatively large
amount of data and not providing interpretations of what the model has actually learned,
making it almost a black box type of processing inside the model.
Multimedia False Information Detection: Data analytics, computer vision, and sig-
nal processing techniques are needed to generate fabricated and modified sounds, images,
and movies. Machine learning and deep learning are essential for discovering signature
traits of modified and produced multimedia.
Unsupervised Models: The majority of the current work is done utilizing supervised
learning techniques. Due to the large amount of unlabelled data available on social media,
unsupervised models must be constructed.
Datasets: Because most research is done on customised datasets, the production of
persuasive gold standard datasets in this sector is critical. A benchmark comparison
between multiple algorithms is impossible due to the absence of publicly available
large-scale datasets.
Multilingual Platform: The majority of the research focuses on linguistic charac-
teristics in English-language texts. Other popular and regional languages are not yet
considered (multilingual platform for fake news identification).
Early Detection: Detecting fake news in its early phases, before it spreads widely, is
adifculteffortthatmustbecompletedquicklyinordertomitigateandintervene.Its
nearly tough to change people’s minds after bogus news has been widely accepted and
trusted.
406 M. K. Jain et al.
Cross Domain Analysis: The majority of present systems concentrate solely on the
method of deception detection, whether in the form of content, dissemination, style,
or other factors. Cross domain analysis, which takes into account a variety of factors
such as topic, website, language, photos, and URL, aids in the identification of unique
non-varying traits and allows for the early detection of fake content.
5 Conclusion
In recent years, people have found it more difficult to obtain accurate and reliable infor-
mation due to the growing volume of information available on social media. We cover a
variety of tools and methodologies for spotting fake news. In this work, we identified a
number of machine learning approaches for detecting false news, including SVM, LR,
NB, KNN, RF. On all types of datasets used by the authors, linear SVM performed better.
To detect false news, several researchers employed deep learning models such as CNN,
RNN, and Hybrid Models. In the future, we’ll strive to construct a new real-time dataset
for false news and use the graph convolution neural networks to detect false news.
References
1. Ozbay, F.A., Alatas, B.: Fake news detection within online social media using super-
vised artificial intelligence algorithms. Phys. A (2019). https://doi.org/10.1016/j.physa.2019.
123174
2. Sahoo, S.R., Gupta, B.B.: Multiple features-based approach for automatic fake news detection
on social networks using deep learning. Appl. Soft Comput. 106983 (2020). https://doi.org/
10.1016/j.asoc.2020.106983
3. Arruda Faustini, P.H., Covões, T.F.: Fake news detection in multiple platforms and languages.
Expert Syst. Appl. 113503 (2020). https://doi.org/10.1016/j.eswa.2020.113503
4. Reis, J.C.S., Correia, A., Murai, F., Veloso, A., Benevenuto, F., Cambria, E.: Supervised
learning for fake news detection. IEEE Intell. Syst. 34(2), 76–81 (2019). https://doi.org/10.
1109/mis.2019.2899143
5. Zhang, X., Ghorbani, A.A.: An overview of online fake news: characterization, detection,
and discussion. Inf. Process. Manage. 57,102025(2019).https://doi.org/10.1016/j.ipm.2019.
03.004
6. Kumar Jain, M., Gopalani, D., Kumar Meena, Y., Kumar, R.: Machine learning based
fake news detection using linguistic features and word vector features. In: 2020 IEEE
7th Uttar Pradesh Section International Conference on Electrical, Electronics and Com-
puter Engineering (UPCON), pp. 1–6 (2020). https://doi.org/10.1109/UPCON50219.2020.
9376576
7. Elhadad, M.K., Fun Li, K., Gebali, F.: Fake news detection on social media: a systematic
survey. In: 2019 IEEE Pacific Rim Conference on Communications, Computers and Signal
Processing (PACRIM) (2019). https://doi.org/10.1109/pacrim47961.2019.8985
8. Della Vedova, M.L., Tacchini, E., Moret, S., Ballarin, G., DiPierro, M., de Alfaro, L.: Auto-
matic online fake news detection combining content and social signals. In: 2018 22nd Confer-
ence of Open Innovations Association (FRUCT) (2018). https://doi.org/10.23919/fruct.2018.
846830
Review on Analysis of Classifiers for Fake News Detection 407
9. Liu, H., Wang, L., Han, X., Zhang, W., He, X.: Detecting fake news on social media: a multi-
source scoring framework. In: 2020 IEEE 5th International Conference on Cloud Computing
and Big Data Analytics (ICCCBDA), pp. 524–531. IEEE, April 2020
10. Helmstetter, S., Paulheim, H.: Weakly supervised learning for fake news detection on Twitter.
In: 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and
Mining (ASONAM) pp. 274–277. IEEE (2018)
11. Ahmed, H., Traore, I., Saad, S.: Detection of online fake news using n-gram analysis and
machine learning techniques. In: Traore, I., Woungang, I., Awad, A. (eds.) ISDDC 2017.
LNCS, vol. 10618, pp. 127–138. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-
69155-8_9
12. Reddy, H., Raj, N., Gala, M., Basava, A.: Textmining-based fake news detection using
ensemble methods. Int. J. Autom. Comput. 1–12 (2020)
13. Yang, Y., Zheng, L., Zhang, J., Cui, Q., Li, Z., Yu, P.S.: TI-CNN: convolutional neural networks
for fake news detection. arXiv preprint arXiv:1806.00749 (2018)
14. Meel, P., Vishwakarma, D.K.: Fake news, rumor, information pollution in social media and
web: a contemporary survey of state-of-the-arts, challenges and opportunities. Expert Syst.
Appl. 153,112986(2019)
15. Perez-Rosas, V., Kleinberg, B., Lefevre, A., Mihalcea, R.: Automatic detection of fake news.
arXiv preprint arXiv:1708.07104 (2017)
16. Kumar, T., Mahrishi, M., Meena, G.: A comprehensive review of recent automatic speech
summarization and keyword identification techniques. In: Fernandes, S.L., Sharma, T.K.
(eds.) Artificial Intelligence in Industrial Applications. LAIS, vol. 25, pp. 111–126. Springer,
Cham (2022). https://doi.org/10.1007/978-3-030-85383-9_8
17. Loper, E., Bird, S.: NLTK: the natural language toolkit. In: Proceedings of the ACL-02
Workshop on Effective Tools and Methodologies for Teaching Natural Language Process-
ing and Computational Linguistics, ETMTNLP 2002, vol. 1, pp. 63–70. Association for
Computational Linguistics, Stroudsburg (2002). https://doi.org/10.3115/1118108.1118117
18. Rubin, V.L., Chen, Y., Conroy, N.J.: Deception detection for news: three types of fakes. Proc.
Assoc. Inf. Sci. Technol. 52(1), 1–4 (2015). https://doi.org/10.1002/pra2.2015.145052010083
19. Shu, K., Sliva, A., Wang, S., Tang, J., Liu, H.: Fake news detection on social media: A data
mining perspective. ACM SIGKDD Explor. Newsl. 19(1), 22–36 (2017a). Shu, K., Wang, S.,
Liu, H.: Exploiting tri-relationship for fake news detection. arXiv:1712.07709 (2017b)
20. Tausczik, Y.R., Pennebaker, J.W.: The psychological meaning of words: LIWC and comput-
erized text analysis methods. J. Lang. Soc. Psychol. 29(1), 24–54 (2010). https://doi.org/10.
1177/0261927X09351676
21. Mahrishi, M., et al.: Video index point detection and extraction framework using custom
YoloV4 darknet object detection model. IEEE Access (2021). https://doi.org/10.1109/ACC
ESS.2021.3118048
22. Shu, K., Mahudeswaran, D., Wang, S., Lee, D., Liu, H.: Fake newsnet: a data repository with
news content, social context and dynamic information for studying fake news on social media.
arXiv preprint arXiv:1809.01286 (2018)
23. Reis, J.C., Correia, A., Murai, F., Veloso, A., Benevenuto, F., Cambria, E.: Supervised learning
for fake news detection. IEEE Intell. Syst. 34(2), 76–81 (2019)
24. Saleh, H., Alharbi, A., Alsamhi, S.H.: OPCNN-FAKE: optimized convolutional neural net-
work for fake news detection. IEEE Access 9, 129471–129489 (2021). https://doi.org/10.
1109/ACCESS.2021.3112806
25. Abdullah, A., Awan, M., Shehzad, M., Ashraf, M.: Fake news classification bimodal using
convolutional neural network and long short-term memory. Int. J. Emerg. Technol. Learn 11,
209–212 (2020)
... This study's objective is to identify fake news on social media by utilising word vector features, stylometric/linguistic data, [ 18], and the textual content of news items. Here, bagging and boosting techniques are used to apply distinct ML models to stylometric data and word vector characteristics. ...
Chapter
Full-text available
The struggle of organisations from all fields to find practical answers for identifying online-based fake news is a prevalent problem right now. The news is published on news Websites, which act as official sources. Social media has drawn the attention of individuals from all over the world who use it to disseminate fake news because of its accessibility, cost, and ease of information exchange. Because people are unable to distinguish between true and misleading information, fake news weakens the logic of the truth, endangering democracy, journalism, and public confidence in political institutions. To sustain strong Internet media and informal organisations, fake news detection must be automated. The manual method is now impractical, slow, expensive, very subjective, and biased due to the vast quantity of data that is available on social networks. As a result, an interesting and fruitful area of research is automated data categorisation. ML and DL algorithms are by far the best way for fake news detection. This study provided an exhaustive, insightful, and empirical assessment encompassing all AI strategies for the recognising fake news, including reinforcement learning, ensemble learning, unsupervised learning, supervised learning, and semi-supervised learning.KeywordsDeep learning(DL)Fake news classificationMachine learning (ML)Social media
... This method is not automated and takes extra time [17]. Moreover, to identify false news, many writers employ Machine Learning (ML) models [5,7,22,26,29,41,48] using a variety of features. Many researchers utilise content-based [64] characteristics in their research to improve accuracy. ...
Article
Full-text available
The majority of users were available on the Internet and created a number of social networking accounts during India’s COVID-19-caused lockdown, which lasted from March to June 2020. A massive amount of information is currently being disseminated on the Internet via various social networking accounts. Some false or fake information in the form of “government letters or resolutions, religious comments, hate speech, and so on" has spread like wildfire. As a result, there are major social issues affecting areas such as unemployment, politics, healthcare, poverty, religious cleavages, etc. Due to the vast availability of similar datasets comprising these types of information, manual detection of fake news or false information is challenging. This issue requires immediate attention in terms of automatically finding false news. With this motivation, we present a novel ‘ConFake’ algorithm. This algorithm includes an eighty content-based feature set for identifying fake news. Content-based and word vector features extracted from the textual content of news stories were used in the experiment. These characteristics were combined and input into machine learning classifiers. To validate the experimental findings, we ran all of the experiments on five publicly available datasets and one synthetically generated ConFake dataset that combined five datasets, namely: Kaggle, McIntire, Reuter, BuzzFeed, and PolitiFact. The proposed model achieved the highest accuracy of 97.31% when compared to other cutting-edge models.
Chapter
Full-text available
Speech has been the most popular form of human communication. A keyboard or a mouse, on the other hand, is the most common way of entering data into a computer. It would be wonderful if computers could understand and carry out human commands. The method of obtaining the transcription (word sequence) of an utterance from the speech waveform is known as automatic speech recognition (ASR). Over the last few decades, speech technology and systems in human-computer interaction have progressed progressively and significantly. This chapter suggests a comprehensive review of automatic speech recognition systems (ASR) and their most recent developments. This research aims to outline and explain some of the popular approaches in speech recognition systems at various stages and highlight selected systems’ unique and innovative characteristics.
Article
Full-text available
The trend of learning from videos instead of documents has increased. There could be hundreds and thousands of videos on a single topic, with varying degrees of context, content, and depth of the topic. The literature claims that learners are nowadays less interested in viewing a complete video but prefers the topic of their interests. This develops the need for indexing of video lectures. Manual annotation or topic-wise indexing is not new in the case of videos. However, manual indexing is time-consuming due to the length of a typical video lecture and intricate storylines. Automatic indexing and annotation is, therefore, a better and efficient solution. This research aims to identify the need for automatic video indexing for better information retrieval and ease users navigating topics inside a video. The automatically identified topics are referred to as “Index Points.” 137-layer YoloV4 Darknet Neural Network creates a custom object detector model. The model is trained on approximately 6000 video frames and then tested on a suite of 50 videos of around 20 hours of run time. Shot Boundary detection is performed using Structural Similarity fused with a Binary Search Pattern algorithm which outperformed the state-of-the-art SSIM technique by reducing the processing time to approximately 21% and providing around 96% accuracy. Generation of accurate index points in terms of true positives and false negatives is detected through precision, recall, and F1 score, which varies between 60-80% for every video. The results show that the proposed algorithm successfully generates a digital index with reasonable accuracy in topic detection.
Article
Full-text available
Recently, there is a rapid and wide increase in fake news, defined as provably incorrect information spread with the goal of fraud. The spread of this type of misinformation is a severe danger to social cohesiveness and well-being since it increases political polarisation and people’s distrust of their leaders. Thus, fake news is a phenomenon that is having a significant impact on our social lives, particularly in politics. This paper proposes novel approaches based on Machine Learning (ML) and Deep Learning (DL) for the fake news detection system to address this phenomenon. The main aim of this paper is to find the optimal model that obtains high performance. Therefore, we propose an optimized Convolutional Neural Network model to detect fake news (OPCNN-FAKE). We compare the performance of the OPCNN-FAKE with Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), and The six regular ML techniques: Decision Tree (DT), logistic Regression (LR), K Nearest Neighbor (KNN), Random Forest (RF), Support Vector Machine (SVM), and Naive Bayes (NB) using four fake news benchmark datasets. Grid search and hyperopt optimization techniques have been used to optimize the parameters of ML and DL, respectively. In addition, N-gram and Term Frequency—Inverse Document Frequency (TF-IDF) have been used to extract features from the benchmark datasets for regular ML, while Glove word embedding has been used to represent features as a feature matrix for DL models. To evaluate the performance of the OPCNN-FAKE, accuracy, precision, recall, F1-measure were applied to validate the results. The results show that OPCNN-FAKE model has achieved the best performance for each dataset compared with other models. Furthermore, the OPCNN-FAKE has a higher performance of cross-validation results and testing results than the other models, which indicates that the OPCNN-FAKE for fake news detection is significantly better than other models.
Article
Full-text available
Fake news is a publicity or conspiracy that contains cautious or fake information having a social as well as political impact because it is spread through old fashioned media and gets the progression via social or news media. Some challenges during fake news are veracity of a news story and natural language processing. This article we are using multimodal approach with Convolutional Neural Network (CNN) and Long Short-Term memory (LSTM) to classify the fake news articles achieved significance performance. We worked on a database with 12 different categories of news articles and used linguistic cue approaches with machine learning. We classified a news based on its source and its previous history (such as domain name and/or author name) with bimodal CNN and LSTM. Through reputable news source, our model classifiesreliable news articles with the accuracy of 99.7% on the training data and 97.5% on test data. However, as a fake news can still be published on a reputable domain, we still had to consider other parameter such as news headlines.
Article
Full-text available
Social media has become a popular means for people to consume and share the news. At the same time, however, it has also enabled the wide dissemination of fake news, that is, news with intentionally false information, causing significant negative effects on society. To mitigate this problem, the research of fake news detection has recently received a lot of attention. Despite several existing computational solutions on the detection of fake news, the lack of comprehensive and community-driven fake news data sets has become one of major roadblocks. Not only existing data sets are scarce, they do not contain a myriad of features often required in the study such as news content, social context, and spatiotemporal information. Therefore, in this article, to facilitate fake news-related research, we present a fake news data repository FakeNewsNet, which contains two comprehensive data sets with diverse features in news content, social context, and spatiotemporal information. We present a comprehensive description of the FakeNewsNet, demonstrate an exploratory analysis of two data sets from different perspectives, and discuss the benefits of the FakeNewsNet for potential applications on fake news study on social media.
Article
Full-text available
The debate around fake news has grown recently because of the potential harm they can have on different fields, being politics one of the most affected. Due to the amount of news being published every day, several studies in computer science have proposed models using machine learning to detect fake news. However, most of these studies focus on news from one language (mostly English) or rely on characteristics of social media-specific platforms (like Twitter or Sina Weibo). Our work proposes to detect fake news using only text features that can be generated regardless of the source platform and are the most independent of the language as possible. We carried out experiments from five datasets, comprising both texts and social media posts, in three language groups: Germanic, Latin, and Slavic, and got competitive results when compared to benchmarks. We compared the results obtained through a custom set of features and with other popular techniques when dealing with natural language processing, such as bag-of-words and Word2Vec.
Article
In recent years, the rise of Online Social Networks has led to proliferation of social news such as product advertisement, political news, celebrity’s information, etc. Some of the social networks such as Facebook, Instagram and Twitter affected by their user through fake news. Unfortunately, some users use unethical means to grow their links and reputation by spreading fake news in the form of texts, images, and videos. However, the recent information appearing on an online social network is doubtful, and in many cases, it misleads other users in the network. Fake news is spread intentionally to mislead readers to believe false news, which makes it difficult for detection mechanism to detect fake news on the basis of shared content. Therefore, we need to add some new information related to user’s profile, such as user’s involvement with others for finding a particular decision. The disseminated information and their diffusion process create a big problem for detecting these contents promptly and thus highlighting the need for automatic fake news detection. In this paper, we are going to introduce automatic fake news detection approach in chrome environment on which it can detect fake news on Facebook. Specifically, we use multiple features associated with Facebook account with some news content features to analyze the behavior of the account through deep learning. The experimental analysis of real-world information demonstrates that our intended fake news detection approach has achieved higher accuracy than the existing state of art techniques.
Article
Social media is a platform to express one’s views and opinions freely and has made communication easier than it was before. This also opens up an opportunity for people to spread fake news intentionally. The ease of access to a variety of news sources on the web also brings the problem of people being exposed to fake news and possibly believing such news. This makes it important for us to detect and flag such content on social media. With the current rate of news generated on social media, it is difficult to differentiate between genuine news and hoaxes without knowing the source of the news. This paper discusses approaches to detection of fake news using only the features of the text of the news, without using any other related metadata. We observe that a combination of stylometric features and text-based word vector representations through ensemble methods can predict fake news with an accuracy of up to 95.49%.