ChapterPDF Available

Review on Analysis of Classifiers for Fake News Detection

January 2022

January 2022

DOI:10.1007/978-3-031-07012-9_34

In book: Emerging Technologies in Computer Engineering: Cognitive Computing and Intelligent IoT (pp.395-407)

Authors:

Mayank Kumar Jain

Manipal University Jaipur

Dinesh Gopalani

Malaviya National Institute of Technology Jaipur

Yogesh Meena

Malaviya National Institute of Technology Jaipur

The spread of false news on an online social media platform has been a major concern in recent years. Many sources, such as news stations, websites, and even newspaper websites, post news pieces on social media. Meanwhile, most of the new material on social media is suspect and, in some circumstances, deliberately misleading. Fake news is a term used to describe this type of information. Large volumes of bogus news on the internet have the potential to generate major societal issues. Accepting the stories and pretending that they are true is extremely harmful for our community. Many people believe that false news affected the 2016 presidential election in the United States. The term has since become commonplace as a result of the election. It has also attracted the interest of industry and academics, who are trying to figure out where it comes from, how it spreads, and what impacts it has. In this work, we looked at a number of different papers and compared all of the strategies for detecting false news.

Content uploaded by Mayank Kumar Jain

Content may be subject to copyright.

Review on Analysis of Classiﬁers for Fake News

Detection

Mayank Kumar Jain1(B),RitikaGarg

1,DineshGopalani

and Yogesh Kumar Meena2

1Swami Keshvanand Institute of Technology, Management and Gramothan, Jaipur 302017,

Rajasthan, India

mayank.jain@skit.ac.in

2Malaviya National Institute of Technology, Jaipur 302017, Rajasthan, India

{dgopalani.cse,ymeena.cse}@mnit.ac.in

Abstract. The spread of false news on an online social media platform has been

a major concern in recent years. Many sources, such as news stations, websites,

and even newspaper websites, post news pieces on social media. Meanwhile, most

of the new material on social media is suspect and, in some circumstances, delib-

erately misleading. Fake news is a term used to describe this type of information.

Large volumes of bogus news on the internet have the potential to generate major

societal issues. Accepting the stories and pretending that they are true is extremely

harmful for our community. Many people believe that false news affected the 2016

presidential election in the United States. The term has since become common-

place as a result of the election. It has also attracted the interest of industry and

academics, who are trying to ﬁgure out where it comes from, how it spreads, and

what impacts it has. In this work, we looked at a number of different papers and

compared all of the strategies for detecting false news.

Keywords: Natural language processing ·Support vector machine ·N-gram

analysis ·Machine learning

1 Introduction

The expansion of the Internet and recent technological advancements have had a huge

inﬂuence on social relationships. People’s use of social media to get information is

becoming increasingly common. People also use various social media platforms to dis-

cuss their own activities, hobbies, and opinions. Many beneﬁts of social media include

simple access to information, minimal cost, and rapid dissemination of information.

Because of these beneﬁts, many individuals prefer to get their news through social

media rather than common stories sources like television or newspapers [1]. As a result,

social news is rapidly displacing traditional news sources. Although social platform has

numerous advantages, news on the internet is not as reliable as traditional news sources.

However, the content of social media may occasionally be altered to serve other

objectives. As a result, false news and rumors spread rapidly and widely. As a result of

©TheAuthor(s),underexclusivelicensetoSpringerNatureSwitzerlandAG2022

V. E . B a l a s e t a l . ( E d s . ) : I C E T C E 2 0 2 2 , C C I S 1 5 9 1 , p p . 3 9 5 – 4 0 7 , 2 0 2 2 .

https://doi.org/10.1007/978-3-031-07012-9_34

396 M. K. Jain et al.

this circumstance, incorrect news stories are produced and disseminated. Furthermore,

well-intentioned individuals distribute false news and disinformation without carefully

vetting it [1,3,5]. Many websites exist on social media with the sole purpose of producing

fake news.

After the mid-1990s, the way people communicate with one another altered. Multiple

online social networks, such as Facebook, Twitter, make it easier for users to share real-

time information with others on the same or other networks [9]. Online social networks

have become a key means of communication and information sharing due to a variety

of qualities such as simplicity of use, faster transformation, and lower cost. Almost

all social network users nowadays get their news from internet sources [7]. However,

as OSNs become more widespread, the Internet has become an excellent medium for

communicating and distributing bogus news. False news is propagated through deceptive

material, fake reviews, fake stories, advertising, false political statements and other

means [5]. Fake news is currently spreading quicker on social media than in traditional

media. Some information circulates on social media, causing people to be perplexed and

distrustful. Fake news detection and identiﬁcation on social media platforms is a difﬁcult

process. Fake news spreads quickly, affecting millions of people and their real-world

surroundings. The spread of false news is not a new issue on social media platforms [5,

8]. Several ﬁrms and well-known individuals utilize a variety of social media networks

to promote their products and build their reputation. Many users are inﬂuenced by these

operations to share and like the news. Fake news propagated over the network as a result

of this procedure. In terms of a topic, the content, style, and media platform of false

news ﬂuctuate with time, and fake news tries to distort linguistic form.

Furthermore, the user produced, shared, liked, and commented on a huge amount of

material in an online social networking site. Many phone identities use repeated postings

to propagate bogus news around the social media site [10]. Due to the large amount of

data on the social network, it is impossible to discover all of these shared items. Fake

news that spreads throughout the internet cannot be exposed by turning a blind eye.

Some features of false news must be examined in order to identify it from genuine news.

Numerous articles and blogs have been written as a result of public awareness see in

Fig. 1.

Users’ decisions have been inﬂuenced by user reviews, comments, and news posted

on social media. The dissemination of low-quality news, particularly false news, has

adetrimentalimpactonsocietalandindividualattitudes[11]. Fake news is damaging

to individuals, society, and companies, as well as governments. Fake news about the

company, for example, spread by spam or malevolent individuals, might cause signiﬁcant

harm. As a result, detecting false news has become a major study topic.

Many websites, such as deversuardian.com and ABCnews.com, produced bogus

news. False claims, fraudulent ads, conspiracy theories, satirical news, and fake news

are just a few examples of fake news. These variations have an impact on people’s

life in every way. All of these stories have dominated public opinion, interest, and

decision-making. Multiple authors and researchers have agreed on numerous criteria to

detect false news pertaining to text, the reaction in the form of sharing, live, and the

source from which it originated. The traditional technique of using a human editor and

professional journalist to detect false news ignores the volume of content generated by

Review on Analysis of Classiﬁers for Fake News Detection 397

Fig. 1. Fake news and all that goes along with it [5]

social media platforms. To identify it in a timely manner, new computational approaches

are necessary [14]. However, professional manual veriﬁcation was required, as well as

acomputertechniquetoidentifyfalsenews.

Many of the older approaches for detecting false news use a similar detection pro-

cedure. Preprocessing is applied to all datasets in order to eliminate noisy information.

Even yet, using a big and false dataset, bogus stories on Twitter are discovered with a

high level of accuracy. Linguistic features, linguistic cue approaches, tweet level char-

acteristics, and NLP features are among the features retrieved from the news data [6].

Linguistic Analysis and Word Count (LIWC), Semantic Analysis, Probabilistic Context-

Free Grammar, (TF-IDF) Term Frequency-Inverted Document Frequency, bag-of-words,

n-grams, and Doc2Vec are some of the feature extraction approaches. Once the features

have been recovered, they are fed into Machine Learning classiﬁers for training and

testing the accuracy of fresh unlabeled news information prediction. After the features

have been retrieved, Nave Bayes Classiﬁer, Neural networks, Support Vector Machine

(SVM), Random Forest, XGBoost, Decision Tree, Linear Regression, Logistic Regres-

sion, K Nearest Neighbors (KNN), AdaBoost Stochastic Gradient Descent (SGD) [6,

9], and Linear SVM are the majority of Machine Learning classiﬁers investigated in the

previous fake news detection methods.

Our major goal is to distinguish between “true” and “false” news. We need to ﬁgure

out whether there is an existing platform for detecting false news depending on context

and content. Following this ﬁrst identiﬁcation, we must conduct large-scale experiments

utilizing multiclass datasets and various machine learning algorithms. We’ll employ a

variety of machine learning methods in these exercises. Some ﬁxed ﬁndings are acces-

sible in prior models; therefore, we must compare these results to determine which

news is true and which is false. This manifests as a result/outcome on social media sites

398 M. K. Jain et al.

such as Instagram and Facebook pages, as well as messaging apps like WhatsApp and

Skype, where fake news receives a signiﬁcant boost and spreads among individuals. For

multi-class article identiﬁcation, there has been a massive refulgence of adjustments.

Our goal is to enhance previously detected bogus news, with the hope that it will be

valuable for future study in this ﬁeld. The suggested model aids in determining the gen-

uineness of news. We examine and compare six different supervised machine learning

approaches, including SVM, Decision Tree, K-nearest Neighbor, and Logistic Regres-

sion. In this study, we aim to develop a model that can accurately estimate the viability

of a particular research project using machine learning to detect fake news.

The paper is further broken into the following sections. Section 2discusses relevant

works that have been completed in the recent past. Model description, data preparation

techniques (stop words removal, stemming), vectorization techniques count vectorizer

or bag of words (Bow), term frequency-inverse document frequency (TF-IDF), and

classiﬁcation procedure and machine learning strategy are covered in Sect. 3.Section4

contains challenges of the data set as well as faced in survey. The ﬁndings of conclusion

which include data normalization, data word cloud representation, and future scope are

summarized in Last Section.

2 Related Work

Using machine learning, several academics have built models and algorithms for detect-

ing bogus news. The challenge of detecting fake news has only lately been brought, but

it has piqued the interest of those who have looked into it. Several methods have been

developed to identify falsiﬁcation in various forms of data.

Two stage strategies for identifying fakeness on social media platforms were discov-

ered by Feyza et al. (2019) [1]. Several times pre-processing approach is performed to

the entire data set in the ﬁrst stage, resulting in the data set being changed from unstruc-

tured to structured. Document-Term matrix utilizing the TF weighting approach. In a

secondary stage, data mining is used to apply the twenty-third supervised AI algorithm to

the collection of data in order to transform it into structured format. The ﬁndings of this

experiment are displayed in a variety of datasets, including: 1) BuzzFeed Political data

set; accuracy in J48 is 0,655, whereas accuracy in F-measure is 1,000. 2) He obtained

accuracy by using random political news data (1,000). 3) In the ISOT fakeness dataset,

the best result is 1,000.

Somya et al. (2020) utilize an automated identiﬁcation approach in the Chrome

domain to detect manufactured tales on Facebook by analyzing some particular functions

on fakebooks using deep learning [2]. The author employs user proﬁle data functions,

news data functions, and a combination of these two characteristics in this paper.

Pedro et al. (2020) [3] used ﬁve sets of data to categories and identify bogus articles

using text attributes. They discriminate both texts and media stories in three distinct

languages: Silvic, Germanic, and Latin. In addition, NLP tools such as Bags-of-words

and Word2vec can be used. With a 92% accuracy rate.

Authors Julio C.S et al. (2019) offered a new set of features, compared their per-

formance to prior data set features, and designed some new features such as lexical

features, psycholinguistic features, semantic features, and subjectivity. And the actual

positive rate is one, whereas the false positive rate is 0.4 [4].

Review on Analysis of Classiﬁers for Fake News Detection 399

The author of Xichen Zhang et al. (2019) [5]givesmanysortsofcharacteristics

for detecting bogus publications. And create a complete fake news detection ecosys-

tem, which includes domains such as intervention, fact checking, alert systems, fake

news detection, potential fake news prediction, suspiciousness analysis, and third-party

veriﬁcation to determine if the system is trustworthy.

M. Jain et al. (2020) introduced a new set of characteristics for ML classiﬁers that

leverages two data sets on political articles. Extract linguistic/stylometric characteristics,

abagofwordsTF,andaBOWTF-IDFvectorfromdatasettextﬁelds,thenusevari-

ous ML approaches with bagging and boosting methods. This model has a stylometric

accuracy of 87.26%, with Word vector having the greatest accuracy of 89.41% [6].

Mohamed K. et al. (2019) [7]proposeasequentialsurveyforfakestorydetection.

With certain predeﬁned datasets, many types of data and various extracted characteristics

are employed for detection.

Macro L et al. [8]developamachinelearning(ML)falsetaledetectionsystemthat

blends news data with social narrative elements and outperforms current algorithms.

Second, use a Messenger chat box to develop a technique and compare it to a real-world

application. And gaining with the greatest accuracy (81.7%).

H. Liu et al. [9]establishedasystemfordetectingfalsenewsfromseveralnews

sources called the Fake News Detector Based on Multi-Source Scoring (FNDMS). To

assess the reliability of a single news source, content-based and author-based aspects are

used. The Dempster-Shafer Theory (DST) model combined the veracity of many sources

to arrive at a conclusion on the authenticity of a news occurrence. When compared to

SVM, Logistic Regression, Random Forest, and AdaBoost, the framework’s efﬁcacy was

conﬁrmed. It would be preferable if this framework concentrated just on the source’s

purpose.

S. Helmstetter and H. Paulheim [10]usedabigandnoisydatasettodetectbogus

news on Twitter. Features derived from a tweet include tweet level features, subject fea-

tures, user-level features, emotion features, and text features. The suggested method was

designed to detect false news tweets and sources. The source identiﬁed the user account

from which the tweets were sent. Multiple classiﬁers were used to categories the tweets,

including neural networks, Random Forest, SVM, Nave Bayes, and XGBoost classi-

ﬁers. The XGBoost algorithm performed exceptionally well in classifying the tweets.

The suggested technique has the disadvantage of focusing on only one source, namely

Twitter. It also failed to target news pieces from a variety of sources.

H. Ahmed et al. [11] suggested an n-gram analysis technique for automatically

detecting bogus news. The features were retrieved from the text using the TF and TF-IDF

feature selection approaches. To verify the news’s dependability, six different Machine

Learning Classiﬁers and two approaches for choosing characteristics were examined.

Six classiﬁers were compared: SVM, KNN, SGD, Linear SVM, and Decision Tree.

Using the Unigram, TFIDF, and Linear SVM classiﬁers, the suggested model had the

maximum accuracy. This approach does not consider the project’s multi-source nature

or legitimacy.

To Recognize fake news in news reports and social media platforms The writers

have recommended a number of techniques. The developers use common language

preparation processes to identify fake/phony news. Harita et al. [12] use stylometric

highlights and word vectors highlights of the content of news items to create fake news

with an accuracy of up to 95.49%.

400 M. K. Jain et al.

Yan g e t a l. [ 13]employedmultimodaltodetectfalsenewsinarticlesanduseda

convolution neural network to explicit and latent feature sets of text and picture data to

recognize fake news. In Table 1we show the limitation of some papers.

Table 1. Comparative analysis

Authors name Dataset Tec h nolog y u s ed Results Limitations

Hager et al.

2021 [24]

Dataset1,

FakeNewsNet,

FA- K E S 5,The

ISOT

Grid search,

hyperopt

optimization

Tec h nique s

Accuracy

OPCNNFAKE =

97.84%,

RNN =86.76%

Not using

Knowledge-base

and fact-based

approaches

Somya et al.

2020 [2]

More than 15

000 news

contents from

different

Facebook users

including both

fake and real

news

Fake news

detection

approach in

chrome

environment,

KNN, SVM,

Logistic

Regression

Accuracy KNN =

99.3%,

SVM =99.3%,

Logistic

regression =

99.0%,

Decision tree =

99.1%,

LSTM =99.4%

When compared

to machine

learning

algorithms, deep

learning

algorithms take

more time for

testing and

analysis

Abdullah et al.

2020 [25]

Fake news

dataset from

Kaggle

Hybrid

CNN-LSTM

Model

Accuracy

99.7%, 97.5% of

training and

testing

Doesn’t use

pre-processing

Feyza et al.

2019 [1]

BuzzFeed

Political News

Data set,

Random

Political News

Data set,

ISOT Fake

News Data set

TF weighting

method and

Document-Term

Matrix

Twe n t y - t h r e e

supervised

artiﬁcial

intelligence

algorithms

Accuracy of

65.5% and 64.4%

They work on

smaller data set

of 1500 news

articles

K. Shu et al.

2019 [22]

FakeNewsNet Fake news data

repository

FakeNewsNet,

Convolution

neural network

Accuracy is 92%

for Buzzfeed

news and 93.6 for

PolitiFact news

Accuracy is less

due to a smaller

number of

features

J.C et al. 2019

[23]

None Measure the

prediction

performance of

proposed

approaches to

design auto

detection system

SVM, KNN

Accuracy

RF =85%,

KNN =80%,

SVM =79%

Accuracy for

detecting fake

account is very

due to small

dataset

Review on Analysis of Classiﬁers for Fake News Detection 401

3 Methodology

3.1 Data Pre-processing

Preparing raw data for use in a machine learning model is known as data pre-processing.

It’s the ﬁrst and most crucial step in creating a machine learning model. Three phases

make up data pre-processing: -

Data Cleaning - This is a method used to improve the quality of data. Normalizing data,

reducing mistakes, smoothing noisy data, addressing missing data, ﬁnding unnecessary

observations, and correcting errors are all part of this process.

Data Transformation - In this approach, raw data is converted into a format that allows

data mining to collect strategic information quickly and efﬁciently. Because raw data is

difﬁcult to track, any information collected must ﬁrst be pre-processed.

Data Reduction - Data reduction is a technique for shrinking the size of original data so

that it may be represented in a lesser amount of space. Data reduction strategies maintain

data integrity while lowering data (Fig. 2).

Fig. 2. Data types in the news

Stop Words Removal: Stop words have no signiﬁcance until they are combined with

other words. Stop words are considered noise in text classiﬁcation when characteristics

are retrieved from text. These are the terms that are utilized in article sentences to connect

the concept and sentence structure. Stop words include prepositions, conjunctions, and

articles. Words like by, for, from, how, of, on, that, the, too, was, what, when, about,

and so on are examples. These terms are no longer in use. Total number of words in all

articles and unique word count in the listed articles after preprocessing.

402 M. K. Jain et al.

Stemming: The tokens are then converted into basic words as the next stage in text

normalization. Stemming is a method of transforming a word into its proper form.

Stemming is a technique for reducing the number of classes of words in data. Words’

afﬁxes are changed by stemming. We utilized the Porter Stemmer Algorithm to convert

the term Python Ly to Python. Because it produces good stemming outcomes. In our

data collection, words like extreme, very, government, transgender, and minister were

replaced with extreme, govern, transgend, and minist, respectively.

3.2 Feature Extraction and Selection

Getting from high dimensional information is one of the challenges of text classiﬁers.

There are a few concepts, phrases, and explanations in documents that contribute to

the learning cycle’s high computational weight. Furthermore, extraneous and excessive

highlights might degrade the precision and execution of classiﬁers.

Term F r e q ue ncy : It is commonly known as count vectorizer (CV), is a technique that

assesses the correspondence of texts using the bag of words (BoW) technique. Each

document comprises a collection of words that have been represented as a length vector.

Each count represents the likelihood of a word appearing in the manuscript. If a word

exists in a document, its value is one; otherwise, its value is zero. Weights of certain

words from the corpus in TF (BoW). That formula given in Eq. (1).

Here, a =Number of times the words occur in the text.

b=Total number of words in text.

c=Total number of documents.

d=Number of documents with word x in it.

TF =a/b(1)

TF-IDF: Frequency-Inverses Document Frequency, also referred as TF-IDF, is a widely

used approach for determining the importance of a word in a document by using altered

text numerical representation. For Natural Language Processing, this is a frequently used

feature extraction approach (NLP). One of the most important features of IDF is that it

affects term recurrence while going up the unusual terms. For example, words like “the”

and “at that point” frequently appear in material, and if we just utilize TF, phrases like

“the” and “at that point” will govern the recurrence check. The IDF, on the other hand,

reduces the impact of these phrases. Below formula given in Eq. (2)and(3)

IDF =Log [c/d](2)

TF−IDF =TF ∗IDF (3)

Content Based Feature: It is a ML approach that makes decisions based on feature

similarity. It is frequently employed in recommender systems, which are algorithms that

advertise or recommend items to people based on information gathered about them. Like-

Number sentences, words, question mark, exclamation and capital letters, punctuation,

negations (no, not), frequency used words.

Review on Analysis of Classiﬁers for Fake News Detection 403

Context-Based Feature: The contextual information aids in maximizing the con-

sumers’ understanding of the scene photographs on the web. Such details can help

distinguish between ambiguous scene photos with intra-class variance and inter-class

similarities,User_Name, User_Age, Registration Date, Registration Time, Follower,

Following, Number of Posts.

3.3 Machine Learning Strategies for Detecting Fake News

Machine Learning is used to create models that make predictions based on past or

historical data. It’s a feature of AI; it learns from previous experiences and may anticipate

our needs. Typically used to store and then analyses massive amounts of data. Nowadays,

it is utilized to identify fraud. There are many other machine learning approaches, but

we’re focusing on supervised machine learning [15–22].

Naive Bayes: To forecast the class of the previous dataset, it is a basic, straightforward,

and effective classiﬁcation procedure. This machine learning classiﬁer is used to make

quick predictions. It forecasts data based on the likelihood of an object. It produces

superior outcomes in multi-class predictions. This is mostly used for text categorization

and consumes a signiﬁcant quantity of data. When it employs NLP tasks like Sentimental

analysis, it produces superior results.

It entails completing the following steps:

•First, make a frequency table based on the terms.

•Calculate the probability for each class based on the frequency table.

•Calculate the posterior probability for each class.

•The greatest posterior probability is the result of the prediction classiﬁer.

K-Nearest Neighbours (KNN): It is based on a machine learning method that is super-

vised. KNN classiﬁcation is mostly used to discover and identify abnormalities (false

news). It is mostly used for classiﬁcation problems, such as detecting bogus news; it

is simple to use and takes little time, and it can readily reduce noise. It calculates the

shortest distance using the Euclidean distance formula, which is K. Across all the input,

this square root of the total of the square differences between a training data and a testing

point.

Random Forest (RF): It is a supervised learning algorithm for text classiﬁcation. It

prevents overﬁtting by classifying the text and eliminating unpredictability. The “forest”

suggestion is a collection of uncorrelated decision trees that are combined to eliminate

variation and provide more accurate data predictions.

Extra Tree Classiﬁer: It is a form of batch learning algorithm that outputs a classiﬁca-

tion result by aggregating several de-correlated decision trees collected in a “forest”. It

is quite similar to a Random Forest Classiﬁer, and the only difference is that it informs

the forest’s decision trees. We didn’t utilize it as a classiﬁer in this case; instead, we used

it as a feature selection approach to choose the most suited features, which we then used

in the Classiﬁers to improve results and performance.

404 M. K. Jain et al.

Support Vector Machine: This algorithm’s purpose is to ﬁnd the funniest line or deci-

sion boundary that can divide n-dimensional space into classes so that fresh data points

can be readily placed in the correct category in the future. A hyperplane denotes the

optimal choice boundary. Machine learning classiﬁcation is shown in Fig. 3.

Logistic Regression: It’s a method for predicting a categorical dependent variable from

asetofindependentvariables.Acategoricaldependentvariable’soutputispredicted

using logistic regression. As a result, the result must be a discrete or categorical value.

It can be Yes or No, 0 or 1, true or false, and so on, but instead of giving exact values

like 0 and 1, it delivers probabilistic values that are somewhere between 0 and 1.

Fig. 3. Machine learning architecture

Convolutional Neural Networks: ACNNisaDeepLearningmethodthattakesan

image as input and assigns different weights and biases to different sections of the picture

so that they can be distinguished. The Convolutional Neural Network Model can handle

avarietyoftasksintheImageProcessingdomainoncetheybecomedifferentiable,such

as Image Recognition, Image Classiﬁcation, Object and Face Detection, and so on.

Recurrent Neural Networks: Recurrent Neural Networks (RNNs) are neural networks

that process temporal or consecutive data. To develop the best predictions, RNNs employ

other data points in a series. They achieve this by taking in input and inﬂuencing the

output by recycling the activations of previous nodes or later nodes in the sequence.

Deep learning approach shown in Fig. 4.

Review on Analysis of Classiﬁers for Fake News Detection 405

Fig. 4. Deep learning architecture

4 Challenges

Deep Learning: Deep learning technology can deal with any type of data, including

text, photos, video, and audio. It may be adapted to a new type of problem and avoids

feature engineering, which is the most time-consuming yet essential aspect of a machine-

learning framework. Deep learning technologies, on the other hand, have the disadvan-

tage of requiring a signiﬁcant amount of time for model training with a relatively large

amount of data and not providing interpretations of what the model has actually learned,

making it almost a black box type of processing inside the model.

Multimedia False Information Detection: Data analytics, computer vision, and sig-

nal processing techniques are needed to generate fabricated and modiﬁed sounds, images,

and movies. Machine learning and deep learning are essential for discovering signature

traits of modiﬁed and produced multimedia.

Unsupervised Models: The majority of the current work is done utilizing supervised

learning techniques. Due to the large amount of unlabelled data available on social media,

unsupervised models must be constructed.

Datasets: Because most research is done on customised datasets, the production of

persuasive gold standard datasets in this sector is critical. A benchmark comparison

between multiple algorithms is impossible due to the absence of publicly available

large-scale datasets.

Multilingual Platform: The majority of the research focuses on linguistic charac-

teristics in English-language texts. Other popular and regional languages are not yet

considered (multilingual platform for fake news identiﬁcation).

Early Detection: Detecting fake news in its early phases, before it spreads widely, is

adifﬁculteffortthatmustbecompletedquicklyinordertomitigateandintervene.It’s

nearly tough to change people’s minds after bogus news has been widely accepted and

trusted.

406 M. K. Jain et al.

Cross Domain Analysis: The majority of present systems concentrate solely on the

method of deception detection, whether in the form of content, dissemination, style,

or other factors. Cross domain analysis, which takes into account a variety of factors

such as topic, website, language, photos, and URL, aids in the identiﬁcation of unique

non-varying traits and allows for the early detection of fake content.

5 Conclusion

In recent years, people have found it more difﬁcult to obtain accurate and reliable infor-

mation due to the growing volume of information available on social media. We cover a

variety of tools and methodologies for spotting fake news. In this work, we identiﬁed a

number of machine learning approaches for detecting false news, including SVM, LR,

NB, KNN, RF. On all types of datasets used by the authors, linear SVM performed better.

To detect false news, several researchers employed deep learning models such as CNN,

RNN, and Hybrid Models. In the future, we’ll strive to construct a new real-time dataset

for false news and use the graph convolution neural networks to detect false news.

References

1. Ozbay, F.A., Alatas, B.: Fake news detection within online social media using super-

vised artiﬁcial intelligence algorithms. Phys. A (2019). https://doi.org/10.1016/j.physa.2019.

123174

2. Sahoo, S.R., Gupta, B.B.: Multiple features-based approach for automatic fake news detection

on social networks using deep learning. Appl. Soft Comput. 106983 (2020). https://doi.org/

10.1016/j.asoc.2020.106983

3. Arruda Faustini, P.H., Covões, T.F.: Fake news detection in multiple platforms and languages.

Expert Syst. Appl. 113503 (2020). https://doi.org/10.1016/j.eswa.2020.113503

4. Reis, J.C.S., Correia, A., Murai, F., Veloso, A., Benevenuto, F., Cambria, E.: Supervised

learning for fake news detection. IEEE Intell. Syst. 34(2), 76–81 (2019). https://doi.org/10.

1109/mis.2019.2899143

5. Zhang, X., Ghorbani, A.A.: An overview of online fake news: characterization, detection,

and discussion. Inf. Process. Manage. 57,102025(2019).https://doi.org/10.1016/j.ipm.2019.

03.004

6. Kumar Jain, M., Gopalani, D., Kumar Meena, Y., Kumar, R.: Machine learning based

fake news detection using linguistic features and word vector features. In: 2020 IEEE

7th Uttar Pradesh Section International Conference on Electrical, Electronics and Com-

puter Engineering (UPCON), pp. 1–6 (2020). https://doi.org/10.1109/UPCON50219.2020.

9376576

7. Elhadad, M.K., Fun Li, K., Gebali, F.: Fake news detection on social media: a systematic

survey. In: 2019 IEEE Paciﬁc Rim Conference on Communications, Computers and Signal

Processing (PACRIM) (2019). https://doi.org/10.1109/pacrim47961.2019.8985

8. Della Vedova, M.L., Tacchini, E., Moret, S., Ballarin, G., DiPierro, M., de Alfaro, L.: Auto-

matic online fake news detection combining content and social signals. In: 2018 22nd Confer-

ence of Open Innovations Association (FRUCT) (2018). https://doi.org/10.23919/fruct.2018.

846830

Review on Analysis of Classiﬁers for Fake News Detection 407

9. Liu, H., Wang, L., Han, X., Zhang, W., He, X.: Detecting fake news on social media: a multi-

source scoring framework. In: 2020 IEEE 5th International Conference on Cloud Computing

and Big Data Analytics (ICCCBDA), pp. 524–531. IEEE, April 2020

10. Helmstetter, S., Paulheim, H.: Weakly supervised learning for fake news detection on Twitter.

In: 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and

Mining (ASONAM) pp. 274–277. IEEE (2018)

11. Ahmed, H., Traore, I., Saad, S.: Detection of online fake news using n-gram analysis and

machine learning techniques. In: Traore, I., Woungang, I., Awad, A. (eds.) ISDDC 2017.

LNCS, vol. 10618, pp. 127–138. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-

69155-8_9

12. Reddy, H., Raj, N., Gala, M., Basava, A.: Textmining-based fake news detection using

ensemble methods. Int. J. Autom. Comput. 1–12 (2020)

13. Yang, Y., Zheng, L., Zhang, J., Cui, Q., Li, Z., Yu, P.S.: TI-CNN: convolutional neural networks

for fake news detection. arXiv preprint arXiv:1806.00749 (2018)

14. Meel, P., Vishwakarma, D.K.: Fake news, rumor, information pollution in social media and

web: a contemporary survey of state-of-the-arts, challenges and opportunities. Expert Syst.

Appl. 153,112986(2019)

15. Perez-Rosas, V., Kleinberg, B., Lefevre, A., Mihalcea, R.: Automatic detection of fake news.

arXiv preprint arXiv:1708.07104 (2017)

16. Kumar, T., Mahrishi, M., Meena, G.: A comprehensive review of recent automatic speech

summarization and keyword identiﬁcation techniques. In: Fernandes, S.L., Sharma, T.K.

(eds.) Artiﬁcial Intelligence in Industrial Applications. LAIS, vol. 25, pp. 111–126. Springer,

Cham (2022). https://doi.org/10.1007/978-3-030-85383-9_8

17. Loper, E., Bird, S.: NLTK: the natural language toolkit. In: Proceedings of the ACL-02

Workshop on Effective Tools and Methodologies for Teaching Natural Language Process-

ing and Computational Linguistics, ETMTNLP 2002, vol. 1, pp. 63–70. Association for

Computational Linguistics, Stroudsburg (2002). https://doi.org/10.3115/1118108.1118117

18. Rubin, V.L., Chen, Y., Conroy, N.J.: Deception detection for news: three types of fakes. Proc.

Assoc. Inf. Sci. Technol. 52(1), 1–4 (2015). https://doi.org/10.1002/pra2.2015.145052010083

19. Shu, K., Sliva, A., Wang, S., Tang, J., Liu, H.: Fake news detection on social media: A data

mining perspective. ACM SIGKDD Explor. Newsl. 19(1), 22–36 (2017a). Shu, K., Wang, S.,

Liu, H.: Exploiting tri-relationship for fake news detection. arXiv:1712.07709 (2017b)

20. Tausczik, Y.R., Pennebaker, J.W.: The psychological meaning of words: LIWC and comput-

erized text analysis methods. J. Lang. Soc. Psychol. 29(1), 24–54 (2010). https://doi.org/10.

1177/0261927X09351676

21. Mahrishi, M., et al.: Video index point detection and extraction framework using custom

YoloV4 darknet object detection model. IEEE Access (2021). https://doi.org/10.1109/ACC

ESS.2021.3118048

22. Shu, K., Mahudeswaran, D., Wang, S., Lee, D., Liu, H.: Fake newsnet: a data repository with

news content, social context and dynamic information for studying fake news on social media.

arXiv preprint arXiv:1809.01286 (2018)

23. Reis, J.C., Correia, A., Murai, F., Veloso, A., Benevenuto, F., Cambria, E.: Supervised learning

for fake news detection. IEEE Intell. Syst. 34(2), 76–81 (2019)

24. Saleh, H., Alharbi, A., Alsamhi, S.H.: OPCNN-FAKE: optimized convolutional neural net-

work for fake news detection. IEEE Access 9, 129471–129489 (2021). https://doi.org/10.

1109/ACCESS.2021.3112806

25. Abdullah, A., Awan, M., Shehzad, M., Ashraf, M.: Fake news classiﬁcation bimodal using

convolutional neural network and long short-term memory. Int. J. Emerg. Technol. Learn 11,

209–212 (2020)

Fake News Detection: A Study

Chapter

Full-text available

Jul 2023

The struggle of organisations from all fields to find practical answers for identifying online-based fake news is a prevalent problem right now. The news is published on news Websites, which act as official sources. Social media has drawn the attention of individuals from all over the world who use it to disseminate fake news because of its accessibility, cost, and ease of information exchange. Because people are unable to distinguish between true and misleading information, fake news weakens the logic of the truth, endangering democracy, journalism, and public confidence in political institutions. To sustain strong Internet media and informal organisations, fake news detection must be automated. The manual method is now impractical, slow, expensive, very subjective, and biased due to the vast quantity of data that is available on social networks. As a result, an interesting and fruitful area of research is automated data categorisation. ML and DL algorithms are by far the best way for fake news detection. This study provided an exhaustive, insightful, and empirical assessment encompassing all AI strategies for the recognising fake news, including reinforcement learning, ensemble learning, unsupervised learning, supervised learning, and semi-supervised learning.KeywordsDeep learning(DL)Fake news classificationMachine learning (ML)Social media

ConFake: fake news identification using content based features

Article

Full-text available

Jun 2023
MULTIMED TOOLS APPL

The majority of users were available on the Internet and created a number of social networking accounts during India’s COVID-19-caused lockdown, which lasted from March to June 2020. A massive amount of information is currently being disseminated on the Internet via various social networking accounts. Some false or fake information in the form of “government letters or resolutions, religious comments, hate speech, and so on" has spread like wildfire. As a result, there are major social issues affecting areas such as unemployment, politics, healthcare, poverty, religious cleavages, etc. Due to the vast availability of similar datasets comprising these types of information, manual detection of fake news or false information is challenging. This issue requires immediate attention in terms of automatically finding false news. With this motivation, we present a novel ‘ConFake’ algorithm. This algorithm includes an eighty content-based feature set for identifying fake news. Content-based and word vector features extracted from the textual content of news stories were used in the experiment. These characteristics were combined and input into machine learning classifiers. To validate the experimental findings, we ran all of the experiments on five publicly available datasets and one synthetically generated ConFake dataset that combined five datasets, namely: Kaggle, McIntire, Reuter, BuzzFeed, and PolitiFact. The proposed model achieved the highest accuracy of 97.31% when compared to other cutting-edge models.

Stock Market Trends Analysis using Extreme Gradient Boosting (XGBoost)

Conference Paper

Full-text available

Nov 2023

A Comprehensive Review of Recent Automatic Speech Summarization and Keyword Identification Techniques

Chapter

Full-text available

Jan 2022

Speech has been the most popular form of human communication. A keyboard or a mouse, on the other hand, is the most common way of entering data into a computer. It would be wonderful if computers could understand and carry out human commands. The method of obtaining the transcription (word sequence) of an utterance from the speech waveform is known as automatic speech recognition (ASR). Over the last few decades, speech technology and systems in human-computer interaction have progressed progressively and significantly. This chapter suggests a comprehensive review of automatic speech recognition systems (ASR) and their most recent developments. This research aims to outline and explain some of the popular approaches in speech recognition systems at various stages and highlight selected systems’ unique and innovative characteristics.

Video Index Point Detection and Extraction Framework Using Custom YoloV4 Darknet Object Detection Model

Article

Full-text available

Oct 2021

The trend of learning from videos instead of documents has increased. There could be hundreds and thousands of videos on a single topic, with varying degrees of context, content, and depth of the topic. The literature claims that learners are nowadays less interested in viewing a complete video but prefers the topic of their interests. This develops the need for indexing of video lectures. Manual annotation or topic-wise indexing is not new in the case of videos. However, manual indexing is time-consuming due to the length of a typical video lecture and intricate storylines. Automatic indexing and annotation is, therefore, a better and efficient solution. This research aims to identify the need for automatic video indexing for better information retrieval and ease users navigating topics inside a video. The automatically identified topics are referred to as “Index Points.” 137-layer YoloV4 Darknet Neural Network creates a custom object detector model. The model is trained on approximately 6000 video frames and then tested on a suite of 50 videos of around 20 hours of run time. Shot Boundary detection is performed using Structural Similarity fused with a Binary Search Pattern algorithm which outperformed the state-of-the-art SSIM technique by reducing the processing time to approximately 21% and providing around 96% accuracy. Generation of accurate index points in terms of true positives and false negatives is detected through precision, recall, and F1 score, which varies between 60-80% for every video. The results show that the proposed algorithm successfully generates a digital index with reasonable accuracy in topic detection.

OPCNN-FAKE: Optimized Convolutional Neural Network for Fake News Detection

Article

Full-text available

Sep 2021

Recently, there is a rapid and wide increase in fake news, defined as provably incorrect information spread with the goal of fraud. The spread of this type of misinformation is a severe danger to social cohesiveness and well-being since it increases political polarisation and people’s distrust of their leaders. Thus, fake news is a phenomenon that is having a significant impact on our social lives, particularly in politics. This paper proposes novel approaches based on Machine Learning (ML) and Deep Learning (DL) for the fake news detection system to address this phenomenon. The main aim of this paper is to find the optimal model that obtains high performance. Therefore, we propose an optimized Convolutional Neural Network model to detect fake news (OPCNN-FAKE). We compare the performance of the OPCNN-FAKE with Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), and The six regular ML techniques: Decision Tree (DT), logistic Regression (LR), K Nearest Neighbor (KNN), Random Forest (RF), Support Vector Machine (SVM), and Naive Bayes (NB) using four fake news benchmark datasets. Grid search and hyperopt optimization techniques have been used to optimize the parameters of ML and DL, respectively. In addition, N-gram and Term Frequency—Inverse Document Frequency (TF-IDF) have been used to extract features from the benchmark datasets for regular ML, while Glove word embedding has been used to represent features as a feature matrix for DL models. To evaluate the performance of the OPCNN-FAKE, accuracy, precision, recall, F1-measure were applied to validate the results. The results show that OPCNN-FAKE model has achieved the best performance for each dataset compared with other models. Furthermore, the OPCNN-FAKE has a higher performance of cross-validation results and testing results than the other models, which indicates that the OPCNN-FAKE for fake news detection is significantly better than other models.

Machine Learning based Fake News Detection using linguistic features and word vector features

Conference Paper

Full-text available

Nov 2020

Fake News Classification Bimodal using Convolutional Neural Network and Long Short-Term Memory

Article

Full-text available

Aug 2020

Fake news is a publicity or conspiracy that contains cautious or fake information having a social as well as political impact because it is spread through old fashioned media and gets the progression via social or news media. Some challenges during fake news are veracity of a news story and natural language processing. This article we are using multimodal approach with Convolutional Neural Network (CNN) and Long Short-Term memory (LSTM) to classify the fake news articles achieved significance performance. We worked on a database with 12 different categories of news articles and used linguistic cue approaches with machine learning. We classified a news based on its source and its previous history (such as domain name and/or author name) with bimodal CNN and LSTM. Through reputable news source, our model classifiesreliable news articles with the accuracy of 99.7% on the training data and 97.5% on test data. However, as a fake news can still be published on a reputable domain, we still had to consider other parameter such as news headlines.

FakeNewsNet: A Data Repository with News Content, Social Context, and Spatiotemporal Information for Studying Fake News on Social Media

Article

Full-text available

Jun 2020

Social media has become a popular means for people to consume and share the news. At the same time, however, it has also enabled the wide dissemination of fake news, that is, news with intentionally false information, causing significant negative effects on society. To mitigate this problem, the research of fake news detection has recently received a lot of attention. Despite several existing computational solutions on the detection of fake news, the lack of comprehensive and community-driven fake news data sets has become one of major roadblocks. Not only existing data sets are scarce, they do not contain a myriad of features often required in the study such as news content, social context, and spatiotemporal information. Therefore, in this article, to facilitate fake news-related research, we present a fake news data repository FakeNewsNet, which contains two comprehensive data sets with diverse features in news content, social context, and spatiotemporal information. We present a comprehensive description of the FakeNewsNet, demonstrate an exploratory analysis of two data sets from different perspectives, and discuss the benefits of the FakeNewsNet for potential applications on fake news study on social media.

Fake news detection in multiple platforms and languages

Article

Full-text available

May 2020
EXPERT SYST APPL

The debate around fake news has grown recently because of the potential harm they can have on different fields, being politics one of the most affected. Due to the amount of news being published every day, several studies in computer science have proposed models using machine learning to detect fake news. However, most of these studies focus on news from one language (mostly English) or rely on characteristics of social media-specific platforms (like Twitter or Sina Weibo). Our work proposes to detect fake news using only text features that can be generated regardless of the source platform and are the most independent of the language as possible. We carried out experiments from five datasets, comprising both texts and social media posts, in three language groups: Germanic, Latin, and Slavic, and got competitive results when compared to benchmarks. We compared the results obtained through a custom set of features and with other popular techniques when dealing with natural language processing, such as bag-of-words and Word2Vec.

Multiple features based approach for automatic fake news detection on social networks using deep learning

Article

Dec 2020

In recent years, the rise of Online Social Networks has led to proliferation of social news such as product advertisement, political news, celebrity’s information, etc. Some of the social networks such as Facebook, Instagram and Twitter affected by their user through fake news. Unfortunately, some users use unethical means to grow their links and reputation by spreading fake news in the form of texts, images, and videos. However, the recent information appearing on an online social network is doubtful, and in many cases, it misleads other users in the network. Fake news is spread intentionally to mislead readers to believe false news, which makes it difficult for detection mechanism to detect fake news on the basis of shared content. Therefore, we need to add some new information related to user’s profile, such as user’s involvement with others for finding a particular decision. The disseminated information and their diffusion process create a big problem for detecting these contents promptly and thus highlighting the need for automatic fake news detection. In this paper, we are going to introduce automatic fake news detection approach in chrome environment on which it can detect fake news on Facebook. Specifically, we use multiple features associated with Facebook account with some news content features to analyze the behavior of the account through deep learning. The experimental analysis of real-world information demonstrates that our intended fake news detection approach has achieved higher accuracy than the existing state of art techniques.

Detecting Fake News on Social Media: A Multi-Source Scoring Framework

Conference Paper

Apr 2020

Text-mining-based Fake News Detection Using Ensemble Methods

Article

Feb 2020

Social media is a platform to express one’s views and opinions freely and has made communication easier than it was before. This also opens up an opportunity for people to spread fake news intentionally. The ease of access to a variety of news sources on the web also brings the problem of people being exposed to fake news and possibly believing such news. This makes it important for us to detect and flag such content on social media. With the current rate of news generated on social media, it is difficult to differentiate between genuine news and hoaxes without knowing the source of the news. This paper discusses approaches to detection of fake news using only the features of the text of the news, without using any other related metadata. We observe that a combination of stylometric features and text-based word vector representations through ensemble methods can predict fake news with an accuracy of up to 95.49%.

Review on Analysis of Classifiers for Fake News Detection

Abstract

Recommended publications

Machine Learning based Fake News Detection using linguistic features and word vector features

ConFake: fake news identification using content based features

Fake News Detection: A Study

CTrL-FND: content-based transfer learning approach for fake news detection on social media