Conference PaperPDF Available

Automatic Identification of Fake News Using Deep Learning

October 2019

October 2019

DOI:10.1109/SNAMS.2019.8931873

Conference: 2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS)

Authors:

Ethar Qawasmeh

The Ohio State University

Mais Ali Tawalbeh

Jordan University of Science and Technology

Malak A. Abdullah

Jordan University of Science and Technology

Stance classes distribution rate in train_stances file

…

Model1 Diagram; FND_Bidirectional LSTM concatenated

…

Model2 Diagram; FND_Multi-head LSTM

…

Model1 results: (a) Accuracy; and, (b) Loss;

…

Figures - uploaded by Mais Ali Tawalbeh

Content may be subject to copyright.

Content uploaded by Mais Ali Tawalbeh

Content may be subject to copyright.

Automatic Identiﬁcation of Fake News Using Deep

Learning

Ethar Qawasmeh, Mais Tawalbeh, Malak Abdullah

College of Computer and Information Technology

Jordan University of Science and Technology

Email: {eaqawasmeh16,matawalbeh18}@cit.just.edu.jo,mabdullah@just.edu.jo

Abstract—The rapid development of computing trends, wire-

less communications, and the smart devices industry has con-

tributed to the widespread of the internet. People can access

internet services and applications from anywhere in the world

at any time. There is no doubt that these technological advances

have made our lives easier and saved our time and efforts. On

the other side, we should admit that there is a misuse of internet

and its applications including online platforms. As an example,

online platforms have been involved in spreading fake news all

over the world to serve certain purposes (political, economic,

or social media). Detecting fake news is considered one of the

hard challenges in term of the existing content-based analysis of

traditional methods. Recently, the performance of neural network

models have outperformed traditional machine learning methods

due to the outstanding ability of feature extraction. Still, there is

a lack of research work on detecting fake news in news and time

critical events. Therefore, in this paper, we have investigated the

automatic identiﬁcation of fake news over online communication

platforms. Moreover, We propose an automatic identiﬁcation

of fake news using modern machine learning techniques. The

proposed model is a bidirectional LSTM concatenated model

that is applied on the FNC-1 dataset with 85.3% accuracy

performance.

Index Terms—fake news detection, social media, machine

learning, deep learning, Fake news challenge

I. INTRODUCTION

The recent trends and advances in communication and mobile

technologies along with the widespread of the Internet have

simpliﬁed accessing the news all over the world. Moreover, the

uncontrolled development in the web-based life applications and

the race of sharing and spreading news-related data between the

international organizations in the ﬁeld of social correspondence

have affected the reliability of the news [

]. Recently, social

media has become one of the main sources of information. The

main factors beyond this are the low cost, the speed of access,

ease of use, and availability on all digital devices including

desktops, smartphones, iPod and others.

The "fake news" concept refers to intentionally disseminate

false information on social media that aims to confuse and

mislead the reader to achieve economic or political agendas. In

addition, the diverse and growing number of industry players

in the ﬁeld of news writing and spreading have led to creating

news articles that are difﬁcult to know whether they are credible

or not. [2].

The massive growth of fake news in social media has

motivated researchers in academic institutions and industrial

domains to obtain solutions that limit this phenomenon [

The widespread of fake news that preceded the United States

presidential elections 2016 is considered a controversial issue

that affected the public opinions [

]. The risk of catastrophic

impacts of the rapid spread of false news over the social

network sites is increasing dramatically. Therefore, spreading

fake news is a worldwide annoying issue and many countries are

criminalizing the creation and the distribution of misinformation

online 1.

Automatically detecting fake news is considered a challenge

for the existing content-based analysis methods. There is an

urgent need for investigating machine learning approaches to

detect fake news. Several fake news detection models and

approaches have been used to identify fake news including

traditional learning [

] and deep learning models [

There are three main categories for these approaches which

are: content, social context, and propagation [

]. The existing

neural network models have outperformed the traditional ones

on their performance due to the outstanding ability of feature

extraction, but still, these methods are not able to detect fake

news, newly arose news, and time-critical events [10].

The remainder of this paper is organized as follows. Section

2 gives a brief description of existing works in detecting

fake news. Section 3 provides a background about fake

news deﬁnition and detection. Section 4 proposes a system

architecture to determine the presence of fake in an article.

Section 5 presents the evaluations and the results. Finally,

section 6 concludes with future directions for this research.

II. RE LATE D WORK

There are a signiﬁcant number of studies has been conducted

for fake news detection in the context of machine learning.

Pérez-Rosas et al.

[11]

constructed two new datasets that

covered seven news domains. One dataset was collected by

crowd-sourcing, which covered six news data set. The second

dataset was collected directly from the web and covered one

domain. They conducted several exploratory analyses in order

to identify linguistic features that were presented in fake news

content and the differences from real news. They built a fake

news detector using linear SVM classiﬁer and ﬁve-fold cross-

validation based on the combination of lexical, syntactic, and

semantic information as linguistic features. Based on their

results, their model achieved accuracy up to 78%.

1https://www.poynter.org/ifcn/anti-misinformation-actions/

2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS)

383

Davis and Proctor

[12]

proposed a model of bag-of-words

followed by a three-layer multi-layer perceptron (BoW MLP).

This model achieved the best results compared with the other

three neural architecture models in their study. They used The

Fake News Challenge dataset (FNC-1) which was presented

in a public competition that aimed to ﬁnd automatic methods

for detecting fake news. The objective was to classify the

dataset that consists of headline-text pairs as unrelated, agreeing,

disagreeing, or discussing. They achieved 93% of accuracy.

In addition, Miller and Oswalt

[13]

used the same dataset for

detecting fake news using a neural network model with attention

mechanism. They built a network architecture using multiple

Bidirectional LSTMs and an attention mechanism to predict

the entailment of the articles to their paired headlines. The

best result was achieved by the BiLSTM + MLP (Multilayer

Perceptron) with 57% accuracy. Other trials with attention

models performed about 55% accurately.

Advances in artiﬁcial intelligence have made it easy to

create fake visual contents, like images and videos, that are

hard to be spotted if real or fake. These visual contents can

be easily used to accelerate the spread of fake news through

social media. In 2018, Wang et al.

[14]

proposed a framework

named Event Adversarial Neural Network (EANN), which

aimed to derive event-invariant features and can identify fake

news based on multi-modal features and learn transferable

feature. For evaluation of the performance for their model, they

collected two real social data-sets from Twitter and Weibo. The

experimental results showed that their proposed EANN model

outperformed the state-of-the-art methods at that time.

In 2018, Zhang et al.

[15]

built a deep diffusive network

model named FACE DETECTOR relied on a set of explicit

and latent features extracted from the textual information.

The proposed model aimed to detect the labels of news

articles, creators and subjects simultaneously. Furthermore, they

had performed an extensive experiment on a real-world fake

news dataset from PolitiFact to compare FACE DETECTOR

with several state-of-the-art models. They concluded that the

proposed model had outstanding performance in identifying

the fake news articles, creators, and subjects in the network.

Recently, in 2019, Monti et al.

[16]

proposed a fake news

detection model based on geometric deep learning on a

Twitter social network. They collected news articles from the

archive of news that were provided from popular fact-checking

organizations such as Snopes1, PolitiFact2, and Buzzfeed3.

They ﬁltered out all data which not-contained at least one URL

reference on Twitter. Their experiments showed that the social

network features, such as structure and propagation, achieved

high accuracy (92.7 % ROC AUC) on fake news detection

model.

III. FAKE NEWS

The broadcasting media of news has been changed drastically

from newsprint, telephone, radio, and television to the internet

by online news and social media. In this section, we discuss

the main deﬁnitions of fake news. We also explore the fake

news detection methods, especially stances detection, which

has been used in this paper.

A. Problem deﬁnition

In the past, the news was being broadcast by speech between

two or more parties. Then, the printed newspapers became

wandering the world as trusted sources that follow strict codes

of practice. Before the born of internet, we got our news from

traditional sources which come from professionals in the form

of newspapers, radio or television. However, the internet has

enabled a whole new way to publish and broadcast information

and news with less regulation or editorial standards. Three

years ago, Fake news became a common concept that has

grown in popularity. Fake news is deﬁned as a news that

intentionally includes false information and aims to mislead

the reader [

]. This deﬁnition has two main characteristics:

authenticity and intent. In the ﬁrst characteristic, fake news

includes false information varied in their content that can be

veriﬁed. The second one, fake news is created with dishonest

intention aims to mislead consumers. This deﬁnition of fake

news has been widely adopted in recent researches [18, 19].

B. Stance Detection

The rapid growth of players number in news spreading ﬁeld,

especially on social media, has led to an increase in the number

of news which are hard to tell if they are credible or not. The

common way to highlight a speciﬁc type of news is by choosing

a headline that grasps the attention of readers. Unfortunately,

the news content does not always reﬂect the headline. The

researchers studied this problem by analyzing several datasets

that are focused on stance and headline detection task to identify

whether a news headline is - in reality- related or unrelated to

the corresponding news article. A further investigation had been

applied on data to determine if the news headline is agreed,

disagree or discussed with the content of the corresponding

news article. The dataset used in this current paper was provided

by Fake News Challenge (FNC-Stage 1)

. This challenge was

prepared in 2016 by a category of academics and volunteers

with a goal of addressing the fake news problem. FNC-1’s

experimental setup and the top three results are discussed by

a group of researchers in [

]. The same data was studied

and analyzed using different methods by different researchers

[

]. Also, the attention module was applied on the same

data in 2017 by Chopra et al.

[22]

and they got 51% accuracy.

In addition to FNC-1 datasets, other datasets were proposed

for the same purpose of detecting fake news [23, 24].

IV. APP ROACH

This section demonstrates the used dataset. In addition, we

discuss the data cleaning process and the feature extraction

method that are used in our approach. Then, we present the

model architectures.

2Fake news challenge, http://www.fakenewschallenge.org/

2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS)

384

A. Dataset

We have used a speciﬁc dataset that serves our goal to

address the fake news issue, distinguishes untrustworthy news,

and improves automatic detectors tools. The dataset contains

news articles that are labeled with a class label to imply whether

the article is trustworthy or not. The dataset is derived from

the Emergent Dataset created by Craig Silverman [

] and is

used in February 2017 for Fake News Challenge Stage 1 (FNC-

1). The dataset can be downloaded from their Github page

The dataset is provided as two CSV’s ﬁles { train_bodies and

train_stances }. Table I provides the dataset format.

TABLE I: Dataset format

FNC-1 Dataset

train_bodies (1683 articles)

Field Description

Article Body The body text of article

Body ID The Article ID

train_stances (49972 headlines)

Field Description

Body ID The Article ID

Headline The article headline

Stance The lable

Due to the differentiation in the number of articles and

headlines in the dataset ﬁles, we have used many-to-many

mapping. As a result, we got 49972 pairs of headline and body

texts, each with a corresponding class label. Table II and Fig

1 show the description and distribution rate of stance classes

in train _stances ﬁle. We assumed that all features in our used

dataset are important, so we didn’t apply cleaning process

(removing stop words, etc..). We split the dataset into train,

validation, and test. The test data was separated before training

process and has been only used for ﬁnal model evaluation. The

distribution of splitting dataset as follows: 60%, 20%, 20% to

training, validation, and test sets, respectively.

TABLE II: Stance classes description in train_stances ﬁle

Label Description

Agree The body text agrees with the headline.

Disagree The body text disagrees with the headline.

Discuss

The body text discusses the same topic as the headline, but

does not take a position

Unrelated

The body text discusses a different topic than the headline

B. Feature extraction

In order to extract features from text data, we have trans-

formed the inputs into vector space using the pre-trained

GoogleNews

word vector model with 300-dimension that

are taken from 3 billion running words of Wikipedia.

Its worth mentioning that we also used the Gensim pre-

trained word embedding model for word representation, which

applied "dropping stop words" as a cleaning data preprocessing.

Using word embedding on our ﬁnal classiﬁcations module

Fake news challenge data-set, https://github.com/FakeNewsChallenge/fnc-1

word2vec-GoogleNews-vectors, https://github.com/mmihaltz/word2vec-

GoogleNews-vectors

Fig. 1: Stance classes distribution rate in train_stances ﬁle

architectures, the Word2Vec approach with Google’s pre-trained

word embedding model achieved the best results. Table III

illustrates a simple comparison between these approaches.

TABLE III: Pre-trained word embedding models

Model Description Unique Words Vector Length

Gensim

Is trained in our data-

set, which is roughly

49972 records

23977 words

(100-300)

features

Google

News

Is trained on roughly 3

billions of words from

a Google News data-

set

3 Billions words

300 features

The term

Word embedding

refers to mapping the words

into vectors of real numbers as initial weights, then the

improvement of the weight values occurres at the embedding

layer, which needs a time. Using the Word2Vec allows us to

use a pre-trained word embedding model. So, the improvement

of the weight values happens before the embedding layer,

which reduces the training time inside the ﬁnal classiﬁcation

model[26].

C. Model architectures

After running over tens of experiments across different neural

network architectures and hyper-parameters, two models shows

satisfying results. These two models are FND_Bidirectional

LSTM concatenated model and FND_Multihead LSTM model

that are both described in details (FND refers to Fake News

Detection).

FND_Bidirectional LSTM concatenated model:

The head-

line and article texts are converted into two separated embed-

ding layers using Google’s pre-trained model. The result of

the two embedding vectors are concatenated to feed the model

as shown in Figure 2. The model consists of two CNN’s

layers with 32 and 64 ﬁlters, respectively. The CNNs are

followed by max pooling to avoid over-ﬁtting

before passing

it through a Bidirectional LSTM layer with 100 memory units.

5pooling, http://uﬂdl.stanford.edu/tutorial/supervised/Pooling/

2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS)

385

Fig. 2: Model1 Diagram; FND_Bidirectional LSTM concate-

nated

The output of the bidirectional LSTM passed through three

dense layers (512, 128, 4 units) separated by one dropout.

Using soft-max activation function in the output layer, the

result is a stance classiﬁcation (unrelated, agree, disagree, or

discuss). Through the training process, the loss function used

was sparse_categorical_cross-entropy

with rmsprop optimizer.

FND_Multi-head LSTM model:

In this model, the merging

of headline and article texts occured before passing them as

input through the input layer. Then, the embedding layer using

Google’s pre-trained model is performed. The embedding layer

resulted from previous step is passed through two CNN’s layers

with 32 and 64 ﬁlters, respectively that are shown in Figure

3. This step is followed by max pooling to avoid over-ﬁtting.

Then, the output is passed through ﬁve Multi-Head LSTM

layers

with 150 memory units. A ﬂatten layer is added, then

the result are passed through one dense layer with 4 units

and a softmax activation function to classify the stance into

four classes (unrelated, agree, disagree, or discuss). The loss

function in the training process was sparse_categorical_cross-

entropy with Adam optimizer [27].

Finally, we need to mention that the number of layers

and units have been chosen carefully after applying several

experiments to make a decision.

6losses, https://keras.io/losses/

7keras-multi-head, https://pypi.org/project/keras-multi-head/

Fig. 3: Model2 Diagram; FND_Multi-head LSTM

V. RESULTS

A. Models Evaluation

For our experiment, we have used the online python

environment (Colab), which is provided by Google

. Tuning

the hyper-parameters to select the best values and exchanging

the loss and optimizer function were very challenging in this

project.

Table IV illustrates a set of statistics (precision, recall, F1

scores, and accuracy) for our two models. As we have men-

tioned earlier, Model1 (FND_Bidirectional LSTM concatenated

) has the best results. Moreover, Figure 4a and 4b show the loss

function and accuracy for training and validation sets overtime

for model1. The confusion matrix for model1 is also shown in

Table V.

B. Discussion

As we have mentioned earlier, we used a dateset that is

provided by FNC-1 challenge. This article [

] provided a

deep analysis for the ﬁrst three top-performing systems in the

challenge

. Other researchers invested their time on generating

new fake news detecting techniques based on the challenge rules

and score computation [

]. We have used different splitting

point for training, validation, testing dataset. In our experiments,

we divided the dataset into train, valdiation and test sets with

60%, 20%, 20% percentage of dataset, respectively. In [

they divided the dataset into 80%, 20% as train and test data.

Furthermore, in [

], the complete training set consisted of

94% of dataset and they left the reminder to the development

set for performance evaluation purposes. As a result, it is hard

8Google Colab, https://colab.research.google.com/

9http://www.fakenewschallenge.org/#fnc1results

2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS)

386

TABLE IV: Models statistics

Models Precision Recall F1_score Accuracy

Model1:

FND_Bidirectional

LSTM concatenated

0.7711

0.7561

0.7635 0.853

Model2: FND_Multi-head

LSTM

0.8451

0.5079

0.6345 0.8294

(a)

(b)

Fig. 4: Model1 results: (a) Accuracy; and, (b) Loss;

to compare the performance of the used techniques for the

existing work as there is no standard test data that were used

in other existing work to compare with.

TABLE V: Model1 Confusion Matrix

Unrelated Disagree Agree Discuss

Unrelated 6711 0 100 502

Disagree 67 1 107 9

Agree 255 1 406 43

Discuss 322 0 53 1408

VI. CONCLUSION

The evolving of new trends in computing (mobile cloud

computing, wiFi, and smart devices applications) led to the

widespread of internet usage. As a result, sharing information

(text, audio, and video) is getting easier. Recently, it is noticed

that there is a signiﬁcant role of using different online platforms

to share false data and spreading fake news to serve several

purposes, manipulate the stock market, or just spread unwanted

emotions among the online platforms users such as anger and

hate. Based on that, there is an increasing demand for an

accurate and efﬁcient fake news automated detection systems.

In this research, we investigated the automatic identiﬁcation of

fake news over online communication platforms. This task is

considered a challenge for the existing content-based analysis of

traditional methods. We proposed an automatic identiﬁcation of

fake news model using modern Machine Learning techniques,

mainly deep learning and neural networks. We explored in

details two models, namely, Bidirectional LSTM concatenated,

and Multi-head LSTM. These models were applied to the

FNC-1 dataset and the results showed that the Bidirectional

LSTM concatenated model has the highest accuracy with

85% followed by the Multi-head LSTM model of about 83%

accuracy. In terms of precision, the LSTM model has the

highest precision of 88% followed by a Multi-head LSTM

model with 84.5% precision. Overall, we recommend using

the Multi-head LSTM model since it provides high precision

and accuracy.

REFERENCES

[1]

S. Vosoughi, D. Roy, and S. Aral, “The spread of true

and false news online,” Science, vol. 359, no. 6380, pp.

1146–1151, 2018.

[2]

J. Gottfried and E. Shearer, News Use Across Social

Medial Platforms 2016. Pew Research Center, 2016.

[3]

D. M. Lazer, M. A. Baum, Y. Benkler, A. J. Berinsky,

K. M. Greenhill, F. Menczer, M. J. Metzger, B. Nyhan,

G. Pennycook, D. Rothschild et al., “The science of fake

news,” Science, vol. 359, no. 6380, pp. 1094–1096, 2018.

[4]

A. Bovet and H. A. Makse, “Inﬂuence of fake news in

twitter during the 2016 us presidential election,” Nature

communications, vol. 10, no. 1, p. 7, 2019.

[5]

N. J. Conroy, V. L. Rubin, and Y. Chen, “Automatic

deception detection: Methods for ﬁnding fake news,”

Proceedings of the Association for Information Science

and Technology, vol. 52, no. 1, pp. 1–4, 2015.

[6]

Z. Jin, J. Cao, Y. Zhang, J. Zhou, and Q. Tian, “Novel

visual and statistical image features for microblogs news

veriﬁcation,” IEEE transactions on multimedia, vol. 19,

no. 3, pp. 598–608, 2017.

[7]

J. Ma, W. Gao, P. Mitra, S. Kwon, B. J. Jansen, K.-F.

Wong, and M. Cha, “Detecting rumors from microblogs

with recurrent neural networks.” in Ijcai, 2016, pp. 3818–

3824.

[8]

N. Ruchansky, S. Seo, and Y. Liu, “Csi: A hybrid deep

model for fake news detection,” in Proceedings of the

2017 ACM on Conference on Information and Knowledge

Management. ACM, 2017, pp. 797–806.

[9]

X. Zhou and R. Zafarani, “Fake news: A survey of

research, detection methods, and opportunities,” arXiv

preprint arXiv:1812.00315, 2018.

2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS)

387

[10]

K. Shu, A. Sliva, S. Wang, J. Tang, and H. Liu, “Fake news

detection on social media: A data mining perspective,”

ACM SIGKDD Explorations Newsletter, vol. 19, no. 1,

pp. 22–36, 2017.

[11]

V. Pérez-Rosas, B. Kleinberg, A. Lefevre, and R. Mihal-

cea, “Automatic detection of fake news,” in Proceedings

of the 27th International Conference on Computational

Linguistics, 2018, pp. 3391–3401.

[12]

R. Davis and C. Proctor, “Fake news, real consequences:

Recruiting neural networks for the ﬁght against fake news,”

2017.

[13]

K. Miller and A. Oswalt, “Fake news headline classiﬁ-

cation using neural networks with attention,” tech. rep.,

California State University, year, Tech. Rep., 2017.

[14]

Y. Wang, F. Ma, Z. Jin, Y. Yuan, G. Xun, K. Jha, L. Su,

and J. Gao, “Eann: Event adversarial neural networks

for multi-modal fake news detection,” in Proceedings

of the 24th ACM SIGKDD International Conference on

Knowledge Discovery & Data Mining. ACM, 2018, pp.

849–857.

[15]

J. Zhang, L. Cui, Y. Fu, and F. B. Gouza, “Fake news

detection with deep diffusive network model,” arXiv

preprint arXiv:1805.08751, 2018.

[16]

F. Monti, F. Frasca, D. Eynard, D. Mannion, and M. M.

Bronstein, “Fake news detection on social media using ge-

ometric deep learning,” arXiv preprint arXiv:1902.06673,

2019.

[17]

H. Allcott and M. Gentzkow, “Social media and fake news

in the 2016 election,” Journal of economic perspectives,

vol. 31, no. 2, pp. 211–36, 2017.

[18]

N. J. Conroy, V. L. Rubin, and Y. Chen, “Automatic

deception detection: Methods for ﬁnding fake news,”

Proceedings of the Association for Information Science

and Technology, vol. 52, no. 1, pp. 1–4, 2015.

[19]

D. Klein and J. Wueller, “Fake news: A legal perspective,”

Journal of Internet Law (Apr. 2017), 2017.

[20]

A. Hanselowski, P. Avinesh, B. Schiller, F. Caspelherr,

D. Chaudhuri, C. M. Meyer, and I. Gurevych, “A

retrospective analysis of the fake news challenge stance-

detection task,” in Proceedings of the 27th International

Conference on Computational Linguistics, 2018, pp. 1859–

1874.

[21]

A. K. Chaudhry, D. Baker, and P. Thun-Hohenstein,

“Stance detection for the fake news challenge: identifying

textual relationships with deep neural nets,” CS224n:

Natural Language Processing with Deep Learning, 2017.

[22]

S. Chopra, S. Jain, and J. M. Sholar, “Towards auto-

matic identiﬁcation of fake news: Headline-article stance

detection with lstm attention models,” 2017.

[23]

N. Lozhnikov, L. Derczynski, and M. Mazzara, “Stance

prediction for russian: data and analysis,” in International

Conference in Software Engineering for Defence Appli-

cations. Springer, 2018, pp. 176–186.

[24]

I. Augenstein, T. Rocktäschel, A. Vlachos, and

K. Bontcheva, “Stance detection with bidirectional condi-

tional encoding,” in Proceedings of the 2016 Conference

on Empirical Methods in Natural Language Processing,

2016, pp. 876–885.

[25]

W. Ferreira and A. Vlachos, “Emergent: a novel data-

set for stance classiﬁcation,” in Proceedings of the 2016

conference of the North American chapter of the asso-

ciation for computational linguistics: Human language

technologies, 2016, pp. 1163–1168.

[26]

L. Ma and Y. Zhang, “Using word2vec to process big

text data,” in 2015 IEEE International Conference on Big

Data (Big Data). IEEE, 2015, pp. 2895–2897.

[27]

D. P. Kingma and J. Ba, “Adam: A method for stochas-

tic optimization,” in The International Conference on

Learning Representations (ICLR), 2015.

2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS)

388

Think Fast, Think Slow, Think Critical: Designing an Automated Propaganda Detection Tool

Conference Paper

Full-text available

May 2024

PERFORMANCE ANALYSIS OF FAKE SOCIAL MEDIACONTENT BASED ON DEEP LEARNING METHODS

Article

Full-text available

May 2022

Nowadays, social media platform has enhanced tremendously in the last few years and considered one of major source of for information seeking and sharing in low cost. Due to the growth of data, The Fake news are quickly dominating and spreading the information, and distorting the community for sharing their own thoughts, knowledge and point of view regarding towards to any topic. In this paper, structural features with updated RNN and LSTM methods are proposed to improve the performance of system on fake news data. The system uses attention layer with RNN and LSTM to update the weights and values of different features. The performance of model also compared with various hyper parameters such activation, optimization, and dropout. The Proposed Model based with Long Short Term Memory (LSTM) categorize the features closer on original and fake news with customized hyper parameters and random search. The experimental results also depicted that deep learning methods outperformed when size of data samples is high. Furthermore, we showed that combining strong feature engineering with deep learning models, we can more concisely identify the fake news with state-of-art results.

Strategies based on artificial intelligence for the detection of fake news

Article

Full-text available

Aug 2023

Digmar Garcia Paredes

The article examines artificial intelligence (AI) strategies to combat fake news, highlighting the rise in misinformation, especially during the pandemic, and its negative impact on public decision-making. The accelerated spread of fake news in the face of truth underlines the urgency of effective detection methods. Through a systematic literature review, the use of machine learning, deep learning, and natural language processing (NLP) to automate the identification of fake news is explored, highlighting key data sets such as BuzzFeedNews, LIAR, and BS Detector, among others, essential to train detection algorithms. The study discusses various AI approaches and techniques applied to detection, including convolutional neural networks (CNN), bidirectional LSTM, and the combination of CNN with LSTM, showing significant improvements in accuracy and efficiency. However, the limitations of these techniques are pointed out, such as the volatility of the training data and the difficulty of adapting models to different misinformation contexts. The conclusion highlights AI as a vital tool against fake news, emphasizing the need to advance research and develop more sophisticated technologies to strengthen disinformation detection and protect information integrity in society. The fight against fake news is complex, but AI-based strategies show a promising path toward practical solutions.

Ensemble-Based Machine Learning Approach for Detecting Arabic Fake News on Twitter

Article

Feb 2024

Evaluating Deep Learning for Cross-Domains Fake News Detection

Chapter

Apr 2024

Investigation of deep fake video detection

Conference Paper

Jan 2024

Fake News Detection Using Deep Learning

Thesis

Full-text available

Jan 2021

Yahya Layth Khaleel

The ongoing spread and expansion of information technology and social media sites has made it easier for people to access different types of news – political, economic, medical, social etc. - through these platforms. This rapid growth in news outlets and the demand for information has blurred the lines between real and fake news, and led to the dissemination of fake news, which is a dangerous state of affairs. The outbreak of the coronavirus pandemic and a rising awareness of the dangers it posed all across the globe saw a parallel rise in fake news and rumors. As a result, people did not know what to believe and questioned what they read, enabling rumor mongering, false news, humor and unsubstantiated statements to sow panic and propagate deceptive ideas. This tide of misinformation undermined general confidence, freedom of speech, journalists’ work and any form of clarity. It is therefore important to find a way of limiting the spread of untrustworthy information, and a number of strategies have been launched to try and highlight fake news and find ways of accurately differentiating between real and fake news. Certain research studies have used a broad range of datasets and achieved high levels of accuracy, but to date the fake news associated with the coronavirus pandemic has been generally overlooked, and the limited number of studies on the subject have used small datasets or particular categories. This thesis sets out to overcome this problem by minimizing it, on the basis on deep learning (LSTM, Bi-LSTM, BERT) and machine learning (SVM, RF, XGBoost, LightGBM), using a large dataset (39279 rows) to identify fake news. However, the length of the news varies significantly in the dataset that was used. Thus, it was important to carry out BERT text summarization to minimize this disparity and to enable comparisons to be made between the outcomes obtained before and after the text summarization. The research began by carrying out a number of experiments to find the best performing deep learning architecture, before comparing it with alternative models already used in machine learning in the same field. The findings of the present work showed that the BERT model performed the best, achieving a text classification accuracy of 96.63 before and 96.83 after the text was summarized. Moreover, the research results confirmed that the text summarization process resulted in enhanced performance for all of the employed models, regardless of whether they were deep learning or machine learning models.

Ensemble Learning with optimum Feature Selection for Tweet Fake News Detection using the Dragonfly approach

Conference Paper

Dec 2023

Implementation Of Deep Learning For Fake News Classification In Bahasa Indonesia

Article

Full-text available

Sep 2023

Eko Prasetio Widhi

Fake news has become a serious threat in the digital information era. This research aims to develop a model for detecting fake news in Bahasa Indonesia using a deep learning approach, combining the Long Short-Term Memory (LSTM) method with word representations from Word2vec Continuous Bag of Words (CBOW) to achieve optimal results. Our main model is LSTM, optimized through hyperparameter tuning. This model can process information sequentially from both directions, allowing for a better understanding of the news context. The integration of Word2vec CBOW enriches the model's understanding of word relationships in news text, enabling the identification of important patterns for news classification. The evaluation results show that our model performs very well in detecting fake news. After the tuning process, we achieved an F1-Score of 97.30% and an Accuracy of 98.38%. 10-fold cross-validation yielded even better results, with an F1-Score and Accuracy reaching 99%.

Identification of Source of Misleading Information and Stop the Dissemination through Blocking the User

Article

Feb 2023

Girishkumar K. Patnaik, Akash D. Waghmare, Dinesh D. Puri

Introduction: At present, people are more dependent on Internet sources for any sort of information or news. So, the news/information needs to be preserved and should not be modified by any user. Providing security for the news data is a major concern. The decentralized approach of a chain of blocks is used in order to strengthen the security of the news. The existing blockchain framework that offers openness, tamper-proofing, privacy, controlling information, and monitoring is inherited in the proposed work. Precisely, the idea is to build a safe platform that can detect bogus news on social media platforms. Even if the environment is fragile, the chain of blocks-based decentralised peer-to-peer environment provides security to the published information. Objectives: As a result of recent innovations and advancements in the field of computer technology, social media networks have emerged as one of the most crucial aspects of contemporary human existence. Social media has developed into a well-known platform for information dissemination and news, as well as for daily reports. There are a variety of benefits associated with social media; but, on the converse, there is a great deal of misleading news and data that can mislead the reader. One of the major issues with social media is that there is a dearth of information that can be relied on as well as real world news. Because of misleading news on social media, users are misled. So, to build a trustful environment, early detection of misleading news is necessary. Innovative machine learning methodologies are useful to identify and recognize misleading news more accurately. Methods: Misleading news is more viral than real news. People instantly believe on the false information. So, there is a need to reduce the dissemination of misleading information on social media. In order to minimize the spread of misleading news, the source of the news needs to be traced. In overall, proposed system utilizes the chain of blocks and applies proposed machine learning methodologies in order to identify misleading news and thereafter reduce the propagation of misleading information by blocking the fake user. Results: An experimental analysis reveals that the proposed classification algorithm obtains a better accuracy rate. In order to produce useful training rules and evaluate the test classifier, a number of features are extracted like TF-IDF, N-Gram features, and dependency-oriented NLP features from the data input. Conclusions: The proposed method analyzes every user's uploaded information, identify fraudulent users and reduce the propagation of false information by blocking the user.

Stance Prediction for Russian: Data and Analysis

Chapter

Full-text available

Jan 2020

Stance detection is a critical component of rumour and fake news identification. It involves the extraction of the stance a particular author takes related to a given claim, both expressed in text. This paper investigates stance classification for Russian. It introduces a new dataset, RuStance, of Russian tweets and news comments from multiple sources, covering multiple stories, as well as text classification approaches to stance detection as benchmarks over this data in this language. As well as presenting this openly-available dataset, the first of its kind for Russian, the paper presents a baseline for stance prediction in the language.

Stance Prediction for Russian: Data and Analysis

Conference Paper

Full-text available

Mar 2019

EANN: Event Adversarial Neural Networks for Multi-Modal Fake News Detection

Conference Paper

Full-text available

Jul 2018

As news reading on social media becomes more and more popular, fake news becomes a major issue concerning the public and government. The fake news can take advantage of multimedia content to mislead readers and get dissemination, which can cause negative effects or even manipulate the public events. One of the unique challenges for fake news detection on social media is how to identify fake news on newly emerged events. Unfortunately, most of the existing approaches can hardly handle this challenge, since they tend to learn event-specific features that can not be transferred to unseen events. In order to address this issue, we propose an end-to-end framework named Event Adversarial Neural Network (EANN), which can derive event-invariant features and thus benefit the detection of fake news on newly arrived events. It consists of three main components: the multi-modal feature extractor, the fake news detector, and the event discriminator. The multi-modal feature extractor is responsible for extracting the textual and visual features from posts. It cooperates with the fake news detector to learn the discriminable representation for the detection of fake news. The role of event discriminator is to remove the event-specific features and keep shared features among events. Extensive experiments are conducted on multimedia datasets collected from Weibo and Twitter. The experimental results show our proposed EANN model can outperform the state-of-the-art methods, and learn transferable feature representations.

Influence of fake news in Twitter during the 2016 US presidential election

Article

Full-text available

Jan 2019

The dynamics and influence of fake news on Twitter during the 2016 US presidential election remains to be clarified. Here, we use a dataset of 171 million tweets in the five months preceding the election day to identify 30 million tweets, from 2.2 million users, which contain a link to news outlets. Based on a classification of news outlets curated by www.opensources.co, we find that 25% of these tweets spread either fake or extremely biased news. We characterize the networks of information flow to find the most influential spreaders of fake and traditional news and use causal modeling to uncover how fake news influenced the presidential election. We find that, while top influencers spreading traditional center and left leaning news largely influence the activity of Clinton supporters, this causality is reversed for the fake news: the activity of Trump supporters influences the dynamics of the top fake news spreaders.

The science of fake news

Article

Full-text available

Mar 2018

Addressing fake news requires a multidisciplinary effort

Fake News Detection on Social Media: A Data Mining Perspective

Article

Full-text available

Aug 2017

Social media for news consumption is a double-edged sword. On the one hand, its low cost, easy access, and rapid dissemination of information lead people to seek out and consume news from social media. On the other hand, it enables the wide spread of "fake news", i.e., low quality news with intentionally false information. The extensive spread of fake news has the potential for extremely negative impacts on individuals and society. Therefore, fake news detection on social media has recently become an emerging research that is attracting tremendous attention. Fake news detection on social media presents unique characteristics and challenges that make existing detection algorithms from traditional news media ineffective or not applicable. First, fake news is intentionally written to mislead readers to believe false information, which makes it difficult and nontrivial to detect based on news content; therefore, we need to include auxiliary information, such as user social engagements on social media, to help make a determination. Second, exploiting this auxiliary information is challenging in and of itself as users' social engagements with fake news produce data that is big, incomplete, unstructured, and noisy. Because the issue of fake news detection on social media is both challenging and relevant, we conducted this survey to further facilitate research on the problem. In this survey, we present a comprehensive review of detecting fake news on social media, including fake news characterizations on psychology and social theories, existing algorithms from a data mining perspective, evaluation metrics and representative datasets. We also discuss related research areas, open problems, and future research directions for fake news detection on social media.

Automatic Detection of Fake News

Conference Paper

Jan 2019

The proliferation of misleading information in everyday access media outlets such as social media feeds, news blogs, and online newspapers have made it challenging to identify trustworthy news sources, thus increasing the need for computational tools able to provide insights into the reliability of online content. In this paper, we focus on the automatic identification of fake content in online news. Our contribution is twofold. First, we introduce two novel datasets for the task of fake news detection, covering seven different news domains. We describe the collection, annotation, and validation process in detail and present several exploratory analysis on the identification of linguistic differences in fake and legitimate news content. Second, we conduct a set of learning experiments to build accurate fake news detectors. In addition, we provide comparative analyses of the automatic and manual identification of fake news.

The spread of true and false news online

Article

Mar 2018

Lies spread faster than the truth There is worldwide concern over false news and the possibility that it can influence political, economic, and social well-being. To understand how false news spreads, Vosoughi et al. used a data set of rumor cascades on Twitter from 2006 to 2017. About 126,000 rumors were spread by ∼3 million people. False news reached more people than the truth; the top 1% of false news cascades diffused to between 1000 and 100,000 people, whereas the truth rarely diffused to more than 1000 people. Falsehood also diffused faster than the truth. The degree of novelty and the emotional reactions of recipients may be responsible for the differences observed. Science , this issue p. 1146

CSI: A Hybrid Deep Model for Fake News Detection

Conference Paper

Nov 2017

The topic of fake news has drawn attention both from the public and the academic communities. Such misinformation has the potential of affecting public opinion, providing an opportunity for malicious parties to manipulate the outcomes of public events such as elections. Because such high stakes are at play, automatically detecting fake news is an important, yet challenging problem that is not yet well understood. Nevertheless, there are three generally agreed upon characteristics of fake news: the text of an article, the user response it receives, and the source users promoting it. Existing work has largely focused on tailoring solutions to one particular characteristic which has limited their success and generality. In this work, we propose a model that combines all three characteristics for a more accurate and automated prediction. Specifically, we incorporate the behavior of both parties, users and articles, and the group behavior of users who propagate fake news. Motivated by the three characteristics, we propose a model called CSI which is composed of three modules: Capture, Score, and Integrate. The first module is based on the response and text; it uses a Recurrent Neural Network to capture the temporal pattern of user activity on a given article. The second module learns the source characteristic based on the behavior of users, and the two are integrated with the third module to classify an article as fake or not. Experimental analysis on real-world data demonstrates that CSI achieves higher accuracy than existing models, and extracts meaningful latent representations of both users and articles.

Social Media and Fake News in the 2016 Election

Article

May 2017
J ECON PERSPECT

Following the 2016 US presidential election, many have expressed concern about the effects of false stories ("fake news"), circulated largely through social media. We discuss the economics of fake news and present new data on its consumption prior to the election. Drawing on web browsing data, archives of fact-checking websites, and results from a new online survey, we find: 1) social media was an important but not dominant source of election news, with 14 percent of Americans calling social media their "most important" source; 2) of the known false news stories that appeared in the three months before the election, those favoring Trump were shared a total of 30 million times on Facebook, while those favoring Clinton were shared 8 million times; 3) the average American adult saw on the order of one or perhaps several fake news stories in the months around the election, with just over half of those who recalled seeing them believing them; and 4) people are much more likely to believe stories that favor their preferred candidate, especially if they have ideologically segregated social media networks.

Automatic Identification of Fake News Using Deep Learning

Figures

Recommended publications

Aktuelle Entwicklungen in der Automatischen Musikverfolgung

A semi-supervised deep learning algorithm for abnormal EEG identification

A platform for automatic identification of phishing URLs in mobile text messages

Automatic Identification of Conodonts Based on Deep Learning