Conference PaperPDF Available

A Tool for Fake News Detection using Machine Learning Techniques

Authors:

Figures

Content may be subject to copyright.
A Tool for Fake News Detection using Machine
Learning Techniques
K Raghavendra Asish
Department of CSE,
SRM University-AP, India
kalva raghavendra@srmap.edu.in
Adarsh Gupta
Department of CSE,
SRM University-AP, India
adarsh gupta@srmap.edu.in
Arpit Kumar
Department of CSE,
SRM University-AP, India
arpit kumar@srmap.edu.in
Alex Mason
Department of CSE,
SRM University-AP, India
alex martin@srmap.edu.in
Murali Krishna Enduri
Department of CSE,
SRM University-AP, India
muralikrishna.e@srmap.edu.in
Satish Anamalamudi
Department of CSE,
SRM University-AP, India
satish.a@srmap.edu.in
Abstract—The web and internet are very important to a very
huge number of people and it has a large number of users.
These users use these platforms for different purposes. There are
many social media platforms that are available to these users.
Any user can make a spread or post the news/message through
these online social platforms. Even though the algorithms used
by social media platforms are updated meticulously, they still
are not efficient enough to filter out the fake news or make the
essential information viral first where it is needed so that the
information surrounding that specific region benefits the people
living there before the news reaches out to the rest of the world.
One of the biggest methods of fake news contribution is social
bots. Social bots generate the content automatically and spread
among the social media users.
In this work, we propose an effective approach to detect fake
news / false information using machine learning techniques. We
provide a tool to detect fake news using Naive Bayes technique
with high accuracy. We show the results on two data sets by
using our tool.
Index Terms—Fake news, Machine learning, Text processing,
LSTM, Naive Bayes.
I. INTRODUCTION
World is changing rapidly. There used to be a time when
we received a letter from our beloved ones made us feel
so special, but in today’s world thanks to the emergence of
social networking websites that have enabled users across
the world to consume content as well as share information.
A person who is sitting on the 14th floor of a building in
Manhattan within seconds gets the news about the live score
of a cricket match happening in Eden Gardens, India. We
know that many advantages of this digital world are similar
to those many disadvantages as well. The way all that content
spreads so rapidly and effectively across platforms, on a daily
basis, indicates that individuals are potentially vulnerable to
being misled, manipulated, and deceived, which might result
in a long lasting backlash. News / Information is available
on fingertips these days, thanks to the emergence of social
networking websites that have enabled users across the world
to consume content as well as share information. In recent
years, web-based media have reformed the manner by which
information spreads across the web and the world. [1], [2],
[3], [4], [5].
Some of the common social factors are Truth Bias, Naive
Realism and Confirmation Bias i.e the person consumes the
news (which might be formed in a way to shape the reader’s
belief) as it is without worrying about it’s authenticity, then
forms his/her opinions and disregards others views as irrational
or biassed, because they favour receiving information that only
verifies their own current views. For instance, a journalist
posted a false report about Steve Jobs that he suffered a heart
attack through a channel (CNN) which is a trusted website [6],
[7].
Even though the algorithms used by social media platforms
are updated meticulously, they still are not efficient enough to
filter out the fake news or make the essential information viral
first where it is needed so that the information surrounding
that specific region benefits the people living there before the
news reaches out to the rest of the world [8], [9]. One of the
biggest methods of fake news contribution is social bots. Social
bots generate the content automatically and spread amongst the
social media user. In the US election (2016), a huge number
of people tweeted in favour of either Trump or Clinton but
those were not genuine users instead social bots distorted the
elections.
II. RELATED WORK
In the present time, the spread of deception is turned into
an exceptionally simple undertaking due to web based media.
To stop this we want to discover news is phoney or genuine.
For which we have constructed a model which will recognize
that given news is phoney or not by utilising a few ML and
NLP ideas and calculations.
Bag Of Words(BoW): Bag Of Words is most generally
utilized in the techniques for report classification [10], [11].
BoW is Natural Language Processing technique and Informa-
tion Retrieval strategy. NLP models are utilised on the numbers
we can’t utilise text information into our model. Therefore
the BoW model is utilised to preprocess text information by
2022 2nd International Conference on Intelligent Technologies (CONIT)
K
arnataka, India. June 24-26, 2022
978-1-6654-8407-7/22/$31.00 ©2022 IEEE 1
2022 International Conference on Intelligent Technologies (CONIT) | 978-1-6654-8407-7/22/$31.00 ©2022 IEEE | DOI: 10.1109/CONIT55038.2022.9848064
Authorized licensed use limited to: SRM University. Downloaded on September 07,2022 at 04:45:32 UTC from IEEE Xplore. Restrictions apply.
changing it into a pack of words. In this strategy Frequency
of each word is utilised as an element in the order. BoW is
a request-less documentation portrayal model in which just
recurrence of words is important.
N-grams: N-grams is a text characterization model which
for the most part is utilized in NLP and text mining [12], [6],
[13]. N-grams really are a bunch of co-happening words in
given information and while processing n-grams push a single
word ahead.
TF-IDF: Term Frequency–Inverse Document Frequency is
a numerical statistic that is indicated to understand the mostly
usage of a word to a document in a dataset [14], [15]. It is
frequently used as a weighting factor in finding for information
retrieval, text mining.
Term Frequency (TF): Term Frequency is the inclusion of
words present in the archive or a discovery of the disparity
between the document [12], [16]. Each archive is portrayed in
a vector that contains the word count. This term is determined
by number of times the term shows up in a record separated
by total number of terms in the Document [6].
Word Embedding: Word inserting is a bunch of language
demonstrating and highlighting extraction procedures in Natu-
ral Language Processing (NLP). In word inserting words from
jargon are changed over into the vectors of genuine numbers.
Word inserting is a sort of word portrayal that permits words
with comparable importance to have a comparable portrayal.
III. EXPERIMENTAL STUDY AND RESULT ANALYSIS:
LSTM: Long Short Term Memory is a sort of repetitive
neural organisation. In RNN yield from the last advance is
taken care of as contribution to the current advance. LSTM
can of course hold the data for an extensive stretch of time.
It is utilised for handling, anticipating and arranging based on
time series information.
LSTM units are a structure block for the layers of a
Recurrent Neural Network (RNN). A LSTM unit is made up
of a cell, and three gates namely input gate, output gate and
forget gate. The cell tracks the values throughout a huge time
stretch so the connection of the word in the beginning of the
message can impact the result of the word later in the sentence.
Existing neural networks can’t recall or keep the record of
what everything is passed before they are executed, this stops
the ideal impact of words that comes in the sentence before
to have any effect on the completion words, and it appears to
be a significant deficiency [12].
It handled the issue of long haul conditions structure in
which the RNN can not anticipate the word put away in
the drawn out memory however can give additional exact
expectations from the new information. LSTM can of course
hold the data for an extensive stretch of time [17], [18].
Overview of Dataset: Data Set 1: Dataset is taken from
the Kaggle platform. It has the following attributes: id: unique
id for a news article, title: the title of a news article, author:
author of the news article, text: the information of news article.
Data set consists of 18212 news articles for training and testing
of models. Dataset is formed with a combination of real and
fake news pertaining to the US, especially politics.
Data Set 2: We have used a dataset called IFND from
Kaggle [19]. This dataset covers news exclusive to India
which is created by scraping Indian fact checking websites.
The dataset contains fake and legitimate news, further divided
into different types of articles on different topics. It has the
following attributes: id: unique id for a news article, title: the
title of a news article, website: news website that published
the news, label: real or fake. The dataset consists of 56714
news articles for training and testing of models.
Implementation: Preprocessing: To change information
into the applicable arrangement the informational collection
needs pre-process. Right off the bat, we eliminated all the
NAN values from the dataset. Vocabulary size of 5000 words
is chosen. Then, at that point, the NLTK (Natural Language
Processing) Toolkit is utilised to eliminate all the prevent
words from the dataset. Stop words are rundown of accen-
tuations which prevent words from nltk toolbox for example
words, for example, ’and’, ’the’ and ’I’ that don’t pass a lot
of data changing over them on to lowercase and eliminating
accentuation. For each word in records on the off chance that
it’s anything but a stop word then that words tag is taken
from the postag. Then, at that point, this assortment of words
is added to report.
Word index of tokenize dataset: Word tokenization adds
text to a list, which is then named as a document. This stage’s
output is a list of all the words in the narration.
Word Embedding: One-hot Representation: We can not
give input as message configuration to the calculation so we
need to change over them into the numeric structure, for which
we are utilising one hot representation. In one-hot represen-
tation each word in the dataset will be given its record from
the characterised vocabulary size and these lists are supplanted
in sentences. While contributing to the word implanting, we
need to give it a fixed length. To change over each sentence
into the fixed length cushioning arrangements is utilised. We
have thought about a max length of 20 words while cushioning
the title. It is possible that we can give cushioning before the
sentence (pre) or after the sentence (post), and afterward these
sentences pass as contribution to the word implanting. Word
installing applies highlight extraction on the given input vector.
In absolute 40 vector highlights are thought of.
Model: Yield from the word implanting is given to the
model. The AI model carried out here is a successive model
consisting of installing a first layer which consists of values,
vocabulary size, number of elements and length of sentence.
The following LSTM with sigmoid activation will work as we
want one last result. The model flow for the data processing
is shown in the Fig. 1.
Classification: For both preprocessed testing data the result
is predicted. If the predicted value>0.5Classified as 1is real
and 0is fake.
Accuracy =(TP +TN)
T otal
2
Authorized licensed use limited to: SRM University. Downloaded on September 07,2022 at 04:45:32 UTC from IEEE Xplore. Restrictions apply.
Fig. 1. Module Flow for Data Processing
The following terms were used: True Negative (TN), i. e.,
the prediction was negative and test cases, too, were actually
negative; True Positive (TP) i.e., the prediction was positive
and test cases, too, were actually positive; False Negative
(FN) i.e., the prediction was negative, but the test cases were
actually positive; False Positive (FP), i.e., the prediction was
positive, but the test cases were actually negative.
IV. RESULTS
The characterization precision for genuine news stories and
misleading news stories is generally something very simi-
lar, however arrangement exactness for counterfeit news is
marginally digressed. By utilising the disarray lattice and the
grouping report further the exactness of every individual model
is estimated.
TABLE I
CLASSIFICATION TABLE OF LSTM WITH ACCURACY 0.91177 FOR DATA
SET1
N=8014 Predicted: NO Predicted: YES
Actual: NO 4191 374
Actual:YES 333 3116
Fig. 2. LSTM Model Accuracy for Data Set1
TABLE II
CLASSIFICATION TABLE OF LSTM WITH ACCURACY 0.91865 FOR DATA
SET2
N=24955 Predicted: NO Predicted: YES
Actual: NO 15627 923
Actual:YES 1107 7298
Fig. 3. LSTM Model Accuracy for Date Set2
Limitations:
1) It takes longer to train LSTMs.
2) To train, LSTMs demand additional memory.
3) The LSTM algorithm is easily overfitted.
Naive Bayes Model: In machine learning, Naive Bayes
classifiers are the piece of straightforward AI. Naive Bayes
is a well known calculation which is utilised to observe the
precision of the news whether its genuine or phoney utilising
multinomial NB and pipelining ideas. There are a number of
calculations that emphasise on normal guidelines, so it isn’t
the main calculation for preparing such classifiers. To check
on the off chance that the news is phoney or genuine innocent
Bayes can be utilised. It is a sort of calculation utilized in text
characterization. The utilization of the token is connected with
the news that might be phoney or not phoney in the Na¨
ıve
Bayes classifier and afterward the precision of the news is
determined by utilizing Bayes theorem. There are essentially
3 sorts of Na¨
ıve base model as - Gaussian Na¨
ıve Bayes,
Multinomial guileless Bayes and Bernoulli Na¨
ıve Bayes. We
have utilized Multinomial Na¨
ıve Bayes model for our task to
identify counterfeit news [12], [20]. A benefit of Na¨
ıve Bayes
classifier is that it requires less preparation information for
grouping.
Naive Bayes Formula Details: Coming up next is the equa-
tion for naive Bayes characterization utilizes the likelihood of
the past occasion and contrasts it and the current occasion.
Each and every likelihood of the occasion is determined and
finally the general likelihood of the news when contrasted
with the dataset is determined. Subsequently on computing
3
Authorized licensed use limited to: SRM University. Downloaded on September 07,2022 at 04:45:32 UTC from IEEE Xplore. Restrictions apply.
the general likelihood, we can get the surmised esteem and
can distinguish whether the news is genuine or counterfeit .
P(E1|E2)=P(E2|E1)P(E1)
P(E2)(1)
Finding the probability of event, E1when event E2is TRUE
P(E1)=PRIOR PROBABILITY (2)
P(E1|E2)=POSTERIOR PROBABILITY (3)
If the probability is 0
P(W)= Wc+1
(Wn+Wu)(4)
Where Wis a word, Wcis a word count, Wnis total number
of words and Wunumber of unique words. Therefore, by using
this formula one can find the accuracy of the news.
Fig. 4. Module Flow for Data Processing
Implementation: Data Pre-processing: This contains ev-
ery one of the information which should be checked com-
pletely and preprocessed. To start with, we go through the
train, test and approve information records then, at that point,
play out some preprocessing like tokenizing, stemming and so
on. Here the information is checked completely assuming it
has missing qualities.
Feature Extraction: In this dataset we have done element
extraction and choice strategies from scikit and python. To
perform highlight choice, we utilize a strategy called tf-idf. We
have likewise utilized word to vector to separate the highlights,
additionally pipelining has been utilized to facilitate the code.
Classification: Here the order of the information is done
into parts that is test information and train information and
the train dataset is characterized into bunches with comparable
substances. Later the test information is coordinated, and the
gathering is relegated to whichever it has a place with and
afterward further the Na¨
ıve Bayes classifier is applied and the
likelihood of every single word is determined independently.
Prediction: At last, our model was saved with the name
finalised model.sav. This model will be replicated to the
client’s machine and will be utilized by the app.py document
to arrange the phoney news with precision. It takes a news
story as a contribution from the client then, at that point, the
model is utilized for a conclusive arrangement yield that is
displayed to client alongside likelihood of truth.
Fig. 5. Confusion Matrix for Data Set1
Fig. 6. Confusion Matrix for Data Set2
TABLE III
ACCURACY FOR DIFFERENT DATA SETS
Results Data Set1 Data Set2
Accuracy 0.89693 0.93216
Advantages: In this way by utilizing Naive Bayes theorem
we can infer that any report from a huge or little dataset can be
delegated phoney or genuine news by coordinating it with the
past dataset values quicker than expected which thus assists the
clients with putting stock in a specific news. When assumption
4
Authorized licensed use limited to: SRM University. Downloaded on September 07,2022 at 04:45:32 UTC from IEEE Xplore. Restrictions apply.
of independence holds, a Naive Bayes classifier performs
better compared to other models like logistic regression and
you need less training data.
V. P ROPOSED APP ROACH
A few ways by which we can identify fake news is as
follows: One way for handling counterfeit news is to manually
classify news as legitimate or fake. Despite the fact that it is
straightforward and seems like the simplest solution, it’s not
feasible to label the massive amount of news that already exists
and gets produced daily. Therefore, we must look for a realistic
technical solution to do the same in minimal duration. The
other way is to develop a classification model that has been
trained with a machine learning algorithm to label the data,
and then retrieve it as and when required.
Although a few algorithms exist, it is not readily available
to common people, which gives rise to a burning question
that “then what’s the point of fake news detection”, so we
thought of creating something that will make detecting fake
news a lot easier and easily accessible. That is why we as a
collective developed this Google Chrome extension and linked
it with a Naive Bayes-based machine learning model, which
runs at server-side. We chose Naive Bayes because it takes less
time to train and test the data and also because its efficiency
matches that of the LSTM with respect to texts. Now all a
person needs to do is to copy any sentence from any website,
paste it into the extension and press the analyse button, in a
matter of seconds it will show whether the news is reliable or
unreliable.
Fig. 7. Google Chrome extension
Coming to the dataset, we have developed this extension
based on the Data Set 2 as we assumed most of the users who
will use this extension will be from India. That is why it is
important to make this extension relevant to Indian Politics,
Sports, Developments, Markets, etc. When the user uses this
extension then he/she will be able to understand whether the
data which they are consuming is reliable or not.
Advantages:
1) Our extension provides results instantly because of its
very less time complexity of the backend process.
2) It is a lightweight extension that is completely free of
cost.
3) This is the first extension to detect fake news based on
an Indian Dataset.
4) One more feature of our extension which makes this
unique from all other extensions that are available on
chrome web stores is that all other extensions provide
a platform for users to categorize and tag whether
particular news is true or not, this makes the result
unreliable as everyone will have their own view on the
data and categorize it accordingly which will impact
the actual result, in our extension users don’t have any
platform to tag whether particular data is true or not,
data directly gets verified by the AI model which runs
at the backend.
VI. CONCLUSION AND FUTURE WORK
Fake news spreads extraordinarily fast. It can be indistin-
guishable from accurate reporting. Nowadays, people down-
load the information and re-share it. They do not even check
whether it’s true or not, and by the end of the day, the
false information has gone so far from its source that it
becomes indistinguishable. We proposed an effective approach
to detect fake news/ false information using machine learning
techniques. We are providing a tool to detect fake news using
Naive Bayes technique with high accuracy. We show the
results on two data sets by using our tool. This tool is used
as a google chrome extension. In the future, we will try to
implement and execute a model that can predict the reason
for unreliability.
REFERENCES
[1] S. B. Parikh and P. K. Atrey, “Media-rich fake news detection: A survey,
in 2018 IEEE conference on multimedia information processing and
retrieval (MIPR). IEEE, 2018, pp. 436–441.
[2] H. S. Al-Ash and W. C. Wibowo, “Fake news identification charac-
teristics using named entity recognition and phrase detection,” in 2018
10th International Conference on Information Technology and Electrical
Engineering (ICITEE). IEEE, 2018, pp. 12–17.
[3] K. Stahl, “Fake news detection in social media, California State
University Stanislaus, vol. 6, pp. 4–15, 2018.
[4] A. A. A. Ahmed, A. Aljabouh, P. K. Donepudi, and M. S. Choi,
“Detecting fake news using machine learning: A systematic literature
review,” arXiv preprint arXiv:2102.04458, 2021.
[5] A. Jain, A. Shakya, H. Khatter, and A. K. Gupta, A smart system for
fake news detection using machine learning, in 2019 International Con-
ference on Issues and Challenges in Intelligent Computing Techniques
(ICICT), vol. 1. IEEE, 2019, pp. 1–4.
[6] H. Ahmed, I. Traore, and S. Saad, “Detecting opinion spams and fake
news using text classification, Security and Privacy, vol. 1, no. 1, p. e9,
2018.
[7] S. Shabani and M. Sokhn, “Hybrid machine-crowd approach for fake
news detection, in 2018 IEEE 4th International Conference on Collab-
oration and Internet Computing (CIC). IEEE, 2018, pp. 299–306.
[8] S. Gupta, R. Thirukovalluru, M. Sinha, and S. Mannarswamy, “Cimt-
detect: a community infused matrix-tensor coupled factorization based
method for fake news detection, in 2018 IEEE/ACM International
Conference on Advances in Social Networks Analysis and Mining
(ASONAM). IEEE, 2018, pp. 278–281.
[9] Z. Jin, J. Cao, Y.-G. Jiang, and Y. Zhang, “News credibility evaluation
on microblog with a hierarchical propagation model,” in 2014 IEEE
international conference on data mining. IEEE, 2014, pp. 230–239.
5
Authorized licensed use limited to: SRM University. Downloaded on September 07,2022 at 04:45:32 UTC from IEEE Xplore. Restrictions apply.
[10] J. Wang, P. Liu, M. F. She, S. Nahavandi, and A. Kouzani, “Bag-of-
words representation for biomedical time series classification,” Biomed-
ical Signal Processing and Control, vol. 8, no. 6, pp. 634–644, 2013.
[11] B. Srinivasa-Desikan, Natural Language Processing and Computational
Linguistics: A practical guide to text analysis with Python, Gensim,
spaCy, and Keras. Packt Publishing Ltd, 2018.
[12] R. K. Kaliyar, “Fake news detection using a deep neural network, in
2018 4th International Conference on Computing Communication and
Automation (ICCCA). IEEE, 2018, pp. 1–7.
[13] M. D. Ibrishimova and K. F. Li, A machine learning approach to
fake news detection using knowledge verification and natural language
processing,” in International Conference on Intelligent Networking and
Collaborative Systems. Springer, 2019, pp. 223–234.
[14] S. Vijayarani, M. J. Ilamathi, M. Nithya et al., “Preprocessing techniques
for text mining-an overview, International Journal of Computer Science
& Communication Networks, vol. 5, no. 1, pp. 7–16, 2015.
[15] H. Ahmed, I. Traore, and S. Saad, “Detection of online fake news
using n-gram analysis and machine learning techniques,” in International
conference on intelligent, secure, and dependable systems in distributed
and cloud environments. Springer, 2017, pp. 127–138.
[16] Q. He, “Knowledge discovery through co-word analysis, 1999.
[17] R. K. Kaliyar, A. Goswami, P. Narang, and S. Sinha, “Fndnet–a
deep convolutional neural network for fake news detection,” Cognitive
Systems Research, vol. 61, pp. 32–44, 2020.
[18] J. C. Reis, A. Correia, F. Murai, A. Veloso, and F. Benevenuto,
“Supervised learning for fake news detection, IEEE Intelligent Systems,
vol. 34, no. 2, pp. 76–81, 2019.
[19] D. K. Sharma and S. Garg, “Ifnd: a benchmark dataset for fake news
detection,” Complex & Intelligent Systems, pp. 1–21, 2021.
[20] M. L. Della Vedova, E. Tacchini, S. Moret, G. Ballarin, M. DiPierro,
and L. de Alfaro, “Automatic online fake news detection combining
content and social signals,” in 2018 22nd conference of open innovations
association (FRUCT). IEEE, 2018, pp. 272–279.
6
Authorized licensed use limited to: SRM University. Downloaded on September 07,2022 at 04:45:32 UTC from IEEE Xplore. Restrictions apply.
Conference Paper
Full-text available
Since the outbreak of COVID-19, social media plays an important role to circulate pandemic news around the world. Some malevolent users may take an advantage of this and spread fake news to attract people for business and research purposes. In this paper, we take an approach by applying existing machine learning algorithms to detect fake news in social media and show a comparison of their performances. In our study, the support vector classifier (SVC) outperforms the rest of the classifiers based on different statistical metrics. Therefore, the SVC classifier has been considered as our proposed classifier model to identify fake COVID-19 news in social media. Two word clouds have also been generated based on the appearance of words in the news that shows an insignificant difference between true and fake news.
Article
Full-text available
Spotting fake news is a critical problem nowadays. Social media are responsible for propagating fake news. Fake news propagated over digital platforms generates confusion as well as induce biased perspectives in people. Detection of misinformation over the digital platform is essential to mitigate its adverse impact. Many approaches have been implemented in recent years. Despite the productive work, fake news identification poses many challenges due to the lack of a comprehensive publicly available benchmark dataset. There is no large-scale dataset that consists of Indian news only. So, this paper presents IFND (Indian fake news dataset) dataset. The dataset consists of both text and images. The majority of the content in the dataset is about events from the year 2013 to the year 2021. Dataset content is scrapped using the Parsehub tool. To increase the size of the fake news in the dataset, an intelligent augmentation algorithm is used. An intelligent augmentation algorithm generates meaningful fake news statements. The latent Dirichlet allocation (LDA) technique is employed for topic modelling to assign the categories to news statements. Various machine learning and deep-learning classifiers are implemented on text and image modality to observe the proposed IFND dataset's performance. A multi-modal approach is also proposed, which considers both textual and visual features for fake news detection. The proposed IFND dataset achieved satisfactory results. This study affirms that the accessibility of such a huge dataset can actuate research in this laborious exploration issue and lead to better prediction models.
Article
Full-text available
Data mining is used for finding the useful information from the large amount of data. Data mining techniques are used to implement and solve different types of research problems. The research related. It is also called knowledge discovery in text (KDT) or knowledge of intelligent text analysis. Text mining is a technique which extracts information from both structured and unstructured data and also finding patterns. Text mining techniques are used in various types of research domains like natural language processing, information retrieval, text classification and text clustering.
Article
With the increasing popularity of social media and web-based forums, the distribution of fake news has become a major threat to various sectors and agencies. This has abated trust in the media, leaving readers in a state of perplexity. There exists an enormous assemblage of research on the theme of Artificial Intelligence (AI) strategies for fake news detection. In the past, much of the focus has been given on classifying online reviews and freely accessible online social networking-based posts. In this work, we propose a deep convolutional neural network (FNDNet) for fake news detection. Instead of relying on hand-crafted features, our model (FNDNet) is designed to automatically learn the discriminatory features for fake news classification through multiple hidden layers built in the deep neural network. We create a deep Convolutional Neural Network (CNN) to extract several features at each layer. We compare the performance of the proposed approach with several baseline models. Benchmarked datasets were used to train and test the model, and the proposed model achieved state-of-the-art results with an accuracy of 98.36% on the test data. Various performance evaluation parameters such as Wilcoxon, false positive, true negative, precision, recall, F1, and accuracy, etc. were used to validate the results. These results demonstrate significant improvements in the area of fake news detection as compared to existing state-of-the-art results and affirm the potential of our approach for classifying fake news on social media. This research will assist researchers in broadening the understanding of the applicability of CNN-based deep models for fake news detection.
Chapter
The term “fake news” gained international popularity as a result of the 2016 US presidential election campaign. It is related to the practice of spreading false and/or misleading information in order to influence popular opinion. This practice is known as disinformation. It is one of the main weapons used in information warfare, which is listed as an emerging cybersecurity threat. In this paper, we explore “fake news” as a disinformation tool. We survey previous efforts in defining and automating the detection process of “fake news”. We establish a new definition of “fake news” in terms of relative bias and factual accuracy. We devise a novel framework for fake news detection, based on our proposed definition and using a machine learning model.
Article
A large body of recent works has focused on understanding and detecting fake news stories that are disseminated on social media. To accomplish this goal, these works explore several types of features extracted from news stories, including source and posts from social media. In addition to exploring the main features proposed in the literature for fake news detection, we present a new set of features and measure the prediction performance of current approaches and features for automatic detection of fake news. Our results reveal interesting findings on the usefulness and importance of features for detecting false news. Finally, we discuss how fake news detection approaches can be used in the practice, highlighting challenges and opportunities.