Conference PaperPDF Available

A Tool for Fake News Detection using Machine Learning Techniques

June 2022

June 2022

DOI:10.1109/CONIT55038.2022.9848064

Conference: 2022 International Conference on Intelligent Technologies (CONIT)

Authors:

Show all 6 authorsHide

Google Chrome extension

…

Figures - uploaded by Murali Krishna Enduri

Content may be subject to copyright.

Content uploaded by Murali Krishna Enduri

Content may be subject to copyright.

A Tool for Fake News Detection using Machine

Learning Techniques

K Raghavendra Asish

Department of CSE,

SRM University-AP, India

kalva raghavendra@srmap.edu.in

Adarsh Gupta

Department of CSE,

SRM University-AP, India

adarsh gupta@srmap.edu.in

Arpit Kumar

Department of CSE,

SRM University-AP, India

arpit kumar@srmap.edu.in

Alex Mason

Department of CSE,

SRM University-AP, India

alex martin@srmap.edu.in

Murali Krishna Enduri

Department of CSE,

SRM University-AP, India

muralikrishna.e@srmap.edu.in

Satish Anamalamudi

Department of CSE,

SRM University-AP, India

satish.a@srmap.edu.in

Abstract—The web and internet are very important to a very

huge number of people and it has a large number of users.

These users use these platforms for different purposes. There are

many social media platforms that are available to these users.

Any user can make a spread or post the news/message through

these online social platforms. Even though the algorithms used

by social media platforms are updated meticulously, they still

are not efﬁcient enough to ﬁlter out the fake news or make the

essential information viral ﬁrst where it is needed so that the

information surrounding that speciﬁc region beneﬁts the people

living there before the news reaches out to the rest of the world.

One of the biggest methods of fake news contribution is social

bots. Social bots generate the content automatically and spread

among the social media users.

In this work, we propose an effective approach to detect fake

news / false information using machine learning techniques. We

provide a tool to detect fake news using Naive Bayes technique

with high accuracy. We show the results on two data sets by

using our tool.

Index Terms—Fake news, Machine learning, Text processing,

LSTM, Naive Bayes.

I. INTRODUCTION

World is changing rapidly. There used to be a time when

we received a letter from our beloved ones made us feel

so special, but in today’s world thanks to the emergence of

social networking websites that have enabled users across

the world to consume content as well as share information.

A person who is sitting on the 14th ﬂoor of a building in

Manhattan within seconds gets the news about the live score

of a cricket match happening in Eden Gardens, India. We

know that many advantages of this digital world are similar

to those many disadvantages as well. The way all that content

spreads so rapidly and effectively across platforms, on a daily

basis, indicates that individuals are potentially vulnerable to

being misled, manipulated, and deceived, which might result

in a long lasting backlash. News / Information is available

on ﬁngertips these days, thanks to the emergence of social

networking websites that have enabled users across the world

to consume content as well as share information. In recent

years, web-based media have reformed the manner by which

information spreads across the web and the world. [1], [2],

[3], [4], [5].

Some of the common social factors are Truth Bias, Naive

Realism and Conﬁrmation Bias i.e the person consumes the

news (which might be formed in a way to shape the reader’s

belief) as it is without worrying about it’s authenticity, then

forms his/her opinions and disregards others views as irrational

or biassed, because they favour receiving information that only

veriﬁes their own current views. For instance, a journalist

posted a false report about Steve Jobs that he suffered a heart

attack through a channel (CNN) which is a trusted website [6],

[7].

Even though the algorithms used by social media platforms

are updated meticulously, they still are not efﬁcient enough to

ﬁlter out the fake news or make the essential information viral

ﬁrst where it is needed so that the information surrounding

that speciﬁc region beneﬁts the people living there before the

news reaches out to the rest of the world [8], [9]. One of the

biggest methods of fake news contribution is social bots. Social

bots generate the content automatically and spread amongst the

social media user. In the US election (2016), a huge number

of people tweeted in favour of either Trump or Clinton but

those were not genuine users instead social bots distorted the

elections.

II. RELATED WORK

In the present time, the spread of deception is turned into

an exceptionally simple undertaking due to web based media.

To stop this we want to discover news is phoney or genuine.

For which we have constructed a model which will recognize

that given news is phoney or not by utilising a few ML and

NLP ideas and calculations.

Bag Of Words(BoW): Bag Of Words is most generally

utilized in the techniques for report classiﬁcation [10], [11].

BoW is Natural Language Processing technique and Informa-

tion Retrieval strategy. NLP models are utilised on the numbers

we can’t utilise text information into our model. Therefore

the BoW model is utilised to preprocess text information by

2022 2nd International Conference on Intelligent Technologies (CONIT)

arnataka, India. June 24-26, 2022

Authorized licensed use limited to: SRM University. Downloaded on September 07,2022 at 04:45:32 UTC from IEEE Xplore. Restrictions apply.

changing it into a pack of words. In this strategy Frequency

of each word is utilised as an element in the order. BoW is

a request-less documentation portrayal model in which just

recurrence of words is important.

N-grams: N-grams is a text characterization model which

for the most part is utilized in NLP and text mining [12], [6],

[13]. N-grams really are a bunch of co-happening words in

given information and while processing n-grams push a single

word ahead.

TF-IDF: Term Frequency–Inverse Document Frequency is

a numerical statistic that is indicated to understand the mostly

usage of a word to a document in a dataset [14], [15]. It is

frequently used as a weighting factor in ﬁnding for information

retrieval, text mining.

Term Frequency (TF): Term Frequency is the inclusion of

words present in the archive or a discovery of the disparity

between the document [12], [16]. Each archive is portrayed in

a vector that contains the word count. This term is determined

by number of times the term shows up in a record separated

by total number of terms in the Document [6].

Word Embedding: Word inserting is a bunch of language

demonstrating and highlighting extraction procedures in Natu-

ral Language Processing (NLP). In word inserting words from

jargon are changed over into the vectors of genuine numbers.

Word inserting is a sort of word portrayal that permits words

with comparable importance to have a comparable portrayal.

III. EXPERIMENTAL STUDY AND RESULT ANALYSIS:

LSTM: Long Short Term Memory is a sort of repetitive

neural organisation. In RNN yield from the last advance is

taken care of as contribution to the current advance. LSTM

can of course hold the data for an extensive stretch of time.

It is utilised for handling, anticipating and arranging based on

time series information.

LSTM units are a structure block for the layers of a

Recurrent Neural Network (RNN). A LSTM unit is made up

of a cell, and three gates namely input gate, output gate and

forget gate. The cell tracks the values throughout a huge time

stretch so the connection of the word in the beginning of the

message can impact the result of the word later in the sentence.

Existing neural networks can’t recall or keep the record of

what everything is passed before they are executed, this stops

the ideal impact of words that comes in the sentence before

to have any effect on the completion words, and it appears to

be a signiﬁcant deﬁciency [12].

It handled the issue of long haul conditions structure in

which the RNN can not anticipate the word put away in

the drawn out memory however can give additional exact

expectations from the new information. LSTM can of course

hold the data for an extensive stretch of time [17], [18].

Overview of Dataset: Data Set 1: Dataset is taken from

the Kaggle platform. It has the following attributes: id: unique

id for a news article, title: the title of a news article, author:

author of the news article, text: the information of news article.

Data set consists of 18212 news articles for training and testing

of models. Dataset is formed with a combination of real and

fake news pertaining to the US, especially politics.

Data Set 2: We have used a dataset called IFND from

Kaggle [19]. This dataset covers news exclusive to India

which is created by scraping Indian fact checking websites.

The dataset contains fake and legitimate news, further divided

into different types of articles on different topics. It has the

following attributes: id: unique id for a news article, title: the

title of a news article, website: news website that published

the news, label: real or fake. The dataset consists of 56714

news articles for training and testing of models.

Implementation: Preprocessing: To change information

into the applicable arrangement the informational collection

needs pre-process. Right off the bat, we eliminated all the

NAN values from the dataset. Vocabulary size of 5000 words

is chosen. Then, at that point, the NLTK (Natural Language

Processing) Toolkit is utilised to eliminate all the prevent

words from the dataset. Stop words are rundown of accen-

tuations which prevent words from nltk toolbox for example

words, for example, ’and’, ’the’ and ’I’ that don’t pass a lot

of data changing over them on to lowercase and eliminating

accentuation. For each word in records on the off chance that

it’s anything but a stop word then that words tag is taken

from the postag. Then, at that point, this assortment of words

is added to report.

Word index of tokenize dataset: Word tokenization adds

text to a list, which is then named as a document. This stage’s

output is a list of all the words in the narration.

Word Embedding: One-hot Representation: We can not

give input as message conﬁguration to the calculation so we

need to change over them into the numeric structure, for which

we are utilising one hot representation. In one-hot represen-

tation each word in the dataset will be given its record from

the characterised vocabulary size and these lists are supplanted

in sentences. While contributing to the word implanting, we

need to give it a ﬁxed length. To change over each sentence

into the ﬁxed length cushioning arrangements is utilised. We

have thought about a max length of 20 words while cushioning

the title. It is possible that we can give cushioning before the

sentence (pre) or after the sentence (post), and afterward these

sentences pass as contribution to the word implanting. Word

installing applies highlight extraction on the given input vector.

In absolute 40 vector highlights are thought of.

Model: Yield from the word implanting is given to the

model. The AI model carried out here is a successive model

consisting of installing a ﬁrst layer which consists of values,

vocabulary size, number of elements and length of sentence.

The following LSTM with sigmoid activation will work as we

want one last result. The model ﬂow for the data processing

is shown in the Fig. 1.

Classiﬁcation: For both preprocessed testing data the result

is predicted. If the predicted value>0.5Classiﬁed as 1is real

and 0is fake.

Accuracy =(TP +TN)

T otal

Authorized licensed use limited to: SRM University. Downloaded on September 07,2022 at 04:45:32 UTC from IEEE Xplore. Restrictions apply.

Fig. 1. Module Flow for Data Processing

The following terms were used: True Negative (TN), i. e.,

the prediction was negative and test cases, too, were actually

negative; True Positive (TP) i.e., the prediction was positive

and test cases, too, were actually positive; False Negative

(FN) i.e., the prediction was negative, but the test cases were

actually positive; False Positive (FP), i.e., the prediction was

positive, but the test cases were actually negative.

IV. RESULTS

The characterization precision for genuine news stories and

misleading news stories is generally something very simi-

lar, however arrangement exactness for counterfeit news is

marginally digressed. By utilising the disarray lattice and the

grouping report further the exactness of every individual model

is estimated.

TABLE I

CLASSIFICATION TABLE OF LSTM WITH ACCURACY 0.91177 FOR DATA

SET1

N=8014 Predicted: NO Predicted: YES

Actual: NO 4191 374

Actual:YES 333 3116

Fig. 2. LSTM Model Accuracy for Data Set1

TABLE II

CLASSIFICATION TABLE OF LSTM WITH ACCURACY 0.91865 FOR DATA

SET2

N=24955 Predicted: NO Predicted: YES

Actual: NO 15627 923

Actual:YES 1107 7298

Fig. 3. LSTM Model Accuracy for Date Set2

Limitations:

1) It takes longer to train LSTMs.

2) To train, LSTMs demand additional memory.

3) The LSTM algorithm is easily overﬁtted.

Naive Bayes Model: In machine learning, Naive Bayes

classiﬁers are the piece of straightforward AI. Naive Bayes

is a well known calculation which is utilised to observe the

precision of the news whether its genuine or phoney utilising

multinomial NB and pipelining ideas. There are a number of

calculations that emphasise on normal guidelines, so it isn’t

the main calculation for preparing such classiﬁers. To check

on the off chance that the news is phoney or genuine innocent

Bayes can be utilised. It is a sort of calculation utilized in text

characterization. The utilization of the token is connected with

the news that might be phoney or not phoney in the Na¨

ıve

Bayes classiﬁer and afterward the precision of the news is

determined by utilizing Bayes theorem. There are essentially

3 sorts of Na¨

ıve base model as - Gaussian Na¨

ıve Bayes,

Multinomial guileless Bayes and Bernoulli Na¨

ıve Bayes. We

have utilized Multinomial Na¨

ıve Bayes model for our task to

identify counterfeit news [12], [20]. A beneﬁt of Na¨

ıve Bayes

classiﬁer is that it requires less preparation information for

grouping.

Naive Bayes Formula Details: Coming up next is the equa-

tion for naive Bayes characterization utilizes the likelihood of

the past occasion and contrasts it and the current occasion.

Each and every likelihood of the occasion is determined and

ﬁnally the general likelihood of the news when contrasted

with the dataset is determined. Subsequently on computing

Authorized licensed use limited to: SRM University. Downloaded on September 07,2022 at 04:45:32 UTC from IEEE Xplore. Restrictions apply.

the general likelihood, we can get the surmised esteem and

can distinguish whether the news is genuine or counterfeit .

P(E1|E2)=P(E2|E1)P(E1)

P(E2)(1)

Finding the probability of event, E1when event E2is TRUE

P(E1)=PRIOR PROBABILITY (2)

P(E1|E2)=POSTERIOR PROBABILITY (3)

If the probability is 0

P(W)= Wc+1

(Wn+Wu)(4)

Where Wis a word, Wcis a word count, Wnis total number

of words and Wunumber of unique words. Therefore, by using

this formula one can ﬁnd the accuracy of the news.

Fig. 4. Module Flow for Data Processing

Implementation: Data Pre-processing: This contains ev-

ery one of the information which should be checked com-

pletely and preprocessed. To start with, we go through the

train, test and approve information records then, at that point,

play out some preprocessing like tokenizing, stemming and so

on. Here the information is checked completely assuming it

has missing qualities.

Feature Extraction: In this dataset we have done element

extraction and choice strategies from scikit and python. To

perform highlight choice, we utilize a strategy called tf-idf. We

have likewise utilized word to vector to separate the highlights,

additionally pipelining has been utilized to facilitate the code.

Classiﬁcation: Here the order of the information is done

into parts that is test information and train information and

the train dataset is characterized into bunches with comparable

substances. Later the test information is coordinated, and the

gathering is relegated to whichever it has a place with and

afterward further the Na¨

ıve Bayes classiﬁer is applied and the

likelihood of every single word is determined independently.

Prediction: At last, our model was saved with the name

ﬁnalised model.sav. This model will be replicated to the

client’s machine and will be utilized by the app.py document

to arrange the phoney news with precision. It takes a news

story as a contribution from the client then, at that point, the

model is utilized for a conclusive arrangement yield that is

displayed to client alongside likelihood of truth.

Fig. 5. Confusion Matrix for Data Set1

Fig. 6. Confusion Matrix for Data Set2

TABLE III

ACCURACY FOR DIFFERENT DATA SETS

Results Data Set1 Data Set2

Accuracy 0.89693 0.93216

Advantages: In this way by utilizing Naive Bayes theorem

we can infer that any report from a huge or little dataset can be

delegated phoney or genuine news by coordinating it with the

past dataset values quicker than expected which thus assists the

clients with putting stock in a speciﬁc news. When assumption

Authorized licensed use limited to: SRM University. Downloaded on September 07,2022 at 04:45:32 UTC from IEEE Xplore. Restrictions apply.

of independence holds, a Naive Bayes classiﬁer performs

better compared to other models like logistic regression and

you need less training data.

V. P ROPOSED APP ROACH

A few ways by which we can identify fake news is as

follows: One way for handling counterfeit news is to manually

classify news as legitimate or fake. Despite the fact that it is

straightforward and seems like the simplest solution, it’s not

feasible to label the massive amount of news that already exists

and gets produced daily. Therefore, we must look for a realistic

technical solution to do the same in minimal duration. The

other way is to develop a classiﬁcation model that has been

trained with a machine learning algorithm to label the data,

and then retrieve it as and when required.

Although a few algorithms exist, it is not readily available

to common people, which gives rise to a burning question

that “then what’s the point of fake news detection”, so we

thought of creating something that will make detecting fake

news a lot easier and easily accessible. That is why we as a

collective developed this Google Chrome extension and linked

it with a Naive Bayes-based machine learning model, which

runs at server-side. We chose Naive Bayes because it takes less

time to train and test the data and also because its efﬁciency

matches that of the LSTM with respect to texts. Now all a

person needs to do is to copy any sentence from any website,

paste it into the extension and press the analyse button, in a

matter of seconds it will show whether the news is reliable or

unreliable.

Fig. 7. Google Chrome extension

Coming to the dataset, we have developed this extension

based on the Data Set 2 as we assumed most of the users who

will use this extension will be from India. That is why it is

important to make this extension relevant to Indian Politics,

Sports, Developments, Markets, etc. When the user uses this

extension then he/she will be able to understand whether the

data which they are consuming is reliable or not.

Advantages:

1) Our extension provides results instantly because of its

very less time complexity of the backend process.

2) It is a lightweight extension that is completely free of

cost.

3) This is the ﬁrst extension to detect fake news based on

an Indian Dataset.

4) One more feature of our extension which makes this

unique from all other extensions that are available on

chrome web stores is that all other extensions provide

a platform for users to categorize and tag whether

particular news is true or not, this makes the result

unreliable as everyone will have their own view on the

data and categorize it accordingly which will impact

the actual result, in our extension users don’t have any

platform to tag whether particular data is true or not,

data directly gets veriﬁed by the AI model which runs

at the backend.

VI. CONCLUSION AND FUTURE WORK

Fake news spreads extraordinarily fast. It can be indistin-

guishable from accurate reporting. Nowadays, people down-

load the information and re-share it. They do not even check

whether it’s true or not, and by the end of the day, the

false information has gone so far from its source that it

becomes indistinguishable. We proposed an effective approach

to detect fake news/ false information using machine learning

techniques. We are providing a tool to detect fake news using

Naive Bayes technique with high accuracy. We show the

results on two data sets by using our tool. This tool is used

as a google chrome extension. In the future, we will try to

implement and execute a model that can predict the reason

for unreliability.

REFERENCES

[1] S. B. Parikh and P. K. Atrey, “Media-rich fake news detection: A survey,”

in 2018 IEEE conference on multimedia information processing and

retrieval (MIPR). IEEE, 2018, pp. 436–441.

[2] H. S. Al-Ash and W. C. Wibowo, “Fake news identiﬁcation charac-

teristics using named entity recognition and phrase detection,” in 2018

10th International Conference on Information Technology and Electrical

Engineering (ICITEE). IEEE, 2018, pp. 12–17.

[3] K. Stahl, “Fake news detection in social media,” California State

University Stanislaus, vol. 6, pp. 4–15, 2018.

[4] A. A. A. Ahmed, A. Aljabouh, P. K. Donepudi, and M. S. Choi,

“Detecting fake news using machine learning: A systematic literature

review,” arXiv preprint arXiv:2102.04458, 2021.

[5] A. Jain, A. Shakya, H. Khatter, and A. K. Gupta, “A smart system for

fake news detection using machine learning,” in 2019 International Con-

ference on Issues and Challenges in Intelligent Computing Techniques

(ICICT), vol. 1. IEEE, 2019, pp. 1–4.

[6] H. Ahmed, I. Traore, and S. Saad, “Detecting opinion spams and fake

news using text classiﬁcation,” Security and Privacy, vol. 1, no. 1, p. e9,

2018.

[7] S. Shabani and M. Sokhn, “Hybrid machine-crowd approach for fake

news detection,” in 2018 IEEE 4th International Conference on Collab-

oration and Internet Computing (CIC). IEEE, 2018, pp. 299–306.

[8] S. Gupta, R. Thirukovalluru, M. Sinha, and S. Mannarswamy, “Cimt-

detect: a community infused matrix-tensor coupled factorization based

method for fake news detection,” in 2018 IEEE/ACM International

Conference on Advances in Social Networks Analysis and Mining

(ASONAM). IEEE, 2018, pp. 278–281.

[9] Z. Jin, J. Cao, Y.-G. Jiang, and Y. Zhang, “News credibility evaluation

on microblog with a hierarchical propagation model,” in 2014 IEEE

international conference on data mining. IEEE, 2014, pp. 230–239.

Authorized licensed use limited to: SRM University. Downloaded on September 07,2022 at 04:45:32 UTC from IEEE Xplore. Restrictions apply.

[10] J. Wang, P. Liu, M. F. She, S. Nahavandi, and A. Kouzani, “Bag-of-

words representation for biomedical time series classiﬁcation,” Biomed-

ical Signal Processing and Control, vol. 8, no. 6, pp. 634–644, 2013.

[11] B. Srinivasa-Desikan, Natural Language Processing and Computational

Linguistics: A practical guide to text analysis with Python, Gensim,

spaCy, and Keras. Packt Publishing Ltd, 2018.

[12] R. K. Kaliyar, “Fake news detection using a deep neural network,” in

2018 4th International Conference on Computing Communication and

Automation (ICCCA). IEEE, 2018, pp. 1–7.

[13] M. D. Ibrishimova and K. F. Li, “A machine learning approach to

fake news detection using knowledge veriﬁcation and natural language

processing,” in International Conference on Intelligent Networking and

Collaborative Systems. Springer, 2019, pp. 223–234.

[14] S. Vijayarani, M. J. Ilamathi, M. Nithya et al., “Preprocessing techniques

for text mining-an overview,” International Journal of Computer Science

& Communication Networks, vol. 5, no. 1, pp. 7–16, 2015.

[15] H. Ahmed, I. Traore, and S. Saad, “Detection of online fake news

using n-gram analysis and machine learning techniques,” in International

conference on intelligent, secure, and dependable systems in distributed

and cloud environments. Springer, 2017, pp. 127–138.

[16] Q. He, “Knowledge discovery through co-word analysis,” 1999.

[17] R. K. Kaliyar, A. Goswami, P. Narang, and S. Sinha, “Fndnet–a

deep convolutional neural network for fake news detection,” Cognitive

Systems Research, vol. 61, pp. 32–44, 2020.

[18] J. C. Reis, A. Correia, F. Murai, A. Veloso, and F. Benevenuto,

“Supervised learning for fake news detection,” IEEE Intelligent Systems,

vol. 34, no. 2, pp. 76–81, 2019.

[19] D. K. Sharma and S. Garg, “Ifnd: a benchmark dataset for fake news

detection,” Complex & Intelligent Systems, pp. 1–21, 2021.

[20] M. L. Della Vedova, E. Tacchini, S. Moret, G. Ballarin, M. DiPierro,

and L. de Alfaro, “Automatic online fake news detection combining

content and social signals,” in 2018 22nd conference of open innovations

association (FRUCT). IEEE, 2018, pp. 272–279.

Authorized licensed use limited to: SRM University. Downloaded on September 07,2022 at 04:45:32 UTC from IEEE Xplore. Restrictions apply.

Liver Disease Prediction and Classification using Machine Learning Techniques

Article

Full-text available

Jan 2023

Tackling Disinformation: Machine Learning Solutions for Fake News Detection

Conference Paper

Apr 2024

Detection of fake news: A comparative analysis using machine learning

Conference Paper

Jan 2024

Enhancing Fake News Detection Using Classification Algorithms and Deep Learning

Conference Paper

Dec 2023

An Empirical Study on Fake News Prediction with Machine Learning Methods

Conference Paper

Dec 2022

COVID-19 Fake News Detection on Social Media

Conference Paper

Full-text available

Jun 2022

Since the outbreak of COVID-19, social media plays an important role to circulate pandemic news around the world. Some malevolent users may take an advantage of this and spread fake news to attract people for business and research purposes. In this paper, we take an approach by applying existing machine learning algorithms to detect fake news in social media and show a comparison of their performances. In our study, the support vector classifier (SVC) outperforms the rest of the classifiers based on different statistical metrics. Therefore, the SVC classifier has been considered as our proposed classifier model to identify fake COVID-19 news in social media. Two word clouds have also been generated based on the appearance of words in the news that shows an insignificant difference between true and fake news.

IFND: a benchmark dataset for fake news detection

Article

Full-text available

Oct 2021

Spotting fake news is a critical problem nowadays. Social media are responsible for propagating fake news. Fake news propagated over digital platforms generates confusion as well as induce biased perspectives in people. Detection of misinformation over the digital platform is essential to mitigate its adverse impact. Many approaches have been implemented in recent years. Despite the productive work, fake news identification poses many challenges due to the lack of a comprehensive publicly available benchmark dataset. There is no large-scale dataset that consists of Indian news only. So, this paper presents IFND (Indian fake news dataset) dataset. The dataset consists of both text and images. The majority of the content in the dataset is about events from the year 2013 to the year 2021. Dataset content is scrapped using the Parsehub tool. To increase the size of the fake news in the dataset, an intelligent augmentation algorithm is used. An intelligent augmentation algorithm generates meaningful fake news statements. The latent Dirichlet allocation (LDA) technique is employed for topic modelling to assign the categories to news statements. Various machine learning and deep-learning classifiers are implemented on text and image modality to observe the proposed IFND dataset's performance. A multi-modal approach is also proposed, which considers both textual and visual features for fake news detection. The proposed IFND dataset achieved satisfactory results. This study affirms that the accessibility of such a huge dataset can actuate research in this laborious exploration issue and lead to better prediction models.

Preprocessing Techniques for Text Mining - An Overview

Article

Full-text available

Feb 2015

Vijayarani Mohan

Data mining is used for finding the useful information from the large amount of data. Data mining techniques are used to implement and solve different types of research problems. The research related. It is also called knowledge discovery in text (KDT) or knowledge of intelligent text analysis. Text mining is a technique which extracts information from both structured and unstructured data and also finding patterns. Text mining techniques are used in various types of research domains like natural language processing, information retrieval, text classification and text clustering.

A smart System for Fake News Detection Using Machine Learning

Conference Paper

Full-text available

Sep 2019

FNDNet- A Deep Convolutional Neural Network for Fake News Detection

Article

Jun 2020
COGN SYST RES

With the increasing popularity of social media and web-based forums, the distribution of fake news has become a major threat to various sectors and agencies. This has abated trust in the media, leaving readers in a state of perplexity. There exists an enormous assemblage of research on the theme of Artificial Intelligence (AI) strategies for fake news detection. In the past, much of the focus has been given on classifying online reviews and freely accessible online social networking-based posts. In this work, we propose a deep convolutional neural network (FNDNet) for fake news detection. Instead of relying on hand-crafted features, our model (FNDNet) is designed to automatically learn the discriminatory features for fake news classification through multiple hidden layers built in the deep neural network. We create a deep Convolutional Neural Network (CNN) to extract several features at each layer. We compare the performance of the proposed approach with several baseline models. Benchmarked datasets were used to train and test the model, and the proposed model achieved state-of-the-art results with an accuracy of 98.36% on the test data. Various performance evaluation parameters such as Wilcoxon, false positive, true negative, precision, recall, F1, and accuracy, etc. were used to validate the results. These results demonstrate significant improvements in the area of fake news detection as compared to existing state-of-the-art results and affirm the potential of our approach for classifying fake news on social media. This research will assist researchers in broadening the understanding of the applicability of CNN-based deep models for fake news detection.

A Machine Learning Approach to Fake News Detection Using Knowledge Verification and Natural Language Processing

Chapter

Jan 2020

The term “fake news” gained international popularity as a result of the 2016 US presidential election campaign. It is related to the practice of spreading false and/or misleading information in order to influence popular opinion. This practice is known as disinformation. It is one of the main weapons used in information warfare, which is listed as an emerging cybersecurity threat. In this paper, we explore “fake news” as a disinformation tool. We survey previous efforts in defining and automating the detection process of “fake news”. We establish a new definition of “fake news” in terms of relative bias and factual accuracy. We devise a novel framework for fake news detection, based on our proposed definition and using a machine learning model.

Fake News Detection Using A Deep Neural Network

Conference Paper

Dec 2018

Rohit Kaliyar

Supervised Learning for Fake News Detection

Article

Mar 2019

A large body of recent works has focused on understanding and detecting fake news stories that are disseminated on social media. To accomplish this goal, these works explore several types of features extracted from news stories, including source and posts from social media. In addition to exploring the main features proposed in the literature for fake news detection, we present a new set of features and measure the prediction performance of current approaches and features for automatic detection of fake news. Our results reveal interesting findings on the usefulness and importance of features for detecting false news. Finally, we discuss how fake news detection approaches can be used in the practice, highlighting challenges and opportunities.

Hybrid Machine-Crowd Approach for Fake News Detection

Conference Paper

Oct 2018

Fake News Identification Characteristics Using Named Entity Recognition and Phrase Detection

Conference Paper

Jul 2018

A Tool for Fake News Detection using Machine Learning Techniques

Figures

Recommended publications

Experiments on Detecting Fake News using Machine Learning Algorithms

Machine Learning Application for Oscillation Detection in Control Loops

Email Spam Detection Using Machine Learning Algorithms

Detection of Faulty Steel Used in Construction Industry Using Machine Learning Algorithms