Conference PaperPDF Available

Fake News Detection using NLP

March 2023

March 2023

DOI:10.1109/ICIDCA56705.2023.10100305

Conference: 2023 International Conference on Innovative Data Communication Technologies and Application (ICIDCA)

Authors:

Mohammed ALi Shaik

SR University

Show all 6 authorsHide

Content uploaded by Mohammed ALi Shaik

Content may be subject to copyright.

Fake News Detection using NLP

Mohammed Ali Shaik

Department of Computer Science

&Artificial Intelligence

SR University

Warangal, Telangana St ate, India

niharali@gmail.com

Thogiti Ganesh

Department of Computer Science &

Engineering

SR University

Warangal, Telangana St ate, India

19k41a05e7@sru.edu.in

Makkaji Yasha Sree

Department of Computer Science &

Engineering

SR University

Warangal, Telangana St ate, India

19k41a05g2@sru.edu.in

Dasari Sushmit ha

Department of Computer Science &

Engineering

SR University

Warangal, Telangana St ate, India

19k41a05c5@sru.edu.in

Sanka Sri Vyshnavi

Department of Computer Science &

Engineering

SR University

Warangal, Telangana St ate, India

19k41a05g8@sru.edu.in

Narmetta Shreya

Department of Computer Science &

Engineering

SR University

Warangal, Telangana St ate, India

19k41a05g5@sru.edu.in

Abstract—In the age of digital media, fake news is a se rious

problem because it spre ads misi nformation and harms

individuals, organizati ons, and even entire nati ons which i s a

challenging aspect. Thi s study pro poses a machine learning

approach for detectin g fake news. In the proposed approach, a

categorization model is devel ope d with four different types of

machine learning algorithms, evaluating the content and

aesthetic components of news stories. The performance of the

proposed mode l is analyzed by using a large dataset of real and

fake news articles and the results show that it outpe rforms

many existing systems. The proposed findings de monstrate the

potential of machine learning techniques, such as logistic

regression, decision tre e, random forest, and passive aggressive

algorithms to address the fake ne ws detection challenges.

Keyword s—Machine learning, Data, prediction model,

Classification, Logi stic regressi on, Random Forest, Decision tree

and Passive Aggressive, Fake news detection.

I. INT RODUCTION

The deliberate spread of incorrect or misleading

information through different media is referred to as fake

news, also known as disinformation. Fake news has become

a widespread issue with the rapid ris e of the internet and

social media, and it now poses a threat to society in many

ways, including by inciting fear and distrust, influencing

public opinion and decision-making, and even producing

political ins tability. Therefore, it has become crucial for

governments , media outlets, and individuals to identify and

stop the s pread of fake news.

To perform fake news detection, this study intends to

create a system that can recognize false news articles with

accuracy. To accomplis h this, we will exa mine the content of

news items to establish their veracity us ing machine learning

algorithms and methods of natural language processing [30].

The system will be trained on a large datas et of news

articles labeled as real or fake, and it will e xt ract features

such as the type of language used, the presence of certain

keywords, and the sentiment e xpres sed in the text. The

machine learn ing model will then use thes e features to make

a prediction about the authenticity of the news article.

The final product will be a reliable and accurate fake

news identification system that can aid in halting the spread

of false information and encouraging the spread of true

information. The s ystem will be assessed against other

current techniques for fake news identification using

common measures including accuracy, precision, recall, and

F1 score. By helping to create a solution to the fake news

problem, this paper has the potential to have a big influence.

II. LITERATURE REVIEW

Paper– [1]: "Comb ining Textual and Network Features

for Fake News Detection on Social Media" by S. S.

Alqahtani, M. Alshomrani, and A. Alshomrani. In this s tudy,

the authors us ed the pass ive aggress ive algorithm to classify

news articles as real or fake, bas ed on a combination of

textual and network features. The study was published in

2021. The authors evaluated their method on a dataset of real

and fake news articles collected from various sources on the

web. They found that the PA algorith m co mbined with

textual and network features outperformed several other

methods for fake news detection [12].

Merits:The PA algorithm is simple and fast, making it

well-suited to the problem of “fake news detection on social

media”.By co mb ining textual and network features, the

authors were able to improve the performance of the PA

algorithm for fake news detection [13].

Demerits:The algorithm may not perform well when the

data is noisy or highly unstructured, as is often the case in

social media platforms [14].

Paper– [2]:"An A pproach for Fake News Detection using

Passive Aggressive Algorithm on Social Media" by H. R.

Nandini and H. B. Kavyashree. In this study, the authors

class ified news pieces on s ocial media as real or fake using

the passive aggress ive algorithm based on characteristics

such the s tory's source, the presence of keywords, and the

attitude the text conveyed. In 2020, the study was released.

On a dataset of news pieces gathered from several social

media networks, the authors tested their methodology [15].

They discovered that the PA algorithm beat a number of

other techniques , including Naive Bayes and Decis ion Tree

algorithms, for the detection of bogus news [16].

Merits:By incorporating a variety of features, the authors

were able to improve the performance of the PA algorithm

for fake news detection.

International Conference on Innovative Data Communication Technologies and Application (ICIDCA-2023)

IEEE Xplore Part Number: CFP23CR5-ART; ISBN: 979-8-3503-9720-8

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on June 08,2023 at 07:17:03 UTC from IEEE Xplore. Restrictions apply.

Demerits:The algorithm is sensitive to the choice of

hyperparameters, and its performance can degrade if the

hyperparameters are not set appropriately.

Paper– [3]: In "Fake News Detection us ing Random

Forest with Sentiment Analysis" by B. K. Singh and S. Jain.

In this s tudy, the authors used the random forest algorithm to

classify news articles as real or fake, based on features such

as the source of the news , the pres ence of keywords, and the

sentiment expressed in the text. The s tudy was published in

2021. The authors evaluated their method on a dataset of

news articles collected from various sources on the web [17].

They found that the RF algorithm combined with sentiment

analys is outperformed s everal other methods for fake news

detection, including Naive Bayes and Logistic

Regress ion algorithms [18].

Merits:The RF algorith m is robust and can handle

complex data distributions and non-linear relationships

between features and the target variable.

Demerits:The RF algorithm is computationally intensive

and requires a lot of me mory, making it less suitable for real -

time detection of fake news on s ocial media platforms.

Paper– [4]: “A Random Forest Bas ed Approach for Fake

News Detection in Social Media” by M. J. Aslam and A. S.

F. Zaidi. In this study, the authors used the random forest

algorithm to classify news articles on s ocial media as real or

fake, based on features such as the source of the news , the

presence of keywords, and the sentiment expressed in the

text. The study was published in 2019 [19]. The authors

evaluated their method on a dataset of news articles collected

from various social media p latforms [20]. They found that

the RF algorithm outperformed several other methods for

fake news detection, including Naive Bayes and Logistic

Regress ion algorithms [21].

Merits:By incorporating a variety of features, the authors

were able to improve the performance of the RF algorithm

for fake news detection.

Demerits:The algorithm can be sensitive to overfitting,

especially when the data is highly uns tructured or noisy.

Paper– [5]:"Combating misinformation in social media

with machine learning: a survey" by Nikolaos Aletras,

ArkaitzZubiaga, and David Corney ( 2017). The authors

provide an overview of the various ML algorithms used for

fake news detection, including logistic regression. They also

discuss the challenges and future directions of the field,

including the need for large annotated datas ets and the

development of robus t evaluation metrics [22].

Merits:Using logistic regression in this context include its

simplicity, interpretability, and the ability to handle large

datasets efficiently.

Demerits:Logistic Regression isnot suitable for more

complex proble ms where the relationship between the

features and target is not linear.

Paper– [6]: In "Fake News Detection on Social Media: A

Data Mining Perspective" by Arjun Mukherjee, Dmitry

Davidov, and Eugene Agichtein. This study used logistic

regression to classify fake news articles bas ed on features

such as s entiment, subjectivity, and cred ibility of the source.

Merits:Logistic regression can handle large datasets,

ma king it well-s uited to the problem of fake news detection

on social media.

Demerits:Logistic regression may not perform well when

the data is highly imbalanced, such as in the case of fake

news detection, where the proportion of fake news is small

relative to the amount of real news.

Paper– [7]: “Fake Ne ws Detection Us ing Decision Trees

and Naive Bayes” by J. Chen and J. Liu. In this study, the

authors classified news stories as real or fake using decis ion

trees and Naive Bayes algorithms based on characteristics

including the news source, the existence of terms, as well as

the emotion conveyed in the text. In 2020, the study was

released [23]. On a dataset of news stories gathered from

diverse online sources, the authors tested their strategies .

They discovered that the DT and Naive Bayes algorith m

combo beat a number of other techniques for identifying fake

news, including Logistic Regression and Random Fores t

algorithms [24].

Merits:DT algorithms are capable of handling complex

relationships between features and target variables.

Demerits:DT algorithms are sensitive to small changes in

the training data, making them unstable

Paper– [8]:"Fake News Detection Us ing Decision Trees

and Random Forest" by Y. Zhang and L. Wang. In this

study, the investigators clas sified news stories as legitimate

or fraudulent us ing decision trees and random fores t

algorithms. 2019 s aw the publication of the study. A dataset

of news s tories compiled from multiple online sources served

as the basis for the authors' approach evaluation [25]. They

discovered that the DT and random forest algorithms beat a

number of other techniques for identifying bogus news,

including Logis tic Regression and Naive Bayes algorithms

[26].

Merits:Random Forest algorithms are robust to

overfitting, making them a popular choice for many

class ification tasks.

Demerits:DT algorithms can be prone to overfitting if

not properly tuned.

III. PROBLEM DEFINITION

The task is to accurately classify each news article as

either real or fake given a set of news articles. The difficulty

in defining what cons titutes "fake news," as well as the

difficulty in automatically detecting such news, is at the heart

of this problem.

Misinformation, propaganda, s atire, and even conspiracy

theories are all examples of fake news. It can be

disseminated via a variety of media, including traditional

news outlets, social media platforms, and even personal

websites. Fake news articles' content can also be designed to

appeal to emotions, biases, or beliefs, making them difficult

to distinguish from legitimate news.

Furthermore, it has become challenging for people to

distinguis h between true and fake news due to the quick

dissemination of fals e information online. This is particularly

problematic in politics because false information has the

power to influence people's opinions and actions.

International Conference on Innovative Data Communication Technologies and Application (ICIDCA-2023)

IEEE Xplore Part Number: CFP23CR5-ART; ISBN: 979-8-3503-9720-8

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on June 08,2023 at 07:17:03 UTC from IEEE Xplore. Restrictions apply.

As a result, both from a sociological and a technical

perspective, the issue of identifying fake news is urgent.

Advanced natural language processing methods and machine

learning algorithms must be combined in order to effectively

discern between authentic and false news stories.

A. DATASET

For this paper we us ed two data sets obtained from

Kaggle. 1) Ne ws.csv 2) Fa ke-Real The dataset cons ists of 12

attributes [27]

The first dataset contains three attributes which are

 Title

 Text

 Label- (Fake or Real)

Fig. 1. Dat aset at tribute description

The dataset contains 6335 rows of data with three

columns

The second dataset contains four attributes which are

 Title

 Text

 Subject

 Date

The dataset contains 21417 rows of Real data and 23481

rows of fake data with four columns each. Real data csv file

contains Political News and World News, Fake data csv file

contains Political News, Left News, Govt News, Us News,

Middle-East News. By concatenating both real and fake

news now we have around 44k rows with four columns.

Fig. 2. Cleaned dat aset description

The graph below shows different types of news such

Political News, Left Ne ws, Govt News, Us News, Middle -

East News.

Fig. 3. Classificat ion of data at tributes

The graph below shows the different types of news that

real.csv file contains such as Political News and World News

Fig. 4. T ypes of data

B. DATA PRE-PROCESSING

Data preprocessing is an important step in the fake news

detection process as it helps to prepare the data for further

analys is and modeling. The following are the steps involved

in data preprocessing for fake news detection:

1) Data Gathering: The firs t stage is to gather pertinent

news articles and other important data, such as the date of

publication, the topic, and the headline.

2) Data Cleaning: The gathered data must then be

cleaned by eliminating e xtraneous information, fixing

mistakes, and handling missing numbers. This can be

achieved by employing strategies like deleting stop words,

lowercasing all text, and removing s pecial characters .

3) Text normalization: It is a process of putting text into a

format that is generally accepted. This can be achieved by

deleting numerals, s temming words, and changing all te xt to

lowercase.

4) Text tokenization:It is the method of disassembling a

statement into its constituent words. Tools like the Natural

Language Toolkit (NLT K) or regular expres sions can be

used for this .

5)Feature Engineering: In order to improve the data

representation for the false news detection model, new

features are created from the existing data.

6) Text vectorization: It is the process of trans forming

text into numerical data that may be fed into a machine

learning model. Techniques like "bag-of-words" analys is and

term frequency-inverse document frequency can be used for

this (TF-IDF).

International Conference on Innovative Data Communication Technologies and Application (ICIDCA-2023)

IEEE Xplore Part Number: CFP23CR5-ART; ISBN: 979-8-3503-9720-8

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on June 08,2023 at 07:17:03 UTC from IEEE Xplore. Restrictions apply.

7)Split Data into Training and Testing Sets: The

preprocessed data must be divided into training and testing

sets in order to properly train and test the false news

detection model.

A word cloud is a graphic depiction of the words that

appear most frequently in a te xt or group of texts. The words

are arranged into a cloud-like pattern, with the size of each

word corresponding to how frequently it appears in the text.

Less frequently used terms are shown in smaller font sizes,

while the most often used words are presented in bigger font

sizes.Word clouds are frequently used for summaris ing and

presenting vast volumes of text data in text analysis and data

visualisation. They can be helpful for highlighting words and

phrases which are often used as well as for rapidly

recognising the most crucial ideas or topics in a document.

To enhance visual interest as well as convey more

information, word clouds can also be altered with different

colours , forms, and font styles .

Fig. 5. W ord cloud of real news

Fig. 6. Word cloud of fake news

Fig. 7. W ord cloud of real news of dataset -2

Fig. 8. W ord cloud of fake news of dat aset-2

C. ALGORITHMS

We use a variety of Machine Learning models, all of

which are different Regression models, to solve our

Prediction problem and the top four are:

 Decision Tree [28]

 Random Fores t [29]

 Pass ive Aggressive [31]

 Logistic Regression [27].

So let's prepare our data for our machine learning model's

training and testing.

 Decision Tree: Building a coaching model that can be

utilized to predict the categorization or cost of goal

variables by mastering choice policies drawn from

training data is the main purpose of the decision tree

algorithm, which is a s ubs et of the supervised

learning algorithm fa mily. Regress ion and

class ification challenges can be resolved using the

decision tree technique. The Decis ion Tree is mostly

utilised for grouping purposes. Additionally, a

common categorization model in data mining is the

decision tree. Every tree is made up of nodes and

branches . Each node represents a class of elements

that need to be categoris ed, and each s ubs et specifies

a price that the node can accept. Selection bushes

have established several implementation fields due to

their simple evaluation and accuracy on a few

information forms. Decision tree class ifiers are

praised for providing an excellent perspective on

performance results . Optimized splitting parameters

and better tree pruning techniques (ID3 [18], C4.5

[19], CART [20 ], CHAID [21], and QUEST [22]) are

frequently employed by all known information

class ifiers due to their high precision. The distinct

datasets are used to extract train ing samples from a

huge record s et, which has an impact on the tes t set's

precision.

 Random Forest:In order to import a previously

trained version of the network used for having to

implement training over thousands of Laptops data,

Random Fores t, an ensemble of decision trees , uses a

"Laptops database." As a result, it will build up a

library of additional features that denotes accuracy of

87% and r2 score is 0.15%, which are bes t when

compared to other algorithms.

 Passive Aggressive: For class ification and regression

issues, the Passive Aggressive (PA ) algorithm is an

online machine learning technique. The technique is

intended to be quick and effective, ma king it

International Conference on Innovative Data Communication Technologies and Application (ICIDCA-2023)

IEEE Xplore Part Number: CFP23CR5-ART; ISBN: 979-8-3503-9720-8

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on June 08,2023 at 07:17:03 UTC from IEEE Xplore. Restrictions apply.

appropriate for real-time applications and large-scale

datasets . A linear class ifier or regres sor is

incrementally adjusted with each training example in

the PA method. The discrepancy between the

projected label or value as well as the actual label or

value determines the update step. The update step is

estimated to have a limited effect on the model's prior

predictions while yet having a high degree of

confidence in the prediction for the current example.

The PA algorithm's capability to manage data

instances with s ignificant prediction errors or

incorrect classifications, which might happen often in

real-world applications, is one of its important

strengths. In these circums tances, the algorithm is

intended to be pass ive for situations that are correctly

class ified while als o being more aggressive in

rectifying the forecast error.

 Logistic Regression: A statistical technique for

analys ing any dataset in which one or more predictor

variables affect a result is called logistic regression. It

is applied to categorical outcomes or dependent

variables in class ification is sues.A logistic function,

which generates a probability between 0 and 1, is

used in logistic regression to represent the connection

between the independent factors and the dependent

variable. Given the values of the various independent

variables, the logistic function s imulates the

likelihood that the dependent variable (such as class

me mbership) would take on a s pecific value.Due to

its eas e of use and interpretability, logistic regression

is a common machine-learning technique. It is very

simple to use and can handle interactions between the

independent as well as dependent variables that are

both linear and non-linear. Nevertheless, it can only

be applied to situations involving binary classification

or multiple classes when many models are trained

then integrated.

D. BUILDING THE MODEL

 Defining the Problem: The goal of building a fake

news detection model is to determine whether the

news is authentic or fake. This is an important issue

becaus e fake news has the potential to harm

individuals and society by s preading

misinformation and influencing public opinion.

 Preparing the Data: The next step is to pre-process

the data in order to prepare it. the gathering of

pertinent news articles, data cleaning, text

normalisation, text tokenization, and development

of new features. We have gathered two distinct

datasets with various title, text, subject, date, and

label properties. Then, designating it as an article,

consolidated the title as well as text into such a

single column. We removed that column becaus e it

wasn't really useful given the article's publication

date. The column labels that indicate if whether

news is fake or real have been modified to read " 0"

for fake news and "1" for true news . We prepared

the data in this way.

 Selecting a Model: We have identified four key

models to train using the data out of t he many

machine learning methods that can be utilised for

false news identification. Thes e include the

algorithms Decision Tree, Logistic Regression,

Random Fores t, and Passive Aggress ive.

 Model Training: Using the training set of data, we

successfully trained the model. In order to do this ,

pre-processed data must be fed to the model in

order for it to recognize patterns.

 Evaluating the Model: Using the testing data, we

compared the projected results to the actual results

and calculated accuracy to assess the model's

performance.

IV. RESULT S AND COMP ARITI VE ST UDY

 Passive Aggressive: It is a linear classifier algorith m

and can be used for binary or mu lticlass

class ification problems. It is known for its fast-

training time and ability to handle large data sets

effic iently. It’s mainly us ed for online learning,

where new data can continuously be added to the

model and updated. Its main weakness is that it can

be sensitive to outliers and irrelevant features ,

which can negatively impact its performance.

We achieved 97.86% accuracy using the PASSIVE

AGGRESSIVE algorithm.

 Decision Tree: It is a straightforward but effective

approach that works both for regres sion and

class ification issues. It is perfect for describing

outcomes to stakeholders who really are unfamilia r

with technical nuances because it is simple to

interpret and visualis e. It is simple to manage non -

linear correlations between features and goal

variables s ince the algorithm divides the data into

progressively lower s ubs ets based on the

characteris tics . So, when tree is allowed to expand

too deep, it is particularly prone to overfitting,

which can result in subpar performance on

unobserved data.

We achieved 95.29% accuracy using the

DECISION TREE algorithm.

 Logistic Regression: For problems involving binary

and multiple clas ses in classification, it is a linear

algorithm. With a minimal to medium-sized data

sets , it is quick to train and effective. It is simp le to

understand and offers information about how

features relate to the desired variable. The

assumption that characteristics and target variables

have a linear relationship, however, may not

necessarily be true in real-world data.

We achieved 96.65% accuracy using the

LOGISTIC REGRESSION algorithm.

 Random Forest: To produce predictions based on

many decis ion trees , data mining and machine

learning use the ensemble learning technique

known as random forest. Several decision trees are

trained using randomly chosen data subsets , and

then their predictions are combined using weighted

average or ma jority voting. Especially in

comparison to decision trees, it is significantly

accurate and much less prone to overfitting,

although it requires more computing.

International Conference on Innovative Data Communication Technologies and Application (ICIDCA-2023)

IEEE Xplore Part Number: CFP23CR5-ART; ISBN: 979-8-3503-9720-8

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on June 08,2023 at 07:17:03 UTC from IEEE Xplore. Restrictions apply.

We achieved 95.81% accuracy us ing the RANDOM

FOREST algorithm.

A. PREDICTIONS

ALGORITHM

ACCURACY

Random Forest [12]

95.81%

Pass ive Aggressive [13]

97.86%

Logistic Regression [14]

96.65%

Decision Tree [15]

95.29%

We have used these models, and the accuracies for these

models are

Fig. 9. Obt ained Accuracies

B. ENVIRONMENT

Google Co lab is a f ree cloud-based platform that provides

access to powerful co mputing resources and a Jupyter

notebook environment for data scientists and machine

learning engineers. The platfor m is designed to allow users

to collaborate and share their work, making it an ideal

environment for conducting machine learning applicat ions,

including fake news detection.

In Google Colab, us ers have access to GP Us and TPUs,

which can greatly speed up the training of machine learning

models. This is particularly useful for large and complex

models that would otherwise require a lot of co mputational

resources and time to train. With Google Colab, users can

start training their models in minutes , without having to

worry about s etting up their own hardwa re or software

environment.

The Jupyter notebook environment in Google Co lab

provides a convenient and interactive way to write, execute,

and visualize code. Users can write and run their code in the

browser, without having to install any software or

dependencies on their local machine. The notebooks can be

easily shared with others, ma king it easy to collaborate with

team members or share the results with a wider audience.

In the context of a fake news detection paper, Google Colab

can be used to train and evaluate machine lea rning models

on large datasets, and to perform data preprocessing and

feature e xtraction. The Jupyter notebooks can be used to

document the steps taken in this paper, to record the results,

and to share the findings with others .

V. CONCLUSION

To summarize, detecting fake news is a co mple x problem

that necessitates a multidisciplinary approach. Collecting

and preprocessing data, selecting and training machine

learning algorithms , and fine-tuning the models for

improved performance are a ll steps in the development of

effective fake news detection models. The quality and

quantity of data, as well as the algorithms and features used,

all have a significant impact on the performance of fake

news detection models .

Despite these obstacles , fake news detection models have

the potential to make a significant contribution to the fight

agains t misinformation. These models can help to mitigate

the spread of fals e informat ion and protect individuals and

society from its harmfu l effects by automatically identifying

and flagging it. Furthermore, as technology advances and

machine learning algorithms become more s ophisticated,

fake news detection models are expected to beco me even

more effective in detecting and combating fake news.

REFERENCES

[1] N. R. de Oliveira, D. S. V. Medeiros, and D. M. F. Matt os, ‘‘A

sensitivestylistic approach to identify fake n ews on social

networkin g,’’ IEEE SignalProcess. Lett., vol. 27, pp. 1250–

1254, 2020.

[2] A. R. Merryton an d G. Augasta, ‘‘A survey on recent advances in

machinelearning techniques for fake news detection,’’ Test Eng.

Manag, vol. 83 ,pp. 11572–11582, 2020.

[3] M. Mahyoob, J. Algaraady , and M. Alrah aili, ‘‘Linguistic-based

detectionof fake news in social media,’’ Int. J. English Linguistics,

vol. 11, no. 1,p. 99, Nov. 2020Journal of Advan ced Research in

Comput er and Communicat ion Engineering ISO 3297 :2007 Certified

Vol. 6 , Issue 12, December 2017

[4] N. R. de Oliveira, P . S. P isa, M. A. Lopez, D. S. V. de Medeiros,

andD. M. F. Matto s, ‘‘Ident ifying fake news on social networks

based onnatural language processing: T ren ds and challenges,’’

Information, v ol. 12,no. 1, p. 38, Jan. 2021.

[5] D. D. N. Caragea, D. M. Caragea and K. C. Al-Kofahi, "A Machine

Learning-based System for Det ectin g Fake News in Social Media,"

in Proceedings of the 56th Hawaii International Conference on

Sy stem Scien ces, 2 019, pp. 3 220 -3229.

[6] H. Farid an d K. Abdullah, "Fak e News Detection Based on Machine

Learning Algorith ms," in 201 9 Int ernational Conference on

Information Technology, Islam ic Republic of Iran, 2019, pp. 1-6.

[7] S. R. S. S. P rabaharan and R. Priyan ka, "Fake News Detect ion using

Machine Learning Algorithms," in 2020 International Conference on

Advances in Comput ing, Comm unications an d Informatics

(ICACCI), 2020, pp. 1190-1195.

[8] J. Kim and J. Kim, "Fake News Detect ion Using Mult i-Source

Informat ion and Machine Learning T echniques," in P roceedings of

the 2019 International Conference on Computational Science and

Comput ational Intelligence, 2019, pp. 3 29-334.

[9] Amjad M, Sido rov G, Zh ila A, Gómez-Adorno H, Voronkov I,

Gelbukh A. 2020. Bend t he truth: benchmark dataset fo r fake news

detection in Urdu language and its evaluat ion. Journal of

Int elligent & Fuzzy SystemsRegr ession Analysis,” vol. 6, no. 5.

[10] Mohammed Ali Shaik, Praveen Pappula, T Sampath Kumar,

Predict ing Hypothyroid Disease usin g Ensemble Models through

Machine Learning Approach , European Journal of Molecular &

Clinical Medicine, , 9(7), 6738-6745, (2022).

[11] M. A. Shaik, S. k. Koppula, M. Rafiuddin and B. S. Preethi,

COVID-19 Detector Usin g Deep Learning, International Conference

on Applied Artificial Intelligen ce and Computing (ICAAIC), 443-

449 ,(2022).

[12] T.Sampath Kumar, B.Manjula, Security Issue Analysis on Cloud

Comput ing Based System,Int ernational Journal of Future Generation

Communication and Networking, 12(5),143 – 150, (2019).

[13] Mohammed Ali Shaik and Dh anraj Verma, Prediction of Heart

Disease usin g Swarm Intelligence based Machine Learning

Algorithms, Int ernational Conference on Research in Sciences,

Engineering &Technology, Published by AIP Publishing. 978-0-

7354-4368-6, 2022, pp. 02002 5-1 to 020025-9.

[14] YerrollaChanti, Bandi Bhaskar, NagendarYamsani, “ Li-Fi

Technolo gy Utilized In Leveraged To Power In Aviat ion System

Entertainment Through Wireless Comm unication ”, J. Mech. Cont.&

Math. Sci., Vol.-15, No.-6, June (2020) p p 405-41 2.

[15] Mohammed Ali Shaik and Dhanraj Verma, "Predictin g Present Day

Mobile Phone Sales using Time Series based Hybrid Prediction

International Conference on Innovative Data Communication Technologies and Application (ICIDCA-2023)

IEEE Xplore Part Number: CFP23CR5-ART; ISBN: 979-8-3503-9720-8

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on June 08,2023 at 07:17:03 UTC from IEEE Xplore. Restrictions apply.

Model", International Conference on Research in Sciences,

Engineering & Technology, AIP Conf. P ro c. 2418, ( 2022) pp.

020073-1 to 020073-9

[16] Mohammed Ali Shaik, MD. Riyaz Ahmed, M. Sai Ram and G.

Ranadheer Reddy, "Imposing Security in the Video Surveillance",

Internat ional Conference on Research in Sciences, Engineering &

Technolo gy , AIP Conf. Proc. 2418 , 0 20012 -1–020012 -8, (2022), pp.

020012-1 to 020012-8.

[17] Mohammed Ali Shaik, Geetha Manoharan, B P rashanth, NuneAkhil,

Anumandla Akash and Thudi Raja Sh ekhar Reddy, "Predict ion of

Crop Yield using Machin e Learn ing", Int ernational Conference on

Research in Sciences, Engineering & T echnology, AIP Conf. Proc.

241 8, (2022), pp. 020072-1–020072-8,

[18] Mohammed Ali Shaik and Dhanraj Verma, Enhanced ANN training

model to smooth and time series forecast, IOP Conf. Ser.: Mater.

Sci. Eng. 981, (2020), p p. 02 2038

[19] T. Sampath Kumar, B. Manjula, Mohammed Ali Shaik, P. Praveen ,

A Comprehensive Study on Single Sign on Technique Internat ional

Journal o f Advanced Science and T echnology (IJAST ), 127, 156-

162 , (20 19)

[20] Mohammed Ali Shaik, Dhanraj Verma, P Praveen, K Ranganath and

Bonthala Prabh anjan Yadav, RNN based predict ion of

spatiot empo ral data mining, IOP Conf. Ser.: Mater. Sci. Eng. 981,

(20 20), pp.02 2027.

[21] T. Sampath Kum ar, B. Manjula, D. Srinivas, A New T echnique to

Secure Data Over Cloud Jo ur of Adv Research in Dynamical &

Cont rol Syst ems, 11, (2017), pp 145-149.

[22] Mohammed Ali Shaik and Dhanraj Verm a, Deep learnin g time series

to forecast COVID-19 active cases in INDIA: A comparative study,

IOP Conf. Ser.: Mater.Sci.Eng.981, (2020), pp. 022041.

[23] P Praveen, M Ranjith Kumar, Mohammed Ali Shaik, R RaviKumar

and R Kiran, The comparative study on agglomerat ive h ierarchical

clustering usingnumerical dat a, IOP Conf. Ser.: Mat er. Sci. Eng.

981 , 2020, pp. 0 22071.

[24] P. Praveen, B.Rama, An Efficient Smart Search Using R Tree on

Sp atial Data Journal of Advanced Research in Dynamical and

Cont rol Syst em s, 4 , (2019), pp.19 43-1949.

[25] D Kothandaraman, N P raveena, K Varadarajkumar, B Madhav Rao,

Dharmesh Dhabliya, Sh ivaprasad Sat la, Worku Abera, Intelligent

Forecast ing of Air Quality and Pollut ion Predictio n Using Machine

Learn ing, Adsorption Science & Technology, 2022, (2022).

[26] Sallauddin Mohmmad, Ramesh Dadi, D Kothandaraman , E

Su darshan, Syed Nawaz Pasha, Mohammed Ali Shaik, A survey

machine learning based object det ections in an image, AIP

Conference P ro ceedings,2418, (1), (2022), pp. 020024.

[27] A Balasundaram, D Kothandaraman, S Ashokkumar, E Sudarsh an,

Chest X-ray image based COVID predict ion using machine learning,

AIP Conference Proceedings, 2418(1), 2022 , pp. 020079.

[28] Mohammed Ali Shaik, T ime Series Forecast ing using Vector

quantizat ion”, Internation al Journal of Advanced Science and

Technology (IJAST), 29(4), 169-175, (2020).

[29] Mohammed Ali Sh aik, T. Sampath Kumar, P. Praveen, R.

Vijayaprakash, Research on Multi-Agent Experiment in Clust ering”,

Internat ional Journ al of RecentT echnology and Engineering

(IJRT E), 8(1S4), 1126-1129, (2019).

[30] Mohammed Ali Shaik, A Survey on Text Classification meth ods

throughMachine Learning Methods”, Int ern ational Journal of

Cont rol and Automation (IJCA),12(6), 390-396, (2 019).

[31] R. Ravi Kumar, M. Babu Reddy an d P. P raveen, A review of feat ure

subset selection on unsupervised learning Th ird Internat ional

Conference on Advances in Electrical, Electronics, Info rmat ion,

Communication and Bio-Informatics (AEEICB), (2017), pp.163-

167.

International Conference on Innovative Data Communication Technologies and Application (ICIDCA-2023)

IEEE Xplore Part Number: CFP23CR5-ART; ISBN: 979-8-3503-9720-8

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on June 08,2023 at 07:17:03 UTC from IEEE Xplore. Restrictions apply.

Hand Gesture Based Food Ordering System

Conference Paper

Full-text available

Oct 2023

Cost Preference Product Service using Recommendation System

Conference Paper

Dec 2023

Machine Learning Model to Detect Parkinson's Disease using MRI Data

Conference Paper

Nov 2023

Comparative Analysis of Emotion Classification using TF-IDF Vector

Conference Paper

Oct 2023

Natural Language Processing in Politics

Chapter

Aug 2023

Tshilidzi Marwala

Natural language processing (NLP) has changed how humans interact with technology and evaluate data. Its ability to comprehend, interpret, and generate human language has opened up new avenues for international diplomacy. The study of international politics necessitates the examination of massive amounts of unstructured textual material, such as diplomatic letters, international treaties, speeches, news reports, and social media posts. Manually analyzing this data is a complex and time-consuming undertaking. NLP is a tool for automating this process and extracting useful information. NLP provides real-time global political discourse analysis, detects sentiment or policy shifts, identifies new trends or dangers, and bridges language barriers using sentiment analysis, topic modelling, text categorization, and machine translation algorithms. Sentiment analysis, for example, is used to evaluate public opinion on international topics based on social media posts. Machine translation enables diplomats and politicians to swiftly comprehend papers written in foreign languages. We will investigate the consequences of NLP in decision-making, anticipating political outcomes, and altering the international relations landscape. The intersections of NLP and politics present consequential prospects while raising critical ethical and policy concerns, emphasizing the importance of responsibly using this powerful technology.KeywordsNatural Language ProcessingTopic modellingSentiment analysisText classificationMachine translationUnder-resourced languages

Intelligent Forecasting of Air Quality and Pollution Prediction Using Machine Learning

Article

Full-text available

Jun 2022

Air pollution consists of harmful gases and fine Particulate Matter (PM2.5) which affect the quality of air. This has not only become the key issues in scientific research but also turned to be an important social issues of the public’s life. Therefore, many experts and scholars at different R&Ds, universities, and abroad are involved in lot of research on PM2.5 pollutant predictions. In this scenario, the authors proposed various machine learning models such as linear regression, random forest, KNN, ridge and lasso, XGBoost, and AdaBoost models to predict PM2.5 pollutants in polluted cities. This experiment is carried out using Jupyter Notebook in Python 3.7.3. From the results with respect to MAE, MAPE, and RMSE metrics, among the models, XGBoost, AdaBoost, random forest, and KNN models (8.27, 0.40, and 13.85; 9.23, 0.45, and 10.59; 39.84, 1.94, and 54.59; and 49.13, 2.40, and 69.92, respectively) are observed to be more reliable models. The PM2.5 pollutant concentration (PClow-PChigh) range observed for these models is 0-18.583 μg/m³, 18.583-25.023 μg/m³, 25.023-28.234μg/m³, and 28.234-49.032 μg/m³, respectively, so these models can both predict the PM2.5 pollutant and can forecast the air quality levels in a better way. On comparison between various existing models and proposed models, it was observed that the proposed models can predict the PM2.5 pollutant with a better performance with a reduced error rate than the existing models.

COVID-19 Detector Using Deep Learning

Conference Paper

Full-text available

May 2022

Prediction of crop yield using machine learning

Conference Paper

Full-text available

May 2022

Farming is the major work which is considered as a culture instead like job and farming is the back bone of our economy as farming is the means which carried forth human advancement. India is nation which shows more interest towards farming and also grows all types of crops and its economy generally dependent on harvest profitability. Subsequently we can say that agriculture is major support for all business in our nation. Choosing of each harvest is significant in the choosing as each and every state in India grow various crop and the climate also varies from state to state. The choice of crop will depend on the various factors like, value of the crop, price given by the government, weather conditions and the price given by the private market buyers. Numerous progressions are needed in the field of agriculture to improve the benefit to Indian economy. We can improve agriculture by implementing AI mechanisms which can be same are defficiently on various cultivating areas. With all the advancements in the areas of machines and their improvements we can use them in cultivating the valuable and detailed data concerning various issues in addition to assuming the critical part in it. This paper helps use to getting an idea towards executing all the harvest based strategy with the ambitious techniques that helps in enchanting the maintenance of numerous agriculture and agriculture field issues. This helps the farmers to choose a best crop which helps them getting profit and also helps to increase our nation’s economy.

Predicting present day mobile phone sales using time series based hybrid prediction model

Conference Paper

Full-text available

May 2022

In past decades, usually people purchase electronic products or gadgets at nearby retail stores or from direct brand showrooms. The manufacturers collect feedback from the customers via salespoints, calls, messages, emails and feedback forms during service. The customer feedback plays a vital role in improving the product quality as well as to know the need of the customers. These reviews over a time series may not reach the new customers and as well as the originality of the reviews are not ensured. In recent days of thriving information technology, because of huge arrival of shopping portals like Flipkart, Amazon and so on people started to buy products via these portals. These portals beyond sharing the product information also allows the buyers to share their feedback as well as the experience with the purchased product. Novel buyers do read those online reviews or comments and further compare dozens of stores and products before deciding to purchase a product as the customer comments also serve as a source for the companies to predict the sales of their present product and tentative prediction of future sales. By collecting these time series specific reviews and stock market values would help the companies to make an estimation of sales that take place.

Predicting Hypothyroid Disease using Ensemble Models through Machine Learning Approach

Article

Jul 2022

Mohammed ALi Shaik

An Efficient Phishing Attack Detection using Machine Learning Algorithms

Conference Paper

Nov 2022

Chest X-ray image based COVID prediction using machine learning

Conference Paper

May 2022

This work is towards COVID-19 infection detection by analyzing chest X-Rays. Since the recent and sudden spike in the rate of COVID-19 infections across the globe, several alternative screening approaches and strategies have been developed to identify infected cases of COVID-19. Our aim is to develop a machine-learning based model and design exploration to learn the architecture design starting from initial design prototype and machine learning technique to detect COVID-19 in.a simpler manner. Therefore developing an automated analysis system is required to save medical professionals valuable time.

Imposing security in the video surveillance

Conference Paper

May 2022

In case of a crime scene or any other investigating task, the recorded surveillance video footage plays a very prominent role in studying the situation. It is a very challenging duty for the surveillant officer to look after many such videos without missing the important details. Thereby, the officer has to be assisted with an efficient tool to ensure grabbing clear and unambiguous information from the video footage. The foremost step is to detect people and other objects distinctly as well as accurately. Along with the detection pointing out the detected objects has to be done by drawing bounding boxes around them. For this to happen the Machine has to be trained very well that in such a way that it should be able to detect all the objects in the video footage accurately. It is a challenging task to identify a moving object in public places. Set of regulations and restrictions are to be followed to put an association or commonality cautiously. This project is aimed to support and assist the surveillant officers in their investigations.

Prediction of heart disease using swarm intelligence based machine learning algorithms

Conference Paper

May 2022

In the present era heart disease is considered to be one of the major diseases and many people are suffering due to this disease and the foremost challenge is identification and prediction before it causes any consequences or deaths. There are some techniques available for prognosticate heart disease since this disease is increasing rapidly throughout the universe, this prediction process may save life. Time and efficient play important role in identifying heart disease in healthcare industry particularly in the field of cardiology. In this paper we developed a dynamic and accurate system for heart disease prediction using machine learning techniques. There are two phases which can identify and predict heart disease: 1) Feature selection 2) classification stage. Feature selection is one of the methods for selecting attributes and feature subset as it eliminates unwanted data and apply classification algorithms and dataset comprises of patient's attributes like age, gender, blood pressure, glucose level, blood sugar etc… by processing these attributes we can predict the chance of occurring heart disease. This paper proposes a optimization techniques like Grey wolf optimization, Particle swarm optimization combined with Ant colony optimization for performing supervised classification algorithms.PSO is for finding optimum solutions and ACO is for finding good paths and the mixed proposed algorithm is applied and result are estimated to identify the efficiency and robustness.

A survey machine learning based object detections in an image

Conference Paper

May 2022

One of the research emergence as per studied problem on the image processing based computer vision is that object detection in a image with bounding boxes. This complicated processing has to be done with help of machine leaning based algorithms only. In the recent years research has done with machine leaning algorithms like CNN,RCNN, Fast RCNN,FasterRCNN,Yolo algorithm and etc .These algorithms have achieved the proposed concept in different levels. In this paper we presented the comparative study of each algorithm and provided efficiency and weakness contexts.