Conference PaperPDF Available

Fake News Detection using NLP

Authors:
Fake News Detection using NLP
Mohammed Ali Shaik
Department of Computer Science
&Artificial Intelligence
SR University
Warangal, Telangana St ate, India
niharali@gmail.com
Thogiti Ganesh
Department of Computer Science &
Engineering
SR University
Warangal, Telangana St ate, India
19k41a05e7@sru.edu.in
Makkaji Yasha Sree
Department of Computer Science &
Engineering
SR University
Warangal, Telangana St ate, India
19k41a05g2@sru.edu.in
Dasari Sushmit ha
Department of Computer Science &
Engineering
SR University
Warangal, Telangana St ate, India
19k41a05c5@sru.edu.in
Sanka Sri Vyshnavi
Department of Computer Science &
Engineering
SR University
Warangal, Telangana St ate, India
19k41a05g8@sru.edu.in
Narmetta Shreya
Department of Computer Science &
Engineering
SR University
Warangal, Telangana St ate, India
19k41a05g5@sru.edu.in
AbstractIn the age of digital media, fake news is a se rious
problem because it spre ads misi nformation and harms
individuals, organizati ons, and even entire nati ons which i s a
challenging aspect. Thi s study pro poses a machine learning
approach for detectin g fake news. In the proposed approach, a
categorization model is devel ope d with four different types of
machine learning algorithms, evaluating the content and
aesthetic components of news stories. The performance of the
proposed mode l is analyzed by using a large dataset of real and
fake news articles and the results show that it outpe rforms
many existing systems. The proposed findings de monstrate the
potential of machine learning techniques, such as logistic
regression, decision tre e, random forest, and passive aggressive
algorithms to address the fake ne ws detection challenges.
Keyword sMachine learning, Data, prediction model,
Classification, Logi stic regressi on, Random Forest, Decision tree
and Passive Aggressive, Fake news detection.
I. INT RODUCTION
The deliberate spread of incorrect or misleading
information through different media is referred to as fake
news, also known as disinformation. Fake news has become
a widespread issue with the rapid ris e of the internet and
social media, and it now poses a threat to society in many
ways, including by inciting fear and distrust, influencing
public opinion and decision-making, and even producing
political ins tability. Therefore, it has become crucial for
governments , media outlets, and individuals to identify and
stop the s pread of fake news.
To perform fake news detection, this study intends to
create a system that can recognize false news articles with
accuracy. To accomplis h this, we will exa mine the content of
news items to establish their veracity us ing machine learning
algorithms and methods of natural language processing [30].
The system will be trained on a large datas et of news
articles labeled as real or fake, and it will e xt ract features
such as the type of language used, the presence of certain
keywords, and the sentiment e xpres sed in the text. The
machine learn ing model will then use thes e features to make
a prediction about the authenticity of the news article.
The final product will be a reliable and accurate fake
news identification system that can aid in halting the spread
of false information and encouraging the spread of true
information. The s ystem will be assessed against other
current techniques for fake news identification using
common measures including accuracy, precision, recall, and
F1 score. By helping to create a solution to the fake news
problem, this paper has the potential to have a big influence.
II. LITERATURE REVIEW
Paper [1]: "Comb ining Textual and Network Features
for Fake News Detection on Social Media" by S. S.
Alqahtani, M. Alshomrani, and A. Alshomrani. In this s tudy,
the authors us ed the pass ive aggress ive algorithm to classify
news articles as real or fake, bas ed on a combination of
textual and network features. The study was published in
2021. The authors evaluated their method on a dataset of real
and fake news articles collected from various sources on the
web. They found that the PA algorith m co mbined with
textual and network features outperformed several other
methods for fake news detection [12].
Merits:The PA algorithm is simple and fast, making it
well-suited to the problem of fake news detection on social
media.By co mb ining textual and network features, the
authors were able to improve the performance of the PA
algorithm for fake news detection [13].
Demerits:The algorithm may not perform well when the
data is noisy or highly unstructured, as is often the case in
social media platforms [14].
Paper [2]:"An A pproach for Fake News Detection using
Passive Aggressive Algorithm on Social Media" by H. R.
Nandini and H. B. Kavyashree. In this study, the authors
class ified news pieces on s ocial media as real or fake using
the passive aggress ive algorithm based on characteristics
such the s tory's source, the presence of keywords, and the
attitude the text conveyed. In 2020, the study was released.
On a dataset of news pieces gathered from several social
media networks, the authors tested their methodology [15].
They discovered that the PA algorithm beat a number of
other techniques , including Naive Bayes and Decis ion Tree
algorithms, for the detection of bogus news [16].
Merits:By incorporating a variety of features, the authors
were able to improve the performance of the PA algorithm
for fake news detection.
International Conference on Innovative Data Communication Technologies and Application (ICIDCA-2023)
IEEE Xplore Part Number: CFP23CR5-ART; ISBN: 979-8-3503-9720-8
979-8-3503-9720-8/23/$31.00 ©2023 IEEE 399
2023 International Conference on Innovative Data Communication Technologies and Application (ICIDCA) | 979-8-3503-9720-8/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICIDCA56705.2023.10100305
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on June 08,2023 at 07:17:03 UTC from IEEE Xplore. Restrictions apply.
Demerits:The algorithm is sensitive to the choice of
hyperparameters, and its performance can degrade if the
hyperparameters are not set appropriately.
Paper [3]: In "Fake News Detection us ing Random
Forest with Sentiment Analysis" by B. K. Singh and S. Jain.
In this s tudy, the authors used the random forest algorithm to
classify news articles as real or fake, based on features such
as the source of the news , the pres ence of keywords, and the
sentiment expressed in the text. The s tudy was published in
2021. The authors evaluated their method on a dataset of
news articles collected from various sources on the web [17].
They found that the RF algorithm combined with sentiment
analys is outperformed s everal other methods for fake news
detection, including Naive Bayes and Logistic
Regress ion algorithms [18].
Merits:The RF algorith m is robust and can handle
complex data distributions and non-linear relationships
between features and the target variable.
Demerits:The RF algorithm is computationally intensive
and requires a lot of me mory, making it less suitable for real -
time detection of fake news on s ocial media platforms.
Paper [4]: A Random Forest Bas ed Approach for Fake
News Detection in Social Media by M. J. Aslam and A. S.
F. Zaidi. In this study, the authors used the random forest
algorithm to classify news articles on s ocial media as real or
fake, based on features such as the source of the news , the
presence of keywords, and the sentiment expressed in the
text. The study was published in 2019 [19]. The authors
evaluated their method on a dataset of news articles collected
from various social media p latforms [20]. They found that
the RF algorithm outperformed several other methods for
fake news detection, including Naive Bayes and Logistic
Regress ion algorithms [21].
Merits:By incorporating a variety of features, the authors
were able to improve the performance of the RF algorithm
for fake news detection.
Demerits:The algorithm can be sensitive to overfitting,
especially when the data is highly uns tructured or noisy.
Paper [5]:"Combating misinformation in social media
with machine learning: a survey" by Nikolaos Aletras,
ArkaitzZubiaga, and David Corney ( 2017). The authors
provide an overview of the various ML algorithms used for
fake news detection, including logistic regression. They also
discuss the challenges and future directions of the field,
including the need for large annotated datas ets and the
development of robus t evaluation metrics [22].
Merits:Using logistic regression in this context include its
simplicity, interpretability, and the ability to handle large
datasets efficiently.
Demerits:Logistic Regression isnot suitable for more
complex proble ms where the relationship between the
features and target is not linear.
Paper [6]: In "Fake News Detection on Social Media: A
Data Mining Perspective" by Arjun Mukherjee, Dmitry
Davidov, and Eugene Agichtein. This study used logistic
regression to classify fake news articles bas ed on features
such as s entiment, subjectivity, and cred ibility of the source.
Merits:Logistic regression can handle large datasets,
ma king it well-s uited to the problem of fake news detection
on social media.
Demerits:Logistic regression may not perform well when
the data is highly imbalanced, such as in the case of fake
news detection, where the proportion of fake news is small
relative to the amount of real news.
Paper [7]: Fake Ne ws Detection Us ing Decision Trees
and Naive Bayes by J. Chen and J. Liu. In this study, the
authors classified news stories as real or fake using decis ion
trees and Naive Bayes algorithms based on characteristics
including the news source, the existence of terms, as well as
the emotion conveyed in the text. In 2020, the study was
released [23]. On a dataset of news stories gathered from
diverse online sources, the authors tested their strategies .
They discovered that the DT and Naive Bayes algorith m
combo beat a number of other techniques for identifying fake
news, including Logistic Regression and Random Fores t
algorithms [24].
Merits:DT algorithms are capable of handling complex
relationships between features and target variables.
Demerits:DT algorithms are sensitive to small changes in
the training data, making them unstable
Paper [8]:"Fake News Detection Us ing Decision Trees
and Random Forest" by Y. Zhang and L. Wang. In this
study, the investigators clas sified news stories as legitimate
or fraudulent us ing decision trees and random fores t
algorithms. 2019 s aw the publication of the study. A dataset
of news s tories compiled from multiple online sources served
as the basis for the authors' approach evaluation [25]. They
discovered that the DT and random forest algorithms beat a
number of other techniques for identifying bogus news,
including Logis tic Regression and Naive Bayes algorithms
[26].
Merits:Random Forest algorithms are robust to
overfitting, making them a popular choice for many
class ification tasks.
Demerits:DT algorithms can be prone to overfitting if
not properly tuned.
III. PROBLEM DEFINITION
The task is to accurately classify each news article as
either real or fake given a set of news articles. The difficulty
in defining what cons titutes "fake news," as well as the
difficulty in automatically detecting such news, is at the heart
of this problem.
Misinformation, propaganda, s atire, and even conspiracy
theories are all examples of fake news. It can be
disseminated via a variety of media, including traditional
news outlets, social media platforms, and even personal
websites. Fake news articles' content can also be designed to
appeal to emotions, biases, or beliefs, making them difficult
to distinguish from legitimate news.
Furthermore, it has become challenging for people to
distinguis h between true and fake news due to the quick
dissemination of fals e information online. This is particularly
problematic in politics because false information has the
power to influence people's opinions and actions.
International Conference on Innovative Data Communication Technologies and Application (ICIDCA-2023)
IEEE Xplore Part Number: CFP23CR5-ART; ISBN: 979-8-3503-9720-8
979-8-3503-9720-8/23/$31.00 ©2023 IEEE 400
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on June 08,2023 at 07:17:03 UTC from IEEE Xplore. Restrictions apply.
As a result, both from a sociological and a technical
perspective, the issue of identifying fake news is urgent.
Advanced natural language processing methods and machine
learning algorithms must be combined in order to effectively
discern between authentic and false news stories.
A. DATASET
For this paper we us ed two data sets obtained from
Kaggle. 1) Ne ws.csv 2) Fa ke-Real The dataset cons ists of 12
attributes [27]
The first dataset contains three attributes which are
Title
Text
Label- (Fake or Real)
Fig. 1. Dat aset at tribute description
The dataset contains 6335 rows of data with three
columns
The second dataset contains four attributes which are
Title
Text
Subject
Date
The dataset contains 21417 rows of Real data and 23481
rows of fake data with four columns each. Real data csv file
contains Political News and World News, Fake data csv file
contains Political News, Left News, Govt News, Us News,
Middle-East News. By concatenating both real and fake
news now we have around 44k rows with four columns.
Fig. 2. Cleaned dat aset description
The graph below shows different types of news such
Political News, Left Ne ws, Govt News, Us News, Middle -
East News.
Fig. 3. Classificat ion of data at tributes
The graph below shows the different types of news that
real.csv file contains such as Political News and World News
Fig. 4. T ypes of data
B. DATA PRE-PROCESSING
Data preprocessing is an important step in the fake news
detection process as it helps to prepare the data for further
analys is and modeling. The following are the steps involved
in data preprocessing for fake news detection:
1) Data Gathering: The firs t stage is to gather pertinent
news articles and other important data, such as the date of
publication, the topic, and the headline.
2) Data Cleaning: The gathered data must then be
cleaned by eliminating e xtraneous information, fixing
mistakes, and handling missing numbers. This can be
achieved by employing strategies like deleting stop words,
lowercasing all text, and removing s pecial characters .
3) Text normalization: It is a process of putting text into a
format that is generally accepted. This can be achieved by
deleting numerals, s temming words, and changing all te xt to
lowercase.
4) Text tokenization:It is the method of disassembling a
statement into its constituent words. Tools like the Natural
Language Toolkit (NLT K) or regular expres sions can be
used for this .
5)Feature Engineering: In order to improve the data
representation for the false news detection model, new
features are created from the existing data.
6) Text vectorization: It is the process of trans forming
text into numerical data that may be fed into a machine
learning model. Techniques like "bag-of-words" analys is and
term frequency-inverse document frequency can be used for
this (TF-IDF).
International Conference on Innovative Data Communication Technologies and Application (ICIDCA-2023)
IEEE Xplore Part Number: CFP23CR5-ART; ISBN: 979-8-3503-9720-8
979-8-3503-9720-8/23/$31.00 ©2023 IEEE 401
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on June 08,2023 at 07:17:03 UTC from IEEE Xplore. Restrictions apply.
7)Split Data into Training and Testing Sets: The
preprocessed data must be divided into training and testing
sets in order to properly train and test the false news
detection model.
A word cloud is a graphic depiction of the words that
appear most frequently in a te xt or group of texts. The words
are arranged into a cloud-like pattern, with the size of each
word corresponding to how frequently it appears in the text.
Less frequently used terms are shown in smaller font sizes,
while the most often used words are presented in bigger font
sizes.Word clouds are frequently used for summaris ing and
presenting vast volumes of text data in text analysis and data
visualisation. They can be helpful for highlighting words and
phrases which are often used as well as for rapidly
recognising the most crucial ideas or topics in a document.
To enhance visual interest as well as convey more
information, word clouds can also be altered with different
colours , forms, and font styles .
Fig. 5. W ord cloud of real news
Fig. 6. Word cloud of fake news
Fig. 7. W ord cloud of real news of dataset -2
Fig. 8. W ord cloud of fake news of dat aset-2
C. ALGORITHMS
We use a variety of Machine Learning models, all of
which are different Regression models, to solve our
Prediction problem and the top four are:
Decision Tree [28]
Random Fores t [29]
Pass ive Aggressive [31]
Logistic Regression [27].
So let's prepare our data for our machine learning model's
training and testing.
Decision Tree: Building a coaching model that can be
utilized to predict the categorization or cost of goal
variables by mastering choice policies drawn from
training data is the main purpose of the decision tree
algorithm, which is a s ubs et of the supervised
learning algorithm fa mily. Regress ion and
class ification challenges can be resolved using the
decision tree technique. The Decis ion Tree is mostly
utilised for grouping purposes. Additionally, a
common categorization model in data mining is the
decision tree. Every tree is made up of nodes and
branches . Each node represents a class of elements
that need to be categoris ed, and each s ubs et specifies
a price that the node can accept. Selection bushes
have established several implementation fields due to
their simple evaluation and accuracy on a few
information forms. Decision tree class ifiers are
praised for providing an excellent perspective on
performance results . Optimized splitting parameters
and better tree pruning techniques (ID3 [18], C4.5
[19], CART [20 ], CHAID [21], and QUEST [22]) are
frequently employed by all known information
class ifiers due to their high precision. The distinct
datasets are used to extract train ing samples from a
huge record s et, which has an impact on the tes t set's
precision.
Random Forest:In order to import a previously
trained version of the network used for having to
implement training over thousands of Laptops data,
Random Fores t, an ensemble of decision trees , uses a
"Laptops database." As a result, it will build up a
library of additional features that denotes accuracy of
87% and r2 score is 0.15%, which are bes t when
compared to other algorithms.
Passive Aggressive: For class ification and regression
issues, the Passive Aggressive (PA ) algorithm is an
online machine learning technique. The technique is
intended to be quick and effective, ma king it
International Conference on Innovative Data Communication Technologies and Application (ICIDCA-2023)
IEEE Xplore Part Number: CFP23CR5-ART; ISBN: 979-8-3503-9720-8
979-8-3503-9720-8/23/$31.00 ©2023 IEEE 402
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on June 08,2023 at 07:17:03 UTC from IEEE Xplore. Restrictions apply.
appropriate for real-time applications and large-scale
datasets . A linear class ifier or regres sor is
incrementally adjusted with each training example in
the PA method. The discrepancy between the
projected label or value as well as the actual label or
value determines the update step. The update step is
estimated to have a limited effect on the model's prior
predictions while yet having a high degree of
confidence in the prediction for the current example.
The PA algorithm's capability to manage data
instances with s ignificant prediction errors or
incorrect classifications, which might happen often in
real-world applications, is one of its important
strengths. In these circums tances, the algorithm is
intended to be pass ive for situations that are correctly
class ified while als o being more aggressive in
rectifying the forecast error.
Logistic Regression: A statistical technique for
analys ing any dataset in which one or more predictor
variables affect a result is called logistic regression. It
is applied to categorical outcomes or dependent
variables in class ification is sues.A logistic function,
which generates a probability between 0 and 1, is
used in logistic regression to represent the connection
between the independent factors and the dependent
variable. Given the values of the various independent
variables, the logistic function s imulates the
likelihood that the dependent variable (such as class
me mbership) would take on a s pecific value.Due to
its eas e of use and interpretability, logistic regression
is a common machine-learning technique. It is very
simple to use and can handle interactions between the
independent as well as dependent variables that are
both linear and non-linear. Nevertheless, it can only
be applied to situations involving binary classification
or multiple classes when many models are trained
then integrated.
D. BUILDING THE MODEL
Defining the Problem: The goal of building a fake
news detection model is to determine whether the
news is authentic or fake. This is an important issue
becaus e fake news has the potential to harm
individuals and society by s preading
misinformation and influencing public opinion.
Preparing the Data: The next step is to pre-process
the data in order to prepare it. the gathering of
pertinent news articles, data cleaning, text
normalisation, text tokenization, and development
of new features. We have gathered two distinct
datasets with various title, text, subject, date, and
label properties. Then, designating it as an article,
consolidated the title as well as text into such a
single column. We removed that column becaus e it
wasn't really useful given the article's publication
date. The column labels that indicate if whether
news is fake or real have been modified to read " 0"
for fake news and "1" for true news . We prepared
the data in this way.
Selecting a Model: We have identified four key
models to train using the data out of t he many
machine learning methods that can be utilised for
false news identification. Thes e include the
algorithms Decision Tree, Logistic Regression,
Random Fores t, and Passive Aggress ive.
Model Training: Using the training set of data, we
successfully trained the model. In order to do this ,
pre-processed data must be fed to the model in
order for it to recognize patterns.
Evaluating the Model: Using the testing data, we
compared the projected results to the actual results
and calculated accuracy to assess the model's
performance.
IV. RESULT S AND COMP ARITI VE ST UDY
Passive Aggressive: It is a linear classifier algorith m
and can be used for binary or mu lticlass
class ification problems. It is known for its fast-
training time and ability to handle large data sets
effic iently. It’s mainly us ed for online learning,
where new data can continuously be added to the
model and updated. Its main weakness is that it can
be sensitive to outliers and irrelevant features ,
which can negatively impact its performance.
We achieved 97.86% accuracy using the PASSIVE
AGGRESSIVE algorithm.
Decision Tree: It is a straightforward but effective
approach that works both for regres sion and
class ification issues. It is perfect for describing
outcomes to stakeholders who really are unfamilia r
with technical nuances because it is simple to
interpret and visualis e. It is simple to manage non -
linear correlations between features and goal
variables s ince the algorithm divides the data into
progressively lower s ubs ets based on the
characteris tics . So, when tree is allowed to expand
too deep, it is particularly prone to overfitting,
which can result in subpar performance on
unobserved data.
We achieved 95.29% accuracy using the
DECISION TREE algorithm.
Logistic Regression: For problems involving binary
and multiple clas ses in classification, it is a linear
algorithm. With a minimal to medium-sized data
sets , it is quick to train and effective. It is simp le to
understand and offers information about how
features relate to the desired variable. The
assumption that characteristics and target variables
have a linear relationship, however, may not
necessarily be true in real-world data.
We achieved 96.65% accuracy using the
LOGISTIC REGRESSION algorithm.
Random Forest: To produce predictions based on
many decis ion trees , data mining and machine
learning use the ensemble learning technique
known as random forest. Several decision trees are
trained using randomly chosen data subsets , and
then their predictions are combined using weighted
average or ma jority voting. Especially in
comparison to decision trees, it is significantly
accurate and much less prone to overfitting,
although it requires more computing.
International Conference on Innovative Data Communication Technologies and Application (ICIDCA-2023)
IEEE Xplore Part Number: CFP23CR5-ART; ISBN: 979-8-3503-9720-8
979-8-3503-9720-8/23/$31.00 ©2023 IEEE 403
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on June 08,2023 at 07:17:03 UTC from IEEE Xplore. Restrictions apply.
We achieved 95.81% accuracy us ing the RANDOM
FOREST algorithm.
A. PREDICTIONS
ALGORITHM
ACCURACY
Random Forest [12]
95.81%
Pass ive Aggressive [13]
97.86%
Logistic Regression [14]
96.65%
Decision Tree [15]
95.29%
We have used these models, and the accuracies for these
models are
Fig. 9. Obt ained Accuracies
B. ENVIRONMENT
Google Co lab is a f ree cloud-based platform that provides
access to powerful co mputing resources and a Jupyter
notebook environment for data scientists and machine
learning engineers. The platfor m is designed to allow users
to collaborate and share their work, making it an ideal
environment for conducting machine learning applicat ions,
including fake news detection.
In Google Colab, us ers have access to GP Us and TPUs,
which can greatly speed up the training of machine learning
models. This is particularly useful for large and complex
models that would otherwise require a lot of co mputational
resources and time to train. With Google Colab, users can
start training their models in minutes , without having to
worry about s etting up their own hardwa re or software
environment.
The Jupyter notebook environment in Google Co lab
provides a convenient and interactive way to write, execute,
and visualize code. Users can write and run their code in the
browser, without having to install any software or
dependencies on their local machine. The notebooks can be
easily shared with others, ma king it easy to collaborate with
team members or share the results with a wider audience.
In the context of a fake news detection paper, Google Colab
can be used to train and evaluate machine lea rning models
on large datasets, and to perform data preprocessing and
feature e xtraction. The Jupyter notebooks can be used to
document the steps taken in this paper, to record the results,
and to share the findings with others .
V. CONCLUSION
To summarize, detecting fake news is a co mple x problem
that necessitates a multidisciplinary approach. Collecting
and preprocessing data, selecting and training machine
learning algorithms , and fine-tuning the models for
improved performance are a ll steps in the development of
effective fake news detection models. The quality and
quantity of data, as well as the algorithms and features used,
all have a significant impact on the performance of fake
news detection models .
Despite these obstacles , fake news detection models have
the potential to make a significant contribution to the fight
agains t misinformation. These models can help to mitigate
the spread of fals e informat ion and protect individuals and
society from its harmfu l effects by automatically identifying
and flagging it. Furthermore, as technology advances and
machine learning algorithms become more s ophisticated,
fake news detection models are expected to beco me even
more effective in detecting and combating fake news.
REFERENCES
[1] N. R. de Oliveira, D. S. V. Medeiros, and D. M. F. Matt os, ‘‘A
sensitivestylistic approach to identify fake n ews on social
networkin g, IEEE SignalProcess. Lett., vol. 27, pp. 1250–
1254, 2020.
[2] A. R. Merryton an d G. Augasta, ‘A survey on recent advances in
machinelearning techniques for fake news detection,’’ Test Eng.
Manag, vol. 83 ,pp. 1157211582, 2020.
[3] M. Mahyoob, J. Algaraady , and M. Alrah aili, ‘‘Linguistic-based
detectionof fake news in social media,’’ Int. J. English Linguistics,
vol. 11, no. 1,p. 99, Nov. 2020Journal of Advan ced Research in
Comput er and Communicat ion Engineering ISO 3297 :2007 Certified
Vol. 6 , Issue 12, December 2017
[4] N. R. de Oliveira, P . S. P isa, M. A. Lopez, D. S. V. de Medeiros,
andD. M. F. Matto s, ‘‘Ident ifying fake news on social networks
based onnatural language processing: T ren ds and challenges,’
Information, v ol. 12,no. 1, p. 38, Jan. 2021.
[5] D. D. N. Caragea, D. M. Caragea and K. C. Al-Kofahi, "A Machine
Learning-based System for Det ectin g Fake News in Social Media,"
in Proceedings of the 56th Hawaii International Conference on
Sy stem Scien ces, 2 019, pp. 3 220 -3229.
[6] H. Farid an d K. Abdullah, "Fak e News Detection Based on Machine
Learning Algorith ms," in 201 9 Int ernational Conference on
Information Technology, Islam ic Republic of Iran, 2019, pp. 1-6.
[7] S. R. S. S. P rabaharan and R. Priyan ka, "Fake News Detect ion using
Machine Learning Algorithms," in 2020 International Conference on
Advances in Comput ing, Comm unications an d Informatics
(ICACCI), 2020, pp. 1190-1195.
[8] J. Kim and J. Kim, "Fake News Detect ion Using Mult i-Source
Informat ion and Machine Learning T echniques," in P roceedings of
the 2019 International Conference on Computational Science and
Comput ational Intelligence, 2019, pp. 3 29-334.
[9] Amjad M, Sido rov G, Zh ila A, Gómez-Adorno H, Voronkov I,
Gelbukh A. 2020. Bend t he truth: benchmark dataset fo r fake news
detection in Urdu language and its evaluat ion. Journal of
Int elligent & Fuzzy SystemsRegr ession Analysis,” vol. 6, no. 5.
[10] Mohammed Ali Shaik, Praveen Pappula, T Sampath Kumar,
Predict ing Hypothyroid Disease usin g Ensemble Models through
Machine Learning Approach , European Journal of Molecular &
Clinical Medicine, , 9(7), 6738-6745, (2022).
[11] M. A. Shaik, S. k. Koppula, M. Rafiuddin and B. S. Preethi,
COVID-19 Detector Usin g Deep Learning, International Conference
on Applied Artificial Intelligen ce and Computing (ICAAIC), 443-
449 ,(2022).
[12] T.Sampath Kumar, B.Manjula, Security Issue Analysis on Cloud
Comput ing Based System,Int ernational Journal of Future Generation
Communication and Networking, 12(5),143 150, (2019).
[13] Mohammed Ali Shaik and Dh anraj Verma, Prediction of Heart
Disease usin g Swarm Intelligence based Machine Learning
Algorithms, Int ernational Conference on Research in Sciences,
Engineering &Technology, Published by AIP Publishing. 978-0-
7354-4368-6, 2022, pp. 02002 5-1 to 020025-9.
[14] YerrollaChanti, Bandi Bhaskar, NagendarYamsani, Li-Fi
Technolo gy Utilized In Leveraged To Power In Aviat ion System
Entertainment Through Wireless Comm unication ”, J. Mech. Cont.&
Math. Sci., Vol.-15, No.-6, June (2020) p p 405-41 2.
[15] Mohammed Ali Shaik and Dhanraj Verma, "Predictin g Present Day
Mobile Phone Sales using Time Series based Hybrid Prediction
International Conference on Innovative Data Communication Technologies and Application (ICIDCA-2023)
IEEE Xplore Part Number: CFP23CR5-ART; ISBN: 979-8-3503-9720-8
979-8-3503-9720-8/23/$31.00 ©2023 IEEE 404
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on June 08,2023 at 07:17:03 UTC from IEEE Xplore. Restrictions apply.
Model", International Conference on Research in Sciences,
Engineering & Technology, AIP Conf. P ro c. 2418, ( 2022) pp.
020073-1 to 020073-9
[16] Mohammed Ali Shaik, MD. Riyaz Ahmed, M. Sai Ram and G.
Ranadheer Reddy, "Imposing Security in the Video Surveillance",
Internat ional Conference on Research in Sciences, Engineering &
Technolo gy , AIP Conf. Proc. 2418 , 0 20012 -1020012 -8, (2022), pp.
020012-1 to 020012-8.
[17] Mohammed Ali Shaik, Geetha Manoharan, B P rashanth, NuneAkhil,
Anumandla Akash and Thudi Raja Sh ekhar Reddy, "Predict ion of
Crop Yield using Machin e Learn ing", Int ernational Conference on
Research in Sciences, Engineering & T echnology, AIP Conf. Proc.
241 8, (2022), pp. 020072-1020072-8,
[18] Mohammed Ali Shaik and Dhanraj Verma, Enhanced ANN training
model to smooth and time series forecast, IOP Conf. Ser.: Mater.
Sci. Eng. 981, (2020), p p. 02 2038
[19] T. Sampath Kumar, B. Manjula, Mohammed Ali Shaik, P. Praveen ,
A Comprehensive Study on Single Sign on Technique Internat ional
Journal o f Advanced Science and T echnology (IJAST ), 127, 156-
162 , (20 19)
[20] Mohammed Ali Shaik, Dhanraj Verma, P Praveen, K Ranganath and
Bonthala Prabh anjan Yadav, RNN based predict ion of
spatiot empo ral data mining, IOP Conf. Ser.: Mater. Sci. Eng. 981,
(20 20), pp.02 2027.
[21] T. Sampath Kum ar, B. Manjula, D. Srinivas, A New T echnique to
Secure Data Over Cloud Jo ur of Adv Research in Dynamical &
Cont rol Syst ems, 11, (2017), pp 145-149.
[22] Mohammed Ali Shaik and Dhanraj Verm a, Deep learnin g time series
to forecast COVID-19 active cases in INDIA: A comparative study,
IOP Conf. Ser.: Mater.Sci.Eng.981, (2020), pp. 022041.
[23] P Praveen, M Ranjith Kumar, Mohammed Ali Shaik, R RaviKumar
and R Kiran, The comparative study on agglomerat ive h ierarchical
clustering usingnumerical dat a, IOP Conf. Ser.: Mat er. Sci. Eng.
981 , 2020, pp. 0 22071.
[24] P. Praveen, B.Rama, An Efficient Smart Search Using R Tree on
Sp atial Data Journal of Advanced Research in Dynamical and
Cont rol Syst em s, 4 , (2019), pp.19 43-1949.
[25] D Kothandaraman, N P raveena, K Varadarajkumar, B Madhav Rao,
Dharmesh Dhabliya, Sh ivaprasad Sat la, Worku Abera, Intelligent
Forecast ing of Air Quality and Pollut ion Predictio n Using Machine
Learn ing, Adsorption Science & Technology, 2022, (2022).
[26] Sallauddin Mohmmad, Ramesh Dadi, D Kothandaraman , E
Su darshan, Syed Nawaz Pasha, Mohammed Ali Shaik, A survey
machine learning based object det ections in an image, AIP
Conference P ro ceedings,2418, (1), (2022), pp. 020024.
[27] A Balasundaram, D Kothandaraman, S Ashokkumar, E Sudarsh an,
Chest X-ray image based COVID predict ion using machine learning,
AIP Conference Proceedings, 2418(1), 2022 , pp. 020079.
[28] Mohammed Ali Shaik, T ime Series Forecast ing using Vector
quantizat ion”, Internation al Journal of Advanced Science and
Technology (IJAST), 29(4), 169-175, (2020).
[29] Mohammed Ali Sh aik, T. Sampath Kumar, P. Praveen, R.
Vijayaprakash, Research on Multi-Agent Experiment in Clust ering”,
Internat ional Journ al of RecentT echnology and Engineering
(IJRT E), 8(1S4), 1126-1129, (2019).
[30] Mohammed Ali Shaik, A Survey on Text Classification meth ods
throughMachine Learning Methods”, Int ern ational Journal of
Cont rol and Automation (IJCA),12(6), 390-396, (2 019).
[31] R. Ravi Kumar, M. Babu Reddy an d P. P raveen, A review of feat ure
subset selection on unsupervised learning Th ird Internat ional
Conference on Advances in Electrical, Electronics, Info rmat ion,
Communication and Bio-Informatics (AEEICB), (2017), pp.163-
167.
International Conference on Innovative Data Communication Technologies and Application (ICIDCA-2023)
IEEE Xplore Part Number: CFP23CR5-ART; ISBN: 979-8-3503-9720-8
979-8-3503-9720-8/23/$31.00 ©2023 IEEE 405
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on June 08,2023 at 07:17:03 UTC from IEEE Xplore. Restrictions apply.
... About [4], The method comprises several stages: hand detection using the SSD MobileNet model, initialization of hand tracking using the Kalman filter, estimation of hand key points based on Convolutional Pose Machines (CPMs), and classification through Convolutional Neural Networks (CNNs). A key innovation is using a multi-frame recursion technique to enhance accuracy and mitigate the impact of misclassified or redundant frames [11]. ...
Chapter
Natural language processing (NLP) has changed how humans interact with technology and evaluate data. Its ability to comprehend, interpret, and generate human language has opened up new avenues for international diplomacy. The study of international politics necessitates the examination of massive amounts of unstructured textual material, such as diplomatic letters, international treaties, speeches, news reports, and social media posts. Manually analyzing this data is a complex and time-consuming undertaking. NLP is a tool for automating this process and extracting useful information. NLP provides real-time global political discourse analysis, detects sentiment or policy shifts, identifies new trends or dangers, and bridges language barriers using sentiment analysis, topic modelling, text categorization, and machine translation algorithms. Sentiment analysis, for example, is used to evaluate public opinion on international topics based on social media posts. Machine translation enables diplomats and politicians to swiftly comprehend papers written in foreign languages. We will investigate the consequences of NLP in decision-making, anticipating political outcomes, and altering the international relations landscape. The intersections of NLP and politics present consequential prospects while raising critical ethical and policy concerns, emphasizing the importance of responsibly using this powerful technology.KeywordsNatural Language ProcessingTopic modellingSentiment analysisText classificationMachine translationUnder-resourced languages
Article
Full-text available
Air pollution consists of harmful gases and fine Particulate Matter (PM2.5) which affect the quality of air. This has not only become the key issues in scientific research but also turned to be an important social issues of the public’s life. Therefore, many experts and scholars at different R&Ds, universities, and abroad are involved in lot of research on PM2.5 pollutant predictions. In this scenario, the authors proposed various machine learning models such as linear regression, random forest, KNN, ridge and lasso, XGBoost, and AdaBoost models to predict PM2.5 pollutants in polluted cities. This experiment is carried out using Jupyter Notebook in Python 3.7.3. From the results with respect to MAE, MAPE, and RMSE metrics, among the models, XGBoost, AdaBoost, random forest, and KNN models (8.27, 0.40, and 13.85; 9.23, 0.45, and 10.59; 39.84, 1.94, and 54.59; and 49.13, 2.40, and 69.92, respectively) are observed to be more reliable models. The PM2.5 pollutant concentration (PClow-PChigh) range observed for these models is 0-18.583 μg/m³, 18.583-25.023 μg/m³, 25.023-28.234μg/m³, and 28.234-49.032 μg/m³, respectively, so these models can both predict the PM2.5 pollutant and can forecast the air quality levels in a better way. On comparison between various existing models and proposed models, it was observed that the proposed models can predict the PM2.5 pollutant with a better performance with a reduced error rate than the existing models.
Conference Paper
Full-text available
Farming is the major work which is considered as a culture instead like job and farming is the back bone of our economy as farming is the means which carried forth human advancement. India is nation which shows more interest towards farming and also grows all types of crops and its economy generally dependent on harvest profitability. Subsequently we can say that agriculture is major support for all business in our nation. Choosing of each harvest is significant in the choosing as each and every state in India grow various crop and the climate also varies from state to state. The choice of crop will depend on the various factors like, value of the crop, price given by the government, weather conditions and the price given by the private market buyers. Numerous progressions are needed in the field of agriculture to improve the benefit to Indian economy. We can improve agriculture by implementing AI mechanisms which can be same are defficiently on various cultivating areas. With all the advancements in the areas of machines and their improvements we can use them in cultivating the valuable and detailed data concerning various issues in addition to assuming the critical part in it. This paper helps use to getting an idea towards executing all the harvest based strategy with the ambitious techniques that helps in enchanting the maintenance of numerous agriculture and agriculture field issues. This helps the farmers to choose a best crop which helps them getting profit and also helps to increase our nation’s economy.
Conference Paper
Full-text available
In past decades, usually people purchase electronic products or gadgets at nearby retail stores or from direct brand showrooms. The manufacturers collect feedback from the customers via salespoints, calls, messages, emails and feedback forms during service. The customer feedback plays a vital role in improving the product quality as well as to know the need of the customers. These reviews over a time series may not reach the new customers and as well as the originality of the reviews are not ensured. In recent days of thriving information technology, because of huge arrival of shopping portals like Flipkart, Amazon and so on people started to buy products via these portals. These portals beyond sharing the product information also allows the buyers to share their feedback as well as the experience with the purchased product. Novel buyers do read those online reviews or comments and further compare dozens of stores and products before deciding to purchase a product as the customer comments also serve as a source for the companies to predict the sales of their present product and tentative prediction of future sales. By collecting these time series specific reviews and stock market values would help the companies to make an estimation of sales that take place.
Conference Paper
This work is towards COVID-19 infection detection by analyzing chest X-Rays. Since the recent and sudden spike in the rate of COVID-19 infections across the globe, several alternative screening approaches and strategies have been developed to identify infected cases of COVID-19. Our aim is to develop a machine-learning based model and design exploration to learn the architecture design starting from initial design prototype and machine learning technique to detect COVID-19 in.a simpler manner. Therefore developing an automated analysis system is required to save medical professionals valuable time.
Conference Paper
In case of a crime scene or any other investigating task, the recorded surveillance video footage plays a very prominent role in studying the situation. It is a very challenging duty for the surveillant officer to look after many such videos without missing the important details. Thereby, the officer has to be assisted with an efficient tool to ensure grabbing clear and unambiguous information from the video footage. The foremost step is to detect people and other objects distinctly as well as accurately. Along with the detection pointing out the detected objects has to be done by drawing bounding boxes around them. For this to happen the Machine has to be trained very well that in such a way that it should be able to detect all the objects in the video footage accurately. It is a challenging task to identify a moving object in public places. Set of regulations and restrictions are to be followed to put an association or commonality cautiously. This project is aimed to support and assist the surveillant officers in their investigations.
Conference Paper
In the present era heart disease is considered to be one of the major diseases and many people are suffering due to this disease and the foremost challenge is identification and prediction before it causes any consequences or deaths. There are some techniques available for prognosticate heart disease since this disease is increasing rapidly throughout the universe, this prediction process may save life. Time and efficient play important role in identifying heart disease in healthcare industry particularly in the field of cardiology. In this paper we developed a dynamic and accurate system for heart disease prediction using machine learning techniques. There are two phases which can identify and predict heart disease: 1) Feature selection 2) classification stage. Feature selection is one of the methods for selecting attributes and feature subset as it eliminates unwanted data and apply classification algorithms and dataset comprises of patient's attributes like age, gender, blood pressure, glucose level, blood sugar etc… by processing these attributes we can predict the chance of occurring heart disease. This paper proposes a optimization techniques like Grey wolf optimization, Particle swarm optimization combined with Ant colony optimization for performing supervised classification algorithms.PSO is for finding optimum solutions and ACO is for finding good paths and the mixed proposed algorithm is applied and result are estimated to identify the efficiency and robustness.
Conference Paper
One of the research emergence as per studied problem on the image processing based computer vision is that object detection in a image with bounding boxes. This complicated processing has to be done with help of machine leaning based algorithms only. In the recent years research has done with machine leaning algorithms like CNN,RCNN, Fast RCNN,FasterRCNN,Yolo algorithm and etc .These algorithms have achieved the proposed concept in different levels. In this paper we presented the comparative study of each algorithm and provided efficiency and weakness contexts.