ChapterPDF Available

Computer Vision Technique to Detect Accidents

Authors:

Abstract

Detecting accident in smart cities is hypothetical task in day-to-day life. It is hard to control by traffic police; the police cannot be available for all the 24 * 7 (all the month). Due to this, many accidents are passing by in everyone life. Many humans are losing their life due to lack of first aid support from the hospital. It takes at least 5 min to pass the accident information to the hospital; hence to overcome this problem, we have used computer vision technique to identify the accident in specific location, and the messages will be passed automatically to the nearby hospital. When the accident is detected the local hospital, and patrol is intimated by the Gmail else by SMS through SMTP to take necessary action. Using deep learning techniques, we are able to achieve a promising solution to this problem.KeywordsDeep learningCCTV surveillance footageFaster R-CNNAlarm systemZFVGG-16
123
Smart Innovation, Systems and Technologies 315
B.NarendraKumarRao
R.Balasubramanian
Shiuh-JengWang
RichiNayakEditors
Intelligent
Computing and
Applications
Proceedings ofICDIC 2020
Smart Innovation, Systems and Technologies
Volume 315
Series Editors
Robert J. Howlett, Bournemouth University and KES International,
Shoreham-by-Sea, UK
Lakhmi C. Jain, KES International, Shoreham-by-Sea, UK
The Smart Innovation, Systems and Technologies book series encompasses the topics
of knowledge, intelligence, innovation and sustainability. The aim of the series is to
make available a platform for the publication of books on all aspects of single and
multi-disciplinary research on these themes in order to make the latest results avail-
able in a readily-accessible form. Volumes on interdisciplinary research combining
two or more of these areas is particularly sought.
The series covers systems and paradigms that employ knowledge and intelligence
in a broad sense. Its scope is systems having embedded knowledge and intelligence,
which may be applied to the solution of world problems in industry, the environment
and the community. It also focusses on the knowledge-transfer methodologies and
innovation strategies employed to make this happen effectively. The combination
of intelligent systems tools and a broad range of applications introduces a need
for a synergy of disciplines from science, technology, business and the humanities.
The series will include conference proceedings, edited collections, monographs,
handbooks, reference books, and other relevant types of book in areas of science and
technology where smart systems and technologies can offer innovative solutions.
High quality content is an essential feature for all book proposals accepted for the
series. It is expected that editors of all accepted volumes will ensure that contributions
are subjected to an appropriate level of reviewing process and adhere to KES quality
principles.
Indexed by SCOPUS, EI Compendex, INSPEC, WTI Frankfurt eG, zbMATH,
Japanese Science and Technology Agency (JST), SCImago, DBLP.
All books published in the series are submitted for consideration in Web of Science.
B. Narendra Kumar Rao ·R. Balasubramanian ·
Shiuh-Jeng Wang ·Richi Nayak
Editors
Intelligent Computing
and Applications
Proceedings of ICDIC 2020
Editors
B. Narendra Kumar Rao
Department of Computer Science
and Engineering
Sree Vidyanikethan Engineering College
Tirupati, India
Shiuh-Jeng Wang
Department of Information Management
Central Police University
Taoyuan, Taiwan
R. Balasubramanian
Department of Computer Science
and Engineering
Indian Institute of Technology Roorkee
Roorkee, India
Richi Nayak
School of Electrical Engineering
and Computer Science
Queensland University of Technology
Brisbane, QLD, Australia
ISSN 2190-3018 ISSN 2190-3026 (electronic)
Smart Innovation, Systems and Technologies
ISBN 978-981-19-4161-0 ISBN 978-981-19-4162-7 (eBook)
https://doi.org/10.1007/978-981-19-4162-7
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Singapore Pte Ltd. 2023
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Conference Committee
International Conference on Data Analytics, Intelligent Computing, and Cyber
Security (ICDIC 2020)
Department of Computer Science and Engineering
Sree Vidyanikethan Engineering College
(Affiliated to JNTUA, Anantapuramu)
Sree Sainath Nagar, Tirupathi-517102, India
Chief Patrons
Dr. M. Mohan Babu, Chairman, SVET
Mr. Vishnu Manchu, CEO, SVET
Patrons
Dr. L. Venu Gopal Reddy, Advisor and Director, SVET
Dr. P. Giridhara Reddy, Director, Academics and Research, SVEC
Dr. B. M. Satish, Principal, SVEC
Advisors
Dr. T. Nageswara Prasad, Vice Principal, SVEC
Dr. P. V. Ramana, Dean Academics, SVEC
Dr. B. Raveendra Babu, Dean, CSE, SVEC
v
vi Conference Committee
International Technical Committee
Dr. Mathew Dailey, Asian Institute of Technology, Thailand
Dr. Lakshmi C. Jain, University of Technology, Sydney, Australia
Dr. Farook Hussain, University of Technology, Sydney, Australia
Dr. Basim Alhadidi, Al-Balqa’ Applied University, Amman, Jordan
Dr. Habibollah Haron, Qaiwan International University, Iraq
Dr. Sam Goundar, Victoria University of Wellington, New Zealand
Dr. Marjan Kuchaki Rafsanjan, Shahid Bahonar University of Kerman, Kerman, Iran
Dr. M. S. Mekala, Yeungnam University, Gyeongsan, Korea
Dr. T. V. Ramana, TVET, Addis Ababa 1000, Ethiopia
Dr. Srinivas Nowduri, USA
Dr. Siva Ram Rajeyyagari, Shaqra University, Shaqra
National Technical Committee
Dr. R. Balasubramanian, IIT Roorkee, India (Chairman)
Dr. R. B. V. Subramanyam, NIT Warangal
Dr. V. K. Govindham, NIT Calicut
Dr. P. Viswanadh, IIIT Sricity
Dr. Devender Gurjat, NIT Silchar
Dr. K. Jairam Naik, NIT Raipur
Dr. G. C. Nandi, IIIT, Allahabad, Jhalwa, Uttar Pradesh
Dr. Deepak Garg, Bennett University, Greater Noida, Uttar Pradesh
Dr. R. Jagadish Kannan, VIT, Chennai
Dr. A. Rama Mohan Reddy, SV University, Tirupati
Dr. A. Vinaya Babu, JNTUH, Hyderabad
Dr. S. Viswanadha Raju, JNTUHCEJ, Jagityal, Karimnagar
Dr. B. Eswar Reddy, JNTUA College of Engineering, Kalikiri
Dr. P. Chenna Reddy, JNTUA, Ananthapuramu
Dr. C. Shoba Bindhu, JNTUA, Ananthapuramu
Dr. V. Valli Kumari, Andhra University, Vishakhapatnam
Dr.B.K.Tripathy,VIT,Vellore
Dr. S. Jyothi, SPMVV, Tirupati
Dr. Praveen Shukla, BBD University, Lucknow
Dr. K. Srujan Raju, CMR Technical Campus, Hyderabad
Dr. B. Balamurugan, Galgotias University, Greater Noida, Uttar Pradesh
Dr. K. Thirunavukkarasu, Karnavati University, Uvarsad, Gandhinagar, Gujarat
Dr. M. Rajasekhar Babu, VIT University, Vellore
Dr. G. Vara Prasad, BMS College of Engineering, Bengaluru
Conference Committee vii
Dr. K. Suneetha, Jain-deemed-to-be University, Bengaluru
Dr. Praveen Shukla, BBD University, Lucknow
Dr. P. Nagendra, Vishnu Institute of Technology, Bheemavaram
Dr. E. S. Madhan, SRM University, Chennai
Honorary Chair
Dr. Suresh Chandra Satapathy, KIIT, Bhubaneswar, India
Conference Chair
Dr. B. Narendra Kumar Rao, SVEC
Convener
Dr. K. Reddy Madhavi, SVEC
Organizing Committee
Dr. A. V. Sri Harsha, SVEC
Dr. G. Sunitha, SVEC
Dr.J.Avanija,SVEC
Dr. K. Suresh, SVEC
Dr. S. Sreenivasa Chakravarthi, SVEC
Dr. D. Ganesh, SVEC
Preface
The maiden International Conference on Data Analytics, Intelligent Computing, and
Cyber Security (ICDIC 2020), organized by the Department of Computer Science and
Engineering, Sree Vidyanikethan Engineering College, Tirupati, Andhra Pradesh,
India, is held during December 29–30, 2021. Sree Vidyanikethan Engineering
College (Autonomous) was established in 1996 by Sree Vidyanikethan Educational
Trust under the stewardship of Dr. M. Mohan Babu, renowned Film Artiste and
Former Member of Parliament (Rajya Sabha). The College was established in the
backward region of Rayalaseema to serve the cause of technical education with
an initial intake of 180. The intake has been increased exponentially to 2382 from
the year 2021 to 2022. The College now offers 15 B.Tech. programs; four M.Tech.
programs; MCA programs; and three Doctoral Programs. AICTE has also accorded
permission for second shift polytechnic from the academic year 2009 to 2010 and
presently five diploma courses are being offered. Today, Sree Vidyanikethan Engi-
neering College is one of the largest, most admired, and sought-after institutions in
Andhra Pradesh. The College is approved by AICTE and affiliated with JNTUA,
Ananthapuramu. The College has been accorded Autonomous Status by the UGC,
New Delhi, in 2010–2011. The College is known for its quality initiatives which are
amply reflected in accreditations by the National Board of Accreditation (NBA) for
UG and PG programs and Accredited by the National Assessment and Accreditation
Council (NAAC) with an A Grade. The College has been accorded ‘UGC-Colleges
with Potential for Excellence’ status under CPE Scheme by UGC, New Delhi. It
also has been accorded ‘PLATINUM’ category by CII-AICTE Survey and was
conferred with A-Grade’ by the Department of Higher Education, Andhra Pradesh.
The College participated in National Institution Ranking Frame Work (NIRF), 2020,
and was awarded the rank of 184. SIEMENS and APSSDC have established six
state-of-the-art laboratories in the institution.
ICDIC 2020 aims at providing a platform for scientists, scholars, engineers,
and students to present theoretical research and practical advancements at national
and international levels in the fields of data analytics, intelligent computing, cyber-
security, and its allied areas. Experts from different parts of the globe are involved in
the interaction on respective fields of the conference theme. Out of 176 submissions
ix
xPreface
from all over the world, only 48 papers were selected after thorough reviewing, for
publication in the Springer Book Series on Smart Innovation, Systems and Technolo-
gies (SIST). International Conference on Data Analytics, Intelligent Computing and
Cyber Security, ICDIC-20, comprises the comprehensive state-of-the-art technical
contributions in the areas of data analytics, intelligent computing, cyber-security, and
emerging technologies. Selected papers were divided into five tracks, well balanced in
the content, and created enough discussion space for trending concepts. The purpose
of the conference has been served satisfactorily through international and national
speakers, 48 oral presentations by delegates, exchanging and sharing research knowl-
edge among peers. This conference created an ample opportunity for discussions,
debate, and exchange of ideas and information among participants. We are very
much grateful for the international and national advisory committee, session chairs,
peer reviewers who provided critical reviews in selecting quality papers, organizing
committee members, student volunteers, and faculty of the Department of Computer
Science and Engineering, who contributed to the success of this conference. We are
also thankful for all the authors who submitted quality papers and communicated
with their peers through the presentation of the work, which led to the grand success
of the conference.
We are very much thankful to the management of Sree Vidyanikethan Engineering
College Management, for their support in every step of the journey toward the success
of this conference, which inspired organizers and motivated many others.
Tirupati, India
Roorkee, India
Taoyuan, Taiwan
Brisbane, QLD, Australia
B. Narendra Kumar Rao
R. Balasubramanian
Shiuh-Jeng Wang
Richi Nayak
Contents
1 Prediction of Depression-Related Posts in Instagram Social
Media Platform ............................................... 1
M. Harini and B. Sivakumar
2 Classification of Credit Card Frauds Using Autoencoded
Features ...................................................... 9
Kerenalli Sudarshana, C. MylaraReddy, and Zameer Ahmed Adhoni
3 BIVFN: Blockchain-Enabled Intelligent Vehicular Fog
Networks ..................................................... 19
Priyanka Gaba and Ram Shringar Raw
4 Deep Learning Approach for Pedestrian Detection, Tracking,
and Suspicious Activity Recognition in Academic Environment .... 29
Kamal Hajari, Ujwalla Gawande, and Yogesh Golhar
5 Data-Driven Approach to Deflate Consumption in Delay
Tolerant Networks ............................................ 39
C. Venkata Subbaiah and K. Govinda
6 Code-Level Self-adaptive Approach for Building Reusable
Software Components ......................................... 49
Sampath Korra, V. Biksham, Kotte Vinaykumar, and T. Bhaskar
7 Design of a Deep Network Model for Weed Classification ......... 59
M. Vaidhehi and C. Malathy
8 E-Voting System Using U-Net Architecture with Blockchain
Technology ................................................... 69
Nuthalapati Sudha and A. Brahmananda Reddy
9 Multi-layered Architecture to Monitor and Control
the Energy Management in Smart Cities ........................ 81
A. K. Damodaram, S. Sreenivasa Chakravarthi,
L. Venkateswara Reddy, and K. Reddy Madhavi
xi
xii Contents
10 Bio-Inspired Firefly Algorithm for Polygonal Approximation
on Various Shapes ............................................. 95
L. Venkateswara Reddy, Ganesh Davanam, T. Pavan Kumar,
M. Sunil Kumar, and Mekala Narendar
11 An Efficient IoT Security Solution Using Deep Learning
Mechanisms .................................................. 109
Maganti Venkatesh, Marni Srinu, Vijaya Kumar Gudivada,
Bibhuti Bhusan Dash, and Rabinarayan Satpathy
12 Intelligent Disease Analysis Using Machine Learning ............. 119
Nagendra Panini Challa, J. S. Shyam Mohan,
M. Naga Badra Kali, and P. Venkata Rama Raju
13 Automated Detection of Skin Lesions Using Back Propagation
Neural Network ............................................... 127
Nagendra Panini Challa, A. Mohan, Narendra Kumar Rao,
Bhaskar Kumar Rao, J. S. Shyam Mohan, and B. Balaji Bhanu
14 Detection of COVID-19 Using CNN and ML Algorithms .......... 135
M. Raghav Srinivaas, Khanjan Shah, B. Abhishek,
R. Jagadeesh Kannan, and A. Balasundaram
15 Prioritization of Watersheds Using GIS and Fuzzy Analytical
Hierarchy (FAHP) Method ..................................... 149
K. Anil, S. Sivaprakasam, and P. Sridhar
16 A Narrative Framework with Ensemble Learning for Face
Emotion Recognition .......................................... 159
S. Naveen Kumar Polisetty, T. Sivaprakasam, and S. Indraneel
17 Modified Cloud-Based Malware Identification Technique
Using Machine Learning Approach ............................. 169
Gavini Sreelatha, Aishwarya Govindkar, and Sarukolla Ushaswini
18 Design and Deployment of the Road Safety System
in Vehicular Network Based on a Distance and Speed ............. 179
Thalakola Syamsundararao, Badugu Samatha,
Nagarjuna Karyemsetty, Subbarao Gogulamudi, and V. Deepak
19 Diagnosis of COVID-19 Using Artificial Intelligence
Techniques ................................................... 189
Pattan Afrid Ahmed, Prabhu Gantayat, Sarika Jay,
Venkata Sai Satvik, Jagadeesh Kannan Raju, and A. Balasundaram
20 Location Tracking via Bluetooth ................................ 203
Jasthi Siva Sai, Mukkamala Namitha, Routhu Ramya Dedeepya,
Mulugu Suma Anusha, Angadi Lakshmi, and Mukesh Chinta
Contents xiii
21 Shrimp Surfacing Recognition System in the Pond Using
Deep Computer Vision ......................................... 217
Gadhiraju Tej Varma and Sri Krishna Adusumalli
22 Sign Language Recognition for Needy People Using Machine
Learning Model ............................................... 227
Pavan Kumar Vadrevu, M. R. M. Veeramanickam,
Sri Krishna Adusumalli, and Sasi Kumar Bunga
23 Efficient Usage of Spectrum by Using Joint Optimization
Channel Allocation Method .................................... 235
Padyala Venkata Vara Prasad, K. V. D. Kiran,
Rajasekhar Kommaraju, and N. Gayathri
24 An Intelligent Energy-Efficient Routing Protocol for Wearable
Body Area Networks .......................................... 249
Muniraju Naidu Vadlamudi and Md. Asdaque Hussian
25 Enhanced Video Classification System with Convolutional
Neural Networks Using Representative Frames as Input Data ..... 259
K. Jayasree and Sumam Mary Idicula
26 Text Recognition from Images Using Deep Learning
Techniques ................................................... 265
B. Narendra Kumar Rao, Kondra Pranitha, Ranjana,
C. V. Krishnaveni, and Midhun Chakkaravarthy
27 Early Detection and Diagnosis of Oral Cancer Using Fusioned
Deep Neural Network .......................................... 281
Sree T. Sucharitha, I. Kannan, and K. A. Varun Kumar
28 Fine-tuning for Transfer Learning of ResNet152 for Disease
Identification in Tomato Leaves ................................. 295
Lakshmi Ramani Burra, Janakiramaiah Bonam,
Praveen Tumuluru, and B Narendra Kumar Rao
29 AI-Based Mental Fatigue Recognition and Responsive
Recommendation System ...................................... 303
Korupalli V. Rajesh Kumar, B. Rupa Devi, M. Sudhakara,
Gabbireddy Keerthi, and K. Reddy Madhavi
30 Multiple Slotted Triple-Band PIFA Antenna for Wearable
Medical Applications at 2.5–9 GHz ............................. 315
T. V. S. Divakar and G. Anantha Rao
31 Fish Classification System Using Customized Deep Residual
Neural Networks on Small-Scale Underwater Images ............. 327
M. Sudhakara, Y. Vijaya Shambhavi, R. Obulakonda Reddy,
N. Badrinath, and K. Reddy Madhavi
xiv Contents
32 Multiple Face Recognition System Using OpenFace ............... 339
Janakiramaiah Bonam, Lakshmi Ramani Burra,
Roopasri Sai Varshitha Godavarthi, Divya Jagabattula,
Sowmya Eda, and Soumya Gogulamudi
33 EDAARP-Efficient and Data-Aggregative Authentic Routing
Protocol for Wireless Sensor Networks .......................... 351
Kurakula Arun Kumar and Karthikeyan Jayaraman
34 Mobile-Based Selfie Sign Language Recognition System
(SSLRS) Using Statistical Features and ANN Classifier ........... 361
G. Anantha Rao, K. Syamala, and T. V. S. Divakar
35 An Effective Model for Malware Detection ...................... 377
V. Valli Kumari and Shaik Jani
36 An Efficient Approach to Retrieve Information for Desktop
Search Engine ................................................ 387
S. A. Karthik, G. Lalitha, Y. Md. Riyazuddin, and R. Venkataramana
37 Baggage Recognition and Collection at Airports .................. 397
Aviral Pulast and S. Asha
38 Computer Vision Technique to Detect Accidents ................. 407
A. Jafflet Trinishia and S. Asha
39 An Efficient Machine Learning Approach for Apple Leaf
Disease Detection .............................................. 419
K. R. Bhavya, S. Pravinth Raja, B. Sunil Kumar, S. A. Karthik,
and Subhash Chavadaki
40 Precipitation Estimation Using Deep Learning ................... 431
Mohammad Gouse Galety, Fanar Fareed Hanna Rofoo,
and Rebaz Maaroof
41 The Adaptive Strategies Improving Digital Twin Using
the Internet of Things ......................................... 439
N. Venkateswarulu, P. Sunil Kumar Reddy, O. Obulesu,
and K. Suresh
42 Deep Learning for Breast Cancer Diagnosis Using
Histopathological Images ...................................... 447
Mohammad Gouse Galety, Firas Husham Almukhtar,
Rebaz Jamal Maaroof, and Fanar Fareed Hanna Rofoo
43 Implementation of 12 Band Integer Filter-Bank for Digital
Hearing Aid .................................................. 455
K. Ayyappa Swamy and Zachariah C. Alex
Contents xv
44 Comparative Analysis on Heart Disease Prediction
Using Convolutional Neural Network with Adapted
Backpropagation .............................................. 465
K. Suneetha, Kamala Challa, J. Avanija, Yaswanth Raparthi,
and Suresh Kallam
45 Applying Machine Learning to Enhance COVID-19
Prediction and Diagnosis of COVID-19 Treatment Using
Convalescent Plasma .......................................... 479
Lavanya Kongala, Thoutireddy Shilpa, K. Reddy Madhavi,
Pradeep Ghantasala, and Suresh Kallam
46 Analysis of Disaster Tweets Using Natural Language
Processing .................................................... 491
Thulasi Bikku, Pathakamuri Chandrika, Anuhya Kanyadari,
Vuyyuru Prathima, and Borra Bhavana Sai
Author Index ...................................................... 503
About the Editors
Dr. B. Narendra Kumar Rao is currently Professor and Head of the Department
of Computer Science and Engineering at Sree Vidyanikethan Engineering College,
Tirupati, Andhra Pradesh, India. He published and authored in reputed journals and
books to his credit. He is part of Intelligent Computing Research Centre at Sree
Vidyanikethan Engineering College, Tirupati. His areas of research include Software
Engineering and Deep Learning.
R. Balasubramanian is currently with the Department of Computer Science and
Engineering, Indian Institute of Technology, Roorkee, India. His areas of interest
include computer vision, image processing, machine learning and other allied areas.
Being an active researcher he is part of several reputed research labs such as SeSaMe
Research Centre, National University of Singapore, Singapore, School of Computer
Sciences, Universiti Sains Malaysia, Pulau Pinang, Malaysia, Faculty of Computing
and Informatics, Multimedia University, Cyberjaya Campus, Malaysia, and Depart-
ment of Computer Science, University at Albany-State University of New York
(SUNY), NY, USA. So far, he has published more than 130 international journals,
125 conference proceedings, seven book chapters and a technical report. He is the
recipient of BOYSCAST fellowship (awarded by DST, India). He is also the recipient
of “Outstanding Teacher Award 2010” awarded by IIT Roorkee.
Prof. Shiuh-Jeng Wang is currently with the Department of Information Manage-
ment at Central Police University, Taoyuan, Taiwan, where he directs the Informa-
tion Cryptology and Construction Laboratory (ICCL). He was a recipient of the 5th
Acer Long-Tung Master Thesis Award and the 10th Acer Long-Tung Ph.D. Disser-
tation Award in 1991 and 1996, respectively. He served the editor-in-chief of the
journal of Communications of the CCISA in Taiwan from 2000 to 2006. He authored
eight books: Information Security,Cryptography and Network Security,State of the
Art on Internet Security and Digital Forensics,Eyes of Privacy–Information Secu-
rity and Computer Forensics,Information Multimedia Security, Computer Forensics
xvii
xviii About the Editors
and Digital Evidence,Computer Forensics and Security Systems, and Computer
and Network Security in Practice, published in 2003, 2004, 2006, 2007, and 2009,
respectively.
Assoc. Prof. Richi Nayak is with the School of Electrical Engineering and Computer
Science, Queensland University of Technology, Brisbane, Australia. She is Head of
the Data Science Discipline in EECS. She is an internationally recognised expert in
data mining, text mining, and web intelligence. She has combined knowledge in these
areas very successfully with diverse disciplines such as Social Science, Science, and
Engineering for technology transfer to real-world problems to change their practices
and methodologies. Her particular research interests are machine learning, and in
recent years, she has concentrated her work on text mining, personalization, automa-
tion, and social network analysis. She has published high-quality conference and
journal articles and highly cited in her research field. She has received a number of
awards and nominations for teaching, research, and service activities.
Chapter 1
Prediction of Depression-Related Posts
in Instagram Social Media Platform
M. Harini and B. Sivakumar
Abstract Depression is taken as a vital issue that is a leading cause of disability
worldwide and is a major contributor to serious medical illness which may even lead
to suicide. Depression causes feelings of sadness and loss of interest in activities you
once enjoyed. This serious medical illness seriously affects the way you think, how
you feel and act. Millions of people are suffering from depression, but only a few of
them are undergoing proper and adequate treatment. So as most of us are very much
connected to social media today, we decided to explore any such depression-related
behavior in their posts that they do in it. Usually, this is caused by a person’s day-
to-day life activities such as working, thinking, relationship issues, and studying. It
is taken as a sober challenge in our everyday lives. Nowadays, people spend much
time on social media forums, and detection of depression-related posts is important
to avoid sharing bad posts among the people community spreading positivity. Deter-
mination of depression levels and person’s negative response is important because
it tells us about the negativism and also usage of ML classifier techniques and auto-
matic negative posts are also determined. This proposed system can help the kids in
viewing only positive posts in the social media forums.
1.1 Introduction
Nowadays, Instagram became the most well-liked social media platform, as the user
interface attracts people to use it for a longer time, there is no end for matter and
we can keep on scrolling. People keep on sharing their daily day-to-day activities
on the social media platform, and users share all kinds of emotions and activities in
the form of images and texts. Depression is increasing vigorously, it is being treated
spuriously, and it is a mental disorder that has an unremitting feeling of hopelessness
and worthlessness. It affects the way the user thinks in his normal life, and they often
feel very difficult to interact with people.
M. Harini ·B. Sivakumar (B)
Computer Science and Engineering Department SRMIST, Kattankulathur, India
e-mail: sivakumb2@srmist.edu.in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
B. N. K. Rao et al. (eds.), Intelligent Computing and Applications, Smart Innovation,
Systems and Technologies 315, https://doi.org/10.1007/978-981-19-4162-7_1
1
2 M. Harini and B. Sivakumar
Using sentimental analysis [15] since it is a thriving topic has been researched for
a long time, with the goal of determining the character of the text and categorizing the
polarity of it. In today’s information age, a wealth of data is available for sentiment
classification of both image and text via social networking sites; [68] however, using
data from some social networking sites elevates privacy concerns; thus, Instagram
is the best social networking site for gathering enough data while also protecting us
from privacy laws.
For post classification, a variety of machine learning algorithms and natural
language processing (NLP) approaches are applicable, including Naive Bayes, SVM
algorithms, random forest algorithms, PCA, LIWC, and LDA. This work aims to
use ML and NLP [2,3] approaches to text-related data on social media in order to
understand the emotions of users, with a particular focus on depression, as well as
feature extraction techniques for picture classification. In this article, we largely deal
with social media postings, which are brief messages with a character restriction.
Users express their thoughts and feelings regarding current events in their lives and
the world around them in this short way.
1.2 Literature Survey
A A Computational Approach to Include Extraction for Recognizable Proof of
Self-Destructive Ideation in Tweets by T. Deepa and M. Kiran Mayee in the year
of 2020 proposed the depiction of any self-destructive depression behavior of a
person reflected in their tweets that they do in one of the trending social media
platform Twitter. The objective of the paper is it mainly focuses on linguistic
inquiry and word count to procure preferred outcomes [9].
B Detection of Depression-Related Posts in Reddit Social Media Forum by Michael
M. Tadesse, Hongfei Lin, Bo Xu, and Liang Yang in the year of 2019 proposed the
illustration of run-through of contemporary depression-related etiquette in social
media. The main objective of the paper is it mainly focuses on NLP techniques
such as N-grams, LIWC, and LDA to prevail desired outputs.
C Data Mining Approach to the Detection of Suicide in Social Media: A Case
Study of Singapore by Jane H. K. Seah, Kyong Jin Shim in the year of 2018
proposed the delineation of the analysis of the posts and comments that are
related to depression and suicide in social media. The intent of the paper is it
mainly focuses on the usage of LDA NLP technique to obtain at results.
D Predicting Depression Levels Using Social Media Posts by Maryam Mohammed
Aldarwish, Hafiz Farooq Ahmed in the year of 2017 proposed to present the
association among SNS users’ activities and mental health illness in social media.
The main grail of the paper is it mainly focuses on to classifying texts in different
social media platforms.
E Nonparametric Discovery of Online Mental Health-Related Communities by
Bo Dao, Thin Nguyen, Svetha Venkatesh and Dinh Phungin the year of 2015
1 Prediction of Depression-Related Posts 3
proposed to render the distributed-free finding of interactive web-based life-
disrupting allied sections. The leading object of the paper is it mainly focuses on
to classifying texts in different social media platforms.
1.3 Proposed Work
As depicted in Fig. 1.1 while in the initial phase, dataset or metadata is obtained
from Instagram API and public repositories while collecting the post’s filters that are
applied. The collected dataset contains seven different kinds of emotions, and the
dataset we collected is divided into two categories that is training and testing datasets.
Training helps in training the given model and helps in creating the model. Testing
helps in testing the model and also checks the output data. Around 80% should be
trained dataset, and remaining 20% will be testing dataset. Figure 1.1 depicts the
steps involved in emotion identification from images.
A Preprocessing: In preprocessing, the image taken as input from the dataset is
amplified by magnifying and removing the original (present) distortions, and
noise also improves its features to promote for next stage of processing. Here
details of input image are preserved, and at the same time, redundancy which
causes noise is removed. Overall this stage reads the input image and then rescales
it through the processes of filtration and normalization to remove noise. Then in
the rotated and uniform sized image, it is further sent to segmentation stage. The
image obtained from the preprocessing stage is taken as input to the segmentation
stage.
B Segmentation: Segmentation analysis partitions the digital input image taken as
input from preprocessing stage which separates image into multiple fragments.
The input image is divided into compatible, analogous, and coherent regions
Fig. 1.1 Workflow of proposed method for image
4 M. Harini and B. Sivakumar
corresponding to different entities in the input image on the framework of dispo-
sition, boundary, and potency. The image obtained from the segmentation stage
is taken as input to the feature extraction stage.
C Feature extraction: In this stage, the variable dimensions of the input image are
constricted. For the objective of facial emotion recognition, in order to prevail
the real-time staging and to reduce time complexity, only the eye entity and the
mouth entity are considered in an image. Composite of these two features is
ample to fetch the emotion accurately. To arrive at this combination, requisite
that is the projecting points from characteristic regions of image point detection
algorithm (PDA) is used.
In a similar way, preprocessing of text is the initial step that needs to be done
notably to the input text dataset taken. This stage includes takeaway of hash tags,
taggings, stop words, and redundant words from the input text.
Then the output from preprocessing stage is subjected to feature extraction stage.
Feature extraction is used to convert the text into a matrix or vector form. There
are many ways where we can do feature extraction. For this, we are using bag of
words. This feature helps in combining various machine learning algorithms and
extracts the data in many ways. It is a simple form to transform tokens into features,
it does not care about the format and count of words, and only consideration part is
word present in set of words. Then the output of previous stage is subjected to RNN
classifier. In this stage, we will come to know under which categorization of emotion
the text originally belongs to Fig. 1.2 depicts the series of steps involved in detecting
the type of emotion from texts.
Fig. 1.2 Workflow of proposed method for text
1 Prediction of Depression-Related Posts 5
1.4 Implementation
A Eye extraction: The eye entity in image is the most crucial segment in recognition
due to disclosure of white space along with the iris. The entity consisting of
strappy vertical frontiers or boundaries is to be detected. Therefore, the Sobel
mask or filter or frontier detector is applied to the input image to detect the parallel
projecting points of erect boundaries. Hence, it evaluates the inclination in both
x-direction and y-direction in order to improve accuracy.
B Eyebrow extraction: The eyebrow entity in the input image needs to be detected
and precisely segmented in order to stage eyebrow division scrutiny. The two
elliptical portions of characteristic boundaries in the image which recline straight
overhead each of the eye fragment are sorted out as the eyebrow regions.
The boundary fragments of these two regions are fetched for further straining.
Instantly, Sobel detection is employed in fetching the edge image, as it can recog-
nize more boundaries than Robert’s method. These fetched fringe images are then
distended, and the trenches are filled. The out-turn periphery images are used in
straining the eyebrow portions.
C Mouth extraction: The mouth entity in the input image needs to be detected and
strictly segmented and features to be extracted in order to stage mouth rift perusal.
The head (upper jaw), base (lower jaw), right most, and left most positions of the
mouth are fetched, and the core or centroid of the mouth is computed. Graphs:
Classification accuracy for Xceptional model in CNN is depicted as below, Y-
axis: Accuracy in percentage computational time required for Xception model
in CNN is depicted as below, and Y-axis: time in hours.
D RNN classifier: Classification tells us to which category the text comes under,
[1013] here we are using RNN classifier for both text [14,15] and emotion of
user posts. RNN classifier: It works in three stages: The first stage moves through
hidden layer and makes it prediction; in the second stage, it compares with the
data present, and in the final stage, it calculates gradient for each node. This
model is used because each layer gives short-term memory, by using this we can
predict the next level in a very easy way, and it is mainly used in sentimental
analysis and speech tagging, etc. Fig. 1.3 shows the amount of each emotion
expressed in the image and Fig. 1.4 shows the emotion detected from the image
along with the proportion of other emotions that are possible.
1.5 Results and Discussions
The main objective and outcome of our work are to analyze the emotions of users
from the kind of activities they do through posts extracted from Instagram forum
by applying PDA(CNN) classification technique to image input dataset and RNN
classifier to text dataset [3,15,16,17]. The image emotion detected in posts is
6 M. Harini and B. Sivakumar
Fig. 1.3 Illustration of proportion of each emotion in Fig. 1.4 using CNN
Fig. 1.4 Emotion depicted in image of Instagram post using PDA
classified into seven categories, and text emotion is classified into three categories
of nature.
References
1. Goel, A., Gautam, J., & Kumar, S. (2016). Real time sentiment analysis of tweets using
Naive Bayes. In 2nd International Conference on Next Generation Computing Technologies,
Dehradun.
2. Praveen, P., Sudheer, P., & Sudheer Kumar, K. (2018). Public sentiment analysis on movie
reviews. In Computer Science and Engineering Department.
3. Bouazizi, M., & Ohtsuki, T. (2016). Sentiment analysis in Twitter: From classification to
quantification of sentiments within tweets. In Proceedings IEEE GLOBECOM (pp. 1–6).
4. Gao, W., & Sebastiani, F. (2015). Tweet sentiment: From classification to quantification. In
Qatar Computing Research Institute Hamad bin Khalifa University PO Box 5825, Doha, Qatar
IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.
5. Hadeghatta, U. R. (2013). Sentiment analysis of Hollywood movies on Twitter. In Proceedings
of the IEEE/ACM ASONAM (pp. 1401-1).
6. Kumar, A., Sharma, A., & Arora, A. (2019). Anxious depression prediction in real-time
social data. In International Conference on Advances in Engineering Science Management &
Technology (ICAESMT)-2019, Uttaranchal University, Dehradun, India.
1 Prediction of Depression-Related Posts 7
7. Smys, S., & Raj, J. S. (2021). Analysis of deep learning techniques for early detection of
depression on social media network—A comparative study. Journal of trends in Computer
Science and Smart technology (TCSST), 3(01), 24–39.
8. Hossain, M. T., Talukder, M. A. R., & Jahan, N. (2021). Social networking sites data analysis
using NLP and ML to predict depression. In 12th International Conference on Computing
Communication and Networking Technologies (ICCCNT) (pp. 1–5). IEEE.
9. Chiu, C. Y., Lane, H. Y., Koh, J. L., & Chen, A. L. (2021). Multimodal depression detection
on Instagram considering time interval of posts. Journal of Intelligent Information Systems,
56(1), 25–47.
10. Nanomi Arachchige, I. A., Sandanapitchai, P., & Weerasinghe, R. (2021). Investigatingmachine
learning and natural language processing techniques applied for predicting depression disorder
from online support forums: A systematic literature review. Information, 12(11), 444.
11. Bouazizi, M., & Ohtsuki, T. (2017). A pattern-based approach for multi-class sentiment analysis
in Twitter. In Proceedings of the IEEE Access (pp. 20617–20639).
12. Mohammad, S. M., & Kiritchenko, S. (2015). Using hashtags to capture fine emotion categories
from tweets. In Computational Intelligence (Vol. 31, No. 2, pp. 301–326).
13. Plank, B., & Hovy, D. (2015). Personality traits on Twitter or how to get 1500 personality tests
in a week. In Proceedings of the 6th Workshop on Computational Approaches to Subjectivity,
Sentiment and Social Media Analysis (pp. 92–98).
14. Agarwal, A., Xie, B., Vovsha, I., Rambow, O., & Passonneau, R. (2016). Sentiment analysis
of Twitter data. In Department of Computer Science Columbia University New York.
15. Ferragina, P., Piccinno, F., & Santoro, R. (2016). On analyzing hashtags in Twitter. In Dipar-
timent of di Informatica University of Pisa Proceedings of the Ninth International AAAI
Conference on Web and Social Media.
16. Garimella, A., & Mihalcea, R. (2016). Zooming in on gender differences in social media. In
Proceedings of the Workshop on Computational Modeling of People’s Opinions, Personality,
and Emotions in Social Media.
17. Bamman, D., & Smith, N. A. (2015). Contextualized sarcasm detection on Twitter. In
Proceedings of the 9th International AAAI Conference on Web and Social Media Citeseer
(pp. 574–577).
Chapter 2
Classification of Credit Card Frauds
Using Autoencoded Features
Kerenalli Sudarshana, C. MylaraReddy, and Zameer Ahmed Adhoni
Abstract With the advent of online payment and business systems, safety of trans-
actions has become an essential factor. In 2020, Internet Crime Complaint Center
(IC3) had received 2,211,396 complaints culminating in a $13.3 billion loss which
is approximately equal to cumulative loss from 2016–19. Credit card fraud detection
is hindered by concept drift, uneven class distribution, identifying a smaller set of
features, and detecting real-time frauds. To overcome the above issues, in this article,
we propose an autoencoder-based classification scheme to extract the characteristics
such as credit card type, credit grade, credit line, book balance, and other features
from a European credit card dataset. Also, the performance of different machine
learning algorithms compared for the classification consistency using encoded fea-
tures. The results obtained are as follows: accuracy of 99.95%, 97.45% precision,
88.26% recall, and 92.36% F1-score.
2.1 Introduction
With the introduction of online payment systems and the migration of businesses to
the Internet, a secure cyber-transaction has become a critical component of payment
security. In 2020, the Internet Crime Complaint Center (IC3) published the number
of complaints and its supporting data (Fig. 2.1). IC3 received 2,211,396 complaints
within that period, resulting in a $13.3 billion loss [4]. A stolen, misplaced, or cloned
credit card can result in fraud. Furthermore, the rise of online purchasing has escalated
the number of cases of card-not-present fraud or the use of a credit card number in an
e-commerce transaction. Application fraud, account takeover fraud, social engineer-
K. Sudarshana (B)·C. MylaraReddy ·Z. A. Adhoni
Department of CSE, School of Technology, GITAM, Bengaluru, India
e-mail: kerenalli@gmail.com
C. MylaraReddy
e-mail: mchinnai@gitam.edu
Z. A. Adhoni
URL: https://www.gitam.edu
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
B. N. K. Rao et al. (eds.), Intelligent Computing and Applications, Smart Innovation,
Systems and Technologies 315, https://doi.org/10.1007/978-981-19- 4162-7_2
9
10 K. Sudarshana et al.
Fig. 2.1 Top five cybercrimes and corresponding losses from 2016–20. Source https://www.fbi.
gov
ing fraud, and skimming fraud are some of the most frequent categories of payment
card fraud. The number of relevant research initiatives in the literature has increased
because of the significant socioeconomic implications of recognizing credit card
frauds. Recent approaches for identifying fraudulent credit card transactions include
classification [3,14,17], clustering [8], anomaly detection, and association analy-
sis [13]. Credit card fraud detection is challenged by concept drift, imbalanced class
distribution, discovering a smaller collection of attributes, and detecting real-time
frauds [1]. New fraud issues are also arising as mobile devices become more widely
utilized [6]. Finding a representative set of attributes for a classification task is diffi-
cult. Some of the most common ways for reducing the attribute set include sampling,
aggregation, linear algebra-based principal component analysis, and discrete wavelet
analysis.
In this proposed method, a nonlinear principal component analysis methodology,
“autoencoder, is used to extract representative features by decreasing the number of
attributes. The proposed approach uses the European credit card dataset to explore the
effectiveness of multiple classifiers that use this set of characteristics. The classifier’s
accuracy, precision, recall, and F1-score assess its performance.
The paper is organized as follows: In Sect. 2.2, literature survey is given.
Section 2.3 outlines the autoencoder mathematical description. Section 2.4 provides a
brief description of the proposed technique. In Sect. 2.5, the outcomes are examined,
and a brief report on the experimentation results is provided. Finally, in Sect. 2.6,we
conclude the article with a scope for further investigation.
2 Classification of Credit Card Frauds Using Autoencoded Features 11
Table 2.1 Performance of summary of recent methods
Author Year Method Algorithms Results
Sudha and Akila [14]2021 MVE WMSP, RF, SVM F1-score (87%)
Visalakshi and
others [16]
2021 Classification RF, SVM, GNBL Accuracy (99.78%)
Janvitha and others [8]2021 Clustering HMM AUC (RP = 69%)
Asha and Suresh [2]2021 Classification SVM, k-NN, ANN F1-score (78.58%)
Jaiswal and others [7]2021 IFLOF analysis IFLOF Accuracy (97%)
Taha and Malebary [15]2020 Classification OLGBM F1-score (56.95%)
Zhang and others [18]2019 Deep learning HOBA F1-score (48.80%)
Puh and Brki´c[12]2019 Classification LR, SVM, RF Accuracy (91.07%)
Navanshu and
others [11]
2018 Data mining LR, DT, SVM, RF Accuracy (98.6%)
2.2 Literature Review
This section examines several notable studies for credit card fraud detection. The
authors have used several computational methods for detecting such a fraud trans-
actions in their respective work. The number of relevant research initiatives in the
literature has increased because of the significant socioeconomic implications of
recognizing credit card frauds.
Homogeneity-oriented behavior analysis and deep learning were employed [18].
SMOTE was used to balance the European cardholders dataset and applied different
machine learning approaches [12].
Generative adversarial networks (GANs) model used to produce fake instances
and address the imbalanced class dataset problem [5]. The goal was to increase the
effectiveness of fraud detection methods. An under complete autoencoder network
presented by using a supervised learning technique [9] for dimensionality reduction
and fraud detection. Table 2.1 presents the various approaches that have been used
recently and the outcomes acquired.
2.3 Autoencoders
In this section, we discuss the autoencoders and their mathematical model.
2.3.1 Basic Architecture
Autoencoder is an unsupervised, nonlinear approach for detecting and eliminating
nonlinear correlations in data. The autoencoder is used to reduce dimensionality by
12 K. Sudarshana et al.
Fig. 2.2 Autoencoder
architecture. Source www.
wikipedia.org
removing redundant data. Feed-forward, non-recurrent architecture is the simplest
architecture of an autoencoder. It consists of two components: encoder and decoder
as shown in Fig. 2.3. The number of neurons in the input and output layer remains
equal (Fig. 2.2).
2.3.2 Mathematical Model for Autoencoder
Let X,Frepresents input vector, feature vector. Let ,, be the encoder and decoder
functions, such that:
:XF
and
:FX
The input be xof set Xis mapped as a latent vector hof set Fas in Eq. 2.1, where
σis an activation function, Wis the set of weights, and bis the set of biases.
h=σ×(W×x+b)(2.1)
The decoder network then constructs the approximation form xof xas given in
Eq. 2.2, and where σis an activation function, Wis the set of weights and bis the
set of biases, such that
x=σ×(W×h+b)(2.2)
The overall network then tries to minimize the reconstruction loss, L(x,x)as it is
given in Eq. 2.3.
L(x,x)=||(xx)||2(2.3)
2 Classification of Credit Card Frauds Using Autoencoded Features 13
Fig. 2.3 Model for feature extraction
Fig. 2.4 Model for classification phase
2.4 Proposed Method
The proposed method consists of the following steps:
1. Feature Extraction Phase
2. Classification Phase.
2.4.1 Feature Extraction Phase
The model’s architecture for feature extraction has input, an encoder, a bottleneck,
a decoder, and an output layer. Each input records x as 30 features, and hence,
the input layer has 30 neurons. They are connected to the neurons in the encoder.
The encoder and decoder are comprised three layers of a feedforward network. The
output of each layer is forwarded to the next layer. The output of the last layer is
input to the neurons of the bottleneck layer. The bottleneck layer function is to reduce
the total number of attributes to 15. These attributes are used as input to decoder.
The decoder network reconstructs the approximation of the original input, x.Itis
depicted as in Fig. 2.3.
2.4.2 Classification Phase
In the proposed method, the latent feature vector represents the encoded version of
the input vector. The small set of fraud class instances are encoded efficiently by the
encoder network during the training phase. Thereby, it eliminates the skewed data
distribution and feature engineering problems. Once the network learns the better
approximation, decoder is discarded, and the classifier model is connected to the end
of the bottleneck layer. It is as shown in Fig. 2.4.
14 K. Sudarshana et al.
2.5 Results and Discussion
This section discusses the dataset used, experimental setup, evaluation metrics con-
sidered, and a brief description of the experimental results.
2.5.1 Dataset
Credit card features [1] are used to figure out account holders’ buying habits, which
are strongly linked to their attributes, such as income and age. The details on credit
card purchases can be found at [10]. There are 284,807 instances; 284,315 are valid
transactions and 492 fraudulent. Each record has 30 attributes. Fraudulent transac-
tions are negative, and legal ones are positive [9].
2.5.2 Experimental Setup
An Intel Core i3-7000 processor running at 3.90 GHz was used to execute the pro-
posed credit card fraud detection experiment. 8GB of RAM is installed. The tests
were carried out using a 64-bit Windows X64 operating system. The TensorFlow
framework was used to create the autoencoder method, and the sci-kit learn package
was used to implement and evaluate the machine learning approaches.
2.5.3 Evaluation Metrics
Let, Tbe the number of examples, then the number of true positives be TP, the
number of true negatives be TN, the number of false positives be FP and FN be the
false negatives. Following are some of the assessment measures evaluated in this
study:
Accuracy =Acc.=TP +TN
TP +TN +FP +FN (2.4)
Precision =Pre.=TP
TP +FP (2.5)
Recall =Rec.=TP
TP +FN (2.6)
F1-score =F1 =2Precision Recall
Precision +Recall =2TP
2TP +FP +FN (2.7)
2 Classification of Credit Card Frauds Using Autoencoded Features 15
Table 2.2 Performance of proposed method against the existing techniques
Classifier Precision Recall F1-score Accuracy
ANN [2]81.13 76.19 NA 99.92
HOBA [18]36.18 75.00 48.80 96.51
OLGBM [15]97.34 40.59 56.95 98.40
MVE
method [14]
92.50 81.50 87.00 98.50
Proposed method 97.45 88.26 92.36 99.95
Fig. 2.5 Comparison of different ML algorithms
2.5.4 Results and Discussion
Our experiments yielded the following findings (shown in Table 2.2). Random for-
est and CatBoost outperformed the other algorithms, while linear regression (LR)
and light gradient boosting machines (LGBM) were the worst. Out of 98 fraud
transactions, the random forest algorithm correctly classified the false-negative sam-
ples 88.26% of the time. Non-fraudulent transactions accounted for just 11.64% of
the fraudulent transactions. Furthermore, the accuracy of 97.45% indicates that just
2.55% of genuine transactions were misclassified as fraud. However, the noticeable
drawback of this approach is that it required a lot of computational resources and
time to execute.
The CatBoost algorithm correctly classified the true class examples 99.95% of
the time and 86.20% of the time; the false-negative cases were correctly classified.
Non-fraudulent transactions accounted for just 13.80% of the fraudulent transactions.
Furthermore, the accuracy of 97.70% indicates that just 2.30% of actual transactions
were incorrectly labeled as fraud (Fig. 2.5).
We noticed that we achieved the precision of 97.45%, recall of 88.26%, F1-score
of 92.36%, and accuracy of 99.95% when the proposed method’s outcomes were
16 K. Sudarshana et al.
compared to the benchmark results. It has 2.69 times improved specificity, 84.94
percent higher recall, 1.89 times stronger F1-score, and 96.55% better accuracy than
the HOBA technique [18].
2.6 Conclusion
Increased credit card usage demands the detection of credit card fraud. Due to the
technological complexity and continued financial and commercial losses, developing
an efficient system for identifying fraudulent credit card transactions is required. This
paper proposes an effective technique for detecting fraud in credit card transactions
by using an autoencoder for feature selection. The proposed method’s performance
was determined by comparing it to other findings and state-of-the-art techniques.
The proposed strategy outperformed the other accuracy, precision, and F1-score
approaches according to the experiments. The results highlight the importance of a
practical representative feature in strengthening the prediction performance of the
proposed strategy.
References
1. Abdallah, A., Maarof, M. A., & Zainal, A. (2016). Fraud detection system: A survey. Journal
of Network and Computer Applications, 68, 90–113.
2. Asha, R., & Suresh Kumar, K. R. (2021). Credit card fraud detection using artificial neural
network. Global Transitions Proceedings, 2(1), 35–41.
3. Dhankhad, S., Mohammed, E., & Far, B. (2018). Supervised machine learning algorithms for
credit card fraudulent transaction detection: A comparative study. In 2018 IEEE International
Conference on Information Reuse and Integration (IRI) (pp. 122–125). IEEE.
4. FBI. FBI releases the internet crime complaint center 2020 internet crime report, including
covid-19 scam statistics. https://www.fbi.gov/news/pressrel/pressreleases/fbireleasestheintern
etcrimecomplaintcenter2020internetcrimereportincludingcovid19scamstatistics
5. Fiore, U., De Santis, A., Perla, F., Zanetti, P., & Palmieri, F. (2019). Using generative adversarial
networks for improving classification effectiveness in credit card fraud detection. Information
Sciences, 479, 448–455.
6. Gullapalli, V., & Sireeshkumar Kalli, A. V. Cards and payments. https://www.capgemini.com/
in-en/service/cards-and-payments/
7. Jaiswal, S., Brindha, R., & Lakhotia, S. (2021). Credit card fraud detection using isolation
forest and local outlier factor. Annals of the Romanian Society for Cell Biology, 4391–4396.
8. Janvitha, K., Vasavi, C. R. S., Sruthi, A., Praharshitha, K., & Anguraj, D.K. (2021). Survey
on detection of credit card frauds using HMM and various clustering approaches. In 2021
6th International Conference on Inventive Computation Technologies (ICICT) (pp. 101–107).
IEEE.
9. Misra, S., Thakur, S., Ghosh, M., & Saha, S. K. (2020). An autoencoder based model for
detecting fraudulent credit card transaction. Procedia Computer Science, 167, 254–262.
10. mlg ulb: Credit card fraud detection. https://www.kaggle.com/mlg-ulb/creditcardfraud
11. NavanshuKhare, S. Y. S. (2018). Credit card fraud detection using machine learning techniques.
International Journal of Pure and Applied Mathematics, 557, 825–837.
2 Classification of Credit Card Frauds Using Autoencoded Features 17
12. Puh, M., & Brki´c, L. (2019). Detecting credit card fraud using selected machine learning algo-
rithms. In 2019 42nd International Convention on Information and Communication Technology,
Electronics and Microelectronics (MIPRO) (pp. 1250–1255). IEEE.
13. Seeja, K., & Zareapoor, M. (2014). Fraudminer: A novel credit card fraud detection model
based on frequent itemset mining. The Scientific World Journal, 2014.
14. Sudha, C., & Akila, D. (2021). Majority vote ensemble classifier for accurate detection of credit
card frauds. Materials Today: Proceedings.
15. Taha, A. A., & Malebary, S. J. (2020). An intelligent approach to credit card fraud detection
using an optimized light gradient boosting machine. IEEE Access, 8, 25579–25587.
16. Visalakshi,P., Madhuvani, K., et al. (2021). Detecting credit card frauds using differentmachine
learning algorithms. Annals of the Romanian Society for Cell Biology, 4681–4688.
17. Xuan, S., Liu, G., Li, Z., Zheng, L., Wang, S., & Jiang, C. (2018). Random forest for credit
card fraud detection. In: 2018 IEEE 15th International Conference on Networking, Sensing
and Control (ICNSC) (pp. 1–6). IEEE.
18. Zhang, X., Han, Y., Xu, W., & Wang, Q. (2021). Hoba: A novel feature engineering methodology
for credit card fraud detection with a deep learning architecture. Information Sciences, 557,
302–316.
Chapter 3
BIVFN: Blockchain-Enabled Intelligent
Vehicular Fog Networks
Priyanka Gaba and Ram Shringar Raw
Abstract Vehicular ad-hoc network (VANET), a network that facilitates inter-
vehicle communication regarding road safety or entertainment. The popularity of
VANET is because of its increasing demand and its blend with the latest tech-
nologies like cloud, IoT, AI, and ML. The expansion in vehicles on the road and
their communication leads to the need to use fog rather than cloud to provide low
latency to a real-time system VANET. The fog computing together with VANET is
termed as vehicular fog network although providing various advantages, but on the
other side compromises the network’s security. To improve the security and privacy
of the VANET system, blockchain an immutable, peer-to-peer, decentralized, and
distributed ledger-based technology seems suitable for making VFN a safer and more
reliable system. The combination of fog computing and blockchain technology with
VANET is termed as blockchain-enabled intelligent vehicular fog network (BIVFN).
This paper discusses the VFN also the architecture, and phases of BIVFN are explored
in detail. BIVFN can attain the security and privacy requirements well because of
blockchain and fog computing.
3.1 Introduction
Vehicular ad-hoc network (VANET) is a special type of mobile ad-hoc network
(MANET), which is a self-organized wireless communication network that lets the
vehicles transmit information between vehicles and road-side units (RSU) in real
time [1]. VANET provides value-added services and a well-organized road to create a
more efficient and safe traffic environment for vehicles [2]. The connections between
P. Ga b a ( B)
Department of Computer Science and Engineering, School of Information, Communication and
Technology, Guru Gobind Singh Indraprastha University, Dwarka, New Delhi, India
e-mail: priyanka.gaba2202@gmail.com
R. S. Raw
Department of Computer Science and Engineering, Netaji Subhas University of Technology, East
Campus, Delhi, India
e-mail: rsrao@aiactr.ac.in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
B. N. K. Rao et al. (eds.), Intelligent Computing and Applications, Smart Innovation,
Systems and Technologies 315, https://doi.org/10.1007/978-981-19-4162-7_3
19
20 P. Gaba and R. S. Raw
vehicle and infrastructure are utilized for communicating vehicles regarding road
conditions for safety-related and non-safety-related applications. Connected vehicles
provide a platform for cloud or fog computing to share safe and secure information
and evolve smart transportation systems for future VANET. Privacy and security are
the key challenges of connected vehicles in VANET. Any person connected with a
vehicle, like a car owner, mechanic, or any official personnel involved, could breach
vehicle data security and can harm easily. The possible security threats caused by
attackers are the security of devices and linkages, data validation, access control,
data privacy of drivers and vehicles [3]. Therefore, developing security and privacy
solutions for connected vehicles in VANET is a more challenging task.
Security of the VANET with connected vehicles includes the protection of safety
control and infotainment systems, hardware security, software maintenance, inter-
facing security, pedestrian communications with vehicles, and RSU through smart-
phones, etc. We focus here on securing the connected vehicle platforms where we
identify a novel method that all stakeholders will require to come together and
communicate with each other about threats, hacking, and attacks [4]. Researchers
in this field have done much research; still, some issues exist in the current system.
Cloud computing helps in providing services like storage, infrastructure, and better
computing power to connected vehicles and charging them as per their requirements
[5]. Cloud computing suffers from various threats like data breaches, data loss, weak
identity, access management, denial of service, and many more. Fog computing
brings the functionality of cloud computing at the edge of the network by utilizing
the devices capable of offering its features as a fog device to the required vehicle
[6]. Thus, the network integrated with vehicles, RSU, and fog devices is known as
vehicular fog network (VFN).
One significant problem with VFN is that data is stored on a single centralized
cloud server, which causes major security issues; if one entity is hacked, the whole
system gets compromised. Blockchain is a decentralized and distributed technology,
removes this drawback of one entity being hacked. It deals with many connected vehi-
cles involved in the process of writing and verifying transactional data and carrying
all verified transactions in a block [7].
The paper’s organization is as follows: Sect. 3.2 introduces the vehicular fog
network. The complete architecture and phases of BIVFN are discussed in Sect. 3.3.
Section 3.4 discusses the security requirements attainments by BIVFN and also its
challenges. Section 3.5 summarizes the paper.
3.2 Vehicular Fog Networking
Fog computing, a phrase, was being formed by Cisco. Unlike the cloud, a central-
ized server, the fog provides decentralized distributed computing features at the
network’s edge. Using this feature, fog exhibits a better solution to the limitations
of cloud computing [8]. Fog functionality could be provided by any device known
as a fog device that is capable enough to share resources on rent. Fog computing is
3 BIVFN: Blockchain-Enabled Intelligent Vehicular Fog Networks 21
Registered
Vehicle
Data
Registration
/ Verification
Vehicle
Data
Maps & Traffic
data
Sensors,
Infrastructure
Rent out
Resources
Need Storage Space
or Computation
facility
Road Activities
Take Quick Decision
and provide
temporary storage
Vehicle
Communication
Transfer
Messages
Provides storage
to data for
future Analysis
Vehicle
Control /Road
Navigation
Cloud
Server
Fog
Devices
Vehicles
Fig. 3.1 Vehicular fog networking (VFN)
best applicable for those applications which are time-sensitive and require a quick
response.
IoaV is one of the substantial applications of fog computing, and this integration is
known as vehicular fog network (VFN) [9]. VFN gives the advantage of low latency,
less network bandwidth requirement, security, and more reliability as vehicles need
not communicate data to the cloud. Apart from this, fog is also involved in segregating
data, forwarding it, or making real-time decisions for vehicular communication [10].
Despite sending complete data to the cloud, fog sends data required for some future
analysis. VFN deals with the mobility management of vehicles between the number of
fog servers to maintain service quality and provide essential solutions in the network.
Components of VFN with its functionalities and their connections are shown in
Fig. 3.1. Various authors have proposed their architecture, algorithms, and ideas for
VFN to make the system efficient. VFN still faces a challenge of security which
could be handled by applying blockchain to it.
3.3 BIVFN: Blockchain-Enabled Intelligent Vehicular Fog
Network for Connected Vehicles
This section presents the blockchain security framework with vehicular fog
computing as a novel vehicular network architecture BIVFN. The vehicular network
architecture combined with fog computing provides features of cloud computing at
the edge of the network, making the blockchain security transactions faster. This
system is referred to as VFN. To make VFN more secure by storing the value
of reward points and the trustworthiness of vehicles in a traffic environment, the
blockchain concept is applied to VFN. Together with fog computing, the blockchain
concept could resolve the major security concern in the IoV environment. Therefore,
we have integrated blockchain concepts with VFN and proposed a new network
called blockchain-enabled intelligent vehicular fog network for connected vehicles
(BIVFN).
22 P. Gaba and R. S. Raw
3.3.1 Architecture
The architecture of BIVFN comprises four layers, including the VANET layer, fog
layer, blockchain layer, and cloud layer as revealed in Fig. 3.2. Each layer has its
distinct components and roles. VANET layer consists of vehicles on the road which
are responsible for inter-communication. Each vehicle can be part of the system in
the following ways. First is every time a new vehicle needs to first register into the
system to get certificates and keys. This registration helps the system to identify and
track each vehicle. Second, the vehicles can initiate an event that happened on the
road and can report the same on the system. The third is for any initiated event, nearby
vehicles act as validating nodes. Forth is vehicles receive messages of validated event
that has just occurred nearby the vehicle’s location and may impact its journey.
The fog layer contains fog devices that are responsible for the following functions.
First, fog provides cloud features like IaaS, SaaS, and PaaS to the desirable vehicles
on the road [11]. Second, fog devices are involved in the process of registration and
authentication of vehicles to assist the blockchain network. The third fog node also
performs the task of RSU as in doing the initial verification of the vehicle identity
Cloud
Block n-1
Node
5
Node
1
Node
4
Node
3
Node
2
Block n Block n+1
Fig. 3.2 Architecture of BIVFN
3 BIVFN: Blockchain-Enabled Intelligent Vehicular Fog Networks 23
and location. Forth fog devices also identify any events nearby vehicles and provide
those to the blockchain network to be utilized as validating nodes. Fifth, fog devices
also act as a node of the blockchain network, which keeps a replica of the ledger.
The blockchain layer consists of all the components that are involved in the
complete functioning of the system. Blockchain can be implemented with any of
the different available platforms depending on one’s need and suitability.
The cloud layer is responsible for supporting the fog devices for carrying out the
task. It also stores the data of registered users, map-related data, and traffic or other
data that may be required for future analysis.
The BIVFN system demands some security measures to be taken by the entities
of the network to keep the system reliable and secure. One of the requirements is
that every user should keep its keys safe by either using encryption, secret sharing,
or physical locks. Another is that the system must have a key compromise policy to
reduce the risk if in case it is compromised.
3.3.2 Phases of BIVFN
BIVFN, a blockchain-enabled vehicular fog network, will perform the tasks corre-
sponding to VANET in the blockchain network using a fog node. The tasks related
to VANET include sharing road status like traffic information, red light failure, jam
information; borrowing services or infrastructure from fog node; registration of a new
vehicle in the network; authentication of vehicles; vehicle’s score updates according
to involvement; and accessing scores of vehicles. All these tasks act as transactions of
the blockchain, which are combined to create a block. The phases carried out in this
complete process include registration, transaction creation, and block creation. The
entities involved in these phases involve vehicles, fog nodes, application interfaces,
and blockchain networks. The steps followed in each phase are shown in Fig. 3.3 are
discussed below.
1. Registration phase
The vehicle enters its basic and owner details on the application interface for
registration in blockchain. The application interface then forwards these details
to the fog node. After that, the fog node forwards the details to blockchain after
performing the initial verification of the vehicle. In blockchain, the new vehicle
details are stored as transactions, and transactions are added to the unverified
transaction pool. Blockchain will generate the certificate and key pair for the
vehicle for further communication in the network. The generated certificate and
key pair are forwarded to the vehicle through the application interface. Then, the
application interface informs the vehicle about successful registration.
2. Transaction creation
Any transaction from the above list could be initiated by either a vehicle or fog
node and reports the same on the application interface. Application interface will
forward the details of a new transaction to fog if that transaction is requested by
24 P. Gaba and R. S. Raw
Vehicle Application
Interface Fog Device Blockchain
Network
1.2 Registration
Request 1.3 Registration
Request
1.4 Generates
Certificates & Keys
1.5 Registration
Response
1.6 Registration
Response
1.1 Sends details
for registration
Other
Vehicles
2.1 Transaction
initiation
2.1 Transaction
initiation
2.2 Transaction
initial
verification
request 2.3 Performs
initial
verification
2.4.1 Rejects transaction
after initial verification 2.4.2 Sends
transaction for
validation
2.5 Performs
transaction
Validation process
3.1 Block creation
3.2 Block
Broadcasting
3.2 Block
Broadcasting
3.2 Block
Broadcasting
3.3 Block
Validation
3.3 Block
Validation
3.3 Block
Validation
3.4 Blockchain
update
3.4 Blockchain
update
3.4 Blockchain
update
Phase 1:
Registration
Phase 2:
Transaction
creation
Phase 3:
Block creation
Fig. 3.3 Phases of BIVFN
3 BIVFN: Blockchain-Enabled Intelligent Vehicular Fog Networks 25
any vehicle. Fog node also performs the features of RSU so that it will evaluate
the event for initial verification. Fog node checks the basic details as first-level
verification and performs one of the below actions. If the transaction fails initial
verification performed by fog, the event is rejected and will send the status back
to the vehicle through the application interface. If the transaction passes initial
verification or is initiated by the fog node, then it is forwarded to the blockchain.
The list of probable nearby vehicles is also attached with the transaction that
acts as a validation node. The blockchain network will validate the transaction
by involving nearby vehicles suggested by the fog node.
3. Block creation
All leftover transactions from the unverified transaction pool are collected
and ordered, and a block is created out of those transactions. The newly created
block is broadcasted to the vehicles and fog nodes of the network. The vehicles
and fog node will validate the block. After validation, the newly created block is
updated to the ledger of each node, and thus blockchain is updated.
3.4 Security Requirements Attainment in BIVFN
BIVFN, a blockchain and fog computing-based VANET, can overcome its issues
and fulfill the expected requirements. Blockchain can attain VANET’s requirements
with the features like immutable, cryptography-based, distributed, and decentralized
ledger. Fog computing provides cloud-like features on the edge helps of support in
real-time constraint requirements. The attainment of VANET’s security requirements
by the distinct features of fog and blockchain are listed below which also depicts the
efficiency of our proposed model as compared to conventional models.
Authentication: It is ensured by verifying that every involved vehicle is already a
registered user of the system having a well-defined identity.
Non-repudiation: The vehicle’s identity is recorded during communication, and
hence, the vehicle cannot deny its involvement in the system.
Confidentiality: The blockchain network encrypts the identity of the involved
vehicles, and only the public key is visible. The attacker is not able to identify the
actual vehicle behind any transaction.
Integrity: The concept of hashing is applied on transactions and complete blocks,
which does not allow any alteration and ensures the network’s integrity.
Access control: Only the authorized users have restricted access in the network
depending upon its role and hence ensure access control.
Privacy: The unauthorized users cannot access the details of the registered vehicles
of the network as the data is completely stored as hash form.
Data verification: The data transmitted in the network acts as a transaction of the
network and is validated first before adding it to the blockchain network and is
added only if it proves valid.
26 P. Gaba and R. S. Raw
Real-time constraints: Vehicles are moving at high speed and thus need fast notifi-
cations and responses corresponding to any event that fog computing can achieve
well.
3.5 Conclusion
This paper presents a novel architecture of blockchain-enabled intelligent vehicular
fog network for connected vehicles (BIVFN). BIVFN is a VANET incorporating
blockchain technology, and fog computing provides the following features. (i) Fog
computing provides cloud services, memory, and infrastructure on edge, enabling it
to work in real time and give fast responses. (ii) Blockchain being immutable keeps
the data secure and system reliable. (iii) Blockchain’s certificate and key generation
process ensures the authentication and access control in the VANET system. (iv)
The concept of cryptography ensures the integrity of the system. The phases of
BIVFN exploring the complete process are also discussed. The proposed system is
completely reliable and fits well in today’s market demand.
References
1. Pandey, K., Raina, S. K., & Raw, R. S. (2016). Distance and direction-based location aided
multi-hop routing protocol for vehicular ad-hoc networks. International Journal of Commu-
nication Networks and Distributed Systems, 16(1), 71–98. https://doi.org/10.1504/IJCNDS.
2016.073410
2. Raw, R. S., & Lobiyal, D. K. (2011). E-DIR: A directional routing protocol for VANETs in a
city traffic environment. International Journal of Information and Communication Technology,
3(3), 242–257. https://doi.org/10.1504/IJICT.2011.041927
3. Woo, S., Jo, H. J., & Lee, D. H. (2015). A practical wireless attack on the connected car
and security protocol for in-vehicle CAN. IEEE Transactions on Intelligent Transportation
Systems, 16(2), 993–1006. https://doi.org/10.1109/TITS.2014.2351612
4. Raw, R. S., Kumar, M., & Singh, N. (2013). Security challenges, issues and their solutions for
VANET. 5(5), 95–105.
5. Raw, R. S., Loveleen, Kumar, A., Kadam, A., & Singh, N. (2016). Analysis of message propa-
gation for intelligent disaster management through vehicular cloud network. In ACM Interna-
tional Conference on Proceeding Series (Vol. 04–05-Marc). https://doi.org/10.1145/2905055.
2905252
6. Fan, K., Wang, J., Wang, X., Li, H., & Yang, Y. (2018). Secure, efficient and revocable data
sharing scheme for vehicular fogs. Peer-to-Peer Networking and Applcations, 11(4), 766–777.
https://doi.org/10.1007/s12083-017-0562-8
7. Priyanka, & Raw, R. S. (2020). The amalgamation of blockchain with smart and connected
vehicles: Requirements, attacks, and possible solution. In Proceedings—IEEE 2020 2nd Inter-
national Conference on Advances in Computing, Communication Control and Networking,
ICACCCN (pp. 896–902). https://doi.org/10.1109/ICACCCN51052.2020.9362906
8. Siddiqui, S. A., & Mahmood, A. (2018). Towards fog-based next generation internet of vehicles
architecture. In Proceedings ACM Conference on Computer and Communications Security
(pp. 15–21). https://doi.org/10.1145/3267195.3267200
3 BIVFN: Blockchain-Enabled Intelligent Vehicular Fog Networks 27
9. Zhang, L., et al. (2019). Blockchain based secure data sharing system for Internet of vehicles: A
position paper. Vehicular Communications, 16, 85–93. https://doi.org/10.1016/j.vehcom.2019.
03.003
10. Gaba, P., & Raw, R. S. (2020). Vehicular cloud and fog computing architecture, applications,
services, and challenges. In R. S. Rao, V. Jain, O. Kaiwartya & S. Nanhay (Eds.), IoT and
Cloud Computing Advancements in Vehicular Ad-Hoc Networks (pp. 268–296). IGI Global.
11. Aliyu, A., et al. (2018). Cloud computing in VANETs: Architecture, taxonomy, and challenges.
IETE Technical Review, 35(5), 523–547. https://doi.org/10.1080/02564602.2017.1342572
Chapter 4
Deep Learning Approach for Pedestrian
Detection, Tracking, and Suspicious
Activity Recognition in Academic
Environment
Kamal Hajari, Ujwalla Gawande, and Yogesh Golhar
Abstract Pedestrian detection, tracking, and suspicious activity recognition have
grown increasingly significant in computer vision applications in recent years as
security threats have increased. Continuous monitoring of private and public areas in
high-density areas is very difficult, so active video surveillance that can track pedes-
trian behavior in real time is required. We present an innovative and robust deep
learning system as well as a unique pedestrian dataset that includes student behavior
like as test cheating, laboratory equipment theft, student disputes, and danger situ-
ations in institutions. It is the first of its kind to provide pedestrians with a unified
and stable ID annotation. Again, we also presented a comparative analysis of result
achieved by the recent deep learning approach of pedestrian detection, tracking,
and suspicious activity recognition methods on a recent benchmark dataset. Our
investigation will provide new research directions in vision-based surveillance for
practitioners and research scholars.
4.1 Introduction
Video surveillance is now installed everywhere to track and monitor pedestrians or
criminals in streets, airports, banks, prisms, laboratories, shopping centers, etc. [1].
The surveillance system is based on a closed-circuit television (CCTV) system.
Recently, Pan-Tilt-Zoom (PTZ) cameras have many advantages over traditional
CCTV cameras. The main advantage of a PTZ camera is that it allows users to
view more content than a fixed camera. The features of the PTZ camera include:
(1) The user can pan left and right and tilt up and down to obtain a complete 180°
view, whether it is left or right or up and down. If installed and positioned correctly,
K. Hajari (B)·U. Gawande
Department of Information Technology, Yeshwantrao Chavan College of Engineering, Nagpur,
Maharashtra 441110, India
e-mail: kamalhajari123@gmail.com
Y. G o l h a r
Department of Computer Engineering, St. Vincent Palloti College of Engineering and
Technology, Nagpur 441108, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
B. N. K. Rao et al. (eds.), Intelligent Computing and Applications, Smart Innovation,
Systems and Technologies 315, https://doi.org/10.1007/978-981-19-4162-7_4
29
30 K. Hajari et al.
advanced PTZ cameras can provide a complete 360° field of view. Therefore, a
single pan/tilt camera can replace two or even three fixed-view cameras, which is
very suitable and can almost eliminate most of the blind spots on cameras with
deviated fixed-view angles. A PTZ camera is programmed to rotate automatically in
multiple directions at a different view of an area. Researchers are currently working
on creating a video surveillance system that can assess pedestrian methods in real
time [2].
The challenge of identifying pedestrians in crowded environments becomes
extremely challenging in real time when low-resolution images, motion blur, contrast
illumination, scale or size of pedestrian changes, and entirely or partially obscured
outlines are present. Figure 4.1 illustrates the motivation for the proposed effort. In the
Caltech [1], INRIA [2], MS COCO [3], ETH [4], and KITTI [5] datasets, pedestrian
cases are typically modest. Localizing these small instances in presence of illumi-
nation change and occlusion is a challenging task due to issues like (1) hazy appear-
ance, (2) blurred and unclear boundary, (3) overlapped pedestrian instances, (4) small
and large size instances having different characteristics, etc. The advanced research
on pedestrians’ analysis conducted on publicly available benchmark datasets. These
datasets have several limitations, including: (1) a limited variety of pedestrian stances
captured in a controlled context. (2) These datasets contain data with a short time
interval between each unique ID’s succeeding frames. (3) While all of these pedes-
trian datasets are recorded in the urban areas and specifically on city road, car parking,
and public and private places. No student behavior dataset is available. A robust and
novel deep learning model and student academic environment dataset proposed in
this paper. During each sequence of frames in the video, human experts annotated
the behavior of student pedestrians. We provide data in three categories. (1) Using
bounding boxes to locate pedestrians, (2) fully labeled, and (3) unique IDs are used
as a class category of annotated pedestrian.
The contributions of this paper are outline as follows:
1. To solve existing state-of-the-art database concerns such as size and illumination
variance in pedestrian images, we present the unique enhanced mask R-CNN
deep learning architecture.
2. We propose a student activity dataset. In which, we have recorded the student
normal and suspicious activities.
3. Within the framework of the proposed pedestrian dataset for academic settings,
we conduct a comprehensive review of previous work and compare state-of-the-
art methods.
The remainder of this paper is organized as follows: Sect. 4.2 covers the most
significant studies in the new pedestrian dataset, as well as concerns and challenges
in the academic context. A deep learning architecture is described in Sect. 4.3.The
outcomes of the empirical examination are discussed in Sect. 4.4. Conclusions are
presented in Sect. 4.5, along with future research directions.
4 Deep Learning Approach for Pedestrian Detection, Tracking… 31
(a) Illumination Variation (b) Pedestrian Size Variation
(c) Pedestrian Occlusion (d) Object Occlusion
(e) Pedestrian Cloth Variation (f) Multi-camera Pedestrian Variation
Fig. 4.1 Issues and challenges of ETH [2] and Caltech [3] dataset. aPedestrian significant change
in the visual as the illumination changes. bPedestrian scale or size changes in images changed
significantly. cPedestrian occlusion affects the detection and tracking result. dPedestrian occlusion
with other road object affects the detection accuracy. ePedestrian cloth variation effects the detection
algorithm accuracy. fMulti-camera captured direction represents a different visual appearance
4.2 Literature Survey
This section describes the most relevant and recent pedestrian datasets. In addition,
we discuss the advanced deep learning approaches of pedestrian detection, tracking,
and suspicious activity recognition, along with its limitation.
4.2.1 Pedestrian Dataset
In this section, we describe the ten commonly used pedestrian datasets by researchers
for pedestrian detection, tracking, and suspicious activity recognition. First, the
Caltech dataset contains 2300 unique pedestrians and 350,000 annotated bounding
boxes to represent these pedestrians. This dataset was created on the city road and
using the camera mounted on the vehicle [3]. Second, the MIT dataset is the first
pedestrian dataset, which consists of high-quality pedestrian images. This dataset
consists of 709 unique pedestrians. Whether in front view or back view, the range of
32 K. Hajari et al.
pose images taken in city streets [4] is relatively limited. Third, Daimler, this dataset
captures people walking on the street through cameras installed on vehicles in an
urban environment during the day. The dataset includes pedestrian tracking attributes,
annotated labeled bounding boxes, ground truth images, and floating disparity map
files.
The training set contains 15,560 pedestrian images and 6744 annotated pedestrian
images. The test set contains 21,790 pedestrian images and 56,492 annotated images
[5]. The ATCI dataset is a pedestrian database acquired by a normal car’s rear-view
camera, and it is used to test pedestrian recognition in indoor and outdoor parking
lots, city streets, and private lanes. The dataset contains 250 video clips, each of 76
minutes, and 200,000 marked pedestrian bounding boxes, captured in day and night
scenes, with different weather conditions [6]. The ETH dataset is used to observe the
traffic scene from the inside of the vehicle. The behavior of pedestrians is recorded
using a stereo device mounted on a stroller in the automobile. In an urban setting,
the dataset can be used for pedestrian recognition and tracking via mobile platforms.
Different traffic agents, such as cars and pedestrians, are included in the dataset [7].
TUD-Brussels dataset was created using a mobile platform in an urban environment.
Crowded urban street behavior was recorded using a camera mounted on the front
side of the vehicle. It can be used in car safety scenarios in urban environments [8].
One of the most often used static pedestrian detection datasets is the INRIA dataset.
It incorporates human behavior, as well as a mobile camera and complex background
scenes, with various variations in posture, appearance, dress, background, lighting,
contrast, etc. [9]. The PASCAL Visual Object Classes (VOC) 2017 and 2007 collec-
tion contains static objects in an urban setting with various viewpoints and positions.
This dataset was created with the goal of recognizing visual object classes in real-
world scenarios. Animals, trees, road signs, vehicles, and people are among the
20 different categories in this collection [10]. The common object in context was
constructed using the MS COCO 2018 dataset [11]. (COCO). The 2018 dataset was
recently utilized to recognize distinct things in the context while focusing on stimulus
object detection.
The annotations include different examples of things connected to 80 different
object categories and 91 different human segmentation categories. For pedestrian
instances, there are key point annotations and five picture labels per sample image. (1)
real-scene object detection with segmentation mask, (2) panoptic segmentation, (3)
pedestrian key point evaluation, and (4) Dense Pose estimation in a congested scene is
among the COCO 2018 dataset challenges [12]. For street picture segmentation, the
Mapillary Vistas Research dataset is employed [13]. Pedestrians and other non-living
categories are solved using panoramic segmentation, which successfully merges
the concepts of semantic and instance segmentation. A comparison of pedestrian
databases and their video surveillance purposes is shown in Table 4.1. In addition,
we have included our proposed dataset, which will be introduced in the next section.
The connection is made based on the dataset’s use, size, environment, label, and
annotation. These details are used to verify the accuracy of the object detection and
tracking algorithms.
4 Deep Learning Approach for Pedestrian Detection, Tracking… 33
Table 4.1 Comparison of benchmark pedestrian dataset the pedestrian detection and tracking
Dataset Dataset size Annotation Environment Year Ref. Issues and
challenges
Caltech 250,000
frames
2300 unique
pedestrian
City street 2012 [3]Only urban
roads are
captured
MIT 709 unique
pedestrians
No annotated
pedestrian
Day light
scenario
2000, 2005 [4]Missing
annotation
not allow
user to verify
different
techniques
Daimler 15,560
unique
pedestrians
Ground truth
with bounding
boxes
City street 2016 [5]Only urban
roads are
captured
GM-ATCI Video clips:
250
Annotated
pedestrian
bounding
boxes: 200 K
Day and
complex
weather and
lighting
2015 [6]Only urban
roads are
captured.
Side view of
road not
captured
ETH Videos Annotated
cars and
pedestrians
City street 2010 [7]Small size
dataset.
Limited
scenarios
cover
TUD Brussels 1092 frames Ped. Annot.
1776
City street 2009 [8]Only urban
roads are
captured
INRIA 498 images Manual
annotations
City street 2005 [9]Only urban
roads are
captured
PASCAL
VOC 2012
11,530
images, 20
obj., classes
ROI annotated
27,450
City street 2012 [10]Only urban
roads are
captured
MS COCO
2017
328,124
images
Segmented
people obj
City street 2017 [11]Only urban
roads are
captured
MS COCO
2015
328,124
images
Segmented
people obj
City street 2015 [12]Only urban
roads are
captured
Mapillary
Vistas dataset
2017
152 objects,
25,000
images
categories
Pixel accurate
instance
pedestrian
annotations
City street 2017 [13]Only urban
roads are
captured.
Side view of
road not
captured
34 K. Hajari et al.
4.2.2 Proposed Deep Learning Architecture and Academic
Environment Pedestrian Dataset
In this section, we look at the suggested framework from a different perspective, as
captured by a high-quality DSLR camera. The proposed video acquisition framework
records the video at 30 f/s along with 384 ×2160 resolution. The size of dataset is
100 GB. The student behavior frames shown in Fig. 4.2. The orientation of camera
is in the range of 45° to 90°. Yeshwantrao Chavan College of Engineering (YCCE),
Nagpur student academic activity behavior recorded in the proposed dataset. The
student age is between 22 and 27 including both male and female. Out of which
65% are male and 35% are female. The academic environment dataset consists of
different behaviors such as lab student activities, exam hall, classroom, a student
cheating behavior, dispute, and stealing a mobile phone and lab electronic devices
[14].
At the frame level, domain experts annotate the pedestrian video sequence. The
labeling stage contains three phases: (1) human identification, (2) tracking, and (3)
detection of suspicious activities. First, mask R-CNN [12] method is used to deter-
mine location of the pedestrian in the frame, followed by manual validation and
correction of the data. Next, deep sort [15] model used for extracting the tracking
information. At last, with these two basic operations, we get a rectangle bounding
box around pedestrian that defines the ROI for each human. The last stage of the
updating process is performed manually, with a human expert knowledgeable with
the academic environment. For each pedestrian instance in the frame, the height, age,
bounding box ID, feet, frame, body volume, haircut, hair color, head accessories,
apparel, mustache beard activity, and accessories are all specified on the label.
Fig. 4.2 An example of the designed database. The first row depicts a lab fight between two girls.
The scenario of snatching the phone is depicted in the second row.A scenario of a student threatening
is depicted in the third row. The fourth row describes the same critical situation. The fifth row depicts
a situation in which students steal lab material. The sixth row depicts exam cheating scenario in
examination hall
4 Deep Learning Approach for Pedestrian Detection, Tracking… 35
4.3 Recent Deep Learning Architecture
The current deep learning-based pedestrian detection, tracking, and suspicious
activity recognition systems are not as accurate and fast as human vision [2]. Pedes-
trian detection, tracking, and activity recognition are now separated into two cate-
gories: CNN and deep learning. Dollar et al. [3] approach used in pedestrian detection
for face recognition. Again, HOG [16] approach is used for the pedestrian detection.
Next, deformable part model (DPM) [4] and multi-scale histogram of gradient [6]
are examples of conventional approaches. These procedures are computationally
intensive and time-consuming, and they necessitate the participation of humans.
CNN-based deep learning techniques have grown in prominence as a result of their
accuracy in pedestrian identification [7,8]. R-CNN [9] is the first deep learning model
for object detection. Two-stage detectors like R-CNN [9], SPPNet [10], fast-R-CNN
[11], faster R-CNN [12], and mask-R-CNN [13] and single-stage detectors like SSD
[15] and YOLO [17] are examples of deep learning approaches. As a result, real-time
pedestrian detection is now unsuitable. As a result, Kaiming et al. [17] introduced
the YOLO net that is an object regression architecture, to increase detection speed
and accuracy. Later, to improve both detection accuracy and speed when detecting
smaller and densely distributed pedestrians’ different variants of YOLO (v1, v2,
v3, v4, and v5) [4] proposed by the researchers. The proposed improved YOLOv5
method effectively detects small and constant pedestrians.
4.4 Experimental Results
In this section, we describe the results of three tasks performed using methods
regarded to be cutting-edge technology as pedestrian detection, tracking, and suspi-
cious activity recognition. We not only present the results acquired using such strate-
gies in the academic environment database, but we also present our baseline findings
using the same technique in a well-known dataset. Pedestrian detection is the first
step. Both the R-FCN [17] and RetinaNet [18], Mask R-CNN [19] deep learning
frameworks excelled in the PASCAL VOC [19] problems, particularly in pedestrian
identification. The projected dataset results of the two approaches were compared
to results from the PASCAL VOC 2007 and 2012. On top of the ResNet, RetinaNet
uses the feature pyramid network (FPN) as the baseline structure [20,21]. To record
changes in position, R-FCN [17] uses a specific convolutional layer. Instead of a
completely linked layer, we utilize a ROI pooling layer. Mask R-CNN was used to
evaluate the proposed dataset once more [22,23]. The dataset was divided into three
categories: 60% for training, 20% for real-time query, and 20% for testing. APIoU
=0.5 is the standard assessment measure average precision (AP) at the intersection
of union (IoU) values set to 0.5. For two reasons: (1) highest performer in the MOT
challenge; and (2) open-source pre-built framework, pedestrian tracking, the Tracktor
CV [2] and V-IOU [16] approaches provide the state-of-the-art [24,25]. The Tracktor
36 K. Hajari et al.
CV approach is comprised of two stages: (1) a regression component that uses the
detector step’s input to update the bounding box’s current position, and (2) a detector
which keeps track of the bounding boxes for the future frames [26,27]. We noticed
a positive association between both methodologies’ failure situations, which were
related to crowds, and two problematic cases: (1) scenes where trajectories intersect
others at each second due to dense pedestrian density; and (2) scenes where crucial
occlusions of human silhouettes occur. Recognizing suspicious activity, it is harder
to identify or track specific pedestrians within a crowd, and the aforementioned tech-
niques are not suitable for such a situation. Abnormal activities in a spatial context
are detected by a sequence of events with variants in appearance, scale, different
lighting, and pose. Other recent academic researchers have focused on the use of
motion in this direction [28,29]. Finally, collecting regional spatial cuboids from an
optical flow has been attempted to do crowd behavior analysis [3032]. Although the
approaches listed above have been shown to be effective in studies, the most of them
are limited to detecting anomalous behaviors in local or global domains [33,34].
Furthermore, we suggest that coupled contemplation of the motion flows pattern,
variable item sizes, and interactions between surrounding objects in a frame may be
used to portray pedestrian behaviors in a high-density scene, resulting in improved
anomalous activity detection performance.
4.5 Conclusions
We propose the academic environment database in this paper, which comprises video
sequences of pedestrians in indoor academic environments that are annotated at the
frame level. The pedestrian database contains the behavior of students in the institu-
tion. This is the first of its kind dataset that provide a unified and stable pedestrian
ID annotation, making it suitable for pedestrian detection, tracking, and behavior
detection. We have also proposed improved mask R-CNN architecture for pedestrian
detection, tracking, and suspicious activity recognition methods on recent benchmark
databases. This well-organized comparison helps to identify problems and challenges
in this domain. In the future, more experimentation is required for pose estimation
and pedestrian trajectory identification and detection.
References
1. Ahmed, M., Jahangir, M., & Afzal, H. (2015). Using crowd-source based features from social
media and conventional features to predict the movies popularity. In IEEE International
Conference on Smart Cities, Social Communication and Sustained Communication, China
(pp. 273–278).
2. Bergmann, P., Meinhardt, T., & Taixe, L. (2019). Tracking without bells and whistles. In IEEE
ICCV, Seoul, Korea (pp. 1–16).
4 Deep Learning Approach for Pedestrian Detection, Tracking… 37
3. Dollar, P., Wojek, C., Schiele, B., & Perona, P. (2012). Pedestrian detection: An evaluation of
the state of the art. IEEE TPAMI, 34(4), 743–761.
4. Samsi, S., Weiss, M. L., Bestor, D., Li, D., Jones, M., Reuther, A., Edelman, D., Arcand, W., &
Byun, C. (2021). The MIT supercloud dataset. Cornell Journal of Distributed, Parallel, and
Cluster Computing,2108(02037).
5. Silberstein, S., Levi, D., Kogan, V., & Gazit, R. (2014). Vision-based pedestrian detection for
rear-view cameras. In IEEE Intelligent Vehicles Symposium Proceedings, Dearborn, MI, USA
(pp. 853–860).
6. Alom, M. Z., & Taha, T. M. (2017). Robust multi-view pedestrian tracking using neural
networks. In IEEE National Aerospace and Electronics Conference (NAECON), Dayton, OH,
USA (pp. 17–22).
7. Zhang, X., Park, S., Beeler, T., Bradley, D., Tang, S., & Hilliges, O. (2020). ETH-XGaze: A
large scale dataset for gaze estimation under extreme head pose and gaze variation. In ECCV.
Lecture Notes in Computer Science (Vol. 12350). Springer.
8. Wojek, C., Walk, S., & Schiele, B. (2009). IEEE CVPR,Miami,Florida,USA.
9. Nguyen, T., Kim, S., & Na, I. (2013). Fast pedestrian detection using histogram of oriented
gradients and principal components analysis. International Journal of Contents.
10. Everingham, M., Van Gool, L., & Williams, C. K. I. (2010). The Pascal visual object classes
(VOC) challenge. International Journal of Computer Vision, 88, 303–338.
11. Lin, T. Y., et al. (2014). Microsoft COCO: Common objects in context. In ECCV. Lecture Notes
in Computer Science (Vol. 8693, pp. 740–755). Springer.
12. Nicolai, W., Bewley, A., & Dietrich, P. (2017). Simple online and real-time tracking with a
deep association metric. In IEEE ICIP (pp. 3645–3649).
13. Dai, J., Li, Y., He, K., Sun, J., & FCN, R. (2016). Object detection via region-based fully
convolutional networks. In CVPR (pp. 1–11).
14. Hajari, K., Gawande, U., & Golhar, Y. (2021). Deep learning approach to pedestrian detection:
An evaluation of the state of the art. In Computing technologies and applications paving
path towards society 5.0 (1st ed.,). Routledge and CRC Press, Taylor & Francis Group.
ISBN:9780367763701.
15. Everingham, M., Eslami, S., Gool, V., Williams, C., & Winn, J. (2015). The PASCAL VOC
challenge: A retrospective. IJCV, 111, 98–136.
16. Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In IEEE
Computer Society Conference on CVPR, San Diego, CA, USA (pp. 886–893).
17. Kaiming, H., Georgia, G., Dollar, P., & Girshick, R. (2020). Mask R-CNN. IEEE TPAMI, 42(2),
386–397.
18. Viola,P., & Jones, M. (2001). Rapid object detection using a boosted cascade of simple features.
In Proceedings of the IEEE Computer Society Conference on CVPR, HI, USA (pp. I-I).
19. Felzenszwalb, P., Girshick, R., McAllester, D., & Ramanan, D. (2010). Object detection with
discriminatively trained part-based models. TPAMI, 32(9), 1627–1645.
20. He, K., Gkioxari, G., Dollar, P., & Girshick, R. (2017). Mask R-CNN. In IEEE International
Conference on Computer Vision, Venice, Italy (pp. 2980–2988).
21. Liu, W., Anguelov, D., Erhan, D., Szegedy,C., Reed, S., et al. (2016). SSD: Single shot multibox
detector. In European Conference on Computer Vision (pp. 21–37). Springer.
22. Redmon, J., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object
detection. In IEEE Conference on CVPR, Las Vegas, NV, USA (pp. 779–788).
23. Bochinski, E., Senst, T., & Sikora, T. (2018). Extending IOU based multi-object tracking by
visual information. In IEEE International Conference on Advanced Video and Signal Based
Surveillance, Auckland, New Zealand.
24. Barnardin, K., & Stiefelhagen, R. (2008). Evaluating multiple object tracking performance:
The CLEAR MOT metrics. EURASIP Journal on Image and Video Processing 246309.
25. Kratz, L., & Nishino, K. (2012). Tracking pedestrians using local spatio-temporal motion
patterns in extremely crowded scenes. IEEE TPAMI, 34(5), 987–1002.
26. Wang, S., & Miao, Z. (2010). Anomaly detection in crowd scene. In 10th IEEE International
Conference on Signal Processing, Beijing, China (pp. 1220–1223).
38 K. Hajari et al.
27. Wang, S., & Miao, Z. (2010). Anomaly detection in crowd scene using historical information. In
IEEE International Symposium on Intelligent Signal Processing and Communication Systems,
Chengdu, China (pp. 1–4).
28. Muhammad, G., Hossain, M., & Kumar, N. (2021). EEG-based pathology detection for home
health monitoring. IEEE Journal on Selected Areas in Communications, 39(2), 603–610.
29. Muhammad, G., Alhamid, M. F., & Long, X. (2019). Computing and processing on the edge:
Smart pathology detection for connected healthcare. IEEE Network, 33, 44–49.
30. He, K., Zhang, X., Ren, S., & Sun, J. (2015). Spatial pyramid pooling in deep convolutional
networks for visual recognition. TPAMI, 37(9), 1904–1916.
31. Muhammad, N., Hussain, M., Muhammad, G., & Bebis, G. (2011). Copy-move forgery detec-
tion using dyadic wavelet transform. In Eighth International Conference on CGIV, Singapore
(pp. 103–108).
32. Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate
object detection and semantic segmentation. In CVPR, Columbus, OH, USA (pp. 580–587).
33. Girshick, R. (2015). Fast R-CNN. In IEEE International Conference on CV, Santiago, Chile
(pp. 1440–1448).
34. Ren, S., He, K., Girshick, R., & Sun, J. (2016). Faster R-CNN: Towards real-time object
detection with region proposal networks. IEEE TPAMI, 39(6), 1137–1149.
Chapter 5
Data-Driven Approach to Deflate
Consumption in Delay Tolerant Networks
C. Venkata Subbaiah and K. Govinda
Abstract Postponement tolerant systems administration (DTN) is a technique to
PC orchestrates plan that attempts to deal with the specific issues in heterogeneous
frameworks that could require steady accessibility. The capacity to move, or course,
information from a communicating end to a tolerant end is an urgent limit of all data
exchanging frameworks should have. Deferral-tolerant systems (DTNs) are inspected
by their deficiency of accessibility, provoking a temporary affiliation setback between
the centers. In the current testing development, viable uniquely designated directing
shows like DSR and AODV confirmation collection disregard to give correspon-
dence ways between the courses. We exhort a statistics-driven way to deal with keep
away from superfluous use of asset being in the childish or malevolent nodes simul-
taneously as assessing a hub’s trust powerfully in response to alterations inside the
ecological and nodes conditions. We additionally propose an allocated provenance-
basically-based trust control convention where each hub is thought to have ability
to screen its adjoining hubs with perceived probabilities of phony raise and falls in
identifying assault practices or energy level. We initially determine the trust of the
end through message supplier then, at that point, send the message. To decrease the
utilization of energy, during the transmission, we use grouping method what split the
entire message in to certain pockets.
5.1 Introduction
Through the network transmission has been exceptionally simple thing at present,
yet it is far still extremely difficult to communicate data in a couple of networks
which can be regularly in put off and disruption. Because of progress in network
structure and harder environment delay in the route and interruption of networks
are the normal components. In order to address this concern, several researchers’
have proposed several results over the last ten years. However, such methods are all
C. Venkata Subbaiah ·K. Govinda (B)
Scope, Vit University, Vellore, Tamilnadu, India
e-mail: kgovinda@vit.ac.in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
B. N. K. Rao et al. (eds.), Intelligent Computing and Applications, Smart Innovation,
Systems and Technologies 315, https://doi.org/10.1007/978-981-19-4162-7_5
39
40 C. Venkata Subbaiah and K. Govinda
Fig. 5.1 Delay tolerant network
attempt to resolve the issue dependent on the traditional network protocol so these
plans are not viable in some particular cases, which brought about an idea of DTN.
The DTN is created from portable impromptu system (MANET). It is lacking and
infrequent related compose where strong correspondence beginning to end acces-
sibility is not open for message transmission. DTN is expected to work effectively
over maximum remote partitions, for instance, in space correspondence and inter-
planetary scale. In such an area, long dormancy is assessed in hours and days is an
unavoidable. Sensor-based structure, remote structures, earthly remote structures,
and lowered acoustic structures with delays are the examples of DTN (Fig. 5.1).
5.2 Related Work
Postponement tolerant systems administrations (DTNs) are frequently seen in
evolving applications, for example, crisis reaction, exceptional tasks, smooth envi-
ronments, habitation checking, and vehicular ad-hoc networks where numerous
nodes join in group communications to attain a common job. The center charac-
teristic of DTNs is that there is no assurance of start to finish network, in this
way creating high setback or interruption because of innate attributes or deliber-
ately getting misbehave nodes. Overseeing trust capably and adequately is basic to
working with participation or joint effort and dynamic assignments in DTNs while
meeting framework objectives like trustworthiness, accessibility, quality of service
(QoS), as well as adaptability. Precise trust assessment is particularly difficult in
5 Data-Driven Approach to Deflate Consumption 41
DTN conditions since nodes are scantily dispersed and do not frequently experi-
ence one another. Thusly, experience-based evidence trade among hubs may not be
dependably imaginable.
The quality of service (QoS) steering plays a significant factor in QoS provisioning
in mobile ad-hoc networks. Bidirectional inquiry was utilized in Shin and Lee (2018)
for accomplishing efficient QoS routing within the sight of various requirements.
With this QoS, satisfied ways were shown up at with efficient asset usage through
most limited way algorithm and cost-efficient steering algorithm. Notwithstanding,
with different limitations and prerequisites, all the QoS necessities are not supposed
to be satisfied. In Lal et al. (2016), a heuristic neighbor determination strategy with
geographic routing procedure was considered to improve the network length. One
more mixture algorithm upgrading the Cuckoo search was introduced in Deepa and
Suguna (2017) with novel position update model through differential advancement
algorithm with the target of deciding ideal route. To guarantee energy responsive
routing, an enhanced link formal routing protocol was designed in Deepa and Suguna
(2017) utilizing normal time series technique on auto backward coordinated. With
this efficient energy, protecting was supposed to be accomplished. Notwithstanding,
with the transmission nature, undeniable degrees of between way impedances are
said to exist.
5.2.1 DTN Features
By contrasting and the regular Internet, ad-hoc topology, and WLAN, DTN has the
accompanying necessities.
5.2.2 Irregular Interfacing
DTN frequently gets detached because of restriction in portability and capability of
nodes leading frequent changes in the topology of DTN. It implies that DTN tracks
the situation with infrequent interfacing and unfinished connection, causing ensure
less ways between routes.
5.2.3 Low Speed, Inefficiency, and High Queue Interruption
By adding the interruption of each hop on the route gives in general node-to-node
interruption. The interruption is a mix of engendering time, traffic time, and holding
up time. Essentially, because of the infrequent topology of DTN, hop interruption
may be exceptionally high and results extremely less information rate and causing
asymmetric qualities of information rate Along with this, traffic interruption assumes
42 C. Venkata Subbaiah and K. Govinda
a vital role in node-to-node interruption and frequent varieties in DTN make high
lining interruption.
5.2.4 Limited Resource
Node’s assessing and dealing with capacity, correspondence capacity and storage
interstellar are more delicate than the capacity of a normal PC as the price limitations,
volume and energy. What’s more, the restricted storage brought about higher packet
loss rate.
5.2.5 Node Life Time
Node utilizes the battery energy generally, in specific conditions of the forced topolo-
gies on the unpleasant conditions, which decreases the existence season of the node.
At the point when the battery power is lost there is no assurance that the node will
work. It implies that the transmission of the message is unimaginable during the loss
of energy.
5.2.6 Dynamic Topology
See that the DTN topology is dynamic changing for a couple of intentions which
incorporates natural changes, strength exhaustion or diverse screw ups, which winds
up in missing out of networks. Or on the other hand, the necessities of getting into
DTN additionally make topology exchange.
5.2.7 Performance Issues
Procedure of protection inside a DTN forces each band-width use costs on the DTN
links and computational charges at the DTN nodes.
In our short-term task, we have considered of routing protocol safety and site
invitees’ assessment. Delay or interruption tolerant networks (DTNs) are regularly
found in emerging groups which crisis reaction, one of kind tasks, shrewd conditions,
territory observing, and vehicular ad-hoc networks wherein several nodes partake
in bunch correspondences to acquire an ordinary task. The middle component of
DTNs is that there is no assurance of cease-to-cease network, therefore, incurring
unnecessary deferral or interruption due to inherent traits (e.g., wireless medium,
5 Data-Driven Approach to Deflate Consumption 43
resource constraints, or high mobility) or deliberately misbehaving nodes (e.g., mali-
cious or selfish). Managing think about productively and proficiently is critical to
working with participation or joint effort and dynamic obligations in DTNs simul-
taneously as meeting machine wants which incorporate reliability, availability, first-
class service, and/or scalability. Exact accept assessment is for the most part testing
in DTN conditions since hubs are reasonably dissipated and do presently not consis-
tently happen upon each unique. Therefore, coincidentally find fundamentally based
evidence substitute among hubs will not be typically feasible.
We essentially refine the past trust model in by thinking about the accompanying
upgrades:
(a) Minimizing trust bias and (b) minimizing communication cost caused by
trust assessment; and, QoS by minimizing message delivery delay and energy
consumption.
Postponement or disturbance tolerant systems (DTNs) are as often as possible seen
in rising applications including crisis response, one-of-a-kind activity, astute condi-
tions, natural surroundings following, and vehicular advert-hoc systems wherein two
or three hubs partake in organization correspondences to procure a typical under-
taking. The middle element of DTNs is that there is no guarantee of stop-to-stop avail-
ability, as a result causing unnecessary put off or interruption in view of inborn qual-
ities or deliberately getting out of hand hubs. Dealing with acknowledging as valid
effectively and adequately is significant to encouraging participation or joint effort
and decision-making commitments in DTNs even as meeting machine objectives, for
example, dependability, accessibility, nature of administration (QoS), or potentially
versatility. Right accept evaluation is specifically troublesome in DTN conditions
since hubs are with some restraint dispersed and do never again every now and again
happen upon each unique. Therefore, go over essentially based confirmation change
among hubs probably will not be constantly practical.
A very less direct connection revel in DTN environments delays non-stop support
series and may achieve wrong acknowledge as evident with assessment, prompting
negative programming generally execution. A critical test of a provenance-based
machine is that it ought to secure contrary to assailants who might direct or drop
messages like provenance insights or spread fake realities.
Postponement tolerant systems administration (DTN) is a methodology to PC put
together plan that hopes to adjust to the specific challenges in heterogeneous system
that could require constant network accessibility. The capacity to transport, or course,
information from a stock to an outing spot is a major limit all conversation system
should have. Postponement and disturbance tolerant systems (DTNs) are portrayed
through their deficiency of network, achieving a shortfall of right selection of ways
it gives up its methods. In refreshing circumstances, notable uncommonly named
coordinating shows, for instance, AODV and DSR apparent collection disregard to
foundation courses. Nevertheless, when second center-to-center ways are serious
or unfeasible to set up, guiding shows should take to a “extra and ahead” system,
wherein estimations are consistently moved and saved during the organization with
44 C. Venkata Subbaiah and K. Govinda
the assumption that it will finally show up at its objective. A standard strategy used
to help the likelihood of a message being practically moved is to copy different
duplicates of the message inside the presumption that one will accomplish achieving
its objective.
5.3 Proposed Method
In the present study, we suggest a disbursed provenance-based guaranteed manage-
ment protocol. Each node is thought to have the capacity to uncover its adjoining
closes with known probabilities of bogus raises and falls in distinguishing assault
practices or energy levels.
Origin of a node is revived to all its neighbor nodes. Exactly when a source
center picks its unbiased and send bunch, the trade which is sending bundles is pack
modifier, by then it might uncover it as a typical node highlights its neighbor and
advance groups. Direct check is seen upon every association in another hub, while
contorted affirmation is gathered when a DN gets a MM encasing.
In our task, we initially assess the trust of the hub by the assistance of message
transporter then, at that point, sends the message. To diminish the utilization of
energy, during the transmission, we use clustering technique what split the entire
message in to certain parcels. Pack n-number of center points (n1, n2, n3, n4…)
are accessible, and in a gathering, the sensor nodes which have greater essentialness
considered as a bundle head and it will pass on first. A switch expert association
can see the center point nuances, see guiding way, and see time delay. Switch will
recognize the record from the expert community, the gathering head will pick first,
and it size will diminished by the archive size, by then next time when we send the
report, the other node point will be pack head. Likewise, the pack head will pick
unmistakable center ward on most raised imperativeness. The time concedes still up
in the air subject to the coordinating deferment.
Smart sending is a reasonable and suitable way for accomplishing a tradeoff
between the display of the framework and use of essentialness for accomplishing
most outrageous imperativeness capability.
We use distance-based vitality proficient entrepreneurial sending (DEEOF). In this
computation, in case there is any exchange center exists inside the best broadcast
extent of the resource center point and if its worth is not actually the cutoff regard,
by then the source place point progresses the package to that closest center without
replicating the information. This calculation is performed at the source at each contact
time.
We moreover consider center traffic essentials close by its essentialness levels
by making CH assurance. The CH selection relies upon the CH job pivot approach,
where the center point becomes CH in the current round r, it the subjective number
picked by the node.
The limit T(i,r).
5 Data-Driven Approach to Deflate Consumption 45
Fig. 5.2 System architecture
T(i,r)=
Pi(r)
1Pi(r)rmod 1
Pi(r)if node iG(r)
where Pi(r) is the CH selection probability for node iduring round r.G(r)isasetof
eligible nodes for the round r, where the rotation for node Ito become eligible again
is 1
Pi(r)(Fig. 5.2).
5.4 Algorithm
Distance-based energy-efficient opportunistic forwarding
1. Set the edge recurrence.
2. Then, it checks on the off chance that there is any hand-off center point inside
its most limit transmission range. If for sure, the distance of move center not
entirely settled by with no copying a bundle. It gages the distance of the hand-off
without standard cell copy.
3. Calculate the sending practically identical energy-effectiveness distance (FEEDs
for instance to portray the association between the two centers and delay for
equality energy productivity) and P(the likelihood that the distance to the source
hub is more modest by something like one hand-off).
4. If Pis smaller than the threshold value, then it forwards the copy of the packet
to the nearest relay node. Else there is no need to forward the packet copy.
46 C. Venkata Subbaiah and K. Govinda
5. Set the edge recurrence.
5.5 Conclusion
In this investigation, we proposed information-driven strategy to lessen the utilization
of resources inside the proximity of biased or malevolent centers even as assessing
a center point agree with dynamically in light of adjustments inside the regular and
center conditions. We besides proposed a controlled provenance-based guaranteed
control show where every center is acknowledged to have convenience to uncover
its adjoining nodes with regarded chances of fake raises and falls in distinguishing
attack practices or imperativeness level.
We at first survey the trust of the center by the help of message conveyor by then
send the message. To diminish the usage of imperativeness, during the transmission,
we use clustering methodology what split the whole message in to specific bundles.
We moreover considered nodes traffic necessities close by its essentialness levels by
making CH assurance and the CH selection relies upon the CH work unrest approach.
References
1. Spyropoulos, T., Rais, R., Turletti, T., Obraczka, K., & Vasilakos, A. (2010). Routing for
disruption tolerant networks: Taxonomy and design. Wireless Networks, 16(8), 2349–2370.
2. Chen, I.-R., Bao, F., Chang, M., & Cho, J.-H. (2010). Trust management for encounter-based
routing in delay tolerant networks. In IEEE Global Telecommunications Conference, Miami,
FL (pp. 1–6).
3. (2014) Dynamic trust management for delay tolerant networks and its application to secure
routing. IEEE Transactions on Parallel and Distributed Systems, 25(5), 1200–1210.
4. Buneman, P., Khanna, S., & Tan, W. (2001). Why and where: A characterization of data
provenance. In Proceedings of International Conference on Database Theory (pp. 316–330).
Springer-Verlag.
5. Safaei, F., Boustead, P., Nguyen, C. D., Brun, J., & Dowlatshahi, M. (2005). Latency-
driven distribution: Infrastructure needs of participatory entertainment applications. IEEE
Communications Magazine, 43(5), 106–112.
6. Mauve, M., Vogel, J., Hilt, V., & Effelsberg, W. (2004). Local-lag and timewarp: Providing
consistency for replicated continuous applications. IEEE Transactions on Multimedia, 6(1),
47–57.
7. Valancius, V., Laoutaris, N., Massoulie, L., Diot, C., & Rodriguez, P. (2009). Greening the
internet with nano data centers. In Proceedings of the ACM 5th International Conference on
Emerging Networking Experiments and Technologies (pp. 37–48).
8. Choy, S., Wong, B., Simon, G., & Rosenberg, C. (2012). The brewing storm in cloud gaming:
A measurement study on cloud to enduser latency. In Proceedings of the ACM 11th Annual
Workshop on Network and Systems Support for Games (pp. 1–6).
9. Delaney, D., Ward,T., & McLoone, S. (2006). On consistency and network latency in distributed
interactive applications: A survey part I. Presence: Teleoperators and Virtual Environment,
15(2), 218–234.
10. Freire, J., Koop, D., Santos, E., & Silva, C. (2008). Provenance for computational tasks: A
survey. IEEE Computing in Science and Engineering, 10(3), 11–21.
5 Data-Driven Approach to Deflate Consumption 47
11. Somula, R., & Sasikala, R. (2019). A honey bee inspired cloudlet selection for resource
allocation. In Smart Intelligent Computing and Applications (pp. 335–343). Springer.
12. Nalluri, S., Ramasubbareddy, S., & Kannayaram, G. (2019). Weather prediction using clus-
tering strategies in machine learning. Journal of Computational and Theoretical Nanoscience,
16(5–6), 1977–1981.
13. Sahoo, K. S., Tiwary, M., Mishra, P., Reddy, S. R. S., Balusamy, B., & Gandomi, A. H. (2019).
Improving end-users utility in software-defined wide area network systems. IEEE Transactions
on Network and Service Management.
14. Sahoo, K. S., Tiwary, M., Sahoo, B., Mishra, B. K., RamaSubbaReddy, S., & Luhach, A. K.
(2019). RTSM: Response time optimisation during switch migration in software-defined wide
area network. IET Wireless Sensor Systems.
15. Somula, R., Kumar, K. D., Aravindharamanan, S., & Govinda, K. (2020). Twitter senti-
ment analysis based on US presidential election 2016. In Smart Intelligent Computing and
Applications (pp. 363–373). Springer.
16. Sai, K. B. K., Subbareddy, S. R., & Luhach, A. K. (2019). IOT based air quality monitoring
system using MQ135 and MQ7 with machine learning analysis. Scalable Computing: Practice
and Experience, 20(4), 599–606.
17. Somula, R., Narayana, Y., Nalluri, S., Chunduru, A., & Sree, K. V. (2019). POUPR:
Properly utilizing user-provided recourses for energy saving in mobile cloud computing.
In Proceedings of the 2nd International Conference on Data Engineering and Communication
Technology (pp. 585–595). Springer.
18. Vaishali, R., Sasikala, R., Ramasubbareddy, S., Remya, S., & Nalluri, S. (2017). Genetic
algorithm based feature selection and MOE Fuzzy classification algorithm on Pima Indians
Diabetes dataset. In International Conference on Computing Networking and Informatics
(ICCNI) (pp. 1–5). IEEE.
19. Somula, R., & Sasikala, R. (2019). A research review on energy consumption of different
frameworks in mobile cloud computing. In Innovations in Computer Science and Engi-
neering (pp. 129–142). Springer, Singapore. Kumar, I. P., Sambangi, S., Somukoa, R., Nalluri,
S., & Govinda, K. (2020). Server security in cloud computing using block-chaining technique.
In Data Engineering and Communication Technology (pp. 913–920). Springer.
20. Kumar, I. P., Gopal, V. H., Ramasubbareddy, S., Nalluri, S., & Govinda, K. (2020). Dominant
color palette extraction by K-means clustering algorithm and reconstruction of image. In Data
Engineering and Communication Technology (pp. 921–929). Springer.
Chapter 6
Code-Level Self-adaptive Approach
for Building Reusable Software
Components
Sampath Korra, V. Biksham, Kotte Vinaykumar, and T. Bhaskar
Abstract The code-level framework is based on the notion that separate process
units it can be well defined along with the broader process of codes, which are
compatible with the life-cycle model. Centric code framework is chosen to meet the
needs of domain-specific organizations and special projects. The framework is also
independent of any life-cycle model such as a waterfall model or spiral model. We
are able to create adaptive components to show the form and the content of each
type of artifact related to the software, including documentation, code, and test plan
specifications of the component. This article presents the building of adaptive code-
level reusable software components. A set of related components can be adaptive
to create or modify a sample work product. For the work product, each product
components are adaptive in their nature.
S. Korra (B)
Department of CSE, Sri Indu College of Engineering and Technology, Sheriguda(V), Hyderabad,
India
e-mail: sampath_korra@yahoo.co.in
V. Biksham
Department of CSE, Sreyas Institute of Engineering and Technology, Hyderabad, India
e-mail: vbm2k2@gmail.com
K. Vinaykumar
Department of CSE, KITSW, Warangal, Telangana, India
e-mail: kvk.cse@kitsw.ac.in
T. Bh a s kar
Department of CSE, CMRCET, Hyderabad, India
e-mail: bhalu7cs@gmail.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
B. N. K. Rao et al. (eds.), Intelligent Computing and Applications, Smart Innovation,
Systems and Technologies 315, https://doi.org/10.1007/978-981-19-4162-7_6
49
50 S. Korra et al.
6.1 Introduction
The software development life cycle (SDLC) is a series of steps that provide a
common understanding of the software development process. How to develop soft-
ware by capturing business needs and understanding to translate these business ideas
and requirements into features and functionality until used and supported to meet
the business needs. Software engineers must have sufficient knowledge of SDLC
model selection based on the project environment and business requirements. The
software engineering life cycle has a series of phases, and these phases are all the
same as the requirement, analysis, design, implementation, testing, and debugging.
These phases are the same, although not necessarily the same steps with emphasis
or context. Part of the software product available to be changed, in order to develop
new products as well as to test and fix the bug. The test to be applied before it can
test design and/or implementation is to be done. It is necessary to finish the require-
ments before looking at the design phase [1]. In the domain-specific engineering
(DsE), the component is the basic construction unit to develop the adaptive soft-
ware component. The components to frame work products are combined; to form
the product, these work products are aggregated to develop adaptive components.
Adaptive components represent the uniform group of similar parts or components,
each of which can be used for the same purpose. Adaptive components aim is easy
to access and quick to be components, especially for reuse in the new products or
modify existing products to meet the changing needs [2].
6.2 Literature Survey
6.2.1 Adaptive Reuse
The new thinking compatibility concept is widely used generally in electronic circuit
modeling and automatic control of other areas of the construction. Adaptive software
systems can quickly adapt to external changes. Only external changes in the life-cycle
maintenance and affect only operator intervention, so it is necessary and can also
greatly reduces the maintenance costs and time [3].
6.2.2 Adaptable Software System Versus Traditional Software
System
The adaptive software system has many advantages compared to conventional soft-
ware systems. The main advantage in system software is adapted to the system soft-
ware from traditional external changes. The system is hard to adapt to the external
changes. External conditions require reprogramming and changing the overall design
6 Code-Level Self-adaptive Approach for Building Reusable 51
and layout tests, and even a thorough analysis may be required. The changes in the
system life cycle include the necessary highly skilled software engineers, company
operators, programmers, designers, analysts, and staff [4].
6.2.3 Adaptive Methods to Improve the Software System
The ability of adaptive software systems aims to improve the compatibility of the
system software. For adapting to the needs of business and external environment
changes, the following aspects are possible to improve the compatibility of the system
software [5].
6.2.3.1 Improve Interoperability of the Software
Software design is based on business requirements that focus not only on time but
also demand specific accounts. Various changes can be used in the design phase of
the system. When we change the external conditions, the system should be able to
better adapt and do stable work [6].
6.2.3.2 Improve the Description of the Software
The system has a certain ability to describe itself, and changes in external states can
be described as much as possible in the form of parameters. If the conditions change,
only the corresponding parameters need to be adjusted, and there is no need to make
extensive changes to the software. Improve some software tools that are basically
unchanged or often called modules. They can be seen as tools to solve problems in
the design phase, increasing the flexibility and efficiency of the system [7].
6.3 Proposed Work
6.3.1 Software Engineering with Reusable Components
Reusing software development projects build the solution to new products and
processes. Through the reuse, building the solution of particular software devel-
opment, the formation of new products and new technologies are possible [8].
52 S. Korra et al.
6.3.2 Strategy Development Reuse
A strategy is used to create and manage code reuse and processes reuse. The activi-
ties required to define the nature of the strategy depend on the organization, how the
company is marketing the reusable components and systems. Based on the devel-
opment of the application framework for the given domain and reuse, to create the
development of a particular system or maintain the source code is needed [9].
6.3.3 The Incorporation Process to Reuse Project
The incorporation of the reuse process is the subject of the implementation of the
project is a general program for the particular project. The parent organization
projects reuse of public programs for the creation and improvement of the production
process, management code using a particular situation [10].
6.3.4 Process Measurement and Evolution
Reuse process measures and evaluates the performance of the input received in the
form of the captured data, code to create, and manage the use of processes and
products [11].
6.3.5 Management Reuse
A component is the basic unit of software creation and is an independent part of
the software because it interacts with other software components to accomplish a
given task. Reusable software components are designed to leverage software and
development processes. Each component has its own properties [12].
In the field of software reuse, attempt for a good technique to analyze a domain
or best classification schemes to be used. Extending the principles of software reuse
to other areas that support the reusability and management start to every area of the
domain and technical field analysis model for technical non-abstract components
result in we can see. The different stages of the life cycle are as follows [13,14]:
(i) Understand: Understand the problem and determine the structure of a solution
based on the predefined components.
(ii) The reconfiguration: Reconfigure the structure solution to increase the possi-
6 Code-Level Self-adaptive Approach for Building Reusable 53
bility of the use of predefined parts in the next stage.
(iii) Recovery: The acquisition, evaluation, and instantiating the predefined compo-
nents.
(iv) Search: Download exemplary and predefined parts.
(v) Compatibility: Modifying and adapting the components [15].
(vi) Integration: Integrating the pieces into the products of this phase.
(vii) Assessment: The evaluation of the prospects of reusability components that
should be the development of the components obtained by editing the
predefined components to aid in the collection of predefined components.
The different component metrics are as follows [16].
Component Efficiency Metrics (CEM): This metric measuring time and resources
development process are based on the behavior of the component. Time may include
business components and the integration of time.
Component Semantic Efficiency Measurement (CSEM): This metric should be
time and resources developed in the semantic aspect of the behavior of the component
measure.
Component Reliability Metrics (CRM): Estimated time of component reliability
criteria is the possibility of non-defective system operation must be longer than the
specified duration is estimated. Developers are able to use the same techniques used
in traditional systems to obtain such metrics. This may be an important factor to take
into accounts such as portability and fault tolerance and recoverability.
Component Functional Metrics (CFM): Component of functional criteria is the
suitable precision of components, interoperability, and acceptance of the need to
measure the parts of the component-based development (CBD).
Component Customer Satisfaction Measurement (CCSM): Measurement of
customer satisfaction, customer expectations, and component measures how the
software in accordance with the needs. Such needs estimate and decide to reuse,
component makes a plan to guide and can help to develop a test plan that accurately
reflects the use of the product.
Component Cost Metrics (CCM): Measurement of the cost of the overall criteria
for costs incurred during the development of component-based component. These
costs include acquisition costs, component integration, and improve the quality of
the system.
54 S. Korra et al.
Proposed Algorithm
Final goal is to develop adaptive component to reuse that component directly in
other software development
S=
n
i=1
si
siand Sdenote any software component and the complete software component set,
respectively.
6.4 Results and Discussions
In Table 6.1, we allow the values for component compatibility test metrics randomly
by assuming a range of values to get the adaptable components. In the first result,
we provided the values out of boundary level. Hence, none of the components is
compatible with adaptation. CSEM metric and CCSM metrics are an upper bound.
CRM and CCM are lower bound. CEM and CFM metrics are within the range. Table
6.1 is describing the range of software metrics values.
In Fig. 6.1 given in the form of various high values for the different criteria and
finally, none of its components is adaptive in nature this way because of our selection
6 Code-Level Self-adaptive Approach for Building Reusable 55
Table 6.1 Software metrics
with range values Software metrics Range of values
CEM 1.5–2.0
CSEM 1.0–9.0
CRM 2.0–11.0
CFM 1.0–10.0
CCSM 1.0–15.0
CCM 3.0–11.0
Fig. 6.1 None of the component adapted at code-based self-adaptive reuse
of metrics not satisfied the requirements. Hence, none of the components for adaptive
is selected.
We kept the values for component compatibility test metrics. In the second, result
depicts the successful of adaptive components among different components, here,
we provided the values within the range. Hence, the components are compatible with
adaptation. In Fig. 6.2, we had given different values for different metrics and finally
the two self-components comp2 and comp3 are the code-level components which
are adaptive in nature.
We supplied the values for component compatibility test metrics of actual values1
(result1) and actual values2 (result2) for results comparisons. In the second result,
we supplied the values within the range. Hence, the components are compatible with
adaptation at result2.
56 S. Korra et al.
Fig. 6.2 Two components are adapting at code-based self-adaptive reuse
6.5 Conclusion and Future Work
Software reuse is used in almost all engineering disciplines, and systems are the
maintenance costs if the source code of a reused software system. Reuse is possible
at a range of levels from simple functions to framework level an integrated set of
software artifacts. Software reuse production lines need to invest upfront but not
immediately. The value of the promise of the plan and attempt to start the program
reusing. The processes used and the programs should be incorporated into the existing
software development process. The parts must be designed specifically for reuse. By
using different software metrics, we discussed adaptive software reuse for different
technologies it helps to develop adaptive software.
The system leverages the inherent similarities of domain engineering is a domain.
The domain model established the definition of the basic structure of the current and
future needs. The right software processes used to be significant benefits to almost
every aspect of software development. The code gives us the level of compliance to
how to use the software reuse to solve problems that must be understood in any area
of the software code. In order to capture the essence of the technology developed for
the reuse of the software and apply them to other areas, non-application, we need to
identify processes and products for future utilization.
References
1. Prieto-Diaz, R. (1990). Domain analysis: An introduction. ACM SIGSOFT Software Engi-
neering Notes, 15(2).
2. Kelly, T. P., & Whittle, B. R. (1995). Applying lessons learnt from software reuse to other
do mains. In The Seventh Annual Workshop on Software Reuse, St. Charles, Illinois, USA
(pp. 28–30).
6 Code-Level Self-adaptive Approach for Building Reusable 57
3. Ionital, M. T., Hammer, D. K., & Obbink, H. (2003). Scenario based software architecture
evaluation methods: An overview. Technical University.
4. Collier, R. W. (2002). Agent factory: A framework for the engineering of agent-oriented
applications.
5. Taylor, R. N., Tracz, W., & Coglianese, L. (1995). Software development using domain-specific
software architecture.ACM.
6. Grabow, P. C. (1984). Reusable software implementation technology reviews. Hughes Aircraft
Company.
7. Fonseca, S. P. (2002). An internal agent architecture for dynamic composition of reusable
agent subsystems—Part 1: Problem analysis and decomposition framework. Hewlett Packard
Company.
8. Sommerville, I. (2015). Software engineering (10th ed.).
9. Sametinger, Johannes. Software engineering with reusable components. Springer Science &
Business Media, 1997.
10. Tracz, W. (1995). DSSA (domain-specific software architecture) pedagogical example.ACM.
11. Korra, S., Babu, A. V., & Raju, S. V. (2014). The adaptive approach to software reuse. In
International Conference on Contemporary Computing and Informatics (IC3I). IEEE.
12. Sedigh-Ali, S., Ghafoor, A., & Paul, R. A. (2001). Software engineering metrics for COTS-
based systems. IEEE Computer (pp. 44–50).
13. Hu, X., & Tian, Z. (2003). Research on software metrics. Information Technology, 27(5), 58–60.
14. Reddy, K. V., & Korra, S. (2018). Object-oriented analysis and design using UML.BS
Publications.
15. Korra, S., Vasumathi, D., & Vinayababu, A. (2018). An approach for cognitive software reuse
framework. In Second International Conference on Intelligent Computing and Control Systems
(ICICCS) (pp. 1–6). IEEE.
16. Gill, N. S., Lycett, M., & deCesare, S. (2002). Measurement of component-based software:
Some important issues. In 7th Annual UKAIS Conference Proceedings, Leeds, UK (pp. 373–
381).
Chapter 7
Design of a Deep Network Model
for Weed Classification
M. Vaidhehi and C. Malathy
Abstract Deep learning has attained superior performance in various aspects of
human life and the agricultural field. The primary target of deep learning for farming
is to predict the precise crop and weed location over the farmland. Here, a novel dense
convolutional neural networks (DCNN)-based network is proposed to differentiate
weeds from crops. The overall architecture of crop and weed prediction is extensive,
with enormous parameters requiring a longer training time. To handle certain limita-
tions, a novel idea of a training network is proposed to acquire a coarse–fine predic-
tion model, which is merged to achieve outcomes. The evaluation of the anticipated
model and comparison with other prevailing network models is performed using a
real-time dataset collected during the stage by stage growth of paddy. The exper-
imental outcomes and comparison pro-claims that the anticipated network model
outperforms the prevailing approaches like Inception-V4, ResNet, LSTM-RNN, etc.
The simulation is done in MATLAB 2020a simulation environment and evaluated
with metrics like accuracy, precision, F1-score, and recall.
7.1 Introduction
Deep learning (DL) is depicted as a subset of machine learning (ML) in the earlier
90s when thresholds reasoning were utilized to provision a computerized model
resembling biological pathways. This subject of study is currently developing; it
may be split into two time periods: 1943–2006 and 2012–the present [1]. Several
advancements are noticed during the first phase, including training algorithm, chain
rule, neocognitron, handwritten text recognition (LeNET structure), and overcoming
the learning problem. However, cutting-edge algorithms/architectures were created
for various applications in the second phase, including self-driving vehicles, medical,
M. Vaidhehi (B)·C. Malathy
SRM Institute of Science and Technology, SRM Nagar, Kattankulathur 603203, India
e-mail: vaidhehm@srmist.edu.in
C. Malathy
e-mail: malathyc@srmist.edu.in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
B. N. K. Rao et al. (eds.), Intelligent Computing and Applications, Smart Innovation,
Systems and Technologies 315, https://doi.org/10.1007/978-981-19-4162-7_7
59
60 M. Vaidhehi and C. Malathy
language processing, earthquake forecasts, branding, finance, and image recognition.
AlexNet, which achieved the ImageNet competition for object identification called
that of the ILSVRC, is regarded as a milestone in the area of DL [2].
Researchers used deep learning architectures to image prediction and classifi-
cation as they evolved. These designs have been used for a variety of agricultural
purposes as well. In [3], categorization of leaves was carried out using a creator CNN
and RF encoder among the available species, with a CA score of 97.3%. It was,
however, less effective in detecting obstructed objects. Furthermore, the DL method
is used to recognize various plants. Sarker and Kim [4] utilized a user-modified CNN
architecture, whereas the author used AlexNet architecture. The DL method to plant
identification was investigated and found to have a success rate of 91.78%. Further-
more, DL methods are utilized for essential tasks like planting disease identification,
which is the subject of this paper. Some previous study papers have been presented to
summarize DL’s agricultural research (including plant disease identification). Still,
it lacks in some events, i.e., visualization techniques used in conjunction with DL
and stacked versions of diverse DL models adopted for plant disease prediction. The
following are the research contributions:
(1) Here, the input is collected in a real-time environment based on crop growth
stages, i.e., real-time dataset with huge samples and well-suited to perform the
DL classification process.
(2) To design a novel dense CNN (DCNN) to perform the classification and generate
the outcomes. The model acts as a predictor approach to help the physicians
during the time of complexity.
(3) The efficient approach is being trained to handle the over-fitting issues and
evaluate various metrics like prediction accuracy, precision, recall, and F1-score.
The work is structured as: Sect. 7.2 provides an extensive review of the various
existing approaches used for weed prediction along with the pros and cons.
Section 7.3 gives a detailed description of the anticipated DCNN model for weed
prediction with the later structure. In Sect. 7.4, the numerical results attained with the
expected model are provided, and the detailed analysis is done with various existing
approaches to show the model superiority. In Sect. 7.5, the research summary is given
with certain research constraints and ideas for future research enhancements.
7.2 Related Works
The detection of plant density and distribution is a crucial topic of research. We can
assess weed coverage and dispersion over a field using automatic pattern recognition
and image processing approaches. These estimates can then be used to treat weeds
appropriately. Developing a sprayer that can autonomously treat weedy regions in
crops is an intriguing application [5]. Unexpected field circumstances, changing
weather conditions might alter dataset, and differences in lighting conditions are all
critical obstacles in this study domain [6]. With each growth stage, plant morphology,
7 Design of a Deep Network Model for Weed Classification 61
texture and color of products, and weed change dramatically. The majority of
computer vision research aims at detecting distinct weed kinds in diverse images
like segmentation of narrow/broad leaf image. Using wavelet transform, Jørgensen
et al. [7] categorized weeds as narrow and wide weeds. After that, background objects
are removed using the difference between the two channels. Weed classification is
based on tightness, momentum, curve, and Fourier coefficients, and wheat images
are utilized for training. To categorize images, both supervised and unstructured
clustering are employed. Plant border, the interior plant region, breadth and height
with shortest distance of plant are the parameters utilized for classification in [8]
suggested weed detection for maize plants. Image analysis accuracy is measured for
both static and moving robotic images. Weeds are identified using an artificial neural
network to classify them (ANN).
Another study in this field looked at the identification of maize plants and weeds
in various environments. Using the db1 [8] vector, two-level wavelet decomposition
is done, and the power of wavelet coefficients at two levels is determined. The
backpropagation system is then used to classify seven characteristics, along with
the frequency and fractal dimensions of the wavelet coefficient. Potena et al. [9]
developed a weed control spraying system that uses features derived from GLCM,
FFT, and scale-based feature transforms to identify weeds as broad or narrow leaf
(SIFT). Maximum and minimum coefficients are used to categorize weed into two
kinds in two-dimensional FFT. SIFT is based on the image’s Gaussian difference.
In terms of accuracy, SIFT surpassed the other two methods. Simon and Zisserman
[10] developed a method for distinguishing grass weeds from rice using UAV data,
which used a deep convolution network to classify images into weed patches, rice
patches, and other ground, among other things. Another study used SVM and ANN-
based invariant moments and Fourier descriptors feature set to detect weed in sugar
beetX. Mubeen et al. [2] modeled ML approach for detecting weed patches in images
collected from a height of 50 m using a drone. The classifiers are trained using
nine texture features: Harlick’s and GLCM descriptors. For weeds that resemble
rice seedlings, such as grasses, this approach is inaccurate. The closest study by
dos Santos Ferreira et al. [11] concentrates on vegetation categorization using LBP
features followed by GLCM in soy bean. It uses grass density to classify images and
identify those that pose a fire danger along roadways. As a voting majority predictor,
they utilized a mixture of three classifiers: ANN [13], k-NN [14], and linear SVM
[15]. Broadleaf weed density categorization into three wheat crop classes is another
effort similar to our approaches [1617]. Another approach is to assess weed density
in a recorded video frame or image and spray based on density rather than the precise
weed location [1820]. There have been studies on weed coverage, but none have
specifically addressed grass density categorization in rice fields.
62 M. Vaidhehi and C. Malathy
7.3 Methodology
This section involves two major phases: (1) dataset acquisition and (2) classification
with dense CNN model. The simulation is done in the MATLAB 2020a environment,
and metrics like accuracy, precision, F1-score, and recall are evaluated and compared
with other approaches to depict the significance of the anticipated model.
7.3.1 Dataset Description
The dataset is built with the real-time samples collected over some time. The images
are captured using a canon SX730HS 20.3 MP digital camera, where 1000 samples
are collected. Here, 700 samples are used for training and 300 samples are adopted
for testing. The label count is given for the corresponding dataset as paddy 81 and
weed 320, respectively. The input image is resized into 200 rows and 200 columns.
7.3.2 Dense Convolutional Neural Networks (DCNN)
The adoption of multiple FCL is done to perform target domain, and it is generally
known as fine-tuning. It is considered as the essential step for transfer learning with
the utilization of pre-trained networks. Figure 7.1 depicts the experimental flow of
the anticipated model.
The activation features over the initial convolution layers change the pixels based
on specific attributes like color, edge, and line over the window filter. The edge-based
feature passes via the intermediate CNN layers. It is merged with many filters where
the weights are kept randomly (random weights) and updated with backpropagation
training.
The intermediate layer pretends to identify the activated image parts, while the
final layers learn about the discriminative features (pattern and shape) between the
target regions. When the training reaches convergence, then it represents that there
is no further weight change and training accuracy triggers to the maximum value,
thereby terminating the training process. Consequently, the proposed CNN model
is trained and acts as a generic feature extractor like of conventional way of gener-
ating the features. Here, multiple 2D filters pretend to give 4D outputs in every
connected layer, i.e., feature map/filter. Convoluting the features with the provided
filters predicts the occurrence of the features over the image. The filters are equal
to 64, 128, and 256. The convolution operation window keeps moving based on
the stride size. The provided network layer convolves input by eliminating filters
(horizontal and vertical) filters. However, it evaluates the weighted dot products and
given input and subsequently added to the bias term. When the filtering is done along
7 Design of a Deep Network Model for Weed Classification 63
Fig. 7.1 Experimentation workflow
with the input, it utilizes a similar weighted set and bias for performing convolution.
Therefore, it forms feature mapping.
During the training process, the gradient error needs to be BP via the transfor-
mation and evaluate the gradients based on the parameters like batch normalization
transformation. The initial layers are tested, i.e., convolutional-batch normalization,
max-pooling, and ReLU are defined based on the optimal layers with an input of
64 64 (2D scan). The final feature size with convolution [222]is provided for
all 64 filters. It shows that the filter kernels possess two pixels for all filters. There-
fore, this work does not consider more than five convolution layers. The training and
validation are provided to project how the proposed 2DCNN influences the training
64 M. Vaidhehi and C. Malathy
process and assists in a better understanding of the convergence process for all CNN.
To better understand the feature extraction from the provided convolutional layers,
the image for all domains is passed, and every layer is monitored. The observations
help to predict the difference among the patterns, intensities, lines, and edges. The
layers are visualized for the entire testing set and, therefore, helps to segregate the
features from the provided framework in a superior manner. At last, the outcomes
from the various hyper-parameter setting are provided.
The filter size depicts the scanning window during convolution process, and two
strides increment the window size for all consecutive layers. Therefore, the extracted
features are extracted sequentially at lower, intermediate, and higher levels. Here,
lower-level features are removed from filter window 3 33 and maximum-
pooling by 2 22 windows with the stride of the convolutional layers. The filter
size increases based on the step size; however, the filters of every layer are provided
as 64. It is done so to preserve the channel size for the given input 64 64 64.
Thus, the variations are captured more straightforwardly. When the layer moves
more profoundly, the features are accumulated by increasing the window size of
all layers. The maximum-pooling strides are also grown for diminishing the feature
redundancy. The kernel size is uniform, with 3 33 for all convolutional layers.
7.4 Numerical Results and Discussion
Here, four metrics are utilized for quantitative comparison and evaluation, including
accuracy, precision, recall, and F1-score. In the equation given below, True Positive
(TP), True Negative (TN), False Positive (FP), and False Negative (FN) are used for
evaluation, and it is expressed as in Eqs. (7.1)–(7.4):
Accuracy =TP +TN
TP +FP +FN +TN (7.1)
Precision =TP
TP +FP (7.2)
Recall =TP
TP +FN (7.3)
F1 - score =2TP
2TP +FP +FN (7.4)
Table 7.1 depicts the DCNN performance with existing LSTM-RNN, SAE-DNN,
2DCNN, and standard CNN model. Here, metrics like accuracy, precision, recall,
and F1-score are evaluated and compared among these models. Table 7.2 depicts
DCNN performance with the existing Inception-v4, ResNet, and CNN models. Here,
metrics like accuracy, precision, recall, and F1-score are evaluated and compared
7 Design of a Deep Network Model for Weed Classification 65
Table 7.1 Performance metrics evaluation
Metrics LSTM-RNN (%) SAE-DNN (%) 2DCNN (%) CNN (%) DCNN (%)
Accuracy 78 77 83 76 89.47
Precision 68 73 84 76 86.66
Recall 76 77 83 75 89.47
F1-score 72 75 82 64 90.41
among these models. Figure 7.2 represents the accuracy among training and vali-
dation models whereas, Figure 7.3 represents the loss incurred among training and
validation models. Figure 7.4 shows the accurate result of classification between
paddy and weed.
It is considered an excellent alternative to deal with these constraint data. Even
though hybrid approaches have achieved relatively superior outcomes, they do not
Table 7.2 Performance evaluation of proposed versus existing
Metrics Inception-v4 (%) ResNet (%) CNN (%) DCNN (%)
Accuracy 75 82 83 89.47
Precision 67 68 84 86.66
Recall 75 82 83 89.47
F1-score 71 75 82 90.41
Fig. 7.2 Representation of training and validation accuracy
66 M. Vaidhehi and C. Malathy
Fig. 7.3 Representation of training and validation loss
Fig. 7.4 Classified paddy and weed
benefit DL, which extracts features automatically from a large amount of imaging
data. The most commonly adopted DL approaches in computer vision studies are
CNN, specializing in removing images’ characteristics. Currently, DCNN models
are proposed in this work and show superior performance for weed classification.
7 Design of a Deep Network Model for Weed Classification 67
7.5 Conclusion
In this research, a novel deep CNN-based network model is proposed to differentiate
weeds from crops. To handle certain limitations of the general approaches, a novel
idea of a training network is proposed to acquire a coarse–fine prediction model,
which is merged to achieve outcomes. Some research limitations are the complexity
in testing the specific weed growth stage and weeds (grass) types. Also, these network
models do not target the prediction of other kinds of weeds like sedges, broadleaf
weeds, etc.
In the future, the detection of these kinds of grasses with multiple growth stages
is considered. Further, research directions are to predict weeds in multiple weed
images, i.e., images with various weeds and evaluation of density over a specific
region for predicting weed density need to be concentrated. Next, the concatenations
of diverse optimization approaches are highly solicited to acquire global outcomes.
References
1. Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural
networks. Science,313(5786), 504–507.
2. Mubeen, K., Nadeem, M. A., Tanveer, A., & Jhala, A. J. (2014). Effects of seeding time and
weed control methods in direct seeded rice (Oryza sativa L.). Journal of Animal and Plant
Science, 24(2), 534–542.
3. Rao, A. N., & Ladha, J. K. (2013). Economic weed management approaches for rice in Asia.
In The Role of Weed Science in Supporting Food Security by 2020. Proceedings of the 24th
Asian-Pacific Weed Science Society Conference, Bandung, Indonesia, October 22–25, 2013
(pp. 500–509). Weed Science Society of Indonesia.
4. Sarker,M. I., & Kim, H. (2019). Farm land weed detection with region-based deep convolutional
neural networks. arXiv preprint arXiv:1906.01885.
5. Wendel, A., & Underwood, J. (2016, May). Self-supervised weed detection in vegetable crops
using ground based hyperspectral imaging. In 2016 IEEE International Conference on Robotics
and Automation (ICRA) (pp. 5128–5135). IEEE.
6. Garcia-Ruiz, F. J., Wulfsohn, D., & Rasmussen, J. (2015). Sugar beet (Beta vulgaris L.)
and thistle (Cirsium arvensis L.) discrimination based on field spectral data. Biosystems
Engineering, 139, 1–15.
7. Dyrmann, M., Jørgensen, R. N., & Midtiby, H. S. (2017). RoboWeedSupport—Detection
of weed locations in leaf occluded cereal crops using a fully convolutional neural network.
Advances in Animal Biosciences, 8(2), 842–847.
8. Lottes, P., Behley, J., Milioto, A., & Stachniss, C. (2018). Fully convolutional networks with
sequential information for robust crop and weed detection in precision farming. IEEE Robotics
and Automation Letters, 3(4), 2870–2877.
9. Potena, C., Nardi, D., & Pretto, A. (2016, July). Fast and accurate crop and weed identification
with summarized train sets for precision agriculture. In International Conference on Intelligent
Autonomous Systems (pp. 105–121). Springer.
10. Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image
recognition. arXiv preprint arXiv:1409.1556.
11. dos Santos Ferreira, A., Freitas, D. M., da Silva, G. G., Pistori, H., & Folhes, M. T. (2017).
Weed detection in soybean crops using ConvNets. Computers and Electronics in Agriculture,
143, 314–324.
68 M. Vaidhehi and C. Malathy
12. Myers, D., Ross, C. M., & Liu, B. (2015). A review of unmanned aircraft system (UAS)
applications for agriculture. In 2015 ASABE Annual International Meeting (p. 1). American
Society of Agricultural and Biological Engineers.
13. Ronneberger, O., Fischer, P., & Brox, T. (2015, October). U-net: Convolutional networks for
biomedical image segmentation. In International Conference on Medical Image Computing
and Computer-Assisted Intervention (pp. 234–241). Springer.
14. Ramprakash, T., Madhavi, M., & Yakadri, M. (2013). Influence of bispyribac sodium on soil
properties and persistence in soil, plant and grain in transplanted rice. Progressive Research,
8(1), 16–20.
15. Rao, A. N., & Chauhan, B. S. (2015). Weeds and weed management in India—A review.
16. Rashid, M. H., Alam, M. M., Rao, A. N., & Ladha, J. K. (2012). Comparative efficacy of
pretilachlor and hand weeding in managing weeds and improving the productivity and net
income of wet-seeded rice in Bangladesh. Field Crops Research, 128, 17–26.
17. Bakhshipour, A., Jafari, A., Nassiri, S. M., & Zare, D. (2017). Weed segmentation using texture
features extracted from wavelet sub-images. Biosystems Engineering, 157, 1–12.
18. Lavania, S., & Matey, P. S. (2015, February). Novel method for weed classification in
maize field using Otsu and PCA implementation. In 2015 IEEE International Conference
on Computational Intelligence & Communication Technology (pp. 534–537). IEEE.
19. Rumpf, T., Römer, C., Weis, M., Sökefeld, M., Gerhards, R., & Plümer, L. (2012). Sequential
support vector machine classification for small-grain weed species discrimination with special
regard to Cirsium arvense and Galium aparine.Computers and Electronics in Agriculture, 80,
89–96.
20. Yu, J., Sharpe, S. M., Schumann, A. W., & Boyd, N. S. (2019). Detection of broadleaf weeds
growing in turfgrass with convolutional neural networks. Pest Management Science, 75(8),
2211–2218.
Chapter 8
E-Voting System Using U-Net
Architecture with Blockchain Technology
Nuthalapati Sudha and A. Brahmananda Reddy
Abstract In order to give power to the right person, any democracy must have a
fair voting system that meets the needs of the people. In India, there are two types of
voting system: paper ballots and electronic voting machines (EVMs), both of which
have similar drawbacks. The provision of security in a digital voting system has
long been a major concern. In the proposed work, implemented an interface. Firstly,
the user has to register after which they will be authenticated and validated through
many phases such as OTP and IRIS recognition to establish whether they are valid
or fraudulent voters. The election process will be done under blockchain technology.
The blockchain technology is to provide transparency, security, reliability. Therefore,
there will be an increase in the voting percentage because each and every vote counts;
there will be no delay in calculating ballots.
8.1 Introduction
In 1989, the Election Commission of India collaborated with Bharat Electronics
Limited and Electronics Corporation of India Limited to create the Indian elec-
tronic voting machine (EVM). The EVMs were designed by faculty members at
IIT Bombay’s Industrial Design Centre. In traditional voting methods such as paper
voting, punch card voting, ballot systems and so on. There is an issue which has
been faced due to covid-19 (In recent GHMC elections, the voting percentage has
been drastically decreased due to the pandemic situations). Online voting system
provides a better voting method in terms of correctness, convenience, flexibility,
confidentiality, reliability, validity and portability, but it has a number of flaws,
including time consumption, mountains of paperwork and mobility. Officials played
no direct involvement; the machine was damaged, and there were no updates. These
weak points can overcome by online voting system. Voter can cast his/her vote from
anywhere in the location through online voting system (OVS).
N. Sudha ·A. Brahmananda Reddy (B)
VNR VJIET, Bachupally, Hyderabad, India
e-mail: brahmanandareddy_a@vnrvjiet.in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
B. N. K. Rao et al. (eds.), Intelligent Computing and Applications, Smart Innovation,
Systems and Technologies 315, https://doi.org/10.1007/978-981-19-4162-7_8
69
70 N. Sudha and A. Brahmananda Reddy
Electronic voting (e-voting) has been one of the blockchain’s (BC) newer appli-
cations, with academics trying to take advantage of benefits including integrity,
anonymity and non-repudiation, all of which are vital in a voting application. A
block is a collection of data. Mining is the process of gathering and processing data
in order to fit it into a block. A cryptographic hash could be used to identify each
block (also known as a digital fingerprint). So that blocks can create a chain from the
first block ever (known as the Genesis block) to the formed block; the created block
will contain a hash of the previous block. All of the data might be linked together
in such a way using a linked list structure. In 2008, a person (or group of persons)
going by the name Satoshi Nakamoto created the BC to serve as the public transac-
tion record for the cryptocurrency bitcoin. Satoshi Nakamoto’s true identity is still
unknown. With the creation of the BC for bitcoin, it became the first digital currency
to solve the double-spending problem without the need for a trusted authority or
central server. Every user on the BC network has the ability to connect, send new
transactions, validate transactions and generate new blocks. A cryptographic hash is
assigned to each block, which remains true as long as the information in the block
is not changed. The voter’s voting transaction allows the acquisition of a casting a
vote token to his preferred candidate’s address as a receiver’s address. If one of the
assigned miners confirms this transaction into a block, this becomes part of the main
BC. This may help to increase voter ‘participation. The main reasons in building
decentralised applications are there is no single point of failure, even if so. The
key advantages of developing decentralised apps are that there is no single point of
failure, and our programme will continue to function even if some machines fail.
These programmes have a higher level of security. These uses are undetectable. Any
participant can check that everyone else is following the rules and that no one is
tampering with the data.me machines go down our application will continue to run.
These applications are more secure. These applications are transparent. Any partici-
pant can check that everyone else is following the rules and that no one is tampering
with the data. Ethereum is a flexible platform that can be used to create both private
and public decentralised applications. The basic concepts of e-voting system using
blockchain technology and system architecture were introduced in the research article
titled “E-Voting system using Deep Learning and Blockchain technology.”
8.2 Literature Survey
Gomathi and Veena Priyadarshini [1] provide an interface which is implemented by
3 security measures are magnetic coated scan, fingerprint recognition and password
validation. These measures will assure the interface in terms of security, and no
fraud voters can be entertained. The voter cannot be able to vote more than once.
This system requires another device for the purpose of fingerprint recognition. More
advanced or enhanced operations need to be implemented in order to safeguard the
system from malicious threats.
8 E-Voting System Using U-Net Architecture with Blockchain 71
Yu et al. [2] appreciating the efforts of the authors that have put on to build the
system. Introduces a Hyperledger framework as a framework for BC, implements
a consensus algorithm by using PBFT, and the authentication is done by using ring
signature method which increases the count of the number of voters. Transaction
time will be taking more. It does not support transactional parallelism.
Hjálmarsson et al. [3] System comprises of a district node that manages the boot
node’s smart contract. Exonium, Quorum and Geth are three frameworks that come
highly recommended. Exonium is a premium system that can be used with cryptocur-
rency, making it prohibitively expensive for larger implementation. There are other
frameworks available which can give better performance and they are open-source
frameworks. Quorum and Geth are Ethereum-based frameworks that do not allow
for parallel transaction execution, limiting scalability and speed.
Zhang and Romero et al. [4] Paillier homomorphic encryption, as well as other
cryptographic methods like blind signatures and zero-knowledge proof, is used in
this e-voting system. This technique also meets the requirements for a generic voting
system, such as eligibility, accuracy, simplicity, privacy and robustness. It is written
in C++ and makes use of GMP, a mathematical multiple highly precise module and
the “Paillier” librarian to do the computations required for encrypting, decoding,
vote validation and tallying. To increase the e-voting overall system performance,
some parts of the system, such as certifying the sealed ballot, verifying for legitimate
votes and counting the votes, might be made to run in parallel.
Zagórski et al. [5] Voter registers for election and after registering the voter the
voting assistant will give him a QR code. This QR code will be scanned at the time of
election. The voter can cast his/her favourite candidate. The voter will be encrypted
and decrypts at the time of tallying the results.
Kshetri and Voas [6] work with BC technology where each voter contains a
“wallet” with the user credentials. Each voter receives a single coin in his wallet
that represents a vote in which a voter can only spend his coin once. The vote moves
the coin of the elector into the pocket of a nominee. However, voters can change
their vote from one candidate to another candidate before a pre-set deadline, and by
mistake, a voter can vote for the other candidate; the confirmation has to be given
to the voter before they submit the vote. Voting by eligible electors on a machine
or mobile anonymously blockchain-enabled voting uses personal key identification
encrypted. The method limits the voting to one vote. No voting fraud will enter as
this device uses encrypted personal identification key.
Ismail and Kintu [7] deal with fingerprint recognition for traditionally collection of
fingerprints using ink or electronically using screen sensors, such as a person’s finger-
print reader/scanner. The fingerprints are used for authentication, which requires a
user’s verification and identification. The verification process consists of a matching
fingerprint with a comparison between two of the bio-metric and other fingerprints
already in the database. The fingerprint involves four distinct components. The match
module contains a decision module, in which the user is checked for the identity
requested, a device storage module, data collected and stored in the database during
the fingerprint registration to be later recovered. The database for the corresponding
user is set at 1:1; the user cannot fit at 1:0, and the database can be equated with
72 N. Sudha and A. Brahmananda Reddy
the fingerprints of the current registered user at 1:N. The system is not running on
various devices, for example, mobile phones.
Arshad et al. [8] author has provided a detailed description of how the system
prevents the problem of double voting (voting by the same person in multiple areas).
The voter must first register and be authenticated using fingerprint recognition. The
person can vote after the authentication process is completed. This information is
then used to generate a cryptographic hash, which contributes to the creation of
a transaction ID, which is then transmitted to the voter’s email address. Finger-
print recognition requires the use of another device, which is not a necessary safety
precaution at this moment of covid because as cases are spreading widely.
8.3 Methodology
Registration Phase: User has to register within the application by providing few
details like name, Aadhaar number, phone number and so on. And later, voter will
be validated whether he/she is a valid voter or not.
Slogan: Vote for the best candidate and bring shine into your life.
Authentication Phase
Authentication Phase 1: The UIDAI (“Authority”) issues a 12-digit set of numbers
to Indian citizens who have already accomplished the Govt’s identity verification.
Everyone residing in India, irrespective of age, is eligible to apply for an Aadhaar
number (Fig. 8.1).
Authentication Phase 2: In the second phase, voter will get a code to their mobile
number; the reason behind this is whether the information submitted by voters is
correct or not. By delivering a one-time code to a mobile phone number, we can also
prevent a voter fraud. SMS validation, also known as SMS two-factor authentication
Fig. 8.1 Aadhaar card
8 E-Voting System Using U-Net Architecture with Blockchain 73
Fig. 8.2 Process of authentication
(2FA) and SMS one-time passcode (OTP), allows users to confirm their identities by
texting a code to themselves. It is a type of 2FA that functions as a second verifier for
users attempting to access a system, device, or app, and it is a solid initial step toward
added protection. Following sign-in, the user may get a text message containing an
SMS authentication token. To gain access, users just simply input their password on
the interface, and the voter’s code will be validated (Fig. 8.2).
Authentication Phase 3: IRIS recognition using convolutional neural networks
(Fig. 8.3).
Pre-processing an image: In pre-processing, firstly, the images are (BGR)
converted into grayscale image; then, by using the thresholding and canny edge
detection, we localise the part of the image. Secondly, normalisation is a critical step
that ensures each input parameter (pixel in this case) has a similar data distribution.
These speeds up convergence whilst training the network (Fig. 8.4).
Segmentation: The process of segmenting an image into parts with similar
attributes is known as image segmentation. Image segmentation’s primary goal is
to simplify the image so that it can be more easily analyzed. Image segmentation is
a subset of digital image processing that focuses on dividing an image into distinct
parts based on their features and properties.
Train and Test: The data set splits into 60 and 40% where 60% of images where
been trained and 40% of the data is been tested (Fig. 8.5).
Architecture (model): The U-Net model is a widely used deep convolution network
(CNN) architecture for biomedical and other image translation tasks. This model’s
advantage is its capacity to generate relatively accurate models from (very) tiny data
sets, which is a prevalent issue in data-constrained image processing applications like
74 N. Sudha and A. Brahmananda Reddy
Fig. 8.3 Flow process of IRIS recognition
Fig. 8.4 Input image: IRIS
pattern
iris recognition. An encoder–decoder path of the architecture is used by U-Net. Deep
neural encoding and decoding levels compensate the architecture. The encoding path
is on the left side of the system, and the decoding path is on the right. The sender
adopts a conventional CNN architecture popularised by the visual geometry (VGG)
network,whichcomprisesoftwo3*3convolutional followed by a rectified linear
unit (ReLU) activation layer with maximum pooling. Every step of encoder down-
sampling multiplies the amount of the feature channels whilst halving the image
resolution. The lower layer’s feature maps are then up-sampled by the decoder part,
which also concatenates and crops the encoder part’s output of the same depth. This
process ensures that data are propagated at all scales from the encoder to the decoder
and that no information is lost even during encoder’s down-sampling processes on
the U-Net model’s output. The network’s final layer is a 1 ×1 convolutional layer
that integrates the output signals from the previous layer and generates fragmentation
(in each class—iris vs. non-iris) that represents the U-Net model’s result (Fig. 8.6).
8 E-Voting System Using U-Net Architecture with Blockchain 75
Fig. 8.5 U-Net model
Fig. 8.6 Flow process of model
Election process:
The vote has to select the best candidate to cast vote who deserves it or who can
handle the government.
After the registered voter is been validated, the ballot paper has been visible to
the voter
76 N. Sudha and A. Brahmananda Reddy
Fig. 8.7 Chain of blocks in blockchain [1]
The voter submits it; then, the voter gets acknowledgement (Alert message) that
they have been voted for the particular candidate (example: Id: 234,561 your vote
has been casted to candidate1 and thank you for voting).
After the election is over, automatically, the votes will be displayed to the user.
Vote transaction: When a person votes, he or she interacts with a smart contract
called a ballot. The BC interacts with this smart contract, which adds the vote to the
BC. Following the voter’s submission of his or her preferred candidate, the votes are
held in a mem pool. The transaction from the mem pool will then be followed up on
by miners, who will add it to a block in a BC. Each vote is saved as a transaction
on the BC, and each voter is given the transaction ID for their vote. On the BC,
each transaction has information about who voted for which candidate. The Merkle
tree is used to store data in the BC, which requires very little memory. Each vote is
appended to the BC via the relevant ballot smart contract (Fig. 8.7).
Announce the results: The election commission will publish the results in the
website once the voting period has ended.
8.4 Results and Discussion
OTP has been sent to a user by using 2Factor services (Fig. 8.8).
IRIS Recognition: The model achieved an overall accuracy of 95% which is
significantly higher than the traditional methods (Figs. 8.9,8.10, and 8.11).
Conditions to be followed
According to Article 58 of the Constitution, no one may be elected President unless
he is a citizen of India.
8 E-Voting System Using U-Net Architecture with Blockchain 77
Fig. 8.8 OTP has been sent to a user
Fig. 8.9 IRIS pattern
Every individual voter should be greater than equal to 18 years of age at the time
of registering the voter. So, that they will get the eligibility to vote according to the
Election Commission of India.
8.5 Conclusion
The three levels of authentication (Aadhaar id varification, OTP generated to regis-
tered mobile number and finally IRIS recognition) used and applied CNN technique
78 N. Sudha and A. Brahmananda Reddy
Fig. 8.10 Accuracy
Fig. 8.11 Election process
to detect the fraudalent voters. Once the election process is done, it may also helps
in declaring the accurate results.
8 E-Voting System Using U-Net Architecture with Blockchain 79
References
1. Gomathi, B., & Veena Priyadarshini, S. (2013). Modernized voting machine using finger print
recognition. IJSER, 4(5). ISSN: 2229-5518.
2. Yu, B., Liu, J. K., Sakzad, A., Nepal, S., Steinfeld, R., Rimba, P., & Au, M. H. (2018). Platform
independent secure blockchain-based voting system (pp. 369–386). Springer. https://doi.org/10.
1007/978-3-319-99136-8_20
3. Hjálmarsson, F., Hreiðarsson, G. K., Hamdaqa, M., & Hjálmtýsson, G. (2018). Blockchain-based
e-voting system (pp. 983–986). IEEE.
4. Zhang, M., & Romero, S. (2020). Design and implementation of an e-voting system based on
Paillier encryption (pp. 815–831). Springer.
5. Gaweł, D., Kosarzecki, M., Vora, P. L., Wu, H., & Zagórski, F. (2017). Apollo—End-to-end
verifiable internet voting with recovery from vote manipulation. Springer.
6. Kshetri, N., & Voas, J. (2018). Blockchain-enabled E-voting. IEEE Software, 35, 95–99.
7. Ismail, M., & Kintu, N. B. (2018). A secure e-voting system using biometric fingerprint and
crypt-watermark methodology. IEEE.
8. Khan, K. M., Arshad, J., & Khan, M. M. (2018). Secure digital voting system based on blockchain
technology. International Journal of Electronic Government Research (IJEGR), 14, 53–62.
Chapter 9
Multi-layered Architecture to Monitor
and Control the Energy Management
in Smart Cities
A. K. Damodaram, S. Sreenivasa Chakravarthi, L. Venkateswara Reddy,
and K. Reddy Madhavi
Abstract Energy meter monitoring and control management in real time still have
several limitations in smart city establishments. One of the essential components of
smart city is energy meter monitoring and control in real time. Since it is a herculean
task for the energy companies to monitor and control energy meter in collecting the
data on energy consumption, billing, payments and other managerial aspects in smart
city environment, a suitable wireless sensor network architecture that encompasses
emerging technologies like smart devices, Cloud–Fog computing, IoT and drones
may be an approach to provide a solution in this regard. Present day communication
technologies revolutionized the way smart phones and smart gadgets/devices are
deployed to access, acquire, store and share data. Internet of Things (IoT), in the
recent past, facilities to provide several applications and solutions in making our lives
‘smart’ leading to ‘smart city’ establishments. The present article proposes a novel
‘three-layered’ wireless sensor network architecture for energy meter monitoring
control management in real time.
9.1 Introduction
In conventional metering system, energy providing authority sends personnel to
acquire the data manually and record the energy consumption [1], so billing can be
done accordingly. The manual reading systems have disadvantages like it require
large number of personnel to read the meters, involvement of erroneous readings,
tampering of the readings, etc. This calls for adoption of emerging technologies like
A. K. Damodaram (B)
Department of ME, Sree Vidyanikethan Engineering College (Autonomous), Tirupati, AP, India
e-mail: akdamodaram@vidyanikethan.edu
S. Sreenivasa Chakravarthi ·K. Reddy Madhavi
Department of CSE, Sree Vidyanikethan Engineering College (Autonomous), Tirupati, AP, India
e-mail: sschakravarthi@ieee.org
L. Venkateswara Reddy
Department of CSE, KG Reddy College of Engineering, Hyderabad, AP, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
B. N. K. Rao et al. (eds.), Intelligent Computing and Applications, Smart Innovation,
Systems and Technologies 315, https://doi.org/10.1007/978-981-19-4162-7_9
81
82 A. K. Damodaram et al.
IoT to provide a comprehensive solution for the problem [2]. Energy consumption
monitoring using IoT for energy monitoring and management is emerging technology
today. This IoT-based system transforms the way people and organizations use and
control power consumption. Internet of Things (IoT) products became recently an
essential part of any home in conjunction with the great advancements in Internet
speeds and services [3]. It consists of smart devices, hardware and software. This
system allows real-time monitoring of power consumption telemetry and predictive
calculation of consumption. The smart meter can automatically send its reading to
the energy supplier for accurate billing, removing estimation-based billing [4].
9.1.1 Proposed Questions
The following questions were floated related to the development of architecture for
energy meter monitoring and control management for smart cities:
1. What are the architectural components of wireless sensor network requirement
for energy meter monitoring
2. What is the suitable architecture framework for smart city environment
3. How computing and communicating technologies distributed in the wireless
sensor network architecture for smart cities
4. What is the ‘threat landscape’ in to-be proposed architecture.
9.1.2 Methodology
The present article assumes a generic four-stage process which describes the data
flow from the IoT devices and sensors through the Cloud or Fog layers. The to-be
proposed architecture shall fulfill the following:
a. All the data acquisition and collection to happen real time at the set time intervals
b. Acquired data on the energy consumption from devices and sensors need to be
conveyed Cloud or Fog or both with an objective to reduce or avoid the bourdon
on system due to large raw data to go core computing capabilities
c. Provide gateways for easy access between the separated layers for effective
utilization of the computing resources
d. Allow preprocessing of the data to reduce large unstructured data
e. Allow core computing facilities on the cloud to deliver better performance.
Based on the above mentioned aspects, a novel innovative first-hand architecture
was developed from understandings of the information sourced from books, journals,
whitepapers and technical reports available on Internet.
9 Multi-layered Architecture to Monitor and Control… 83
9.2 Literature
9.2.1 Smart Meters
Asmart meter is an electronic device intended to record consumption of energy,
current, voltage power factor, etc. Smart meters communicate the information on
consumption in real time. Functions aspects of these smart meters made it best suit for
automation of data acquisition in IoT environment. In IoT, everything is configured
with Internet protocol addresses, and it can monitor controlled and access remotely
in accordance with Web technology [5]. However, the battery capacity capabilities of
the meters limit the quantity and frequency of sending data to the nodes and modules
of the monitoring system. Moreover, the location of meters often limits the signal
transmission in real-time data [6]. Benefits of using smart meters in IoT environment
include
Readings can be automatically sent to the electricity department.
No need to prepare estimated bills.
No room for human errors/mistakes.
9.2.2 Smart City
A smart city is a developed and modernized urban area characterized by the inte-
gration of urban infrastructural facilities with the information and communication
technologies (ICT) through Internet. Smart city establishments permit the city offi-
cials to interact directly with community in providing the urban infrastructure like
utilities, energy, water supply, waste disposal and other civic amenities. Smart city
establishment allows the use of wide range of electronic and digital technologies
to connect with urban infrastructure with proper technology architecture. It also
embeds information and communication technologies (ICT) to government systems.
Smart city energy sector module involves collection of data on energy consumption,
processing, billing, control management, etc. Energy providers see opportunities for
information and communication technology (ICT) enabled smart energy applications
[7].
9.2.3 Internet of Things (IoT)
The concept of the Internet of Things first became popular in 1999, through the
Auto-ID Center at MIT and related market-analysis publications [8,9]. IoT is an
information technology foundation which enables communication between elec-
tronic devices and sensors with the assistance of Internet. The Internet of Things
(IoT) is emerging as a significant development in information technology, with the
84 A. K. Damodaram et al.
potential to increase convenience and efficiency in daily life [10]. The technical
concept of the IoT is to enables these different physical objects to sense information
using sensors and sends these information to a server [11].
9.2.4 Cloud–Fog-IoT Environment
All fog environments, nodes are not always active computationally. Nodes are put in
operation mode only when required. It is also possible to turn OFF the node when
there is no data and can be put ‘ON’ when required. Also, the Fog environment
can be made scalable on each communication link among the nodes, and connec-
tivity features can be applied for data acquisition. Thus, consistent real-time data
transfer can be ensured. Fog and Cloud platforms differ from each other in resource
capacity and capability to facilitate deployment of Cloud platform in Fog environ-
ment, a cluster-based Fog system architecture is commonly used. Three-tier IoT
system architecture consists of smart devices in tier1, edge gateway in tier 2, and
Cloud-networked things like sensors and actuators. These sensors and actuator use
protocols such as Modbus, Bluetooth, Zigbee or any other proprietary protocols, to
connect to an Edge Gateway [12]. The Cloud tier in most Cloud-based IoT platforms
facilitate event queuing and messaging that transpires in all tiers. To build an Internet
of things, the Web of things acts as an architecture for the application layer. There is
also the IoT, which includes devices such as Nest thermostats or security cameras,
electric appliances, voice assistants, such as Google Home or Alexa, and sensors
[13]. In order to program and control the flow of information in the IoT environment
need architectural design and environment with a blend of traditional process mining
and special automation capabilities (Fig. 9.1).
9.2.5 A Network Architecture
The Internet of Things (IoT) require huge scalability in the network space to handle
the surge of devices [1]. Fog computing is a viable alternative to prevent large amount
of data flow through Internet [14].
9.2.6 Drones for Smart Cities
Use of drones is being witnessed across all the human technological interventions in
day-to-day life. Drones provide flexibility in adapting for various aspects of smart city
life including safety and security, policing, delivery of small consignments, infras-
tructure planning, traffic control and monitoring, smart lighting, smart metering,
9 Multi-layered Architecture to Monitor and Control… 85
Fig. 9.1 Cloud–Fog-IoT environment
smart banking smart governance, etc. Capabilities of drones to carry small to medium
loads are in high priority in research and development.
In context of the present work in this article, if drones are mounted with necessary
sensory devices to collect the data in real time from smart meters in a given locality,
we can eliminate the unnecessary investment on computing and communication tech-
nologies in terminal-perception layer interfaces. A sensor network with embedment
of drones and IoT can be a valuable option in smart city environment. Generally, IoT
is most abundant in manufacturing, transportation and utility organizations, making
use of sensors and other IoT devices [15].
9.2.7 Network Architectures for Smart Cities
The sensor network architecture for smart city needs to equip connectivity to the IoT
devices and computing and communicating technologies [16]. Where connectivity
was required, a zoned architecture was adopted, with firewalls and/or demilitarized
zones used to protect the core control system components [17]. The following are
the technologies that need to be incorporated in to the architectural form.
a. SCADA—Supervisory Control and Data Acquisition
b. WPAN—Wireless Personal Area Network
c. LPWAN—Low-Power Wide Area Network.
86 A. K. Damodaram et al.
The architecture for wireless sensor network for smart city energy meter moni-
toring and control management should serve the purpose of facilitating sharing
of information across different IoT devices (smart meters), applications and the
computing and communicating layers. Also, it is expected to consolidate the network,
cloud, Fog and IoT devices in the smart city environment with the following
capabilities
i. IoT device provisioning with access controls
ii. Network node provision in an automated approach.
Based on the application domain, we classify and discuss these solutions under
five different categories: (1) smart wearable, (2) smart home, (3) smart city, (4) smart
environment and (5) smart enterprise [18].
9.2.8 Objectives of Energy Meter Monitoring Device System
in
An energy consumption monitoring device systems in the smart city have the
following function to fulfill
(a) Web-based system for remote tracking the energy consumption system
(b) Real-time data acquisition through sensor-based IoT devices
(c) Monitor the data from the grid to compute energy losses
(d) Quick and reliable connectivity to entire network through IoT platform
(e) Enable real-time reading and billing
(f) Enable access through Android/IOS apps.
There are multiple benefits from using IoT for power monitoring and management.
These benefits include real-time monitoring with total control over the acquired data.
Data from the domestic and industrial establishment pertaining to energy consump-
tion would be enormous and hence needs consolidation using required database
management tools. These data may be used further to establish trends in usage of
energy and as a check for power distribution losses. Other benefits are predictive cost
calculation, if required, and automated energy consumption administration. Despite
its intricate connectivity and loaded functionality, the energy monitoring system is
easy to install and configure using a smart device.
The proposed systems have the following features;
Real-time data processing for energy consumption and management for the
domestic and industrial establishments
Electronic device recognition based on smart IoT devices platform
Detection of energy consumption trends at distribution level and grid level
Automated system with error-free energy consumption monitoring
Automation of data collection, analytics, visualization and reporting on energy
consumption
9 Multi-layered Architecture to Monitor and Control… 87
Detection of misuse of energy and tracing of faulty or tampered energy meters
Scalability depending on requirements.
9.3 A Novel Approach to Data Acquisition for Energy
Monitoring System
IoT provides a three layers network through which sensors can share data with one
another. The Internet of Things belong to the branch of application programs for
collecting data in real time from multiple devices installed at remote locations to
control those devices by setting conditions. The major requirement for providing
communication among these devices is Internet. A suitable wireless sensor network
architecture is prerequisite to encompass all the capabilities on the cloud environ-
ment through Fog. IoT facilitates connecting and networking with a wide variety of
energy monitoring devices which are in commercial and residential use. IoT offers
sophisticated methods and techniques to analyze and optimize the use of energy
utilization monitoring devices in the entire distribution system. IoT can provide a
suitable models, algorithms and architectures for energy consumption monitoring
and management. IoT can also provide a scope to discover faulty smart meters
and erroneous consumption from outdated appliances, damaged/under-performing
appliances and faulty system components. Energy consumption should be monitored
effectively to avoid misuses, losses at the user side and to ensure proper auditing of
energy both at generation and distribution sides. Automated meter reading systems
make visits by the reader unnecessary and also allows energy companies to bill based
on real-time data instead of estimates over time. If the automated meter reading
devices are modified to smart devices with a capability to connect to the network
modules like Wi-Fi, IoT-Drone adapter and Arduino through Internet, it is possible
to establish a sound system of energy meter control and management effectively.
In this present work, we propose a novel three layers architecture for acquiring
energy consumption data which uses real-time interfacing technology-based smart
devices/smart meters, Wi-Fi and drones.
9.3.1 Proposed Architecture for Cloud-Fog-IoT
Environment—Framework
In the proposed approach, interfacing through the device enabled ‘real time’ through
three layers. The interfacing technologies for smart devices and ‘cryptography’ will
enable the real-time accessibility between the Cloud and Fog layers (Fig. 9.2).
A Three-layer architecture for Cloud–Fog-IoT deployment.
A three-layered architecture for ‘Cloud–Fog-IoT’-based system is proposed in the
present work. This architecture comprised of a terminal layer, perception layer and
88 A. K. Damodaram et al.
Fig. 9.2 Typical
Cloud–Fog-IoT environment
network layer. The terminal layer contains IoT devices deployed in home/office
establishments to read the energy meters and acquire the data. The IoT devices include
smart meters, remote terminal units and information collection devices. This layer
acquires the data and transmits the data to the perception layer. The perception layer
collects the data from all the devices in the terminal layer through ‘data analytics’.
The data thus acquired is subjected to analysis and interpretation on the type of use,
consumption of energy, conversion in to standard datasets for every home/business
establishments according to the set norms by the energy supplying authority. Data
validation shall also be incorporated for detecting faulty energy meters, errors in
metering, etc., at this perception layer with suitable software and hardware. The data
then transferred to network layer for storage, retrieval, analysis and interpretation
at the central level. Hence, the proposed three-layer network model enables data
acquisition through the deployed IoT tools/devices across the layers (Fig. 9.3).
Fig. 9.3 Proposed Cloud–Fog-IoT environment
9 Multi-layered Architecture to Monitor and Control… 89
9.3.2 The Activities in Three-Layer Network
The following are the activities envisaged in the proposed ‘three-layer architecture
for smart meter monitoring and control management in real time for smart city
environment (Table 9.1).
IoT-Fog–Cloud layers in the proposed architecture (Fig. 9.4):
The information thus collected is conveyed to the Fog-based data processing
capabilities in the Fog computing infrastructure. Perception layer is the interme-
diary in the architecture for reducing the data processing bourdon on the network
layer where energy meter usage trends, storage of structured data pertaining to
the energy consumption and trends, interpretation of data by a suitable application
to support decision-making regarding the control management and comprehensive
energy consumption monitoring authorizations.
Table 9.1 Three-layer architecture proposed activities
S. No. Layer Activates
1Terminal layer Data collection—raw data on energy consumption
2 Perception layer Data processing—data consolidation to resource description
framework (RDF) using semantic Web technologies. Also, data
integration and reasoning
3 Network layer Data storage and interpretation—device controls, alters and
overall control management of the energy consumption trends
APPLICATION
ENERGY METERING TRENDS, STORAGE, INTERPRETATION &
MONITORING & CONTROL MODULES
DATA INTEGRATION
DATA
FILTERING OF DATA
COLLECTION & CONSOLIDATION OF DATA
ENABLERS
COMMUNICATION
PROTOCOLS
DRONE BASED
DATA
COLLECTION
SDN
SMART METER IoT IoT
PHYSICAL WORLD –
ENERGY CONSUMPTION SMART METER EQIPPED HOME/OFFICE/INDUSTRY
ESTABLISMENTS
Fig. 9.4 IoT-Fog–Cloud layers (proposed)
90 A. K. Damodaram et al.
Fig. 9.5 Block diagram of proposed system for IoT environment with Wi-Fi and drone modules
9.3.3 Arduino-Based Energy Meter Control
and Management
Figure 9.5 shows the Arduino-based energy meter control and management modules.
The figure elaborates various components and modules and their connectivity logic.
The following are the expected additional features of the proposed model with usual
features as mentioned below.
a. Tamper proof
b. Low/High voltage protection
c. IoT adaptability.
9.3.4 Components of IoT-Based Energy Meter Control
and Management
9.3.4.1 Smart Energy Meter Module
To record and display the energy consumption in KW at the user place, it is required
to measure the frequency, power factor along with the current and voltage [19].
9 Multi-layered Architecture to Monitor and Control… 91
9.3.4.2 Arduino
With required uplink/downlink capabilities, it have to send information to Wi-Fi
module or IoT-Drone adapter module. The selected Arduino shall have the low-power
consumptions, which shall be controlled via AT Commands and need to be embed
with TCP/UDP protocols. It is also required to support the required communication
protocols like TTL. Proper Arduino codes need to be developed to facilitate inter-
facing, Internet testing and sending and receiving data from the different modules of
the energy meter control and management system.
9.3.4.3 Wi-Fi Module
To provide seamless wireless connectivity to IoT three-layer platforms with Internet
connectivity.
9.3.4.4 IoT to Drone Adapter
The drone adapter shall equip interfacing modules to provide seamless connectivity
between energy meter control and management system as and when required.
9.3.4.5 LCD Display
LCD module with display has to display the parameters like voltage, current, power
(real time), energy consumptions, power factor and consolidated energy bill in real
time.
9.3.4.6 Relays and Regulators
Relays and regulators required to connect and disconnect the smart energy meter
system from the grid. Also, rectifiers, filter capacitors, voltage regulators along with
a step down transformer for the system are required.
In the proposed system, smart energy meter provides the information about the
usage of energy to Arduino which in turn displays the same for the use of consumer.
The information about the reading shall be sent to either to Wi-Fi module or IoT-
Drone adapter module to the layers (three layers proposed).
In case of far reach and scattered user points, a drone shall be equipped with
the receiver module to receive the information from IoT-Drone board in real time.
In case of users in regular domestic establishments like gated communities, open
communities and town/city layouts, Wi-Fi module shall convey the information about
the energy meter reading through Arduino.
92 A. K. Damodaram et al.
9.3.5 Features of Proposed ‘Three-Layered Architecture’
for Smart Meter Monitoring and Management
with Integration of Cloud, Fog, IoT and Wireless Sensor
Network Model as Shown in Fig. 9.3 Are as Follows
a. Smart meters and devices connectivity through terminal layer with the help of
interfacing device/meters/technologies.
b. Consolidation and data analytics through Fog in perception layer.
c. Storage of data after consolidation and analytics in the form of trends and
databases in Cloud in network layer.
The proposed architecture includes three platforms Fog, Cloud and IoT environ-
ment to facilitate data collection, data analytics and data storage at three different
layer of the architecture. Domestic establishment consists of several premises like
homes, campus, buildings and office and industry establishments consists, factory
space, offices, departments, labs and rooms. Each establishment in the smart city
environment is with a different energy needs and utilization that are significant to
take into account in managing energy consumption as a single unit. Even in a single
home, each family member has his/her own energy consumption requirements which
needs to be considered. Hence, this framework has a tree-like structure control plane
as presented in Fig. 9.3, comprising ‘three-layered network’ for energy consump-
tion monitoring and management right from single unit level consumption to smart
city level. Today’s IoT systems include event-driven smart applications (apps) that
interact with sensors and actuators [20].
9.3.6 Threat Landscape in Three Layer Architecture
Smart city IoT establishments are susceptible to cyber-attacks. The intruder may
breach into the architecture communication technologies and may temper, manip-
ulate or misuse the data pertaining to the energy consumption and usage trends.
Among the security threat in smart city establishment, data manipulation threats are
crucial. The data manipulation in the smart city may lead to tampering the data,
corrupting the data, misuse of the data and disrupting decision-making.
Data tampering: Data tampering may result in faulty smart meter readings
conveyed to the perception layer.
Data corruption: Data corruption may result in faulty meter readings conveyed to
the network layer.
Data misuse: Data misuse may result in fraudulent transactions related to the
energy consumption. And finally,
Disruptions: Disruption may result in decision-making regarding energy usage
trends in the city’s individual establishments like homes, buildings, offices and factory
9 Multi-layered Architecture to Monitor and Control… 93
premises and may influence the structured data stored in the network layer for further
decision-makings.
Since the intruder first attack the smart devices and smart meters in the terminal
layer, it is essential to deploy the security measures to the perception layer with
the help of anti-virus, firewalls, etc. Hence, the smart meters and devices should be
enabled with protection against tampering, manipulation and reprogramming.
9.4 Conclusions
Proposed framework for energy consumption control and management through three-
layered Cloud–Fog-IoT architecture for deployment ensure decentralization of the
function of conventional central IoT platform. In general, Cloud architecture has
some limitations due to its geographically centralized and multi-hop nature from IoT
devices data source and hence inclusion of Fog layer in the proposed framework.
This inclusion shall address limitations by endorsing decentralization of devices,
applications and platforms. The decentralization shall lead to release of total data
burden on central system by distributing the same through the terminal and perception
layers where much data analytics is done at the initial stages of data acquisition stages.
The proposed framework is developed for energy monitoring system where the all
the limitation of conventional energy monitoring is eliminated and addressed. In
this work, motivation to propose Cloud–Fog-IoT-based architecture framework for
energy consumptions monitoring is because of flexibility in three different layers
with three different approaches with inclusion of drones. The present work gives
a scope to further work on performance evaluation by considering parameters like
network delay, latency, cost, etc., on the proposed architecture through a simulation
tool kits to demonstrate feasibility of the proposed architecture.
References
1. Pal, A. (2015). Internet of Things: Making the hype a reality (PDF). IT Pro, 17(3), 2–4. https://
doi.org/10.1109/MITP.2015.36
2. Rashdi, A., Malik, R., Rashid, S., Ajmal, A., & Sadiq, S. (2012). Remote energy monitoring,
profiling and control through GSM network. In 2012 International Conference on Innovations
in Information Technology (IIT).
3. Alfandi, O., Hasan, M., & Balbahaith, Z. (2019). Assessment and hardening of IoT development
boards. Lecture notes in computer science (pp. 27–39). Springer. https://doi.org/10.1007/978-
3-030-30523-9_3. ISBN: 978-3-030-30522-2
4. Chauhuri, A. (2018). Internet of things, for things, and by things (p. 9781138710443). CRC
Press.
5. Das, H., & Saikia, L.C. (2015). GSM enabled smart energy meter and automation of home
appliances. IEEE. 978-1-4678-6503-1.
6. Lloret, J., Tomas, J., Canovas, A., & Parra, L. (2016). An integrated IoT architecture for smart
metering. IEEE Communications Magazine, 54(12), 50–57.
94 A. K. Damodaram et al.
7. Solaimani, S., Keijzer-Broers, W., & Bouwman, H. (2015). What we do—And don’t—Know
about the Smart Home: An analysis of the Smart Home literature. Indoor and Built Environ-
ment, 24(3), 370–383. https://doi.org/10.1177/1420326X13516350. ISSN: 1420-326X. S2CID
59443602.
8. Why edge computing is an IIoT requirement: How edge computing is poised to jump-start the
next industrial revolution. Retrieved June 3, 2019 from iotworldtoday.com
9. Staff, Investopedia. (2011). Cloud computing. Investopedia. Retrieved October 8, 2018.
10. Hsu, C.-L., & Lin, J.C.-C. (2016). An empirical examination of consumer adoption of Internet
of Things services: Network externalities and concern for information privacy perspectives.
Computers in Human Behavior, 62, 516–527.
11. Alsulami, M. M., & Akkari, N. (2018). The role of 5G wireless networks in the internet-
of-things (IoT). In 2018 1st International Conference on Computer Applications Information
Security (ICCAIS) (pp. 1–8). https://doi.org/10.1109/CAIS.2018.8471687. ISBN: 978-1-5386-
4427-0
12. Hassan, Q., Khan, A., & Madani, S. (2018). Internet of things: Challenges, advances, and
applications (p. 198). CRC Press. ISBN: 9781498778510.
13. Hamilton, E. (2019). What is edge computing: The network edge explained. Retrieved June
14, 2019 from cloudwards.net
14. Reza Arkian, H. (2017). MIST: Fog-based data analytics scheme with cost-efficient resource
provisioning for IoT crowdsensing applications. Journal of Network and Computer Applica-
tions, 82, 152–165. https://doi.org/10.1016/j.jnca.2017.01.012
15. Rouse, M. (2019). Internet of things (IoT). IOT Agenda. Retrieved August 14, 2019.
16. Raji, R. S. (1994). Smart networks for control. IEEE Spectrum, 31(6), 49–55. https://doi.org/
10.1109/6.284793 S2CID 42364553.
17. Boyes, H., Hallaq, B., Cunningham, J., & Watson, T. (2018). The industrial internet of things
(IIoT): An analysis framework. Computers in Industry, 101, 1–12.
18. Perera, C., Liu, C. H., & Jayawardena, S. (2015). The emerging internet of things marketplace
from an industrial perspective: A survey. IEEE Transactions on Emerging Topics in Computing,
3(4), 585–598.
19. Kabir, Y., Mohsin, Y. M., & Khan, M. M. (2017). Automated power factor correction and
energy monitoring system. IEEE
20. Nguyen, D. T., Song, C., Qian, Z., Krishnamurthy, S. V., Colbert, E. J., & McDaniel, P.
(2018). IoTSan: Fortifying the safety of IoT systems. In Proceedings of the 14th Interna-
tional Conference on emerging Networking EXperiments and Technologies (CoNEXT ‘18).
Heraklion, Greece. arXiv:1810.09551.https://doi.org/10.1145/3281411.3281440
Chapter 10
Bio-Inspired Firefly Algorithm
for Polygonal Approximation on Various
Shapes
L. Venkateswara Reddy, Ganesh Davanam, T. Pavan Kumar,
M. Sunil Kumar, and Mekala Narendar
Abstract Polygonal approximation (PA) is a challenging problem in representation
of images in computer vision, pattern recognition and image analysis. This paper
proposes a stochastic technique-based firefly algorithm (FA) for PA. This technique
customizes a kind of randomization by searching a set of solutions. In contrast, PA
requires more combination of approximation to find an optimal solution. The algo-
rithm involves several steps to produce better results. The attractiveness and bright-
ness of the firefly have been used efficiently to solve the approximation problem.
While compared to other similar algorithms, FA is independent of velocities which
are considered as an advantage for this algorithm. Subsequently, the multi-swarm
nature of FA allows finding multiple optimal solutions concurrently. This technique
achieves the main goal of PA that is minimum error value with less number of domi-
nant points. The experimental results show that proposed algorithm generates better
solutions compared to other algorithms.
10.1 Introduction
Extraction of boundary points from a shape and representation of the same shape
with lesser number of boundary points is a state-of-the-art. This technique is termed
as polygonal approximation (PA) for the closed curves. These are applied in various
L. Venkateswara Reddy
Department of Computer Science and Engineering, KG Reddy College of Engineering and
Technology, Chilikuru (Village), Moinabad (M), R R District, Hyderabad, Telengana, India
G. Davanam (B)·M. Sunil Kumar
Department of Computer Science and Engineering, Sree Vidyanikethan Engineering College
(Autonomous), Tirupati, Andhra Pradesh, India
e-mail: dgani05@gmail.com
T. Pa va n K umar
Department of Computer Science and Engineering, Koneru Lakshmaiah Educational Foundation,
Vaddeswaram, Andhra Pradesh, India
M. Narendar
Jayamukhi Institute of Technological Sciences, Hanmkonda, Telangana, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
B. N. K. Rao et al. (eds.), Intelligent Computing and Applications, Smart Innovation,
Systems and Technologies 315, https://doi.org/10.1007/978-981-19-4162-7_10
95
96 L. Venkateswara Reddy et al.
applications such as image analysis, computer vision and pattern recognition. The
accuracy depends on the compatibility of the approximated polygon to the original
polygon shape. As per the compactness, the error values differ and produce different
approximation. The points which are considered for the approximation is named
as a dominant points (DP) of the approximation. PA involves approximation of a
digital planar curve using line segments that are categorized as curvature maxima
and minima points. The technique involves determination of an object boundary
by tracing straight lines and approximating its outline to the best possible manner.
However, the time and space complexity of the algorithm increases with the number
of DP that cannot be usually manipulated in real time. This paper attempts to create a
novel mechanism to enable one to decide the accuracy required for tracing an object
outline using least possible. This paper begins with outlining background behind
this research and discusses various related works pioneered by various researchers
and then goes on to propose a sub-optimal algorithm. The authors subsequently
explain their derivation. This paper concludes with the advantages and limitations
of the algorithm as well as provides and overview of the various avenues where this
algorithm might find its usage. The works of few decades were used split, merge,
split-and-merge and sequential techniques for solving the polygonal approximation
[110].
The polygonal approximation follows any one of these criteria,
min-ε:fix Mvalue, search for the optimal Mvertices to approximate polygon
with minimum value of total distortion error.
min-#:fixε, identify the vertices to approximate the polygon whose approximation
error does not exceed ε.
Ramer [9] used split-and-merge approaches and proposed an iterative method by
taking initial boundary points as input. The segments are splitted in each iteration
by considering the farthest point until the approximation error is not exceeding the
specified error value.
Masood [11] proposed an algorithm based on dominant point deletion for polyg-
onal approximation. The initial points are detected as the pre-processing steps and
eliminates the point one-by-one depends upon the error associated with each DP.
Once a point is deleted the neighbouring point’s error value is updated. The algorithm
does not guarantee optimal results.
Marji [12] proposed a nonparametric approach for the dominant point detection.
The 8-connected chain code is used efficiently for the PA based on the strength of
the point (left support region +right support region), length of support region and
centroid.
In this paper, Sect. 10.2 includes the existing meta-heuristic algorithms which
includes iPSO, GA, DPSO-EDA, DPSO, PSO, mPSO, etc. Section 10.3 summarizes
the problem formulation, and Sect. 3.1 gives an overview about Freeman’s chain
code computation and explains the rules in computing the freeman’s chain code
for a digital shape with illustrations. Section 10.4 narrates about the significance of
firefly algorithm (FA). Sect. 4.1 suggests the proposed firefly algorithm for polyg-
onal approximation. Section 5 explains the experimental results of the proposed
10 Bio-Inspired Firefly Algorithm for Polygonal 97
algorithm comparative to the similar existing algorithms. Finally, Sect. 10.6 presents
the conclusion of the paper.
10.2 Problem Formulation
A closed digital curve CD with Npoints which are clockwise ordered sequence
of contour points are represented as C={Ci(Xi,Yi)/i=1,2,3,...N}, where ith
contour point Ciholding Xi,Yiand CN+i=C1.LetBij ={Ci,Ci+1 ,…,Cj) represents
a bend from Cito Cj.TheLS
ij represents the line segment between Ciand Cj, i.e. the
chord of Bij. The distortion error associated with approximating the bend Bij using
its chord LSij is determined using Eq. (10.1).
e(LS
ij,Bij)=
n
cK
d(Ck,Lij)(10.1)
where d(Ck,Lij)is the perpendicular distance from the line segment LS
ij to the
contour point, Ck. The approximated polygon retains the shape with M’ number of
contour points and is represented as
NC ={nci(xi,yi)/i=1,2,3,...M}
where Mis always less than N’.
Figure 10.1 shows the sample approximation of the shape apple-5 from the
MPEG7_CE-Shape-1_Part_B. Figure 10.1a shows the original image with N=881,
and Fig. 10.2a shows the sample approximation with M=55. The approximated
image shows that it retains the same shape of the original image with lesser number
of contour points. The points which retain the same shape are decided as dominant
points. The points which are suppressed are decided as weak points of the shape.
Many techniques are used for the approximation of the shape with different criterion
and error metrics. Most of the techniques are split, merge, split and merge, sequential
and iterative in nature points. [9,10]. Most of the methods involve obtaining initial
set of candidates using freeman chain code assignment and then either compute the
redundant break point deviation from the curve or create pseudo points and attempt
to calculate the deviation of each point with respect to the median value.
10.2.1 Basic Pre-processing
The basic pre-processing which is used to suppress the collinear points and used
to detect the initial break points on most of the algorithm was proposed by Marji
et al. [12], Masood [11]. These break points are detected using freeman’s chain code
98 L. Venkateswara Reddy et al.
a. Original Contour N=881
(apple- 5)
b. Approximated contour M = 55
(apple-5)
Fig. 10.1 Sample approximation. aOriginal contour N=881 (apple-5)andbapproximated
contour M=55 (apple-5)
using eight different directions which are separated by 45° each direction as shown
in Fig. 10.2a.
A closed digital curve CD with Npoints which are clockwise ordered sequence
of contour points are represented as C={ci(Xi,Yi)/i=1, 2, 3,…, N}, where ith
contour point Ciholding Xi,Yi)(xi,yi) and Cn+I=C1CN+1 . The chain code which is
generated using this approach is shown in Fig. 10.2b, and the extracted breakpoints
for the chromosome shape is shown in Fig. 10.2c.
The following condition should be followed while computing the breakpoints as
given in Eq. (10.2)
CCi= CCi1(10.2)
where CCirepresents the chain code of the ith point which is compared with CCi1
if it is not equal, then it is treated as a break point of the shape.
10.3 Firefly Algorithm
The proposed algorithm for polygonal approximation is nature-inspired algorithm
which is suitable to produce a better approximation. Firefly algorithm is one among
the nature-inspired algorithm.
In the hot and moderate regions, firefly’s flashing light is a wonderful sight. There
are about thousands of fireflies which produce periodic and short flashes. The flashing
10 Bio-Inspired Firefly Algorithm for Polygonal 99
(a) Eight direction of Freeman (b) Chain code for chromosome
5 5 4 5 4 3 2 0 110
1 1 1 1 1 2 1 1 21 2
0 0 6 6 5 6 5 56 0 0
1 0 1 0 7 6 55 5 4 5
5 5 5 5 5 5 5 5 4 31
1
2 1 2 2
(c) Extracted breakpoints using Freeman’s chain code
Fig. 10.2 Chain code illustration
light of fireflies is produced by the bioluminescence and signalling systems. That is
quiet rhetoric. These flashes are to attract the pair of partners and potential target.
The light intensity, Lof the flash decreases when the distance, dincreases. It is
denoted by L1/d2. As the distance increases, the intensity of the flash decreases
and becomes weaker. These reasons cause the firefly’s visibility to a limited distance
and are typically sufficient to communicate to each other. The optimized objective
function is formulated and associated to the flashing light, which leads to articulate
new optimization algorithm.
On describing the new firefly algorithm (FA), the following three rules to be
idealized:
1. Entire fireflies are same sex so that one firefly will be fascinated to other fireflies
irrespective of their sex.
2. Brightness is proportionate to their attractiveness; one less bright firefly is
attracted to the brighter one. Both attractiveness and brightness are decreased
100 L. Venkateswara Reddy et al.
when their distance increases. If no brighter one is availed, then it moves
randomly.
3. A firefly’s brightness is evaluated by the fixation of the objective function. The
objective function defines the brightness factor.
Algorithm for the Firefly algorithm (FA)
Input: Firefly population P =(pi, ………., pN), Objective function OF (xi)
Output: The best solution pbest and its value OFmin =min (OF (pbest ))
Initial population generation IP =(pipi, ………., pip N),
OF(p0i)=estimate new best_solution and upgrade the intensity of light
k=0;
do
fori =1: n
forj=1: i
if (OF(p0ji)>OF(p0j i)),
end if
Attractiveness varies with distance d
calculate new solutions and upgrade intensity of light
end for j
end for i
Rate the fireflies and identify the recent best
k=k+1
while (k <Max_Gen)
end while
The firefly algorithm is another populace-based algorithm, and every member
represents an initial solution of the problem to be solved and thus signifies a point in
the search space. The initial solution is denoted as follows (10.3)
Ci=(Ci1,...,CiD)for i=1,...,N(10.3)
where Nrepresents the size of the population, and Drepresents the problem’s
dimensionality.
The initial solution is given by Eq. (10.4)
C(0)
ij =U(0,1).(ub lb)+lb,for i=1,..., N(10.4)
where U(0, 1) denotes a random number in the interval [0,1] of uniform distribution
and lbjrefers to the lower limit and ubjrefers to the upper limits of the jth problem
variable.
The variation operator works on the intensity of the light, Ir2and it can be
formulated as Eq. (10.5)
L(r)=ler2(10.5)
where I0denotes the source light intensity and fixed light absorption co-efficient γ.
Usually the fireflies get attracted to one another when it meets a brightest firefly in
10 Bio-Inspired Firefly Algorithm for Polygonal 101
its environment. Similar to the intensity of light, the attractiveness βis influenced
by the distance r. It can be computed by Eq. (10.6)
β(r)=β0er2for K >=1 (10.6)
where β0indicates the attractiveness at the distance, r=0.
Euclidian distance is used to represent the distance between any two fireflies and
is given by Eq. (10.7)
Nij =
xixj
=
D
k=1
xikxjk(10.7)
where xiand xjrepresent any two fireflies and Xik and Xjk represent the kth element of
the ith and jth firefly positions with respect to the search space, and Drepresents the
problem’s dimensionality. A firefly, igets attracted to the firefly j, and is represented
by Eq. (10.8)
Xi=xi+βoer2xjxi+α·ei (10.8)
where a step size scaling factor is denoted by and randomization parameter is
denoted by ε. Equation (10.8) shows the sum of the position of the ith firefly with
the social component of ith firefly when attracted and move towards the jth firefly
and the ith firefly randomized move within the search space.
10.3.1 Firefly for Polygonal Approximation
The firefly algorithm has advantages which includes the automatic subdivision and
facilitate the multimodal problem.
1. Firefly algorithm works on the concept of attractiveness. The attraction decreases
with increase in the distance, which automatically subdivide the whole population
into subdivisions. Within every subdivision, the members of the subdivision
swarm around each mode or local optimum. The best global solution is identified
using all these modes. Similarly, polygonal approximation also considers the
local optimum, which leads to the global optimum solution. The perpendicular
distance is calculated between the three consecutive points in the shape, so the
points with higher distance are considered for approximation. If the perpendicular
distance is less, then the points are suppressed during the approximation.
2. This subdivision permits the fireflies to detect all optima at the same time if the
population size is adequately greater than the number of modes. The split and
merge with combination of iterative process are used to identify the dominant
102 L. Venkateswara Reddy et al.
points. The algorithm works for multi-objective problem in polygonal approxi-
mation that is total distortion must decrease when the number of points decreases.
Thus approximation is achieved.
Algorithm for polygonal approximation using FA
Input: The contour points P =(pi, ………., pN), with the coordinates
{(xi,yi,),…………,(xN,yN)} Objective function f(pi)
Output: The best solution f(p
best)and its value fmin =min(f(pbest ))
Initial population generation pip =(pipi, ………., pip N),
f(p0i)=estimate new solution and upgrade
k=0;
do fori=1: n
forj=1: n
if (f(p0ji)>f(p0j i)) then
end if
Attractiveness varies with distance d
end for j
f(pi(t))=calculate new solutions and upgrade
end fori
Rate the points and identify the recent best
t=k+1
while (k<Max_Gen)
The original contour points are treated as the input for the algorithm, and the
output is the best optimal solution.
The squared perpendicular distance is treated as the associated error value (AEV),
where this value is high those points are treated as a dominant point. The sum of
this associated error value is called as the integral square error (ISE) which means
the total deviation of the approximated digital curve from the original digital curve.
Initialize the firefly structure, population array and create the initial population with
best solution ever found. The iteration compares the objective function’s metric with
each other. The values generated are compared to find the minimum value and least
is used to find the optimal solution. Different solutions are generated with different
combination of the original population with respect to the number of iterations. The
minimum value obtained by the objective function is updated by the recent minimum
value, finally is compared to the best solution which is modelled.
The best solution which is obtained is treated as the optimal solution. Figure 10.3
explains the detailed flow of the entire phenomenon where the objective function is
used to generate the solution from a wide range of random population which in turn
helps in finalizing the final best solution.
10.4 Experimental Results and Discussion
The experiments has been conducted for two digital curves namely a digital curve
that contains four semicircle of 102 points as given in Fig. 10.4a and the leaf shaped
10 Bio-Inspired Firefly Algorithm for Polygonal 103
Fig. 10.3 Algorithm flow
curve of 120 points as in Fig. 10.5a. These curve shapes are bench marking shape
for testing the PA algorithms. The performance of the algorithms is compared with
other similar algorithms using those shapes.
The synthesized shapes and their computed chain code is shown in Fig. 10.4a,
b, and Figs. 10.5a, b 10.4a show the digital curve with semicircle shape before
approximation with 102 points, and Fig. 10.4b shows the chain code of the semicircle
shaped curve computed based on the freeman’s technique. Figure 10.5ashowsthe
leaf shape before approximation with 120 points, and Fig. 10.5b shows the chain
code of the leaf shape computed based on freeman’s technique. Similarly, Table 10.1
shows the comparison of results based on the number of DP, M and the ISE value.
The ISE value is a performance measure used here which helps in establishing the
error associated. The objective of our approach is to retain the shape of the digital
104 L. Venkateswara Reddy et al.
5 4 5 4 4 3 4 2
3 2 2 1 2 1 3 2
2 2 2 2 2 2 2 1
2 2 1 1 1 1 1 1
0 0 1 0 0 0 0 0
0 0 0 7 0 0 7 7
7 7 7 7 6 6 7 6
6 6 6 6 6 6 6 5
7 6 7 6 6 5 6 4
5 4 4 3 4 3 6 6
(a) (b)
Fig. 10.4 a Semicircle shape with 102 points and bchain code of semicircle shape
7 6 6 6 1 1 1 1
1 6 6 6 5 6 6 5
5 0 0 0 1 0 0 5
6 6 5 6 5 5 0 0
1 1 0 6 6 5 6 5
6 5 5 5 5 5 6 6
6 7 6 6 6 6 6 6
6 6 6 4 2 2 2 2
2 2 2 2 2 2 2 3
2 2 4 4 3 4 3 3
(a) (b)
Fig. 10.5 a Leaf shape with 120 points and bchain code of leaf shape
curve with minimum number of DP as well as with minimum error. The proposed
FA-based PA technique achieved the result which is given in Table 10.1 for reference.
Usually the standard of PA can be assessed based on the retainment of the shape of
digital curve with minimum number of DP. ISE is one of the most significant measure
in determining the quality of PA which is widely used in different PA algorithms.
The ISE refers to the error caused during the approximation of a polygonal shape.
ISE is computed as given in Eq. (10.10).
ISE =
N
k=1
Ek(10.10)
10 Bio-Inspired Firefly Algorithm for Polygonal 105
Table 10.1 Comparison of the results of semicircle and leaf obtained by the bPSO, mPSO, GA,
iPSO and the proposed (FA)-based PA algorithm
Shape of digital curve Algorithm MISE
Semicircle bPSO 21 9.11
mPSO [13]20 9.57
GA [14]20 9.68
iPSO [15]20 9.2
FA 20 9.93
bPSO 13 27.12
mPSO [13]12 28.78
GA [14]12 27.87
iPSO [15]12 26
FA 12 28.17
Leaf bPSO 24 9.62
mPSO [13]24 9.56
GA [14]24 9.48
iPSO [15]23 9.46
FA 23 9.46
bPSO 16 27.6
mPSO [13]16 27.56
GA [14]16 27.56
iPSO [15]16 26.6
FA 16 27.26
where Ekis the squared distance of the kth point of the digital curve approximating
the polygon. When the value of ISE is higher, then the compression ratio is maximum.
Similarly minimum value of ISE refers to less compression ratio.
The results for the semicircle shape after PA is shown in Fig. 10.6a, b with DP, M
=20 and M=12. The results for the leaf shape after PA with dominant points M=
23 and M=16 is given in Fig. 10.7a, b.
10.5 Conclusion
The firefly algorithm is well suited for the approximation problem and approximates
the shape with less number of DP. The algorithm detects the DP with its attractiveness
and brightness which is in the basis of the FA algorithm. The FA algorithm takes
the original contour points as the input and generates the best optimal solution as
the output. A set of random contour points are selected based on the original popu-
lation and passed to the objective function to find the minimum value. FA-based PA
106 L. Venkateswara Reddy et al.
a. M=20 b. M=12
Fig. 10.6 PA o f S e m ic i r c l e awith 20 DP and bwith 12 DP
a. M=23 b. M=16
Fig. 10.7 PA of leaf awith 23 DP and bwith 16 DP
successfully achieved the objective of retaining the shape of the digital curve with
minimum error value and minimum number of DP. The present FA-based PA has
presented a comparison of different shapes based on ISE error criterion since ISE
is a significant measure in establishing the quality of PA. The error values which is
obtained for the shape leaf and semicircle is considerably less and generates a better
approximation.
10 Bio-Inspired Firefly Algorithm for Polygonal 107
References
1. Kalaivani, S., & Ray, B. K. (2019). A heuristic method for initial dominant point detection for
polygonal approximations. Soft Computing, 23(18, 27), 8435–8452.
2. Jung, J.-W., So, B.-C., Kang, J.-G., Lim, D.-W., & Son, Y. (2019). Expanded Douglas-Peucker
polygonal approximation and opposite angle-based exact cell decomposition for path planning
with curvilinear obstacles. Applied Sciences, 9, 638.
3. Madrid-Cuevas, F. J., Aguilera-Aguilera, E. J., Carmona-Poyato, A., Muñoz-Salinas, R.,
Medina-Carnicer, R., & Fernández-García, N. L. (2016). An efficient unsupervised method
for obtaining polygonal approximations of closed digital planar curves. Journal of Visual
Communication and Image Representation, 39, 152–163.
4. Fernández-García, N. L., Del-Moral Martínez, L., Carmona-Poyato, A., Madrid-Cuevas, F.
J., & Medina-Carnicer, R. (2016) A new thresholding approach for automatic generation of
polygonal approximations. Journal of Visual Communication and Image Representation, 35,
155–168.
5. Masood, A. (2008). Optimized polygonal approximation by dominant point deletion. Pattern
Recognition, 41(1), 227–239.
6. Marji, M., & Siy, P. (2003). A new algorithm for dominant points detection and polygonization
of digital curves. Pattern Recognition, 36(10), 2239–2251.
7. Kolesnikov, A., & Fränti, P. (2003). Reduced-search dynamic programming for approximation
of polygonal curves. Pattern Recognition Letters, 24(14), 2243–2254.
8. Salotti, M. (2001). An efficient algorithm for the optimal polygonal approximation of digitized
curves. Pattern Recognition Letters, 22(2), 215–221.
9. Pikaz, A., & Dinstein, I. (1995). An algorithm for polygonal approximation based on iterative
point elimination. Pattern Recognition Letters, 16(6), 557–563.
10. Ray, B. K., & Ray, K. S. (1995). A new split-and-merge technique for polygonal approximation
of chain coded curves. Pattern Recognition Letters, 16(2), 161–169.
11. Masood, A. (2008). Dominant point detection by reverse polygonization of digital curves.
Image and Vision Computing, 26(5), 702–715.
12. Marji, M., & Siy, P. (2004). Polygonal representation of digital planar curves through dominant
point detection—A nonparametric algorithm. Pattern Recognition, 37(11), 2113–2130.
13. Wang, B., Shu, H.-Z., Li, B.-S., & Niu, Z.-M. (2008). A mutation-particle swarm algorithm
for error-bounded polygonal approximation of digital curves. In Lecture Notes in Computer
Science (pp. 1149–1155).
14. Wang, B., Shu, H., & Luo, L. (2009). A genetic algorithm with chromosome-repairing for
min# and minεpolygonal approximation of digital curves. Journal of Visual Communica-
tion and Image Representation, 20(1), 45–56.
15. Wang, B., Brown, D., Zhang, X., Li, H., Gao, Y., & Cao, J. (2014). Polygonal approximation
using integer particle swarm optimization. Information Sciences, 278, 311–326.
Chapter 11
An Efficient IoT Security Solution Using
Deep Learning Mechanisms
Maganti Venkatesh, Marni Srinu, Vijaya Kumar Gudivada,
Bibhuti Bhusan Dash, and Rabinarayan Satpathy
Abstract The Internet has become an inextricable element of human life, and the
number of Internet-connected gadgets is rapidly growing. Internet of Things (IoT)
gadgets, in particular, has become an integral component of modern life. IoT network
participants are generally resource constrained, rendering them vulnerable to cyber-
threats. Classic cryptographic techniques have been extensively used to deal with
the safety and confidentiality problems in IoT systems in this regard. Due to the
exclusive qualities of IoT nodes, available results are unable to cover the complete
defense spectrum of IoT networks. However, some difficulties are becoming more
prevalent, and their remedies are unclear. The IoT is posing an increasing number of
issues in terms of technological security. The Internet of Things, on the other hand,
has been shown to be prone to security breaches. To address security concerns, it is
critical to establish effective solutions through the progress of the latest technologies
or the integration of obtainable technology. Deep learning, a division of machine
learning, has previously demonstrated potential for finding security vulnerabilities.
IoT devices also generate a lot of data with a lot of variety and veracity. As a result,
by using big data technologies, it is possible to achieve enhanced speed and data
management. In this research, we examine the safety necessities, assault vectors,
and existing safety resolutions for IoT systems and offer a ground-breaking deep
learning strategy for IoT security.
M. Venkatesh ·M. Srinu
Department of Computer Science and Engineering, Aditya Engineering College Surampalem,
East Godavari, AP, India
V. K. Gudivada
Department of Information Technology, Nawab Shah Alam Khan College of Engineering and
Technology, Hyderabad, TS, India
B. B. Dash
School of Computer Applications, KIIT Deemed to be University, KOEL Campus, Patia,
Bhubaneswar, Odisha, India
R. Satpathy (B)
Professor CSE (FET) and Director of the Office of the VC, Sri Sri University, Cuttack, Odisha,
India
e-mail: rabinarayan.satpathy@gmail.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
B. N. K. Rao et al. (eds.), Intelligent Computing and Applications, Smart Innovation,
Systems and Technologies 315, https://doi.org/10.1007/978-981-19-4162-7_11
109
110 M. Venkatesh et al.
11.1 Introduction
In view of the quick development of arising advances like sensors, PDAs, 5G corre-
spondence, and augmented reality, creative applications like associated enterprises,
shrewd urban communities, and keen energy are being created. These applications
include: associated businesses, associated autos, associated horticulture, associated
fabricating edifices, associated medical care, shrewd retail outlets, and brilliant
production network, which are all adding to the aggregation of gigantic measures of
information. It is assessed that 50.1 billion Internet of Things (IoT) gadgets will be
associated with the Internet by 2020, as per a review directed by the National Cable
and Telecommunications Association (NCTA). The security of Internet of Things
devices has become a source of controversy as the number of devices has increased
[1,2].
As of January 1, 2018, virtually, every industry has been impacted by an onslaught
of cyberattacks and data breaches, according to McAfee Security. Aside from that, a
large number of these attacks were directed at Internet of Things (IoT) devices. As the
use of Internet of Things devices continues to rise, cybercriminals are increasingly
targeting them. Furthermore, because of the possibility of linkage, Internet of Things
devices are vulnerable [3]. According to VDC Research Group Inc., a study was
conducted in order to analyze the difficulties associated with establishing linked
devices. Approximately, 60% of the issues associated with developing connected
devices are related to security requirements [4] according to the report. According
to Kaspersky Lab data, the amount of malware trials for IoT procedures surged
substantially from 3219 samples in 2016 to 121,588 samples in 2018 [5]. There is
no denying that there are a number of vulnerabilities in Internet of Things devices.
Network-based risk monitoring is a challenge for many firms, according to [6,7]
particularly for the government, energy, and healthcare sectors as well as for deposito-
ries and research institutes. These segments also spend in safety-checking systems to
defend and lock their transportation. Massive amounts of information were created by
IoT devices, as previously stated, and this data travel via networks. Network attacks
can compromise data flow across a network. According to the research, current tech-
nologies and approaches are unable to detect hackers’ innovative attacks because of
the volume, speed, variety, and veracity of available data. Because of this, a weekly
or monthly security analytics report will fall short when dealing with significant
amounts of data. Another benefit of big data technology is that it can deal with data
volume, velocity, variety, and validity difficulties.
It is possible that data transmissions through a network will be subject to network
assaults. In the research, it is said that current methodologies and approaches are
insufficient for detecting novel assaults conducted by cybercriminals because of the
quantity of data, the rate at which it is generated, the variety of data. A weekly
or monthly security analytics report is also insufficient when dealing with massive
amounts of data in order to detect and prevent risks. According to the paper, big data
technology will also be able to deal with challenges such as data volume, velocity,
variety, and validity.
11 An Efficient IoT Security Solution Using Deep 111
11.2 Related Work
Deep learning and big data technologies may be utilized to enhance the safety of IoT
plans, according to current study. Aspect engineering, unsupervised pre-training,
and firmness ability in deep learning have lately gained appeal, making it a hot
topic right now. Deep learning may be employed even in networks with minimal
resources because of these properties. With the ability to learn on its own, produce
highly accurate findings, and handle data more quickly, deep learning has become
quite popular. A resource-controlled scheme may dart into further concerns, such as
out-of-memory access and unsafe programming languages [8]. This is crucial. Many
studies just look at one aspect of IoT security, such as deep learning, big data, or big
data analytics. Deep learning [9,10] or big data have been studied for IoT protection.
There has been no prior study that comprehensively examines the viability of merging
these two techniques in the circumstance of IoT safety, to the best of our knowledge.
Monitoring of COVID-19 patients when they are in home isolation was presented in
[11], using IOT and handling smart cities traffic management in [12]. 5G Integrated
Spectrum Selection and Spectrum Access Using AI-Based Framework for IoT Based
Sensor Networks in [13].
11.2.1 Safety in IoT Operation
Two of the most important considerations in the commercialization of IoT services
and applications are security and privacy. Security attacks range from basic hacks to
business level which are finely synchronized safety violations that include impacted
various productions, including health care and production, on today’s Internet, which
makes it an attractive target for attackers. The limitations of IoT devices, with the
surroundings in such a way that they function, supply to safety challenges that affect
mutually applications and devices. Safety and privacy issues in the Internet of Things
(IoT) have been studied from a variety of angles for example communication safety,
information safety, confidentiality, and architectural safety.
11.2.2 Gaps in the Presented Security Resolutions for IoT
Networks
To make effective use of the Internet of Things, it is critical to understand where
security and privacy concerns originated. Furthermore, because past technologies
have thrown out the moniker “IoT,” determining if security risks in IoT were novel
or merely a revision of the legacy of previous knowledge is crucial. IoT and traditional
IT devices were compared and contrasted by Fernandes et al. Their attention was
drawn to the issue of privacy, as well. Hardware, software, networks, and applications
112 M. Venkatesh et al.
all play a role in the debate over the similarities and distinctions between them.
According to these classifications, the security concerns in traditional IT and IoT are
basically comparable. While the IoT’s key concern is resource limitations, existing
advanced security solutions will have a tough time adapting to IoT networks. To
tackle the safety and confidentiality issue posed by the Internet of Things, a multi-
layered architecture and enhanced algorithms are necessary. In order to handle with
safety and confidentiality, IoT devices, for example, may require a new generation
of efficient cryptography and other algorithms due to computational limitations. For
security processes, the sheer number of IoT devices poses significant challenges.
The majority of security concerns are complex, and discrete solutions are not
possible. False positives can occur when dealing with security threats like DDoS
or infiltration, rendering the solutions ineffective. Furthermore, consumer confi-
dence will be eroded, decreasing the efficacy of these remedies. The creation of
new intelligent, robust, evolutionary, and scalable techniques to deal with IoT safety
concerns will be part of a holistic strategy for IoT security and privacy that includes
contributions from existing security systems.
11.2.3 Machine Learning: A Solution to Iot Security
Challenges
Intelligent ways to optimizing performance criteria by learning from examples or
previous experience are referred to as machine learning (s). Algorithms based on
machine learning construct behavior models from vast datasets by utilizing mathe-
matical principles. With ML, smart devices may also learn on their own, without any
assistance from a human programmer. Next,such molds were utilized to produce fore-
casts for the future elements on the recently added information. Only, a few topics like
artificial intelligence and optimization theory—as well as cognitive science—have
had a significant impact on machine learning. In situations when human knowledge is
either unavailable or ineffectual, such as while negotiating in a hostile atmosphere,
machine learning is used. It is also used in situations where individuals are not
capable to utilize its experience, such as robotics and speech gratitude. This method
can also be employed in circumstances when the answer to a precise trouble vary over
time. Aside from that, machine learning is utilized in real-world smart method; for
example, Google utilizes machine learning to identify risks associated with android
endpoints and applications. It can also be used to detect and remove malware from
phones that have been infected. As an example, Amazon has developed Macie, a tool
that uses machine learning to organize and categorize data stored on the company’s
cloud storage platform. Machine learning techniques are successful in a variety of
fields, but here is a risk of fake positives and true negatives when applying them.
Hence, ML methods necessitate direction and model correction in the event that
imprecise predictions are generated. A new type of machine learning, deep learning
11 An Efficient IoT Security Solution Using Deep 113
(DL), on the other hand, empowers models by allowing them to independently deter-
mine the correctness of their predictions. In particular, because of their self-service
nature, deep learning models are better suited for categorization and prediction
responsibilities in new IoT elements that provide background and tailored support.
However, despite the widespread use of traditional approaches for various aspects
of the Internet of Things (such as application and service development, architec-
tural design and protocol development, information aggregation, source allotment
and grouping), with safety, the huge scale consumption of the IoT calls for the
development of intellectual, vigorous, and consistent methods.
11.3 Methodology
An artificial neural network (ANN)-based computing system known as deep learning
(DL) is mainly effective and proficient part of machine learning today. DL, which is
a subset of machine learning that relies on deep learning, is a total system stimulated
by the natural mind and able to learn from a large number of training examples.
Various domains have made use of the DL, and it is well identified for its capability
to distinguish best characteristics in unrefined information by applying a series of
nonlinear transformations, each of which gains in complexity and abstraction. In-
depth learning is classified into three types: supervised, unsupervised, and semi-
supervised (a combination of the three).
For the first time, a neural network was proposed called FNN or deep FNN
(DFNN). Layers of a FNN include input, one or more hidden layers, and the final
output layer. Every neuron layer isn’t recursively associated with all next layer
neurons by making a non-recursive association with the following layer without
framing a cycle back. In other words, it “indicates the relationship between a neuron
and another neuron.” In a neural network, the weight coefficient indicates how
important a particular connection is Fig. 11.1.
In 1998, LeCun built on Fukushima’s work from 1980 to create the “Convolutional
Neural Networks” deep learning model (also known as the “MPL neural networks
version”). CNN has had a lot of success in computer vision applications over the
previous two decades. It is an advanced version of the DFNN, with additional layers
for convolution, activation, and pooling. A filter (matrix of numbers) is used in the
convolutional layer to transform data (usually pictures) according to the values from
the filter. The convolution kernel is used in this layer (see Fig. 11.2). Contribution
information is represented by “f,” O.S data by “h, and row and column indices of
the output matrix by “m” and “n,” respectively, are used to construct the feature map
values. After that, it multiplies each kernel value by two, using the corresponding
data values as the multiplicands. In the end, the sum is saved as a production attribute
plan. The grouping sheet reduces the size of subsequent layers by using maximum or
standard grouping to assist prevent overfitting. Max pooling parts the contribution to
non-covering bunches and chooses the greatest incentive for each group in the past
layer.
114 M. Venkatesh et al.
Fig. 11.1 Feed forward neural networks layers
G[m,n]=(fh)[m,n]=
j
k
h[j,k]f[mj,nk]
A flattened input is used in the fully connected layer, and it is associated to every
neurons. It is the launching purpose of nodule that determines its production from
a set of contributions. All volume elements are activated using ReLU, which is a
rectified linear unit. Its objective is to increase the network’s nonlinearity.
CNN, in contrast to standard feature selection algorithms, can automatically learn
new characteristics and classify traffic. Since it uses the identical convolution matrix,
it can classify superior and discover more characteristics from traffic information,
which reduces the quantity of constraints and training calculations considerably.
Fig. 11.2 CNN layers
11 An Efficient IoT Security Solution Using Deep 115
11.4 Results and Discussion
11.4.1 Dataset Used
Datasets such as KDDCUP99 and its expanded version NSL-KDD, ISCXIDS2012,
CSE-CIC-IDS 2018, and remaining are widely used for Network IDS, but, as previ-
ously stated, these datasets were not appropriate for Internet of Things (IoT). The
UNSW-NB15 dataset appears to be a fascinating dataset, and bot-IoT dataset was
generated by building a practical IoT network setting with a mix of standard and
botnet transfer. These two datasets were uncovered during our investigation.
11.4.2 Training and Test Data
Training machine learning algorithms is accomplished through the use of statistical
features derived from innocuous network traffic data. Capturing raw network traffic
data via port mirroring on a network switch is an effective method of gathering data.
As soon as the device was linked to the network, the IoT network traffic was captured
and analyzed. An overview of the network traffic gathering can be found in the below.
(1) The network traffic came from a well-known IP address.
(2) All network sources got their MAC and IP addresses from the same place.
(3) Between the source and destination IP addresses, known data are sent.
(4) Information about the destination TCP/UDP/IP port is gathered.
A total of 115 characteristics were extracted from each of the five time frames,
which were divided into increments of 100 ms, 500 ms, 1.5 s, 10 s, and 1 min.
These characteristics can be computed fast and gradually, allowing rogue packets
to be identified in real time. The statistical range of normal network traffic feature
statistical values is displayed across the maximum sample collection time window
as well (1 min). When it comes to catching basis IP spoofing and other typical
malware flaws, these characteristics are extremely useful. Because of the behavior
of a hacked IoT device spoofing an IP address, the elements amassed by the source
MAC/IP (highlight variable MI) and IP/channel (include variable HP) will quickly
flag a high irregularity score because of the hacked IoT gadget’s conduct.
Understanding the data can aid in determining whether a machine learning mold
would be effective for categorizing data using a classifier approach based on the
information included in it. Figure 11.3 shows the data characteristics of both ordinary
traffic and malicious attack traffic placed on top of one another. In this case, there is
no direct relationship between the data attributes and the other data properties. It is
not possible to determine the data for IoT traffic merely by examining the correlations
between normal and attack traffic. The types of attacks, as well as the characteristics
of the network, are critical indicators for protecting against malware attacks.
116 M. Venkatesh et al.
Fig. 11.3 Comparison of feature data against attacks
11.5 Conclusion
The growing number of IoT devices has prompted researchers to explore the secu-
rity threats they pose. Due to current increased assaults such as the Carna and Mirai
botnets, IoT procedures have been demonstrated to be vulnerable. Furthermore, IoT
devices generate a tremendous amount, speed, and diversity of information. Existing
methods become less competent as a result, necessitating the use of modern-day
alternatives. Deep learning has achieved widespread acceptance among researchers
and organizations as a result of its elevated accurateness, capability to study deep
features, and lack of human supervision. Based on the findings of our inquiry, we
can infer that much effort has been put into studying IoT security in recent years. In
fact, deep learning was used in a diversity of research areas, including cybersecurity
and intrusion detection systems (IDSs), where cybersecurity has yielded a number of
promising discoveries, opening the way for more powerful security in IoT contexts.
The study focused on datasets for training models; nevertheless, the findings should
be applicable to IDS autonomy in real-world IoT contexts by getting fresh infor-
mation utilizing deep reinforcement learning algorithms to produce powerful and
efficient molds.
References
1. Mohan, N., & Kangasharju, J. (2016). Edge-fog cloud: A distributed cloud for internet of things
computations. In Proceedings of cloudification of the internet of things (CIoT) (pp. 1–6). https://
doi.org/10.1109/CIOT.2016.7872914
2. Habeeb, R. A. A., Nasaruddin, F., Gani, A., Hashem, I. A. T., Ahmed, E., & Imran, M.
(2019). Real-time big data processing for anomaly detection: A survey. International Journal
of Information Management, 45, 289–307.
11 An Efficient IoT Security Solution Using Deep 117
3. Davis, G., & Davis, G. (2018). Trending: IoT malware attacks of 2018. https://securingt
omorrow.mcafee.com/consumer/mobile-and-iot-security/top-trending-iot-malware-attacks-
of-2018. Accessed on May 10, 2019.
4. Wong, W. G. (2015). Developers discuss IoT security and platforms trends. https://www.electr
onicdesign.com/embedded/developers-discuss-iot-security-and-platforms-trends. Accessed
on May 1, 2019.
5. New trends in the world of iot threats. https://securelist.com/new-trends-in-the-world-of-iot-
threats/87991/. Accessed on May 10, 2019
6. Katal, A., Wazid, M., & Goudar, R. H. (2013). Big data: Issues, challenges, tools and good
practices. In: 2013 6th International Conference on Contemporary Computing (IC3), IEEE.
https://doi.org/10.1109/ic3.2013.6612229
7. Cardenas, A. A., Manadhata, P. K., & Rajan, S. P. (2013). Big data analytics for security. IEEE
Security & Privacy, 11(6), 74–76. https://doi.org/10.1109/msp.2013.138
8. McDermott, C. D., Majdani, F., & Petrovski, A. V. (2018). Botnet detection in the internet
of things using deep learning approaches. In: 2018 International Joint Conference on Neural
Networks (IJCNN) (pp. 1–8). IEEE
9. Aly, M., Khomh, F., Haoues, M., Quintero, A., & Yacout, S. (2019). Enforcing security in
internet of things frameworks: A systematic literature review. Internet of Things 100050.
10. Pan, J., & Yang, Z. (2018). Cybersecurity challenges and opportunities in the new edge
computing+ IoT world. In: Proceedings of the 2018 ACM International Workshop on Security
in Software Defined Networks & Network Function Virtualization (pp. 29–32). ACM
11. Reddy Madhavi, K., Vijaya Sambhavi, Y., Sudhakara, M., & Srujan Raju, K. (2021). Covid-
19 isolation monitoring System, Springer series—Lecture Notes on Data Engineering and
Communication Technology.https://doi.org/10.1007/978-981-16-0081-4_60
12. Rizwan, P., Suresh, K., & Babu, M. R. (2016). Real-time smart traffic management system
for smart cities by using Internet of Things and big data. In 2016 International Conference on
Emerging Technological Trends (ICETT). IEEE.
13. Sekaran, R., Goddumarri, S. N., Kallam, S., Patan, R., Ramachandran, M., Al-Turjman, F.
(2021). 5G integrated spectrum selection and spectrum access using ai-based framework for
iot based sensor networks, Computer Networks,186.
Chapter 12
Intelligent Disease Analysis Using
Machine Learning
Nagendra Panini Challa , J. S. Shyam Mohan, M. Naga Badra Kali,
and P. Venkata Rama Raju
Abstract Heart diseases include disordered functioning of heart which can be saved
through early diagnosis. This diagnosis needs a lot of time to perceive the patient
through an accurate approach for treatment. Technical advancements are a boon to
healthcare domain for analyzing huge amounts of data generated by various hospitals.
These data can be further preprocessed and filtered according to the disease analysis.
In this paper, logistic regression (LR) and support vector machine (SVM) models are
incorporated for effective prediction of heart disease. The results achieved are 93%
accurate when the datasets are compared with SVM model.
12.1 Introduction
Heart is the most essential organ for any individual which involuntarily pumps blood
to all the parts of the body starting from the brain to the smallest tissue of our body.
Any miniature problem within the heart can cause many disruptions within the whole
body [1]. The symptoms of heart disease vary from persons when different related
attributes such as heartbeat, chest pain and its associated symptoms are associated
[2]. Heart diseases contribute to most deaths faced by any individual every year as
more than 16 million people [3]. Due to the technical advancements in the disease
prediction which are based on attributes has become most popular technique for
providing accurate diagnosis of chronic heart diseases at an early stage.
P. Shaji and P. Mamatha Alex (2019) have developed a model for heart disease
prediction which uses different machine learning algorithms like support vector
N. P. Challa (B)
School of Computer Science and Engineering (SCOPE), VIT-AP University, Amaravati, India
e-mail: paninichalla123@gmail.com;nagendra.challa@vitap.ac.in
J. S. Shyam Mohan
Department of Computer Science and Engineering, Sri Chandrasekharendra Saraswati Viswa
Mahavidyalaya, Kanchipuram 631561, India
M. Naga Badra Kali ·P. Venkata Rama Raju
Department of Computer Science and Engineering, Shri Vishnu Engineering College for Women,
Bhimavaram 534202, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
B. N. K. Rao et al. (eds.), Intelligent Computing and Applications, Smart Innovation,
Systems and Technologies 315, https://doi.org/10.1007/978-981-19-4162-7_12
119
120 N. P. Challa et al.
machine, artificial neural network, K-nearest neighbors, and random forest. The
best accuracy was produced by artificial neural networks with respect to their dataset
[4]. Vikas Deep et al. worked on heart disease prediction using machine learning.
They used the approach of rule learning in their study. Their research helps a lot in
the reduction of human force. The data can be directly retrieved from the equipped
electronic gadgets. They made strong association rules. They performed data mining
on the dataset where patient’s medical history was there, which resulted in reducing
the services and proved that many rules together help for the most accurate prediction
of cardiovascular disease [5].
12.2 Literature Survey
B. Gomathy and et al. have developed a model which included big data techniques
powered by the MapReduce algorithm. The technique of linear scaling was in turn
increased the accuracy of the model [6]. Sai Deepak Ravikanti and et al. proposed
a model based on cryptographic AES algorithm for safe transfer of the data and
Naive Bayesian techniques for classification of data and prediction of cardiovascular
disease [7] (Table 12.1).
Thomas and Theresa Princy R. J. together implemented various machine learning
models for predicting heart disease accurately to find out the accuracy based on
Table 12.1 Literature survey
Reference number Title Research finding
[1] Heart disease prediction using new
age computing techniques
Different new age computing
approaches like machine learning
and deep learning are applied UCI
dataset for result analysis
[2]Prediction of coronary heart disease
using risk factor categories
To examine the association of
JNC-V blood pressure and NCEP
cholesterol categories with
coronary heart disease (CHD) risk
[3] Heart disease prediction using
machine learning techniques
The prediction is carried out using
various supervised learning
techniques like Naïve Bayes,
decision tree, and so on
[4]Improving the accuracy of prediction
of heart disease risk based on
ensemble classification techniques
A novel algorithm is proposed on
medical dataset which predicts the
heart disease at an early stage of
diagnosis
[5] Heart disease prediction using
exploratory data analysis
The attributes are collected from
various risk factors which are
predicted using kmeans algorithm
with a publicly available dataset
12 Intelligent Disease Analysis Using Machine Learning 121
various attributes [8]. C. Chetan et al. used machine learning classification and exper-
imented on predicting heart disease. They included mean absolute error (MAE), root
mean squared error, and sum of squares error as performance measure [9]. This
research identified that the support vector machine is the most accurate algorithm
in terms of accuracy. SVM gave a better accuracy than Naive Bayes [10]. After
incorporating the given attributes related to cardiovascular diseases, a deep learning
neural network model was developed. They included nearly 120 hidden layers, and
they were administered by the output perceptron model [11]. Fahd Saleh devel-
oped a machine learning model in which he compared the performance of various
algorithms. He used a rapid tool which led to higher accuracy compared to any
other tools like Weka tools [12]. The five algorithms he used are support vector
machine, random forest, Naive Bayes, decision tree, and logistic regression. Among
all these, the algorithm with highest accuracy was decision tree classifier [13]. A
heart disease prediction was made using different kinds of algorithms like K-nearest
neighbors, logistic regression, and random forest classifier. It is observed that every
algorithm had its own objectives registered and had its own strength. [14]. A. M.
Karandikar et al. together worked on cardiovascular diseases prediction by applying
a couple of machine learning algorithms [15]. They used Naive Bayes and decision
tree algorithms. They got higher accuracy for the decision tree than Naive Bayes
[16].
12.3 Proposed Work
The main objective of this work is to efficiently predict whether a person is suffering
from a heart disease or not. After we enter input values which are nothing but the
attribute values of a person’s health report, the output (whether that patient has heart
disease or not) is obtained [17,18]. For this work, we used the UCI Cleveland dataset
for the analysis. The source of this dataset is Kaggle [19]. This dataset totally has 76
factors. We have used 14 factors out of those. This dataset on Kaggle is preprocessed
without any null values. The attributes which are used in the project are shown below
[20] (Table 12.2).
This analysis is done using various APIs in Python programming language which
facilitates the data collection in the initial stage. For this research, the UCI repository
from the Kaggle website is chosen. The next step is data preprocessing and splitting
the data. We split all the 13 attributes except the target variable into Xand the target
attribute into Y(Fig. 12.1).
Now the data is passed into the algorithms. The algorithms used in this research
are namely support vector machine and logistic regression. Finally, this model is
evaluated based on different performance metrics like accuracy, F-score, etc.
Firstly, regression algorithm helps for classification problems with two different
classes. Here, we implemented a logistic function to get an output value between
0 and 1. In this study, there are 13 independent attributes and one target attribute.
When there are multiple attributes and the result should lie between the values 0 and
122 N. P. Challa et al.
Table 12.2 Dataset attributes
S.No. Attribute description Val u e s
1. Age—the age of the person Numerical value
2. Gender—sex of the person 0[f], 1[m]
3. Chest pain—intensity of chest pain of a person Va l u es f rom 0 t o 3
4. Rest blood pressure—this shows person’s blood pressure
value
Numerical value
5. Cholesterol—this represents the cholesterol level of a person Numerical value
6. Fasting blood sugar—it shows the fasting blood sugar of the
person
1or0
7. Electro cardio gram—this represents the ECG result of a
person
0–2
8. Heartbeat raise—this represents the maximum heartbeat of a
person
Numerical value
9. Exercise angina—it shows whether the person has exercise
induced angina
0or1
10. Old peak—this represents the depression level of a person Numerical value
11. Slope—the heartbeat fluctuations of a person during exercise Values between 1–3
12. CA—CA shows the fluoroscopy results of a person Values between 0–3
13. Thallium test result Values between 0–3
14. Tar get 0—No, 1—Yes
Fig. 12.1 Proposed system architecture
12 Intelligent Disease Analysis Using Machine Learning 123
1, logistic regression could be the best fit.
Sigmoid/Logistic Function Y=1/1+ez
Secondly, using SVM hyper planes are identified and most accurate decision
boundary is obtained. This algorithm divides the whole n-dimensional space into
classes (two classes if it is binary classification) based on the boundary. So, the new
data can be put into the correct category based on the decision boundary.
maximize f(c1...cn)=
n
i=1
ci1
2
n
i=1
n
j=1
yici(ϕ(xi)·ϕ(xi))yicj
=
n
i=1
ci1
2
n
i=1
n
j=1
yicik(xi,xj)yicj
12.4 Results and Analysis
The results of this study after applying the machine learning algorithms are as
shown below. Here, performance is analyzed using accuracy, recall, precision, and
F-measure are used. Accuracy is computed by analyzing correct predictions in all
predictions made. This is the most widely used performance metric in evaluating
machine learning models. Precision is the ratio between the relevant records to all
the instances which are retrieved. Recall shows us how efficiently the model is able
to identify the given data. It is also referred to as sensitivity or the rate of TPs.
True Positives [TP]: the person has the heart disease and the prediction is correct.
False Positives [FP]: the person does not have the heart disease but the test
prediction is wrong.
True Negatives [TN]: the person does not have the disease and the test prediction
is correct.
False Negatives [FN]: the person has the disease but the test prediction is wrong
(Fig. 12.2).
Accuracy (A)=Total number of correct predictions/[Total number of
predictions made
Recall(R)=[True Positives]/True Positives +False Negatives
Precision(P)=[True Positives]/[True Positives +False Positives]
F - Measure(Fs)=[2Precision Recall]/[Precision +Recall]
124 N. P. Challa et al.
Fig. 12.2 Results comparison
Table 12.3 Confusion matrix
Algorithm TP FP FN TN
Logistic regression 80 17 11 140
Support vector machine 89 8 6 105
Table 12.4 Performance analysis
Algorithm P R Fs A(%)
Logistic regression 0.87 0.83 0.85 87
Support vector machine 0.86 0.88 0.87 93.40
The above-mentioned performance metrics, accuracy, F-score, recall, and preci-
sion can be derived from the confusion matrix. The confusion matrix helps us to
assess the performance of the model. The confusion matrix values obtained for the
logistic regression and support vector machine are in Tables 12.3 and 12.4.
12.5 Conclusion
Out of most widely occurred people deaths around the globe majority constitute
to heart related problems. This research implements machine learning algorithms
like support vector machine and logistic regression for prediction of cardiovascular
diseases. The results clearly show that the support vector machine is the best algo-
rithm with an accuracy of 93.40% for prediction when the models are compared. In
12 Intelligent Disease Analysis Using Machine Learning 125
the future, the research may be extended to developing a web-based application or
basic GUI based on the support vector machine with perhaps a large dataset. It may
help in producing more accurate results and can efficiently help the doctors.
References
1. Pavan Kumar, T., & Avinash, G. (2019). Heart disease prediction using effective machine
learning techniques. International Journal of Recent Technology and Engineering, 8, 944–950
2. Chetty, N., Vaisla, K. S., & Patil, N. (2015). An improved method for disease prediction using
fuzzy approach. ACCE 2015.
3. Chilukuri, S. K., Challa, N. P., Mohan, J. S. S., Gokulakrishnan, S., Mehta, R. V. K., & Suchita,
A. P. (2021). Effective predictive maintenance to overcome system failures—A machine
learning approach. In H. Sharma, M. Saraswat, A. Yadav, J. H. Kim & J. C. Bansal (Eds.),
Congress on intelligent systems. CIS 2020. Advances in intelligent systems and computing (vol.
1334). Springer. https://doi.org/10.1007/978-981-33-6981-8_28
4. Mamatha Alex, P., & Shaji, S. P. (2019). Prediction and diagnosis of heart disease patients using
data mining technique. In International Conference on Communication and Signal Processing
2019.
5. Chauhan, A., Jain, A., Sharma, P., Deep, V. (2018). Heart disease prediction using evolutionary
rule learning. In International Conference on “Computational Intelligence and Communication
Technology” (CICT 2018).
6. Nagamani, T., Logeswari, S., Gomathy, B. (2019). Heart disease prediction using data mining
with mapreduce algorithm. International Journal of Innovative Technology and Exploring
Engineering (IJITEE), 8(3). ISSN: 2278-3075.
7. Repaka, A. N., Ravikanti, S. D., Franklin, R. G. (2019). Design and implementation of heart
disease prediction using Naives Bayesian. In International Conference on Trends in Electronics
and Information (ICOEI 2019).
8. Theresa Princy R., & Thomas, J. (2016). Human heart disease prediction system using data
mining techniques. In International Conference on CircuitPower and Computing Technologies,
Bangalore.
9. Lutimath, N. M., Chethan, C., & Pol, B. S. (2019). Prediction of heart disease using machine
learning. International Journal of Recent Technology and Engineering, 8, 474–477.
10. Kiyasu, J. Y. (1982). U.S. Patent No. 4,338,396. Washington, DC: U.S. Patent and Trademark
Office.
11. Alotaibi, F. S. (2019). Implementation of a machine learning model to predict heart failure
disease (IJACSA). International Journal of Advanced Computer Science and Applications,
10(6).
12. Ganna, A., Magnusson, P. K., Pedersen, N. L., de Faire, U., Reilly, M., Ärnlöv, J., & Ingelsson,
E. (2013). Multilocus genetic risk scores for coronary heart disease prediction. Arteriosclerosis,
thrombosis, and vascular biology, 33(9), 2267–2272.
13. Nikhar,S., & Karandikar, A. M. (2016). Prediction of heart disease using machine learning algo-
rithms. International Journal of Advanced Engineering, Management and Science (IJAEMS)
Infogain Publication, 2(6); Jacobs, I. S., & Bean, C. P. (1963). Fine particles, thin films and
exchange anisotropy. In G. T. Rado & H. Suhl (Eds.), Magnetism (vol. III, pp. 271–350).
Academic.
14. Piller, L. B., Davis, B. R., Cutler, J. A., Cushman, W. C., Wright, J. T., Williamson, J. D., &
Haywood, L. J. (2002). Validation of heart failure events in the Antihypertensive and lipid
lowering treatment to prevent heart attack trial (ALLHAT) participants assigned to doxazosin
and chlorthalidone. Current Controlled Trials in Cardiovascular Medicine, 3(1), 10.
126 N. P. Challa et al.
15. Folsom, A. R., Prineas, R. J., Kaye, S. A., & Soler, J. T. (1989). Body fat distribution and
self-reported prevalence of hypertension, heart attack, and other heart disease in older women.
International Journal of Epidemiology, 18(2), 361–367.
16. Zhang, Y., Fogoros, R., Thompson, J., Kenknight, B. H., Pederson, M. J., Patangay, A., &
Mazar, S. T. (2011). U.S. Patent No. 8,014,863. U.S. Patent and Trademark Office.
17. Lee, I., et al. (2012). Challenges and research directions in medical cyber–physical systems.
Proceedings of the IEEE, 100, 75–90.
18. Rajkumar, R. (2012). A cyber–physical future. Proceedings of the IEEE, 100, 1309–1312.
19. Shyam Mohan, J. S., Vedantham, H., Vanam, V., Challa, N. P. (2021). Product recommendation
systems based on customer reviews using machine learning techniques. In I. Jeena Jacob, S.
Kolandapalayam Shanmugam, S. Piramuthu, P. Falkowski-Gilski (Eds.), Data Intelligence and
Cognitive Informatics. Algorithms for Intelligent Systems. Springer. https://doi.org/10.1007/
978-981-15-8530-2_21
20. Zhou, J., Cao, Z., Dong, X., & Vasilakos, A. V. (2017). Security and privacy for cloud-based
IoT: Challenges. IEEE Communications Magazine, 55(1), 26–33.
Chapter 13
Automated Detection of Skin Lesions
Using Back Propagation Neural Network
Nagendra Panini Challa , A. Mohan, Narendra Kumar Rao,
Bhaskar Kumar Rao, J. S. Shyam Mohan, and B. Balaji Bhanu
Abstract Many skin diseases impact our human body in a drastic way. Many skin
related problems exist but this paper focuses on skin lesions which has an abnormal
growth when compared to its surrounding skin. The diagnosis is analyzed using
neural networking (NN) model where data preprocessing, skin texture identification
and classification is performed using support vector machine (SVM) method for skin
disease dataset. The results obtained specify that good accuracy 93% is achieved by
reducing the data loss.
13.1 Introduction
Skin related problems like cancer occur due to the abnormal growth of unwanted or
harmful skin cells, and they have the ability to spread to other places of human body
when not addressed properly [1]. This occurs when the human skin is exposed to sun
abnormally at different durations of the day. Generally, skin diseases are classified
in to two groups namely non-melanoma (NMSC) and melanoma (MSC) skin cancer
[2]. This is most common types of cancer which develops on the upper layer of actual
skin. There are two main types of cancers found in NMSC like squamous-cell (SCC)
N. P. Challa (B)
School of Computer Science and Engineering (SCOPE), VIT-AP University, Amaravati, India
e-mail: paninichalla123@gmail.com;nagendra.challa123@gmail.com
A. Mohan
Department of Computer Science and Engineering, Lovely Professional University, Punjab, India
N. K. Rao ·B. K. Rao
Department of Computer Science and Engineering, Sree Vidyaniketan Engineering College,
Tirupathi, India
J. S. Shyam Mohan
Department of Computer Science and Engineering, Sri Chandrasekharendra Saraswati Viswa
Mahavidyalaya, Kanchipuram 631561, India
B. Balaji Bhanu
Department of Electronics, Andhra Loyola College, Vijayawada 520010, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
B. N. K. Rao et al. (eds.), Intelligent Computing and Applications, Smart Innovation,
Systems and Technologies 315, https://doi.org/10.1007/978-981-19-4162-7_13
127
128 N. P. Challa et al.
skin cancer and basal-cell (BCC) skin cancer. MSC develops various harmful skin
cells which can be more serious when compared to NMSC [3]. These skin related
problems occur due to over exposure of ultraviolet radiation (UV) from sun and other
indoor light sources. DNA gets damaged when they are exposed which triggers the
basal cells in the epidermis causing abnormal growth in skin [4]. SCC occurs due
to skin exposure to sun which makes it red and very itchy. This is considered as a
dangerous cancer type when compared to MSC.
Most MSC cells consist of various color shades which are red and pin in color
which tend to be very harmful [5]. They change in shape and size abnormally
depending on the skin tissue. The skin exposure to UV radiations from sun causes
90% skin related cases in our country. Another type of radiation is the tanning beds
which cause cancer due to the exposure of human body during childhood is harmful
which enormously raises MSC and NMSC cells [6]. There is a chance for MSC to
impact the human skin through moles in the human body. The direct DNA will be
damaged due to the exposure of human skin to UV rays and the indirect DNA will
also be altered through the reactive oxygen species [7].
The skin cancer can be identified by various symptoms like ulcering in the skin,
skin do not heal, changes in moles, colored skin, and many more. The most vulnerable
profession for the skin cancer is farming where the farmers are exposed to the sun at
different intervals of time. Some of the other factors include smoking, HPC infections,
immune problems, and many more. Further, many partial solutions are proposed for
early diagnosis of the skin cancer which include different surgical procedures and
chemotherapy [8]. Most common solution for skin problems is the surgical excision.
These surgeries mainly focus on skin reconstruction according to the size and location
of the skin defect. The mortality rate is very less in these skin related problems as
there are many approaches for its cure and containment.
13.2 Literature Survey and Proposed Work
This CNN is very similar to the neurons of the human brain which responds to
the input dataset based on the relevant filters necessary for spatial and temporal
dependencies of the image [9]. CNN best fits the input dataset which is given which
involves in various parameters and weights reusability. The classification of the
correct image is successful only when the image pixels are near to the intensity
values of the input image [10] (Fig. 13.1).
13.2.1 Image Segmentation
This refers to splitting of the images into various segments where the required region
is identified for analysis [11,12]. Popular Otsu algorithm is used for segmentation
13 Automated Detection of Skin Lesions Using Back Propagation 129
Fig. 13.1 Convolution neural network [9]
Table 13.1 Literature survey
S.No. Title Research finding
1 An introduction to ROC data analysis ROC graphs play a major role in identifying
the accurate classifiers for performance
visualization
2Automatic segmentation of clustered
breast cancer cells using watershed and
concave vertex graph
Automatic detection and analysis of
quantum dots (DQ) are performed using
fuzzy approach
3 Brain tumor segmentation and its area
calculation in brain MR images using
k-mean clustering and fuzzy c-mean
algorithm
A novel approach for shape detection of
tumor in MR brain images
4 Brain tumor detection using color-based
k-means clustering segmentation
A novel algorithm which is based on
color-based segmentation using k-means
clustering technique for classifying tumor
objects is proposed
which is most ideal for image thresholding which is an alternative of region-based
segmentation method [13,14] (Fig. 13.2).
In this work, the RGB image is converted into grayscale where ConvNet is utilized
to process the image without losing the features which are necessary for good predic-
tion based on out scalable input dataset [15]. The main objective to utilize this function
is to extract the high-level features from the input image and the low-level features are
identified from ConvLayer function. The pooling layer in CNN plays a major role for
reducing the spatial size of the input dataset. This work focuses on various skin lesions
patterns which is based on convolution neural network [16]. The model is a subset of
deep learning algorithm which takes an input image with its associated weights. The
inbuilt functions in many tools preprocess the data using ConvNet function which
learns various features from the input dataset. The data is preprocessed effectively
with dimensionality reduction technique for the input images where pooling process
130 N. P. Challa et al.
Fig. 13.2 4×4×3 RGB input image
will be continued for achieving the necessary features with a higher computation
power [17].
13.2.2 Morphological Image Processing
This refers to the combination of nonlinear operations which are dependent of relative
ordered pairs of image pixel values generally suitable for binary images [18,19].
The structuring element in the image is identified which is positioned and compared
with the nearest/neighbor pixels for easy classification of the input image. The most
important operations are namely erosion and dilation are applied to the input dataset
for addition and removal of boundary objects of the input images [20].
13.3 Results
This work was implemented using MATLAB 2018 tool where image processing
toolbox is readily available and facilitates with different machine learning algorithms
[2123]. This helps the input dataset to easily classify and detect the skin lesions
accurately (Figs. 13.3,13.4 and 13.5).
13 Automated Detection of Skin Lesions Using Back Propagation 131
Fig. 13.3 Training progress of the datasets
Fig. 13.4 Segmented image of detected object
132 N. P. Challa et al.
Fig. 13.5 Cancer area detection in input image
13.4 Conclusion and Future Enhancement
This work mainly focuses on identifying the skin disease through a computer-based
technique which reduces the identification time of disease by any individual (derma-
tologist). A suitable segmentation technique is applied on the input set of images
which are classified based on the CNN technique. The skin disease is accurately
identified which is based on the efficient incorporated in the classification algorithm
in this work. This work using the CNN can be further extended to other medical
images for efficient identification of disease analysis.
References
1. Fawcett, T. (2006). An introduction to roc analysis. Pattern Recognition Letters, 27(8), 861–
874.
2. Mouelhi, A., Sayadi, M., & Fnaiech, F. (2011). Automatic segmentation of clustered breast
cancer cells using watershed and concave vertex graph. In International Conference on
13 Automated Detection of Skin Lesions Using Back Propagation 133
Communications, Computing and Control Applications(CCCA).
3. Sivakumar, J., Lakshmi, A., & Arivoli, T. (2012). Brain tumor segmentation and its area calcu-
lation in brain mr images using k-mean clustering and fuzzy c-mean algorithm. In International
Conference on Advances in Engineering, Science and Management (ICAESM).
4. Wu, M.-N., Lin, C.-C., & Chang, , C.-C. (2007). Braintumor detection using color-based k-
means clustering segmentation. In Third International Conference on Intelligent Information
Hiding and Multimedia Signal Processing (IIH-MSP).
5. Zhou, H., Rehg, J. M., & Chen, M. (2010). Exemplar-based segmentation of pigmented skin
lesions from dermoscopy images. In IEEE International Symposium on Biomedical Imaging:
From Nano to Macro.
6. Jones, T. D., & Plassmann, P. (2000). An active contour model for measuring the area of leg
ulcers. IEEE Transactions on Medical Imaging, 19(12), 1202–1210.
7. Tsap, L. V., Goldgof, D. B., Sarkar, S., & Powers, P. S. (1998). A vision-based technique for
objective assessment of burn scars. IEEE Transactions on Medical Imaging, 17 (4), 620–633.
8. Keke, S., Peng, Z., & Guohui, L. (2010). Study on skin color image segmentation used by fuzzy-
c-means arithmetic. In Seventh International Conference on Fuzzy Systems and Knowledge
Discovery.
9. Sarrafzade, O., Baygi, M. H. M., & Ghassemi, P. (2010). Skin lesion detection in dermoscopy
images using wavelet transform and morphology operations. In 17th Iranian Conference of
Biomedical Engineering(ICBME).
10. Challa, N. P., Gokulakrishnan, et al. (2020). A method of an artificially intelligent build
repository management system. Indian Patent Application Number: 202041026135, July 2020.
11. Cula, O. G., & Dana, K. J. (2002). Image-based skin analysis. In Proceedings of Texture
(pp. 35–40). Copenhagen.
12. Kundin, J. I. (1989). A new way to size up wounds. AJN The American Journal of Nursing,
89(2), 206–207.
13. Lucas, C., Classen, J., Harrison, D., & De, H. (2002). Pressure ulcer surface area measurement
using instant full-scale photography and transparency tracings. Advances in Skin and Wound
Care, 15(1), 17–23.
14. Keas, D. H., & Keith, C. (2004). Measure: A proposed assessment framework for developing
best practice recommendations for wound assessment. Wound Repair and Regeneration, 12(3),
1–17.
15. Shen, S., Sandham, W. A., Granat, M. H., Dempsey, M. F. & Patterson, J. (2003). Fuzzy
clustering-based application to medical image segmentation. In Proceedings of the 25th Annual
International Conference of the IEEE EMBS (pp. 747–750).
16. Otsu. N. (1979). A threshold selection method from gray-level histograms. IEEE Transactions
on Systems, Man, and Cybernetics, 9(1), 62–66.
17. Xu, R., & Wunsch, D. (2005). Survey of clustering algorithms. IEEE Transactions on Neural
Networks, 16(3), 645–678.
18. Chan, T. F., & Vese, L. A. (2001). Active contours without edges. IEEE Transactions on image
processing, 10(2), 266–277.
19. Challa, N. P., Rao, N. K., et al. (2019). Predictive maintenance for monitoring heritage buildings
and digitization of structural information. International Journal of Innovative Technology and
Exploring Engineering, 8(8), 1463–1468.
20. Yun, T., Zhou, M.-q., Wu, Z.-k., & Wang, X.-c. (2009). A region based active contour model for
image segmentation. In International Conference on Computational Intelligence and Security.
21. Dawod, A. Y., Abdullah, J., & Alam, M. J. (2010). A new method for hand segmentation using
free-form skin color model. In 3rd International Conference on Advanced Computer Theory
and Engineering (ICACTE).
22. Bezdek, J. C. (1981). Pattern recognition with fuzzy objective function algorithms. Plenum
press.
23. Abolfazl, K., Hadi, S., & Ali, A. (2011). A modified fcm algorithm for MRI brain image
segmentation. In 7th Iranian Conference on Machine Vision and Image Processing.
Chapter 14
Detection of COVID-19 Using CNN
and ML Algorithms
M. Raghav Srinivaas, Khanjan Shah, B. Abhishek, R. Jagadeesh Kannan,
and A. Balasundaram
Abstract As we see coronavirus is the very dangerous diseases and to identify this
diseases in one’s body is also not as easy. So during identification of diseases there
are many false positive cases we see that person does not have corona and still the
prediction comes true and also in some cases, it happens that person has corona but it
does not get detected (false negative case). So due to this problem, we here come up
with the two approaches and make comparison between these two approaches and
decide which one is better to analyze the diseases in the body. We are using CNN to
scan chest X-ray dataset and ML algorithms for tabular dataset as it contains many
text information too. So in this project, we explain in detail, what is CNN, what is
ML, how to implement CNN and ML algorithms on particular dataset, what output
we will get as a comparison.
14.1 Introduction
COVID-19 is one of the worst pandemic seen around the world. Many people were
affected by it and many lost their lives to it. COVID-19 started around December
2019 in Wuhan, China and started to spread around the world by the first quarter of
2020. From March 2020, lockdowns were imposed in India and it went around for
about 6 months. The vaccines were founded in early 2021 and now almost 30% of
the population is vaccinated. During the early stages of the pandemic, the detection
of the COVID-19 was a tedious job. It took about 2 to 3 days to get the test results
and by that time the condition of the people affected by the disease worsened.
In this paper, we are proposing a system to detect COVID-19 in an easy manner.
Two datasets are taken for training and testing purpose. The first one is an image
dataset consisting of chest X-ray images. The second one is a text dataset. The image
M. Raghav Srinivaas ·K. Shah ·B. Abhishek ·R. Jagadeesh Kannan ·A. Balasundaram (B)
School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, India
e-mail: balasundaram.a@vit.ac.in
R. Jagadeesh Kannan
e-mail: jagadeeshkannan.r@vit.ac.in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
B. N. K. Rao et al. (eds.), Intelligent Computing and Applications, Smart Innovation,
Systems and Technologies 315, https://doi.org/10.1007/978-981-19-4162-7_14
135
136 M. Raghav Srinivaas et al.
dataset is used with convolution neural network, and the text dataset is used with
recurrent neural network. Using this we are able to create a system that can detect
COVID-19 in an easier as well as faster manner.
14.2 Literature Survey
14.2.1 Pneumonia and COVID-19 Detection Using
Convolution Neural Network
A system to detect COVID-19 and pneumonia using neural network is proposed.
CNN and VGG 16 have been used because methods of deep learning covering
the deep CNNs techniques on X-ray images of chest are receiving recognition and
encouraging outcomes has made it known in diverse applications. Kaggle chest X-
ray database containing 15,798 chest X-ray images of normal or healthy, viral, and
bacterial-pneumonia is taken. A CNN network is constructed containing a convolu-
tion layer followed by batch normalization layer and then max pooling layer with
activation layer. These layers are repeated multiple times with a flatten and a dense
layer at the end. A learning rate of 0.0001 utilizing the Adam optimizer and a cross-
entropy using categorical were used as optimizer. A training model accuracy of 95%
was achieved. While testing an accuracy of 92% was achieved for pneumonia and
93% for COVID-19.
14.2.2 COVID-19 Detection Using Recurrent Neural
Network
A system to detect COVID-19 using recurrent neural networks is proposed. A speech
corpus consisting of 60 healthy speakers and 20 COVID patients was considered.
Three different types of sounds like cough sounds, breathing sounds and voice were
recorded. Mel-frequency spectral coefficients were used to extract features. A long
short-term memory (LSTM) network was used for detection. 70% of data was taken
for training and 30% for testing. An accuracy of 98% was achieved for breathing
sounds and 97% for cough sounds, and 88% for voice was achieved. The end result
was that only breathing and cough sounds were enough for detection of COVID-19.
14 Detection of COVID-19 Using CNN and ML Algorithms 137
14.2.3 Transfer Learning to Detect COVID-19 Automatically
from X-ray Images Using Convolutional Neural
Networks
A system that uses transfer learning to detect COVID-19 automatically from X-ray
images using convolutional neural networks is designed. An X-ray image dataset
was downloaded from Kaggle. This dataset consists of chest X-ray images from
1200 individuals with COVID-19, 1341 images from healthy individuals, and 1345
images from individuals with other types of viral pneumonia. The transfer learning
technique was applied using ImageNet data. Three more layers were added to the
top of each model namely a fully connected layer (FC2) with the output of 512,
a dropout layer, and another fully connected layer (FC1) with a softmax classifier.
A dropout layer was added to prevent overfitting. The network was trained with a
softmax classifier for 15 epochs using an RMSprop optimizer, with a learning rate
of 0.00001 and a batch size of 32. An accuracy of 96% was achieved.
14.2.4 Automatic Detection of Coronavirus Disease
(COVID-19) Using X-ray Images and Deep
Convolutional Neural Networks
A system on automatic detection of coronavirus disease (COVID-19) using X-ray
images and deep convolutional neural networks is designed. A chest X-ray images
of 341 COVID-19 patients have been obtained from the open source GitHub reposi-
tory shared by Dr. Joseph Cohen. This repository is consisting chest X-ray/computed
tomography (CT) images of mainly patients with acute respiratory distress syndrome
(ARDS), COVID-19, middle east respiratory syndrome (MERS), pneumonia, and
severe acute respiratory syndrome (SARS). 2800 normal (healthy) chest X-ray
images were selected from “ChestX-ray8” database and 2772 bacterial and 1493
viral pneumonia chest X-ray images were used from Kaggle repository called “chest
X-ray images (pneumonia)”. The data augmentation is performed after which the
pretrained models like ResNet, IncaetionV3 is loaded. Then, it is passed through
a global average pooling layer and a fully connected layer. An accuracy 95.4 was
achieved using Inception V3 model, and an accuracy of 96.1% was achieved using
ResNet model.
138 M. Raghav Srinivaas et al.
14.2.5 Use of Fuzzy Soft Set in Decision Making
of COVID-19 Risks
To make human-like decisions, we are using fuzzy inference systems which have
fuzzy reasoning, this will happens only with the help of professionals who are about to
make decisions with positive issues with their continuous hard-work and knowledge.
This type of fuzzy systems is useful when we have human information because it
needs some processing through natural languages. But the problem is that we are
not able to feed whole information or we have not sufficient information to feed to
our conventional mathematical model. The maximum usually used fuzzy inference
method is the Mamdani version. The Mamdani fuzzy inference technique is executed
in four consequent tiers; fuzzification, rule assessment, rule output aggregation, and
defuzzification. The fuzzification module maps the crisp enter price into a diploma
of club of fuzzy units by using making use of fuzzification membership capabilities.
A club function returns a value among 0 (for non-membership) and one (for full
membership). The knowledge base includes the IF-THEN guidelines which can be
provided by means of field specialists.
14.2.6 CNN-LSTM Combination for COVID-19 Prediction
Time-series prediction is a forecasting approach that analyzes historic information
to capture the relationship and tendencies of a random variable. It will then be imple-
mented to forecast the fee of that random variable in the destiny. This technique is
in particular useful if the underlying distribution/technique records era is unknown
or if there may be no explanatory model capable of precisely linking the predic-
tion variable with different explanatory variables. A tremendous deal of attempt and
manufacturing of studies has long past into the development and development of
time collection forecasting techniques during the last several a long time. The next
paragraph summarizes many fruitful researches that demonstrate several models for
forecasting COVID-19 cases. Alternatively, algorithms primarily based on artifi-
cial intelligence (AI) analyze from ancient information to forecast destiny results.
Machine-mastering and deep-studying algorithms are sorts of AI algorithms. It is a
discipline this is focused on laptop algorithms getting to know and growing on their
very own. Machine-gaining knowledge of primarily based forecasting repressor’s
change their parameters to in shape their forecasts to the actual data. Some associ-
ated research that used gadget-studying algorithms in forecasting the dispersion of
COVID-19 disease are discussed in the following paragraphs.
14 Detection of COVID-19 Using CNN and ML Algorithms 139
14.2.7 Statistical Techniques for Combating COVID-19
Computational intelligence changed into officially described by means of Bezdek in
1994 such that a machine is called “computationally clever” if the system deals with
facts on a fundamental level (together with pixels of an image), carries a module
of pattern recognition, and does not utilize previous knowledge in the experience of
synthetic intelligence. According to Bezdek’s definition, computational intelligence
is one branch of synthetic intelligence. Actually, the dreams of each synthetic intel-
ligence and computational intelligence are the same, that’s to realize trendy intelli-
gence. Marks clarified the distinction among synthetic intelligence and computational
intelligence via claiming that the previous is crafted from tough-computing technolo-
gies, while the latter is crafted from gentle-computing technologies. Therefore, we
are able to presume that types of device intelligence exist:
Artificial intelligence involves using advanced concepts of computing to impart
intelligence to the system [1]. Compared to the tough-computing-based synthetic
intelligence, computational intelligence can adapt too many exceptional situations
through the benefits of the concept of gentle-computing. Hard-computing strategies
are designed the usage of a Boolean good judgment primarily based only on true or
false values that data engineering relies on. One crucial trouble in Boolean logic is
that Boolean values are unable to interpret herbal language easily.
14.2.8 Computational Intelligence Techniques for Combating
COVID-19: A Survey
In the disciplines of health sciences and epidemiology, computational intelligence has
been employed in a variety of applications [15]. Because of the fast and widespread
spread of COVID-19, many researchers all over the world have been working hard
to create computational intelligence methods and systems to battle the pandemic.
Despite the fact that over 200,000 scientific publications have been published on
COVID-19, SARS-CoV-2, and other coronaviruses [610], none of them properly
addressed the essential challenges for using computational intelligence to battle
COVID-19. To close this gap, this survey categorizes and evaluates the present state of
computational intelligence in the battle against this deadly illness. We hope to compile
and synthesize the most recent discoveries and ideas in computational intelligence
techniques such as machine learning, evolutionary computation, soft computing, and
big data analytics in this study. COVID-19 has been turned into a practical applica-
tion. They also look into some possible computational intelligence research topics for
combating the epidemic. According to the findings, the majority of research papers
addressing the challenges of characterization of viral infection symptoms have been
created using neural networks. They got the best results (97.62% true-positive rate),
140 M. Raghav Srinivaas et al.
indicating that deep neural networks for detecting symptoms from CT images are
well-developed.
14.2.9 Toward Using Recurrent Neural Networks
for Predicting Influenza-Like Illness: Case Study
of COVID-19 in Morocco
Influenza does not just damage people’s health; it is also a major concern for
governments and health-care facilities. The most effective control technique for flu
outbreaks is early analysis, forecast, and reaction. Scientists working on artificial
intelligence (AI) are attempting to analyze epidemics, create supervised and unsu-
pervised models. In this work, they described the most commonly used machine
learning (ML) and deep learning (DL) models for analyzing time-series data to
better understand COVID-19’s behavior. Among many algorithms of ML, recurrent
neural network (RNN) was chosen for tracking this pandemic and forecasting its
future emergence. Since the first case of COVID-19 was reported in Morocco, the
total number of documented infectious cases has continued to rise, albeit the number
varies by area of the country. In addition, they offer an analysis and prediction model
for the influenza-like disease COVID-19 based on geographical distribution in this
study. The use of machine learning and deep learning models for epidemic prediction
and analysis was proposed in this study, with the LSTM model being used to fore-
cast COVID-19 pandemic expansion in Morocco. This prediction model can assist
in making critical decisions that will result in a quicker response and management
of the issue.
14.2.10 The Role of Chest Imaging in Patient Management
During the COVID-19 Pandemic: A Multinational
Consensus Statement from the Fleischer’s Society
The coronavirus disease 2019 (COVID-19) pandemic has arisen as an extraordi-
nary health-care catastrophe, with over 900,000 confirmed cases globally and almost
50,000 deaths in the first three months of 2020. COVID-19 has spread in a variety of
ways, resulting in sporadic transmission and a small number of hospitalized COVID-
19 patients in some areas, and community transmission in others, resulting in a large
number of severe cases. Critical resource restrictions in diagnostic tests, hospital
beds, ventilators, and health-care professionals who have been ill as a result of the
virus have interrupted and impaired health-care delivery in these locations, which has
been worsened by a lack of personal protective equipment. Although initial instances
14 Detection of COVID-19 Using CNN and ML Algorithms 141
resemble normal upper respiratory virus infections, as the disease progresses, respi-
ratory dysfunction becomes the leading cause of morbidity and death. Chest radiog-
raphy and computed tomography (CT) are important diagnostic and treatment tech-
niques for pulmonary illness. The findings were compiled into five primary and
three supplementary suggestions to assist medical professionals in the use of chest
radiography and CT in the treatment of COVID-19.
14.3 Proposed Methodology
14.3.1 Detecting COVID-19 Using CNN
14.3.1.1 Dataset Info
COVID-19 Radiography: COVID-19 Radiography Database
A team of researchers from Qatar University, Doha, Qatar, and the University
of Dhaka, Bangladesh along with their collaborators from Pakistan and Malaysia in
collaboration with medical doctors have created a database of chest X-ray images
for COVID-19 positive cases along with normal and viral pneumonia images.
There are a total of 21,165 samples, which are classified into four categories:
1. COVID-19
2. Lung opacity
3. Normal
4. Viral pneumonia
The photos are all in the Portable Network Graphics (PNG) file type and are
299 ×299 pixels in size. The database presently contains 3616 COVID-19 positive
cases, 10,192 normal, 6012 lung opacity (non-COVID lung infection), and 1345 viral
pneumonia pictures, according to the most recent update.
We will only train our model on two of these four classes, namely the “normal”
and “COVID” classes.
14.3.1.2 Training Phase
1. The image dataset is used as an input to CNN.
2. The data is either chest X-ray or CT scan of the chest.
3. Once the data is provided as input, the CNN starts to extract the features of the
data.
4. The image is passed through a set of convolution and max pooling layers to
reduce the feature and focus on the point on infection.
142 M. Raghav Srinivaas et al.
Fig. 14.1 Pneumonia detection using convolutional neural network (CNN)
5. After the features are extracted, the image is flattened from 2 to 1D.
6. In this way, we first train the model.
7. This model is used for detection of pneumonia as well as COVID-19 (Fig. 14.1).
14.3.1.3 Testing Phase
1. The testing phase starts with passing the input image.
2. The image is passed through a set of convolution and max pooling layers to
reduce the feature and focus on the point on infection.
3. After the features are extracted, the image is flattened from 2 to 1D.
4. Now the image is classified as positive is infection is there and negative if infection
is not there.
5. Various types of CNN can be used like Alex net, ResNet or region-based CNN.
6. We can also use the transfer learning approach where we are taking an already
trained model.
14.3.1.4 Exploratory Data Analysis
In the next section, we did an exploratory data analysis which deals with the dimen-
sion of the images in the respective classes and the number of images in each classes.
Here, we found that there was a big difference of data under each classes for which
we applied the concept of “focal loss”. After that, we did preprocessing of the images
and tried to examine a clear image with the help of computer vision techniques. The
graph for each techniques was plotted below the same image. This step was repeated
for the COVID image. Then, we dealt with data augmentation and split the data for
training and testing. After this tap, we built the model from the scratch and applied
various concepts of deep learning like pooling, dropouts, activation function, etc.,
for which the details are mentioned in the summary of the model(model.summary()).
We defined a custom loss function and compiled the model. After that we trained
14 Detection of COVID-19 Using CNN and ML Algorithms 143
our model for 25 epochs which gave an accuracy of 89.29%. This model was saved
under the name of “my_model_1.h5”. We plotted a graph of training and validation
using epoch and accuracy as the grid values. Same step was executed to plot the loss
of training and validation.
Then, we trained the model using “InceptionResNetV2”. This model was trained
while running it for 12 epochs on which we got an accuracy of 95.47% and was saved
under the name “my_model_2.h5”. We plotted a graph to observe the accuracy and
loss at different points (Fig. 14.2).
After that, we fine-tuned the model using the concept of focal loss and the same
step was executed for the model 2 above. But, at this time, the model ran up to 20
epochs, for which the accuracy of 99.24% was attained (Fig. 14.3).
Fig. 14.2 InceptionResNetV2 accuracy and loss of model
Fig. 14.3 Accuracy and loss of model after focal loss
144 M. Raghav Srinivaas et al.
14.3.2 Detecting COVID-19 Using Random Forest
14.3.2.1 Dataset Information for ML: Covid_Symptoms Checker
These data will help to identify whether any person is having a coronavirus disease
or not based on some pre-defined standard symptoms. These symptoms are based on
guidelines given by the World Health Organization (WHO) who.int and the Ministry
of Health and Family Welfare, India.
The dataset contains seven major variables that will be having an impact on
whether someone has coronavirus disease or not, the variables are country, age,
symptoms, experience any other symptoms, severity, and contact. With all these
categorical variables, a combination for each label in the variable will be generated
and therefore, in total 316,800 combinations are created.
14.3.2.2 Implementation Using Random Forest
1. The text dataset is used as an input to random forest.
2. The data is either a report containing a sequence of text or labeled CSV files.
3. We initially set a threshold for each feature value like oxygen level, SAT’s, BP.
4. Once the data is provided as input, the random forest starts to extract the features.
5. After the features are extracted, it is compared with the threshold value to finds
if it is positive or negative.
This approach is a form of a supervised learning. We have the labeled data as well
as the outcome. So we perform random forest to extract the features and then, we
compare with the threshold value and classify it as positive or negative. Then, we
compare it with the output to find whether we are able to identify properly.
14.3.2.3 Exploratory Data Analysis of COVID-19 Symptoms Data
See Figs. 14.4,14.5,14.6 and 14.7.
14.4 Experimental Results
Using CNN we created two models and were tested with different loss function.
Initially, a model was developed from scratch and while testing it, we were able to
get an accuracy of 89.29%. Then, we trained the model using InceptionResNetV2
and were able to achieve an accuracy of 95.47%. Later the same model was trained
with focal loss function and this gave us a better accuracy of 99.24%. When we used
14 Detection of COVID-19 Using CNN and ML Algorithms 145
Fig. 14.4 Symptoms counts
Fig. 14.5 Percentage of symptoms heavily occurred
146 M. Raghav Srinivaas et al.
Fig. 14.6 Type of symptoms checker
Fig. 14.7 Random forest
results
random forest model, a less accuracy of 75.09% was achieved. From this, we can
infer that convolution neural networks perform better when compared to machine
learning algorithms.
14.5 Conclusion
Two models using were created to detect COVID symptoms using X-ray images. The
performance was evaluated in terms of classification accuracy, and it was observed
that the CNN model using InceptionResNetV2 provided a greater accuracy. Future
work will be toward using this model to enhance the accuracy for other datasets as
well.
14 Detection of COVID-19 Using CNN and ML Algorithms 147
References
1. Militante, S. V., Dionisio, N. V., & Sibbaluca, B. G. (2020). Pneumonia and COVID-19 detec-
tion using convolutional neural networks. In International Conference on Vocational Education
and Electrical Engineering (ICVEE).
2. Hasan, A., Shahin, I., & Alzabek, M. B. (2020). COVID 19 using recurrent neural networks.
IEEE.
3. Taresh, M. M., Zhu, N., Ali, T. A., Hameed, A. S., & Mutar, M. L. (2021). Transfer learning
to detect COVID-19 automatically from X-ray images using convolutional neural networks.
International Journal of Biomedical Imaging, Article ID 8828404.
4. Narin, A., Kaya, C., & Pamuk, Z. (2021). Automatic detection of coronavirus disease
(COVID-19) using X-ray images and deep convolutional neural networks. Pattern Analysis
and Application.
5. Awasthi, A., & Srivastava, S. K. (2021). A fuzzy soft set theoretic approach in decision making
of covid-19 risk in different regions. Communications in Mathematics and Applications, 12(2),
285–294. ISSN: 0975-8607.
6. Islam, M. Z., Islam, M. M., & Asraf, A. (2020). A combined deep CNN-LSTM network for
the detection of novel coronavirus (COVID-19) using X-ray images. Information in Medicine
Unlocked.
7. Digital Object Identifier https://doi.org/10.1109/MCI.2020.3019873. Date of current version:
14 October 2020. “Statistical techniques for combating covid-19”.
8. Tseng, V. S., Ching Ying, J. J., Wong, S. T. C., Cook, D. J., & Lui, J. (2020). Computational
intelligence techniques for combating COVID-19: A survey. IEEE Computation Intelligence
Magazine.
9. Taj, R. M., El Mouden, Z. A., Jakimi, A., & Hajar, M. (2020). Towards using recurrent neural
networks for predicting influenza-like illness: Case study of covid-19 in Morocco. International
Journal of Advanced Trends in Computer Science and Engineering, 9(5).
10. Rubin G. D., et al. (2020). The role of chest imaging in patient management during the COVID-
19 pandemic: A multinational consensus statement from the Fleischner society. Radiology,
296(1), 172–180.
Chapter 15
Prioritization of Watersheds Using GIS
and Fuzzy Analytical Hierarchy (FAHP)
Method
K. Anil, S. Sivaprakasam, and P. Sridhar
Abstract Evaluating watershed characteristics with GIS tools and FAHP method
to prioritize based on erodability is crucial for the researchers. Presently, the study
area Kaddam watershed part of the Godavari basin has been selected to prioritize the
watersheds. The study area has a watershed divide with watershed codes of 4E3C4
(Sikkumanu River), classified as Eleven watersheds (4E3C4a to 4E3C4k), and 4E3C5
(Kaddam river) classified into seven watersheds (4E3C5a to 4E3C5g). The geospatial
thematic layer drainage network was extracted from Survey of India topographical
maps and delineated the watershed boundaries from Watershed Atlas of India [WAI]
using ArcGIS software. The GIS software provides a database to the users about
the drainage network’s physical shape and length, and watershed areas. The data is
helpful to analyze the characteristics of watershed compute through morphometric
parameters equations. The results showed seven watersheds as very severe, five as
severe, and six as moderate are checked and compared with soil map properties of the
study area. 85–87% FAHP simulations showed accurate erodability matched with
soil map. Hence, the study showed that FAHP simulation is beneficial for prioritizing
watersheds based on erodability as a deciding factor.
15.1 Introduction
The watersheds prioritization apt to select on hydrological parameters such as runoff,
sediment yield, or erodability has gained more importance in watershed management
practices [1]. Watershed is defined as a drained area or natural hydrological boundary
which constitutes a closed polygon or at the lower elevation where exit point collects
K. Anil (B)·S. Sivaprakasam
Department of Civil Engineering, Annamalai University, Chidambaram, India
e-mail: anilkodimela@gmail.com
K. Anil
Department of Civil Engineering, Bapatla Engineering College, Bapatla, India
P. Sridhar
Shri Vishnu Engineering College for Women, Bhimavaram, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
B. N. K. Rao et al. (eds.), Intelligent Computing and Applications, Smart Innovation,
Systems and Technologies 315, https://doi.org/10.1007/978-981-19-4162-7_15
149
150 K. Anil et al.
the rainwater, melting snow, or ice coverage’s which meets another water body [2].
Delineate the watershed boundary depends on the occupancy size of streams and
which also joins to the mainstream. The length of the physical object of streams
and areas of watersheds extracts from topographical maps or satellite images, or
aerial photographs with the help of Geographical Information System (GIS) software
treated as morphometric parameters. The several morphometric parameters that hold
the mathematical equations measure the geometrical shape of above earth features
and describe the characteristics of the watershed. The characteristics of watersheds
assessed through employing advanced technologies such as remote sensing satellite
images data with medium to higher resolution, digital elevation models (DEM), and
GIS software proved significant tools in the last few decades [3]. The GIS tools
provide to figure out the nature of the topography, hydrological behavior, and causes
of erodability, which are helpful to prioritize the watershed [4]. Erodability is the
decision-making parameter for prioritization of watersheds to identify soil and water
conservation measures studies is increased by recent times with the morphome-
tric and fuzzy analytical hierarchy (FAHP) method [3,5]. Apart from this earlier,
several researchers investigated to prioritize the watersheds on different terrains
applied morphometry with Sediment Yield Index (SYI) or LULC combination, or
the compound parameter method [7].
Nevertheless, many studies have recently focused on the FAHP method to prior-
itize watersheds deciding on erodability values, which are FAHP outcomes. FAHP
method evaluates the fuzzy behavioral process and identifies which one shows higher
to the lower influences with better accuracy of any problem [8]. Therefore, present
research attempts to prioritize watersheds using a unique approach based on an exam-
ination of the natural drainage system by looking into the FAHP method to avoid the
complex information that comes with it by improving recognition and prioritization
accuracy with varied morphological parameters [9].
15.2 Relevant Study
The Kaddam watershed is a part of the Godavari river basin and is located on the
Northern side of Telangana state, a present district of Adilabad and Nirmal. The
coordinates of the study area are 19°5to 19°35N latitude and longitudes 78°10
to 78°55E. The study area covers 64% of clayey soils, cracking clay soil is 4.6%,
gravelly clay soils are 11.47%, Gravelly loam soils are 1.6%, and loamy soils are
18.58%. The mean annual rainfall of the study area was found as 1031.75 mm period
of 1996–2020 and to 9 °C minimum temperature in the January month and 18° to
32 °C in the May months from the Bureau of Economics and Statistics department,
Hyderabad. The study area map represented in Fig. 15.1. The datasets are collected
from Survey of India Topographical maps or Toposheet numbers 56 I/3, I/6, I/7, I/8,
I/10, I/11, I/12, I/14, I/15, and 56 I/16, scale of 1:50,000. The software utilized here
is ArcGIS 10.3 and Microsoft Excel.
15 Prioritization of Watersheds Using GIS and FAHP... 151
Fig. 15.1 Study Area
15.3 Methodology
15.3.1 Topographical Maps
Survey of India [SOI] topographical maps of the study area has been collected,
scanned, and added into the ArcMap environment. All the maps are geo-referenced,
rectified, and projected into a three-dimensional spherical coordinate system to a
two-dimensional system to get the physical objects’ real-world object distance and
areas in the ArcMap environment [12]. Created a Geospatial database, extracted
linear features of drainage or stream network in the study area from topographical
maps. Strahler stream order attributed to drainage network as follows; first order has
attributed to those streams that do not have any connected streams if two first orders
meet or join at one point from that intersection onwards second order as follows if
any stream order number as two of two individual streams, then the intersection point
ahead of third-order as follows.
15.3.2 Watershed Atlas of India (WAI)
The study area watershed divide, delineation, and attribution of mini watersheds are
refereed from sheet no four (4) of the WAI [12].
The morphometry equations were used to compute the characteristics of the
watershed listed in Table 15.1.
152 K. Anil et al.
Fig. 15.2 Proposed flow chart of methodology
Table 15.1 Morphometric parameter
S.No. Basic criterion Technique Mentioned by
1Basin length Lb=(1.312) ×[A]0.568 Nookaratnam (11)
2 Compactness coefficient Cc=0.2821 ×P/(A)0.5 Horton (6)
3Bifurcation ratio Rb=Nu/Nu +1Schumn (17)
4Elongation ratio Re=(2/Lb)×(A/π)2Schumn (17)
5Drainage density Dd=(Lu)/(A)Hortan (6)
6Drainage texture T=(Nu)/(P)Hortan (6)
7Form factor Rf=A/(Lb)2Hortan (6)
8Circularity ratio Rc=4×π×A/(P)2Miller (10)
9Stream frequency Fs=(N)/(A)Horton (6)
Nu total number of streams of all order, Lbbasin Length (km), Lu total stream length of all orders,
Ntotal number of streams, Nu +1 number of stream segments of next higher order. Aarea of the
basin (km2)andPperimeter of the basin (km)
15.3.3 Fuzzy Analytical Hierarchy Program (FAHP)
Several researchers utilized statistical models AHP and MCDM techniques to prior-
itize watersheds based on morphometric parameters, and these outputs, such as
erosion, are decision-making processes found in recent and old literature. To assess
and prioritization of watersheds for the study area, morphometric outputs are consid-
ered to frame in matrix form in row and columns are nine [13]. The nine parameters
of matrix form are basin length (Lb), circularity ratio (Rc), stream frequency (Fs),
15 Prioritization of Watersheds Using GIS and FAHP... 153
drainage density (Dd), drainage texture (T), elongation ratio (Re), bifurcation ratio
(Rb) form factor, and compactness constant (Cc)[14]. The matrix parameters are
fixed with numeric values, and the reciprocal values of numeric values are derived
from linguistic terms, and the values are as follows very severe for nine values, severe
for eight, and moderate for seven in the FAHP analysis. Then, these values provide
the weightage value of each morphometric parameter. The weightage value of each
parameter multiplied with morphometric parameter, these outcomes of each param-
eter summation provide the normalized weightage value of each watershed, which
can derive the prioritization ranks of a watershed [15]. Hierarchy of fuzzy analyt-
ical program analysis is required to ensure consistency in matrix simulations using
Saaty’s proposed consistency index (CI) and consistency ratio (CR). The consistency
ratio (CR) simulations show a value of less than 10%, indicating whether the decision
is consistent or not [16]. The consistency ratio and index formulas are listed below.
CR =(CI/RI)×100
Consistency Index =λMax n
n1
n—number of factors used in the table of matrix, for present study area, 9 ×9matrix
adopted for the watershed prioritization.
15.4 Results and Discussion
15.4.1 Morphometric Analysis
Each watershed characteristic is calculated using Table 15.1 formulas, and the results
are represented in Table 15.2.
The bifurcation ratio is the minimum and maximum values found in 4E3C5e
and 4E3C5a watersheds. The bifurcation ratio indicates that to measure the flooding
proneness in the watershed, and the higher the bifurcation ratio indicates greater
flooding probability. The ranges of bifurcation ratio show between 2 to 5 typically,
but the present study area has greater than five values (Table 15.2), which means has
severe floods occurred in the respective watersheds. The drainage density expresses
the potential of a hydrological parameter such as runoff in the watershed or basin.
Presently, higher drainage density founds in 4E3C4d as 3.51 km/km2(Table 15.2),
and lesser density is found in 4E3C4i as 2.1 km/km2(Table 15.2) watersheds. The
elongation ratio value lies in between 0 and near 1, if near 1 of any basin that
indicates the basin has less control on geomorphological parameters. The relief,
infiltration capacity, and permeability of the watershed or basin depend upon the
stream frequency. The stream frequency highest found in the 4E3C4c watershed was
7.2 (Table 15.2) and lowest in 4E3C4i as 3.6 (Table 15.2). The runoff degree of
dissection is directly related to stream frequency and in direct proportion to the mean
154 K. Anil et al.
Table 15.2 Characteristics of morphometric parameters
Code Ld,b CcDdFsRcRfReRbT
4E3C4a 41.65 2.2 2.82 4.6 0.2 0.25 0.56 5.35 12.4
4E3C4b 24 1.4 2.8 5.1 0.4 0.29 0.6 6.4 13.1
4E3C4c 25.11 1.7 3.4 7.2 0.33 0.28 0.6 4.73 15.8
4E3C4d 12.88 1.2 3.51 6.036 0.6 0.33 0.65 4.47 10.2
4E3C4e 16.48 22.9 4.215 0.2 0.31 0.63 5.99 5.39
4E3C4f 10.64 1.5 3.06 5.593 0.41 0.35 0.67 10.7 6.43
4E3C4g 8.54 1.5 3.07 5.468 0.4 0.37 0.68 6.5 5.09
4E3C4h 10.8 1.3 3.4 6.086 0.5 0.35 0.66 6.55 8.33
4E3C4i 17.98 1.8 2.1 3.627 0.28 0.31 0.62 6.04 5.43
4E3C4j 9.6 1.2 2.43 3.693 0.6 0.36 0.67 6.66 4.7
4E3C4k 25.96 1.3 3.14 5.098 0.57 0.28 0.6 815.1
4E3C5a 44.46 2.4 2.82 5.251 0.16 0.25 0.56 11.9 13.4
4E3C5b 19.24 1.5 2.4 3.7 0.43 0.3 0.62 4.38 7.3
4E3C5c 22.43 1.8 2.9 4.6 0.3 0.29 0.61 4.8 8.7
4E3C5d 9.78 1.7 3.25 50.31 0.35 0.67 5.36 4.6
4E3C5e 26.24 1.2 3.04 50.6 0.2 0.6 215.5
4E3C5f 15.81 1.4 2.94 4.7 0.46 0.32 0.63 4.2 8.1
4E3C5g 28.65 1.4 2.6 4.9 0.49 0.2 0.59 4.19 14.7
annual rainfall of the study area [18]. Very high runoff in the watersheds leads to
severe erosion problems, which can hamper the reservoir depositions. The texture
ratio is related to infiltration rate; a lesser texture value has high infiltration and
less susceptibility to erosion, whereas a higher texture value indicates more inferior
infiltration and increased exposure to erosion. The study area found that a more
secondary texture 4.6 (Table 15.2) is found in the 4E3C5d watershed and a higher
texture value 15.8 (Table 15.2) 4E3C4c watershed [19].
15.4.2 Watershed Prioritization Utilizing the Fuzzy
Analytical Hierarchy Process (FAHP) Technique
The morphometric parameters such as basin length (Lb), circularity ratio (Rc),
drainage density (Dd), stream frequency (Fs), elongation ratio (Re), form factor,
drainage texture (T), bifurcation ratio (Rb), and compactness constant (Cc) pairwise
comparison matrix size and weightage value obtained from the computation are
represented in Table 15.3. The field experience and expert’s suggestions are helpful
to fill the matrix between two pairs which determine the degree of importance in
the FAHP method [20]. Initially, the method needs to set the linguistic terms with
15 Prioritization of Watersheds Using GIS and FAHP... 155
Table 15.3 Pairwise comparison matrix values and weightage
Parameters LbCcDdFsRcRfReRbT
Weights 0.1012 0.1157 0.0613 0.1765 0.1269 0.1561 0.0393 0.142 0.0804
values such as very severe-9, severe-8, and moderate-7 and reciprocal of the same
importance to calculate each morphometric parameter to procure relative weights
shown in Table 15.3.
The relative weights are multiplied with morphometric parameters provides the
normalized values. Each parameter normalized value is summed together gets the
final value that decides the watershed’s priority. The priority values are not unique
for any study area; different researchers or users get other values for priority ranking.
However, the final values obtained of normalized values in the study area were chosen
as the values between 0–4 as a moderate priority, if the values are places in between
4–5 as severe and >5 as very severe erodability chances in the watersheds.
The erodability chances very severe in watersheds found are—4E3C4a, 4E3C4b,
4E3C4c, 4E3C4k, 4E3C5a, 4E3C5c, 4E3C5e, and 4E3C5g severe watersheds are—
4E3C4d, 4E3C4e, 4E3C4f, 4E3C4h, 4E3C4i, 4E3C5b, and 4E3C5f, followed by
Moderate watersheds are—4E3C4g, 4E3C4j, and 4E3C5d. The FAHP simulations
validate with soil map and taxonomy, from results 85% to 87% matched accurately
of erodability nature of the watersheds.
The various maps of the study area are represented in Fig. 15.3.
Fig. 15.3 Description of various study area maps
156 K. Anil et al.
15.5 Conclusion
The GIS tools are very effective tools that can extract from various sources of phys-
ical objects information at exact geospatial locations, which is a valuable insight
to various morphometric equations. The morphometric outcomes are easy to under-
stand about the hydrological behavior of the watersheds over a while spatially by
experts or analysts or working professionals in the field. The statistical model like
FAHP needs expertise and understand each parameter to fill or frame the parameters
in matrix size is the only disadvantage but remaining other computation in the model
easy can dose any person. But, integrating the simulations with the FAHP method to
solve fuzzy or vagueness to decide on a particular theme of erodability and water-
shed prioritization is not an easy task for planners. The present study showed that
morphometric and FAHP method is the prominent tools to prioritize the watersheds
on which erodability is a decision-making parameter. Finally, the spatial distribution
of prioritization map preparation, which is the outcome of FAHP, is helpful informa-
tion to the watershed planners so that they can easy to work on a priority base that is
very severe, severe, and moderate erodability watersheds. These maps are reducing
the valuable time with cost in the watershed for the planners. Hence, the methods
recommended and encouraged to do by researchers and planners in the watershed
management planning activities to achieve better results effectively.
References
1. Ahmed, R., Sajjad, H., & Husain, I. (2018). Morphometric parameters-based prioritization
of sub-watersheds using fuzzy analytical hierarchy process: A case study of lower Barpani
watershed, India. Natural Resources Research, 27(1), 67–75. https://doi.org/10.1007/s11053-
017-9337-4
2. Bali, R., Agarwal, K. K., Nawaz Ali, S., Rastogi, S. K., & Krishna, K. (2012). Drainage
morphometry of Himalayan Glacio-fluvial basin, India: Hydrologic and neotectonic implica-
tions. Environmental Earth Sciences, 66(4), 1163–1174. https://doi.org/10.1007/s12665-011-
1324-1
3. Banerjee, A., Singh, P., & Pratap, K. (2017). Morphometric evaluation of Swarnrekha water-
shed, Madhya Pradesh, India: An integrated GIS-based approach. Applied Water Science, 7 (4),
1807–1815. https://doi.org/10.1007/s13201-015-0354-3
4. Bhattacharya, R. K., Chatterjee, N. D., & Das, K. (2020). Sub-basin prioritization for assess-
ment of soil erosion susceptibility in Kangsabati, a plateau basin: A comparison between
MCDM and SWAT models. Science of the Total Environment, 734, 139474. https://doi.org/10.
1016/j.scitotenv.2020.139474
5. Hembram, T. K., & Saha, S. (2020). Prioritization of sub-watersheds for soil erosion based on
morphometric attributes using fuzzy AHP and compound factor in Jainti Riverbasin, Jharkhand,
Eastern India. Environment, Development, and Sustainability, 22(2), 1241–1268. https://doi.
org/10.1007/s10668-018-0247-3
6. Horton, R.E. (1945) Erosional development of streams and their drainage basins: Hydro-
physical approach to quantitative morphology. GSA Bulletin, 56:275–370. https://doi.org/10.
1130/0016-7606(1945)56[275:EDOSAT]2.0.CO;2
7. Jhariya, D. C., Kumar, T., & Pandey, H. K. (2020). Watershed prioritization based on soil and
water hazard model using remote sensing, geographical information system, and multi-criteria
15 Prioritization of Watersheds Using GIS and FAHP... 157
decision analysis approach. Geocarto International, 35(2), 188–208. https://doi.org/10.1080/
10106049.2018.1510039
8. Magesh, N. S., Jitheshlal, K. V., Chandrasekar, N., & Jini, K. V. (2013). Geographical informa-
tion system-based morphometric analysis of Bharathapuzha river basin, Kerala, India. Applied
Water Science, 3(2), 467–477. https://doi.org/10.1007/s13201-013-0095-0
9. Meshram, S. G., Alvandi, E., Singh, V. P., & Meshram, C. (2019). Comparison of AHP and
fuzzy AHP models for prioritization of watersheds. Soft Computing, 23(24), 13615–13625.
https://doi.org/10.1007/s00500-019-03900-z
10. Miller, V.C. (1953) A quantitative geomorphologic study of drainage basin characteristics
in the clinch mountain area, Virginia and Tennessee. Columbia University, Department of
Geology, Technical Report, No. 3, Contract N6 ONR. 271–300
11. Nooka Ratnam, N., Srivastava, Y.K., Rao, V.V., Amminedu, E., & Murthy. K.S.R. (2005) Check
dam positioning by prioritization of micro-watersheds using SYI model and morphometric
analysis—remote sensing and GIS perspective. Journal of the Indian society of remote sensing,
33(1):25–38. https://doi.org/10.1007/BF02989988
12. Parupalli, S., Padma Kumari, K., & Ganapuram, S. (2019). Assessment and planning for inte-
grated river basin management using remote sensing, SWAT model, and morphometric analysis
(case study: Kaddam river basin, India). Geocarto International, 34(12), 1332–1362. https://
doi.org/10.1080/10106049.2018.1489420
13. Rahaman, S. A., Ajeez, S. A., Aruchamy, S., & Jegankumar, R. (2015). Prioritization of sub
watershed based on morphometric characteristics using fuzzy analytical hierarchy process
and geographical information system—A study of Kallar Watershed, Tamil Nadu. Aquatic
Procedia, 4(Icwrcoe), 1322–1330. https://doi.org/10.1016/j.aqpro.2015.02.172
14. Rai, P. K., Mohan, K., Mishra, S., Ahmad, A., & Mishra, V. N. (2017). A GIS-based approach
in drainage morphometric analysis of Kanhar River Basin, India. Applied Water Science, 7 (1),
217–232. https://doi.org/10.1007/s13201-014-0238-y
15. Sangma, F., & Guru, B. (2020). Watersheds characteristics and prioritization using morpho-
metric parameters and fuzzy analytical hierarchal process (FAHP): A part of Lower Subansiri
Sub-Basin. Journal of the Indian Society of Remote Sensing, 48(3), 473–496. https://doi.org/
10.1007/s12524-019-01091-6
16. Sarma, S., & Saikia, T. (2012). Prioritization of Sub-watersheds in Khanapara-Bornihat Area
of Assam-Meghalaya (India) based on land use and slope analysis using remote sensing and
GIS. Journal of the Indian Society of Remote Sensing, 40(3), 435–446. https://doi.org/10.1007/
s12524-011-0163-6
17. Schumm, S.A. (1956) Evolution of drainage systems and slopes in badlands at perth amboy,
New Jersey. GSA Bulletin 67(5):597–646. https://doi.org/10.1130/0016-7606(1956)67[597:
EODSAS]2.0.CO;2
18. Singh, P., Thakur, J. K., & Singh, U. C. (2013). Morphometric analysis of Morar River
Basin, Madhya Pradesh, India, using remote sensing and GIS techniques. Environmental Earth
Sciences, 68(7), 1967–1977. https://doi.org/10.1007/s12665-012-1884-8
19. Sridhar, P., & Ganapuram, S. (2021). Morphometric analysis using fuzzy analytical hierarchy
process (FAHP) and geographic information systems (GIS) for the prioritization of watersheds.
Arabian Journal of Geosciences, 14(4), 236. https://doi.org/10.1007/s12517-021-06539-z
20. Thomas, J., Joseph, S., Thrivikramji, K. P., Abe, G., & Kannan, N. (2012). Morphometrical anal-
ysis of two tropical mountain river basins of contrasting environmental settings, the southern
Western Ghats, India. Environmental Earth Sciences, 66(8), 2353–2366. https://doi.org/10.
1007/s12665-011-1457-2
Chapter 16
A Narrative Framework with Ensemble
Learning for Face Emotion Recognition
S. Naveen Kumar Polisetty, T. Sivaprakasam, and S. Indraneel
Abstract Face recognition is playing essential role in many different kind of sectors.
Past four decades face recognition (FR) features are extracted for security, entertain-
ment, employment and so on. Most auspicious part of our research across the world
is focusing to the living being and nature, etc. Our ancient myths are stated that mind
is the origin of all the thoughts and it can be expresses by eyes. Artificial intelligence
is playing vital role in living being’s life, for instance education, research, medical,
forecasting, many kinds of natural disaster and so on. In this paper, we focused
on deepest relativity between mind and matter and how the mind is emotionally
connected with eyes. We extract the features of mind and its correlation activities
because mind is acted as the creator for any event and face is the output from the
mind. Through the features we have planned to construct the security, employment
areas like candidate hiring process to various industry sectors. We proposed ensemble
learning techniques to achieve mind functions and how it can be imposed to the eye
visions.
16.1 Introduction
Mind is directionally propositional to the body which is the outward manifestation.
So body is simply acted as actuator of the mind. Especially, face is the ultimate
platform of the mind which states the different actions. When the body is always
following the mind, if the mink thinks of moving toward up area, the physical body
prepares automatically itself and reflects the external signals are sign.
S. Naveen Kumar Polisetty (B)
Research Scholar, Department of Computer Science and Engineering, Annamalai University,
Chidambaram, Tamil Nadu, India
e-mail: naveenmtech28@gmail.com
T. Sivaprakasam
Department of Computer Science and Engineering, Annamalai University, Chidambaram, Tamil
Nadu, India
S. Indraneel
St. Anns College of Engineering and Technology, Chirala, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
B. N. K. Rao et al. (eds.), Intelligent Computing and Applications, Smart Innovation,
Systems and Technologies 315, https://doi.org/10.1007/978-981-19-4162-7_16
159
160 S. Naveen Kumar Polisetty et al.
Facial Recognition System
A facial recognition system is one of the vital techniques which are used to check the
identity of an object from the input sources like video or image frames. Biomedical-
enabled artificial intelligence system is incorporating with multiple image processing
techniques with optimum result criteria.
Emotional Intelligence
Emotional is the distinguish characteristics of human being and exposes their inner
thought. In this scenario, they may be discussed by four different parameters like
self-management, awareness, social impacts and relativities. This is one of the arts
of study which elucidate the managing skills against the relativities and other long
lusting objects. In fact, emotion recognition plays a pivotal role in the experience of
empathy [1].
Cat and Rat Theory
We consider about the above figures, Fig. 16.1 shows how the rat is affected by cat
and rat felt that it is not in safer zone. Figure 16.2 shows that kitten is in cat’s mouth
but kitten felt that it is in comfort place. Even though, two animals are in same place
but their feelings are different. Here, mind analysis was calculated between animals.
Our myths are saying about the mind in different aspects, and mind is the origin of
all kinds of matter like God, planet, evilness and so on. Since we have five major
organs eye, nose, ear, mouth and body; eye is the primary agent of mind.
Based on this analogy, we deeply analyzed about the mind and eye relativity
concerns. In past three decades, artificial intelligence methodologies are impacting
very big role and rule the world with different machine learning techniques.
Fig. 16.1 Cat catch rat
16 A Narrative Framework with Ensemble Learning for Face Emotion 161
Fig. 16.2 Cat catch kitten
16.2 Related Work
Nurulhuda Ismail et al. discussed [1] about the few different face recognition algo-
rithms and their review results also. Principle component analysis (PCA) is the
method for simplifying the problem with dimension space of representation. It always
returns the original information of data linear discriminant analysis (LDA) is the
useful algorithm for feature extraction in face images. This approach requires several
training images for each face. Skin color-based algorithm is used for extracting
features of human faces. Skin pixels are the parameter for distinguish whether
skin color or not and it is constructed by Gaussian probability density. Wavelet-
based algorithm describes each face image is described by a subset of band filtered
images containing wavelet coefficients. Artificial neural networks-based algorithm
is commonly used in all industry level and commercial applications. It will be imple-
mented once a face has been detected to identify and recognize who the person is by
calculating the weight of the facial information. ANN is considered the—foundation
of AI and refers to the pieces of the processing system that computers use to perform
human-like, intelligent problem solving.
Alvappillai and Barrina [2] were implemented a facial recognition system using
a global-approach to feature extraction based on histogram-oriented gradient. They
extracted feature vectors from different facial images in AT&T dataset and Yale data
repositories. They used support vector machine (SVM) for learning the image set. It
supports regression and classification and coming from supervised learning sector.
SVM is mostly used to for finding hyperplane in the features from the data points.
Hyperplanes are decision boundaries that aid in data classification. Different classes
can be assigned to data points on either side of the hyperplane. The hyperplane’s
dimension is also determined by the number of features. Support vectors are data
points that are closer to the hyperplane and have an influence on the hyperplane’s
162 S. Naveen Kumar Polisetty et al.
Fig. 16.3 Blending two different images
position and orientation. We increasing the classifier’s margin by using these support
vectors. The hyperplane’s position will be altered if the support vectors are deleted.
Bah and Ming [3] had demonstrated attendance monitoring system for real-life
scenario with the implementation of algorithm with the inputs of binary pattern. They
are incorporating as contrasting, bilateral filtering for face recognition system. Their
experimental results from two interconnected parts, LBP cascade classifier have been
implemented and images are captured by digital sources like camera, jpeg pictures.
Hear cascade classifier algorithm was implemented for face detection accuracy and
minimizing the numbers of +, −−. Second phase, they extracted image unification
technique on their trained image datasets (Fig. 16.3).
They have concluded that using LBP with contrast adjustments, bilateral filter and
histogram equalization is evolving in sophisticated imaging techniques and func-
tional to the training image processing technique. K2 sectors are having two subsets,
LBP code will be calculated for every pixel in a region of the input face image by
comparing the center with the surrounding pixel.
Narmatha C. and Manimegalai P. have implemented modified stenography for
images (MSI) for encoding and decoding techniques foe image protection in the
transmission areas. The MSI technique emphases the stenographic images creation
using encode process and reconstructing with secret data in decode process.
The ‘stegano’ is the cover image and not able to identify the hidden content from
the ‘stegano’ image N,n
i,j=0ST(i,j)as shown in Fig. 16.4. Major features research
is shuffling of cover image. They tested with 1000 Gy scale images with cover and
secret parameters. 512 ×512 pixels fixed for cover images and 256 ×256 pixels
for secret images are considered as input and applied for MSI processing. And it
takes lesser time 1–1.865 s only for processing with all data and they improved with
testing software for complexity strength [4].
16 A Narrative Framework with Ensemble Learning for Face Emotion 163
Fig. 16.4 Cover and encoded images
16.3 Proposed Work
This architecture depicts the framework of face recognition, and we detect the face
with different trained parameters and analysis eye of impression as inputs to the
machine. Already we have the mind parameters about the particular person and
synchronize with ensemble learning. Human brain has 10 billion neurons, and it
can be incorporated with different duties of human. Based on the eye impression,
decision support or expert system will produce the output as expected hypothesis.
If the hypotheses are nearly meet the requirement or we have to apply the booster
function at ensemble learning part, then train the network with stipulated time.
Ensemble Learning
Ensemble learning is one of the supervised learning methods, instead of learn from
single hypothesis, it allows to learn multiple hypotheses and combine their predic-
tions. When combing multiple independent and divers decisions each of which is at
least more accurate than random guessing, random errors cancel each other out and
correct decisions are reinforced. Hypothesis space is important to note that the possi-
bility or impossibility of finding a simple, consistent hypothesis depends strongly
on the hypothesis space chosen. It is always good to prefer the simplest hypothesis
which is consistent with data. But consistency is not always achievable.
Figure 16.5 depicts the detailed processing of our learning model; we train the face
images for our expecting outcome which is come from his/her mind. We have taken
the input parameters are eye brow, eye ball rotation, skin color and intensity of eyes
and cheek. By the combinations of our entire trained hypothesis model combiner
will be generated. If it is meet our expectation we have concluded with statement or
we have to go for ADA boosting algorithm for improving the efficiency.
164 S. Naveen Kumar Polisetty et al.
Fig. 16.5 Ensemble learning with mind parameters
16.4 Results and Discussion
In this scenario, three classifiers have been analyzed; they are decision tree, neighbor
and logistic regression. From this classifier is divided into two parts; one is consisting
with independent variables with trained data and other is incorporate with target
variable for training data.
Boost Algorithm
1. For each learners do—uniform weights
2. Match and train with learner with weight
3. All data must be trained
4. Identify the errors in weight
5. Confine with ensemble value end for
Boost Features:
1. It is sequential algorithm
2. Concentrate on misclassified data
3. Very popular algorithm
4. Used to improve accuracy from the prior tree
16 A Narrative Framework with Ensemble Learning for Face Emotion 165
We used keras tool with Python for simulate our research work. In this part, tree
has been constructed with all image parameters. Xis the actual image accelerations,
and Yis our expectations. Collection of Xis also considered as hypothesis and
collect n number hypothesis are grouped together and confine the statement. If Xis
not supports we can boost algorithm for Xand get the Ynearer value.
Figure 16.6 is trained by our network and predict the optimum result. In
psychology, decision making is the major cognitive process and come across their
past memories and real expectation or depending on the current scenario. Our aim is
analyze the mind values through the face recognition. So we train the 1200 different
images with different actions are to be stored in the database for future reference.
Input images may give as recorded format like Jpeg, video formats or through the
live video stream to the framework, then feature extraction is enabled with the given
input. For instance, we have given black and white images, skin color is difficult to
identify the features; so it can be split into 512 ×512 pixels and analyze the features.
How it works:
Figure 16.7 shows the reactions of deep looking or stares about something but this
picture is not in preferred pixel size. We have to implement boosting algorithm to
achieve the result; for this image parameters color, eye ball view and eye ball position
are having less pixel values. Hence, our trained model suggests add brightness 17%
and contrast 21% will be added to the images and confine the result.
166 S. Naveen Kumar Polisetty et al.
Face – Peace Face – Angry
Face – Expected Face – Comfort
Face – Question to others Face Not Happy
Face – Not able to answer Face Stare
Fig. 16.6 Face—different actions on eye view
Fig. 16.7 Stare image
Our narrative framework defines how the mind flows from the eye; therefore, we
identified the parameters like eye ball, eye brow, cheek, color and intensity of eyes
are called instance of training data. Each classifier is able to identify or recognize
the test instances from its trained set.
Strictly follows the feature extraction, i.e., there are n similarity in the set, there
are no non-similarity in the given set. Next phases, all the (n) similarity of probability
16 A Narrative Framework with Ensemble Learning for Face Emotion 167
Table 16.1 Parameterized analysis
Train the parameters Eye ball rotation Eye brow Cheek Intensity of eyes Skin color
Image 1 0.0 0.05 0.14 0.32 0.21
Image 10 0.15 0.32 0.21 0.42 0.25
Image 50 0.22 0.12 0.11 0.31 0.12
Image 80 0.24 0.14 0.16 0.32 0.12
are combined in region index for further inference.
y=arg max
cjC
hiH
P(cj|hj)P(T|hi)P(hi)
Table 16.1 depicts the images for decision making from ensemble learning.
Initially, we analyzed one images with different prosperities and ten images with
two persons, then started with ten, 16 persons with our narrative framework. Even-
tually, we evaluate with Yis the expected value, where Cis the possibility of classes
His the hypothesis and Pis to identify the probability with Ttraining data.
16.5 Conclusion
We confined that this research will be used for all employability selection process,
security surveillance, smart home application, etc. For instance, through the face
detection we may have to make the decision making for face to face interview. Even
though, many researchers contributed their effective ideas about the face recognition;
we contributed mind with eye in different aspects. Decision making is the toughest
task for analyzing single hypothesis from the from the face recognition. Because
human mind is varying every aspect so we tend to narrate this framework with more
than one parameter for face recognition system. We achieved 94% accuracy from the
given trained set, and it will be very useful for security related application and other
humanoid applications.
References
1. Ismail, N., & Sabri, M. I. (2018). Review of existing algorithms for face detection and
recognition. Recent Advances in Computational Intelligence, Man-Machine Systems and
Cybernetics.
2. Alvappillai, A., & Barrina, P. N., Face Recognition Using Machine Learning.USCD
3. Bah, S. M., & Ming, F. (2020). An improved face recognition algorithm and its application in
attendance management system. Elsevier, 2590-0056.
4. Li, Y., Face Recognition System Thesis.
Chapter 17
Modified Cloud-Based Malware
Identification Technique Using Machine
Learning Approach
Gavini Sreelatha, Aishwarya Govindkar, and Sarukolla Ushaswini
Abstract As there is drastic improvement in the wireless signals, which is associated
with IoT environment and results in the development mobile devices. This is needed
as there is growth in the impact of the mobile devices nowadays, threat developers
also active in spreading the malware day to day to weaken the user’s data privacy
and integrity as it is eventually need for the mobile users. So, there is a need of
effective framework to identify the malware, which is exist in the smartphone android
applications and able to analysis the devices dynamically from time to time. To
develop a machine learning-based web framework, which is able to identify the
malware in the mobile devices. As this framework should be more effective for the
real-time applications as it the propose framework needs more salient features to gain
the effectiveness based on the feature selection concept. For the analysis, we have to
take more samples on various android applications and analysis the framework with
certain parameters like F-measure and data accuracy. As feature selection is concern,
we can consider chi-square, gain ratio, information gain, logistic regression analysis,
one R and PCA.
17.1 Introduction
In the recent days, wireless network is growing rapidly due to the increase in the usage
of smartphones and become the daily element for the human life. As the humans are
physically and mentally dependent on the mobile phones and as the need of various
applications are need to process different activities such as bank transactions, chat
G. Sreelatha (B)·A. Govindkar ·S. Ushaswini
Department of Information Technology, Stanley College of Engineering and Technology for
Women, Hyderabad, India
e-mail: sreelathaprince13@gmail.com
A. Govindkar
e-mail: aishwaryagovindkar@gmail.com
S. Ushaswini
e-mail: sushaswini@gmail.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
B. N. K. Rao et al. (eds.), Intelligent Computing and Applications, Smart Innovation,
Systems and Technologies 315, https://doi.org/10.1007/978-981-19-4162-7_17
169
170 G. Sreelatha et al.
interaction and social networking. To utilize the above mobile applications, there is a
need of vast Internet and huge data storage with the mobile phones where cloud plays
a significant role. As there is a purpose of installing various applications, malware
detection has a major role and it extracts the useful information and it happens day to
day [1]. As the statistical survey has forecasted that there are few thousand new threats
are penetrated based on mobile applications. So, there is a significant challenge faced
by the mobile user that the new intruder threats are a major problem.
Based on the mobile applications, the threat detection and identification play
a significant role and it can be processed based on feature selection. The feature
selection can be performed based on training the datasets through learning techniques.
As there are two broad classifieds in the feature selection as mentioned as, ranking
the feature and subset selection approach. Regarding the feature selection.
Gain Ratio: It can be determined based on the essential information based on
decision learning tree. It helps to reduce the predisposition based on the attributes
of multi-valued, and it will be based on the attribute choice using branches size
and count [2].
Chi-Squared Test: By considering the classification problem, categorical data are
considered as the input variables. Then, the output variables are generated based
on the test statistical method. Here, the test has dependent and non-dependent
method based on the input attributes.
Information Ratio: It helps to measure the financial information based on adjust-
ment risk related to certain limitations. Determine the data consistency to calcu-
late the excess data to be obtained and it exhibits the data identification related to
problem obtained [3].
One R: It follows the classification of data, which obtained data prediction using
one rule method. When one rule is applied to determine the total error rate and
it performs prediction based on rule creation. For the construction of prediction,
the table frequency is considered based on the data target [4].
Principal Component Analysis (PCA): It is nothing but the method of statistics,
and it converts the certain number of variable correlations into the number of
variable uncorrelation based on usage of transform method. It is mainly used for
analysing the data identification and various model of predictions.
Logistic Regression Analysis: It refers to the logical regression, and inputs are
considered as the data weights and its value coefficient as joined. The above input
values able to predict the data to determine the prediction value output and this
output can be obtained based on the binary values of 0’s and 1’s [5].
For the data subset feature,
Feature Selection Correlation: It is based on statistical test being analysed using
various machine learning techniques, and it is a feature selection method makes
faster execution [6]. This process helps in eliminating the unwanted and unused
data in order to improve the data performance based on machine learning
algorithm.
17 Modified Cloud-Based Malware Identification Technique 171
Rough Set Analysis: Here, the analysis are performed based on data uncertainty
through rough datasets. The analysis help to calculate the object attributes and
able to construct the both set of upper and lower data. For real applications,
size and complexity of data get varies while performing the data analysis and
managing the execution, [7] to reduce the size of data and able to maintain the
data inconsistency.
Consistency Subset Evaluation Approach: It has feature selection, which helps
to perform effective technique with reduced reduction dimensionality. The data
classification helps to determine the feature selection in an optimal way with data
accuracy and reduced data size [8]. In this feature selection, there are two methods
such as candidate selection subset and feature space search.
Filtered Subset Evaluation: This filter method is the statistical approach to analyse
the association between the input data and output data generated at the destina-
tion [9]. During the filtering method, the score obtained based on the input data
considered will perform to choose the filter.
The above techniques used for the process of machine learning will able to identify
and analyse the detection of malware.
In this paper, modified cloud-based malware identification technique is propose,
which is more effective and optimal based on cloud concept will able to identify and
analyse the malware application. This will able to perform based on the application
process interface. Generally, this proposes technique will makes the machine learning
algorithm to execute based on the application, which will support either android or
IOS. In this technique, machine learning will support supervised, unsupervised and
hybrid approach. In the performance analyse, the propose technique will attain high
accuracy during the detection of the malware based on the huge datasets considered.
The propose cloud-based malware identification technique able to identify and
analyse the malware application to attain high accuracy during the detection of the
malware based on the huge datasets considered. Here, there are three phases are
associated with the propose work as mentioned below. Input is taken from various
repositories as all the malware-based application are stored over there. Mobile appli-
cation files are identified using the malware scanner to detect. Extracting the feature
associated with application permission and API calls are to be performed. In this
extract feature, i.e. feature selection based on machine learning will help to perform
any test as mentioned in Sect. 17.1. The malware detection model helps to iden-
tify the significant feature, and those analyse will be performed based on machine
learning algorithm. To authenticate the malware, real-based applications are analysed
based on various parameters such as accuracy and F-measure based on P value and t
value. Based on the datasets, the deployment, detection rate and data availability are
performed (Fig. 17.1).
172 G. Sreelatha et al.
Fig. 17.1 Cloud malware detection technique
17.2 Literature Survey
Here, the existing malware-based detection technique is discussed and various
research breaches are highlighted. Then, the highlighted breaches are overcome
on the propose work [10]. Proposed a malware detection model associated with
cloud computing based on packet networking. The identification of packets, which
is considered as the input uses data mining technique to reduce the packet knowl-
edge and this helps to validate whether malware detection or not. As data mining
will analyse the extraction of data but learning algorithm will learn the input dataset.
So, SMMDS-based malware detection model will follow the concept of machine
learning technique. Then, [11] proposes a detection approach to detect the appli-
cation based on malware that will happen during the installation period and this
approach will have a set of instructions to be carried out in mobile applications.
The framework is proposed to extract the feature during the installation of mobile
application and execute them. Then by using some technique, features are guided
by using the machine learning algorithms to perform data classification, which helps
to identify the malware. Here, in this approach, there is limitation of huge system
resource and data loading time. The proposed [12] work to construct the detection-
based malware detection model when there is an anomaly behaviour happen in the
mobile applications. To perform the operation of feature information, there are some
machine learning algorithms such as naïve Bayes and logistic to calculate the accu-
racy rate among the data. As in this approach, there are certain parameters are not
considered as mentioned as, CPU utilization, data memory, battery and trained data
as the limitations.
Propose [13] malware aware detection model based on Gaussian mixture and
perform feature selection concept based on machine learning algorithm. Feature
collections are performed based on CPU utilization, battery, data memory, etc., and
it can be extracted. The limitations faced by the proposed model is the enabling of
remote server, i.e. cloud server. So far, the mobile behaviour has not been represented
while it is infected with malware, i.e. threat to the mobile phones [14]. Propose the
model, which manages the behaviour of the battery life when the mobile phone is
17 Modified Cloud-Based Malware Identification Technique 173
infected with malware threats. As the limitations is considered as more effective
models need an effective model to detect the malware [15] discussed the detection
model based on cloud will makes the malware detection more effective and make
the saving power more effective and optimal. Here, the detection model based on
machine learning will outperform the cloud-based detection model based on power
saving parameters. As it has the limitation of real-time application to be consid-
ered based on power saving parameter. Deep learning-based intrusion detection in
cloud services for resilience management was discussed in [16]. Deep learning with
machine learning algorithm based on Knowledge Decision Database (KDD) and
discussion on information security while exchanging the information between the
mobile devices. Parameter used to analyse the model as mentioned as, F-measure
and data accuracy based on various machine learning classification.
(i) Problem Formulation:
There is a drastic improvement in the wireless signals, which is associated
with IoT environment and results in the development mobile devices. This is
needed as there is growth in the impact of mobile devices nowadays, hacking
developers also active in spreading the malware day to day to weaken the user’s
data privacy and integrity as it is eventually need for the mobile users. There
is certain limitation to consider while managing the mobile device physically
as mentioned as, CPU utilization, phone battery, data memory, data accuracy
based on F-measure and other test related to machine learning.
(ii) Research Objectives:
The proposed cloud-based malware identification technique able to identify
and analyse the malware application to attain high accuracy during the detection
of the malware based on the huge datasets considered. Here, there are three
phases are associated with the propose work as mentioned below.
Input is taken from various repositories as all the malware-based application
are stored over there.
Mobile application files are identified using the malware scanner to detect.
Extracting the feature associated with application permission and API calls
are to be performed. In this extract feature, i.e. feature selection based on
machine learning will help to perform any test as mentioned in Sect. 17.1.
The malware detection model helps to identify the significant feature, and
those analyse will be performed based on machine learning algorithm.
To authenticate the malware, real-based applications are analysed based
on various parameters such as accuracy and F-measure based on Pvalue
and tvalue. Based on the datasets, the deployment, detection rate and data
availability are performed.
174 G. Sreelatha et al.
17.3 Proposed Methodology
The proposed cloud-based malware identification technique able to identify and
analyse the malware application to attain high accuracy during the detection of the
malware based on the huge datasets considered. Input is taken from various reposi-
tories as all the malware-based application are stored over there. Mobile application
files are identified using the malware scanner to detect. Extracting the feature asso-
ciated with application permission and API calls are to be performed. In this extract
feature, i.e. feature selection based on machine learning will help to perform any test
as mentioned in Sect. 17.1.
Algorithm 1: Dataset Formulation Based on Machine Learning
//Collection of Data Files Input: Dataset Files
Output: Extracted feature data
Collection of categorical based datasets (.apk Files) Removing the duplicate files
based on datasets
Consider the files with normal that Perform Data Generalization. Training the datasets
Validate the data Testing the data
Data Generalization Outcome
//Extracting the Feature from the information
Collecting the unique data samples Extract the file permission and API Calls
To revoke the permission to use the resource Execute the collected data and extract
the file
If (API Calls) return true else return false
Extracted Feature Data
The malware detection model helps to identify the significant feature, and those
analyse will be performed based on machine learning algorithm. To authenticate the
malware, real-based applications are analysed based on various parameters such as
accuracy and F-measure based on Pvalue and tvalue. Based on the datasets, the
deployment, detection rate and data availability are performed.
Algorithm 2: Feature Selection
Input: Feature Extraction
Output: Malware Detection Model
To identify the effective datasets based on feature extraction Categorical of classified
and unclassified errors based on the datasets
//Feature Ranking Training Datasets
17 Modified Cloud-Based Malware Identification Technique 175
Apply filtering technique to perform ranking Mutual Information
Relief-F
Association of two filtering technique to perform ranking feature
//Feature Subset Selection
Optimize the Feature Selection based on the training the dataset If (Identifying the
feature indices)
Select the feature set based on training set Else
Classify the data on machine learning algorithm}
To predict and validate the detection of malware is present or not
Limitations:
There is certain limitation on the existing work are mentioned below,
Limited datasets
Detection rate efficiency
Effective computation rate (Figs. 17.2 and 17.3).
Expected parameters:
F-measure and accuracy feature ranking.
Supervised machine learning techniques.
Unsupervised machine learning techniques.
Semi-supervised machine learning techniques.
Hybrid machine learning techniques.
Feature selection techniques based on significance and insignificance difference.
Data accuracy with respect to detection rate.
Malware families’ detection with respect to various machine learning techniques.
17.4 Conclusion
Construction of malware detection approach is proposed by performing and selecting
the featured datasets to identify the malware application or not. The feature selection
approach helps to minimize the featured datasets and perform better. Then, featured
set able to identify the malware with data misclassification-based errors with better
accuracy. Then, apply machine learning algorithm to provide better detection rate
and computation rate.
Same analysis process is applied on an image file which is malware free and it
result is below 70% as expected. From the above various files results, the groove-
monitor.exe file is blocked and the Image File1.bin and other files are pass for the
second process.
176 G. Sreelatha et al.
Fig. 17.2 Feature extraction
process
17 Modified Cloud-Based Malware Identification Technique 177
Fig. 17.3 Feature raking
and feature subset
References
1. Singh, K. U., Gupta, P. K., & Ghrera, S. P. (2015). Performance evaluation of AOMDV routing
algorithm with local repair for wireless mesh networks. CSI Trans ICT, 2(4), 253–260.
2. Novakovic, J. (2010). The impact of feature selection on the accuracy of Naïve Bayes classifier.
In 18th Telecommunications forum TELFOR (vol. 2, pp. 1113–1116).
3. Plackett, R. L. (1983). Karl Pearson and the chi-squared test. International statistical
review/revue internationale de statistique, 51(1), 59–72; Wang, W., Wang, X., Feng, D., Liu,
J., Han, Z., & Zhang, X. (2014). Exploring permission-induced risk in android applications
for malicious application detection. IEEE Transactions on Information Forensics and Security,
9(11), 1869–1882.
4. Cruz, C., Erika, A., & Ochimizu, K. (2009). Towards logistic regression models for predicting
fault-prone code across software projects. In Proceedings of the 2009 3rd International Sympo-
sium on Empirical Software Engineering and Measurement (pp. 460–463). IEEE Computer
Society
5. Hall, M. A. (1999). Correlation-based feature selection for machine learning (Doctoral disserta-
tion, The University of Waikato, Department of Computer Science); Pawlak, Z. (1982). Rough
sets. International Journal of Computer and Information Sciences, 11(5), 341–356.
178 G. Sreelatha et al.
6. Dash, M., & Liu, H. (2003). Consistency-based search in feature selection. Artificial Intel-
ligence, 151(1–2), 155–176; Kohavi, R., & John, G. H. (1997). Wrappers for feature subset
selection. Artificial Intelligence, 97(1–2), 273–324.
7. Arp, D., Michael, S., Malte, H., Hugo, G., Konrad, R., & Siemens, C. E. R. T. (2014). Drebin:
Effective and explainable detection of android malware in your pocket. NDSS, 14, 23–26.
8. Cui, B., Jin, H., Carullo, G., & Liu, Z. (2015). Service-oriented mobile malware detection
system based on mining strategies. Pervasive and Mobile Computing, 24, 101–116.
9. Enck, W., Ongtang, M., & McDaniel, P. (2009). On lightweight mobile phone application
certification. In Proceedings of the 16th ACM Conference on Computer and Communications
Security (pp. 235–245). ACM.
10. Narudin, F. A., Ali, F., Nor, B. A., & Abdullah, G. (2016). Evaluation of machine learning
classifiers for mobile malware detection. Soft Computing, 20(1), 343–357.
11. Wei, T.-E., Mao, C.-H., Jeng, A. B., Lee, H.-M., Wang, H.-T., & Wu, D.-J. (2012). Android
malware detection via a latent network behavior analysis. In 2012 IEEE 11th International
Conference on Trust, Security and Privacy in Computing and Communications (pp. 1251–
1258). IEEE.
12. El Attar, A., Khatoun, R., & Lemercier, M. (2014). A Gaussian mixture model for dynamic
detection of abnormal behavior in smartphone applications. In: 2014 Global Information
Infrastructure and Networking Symposium (GIIS) (pp. 1–6). IEEE.
13. Dixon, B., & Mishra, S. (2013). Power based malicious code detection techniques for smart-
phones. In 2013 12th IEEE International Conference on Trust, Security and Privacy in
Computing and Communications (pp. 142–149). IEEE.
14. Suarez-, G., Tapiador, J. E., Peris, P., & Pastrana, S. (2015). Power-aware anomaly detection in
smartphones: An analysis of on-platform versus externalized operation. Pervasive and Mobile
Computing, 18, 137–151.
15. Chen, P. S., Lin, S.-C., & Sun, C.-H. (2015). Simple and effective method for detecting abnormal
internet behaviors of mobile devices. Information Sciences, 321, 193–204.
16. Chakravarthi, S. S., Kannan, R. J., Natarajan, V. A., & Gao, X. (2022). Deep learning based
intrusion detection in cloud services for resilience management. CMC-Computers, Materials &
Continua, 71(3), 5117–5133.
Chapter 18
Design and Deployment of the Road
Safety System in Vehicular Network
Based on a Distance and Speed
Thalakola Syamsundararao, Badugu Samatha, Nagarjuna Karyemsetty,
Subbarao Gogulamudi, and V. Deepak
Abstract In the age of intelligent communication technology, intelligent mobile
phones play a critical role in road accidents. Their effect on driving, resulting in
vehicle crashes over the last two decades, has become a significant risk. Speed was
identified as an important risk factor for road traffic accidents. By monitoring vehicle
speed and position, one can help prevent accidents by sending out alert messages and
limiting the consequences for unprotected road users such as pedestrians and cyclists.
To manage and control road accidents, the position and speed of the vehicle were
communicated to nearby cars, and the excellent circle method was used to calculate
the distance between the towing vehicle and the trailing vehicle. This method is based
on the zero point of the earth’s equator and GPS. The application was tested in this
paper using mobile devices. The experiment used various smartphone modules, such
as GPS receivers, digital road maps, and communication systems. A prototype was
developed and evaluated using mobile phones in highway and city scenarios with
varying speeds and network sizes. As a result of the experiment, location and speed
accuracy were determined, and alert messages were generated when the distance
between vehicles fell below the standard or government-specified length. The inves-
tigation could be expanded further by connecting to the Internet, storing data in the
cloud, performing analytics, and involving insurance agents, relatives, and nearby
hospitals.
T. Syamsundararao
Department of CSE, Kallam Haranadhareddy Institute of Technology (A), Chowdavaram, Guntur,
India
e-mail: syamsundar@khitguntur.ac.in
B. Samatha
Department of CSE, Vignan’s Foundation for Science, Technology, and Research, Guntur, India
e-mail: drbs_cse@vignan.ac.in
N. Karyemsetty (B)·S. Gogulamudi ·V. Deepak
Department of CSE, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Andhra Pradesh,
India
e-mail: nagarjunak@kluniversity.in
V. Deepak
e-mail: v.d @li ve . in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
B. N. K. Rao et al. (eds.), Intelligent Computing and Applications, Smart Innovation,
Systems and Technologies 315, https://doi.org/10.1007/978-981-19-4162-7_18
179
180 T. Syamsundararao et al.
18.1 Introduction
In transportation traffic, safety becomes one of the significant problems, and speed
is acknowledged as a high risk for causing road traffic accidents. As vehicles are
growing in number, levels of traffic accidents have significantly increased [1,2]. The
cars and vulnerable road users like pedestrians and cyclists also face traffic accidents.
According to the World Health Organization (WHO), road fatalities will increase to
become one of the major causes of death by 2030 [3]. More than 50% of road
injuries and accidents are caused due to unsafe driving behavior and attending phone
calls. However, driving was difficult for venerable road users such as pedestrians and
cyclists, as proposed by the United States researchers [4]. Over the past two decades,
millions of people died due to the causes of road injuries every year. It shows the
way to lose their valuable life. Various companies and vehicle manufacturers have
developed a solution for continuously watching vehicle and driver behavior. The
problems with these solutions are that they are costly in buying [3,5]. We developed
an android project based on the real-time mobile application using java as the frontend
and firebase database as the backend. This application tracks the speed of the vehicle.
When the vehicle exceeds a certain speed, the user mobile automatically turns to silent
mode. If the user gets a call at that moment, then an alert message will be sent to
the caller to avoid talking while driving. The background communication network
among vehicles (OBUs) was mentioned in Fig. 18.1.
Fig. 18.1 Safety message communication among OBUs in VANET
18 Design and Deployment of the Road Safety System 181
18.2 Literature Review
A comprehensive summary of previous research on mobile phones [2,3,5] based on
driving, tracking speed, and smartphone sensing has been provided in this section.
In addition, the literature survey has categorized information based on different
applications [4,6]. The data is
Driver information.
Traffic information.
Vehicle information.
Environmental information.
The survey of the recent projects has been discussed in the context of the rest
of this paper. Ali et al. proposed that the Crowd-ITS [1] server provides a web
interface to aggregated traffic information [6]. According to Signal Guru [7,8],
various traffic signal detection, data filtering, and traffic signal scheduling schemes
have been suggested.
Jam Eyes, the designer of the traffic jam awareness and observation system (TJAS)
[9], has presented a new traffic jam awareness and observation system.
This new approach provides critical information such as the length of the traffic
queue and the amount of time spent in the bottleneck. Verma [10] integrated the
OBD-2 system with a smartphone- and web-based data gathering technologies to
create a hybrid electric car CAN-bus and data monitoring system; they then used the
data from the servers for remote monitoring.
Accelerometers and audio sensors are attached to smartphones, and that informa-
tion is immediately forwarded to the dispatch server in the event of an accident on a
freeway [11,12]. The accident photos are being sent directly using GPS coordinates
and VOIP channels. Unlike the Zaldivar et al. system, the White et al. system incorpo-
rates the data recording detection technology. Accident detection uses accelerometer
values from smartphones and not electronic control unit values (ECU).
The researchers included Medins et al. [13] and designed a smartphone-based
accelerometer to help correct bad driving habits. The study done by Eriksson et al.
proposed the first road condition monitoring system that utilizes an accelerometer
and GPS to detect and map road anomalies, such as potholes. The authors, Ghose and
colleagues and Mednis and colleagues, have collaborated to develop road monitoring
applications that see potholes and collect data via sensors, which are then sent to
remote servers where an alert is issued to drivers.
Mohan et al. have developed a mobile phone based on multiple sensors to identify
the bumps, pits, breakings, and honking in driving and localize the telephone to keep
energy. Castignani et al. have proposed a mobile app to monitor and track the fleet
system using GPS and GSM modules to interact quickly and efficiently to automate
real-time fleet management. Based on the literature survey, this paper has proposed a
safe drive mobile application, which recognized some limitations in present systems
and tried to advance road safety to control the speed and avoid talking on phones.
182 T. Syamsundararao et al.
18.3 Proposed System
Based on the limitations of the existing systems, we have found the solution for
avoiding speed driving and people talking on mobiles while driving at high speed. So,
we developed a safe drive mobile-based android application that tracks the vehicle’s
speed in real time by GPS for locating the vehicle positions. When the vehicle exceeds
a specific rate, the user mobile automatically turns to silent mode. If the user gets a
call, an alert message will be sent to the caller. If the user receives the ring more than
three times from the same user, the mobile phone will automatically turn to vibrate
mode or a general mode in that emergency.
The ‘Safe Drive’ app developed the speed tracking mobile application. The plat-
form used to design the app in Android Studio Version 2.4 Front End JAVA, XML
Back End, and Firebase (MOBILE).
‘Widgets’ are the group of options in Android, which can drop and drag things
easily. We cannot view the XML code in android studio, but the layout will exposé
while appearing on the screen. To define the nature of the application, we need to
go to the ‘MainActivy.java.’ Now, we can see the tabs under application > Java to
set the code or design the layout that we need to run the application. Initially, we
test the application and then run it on an android emulator which facilitates run-time
virtually. Later on, successful testing, we need to test the application on a real-time
instrument. The overall process has shown in diagrammatic form Fig. 18.1.Inthe
project window, click the application and then execute the toolbar module in Android
Studio. In the deployment target window, select the device and click the ‘OK’ button.
The program will be installed using Android Studio, and all connected devices will
begin to function. The Safe Drive app, which was created on the smartphone, was
running in real time. It uses the great circle distance (GCD) method to determine the
distance between two OBUs (Fig. 18.2).
18.3.1 Experimental Methodology
Each OBU consists of the following modules
1. GPS module associated with the digital road map
2. RSU interface
3. Wireless communication modules like IEEE 802.11a & 4G
4. Location tracking module
5. Safety alert module.
Location accuracy depends on the accuracy of the GPS receiver; in real-time
military GPS or dual-band GPS can be used. The wireless communication modules
can receive and transmit the speed and location data from nearby OBUs. GCD method
is used to calculate the distance between two OBUs. Equations 18.2 and 18.3 are
used to calculate the distance point at vehicle A and vehicle B, as shown in Fig. 18.3.
18 Design and Deployment of the Road Safety System 183
Fig. 18.2 Proposed architecture for speed monitoring
Fig. 18.3 Great circle distance method
Cos AB =Cos PA Cos PB +Sin PA Sin PB Cos P(18.1)
184 T. Syamsundararao et al.
PB =latp latB (18.2)
PA =latp latA (18.3)
Depending on the distance between two-vehicle, warning alerts trigger the driver.
Standard distance, i.e., 50 m, is considered to prevent accidents from generating alert
messages.
18.3.2 Flow of Execution of an Application
The main advantages are:
The application gives an ‘alert message.’
The call will be activated after specific rings.
Automatically phone will be in ‘silent mode’ when vehicle speed crosses < 40 km.
The proposed application has been tested on the road, giving users safe and
comfortable driving (Fig. 18.4).
To create a new application, the developer must choose the project template as
‘empty activity,’ then click on Nex and accept the default activity name as (main
activity) and select finish. These android package files have to be uploaded to the
google play store.
Fig. 18.4 Results of speed tracking
18 Design and Deployment of the Road Safety System 185
18.3.3 Modules of the Application
Installation of application
We need to install the ‘Safe Drive’ mobile app for smartphone speed tracking. Here,
it indicates that the app is installed on your mobiles.
Device permissions
Here, the user will ask permission to access the ‘do not disturb mode’ in your mobile
while the app is running and allow permission to access the device location to track the
vehicle’s speed. In addition, the screen provides access to send and view messages,
then the mobile number from which the call has been received will send back an alert
message to that particular number. It will then ask permission to access the phone
calls so that the app can send messages to that specific number, and also, if the same
number repeats more than three times, then the mobile will change to the endless
mode.
Registration, Log in, and Home pages
The registration of the user. If the user is a first-time user, they have to register for
further process. This includes full name, email ID, mobile number, and password.
After entering all the details, then click on the sign-in button. Next, it navigates to
the log-in screen. If the user forgets his password, he will receive an email with a
password update link to his registered email address. If the user has already registered,
he can log in by giving his mobile number, password,and ‘sign-in.’ There are different
options for users to select the type of vehicle mode, user profile, and history on the
home page. Here, the user can choose the vehicle type, and after that, it will give a
green color indication that means the app has started. If the user wants to see their
profile, they can click on the screen’s profile option.
Pedometer screen
The next is the speedometer screen, which shows the vehicle’s current speed as it
was zero ‘0’ initially. When the vehicle exceeds the speed limit of <40, the app will
turn on to silent mode. And if the car is below the speed limit, the app turns to the
usual way. When the caller is trying to call a person on driving, the user can attend
the call when the vehicle’s speed is limited. If the user exceeds the rate, the alert
message ‘I am in driving call me later’ will automatically generate. It does not create
any miscommunication between the user and the caller. If the same caller repeats
more than three times, then the mobile will be turned on to general mode from silent
mode so that he can attempt the emergency call in certain situations. Figure 18.5
shows the trace of two OBUs in the mobile application.
The Safe Drive mobile application does not need any embedded system. The
application will automatically work with Global Positioning System (GPS) to find
the user’s location with the help of latitudes and longitudes. The application will
check whether the GPS is ON or not; it will ask the user to switch ON the GPS if it
is not. The last screen is the stop screen. When the user reaches the destination, he
186 T. Syamsundararao et al.
Fig. 18.5 Monitoring the speed in mobile
Fig. 18.6 Coordinated communication in VANET (field test)
can stop the mobile application will turn into the red color indication, implying that
the app has stopped working. Figure 18.6 shows the sharing of location and speed in
OBUs in different cases.
18.4 Conclusion
The risk of a crash increases significantly when traveling above the speed limit.
Adequate speed can be imposed on traffic via vehicle design features that restrict the
18 Design and Deployment of the Road Safety System 187
vehicle’s speed; specific safe drive characteristics are proposed for the application.
The proposed system utilizes mobile GPS to determine the vehicle’s location and
speed and the excellent circle distance method to determine the distance between
two cars. Based on the calculated distance, warning and safety alert messages were
sent to drivers, and the notes were recorded for further analysis. This application
will significantly improve your quality of life. These types of applications also help
drivers feel more secure on the road. We can save the lives of numerous humans,
such as pedestrians and cyclists, by utilizing these real-time applications. We can
also avoid talking on our phones while driving, avoiding accidents. The appropriate
investment will ensure the journey’s safety, as fulfilling the social responsibility of
these apps will benefit us in the future. We anticipate that smartphone-based vehicle
tracking and the driver will be a critical component of the intelligent transportation
system domain in emerging countries such as India.
References
1. Ali, K., Al Yaseen, D., Ejaz, A., Javed, T., & Hassanein, H. S. (2012). CrowdITS: Crowd-
sourcing in intelligent transportation systems. In Wireless Communications and Networking
Conference (WCNC) (pp. 3307–3311). IEEE.
2. Koukoumidis, E., Peh, L. S., & Martonosi, M. R. (2011). Signalguru: Leveraging mobile
phones for collaborative traffic signal schedule advisory. In Proceedings of the 9th International
Conference on Mobile Systems, Applications, and Services (pp. 127–140). ACM.
3. Zhang, X., Gong, H., Xu, Z., Tang, J., & Liu, B. (2012). Jam eyes: A traffic jam awareness and
observation system using mobile phones. International Journal of DistributedSensor Networks.
4. Yang, Y., Chen, B., Su, L., & Qin, D. (2013). Research and development of hybrid elec-
tric vehicles can-bus data monitor and diagnostic system through ODB-11 and android-based
smartphones. Advances in Mechanical Engineering.
5. Koukoumidis, E., Martonosi, M., & Peh, L. S. (2012). Leveraging smartphone cameras for
collaborative road advisories. IEEE Transactions on Mobile Computing, 11(5), 707–723.
6. White, J., Thompson, C., Turner, H., Dougherty, B., & Schmidt, D. C. (2011). WreckWatch:
Automatic traffic accident detection and notification with smartphones. Mobile Networks and
Applications, 16(3), 285–303.
7. Zaldivar, J., Calafate, C. T., Cano, J. C., & Manzoni, P. (2011). Providing accident detec-
tion in vehicular networks through OBDII devices and Android-based smartphones. In 36th
Conference on Local Computer Networks (LCN) (pp. 813–819). IEEE.
8. Magaña, V. C., & Organero, M. M. Artemisa: Using an Android device as an eco-driving
assistant. Cyber Journals: Multidisciplinary Journals in Science and Technology: Journal of
Selected Areas in Mechatronics (JMTC).
9. Castignani, G., Derrmann, T., Frank, R., & Engel, T. (2015). Driver behavior profiling using
smartphones: A low-cost platform for driver monitoring. Intelligent Transportation Systems
Magazine, IEEE, 7(1), 91–102. https://doi.org/10.1109/MITS.2014.2328673
10. Verma, N. (2018). Development of native mobile application using android studio for cabs and
some glimpse of cross-platform apps. International Journal of Applied Engineering Research,
13(16), 12527–12530. ISSN 0973-4562. © Research India Publications. http://www.ripublica
tion.com
11. Eriksson, J., Girod, L., Hull, B., Newton, R., Madden, S., & Balakrishnan, H. The pothole
patrol: Using a mobile sensor network for road surface monitoring. In Proceedings of the 6th
International Conference on Mobile Systems, Applications, Street-Safety.
188 T. Syamsundararao et al.
12. Ghose, A., Biswas, P., Bhaumik, C., Sharma, M., Pal, A., & Jha, A. (2012). Road condition
monitoring and alert application: Using-vehicle smartphone as an internet-connected sensor.
In 10th International Conference on Pervasive Computing and Communications Workshops
(PerComWorkshops) (pp. 489–491). IEEE.
13. Mednis, A., Strazdins, G., Zviedris, R., Kanonirs, G., & Selavo, L. (2011). Real-time pothole
detection using android smartphones with accelerometers. In International Conference on
Distributed Computing in Sensor Systems and Workshops (DCOSS) (pp. 1–6). IEEE.
14. Mohan, P., Padmanabhan, V. N., & Ramjee, R. (2008). Nericell: Rich monitoring of road
and track conditions using mobile smartphones. In Proceedings of the 6th ACM conference
Embedded Network Sensor Systems (pp. 323–336). ACM.
Chapter 19
Diagnosis of COVID-19 Using Artificial
Intelligence Techniques
Pattan Afrid Ahmed, Prabhu Gantayat, Sarika Jay, Venkata Sai Satvik,
Jagadeesh Kannan Raju, and A. Balasundaram
Abstract Artificial intelligence is being used in a variety of ways by those trying
to address variants and for data management. AI, on the other hand, not only uses
historical data, it makes assumptions about the data without applying a defined set
of rules. This allows the software to learn and adapt to information patterns in more
real time. Numerous sources of medical images (e.g., X-ray, CT, and MRI) make
deep learning a great technique to combat the COVID-19 outbreak. Motivated by
this fact, a large number of research works have been proposed and developed. Chest
CT is an emergency diagnostic tool to identify lung disease. Artificial intelligence
(AI) gives big guidance in the rapid analysis of CT scans to differentiate variants of
COVID-19 findings. This work focuses on the recent advances of COVID-19 drug
and vaccine development using artificial intelligence and the potential of intelligent
training for the discovery of COVID-19 therapeutics.
19.1 Introduction
Coronavirus or the coronavirus infection is a breathing sickness achieved by defile-
ment with the coronavirus of a serious acute respiratory disorder, also known by
the name SARS-V2. COVID-19 directly tainted more than 16.6 crores individuals
worldwide causing more than 34.3 lakhs deaths. A significant issue in the determina-
tion of COVID-19 is the shortcoming and shortage of clinical tests. In such a manner,
a few endeavors were dedicated to looking for elective techniques for determining
COVID-19’s occurrence. Computed tomography scans or CT examinations are seen
as promising for the identification of patients with suspected COVID-19 infection.
CT shows clear radiological discoveries from patients with COVID-19, filling in
as a more effective and open-test strategy. The principal issue with this strategy is
that it relies upon the expert to investigate the CT pictures, as the cycle is tedious,
tedious, and tiring for the master because of various pictures to be examined, causing
weariness, which can prompt mistakes in finding [1].
P. A. Ahmed ·P. Gantayat ·S. Jay ·V. S. Satvik ·J. K. Raju ·A. Balasundaram (B)
School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, India
e-mail: balasundaram.a@vit.ac.in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
B. N. K. Rao et al. (eds.), Intelligent Computing and Applications, Smart Innovation,
Systems and Technologies 315, https://doi.org/10.1007/978-981-19-4162-7_19
189
190 P. A. Ahmed et al.
Even though the quick purpose of care COVID-19 tests is relied upon to be utilized
in clinical settings eventually, for the present, usually, the time taken for COVID-19
test outcomes range from 3 h to 3 days, and most likely, not all nations will approach
those test units that give results quickly [1]. To manage these issues, CT scan-based
strategies have been proposed by various works as of late, indicating incredible
guarantee for the use of deep learning techniques and profound machine learning or
the ML primarily based methodologies for powerful popularity of the disease from
chest CT photos. Inspired by using the requirement for an extensive assessment of
such AI-based diagnosing methodologies, in this work, deep CNN-based strategies
are investigated and focused tentatively to get the measure of the handiness of these
methodologies in the modern emergency.
Artificial intelligence instruments have created steady and exact outcomes in
the applications that utilize either picture-based or different sorts of information.
Messina-2 performed one of the principle assessments on COVID-19 identifica-
tion utilizing X-beam pictures along with Apostol Poulos. In their investigation,
they considered exchange getting the hang of utilizing pre-prepared networks, for
example, Xception and Inception-ResNet-V2, VGG19, MobileNet-V2, Inception
which is the frequently used pre-trained models. A few assessment measurements
were utilized to assess the outcomes obtained from two distinctive datasets. The last
end was made by the creators utilizing the acquired disarray lattices, not the precision
results due to the imbalanced information [1].
Overall, CNNs that endeavor to replicate characteristic pieces of people on PCs
require pre-handling of pictures or data before dealing with them to the organization.
Right when the ConvNet was first grown, regardless, it was portrayed as a neural
organization that requires insignificant pre-getting of pictures before dealing with
them to the organization, and a structure that is good for removing the features from
pictures to improve the learning execution of the neural organization.
The ConvNet includes both segment extraction and request stages in a single
organization. A customary ConvNet involves three layers: convolution, pooling,
and related layers or FCs. Feature extraction is acted in the convolutional layer by
applying cover, which is the route toward disconnecting pictures into a predefined
estimation of segments and using channels to isolate features from the image. By
then, a feature map, which is the projection of features on the 2D guide, is made
by applying an established ability to the characteristics obtained by the cloak. The
activation limits establish the most capable neurons in a nonlinear way and reduce
the computational cost of the neural organization.
The discipline of scientific imaging has seen revolutionary adjustments inside
the ongoing past, due to the headways within the field of DL and Open CV or CV,
for a few, contagion, equivalent to pneumonia, MERS, SARS, ARDS, etc [214].
These situations of the pandemic, it has gotten a good deal extra significant for such
deep learning primarily based ways to address being applied progressively. Effective
utilization of such deep learning-based methodologies can conceivably be of high
utility for people, particularly concerning quick testing and recognition of the illness
that is quick diagnosis and predictions for the COVID-19.
19 Diagnosis of COVID-19 Using Artificial Intelligence Techniques 191
A couple of incitation limits are available in CNNs, and the corrected straight
unit ReLU is the most ordinarily used commencement work; it doesn’t enact all
the neurons at the same time, and along these, lines give a quicker assembly at the
point when the loads locate the ideal qualities to deliver the automatic reflex during
the preparation. A pooling activity is performed on the delivered highlight guide to
diminishing the measurements of the pictures.
Finally, the part map is leveled into a vector and sent off the related layer. The
get-together of the neural organization and the gathering of the data plans are acted in
the related layer, and its guidelines rely upon goof backpropagation to invigorate the
heaps inside this layer [15]. Each readied neural organization gets data for the partic-
ular task that is considered. While the fundamental guideline of fake neural networks
is to re-enact human conduct and insight, the exchange learning in counterfeit neural
networks is utilized to apply the put-away information on a specific assignment
for another connected assignment. Profound learning for picture acknowledgment
applications is fit for learning a great many pictures, and a few tremendous models
were prepared with various structures. These pre-prepared models have been freely
mutual with the goal that everything scientists can utilize the put-away information.
The cutting-edge pre-prepared openly accessible networks, specifically, VGG16,
Inception-V3, VGG19, MobileNet-V2, ResNet50, and Densenet121 were considered
in the correlation [16].
The paper aims to bring in methodologies currently implemented in this arena,
both with vanilla custom CNN-based learning and transfer learning, understand them
and try to understand the implementations and bring out the challenges that were
faced. Also, this work explores the dataset which has been used like COVID CT
dataset for constructing the model. Also adding another layer is added to it by imple-
menting a model which classifies COVID positive patients with pneumonia positive
and COVID positive. This step becomes very essential as there is a high chance that
pneumonia positive patients can be diagnosed as COVID-positive as the CT scans
of patients affected by these diseases are pretty similar. So, we had an extra model
which differentiates COVID positive from pneumonia positive as the last step if the
patient gets detected as COVID positive.
Also, there have been works on preprocessing the scans data or the CT scan
images using the OpenCV library [17]. Histogram equalization, adjusted log, and
rank equalization are done on the images to get a clear picture about all the abnor-
malities in the lung CT scan and more accurate results, while we were dealing with
deep learning models while training.
192 P. A. Ahmed et al.
Fig. 19.1 CT scan of lungs of COVID affected patients
19.2 Materials and Frameworks
19.2.1 Materials and Datasets
The COVID CT scans statistics set consists of 370 fine corona positive instances and
398 corona non-positive chest X-rays or CT images. These CT pictures are in various
sizes relating to tallness (normal =491, greatest =1853, furthermore, least =153)
and width (normal =383, most extreme =1485, and least =124). The covid positive
pictures were gathered from GitHub. The normal age for the COVID-19 gathering
was 45–65 years and included 130 male patients who were affected by COVID-19 and
65 female patients who were pretentious by COVID-19 disease. Also, a few couples
of patients’ information are feeling the deficiency of; and it is since the informational
collection utilized in this examination doesn’t have to go with complete metadata,
since this is indisputably the first uninhibitedly available COVID-19 CT scans picture
assortment, and it was made in a restricted time. Some examples of corona’s positive
effective and terrible chest scans pictures have been shown in Fig. 19.1.Andfor
pneumonia classification, we are using pneumonia affected patients’ CT scans, which
we procured from Kaggle. It is a dataset of 400 pneumonia positive cases which are
head-to-head compared with COVID-19 positive cases’ CT scans. Even there we
had a properly balanced dataset with a binary classification model. To establish or
organize our last informational collection for tests, all the photos have been changed
over into “.png”.
19.2.2 Frameworks
Late headways in the field of deep learning, particularly in the clinical imaging
area, demonstrate the expected use of different deep convolutional neural network
structures. Initially, the models which were trained upon were standard models, which
incorporate VGG16, VGG19, Inception-V3, ResNet50, etc.; in this research work,
what we did was we went with a better model which contained a hybrid of multiple
models working together to produce a more robust model output which will give out
19 Diagnosis of COVID-19 Using Artificial Intelligence Techniques 193
stronger results when working on real-time work. With ReLU enactment, a solitary
hub forecast layer with Sigmoid enactment is added. Aside from this pattern model,
a choice combination-based methodology is additionally thought (Fig. 19.2).
The number one notion of this preference aggregate method is that the mix of
character models is probably controlled utilizing consolidating the character forecasts
through a more component casting poll technique, which can conceivably enhance
the general effectiveness of the same old fashions (Fig. 19.3).
Here are the models which we used in our research paper for transfer learning,
and we got pretty good results. Below is a basic introduction about each model’s
working and its architecture and network flow.
VGG16 Architecture. VGG-Net is by a long shot the most prominently well-
known ML or deep convolutional neural network models, which made sure about
the first and second situations in the ILSVRC 2014 article restriction and order
Fig. 19.2 Transfer learning models
Fig. 19.3 Layer breakdown
194 P. A. Ahmed et al.
errands. In this engineering, the principle thought was that expanding the profun-
dity of the convolutional neural network models and supplanting enormous bits
with numerous more modest portions were conceivably more exact in completing
Computer Vision [16] assignments. VGG-Net variations are as yet utilized broadly
for some, Computer Vision undertakings for extricating profound picture highlights,
for additional preparation, particularly in the clinical imaging field.
Inception-V3 Architecture. In the Inception-V3 designs, the principle notion is to
manage outrageous changeability where the extremely good parts inside the photos
are feasible by permitting the community to include diverse forms of bits on a
comparable degree, which basically “broadens” the community. This possibility of
numerous bits at a similar stage is acknowledged by what’s called the modules called
Inception Mods. After a few years, Inception-V2 along with Inception-V3 structures
has been contemplated, which is more suitable over Inception-V1 layout by tending
to relevant points of the hobby with admiration to illustrative congestion and subor-
dinate classifiers, by using accumulating element factorization, and through supple-
menting institution standardization to helper classifiers. Then is Inception-V3 design
turned into the first sprinter in the ILSVRC 2015 image order mission.
ResNet18 Architecture. The main thought in ResNet designs is a mixture of layers
also known as conv2D layers and pooling layers also known as pool 2D stacking up
one on another, can cause the organization execution to corrupt, as a result of the
issue of vanishing slant, thusly, to deal with this, character backup course of action
affiliations can be used, which can skirt at any rate one layers. These arrangements
of layers that contain character associations are known as a leftover squares. Adding
skip associations disposes of the high preparing mistake, which is ordinarily seen in
a generally profound design. ResNet18 is one of the variations of the ResNet design
that contains 18 layers.
DenseNet Architecture. The DenseNet engineering, proposed by Huang et al. [14],
enhances ResNet architecture by joining thick associations, which interfaces every
layer to each other layer. Its miles rather thickly associated models assure of the
fact, where individually overlay receives all of the component designs from each
preceding overlay and offers its remarkably personal detail guide to every ensuing
overlay. The role of engineering is to provide the ability to reuse and at the same
time as keeping up low-range boundaries altogether. There are numerous variations
of the DenseNet and its design that are utilized generally, out of which DenseNet-201
architectures were also used in that research paper.
VGG19 Architecture. VGG19 is a variation of the VGG model which in short
comprises a total of 19 layers in which 16 convolution layers, 3 fully associated
layers, 5 MaxPool layers, and 1 SoftMax layer are present. VGG19 has approx-
imately 19 billion FLOPs. VGG-Net variations are as yet utilized broadly for
some, Computer Vision undertakings for extricating profound picture highlights,
for additional preparation, particularly in the clinical imaging field.
19 Diagnosis of COVID-19 Using Artificial Intelligence Techniques 195
19.3 Approach
The planned and intended strategy was previously discussed where the approach
comprises: (i) procurement of CT pictures, (ii) extraction of highlights utilizing a
convolutional neural network, (iii) characterization of pictures utilizing XGBoost,
(iv) approval of results utilizing measurements generally utilized in CAD frameworks
[16].Picture Securing. Coronavirus chest scans dataset from GitHub where it is an
open-source project where people from different parts of the world can come and
give their contribution of X-ray scans of covid positive patients and the maintainers
will make sure to keep the dataset balanced always with covid negative scans. There
are two class groupings in COVID-19 dataset. The dataset involves 745 images of
which positive cases COVID-19 pictures is 349 an all-out negative cases COVID-19
pictures is 397. The following illustration or the image instance of pictures from
corona chest X-rays and this dataset images will be used for our research purpose.
This work has used the above dataset to obtain more precise individual sentiment
results (Fig. 19.4).
The major drawback was, to improve the sentiment score, a highly accurate senti-
ment dictionary is in need and sentences like “not good” were neglected which has
opened up for future works. The nature of the pictures has been safeguarded. After
separating the data from the structure of the pictures, the subtitles related to the
pictures were recognized. The choice of CT was done physically. At that point, the
inscription or text related to every CT was perused for arrangement in COVID-19
and non-COVID-19 [1] (Fig. 19.5).
Feature Extraction. Feature descent is the main cycle in the improvement of a
programmed picture arrangement framework [18]. The exhibition of the character-
ization can be affected by the nature of the removed information, prompting lost
execution by the framework. Lately, profound models which have been pre-trained
have been suggested for feature extraction. A convolutional neural network is a
model of profound discovery that has various leveled structures of learning assets
Fig. 19.4 Architectural flow design
196 P. A. Ahmed et al.
Fig. 19.5 CNN-SVM model layer architecture
with high caliber in its layers. Convolutional neural networks can diminish network
multifaceted nature and boundary numbers through neighborhood responsive fields,
sharing activity, and weight sharing.
The change of the convolution portions will be finished by the calculations of the
backpropagation [15], which depends on the casual or insouciant inclination plunge
calculation, used to lessen the space between the network yield information and the
preparation marks.
A convolutional neural network comprises exchanging layers of convolution and
subsampling, at that point changing into completely associated layers when moving
toward the yield layer. Figure 19.3 shows how a convolutional neural network is
designed. To take advantage of convolutional neural network as a characteristic or
property puller, remaining related overlay in the community become removed or
simply overlooked, and the last yield of the new version, which we organized with our
very own statistics, became applied as capabilities that portray the facts photograph.
And all the transfer learning models have this standard architecture maintained. This
involves replacing the pre-trained network part for different models. Figure 19.6
represents this.
Convolutional neural networks can remove commonly helpful information
features, distinguish also, eliminate input redundancies and save just fundamental
parts of the information in powerful and prejudicial portrayals [19]. Its semi-
associated and completely associated layers give a sensible climate for propelling
the preparation and learning measure [20]. Along these lines, the convolution layers
fill in as a productive feature extractor and spend significant time in diminishing the
size of the information and delivering a less-repetitive informational collection.
Preprocessing with OpenCV. The lung CT scan is transformed using a stack of
OpenCV techniques including CLAHE, adjusted log, and rank equalize for better
detection of diseases by our model. Our models with the help of the pre-trained
Fig. 19.6 TL model layer architecture
19 Diagnosis of COVID-19 Using Artificial Intelligence Techniques 197
Fig. 19.7 OpenCV processing
models will try to classify the CT scans into COVID positive class or COVID negative
class (Fig. 19.7).
CLAHE gives your image some extra contrast with which your image will start
looking more contrasty and where blacks will be more black and whites will be more
whiter. Adjusted log will help in enhancing the images with a little touch of extra
brightness and bringing out those suppressed parts of the image which went through
some compression after we applied CLAHE on the image.
Finally, our rank equalizer will give this input image some final touches as it will
help in bringing out all those small patterns or light structures in the image. They
will pick all these and enhance them so that those small formations will be easily
handled and caught by our deep learning model.
Adding Regularization to images. During our research, we found out that models
were heavily overfitting due to fewer data, and also, we have seen that the models
were memorizing all the image patterns and formations, so we had to go for some
image regularization patterns. Some of the famous regularization patterns are L1 or
Lasso regularization, L2 or ridge regularization, and some use a hybrid of both L1
and L2. But, when it comes to images, it is better to go for random image patching or
random image hiding where we hide some part of the image and then send it as the
input to our model. Every time random patches are generated and random position
is patched. Now, this will help our model to not memorize the parts and structures
and force it to become more generalized and easy on the real-time data and not be
rude on unseen data. This way we can help our model to get a little bias and reduce
the already high variance of the overfitting model (Fig. 19.8).
Characterization. The arrangement comprises perceiving which of a bunch of clas-
sifications a ground-breaking perception has a place, given past preparation on an
informational index that has perceptions whose classification is known [21]. In AI,
tree development is an exceptionally successful and generally utilized technique.
Our model with the help of the pre-trained models will try to classify the CT scans
into COVID positive class or COVID negative class. The pictures that gave the
best outcomes with the ConvNet tests and factual estimation tests, which were the
natural pictures, were contrasted and the pre-prepared networks referenced in the
past segment.
198 P. A. Ahmed et al.
Fig. 19.8 Random masking
Approval and Results. To approve the model, we will go for measurable assessment
measurements normally utilized in the writing. These measurements are determined
dependent on the disarray network; given the number of actual perfects or (TP),
incorrect perfects or (FP), actual rejections or (TN), and incorrect rejections or (FN),
the procedures can be numerically communicated by the following:
Accuracy =TP +TN
TP +TN +FP +FN (19.1)
Precision =TP
TP +FP (19.2)
Recall =TP
TP +FN (19.3)
FScore =2×Recall ×Precision
Recall +Precision (19.4)
The results shows exactness for the suggested models as 97.24% for VGG16,
95.89% for VGG19, 93.15% for Inception-V3, 94.52% for SeresNet18, 87.67% for
MobileNet and 90.42% for ResNet18 and for custom models gave 94.52% as the
best model.
Also, our pneumonia versus COVID classifier gave a 95.65% validation score.
We notice that transfer learning models perform better and give better accuracy than
normal CNN models. VGG16 has the best performance with nearly 98% accuracy
closely followed by VGG19 at 96% and then our custom CNN model at 94%. Also,
our pneumonia classification model gave 95.65% accuracy on the validation dataset.
We took the pneumonia dataset from Kaggle and used that against our covid positive
cases dataset for the classification part. Both the datasets were balanced, and the
models were kept with similar hyperparameters tuning (Figs. 19.9 and 19.10).
19 Diagnosis of COVID-19 Using Artificial Intelligence Techniques 199
Fig. 19.9 Genome data visualization
Fig. 19.10 Vaccination forecasting
200 P. A. Ahmed et al.
19.4 Conclusion and Future
In this paper, AI-based deep learning models have been worked upon, and pre-trained
models show a better and more heavily efficient model performance and accuracy
over handmade CNN models. Preprocessing techniques using OpenCV have been
used for better working of our model. Also, pneumonia cases are distinguished from
usual COVID-19 which was a serious concern in all previous research papers. Various
ways to deal with models are applied for contextual investigations, and experi-
mentation has brought about great execution and precise outcomes. Future works
have emerged from these referenced examination works. Deep convolutional neural
networks along with pre-trained models show a better and more heavily efficient
model performance and accuracy.
Additionally, a choice combination-based methodology is likewise proposed,
which consolidates the prediction of every one of the individual deep convolutional
neural network models, to improve the prescient exhibition. Coronavirus is a world-
wide issue, and it not just hugely affects the strength of residents yet additionally on
the worldwide economy and how to recognize it utilizing lung CT examine pictures
from the deliberately chosen information of lung CT check COVID-19-contaminated
patients from around the world.
Besides, even though the model’s proposed technique shows extraordinary guar-
antee, there is even a lot of space for possibly improving the prescient exhibition of
the methodology. As of late, thoughts like image augmentation, transfer learning can
be incorporated as part of future enhancements. The thoughts need to be investigated
as a feature of things to come to work [16].
References
1. Carvalho, E. D., Carvalho, E. D., de Carvalho Filho, A. O., de Araújo, F. H. D., & Andrade Lira
Rabêlo, R. D. (2020). Diagnosis of COVID19 in CT imageusing CNN and XGBoost. In IEEE
Symposium on Computers and Communications (ISCC), Rennes, France (pp. 1–6).https://doi.
org/10.1109/ISCC50000.2020.9219726
2. Sekeroglu, B., & Ozsahin, I. (2020). Detection of COVID19 from chest X-ray images using
convolutional neural networks. SLAS TECHNOLOGY: Translating Life Sciences Innovation.
https://doi.org/10.1177/2472630320958376
3. Shuja, J., Alanazi, E., Alasmary, W., et al. (2020). COVID19 open source data sets: A
comprehensive survey. Applied Intelligence.https://doi.org/10.1007/s10489-020-01862-6
4. Carvalho, E. D., de Carvalho Filho, A. O., de Sousa, A. D., Silva, A. C., & Gattass, M. (2018).
Method of differentiation of benign and malignant masses in digital mammograms using texture
analysis based on phylogenetic diversity. Computers Electrical Engineering, 67, 210–222.
5. de Carvalho, A. S. V., Jr., Carvalho, E. D., de Carvalho Filho, A. O., de Sousa, A. D., Silva,
A. C., & Gattass, M. (2018). Automatic methods for diagnosis of glaucoma using texture
descriptors based on phylogenetic diversity. Computers Electrical Engineering, 71, 102–114.
6. He, X., Yang, X., Zhang, S., Zhao, J., Zhang, Y., Xing, E., & Xie, P. (2020). Sample-efficient
deep learning for COVID19 diagnosis based on CT scans. medrxiv.
7. Abbas, A., Abdelsamea, M., & Gaber, M. (2020). Classification of COVID19 in chest X-ray
images using detrac deep convolutional neural network. medRxiv.
19 Diagnosis of COVID-19 Using Artificial Intelligence Techniques 201
8. Zhao, J., Zhang, Y., He, X., & Xie, P. (2020). COVID CT-dataset: A CT scan dataset about
COVID19.
9. Narin, A., Kaya, C., & Pamuk, Z. (2020). Automatic detection of Coronavirus disease
(COVID19) using X-ray images and deep convolutional neural networks.
10. Carvalho,E.D.,Filho,A.O.,Silva,R.R.,Araujo,F.H.,Diniz,J.O.,Silva,A.C.,Paiva,A.
C., & Gattass, M. (2020). Breast cancer diagnosis from histopathological images using textural
features and CBIR. Artificial Intelligence in Medicine, 105, 101845.
11. Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image
recognition. https://arxiv.org/abs/1409.1556
12. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 770–
778). IEEE.
13. Szegedy, C., Liu, W., Jia, Y., et al. (2015). Going deeper with convolutions. In Proceedings of
the IEEE Conference on Computer Vision and Pattern Recognition.
14. Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected
convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition (pp. 4700–4708). IEEE.
15. Zipser, D. & Andersen, R. (1988). A back-propagation programmed network that simulates
response properties of a subset of posterior parietal neurons. Nature, 331(6158), 679–684.
[Online]. https://doi.org/10.1038/331679a0
16. Mishra, A. K., Das, S. K., Roy, P., & Bandyopadhyay, S. (2020). Identifying COVID19
from chest CT images: A deep convolutional neural networks based approach. Journal of
Healthcare Engineering, 11(2020), 8843664. https://doi.org/10.1155/2020/8843664.PMID:
32832047;PMCID:PMC7424536
17. Ripley, B. D. (1996). Pattern recognition and neural networks. Cambridge University Press.
18. Ren, X., Guo, H., Li, S., Wang, S., & Li, J. (2017). A novel image classification method with
CNN-XGboost model (pp. 378–390).
19. Kothandaraman, D., Balasundaram, A., Dhanalakshmi, R., Sivaraman, A. K., Ashokkumar,
S., et al. (2022). Energy and bandwidth based link stability routing algorithm for IoT. CMC-
Computers, Materials & Continua, 70(2), 3875–3890.
20. Balasundaram, A., Dilip, G., Manickam, M., Sivaraman, A. K., Gurunathan, K., et al. (2022).
Abnormality Identification in video surveillance system using DCT. Intelligent Automation &
Soft Computing, 32(2), 693–704.
21. Arunachalam, P., Janakiraman, N., Sivaraman, A. K., Balasundaram, A., Vincent, R., et al.
(2022). Synovial sarcoma classification technique using support vector machine and structure
features. Intelligent Automation & Soft Computing, 32(2), 1241–1259.
22. Masci, J., Meier, U., Cire¸san, D., & Schmidhuber, J. (2011). Stacked convolutional auto-
encoders for hierarchical feature extraction. In T. Honkela, W. Duch, M. Girolami, & S. Kaski,
(Eds.), Artificial neural networks and machine learning—ICANN (pp. 52–59). Springer Berlin
Heidelberg.
23. Liu, Y. (2018). Feature extraction and image recognition with convolutional neural networks.
Journal of Physics: Conference Series, 1087, 062032.
Chapter 20
Location Tracking via Bluetooth
Jasthi Siva Sai, Mukkamala Namitha, Routhu Ramya Dedeepya,
Mulugu Suma Anusha, Angadi Lakshmi, and Mukesh Chinta
Abstract The inclination to forget and misplace things gradually increases with
aging, and this creates a difficult situation for elderly people to remember and locate
the objects without assistance. In this novel era, this phenomenon is quite common
with all groups to misplace the objects which are used frequently, for instance, vehicle
keys, glasses, wallets, and so forth. Finding misplaced objects is usually time taking
and makes an individual stressed out; if this situation is prolonged, it makes the
individual miss deadlines, difficult to manage the activities, and eventually stressed
out. To reduce the work of finding the misplaced objects and to support elderly people
to manage the activities, the proposed application comes into play. Our application
titled I AM HERE works using Bluetooth technology and iTag, which can be used to
track the location of the objects using just a mobile phone. I AM HERE application
mainly focuses on identifying the location of misplaced objects. An iTag is attached
to the objects used regularly everyday and is commonly misplaced. The iTags are
identified by the I AM HERE application using Bluetooth technology individually.
The mobile application is developed in the Kodular platform which is an open-
source tool. The application mainly targets the elderly people who rely on others for
everyday activities to identify the location of their objects by using a smart mobile
phone.
20.1 Introduction
The pandemic has a huge impact on the lives of people. The way of living has
been altered to a great extent, in order to stay healthy people are advised to follow
social distancing protocol. This situation has created turbulence in the lifestyle of
people, especially for elderly people as elderly people mostly depend on supporters to
perform their day-to-day activities. As the tendency to misplace and forget objects is
usually high in elderly people, it would be difficult for them to cope up with the present
scenario. Our main motive is to reduce the dependency on people and to support
J. S. Sai (B)·M. Namitha ·R. Ramya Dedeepya ·M. S. Anusha ·A. Lakshmi ·M. Chinta
VR Siddhartha Engineering College, Vijayawada, India
e-mail: jasthi.sivasai@gmail.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
B. N. K. Rao et al. (eds.), Intelligent Computing and Applications, Smart Innovation,
Systems and Technologies 315, https://doi.org/10.1007/978-981-19-4162-7_20
203
204 J. S. Sai et al.
people with different age groups. Our application I AM HERE can track various
objects which have a high priority of usage in our daily activities and are usually
misplaced. Bluetooth technology is prominently used in connecting devices within a
small range. Using Bluetooth technology and iTag, our application is developed on
the Kodular platform. I AM HERE application can be used to find the approximate
location of the objects within a certain range. In today’s era, mobile applications are
utilized to a significant extent. According to a recent study from the Pew Research
Center, it is found that among seniors of ages 65 and above, about 85% own a cell
phone. Of those, 46% of seniors use a smartphone. As the utility of smartphones
among seniors is of a good number, the mobile application interacts with them and
promotes finding the location of the specific object. The Bluetooth tags are highly
advantageous as the iTags can withstand a harsh environment and are of low cost.
So, iTags can be utilized for a longer period and require less maintenance.
20.2 Related Work
Kanyanee Phutcharoen; Monchai Chamchoy; and Pichaya Supanakoon proposed a
study regarding the improvement of the accuracy of indoor positioning by utilizing
the Bluetooth low-energy beacons. These beacons were placed inside an indoor envi-
ronment and connected to the electronic devices having a beacon analyzer application
to measure the strength of the RSS signals from each of these devices. The position of
the user equipment is obtained by utilizing the fingerprinting technique with the least
RMS error matching. The accuracy is obtained from each measurement, and five such
measurements were taken into consideration. The cumulative distribution function of
distance error is found, and the average of the considered measurements is inferred.
From the obtained results, it is concluded that the average of the measurements
reduces the error in positioning of objects in the indoor environment [1] Hannah,
Francis, and Tan Shunhua work involved designing a device using Arduino which
can detect and also track in real-time, any items (premium/important for the user),
which a user is most likely to keep close by, i.e., not more than 10 m away at any
time. Bluetooth technology is used to maintain a connection between the users smart
phone and the device at all times. Based on the results, when a system detects a
disconnection between the device and the phone, the user’s device receives a call
along with a warning message including the current location alerting the user of this
disconnection. From thereon, new updated coordinates are received on the phone for
every 1000 ms which helps to keep track of the device even if its location has changed
[2]. Lu Bai, and Fabio Ciravegna proposed an indoor positioning system based on
Bluetooth low energy (BLE) to monitor the daily lifestyle of senior citizens or people
with inabilities. The proposed detection system consists of several sensors which are
established in different positions in a domestic area. With BLE beacons attached and
based on the captured raw received signal strength indicator, users specific location
within the building is captured. For determining the indoor location, a trilateration-
based and a fingerprint-based methods are suggested. To verify and test the systems
20 Location Tracking via Bluetooth 205
performance, simulations are done in different environments. Based on the results,
it has been observed that accurate information on the users location can be obtained
allowing to track the users lifestyle through which users health condition can be
monitored and judged. The work also demonstrated that placement location of BLE
beacons and the quality of the beacons does not have a huge impact on the accuracy
[3].
20.3 Methodology
This section shows the proposed system of the possible approach.
The proposed system consists of the following stages.
Initialization of the mobile application.
Navigation to the second screen.
Initializing the iTags.
Connection of the required iTag.
Finding the required object.
A. Initialization of the mobile application
The name of the mobile application we developed is “I AM HERE”. To
initialize the mobile application, simply open the application. The initial screen
of the mobile application is displayed. The user also needs to enable the Bluetooth
in their smartphone (Fig. 20.1)
B. Navigation to the second screen
In order to navigate to the second screen, click the “go” button widget. Since
the entire activity takes place in the second screen, the user needs to navigate to
the screen.
C. Initializing the iTags
Now, the iTags are to be connected to the frequently lost or misplaced objects
and initialized by giving them a long press to an approx. of three seconds.
D. Connection of the required iTag
All the iTags that fall under the range of reachability of the smartphones
that is all those iTags that can be detected by the smartphone are displayed
upon the screen. The user needs to select the particular iTag which he or she
intends to find, by simply clicking upon the detected iTag. After this step, the
connection gets established in between the iTag that has been selected and the
mobile application.
E. Finding the required object
Now in order to find the required object, the user needs to click the “alarm” button
widget. This results in the generation of a beep sound from the iTag attached to the
object. Hence, the user reaches the object by displacing it according to the direction
of the sound. The alarm button widget is re-clicked to stop the generation of the beep
sound (Fig. 20.2).
206 J. S. Sai et al.
Fig. 20.1 Flowchart representing the proposed methodology
20.4 Implementation
20.4.1 System Components
There are different hardware components used in the application. They are Kodular,
Bluetooth Technology, iTag.
Kodular Kodular software allows the development of Android applications by
providing modules that require no coding skills. Kodular endows features as compo-
nents and blocks through which interactive applications can be developed. Material
design UI is a Kodular module that being enhanced the user interface. Adding to this,
20 Location Tracking via Bluetooth 207
Fig. 20.2 Proposed system diagram
Kodular enables to share and distribution of the applications on a free online app store
that contains apps developed using Kodular. IDE is a service that permits us to create
our extensions and distribute the apps. The status page specifies the services that are
available for usage and the services that are under maintenance. Kodular commu-
nity, where users interact, generate, and modify the ideas. The projects developed on
Kodular can be viewed and stored by using my Kodular, the control panel on Kodular
account. The “I AM HERE” application is developed on Kodular.
Bluetooth Bluetooth is a prominent wireless technology that enables transfer of
information between devices. Bluetooth functions by using the UHF radio waves
within the ISM bands ranging from 2.402 to 2.48 GHz. Bluetooth can be attained
by embedding low-cost transceivers within the devices. The Bluetooth connectivity
range is about 30 ft. Bluetooth consumes low power to function and is cost-effective.
In 79 different channels, Bluetooth sends and receives the signals and is focused on
2.4 GHz. Bluetooth devices connect automatically upon the detection and allow two
minimum and a maximum of eight devices to communicate at once. Out of which
one device works as a master, and the other devices work as slaves. The master
device initiates the communication and controls the traffic between the slave devices
connected to it and itself. To the master device, the slave devices respond. The device
roles can be switched between master and slave. The master device, Bluetooth device
address (BD_ADDR), determines the frequency hopping sequence.
iTag iTag is a kind of Bluetooth 4.0 and low-energy consuming product that func-
tions through “I AM HERE” application. iTag can be connected to the things which
we usually lose and tend to misplace and works with smart phone to prevent lost. In
addition, iTag can provide a last seen pin drop on map to assist you in recovering your
items. It is compact sized, and also it is low-energy consuming supporting Bluetooth
4.0 whose effective range is up to 25 m. It functions like an anti-lost alarm (Fig. 20.3).
Brand and Model: iTag Compatible mobile phone: any iPhone starting from 4th
generation, and other Apple handheld devices like touch, air, iPad; Android device
with Bluetooth 4.0, any smartphone (Android 4.3 version and up-graded version).
208 J. S. Sai et al.
Fig. 20.3 iTag
Battery: CR2032 lithium coin battery.
Standby time: six months.
20.4.2 Application Implementation
The home page is when the application is initialized. The go button is used to navigate
to the other screen, where the actual activity takes place as shown in Fig. 20.4.
The detected iTags are displayed upon the screen along with the MAC address as
shown in Fig. 20.5.
To establish a connection with an iTag, the required iTag is to be selected, and
then click on the connect button. After that a message “connecting...” is displayed
on the screen as shown in Fig. 20.6.
The iTag has been connected successfully, the connect button changes its color to
green and text to “connected”, indicating that the connection has been successfully
established, and the iTag is ready to be utilized as shown in Fig. 20.7.
To find the object’s location, the alarm widget is to be clicked. When the alarm
widget is clicked, its color changes to red, and the iTag begins to produce an alarm
or sound continuously. This helps the user to find the object as shown in Fig. 20.8
The beep sound is produced to track the location of the object, and when the
object is found, the user can again tap on the alarm widget, and then, the iTag stops
producing the sound, and the color of the alarm widget changes to blue indicating
that the required object has been found as shown in Fig. 20.9.
20 Location Tracking via Bluetooth 209
Fig. 20.4 Application home
screen
20.5 Conclusion
The application mainly focuses on identifying the lost items. The proposed appli-
cation helps in tracking the misplaced objects through Bluetooth technology via a
mobile app and with the help of iTags attached to these items. The functionality of
the application can be extended as per the requirement of the client. The tags and the
app were donated to one of the old age homes, and we received satisfactory feedback
from our clients. The beep sound is very audible making it quite easy to identify
the objects. Further, this application can be improved by including the location of
the object, through which the location of an object will be displayed on maps which
are much easier to identify the object. Voice recognition features can also be imple-
mented to connect the iTags by recognizing their names through user instructions.
This is specifically helpful for those who are not comfortable utilizing the keypad or
a touch interface.
210 J. S. Sai et al.
Fig. 20.5 Application
detected iTag
20 Location Tracking via Bluetooth 211
Fig. 20.6 Connecting an
iTag
212 J. S. Sai et al.
Fig. 20.7 iTag connected
20 Location Tracking via Bluetooth 213
Fig. 20.8 Selecting alarm
button
214 J. S. Sai et al.
Fig. 20.9 Sound produced
References
1. Phutcharoen, K., Chamchoy, M., & Supanakoon, P. (2020). Accuracy study of indoor posi-
tioning with Bluetooth low energy beacons.In 2020 Joint International Conference on Digital
Arts, Media and Technology with ECTI Northern Section Conference on Electrical, Electronics,
Computer and Telecommunications Engineering (ECTI DAMT & NCON) (pp. 24–27)
2. Adjei, H. A. S., Oduro-Gyimah, F. K., Shunhua, T., Agordzo, G. K., & Musariri, M. (2020).
Developing a bluetooth based tracking system for tracking devices using Arduino. In 2020
5th International Conference on Computing, Communication and Security (ICCCS) (pp. 1–5).
|INSPEC: 20257873.
3. Bai, L., Ciravegna, F., Bond, R., Mulvenna, M. (2020). A low cost indoor positioning system
using bluetooth low energy. IEEE Access, 8, 136858–136871. |INSPEC: 19974145.
4. Bapat, A. C., & Nimbhorkar, S. U. (2016). Designing RFID based object tracking system by
applying multilevel security. In International Conference on Wireless Communications, Signal
Processing and Networking (WiSPNET).
20 Location Tracking via Bluetooth 215
5. Samual, J. (2015). Implementation of GPS based object location and route tracking on android
device. International Journal of Information System and Engineering,3(2). ISSN: 2289–7615.
6. Gaikwad, P. V., & Kalshetty, Y. R. (2017). Bluetooth based smart automation system using
android. International Journal of New Innovations in Engineering and Technology, (Ijsr.net).
7. Ahmad, S., Rouyu, L., & Hussain, M. J. (2014). Never lose! smart phone based personal
tracking via bluetooth. International Journal of Academic Research in Business and Social
Sciences,4(3). |ISSN: 2222–6990.
8. Bisio, I., Sciarrone, A., & Zappatore, S. (2015). Asset tracking architecture with bluetooth low
energy tags and ad hoc smartphone applications. In European Conference on Networks and
Communications (EuCNC) |INSPEC: 15366926.
9. Tiwari, M., & Singhai, R. (2017). A review of detection and tracking of object from image
and video sequences. International Journal of Computational Intelligence Research, 13(5),
745–765. |ISSN 0973–1873
10. Coustasse, A., Tomblin, S., & Slack, C. (2013). Impact of radio-frequency identification
(RFID) technologies on the hospital supply chain: A literature review. Perspectives in Health
Information Management, 10, 1d.
11. Friesen, M. R & McLeod, R. D. (2015). “i” International Journal of Intelligent Transporation
System Research 13(3), 143–153
12. K. Kalaiselvi, K., & Karunya, S. (2017). Tracking system—A proposed model on literature
review, In 2017 International Conference on Inventive Computing and Informatics (ICICI)
(pp. 766–770)
Chapter 21
Shrimp Surfacing Recognition System
in the Pond Using Deep Computer Vision
Gadhiraju Tej Varma and Sri Krishna Adusumalli
Abstract Shrimp productivity has greatly increased and has high-economical values
along with its impact on the GDP of our country. Considering these impact on the
farmers, this paper proposes a convolutional neural network model, which assists and
the farmer in understanding the symptoms that the shrimp exhibits during critical
conditions of any virus effects or decrease in the dissolved oxygen levels in the
water. The proposed system also synthetizes a set of image by uses of image data
augmentation techniques for obtaining a considerable set of images. These images
are used as the training images for the model. The custom shrimp detection systems
use faster_rcnn_inception_v2_coco model, which effectively detects the shrimp and
represents them through boundary boxes. Whenever surfacing kind of symptoms are
exhibited, thus triggering the farmer to identify the risk factor and take the counter
measures.
21.1 Introduction
Indian aquaculture and fisheries are not only an important source of nutritional nour-
ishment to the people but also contribute to the economical and financial well-being
of around 28 million people. India is also 3rd largest producer and contributes to
7.7% of global fish products. An area of 2.36 million Ha of tanks and ponds where
culture-based fisheries contribute to the total fish production. The current produc-
tion is around 8.5 million MT and is targeted to be 13.5 MT. According to National
Fisheries Policy 2020, which states about the optimizing the capture and culture
fisheries potential, of our country which in turn contributes to the economical well-
being of the country as well as the farmer [1]. More than 50 different varieties of
G. Tej Varma (B)
Department of Computer Science and Engineering, Centurion University of Technology and
Management [CUTM], Vizianagaram, Andhra Pradesh 535003, India
e-mail: tejvarma.varma@gmail.com
S. K. Adusumalli
Department of AI, Shri Vishnu Engineering College for Women (A), Bhimavaram, West
Godavari, Andhra Pradesh 534202, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
B. N. K. Rao et al. (eds.), Intelligent Computing and Applications, Smart Innovation,
Systems and Technologies 315, https://doi.org/10.1007/978-981-19-4162-7_21
217
218 G. Tej Varma and S. K. Adusumalli
shrimp and fish are cultivated and exported to different countries, which prominently
include Tiger Shrimp (Penaeus monodeon), Pacific white legged shrimp (Penaeus
vannamei), Indian white shrimp (Pangasius indicus), etc. During year 2020–21, the
export of frozen shrimp and fish is worth Rs. 36,946.48 Crores in value. Apart from
these number additions, a target of 5 millions tones of productivity is set for the year
2022 [2]. In spite of these remarkable number, the gray scale is the influence of the
disease and its spread which causes huge loses to the farmer.
One of the major products of aquaculture is shrimps, and the cultivation dura-
tion for each crop ranges from 3 to 4 months which is entirely dependent on the
geographical location and the soil where shrimp is grown. There would be a lot
of efforts and investments, which ranges from 5 to 10 lakhs per acre. The water
parameters, which influence the shrimp health and the productivity, are the dissolved
oxygen, pH, ammonia, and minerals present in the water. Apart from these parame-
ters, there are various other influences, which effect the health of the shrimp which
include virus attacks also. The most common and early symptoms of such kind of
virus attacks or lack of dissolved oxygen are that the shrimp would to reaching to
the surface of the water.
The below table gives a brief overview about different viral, bacterial, and fungal
infections to the shrimp and clinical signs or symptoms that the shrimp exhibits
during the effect period [3] (Table 21.1).
These symptoms require closely monitored by the farmer for understanding the
condition of the shrimp which in turn becomes a tedious job for the farmer in spite of
have CC cameras fitted around the pond in few conditions. In this context, we have
developed a new solution for the monitoring of the pond with help of CC cameras
and also with image recognition model which is developed.
Table 21.1 Various viral or
bacterial infections and
symptoms for the shrimp
Names of viral or bacterial
infections
Clinical signs or symptoms
Hepatopancreatic parvo like
virus (HPV)
Reduced feed, poor growth,
body surfacing
Yellow head disease (YHD) Appear swimming slowly
near the surface of pond
Luminous vibriosis Vertical swimming behavior
Vibriosis Corkscrew swimming
behavior and appears at edge
of the pond
Soft shell syndrome Appears toward the edge of
the ponds
Decrease in dissolved oxygen
(4–10 ppm)
Appears on the surface of
water when DO less than
4 ppm
21 Shrimp Surfacing Recognition System in the Pond Using 219
21.2 Literature Survey
There is a lot of research which is taking place in the deep learning and image
processing along with IoT. Dah-Jye Lee, Guangming Xiong, Robert M. Lace, and
Dong Zhang describe about a grading and shrimp quality detection in shrimp shipping
process, which also detects if the shrimp has a whole body or a broken part of shrimp
is present [4]. Dmitry A. Konovalov, Alzayat Saleh, Dina B. Efremova, Jose A.
Domingos, and Dean R. Jerry talk about the system which can automate the weight of
fish which is being harvested by training with 100 whole fish and with 200 defective
fish without tails or fins [5]. Zihao Liu, Fang Cheng, and Wei Zhang used deep
learning technique to identify the soft shell in the shrimps when the shrimp passes
through the conveyor belt, they have experimented with images of 200 shrimps with
soft shell and 200 shrimps in good condition and develop a neural network model
for automatically identifying the shrimps with soft shell or loose shell [6]. Weber.
V.A.M, Weber F.L, Gomes R.C, Oliveira Junior, A. S.; Menezes, G. V.; Abreu, U.
G. P.; Belete, N. A. S. and Pistori, H. have proposed a model on identification of
body weight of cattle by experimenting on 34 cattle, by taking the dorsum image and
body lateral area, features like hip width, rib height, and other parameters are used
in identifying the weight of the cattle [7].
Wu-Chih Hu, Hsin—Te Wu, Yi Fan Zhang, Shu Huan Zhang, and Chun Hung Lo
explain about a two convolutional neural network architecture with fully connected
layers, which can successful detect 6 different types of shrimps with 95.48% of accu-
racy [8]. Takehiro Morimoto, Thi Thi Zin, and Toshiaki Itami talk about the detection
of infected shrimp by various abnormal behaviors of the shrimp like low food intake,
appearing in shallow waters or by making sudden movements inside water, but this
may not be effective for large ponds with large shrimp density [9]. Joseph Lemley,
Shabab Bazrafkan, and Peter Corcoran propose a new smart argumentation tech-
nique which shown a positive sign of increasing accuracy and also by reducing the
network losses by creating a network that learns while creating an augmented data
[10]. Ryo Takahashi, Takashi Matsubara, and Kuniaki Uehara propose a new for
image data augmentation by making use of deep convolutional neural networks in
which a random image cropping and patching (RICAP) technique, and a new image
is created from taking various portions of few images [11].
21.3 Proposed Work
The below given figure gives an architectural overview of the proposed system. This
deep learning architecture model is broadly divided to two module, one module
is for the synthesizing of required image data set of images through image data
augmentation, and the other module is the use of conventional neural networks to
train the machine for detection of the shrimp on the surface of the water (Fig. 21.1).
220 G. Tej Varma and S. K. Adusumalli
Fig. 21.1 Block diagram of proposed system
21.3.1 Image Data Augmentation
As mentioned earlier the first module is for creation the image data set it is a complex
job to capture such images of shrimp on the surface of the water. So we managed to
collect a few images from a nearby pond during shrimp exhibiting such kind of symp-
toms. But unfortunately, it is not possible to train the machine using conventional
neural networks with a small data set. In order to amplify the number of images,
we made use of data augmentation techniques. Data augmentation is a deep learning
neural network which is used to artificially create a new training data from the exiting
small training data. Image data augmentation is most renowned data augmentation
technique which applies various transformation on the image to create a new artificial
data set. The above figure depicts the various image augmentation techniques. We
make use of geometric transformations which is one of the basic image manipula-
tions techniques. The original image which was captured has a dimensions of 4032
* 3024, and the synthetized image which is generated is also of 4032 * 3024.
21 Shrimp Surfacing Recognition System in the Pond Using 221
Fig. 21.2 Before (left side) and after (right side) shift augmentation
21.3.2 Horizontal and Vertical Shift Augmentation
While keeping the image dimensions as same, this shifts the pixel of the image in a
either horizontal or vertical direction. A part of the image is been clipped, and the
new values of the replaced pixels are same the last pixel value in the row or column,
respectively. The left part of the image is the original, and right is after transformation
(Fig. 21.2).
21.3.3 Horizontal and Vertical Flip Augmentation
In this case of image augmentation, flip of the pixel in the image is taken place with
respective to horizontal or vertical (Fig. 21.3).
Fig. 21.3 Before (left side) and after (right side) flip augmentation
222 G. Tej Varma and S. K. Adusumalli
Fig. 21.4 Before (left side) and after (right side) random rotation augmentation
21.3.4 Random Rotation Augmentation
The image transformation undergoes a rotation of pixels with some rotation angle
which is ranging from 0 to 360 ° (Fig. 21.4).
In each of the augmentation techniques, the left image indicates the original image,
and right image indicates an example of an image with augmentation techniques. In
total from the available set of 20 original images, each image is further synthesized
to 50 images, and we all put together which brings up the image count to a 1000
images.
Module 1: Custom object detection for shrimp detection.
The second module is a customized object detection model using deep convo-
lutional neural network techniques in which we used image processing along with
deep learning techniques in order to train the system for shrimp detection. The first
step involves the installation of libraries in Anaconda and dependent model from the
GitHub. Followed by the installation process, it is important to annotation the image
which are developed from the image data augmentation with the help of labelImg
package, which automates the process and generation of the .xml file from each
image which is annotated. A green rectangular box indicates the shrimp which is
surfacing on the water which is annotated.
Once the .xml file has been generated and the annotation is complete. It is impor-
tant to partition the images into training set and test set, in order to train and creation
of the model, we develop a training set, and to evaluate the developed model, we
test it with few images which is known as the test set. The partition of the images
is done with the ratio of 90–10% which indicates 90% of the training model and
10% for evaluating the model. The ration can also be taken as 80–20% also. Apart
from these partitions, creation of label map is also done, which indicates the desired
object in annotation. In the proposed system, we are training the model to identify
only shrimps in the image. The next step involves the creation of the TensorFlow
records from the obtained .xml file of the annotation. Initially, the .xml is converted
into two .csv files based on the distribution of the training and the test set. These
21 Shrimp Surfacing Recognition System in the Pond Using 223
Table 21.2 List of various
convolutional neural
networks versus speed of
respective models
Name of RCNN model Speed in ms
Faster_rcnn_inception_v2_coco 58
Faster_rcnn_resnet50_coco 89
Faster_rcnn_resnet50_lowproposals_coco 64
Faster_rcnn_inception_resnet_v2_atrous_coco 620
Faster_rcnn_resnet101_coco 106
Mask_rcnn_inception_v2_coco 79
Mask_rcnn_resnet50_atrous_coco 343
CSVs are now considered as the inputs and are created to record format which indi-
cates as the TensorFlow records. For creation of the training pipeline model, we
consider faster_rcnn_inception_v2_coco model which is one of the most powerful
RCNN models available. The below table gives a glimpse of various RCNN models
available at GitHub for creating of the training model. From the numbers given
below, it is evident that the faster_rcnn_inception_v2_coco model is the fastest of
the convolution neural networks (Table 21.2).
Upon downloading the required model from the GitHub along with the .config
file, we then customize the file with the modifications that are necessary to run the
model on the image data set which is already labeled using the annotations of the
image. Running the model will take a large amount of time based on the hardware
availability. A GPU-based system considerably takes lesser time when compared with
the CPU systems, CPU systems may take around few hours to days of time which
is also based on the model and number of steps we involving in the model. But for
a stronger model to be created, it typical require a large number of steps. At regular
interval of time, the model creates a checkpoint with name of model_number of
steps.ckpt. Various graphs would be generated for each steps based on the precision,
learning rate, loss, etc. It is quite evident from the graphs that the total number of steps
involved in training the model is 42 thousand steps. In order to develop a stronger
model, we can continue training with more number of steps. The loss function for
identifying the shrimp in the image can be written as
L({pi},{ti})=1
Ncls
i
Lclspi,p
i+λ1
Nreg
i
p
iLregti,t
i
where iindicates the index of the annotation in the images and Piis the probability of
ibeing in the image. Pi*is the truth label which is denoted by 0 if the annotation is a
negative value and 1 if the annotation is a positive value. There are also various tables
which represent the learning curve for the model. The final step of the custom object
detection for the shrimp is to evaluate the developed model. This makes use of the
10% of evaluation images which are partitioned prier training the model. Intersection
over union (IoU) values can also be evaluated by using the ground truth boundary
boxes and predicted bounding box.
224 G. Tej Varma and S. K. Adusumalli
IoU =Overlapping Area
Combined Area
21.4 Results and Analysis
For the development of this work, we made use of Macbook Air with 1.6 GHz Intel
Core i5 Processor, 8 GB 1600 MHz DDR3 Ram, Intel HD graphics 600 1536 MB, and
the collection of original images are collected through Apple Iphone 7 with color
space of RGB and profile as display P3. As mentioned, the dimensions of all the
images captured are 4032 * 3024. Annotation is very useful in object classification
of an image. Bounding boxes are used in annotations process and also for training
the model. The below represents a screenshot of the annotation which is done with
the help of marking the green boxes on the image. Each image may contain more
than one boxes to represent the presence of the shrimp. To mark an rectangular box,
we require Xmin, Ymin, Xmax, and Ymax. In the same way, the image might have
more than one rectangular box which is supposed to be represented. This annotation
process has to be done for all the training images and the test images which are
partitioned. An .xml file is generated from the image after the annotation is done
with the help of labeling package (Fig. 21.5).
After the annotations and the partitions of the training and the test set, creating of
the label map is a part where the number of items are recognized in the image.
In the model we are creating, we have a single entity which is the shrimp, so
we had only one item and id. As mentioned for creation of the training model,
Fig. 21.5 Shrimp detection using proposed model
21 Shrimp Surfacing Recognition System in the Pond Using 225
faster_rcnn_inception_v2_coco model is used and on the system that we used it took
around 2 days to complete the training of 42 thousand steps. The first two graphs
represent the total mean average precision and mAP large, and the second set of
graphs represent the various losses like box classifier loss and localization loss from
the training model. The third set of images indicate the consistent learning rate and
the loss rate throughout the training model (Fig. 21.6).
After the training model is created, the model has to evaluate using the test set
which we partitioned just after the annotation of the images. The below images are
from the test set and also indicate all the shrimps with the boundary boxes (Fig. 21.7).
We also got the boundary boxes values which are represented as Ymin, Xmin,
Ymax, and Xmax, respectively, in the below image.
Fig. 21.6 Performance of annotation for image partition
Fig. 21.7 Shrimp detection using training data
226 G. Tej Varma and S. K. Adusumalli
21.5 Conclusion
The proposed model used convolutional neural network-based deep learning archi-
tecture model for the shrimp surfacing detection system, in order to assist the farmers
considering the economical impact and the difficulty in surveillance of the pond
throughout the life time of the crop. The proposed model produces good results
through using faster_rcnn_inception_v2_coco for identification of the shrimp using
the boundary boxes on the images. As the proposed model, we made study on shrimp
detection in shallow water as well as surfacing, and we would like to enhance
the model by developing an IoT device which alerts the farmer and the system
currently works on image capturing which can be enhanced detect shrimp in live
video streaming so that it can be directly integrated to CC cameras.
References
1. Lee, D. J., Xiong, G., Lane, R. M., & Zhang, D. (2012). An efficient shape analysis method
for shrimp quality evaluation. In 2012 12th International Conference on Control Automation
Robotics & Vision (ICARCV) (pp. 865–870). IEEE.
2. Konovalov,D. A., Saleh, A., Efremova, D. B., Domingos, J. A., & Jerry, D. R. (2019). Automatic
weight estimation of harvested fish from images. In 2019 Digital Image Computing: Techniques
and Applications (DICTA) (pp. 1–7). IEEE.
3. Liu, Z., Cheng, F., & Zhang, W. (2016). Identification of soft shell shrimp based on deep
learning.In 2016 ASABE Annual International Meeting (p. 1). American Society of Agricultural
and Biological Engineers.
4. Weber, V. A. D. M., Weber, F. D. L., Gomes, R. D. C., Oliveira Junior, A. D. S., Menezes,
G. V., Abreu, U. G. P. D., Belete, N. A. D. S & Pistori, H. (2020). Prediction of girolando
cattle weight by means of body measurements extracted from images. Revista Brasileira de
Zootecnia, 49.
5. Hu, W. C., Wu, H. T., Zhang, Y. F., Zhang, S. H., & Lo, C. H. (2020). Shrimp recognition
using shrimpnet based on convolutional neural network. Journal of Ambient Intelligence and
Humanized Computing, 1–8.
6. Morimoto, T., Zin, T. T.,& Itami, T. (2018). A study on abnormal behavior detection of infected
shrimp. In 2018 IEEE 7th Global Conference on Consumer Electronics (GCCE) (pp. 291–292)
IEEE.
7. Lemley, J., Bazrafkan, S., & Corcoran, P. (2017). Smart augmentation learning an optimal data
augmentation strategy. IEEE Access, 5, 5858–5869.
8. Takahashi, R., Matsubara, T., & Uehara, K. (2019). Data augmentation using random image
cropping and patching for deep CNNs. IEEE Transactions on Circuits and Systems for Video
Technology.
9. Shorten, C., & Khoshgoftaar, T. M. (2019). A survey on image data augmentation for deep
learning. Journal of Big Data, 6, 60. https://doi.org/10.1186/s40537-019-0197-0
10. Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection
with region proposal networks. Advances in neural information processing systems 91–99
11. Gavrilescu, R., Zet, C., Fos
,al˘au, C., Skoczylas, M., & Cotovanu, D. (2018) Faster R-CNN: an
approach to real-time object detection. In 2018 International Conference and Exposition on
Electrical and Power Engineering (EPE) (pp. 0165–0168). IEEE.
Chapter 22
Sign Language Recognition for Needy
People Using Machine Learning Model
Pavan Kumar Vadrevu, M. R. M. Veeramanickam, Sri Krishna Adusumalli,
and Sasi Kumar Bunga
Abstract In this digital era, gesture identification is one of the revolutions to serve
specially-abled (deaf and dumb) people, and it is been investigated for several
decades. Tactlessly, all research studies have their restrictions and are incompetent
to be used commercial. Few studies have been recognized to be effective for gesture
identification, but it needs an affluent cost to be marketed. Currently, research scholars
have more consideration for developing gesture identification systems that can be
used commercially. Scholars do their studies in innumerable ways. It twitches from
the data gaining approaches when required. The data gaining approach differs since
the cost is required for a decent device, but a low-cost device is needed for the hand
gesture identification system to be made saleable. Approaches used in implementing
the system are diverse between studies. Separately, approach has its strong point
compared to other approach and research still using different approaches in devel-
oping their gesture identification. Every approach has setbacks compared to others.
This manuscript intends to review the gesture identification system for needy people.
Hence, other studies can get more information about the approaches used and develop
better applications in the forthcoming.
22.1 Introduction
The proposed statement in this manuscript is mute people cannot communicate with
other people directly, and they can only communicate with the help of sign language.
As the people do not know the sign language that specially-abled people communicate
with so the key objective of the proposed model is to forecast the sign that we give so
it helps us to communicate with them easily. Even though we get the sign as written
P. K. Va d r e v u (B)·M. R. M. Veeramanickam ·S. K. Bunga
Department of Information Technology, Shri Vishnu Engineering College for Women
(Autonomous), Bhimavaram, India
e-mail: vadrevu.pavan@svecw.edu.in
S. K. Adusumalli
Department of AI, Shri Vishnu Engineering College for Women (Autonomous), Bhimavaram,
India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
B. N. K. Rao et al. (eds.), Intelligent Computing and Applications, Smart Innovation,
Systems and Technologies 315, https://doi.org/10.1007/978-981-19-4162-7_22
227
228 P. K. Vadrevu et al.
text, then the people who do not know how to read will get the problem. If we also
give it in the speech format, someone cannot understand any other language other
than the native language (Example: Telugu). By considering this problem statement,
authors experimented with a sign detector that detects the sign and displays it in
both languages like English and Telugu text and also in Telugu speech. The existing
system is very simple to understand and interpret.
22.2 Relevant Study
Here, in this existing system, we have a sign detector but some people cannot under-
stand any language other than their native language, so few systems convert the sign
into text and as well as speech but it is not in Telugu. The existing hand gesture recog-
nition system recognizes the hand gestures statically. This means every hand gesture
has to be captured manually by a person and has to be sent into the system to identify
the gesture [1]. Also, people have to control all the operations and conditions in the
environment, which increases manual work. Also, the existing system should have
special requirements and working conditions to work. Another existing system uses
distinct data belts. A data belt is a communicating device, approximating a belt worn
on the hand, which enables perceptible sensing and fine-motion control in robotics
and virtual reality [2]. Data belts are one of the several types of electro-mechanical
devices used in the haptic application. Motion control involves the use of a sensor
to detect the movement of the user hand and fingers, and the translation of these
motions into signals that can be used by a virtual hand (for example, in gaming) or
a robotic hand (for example, in remote- control surgery). This method services the
devices (mechanical or optical) attached to a belt that transduces finger flexion into
an electrical signal for defining the hand position [2].
22.2.1 Pros and Cons of the Existing and Planned Models
The existing state-of-the-art systems are no longer helpful for the users who know
only native languages of their own. It is difficult to get the sign-in Telugu speech
format and text formats. It is tough to communicate with specially-abled people. It is
not possible to capture the gestures always with the state-of-the-art systems. Every
model must have a plain and uniform background always, and it is not movable.
The system functionality dropdown when the distance increases from the user and
camera. In the proposed system, we develop the model to reduce the drawbacks that
were present in the existing system. We created a sign detector so that we can get the
sign detected and get that sign in both Telugu speech and Telugu text. If we did not
place the hand to detect, then it shows the warning that the hand is not placed. If we
give the valid sign, then the sign will get predicted. If the sign is not valid, then the
sign will not get recognized. This will detect the hand gesture and say it is a sign. This
22 Sign Language Recognition for Needy People 229
sign will be converted into both Telugu text and Telugu speech. The advantage of this
model is to improve the accuracy, and it is very easy to maintain compared with the
state-of-the-art models. It provides people to understand better sign language. This
model helps the users who do not know how to read and understand other languages
than the Telugu language.
22.3 Proposed Method
22.3.1 Data Preparation
We are using a total of six different hand gestures. We will be creating 6 different
folders for each class in the directory with each folder name representing the number
of fingers in hand gestures. That is from 0 to 5. During the setup, we are going to
generate our data set. This is done by capturing the live camera feed with the help of
the system’s camera [3]. While the system’s camera continuously starts capturing,
we will be pressing the corresponding folder name on the keyboard which inserts
the current frame picture into the corresponding directory. Algorithm:
Step 1: The camera input frame is converted into a grayscale image.
Step 2: This image is again threshold and converted into a binary image.
Step 3: That image has just one channel which makes the CNN algorithm learn
easily.
Step 4: Image augmentation is submitted to all the composed data to increase size
of the learning data set.
CNN is part of deep learning techniques. The neural networks are layers dispensa-
tion algorithms containing input, output, and other intermediate films. So, this inter-
mediate film includes processing film like intricacy, assembling, repeated, drop-out,
clatter, stabilization, etc. [4]. This model consists of films that will be processing
images faster and loading the features depending on the number of times the training
is going and number of samples been trained. “Keras” is the neural network built-in
library that will be imported to make CNN work on system [4]. The below architec-
ture gives the complete details of the model which is used to convert the given text
into voice (Fig. 22.1).
Hand Segmentation from Background: A histogram method is used to separate a
hand from the background image. Background termination techniques are used
to gain the finest result (Fig. 22.2).
230 P. K. Vadrevu et al.
Fig. 22.1 Proposed model
Fig. 22.2 System architecture
22.4 Results and Discussion
Gesture Labeling: The identified hand is then handled and shown by finding contours
and convex hull to identify finger and palm positions and dimensions. Finally, a
gesture object is shaped from the recognized pattern which is compared to a defined
gesture dictionary.
The experimentation is conducted by collecting the data set from different Internet
sources. The collected data set is undergone for training with the model, and this
22 Sign Language Recognition for Needy People 231
Fig. 22.3 Test case specification
can be tested using a ten-fold cross-validation technique to understand the model
efficiency and accuracy [5]. The system is implemented in a Python environment as
we have a rich set of built-in libraries to get more accurate results on a windows-based
machine. Based on the obtained results, this implementation is giving better accurate
results compared with the existing few models. 80% of the data set is used for the
training, and 20% is used to test the model. The given figure represents the test case
specification for gesture recognition with five valid test cases. The experimentation
results were also shown in the given figure (Fig. 22.3 and 22.4).
22.5 Conclusion
This paper deals with hand gesture recognition for HCI automation. User-friendly
gestures to improve communication between specially-abled people and normal
people and also supervisory things by hand are more normal, cooler, more elastic,
and cheap, and there is no need to fix problems caused by hardware devices since
none is required. The CNN algorithm has given an accuracy of about 94% which
indicates that the algorithm is performing accurately with the given hand gestures [6
8]. The implemented system serves as an extendible foundation for future research,
232 P. K. Vadrevu et al.
Fig. 22.4 Experimental results
extensions to the current system have been proposed. In extension to our existing
work with the help of a high-range camera, we can capture the depth data, segment
the hand, and then classify it using a chamfer matching method to measure the simi-
larities between the candidate hand image and hand templates in the database [68].
With the use of infrared high-resolution cameras, there is a chance for controlling the
appliances even during the night time. The comparison study needs to be conducted
by taking the other models that can be presented in the next coming paper.
References
1. Wang, H., Chai, X., Zhou, Y., & Chen, X. (2015). Fast sign language recognition benefited
from low rank approximation. In: 2015 11th IEEE International Conference and Workshops
on Automatic Face and Gesture Recognition, FG 2015.
2. Tewari, D., & Srivastava, S. (2012). A visual recognition of static hand gestures in Indian
sign language based on Kohonen self-organizing map algorithm. International Journal of
Engineering and Advanced Technology (IJEAT), 2(2), 165–170.
3. Huang, J., Zhou, W., Li, H., Li, W. (2015). Sign language recognition using 3D convolutional
neural networks. In 2015 IEEE International Conference on Multimedia and Expo (ICME),
(pp. 1–6). IEEE.
4. Suryapriya, A. K., Sumam, S., & Idicula M. (2009). Design and development of a frame
based MT system for english-to-ISL. In World Congress on Nature and Biologically Inspired
Computing (pp. 1382–1387).
5. Georghiades, A. S., Belhumeur,P. N., & Kriegman,D. J. (2001). From few to many: illumination
cone models for face recognition under variable lighting and pose. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 23(6), 643–660.
6. Vadrevu, P. K., Adusumalli, S. K., & Mangalapall, V. K. (2020) Motion detection to preserve
personal privacy from surveillance data using contrary motion. International Journal of Recent
Technology and Engineering (IJRTE), 8(6).
7. Vadrevu, P. K., Adusumalli, S. K., & Mangalapall, V. K. (2019) A survey on personal
privacy preserving data publication in IoT. International Journal of Innovative Technology
and Exploring Engineering (IJITEE), 8(6C2). ISSN: 2278–3075.
22 Sign Language Recognition for Needy People 233
8. Vadrevu, P. K., Adusumalli, S. K., & Mangalapalli, V. K., Survey: privacy preserving data
publication in the age of big data in IoT Era. International Journal of Engineering, Science
and Mathematics, 6(8), 938–944.
9. Ong S., & Ranganath S. (2005). Automatic sign language analysis: a survey and the future
beyond lexical meaning. IEEE Transactions on Pattern Analysis and Machine Intelligence,
27(6), 873–891.
10. In Human Friendly Mechatronics Selected Papers of the International Conference on Machine
Automation ICMA 2000 Sept 27–29, 2000, Osaka, Japan, 2001, pp 11–16.
11. Bantupalli K., & Xie, Y. (2018). American sign language recognition using deep learning and
computer vision. In 2018 IEEE International Conference on Big Data (Big Data), Seattle, WA,
USA, 2018, (pp. 4896–4899). https://doi.org/10.1109/BigData.2018.8622141.
12. Cabrera, M., Bogado, J., Fermãn, L., Acuna, R., & Ralev, D. (2012). Glove-based gesture
recognition system. https://doi.org/10.1142/9789814415958_0095.
13. He, S. (2019). Research of a sign language translation system based on deep learning. 392–396.
https://doi.org/10.1109/AIAM48774.2019.00083.
14. Herath, H. C. M., Kumari, W. A. L. V., Senevirathne, W. A. P. B., & Dissanayake, M. (2013).
Image based sign language recognition system for Sinhala sign language
15. Geetha, M., & Manjusha, U. C. (2012). A vision based recognition of Indian sign language
alphabets and numerals using b-spline approximation. International Journal on Computer
Science and Engineering (IJCSE), 4(3), 406–415.
16. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). ImageNet: a large-scale
hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition,
CVPR 2009, (pp. 248–255). IEEE. Miami, FL, USA.
17. Feng, Z., & Jiang, Y. (2013). The review of the gesture recognition research. In Proceeding of
the Jinan University: Natural Sciences column (vol. 4, pp. 336–341).
18. Zafrulla, Z., Brashear, H., Starner, T.,et al. (2011). American sign language recognition with the
Kinect. In Proceedings of the 13th international conference on multimodal interfaces (pp. 279–
286).
Chapter 23
Efficient Usage of Spectrum by Using
Joint Optimization Channel Allocation
Method
Padyala Venkata Vara Prasad, K. V. D. Kiran, Rajasekhar Kommaraju,
and N. Gayathri
Abstract Traditionally, cognitive radio can be accessed by secondary user only
when foremost user is absent, but the subordinate client needs to evacuate the idle
spectrum when existence of foremost user is detected. Hence, the bandwidth is
reduced in traditional scheme. To overcome the problem, non-orthogonal multiple
access is used to increase the efficiency of spectrum in 5G communications. Non-
orthogonal multiple access is used in this method to allow the subordinate user to
enter the gamut even when forecast client is attending or not attending the conduit.
Foremost client decryption technique and subordinate client decryption technique
are introduced to decrypt the non-orthogonal signs. Hence, through the decoding
techniques, secondary user throughput can be achieved; to increase the primary user
throughput, duct channel energy must be in a limit. However due to the disturbance
caused by the foremost client, the subordinate efficiency may be decreased. Orienting
toward foremost user first deciphering and subordinate user first decoding. Here, we
come up with two enhancement problems to enhance the efficiency of both primary
client and secondary client. This is done using jointly optimizing spectrum resource
(Gayatri et al. in Mobile Netw Appl 6:435–441 [1]). This citation embracing how
much amount of sub-channel transmission power is used, and also, it enhances the
number of sub-channels present in it. This type of citation is to enhance optimiza-
tion problems. Jointly optimization algorithm is introduced to eliminate the existing
problem. This is achieved by accepting the signs and calculating the time needed to
sense gamut and the forecast client attending or not attending the conduit to decrypt
the data sent while forecast client is absent (Ghasemi and Sousa in IEEE Commun
Mag 46:32–39 [2]). The miniature outcomes will be shown the non-orthogonal-based
multiple access cognitive radio’s predominant transmission efficiency.
P. V. V. P r a s a d ( B)·K. V. D. Kiran ·N. Gayathri
Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation,
Vaddeswaram, Andhra Pradesh, India
e-mail: padyalaprasad@gmail.com
R. Kommaraju
IT Department, LakiReddy BaliReddy College of Engineering, Mylavaram, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
B. N. K. Rao et al. (eds.), Intelligent Computing and Applications, Smart Innovation,
Systems and Technologies 315, https://doi.org/10.1007/978-981-19-4162-7_23
235
236 P. V. V. Prasad et al.
23.1 Introduction
In contrast to past periods of compact interchange frameworks, execution. The other
variation of adaptable intercommunication changes, improvements from 1 to 2G,
3G, 4G, is not reasonable to technology of new cutting edge 5G advancement, but
it provides an unused methodology that provides ubiquitous organization.5G archi-
tecture is astonishingly precise. Past programs have advanced more motivated by
which loop of the new development could be performed. Unique company promo-
tion apps have powered the unused 5G growth. For applications as contrasting as car
interchanges, distant control with haptic design feedback, monster video downloads,
as well as the amazingly form data implemented applications such as difficult to
reach sensors they known to be used as IoT, Web of things, 5G portable correspon-
dences have been powered by the should give ubiquitous organization. 5G can supply
significantly more unmistakable versatility and can be ready to back a considerably
broader stretch out of uses [3], from moo data rate Web of things necessities through
to especially snappy data rate and astoundingly moo dormancy applications.
An ideal joint detecting edge and sub-band control assignment is proposed for
multichannel cognitive radio (CR) shell by naming a mixed-variable optimization
issue to maximize the whole throughput of cognitive signal whereas containing all the
restraint to the PU, the full control of the CR, and possibilities of wrong caution with
address of each sub-channel. Based on the bi-level optimization strategy. A few of the
available procedures utilized in cognitive radio incorporate range detecting, range
database, and pilot channel. The designed sharpening problem is divided into two
sub-issues of single-variable angled enhancement: the upper level for streamlining
the limit and the not used end level for enhancing power. The reenactment is about
creating the perception that with unique sub-channel gets formed joint sharpen will
achieve charming improvement in the throughput. A few of the existing procedures
utilized in cognitive radio incorporate range detecting, range database, and pilot
channel. These strategies are either complex that requires tall computational control
to distinguish unused range or come up short to capitalize on range space made
in real amount. Cognitive signal (CR) could be a shape of remote communication
with which handset can scholarly people discern which transmission channels are in
utilize. It immediately moves into empty channels whereas maintaining a strategic
distance from involved ones. It does not cause any impedances to the authorized
client. Figure 23.1 appears a way of range sharing by applying joint optimization
process in channel allocation.
Cognitive radio may be a concept presented to assault the up to the next coming
range crunch issue. Cognitive radio clients are unlicensed clients who discover
unused authorized range powerfully for its possess utilize without causing any
obstructions to authorized clients [4]. A few of the existing procedures utilized in
Cognitive radio incorporate range detecting, range database, and pilot channel. These
strategies are either complex that requires tall computational control to distinguish
unused range or come up short to capitalize on range space made in real amount.
Cognitive signal (CR) could be a shape of remote communication with which handset
23 Efficient Usage of Spectrum by Using Joint Optimization 237
Fig. 23.1 Flowchart for jointly optimization
can scholarly people discern which transmission channels are in utilize. It imme-
diately moves into empty channels whereas maintaining a strategic distance from
involved ones. It does not cause any impedances to the authorized client. The solid-
ified approach highlighted in this paper can help make less complex, taken a toll
viable arrangement. It can radically diminish operator’s speculation as CR utilize
unlicensed range. As the IoT wonder develops, tens of billions of gadgets will have to
be communicate with each other in real time. The highlighted CR approach will help
administrators cater to enormous range necessities and help construct an associated
world.
Traditionally, cognitive radio can be accessed by subordinate client only when
foremost user is absent, but the subordinate client needs to evacuate the idle spec-
trum when existence of foremost user is detected. Hence, the bandwidth is reduced
in traditional scheme. To overcome the problem, non-orthogonal multiple access is
used to increase the efficiency of spectrum in 5G communications. Non-orthogonal
multiple access is used in this method to allow the subordinate user to enter the gamut
even when forecast client is attending or not attending the conduit. Foremost client
decryption technique and subordinate client decryption technique are introduced to
decrypt the non-orthogonal signs. Hence, through the decoding techniques, subor-
dinate client efficiency can be addressed, to make high the primary user throughput
238 P. V. V. Prasad et al.
duct channel energy must be in a limit. However due to the disturbance caused by the
foremost client, the subordinate efficiency may be decreased [5]. Orienting toward
foremost user first deciphering and subordinate user first decoding, here, we come
up with two enhancement problems to enhance the efficiency of both foremost client
and subordinate client. This is done using jointly optimizing spectrum resource. This
citation embracing how much amount of sub-channel transmission power is used for
implementation, and it enhances the number of sub-channels present in it. This type
of citation is to enhance optimization problems. Jointly optimization algorithm is
introduced to eliminate the existing problem. This is achieved by accepting the signs
and measuring the time needed to sense gamut and the forecast client attending or
not attending the conduit to decrypt the data sent while forecast client is absent.
The miniature outcomes will be shown the non-orthogonal-based multiple access
cognitive radio’s predominant transmission efficiency. The entire article is having
Sect. 23.1 about introduction part is about the channel allocation in wireless commu-
nication, next Sect. 23.2 is about literature survey having different calculation formula
used while allocating channels to PU and SU users, next Sect. 23.3 of paper is about
proposed joint optimization method for channel allocation, and finally, Sect 23.4 is
having experimental results, Sect. 23.5 is about comparison between traditional and
proposed joint channel allocation process, and Sect. 23.6 is conclusion and future
scope of the work.
23.2 Literature Survey
Gamut sharing instrument should improve its present data point of the gamut utiliza-
tion for the cognitive signals by permitting a subordinate user for entering trivial spec-
trum allotted for foremost user. Anyway, the subordinate client may not, permitted
to upset the customary correspondences of the component. The Subordinate user
detects a similar array by action range detecting which might recognize as binary
fold detection issue. Inside, the conventional plans, the subordinate client can exclu-
sively get to the inert channel; finally, there is any existence of the foremost user is
detected energy sight is edged in accustomed to detecting the element with scrutiny
of stockpile energy datum of the element signal to regain its threshold. The element
is identified to truant if power is minimized below the threshold. The gig of vivacity
detection is mirrored by warning likelihood and sight of probability.
x=cos2pi1000t(23.1)
a=ammod(x,F,Fs)(23.2)
Pxx =periodogram(a)(23.3)
where Fis frequency of the user, and Fs is frequency of the gamut.
23 Efficient Usage of Spectrum by Using Joint Optimization 239
In the zero-mode chance that the essentialness numbers are underneath the imprint,
the plutonium is resolved not to be available. Energy locator effectiveness is interacted
by the measurable of caution and the likelihood of identification. Less inaccurate
alert likelihood offers exceedingly trustworthy otiose channel recognition, while
more probability of detection suggests that the existence of the foremost recipient is
exceptionally correctly identified. The moment takes remembering the show that is
lifted above the limit of risk counterfeit and discovery of weight space locations. Pre-
talk range control subject tuning was expected, when the edge was sliced up during the
opening and submission of the note medium space and even the subordinate recipient
should introduce the medium of sending opening when the inert channel inside
the detection is recognized. In a technique of detecting, throughput compromise
was found to seek the best warning time to maximize the subordinate user’s yield.
Several nuclear number 24 channels have been suggested that will upgrade the second
client yield by allowing the subordinate user to enter various inactive sub-channels
at a comparable time [6]. To upgrade the yield of the subordinate customer to the
non-attendance of plutonium, capacity enhancement of the multichannel nuclear
number 24 was discovered. Notwithstanding, if the premier customer was open,
the subordinate channel accessing person does not use the scope inside higher than
strategies. In this way, the throughput reduces. Multiple chances are expanded to
support 5G exchange spectrum power and can assist range access within the NOMA
of various customers in force space, multiplexing different customers on an equivalent
sub-channel by adding superposition committal to writing at the transmitter and
demanding impedance to undo the legatee’s strategy. Therefore, as soon as the item
is present, the subordinate user may also accept the range to boost the out turn through
mistreatment ulceration.
Hpsd =dspdata.psd(Pxx,Fs,Fs)(23.4)
c=Pxx(25)10,000 (23.5)
Cognitive radio clients are unlicensed clients who discover unused authorized
range powerfully for its possess utilize without causing any obstructions to authorized
clients. A few of the existing procedures utilized in cognitive radio incorporate range
detecting, range database, and pilot channel. These strategies are either complex that
requires tall computational control to distinguish unused range or come up short to
capitalize on range space made in real amount. Cognitive signal (CR) could be a
shape of remote communication with which handset can scholarly people discern
which transmission channels are in utilize. It immediately moves into empty channels
whereas maintaining a strategic distance from involved ones.
aa =ammod(x,F,Fs)(23.6)
Initial methods, the efficiency of the forecast user is reduced due to noise caused
by the subordinate user when both the clients in access mode to the conduit at the
240 P. V. V. Prasad et al.
same time, so this disturbance leads to decrease in efficiency of primary user. So,
to overcome the problem, we used non-orthogonal multiple access to enhance the
efficiency of the forecast client. Hence, when contrasted with traditional scheme, in
our algorithm, subordinate client access the conduit even when the forecast client
is attending the conduit, and the efficiency of both the clients is maintained while
accepting the conduit.
23.3 Proposed Method
Goaling at primary user (PF) decoding mode and SF decoding mode, we have got
planned two improvement drawbacks, respectively, firstly, to access the conduit while
the forecast client is attending or not attending the channel. Secondly, to enhance
the efficiency of both the forecast client and subordinate client efficiency using non-
orthogonal multiple access. A jointly optimization algorithm is introduced to enter
the sub-conduit while forecast client is attending or not attending the conduit and to
enhance the efficiency.
In the old method, the efficiency of the forecast user is reduced due to the distur-
bance caused by the subordinate user when both the clients are accessing the conduit
at the same time, so this disturbance leads to decrease in efficiency of primary user.
So, to overcome the problem, we used non-orthogonal multiple access to enhance
the efficiency of the forecast client. Hence, when contrasted with traditional scheme,
in our algorithm, subordinate client access the conduit even when the forecast client
is attending the conduit and the efficiency of both the clients is maintained while
accepting the conduit.
Hence, when contrasted with traditional scheme, in our algorithm, subordinate
client access the conduit even when the forecast client is attending the conduit and
the efficiency of both the clients is maintained while accepting the conduit. This is
achieved by accepting the signs and calculating the time needed to sense gamut and
the forecast client attending or not attending the conduit to decrypt the data sent while
forecast client is absent. The miniature outcomes will be shown the non-orthogonal-
based multiple access cognitive radio’s predominant transmission efficiency.
Therefore, through the decrypting techniques, secondary user efficiency can be
achieved, to increase the primary user throughput duct channel energy must be in a
limit. However due to the disturbance caused by the foremost client, the subordinate
efficiency may be decreased. Orienting toward foremost user first deciphering and
subordinate user first decoding, here, we come up with two enhancement problems to
enhance the efficiency of both primary client and secondary client. This is done using
jointly optimizing spectrum resource. This citation embracing how much amount
23 Efficient Usage of Spectrum by Using Joint Optimization 241
of sub-channel transmission power is used, and it enhances the number of sub-
channels present in it. This type of citation is to enhance optimization problems.
Jointly optimization algorithm is introduced to eliminate the existing problem.
Algorithm for Sub-channel Power Optimization
Initialize: time =0,v ar =0 and Freq > 0;
1. with given t, calculate {x} using (pi);
2. with {t, pi}, update ‘x’ through (1);
3. set F =0, Fs =max
4. with given freq, Fs, calculate {a} using (x);
5. with {F, Fs, x}, update ‘a’ through (2);
6. repeat step (2) to (5) until F =5.
output: graph{Frequencies}
In the algorithm, we are assigning the frequencies of different users and assigning
time with 0 and calculating the lambda values, and ammod function is used to calcu-
late the respected range using the frequency of the user and frequency of the spectrum
[7]. In Fig. 23.2, the subordinate client is entering the conduit while the forecast is
attending the conduit. However, the efficiency is reduced due to the disturbance
caused in the gamut by the subordinate client while forecast client is attending the
conduit.
In the algorithm, we give the number of forecast users, then we give calculate the
threshold of the forecast user. If the threshold is in between the given range, then the
forecast user is present or not is concluded. If the forecast is not attending the conduit,
then the subordinate client can access the conduit to send its encrypted data. When
the forecast user is attending the conduit, then the subordinate client can accept the
data from the conduit which was sent in the absence of forecast client for subordinate
client and decrypt the encrypted data. Here using non-orthogonal multiple access, the
efficiency of the forecast client and subordinate client is improved, and the frequen-
cies of both the clients are maintained. A graph is plotted to show the efficiency and
Fig. 23.2 Primary user is present
242 P. V. V. Prasad et al.
frequencies of the clients to show the outputs and the work done between the fore-
cast and subordinate clients. In the old method, the efficiency of the forecast user is
reduced due to the disturbance caused by the subordinate user when both the clients
are accessing the conduit at the same time, so this disturbance leads to decrease in
efficiency of primary user. But the efficiency of the forecast user is reduced due to
the disturbance caused by the subordinate user when both the clients are accessing
the conduit at the same time, so this disturbance leads to decrease in efficiency of
primary user [8]. So, to overcome the problem, we used non-orthogonal multiple
access to enhance the efficiency of the forecast client. Hence, when contrasted with
traditional scheme, in our algorithm, subordinate client access the conduit even when
the forecast client is attending the conduit, and the efficiency of both the clients is
maintained while accepting the conduit. The subordinate client sends its data when
forecast client is absent and access the data from the client while forecast client
is present by decrypting the data from the conduit. But, however, the efficiency of
the clients is distorted due to the disturbances caused by the forecast client and the
subordinate client.
Algorithm for Jointly Optimization
Initialize: gamma =8000 and Freq > 0;
1. with given F, calculate {a} using algorithm 1;
2. with {a}, update ‘Pxx’ through (3);
3. with {Fs}, update ‘Hpsd’ through (4);
4. with {Pxx, F}, update ‘c’ through (5);
5. if c < gamma
6. with given {F}, calculate {aa} using (x);
7. with {x, F, Fs}, update ‘aa’ through (6);
8. repeat step (3).
output: graph{Frequencies}
In the algorithm, the subordinate client is accessing the gamut while the forecast
client is attending or not attending the client. Here, in the above Fig. 23.3,itis
show that the efficiency of the forecast user is achieved using the non-orthogonal
multiple access. The efficiency of both the clients forecast client and subordinate
client is maintained. The efficiency can be calculated by frequencies difference.
Here, the forecast client and subordinate client are having different frequencies.
Hence, efficiency is enhanced.
With the proposed algorithm gives results will be obtained and we give the number
of forecast users then we give calculate the threshold of the forecast user. If the
threshold is in between the given range, then the forecast user is present or not is
concluded. If the forecast is not attending the conduit, then the subordinate client
can access the conduit to send its encrypted data. When the forecast user is attending
the conduit, then the subordinate client can accept the data from the conduit which
was sent in the absence of forecast client for subordinate client and decrypt the
encrypted data. Here using non-orthogonal multiple access, the efficiency of the
23 Efficient Usage of Spectrum by Using Joint Optimization 243
Fig. 23.3 Usage of channel by primary user and secondary user
forecast client and subordinate client is improved, and the frequencies of both the
clients are maintained. A graph is plotted to show the efficiency and frequencies of the
clients to show the outputs and the work done between the forecast and subordinate
clients. In the old method, the efficiency of the forecast user is reduced due to the
disturbance caused by the subordinate user when both the clients are accessing the
244 P. V. V. Prasad et al.
conduit at the same time, so this disturbance leads to decrease in efficiency of primary
user.
So, to overcome the problem, we used non-orthogonal multiple access to enhance
the efficiency of the forecast client. Hence, when contrasted with traditional scheme,
in our algorithm, subordinate client access the conduit even when the forecast client
is attending the conduit and the efficiency of both the clients is maintained while
accepting the conduit. This is done using jointly optimizing spectrum resource. This
citation embracing how much amount of sub-channel transmission power is used,
and it enhances the number of sub-channels present in it. This type of citation is to
enhance optimization problems [9]. In Fig. 23.3, jointly optimization algorithm is
introduced to eliminate the existing problem. This is achieved by accepting the signs
and calculating the time needed to sense gamut and the forecast client attending or
not attending the conduit to decrypt the data sent while forecast client is absent.
The miniature outcomes will be shown the non-orthogonal-based multiple access
cognitive radio’s predominant transmission efficiency.
23.4 Experimental Results
In the algorithm, we give the number of forecast users, then we give calculate the
threshold of the forecast user. If the threshold is in between the given range, then the
forecast user is present or not is concluded. If the forecast is not attending the conduit,
then the subordinate client can access the conduit to send its encrypted data. When
the forecast user is attending the conduit, then the subordinate client can accept the
data from the conduit which was sent in the absence of forecast client for subor-
dinate client and decrypt the encrypted data. Here using non-orthogonal multiple
access, the efficiency of the forecast client and subordinate client is improved, and
the frequencies of both the clients are maintained. A graph is plotted to show the effi-
ciency and frequencies of the clients to show the outputs and the work done between
the forecast and subordinate clients. In the old method, the efficiency of the forecast
user is reduced due to the disturbance caused by the subordinate user when both
the clients are accessing the conduit at the same time, so this disturbance leads to
decrease in efficiency of primary user. So, to overcome the problem, we used non-
orthogonal multiple access to enhance the efficiency of the forecast client. Hence,
when contrasted with traditional scheme, in our algorithm, subordinate client access
the conduit even when the forecast client is attending the conduit, and the efficiency
of both the clients is maintained while accepting the conduit.
23.5 Comparison of Results
In ancient method, the subordinate client can access the forecast client conduit only
when the forecast conduit is not attending the conduit. But in the modern scheme,
23 Efficient Usage of Spectrum by Using Joint Optimization 245
subordinate client can access the forecast client conduit even when forecast client
is attending the conduit. So, in this algorithm, the subordinate client can access the
forecast conduit when forecast client is attending or not attending the conduit. Firstly,
the conduit is sensed to know the presence of forecast client, based on the gamma
range, if it is between the given threshold, then the subordinate client can know
whether the forecast user is attending the conduit or not. In Fig. 23.4, results shown
are secondary user accessing primary user sub-channel, and power usage is very
affective. If forecast client is not attending the conduit, then the subordinate client
sends its encrypted data to the forecast client. So, by this whether forecast client
is attending or not attending, the conduit subordinate client can enter the conduit.
Whenever the subordinate client wants to accept the data when forecast client is
attending the conduit, then the data is decrypted at the forecast client, and the actual
data is sent to the subordinate client. So, in this way, both the forecast and subordinate
clients can accept the conduit to access the sub-conduit even when forecast user is
attending or not attending the conduit. But here, the drawback is that whenever the
forecast client and subordinate client accept the sub-conduit, the efficiency of the
clients is decreased so to overcome the drawback jointly optimization technic is used
to enhance the efficiency of the clients. Using MATLAB, we showed the difference
between the old and our approach. In the algorithm, firstly, frequencies are assigned
to the variables, ammod function is used to calculate the output, and the outcome is
plotted on the graph to show the result. Figure 23.3 is showing throughput comparison
between traditional and joint optimization with the parameters of received power and
bits transmitted in allocated channel.
Fig. 23.4 Throughput comparison between traditional and joint optimization
246 P. V. V. Prasad et al.
Here, the subordinate client senses the conduit and identifies the presence or
absence of forecast client. Based on the sensing result, it takes decision whether
to enter the conduit or send data to the conduit. However, this algorithm fails to
improve the efficiency of the algorithm, so to improve the efficiency, we discussed
the jointly optimization algorithm. In the technique, non-orthogonal multiple access
is used to eliminate the disturbances caused by the subordinate user and to enhance
the efficiency of the forecast client.
23.6 Conclusion and Future Scope
In ancient method, subordinate client can access the forecast client conduit only
when the forecast conduit is not attending the conduit. But in the modern scheme,
subordinate client can access the forecast client conduit even when forecast client
is attending the conduit. So, in this algorithm, the subordinate client can access
the forecast conduit when forecast client is attending or not attending the conduit.
Firstly, the conduit is sensed to know the presence of forecast client, based on the
gamma range, if it is between the given threshold, then the subordinate client can
know whether the forecast user is attending the conduit or not. If forecast client is
not attending the conduit, then the subordinate client sends its encrypted data to the
forecast client. So, by this whether forecast client is attending or not attending, the
conduit subordinate client can enter the conduit. Whenever the subordinate client
wants to accept the data when forecast client is attending the conduit, then the data is
decrypted at the forecast client, and the actual data is sent to the subordinate client.
So, in this way, both the forecast and subordinate clients can accept the conduit
to access the sub-conduit even when forecast user is attending or not attending the
conduit.
But here, the drawback is that whenever the forecast client and subordinate client
accept the sub-conduit, the efficiency of the clients is decreased so to overcome the
drawback jointly optimization technic is used to enhance the efficiency of the clients.
Using MATLAB, we showed the difference between the old and our approach. In the
algorithm, firstly, frequencies are assigned to the variables, ammod function is used
to calculate the output, and the outcome is plotted on the graph to show the result.
Here, the subordinate client senses the conduit and identifies the presence or
absence of forecast client. Based on the sensing result, it takes decision whether
to enter the conduit or send data to the conduit. However, this algorithm fails to
improve the efficiency of the algorithm so to improve the efficiency we discussed
the jointly optimization algorithm. In the technique, non-orthogonal multiple access
is used to eliminate the disturbances caused by the subordinate user and to enhance
the efficiency of the forecast client.
23 Efficient Usage of Spectrum by Using Joint Optimization 247
References
1. Gayatri, T., Sharma, V. K., & Anveshkuar, N. (2019). A survey on conceptualization of cogni-
tive radio and dynamic spectrum access for next generation wireless communication. Mobile
Networks and Applications, 6(5), 435–441.
2. Ghasemi, A., & Sousa, E. S. (2018). Spectrum sensing in cognitive radio networks: Require-
ments, challenges and design tradeoffs. IEEE Communications Magazine, 46(4), 32–39.
3. Liu, X., Jia, M., Gu, X., & Tan, X. (2017). Optimal periodic cooperative spectrum sensing based
on weight fusion in cognitive radio networks. Sensors, 13(4), 5251–5272.
4. Shen, J., Liu, S., Wang, Y., Xie, G., Rashvand, H. F., & Liu, Y. (2019). Robust energy detection
in cognitive radio. IET Communications, 3(6), 1016–1023.
5. Yang, L., Han, Z., Huang, Z., & Ma, J. (2018). A remotely keyed file encryption scheme under
mobile cloud computing. Journal of Network and Computer Applications, 106, 90–99.
6. Unde, A. S., & Deepthi, P. P. (2020, January). Design and analysis of compressive sensing
based lightweight encryption scheme for multimedia IoT. IEEE Transactions Circuits Systems
II, Express Briefs, 67(1), 167–171
7. Liu, X., & Jia, M. (2017). Joint optimal fair cooperative spectrum sensing and transmission in
cognitive radio. Physical Communication, 25, 445–453.
8. Liu, X., Li, F., & Na, Z. (2017). Optimal resource allocation in simultaneous cooperative
spectrum sensing and energy harvesting for multichannel cognitive radio. IEEE Access, 5,
3801–3812.
9. Muhammad, K., Hamza, R., Ahmad, J., Lloret, J., Wang, H., & Baik, S. W. (2018). Secure
surveillance framework for IoT systems using probabilistic image encryption. IEEE Transactions
Industrial Informatics, 14(8), 3679–3689.
Chapter 24
An Intelligent Energy-Efficient Routing
Protocol for Wearable Body Area
Networks
Muniraju Naidu Vadlamudi and Md. Asdaque Hussian
Abstract The wireless body area network (WBAN) is a wireless sensor network
that calculates natural constraints for special applications using wireless sensor nodes
in and around the human body. Routing protocols used in wireless body networks
influence energy, topology, temperature, location, sensor radio choice, and appro-
priate quality of service in sensor nodes. The achievement of energy effectiveness in
wireless body networks (WBANs) is one of these challenges, as energy efficiency
eventually affects the durability of the network. The majority of applications in
the wireless body sensor network (WBAN) currently require efficient transmission
processes for remote monitoring of wearable system data on demand and in a timely
manner. This paper proposes a reliable, power efficient, and highly stable network
for wireless body area sensor networks. The simulation has been completed, and the
findings have been found to be consistent.
24.1 Introduction
The WSN includes a node that is used in a specific geographical area to sensor or
monitor parameters such as temperature, humidity, pressure, noise level, and move-
ment of the vehicle or the human being. The WBAN IEEE standard was provided
as IEEE 802.15 as ‘a communication standard optimized for low-power devices
and operation on, in or around the human body (but not limited to humans) to
serve a variety of applications including medical, consumer electronics/personal
entertainment, and others’ [1].
Information aggregation procedures are classified as follows:
M. N. Vadlamudi (B)
Assistant Professor, Department of Computer Science and Engineering, School of Engineering,
Malla Reddy University, Hyderabad, Telangana, India
e-mail: munirajunaidu.v@gmail.com
M. N. Vadlamudi ·Md. A. Hussian
Associate Professor, Faculty of Computer Studies, Arab Open University, Manama, Bahrain
e-mail: m.asdaque@aou.org.bh
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
B. N. K. Rao et al. (eds.), Intelligent Computing and Applications, Smart Innovation,
Systems and Technologies 315, https://doi.org/10.1007/978-981-19-4162-7_24
249
250 M. N. Vadlamudi and Md. A. Hussian
(a) A prearranged method where the battery exhausts quicker when the network
topology is analyzed and arranged that leads to network failure, and
(b) A network topology open method in which the power does not misuse energy
set-up network topology [2]. Information aggregation was made possible for big
body area networks by reducing an active state node set to reduce network usage
[3]. For heterogeneous networks, the aggregation of data takes place through the
compromise among information quality and power [4]. A tree-based topology
aggregation is dependable by supplying instance gaps for forwarding data and
a predefined transmission power to improve network life [5].
24.2 Issues in Designing Routing Protocols
Due to specific conditions such as architecture, thickness, and information speed,
WBAN is not directly used for protocols used in WSNs because it is a subset of
WSN [6,7]. A few difficulty factors that make choosing a WBAN routing protocol
are A. mobility, B. efficient communication choice, C. service quality, and D. safety
and confidentiality(Fig. 24.1).
.
Fig. 24.1 Deployment of nodes
24 An Intelligent Energy-Efficient Routing Protocol 251
24.2.1 Existing Energy-Efficient Routing Protocols in WBAN
Javaidane [8] proposed an energy proficient and thermally sensitive routing protocol
for WBANs to reduce node temperatures and reduce critical data delay. Tauqir et al.
[9] propose the distance aware relaying energy efficient (DARE) for monitoring the
patient’s biological status. Ahmedulla [10] proposed a (LAEEBA). The route with
the fewest nodes is chosen for broadcast in LAEEBA. Nadeemunem [11]wasthe
first to propose a WBAN routing model that was both energy and power efficient.
24.3 Proposed Approach
24.3.1 Intelligent Energy-Efficient Routing Protocol
24.3.1.1 Power Consumption Analysis
The sensors need to sense and progress the sensed information and broadcast it to
the source. Equation 24.1 is a mathematical representation of the energy consumed
during transmission.
FTX =(Famp +Felec)sd2(24.1)
In Eq. 24.1, the distance is represented by the d parameter. Energy transmission
sensors consume more energy as the distance between them increases. Equation 24.2
shows the amount of energy used by the WBASN sensor node to transmit data.
Fnode =Ftx +Fretx +Fack +Facc (24.2)
(Fnode): Total energy, (Ftx), is a term used to describe the amount of energy in a
transmission energy, (Fretx) is a term used to describe the energy used in transmission,
and (Fack) is a word that means ‘energy retransmission’. (ACK) is the amount of
energy used to send an acknowledgment packet. (Facc) is the amount of energy
consumed by the channel access procedures. In this study, the Nordic NRF2401A1
transceiver was used due to its mega low-power utilization. It communicates using
the 4.8 GHz ISM (manufacturing, technical, and medicinal) group. It has a battery
life of several months to years when powered by AA or AAA batteries. A list of
some parameters is provided in Table 24.2.
24.3.1.2 Proposed Algorithm
A total of eight sensors will be used in the proposed scheme. The coordinates of
the proposed scheme’s entire sensor nodes are shown in Table 24.1. Figure 24.2
252 M. N. Vadlamudi and Md. A. Hussian
Table 24.1 Nordic
parameters Parameters Val u e s
FxDC current 12 0.5 mA
GxDC current 19 mA
Minimum voltage supply 1.6 V
Ftx-elec 14.6 nJ/bit
Frs-elec 33.6 nJ/bit
Eamp 2.36e-10 j/b
depicts the sensor deployment on the human body. The sensors are represented by
circular dots, while the base station is represented by a rectangular box. The power
and computation capabilities of the sensor nodes are equal. Nodes have an initial
energy of 1.3 J. The 1.1 J threshold energy level has been set. A sensor is considered
dead if its energy level falls below a certain threshold. The center of the body is where
the sink node is located. The multi-hopping method is utilized to save energy as the
space among distant nodes decreases, data is now sent to the sink by the forwarder
node. The information from other sensors is collected by a forwarder node and sent to
the sink. Whenever one round ends, the forwarder in the proposed scheme is chosen.
The sink knows the identity of the sensor nodes as well as their energy status. The
forwarder node should be chosen carefully and based on the cost function. This is
done to determine the next stop on the journey. To determine whether any of the
sensors can act as a forwarder, the cost function [8] is used. Which forwarder node to
use is determined by the smallest amount distance from descend and energy above
maximum energy cost function can be represented as
CF
i=Ei
FR
i
(24.3)
Table 24.2 Sensor
coordinates Sensor number Xcoordinate Ycoordinate
S1 0.35 0.2
S2 0.6 0.3
S3 0.15 0.4
S4 0.6 0.5
S5 0.5 0.65
S6 0.35 0.75
S7 0.7 0.9
S8 0.3 0.85
24 An Intelligent Energy-Efficient Routing Protocol 253
Fig. 24.2 Sensor
deployment
where Edenotes the sensor’s distance from the sink, FR denotes the sensor’s residual
energy, and Idenotes the sensor’s number.
Algorithm
Intelligent Energy-Efficient Routing Protocol
Inputs: Neighbor list N, energy F, threshold th, sensor S
Output: Efficient energy management scheme
1. Start
2. Initialize neighboring nodes
3. Initialize F, sensors S
4. Initialize consistency to true
5. For each sensor S
6. Calculate the energy consumed FTX
7. FTX =compute FTX(n)
8. End For
9. maxftx =find Max FTX(pmap)
10. IF maxftx < th THEN
11. Reassign the sensor energy
12. Amount of energy used by the WBASN sensor node to transmit data
13. Assign the value to Fnode
14. Consider both the transmission and retransmission energies
15. Both are less than the threshold level ‘th’
16. Find the resultant sensor nodes available in the limit
17. Calculate cost function ‘CF’ of the nodes
254 M. N. Vadlamudi and Md. A. Hussian
Fig. 24.3 Evaluation of energy consumption
18. IF ‘CF’ < =FTX
19. Notify (‘node is energy efficient and useful for transmission’)
20. Else
21. Notify (‘new node will be considered’);
22. END IF
23. END FOR
24. Stop
Network stability refers to how long it takes for the first node to die, while network
lifetime refers to how long it takes for all of the sensor nodes to die (Fig. 24.3).
24.4 Simulation Setup
We compare the performance of intelligent energy efficient routing scheme (IEERS)
with distance aware relaying energy efficient (DARE) [9] and energy efficient scheme
for body area network (LAEEBA) [10]. Network simulator (NS2) is used to evaluate
our IEERS. The evaluation is carried out on a network with a size of 200 m * 200 m,
a packet size of 250 bytes, and a number of packets transmitted per session of 500.
The total available spectrum (BW) is set to range from 10 to 20 MHz. The bandwidth
available to nodes is limited to 2; 4; and 6 MHz. The traffic is constant bit flow, and
the nodes are distributed in a uniform random distribution. The parameters α,β,a,
b,γ,θare set to the values 2, 3, 0.5, 0.5, 0.5. The duration of the simulation is set to
900 s. The Tw, Tm, and Tassoc parameters are set to 5, 1, and 10.
Energy consumption: Certain amount of energy is consumed for every node to
complete its work. The average energy is measured as given below.
E=1
Ns
Ns
1
e[n](24.4)
24 An Intelligent Energy-Efficient Routing Protocol 255
Table 24.3 Simulation
results for energy
consumption
Number of WBAN nodes Energy consumption (kJ)
IEERS DARE LAEEBA
25 2.8 3.7 4.6
50 4.2 5.7 6.8
75 6.3 7.5 8.2
100 7.5 11.5 14.5
125 8.6 12.3 14.2
150 9.3 13.3 16.2
175 9.8 14.5 18.2
200 11.3 16.5 19.3
225 12.4 17.6 20.36
250 13.6 18.3 22.36
From the above equation, the energy E is measured based on the generated
sampled energy vectors e[n],where(n=1,2,3,...,Ns)’.
Simulation Results: The energy consumption calculated using three different
methods is shown in Table 24.3.
Therefore, energy use of IEERS is reportedly reduced by 35% compared with
DARE and 51% compared with LAEEBA (Fig. 24.3).
Packet Delivery Ratio: The packet delivery ratio (PDR) is defined as the ratio
of total packets to total packets transmitted from the source node to a network
destination node. Performance improves as the packet delivery ratio increases. It
is mathematically denoted by the equation.
Packet Delivery Ratio =(Total packets received by all destination node)/
(Total packets send by all source node)
Therefore, the IEERS packet delivery ratio would be reduced by 23% compared
with DARE and 41% compared to LAEEBA (Fig. 24.4).
24.5 Conclusion
The wireless body area network is a common form of sensor network in a number
of applications including medical services, entertainment, games, burning, and mili-
tary applications. One method of achieving energy efficiency is the efficient routing
of energy. This paper proposes an intelligent WBASN routing protocol for energy
efficiency. The sensor is first calculated in this system for the forwarder node. This
forwarder node collects data from other sensors and then adds data to the sink.
256 M. N. Vadlamudi and Md. A. Hussian
No. of
WBANs Packet Delivery Ratio (%)
IEERS DARE LAEEBA
25 98 96
93
50 90 88
86
75 84 80
76
100 79 71
62
125 70 62
58
150 68 60
51
175 62 53
42
Fig. 24.4 Evaluation of packet delivery ratio
After completing a round successfully, a forwarder node is chosen based on less
distance from the sink and overall residual energy than all sensor nodes. The scheme
proposed achieves a better energy consumption and delivery ratio, as can be seen
from the results of the simulation.
References
1. Jain, K. L., & Mohapatra, S. (2019). Grid base energy efficient coverage aware routing protocol
for wireless sensor network. In Proceedings of the 2nd International Conference on Software
Engineering and Information Management, ACM (pp. 49–53).
2. Merzoug, M. A., Boukerche, A., Mostefaoui, A., & Chouali, S. (2019). Spreading aggrega-
tion: A distributed collision-free approach for data aggregation in large-scale wireless sensor
networks. Journal of Parallel and Distributed Computing, 125, 121–134.
3. Soltani M., Hempel, M., & Sharif, H. (2014). Data fusion utilization for optimizing large- scale
wireless sensor networks. In 2014 IEEE International Conference on Communications, ICC,
(pp. 367–372). IEEE
4. Baskar, S., Periyanayagi, S., Shakeel, P. M., & Dhulipala, V. S. (2019). An energy persis-
tent range-dependent regulated transmission communication model for vehicular network
applications. Computer Networks.https://doi.org/10.1016/j.comnet.2019.01.027
5. Gong, D., & Yang, Y. (2014). Low-latency SINR-based data gathering in wireless sensor
networks. IEEE Transactions on Wireless Communications, 13(6), 3207–3221.
6. Khan, Z., Sivakumar, S., Philips, W., & Robertson, B. (2013). A QoS-aware routing protocol
for reliability sensitive data in hospital body area networks. In The 4th International Conference
on Ambient Systems, Networks and Technologies, Procedia Computer Science.
7. Macwan, S., Gondaliya, N., & Raja, N. (2016). Survey on wireless body area network.
International Journal of Advanced Research in Computer and Communication Engineering.
8. Tang, Q., Tummala, N., Gupta, & Schwiebert, L. (2005). TARA: thermal aware routing algo-
rithm for implanted sensor networks. In International Conference on Distributed Computing
in Sensor Systems (pp. 206–217). Springer Berlin Heidelberg.
9. Tauqir, A., Javaid, N., Akram, S., Rao, A., & Mohammad, S. (2013). Distance aware
relaying energy-efficient: Dare to monitor patients in multi-hop body area sensor networks.
In Broadband and Wireless Computing, Communication and Applications (BWCCA), Eighth
International Conference (pp. 206–213). IEEE
24 An Intelligent Energy-Efficient Routing Protocol 257
10. Ahmed, S., Javaid, N., Akbar, M., Iqbal, A., Khan, Z., & Qasim, U. (2014). Laeeba: Link aware
and energy efficient scheme for body area networks. In IEEE 28th International Conference
on Advanced Information Networking and Applications (pp. 435–440). IEEE.
11. Nadeem, Q., Javaid, N., Mohammad, S., Khan, M., Sarfraz, S., & Gull, M. (2013). Simple:
stable increased-throughput multi-hop protocol for link efficiency in wireless body area
networks. In 2013 Eighth International Conference on Broadband and Wireless Computing,
Communication and Applications (BWCCA) (pp. 221–226). IEEE.
Chapter 25
Enhanced Video Classification System
with Convolutional Neural Networks
Using Representative Frames as Input
Data
K. Jayasree and Sumam Mary Idicula
Abstract Convolutional neural networks are extensively used in video classifica-
tions systems, which uses video action recognition data set which consists of 13,320
videos grouped to 101 classes. In this proposed method, we are using a promising
way of speeding up the training by using representative frames from each video for
the entire video data set. The advantage is that the training time can be drastically
decreased with the compact input video data. Here we are using the inceptionv3
pretrained model for classification where the weights and architecture can be used
for similar types of inputs. Instead of giving the entire video frames for video clas-
sification, the proposed method, selects representative frames only, still producing
comparable results. When used with ‘action recognition data set’ having 101 classes
and 13,320 videos, while using full frames, it gives a F1-score of 0.717 and while
using representative frames, it results with a 0.704 value which is comparable. When
all the frames of the videos are given as input to the pretrained model, the perfor-
mance is considerably high. However, there is a direct implication on the cost as
far as time is concerned. The main objective of this research work is to enhance the
computational efficiency of the video classifier by using the representative frames
for speeding up the process of training the data set.
25.1 Introduction
Earlier classification methods primarily involve two major stages which are feature
extraction and classification. Feature extraction includes extraction of handcrafted
features such as Motion Vector, Edge Histogram, and Group of frame features.
For classification, different classifiers like support vector machine, HMM, Bayesian
K. Jayasree (B)
Department of Computer Engineering, Model Engineering College, Ernakulam, Kerala 682021,
India
e-mail: jayasreek@mec.ac.in
S. M. Idicula
Department of Computer Science, Muthoot Institute of Technology and Science, Ernakulam,
Kerala 682308, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
B. N. K. Rao et al. (eds.), Intelligent Computing and Applications, Smart Innovation,
Systems and Technologies 315, https://doi.org/10.1007/978-981-19-4162-7_25
259
260 K. Jayasree and S. M. Idicula
classifier, and neural networks are used. Nowadays convolutional neural networks
(CNNs) which use the concepts of deep learning have been extensively used in image
recognition problems. Hence, it can also be extended to video classification problems
as videos comprises sequences of images.
The network acquires knowledge from the data as weights of the network. These
learned values are transferred to other neural network model instead of training it
from the starting point. Training with transfer learning entails initializing all the pre-
final layer weights to the pretrained values and training just the last few layers with
relatively low rate. To construct a neural network, it takes a lot of time for training
the model. However, if a pre-constructed network structure and pretrained weights
are there, it will make the process faster. Thus, we don’t require a large-scale training
data set once learning outcomes are transferred. Hence, the pretrained convolutional
neural networks used in common for classifying the video data.
The pretrained model is obtained by training on a large data set. The weights and
architecture of the pretrained model can be used for further classification with same
or different data set. This process is known as transfer learning. The selection of
the pretrained model must be done with utmost care. Many pretrained architectures
are available in Keras library. ImageNet data set is comfortably large enough for
creating generalized models. Therefore, the data set is widely used for building
various architectures. Keras is a widely used API on TensorFlow platform and Theano
library. Keras APIs are deep learning models which can be used for prediction, feature
extraction, and fine tuning. These models are associated with pretrained weights.
These weights are downloaded automatically while initiating a model.
Inceptionv3 is a pretrained deep convolutional neural network trained on the
ImageNet data set. In this proposed method, Inceptionv3 is used for classifying
the video data. Usually, the entire video frames are given to Inceptionv3 pretrained
model for training and testing. In our proposed method instead of giving full frames,
selected frames which best represents the data are given for training and testing.
It is found that the results are somewhat very near. The representative frames are
created using block-based adaptive threshold method. The proposed method which
processes fewer frames still produces comparable results with entire frames used as
an input to the classifier.
25.2 Related Works
In the past few years, the most crucial problem found in video analysis and multimedia
database area is the automatic content-based video classification. The usual require-
ment for accessing the query-based video database is that the user must provide an
example clip so as to get similar clips as outputs. Searching of similar clips in the
entire database can be burdensome. In order to make the searching more efficient, the
video data must be classified into different categories. To improve the efficiency of
searching, methods need to be developed for categorizing the video data into one of
the predefined classes. Ferman and Tekalp [1] have used a probabilistic framework
25 Enhanced Video Classification System with Convolutional Neural 261
for the construction of descriptors based on the location, object, and events. In order
to categorize content domain and for extracting the relevant semantic information,
Bayesian belief networks and hidden Markov models (HMMs) are used at multiple
levels. A video indexing technique using decision tree is considered in [2] to classify
video in categories such as commercials, music, and sports. For efficient video clas-
sification in large scale, convolutional neural network is introduced in [3] where the
run time performance is improved by architecture of CNN which incorporates input
at two spatial resolutions. For action recognition, Simonyan et al. [4] make use of
two stream architecture for video classification that helps in encoding static frames
and optical flow frames. In [5], compute-efficient video classification models are
built by processing fewer frames, thereby complimenting the research on memory
efficient video classification. In [6], recurrent convolutional neural network archi-
tecture extracts the local features from image frames and temporal feature between
consecutive frames are taken for video classification. In [7], classification includes
extraction of features like frames and audio from video data, which is accomplished
by using a convolutional recurrent neural network.
In the proposed system, we are giving representative frames extracted from the
whole frames as input to the convolutional neural network and compare the perfor-
mance with system using all the frames of video data for classification. Our proposed
method achieves a classification accuracy using only representative frames from the
UCF101 data set, which is competing to the state-of-the art methods.
25.3 Methodology
Convolutional neural networks are widely accepted class of models for image recog-
nition tasks. It can be extended to video classification problems. Nowadays pretrained
convolutional neural networksmodels are used for classifying video data. Inceptionv3
is a deep convolutional neural network pretrained for the ImageNet and Large Visual
Recognition Challenge using data from 2012, and it can be differentiable among
1000 different classes.
Usually for video classification, all frames are extracted from each video. All
extracted frames are given to the pretrained model, and the performance is evaluated.
This paper proposes a novel method which is used to effectively classify video
data into predefined classes. Here, instead of using entire images, frames which
most represent the video are only taken into consideration. These representative
frames are only given to CNN. It provides a performance which is comparable to the
performance where all frames are used. The representative frames are created using
‘two-pass block based adaptive threshold technique’ [8]. The schematic diagram for
the proposed method is given below. (see Fig. 25.1).
The block-based adaptive threshold algorithm consists of two passes. Each frame
of the video is further segmented into four corners and a middle part of size 60
* 60 pixels during the first pass. Each of these blocks is named BottomLeft[BL],
BottomRight[BR], TopLeft[TL], TopRight[TR], and Middle[MID] in accordance
262 K. Jayasree and S. M. Idicula
Fig. 25.1 Schematic representation of video classification
with its positions in the frame. For each block, we create a quantized 64-bin RGB
colour histogram. From the list of frames, every two consecutive frames are taken,
and the accumulated histogram-based dissimilarity S(fm,fn) is determined for all
frames using the following formula.
S(fm,fn)=
r
i=1
B
iSp(fm,fn,i)(25.1)
In this equation, Biis the weighting factor predetermined for each block, Sp(f m,
fn,i)is the partial match, obtained through histogram matching method, fmand fn
are the consecutive frames, and rshows the number of blocks.
From first pass, a sequence of the sequence of dissimilarity measures is obtained.
In the second pass, these computed values are taken for the detection of the repre-
sentative frames using a single adaptive threshold based on Dugad model [9]. For
adaptively setting the threshold, the dissimilarity measures from the consecutive
frames are used. Thus, the procedure uses a sliding window of a given size where
the dissimilarity measures within this window is taken into consideration. From the
samples in the left side of the middle sample in the window and right side of the
middle sample in the window, the mean and standard deviations are estimated. The
threshold is set as given below in the Eq. (25.2). If the both conditions given below
are satisfied, the middle sample ‘mrgives the representative frame.
1. ‘mris to be the maximum value in the window.
2. ‘mrsatisfies the condition as given in the following equation:
mr>maxμleft +σleft,μright +σright  (25.2)
where ‘mris the dissimilarity value calculated for the two consecutive frames. μleft
is mean of the samples in the left side of the middle sample within the window μright
mean of the samples in the right side of the middle sample within the window. σright
and σleft represent the standard deviations of left and right samples.
Video classification is done using Inceptionv3 pretrained convolutional neural
network model. UCF101 action recognition data set is used as input to the classifier.
25 Enhanced Video Classification System with Convolutional Neural 263
25.4 Experiments and Results
Experiments are conducted on UCF101, video action recognition data set. UCF101
data set contains 101 classes with 13,320 videos. In Inceptionv3 model architecture,
we add fully connected last layer of 101 nodes for 101 classes. Last layer is SoftMax
activated and we have used ADAM optimizer for optimization of this deep model.
We have trained the model for 50 epochs. In this stage of training, last layer weights
are only updated by keeping all of the other layers froze. After this, we fine-tuned
this model for 250 epochs keeping all the layers trainable. We got the model with
best accuracy of 72.14% with representative frames as input.
The standard data set UCF101 gave an F1-score of 0.717 when classified on 101
classes of 13,320 videos for full frames and 0.704 when classified with representative
frames. From the result obtained, it is evident that using representative frames as input
to a classifier will give comparatively better result which is almost same as the result
obtained using all frames for all videos used for classification. The method saves a
considerable time in training.
The following Table 25.1 gives the comparison of various an F1-scores obtained
for different models with all frames taken into consideration for classification. Here
with full frames Inceptionv3 model gives an F1-score of 0.717.
Table 25.2 gives the comparison of various an F1-scores obtained for different
models with representative frames taken into consideration for classification. Here
with representative frames, Inceptionv3 model gives an F1-score of 0.704. There is
not that much difference, and it gives an approximately similar performance.
Table 25.1 Comparison of various classification models without representative frames
Models Recall Precision F1-score No. of
parameters
Model size Videos per min
1ResNet 50 58.47 67.14 0.625 42,548,524 175.54 MB 15
1MobileNet 57.15 66.68 0.615 4,294,462 17.85 MB 41
3VGG 16 62.49 57.37 0.598 15,145,245 60.45 MB 23
4VGG 19 65.99 63.71 0.648 20,448,235 85.41 MB 19
5Inceptionv3 72.85 70.63 0.717 23,784,238 94.78 MB 21
Table 25.2 Comparison of various classification models—with representative frames
Models Recall Precision F1-score No. of
parameters
Model size Videos per min
1ResNet 50 54.18 59.47 0.567 42,548,524 175.54 MB 22
2MobileNet 52.12 58.15 0.549 4,294,462 17.85 MB 53
3VGG 16 61.49 54.28 0.576 15,145,245 60.45 MB 29
4VGG 19 64.91 61.58 0.632 20,448,235 85.41 MB 26
5Inceptionv3 71.45 69.43 0.704 23,784,238 94.78 MB 32
264 K. Jayasree and S. M. Idicula
25.5 Conclusion
Convolutional neural networks are prominent class of models for video classification
tasks. The performance of the pretrained model having all the frames of the videos
as input was comparable to that of the pretrained model which took representative
frames alone as input. Our proposed method of using representative frames as input
to the classifier performs well with significant reduction in computational time and
hence expedites the training process.
References
1. Ferman, A. M., & Tekalp, A. M. (1999). Probabilistic analysis and extraction of videocontent.
In Proceedings of ICIP (vol. 2, pp 91–95).
2. Yuan, Y., Song, Q. -B., & Shen, J. -Y. (2002) Automatic video classification using decision tree
method. In Proceedings of Machine Learning and Cybernetics (pp. 1153–1157).
3. Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014). Large-
scale video classification with convolutional neural networks. In Proceedings of International
Computer Vision and Pattern Recognition (CVPR 2014) . IEEE.
4. Simonyan, K., & Zisserman, A. (2014). Two stream convolutional networks for action
recognition in videos. CoRR abs/1406.2199:1-8
5. Bhardwaj, Shweta, Mukundhan Srinivasan,and Mitesh M. Khapra. ”Efficient videoclassification
using fewer frames.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition. 2019.
6. Xu, Zhenqi, Jiani Hu, and Weihong Deng. ”Recurrent convolutional neural networkfor video
classification.” 2016 IEEE International Conference on Multimedia and Expo (ICME). IEEE,
2016.
7. K Prasanna Lakshmi, Mihir Solanki,Jyothi,Avinash Bhargav.”Video Genre Classification using
convolutional Recurrent Neural Networks.” International journal of advanced Computer science
and applications Vol. 11, No.3 2020.
8. V. Gomathi, “Content based video indexing and Retrieval, M.Tech thesis, Dept. of Computer
Science and Engineering, Indian Institute of Technology Madras, 2005.
9. Yusoff, Y., Christmas, W., & Kittler, J. (2000). Video shot cut detection using adaptive thresh-
olding. In Proceedings of the British Machine Vision Conference 2000, British Machine Vision
Association and Society for Pattern Recognition, Bristol, UK, 11–14 Sept 2000, (p. 37).
Chapter 26
Text Recognition from Images Using
Deep Learning Techniques
B. Narendra Kumar Rao, Kondra Pranitha, Ranjana, C. V. Krishnaveni,
and Midhun Chakkaravarthy
Abstract One of the most significant methods utilized in the deep learning approach
is text recognition. Text recognition is now a very significant activity that is utilized in
many applications of current gadgets to recognize images in a detailed manner. Auto-
matic number plate recognition, for example, is an image processing approach that
detects the vehicle’s number (license) plate. The automatic number plate recognition
system (ANPR) is a key feature that is used to manage traffic congestion. The goal of
ANPR is to devise a method for automatically identifying permitted vehicles using
vehicle numbers. Automatic number plate recognition (ANPR) is utilized in a variety
of applications, including traffic control, vehicle tracking, and automatic payment
of tolls on roads and bridges, as well as monitoring systems, parking management
systems, and toll collecting stations. The established approach first recognizes the
vehicle before taking a picture of it. After that, the number plate region in the car
is localized using a neural network, and the image is segmented. Using a character
recognition approach, characters are retrieved from the plate. The results, together
with the time stamp, are then saved in the database. It is implemented and performed
in Python, and the results are tested on a real picture.
B. Narendra Kumar Rao (B)
Computer Science and Engineering, Sree Vidyanikethan Engineering College, Tirupati, India
e-mail: narendrakumarraob@gmail.com
K. Pranitha
Computer Science and Engineering, B V Raju Institute of Technology, Narasapur, Medak,
Telangana, India
Ranjana
Department of IT, Sairam Institute of Technology, Chennai, Tamil Nadu, India
C. V. Krishnaveni
SKR & SKR GCW(A), Kadapa, India
M. Chakkaravarthy
Faculty of Computer Science and Multimedia, Lincoln University College, Kota Bharu, Malaysia
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
B. N. K. Rao et al. (eds.), Intelligent Computing and Applications, Smart Innovation,
Systems and Technologies 315, https://doi.org/10.1007/978-981-19-4162-7_26
265
266 B. Narendra Kumar Rao et al.
26.1 Introduction
Text recognition is one of the important method that is used in the deep learning
technique. Now a day’s, text recognition performs most important task which in
many applications of modern devices which is used to identify an image’s in detail
manner. For example, automatic number p1ate recognition system is a cast-off to
recognize the number p1ate in the vehicle. Here, this method gives good result in
the recognition of the text from images with many lines, images with roads and car
plate number recognition.
To eliminate inadequacies in CCTV camera surveillance, an automatic number
plate recognition system is implemented [13]. The ANPR system is widely utilized
in developed nations such as the United States, the United Kingdom, and Germany,
and it has been in use for a long time, but it became particularly essential in the 1990s
due to a growth in the number of cars. The data acquired from the 1icense plate
are generally utilized for traffic monitoring, parking, motorway road tolling, access
control, and border control, travel time measurement for toll booth, etc., by the law
enforcement agencies. The problem that arises in recognition is generally sub divided
in to 5 parts:(1) image capturing, i.e., captures the image on the 1icense plate, (2)
preprocessing the image, (3) 1ocalizing the 1icense plate, (4) character segmentation,
i.e., to identify the images on the plate, (5) optical character recognition. It he1ps
in fine-tuning the system such as number of characters in the 1icense plate and text
1uminance level. The problem can be reduced based on the application in specific
country. For example, the standard is to print the 1icense plate number in b1ack color
on a white background for private vehic1es, and for commercia1 vehicles, ye11ow
background is used in India (Fig. 26.1).
Image binarization mechanism is used to convert an image from black and white.
Here, the threshold is used to differentiate certain pixels as white and certain pixe1s as
black. The issue is to exact the threshold value for a particu1ar image. Sometimes, it
is impossible to select optimal threshold value. To overcome this problem, adaptive
threshold is used. And this mechanism of selecting threshold is called automatic
threshold selection.
Many operational algorithms are used for the edge detection [4] they are Canny,
Canny-Deice, differential, Sobel, and Prewitt. Hough transform is initially used for
line detection, and it is a feature extraction technique. This method is used to find
the extended position of random form like circle or oval.
Blob detection is the method is used in the detection of points or regions that
fluctuate in brightness or color as compared to its environment [5]. This method is
Fig. 26.1 Block diagram of
ANPR model
26 Text Recognition from Images Using Deep Learning Techniques 267
used to find complementary regions which cannot be perceived by edge or corner
detection algorithms.
After detecting number plate, selected characters are tested for the further process.
For plate segmentation, many methods are used to recognize the character. In current
scenario, some of the methods that are used for character segmentation are image
binarization and CCA. Characters are recognized by using character segmenta-
tion, and it is important task to perform character recognition with good accuracy.
Character recognition is impossible in character segmentation due to errors that
are occurred sometimes. Vertical and horizontal projection gives better results for
segmentation.
Character recognition helps to identify, and it is going to convert the image text in
to modifiable text. Character recognition is a process of transforming data from bit
map representation in to form of representation, which is apt for the computers. Here,
character recognition should be invariant toward the user font type or deformation
formed by a skew.
Artificial neural network model [69] is used to differentiate the characters. It
contains three layers they are input, hidden and, output layers. Input layer is used for
decision-making; hidden layer is used to compute more complex relations, and output
layer is used to avail accurate result. ANN is trained by using feed forward back-
propagation (BP) algorithm. This algorithm is used where process time of 0.06 s
is essential. Neural network is mathematically named as artificial neural network
(ANN). Many algorithms are used in ANN.
Template matching is used to recognize fixed-sized letters and to find objects
in face detection and medical image processing [10]. There are two types of
template matching in this context: feature-based matching and template-based
matching. When the template picture includes prominent characteristics, feature-
based matching is advantageous; otherwise, template matching is advantageous. To
achieve 85% of character recognition rate, here, statistical feature extraction is used
[11]. To adjust characters with unique size, we use linear normalization algorithm.
The objectives include:
To review other algorithms which recognize the number plate of vehicles.
The proposed method is used to solve the automated number plate recognition
problem.
To propose an algorithm that is used to evaluate and test the process and it presents
the evaluation results.
Automatic number plate recognition system is used to overcome the drawbacks
and deficiency of successful surveillance of the CCTV cameras [12,13].
Automatic number plate identification does well in recognizing text from images
with several lines, images with road names, and automobile plate numbers.
The text recognition problem is generally sub-divided in to 5 parts:
Image acquisition, i.e., captures the image in the license plate [14].
Preprocessing the image, i.e., normalization, adjusting the brightness.
Localizing the license plate.
268 B. Narendra Kumar Rao et al.
Character segmentation, i.e., identifying and locating the individual symbol
images and contrast of the image [15].
Optical character recognition.
Recognize the model for an image analysis procedure.
We can understand how images are represented by utilizing this method, which
includes optical images, analog and digital images.
By using image segmentation, many image types are used such as binary images,
grayscale images, color and multi-spectral images.
The fast expansion of today’s urban and national road networks in recent years
has necessitated the need for effective road traffic monitoring [16] and management.
The increased usage of vehicles today produces societal concerns such as accidents,
traffic congestion, and, as a result, traffic pollution. The process of detecting and
recognizing the vehicle number plate or license plate is known as number plate
recognition [17]. We employ image processing techniques to extract the car license
plate from digital photographs in this case. Number plate recognition is made up of
two parts: The first one is a camera that captures vehicle number plate photographs,
as well as software that extracts numbers from the license plate by capturing the
images using a character recognition tool that allows pixels to convert numerical
data into readable letters. Vehicle tracking, traffic monitoring, automatic payment of
tolls on roads and bridges, surveillance systems, toll collection stations, and parking
management systems are just a few of the uses.
The algorithm under context is categorized divided into four steps:
i. Vehicle image acquisition
ii. Number plate extraction
iii. Character segmentation and
iv. Character recognition.
We employ a few procedures to recognize a text number plate. The first stage is to
collect a picture from a car. Capturing an image is not an easy problem, but driving
a vehicle in real time in such a way that the vehicle number plate is not missed is a
difficult task. The fourth step’s success is determined by how the second and third
steps allot vehicle number plates and separate each character.
26.2 Related Work
The detection of a number plate region is the initial stage in this procedure. This is
accomplished by incorporating algorithms capable of detecting a rectangular region
of the number plate in an original image. There are four key phases in the detection
and recognition process.
i. Preprocessing
ii. Localization
iii. Segmentation
26 Text Recognition from Images Using Deep Learning Techniques 269
iv. Recognition.
By acquiring pictures of data, it does the variety of vehicles, which are then
sent into the computer code, which converts them to grayscale images. To get the
quantity plate and its characters, modifications in contrast, brightness, and gamma
are made to their optimal values. The result is recorded in computerized software,
which verifies whether it has all of the digits in the number plate on each loop. When
the results fulfill the specified requirements, the computer shows the quantity and
ends the program execution so that the next image may be analyze. Here, to develop
system, the following steps are involved.
The input image includes a lot of colors; therefore, it is preprocessed to improve
the standard and get it ready for the following steps. Because the image contains
varied color, the system uses the NTSC standard to transform the RGB images to
grayscale images.
Gray =0.3Red +0.58Green +0.11Blue
In the following part, the gray picture is filtered with a median filter to reduce noise
while maintaining image clarity. We utilized a nonlinear filter to replace each pixel
with a value calculated by computing the median of pixel values. By categorizing
images into many groups by
Total number of grops =Height/Candidate region Eetraction
26.2.1 Localization
It is used to recognize the plate region in an image. The main goal is to locate
the car plate region using pictures of the vehicle captured by the camera or video.
The image’s quality is a crucial element of this approach, and preparing the image
helps to improve it. In general, number plates appear to have a large amount of
free space within the image. Because the numerals and letters are in the same row,
there are frequent variations in intensity horizontally. Because the rows that will hold
the number plate are expected to display significant fluctuations, this allows for the
detection of changes in horizontal intensity. The difference between the letters and
the backdrop is the cause of this extreme fluctuation. The Hough transform is now
applied to the binary pictures that have been scaled. The pictures are subjected to
edge detection before being input into the Hough transforms program. Edges help
to describe boundaries and are hence a crucial element to consider while processing
a picture. Big O notation can be used to represent the difficulty of a Canny edge
operation.
OcN =ON
OM
OC
C+1
270 B. Narendra Kumar Rao et al.
where O,N,O,M,O,Cdenote the complexity of the first, second, and third loops,
respectively, inside the codes. When robust intensity is present, edges in images are
the places where there is a rapid change in intensity from one pixel to the next.
Detecting edges lowers the quantity of data in a picture, which aids in filtering out
unnecessary data while maintaining the image’s structural characteristics. The Hough
transformation is a common image analysis method that allows us to detect global
patterns in images by recognizing local patterns in a revised parameter space.
p=xcos θysin θ
It is most beneficial when the patterns start looking for areas that are sparsely
digitized and have “holes, and the images become obnoxious, especially when a
straight line within the license plate is discovered. The main goal of this method is
to find curves that can be parameterized into straight lines in a reasonable parameter
range. The vehicle area is identified using the exploitation Hough transforms [18].
The following is a depiction of the license plate region:
Cregion =0,0
blackpixel plate region ×col hit 255, otherwise
26.2.2 Segmentation
After identifying the number plate region in an image/video, this is the following
step to segment characters. It is one of the most essential procedures in automatic
number plate identification, when all phases are taken into account. If segmentation
fails, a character may be incorrectly united or separated into two halves. If only one-
row plates are assumed, segmentation can be accomplished by identifying character
boundaries.
The acquired segments are improved in the second part of the segmentation. The
segment phase of a plate comprises not just characters, but also unwanted elements
such as dots and superfluous space on the character’s borders. These elements must
be removed, leaving simply the character. By identifying the space in its horizontal
projection, we can partition it. We use the adaptive threshold filter on a regular basis to
strengthen the plate before segmentation phase’s vicinity. With non-uniform lighting,
the adaptive threshold aids in distinguishing dark foreground from light background.
26 Text Recognition from Images Using Deep Learning Techniques 271
26.2.3 Recognition
The output is compared and identified against databases with completely distinct
algorithmic rules. On the acquired image, grayscale segmentation is performed. We
had to conduct some image preprocessing before creating the model. Following that,
take the following steps:
a. Binarization:
b. Inversion and intensity of the characters.
Register the character’s linked component, as well as the smallest rectangular
rectangle comprising these connected components. The picture size is normalized to
15 ×15 pixels. For each of the characters, the intensity values victimization must
store the algorithm rule listed below. We prefer to compute the segmented characters’
matching score using the character templates held on by the algorithmic rule that
follows. We compare the pixel values of the segmented character matrix with the
model matrix, adding 1 to the matching score for each match and decrementing
1 for each mismatch, which is applied to all 225 pixels. For each model, a match
score is generated, as well as the best score derived from a recognized character. For
recognition, character sets are employed.
26.2.4 Datasets
The GTI dataset was created on Spanish roads (Madrid, Brussels and Turin). The
data were gathered from video sequences captured by a camera mounted on the
vehicle’s front. On one hand, there are 3425 affirmative rear pictures, which contain
the vehicle’s number plates and are retrieved from a range of viewpoints [19]. On
the other side, there are 3900 negative photos (taken from way sequences) that have
the number plates but no images of cars. To estimate the entire number of pictures
into 4000 positive and 4000 negative images, a limited number of photographs from
the Caltech and TU Graz-02 databases were employed.
Markus Weber Cars dataset was taken at the parking lot of California Institute of
Technology by Markus Weber. It is not a large dataset because it only contains 126
pictures with a resolution of 896 ×592 pixels, all of which are stored in JPG format
[20]. This dataset only contains pictures taken from the back, and it only includes
salon vehicles, not track or bus vehicles. This dataset’s pictures were all captured
under the same settings, which were sunny days. Furthermore, images shot at night,
in low illumination, rain, or in shadow weathers are not included in the collection. In
addition to all of these challenges, the dataset has no tilt, rotation, or unambiguous
translation.
272 B. Narendra Kumar Rao et al.
26.3 Proposed Work
Image acquisition, license plate extraction, character segmentation, and character
identification are the four processes of a typical ANPR system.
26.3.1 Image Acquisition
The first stage in this system is image acquisition, which involves taking a picture
using a digital camera connected to a computer. Because the pictures were recorded
in RGB format, they could be processed further for number plate extraction. The
database system stores the car owner’s personal information as well as a few plate
vehicle pictures, abbreviations, and acronyms.
26.3.2 Image Processing
The RGB picture is captured. Many factors impact the acquired image, such optical
system distortion, system disturbance, lack of presentation, or excessive relative
motion of the camera or vehicle, and so on. The consequence is a degraded captured
vehicle image and an unfavorable influence on subsequent image processing. As a
result, prior to the primary image processing, the acquired image must be prepro-
cessed, which includes converting RGB to gray scale, noise removal, and border
enhancement for brightness.
26.3.3 Plate Localization
The primary purpose of vehicle number plate identification is to determine the plate
size. Number plates are similar to rectangular plates, and the region props function
is provided by the MATLAB toolbox. It expresses a collection of attributes for each
marked region in the matrix. In this case, we used the bounding box to estimate
the picture region’s attributes. Following the labeling of the linked components, the
area will be extracted from the input picture. The localization of number plates has
emerged.
In an ANPR system [21], number plate segmentation is critical. After you have
grown your area, the most important thing to remember is the requirements for a good
region. The following is used to determine which picture pixels fulfill the require-
ments. At every location when such a pixel is found, its neighbors are evaluated, and
if any of them meet the requirements, both pixels are considered to be in the same
26 Text Recognition from Images Using Deep Learning Techniques 273
region. We extract individual characters and numbers from the picture using vertical
and horizontal scanning techniques.
26.3.4 Character Recognition
This is the most crucial and fundamental step of the ANPR system. It demonstrates the
procedures needed to sort and then interpret the individual characters. The retrieved
characteristics are used to classify the data. By using statistical, syntactic, or neural
methodologies, features are arranged. For recognition of letters and characters in
the paper, we use distinctive strategies. The identification process is completed by
calculating the similarity of features. Make the second identification for the compa-
rable characters using the highlight point matching approach. Another methodology
is that once the lines is extracted from the vehicle, individual characters can now be
separated using the line separation process, which is now segment savvy. Individual
characters are then split and kept in different variables as a result. The extracted
characters are taken from number plate, and these characters must coordinate with
the database that has characters. The next phase is template matching. Template
matching is a powerful character recognition method. The image of the character is
compared to our database, and the best resemblance is chosen. Another approach for
character identification is optical character recognition (OCR), which compares each
individual character to the whole alphanumeric database. To match individual char-
acters, the OCR uses a relationship mechanism, and the number is eventually recog-
nized and saved in string format in a variable. After that, the character is compared
to the vehicle authorization database. As a result of the comparison, the subsequent
indicators are presented. Every character, such as A–Z and 0–9, will have its own
template, as shown in the diagram (Fig. 26.2).
Fig. 26.2 Database of
templates
274 B. Narendra Kumar Rao et al.
The suggested technique is depicted in the above block diagram. In this suggested
technique, the input picture is converted to gray scale, and morphological scanning is
done to the grayscale image, which checks the image and identifies the number plate
section of the vehicle. The separate number plate will be sent into split segmentation,
which will divide every single character and compare each character to the current
dataset. The most extreme coordination will be identified and combined to form the
final product.
26.4 Proposed Method
In the previous section, we had discussed about the abstract and view of the model.
In this chapter, we are going to present the complete architecture that we are using
for the project. Here, we are using convolution neural network.
26.4.1 Convolution Neural Network
In deep learning, a convolutional neural network (CNN/ConvNet) is a sort of deep
neural network that is used to assess visual imagery. Convolution is a technique used
in it. Convolution is a mathematical procedure that takes two functions and produces
a third function that shows how the shape of one is modified by the other (Fig. 26.3).
Let’s go through the basics of what an image is and how it is portrayed before
we get into how CNN works. A grayscale image is nothing more than a single-plane
matrix of pixel values, whereas an RGB image is a three-plane matrix of pixel values.
To learn more, take a look at this diagram (Fig. 26.4).
Convolutional neural networks are made up of many layers of artificial neurons.
Here, the image is going to convert in to gray scale by using convolution neural
network (Fig. 26.5).
Fig. 26.3 Convolution neural network
26 Text Recognition from Images Using Deep Learning Techniques 275
Fig. 26.4 Multiple layers of
artificial neurons
Fig. 26.5 2D convolution
for pooling
26.4.2 Pooling
The pooling layer, like the convolutional layer, is responsible for shrinking the
convolved feature’s spatial size. By decreasing the size, the processing power required
to process the data is reduced. Average pooling and maximum pooling are the two
forms of pooling. So far, I have just worked with Max Pooling and haven’t run across
any issues.
26.4.3 Padding
Padding is a term used in convolutional neural networks to refer to the number of
pixels added to an image during CNN kernel processing. If the padding in a CNN is
set to zero, for example, every pixel value added will be 0. The image will have a
one-pixel border with a pixel value of zero if the zero padding is set to one (Fig. 26.6).
276 B. Narendra Kumar Rao et al.
Fig. 26.6 Padding =Same
26.5 Implementation
26.5.1 Models of CNN
Training CNN is used to evaluate picture performance; models are utilized to generate
a number of alternative CNN-based classification models. I will be using the Keras
framework to create our model. Some of the models that are used as follows:
i. CNN with 1 convolutional layer
ii. CNN with 3 convolutional layer
iii. CNN with 4 convolutional layer.
To optimize the classifier, divide the original training data (60,000 photos) into 80
percent training (48,000 images) and 20% validation (12,000 images), while keeping
the test data (10,000 images) to finally evaluate the model’s accuracy on data, it has
never seen. This allows me to determine if I’m over-fitting on the training data, and
whether I should drop the learning rate and train for longer epochs if validation
accuracy is greater than training accuracy, or whether I should stop over-training if
validation accuracy shifts higher than training accuracy.
26.5.2 OCR using open CV and CNN
Our strategy will not be to recognize everything in an image in one go, but rather to
segment the image into characters, send these segmented characters to CNN to be
recognized, and then arrange the detected characters to duplicate the text shown in
the image (Fig. 26.7).
The method is illustrated in the diagram.
26 Text Recognition from Images Using Deep Learning Techniques 277
Fig. 26.7 OCR using open CV and CNN
26.6 Conclusion
We have successfully developed a fully working number plate recognition system
that used convolutional neural network in association with character recognition and
character segmentation, and license plate detection is used to detect the number
plate. The majority of the number plate recognition system’s components have been
successfully deployed. Our proposed approach works in general situations when the
distance between the camera and the vehicle is unrestricted, and weather conditions
are unfavorable. However, when the distance between the camera and the vehicle
remains constant, the performance of our system improves. To enhance the segmen-
tation section, we collect successfully trained data. Other prominent methods, such
as artificial neural network, can help enhance optical character recognition. I intend
to create an automatic number plate recognition system with its own data base, user
interface, and authorization system based on number plate identification.
278 B. Narendra Kumar Rao et al.
References
1. Cheng, C., Koschan, A., Chen, C. –H., Page, D. L., Abidi, M. A. (2012). Outdoor scene image
segmentation based on background recognition and perceptual organization. IEEE Transactions
on Image Processing, 21(3), 1007–1019
2. Mehmood, S., Cagnoni, S., Mordonini, M., & Khan, S. A. (2012). An embeded architecture
for real-time object detection in digital images based on niching particle swarm optimization.
Journal of RealTime Image Processing, 1–15
3. Kurugollu, F., Sankur, B., & Harmanci, A. E. (2002). Image segmentation by relaxation using
constraint satisfaction neural network. Image and Vision Computing, 20(7), 483–497
4. Sarfraz, M. S., et al. (2011). Real-Time automatic license plate recognition for CCTV forensic
applications. Journal of Real-Time Image Processing.
5. Zhuang, D., & Zang, W. (2010) Content-Based image retrieval based on integrating region
segmentation and relevance feedback. In International Conference on Multimedia Technology
(ICMT)
6. Ong, S. H., Yeo, N. C., Lee, K. H., Venkatesh, Y. V., & Cao, D. M. (2002). Segmentation of
color images using a two stage self-organizing network. Image and Vision Computing, 20(4),
279–289.
7. Navon, E., Miller, O., & Averbuch, A. (2005). Color Image segmentation based on adaptive
local thresholds. Image and Vision Computing, 23(1), 69–85.
8. Zhuge, Y., Udupa, J. K., & Saha, P. K. (2006). Vector scale-based fuzzy-connected image
segmentation. Computer Vision and Image Understanding, 110(2), 177–193.
9. Crevier, D. (2008). Image segmentation algorithm development using ground truth image data
sets. Computer Vision and Image Understanding, 112(2), 143–159.
10. Lalimi, M. A., Ghofrani, S., & McLernon, D. (2012). A vehicle license plate detection method
using region and edge based methods. Computers & Electrical Engineering
11. Wang, S. -Z., & Lee, H. -J. (2009). A cascade framework for real-time statistical plate recogni-
tion system. IEEE Transactions on Information Forensics Security, 2(2), 267–282. Prathamesh
Kulkarni, Ashish Khatri, Prateek Banga & Kushal Shah (2009). Automatic number plate
recognition (ANPR). In Radioelektronika. 19th International Conference.
12. Naito, T., Tsukada, T., Kozuka, K., & Yamamoto, S. Robust license-plate recognition method
for passing vehicles under outside environment. IEEE Transactions on Vehicular Technology,
49,(6).
13. Janakiramaiah, B., Kalyani, G., & Jayalakshmi, A. (2021). Automatic alert generation in a
surveillance systems for smart city environment using deep learning algorithm. Evolutionary
Intelligence, 14, 635–642.
14. Cynthia, L., Hibdon, J., Cave, B., Koper, C. S., & Merola, L. (2011). License plate reader (LRP)
police patrols in crime hot spots: an experimental evaluation in two adjacent jurisdictions.
Journal of Experimel Criminology, 321–345.
15. Chen, J. -J., Su, C. -R., Grimson, W. E. L., Liu, J. L., & Shiue, D. -H. (2012). Object segmentation
of database images by dual multiscale morphological reconstructions and retrieval applications.
IEEE Transactions on Image Processing, 21(2), 828–843.
16. Suresh, K. V., Mahesh Kumar, G., Rajagopalan, A. N. (2007) Super resolution of license plates
in real traffic videos. IEEE Transactions on Intelligent Transportation System, 8(2), 321–331.
17. AbdulkarSengur and YanhuiGuo. (2011). Color texture image segmentation based on neutro-
sophic set and wavelet transformation. Computer Vision and Image Understanding, 115(8),
1134–1144.
18. Ballard, D. H. (1981). Generalizing the hough transform to detect arbitary shapes. Pattern
Recognition, 13(2), 111–122.
19. Peng, B., Zhang, L., & Zhang, D. (2011). Automatic image segmenation by dynamic region
merging. IEEE Transactions on Image Processing, 20(12), 3592–3605
20. Tian, Y., Yap, K. -H., & He, Y. (2012). Vehicle license plate super-resolution using soft learning
prior. Multimedia Tools and Applications, 519–535
26 Text Recognition from Images Using Deep Learning Techniques 279
21. Wu, H., & Li, B. (2011). License plate recognition system. In International Conference on
Multimedia Technology (ICMT) (pp. 5425–5427).
22. Chen, R., & Luo, Y. (2012). An improved license plate location method based on edge detection.
Physics Procedia, 24, 1350–1356.
Chapter 27
Early Detection and Diagnosis of Oral
Cancer Using Fusioned Deep Neural
Network
Sree T. Sucharitha, I. Kannan, and K. A. Varun Kumar
Abstract In recent days, oral cancer cases are significantly increasing due to
increase in tobacco consumption, along with the combination of consumption of
alcohol, poor oral hygiene, and human papilloma virus (HPV) infection. Early detec-
tion of this kind of cancers is preventive, or else, it may leads to premature deaths.
50% of cases are detected in advanced stages. For the above reasons, it is important to
develop the new model to detect the oral cavity cancer in early stage from the digital
data and image processing techniques. The research in detection of oral cancer is
highly active from the twentieth century. In this paper, the detection of oral cancer
with the fusion model of CNN +RNN is proposed. The proposed model outper-
forms the state-of-art techniques in detection of oral cancer with 82% of accuracy.
The obtained result is analyzed with systematic approach, and assured diagnosis is
ensured the diagnosis of the oral cancer we attempt to near future. The intention of
the proposed method is to improve the detection accuracy in the early diagnosis of
oral cavity cancers.
27.1 Introduction
Oral cancers are important for a gathering of diseases usually alluded to as head
and neck tumors, and of all head and neck tumors, they contain around 85% of
that classification. Cerebrum malignant growth is a disease class unto itself and is
excluded from the head and neck malignant growth bunch. Historically, the death
rate is connected with the cancers why because of its hard to detect in early stage and
diagnose also complex till now. We are 2021 still cancer cell detection and diagnose
is complex in medical sectors. Oral cancer is more dangerous than other because
of detecting the origin of the cancer is difficult once is treated in early stage it may
S. T. Sucharitha ·I. Kannan
Department of Community Medicine, Tagore Medical College and Hospital, Chennai, India
K. A. Varun Kumar (B)
School of Computing, SRM Institute of Science and Technology, Kattankulathur, India
e-mail: varun.kumar300@gmail.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
B. N. K. Rao et al. (eds.), Intelligent Computing and Applications, Smart Innovation,
Systems and Technologies 315, https://doi.org/10.1007/978-981-19-4162-7_27
281
282 S. T. Sucharitha et al.
significantly grow even if the person completed their treatment. Oral cancer may also
affect the lungs and liver cause of localization in intra-oral area in later stages. Oral
cancer is particularly dangerous because in its starting stages, it may not be seen by
the patient, as it can as frequently as conceivably thrive without making torture or
secondary effects they may expeditiously see, and considering the way that it has a
high risk of conveying second fundamental developments. This suggests that patients,
who persevere through a first involvement in the affliction, have up to different
occasion’s higher risk of developing second harmful development. This raised peril
part can continue to go for 5–10 years after the essential occasion. There are a couple
of sorts of oral growths, but around 90% is squamous cell carcinomas. The other
obviously more surprising oral malignancies are ACC and MEC illnesses which
by assessment are for the most part unprecedented, yet particularly perilous as the
significance of data about them is certainly not by and large SCC. It is evaluated that
around the period of analyzed patients might show a period part in the biochemical or
biophysical cycles of maturing cells that permits dangerous change, or maybe, safe
framework ability reduces with age. Exceptionally, ongoing information persuade us
to think that the quickest developing fragment of the oral malignant growth populace
is non-smokers younger than fifty, which would demonstrate a change in outlook
in the reason for the infection, and in the areas where it most every now and again
happens in the oral climate. The foremost of the mouth, tobacco, and liquor-related
diseases have declined alongside a comparing decrease in smoking, and the back
of the oral pit locales related with the HPV16 viral reason is expanding. So, while
talking in consensuses to general society, many allude to these two unmistakably
various diseases (oral and oropharyngeal) as “oral malignant growth, and keeping
in mind that in fact, not precise is viewed as ordinary in overall population informing.
This is a causative specialist in diseases of the lip, just as other skin malignant
growths. Disease of the lip is one oral malignant growth whose numbers have declined
over the most recent couple of many years. This is reasonable because of the expanded
attention to the harming impacts of delayed openness to daylight and the utilization of
sunscreens for security. Another actual variable is openness to x-beams. Radiographs
were routinely taken during assessments and at the dental office are protected, yet
recall that radiation openness is aggregate over a long period. It has been ensnared
in a few head and neck tumors.
Various Stages of Oral Cancers
Stage-0:
Stage 0 is beginning stage of the cancer which is medically called as “carcinoma
in situ.” It means an abnormal tumor cells were lining the lips and oral cavity of the
person which may lead to oral cancer.
Stage-1:
Stage 1 is early of the oral cancer; the tumor calls are growing less than 2
centimeters in oral cavity or lips. In this stage, cancer cells not reach to regional
clusters.
27 Early Detection and Diagnosis of Oral Cancer Using 283
Stage-2:
Stage 2 describes cancer cells are growing more than 2 cm and not greater than 4
cm. This stage not reaches the regional clusters.
Stage-3:
Stage 3 describes the cancer cells growing more than 4 cm and reach regional
clusters in bottle neck stage.
Stage-4:
Stage 4 describes most advance stage of oral cancer. It may affect in any sizes but
its on spreading stage. This spreading follows below:
1. Nearby tissue like jaw and other parts of oral cavity.
2. One large regional cluster is growing in one side of the lips or oral cavity it also
affects the others side too.
3. It also distant the parts of the body and beyond the mouth likely lungs and liver.
Stage 3 and Stage 4 cancers recure the earlier stage of the cancer.
Convolutional Neural Network
CNN is a deep learning algorithm to extract the feature from the image. This algorithm
uses the image as input and extracts the feature by the user given parameters. From
those parameters, it will extract the feature from image and process the data to further
steps. CNN core architecture is like neural system of the human body. The CNN
system taken the image input into matrix form to processing the feature extraction
technics. As it were, the system can be prepared to comprehend the advancement of
the picture better. In previous systems, authors test the system by less samples it may
leads to the inaccuracies in the result to overcome this issues we use CNN model to
utilize the resource in effective manner. It also helps the automation in the system to
process the dataset.
27.2 Literature Survey
Wang et al. [1] proposed the portable oral cancer detection system. The authors use
micro-electromechanical system micromirror to field of view technique to identify
the oral cancer from confocal image to large field of view. They tested with multiple
clinical samples include the neoplastic lesion tissues. Pandey and Gupta [2] proposed
the stage determination model to detect the oral cancer. They use neuro fuzzy infer-
ence system to detect the stage of oral cancer by if then rule. From this approach,
they reduce the tolerance level of the detection system. Swetha et al. [3] proposed the
cancer detection model using the neural network. The authors use the CNN model
to detect the oral cancer. The created system automatically detects the attributes of
temperature, saliva ph value, and Co2for diagnosing the oral cancer. Aier and Khan
et al. [4] proposed the oral detection model to identify the mutation of the cancer
284 S. T. Sucharitha et al.
cells. The authors exploring the ELF4 transcriptional factor to detect the oral/mouth
cancer with high throughput sequencing data. They also focus on the quality of the
oral cancer detection; they also focus the potential targets of oral cancer throughput
investigations.
Sami et al. [5] proposed the model to detect the oral borderline malignancy. The
authors use the image processing methodology to develop the computer-aided model
to detect the oral cancer. This model is featuring the clinical data to processing and
detect the oral cancer of the mankind. Rekha et al. [6] proposed the model to identify
the blood plasma of oral cancer. The authors use Raman spectroscopic characteristic
to versatile detailing of the blood plasma. This method can archives result cure the
cancer patients from the oral cancer cell blood plasma with high accuracy of the
recovery rate. Amulya and Jayakumar [7] proposed the study on melanoma skin
cancer detection techniques. The author analyzes the various parameter of the skin
cancer to diagnose patients and detects the cancer cells in the early stage. Nezhadian
and Rashidi [8] proposed the model to detect the melanoma skin cancer by using the
color and texture features. The author develops the algorithm to detect skin cancer by
using the color and texture of the cancer cell. This algorithm extracts the feature from
dataset and classifies the dataset by SVM to improve the accuracy of the detection.
Udrea and Mitra [9] proposed the model to detect the skin cancer using the clinical
images. The authors proposed the adversarial neural network algorithm to detect the
pigmented and non-pigmented skin cancers. This algorithm uses the clinical images
to extract the cancer cell feature and automatically detects the skin cancer in accuracy
of 92%. Caorsi and Lenzi [10] proposed the removal techniques to remove the breast
cancer cells. The authors use the ANN algorithm to train and test the dataset to detect
the cancer cell. This model also uses the real cleaning technics to detect the cancer
in improved accuracy. Jiang et al. [11] proposed the oral cancer detection model
to detect the cancer cells from the fluorescent images. The authors use fluorescent
images as input and processing the data with image fusion algorithm to detect the
cancer susceptible portion of the oral cavity. This will improve the system detection
rate and efficiency of the system. Shu-Fan [12] proposed the oral cancer diagnosis
model to identify the cancer tissue with improved accuracy rate. The authors use the
prob images to system to structuring the tissue inside the oral cavity for diagnosis
with scanned precancer cells.
Shalu and Kamboj [13] proposed the model to detect the skin cancer by using the
color-based method. In this model, the authors implement the algorithm to extract the
color features from the digital images. From that classification, the system develops
the MED-NODE dataset for testing. This model uses machine learning algorithms
to improve the detection accuracy up to 82.35% [14]. Arik et al. [15] proposed the
deep learning-based skin cancer detection model. The authors use the deep learning
algorithm to develop the model to detect the skin cancer in early stage of the cancer
cell by clinical images. Demir et al. [16] proposed the deep learning architecture to
detect the skin cancer in early stage. They use ResNet-101 and Inception-v3 deep
learning architecture to detect the skin cancer. This model will recognize the cancer
cell from the CT images in accuracy of 87.42%.
27 Early Detection and Diagnosis of Oral Cancer Using 285
Khokhar et al. [17] proposed the model to detect the skin cancer using the
millimeter waves. The authors develop the model to detect the skin cancer by passing
the electromagnetic waves to small part of the skin to detect the cancer cells. This
model improves accuracy in the early skin cancer detection. Sujatha et al. [14]
proposed the model to identify the therapeutic inhibitors of the oral cancer. The
authors describe the inhibitors treatment technic of oral cancer and proposed the
computation approach to validate the in vivo and in vitro studies to prove the effi-
ciency of the system and protect the mankind from the oral cancer. Rajaguru and
Kumar Prabhakar [18] proposed the hybrid classification model for oral cancer. The
authors use the Bayesian LDA algorithm with the combination of the artificial bee
colony optimization algorithm to classify the cancer sets of images from the input
system. The tested environment with the classification accuracy of 83.13 out of 100%.
Mansutti et al. [19] proposed the millimeter wave near field to detect the skin
cancer. The authors develop the promising tool based on the substrate-integrated
waveguide (SIW) to overcome the fabricated issues in the previous cancer detection
techniques [11]. This tool is placed near to the skin and passed the electric wave to
detect the skin cancer.
Bumrungkun et al. [20] proposed the model to detect the skin cancer using the
SVM and snake algorithm. The author analyzes the various parameters of the skin
cancer; parameters are extract feature from the image and process the image by image
segmentation to detect the skin cancer. These segmentation parameters are taken as
input to SVM to find the accuracy of the cancer detection parameters [4]. Zhao et al.
[21] proposed the model to detect the skin cancer using the Raman spectroscopy.
The authors use the Raman scattering effect and inherit the effects to the model to
detect the skin cancer in preliminary stage. This model takes the clinical result of the
skin cancer testing and analysis the sensitivity of the skin to detect the skin cancer.
Aydinalp et al. [22] proposed the model to detect the skin cancer using the
microwave. The authors use the rat skin tissue to test the model. This model collects
the dielectric properties of the skin tissue and passed properties to tool to detect the
skin cancer [2]. Setiawan et al. [23] proposed the model to detect the skin cancer in
early stage. The authors use the CNN algorithm to process the image segmentation.
This model analyzes the skin color changes in human from the image. This model
detects only the early stages of skin cancer to overcome the medical complexity and
early diagnosis to prevent the skin cancer [1]. Azadeh Noori Hoshyar et al. [24]
proposed the review on early skin cancer detection methods. The author analyzes the
various automatic skin cancer detection methods and provides the overview of those
methods. This review helps the patients to detect the skin cancer in early stages and
diagnose them to prevent affect ratio.
27.3 Methodology
The details of proposed methodology are elaborated in these sections. The input
images taken from migrant from northern part of India are preprocessed for quality
286 S. T. Sucharitha et al.
enhancement. The feature extraction has to be done for classification. Figure 27.1
shows the initial steps from the proposed methodology with demonstrates the
proposed methodology with deep learning techniques.
Dataset and Preprocessing
The input images are collected from various dental clinic all over India. There are
1600 images are collected, in that 1224 images are extracted from the histopatho-
logical repository. There are two different types of images found in the repository
which consist of two sets with two resolution. First set consists of more than 90
images as histopathological images and more than 400 images of OSCC, in the
second set more than 200 images as epithelium of normal and more than 450 images
of OSCC with good magnification. The images are collected from more than 500
patients with high-quality HD camera with standard lightening condition. Convolu-
tion neural networks outperform the state of art deep learning algorithms in certain
specific problems. The input layers are extracted from CNN algorithm which consist
of softmax, maxpooling, and connected layers. Normally, CNN comprises of these
layers with the sequential stack, and RNN is used to train the input model with the
term called transfer learning technology. The sample images for preprocessing have
shown in Fig. 27.2, and the detection of edges of the sample input images is shown
in Fig. 27.3 (Fig. 27.4).
Feature Selection
There are two types of feature selection methods specifically, vector and scalar
method. In scalar method, it has individual feature to be considered, and it is also
an easier method than vector method while using for feature selection. In case of
vector method, it is used based on the mutual subset and relationship between them.
It is a complex method while comparing scalar method for feature selection. Here,
Fig. 27.1 Sample images
collection
27 Early Detection and Diagnosis of Oral Cancer Using 287
Fig. 27.2 Architecture of proposed model
Fig. 27.3 Preprocessing of input images
the different color intensity is considered as feature to select. In our model, we use
vector method for selection of best feature over input images.
Fully Connected Layer
After the feature extraction, the input data are transferred to final fully connected
layer to process; the training of neural parameters happened in this layer with the
help of pooling layer and softmax layer; at last, it makes final features which need
to be trained for the prediction of cancerous object.
27.4 Result Analysis
The epithelium images are processed for input images which has shown in Fig. 27.5.
This images are given to convolutional layer which the output of convolution layer,
288 S. T. Sucharitha et al.
Fig. 27.4 Edge detection
using input images
maxpooling, softmax and dense layers, and the input data is given to recurrent neural
learning (RNN) with the technology of transfer learning and the above process contin-
uous for the cancerous affected images. So, the differentiation between the normal
epithelial images and cancerous images has already trained and waiting for the vali-
dation to be done with the testing process. The sample image of cancerous affected
hepathamalogy has showed in Fig. 27.6. The process of convolution and layer by
layer had shown in Table 27.1. It starts with convol_1 of output shape with 896
parameters, conv_2 with output shape with 9248 parameters, and it continuous to
perform with maxpooling to dense layer with different parameters with output shape
(Fig. 27.7 and Table 27.2).
The accuracy of the classifiers has been defined with following formula:
Accuracy =classification of corrected samples/Number of samples taken
The accuracy rate for combined CNN +RNN offers an enhancement of 1.2%
related to normal CNN. The experimental results seen in Fig. 27.8 lead to the
remark of FPR that instigates with expressively worse value with both the algorithms,
whereas the ADR gains some part of improvements ranging from 1 to 2.2%.
The above Fig. 27.9 shows that proposed algorithm outperforms the current
algorithms which was clearly depicted the red line in the graph performs with the
limitation score of 1.8 which ranges from 4to2.
27 Early Detection and Diagnosis of Oral Cancer Using 289
Fig. 27.5 Epithelium
normal images
Fig. 27.6 Cancerous affected images
27.5 Conclusion
More than 80,000 fresh oral cancer cases reported every year with a high mortality
rate owing to the delay in detection. Presently, the cancer identification relies majorly
on oral examinations using torchlight to detect early stage cancers of the oral cavity.
Mostly, oral cancer affects the people from the lower income category and people in
rural area, due to the consumption of tobacco (either smokeless or smoking), alcohol,
etc. Earlier detection of oral cancer offers the best chance for long-term survival and
290 S. T. Sucharitha et al.
Table 27.1 Layer
conversions with parameters Layer type Output shape Parameters
Conv_1 12,812,832 896
Conv_2 12,812,832 9248
Maxpooling 646,432 Null
Droput 646,432 Null
Conv_3 646,464 18,496
Conv_4 646,464 36,982
Maxpooling_2 323,264 Null
Dropout2 323,264 Null
Conv_5 323,264 36,982
Conv_6 323,264 36,982
Maxpooling3 161,664 Null
Dropout3 161,664 Null
Flatten1 16,764 Null
Dense1 512 838,920
Dense2 73591
Fig. 27.7 Representation of
results with confusion matrix
Table 27.2 Result obtained with fusion of CNN +RNN
Parameters Precision Recall F1 score Support
10.52 0.82 0.52 0.578
20.80 0.54 0.53 0.975
Micro-average 0.62 0.47 0.60 1158
Macro-average 0.67 0.53 0.60 1158
Weighted average 0.58 0.61 0.60 1158
27 Early Detection and Diagnosis of Oral Cancer Using 291
Fig. 27.8 RoC curve
achieved through obtained
result
Fig. 27.9 Comparison of
CNN and RNN
has the potential to improve treatment outcomes. The proposed method outperforms
the current state of art techniques with the detection accuracy of 82%. The patient
tracking system and Nanooral kit to be developed and deployed in an near future to
diagnose the early stages of cancer, which gives the moral support for the patients
with low-income category.
References
1. Wang, Y., Raj, M., McGuff, H. S., Shen, T., & Zhang, X. (2011).Portable oral cancer detection
using miniature confocal imaging probe with large field of view. In 2011 16th International
Solid-State Sensors, Actuators and Microsystems Conference (pp. 1821–1824).https://doi.org/
10.1109/TRANSDUCERS.2011.5969765
2. Pandey & Gupta N. K. (2014). Stage determination of oral cancer using neurofuzzy inference
292 S. T. Sucharitha et al.
system. In 2014 IEEE Students’ Conference on Electrical, Electronics and Computer Science
(pp. 1–5). https://doi.org/10.1109/SCEECS.2014.6804517
3. Swetha, S., Kamali, P., Swathi, B., Vanithamani, R., & Karolinekersin, E. (2020). Oral disease
detection using neural network. In 2020 9th International Conference System Modeling and
Advancement in Research Trends (SMART) (pp. 339–342). https://doi.org/10.1109/SMART5
0582.2020.9337094
4. Aier, & Khan, S. M. (2018). Exploring the effect of wild type and mutant ELF4 transcriptional
factor on oral cancer using high-throughput sequencing data. In 2018 International Conference
on Bioinformatics and Systems Biology (BSB) (pp. 207–211). https://doi.org/10.1109/BSB.
2018.8770545
5. Sami, M. M., Saito, M., Kikuchi, H., & Saku, T. (2009). A computer-aided distinction of border-
line grades of oral cancer. In 2009 16th IEEE International Conference on Image Processing
(ICIP), pp. 4205–4208. https://doi.org/10.1109/ICIP.2009.5413534
6. Rekha, P., et al. (2013). Raman spectroscopic characterization of blood plasma of oral cancer.
In 2013 IEEE 4th International Conference on Photonics (ICP), pp. 135–137. https://doi.org/
10.1109/ICP.2013.6687092
7. Amulya, P. M., & Jayakumar, T. V. (2017). A study on melanoma skin cancer detection
techniques. In 2017 International Conference on Intelligent Sustainable Systems (ICISS)
(pp. 764–766). https://doi.org/10.1109/ISS1.2017.8389278
8. Nezhadian, F. K., & Rashidi, S. (2017). Breast cancer detection without removal pectoral
muscle by extraction turn counts feature. In 2017 Artificial Intelligence and Signal Processing
Conference (AISP) (pp. 6–10). https://doi.org/10.1109/AISP.2017.8324112
9. Udrea, A., & Mitra, G. D. (2017). Generative adversarial neural networks for pigmented and
non-pigmented skin lesions detection in clinical images. In 2017 21st International Conference
on Control Systems and Computer Science (CSCS) (pp. 364–368). https://doi.org/10.1109/
CSCS.2017.56
10. Caorsi, S., & Lenzi, C. (2015). Skin removal techniques for breast cancer radar detection based
on artificial neural networks. In 2015 IEEE 15th Mediterranean Microwave Symposium (MMS)
(pp. 1–4). https://doi.org/10.1109/MMS.2015.7375418
11. Jiang, C. F., Wang, C. Y., & Chiang, C. P. (2004). Oral cancer detection in fluorescent image
by color image fusion. In The 26th Annual International Conference of the IEEE Engineering
in Medicine and Biology Society (pp. 1260–1262). https://doi.org/10.1109/IEMBS.2004.140
3399
12. Chen, S. -F., Lu, C. -W., Tsai, M. -T., Wang, Y. -M., Yang, C. C., & Chiang, C. -P. (2005). Oral
cancer diagnosis with optical coherence tomography. In 2005 IEEE Engineering in Medicine
and Biology 27th Annual Conference (pp. 7227–7229). https://doi.org/10.1109/IEMBS.2005.
1616178
13. Shalu, & Kamboj, A. (2018). A color-based approach for melanoma skin cancer detection.
In 2018 First International Conference on Secure Cyber Computing and Communication
(ICSCCC) (pp. 508–513). https://doi.org/10.1109/ICSCCC.2018.8703309
14. Sujatha, P. L. et al. (2016). Identification of therapeutic inhibitors for the treatment of oral cancer
using herbal bio-active principles by means of computational approach. In: 2016 2nd Inter-
national Conference on Advances in Electrical, Electronics, Information, Communication and
Bio-Informatics (AEEICB) (pp. 549–553). https://doi.org/10.1109/AEEICB.2016.7538351
15. Arik, A., Golcuk, M., & Karslıgil, E. M. (2017). Deep learning based skin cancer diagnosis. In
2017 25th Signal Processing and Communications Applications Conference (SIU) (pp. 1–4).
https://doi.org/10.1109/SIU.2017.7960452
16. Demir, ˙
I., et al. (2019). SkelNetOn 2019: dataset and challenge on deep learning for
geometric shape understanding. In 2019 IEEE/CVF Conference on Computer Vision and
Pattern Recognition Workshops (CVPRW) (pp. 1143–1151). https://doi.org/10.1109/CVPRW.
2019.00149
17. Khokhar, U., Naqvi, S. A. R., Al-Badri, N., Bialkowski, K., & Abbosh, A. (2017). Near-field
tapered waveguide probe operating at millimeter waves for skin cancer detection. In 2017
IEEE International Symposium on Antennas and Propagation & USNC/URSI National Radio
Science Meeting (pp. 795–796). https://doi.org/10.1109/APUSNCURSINRSM.2017.8072440
27 Early Detection and Diagnosis of Oral Cancer Using 293
18. Rajaguru, H., & Kumar Prabhakar, S. (2017). Oral cancer classification from hybrid ABC-PSO
and Bayesian LDA. In 2017 2nd International Conference on Communication and Electronics
Systems (ICCES) (pp. 230–233). https://doi.org/10.1109/CESYS.2017.8321271
19. Mansutti, G., Mobashsher, A. T., & Abbosh, A. M. (2018). Millimeter-wave substrate integrated
waveguideprobe for near-field skin cancer detection. In 2018 Australian Microwave Symposium
(AMS) (pp. 81–82). https://doi.org/10.1109/AUSMS.2018.8346992
20. Bumrungkun, P., Chamnongthai, K., & Patchoo, W. (2018). Detection skin cancer using SVM
and snake model. In 2018 International Workshop on Advanced Image Technology (IWAIT)
(pp. 1–4).https://doi.org/10.1109/IWAIT.2018.8369708
21. Zhao, J., Lui, H., McLean, D. I., & Zeng, H. (2008). Real-time raman spectroscopy for non-
invasive skin cancer detection - preliminary results. In 2008 30th Annual International Confer-
ence of the IEEE Engineering in Medicine and Biology Society (pp. 3107–3109). https://doi.
org/10.1109/IEMBS.2008.4649861
22. Aydinalp, C., Joof, S., Yilmaz, T., Özsobaci, N. P., Alkan, F. A., & Akduman, I. (2019). In vitro
dielectric properties of rat skin tissue for microwaveskin cancer detection. In 2019 International
Applied Computational Electromagnetics Society Symposium (ACES) (pp. 1–2)
23. Setiawan, A. W. (2020). Effect of color enhancement on early detection of skin cancer using
convolutional neural network. In 2020 IEEE International Conference on Informatics, IoT, and
Enabling Technologies (ICIoT) (pp. 100–103). https://doi.org/10.1109/ICIoT48696.2020.908
9631
24. Hoshyar, A.N., Al-Jumaily, A., & Sulaiman, R. (2011). Review on automatic early skin cancer
detection. In 2011 International Conference on Computer Science and Service System (CSSS)
(pp. 4036–4039). https://doi.org/10.1109/CSSS.2011.5974581
Chapter 28
Fine-tuning for Transfer Learning
of ResNet152 for Disease Identification
in Tomato Leaves
Lakshmi Ramani Burra, Janakiramaiah Bonam, Praveen Tumuluru,
and B Narendra Kumar Rao
Abstract Plants provide a significant portion of the world’s food supply. Tomato
is the most popular plant which is cultivated worldwide. The tomato leaf disease is
the primary factor in productivity loss but can be avoided by monitoring regularly.
Detection of tomato leaf diseases using pre-trained deep learning models can help
to reduce the severity of the disease identification. However, instead of using a pre-
trained model directly, there is an optional step to fine-tune the model in transfer
learning, which improves the model performance. The examination of fine-tuning
the model with four various scenarios of transfer learning and the art of employing
pre-trained models were suggested in this work. Experiments were done using the
pre-trained model ResNet152 on tomato leaf disease identification.
28.1 Introduction
Agriculture is a significant measure of economic growth. Tomato is a global commer-
cial crop and the only vegetable consumed as a fruit, with a nutritional value
exceeding that of fruit. The production of tomato crops has made India the world’s
third-largest producer. Currently, disasters caused by tomato leaf diseases and plant
growth disorders are limiting agricultural output [1]. This condition significantly
impacts tomato quality and productivity and results in large economic losses. Pests
L. R. Burra (B)·J. Bonam
Prasad V. Potluri Siddhartha Institute of Technology, Vijayawada, Andhra Pradesh, India
e-mail: blramani@pvpsiddhartha.ac.in
J. Bonam
e-mail: janakiramaiah@pvpsiddhartha.ac.in
P. Tu m ul u r u
Koneru Lakshmaiah Education Foundation, Vaddeswaram, Andhra Pradesh, India
e-mail: praveenluru@gmail.com
B. Narendra Kumar Rao
Sree Vidyanikethan Engineering College, Tirupati, India
e-mail: narendrakumarraob@gmail.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
B. N. K. Rao et al. (eds.), Intelligent Computing and Applications, Smart Innovation,
Systems and Technologies 315, https://doi.org/10.1007/978-981-19-4162-7_28
295
296 L. R. Burra et al.
and diseases that affect tomato crops can be identified in advance, and proper culti-
vation and pest management can be supplied to satisfy crop growth needs, reducing
economic losses.
Traditional methods for identifying tomato leaf diseases rely solely on the culti-
vator’s skill or expert assistance. This technique is not only slow, but it is also inef-
fective, inaccurate, costly, and time-consuming. A team of researchers has proposed
a methodology that accurately identifies the tomato crop’s diseases. Deep learning
models [2] have successfully provided new methods and ideas for identifying plant
diseases and insect pests. This is the underlying motivation for the identification of
leaf disease. This work suggested the analysis of fine-tuning the model with four
different scenarios of transfer learning.
The work is organized as follows: Sect. 28.2 “Literature Survey” surveys interre-
lated works and techniques for tomato leaf disease identification; Sect. 28.3 “Mate-
rials and Methods” discusses the fine-tune models; Sect. 28.4 “Experimental Results”
depicts the experiments and results; and finally, Sect. 28.5 “Conclusion” concludes
the work.
28.2 Literature Survey
Deep learning and image processing techniques have been widely used for the detec-
tion of plant leaf diseases. Several works related to plant leaf diseases are observed
in literature and introduced some baseline approaches on deep learning for object
detection. In this section, we describe a recent survey related to this field.
Liu and Wang [3] suggested a recognition method with Low memory consump-
tion, high recognition accuracy, and speed. The new solution for predicting tomato
leaf spot early and a new concept for diagnosing tomato leaf spot intelligently.
A deep convolutional neural network with an attention mechanism was proposed
by Zhao et al. [4], which adapts the detection of a number of tomato leaf diseases.
Residual blocks and attention extraction modules make up the majority of the
network structure. The model can extract complex aspects of numerous diseases
with accuracy.
To characterize mango leaves affected by anthracnose and powdery mildew infec-
tions, Janakiramaiah et al. [5] presented a variation of CapsNet called Multilevel
CapsNet. General Infections in mango trees are caused by a variety of climatic and
fungal infections, which has resulted in a decrease in mango quality and quantity.
Picon et al. [6] employed a deep residual neural network-based technique to
diagnose a variety of plant disorders under real-world acquisition situations, with a
few adaptations to detect disease early.
Using highly connected convolutional neural networks, Subramanian et al. [7]
used corn leaf images with three classes of diseases and one healthy class, obtained
from Web sites (CNNs). To fine-tune and minimize the training period of the
suggested models, Bayesian optimization is utilized to determine optimal values
for hyperparameters, and transfer learning is investigated.
28 Fine-tuning for Transfer Learning of ResNet152 297
Table 28.1 Dataset classes Classes of tomato leaf Train dataset Test dataset
Healthy 1000 100
Leaf mold 1000 100
Bacterial spot 1000 100
Mosaic virus 1000 100
Early blight 1000 100
Late blight 1000 100
Spider mites two-spotted spider
mite
1000 100
Septoria leaf spot 1000 100
Target spot 1000 100
YL_Curl virus 1000 100
Tot a l 10,000 1000
Fuentes et al. [8] suggested a system for detecting and locating plant anoma-
lies and producing diagnostic results, displaying anomalous areas, and describing
sentence symptoms as output. The newly produced tomato plant anomaly dataset
has an average of 92.5% accuracy.
28.3 Materials and Methods
28.3.1 Dataset
The dataset contains 10,000 images belonging to 10 classes of training data and
1000 images belonging to 10 classes of test data, as shown in Table 28.1. The dataset
classes are Healthy, Bacterial spot, Early blight, Late blight, Leaf Mold, Mosaic virus,
Spider mites Two-spotted spider mite, Septoria leaf spot, Target Spot, YL_Curl Virus
and are shown in Fig. 28.1.
28.3.2 Transfer Learning-Based CNN Model
Transfer learning is commonly expressed in computer vision over pre-trained models
and deep convolutional neural networks (CNN). CNN’s excellent performance and
ease of training are the two key aspects that have driven its popularity throughout
the years. The three significant ways in the transfer learning process to transform the
pre-trained model are prediction, feature extraction, and fine-tuning.
298 L. R. Burra et al.
Fig. 28.1 Dataset classes
1. Prediction: If the problem is within the scope of a pre-trained model, a common
approach is taken and predicts the labels for the images. Finally, it delivers
accurate results for similar data used to train the model.
2. Feature extraction—As a mechanism of feature extraction, we can employ a
pre-trained model, but the complete model does not need to be (re)trained. By
dropping the output layer, we may use the entire network as a fixed feature
extractor for the new dataset (the one that gives the probabilities for being in
each of the 1000 classes).
3. Fine-tune model: Unfreeze a few of the top layers of a fixed model base and
train both the new classifier layers and the base model’s final layers simultane-
ously. This allows to “fine-tuning” the higher-order feature representations of the
underlying model to make them more relevant for the task at hand and retrain
the new data with a very low learning rate.
In general, for fine-tuning the model, transfer learning in CNN can be applied in
four scenarios.
Scenario 1—New dataset is small and similar to the original dataset—As the
dataset is small, fine-tuning may lead to over-fitting the model. So, use ConvNet
as a fixed feature detector, add any linear classifier (the dense layers) on top, and
train that classifier only.
Scenario 2—New dataset is large and similar to the original datasetthe dataset
is large and uses the fine-tune approach to save the time taken in training the
network from scratch. If we try to fine-tune over the network, this can be more
confident that the model will not lead to over-fit.
Scenario 3—New dataset is small but very different from the original dataset—If
the dataset is significantly different from the original, the fine-tuning of ConvNet
is recommended, but make sure not to go too deep into the network. It only adjusts
the weights of a few layers.
Scenario 4—New dataset is large and very different from the original dataset—
Because the large dataset and different from the ImageNet data, the possibility is
28 Fine-tuning for Transfer Learning of ResNet152 299
to move forward to train the ConvNet from scratch, but it takes huge training time
to the network. However, it is useful to begin with pre-trained model weights in
actuality.
28.3.3 Performance Evaluation
The performance indicators such as recall, precision, accuracy, and F1-score were
used to evaluate the training and testing dataset. The indicators are as follows:
Accuracy: the total no. of correctly classified images to the total number of images.
TP +TN
TN +FP +TP +FN
Precision: proportion of the no. of correctly predicted observations to the total no.
of optimistic predictions.
TP
FP +TP
Recall: the proportion of correctly predicted to all observations in that class.
TP
FN +TP
F1-Score: harmonic mean between precision and recall.
2precision recall
precision +recall
28.4 Experimentation
In this work, the pre-trained model ResNet152 has experimented on a tomato leaf
disease identification dataset for analysis in different stages. The parameters used
for this experimentation are the image size of 224 ×224, the execution process of
optimization is SGD, the learning rate is 0.0001, size of the batch is 10, the loss is
cross-entropy, the number of epochs is 50, and the number of trainable parameters
is 58.14M as shown in Table 28.2.
300 L. R. Burra et al.
Table 28.2 Summary of
pre-trained ResNet152 Parameters ResNet152 architecture
Image size 224 ×224
Layers 152
Learning rate 1e-3
Batch size 32
Optimizer SGD
loss Cross-entropy
Epochs 50
Trainable parameters 58.14
1. Prediction: The ResNet152 model is utilized in the experimental for prediction
on tomato leaf disease identification test dataset. As this results in wrong predic-
tion because the test dataset images are different from the original ImageNet
dataset. As a result, it is unsuitable for forecasting.
2. Feature Extraction: Using the ResNet152 model, the experimentation with the
tomato leaf disease identification test dataset provides fixed features. It is solely
good for extracting the features.
3. Fine-tune Model: In this approach, fine-tuning is applied to the deep learning
model that has been trained before on a different dataset. The experimentation is
done with four different paradigms using ResNet152 model.
a. New dataset is small and similar to the original dataset: Fine-tune the
ResNet152 model on a small dataset pre-trained on the ImageNet dataset. As a
result, any linear model can train the classifier. But here, the tomato leaf disease
detection dataset has large data, so this paradigm is not suitable.
b. New dataset is large and similar to the original dataset: Since the tomato leaf
disease identification dataset has more data and doesn’t happen any over-fit, the
entire network has to be fine-tuned. So, this approach is not preferable to this
application.
c. New dataset is small but different from the original dataset: To fine-tune the
ResNet152 model in this approach, the tomato leaf disease detection test dataset
has more data. It doesn’t work to adjust the weights somewhere earlier in the
network. So, this approach is not suitable for this application.
d. New dataset is large and different from the original dataset: Since the tomato
leaf disease identification dataset is huge and different from the original ImageNet
dataset, it is suitable to fine-tune the ResNet152 model from scratch and is
beneficial to initialize with weights from a pre-trained model in training.
28.5 Results and Discussion
This work employs ResNet152 pre-trained model on tomato leaf disease identifica-
tion for analysis of transfer learning and using of pre-trained models in deep learning.
28 Fine-tuning for Transfer Learning of ResNet152 301
The classification report of the model on test dataset was shown in Table 28.3.The
average precision for the 10 classes on tomato leaf disease identification is 94%,
recall is 94.2%, F1-score is 94%, and the accuracy is 94.2%.
Similarly, the classification report on the train dataset for the model ResNet152
is shown in Table 28.4. The average precision is 96%, recall is 96.75%, F1-score is
96%, and achieved accuracy is 96.75%.
Table 28.3 ResNet152 classification report on test dataset
Classes Precision (%) Recall (%) F1-score (%) Support
Healthy 93 93 93 100
Leaf mold 95 97.93 96.44 100
Bacterial spot 95 95.96 95.47 100
Mosaic virus 94 91.26 92.6 100
Early blight 96 93.2 94.57 100
Late blight 95 95 95 100
Spider mites two-spotted spider mite 93 94.89 93.93 100
Septoria leaf spot 94 91.26 92.6 100
Target spot 93 97.89 95.38 100
YL_Curl virus 94 92.15 93.06 100
Accuracy 94.2 1000
Table 28.4 ResNet152 classification report on train dataset
Classes Precision (%) Recall (%) F1-score (%) Support
Healthy 95.3 96.94 96.11 1000
Leaf mold 97.5 96.91 97.2 1000
Bacterial spot 97.2 96.81 97 1000
Mosaic virus 96.1 96.58 96.33 1000
Early blight 97.7 96.54 97.11 1000
Late blight 97.2 96.90 97.04 1000
Spider mites two-spotted spider mite 96.6 96.40 96.49 1000
Septoria leaf spot 97.8 96.64 97.21 1000
Target spot 95.9 97.16 96.52 1000
YL_Curl virus 96.2 96.58 96.38 1000
Accuracy 96.75 10,000
302 L. R. Burra et al.
28.6 Conclusion
In the realm of image recognition, deep learning architecture has gained a lot of
popularity. To identify diseased plant leaves, the various pre-trained models effec-
tively analyze the large collection of image data and identify them with minimum
error. Fine-tuning allows expanding the capabilities of state-of-the-art models to
other domains. This work performs fine-tuning the model and its analysis with four
paradigms of transfer learning on tomato leaf disease identification. The ResNet152
architecture is employed and suggests the entire network to be trained for this target
dataset.
References
1. Wang, X., Liu, J., & Zhu, X. (2021). Early real-time detection algorithm of tomato diseases and
pests in the natural environment. Plant Methods, 17(1), 1–17.
2. Ramani, B. L., Poosapati, P., Tumuluru, P., Saibaba, C. H. M. H., Radha, M., & Prasuna,
K. (2019). Deep learning and fuzzy rule-based hybrid fusion model for data classification.
International Journal of Recent Technology and Engineering, 8(2), 3205–3213.
3. Liu, J., & Wang, X. (2020). Early recognition of tomato gray leaf spot disease based on
MobileNetv2-YOLOv3 model. Plant Methods, 16(1), 1–16. Barbedo
4. Zhao, S., Peng, Y., Liu, J., & Wu, S. (2021). Tomato leaf disease diagnosis based on improved
convolution neural network by attention module. Agriculture, 11(7), 651.
5. Janakiramaiah, B., Kalyani, G., Prasad, L. V., Karuna, A., & Krishna, M. (2021). Intelligent
system for leaf disease detection using capsule networks for horticulture. Journal of Intelligent &
Fuzzy Systems, (Preprint), 1–17.
6. Picon, A., Alvarez-Gila, A., Seitz, M., Ortiz-Barredo, A., Echazarra, J., & Johannes, A. (2019).
Deep convolutional neur al networks for mobile capture device-based crop disease classification
in the wild. Computers and Electronics in Agriculture, 1(161), 280–290.
7. Subramanian, M., Lv, N. P., & VE, S. (2021). Hyperparameter optimization for transfer learning
of VGG16 for disease identification in corn leaves using Bayesian optimization. Big Data.
8. Fuentes, A. F., Yoon, S., & Park, D. S. (2019). Deep learning-based phenotyping system with
glocal description of plant anomalies and symptoms. Frontiers in Plant Science, 10, 1321.
Chapter 29
AI-Based Mental Fatigue Recognition
and Responsive Recommendation System
Korupalli V. Rajesh Kumar, B. Rupa Devi, M. Sudhakara,
Gabbireddy Keerthi, and K. Reddy Madhavi
Abstract Complete comfort is considered the definite one among all the foremost
concerns in the present scenario. The organization has been proceeding with more
contemplation in upgrading their work comfort. The workers also proceed with many
perspectives to enhance their ongoing comfort status. However, comfort is generally
related to their everyday activity and conduct, mainly in the worksites where it
influences both the levels of stress and mood, resulting in mental fatigue. Specifically,
the characteristic of the person’s comfort will be affected by the conduct in a worksite.
We suggested a mental fatigue identification system that embraced deep learning
techniques for supplying non-intrusive observing systems. The comfort level has
been classified based on surveys: pressure and mood. This preparatory study instructs
the model to generalized and personalized models of classifications. The model as
personalized perspective has been taken as one of the steps to supply a personalized
health resolution support system that helps in elevating the awareness in customers
and motivates them to upgrade their conduct and eventually put up to the best comfort
system. We have attained an accuracy of 87% on the model generic and 94% on the
model personalized.
K. V. R. Kumar (B)
School of Business, (AI & ML), Woxsen University, Hyderabad, Telangana, India
e-mail: rajesh.kumar@woxsen.edu.in
B. R. Devi
Annamacharya Institute of Technology and Sciences, Tirupati, AP, India
e-mail: rupadevi.aitt@annamacharyagroup.org
M. Sudhakara (B)
School of Computing and Information Technology, Reva University, Bangalore, India
e-mail: malla.sudhakara@reva.edu.in
G. Keerthi
CSE, Vignan’s Foundation for Science, Technology and Research, Guntur, A.P, India
K. R. Madhavi
Sree Vidyanikethan Engineering College, Tirupati, AP, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
B. N. K. Rao et al. (eds.), Intelligent Computing and Applications, Smart Innovation,
Systems and Technologies 315, https://doi.org/10.1007/978-981-19-4162-7_29
303
304 K. V. R. Kumar et al.
29.1 Introduction
Complete comfort is considered the definite one among all the foremost concerns.
Research manifested the work-related comfort that could affect basic comfort in the
long term, including some of the symptoms and depressive. Other research proposed
the health of low levels and comfort, leading to major consequences for the worker
and the organization. The outcomes can incorporate a lawsuit to the work suite related
to disease and the cost for vanished efficiency, leading to worksite harassment. The
research proposed which is lessening stress upon the worker is one of the effective
ways to upgrade the worker’s comfort. Diversely, appreciation at work also leads
to a powerful effect on the performance of the work. A research pointed as that in
a positive way of mood is dynamically based on pro-social conduct [1]. This way
needs an extra friendly worksite to attain elevated work performance.
Regarding health, research was managed to investigate the consequence of stress
and mood that results in causing other symptoms that incidentally result in some
actions from utmost stress. The research proposed that inactive working conduct led
to reduced productivity and the possibility of having worry and additional physical
affliction [2]. The research also tapered out the motion during the functioning hour,
which incorporates the swap of standing, sitting, and walking to lessen the possibility
of associated afflictions. Overall, mood and stress play a vital role in every indi-
vidual’s everyday comfort. This study suggested preparatory research of everyday
comfort identification systems based on mood status by utilizing deep learning tech-
niques in the office environment. We have utilized a coalescence of the conduct data,
that are, the movements of both hand and body, and also the climate conditions are
the inputs to the system [3,4]. The perspective was evolved with the theory to be
less invasive to the worker even as possible while keeping up the potential to observe
the important details from the worker. Hence, we have utilized a webcam to observe
content in the investigation. Following this procedure, the framework does not need
workers to put additional devices and remains non-intrusive for workers’ everyday
routines. Besides, we must assume numerous concepts concerning comfort, and we
have utilized algorithms, namely fuzzy clustering, to cluster the attributes of both
pressure and mood to mention comfort level. We managed a preparatory investiga-
tion to evolve the comfort detection design to exhibit the framework. The scheme
also contemplated where every individual has dissimilar conduct in the worksite.
Consequently, we have exhibited an evolution of a customized classification pattern.
However, we evolved a general classification method based upon the data from all the
available subjects. Its main objective is to give feedback in real time to the utilizer. In
this preparatory research, we evolved the classification method upon an everyday
based on the gathered information. The suggested functional system is planned
to obey a resolution-dependent system in elevating the recognition of the person’s
conduct and comfort. Figure 29.1 shows the daily life routine scenarios—stress and
mood conditions to energy levels, resulting in mental fatigue.
29 AI-Based Mental Fatigue Recognition 305
Fig. 29.1 Human
life—daily
routine—stress-related
aspects
29.2 Related Literature on Responsive System
In this segment, we debate further on how come technologies have given in resolving
the comfort problems. This segment is cleaved into two parts, namely stress-based
mental fatigue and personal mood-based mental fatigue. Physical fatigue is also one
reason for mental fatigue.
29.2.1 A Stress-Based Mental Fatigue—Responsive System
One research debated a way to acknowledge pressure for sustainable life. The
research utilized a mobile phone-like sensor to withdraw mobile phone utilization,
comprising the call log, Bluetooth interactivity, and SMS log. Despite utilizing the
mobile phone, which activates the data, other research suggested a less invasive solu-
tion. The research utilized a smartphone to observe all the physical activities rather
than mobile data. It determines the stress levels of the utilizer through utilizing the
self-analysis process. The conversations about pressure produced a depth review of
the system. Stress detection framework approaches might comprise the data usage:
conduct responses, multimodal methods include many types of data, physiological
signals, and contextual events. This analysis points out key challenges about the
stress detection framework: ubiquitous, obstructiveness, and noninvasiveness. The
analysis proposed those factors that performed a vital role in the utilizers’ satisfaction
and acceptance [5,6].
306 K. V. R. Kumar et al.
29.2.2 Personal Mood-Based Mental Fatigue
The application was represented in mood detection for bipolar disorder. Normal
state mood detection, movement detection, and upper body posture were suggested.
The research estimates the consequences of music on different subject moods, as the
subject’s moods are set on by utilizing an examination. The positions of every subject
are examined, and the research has attained 81.30% precession to the complete
recognition of both happy and sad moods. One more way to recognize the mood
has been known as facial recognition. Other research suggested a facial detection
framework that accurately determines the mood up to 90.43%. Deep learning tech-
niques are applied in this research to attain a precise elevated model to detect. The
feedback on mood recognition methods comprises audio attributes, physiological
attributes, and video attributes. These works have proposed numerous functionali-
ties for recognizing personalized mood and stress levels. The studies of both mood
and stress proposed are related to the whole comfort of an individual. Generally,
these researches tend to contemplate only one: mood and stress. In this research, we
suggested a way to consider both important attributes. The mechanism like clustering
using fuzzy is applied for deciding the person’s comfort level [79].
29.3 Methods
29.3.1 Technological Approach—IoT Devices to Acquire
Data
The crucial elements in the suggested approach are formed on the architecture of
M2M and IoT. Figure 29.2 represents an outline of its components and the system.
In this research, we have utilized the web cameras as a sensory system on the device
for every server area affixed to the hardware device—Raspberry Pi board. To the
system, Raspberry Pi will take measures as a gateway. Figure 29.2 shows the IoT
device—Raspberry Pi unit integrated with a camera to monitor the condition. To
lessen the workload on the server gateway is the most crucial component by executing
effortless data preprocessing tasks. The server holds numerous tasks on the cloud
service component that comprises data storage, everyday conduct extraction by deep
learning techniques, and classification utilizing instructed models. Outcomes of this
classification could be dispatched to utilizer terminals. This response behaves like
part of the conclusion support framework to the utilizer as it assists in raising the
realization concerning their working conduct and comfort. The absolute aim of the
concept is to issue a real-time response to the utilizer about their conduct. The system
could issue a response stand over a synopsis of the tracking information. However,
we exhibited a preparatory experiment in this research by utilizing a synopsis of
information daily [10].
29 AI-Based Mental Fatigue Recognition 307
Fig. 29.2 IoT device
integrated with camera to
monitor condition—on
worksite—proposed model
29.3.2 Research Procedure
Figure 29.3 represents the whole experiment course of action in this research. We
experimented by executing a web camera with 120° observing angles in the resolution
of HD for the collection of data. The subjects are instructed before the experiment
regarding its purpose. The agreement forms are provided and made it signed by all
subjects. The surveys are discussed later, and instructions are provided to the subjects
regarding the survey completion. The process of monitoring has been lasted and
started for 25 days, utilized the information only when the subjects were available,
and finished the survey at the workstation. Behind, we have carried out the conduct
extraction by utilizing the hand detector and body detector, which was instructed with
the deep learning algorithm. A procedure of instructing the image identifiers has been
debated. Eventually, we instructed the classification model besides the withdrawn
conduct data. Information of the algorithms is utilized in instructing the classification
models.
Fig. 29.3 Survey
modeling—data analysis
using AI learning framework
308 K. V. R. Kumar et al.
Surveys
This research utilized the data from the extensive surveys conducted in two patterns
to classify the everyday data to dissimilar comfort levels, though this research has not
utilized the subject’s raw score for classification. Rather, we executed fuzzy clustering
on the scores of a survey for splitting the scores into two individual groups. Every
group acts for either a lower level or a higher level of comfort. The techniques are
explained in detail below. The following surveys are utilized in this research.
29.3.3 Mood
The PNC, abbreviated as positive and negative conditions, is a mood survey in 2D. To
assess mood, we utilize a popular tool PNC as this is manifested as reliable, mainly
with assessing the short term. Generally, the survey efficiently assesses the mood for
“today” or “now”. The directions need to be given clearly where “now” mentions the
feelings of the current moment and “today” mentions the feelings of the entire day.
Research focuses on the survey, which was considered on day basis. Every survey
comprises 15 words, as either negative effect or positive effect. The greater score
on the positive effect relates to words representing as PA—positive affect A greater
score on the negative effect relates to words representing NA—negative affect in the
subject’s condition of mood.
29.3.4 Stress
Here, this research utilizes documenting discernible stress levels on 100 mm VAS.
The scale of “0” represents “stress free”, and the scale of “100” represents “stress
full”. The approach of utilizing this type of survey is represented to create an everyday
record of mood and stress. The list of everyday stress assessments is represented to
subjects as instructions for documenting the stress level.
29.3.5 Methods—Machine Learning Models
In this research, we utilized a supervised learning model to enlarge the image recogni-
tion models utilizing a convolution neural network—CNN. Consequently, this avoids
the bottleneck which might occur. We trained CNN to discover objects, hands, and
bodies. In this research, we documented a video of 25 days, with a period of 1 h
per day, the exterior of experiment duration, and utilized a dataset for the instructing
model. We have taken out frames of 800 of a random content from the video and
named the region boundaries for both hands and body. Next, the data is segmented
into the train and test set with the proportion of 75:25. The CNN model has attained
29 AI-Based Mental Fatigue Recognition 309
an accuracy of 0.92 for body detection and 0.76 for hand detection. Figure 29.5
represents an image sample when we find the hand and body detectors.
Classification Algorithms
The algorithms utilized to instruct the comfort classification model are as follows:
Support vector machines (SVM)
Decision trees (DT)
K-nearest neighbors (KNN)
The KNN is extensively known as an algorithm for instructing a classification
strategy. In this research, KNN is carried out with K=5. The SVM is also extensively
utilized in binary classification issues. The clustering survey outcomes are manifested
in two classes for the comfort classification, which acts for high and low comfort
levels. Hence, SVM is appropriate to the situation. In this research, we instructed the
SVM model, including Gaussian kernel and linear kernel functions. The Gaussian
function executed is superior, so we debated specifically the outcomes from the
function. Besides, decision trees are utilized as an algorithm which did not depend
on distance. Hence, it has given other approaches to instructing the model [11,12].
Based on the results, ranking of program executed on sequence using improved
precision in regression testing in [13] and implementation of electronic medical
health record in [14].
29.4 Evaluation and Results
Data Collection
We experimented on two humans, one female and one male, with ages 25 and 26
sequentially. These two subjects have their worksite, so there is no need of traveling
from source to destination. The web camera will be capturing the working conduct of
the top view throughout the daytime. The survey conducted on the subject completes
at the end of every day, and document is analyzed on day basis on mood and stress
levels. The complete process of this research is conducted for 25 days. The data was
utilized when the subjects were subjected to the worksite. The system withdrew the
conduct from these days by utilizing the instructed hand and body detectors.
29.4.1 Survey Results and Clustering
Table 29.1 represents an easy illustrative statistic of the outcomes from both the
survey of the subject, where “Whole” acts for the illustrative statistic of the survey
outcomes from two subjects merged. A kindly record is that both subjects have an
observable dissimilarity in stress and PA score. This might be the outcome of either
whole workload, as subject 1 has greater working hours than subject 2, resulting
310 K. V. R. Kumar et al.
Table 29.1 Subjects—PA
and NA—stress score (on
Scale 100)
Min Max
Sub1PA 21 45
Sub1NA 34 53
Sub1Stress 43 85
Sub2PA 27 48
Sub2NA 37 55
Sub2Stress 46 87
Overall PA 23 47
Overall PA 35 54
Overall Stress 44 86
in greater stress. The whole data is normalized before the process of clustering.
Extracted results are given in Table 29.1.
29.4.2 Clustering
We utilized a fuzzy C-means (FCM) clustering algorithm to categorize the survey
outcomes as clusters, either lower or higher comfort levels. The benefit of the FCM
is which utilizes possibility during computation rather than utilizing distance basis.
In this scenario, score from the survey can be enigmatic as a score of a definite day
might be adjacent to the mean and categorized into an incorrect class. For every
subject and the merged results, we have a total of two clusters. Subject A’s cluster
shows a greater level of comfort PA mean in the cluster is greater than, both stress
and NA are lesser than the whole mean of subject A. Lesser level of comfort shows
a lesser PA. Both stress and NA values are greater than the whole mean of subject
A, represented by cluster two of subject A. A similar attribute is also represented in
subject B’s whole outcomes and clusters. The mean of every attribute is a bit different
from the datasets of subjects A and B. We have utilized the clustering outcomes of
A cluster to instruct the customized comfort identification by utilizing only the data
of subject A (Fig. 29.4).
29.4.3 Classification Models
Behavior Information Extraction
In this concept building image processing mainly computer vision techniques are
used. The focus is on determining hands and body in the frame brought out from the
video, and in this research, we have analyzed 1 f/s (frame/second) in every video. The
determined hands and body are utilized to compute the following conduct attributes:
29 AI-Based Mental Fatigue Recognition 311
Fig. 29.4 Clustering scores
Fig. 29.5 Daily behavior extraction—based on hand movements—stress conditions—mental
fatigue
mean of the body motion, mean of hand 1 and 2 movement, total time available in
the worksite, meantime at worksite/activity, number of activity changes, and total
time not available in the worksite. Hand movements were observed using motion
detection as shown in Fig. 29.5.
The window size for these computations is equal to the total time taken to work for
the subject in a day. All in all, we have seven attributes from the conduct extraction,
and on an everyday basis, these attributes got extracted. Every data point represents
312 K. V. R. Kumar et al.
the conduct of a particular day. The data points are named considering the survey
analysis—outcome score of that particular day. As mentioned, we have utilized FCM
to separate the outcomes into two classes.
29.4.4 Classification Model Training
Besides the everyday conduct data, we utilized everyday climate as a key attribute.
This sort of psychological data was manifested to affect the person’s mood. Generally,
the temperature will have a major affect on the person NA. Though this is not required
for relating with one particular rule, preferences and personality are generally taken
into account, as less temperature might decrease NA in one person and increase NA
in another person. We have to document the highest and lowest temperatures every
day and affixed them to their everyday conduct data. Hence, we have seven attributes
from the conduct extraction and two from the weather data. We instructed the model
with the arranged dataset, where 25% is utilized as test data and 75%is utilized
as instructed dataset. The algorithm utilized for instructing is the three algorithms
that are mentioned earlier. Before training process, data is normalized. Generally,
we have trained the model on two criteria: Firstly, model was with two subjects
data. We have utilized 21 days of information from subjects A and B. Secondly, the
customized model is trained with some data from subject A. This approach illustrated
the probability of instructing the model for an individual comfort framework. Kindly
note that we have not instructed the personalized model for subject B as we have
only a small dataset from it.
29.4.5 Results
For the general model, the SVM algorithm attained the highest precision with a
precision of 87%. The D-tree algorithm attained the lowest precision, where the
precision was 65%. The PA model is based A’s data, and the highest precision is
SVM, KNN, and D-tree as 94%,79%, and 71%, respectively, as shown in Fig. 29.6.
Generally, SVM is carried out best among the types of models. This also might be
because the classification issue is a binary classification. Among machine learning
algorithms D-tree (Decision tree) was not accurate in prediction based on two model
sources, as this only attained the highest precision of 94%. Finally, precision for the
PA model is higher than the general model, which also suggests a good potentiality
in the personal development.
29 AI-Based Mental Fatigue Recognition 313
Fig. 29.6 Machine learning
algorithms response on
recommendation
system—mental fatigue
0
20
40
60
80
100
SVM KNN D-TREE
Machine Learning - Algorithms
NA PA
29.5 Conclusion and Discussion
In this research, we suggested a comfort identification framework based upon the
status of mood and stress. We exhibited preliminary outcomes from this research
with machine learning and deep learning techniques for everyday information gath-
ering/extraction in the closed environment. The research exhibited the precision for
the two generic and personalized models, besides the noninvasiveness system. More-
over, we utilized the SVM algorithm to cluster the three attributes from the surveys:
stress and mood related. The clustering outcomes are utilized to classify everyday
conduct data. This approach comes up with comfort recognition as it considers
numerous aspects. We instructed the classification model by utilizing three algo-
rithms that represent a satisfactory level of precision. The personalized and generic
models used more than 87% with SVM model. The classification model in this
research has exhibited the probability of development the model for PA information
system.
Furthermore, challenges and development to this preliminary research thought
of incorporating all the factors exterior to the office, like personal traits, physical
activities, and dietary, can assist the system in detecting the person’s comfort better
and simultaneously keeping the system more personalized. Generally, furthermore,
with the application of techniques, long duration of data collection time, and sensors,
we anticipate achieving a more precise model for the real-time comfort system.
On the whole, the research has suggested a comfort recognition system with two
personalized and generic recognition models utilizing the cutting-edge technology
such as deep learning when we consider numerous attributes regarding comfort.
References
1. Vaskari, R. G., & Sugumaran, V. B. (2020). Prevalence of stress among software professionals
in Hyderabad, Telangana State, India. Central African Journal of Public Health, 6(4), 207.
2. Matsumoto, T., Egawa, M., Kimura, T., & Hayashi, T. (2019). A potential relation between
314 K. V. R. Kumar et al.
premenstrual symptoms and subjective perception of health and stress among college students:
A cross-sectional study. BioPsychoSocial medicine, 13(1), 1–9.
3. Umematsu, T., Sano, A., Taylor, S., & Picard, R. W. (2019). Improving students’ daily life
stress forecasting using LSTM neural networks. In 2019 IEEE EMBS international conference
on biomedical & health informatics (BHI) (pp. 1–4).
4. Kumar, K. V. R., & Elias, S. Use case to simulation: Muscular fatigue modeling and analysis
using opensim. Turkish Journal of Physiotherapy and Rehabilitation, 32(2).
5. Martin, K., Meeusen, R., Thompson, K. G., Keegan, R., & Rattray, B. (2018). Mental fatigue
impairs endurance performance: A physiological explanation. Sports Medicine, 48(9), 2041–
2051.
6. Pageaux, B., & Lepers, R. (2018). The effects of mental fatigue on sport-related performance.
Progress in Brain Research, 240, 291–315.
7. McCormick, M. P., Hsueh, J., Merrilees, C., Chou, P., & Mark, C. E. (2017). Moods, stressors,
and severity of marital conflict: A daily diary study of low-income families. Family Relations,
66(3), 425–440.
8. Sudarma, M., & Harsemadi, I. G. (2017). Design and analysis system of KNN and ID3 algorithm
for music classification based on mood feature extraction. International Journal of Electrical
and Computer Engineering, 7(1), 486.
9. Taylor, S., Jaques, N., Nosakhare, E., Sano, A., & Picard, R. (2017). Personalized multitask
learning for predicting tomorrow’s mood, stress, and health. IEEE Transactions on Affective
Computing, 11(2), 200–213.
10. Kumar, K. V. R., Kumar, K. D., Poluru, R. K., Basha, S. M., & Reddy, M. P. K. (2020). Internet
of things and fog computing applications in intelligent transportation systems. In Architecture
and security issues in fog computing applications (pp. 131–150). IGI Global.
11. Bhogaraju, S. D., & Korupalli, V. R. K. (2020). Design of smart roads—A vision on Indian smart
infrastructure development. In 2020 International conference on communication systems &
networks (COMSNETS) (pp. 773–778).
12. Bhogaraju, S. D., Kumar, K. V. R., Anjaiah, P., Shaik, J. H., & Reddy Madhavi. (2021).
Advanced predictive analytics for control of industrial automation process. In Innovations in
the industrial internet of things (IIoT) and smart factory (pp. 33–49). IGI Global.
13. Dr. Narendra Kumar Rao, B., & Bhaskar Kumar Rao, B. (2019). Clustering based test suite
selection for ranking of program execution sequence using improved precision in regression
testing. International Journal of Innovative Technology and Exploring Engineering, 8(7).
14. Dr. Narendra Kumar Rao, B., & Bhaskar Kumar Rao, B. (2019). Block chain Based imple-
mentation of electronic medical health record. International Journal of Innovative Technology
and Exploring Engineering, 8(8).
Chapter 30
Multiple Slotted Triple-Band PIFA
Antenna for Wearable Medical
Applications at 2.5–9 GHz
T. V. S. Divakar and G. Anantha Rao
Abstract This paper discusses about a small-sized planar inverted F-antenna (PIFA)
for tripple bands operation. The recommended antenna contains a stepped rectangular
radiating component near to the edge of the PIFA besides having a ground plane with
complete copper coating. The dimensions of this antenna are 26.7 ×16 ×1mm
3,
and feeding technique used is microstrip line. Here, simulated S11 values, voltage
standing wave ratio (VSWR), and gain are fairly related to each other. The proposed
antenna structure operates at 2.5 and 8.9 GHz frequency range below 15 dB. The
mentioned antenna structure is suitable for wearable medical applications.
30.1 Introduction
In recent years, wearable antennas are showing predominant thurst in antenna designs
due to many applications in well-known areas, like health monitoring, physical
training, and tracing. Since this type of antenna is close to the human body and more
over it has many bends, the antenna performance must take into account the losses
because of the human body and also bending losses, besides having the optimum
performance. So, this type of antenna systems must have low radiation effect which
can be measured by specific absorption rate (SAR). Also, the designed antenna
should have a flexibility to attach to the human bodies to increase comfort while
humans wear it. Since fabric materials are light in weight and easy to wear, they
can be used to incorporate this type of antennas. Besides the advantages of PIFA
antennas like size and cost, there are some disadvantages like “poor impedance
matching, poor proficiency, and excitation of surface waves that could bring down
the radiation effectiveness.” A small sized and low profile fabric electromagnetic
bandgap-based (EBG) antenna for wearable health applications which operates at
T. V. S. Divak a r ( B)·G. Anantha Rao
Department of Electronics and Communication Engineering, GMR Institute of Technology,
Rajam, AP, India
e-mail: divakar.tvs@gmrit.edu.in
G. Anantha Rao
e-mail: anantharao.g@gmrit.edu.in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
B. N. K. Rao et al. (eds.), Intelligent Computing and Applications, Smart Innovation,
Systems and Technologies 315, https://doi.org/10.1007/978-981-19-4162-7_30
315
316 T. V. S. Divakar and G. Anantha Rao
2.1 GHz is explored in [1], with help of EBG structure reduction in the back radi-
ation and improvement in FBR was achieved, SAR of 0.0368 w/kg was achieved
with dimensions 46 ×46 ×2.4 mm3. Investigation on textile antenna with artificial
magnetic conductor is carried out in [2,3]. A dual-resonant mode wearable textile
PIFA for 5 GHz WLAN applications is designed in [4]. In this design, using hollow
copper rivets and wool felt SAR of 0.9307 was achieved. A small-sized dual-band
fabric PIFA for 432 MHz/2.4 GHz ISM bands was proposed which is made of 6 mm
thick felt and 0.17 mm thick conductive textile sheet at the back of the PIFA [57].
Supple fractal EBG for mm-Wave wearable antennas was projected in [8].
This paper explains a tripple-band miniaturized PIFA antenna with edge slots at
the end of the PIFA when viewed from feed element to gain good impedance match
and to have tripple-band action at 2.5 and 8.7 GHz. The projected antenna is small
in size when compared with other antennas in the literature [9].
The organization of this manuscript is like this. Section 30.2 describes the antenna
design with all dimensions of the antenna, and Sect. 30.3 describes the results and
discussion followed by conclusions.
30.2 Antenna Design
The complete size of proposed design is 26.7 ×16 ×1mm
3as shown in Fig. 30.1.
It comprises of PIFA structure which can be divided as two vertical patches of 13.54
×2.72 mm2and 14.079 ×2.72 mm2when viewed from microstrip feeding end.
Two more rectangular patchs of 11.57 ×1.38 mm2and 12.10 ×1.38 mm2were
placed to fulfill PIFA structure. One square slot of 1 ×1mm
2placed near to the
two patches to decrease the return loss and to get perfect impedance matching for
the structure. Ground plane is completely shielded with copper coating. Simulations
were carried out using HFSS software, and all the measurements were augmented by
trail and error process besides maintaining primary values constant to get preferred
features. The thickness, dielectric constant of the dielectric material, and loss tangent
of FR4 substrate are 1.6 mm, 4.4, and 0.02, respectively. Resonating frequencies are
threefold between 2.5 and 8.9 GHz. Sizes of the recommended antenna are accessible
from Fig. 30.1.
30.3 Results and Discussion
The characteristics of reflection coefficient were displayed in Figs. 30.2 and 30.4,
and we can observe that the antenna is resonating at the higher frequency with good
resonance compared to the lower frequency with edge slot compared to without slot.
Figures 30.3 and 30.5 show the corresponding VSWR characteristics. We can observe
that there is a good agreement with the reflection coefficient and VSWR at the desired
frequencies. Recommended antenna works in the double band of frequencies at 2.24
30 Multiple Slotted Triple-Band PIFA Antenna 317
Fig. 30.1 To p v i ew of
antenna
and 8.8 GHz. Because of the edge slot, there is a notable change in the reflection
coefficient, and it was observed that if the measurements of air box and operation
frequency was changed, there are minor variations in the operating frequency.
Figure 30.6 displays the reflection coefficient vs frequency with two-slot antennas
on the structure, and it can be observed that a return loss of 20 dB for 2.5 GHz
frequency and 30 dB at 8.7 GHz frequency. Figure 30.7 displays the VSWR
versus frequency with two-slot antennas on the structure, and it can be observed that
a VSWR of 1.96 and 1.02 at the desired frequencies, which shows the antenna is
resonating at the operating frequencies.
Figure 30.8 displays the reflection coefficient versus frequency with three-slot
antennas on the edge of the structure, and it can be observed that a return loss of
21 dB for 2.5 GHz frequency and 29 dB at 8.7 GHz frequency. Figure 30.9 displays
the VSWR versus frequency with three slot antennas on the edge of the structure,
and it can be observed that a VSWR of 1.66 and 0.67 at the desired frequencies,
Fig. 30.2 Antenna having
edge and middle slots
318 T. V. S. Divakar and G. Anantha Rao
Fig. 30.3 Antenna having three edge slots
Fig. 30.4 Antenna having one edge slot with three bands
which shows the antenna is resonating at the operating frequencies and better than
the two slot structure.
Figure 30.10 displays the reflection coefficient vs frequency with single-slot
antenna on the edge of the structure, and it can be observed that a return loss of
20 dB for 2.5 GHz frequency and 32.4 dB at 6.7 GHz frequency 21.4 dB at
8.9 GHz frequency. Figure 30.11 displays the VSWR versus frequency with three-
slot antennas on the edge of the structure, and it can be observed that a VSWR of 1.9
and 0.45 and 1.51 at the desired frequencies, which shows the antenna is resonating
at the operating frequencies and triple-band operation was achieved.
Figure 30.12 displays the directivity plot with three edge slot antennas on the
structure, and it can be observed that side lobes are less for the antenna. Figure 30.13
displays the reflection coefficient vs frequency with three-slot antennas on the edge
30 Multiple Slotted Triple-Band PIFA Antenna 319
Fig. 30.5 Antenna having multiple edge slots with three bands
Fig. 30.6 Reflection coefficient versus frequency with two slots
of the structure, and it can be observed that triple-band operation also possible with
three slots if we change the slot positions.
Figure 30.14 displays the VSWR versus frequency with three slot antennas on the
edge of the structure and it can be observed that a VSWR of 1.01 and 0.39 and 2.71
at the desired frequencies, which shows the antenna is resonating at the operating
frequencies and triple band operation was achieved.
Figure 30.15 displays the directivity plot with three edge slot antennas on the
structure, and it can be observed that side lobes are less for the antenna.
Figure 30.16 shows the fabricated antenna with edge slot. SMA connector is used
to feed the antenna for power.
320 T. V. S. Divakar and G. Anantha Rao
Fig. 30.7 VSWR versus frequency with two slots
Fig. 30.8 Reflection coefficient versus frequency with three edge slots
30.4 Conclusion
A PIFA antenna with edge slot of overall size 26.7 ×16 ×1mm
3with fully covered
ground structure with copper has been simulated and fabricated. Microstrip line
feed is used to feed the structure. Because of small size, it is useful for medical
applications and wearable applications. This antenna meets requirements of 2.5 and
8.9 GHz applications. Determined gain of the PIFA antenna is found to be 3.5 dB.
Simulated reflection coefficient and VSWR properties are fairly in good agreement.
30 Multiple Slotted Triple-Band PIFA Antenna 321
Fig. 30.9 VSWR versus frequency with three slots
Fig. 30.10 Reflection coefficient versus frequency with single edge slot
322 T. V. S. Divakar and G. Anantha Rao
Fig. 30.11 VSWR versus frequency with single edge slot
Fig. 30.12 Directivity plot
with single edge slot
30 Multiple Slotted Triple-Band PIFA Antenna 323
Fig. 30.13 Reflection coefficient versus frequency with three edge slots
Fig. 30.14 VSWR versus frequency with three edge slots
324 T. V. S. Divakar and G. Anantha Rao
Fig. 30.15 Directivity plot
with three edge slots
Fig. 30.16 Fabricated
antenna with single slot
References
1. Kaschel, H., & Ahumada, C. (2018). Design of rectangular microstrip patch antenna at 2.4 GHz
for WBAN applications. In ICA-ACCA (pp. 17–19).
2. Ashyap, A. Y., Abidin, Z. Z., Dahlan, S. H., Majid, H. A., Shah, S. M., Kamarudin, M. R., &
Alomainy, A. (2017). Compact and low-profile textile EBG-based antenna for wearable medical
applications. IEEE Antennas and Wireless Propagation Letters, 16, 2550–2553.
3. Gao, G. P., Yang, C., Hu, B., Zhang, R. F., & Wang, S. F. (2019). A wide-bandwidth wearable
all-textile PIFA with dual resonance modes for 5 GHz WLAN applications. IEEE Transactions
on Antennas and Propagation, 67(6), 4206–4211.
4. Alonso-González, L., Ver-Hoeye, S., Fernández-García, M., Vázquez-Antuña, C., & Andrés, F.
L. H. (2019). On the development of a novel mixed embroidered-woven slot antenna for wireless
30 Multiple Slotted Triple-Band PIFA Antenna 325
applications. IEEE Access, 7, 9476–9489.
5. Ahmad, M. S., Kim, C. Y., & Park, J. G. (2014). Multi shorting pins PIFA design for multiband
communications. International Journal of Antennas and Propagation, 6.
6. Raviteja, G. V., Lakshmi, V. R., & Reddy, N. S. (2019). Multiband planar inverted-F antenna
employing rectangular SRR for UMTS and WiMax/WiFi applications. International Journal of
Computer Applications, 182(36).
7. Ray,J. A., & Chaudhuri, S. B. (2011). A review of PIFA technology. In 2nd International Confer-
ences on Information Technology, Information Systems and Electrical Engineering (ICITISEE).
https://doi.org/10.1109/IndianAW.2011.6264946
8. Rama Krishna, K., Sambasiva Rao, G., & Ratna Raju, K. P. R. (2015). Design and simulation
of dual band planar inverted F antenna (PIFA) for mobile handset applications. International
Journal of Antennas (JANT), 1(1).
9. Ruaro, A., Thaysen, J., & Jakobsen, K. B. (2016). Wearable shell antenna for 2.4 GHz hearing
instruments. IEEE Transactions on Antennas and Propagation,64(6), 2127–2135.
Chapter 31
Fish Classification System Using
Customized Deep Residual Neural
Networks on Small-Scale Underwater
Images
M. Sudhakara, Y. Vijaya Shambhavi, R. Obulakonda Reddy, N. Badrinath,
and K. Reddy Madhavi
Abstract Recent improvements in marine science research have increased the
importance of underwater fish species identification. Using technology to auto-
mate fish species identification would positively impact marine biology. Since
deep learning techniques, image classification problems have become increasingly
popular. Wild natural habitats make it harder to identify fish species because of
the complex background and noise in the raw images. Some of the most advanced
approaches for categorizing fish species in their natural habitats have been devel-
oped in the previous decade. This paper demonstrated an automated approach for
classifying fish species based on deep residual networks. Existing transfer learning
models do a good job working with smaller datasets, but not so effectively. A novel
RESNET model (SmallerRESNET) is developed to reduce the overfitting generated
by the standard pre-trained RESNET model. Convolutional and fully linked layers
are used in the more straightforward form of the RESNET model. We evaluated and
compared six different versions of the RESNET model. In addition to the number
of convolutional and fully connected layers, the number of iterations required to
achieve 80.56% accuracy on training data, batch size, and the dropout layer is exam-
ined. Compared to the original RESNET model, the proposed and modified RESNET
model with fewer layers obtained 90.26% testing accuracy with a validation loss of
0.0916 on an untrained benchmark fish dataset. The inclusion of a dropout layer
M. Sudhakara (B)
School of C & IT, Reva University, Bangalore, India
e-mail: malla.sudhakara@reva.edu.in
Y. Vijaya Shambhavi
E.E.E, Annamacharya Institute of Technology and Sciences, Tirupati, AP, India
R. Obulakonda Reddy
C.S.E, Institute of Aeronautical Engineering, Dundigal, Hyderabad, India
N. Badrinath
C.S.E, Annamacharya Institute of Technology and Sciences, Tirupati, AP, India
K. Reddy Madhavi
C.S.E, Sree Vidyanikethan Engineering College, Tirupati, AP, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
B. N. K. Rao et al. (eds.), Intelligent Computing and Applications, Smart Innovation,
Systems and Technologies 315, https://doi.org/10.1007/978-981-19-4162-7_31
327
328 M. Sudhakara et al.
enhanced our proposed model’s overall performance. It is more efficient with less
memory, fewer training photos, and less computing complexity than its predecessor.
31.1 Introduction
A fish detection and recognition system that can recognize, localize, and categorize
fish and fish species in underwater photographs would be helpful in a variety of
marine applications. Monitoring duties could include, among other things, population
counting, identifying species present in a specific area, and surveying fish movement
patterns. This could happen in dams, streams, lakes, fish farms, and even the ocean. In
the context of fishing operations, the application could be used to parameterize a fish
school to estimate the distribution of quantities of various species more efficiently.
As a result, the expected bycatch and discard rates can be calculated. Commercial
fisheries can use this information to investigate an identified school of fish before
spending quotas fishing it.
Its significance in oceanography and marine research is that detecting fish is very
important in underwater object detection. Classification of fish kinds is helpful to
researchers, ocean scientists, and biologists [1]. It is also helpful to specify biomass
levels and the geological processes in the oceans. Because fish species recognition is
so important, various computer vision approaches for reliably categorizing different
fish species have been proposed. Fish species can be classified as follows:
1. Identifying fish species from dead fish carcasses.
2. Recognizing various fish species in a simulated habitat.
3. Identifying fish species existing in their local territory.
Finding underwater fish and their species is complex. Fish recognition research
has been conducted on both dead fish and fish that have been removed from the water
or kept in artificial tanks [2]. However, no serious research into underwater habitats
has been conducted. This is due primarily to the natural environment of the sea,
which contributes to its abundance. The underwater films are difficult to watch due
to their poor quality, complicated backgrounds, and low luminance. By visualizing a
fish species’ movements and activities, we could better evaluate the species’ overall
movement and activity. As a result of the introduction of deep learning algorithms,
image classification has become a rapidly growing research topic. This is primarily
because, like CNN, deep learning approaches, no feature extraction is needed before
model training. Most existing object recognition frameworks rely on photographs
taken from the ground or satellites.
31 Fish Classification System Using Customized Deep Residual 329
31.2 Related Works
The rich background and noise of underwater images made it difficult for identifying
fish species. Many academics have proposed advanced approaches to categorizing
fish species in natural habitats. There have recently been many advances in ML
methods for underwater species categorization. The algorithms are used to shape
and texture to classify dead fish samples. General fish classification is complicated
in variable luminance, background noise between the reef and aquatic vegetation,
and water turbidity. Proper classification is difficult because many fish species have
similar shapes, textures, and colors. Shafait et al. [3] used abundance and biomass
to identify fish species in wild habitats. Hernández-Serna and Jiménez-Segura used
ANN to identify morphology, texture, and geometry [4]. The authors in [3]used
hierarchical categorization to identify active fish in an open sea. Sun et al. identified
fish kinds using poor-resolution photos.
Hsiao et al. [5] classified 25 fish types accurately, about 81.8% using S.R.C. and
P.C.A. dataset with 15 variety fish types and approximately 24,000 photos; Huang
et al. [6] used Gaussian mixture models and SVMs to achieve an identification rate of
74.8%. It was introduced in the late twentieth century, but it was unpopular because
it required extensive supervised training and could not deal with difficult or complex
situations. Convolution is a widely used operation in signal processing and computer
vision. Convolutional neural networks are now widely used in computer vision to
reduce noise and identify edges. Convolutional neural networks (CNNs) are like
artificial neural networks (ANNs), gaining popularity in AI and machine learning
applications. Face and object recognition are two applications of CNNs and their
modifications. The introduction of powerful GPUs has facilitated the training of
deep and complex neural networks [7].
Rathietal.[8] recommended a deep learning and image processing system. Pre-
processing techniques included Otsus thresholding, erosion, and dilation. CNN uses
the resultant picture and the actual picture to categorize fish types. Fish4Knowledge
is a utilized dataset with a 96.29% accuracy rate. They proposed using a mono-image
with high-resolution to super-resize a low-resolution picture. A linear SVM and two
deep learning approaches, PCANet and NIN, classify the species. The dataset used
was FishCLEF2015, with PCANet achieving 77.27% precision and NIN achieving
69.84%. Qin et al. [9] are among those who have contributed to this work.
Salman et al. [10] suggested using a CNN in a hierarchical feature combina-
tion system to analyze species-dependent properties. SVM and KNN are applied to
extract and classify. The models FISHCLEF 2014 and FISHCLEF 2015 were used.
The framework was 97.41% accurate. Mohammed et al. introduced an SVM archi-
tecture and strategies to extract the features for the classification of Nile Tilapia. The
SIFT and SURF algorithms extracted image features, promising results. In these
frameworks, machine learning was used to classify fish species. Other frameworks
employ more traditional methods. Moniruzzaman et al. [11] provide an overview
of classification strategies for underwater fish species. Deep learning, for example,
has achieved outstanding results in visual recognition and detection. Previously,
330 M. Sudhakara et al.
researchers had difficulty obtaining satisfactory results. It was difficult for various
reasons, including an insufficient sampling of fish species and sparse datasets. The
recognition accuracy varies depending on the situation. Many scientists are working
on complex models to learn and extract complex features using popular deep learning
models such as ResNet [12]. This study proposes an efficient and instant classification
method for low-quality image datasets.
31.3 Deep Residual Network (DRN)
Deep networks have made significant advances in image classification. However,
there is a degradation problem when considering the convergence of deeper networks,
i.e., as the accuracy increases, it reaches saturation. Accuracy suffers as depth
increases. Overfitting is no longer a factor since an error in training and test sets
raises. The research team examined numerous deep learning models and discovered
that RESNET outperformed other models in classification performance and that
increasing the depth improves accuracy. The main goal of the RESNET model is to
overcome neural network degradation risk, in which the training error rate increases
with depth [13]. As a solution, the team suggested a residual structure. As a residual
input function, the network layer function is reprogrammed. The residual contrasts
the actual observation and the estimated value in mathematics. RESNET residual
components are mentioned based on project requirements, and Fig. 31.1 depicts a
sample RESNET-20 residual component. Two convolution layers are used in the
residual component. Because the residual component’s input and output dimensions
are similar and could be added immediately, the convolution kernel size of 3 * 3.
When step size is one, padding layer is the original input layer; when step size is two,
the RESNET input performs the same function twice, followed by average pooling
to generate the filler layer. Eventually, the output of the filling layer plus the residual
component equals the output layer input.
CNNs help extract features or like a classifier. It is a set of convolutional filters
which extract feature vectors from images using a kernel. The pooling layer subsam-
ples the maximum (M.A.X. pooling) or average (A.V.G. pooling) kernel’s pixel value.
Fully connected layers map with all incoming feature vectors to the next layer. The
framework for identifying fish species is depicted in Fig. 31.2. 3 convolution layers
are used in the system, including a max-pooling layer. Conv 1, Conv 2, Conv 3 denote
the convolutional layers, and Max P denotes the max-pooling operation. Furthermore,
3x3, 64 ReLU 3x3, 64 ReLU
Fig. 31.1 RESNET-20 residual components
31 Fish Classification System Using Customized Deep Residual 331
Conv 1 Conv 2 Conv 3
Max P Max P Max P
Fig. 31.2 Conventional structure of DRN
two additional layers are used to classify fish species. The primary advantage of max-
pooling is to extract the features, i.e., edge detection. The identification of fish species
is dependent on determining the fish border. Image sharpening is used on fish photos.
Because it averages the kernel pixels, average pooling may be insufficient. The first
convolution layer contains thirty-two 3 ×3 filters; the next contains sixty-four 3 ×
3 filters, and the third contains 128 3 ×3 filters. Following the convolution layers is
a2×2 max-pooling layer. A dropout layer is placed before a fully connected layer
to prevent overfitting. The rectified linear unit (ReLU) is utilized in total convolution
layers. Softmax (final layer) is utilized as an activation function. However, due to
the models’ fixed parameters, the model’s accuracy is degraded since the original
models are trained on large benchmark datasets. Further, CNN gives better results
on quality image datasets but poorly on the low-scale and low-quality datasets.
31.4 Proposed Method
Preprocessing the sample images is an essential step in recognizing underwater fish
species. This is significant because the sample images are blurry due to the uncon-
trolled environment. As a result, the classifier may fail to learn species-specific
features accurately. Figure 31.3a depicts the proposed work’s customized architec-
ture, whereas Fig. 31.3b depicts the proposed work’s dropouts between the input
and output layers. In the architecture, three convolutional layers are followed by
two dense layers. Batch normalization and dropouts are used to train a more proba-
bilistic model, reducing overfitting caused by the underlying RESNET model. During
testing, real-time image classification is possible. The mini-batch mean and standard
deviation are subtracted and divided to normalize layer inputs in batch normalization.
A mini-batch, for example, contains only a subset of the total training data.
ˆp=pE[p]
Var [p](31.1)
As a result of normalization, the input distribution to each neuron is the same,
removing the problem of internal covariate shift and enabling regularization. But, the
network’s representative strength has been substantially weakened. Normalization
332 M. Sudhakara et al.
Classification
Dropout on
hidden layer Input laye
r
CONV
BN
MP
DROPOUT
CONV
BN
CONV
BN
MP
DROPOUT
DENSE
BN
DROPOUT
DENSE + S
Output
Dataset
[*2]
ab
Fig. 31.3 Proposed SmallerRESNET. aArchitecture used to train small-scale underwater images.
bRegularization using dropouts
of each layer loses specific nonlinear correlations and weight changes produced by
the previous layer. This can result in suboptimal weighting. This is fixed by adding
gamma- and beta-trainable parameters to the normalized value.
z=z·ˆp+β(31.2)
Another feature of CNNs is a dropout layer. While some neurons’ inputs are
hidden, others remain unaffected. It is possible to use a dropout layer on an input
vector to remove some of its features or a hidden layer to remove some of its hidden
neurons. When training CNNs, dropout layers prevent overfitting on the training
data. The first batch of training data has a disproportionate impact on learning if it
is missing. This would prevent learning traits from being discovered in subsequent
samples or batches.
31 Fish Classification System Using Customized Deep Residual 333
31.5 Results and Discussion
For the experimental analysis to be accurate, the study is carried out on an NVIDIA
4 G.B.—GPU and a Core-i5 CPU of the 9th generation. RESNET’s transfer learning
model is being tested on a challenging dataset of underwater photos as the study’s
primary objective.
The Croatian dataset contains 794 images of 12 different species. Six hundred
twenty-six training images and 168 testing images were considered for the training
model [14]. Despite their low resolution, the images are rescaled to 224 * 224 to
obtain more generalized results at the expense of image quality. To accomplish this,
six different versions of RESNET models are investigated, with the models trained on
the Croatian dataset. For each trained model, the accuracy and loss during training,
the accuracy and loss during validation, and the accuracy and loss during validation
are calculated. The loss is calculated using categorical cross-entropy and the Adam
optimizer. Basic augmentation techniques such as scaling, shearing, zooming, and
flipping are applied to expand the proportion of samples in each class.
The RESNET50, RESNET101, and RESNET152 models have lower training and
validation accuracy and higher training and validation loss than the other models.
Figure 31.4 depicts the version1 models’ training and testing accuracy, as well as
training and testing loss (RESNET50, RESNET101, and RESNET150). The second
version models (RESNET50V2, RESNET101V2, and RESNET152V2) outperform
the first version models in training and validation accuracy. When version2 models
are compared to their predecessors in terms of training loss, the version2 models have
a lower training loss but a significantly higher validation loss. The models appear to
be becoming overfit for the provided dataset. Figures 31.5 and 31.6 depict a statistical
comparison of the two models under consideration’s outcomes. In addition to the drop
out layers, the original RESNET model (SmallerRESNET) is modified by including
the batch normalization and drop out layers. Table 31.1 contains six trained models
as well as a model that was proposed. The results of the proposed architecture are
showninFig.31.7. In addition to accuracy, the proposed framework considers the
metrics F1-score, precision, and recall.
31.6 Conclusion
In their natural habitat, fish is difficult to distinguish. For underwater fish classifi-
cation, we proposed a deep CNN-based technique. Researchers will benefit from a
better understanding of underwater animals’ life cycles and habitats. This will aid
fish shippers and fishery managers in saving fish. It is made up of three convolu-
tional layers and one fully linked layer. Our proposed method significantly outper-
forms the existing RESNET model in classification. The first RESNET model
employed massive deep neural networks trained on millions of images using 750
photos, including a dropout layer before the softmax classifier improves the model’s
334 M. Sudhakara et al.
Fig. 31.4 RESNET50, RESNET101, RESNET152 models during training the model for 100
epochs. Top: RESNET50, middle: RESNET101, bottom: RESNET150
performance. On average, this model outperforms RESNET. It surpasses the original
RESNET model with an accuracy of 90.26%. The model performs well with fewer
training images and consumes less memory. Undersea organisms can be harmed by
turbid water and background noise.
31 Fish Classification System Using Customized Deep Residual 335
Fig. 31.5 Comparison of training accuracy and validation accuracy of the six residual models
Fig. 31.6 Comparison of training loss and validation loss of the six residual models
Table 31.1 Metrics of six transfer learning models along with the proposed architecture
Architecture Training accuracy Training loss Test accuracy Test loss
RESNET50 41.10 2.1482 37.50 2.0473
RESNET101 39.41 2.0354 38.69 2.3148
RESNET152 40.68 2.0057 37.50 2.5220
RESNET50V2 99.58 0.0388 77.98 6.1305
RESNET101V2 98.94 0.0844 81.55 5.9097
RESNET152V2 99.79 0.0073 77.38 5.8286
SmallerRESNET
(proposed)
80.56 0.5526 90.26 0.0916
336 M. Sudhakara et al.
Fig. 31.7 Visualized results of SmallerRESNET architecture during training for 100 epochs
References
1. Sudhakara, M., & Meena, M. J. (2021). Multi-scale fusion for underwater image enhancement
using multi-layer perceptron. IAES International Journal of Artificial Intelligence, 10(2), 389.
2. Shu, L., Ludwig, A., & Peng, Z. (2021). Environmental DNA metabarcoding primers for
freshwater fish detection and quantification: In silico and in tanks. Ecology and Evolution,
11(12), 8281–8294.
3. Shafait, F., Mian, A., Shortis, M., Ghanem, B., Culverhouse, P. F., Edgington, D., Cline, D.,
Ravanbakhsh, M., Seager,J., & Harvey, E. S. (2016). Fish identification from videos captured in
uncontrolled underwater environments. ICES Journal of Marine Science, 73(10), 2737–2746.
4. Hernández-Serna, A., & Jiménez-Segura, L. F. (2014). Automatic identification of species with
neural networks. PeerJ, 2, e563.
5. Hsiao, Y., Chen, C., Lin, S., & Lin, F. (2014). Real-world underwater fish recognition and
identification using sparse representation. Ecological Informatics, 23, 13–21.
6. Jin, L., & Liang, H. (2017). Deep learning for underwater image recognition in small sample
size situations. In OCEANS 2017-Aberdeen, 2017 (pp. 1–4).
7. Rudra Kumar, M., & Kumar Gunjan, V. (2020). Review of machine learning models for credit
scoring analysis. Revista Ingeniería Solidaria, 16(1).
8. Rathi, D., Jain, S., & Indu, S. (2017). Underwater fish species classification using convolu-
tional neural network and deep learning. In International Conference of Advances in Pattern
Recognition.
9. Qin, H., Li, X., Liang, J., Peng, Y., & Zhang, C. (2016). Deepfish: Accurate underwater live
fish recognition with a deep architecture. Neurocomputing, 187, 49–58.
10. Salman, A., Harvey, E., Jalal, A., Shafait, F., Mian, A., Shortis, M., & Seager, J. (2016).
Fish species classification in unconstrained underwater environments based on deep learning.
Limnology and Oceanography, Methods, 14, 570–585.
11. Moniruzzaman, M., Islam, S., Bennamoun, M., & Lavery, P. (2017). Deep learning on under-
water marine object detection: A survey. In International Conference on Advanced Concepts
for Intelligent Vision Systems (pp. 150–160).
12. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 770–
778). https://doi.org/10.1109/cvpr.2016.90
31 Fish Classification System Using Customized Deep Residual 337
13. Zhang, X., Zhou, X., Lin, M., & Sun, J. (2018). Shufflenet: An extremely efficientconvolutional
neural network for mobile devices. In Conference on Computer Vision and Pattern Recognition
(pp. 6848–6856).
14. Jäger,J., Simon, M., Denzler, J., Wolff,V., Fricke-Neuderth,K., & Kruschel, C. (2015). Croatian
fish dataset: Fine-grained classification of fish species in their natural habitat. Swansea: Bmvc.
Chapter 32
Multiple Face Recognition System Using
OpenFace
Janakiramaiah Bonam, Lakshmi Ramani Burra,
Roopasri Sai Varshitha Godavarthi, Divya Jagabattula, Sowmya Eda,
and Soumya Gogulamudi
Abstract The digitalization of human work has been an ever-evolving process.
Student’s and employee’s attendance systems are automated by using fingerprint
biometrics. Specifically covid situation has created the need for touchless attendance
system. Many institutions have already implemented a face detection-based atten-
dance system. However, the major problem in designing face-recognising biometric
applications is the scalability and accuracy in time to differentiate between multiple
faces from a single clip/image. This paper used the OpenFace model for face recog-
nition and developed a multi-face recognition model. The Torch and Python deploy-
ment module of deep neural network-based face recognition was used, and it was
predicated accurately in time.
32.1 Introduction
Authentication is one of the most significant challenges in society. One of the most
well-known technologies for authentication is human face recognition. The solution
to this problem has substantially improved since the release of FaceNet article was
published. The majority of the research is based on this foundation. The perspective,
dimensions and luminance of the face all impact the performance. The OpenFace
model, which is restricted to images with one face and predicts the individual’s name
and is not suitable for the group image, such as the one below, fails to complete the
task shown in Fig. 32.1. The challenge is to make the OpenFace algorithm work for
all the faces in an image hence bringing multi-face recognition into play and the
block diagram is shown in Fig. 32.2.
FaceNet produces significant face mapping from images using deep learning
models like ZF-Net and Inception. FaceNet obtained cutting-edge outcomes in
J. Bonam (B)·L. R. Burra ·R. S. V. Godavarthi ·D. Jagabattula ·S. Eda ·S. Gogulamudi
Prasad V. Potluri Siddhartha Institute of Technology, Vijayawada, Andhra Pradesh, India
e-mail: janakiramaiah@pvpsiddhartha.ac.in
L. R. Burra
e-mail: blramani@pvpsiddhartha.ac.in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
B. N. K. Rao et al. (eds.), Intelligent Computing and Applications, Smart Innovation,
Systems and Technologies 315, https://doi.org/10.1007/978-981-19-4162-7_32
339
340 J. Bonam et al.
Fig. 32.1 A group photograph with multiple faces
Fig. 32.2 Block diagram of OpenFace algorithm
various standard face recognition datasets, including labeled faces in the wild (LFW)
and the Youtube Face database. Then, as a loss function, it used a procedure known
as triplet loss to train this framework.
The idea behind triplet loss in our architecture is to force the framework to impose
an edge among faces of multiple personalities. The primary objective of this loss
function is to minimise the squared distance of two image embeddings which are
impartial of image circumstance and pose of the same individuality while increasing
the squared distance between two images of different identities. Apart from accessible
facial recognition, OpenFace’s modelling approach concentrates on real-time face
recognition on portable devices, allowing you to train a high precision model with
less data.
This study presents a multi-face recognition model using the OpenFace algo-
rithm based on deep learning networks. The transfer learning is used to modify
32 Multiple Face Recognition System Using OpenFace 341
the OpenFace algorithm so that it can be used for multiple faces in a single
image/photograph.
32.2 Related Work
There are various techniques for face recognition, beginning with one of the most
widely used Eigenfaces [1] until the recent approach utilising deep learning, e.g.
deepface [2]. Principal component analysis (PCA) is used in Eigenfaces to minimise
the dimensionality of a series of facial photographs. This approach was developed
for a face categorisation challenge, and is largely viewed as the father of facial
recognition technology [3]. The eigenvectors are created from the covariance matrix
using this procedure, which involves converting each picture into a vector with each
value representing the importance of a single image pixel. Other techniques, like
linear discriminant analysis (LDA) [4] and support vector machine (SVM) [58]
effort to compensate for the inadequacies of LDA and SVM.
Face Net: The training images are resized, converted and firmly trimmed around
the facial area. FaceNet’s loss function is also an essential characteristic. It employs
the triplet loss function. FaceNet differs from other methodologies in that it under-
stands the mapping from images and generates embeddings instead of relying on
any bottleneck layer for identification or confirmation tasks. Once the embeddings
are produced, all other tasks such as confirmation, identification and so on can be
conducted using conventional methods in that domain, with the newly produced
embeddings serving as the feature vector. For instance, we can utilise k-NN to recog-
nise faces through embeddings as the feature vector. We can utilise any grouping
method to group the faces together by describing a threshold value for confirmation.
TripletLoss Function in Face Net: The idea behind the triplet loss function is that the
anchor image (an image of a particular individual A) to be nearer to favourable char-
acteristics (all of person A’s images) than to negative images (all the other images),
i.e. we require the displacements among anchor image’s embedding and the embed-
dings of our positive images to be relatively small than the ranges among anchor
image’s embedding and the embeddings of our negative images (Fig. 32.3).
Fig. 32.3 Original image sample considered in this study
342 J. Bonam et al.
Fig. 32.4 Original image
sample
Face Detection: Our input images should have properties similar to those used to
train the CNN model to achieve the best results. As a result, the project’s first step
is to be able to extract all of the faces presented and apply proper transformations
so that the model input meets the following requirements: The image should be 96
pixels by 96 pixels in size. In the image, only one face is permitted. The face must
be aligned so that the line connecting the two eyes is horizontally oriented and 20
pixels from the top of the image.
Open Face AlignDilib Utility: There are some pre-trained libraries available; in
this particular instance, we are using the AlignDlib functionality from the OpenFace
proposal, which detects and aligns the face using pre-trained landmarks. We can also
use OpenCV’s Haar cascade frontalface, which has already been trained.
This alternative, nevertheless, does not include the alignment task. Figures 32.4,
32.5,32.6 and 32.7 depicts different phases of a sample image processed using
AlignDilib utility. The final face detected and the aligned image is shown in Fig. 32.7.
32.3 Proposed Methodology
The aim of this work is to develop a model that can detect and recognise all people
whose faces are present in a picture or real-time captured video and the Logical flow
of Open Face is shown in Fig. 32.11.
Data Preprocessing: As discussed earlier, we have a better chance of training the
model reliably if we have more data. On the other hand, data is expensive, so we
must make the most of what we have. By changing the original image’s orientation,
brightness and contrast, we will be able to produce more data for the database.
When it comes to the actual job, this will also help reduce the effect of the photo’s
32 Multiple Face Recognition System Using OpenFace 343
Fig. 32.5 Face detection
performed
Fig. 32.6 Detected face
from the image
orientation and lighting. To preprocess the data, that is available at hand, will bring
better outcomes. Applied preprocessing techniques are shown in Figs. 32.8,32.9 and
32.10.
OpenFace CNN for Multi-face Detection: OpenFace is a lightweight model. Open-
Face model expects 96 ×96 RGB images as input. It has an output consisting of 128
dimensions. It is developed on the Inception Resnet v1 framework. The model is more
complex but has less parameters compared to other models. Similar to FaceNet and
344 J. Bonam et al.
Fig. 32.7 Detected face
using AlignDilib
Fig. 32.8 Image with
changed orientation
32 Multiple Face Recognition System Using OpenFace 345
Fig. 32.9 Changing contrast
of detected
Fig. 32.10 Changing
gamma of detected face
346 J. Bonam et al.
Fig. 32.11 Logical flow of OpenFace
VGG Net algorithms, the algorithm applies one-shot learning. The model is based on
the condition that different photos of the same person must have less distance among
them, whereas different person photos should be at more distance. The distance that
is considered can be Euclidean distance (Fig. 32.11).
OpenFace detects the facial region in a picture using dlib [2] and generates a box
over each face that may be in various positions. This might be a problem if utilised
as input to the network right away, thus it has to be preprocessed. OpenFace uses 2D
affine transformation as a preprocessing approach, which resizes and crops photos to
the boundaries of the landmarks created by the dlib face detector, bringing the nose
and eye corners closer to mean locations. The outcome of this transformation is a 96
×96 pixel normalised picture. Rather than training the neural network to identify
individuals one by one, the primary concept is to teach it to determine if two images
belong to the same individual.
The training stage will use three input images to do this: an anchor image of
the test subject, [9] a positive image of the same person, and a negative image of a
different person. The network will next use a triplet-loss function to try to reduce the
distance between and while maximising the distance between. Although the model
structure is complex, we just need to know that the input dimension is 3 (channels)
×96 (pixels) ×96 to use it (pixels). It provides an output with a dimension of 1 ×
1×128, which is the image embedding, after numerous convolutional layers, one
of which is an inception network. If the model is correct, images of the same person
should have a short embedding distance, whereas images of different people should
have a large embedding distance.
32 Multiple Face Recognition System Using OpenFace 347
Extraction and Alignment Setup:InFigs.32.12 and 32.13, the embedding distance
between both positive images to the input and negative images to the input image is
the same. This is not expected. This is taking place because the model is contrasting
the whole picture, i.e. the face and body and background. In order to compare only
faces, the images must undergo the extraction and alignment. Once the above step
is completed, we have to generate a database by storing the embedding values in a
dictionary for the comparison task. Our proposed framework, focuses on the above-
discussed criteria utilising them to train a multi-face detection model.
Multi-face Detection Model: Training is a vital task that feeds the input to the model.
We are focusing our model to be trained in such a way that it promotes real-time
Fig. 32.12 Embedding distance between the input image and positive image
Fig. 32.13 Embedding distance between the input image and negative image
348 J. Bonam et al.
multi-face recognition. Real-time multi-face detection enables us to achieve a multi-
face recognising biometric attendance system. The below flow of steps help us to
achieve a multiple face recognition model.
Detect, extract and align all faces presented in the picture.
Pass all the output faces to the CNN, which return the corresponding embedding
values.
For each face, calculate its embedding distance with all the values stored in the
database and return the identity of the person with smallest distance.
Attached the label to the corresponding face in the original image.
The above steps help us achieve a multi-face recognition system based on Open
Face. As the major algorithm utilised is Open Face and we have only multiplied the
usage of the Open Face algorithm for multiple faces in a single image. The final
accuracy and loss of the model will be the same as the Open Face model for face
recognition.
32.4 Results
We can transform our model to work for real time and live feed by simply adding a
piece of further implementation that can collect each frame, pre-process each frame,
and pass it to the trained model as a stream of single images. Figure 32.14 shows
the multiple face recognition from a group photo by the model and this achieves the
scalability and accuracy in time to differentiate between multiple faces from a single
clip/image.
Fig. 32.14 Multiple face recognition from a group photo by the model
32 Multiple Face Recognition System Using OpenFace 349
32.5 Conclusion
The OpenFace facial recognition package has been elaborated in its use case in this
paper. It shows that the embedding distance can be calculated for multiple faces
in a single image and replicate the OpenFace algorithm to every face in an image
leading to a multi-face recognition model for bio-metric attendance systems, saving
attendees’ time. This paper provides a greater utilisation and performance of the
OpenFace algorithm for multi-face detection from a single image.
References
1. Santoso, K., & Kusuma, G. P. (2018). Face recognition using modified OpenFace, In 3rd
International Conference on Computer Science and Computational Intelligence.
2. Zulfiqar, M., Syed, F., Khan, M. J., & Khurshid, K. (2019). Deep face recognition for biometric
authentication. In 2019 International Conference on Electrical, Communication, and Computer
Engineering (ICECCE).
3. Cao, Q., Shen, L., Xie, W., Parkhi, O. M., & Zisserman, A. (2017). VGGFace2: A dataset for
recognising faces across pose and age. arXiv preprint arXiv:1710.08092
4. Xu, M., Cheng, W., Zhao, Q., Ma, L., & Xu, F. (2015) Facial expression recognition based on
transfer leaming from deep convolutional networks (pp. 702–708). IEEE.
5. Schroff, F., Kalenichenko, D., & Philbin, J. (2015). Facenet: A unified embedding for face
recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition (pp. 815–823).
6. Janakiramaiah, B., Kalyani, G., Karuna, A., et al. (2021). Military object detection in defense
using multi-level capsule networks. Soft Computing.https://doi.org/10.1007/s00500-021-059
12-0
7. Janakiramaiah, B., Kalyani, G., & Jayalakshmi, A. (2021). Automatic alert generation in a
surveillance systems for smart city environment using deep learning algorithm. Evolutionary
Intelligence, 14, 635–642. https://doi.org/10.1007/s12065-020-00353-4
8. Rao, B. N. K., & Rao, B. B. K. (2019). Defect detection in printed board circuit using image
processing. International Journal of Innovative Technology and Exploring Engineering, 9(2).
9. Wolf, L., Hassner, T., & Maoz, I. (2011). Face recognition in unconstrained videos with matched
background similarity. In 2011 IEEE Conference on Computer Vision and Pattern Recognition
(CVPR) (pp. 529–534). IEEE.
Chapter 33
EDAARP-Efficient and Data-Aggregative
Authentic Routing Protocol for Wireless
Sensor Networks
Kurakula Arun Kumar and Karthikeyan Jayaraman
Abstract Wireless sensor networks (WSN) are quickly gaining a lot of research
attention, and several communications are required for data sensing in WSN. The
sensor nodes collect data from multiple locations and relay it to the central control
unit. However, few of the nodes have insufficient resources. Despite the fact that
existing routing algorithms support cluster-based routing, these techniques do not
consider all of the resource constraints associated with the nodes. The goal is to
discuss the design and implementation of a cluster-based network strategy that uses
Presumptive Data Gathering (PDG) and Selective Information Standards (SIS) algo-
rithms. “This aims to improve energy efficiency by creating a fully connected dedi-
cated node that connects all sub-clusters” is connected using the best processes of a
controlled set associated with it, and in these cases, relay nodes are used. The Effi-
cient Data-Aggregative Authentic Routing Protocol (EDAARP) is proposed in this
work, to preserve actual energy usage and even consistent transmission of sensed
data.
33.1 Introduction
In WSN, the energy that is being used by the nodes is predominantly related to the
amount of data that is being transmitted among the nodes. The maximum amount
of energy provided for the communication is consumed for data transmission [1]. In
WSN, data transmission possesses a major significance as they are networks that are
data-dependent. In WSN, through multiple hops or single-hop transmission proto-
cols, the data is routed to the sink nodes. During this state, routing has a crucial role
in the process of data accumulation. In WSNs, the algorithm designed for routing
offers higher importance toward the energy utilization of each of the sensor nodes.
K. Arun Kumar (B)·K. Jayaraman
SITE, VIT, Vellore, Tamil Nadu, India
e-mail: arun.aitsr@gmail.com
K. Jayaraman
e-mail: karthikeyan.jk@vit.ac.in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
B. N. K. Rao et al. (eds.), Intelligent Computing and Applications, Smart Innovation,
Systems and Technologies 315, https://doi.org/10.1007/978-981-19-4162-7_33
351
352 K. Arun Kumar and K. Jayaraman
From [1], it can be determined that flat network protocol [2] lessens the network
throughput along with increasing the network lifetime. System computational intri-
cacy is increased to a higher level in hierarchical protocols. This is possible by
introducing geographical information-based protocols that need a further setup like
Global Positioning Systems (GPS) devices that are to be mounted on sensor nodes.
Assuring the data delivery during the failure of the nodes and interference in the
communication is the major challenge associated with routing in sensor networks. In
WSN, the most commonly used design principles are clustering and data aggregation
[3]. Clustering in WSN is very common and the main aim of clustering is to lessen
the number of inter-cluster links and even minimize the network traffic. Further,
clustering in WSN helps in attaining network stability. Unlike the other traditional
protocols, the data aggregation protocols [1,3,4] concentrated on the communication
overhead and reacts to the response of the network. Better data gathering networks
are attained in data aggregation protocols as the receiver nodes in these protocols
wait for receiving the data from its nearest node instead of just sending data to
other nodes as soon as it receives. In the data aggregation method, the redundant data
from various sources is eliminated. With this, the bandwidth and power consumption
are preserved. In [5,6], the main innovation associated with the PLAPS protocols
is associated with the establishment of the node to node connectivity. This helps
in successful data transmission. Utilizing the corresponding functional reputations,
the required data is added in PLAPS. Even credible data aggregation processes are
applied in PLAPS. The sender node directs the packet toward its dominator node to
initiate the communication among the nodes that belong to various clusters. Once
the receiver node is discovered, the data is forwarded to its subsequent nodes by
the dominator node. Upon receiving the data, the packets are propagated by the
terminal nodes to neighboring cluster nodes within their range. Relay nodes operate
as boosters when surrounding cluster nodes are out of range of the terminal node.
With this, the neighboring cluster nodes are taken into the range of transmission.
The major assumptions in the projected model are as follows:
Sensor nodes would be distributed unevenly in the predefined locations. The base
station (BS) is located outside of the network. Static clusters can be seen in the
sensor field. Each sensor node and cluster head has their identification.
After deployment, sensors cannot be moved.
All the nodes possess equal communication and computation capabilities, and
therefore, network is homogeneous.
Every sensor node can work as an information serving node.
All nodes own the capacity of converting them into cluster heads.
When the receiver is not in the range and once the data packet is received, subse-
quent routing of data packets from one to another node is shown in Figs. 33.1,
and 33.2 shows how this procedure is utilized to avoid overuse of smart nodes for
communication and ensure network reliability.
33 EDAARP-Efficient and Data-Aggregative Authentic Routing 353
Fig. 33.1 Relay node
communication
Fig. 33.2 Maximum
number of independent sets
33.2 Efficient Data-Aggregative Authentic Routing
Protocol (EDAARP)
Clustering and aggregation techniques can be merged by using the suggested algo-
rithm which allows ideal routing hierarchy creation with the maximum number of
354 K. Arun Kumar and K. Jayaraman
constructive paths. Misra and Thomasinous [3] that associates whole sensor nodes to
the (base station) BS while maximizing the use of WSN resources. In wireless sensor
networks, the suggested protocol is known as Efficient Data-Aggregative Authentic
Routing Protocol (EDAARP). Simple frame work of relay nodes and healing are
represented in (Figs 33.3,33.4 and 33.5).
Fig. 33.3 Relay node for
MCDS
Fig. 33.4 Refined PLAPS
33 EDAARP-Efficient and Data-Aggregative Authentic Routing 355
Fig. 33.5 Healing
The purpose of this scheme’s design is to provide continuous clustering with
precise data aggregation and suitable additional communication. This ensures that
information is kept private [7,8]. The EDAARP algorithm is divided into four distinct
parts or modules. In module 1, the deployed nodes use their energy levels and even
their neighborhood information to form diverse clusters. In module 1, for effective
data forwarding, a cluster structure is formed and this structure is formed using the
SIS algorithm. The main focus of module 2 is on data management [9]. However,
PDG scheme (Presumptive Data Gathering) [8,9] provides accurate data aggregated
results. This helps in maximizing throughput and reducing the consumption of energy.
However, module 3 concentrates on network connectivity. PLAPS [5] not only offers
the utmost network connectivity for efficient data routing, but it also ensures the
saving of the data packets. At last, module 4 helps in setting a reliable route among the
sensor nodes base station (BS) and sink node. This ensures improved security features
that involve public-key excess Ncryptosystems [2]. For better resource management
in WSN, SIS is used, which is an efficient clustering approach. In SIS, to form a
network, every node associates with its adjacent node. The dominant node and the
member nodes maintain the SIS and sub-clusters, respectively. The utilization of the
SIS scheme helps in organizing the networks and ensuring communication among the
nodes. Thus, there would a successful establishment of clusters and it also ensures
clusters node management. In algorithm 2, the steps involved in (EAARCP) path
discovery method are described.
The main aim of the PDG protocol [4,5] is to provide an effortless approach
to routing as well as management [10,11]. This further helps in the reduction of
routing table size [11]. In this protocol, very little bandwidth is utilized as the mutual
transfer of messages among the nodes is very less. The sensor topology is considered
to have less maintenance and ease of use. The credibility value of each sensor node
is considered in the PDG protocol during aggregation [9] for identifying the data
redundancy. This helps in increasing the aggregation reliability [1012]. Meaningful
data is provided to the near end by the supporting nodes. Network integrity is an
important feature of this protocol, and this is ensured by sending random informa-
tion to the sink nodes or the nearest end nodes instead of sending the actual value.
356 K. Arun Kumar and K. Jayaraman
Within a cluster, the compromised node or a supporting node is identified using
presumptive data aggregation process. There would be a more deviation in value for
the non-supporting nodes when compared to the other supporting nodes’ values. The
presumptive value is derived by multiplying the aging factor by the old reliability
value, which is represented by multiplying the product of many supporting nodes
and non-supporting nodes. With the PDG implementation, the compromised nodes
in the network can be easily detected and even the accuracy of data aggregation [8,
9] could be attained. In algorithms 2, the steps involved in PDG and forwarding the
data packets of EDAARP are presented, respectively.
Algorithm 1. Steps for Presumptive Data Gathering
Input: Data Gathering (Xi), presumptive value of sensor node (Vi),
Output: Aggregated Data
Step 1: Each member node Ni receives a request from the Aggregator for their
credibility value Vi.
Step 2: The presumptive value is reported by Sensor Nodes (Ni) to their aggregator
(Vi).
Step 3: The aggregator gets its presumptive value and uses it to find out which
nodes are helpful and which aren’t.
Step 4: The aggregator collects data from the nodes that are supported and
generates Dagg.
Algorithm 2: EAARCP: A Step-By-Step Procedure
Input: Sensor nodes, Neighbor information-based clustering, Relay node, Dominator
node, Connector node, and each node’s actions are all inputs.
Output: Data is delivered to the base station after it has been processed.
Begin
List the packets based on their length and type.
if (the packet is in good condition) SIS is used to find a neighbor node.
if The packet arrived at the aggregator node via a neighbor node, which
performed the aggregation.
Else
Using multi-hop communication, try to reach the aggregator node via multiple
neighbor nodes.
end
if Connect to another aggregator node via relay node if a packet is relayed
from one aggregator node to another aggregator node.
33 EDAARP-Efficient and Data-Aggregative Authentic Routing 357
end
if the connectivity is at its finest, To send data to the base station, use the
processor and aggregator nodes.
else
establish a connection to forward data using the relay node, connector node,
and dominator node.
end
path is generated, Pack sent
End
33.3 Simulation Study and Performance Analysis:
NS2 simulator [13] is used for simulating the EAARCP. This analysis helps in
evaluating the EDAARP approach performance. Various EDAARP aspects when
compared to InFRA, DRINA, and SPT [4,1012]. The analysis also measured the
packet delivery ratio, lifetime, and throughputs. Utilizing various node topologies,
the simulations are carried out. The results that are observed in this research are
presented in Figs. 33.6,33.7,33.8 and 33.9.
33.4 Conclusion
WSN operates in a difficult and sensitive environment. As a result, they are frequently
prone to network connection damage. The network is split into separate segments
as a result of node disconnectivity in a certain region. In this research, the relay
Fig. 33.6 Lifetime versus
network size
358 K. Arun Kumar and K. Jayaraman
Fig. 33.7 Packets processed
versus loss
Fig. 33.8 Throughput
versus network size
Fig. 33.9 Packet delivery
versus loss probability
33 EDAARP-Efficient and Data-Aggregative Authentic Routing 359
nodes and MCDS ensured optimum network connectivity. Also, EDAARP is being
proposed and implemented in the NS2 simulator, which effectively established a
route among the source node and the BS. With PDG methodology, an energy-efficient
architecture is designed utilizing the PAMPS mechanism and the relay nodes, there
is an establishment of maximum network connectivity. As a part of performance
analysis, the performance assessment factors such as packets per packet involved,
throughput, and lifetime are studied. Also, the above-mentioned factors of EDAARP
are compared and contrasted with DRINA, In FRA, and SPT.
References
1. Ozdemir, S. (2007). Secure and reliable data aggregation for wireless sensor networks. In H.
Ichikawa et al. (Eds.), LNCS (Vol. 4836, pp. 102–109).
2. Ozdemir, S. (2008). Secure data aggregation in wireless sensor networks via homomorphic
encryption. Journal of the Faculty of Engineering and Architecture of Gazi University, 23(2)¸
365–373. ISSN: 1304-4915.
3. Misra, S., & Thomasinous, P. D. (2010). A simple, least-time, and energy-efficient routing
protocol with one-level data aggregation for wireless sensor networks. The Journal of Systems
and Software, 83, 852–860.
4. Villas, L. A., Boukerche, A., Ramos, H. S., de Oliveira, H. A. B. F., de Araujo, R. B., &
Loureiro, A. A. F. (2013). DRINA: A lightweight and reliable routing approach for in-network
aggregation in wireless sensor networks. IEEE Transactions on Computers, 62(4).
5. Thai, M. T., Wang, F., Zhu, S., & Zhu, S. (2007). Connected dominating sets in wireless
networks with different transmission ranges. IEEE Transactions on Mobile Computing, 6(7).
6. Rai, M., Verma, S., & Tapaswi, S. (2009). A power-aware minimum connected dominating set
for wireless sensor networks. Journal of Networks, 4(6).
7. Ji, S., He, J. S., Pana, Y., & Li, Y. (2013). Continuous data aggregation capacity in probabilistic
wireless sensor networks. Journal of Parallel and Distributed Computing, 73, 729–745.
8. Mantri, D. S., Prasad, N. R., & Prasad, R. (2014). Bandwidth efficient cluster-based data
aggregation for wireless sensor network. Computers and Electrical Engineering.
9. Rout, R. R., & Ghosh, S. K. (2014). Adaptive data aggregation and energy efficiency using
network coding in a clustered wireless sensor network: An analytical approach. Computer
Communications, 40, 65–75.
10. Amgoth, T., & Jana, P. K. (2014). Energy-aware routing algorithm for wireless sensor networks.
Computers and Electrical Engineering.
11. Zin, S. M., Anuar, N. B., Kiah, M. L. M., & Pathan, A.-S. K. (2014). Routing protocol design for
secure WSN: Review and open research issues. Journalof Network and Computer Applications,
41, 517–530.
12. Wu, X., Yan, X., Huang, W., Shen, H., & Li, M. (2013). An efficient compressive data gath-
ering routing scheme for large-scale wireless sensor networks. Computers and Electrical
Engineering, 39, 1935–1946.
13. The NS-2 simulator.http://www.isi.edu/nsman/ns2
Chapter 34
Mobile-Based Selfie Sign Language
Recognition System (SSLRS) Using
Statistical Features and ANN Classifier
G. Anantha Rao, K. Syamala, and T. V. S. Divakar
Abstract This work is to bring mobile-based sign language recognition system into
real time. Selfie sign videos are captured with smartphone front camera. Morpho-
logical gradients along with Sobel edge operators are used to extract hand contour
from each sign video frame. Discrete cosine transform (DCT) of hand contour is
optimized by principle component analysis (PCA) to reduce the execution time. The
four statistical features such as mean, skewness, standard deviation, and kurtosis are
calculated for the optimized hand contour DCT. The feature vector formed with these
four statistical features is used for sign classification using artificial neural networks
(ANN) classifier. The performance of SSLRS is evaluated with the word matching
score (WMS).
34.1 Introduction
The international health organizations have figured that 5% of the total world popu-
lation are hearing impaired. The hearing impaired can’t communicate with others
using acoustic words. The hearing impaired can use sign language to communicate
with others. The signs are formed with hand moments and facial expressions. The
signs are either dynamic or static. The static signs can perform with single moment
of hand, and the dynamic sign can perform with more than one moment of the hands
[1].
In our previous papers, we proposed SSLRS with hand contour DCT optimized
by PCA as feature vector and different sign classifiers such as distance classifiers,
Adaboost classifier, and ANN. In this paper, the SSLR proposed with statistical
features and ANN classifier. The hand contours of the signer in each frame of sign
videos are obtained, and it is represented with an energy compact representation
G. A. Rao (B)·T. V. S. D iv a kar
Department of ECE, GMRIT, Rajam, India
e-mail: anantharao.g@gmrit.edu.in
K. Syamala
Department of ECE, Avanthi Institute of Engineering and Technology, Cherukupally, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
B. N. K. Rao et al. (eds.), Intelligent Computing and Applications, Smart Innovation,
Systems and Technologies 315, https://doi.org/10.1007/978-981-19-4162-7_34
361
362 G. A. Rao et al.
using DCT. The hand contour DCT is treated with PCA. The four statistical features
of PCA-treated DCT are generated. The statistical feature vector (4 ×1) is used for
sign classification using ANN.
This paper presents latest literature relevant to SLR in Sect. 34.2, mathematical
models to extract feature vector and sign classification with ANN in Sect. 34.3,
results and performance of proposed SSLRS in Sect. 34.3, and the proposed work is
concluded in Sect. 34.5.
34.2 Literature Review
Video segmentation methods using wavelets are proposed to detect hand and head
shape and positions [2]. Tanibata et al. [3] proposed gesture features with orienta-
tion, area, flatness of hand portion. Parul et al. [4] presented features with height,
centroid of hand portion, and distance of centroid from origin of the frame. Rao
et al. proposed SSLR with compact energy features and ANN classifiers [5]. And
also tested the performance of SSLR with shape energy features [6]. The better
classification rates achieved with compact energy features for Indian sign language
(ISL). Tharwat et al. [7] proposed scale invariant feature transform (SIFT) treated
with linear discriminant analysis (LDA) for American sign language. The optimized
features using LDA were used for sign classification using KNN and SVM. The static
signs and finger spells were categorized into manual, and face expressions were clas-
sified as non-manual category [8]. Lee et al. [9] proposed techniques for capturing
the hand features using kinetic device sensors. SVM is used for training and classi-
fication of signs based on hand direction, position, and shape. The achieved results
were reasonably good in statistics. Holden et al. [10] presented hidden Markov model
(HMM) and SIFT features along with signs speed. Words and sentences were clas-
sified with accuracy 99% and 97%, respectively. Body and the hand position were
used for the language recognition with independent signer [11]. Successive frames
with no overlap are smallest are chosen to avoid overlapping of signs. HMM is used
for sign classification. The performance of recurrent neural networks (partially and
fully connected) was evaluated for Arabic sign language recognition. Fully connected
networks shown better accuracy [12]. Discrete wavelet transform-based (DWT)
features were extracted which were fed to ANN for sign classification. Authors used
a database of 32 signs with 640 images. For Khan et al.[13] proposed DWT features
and backpropagation ANN classifier. Global features contained region and boundary
information were used for sign recognition [14]. The seven Hu moments are used
for region information, while Fourier descriptors are used for boundary informa-
tion. Classification is done with SVM sign classifier. Kausar et al. [15] proposed
fuzzy-based sign recognition system for which the joint positions and finger tips are
extracted through the color gloves.
34 Mobile-Based Selfie Sign Language Recognition System 363
34.3 Proposed SSLR
The signs performed with one hand are captured using smart phone front camera by
holding the smart phone with selfie stick on the other hand. A sign video database
of 18 Indian words by 10 different signers is created and maintained for SSLRS.
34.3.1 Feature Extraction
Figure 34.1 elaborates the flowchart of SSLR system. The capture noise is removed
using zero mean Gaussian filters with probability density function
f(x)=1
2πσ2e(xm)2
2σ2
mean and σis standard deviation. The Gaussian filters with different σvalues
(0.01, 0.1, and 0.15) are used noise removal.
The information regarding the data change in every frame in maximum change
direction is obtained by applying Sobel edge operator (2D gradient). For every frame,
the gradients gxand gyare calculated in xand ydirections.
gx=
N
m=1
f(xm,y)g(k)
gy=
N
m=1
f(xm,y)gT(k)
where gradient operator gis given by [+1, 1].
Fig. 34.1 SSLR flowchart
364 G. A. Rao et al.
The block thresholding is used to generate the binary image for which the 2D
Sobel convolution masks are used. The block thresholding binary image is given by
BX=
M
x=1(Sxfx)2+(Syfy)2
s
i=1
M
x=1(Sxfx)2+(Syfy)2
In the above equation, Sx,Syare the convolutional masks given by
Sx=
121
000
121
Sy=
121
000
121
T
and sis block size.
Background variations are reduced automatically with the block variational
thresholding. Figures 34.2 and 34.3 show block threshold and global threshold (with
0.2) binary images.
Hand, head contours from binary image are generated by morphological gradients
with connected component analysis as given below.
hC(x)=z|(M3H)zBx= φz|(M3H)zBx
hC(y)=z|(M3v)zBx= φz|(M3v)zBx
hC(x,y)=hC(x)hC(y)
Fig. 34.2 Block threshold
binary image
34 Mobile-Based Selfie Sign Language Recognition System 365
Fig. 34.3 Global threshold
binary image
where M3V,M3Hare vertical and horizontal directions line masks.
The hand and head contours hC(x, y)hand,hC(x, y)head are separated using four
neighborhood pixel operation which are shown in the Figs. 34.4,34.5,34.6,34.7,
and 34.8.
The DCT of hand contour (hC(x, y))hand is given by
Fv
uv =1
4
CuCv
M
x=1
M
y=1
hC(x,y)hand cos uπ(2x+1)
Lcos (2y+1)
L
where Cu=Cv=1
2for all (u,v)=0
=1 elsewhere.
Figure 34.9 shows the color-coded DCT of hand contour. The maximum energy
of a video frame is concentrated in the first 50 ×50 matrix which is because of
the hand movement. In the proposed work, the hand portion is segmented, and the
contour of the hand shape is generated is obtained for which the DCT is calculated.
Fig. 34.4 Frame no. 74
366 G. A. Rao et al.
Fig. 34.5 Binary image
Fig. 34.6 Head and hand
contours
Fig. 34.7 Hand contour
34 Mobile-Based Selfie Sign Language Recognition System 367
Fig. 34.8 Head contour
The first 50 ×50 samples are considered for the sign classification. Execution time
of those 2500 samples is more. To minimize the execution time, the 50 ×50 matrix
is treated with PCA to obtain 50 ×1 vector [16]. The four statistical features such as
mean, skewness, standard deviation, and kurtosis are calculated for the 50 ×1 vector
that generates a 4 ×1 feature vector which is fed to the ANN for sign classification.
Fig. 34.9 Color-coded hand contour energy (DCT) of different frames
368 G. A. Rao et al.
34.3.1.1 Mean
Mean of the 50 ×1 vector is calculated as given in the following equation
μ=50
k=1vk
50
where vk is the kth sample in the 50 ×1 vector.
34.3.1.2 Standard Deviation
The standard deviation of the 50 ×1 vector is calculated as given in the following
equation
σ=50
k=1(vk μ)2
50
where vk is the kth sample in the 50 ×1 vector and μis the mean of the 50 ×1
vector.
34.3.1.3 Skewness
Skewness of the 50 ×1 vector is calculated as given in the following equation
S=50
k=1(vkμ)3
50 .
34.3.1.4 Kurtosis
The kurtosis of the 50 ×1 vector is calculated as given in the following
K=50
k=1(vkμ)4
50 .
34.3.2 Classification
In the proposed SSLR system, the single hidden layer and multi-hidden layer ANNs
shown in Figs. 34.10 and 34.11 are used for sign classification. The extracted 4 ×1
feature vector for every frame of sign video is applied as input to train and test the
ANN with activation function snet=1
1+e(net), i.e., sigmoid. The performance
of SSLR is verified the sign recognition rates.
34 Mobile-Based Selfie Sign Language Recognition System 369
Fig. 34.10 Single hidden
layer ANN model for sign
classification
Fig. 34.11 Multiple hidden layer ANN model for sign classification
370 G. A. Rao et al.
The signs performed with one hand are captured using smartphone front camera
by holding the smartphone with selfie stick on the other hand. A sign video database
of 18 Indian words by 10 different signers is created and maintained for SSLRS.
34.4 Result Analysis
The database of selfie videos is created for “hai, good, morning, I, am, D, H, R, U, V,
A, nice, to, see, you, thank, you, bye.” For training, these eighteen sign words are kept
in sequence and in different order for testing. Figure 34.12 shows the sample frames;
Fig. 34.13 shows hand, head portions, and Fig. 34.14 shows hand, head contours.
Segmented hand portion and contour are shown in Figs. 34.15 and 34.16.
In the next section, the performance of SSLR system is evaluated with WMS.
Fig. 34.12 RGB frames
Fig. 34.13 Hand and head portions
Fig. 34.14 Hand, head contours
34 Mobile-Based Selfie Sign Language Recognition System 371
Fig. 34.15 Hand segment
Fig. 34.16 Hand contour
34.4.1 Performance Evaluation: Word Matching Score
(WMS)
The DCT of the hand contour is generated, in which the first 50 ×50 samples
are considered for sign classification. These 2500 samples are treated with PCA to
generate a 50 ×1 sample vector. The statistical parameters mean, skewness, standard
deviation, and kurtosis are calculated to extract the feature vector of size 4 ×1. The
372 G. A. Rao et al.
mean, skewness, standard deviation, and kurtosis of the 50 ×1 vector for the 60th
and 90th frames of 3rd sign (morning) are shown in Table 34.1.
The word matching score is defined by the ratio of number of correct classifications
to the total number of classifications.
%M=Number of correct classifications
Total number of classifications .
Table 34.1 Statistical features of 60th and 90th frames of 3rd sign (morning)
60th frame 50 ×1 vector Mean Standard deviation Skewness Kurtosis
[0.6135, 0.6135,
1.1269, 1.1269, 1.1114,
1.1146, 0.3621, 0.2183,
1.4986, 1.5882,
2.0728, 1.7498, 1.8040,
1.4834, 2.1938, 1.0954,
0.3230, 2.0883,
0.6633, 1.8864, 0.4939,
0.9714, 1.5619, 1.2418,
1.515, 1.7857,
0.5680, 0.9373, 1.5394,
1.4593, 1.9025, 0.1703,
0.8057, 0.5604, 1.2177,
0.7542, 0.2496,
1.4241, 1.6085,
0.1429, 1.8646, 1.6085,
0.9456, 1.0082, 0.1630,
0.7175, 1.3761, 1.7430,
0.2666, 0.2375]
0.1195 1.2664 0.0553 1.7005
90th frame [1.8493, 1.8493,
1.3810, 1.3810, 1.2292,
1.3162, 1.7321, 2.5969,
2.6743, 2.1905, 2.1356,
2.8661, 2.4568, 0.8828,
2.1926, 1.5707, 0.7369,
0.9115, 1.2881,
0.03119, 0.9812, 2.64401,
0.08503, 0.8926,
0.8662, 0.5935,
1.8733, 1.2805,
2.0614, 1.7324, 2.4475,
2.1250, 0.5854, 1.0626,
0.9244, 1.8336,
0.0101, 0.4198, 0.5409,
1.1290, 1.1319, 0.2586,
0.1026, 1.8913,
1.2702, 0.6647, 0.5898,
1.1216, 0.4245, 0.7854]
0.2043 1.5186 0.1510 1.9557
34 Mobile-Based Selfie Sign Language Recognition System 373
Table 34.2 WMS with single hidden layer (78 neurons) ANN trained with 10 sets each with 18
signs and tested with 6 sets each with 18 signs
Training Testing Network architecture Output confusion matrix WMS
(%)
10 sets
each with
18 signs
6sets
each with
18 signs
85.5
18 signs of 10 different signers (180 signs) are used to evaluate the performance
of SSLR system. The average WMS achieved is 85.5 % when a single hidden layer
ANN trained with 10 sets of each with 18 signs and tested with 6 sets each with 18
signs shown in Table 34.2. The average WMS achieved is 91 % when a multiple (4)
hidden layer ANN trained with 10 sets of each with 18 signs and tested with 6 sets
each with 18 signs shown in Table 34.3.The average WMS achieved is 86.4% when
a single hidden layer ANN trained with 15 sets of each with 18 signs and tested with
10 sets each with 18 signs shown in Table 34.4. The average WMS achieved is 96
% when a multiple (4) hidden layer ANN trained with 10 sets of each with 18 signs
and tested with 6 sets each with 18 signs shown in Table 34.5. It is observed that the
recognition time with single hidden layer ANN having 78 neurons is 0.272 s for a
total of 72 epochs and with multi-hidden (4) layer ANN having 78 neurons per each
layer is 0.872 s for a total of 42 epochs. And also observed that the recognition time
with single hidden layer ANN having 150 neurons is 0.389 s for a total of 61 epochs
and h multi-hidden (4) layer ANN having 150 neurons per each layer is 1.543 s for
a total of 28 epochs.
34.5 Conclusion and Future Work
A video database of 18 Indian signs for 10 different signers is created to simulate
and test the performance of SLR system using mobile phone. The hand contour of
the signer is extracted for every frame of the sign video. The DCT of hand contour
is calculated and which is treated with PCA to obtain the optimized feature vector.
The statistical features are calculated for the PCA treated feature vector which is
inputted to the ANN for the sign classification. The performance of the SSLR system
is evaluated with WMS. The word matching scores and execution times with single
hidden layer (78 neurons) are about 85% which is improved with the multiple hidden
374 G. A. Rao et al.
Table 34.3 WMS with multiple (4) hidden layer (each with 78 neurons) ANN trained with 10 sets
each with 18 signs and tested with 6 sets each with 18 signs
Training Testing Network architecture Output confusion matrix WMS
(%)
10 sets
each
with 18
signs
6sets
each
with 18
signs
91
Table 34.4 WMS with single hidden layer (150 neurons) ANN trained with 15 sets each with 18
signs and tested with 10 sets each with 18 signs
Training Testing Network architecture Output confusion matrix WMS
(%)
15 sets
each with
18 signs
10 sets
each with
18 signs
86.4
Table 34.5 WMS with multiple (4) hidden layer (each with 78 neurons) ANN trained with 15 sets
each with 18 signs and tested with 10 sets each with 18 signs
Training Testing Network architecture Output confusion matrix WMS
(%)
15 sets
each with
18 signs
10 sets
each
with 18
signs
96.9
34 Mobile-Based Selfie Sign Language Recognition System 375
layers or with increased number of neurons. The feature models, sign classifiers need
to be improved in the future work.
References
1. Anantha Rao, G., & Kishore, P. V. V. (2016, October). Sign Language recognition system
simulated for video captured with smart phone front camera. International Journal of Electrical
and Computer Engineering (IJECE),6(5), 2176–2187.
2. Kishore, P. V. V., & Rajesh Kumar, P. (2012). A video based Indian sign language recognition
system (INSLR) using wavelet transform and fuzzy logic. In IACSIT international journal of
engineering and technology (Vol. 4, No. 5, pp. 537–542).
3. Tanibata, N., Shimada, N., & Shirai, Y. (2002). Extraction of hand features for recognition of
sign language words. In Proceedings of the international conference on vision interface.
4. Parul, H. (2014). Neural network based static sign gesture recognition system. International
Journal of Innovative Research in Computer And Communication Engineering (Ijircce), 2(2),
3066–3072.
5. Rao, G. A., & Kishore, P. V. V. (2018). Selfie video based continuous Indian sign language
recognition system. Ain Shams Engineering Journal, 9(4), 1929–1939.
6. Rao G. A., Syamala, K., & Divakar, T. V. S. (2020). Selfie sign language recognition with
shape energy features and ANN classifier. International Journal of Science Technology and
Research, 9(04), 1936–1939.
7. Tharwat, A., Gaber, T., Hassanien, A. E., Shahin, M. K., & Refaat, B. (2015). SIFT-based
Arabic sign language recognition system. In Proceedings of the Afro-European conference for
industrial advancement (pp. 359–370). Springer Cham.
8. Sulman, D. N., & Zuberi, S. (2000). Pakistan sign language—a synopsis. Pakistan, Jun 2000.
[Online]. Available: www.academia.edu/2708088/
9. Lee, G. C., Yeh, F.-H., & Hsiao, Y.-H. (2016). Kinect-based Taiwanese sign- language
recognition system. Multimedia Tools and Applications, 75(1), 261–279.
10. Holden, E.-J., Lee, G., & Owens, R. (2005). Australian sign language recognition. Machine
Vision and Applications, 16(5), 312.
11. Zieren, J., & Kraiss, K.-F. (2005). Robust person-independent visual sign language recognition.
In Iberian conference on pattern recognition and image analysis 2005 (pp. 335–355).
12. Maraqa, M., & Abu-Zaiter, R. (2008). Recognition of Arabic sign language (ArSL) using recur-
rent neural networks. In Proceedings of the 1st international conference on the applications of
digital information and web technologies (ICADIWT), Aug 2008 (pp. 478–481).
13. Khan, N., Shahzada, A., Ata, S., Abid, A., Khan, Y., & ShoaibFarooq, M. (2014). A vision
based approach for Pakistan sign language alphabets recognition. Pen s ee , 76 (3), 274–285.
14. Ahmed, H., Gilani, S. O., Jamil, M., Ayaz, Y., & Shah, S. I. A. (2016). Monocular vision-based
signer-independent Pakistani sign language recognition system using supervised learning.
Indian Journal of Science and Technology, 9(25), 1–16.
15. Kausar, S., Javed, M. Y., & Sohail, S. (2008). Recognition of gestures in Pakistani sign language
using fuzzy classifier. In Proceedings of the 8th conference on signal processing,computational
geometry and artificial vision (pp. 101–105).
16. Anil Kumar, D., Kishore, P. V. V., Sastry, A. S. C. S., & Reddy Gurunatha Swamy, P. (2016).
Selfie continuous sign language recognition using neural network. In 2016 IEEE annual India
conference (INDICON).
Chapter 35
An Effective Model for Malware
Detection
V. Valli Kumari and Shaik Jani
Abstract Malware detection and identification is important to protect an organi-
zation’s data and enable end-to-end monitoring of resources accessible by multiple
users through Internet. Malicious users and Intruders usually try various methods
to gain unauthorized access to data from remote locations. This paper proposes a
model that helps in finding the malware characteristics by extracting features of the
data provided. This model is also tested for unknown malware files generated using
various available tools. This paper discusses the steps used in building an effective
model, Model for Malware Detection (MMD) using EMBER dataset and Keras. The
results obtained with model accuracy of 97.2% are presented.
35.1 Introduction
Malware can be defined as a set of malicious files or programs and may take many
forms like Root kit, Spyware, Botnet, Trojan, Ransomware and gain unauthorized or
unprivileged access to files in victim systems or servers. A malware can affect devices
like desktops, laptops, mobile phones, health care devices, enterprise servers, clients
and network devices. Malware detection means finding the presence of malware in
a given host or detecting the malicious behavior of a program. Email, chat clients,
phone conversations, SMS messages, and even postal mail are used to communicate
with other systems or software.
Unskilled users execute malicious code of the attackers allowing them to penetrate
into the network. Many antivirus products rely solely on signature-based methods
and techniques, which often fail to identify malware whose signatures are not yet
available. High-volume malware is evolving at a faster pace, and it is becoming
increasingly difficult to evaluate the massive amounts of data generated by network
transactions. In general, malware files of this type are recognized and identified
using datasets derived from traffic data collected from registered and enterprise-based
networks. To protect identification of malware files, malware authors use the most
V. V. K u m a r i ( B)·S. Jani
Andhra University College of Engineering (A), Visakhapatnam, AP, India
e-mail: vallikumari@gmail.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
B. N. K. Rao et al. (eds.), Intelligent Computing and Applications, Smart Innovation,
Systems and Technologies 315, https://doi.org/10.1007/978-981-19-4162-7_35
377
3 7 8 V. V. K u m a r i an d S . J a n i
complex and reliable obfuscation techniques, like code amalgamation and reordering
of subroutines, which make identification of malwares hard. In several predefined
malware detection methods the malwares are identified with a high false positive
rate.
The work in this paper uses Elastic malware benchmark for empowering
researchers (EMBER) dataset and this dataset contains over 1.1 million Portable
Executable files (PE) [1]. To analyze this data, Model for Malware Detection (MMD)
is proposed which extracts features and then classifies the malware. The MMD model
gives 97.2% accuracy and helps in the detection and prediction of malware. The work
in this paper contributes the following: (a) Using EMBER-2018 dataset to extract
the features and class labels, which are used to detect malware. (b) This model can
be used for decision making through multimodal approaches (c) Finally, the model
is tested by providing new data or malware generated from different websites or
created using MSF venom and feeding it to the proposed model to find whether it is
malicious or not.
The next sections of the paper are as follows. The relevant literature that was
studied is summarized in Sect. 35.2. Section 35.3 discusses the features of the dataset
used for experimentation. Section 35.4 covers the Malware Detection Model (MMD)
proposed for the detection and prediction of malwares. Section 35.5 explains the
results and Sect. 35.6 summarizes the contributions and presents points for further
work on this topic.
35.2 Literature Survey
Much of the published recent research work is focused on the building of automatic
malware detection mechanisms which used statistical methods rather than deter-
ministic rules. Many works have also proposed machine learning-based approaches
solutions for cyber security-related problems. Many of these operations may be auto-
mated to detect cyber-attacks in real-time and prevent damage. It was found in the
survey that most researchers are interested in the detection of malware of different
types.
A model is proposed in [2] using three datasets for training, testing and scaling up.
The paper discusses the use of cascade one-sided perceptron first and cascade kernel-
ized one-sided perceptron later to identify malware files and reduces false positives.
Theworkin[3] considers building a decision model with data processing, decision
making and malware detection to classify and detect any suspicious malware. The
malware analysis system is ML-based and uses shared nearest neighbor technique.
In [4] information was collected from 2510 APK files of which 1260 were mali-
cious apps. The paper proposed machine learning-based framework for detecting
malicious apps using information from API calls and permissions. In [5] Microsoft
office files related to malicious macros were analyzed using machine learning
methods. In [6] data is collected from packets instead of port numbers and protocols
and automated malware detection was proposed using convolution neural networks
35 An Effective Model for Malware Detection 379
and other machine learning methods. In [7] Malware files are executed in the Cuckoo
Sandbox and traced malware process behavior to determine generated and injected
processes. Recurrent neural network (RNN) is used for feature extraction and later
CNN is trained to classify. In [8] a model based on deep auto-encoder and CNN was
proposed. The data was collected from 23,000 apps and was processed for use in
deep learning models to identify malware in android apps. In [9] data was collected
from VMs by running various malwares like rootkits and Trojan horses. Both 2D
and 3D CNN were used to improve accuracy of detection. In [10] a binary file was
classified to be malicious or benign. The model was tested on a dataset with 11,130
binaries. In [11] data augmentation was used to generate variants of images obtained
from malware samples. In [12] novel ensemble CNN based architecture was used for
detection of both packed and unpacked malware. In [13] static and dynamic analysis
tools were used to extract features from 7000 malware and 3000 benign files and a
classification model was built. In [14] a classification model is built by extracting
requested permissions, vulnerable API calls, application’s key information such as
dynamic code, reflection code, native code, cryptographic code, and database from
applications.
35.3 Data Preprocessing
The Elastic malware benchmark for empowering researchers [1] (EMBER) dataset,
includes over one million Portable Executable (PE) files is used. The PE File [15]
format is given in Fig. 35.1. This dataset has features and repository useful for
machine learning and deep learning models training and testing. It includes sha256
hashes and is a repository of many attacks. JSON is used to build Ember dataset. The
work in this paper initially starts with data analysis and preprocessing. Various steps
involved for populating the variables used to build the model include: (i) Creating
a vectorised set of features, (ii) loading the train and test vectorised features and
(iii) scaling the dataset with zero mean and unit variance. The data that is split into
training and testing data is given as input to MMD model.
35.4 Model for Malware Detection (MMD)
35.4.1 Overview
In this section the overall architecture of the MMD model is discussed. This model is
based on deep neural network. Initially, the features are extracted from the EMBER
data set [1] and data is pre-processed. A set of 1,000,000 samples is divided into
training and testing datasets. See Fig. 35.2. The total features are 2381. The labels
are 0 and 1 for benign and malicious with counts of 182,524 and 17,476, respectively.
3 8 0 V. V. K u m a r i an d S . J a n i
Fig. 35.1 PE file format [15]CCBY4.0,https://commons.wikimedia.org/w/index.php?curid=510
26079
35 An Effective Model for Malware Detection 381
Fig. 35.2 Malware labels distribution in the dataset
Later MMD model is built to classify malware. Using this model, undetected attacks
can be easily spotted, and malware can be detected at an early stage.
35.4.2 MMD Architecture
The proposed architecture helps in extracting features from the datasets and training
the neural network to build a model. Figure 35.3 shows the proposed MMD model.
Inputs are first pre-processed using ember packages provided [1]. The set of layers
in Fig. 35.3 is proposed after several combinations of layers and fine tuning was
done. The model has Dense and dropout layers. Each dropout was set to 0.25. Adam
optimizer was used. As the pre-processed data set has only two labels 0 and 1, either
sigmoid or softmax can be used in the last layer. In the output layer sigmoid is used
to obtain binary classification. Softmax can be used depending upon how the data is
labeled into different classes in the dataset.
35.5 Experimental Results
Figure 35.4 depicts the model layers and Fig. 35.5 presents the validation accuracy in
malware detection. 1,000,000 samples from EMBER dataset are considered. 800,000
are used for training the model. 200,000 samples are used for testing. Output labels
depicts whether the sample is malicious or not. The accuracy obtained is 97.2% as
showninFig.35.5 for the layers shown in Fig. 35.4. The precision, recall and F1
scores indicated the model quality to be high.
3 8 2 V. V. K u m a r i an d S . J a n i
Fig. 35.3 MMD model
architecture
Fig. 35.4 MMD model layers
35 An Effective Model for Malware Detection 383
Fig. 35.5 Epoch wise training and validation: aAccuracy bloss
35.6 Testing MMD Model with Unknown Malware
This section discusses the generation of a customized malware and testing MMD
model for classifying whether it is a malware or not. Kali Linux OS has Metasploit
Framework which has Msfvenom tool to create the malware. This malware file is
pre-processed with PEFeatureExtractor() defined in ember modules [1]. The data is
pre-processed and is submitted to the MMD model for prediction and it was detected
as malicious.
3 8 4 V. V. K u m a r i an d S . J a n i
35.7 Conclusion
Malware can be classified using deep learning techniques. This paper discusses a deep
learning model, MMD for malware detection. The paper has extensively surveyed
several works published in literature. MMD model was trained and tested with Ember
dataset and was found to work with 97.2% accuracy which is quite a good improve-
ment over [1]. Later the model was also tested with unknown malware generated
through Msfvenom, and it was correctly classified. In future the model will be fine-
tuned and optimized for better accuracy. Detection in real-time is another area to be
explored.
References
1. Anderson, H. S., & Roth, P. (2018). EMBER: An open dataset for training static PE malware
machine learning models.
2. Gavrilu¸t, D., Cimpoe¸su, M., Anton, D., & Ciortuz, L. (2009, October). Malware detection
using machine learning. In 2009 IEEE multiconference on computer science and information
technology (pp. 735–741). IEEE.
3. Liu, L., Wang, B. S., Yu, B., & Zhong, Q. X. (2017). Automatic malware classification and new
malware detection using machine learning. Frontiers of Information Technology & Electronic
Engineering, 18(9), 1336–1347.
4. Peiravian, N., & Zhu, X. (2013, November). Machine learning for android malware detection
using permission and api calls. In 2013 IEEE 25th international conference on tools with
artificial intelligence (pp. 300–305). IEEE.
5. Bearden, R., & Lo, D. C. T. (2017, December). Automated Microsoft office macro malware
detection using machine learning. In 2017 IEEE international conference on big data (Big
Data) (pp. 4448–4452). IEEE.
6. Yeo, M., Koo, Y., Yoon, Y., Hwang, T., Ryu, J., Song, J., & Park, C. (2018, January). Flow-
based malware detection using convolutional neural network. In 2018 International conference
on information networking (pp. 910–913).
7. Tobiyama, S., Yamaguchi, Y., Shimada, H., Ikuse, T., & Yagi, T. (2016, June). Malware detec-
tion with deep neural network using process behavior. In 2016 IEEE 40th annual computer
software and applications conference (COMPSAC) (Vol. 2, pp. 577–582). IEEE.
8. Wang, W., Zhao, M., & Wang, J. (2019). Effective android malware detection with a hybrid
model based on deep auto encoder and convolutional neural network. Journal of Ambient
Intelligence and Humanized Computing, 10(8), 3035–3043.
9. Abdelsalam, M., Krishnan, R., Huang, Y., & Sandhu, R. (2018, July). Malware detection in
cloud infrastructures using convolutional neural networks. In 2018 IEEE 11th international
conference on cloud computing (CLOUD) (pp. 162–169). IEEE.
10. Sharma, A., Malacaria, P., & Khouzani, M. H. R. (2019, June). Malware detection using 1-
dimensional convolutional neural networks. In 2019 IEEE European symposium on security
and privacy workshops (EuroS&PW) (pp. 247–256). IEEE.
11. Catak, F. O., Ahmed, J., Sahinbas, K., & Khand, Z. H. (2021). Data augmentation-based
malware detection using convolutional neural networks. PeerJ Computer Science, 7, e346.
12. Vasan, D., Alazab, M., Wassan, S., Safaei, B., & Zheng, Q. (2020). Image-based malware
classification using ensemble of CNN architectures (IMCEC). Computers & Security.
13. Jerlin, M. A., & Marimuthu, K. (2018). A new malwaredetection system using machine learning
techniques for API call sequences. Journal of Applied Security Research, 13(1), 45–62.
35 An Effective Model for Malware Detection 385
14. Koli, J. D. (2018, March). RanDroid: Android malware detection using random machine
learning classifiers. In 2018 Technologies for smart-city energy security and power (ICSESP)
(pp. 1–6). IEEE.
15. Wikipedia. https://commons.wikimedia.org/wiki/File:Portable_Executable_32_bit_Struct
ure_in_SVG_fixed.svg
Chapter 36
An Efficient Approach to Retrieve
Information for Desktop Search Engine
S. A. Karthik, G. Lalitha, Y. Md. Riyazuddin, and R. Venkataramana
Abstract The Internet may not be the only source of information. Despite not
remembering the relevant terms, individuals understand the need to access docu-
ments. Entity Disambiguation is a technique for deciphering the text in processed
documents and queries. In this work, an outline of a desktop search engine using entity
disambiguation at its foundation has been presented. In this article, disambiguate
terms/entities using a Naive Bayes probabilistic model created to comprehend the
keyword depending on the set it is part of, which is inspired by the probabilistic
taxonomy Probase. There are three parts to our implementation: Text extraction, the
conceptualization of obtained items, and index updating/matching. Experimental
results obtained by implementation of the methodology yield truthfulness of the
approach.
36.1 Introduction
Search and retrieval has been the core features of any system which stores data as fact
files of information. Many techniques have been developed to make search efficient
on a personal computer. The most popular File Explorer is a default File Browser
in MS OS [1]. It indexes all the file details such as creation and modification dates,
location, file type, file size, and even file contents. N-gram tokens are used as part
of the index. While this is already convenient enough, we often encounter situations
S. A. Karthik (B)
BMS Institute of Technology and Management, Bengaluru, India
e-mail: karthiksa1990@gmail.com
G. Lalitha ·Y. Md. Riyazuddin
Gitam Deemed to Be University, Hyderabad, India
e-mail: lggidalu@gitam.edu
Y. Md. Riyazuddin
e-mail: rymd@gitam.edu
R. Venkataramana
SV College of Engineering, Tirupati, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
B. N. K. Rao et al. (eds.), Intelligent Computing and Applications, Smart Innovation,
Systems and Technologies 315, https://doi.org/10.1007/978-981-19-4162-7_36
387
388 S. A. Karthik et al.
where we forgot the file details and contents but still desire to retrieve them. Search
engine as an answer to the above scenario.
Concept clusters are a good alternative to the standard ontologies [2] for they are
flexible to maintain and apply in text understanding. Terms can be either an entity that
accommodates an is a relationship to the concept cluster or an attribute the describes
the characteristics of the concept clusters.
The machine could assign a similarity score to every content while performing
the search. Whenever a group of items has been recognized and graded, those are
prioritized and accessed by users via the interface based on their scores. Following
the release of the dataset to the user, certain systems allow users to further modify
their search by reviewing the responses, marking those items in the dataset that are
regarded significant, and resending them to the system via the pertinent reflective
process. A client will have to create a question mostly in form of a list of questions
in simple language, as shown in the figure above. The IR system will next reply by
gathering the essential content’s appropriate result in the form of records.
The software subsequently generates a new conclusion and presents it to the
viewer. An IR system’s main job is to anticipate which items are constructive and
which are not, based on clients technical specifications. It is now well acknowledged
that IR plays a critical role in workstation and Internet applications [35]. Avoiding
phrase removal, isolating, and many other operations dependent on the software’s
characteristics, such as phrase generation, syllable canonicalization, term elongation,
and are used to recognize and extract phrases from txt files before they are evaluated.
Probabilistic taxonomy Probase open opportunities to entity disambiguation that
tremendously contributes to natural text understanding of both file contents and
queries. Probase is a Taxonomy of is A relationships between entities that is easy
to maintain and actively updated in the community. There are many works which
compares Probase with other taxonomies such as FreeBase and WikiTaxonomy to
say that entities of Probase enrich text understanding to greater levels.
Naïve Bayes Probabilistic Model is used to disambiguate text entities by retrieving
the concept clusters the entities belong to. The Model is explained in the later sections
of this paper. A simple Inverted Index is used in conjunction with BM2F Information
Retrieval Model including PageRank to retrieve the desired documents. The proposed
work is compared based on its performance on entity disambiguation on the TREC
datasets for which there are many cutting edge methodologies present.
The remaining of the article is arranged as follows: Recent stuff on search engine
principles as detailed in Sect. 36.2. Suggested techniques Probase has been described
in Sects. 36.3 and 36.4, accompanied by a summary of findings to demonstrate the
feasibility of the proposed approach. Ultimately, a prognosis and guidelines for future
improvements have been offered.
36 An Efficient Approach to Retrieve Information 389
36.2 Related Work
In this section, some of the popular methods of gridding of cDNA microarray images
from the literature are discussed. In this section recent work on desktop search engines
using concepts has been presented. Most of the information retrieval focuses on
ontology-based retrieval. MS File Explorer is a File Browser that indexes the details
of the document such as creation-modification dates, type of file, title and size along
with its contents. However, there is no Text Understanding or Intent Mining that
would have enriched the index for Retrieval of documents.
From the literature, a survey conducted it clears that a reasonable quantity of
research was conducted on information retrieval.
Parsath et al. [1] describes entity exploitation as a preparatory technique that
uses rules of the language or n-grams to retrieve words or phrases [2,3]. For
extracting features, algorithms based on language rules to flag sections of utterance
are prevalent. These norms are handwritten and modified according to Punctuation
and grammar norms. Kuzi et al. [2] Individuals and their related classes are key
properties that impact retrieving models, whether directly annotated [2,4] or cate-
gorized by an application. The ability to use word representations to include coun-
terparts, hyponyms, and abbreviations improves data samples by covering backdrop
while keeping the system insinuating intent. Seok et al. [3] The objective of inquiry
conceptualization is to outline occurrences in an inquiry to concepts characterized
in a certain philosophy or information base. Questions ordinarily do not stare at the
sentence structure of a composed dialect, nor do they contain sufficient signals for
the factual deduction. Agrawal et al. [4] begins with mine assortment of relations
among terms from a huge web corpus and outline them to related concepts employing
a probabilistic information base. At that point, for a given query, we conceptualize
terms within the inquiry employing an irregular walk based iterative calculation.
Andrzejewski and Buttler [5] substantiates the concept, the collection of maxima is
well integrated and contrasts with the keywords in the recipient search. In order to
contrast the similarity across maps, the way to obtain an approximate subdivision
graph is described by Ganguly et al. [6]. The use of vertical translation between
both graphs ensures that the most subordinate graph is scored. Rich, intuitively, and
immersive applications, such as e-readers for electronic books, have sprung out as a
result of the fast spread of hand-held gadgets. Potthast et al. [7] describes applications
spur recovery frameworks that, by leveraging the setting of the user’s exercises, can
certainly fulfill any data request of the peruse. The inquiries delivered utilizing the
setting are frequently complicated objects, which recognize such recovery frame-
works from standard search. Zuccon et al. [8] explain the consequence of the rapid
expansion of palm devices, sophisticated, dynamic, and comprehensive solutions
including such e-readers for digital journals sprouted up. These implementations
encourage fetching mechanisms that really can effectively supply any lead to the
increase of the reader by exploiting the context of the user’s activity. The inquiries
generated by the context-aware visual search are frequently complex items, which
390 S. A. Karthik et al.
Table 36.1 Summary of the techniques and their impact on retrieval
Sl. No. Technique Technique measure/score Remarks
1Ontology-based topic
classification of the
recognized entities [10]
Keywords: 0.46 pattern
matching: 0.53 NB: 0.65
baseline—VSM: 0.47
14.145% improvement
2Concept graph-based
query generation [11,12]
Random: 0.1 Using CG similarity is
much efficient in doc
retrieval
3Semantic network [13,
14]
Bayesian analysis: 0.84
LDA–co-occurrence and
Probase: 0.83
Seems that random walk
covers much more of the
user’s intent under this
technique
4GOV2 model of word
embeddings [14]
RI score of RM3: 0.392 Including synonyms is
exhaustive
5 Generalized language
model [14]
Recall score of LDA: 0.58 Vector word embedding is
exhaustive
separates them from traditional search engines. Wang et al. [9] gives a theoret-
ical explanation of our approach and conduct a thorough experimental validation to
demonstrate its utility in enhancing electronic documents with high-quality movies
from the web. Table 36.1 depicts a summary of the work carried out in the area of
search engine optimization.
From the review of a post-work survey of the said area, it can be observed that
a lees scope has been put up in the IR domain even though it is one of the popular
areas of research [1113]. Space-specific storehouses are being developed with both
estimates and numbers that call for an effective look at the archives and recovery.
Organizing the information on a personal computer becomes more and tougher due to
a variety of types of information that is hard to remember or prioritize. Our proposed
article aims to concentrate following objectives. Initially, to develop an efficient
search algorithm that helps to retrieve the desired documents within the desktop.
Next, to implement word phased inverted indexing and a possible corpus to navigate
the desktop.
36.3 Proposed Methodology
Our proposed approach aims to concentrate on Clustering Probase, POS tagging and
Segmentation conceptualization, and inverted indexing of web documents, which
goes a long way to extract only the noun and verbal phrases of the sentences. If
feasible the lemma versions of the extracted terms would bring up the accuracy. In
this work, Naïve Bayes Probabilistic Model [14] for Conceptualizing the extracted
entities that need to be disambiguated has been proposed. Probase, a Probabilistic
Taxonomy comes into the picture to provide the unique is a relationship and their
36 An Efficient Approach to Retrieve Information 391
frequency of occurrence between the entity and the concepts. Documents and Queries
are treated to the same text extraction and understanding techniques. The entities
and their concepts of the Documents are arranged into an inverted index that can
be called upon to perform a BM2F retrieval of documents upon matching with the
query’s terms. The Bayes theorem that gives insights that Probability that a concept
P(C/E)=Probability of Cbeing the Concept given occurrence of Entity Ecan be
deduced when the following are a given: P(E/C)=Probability of entity Ebelonging
to Concept CP(C)=Probability of Concept Coccurring in the dataset P(E)=
Probability of Entity Eoccurring in the dataset. Then P(C/E)isgivenbyEq.(36.1)
PCE=PECP(C)
P(E)(36.1)
The above is the implementation of the Naïve Bayes Probabilistic Model that gives
the probabilities from which the max scoring concept can be chosen to represent a
set of terms. However to deal with the set of terms where E1,E2,E3…..Enis entities:
By Assumption of Independence of the entities:
PCE1,E2,E3=PE1
CPE2
CPE3
C............P(C)
P(E1,E2,E3)
(36.2)
The below Fig. 36.1 shows the architecture of the proposed system, where the
user enters a query. The entered query undergoes entity disambiguation after POS
concerning Probase. The results are compared against the inverted index from which
the list of docs are drawn up. The list undergoes ranking as to the most likely document
and presented as a result.
The content from the most recently updated files should be gathered and analyzed
in order to derive the objects and construct their concepts. All documents which have
been altered from the last processing time are initially gathered as a list. Probase is a
knowledge base comprised IsA relationships between entities. It was created with the
help of a web crawler that analyzed over 1.2 billion online documents and extracted
entities as IsA relationships using inferential principles. So each relationship’s inci-
dence is also part of the knowledge base. Probase was acquired by Microsoft Research
Teams as part of a deal. Microsoft Concept Graph is the official name for Probase.
36.3.1 Clustering Probase
Probase is a collection of more than 120 million ia pairs of entity-concept relation-
ships. To be able to process this on a Personal Computer is a challenge in itself for
392 S. A. Karthik et al.
Fig. 36.1 Overview of proposed methodology
there will always be a shortage of heap memory. The workaround is to make a sample
out of Probase that still best represents it.
In this work manual clustering has been incorporated which is based on how
typical a concept is given an entity as well as how typical an entity is given a concept.
Typicality is a score measured as where n(e,c) is the frequency of occurring rela-
tionships, n(e) is the frequency of term as an entity, n(c) is the frequency of term
as a concept, T(c) is the typicality of a concept cgiven the entity eand T(e)isthe
typicality of an entity egiven the concept c.
T(e)=n(e,c)
n(c)(36.3)
T(c)=n(e,c)
n(e)(36.4)
36.3.2 POS TAGGING and SEGMENTATION
Language Components Tagging is the process of identifying syntactically signifi-
cant sections of phrases. It is accomplished using machine learning algorithms that
have been taught to recognize grammatical links among words in tagged phrases.
36 An Efficient Approach to Retrieve Information 393
Wink-pos-tagger is a tool for annotating parts of speech in English sentences. This
is centered on Eric Brill’s transformation-based learning (TBL) technique. On the
regular WSJ22-24 test set, it pos-tags and lemmatizes more than 525,000 tokens per s
with a precision of 93.2%. This was tested using the tagRawTokens() API on a
2.2 GHz Intel Core i3 computer with 8 GB RAM.
The practice of splitting written material into recognizable components, such as
keywords, paragraphs, or themes, is known as text segmentation. The information
of reading documents are extracted, however, it must be divided into phrases to
determine the semantics of the entity in that phrase, rather than expressing the data
files with a few notions.
36.3.3 Conceptualization
Individuals can correlate qualities or characteristics of an object with those of objects
they are already familiar with, which is known as conceptualization. It is essentially a
map of common information. The references underpinning an entity can be extended
through an understanding of the concept.
36.3.4 Inverted Index
An inverted index is a statistical index that stores a mapping between content, such
as phrases or numerals, to their places in tables, text, or series of articles. An inverted
index allows for faster comprehensive searching at the cost of additional processing
when content is uploaded to the database. This was the most frequently used data
structure in information retrieval systems, [1] such as search engines.
36.4 Experimentation Results
In this section result analysis of the proposed work has been presented to judge the
efficiency of the proposed method. The Node.js programming language has been
used to create n, such as Web servers. Actual support for Node.js is available for
well-known operating systems and prototype support for FreeBSD. Node.js enables
the creation of fast loading servers in JavaScript by bringing occasion programming
to web servers. To judge the said methodology is efficient, a few parameters are
considered. Whenever an issue has two or more categories that can also be truthful,
micro-averaging is used. The parameters used in this work are micro-avg. recall
(miR), micro-avg. precision (miP), micro-avg. F1 score (miF1), macro-avg. F1 score
(macF1) and it can be computed using the below formulae. TP, FP, TN, FN are
referred to as true positive, true negative, false positive, false negative. The range of
394 S. A. Karthik et al.
Table 36.2 Performance comparison proposed versus existing methodology
Method miR miP miF1 macF1 error
SVM 0.812 0.9137 0.8599 0.5251 0.00365
KNN 0.8339 0.8807 0.8567 0.5242 0.00385
LSF 0.8507 0.8489 0.8498 0.5008 0.00414
NNet 0.7842 0.8785 0.8287 0.3765 0.0047
NB 0.7688 0.8245 0.7956 0.3886 0.00544
Probase N B(Proposed) 0.8992 0.9263 0.9125
all parameters mentioned here is [0, 1]
miR =n
i=1TPi
n
i=1(TPi+FNi)(36.5)
miP =n
i=1TPi
n
i=1(TPi+FPi)(36.6)
miF1 =2×miP +miR
miP miR (36.7)
macF1 =1
n
n
i=0
F1 score (36.8)
The performance comparison proposed versus existing methodology is given in
Table 36.2. Reason good optimization of the result is the implementation of inverted
index and segmentation of the phrases.
From the plot, it is clear that Probase NB relatively performs better than other work
presented in the literature under these metrics and still is a step toward integrating
NLU into PCs (Fig. 36.2).
36.5 Conclusion
The largest challenge had been the lack of heap memory to be able to process
Probase completely. Entity Disambiguation has been achieved and led to a textual
understanding of the file contents. Different alternatives for document representation
are discussed and chosen to use the Vector Space Model representation. So every
different branched phrase is referred to as validity in the Vector Space Model nota-
tion. Opposed to a simple and nearly impossible to break tags are deleted, phrases are
separated using Porter’s stemming algorithm, and multiplicity is decreased through
using the eigenvalue technique for semantic similarity indexing. Probase NB rela-
tively performs less than NB under these metrics. Probase NB still is a step toward
36 An Efficient Approach to Retrieve Information 395
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
SVM KNN LSF N Net N B Probase N
B(Proposed)
S
c
o
r
e
s
Methods
Proposed vs. Exisng approaches
miP
miF1
maF1
error
miR
Fig. 36.2 Line plot of obtained results
integrating NLU into PCs. The addition of unannotated data is easily processed in
Probase NB. Heap Memory limitation is an obstacle.
References
1. Prasath, R., Kumar, V., & Sarkar, S. (2015). Assisting web document retrieval with topic
identification in tourism domain. In Web intelligence (Vol. 13, No. 1, pp. 31–41). IOS Press.
https://doi.org/10.3233/web-150308
2. Kuzi, S., Shtok, A., & Kurland, O. (2016). Query expansion using word embeddings. In
Proceedings of the 25th ACM international onconference on information and knowledge
management—CIKM ’16.https://doi.org/10.1145/2983323.2983876
3. Seok, M., Song, H.-J., Park, C., Kim, J.-D., & Kim, Y.-S. (2016). Named entity recognition
using word embedding as a feature. International Journal of Software Engineering and Its
Applications, 10.
4. Agrawal, R., Gollapudi, S., Kannan, A., & Kenthapadi, K. (2014). Similarity search using
concept graphs. In Proceedings of the 23rd ACM international conference on conference
on information and knowledge management—CIKM ’14.https://doi.org/10.1145/2661829.266
1995
5. Andrzejewski, D., & Buttler, D. (2011). Latent topic feedback for information retrieval. In
Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and
data mining—KDD ’11.https://doi.org/10.1145/2020408.2020503
6. Ganguly, D., Roy, D., Mitra, M., & Jones, G. J. F. (2015). Word embedding based generalized
language model for information retrieval. In Proceedings of the 38th international ACM SIGIR
conference on research and development in information retrieval—SIGIR ’15.https://doi.org/
10.1145/2766462.2767780
7. Potthast, M., Hagen, M., Stein, B., Graßegger, J., Michel, M., Tippmann, M., & Welsch,
C. (2012). ChatNoir: a search engine for the clueweb09 corpus. In Proceedings of the 35th
international ACM SIGIR conference on research and development in information retrieval—
SIGIR ’12.https://doi.org/10.1145/2348283.2348429
396 S. A. Karthik et al.
8. Zuccon, G., Koopman, B., Bruza, P., & Azzopardi, L. (2015). Integrating and evaluating neural
word embeddings in information retrieval. In Proceedings of the 20th Australasian document
computing symposium on ZZZ—ADCS ’15.https://doi.org/10.1145/2838931.2838936
9. Wang, Z., Zhao Renmin, K., Meng, X., & Wen, J.-R. Query understanding through knowledge-
based conceptualization. In Proceedings of the twenty-fourth international joint conference on
artificial intelligence
10. Ordonez-Salinas, S., & Gelbukh, A. (2010). Information retrieval with a simplified conceptual
graph-like representation. In: G. Sidorov, A. Hernandez Aguirre, & C. A. Reyes Garcia (Eds.),
Advances in artificial intelligence, lecture notes in computer science (Vol. 6437). Springer.
11. Koopman, B., Zuccon, G., Bruza, P., Sitbon, L., & Lawley, M. (2015). Information retrieval as
semantic inference: A graph inference model applied to medical search. Information Retrieval
Journal, 19(1–2), 6–37. https://doi.org/10.1007/s10791-015-9268-9
12. Karthik, S. A., & Manjunath, S. S. (2020). Microarray spot partitioning by autonoumsly
organizing maps through contour model. International Journal of Electrical and Computer
Engineering (IJECE)
13. Karthik, S. A. (2019). A systematic examination of microarray segmentation algorithms.
International Journal of Innovative Technology and Exploring Engineering (IJITEE).
14. Karthik, S. A., (2018). An enhanced approach for spot segmentation of microarray images.
In International conference on computational intelligence and data science (ICCIDS 2018).
Elsevier.
Chapter 37
Baggage Recognition and Collection
at Airports
Aviral Pulast and S. Asha
Abstract This paper proposes a solution to the baggage recognition and collection
process at the airports. Security of the passenger’s luggage becomes a major question
at airports during deplane and standing near the belts to collect the luggage. As
nobody likes to stand and check each bag which is similar or identical to theirs, this
paper worked on finding a solution to this problem. In this paper, the exact real-
life scenarios is found and comparison between the conventional and the proposed
methods, advantages of the new method and a proper implementation of the same is
carried out with a promising performance.
37.1 Introduction
The Internet of things has changed the way we look around the world. Right from
going to manual operation of a process to an automated working and then to a
wireless, seamless, and effortless processing of information and data. These days
you can find solutions to literally any problem we face and that too by using IoT. For,
e.g., let us take our houses, IoT has a very major role here which makes everything
in our home automatic, we call it a smart home automation system. We can see that
various big multinational companies are coming forward and showing their interests
in developing new technologies to make our lives easier and smooth. We, in our daily
lives face many problems, be it in our city, our house, our neighborhood which we
would like to remove or if not remove then find an easy solution to it.
A. Pulast ·S. Asha (B)
School of Computer Science and Engineering, Vellore Institute of Technology, Chennai 600127,
India
e-mail: asha.s@vit.ac.in
A. Pulast
e-mail: aviral.pulast2019@vitstudent.ac.in
S. Asha
Centre for Cyber Physical Systems, Vellore Institute of Technology, Chennai 600127, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
B. N. K. Rao et al. (eds.), Intelligent Computing and Applications, Smart Innovation,
Systems and Technologies 315, https://doi.org/10.1007/978-981-19-4162-7_37
397
398 A. Pulast and S. Asha
When we talk about the Internet of Things and consider communication, we talk
about various sensors, various bluetooth devices, new technologies such as Zigbee
and many more. But, there’s one thing common in all this is the use of the Internet
and as an extension we have sensors which helps us to implement and enjoy the IoT
services. We would also require cloud computing to manage our data with proper
safety and security. There are many providers like Google, Microsoft, and Amazon
with a highly safe and secure database to store our/our customer’s information.
One such problem, we generally face is at the airports. The problem is that of
recognizing our baggage as there may be many other identical bags or belongings
which might create a possibility for us to pick up the wrong belongings. Use of
the Internet of Things, wireless sensor networks make it easier for us to work and
implement for the above-mentioned problem. This problem is solved by using various
modules and sensors and Internet and calling services. This paper is organized into
various sections explaining the need, the proposal, and the implementation of this
major problem. The advantages and disadvantages of the new method and previous
methods/procedures, respectively, were also discussed in this paper.
37.2 Methodology
In this section, the procedure or the workflow is explained by giving proper opera-
tional steps with the reasons along with. This section will also show the comparison
between the process which is followed till date and the proposed process with all the
new technologies implemented.
37.2.1 Conventional Method
At airports, we usually see there are several steps that are to be followed before
we actually get our baggage whenever we deboard an aircraft. The process
followed contains two–three steps before our bags reach us. Figure 37.1 depicts
the conventional procedure of baggage collection at airports.
In the conventional method, there is minimal use/no use of the IoT devices and
even if there is a usage, it is either at the end or at the start. In this way, there is no
provision for crowd management and effortless functioning of the pickup belt. Also,
this method does not ensure proper security and safety of your baggage. There might
be a possibility that your bag might get exchanged with another person provided your
bags are identically similar but that is what and why we are presenting a solution in
this paper.
37 Baggage Recognition and Collection at Airports 399
Fig. 37.1 Conventional method
37.2.2 Proposed Solution
The above-discussed method lacks at many places and many points, so to optimize
the above, a new is proposed to achieve a solution in which IoT devices are being
used and hence making the process easier. The process is shown in Fig. 37.2, and it
includes the following steps.
The above steps give an outline about the solution. It is noticed from the Fig. 37.2
that, two extra steps are included, which are RFID tags and they will be scanning
your baggage and will be displaying the output on the screen to alert the passenger
about their luggage.
It’s not just about scanning RFID tags, there is also a usage of GSM modules which
will transfer the scanned information to the cloud and then an intimation will be sent
to the passenger’s phone number as a SMS hence making the process completely
smooth and easy. Further in the following sections, the detailed explanation about
the inner processes is given.
Fig. 37.2 Proposed method
400 A. Pulast and S. Asha
37.3 Implementation
This section provides the implementation of the idea in both schematic and circuit
diagrams. This helps to understand more about the functioning and the procedure of
the new proposed way.
37.3.1 Schematic Diagram
Figure 37.3 gives the detailed schematic diagram of the proposed method. The
detailed steps of the entire implementation procedure are explained the following
sections.
Fig. 37.3 Detailed schematic diagram of the proposed solution
37 Baggage Recognition and Collection at Airports 401
37.3.1.1 1st RFID Scan
The 1st RFID scan is done to divide the luggage into slots (10–20 in. each). This
step is essential as there can be hundreds of passengers in a flight. So, after luggage
being deplaned, they are divided into slots to maintain the smooth functioning as
it is easier to handle ten bags at a time rather than 100 of them all together. After
scanning them if the scanner is true then they are passed to move further on the belt;
otherwise, they are held back and will be sent to the last slot or in a different category
slot (unidentified).
37.3.1.2 GSM Module
In continuation to the previous step, if the RFID tags are matched and the luggage
is proceeded to the belt then the data associated with the luggage will be sent to
the internet and through that SMS notification will be sent to the passenger’s phone.
This will alert the passengers about their baggage and all the notified passengers may
come and stand near the belt to collect the baggage.
37.3.1.3 2nd RFID Scan
When the baggage is on the belt and is ready for pickup, another RFID scan will be
done, hence ensuring that the bag which is being scanned is of a single and unique
passenger. This scan is being done to eliminate the possibility of multiple bags being
identical. Since each bag is given a separate RFID tag, therefore that tag will have
a unique information associated with that bag. As a result, when the tag is scanned
and it matches with the RFID reader, it will display the information on the screen
and will send data to the database and from there another SMS will be sent to the
passenger asking them to come and collect their bag from the belt.
37.3.2 Circuit Implementation
In Fig. 37.4, the circuit diagram of the implementation is given. All the implemen-
tation is done in Proteus 8 Professional software. In the following section, the
apparatus used in this proposed method with their application is discussed.
37.3.2.1 Arduino Uno
This is the microcontroller for this project and all the functions are being performed
by this controller only. Arduino has been backed up by a project specific code and the
whole project works on that. RFID works according to the character array associated
402 A. Pulast and S. Asha
Fig. 37.4 Circuit implementation of the idea [1]
with each scanner tag. It is a 12-figure character array which acts as a unique ID for
RFID tags. This id is read by the receiver, if the id is matched then the bag will be
passed further otherwise not.
37.3.2.2 RFID Tags
RFID technology consists of two tags: one is sender and the other is receiver. In the
circuit diagram in the above Fig. 37.4, there are two device figures marked with two
circles. The figure marked with a green circle is the sender, i.e., the RFID tag which
is placed on each bag, while the red marked RFID is the receiver which means that
whenever sender having a unique array is scanned, the receiver will check that id
and will return the Boolean value (True or False). Further working of the proposed
method is explained in the results section.
37.3.2.3 LED
Here, the LED will be lit when the RFID matches correctly indicating the correct
bag and correct id. When the LED is lit, it means that the data associated with the
bag is sent to the database and from there the message will be sent to the passenger.
Since there is no provision to show the message transfer on mobile phones, we have
put this LED to show the acknowledgement. Further in sections, you will find the
simulated output for the given circuit.
37 Baggage Recognition and Collection at Airports 403
37.4 Result and Discussion
As we proceeded with the implementation, we simulated the circuit and got the
desired outputs. We intentionally hard-coded and put the expected id to show the
match and on the other hand gave a false id to show the mismatch.
In Fig. 37.5 given below, the output screen, two terminal-like windows are seen,
where the upper window is the sender and the lower is the output window. In the
upper window, we entered the correct ID AB123456789A” (also given in code).
Hence, we are getting output as “AB123456789A VALID TAG”. If we get this case
at the airport that would mean that our bag has been identified and the passenger
is getting an intimation about their luggage. Now in the Fig. 37.6, another output
saying “CD123456789D INVALID TAG” is seen. This means that this time we have
entered the tag/ID as “CD123456789D” which is obviously wrong coz in the code
we have provided AB123456789A” as the correct id.
Fig. 37.5 Output (valid tag)
404 A. Pulast and S. Asha
Fig. 37.6 Output (invalid tag)
Hence, we can see that this implementation is working fine with every possible
case i. In case of wrong id, we are getting the expected output and in case of correct
id we are getting a correct output.
37.5 Conclusion
In this era of 4th industrial revolution (Industry 4.0), with the aid of Internet of Things
(IoT), things have become a lot easier when it comes to remote data gathering and
processing with smart sensors, cheaper cloud data storage. The Internet of Things
is growing at an exponential rate wherever we look, wherever we go be it malls,
be it universities we see IoT applications. In colleges and universities, we find IoT
applications in ID Cards, for attendance, for entry and exit students are using their
ID cards to scan and then register their attendance. Similarly, we have smart city
concepts where there are smart toll booths, smart dustbins making it smoother for
the government and for the residents as well. Taking inspiration from various IoT
usage and implementation we have conducted a study to proceed for this project.
In this paper, we have studied and evaluated different modules and application
programming interfaces (APIs) to simulate and solve a real-world problem which is to
ensure proper identification of baggage with proper security and eliminate gathering
and cramming up of passengers in long queues at the airport so as to control the
flow of work, which can be further improved with the help of furthermore resources
and better data queueing and analyzing algorithms alongside with more cloud-based
processing which can handle more amount of data for industrial usages. Above we
discussed our approach to this idea and implementation. Since our idea follows
a minimalistic and simplistic approach making it easier to implement in the real
world and also making it cost efficient because the approach tries to keep less use
of sensors hence getting efficient results with low inputs. We are sure to get more
future applications and future investments in the field of IoT.
37 Baggage Recognition and Collection at Airports 405
References
1. RFID security and privacy: A research survey.https://ieeexplore.ieee.org/abstract/document/
1589116
2. An introduction to RFID technology.https://ieeexplore.ieee.org/abstract/document/1593568
3. Rfid technologies: Supply-chain applications and implementation issues.https://www.tandfo
nline.com/doi/abs/10.1201/1078/44912.22.1.20051201/85739.7?journalCode=uism20
4. RFID systems and security and privacy implications.https://link.springer.com/chapter/10.
1007/3-540-36400-5_33
5. Mapping and localization with RFID technology.https://ieeexplore.ieee.org/abstract/doc
ument/1307283
6. Smart sensors: Analysis of different types of IoT sensors.https://ieeexplore.ieee.org/abstract/
document/8862778
7. Analysis of three IoT-based wireless sensors for environmental monitoring.https://ieeexplore.
ieee.org/abstract/document/7887698
8. General application research on GSM module.https://ieeexplore.ieee.org/abstract/document/
6063315
9. Anti-theft system based on GSM and GPS module.https://ieeexplore.ieee.org/abstract/doc
ument/6376520
10. Implementation of wireless sensor network (WSN) on garbage transport warning information
system using GSM module.https://iopscience.iop.org/article/10.1088/1742-6596/1175/1/012
054/meta
11. Attendance generating system using RFID and GSM.https://ieeexplore.ieee.org/abstract/doc
ument/7494157
12. Design of smart home controlling system based on GSM SMS.https://en.cnki.com.cn/Art
icle_en/CJFDTotal-DZCL201306030.htm
13. Microcontroller based digital meter with alert system using GSM.https://ieeexplore.ieee.org/
abstract/document/7856033
14. Microcontroller based attendance system using RFID and GSM.https://www.ijeter.eversc
ience.org/Manuscripts/Volume-5/Issue-8/Vol-5-issue-8-M-21.pdf
Chapter 38
Computer Vision Technique to Detect
Accidents
A. Jafflet Trinishia and S. Asha
Abstract Detecting accident in smart cities is hypothetical task in day-to-day life.
It is hard to control by traffic police; the police cannot be available for all the 24 * 7
(all the month). Due to this, many accidents are passing by in everyone life. Many
humans are losing their life due to lack of first aid support from the hospital. It takes
at least 5 min to pass the accident information to the hospital; hence to overcome this
problem, we have used computer vision technique to identify the accident in specific
location, and the messages will be passed automatically to the nearby hospital. When
the accident is detected the local hospital, and patrol is intimated by the Gmail else
by SMS through SMTP to take necessary action. Using deep learning techniques,
we are able to achieve a promising solution to this problem.
38.1 Introduction
Traffic accidents can occasionally lead to unexpected damages to society, injuries to
humans and environment, and even fatalities. According to NHTSA yearly reports
on traffic since 1988, probably, more than 5,000,000 car crashes happen which means
in the states each year 30% of them end result in death and injuries. After several
research and after multiple discussion, it is widely believed that reductions of acci-
dent can be achieved via high-quality accident detection methods and also by using
corresponding response strategies. As we are dealing with traffic incident and the
accidents corresponding with them, the traffic management result should be accurate
and speedy detection of traffic accidents is vital, and it should be done within minimal
amount of response time. Traditional techniques normally provide correct location
and time of the site visitors. The common detection methods with solely traffic facts
nevertheless meet positive challenges.
A. J. Trinishia
School of Computer Science and Engineering, VIT University, Chennai, India
e-mail: jafflettrinishia.a2020@vitstudent.ac.in
S. Asha (B)
Centre for Cyber Physical Systems, VIT University, Chennai, India
e-mail: asha.s@vit.ac.in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
B. N. K. Rao et al. (eds.), Intelligent Computing and Applications, Smart Innovation,
Systems and Technologies 315, https://doi.org/10.1007/978-981-19-4162-7_38
407
408 A. J. Trinishia and S. Asha
Initially, most of the researchers make use of the field information mainly used to
observe the traffic accidents, and it is constructed mainly on an implicit assumption
that the records are reliable. Despite the fact that the finder screw ups and corre-
spondence botches are lasting difficulties in rush hour gridlock tasks. The issue of
failed sensors can reason even additional difficulties in mishap location in enormous
districts. The vulnerability idea of traffic designs and non-repetitive events can like-
wise moreover subvert the achievable of traffic measurements in legitimizing the site
guests’ mishaps. Besides auto-collisions, every day traffic activities would perhaps
also go through breakdowns with the guide of uncommon components like proces-
sions, road development, strolling races, and so forth. Hence, the measurements
comprehensive of the traffic float and inhabitance inalienably work as a slanted
help for auto-collisions as a substitute of direct verification. The existing research
is comparatively low review and additional limitation blunder as opposed to faster
R-CNN. Also, it struggles to find close items because of the reality, each lattice can
underwrite exclusively made up of two bounding boxes. The existing algorithms
struggles to discover small objects. This research addresses these difficulties R-
CNN procedure is utilized to distinguish the mishap through the PC vision method
which is a piece of profound learning. This paper is organized as follows: Sect. 38.1
gives the introduction about the automatic detection of road accidents. Section 38.2
discusses the existing systems. Section 38.3 gives the details of the proposed system.
Section 38.4 shows the implementation and results. Section 38.5 gives the conclusion
of this research.
38.2 Existing System
An adaptive traffic light control algorithm that alters the sequence and length of traffic
lights in real time based on the detected traffic is discussed in [1]. The intended result
is obtained using an automatic car accident detection method based on cooperative
vehicle infrastructure systems (CVIS) and machine vision in [2]. It gives 90.02%
average precision (AP). In [3], a feedback control method to optimize the signal
time based on the delay estimated through re-identification technology. Nellore and
Hancke [4] depicts a taxonomy of several traffic control techniques for preventing
congestion. Traffic optimization with IoT platform for optimal use of allocating
changing time to all traffic signals based on the number of vehicles in the road path
is achieved in [5]. In article [6], using image processing technique, number of cars are
counted using a camera which captures the road traffic. If the number of cars exceeds
a traffic threshold, a warning signal will be invoked automatically. But in paper [7],
the traffic of vehicles in road is measured based on the density of the vehicles within
a particular longitude and latitude. IR sensors are used to evaluate the traffic density
in [8]. The road traffic density is dynamic. This is addressed in paper [9], with the
help of RFID for vehicle-to-vehicle communication to avoid congestion. Shaaban
et al. [10] proposed an evaluation strategy of the existing traffic control mechanisms.
38 Computer Vision Technique to Detect Accidents 409
The use of clustering algorithm on VANETs based on threshold for selecting
green light timing for crossroads in real-time environment is experimented in [11].
The survey on various sensors, their strengths, and weakness is discussed in [12].
Vehicle-logo location recognition system is discussed in [13]. Deep learning models
are used for object detection and tracking system (ODTS) in [14].
All the existing models are having both advantages and disadvantages, but traffic
accidents are not addressed. The proposed method uses a computer vision technique
to avoid road accidents.
38.3 Proposed System
Traditional traffic monitoring device in designed only to screen traffic or to manipu-
late the traffic, however, it does no longer supply any answer to minimize the deadly
unintended human damages price which happen due to lack of clinical resource in
actual time. Consider a situation where the accident took place, however, no one
used to be there to record this accident, the sufferer is crucial and each and every
2nd counts, any lengthen can end result in disability or death. We cannot root out
accidents completely, however, we can improve in supplying publish unintentional
care just-in-time. There are a lot of sensors primarily-based structures reachable in
the market as nicely, however, that requires automobile proprietors to set up these
sensors in their vehicles. The working of these systems is based totally on any injury
being sensed through the sensors installed; these indicators from the sensors will set
off a machine that will alert close by clinical help or an emergency contact number.
But what if the accident took place of an automobile which is no longer outfitted
with such sensor primarily-based system. For example, consider the Fig. 38.1 that
displays the accident scene in a highway. This type of accidents cannot be addressed
immediately due to the lack of technology. We want an increase artificial talent-based
totally surveillance gadget which now not solely can notice prevalence of accident,
however, additionally can alert to close by hospitals/ambulance or traffic policemen
in real-time.
The proposed structure bears the cost of a vigorous procedure to acquire an exor-
bitant detection rate and a low-false alarm rate on common street traffic CCTV obser-
vation film. This structure used to be assessed on various specifications like enormous
sunlight, low perceivability, downpour, hail, and snow the utilization of the proposed
dataset. This structure used to be noticed superb and prepares to the improvement
of universally useful vehicular accident location calculations progressively. At the
point when the accident is distinguished, the local facility and patrol are insinuated
via the Gmail and SMS through SMTP. The levels of the accident moreover broke
down whether it is generally significant or minor the SMTP protocol plays the role
of alerting the hospital. The proposed method focuses on vehicle detection, accident
identification, accident classification as a major and minor type of accidents and send
alerting message the nearest control centre. The architecture of the proposed model
is given in Fig. 38.2. It begins from getting the information from the CCTV camera,
410 A. J. Trinishia and S. Asha
Fig. 38.1 Real-time road accident scene
processes the data and sending an alert to the nearest hospital about the accident
occurred in few seconds.
(a) Vehicle Detection
Fig. 38.2 Architecture of the proposed model
38 Computer Vision Technique to Detect Accidents 411
Vehicle detection and monitoring are necessary in self-driving applied sciences
to power automobile safely. It can be carried out by using following the under
tasks. Perform a histogram of oriented gradients (HOG) characteristic extrac-
tion on a labeled education set of snap shots and educate classifier linear SVM
classifiers. Implement a sliding window approach and use your skilled classifier
to search for automobiles in images. Run pipeline on a video flow and create
a warmth map of habitual detections body through body to reject outliers and
comply with detected vehicles. Estimate a bounding container for automobiles
detected vehicle and non-vehicle pics as NumPy array are loaded to the sepa-
rate listing the usage of function. There is a variety of characteristic extraction
strategies has been used to teach the classifier to notice the vehicles efficiently.
(b) Accident Identification
Recently, computerized incident detection has attracted an awful lot interest in
parkway manage structures to minimize traffic delay, strengthen avenue safety,
potential, and actual time traffic manage due to the fact when throughway and
arterial incidents occur, they motive congestion and mobility loss, if they are no
longer constant immediately, they can reason 2nd traffic accidents.
(c) Accident Classification Major or Minor
The high purpose of this lookup paper is to analyze the avenue accidents and
determines the severity of an accident by means of making use of superior
computing device mastering strategies. There exist so many developed strate-
gies in computing device gaining knowledge of to study this sector. The traffic
accident analysis by means of making use of four superior and most famous
supervised gaining knowledge of strategies of computing device getting to
know due to the fact of their confirmed accuracy in this sector. They also use
some techniques-decision tree, K-nearest neighbors (KNNs), Naïve Bayes, and
adaptive boosting (AdaBoost).
(d) Alerting System
If fundamental peer-to-peer email server to take on giant bulk emailing jobs,
transport fee would go through and it would inevitably clog your bandwidth and
probably lengthen sending and receiving peer-to-peer emails. SMTP is used to
send alert mail to the respective hospital mail id which is near to the hospital
location.
(e) SMTP Gateway
Python oversees smtplib module, which describes a SMTP, that can be used to
send letters to any Internet figuring contraption with a SMTP or ESMTP crowd
daemon. Host—The host strolling your SMTP server if fundamental peer-to-
peer email server to take on giant bulk emailing jobs, transport fee would go
through and it would inevitably clog your bandwidth and probably lengthen
sending and receiving peer-to-peer emails.
Figure 38.3 gives the block diagram of the proposed system. It decides whether
the accident is minor or major, and based on it, alert messages will be communicated.
412 A. J. Trinishia and S. Asha
Fig. 38.3 Block diagram
It uses the fast R-CNN to achieve the expected result. The following session gives
the details of faster R-CNN and RPN.
38.3.1 Faster R-CNN
Our object recognition framework, alluded to as faster R-CNN, is comprised of two
modules. The principal module is profound completely convolutional network that
38 Computer Vision Technique to Detect Accidents 413
proposes locales, and the following module is the fast R-CNN identifier that utilizes
the proposed areas. The entire framework is a solitary, brought together local area for
object recognition. Utilizing the as of now celebrated phrasing of neural organizations
with ‘attention’ systems, the RPN module will talk about the faster R-CNN module
which spot to look. We present the plans and places of the local area for territory
proposition. We reinforce calculations for training every module with components
shared.
38.3.2 Region Proposal Networks
A region proposal network, or RPN, takes a picture of any size as input and produces
a number of rectangular article propositions, each with an objectless score. This
technique is modeled using a fully convolutional local region, as seen in this segment.
We rely on each net sharing a sequential arrangement of convolutional layers since
our goal is to impart calculation to a fast R-CNN object location local area. To
provide area suggestions, we use the excess common convolutional layer to lead a
small local region over the convolutional work map yield. The information in this
small local area comes from the information convolutional trademark map’s n×
nspatial window. In this case, each sliding window is designed to have a lower-
dimensional capacity of 256-d for ZF and 512-d for VGG, in addition to ReLU. A
container relapse layer (reg) and a crate order layer are the two kin layers that are
taken care of in this trademark (cls). When we utilize n=three in this study, we notice
that the useful open area on the information image is enormous (171 and 228 pixels
for ZF and VGG). This small partnership is depicted in a single work. The more
modest than usual association works in a sliding window style, with the completely
connected layers shared throughout each spatial territory. A n×nconvolutional
layer is usually combined with two family 1 ×1 convolutional layers to complete
this scheme (Fig. 38.4).
The following Table 38.1 gives the comparison of fast R-CNN with R-CNN, the
features, and their limitations. Table 38.2 gives the efficiency in term of time is given.
Based on this, it is understood that fast R-CNN is recommended and can be applied.
38.4 System Implementation and Results
Because even after filtering by thresholding across the classes scores, we still wind
up with a lot of overlapping boxes, the non-maximum suppression (NMS) is the
second stage of filtering employed to get rid of them. Non-maximum suppression
(NMS) is a computer vision approach that is utilized in a variety of jobs. It is a set of
methods for selecting a single thing. Figure 38.5 depicts how NMS is used to choose
a single entity.
414 A. J. Trinishia and S. Asha
Fig. 38.4 Region proposal network. Source Chengjun Xie (2018) [15]
Table 38.1 Comparison of fast R-CNN and R-CNN algorithms
Algorithm Features Prediction time/image (s) Limitations
Fast R-CNN The CNN receives each
image only once, and
feature maps are
extracted
On these maps, selective
search is employed to
create predictions. All
three R-CNN models are
combined in this model
2Because selective search
is slow, computation
time remains high
Faster R-CNN Replaces the selective
search method with a
region proposal network,
resulting in a
substantially faster
process
0.2 Object proposal takes
time, and because there
are multiple systems
working in parallel, each
system’s performance is
influenced by the prior
system’s performance
Table 38.2 Efficiency of
faster R-CNN Comparison R-CNN Fast R-CNN
Time taken to test per image (s) 50 2
Speed ups 1×25×
mAP 66.0 66.9
38 Computer Vision Technique to Detect Accidents 415
Fig. 38.5 Non-maximum suppression. Source Jatin Prakash (June 2, 2021)
38.4.1 Anchors
At each sliding window area, we simultaneously a few area recommendations, the
spot the amount of most suitable proposition for each space we will indicated that
ask. Now, the reg layer has 4kyields that will be encoding the directions of the
alright boxes, and the CLS layer yields 2krankings that gage possibility of article
or now not item for each proposition. The kproposition is defined comparative with
kreference boxes, which we name secures. An anchor is established at the sliding
window being referred to and is connected with a scale and thing proportion. As
a matter of course, we utilize three scales and three segment proportions, yielding
k=9 anchors at each sliding position. For a convolutional trademark guide of an
estimation W×H(regularly 2400), there are WH k secures in complete.
38.4.1.1 Anchor Boxes
In faster R-CNN, anchor boxes are one of the most significant notions. These are in
charge of providing a predetermined collection of bounding boxes of various sizes
and ratios that are used as a point of reference when the RPN is first predicting item
positions. These boxes are often chosen based on object sizes in the training dataset
to capture the scale and aspect ratio of specific object classes you want to detect.
Anchor boxes are usually placed in the center of the sliding glass.
The original approach employs three scales and three aspect ratios, resulting in k
=9. The total number of anchors generated will be W*H*kif the final feature
map from the feature extraction layer has width Wand height H.
The major reason for using anchor boxes is to analyze all object predictions at
the same time. They aid in the detection portion of a deep learning neural network
framework’s speed and efficiency. Anchor boxes also make it easier to recognize
many items, objects of varying scales, and overlapping objects without having to
scan an image with a sliding window that makes a separate prediction for each
possible position. This allows for real-time object detection.
Object recognition is an enamoring territory in PC vision. It goes to a whole
new stage when we are managing video information. The intricacy ascends a score,
416 A. J. Trinishia and S. Asha
notwithstanding so do the prizes! We can work fantastic useful high-esteem obliga-
tions like observation, guests the board, battle wrongdoing, and so forth the use of
object detection algorithm. In this automobile detection, attainable boxes, with the
aid of a sliding window, that may also include an automobile by using the use of a
support vector machine classifier for prediction to create a warmness map. The warm-
ness map records are then used to filter out false positives earlier than identification
of automobiles by means of drawing a bounding container round it.
Two significant principles have been set for characterizing order boundaries of a
mishap that will genuinely portray the mishap probability and seriousness: decline
the scope of boundaries, characterizing the mishap, while keeping the cherished
records about mishap seriousness. As information, the road mishap dataset kept up
through the Czech Police road traffic police branch was once utilized. The outcomes,
which are not, at this point secured in this article, have delivered intriguing infor-
mation. One perspective with regards to explicit—a monstrous amount of records is
extremely vague and just enigmatically depicts the specialized history of a mishap
method and results. The mishap information is fundamentally founded habitually on
lawful punishments of a mishap. Be that as it may, every single vehicle outfitted with
an underlying insurance contraption has information about its current day climate,
which cannot be put away with the guide of the dataset. A combination of the road
accident dataset assessment and a hypothetical methodology from the auto-abilities
perspective created the accompanying boundaries that can sincerely characterize the
probability of an accident and the confirmation of harm:
1. stress of impacting objects;
2. contact cross-part of impacting objects in the conventional course of crash;
3. amount of resulting crashes;
4. auto-relative speed sooner than the expected mishap;
5. sort of impact;
6. auto-movement sooner or later of the mishap;
7. auto occupation;
8. reaction space.
The above boundaries can sincerely depict the plausible seriousness of a mishap.
To acquire realities about the possibility of impact, the boundary reaction territory
is added. Figure 38.6 gives the details of the dataset used in this implementation.
This structure depends absolutely on neighborhood viewpoints like direction
crossing point, speed estimation, and their abnormalities. Every one of the inves-
tigations completed corresponding to this structure approves the proficiency and
effectivity of the recommendation and along these lines confirms the truth that the
system can deliver convenient, cherished records to the stressed specialists. The
joining of more than one boundary to consider the chance of a mishap enhances
the unwavering quality of our framework. Since we are zeroing in on an exact area
of movement round the recognized, faster vehicles, we ought to restrict the mishap
occasions.
38 Computer Vision Technique to Detect Accidents 417
Fig. 38.6 Accident image analysis dataset
38.5 Conclusion and Future Work
The proposed system is in a situation to acknowledge mishaps effectively with 71%
detection rate with 0.53% false alarm rate on the accident films got under a scope
of surrounding requirements like light, evening, and snow. The exploratory impacts
are consoling and show the ability of the proposed system. Nonetheless, one of the
issues of this work is its incapability for unreasonable thickness site guests because
of errors in car recognition and following that will be tended to in future work.
What is more, goliath limits hindering the space of perspective on the cameras may
likewise affect the checking of vehicles and in flip the collision detection. Faster
R-CNN is a variant of fast R-CNN that has been tweaked. It also proposes using
selective search to locate the regions of interest, which is a slow and time-consuming
approach. Detecting objects takes about 2 s per image, which is substantially faster
than R-CNN. However, when dealing with big real-world datasets, even a fast R-
CNN appears to be slow. R-CNN is now faster. Replaces the selective search method
with a region proposal network, resulting in a substantially faster process.
References
1. Zhou, B., Cao, J., Zeng, X., & Wu, H. (2010, September). Adaptive traffic light control in
wireless sensor network-based intelligent transportation system. In 2010 IEEE 72nd Vehicular
Technology Conference-Fall (pp. 1–5). IEEE.
2. Tian, D., Zhang, C., Duan, X., & Wang, X. (2019). An automatic car accident detection method
based on cooperative vehicle infrastructure systems. IEEE Access, 7, 127453–127463.
418 A. J. Trinishia and S. Asha
3. Henry, X. L., Oh, J. S., Oh, S., Chau, L., & Recker, W. (2001). On-line traffic signal control
scheme with real-time delay estimation technology. In California partners for advanced transit
and highways (PATH). Working papers: Paper UCB-ITS-PWP-2001.
4. Nellore, K., & Hancke, G. P. (2016). A survey on urban traffic management system using
wireless sensor networks. Sensors, 16(2), 157.
5. Janahan, S. K., Veeramanickam, M. R. M., Arun, S., Narayanan, K., Anandan, R., & Parvez, S.
J. (2018). IoT based smart traffic signal monitoring system using vehicles counts. International
Journal of Engineering & Technology,7(2.21), 309–312.
6. Jadhav, P., Kelkar, P., Patil, K., & Thorat, S. (2016). Smart traffic control system using image
processing. International Research Journal of Engineering and Technology (IRJET), 3(3),
2395-0056.
7. Amaresh, A. M., Bhat, K. S., Ashwini, G., Bhagyashree, J., & Aishwarya, P. (2019, May).
Density based smart traffic control system for congregating traffic information. In 2019 Inter-
national Conference on Intelligent Computing and Control Systems (ICCS) (pp. 760–763).
IEEE.
8. Ghazal, B., ElKhatib, K., Chahine, K., & Kherfan, M. (2016, April). Smart traffic light
control system. In 2016 Third International Conference on Electrical, Electronics, Computer
Engineering and Their Applications (EECEA) (pp. 140–145). IEEE.
9. Atta, A., Abbas, S., Khan, M. A., Ahmed, G., & Farooq, U. (2020). An adaptiveapproach: Smart
traffic congestion control system. Journal of King Saud University-Computer and Information
Sciences, 32(9), 1012–1019.
10. Shaaban, K., Khan, M. A., Hamila, R., & Ghanim, M. (2019). A strategy for emergency
vehicle preemption and route selection. Arabian Journal for Science and Engineering, 44(10),
8905–8913.
11. Jindal, V., Bedi, P., Dhankani, H., & Garg, R. ATSOT: Adaptive traffic light system using mOtes
based on threshold.
12. Padmavathi, G., Shanmugapriya, D., & Kalaivani, M. (2010). A study on vehicle detection and
tracking using wireless sensor networks. Wireless Sensor Network, 2(02), 173.
13. Liu, Y., & Li, S. (2011, July). A vehicle-logo location approach based on edge detection and
projection. In Proceedings of 2011 IEEE International Conference on Vehicular Electronics
and Safety (pp. 165–168). IEEE.
14. Lee, K. B., & Shin, H. S. (2019, August). An application of a deep learning algorithm for auto-
matic detection of unexpected accidents under bad CCTV monitoring conditions in tunnels.
In 2019 International Conference on Deep Learning and Machine Learning in Emerging
Applications (Deep-ML) (pp. 7–11). IEEE.
15. Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection
with region proposal networks. Advances in neural information processing systems, 28.
Chapter 39
An Efficient Machine Learning Approach
for Apple Leaf Disease Detection
K. R. Bhavya, S. Pravinth Raja, B. Sunil Kumar, S. A. Karthik,
and Subhash Chavadaki
Abstract Apples are one of the most popular agricultural products. Despite being
one of the most widely grown commodities, apple demand is on the rise. As a result,
this crop, which was formerly only grown in temperate climates, is now being grown
in tropical climates. Pest and disease infestations are a major issue that affects apple
output each year. In this paper, an approach has been made which combines machine
learning and image processing concepts to identify diseases from infected apple
leaves. This method effectively differentiates between diseased and non-diseased
apple leaves. Pre-processing of the image is done using grab cut segmentation which
is the primary stage in the disease identification process. The infected type from the
original leaf image is recognized by 96% using the segmentation of the diseased
portion, and multiclass SVM detects the infected type from 500 images using the
feature extraction.
39.1 Introduction
One of the most widely produced crops is Apple. According to the Food and Agri-
culture Organization, the world produced 125,377 kilotons of fresh apples in 2019.
It can kill a wide range of diseases. It also helps to avoid diseases such as cancer and
diabetes. It is high in vitamins and fibre, which are both good for health and also
help to maintain the health of our bodies and brains, as well as the development of a
robust immune system. It makes a substantial contribution to the country’s economy.
K. R. Bhavya (B)·S. Pravinth Raja
Department of CSE, Presidency University, Bengaluru, India
e-mail: bhavyacs08@gmail.com
B. Sunil Kumar
Department of CSE, Annamacharya Institute of Technology and Science, Tirupati, India
S. A. Karthik
Department of C&IT, REVA University, Bengaluru, India
S. Chavadaki
Department of Mechanical Engineering, GITAM University, Visakhapatnam, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
B. N. K. Rao et al. (eds.), Intelligent Computing and Applications, Smart Innovation,
Systems and Technologies 315, https://doi.org/10.1007/978-981-19-4162-7_39
419
420 K. R. Bhavya et al.
Apple production is impacted by the diseases like cedar apple rust and black rot. As a
result, many farmers are facing significant losses. Most of them are having financial
problems as a result of their inability to return the loan amount on time. As a result,
early discovery of this illness can help to limit the number of apples lost. Most of the
time, individuals try to diagnose ailments by looking at them. Because apple fruits
and leaves come in a variety of forms and colours, people may occasionally make
the wrong choice. Plant diseases come in a variety of forms. As a result, inaccurate
results may be obtained, thus slowing down performance. Furthermore, one may not
receive the required result on time [1]. As a result, increased production rates, plant
monitoring by framers, and automation are required [2,3].
Most of the leaf defect detection techniques entail the use of pricey, high-
performance devices and it is also difficult to get excellent precision. Furthermore,
additional human resources are needed to diagnose these disorders from leaves with
the naked eye. As a result, this study provides an approach based on current techno-
logical advancements. The disease can cause a variety of symptoms in leaves. One
can reduce it by applying image processing techniques.
The entire article is organized as Sect. 39.2 briefly highlights related work carried
out in the domain of apple leaf disease detection. Using the multiclass SVM approach,
apple leaf disease is detected which has been described in Sect. 39.3. To decide the
effectiveness of the proposed methodology, results analysis has been done which is
mentioned in Sect. 39.4. Eventually, a conclusion has been made based on results
obtained by the said approach.
39.2 Related Work
Detecting diseases from plant leaves has been performed using a variety of tech-
niques. Padol and Yadav [4] used support vector machine classifiers to efficiently
identify categories of grape leaf diseases. Author used nine colours and textures as
SVM features after segmenting with K-means clustering. Throughout 137 images,
they were able to achieve an accuracy of 88.89%. Gavhale et al. [5] utilized the
same method to diagnose anthracnose and canker disease, achieving 95% accuracy
with 200 and 100 images. Hossain et al. [6] utilized support vector machine to iden-
tify three varieties of leaf diseases to obtained an accuracy of 90%. Their approach
might automatically choose the optimal cluster. Using superior pre-processing tech-
niques and the GLCM matrix, the suggested methodology has increased efficiency.
On 30,000 images, Sladojevic et al. [7,8] utilized a convolutional neural network
classifier to recognize 13 categories of damaged leaves. Their technique has a 96.3%
accuracy rate in detecting diseases. From 87,848 images, their proposed approach
can identify 58 diseases in 25 plants with 99.53%. They also employed fine-tuning
to significantly improve overall accuracy. Using the 54,306 images as the data set,
Mohanty et al. [9] used the technique to identify 26 types of diseases amongst 14
different types of crops varieties with 99.35%. The time required to classify them is
longer than that required by the proposed method. The system’s accuracy reduces
39 An Efficient Machine Learning Approach for Apple Leaf 421
when the test data is somewhat changed, such as several leaves on one image or light
colour leaves. Sannakki et al. [10] also used neural networks to investigate grape
leaf diseases. They employed K-means clustering to separate the data. Thresholding
was applied to get rid of the green pixels. Because they used the backpropaga-
tion approach, they were able to achieve greater accuracy, although their data set
was restricted. All of these systems contain less than 10 leaf image characteris-
tics, compared to the methods described in Sect. 39.3. Furthermore, using only NN
classifiers, it is challenging to forecast the neurons in a limited sample.
Artificial intelligence has become more popular in analysing physical features,
particularly in agriculture, where physical state or quality is critical [11,12]. Fuzzy
logic, for example, was employed to divide coconut fruits into three phases of matu-
rity: male hog, malakanin, and malakatad [13]. There is currently no standard proce-
dure in place for determining the maturity of coconuts [14]. The VGGNet16 model
was determined to be the most accurate of the three, accuracies of 95.75%, 95.9%,
and 98.75%, respectively. To determine if a tomato leaf is infected or healthy with
phoma rot, leaf miner, or target spot, a deep convolutional neural network was devel-
oped [15]. The plant leaf disease recognition was 91.67% accuracy and 95.75% for
recognizing the model using transfer learning.
The major goal our research is to develop an system that can detect the quality of an
apple (Malus domestica) leaf after colour and texture feature extraction and the model
must be able to distinguish between a healthy and an infected leaf [1618]. If the leaf
is found to be infected, the system will classify it according to the disease kind. The
system will have to distinguish between three forms of apple leaf diseases: Venturia
inaequalis,Botryosphaeria obtusa, and Gymnosporangium juniperi-virginianae.
39.3 Proposed Method
Our proposed method comprises five steps by considering the parameters such as
colour, texture, and the flow of the training procedure using a multiclass SVM
classifier. The proposed methodology is illustrated in Fig. 39.1.
Disease
Detection
Apple
Leaf from
Kaggle
Image Processing
Texture and
Color
Extraction of
Segmented
Leaf
Preprocessing of
apple leaves
Segmentation
Fig. 39.1 Disease detection model
422 K. R. Bhavya et al.
39.3.1 Data Acquisition
The detection and classification models were trained and tested using 500 single-
apple (Malus domestica) leaf images from Kaggle. 200 images are labelled as healthy
leaves, while the other 300 are labelled as infected leaves. Each image data set of
healthy and infected leaves is randomly categorized into two types for the detection
model: 80% for training the system and testing the accuracy of 20%. The 300 image
data set was divided into three subgroups, each with 200 images reflecting one of
three types of apple diseases: apple scab, black rot, and cedar apple rust. Some were
taken in the laboratory, while others were taken in the outdoors, and all of the images
were taken from eight distinct apple species. The following elements are identified
by our model. Each sub-data set likewise had an allocation of 80% for model training
and 20% for accuracy testing. The following elements are identified by our model.
Black rot (Botryosphaeria obtusa).
Cedar apple (Gymnosporangium juniperi-virginianae).
Apple scab (Venturia inaequalis).
Healthy apple leaves.
39.3.2 Image Segmentation
Pixels in apple leaf were separated from the background pixels using graph cut
segmentation. CIE 1976 L*a*b* values were assigned to the raw RGB images. The
parameters for L*a*b* are then configured as follows: 0–100 for L*, 86.1827
to 98.2343 for a*, and 107.8602 to 94.4780 for b*. As illustrated in Fig. 39.2,
scribbles were created in the foreground and background of the image, which was
then transformed to the CIELab colour spectrum using a procedure known as lazy
snapping. After conducting lazy snapping, the RGB vegetation region, where the
apple leaf pixels are placed, was created. For colour and texture extraction, an RGB
vegetation surface is required. In binary mode, the marked region now matches the
apple leaf pixels. This method will guarantee that the affected surfaces are removed
together with the healthy ones.
39.3.3 Feature Extraction
The image data sets were subjected to colour and texture feature extraction [1925].
12 data points were collected for the colour feature extraction: R,G,B,H,S,V,
L*, a*, b*, Y, Cb, and Cr. Five data points were collected for the texture feature
extractions: contrast, correlation, energy, homogeneity, and entropy given in Table
39.1.
39 An Efficient Machine Learning Approach for Apple Leaf 423
Fig. 39.2 Image segmentation: (Left) Lazysnapping, (Right) annotated vegetation region
Table 39.1 Feature extraction parameters
Features Equations Description
Entropy K1
a,b=0k2K1
a,b=0Pd(a,b)It is a measure of texture in the fruit
image
Energy K1
a,b=0Pd(a,b)2It is a measure of matching pixel degree
repetition
Contrast K1
a,b=0Pd(a,b)logPd (a,b)It is measure of power pixel and its
neighbour
Homogeneity K1
a,b=0
Pd(a,b)
1+|ab|Closeness
Correlation K1
a,b=0Pd(a,b)(aμa)(bμb)
σaσbJoint probability
Feature selection was applied to the 17 data points or predictors extracted from
colour and texture feature extractions. The method of limiting the input variables or
predictors to enhance the performance of a statistical model is known as feature selec-
tion. This study employed the neighbourhood component analysis (NCA) feature
selection method. Only three predictors, R,V, and b*, were found to have a signif-
icant impact in detecting whether an apple leaf is infected or healthy, as shown in
Fig. 39.3.
39.3.4 Disease Detection
To develop and compare the accuracy of the trained model, the classification learner
app in MATLAB is utilized. By splitting the data set into many folds and estimating
the accuracy on k-cross fold-validation, folds are also set to ten to prevent overfitting.
424 K. R. Bhavya et al.
Fig. 39.3 Predictors 1, 6, and 9 which matches to R,V,andb* must have important effects of
defining apple health
The accuracies of the training and testing stages were used to compare six SVM
models: linear, quadratic, cubic, fine Gaussian, medium Gaussian, and course Gaus-
sian. The type of kernel function used is the only difference between these models.
The Gaussian fine, medium, and coarse scales, on the other hand, differ from the
kernel scale chosen. Kernel scaling is critical for every SVM model to improve
performance. Fine, medium, and coarse Gaussian SVM kernel scale values were
0.43, 1.7, and 6.9, respectively. The hyperparameter optimization was turned off and
the multiclass method set was one-vs-one.
39.4 Results and Discussion
In our proposed method, we included 600 images of apple leaves from Kaggle [26].
All of the images were 512 by 512 pixels in size, and in the machine, Windows
10 with enough RAM of 16 Giga-Byte has been used to check the efficacy of said
methodology. There are 271 healthy leaves and 329 unhealthy leaves in our sample.
While performing an experimental study, 65% of the data set was utilized for training,
which meant 325,175 images were used for training and testing. Using this split, we
obtain good results.
Three examples of leaf images are displayed in Fig. 39.4, two defected and one
non-defected using our system final outputs. The performance of our entire system
is reviewed below.
Performance Evaluation
We analysed the working performance using a confusion matrix and a receiver oper-
ating characteristic curve. We also calculated the F1-score to evaluate accuracy. Table
39 An Efficient Machine Learning Approach for Apple Leaf 425
Fig. 39.4 Input and output of the proposed method
39.2 gives the validation for the test data and it may be seen in the confusion matrix,
which shows Category 1, Category 2, and Category 3. For this project, we utilized
500 images. We can observe from Category 3 that it finds healthy leaf images of
169 out of 171 photographs. The results of the other two detections are likewise
satisfactory.
F1-score was calculated at a 65 and 35% ratio of training data and validation
data, and we attain 96% accuracy in our system. Table 39.3 lists the performance
measures. The curve is shown in Fig. 39.5. Accuracy becomes more important when
Table 39.2 Confusion
matrix of proposed model Category 1
Black rot
Category 2
Cedar apple
rust
Category 3
Non-diseased
Category 1
Black rot
92 8 3
Category 2
Cedar apple
rust
7219 0
Category 3
Non-diseased
0 2 169
426 K. R. Bhavya et al.
Table 39.3 Performance
measures Category Precision (%) Recall (%) F1-score (%)
Category 1
Black rot
96 94 95
Category 2
Cedar apple rust
94 98 97
Category 3
Non-diseased
98 96 96
Average 98 97 96
Fig. 39.5 Receiver operating curve
the curve is moves closer to topmost left side [26]. It now includes a micro-average
ROC curve. It shows the area under the curve for several classes. The ROC curve
reflects a 96% accuracy rate, which is an excellent result.
Indeed, the proposed technology performed perfectly on these data sets. The
technology not only identifies diseases but also calculates the overall percentage of
affected regions on each leaf.
Our technology takes not more than 3 s to get a result. In addition, the suggested
system’s performance is compared to relevant works in Table 39.4. It also shows the
algorithm that supports all of the works.
We can observe that the proposed approach in this study has a 96% accuracy rate,
which is greater than any of the previous research.
39 An Efficient Machine Learning Approach for Apple Leaf 427
Table 39.4 Comparison of different leaf infected detection methods
References Method Data set Accuracy
Hu et al. [24]Hyperspectral imaging Potato 95
Prakash et al. [25]Image processing and support vector machines Citrus 90
Asfarian et al. [26] Texture analysis and probabilistic neural network Rice 83
Xu et al. k-nearest neighbour Tom a t o 82
Proposed work Grab cut segmentation and support vector machine Apple 96
39.5 Conclusion
Novel techniques of farming the crop are being developed and tested each year to
meet the expanding demand for apples. The growing of apples in tropical climates is
one of them. The spread of pests and diseases is a factor that has an impact on apple
output each year. This work proposed a hybrid model for detecting if an apple is
healthy or infected and classifying the type of disease present through its leaf. Each
apple leaf image is used to extract colour and texture data sets. The R,V, and b*
values were shown to have substantial implications for detection and classification
in this investigation, hence feature selection is done. As R,V, and b* are helpful
for disease detection and classification, the future study can focus on extracting one
colour space and then obtaining the required value from other colour spaces to save
time.
The objective of this paper is to differentiate between two forms of apple diseases:
black rot and cedar apple rust. It also tried to detect the apple’s healthy leaves using
image segmentation as a pre-processing stage. Our data is trained and tested using
a multiclass SVM. Our small contribution will assist farmers in detecting these two
severe diseases. This method proves to be a significant step forward in increasing
apple production.
References
1. Kellerhals, M., Tschopp, D., & Roth, M. Challenges in apple breeding (pp. 12–18).
2. Behera, S. K., Rath, A. K., & Sethy, P. K. (2021). Maturity status classification of papaya
fruits based on machine learning and transfer learning approach. Information Processing in
Agriculture, 8(2), 244–250.
3. Syazwani, R., & Nurazwin, W., et al. (2021). Automated image identification, detection and
fruit counting of top-view pineapple crown using machine learning. Alexandria Engineering
Journal.
4. Khan, N., et al. (2021). Oil palm and machine learning: Reviewing one decade of ideas,
innovations, applications, and gaps. Agriculture, 11(9), 832.
5. Patil, P. U., et al. (2021). Grading and sorting technique of dragon fruits using machine learning
algorithms. Journal of Agriculture and Food Research, 4, 100118.
428 K. R. Bhavya et al.
6. Munera, S., et al. (2021). Discrimination of common defects in loquat fruit cv. Algerie’ using
hyperspectral imaging and machine learning techniques. Postharvest Biology and Technology,
171, 111356.
7. Tripathi, M. K., & Maktedar, D. D. (2021). Detection of various categories of fruits and vegeta-
bles through various descriptors using machine learning techniques. International Journal of
Computational Intelligence Studies, 10(1), 36–73.
8. Koyama, K., et al. (2021). Predicting sensory evaluation of spinach freshness using machine
learning model and digital images. PLoS ONE, 16(3), e0248769.
9. Brighty, S., Sahaya, P., Shri Harini, G., & Vishal, N. (2021). Detection of adulteration in fruits
using machine learning. In 2021 Sixth International Conference on Wireless Communications,
Signal Processing and Networking (WiSPNET). IEEE.
10. Rodrigues, B., et al. (2021). Ripe-unripe: Machine learning based ripeness classification. In
2021 5th International Conference on Intelligent Computing and Control Systems (ICICCS).
IEEE.
11. Naz, F., Irshad, G., & Abbasi, N. A. (2018). Surveillance and characterization of
Botryosphaeria obtusa causing frogeye leaf spot of Apple in District Quetta Abstract (Vol.
16, pp. 111–115).
12. Strickland, D., Carroll, J., & Cox, K. (2020). Cedar apple rust.
13. Riffle, J. W., & Peterson, G. W. (1986). Diseases of trees in the great plains (General Technical
Reports—U.S. Department of Agriculture, Forest Service, no. RM-129). https://doi.org/10.
5962/bhl.title.99571
14. Rigor, D. B., Oryan, C., Ochasan, J. M., Boncato, T., Pedroche, N., & Amoy, M. (1997).
Evaluation of temperate zone fruits in the highlands of Nothern Luzon, Philippines. Acta
Hortic, 441, 59–66. https://doi.org/10.17660/ActaHortic.1997.441.5
15. Concepcion, R. S., Loresco, P. J. M., Bedruz, R. A. R., Dadios, E. P., Lauguico, S. C., &
Sybingco, E. (2020). Trophic state assessment using hybrid classification tree-artificial neural
network. International Journal of Advances in Intelligent Informatics, 6(1), 46–59. https://doi.
org/10.26555/ijain.v6i1.408
16. Concepcion, R., Lauguico, S., Alejandrino, J., Dadios, E. P., & Sybingco, E. (2018). Lettuce
canopy area measurement using static supervised neural networks based on numerical
image textural feature analysis of Haralick and gray level co-occurrence matrixs. Journal
of Agricultural Science, 156(1), 1. https://doi.org/10.1017/S0021859618000163
17. Javel, I. M., Bandala, A. A., Salvador, R. C., Bedruz, R. A. R., Dadios, E. P., & Vicerra,
R. R. P. (2019). Coconut fruit maturity classification using fuzzy logic. In 2018 IEEE 10th
International Conference on Humanoid, Nanotechnology, Information Technology, Communi-
cation and Control, Environment and Management (HNICEM 2018).https://doi.org/10.1109/
HNICEM.2018.8666231
18. De Luna, R. G., Dadios, E. P., Bandala, A. A., & Vicerra, R. R. P. (2019). Tomato fruit image
dataset for deep transfer learning-based defect detection. In Proceedings of the IEEE 2019 9th
International Conference on Robotics, Automation and Mechatronics (RAM) (pp. 356–361).
https://doi.org/10.1109/CIS-RAM47153.2019.9095778
19. De Luna, R. G., Dadios, E. P., & Bandala, A. A. (2019). Automated image capturing system
for deep learning-based tomato plant leaf disease detection and recognition. In IEEE Region 10
Annual International Conference, Proceedings/TENCON (Vol. 2018, pp. 1414–1419). https://
doi.org/10.1109/TENCON.2018.8650088
20. Chien, C.-L., Tseng, D.-C., et al.: Color image enhancement with exact HSI color model.
International Journal of Innovative Computing, Information and Control, 7(12), 6691–6710.
21. Yu, C., Dian-ren, C., Yang, L., & Lei, C. (2010). Otsu’s thresholding method based on gray level-
gradient two-dimensional histogram. In 2010 2nd International Asia Conference on Informatics
in Control, Automation and Robotics (CAR 2010), vol. 3. IEEE, 2010, pp. 282–285.
22. Ehsanirad, A., & Sharath Kumar, Y. H. (2010). Leaf recognition for plant classification using
GLCM and PCA methods. Oriental Journal of Computer Science and Technology, 3(1), 31–36.
23. Zweig, M. H., & Campbell, G. (1993). Receiver-operating characteristic (ROC) plots: A
fundamental evaluation tool in clinical medicine. Clinical Chemistry, 39(4), 561–577.
39 An Efficient Machine Learning Approach for Apple Leaf 429
24. Hu, Y., Ping, X., Xu, M., Shan, W., & He, Y. (2016). Detection of late blight disease on potato
leaves using hyperspectral imaging technique. Guang Pu Xue Yu Guang Pu Fen Xi =Guang
Pu, 36(2), 515–519.
25. Prakash, R. M., Saraswathy, G., Ramalakshmi, G., Mangaleswari, K., & Kaviya, T. (2017).
Detection of leaf diseases and classification using digital image processing. In 2017 Inter-
national Conference on Innovations in Information, Embedded and Communication Systems
(ICIIECS) (pp. 1–4). IEEE.
26. Asfarian, A., Herdiyeni, Y., Rauf, A., & Mutaqin, K. H. (2013). Paddy diseases identification
with texture analysis using fractal descriptors based on Fourier spectrum. In 2013 International
Conference on Computer, Control, Informatics and Its Applications (IC3INA) (pp.77–81).
IEEE.
Chapter 40
Precipitation Estimation Using Deep
Learning
Mohammad Gouse Galety , Fanar Fareed Hanna Rofoo,
and Rebaz Maaroof
Abstract In the new computational era, CNN rules all over; they have proved the
best models for learning characteristics and features of the extensive dimension data.
The network architecture is defined meticulously to ease out the problem-specific
data. Precipitation estimation is the objective of the CNN, which is discussed in the
paper. The problem statement is the critical element of estimating precipitation in a
typical atmospheric model with all meteorological support of scalar data. Statistical
downscaling in the numerical model of atmosphere with the collected snapshot data
on the geo-grid with all dynamical elements is the input for the CNN, used for
precipitation estimation. A new model for numerical precipitation estimation and a
data-driven method with CNN framework is the idea proposal of the paper.
40.1 Introduction
Numerical modeling of the atmosphere is a very fundamental task for mathemati-
cians. Several primitive dynamical equations are composed in the numerical model
of the atmosphere. There are primitive parameters such as thermodynamics, kine-
matics, humidity, heat, radiation, diffusion, and water on surface, which guide envi-
ronmental properties. The notions characterized by the parameters are discretized and
eased out for many kinds of analyzes using computing technologies, out of which the
precipitation analysis is elementary of all the tasks. The properties of particle coales-
cence, density ambulance, pass change, and the clouds’ phenomenal distribution are
simulated with thermal equilibrium and other physical properties during the study
of precipitation estimation. A unique physical science faculty called cloud physics
contributes well to the prognosis of atmospheric conditions. Prognostic variables are
the primary variables of the atmospheric system. Certain differential properties are
M. G. Galety (B)·F. F. H . Rofo o ·R. Maaroof
Department of Information Technology, College of Computer Science and Information
Technology, Catholic University in Erbil, Erbil, Kurdistan 44001, Iraq
e-mail: m.galety@cue.edu.krd
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
B. N. K. Rao et al. (eds.), Intelligent Computing and Applications, Smart Innovation,
Systems and Technologies 315, https://doi.org/10.1007/978-981-19-4162-7_40
431
432 M. G. Galety et al.
evolved from the atmospheric studies that are mapped to a specific grid and sub-grid-
scales, while analyzes of predictive variables. Consensus collected on the prognos-
tics and variables is applied with various heuristics, and large averages are computed
manually with less predictability. The properties are collected from the atmosphere,
and heuristics are used during the discovery process of atmosphere status. The atmo-
spheric properties are deduced by performing downscaling operations in a sub-grid
scale for a particular demographic zone. Scale resolution eliminates discrepancies
between the simulated models and real-time applications during downscaling in the
atmospheric experiments.
The data-driven model statistical downscaling is applied to estimate accuracy
in atmospheric parameters. Statistical parameters of atmosphere are dynamically
down-scaled; a numerical model applies statistical downscaling of the atmospheric
parameters, calibrating, and customizing the benchmarks specifications. The numer-
ical model is ranked with its outperformance for undertaking various schemes. Many
processes that are unresolved are resolved, during the processes incurred in the
numerical model. Therefore, the phenomenon is a practical implication that demands
relevant parameterization techniques and enhances prediction probability.
While fostering the weather forecast requirements, the task of precipitation predic-
tion and statistical downscaling to improve the accuracy levels, the support of
machine learning and deep learning caters stronger, and sufficient frameworks that
can enrich the computational background for the numeric model.
40.2 Related Works
Most of the studies on atmospheric modeling are associated with statistical down-
scaling. Deep neural networks are discussed here to support a brief review of the
statistical downscaling.
40.2.1 Statistical Downscaling
Downscaling is the mechanism of providing higher resolutions of predictions. Statis-
tical downscaling in weather forecasting is applied to narrow the climatic projections
from global climate to local models. Statistical downscaling is also referred to as
dynamical downscaling, where climate models are with high resolution also noted
as high-resolution climate models (HRCMs). Three approaches are in vogue to apply
statistical downscaling in atmospheric modeling: excellent prognosis, weather gener-
ators, and model outcome statistics. The excellent prognosis and model outcome
statistics are approached depending on the objectives and requirements and their
deterministic nature. The principal weather components such as precipitation mois-
ture, wind field, corresponding pressure, and specific raw variables contribute to the
composition of predictors in building a linear regression model.
40 Precipitation Estimation Using Deep Learning 433
The model predictors are instantaneously biased and numerical so that model
output statistics will determine model prediction accuracy. The consistency of precip-
itation and the prediction based on the consistency biases is the antecedents for
ascertaining the validity of precipitation prediction in the ‘model output statistics’.
Though the model is successful, the data specifications and sources are not stable
and guaranteed.
40.2.2 Applications
Learning in DNN is accomplished in machine learning. The scope of applica-
tions of DNN in statistical modeling and prediction is to catalyze the operations of
feedback and feed-forward methodologies as computer-aided modeling and predic-
tion. However, deep learning has a different application framework compared with
machine learning.
The deep neural networks deal with deep learning; the statistical downscaling
employs deep neural networks to accomplish operable end-to-end workflow model.
The modeling process entreats the feature extraction processes, enabling the subject’s
pre-engineered features to learn and develop the customized features. During
learning, objective of all deep neural networks is to understand the features, which
are further customized into multiple levels and based on the data representation.
Higher the abstraction, higher the level of learning. The latest features give only
an outline of the data rather than details. In statistical downscaling, all resolutions
levels of data have to process for learning in the deep neural networks. Particularly
in precipitation prediction, a local scale value forms a conglomerate of parameters
and defines abstraction in learning.
CNNs are a particular category of deep neural networks that undertake a whole-
some building of ‘workflow-based frameworks’. The convolutional neural networks
contrast with the conventional neural networks, which emerge from essential capac-
ities in dealing with complex computations toward stratified computations on the
extensive dimensional data. Understanding the model comprises learning the abstract
forms of the data later, for conceptualizing the results and decision-making parame-
ters, a detailed version of data is incorporated in the process. Thus, the CNN reduces
structural redundancy first to foster effective information representation.
Voluminous data is mounted into the numerical simulations of remote sensing
observations, as geophysical data possesses intrinsic structures with time and space
collections. Such data poses reliable candidacy for the experiments in deep neural
networks. The farthest weather conditions can be determined using CNN with the
possible rates. The most trending experiments with weather data are nowcasting,
which is possible with CNNs [13].
In long-standing deep learning research proposals, convolution neural networks
that suit to work on super-resolution data. Parameters of atmospheric data for esti-
mating precipitation using statistical downscaling methods possess and work on high-
resolution data sets. Therefore, research has increased on weather nowcasting using
434 M. G. Galety et al.
convolutional neural networks on super-resolution weather data. A brief compre-
hension of the atmospheric data is fetched from the experiments on weather data
using convolutional neural networks, with a detailed exploration into parameterizing
uncertain data, modeling in geo-fluids, and dynamic modeling of atmospheric data.
Deep neural networks proved the questions of comprehensive applications on exten-
sive dimension data, empirically notified observations, and numerical simulations in
the improvising precipitation estimation framework.
40.3 Methodology
Many real-time scenarios are based on the formulation of the precipitation estimation
problem. During the formulation, the established models describe the situations of
circulation-precipitation connection are we analogized. A windward slope, merid-
ional temperature gradients during winters, and particularly storms define winter
precipitation as an extra-tropical process or primarily as a cyclone. Similar situations
in the mountainous regions are defined as an orographic effect. Using conventional
mechanisms, researchers and meteorological scientists lost critical observations for
some areas of geography,which penetrates or travels across several landmasses, while
determining the accurate precipitation estimation. A super-resolution case needs to be
employed when the grid and sub-grid level precipitation estimations can be prepared
from the data.
Energy plays a vital role in the atmosphere, such as atmospheric energy as potential
energy to kinetic energy, encompassing baroclinic disturbances, even positively as
baroclinic waves. The parameters that support the build-up of super-resolution data
seem unstable to draw cyclonic effects, though they provide grid and sub-grid levels
of data. Moisture convection and precipitation formation can be known from the
disturbances dealing with the distinct precipitation distribution pattern.
To understand the characteristics of the atmospheric disturbances, an empirical
precipitation estimation is evolved during the numeric weather prediction.
The circulation design to precipitation estimation is represented with empirical
mapping as follows:
E(X,C)=fc(X) (40.1)
The cited formula with E,P,X, and Crepresents the computation of expected
value, precipitation-based estimates, predictors, and local-based climate conditions.
The empirical consequence for the climate-based condition (C) is defined as fc.
Major parameters for the definition of fcare represented by β.
The precipitation observed daily is referred to by P, representing drawings signif-
icant to target-related grid. Pis predicted from the above formulation needs an
essential clarification about fc,which is subsumed as the corresponding training and
validation based on spatio-temporal resolutions.
40 Precipitation Estimation Using Deep Learning 435
Ultimately, to derive deterministic precipitation based on the prediction drawn in
cited formula, Xis computed with significant clarifications (from realistic values),
exhibiting the utilization in the iterations of analysis of the products yield into
observations.
The essential capacities of the complex computation on the extensive dimension
data are wholly vested with the construction of neural networks; however, conven-
tional neural networks depose constraints, a deep learning neural network is suitable
using convolutions, which reduces the structural redundancy of the model fostering
the parameters required for effective information extraction.
The statistical downscaling model projects a geo-grid with a minimum character-
istic scale. A stipulated horizontal velocity of the atmosphere is set up for the model’s
profile. Essential information of circulation context and the dynamical downscaling
on the geo-grid is assumed for the deliverables to suit the atmosphere’s dynamics.
Particularly in the application of precipitation estimation, the role of the feature
vector is introduced to estimate the transformation as the initial activity for sample
four-dimensional physical field, which also represents the circulation and moisture
profile. Also implicit is that the relationship between predictors and predictands with
feature vector comprises predictors’ characteristics.
Precipitation estimates which are intervening are composed of circulation field
with a significant climate scale with respect to the local climate conditions, enabling
the features for the statistical downscaling in precipitation prediction are adopted. A
cyclogenesis instance is considered, where precipitation is directly proportional to
local climate conditions that impose cyclones. Where the parameters set are different
globally, and as the local climate, comprehensive information for the precipitation
estimation is not seen.
Encoded local values extract the contingencies of circulation and precipitation
estimation. Parameters related to circulation and moisture from the extensive dimen-
sion data fed into a typical CNN architecture with the support of sensible features to
estimate the climate with precipitation at the candidate geo-grid comprehensively.
As CNN works with pooling, pooling operations direct the parameters with convo-
lutions to generate the expected patterns of parameters in the atmosphere. The higher
layers in the network promote the computations of the variables of precipitation
drawn from the extensive dimension data, which are classified and filtered with
geo-potential height to overcome the circulation constraint.
The moisture constraint related to the expected precipitation is defined as precip-
itable water. It is known spatially as a coverage constant for the dynamic fields of
the geo-grid, which are represented as xand y.
Now, the predictors are computed with the circulation and moisture constraints,
where cis determined as the predictor’s category, the convolutional layer applies
tensor with c×c×m×nwith a minimum predefined stride of 3 ×three as input.
The dot-product is drawn from the tensor, and various patches in the information
element wise and further noted as C×x×yarray. Convolution is performed as a
computation for the dot-product. At the same time, a relative tensor c×c×m×n
defines kernel for the convolution-based layer. The kernel tensor, c, and m×nare
channel numbers and receptive meadow output.
436 M. G. Galety et al.
The results obtained in the computation are further treated with the nonlinear-
based transformation f, which is imposed as a ReLU function, f(x)=max (0, x).
The cis represented as the convolution-based operation, utilizing filters to convert
the input as essential objectives of learning. The geo-grid’s critical local dynamic
design is the resource for learning at the activated filters.
Hence, local features contribute to the learning at an abstract level on the geo-grid
receptive field at higher layers in the convolution framework. A maximum local patch
is computed by a feature map of the convolution neural network in the pooling layer.
The nature of the convolution neural network is complex, but non-linearity at pooling
stages will suffice the stacked parameter problems with the features. Subsequent
incidence of the input to the dense base layers will succeed in extracting all known
attributes for the benefit of the target being learned.
Temporal collection of the snapshots of the dynamical meadow in the select geo-
grid is recorded and computed with averages to enable the convolution to provide
the total daily precipitation and finally figure out the patterns with a unique picture
of precipitation estimation. A CNN model can be used to map the dynamic fields of
the geo-grid as a snapshot collected temporally and can be very well represented as
a daily precipitation estimation.
Compared to conventional neural networks, more layers define the complex
computations on a geo-grid’s complex structured large dimension dynamical data.
Traditional machine learning and probabilistic networks could only project the latent
values. Hence, the training and validation processes in the CNN model thrust on the
prediction of the data exceptionally. Regularization is the process of avoiding over-
fitting. A dropout and batch normalization could be accomplished as modules after
employing the CNN, which ensures the improvisation of the model. Limitations in
the training and validation process adapt continuous requirements to the input layers
while refining the learning process.
Further, backpropagation shall also be employed in the network models to
improvize the extensive dimension data with dynamic climate parameters drawn
for a geo-grid. The tuning or adjustment of parameters is decided by the stride size
which shall lead to improvisation of learning rate’, using gradient descent method.
For every known loss, a function is computed in the gradient, which attributes for the
circulation-snapshot of the dynamical field of geo-grid temporally, where the outputs
are summed up.
40.4 Discussion
Among are the computational theories; the methods converge and connect entirely to
the hypothesis of the problems and the experimentations with the highest propensity
of proving methods. CNN can deal with extensive dimension data with coarse, uncer-
tain, and formalizable properties. The CNN can relentlessly perform computations
yielding local and global climate modeling decision-making patterns with relevant
snapshots temporally collected. In precipitation estimation using a numerical model,
40 Precipitation Estimation Using Deep Learning 437
the CNN is prudent with strengths of receptive meadow and super-resolution dynam-
ical data of the geo-grid. A single geo-grid may be input for an elementary test to
achieve the best descriptive results. As more features are vectorized, more learning
parameters shall be deduced to work with the CNN; the overall complexity with the
CNN is multiplication and stride with the deciding kernels, which is almost treated
as negligible when compared with the tasks of the numerical model in the statistical
downscaling for precipitation estimation.
40.5 Conclusion
Prediction of precipitation alleviates the catastrophic problems and aid the meteoro-
logical balance of the atmosphere. Thrifty usage management of natural resources
can manage the atmospheric balance; computations of a numerical atmosphere model
are needed in times of untoward imbalances. CNN introduces models to overcome the
dimensionality of the data and improvised performances compared with conventional
precipitation estimation of numerical weather forecasting. The statistical down-
scaling model with convolutional neural networks is implemented to estimate the
atmosphere’s copiousness using snapshots.
References
1. Shen, C. (2018). A transdisciplinary review of deep learning research and its relevance for
water resources scientists. Water Resources Research, 54(11), 8558–8593.
2. Miikkulainen, R., et al. (2019). Evolving deep neural networks. In Artificial intelligence in the
age of neural networks and brain computing (pp. 293–312). Academic Press.
3. Kamilaris, A., & Prenafeta-Boldú, F. X. (2018). Deep learning in agriculture: A survey.
Computers and Electronics in Agriculture, 147, 70–90.
4. Pinardi, N., et al. (2017). From weather to ocean predictions: a historical viewpoint. Journal
of Marine Research, 75(3), 103–159.
5. Kalnay, E. (2019). Historical perspective: Earlier ensembles and forecasting forecast skill.
Quarterly Journal of the Royal Meteorological Society, 145, 25–34.
6. Zarekarizi, M., Rana, A., & Moradkhani, H. (2018). Precipitation extremes and their relation
to climatic indices in the Pacific Northwest USA. Climate Dynamics, 50(11–12), 4519–4537.
7. Akatsuka, S., Susaki, J., & Takagi, M. (2018). Estimation of precipitable water using numerical
prediction data. Engineering Journal, 22(3), 257–268.
8. Galety,M., Al Mukthar, F. H., Maaroof, R. J., & Rofoo, F. (2021). Deep neural network concepts
for classification using convolutional neural network: A systematic review and evaluation.
Technium: Romanian Journal of Applied Sciences and Technology,3(8), 58–70. https://doi.
org/10.47577/technium.v3i8.4554
9. Gouse, G. M., Haji, C. M., Saravanan (2018) Improved reconfigurable based lightweight crypto
algorithms for IoT based applications. Journal of Advanced Research in Dynamical & Control
Systems, 10(12), 186–193.
10. Reshma, G., Al-Atroshi, C., Nassa, V. K., Geetha, B., Sunitha, G., Galety, M. G., &
Neelakandan, S. (2022). Deep learning-based skin lesion diagnosis model using dermoscopic
images. Intelligent Automation and Soft Computing, 31, 621–634.
Chapter 41
The Adaptive Strategies Improving
Digital Twin Using the Internet of Things
N. Venkateswarulu, P. Sunil Kumar Reddy, O. Obulesu, and K. Suresh
Abstract Digital twins for factories and processes are becoming more prevalent
and more valuable as a result of recent technological breakthroughs and the rise of
smart manufacturing. There are also more potential for closed-loop analytics with
digital twins, as well as with the rise of connection, data storage, and the Industrial
Internet of Things (IIoT). Some factories have employed discrete event simulations
(DES) to construct digital twins that are connected to the manufacturing floor and
can be monitored in real time. However, it is difficult to quantify the advantages
of a digital twin that is linked to the real world. With the emergence of the new
generation of mobile network (5G), Tactile Internet, as well as the deployment of
Industry 4.0 and Health 4.0, multimedia systems are moving towards immersed
haptic-enabled human–machine interaction systems such as the digital twin (DT).
Specifically, Industry 4.0 will be using DT and robots on a large scale. This will
increase human–machine and interaction to a great extent. There will be multimodal
communications used to interact with digital twins and robots, especially haptics.
Hence, Tactile Internet will replace the conventional Internet today. In fact, a DT
system can also be extended in Health 4.0 domain to act as a COVID-19 is a COVID-
19 early warning system. When a person’s temperature and other symptom data are
tracked in real time, it may be determined whether or not it is time to see a doctor or
undergo a COVID examination. In conjunction with a COVID tracing programme,
the digital twin may be able to provide further information about the virus in relation
to the individual. Since there are currently no well-recognized models to evaluate the
performance of these systems, to address this research lacuna, we proposed a Quality
of Experience (QoE) model for DT systems con-training multi-levels of subjec-
tive, objective, and physiopsychological influencing factors. The model is itemized
through a fully detailed taxonomy that deduces the perceived user’s emotional and
N. Venkateswarulu ·O. Obulesu (B)
Department of CSE, GNITS, Hyderabad 500104, India
e-mail: Obulesh196@gmail.com
P. Sunil Kumar Reddy
Tibco Software India Ltd, Hyderabad, India
K. Suresh
Department of CSE, Sree Vidyanikethan Engineering College, Tirupati, Andhra Pradesh, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
B. N. K. Rao et al. (eds.), Intelligent Computing and Applications, Smart Innovation,
Systems and Technologies 315, https://doi.org/10.1007/978-981-19-4162-7_41
439
440 N. Venkateswarulu et al.
physical states during and after consuming spatial, temporal, proximal, and abstracted
multi-modality media between humans and machines.
41.1 Introduction
In recent years, research into the concept of “Industry 4.0” has exploded, robotics,
visual systems, autonomous control, and closed-loop feedback which is essential
changing current production practices in the process [1]. New production technolo-
gies such as to Industry 4.0. If you’re interested in learning more about how to
increase quality control through Industry 4.0, there are plenty of options for research
and development [2]. As a result of a wide range of factors, including a tremendous
amount of data available to owners and operators, this change has occurred. Recent
advances in technology, having been integrated into factories.
“Internet of Things” (IoT) is an umbrella term that refers to the expansion of the
Internet into the physical world through the widespread deployment of devices that
have embedded identification and sensing capabilities. “Internet of Things” refers
to the widespread deployment of devices with embedded identification and sensing
capabilities. According to Wikipedia, “The Internet of Items (IoT) is an infrastruc-
ture that enables improved services by networking both physical and virtual things,
based on existing and evolving interoperable information and communication tech-
nologies”, according to the ITU-T Y.2060.The fourth industrial revolution origi-
nates from the advent of IoTs. Through the advancement of information capture and
sharing, processing and communication, systems may interact to deliver services not
formerly attainable. Tangible objects, the things, are in the physical world and are
efficient of being sensed and attached to others in its network. These things every
so often have a virtual controller that enables the interaction and data exchange
with sensory things in the environment. This infrastructure manifests itself in the
domain of Industry 4.0 as a technology known as cyber-physical systems (CPS).
Khaitan and McCalley [3] defines CPS as “systems in which physical and software
components are closely coupled, each working on a different special and temporal
scale, exhibiting diverse and distinct behaviours, and interacting with one another in a
multitude of ways that change with context”. Cybernetic capabilities in every compo-
nent, high levels of automation, different scales of networking, and multiple levels of
integration across time and space are all necessary CPS aspects. A modular dynamic
that allows for reorganization and reconfiguration is another essential CPS element.
Providing haptic contact with audiovisual feedback, as well as technology solutions
that allow not only visual engagement, but also that involving robotic systems, to
be controlled and guided with an imperceptible time delay, are required for next-
generation innovation. The Tactile Internet will be created as a result of this new
wave of innovation (TI). According to the perspective of El Saddik [1], a digital twin
is an exact digital reproduction of a live or nonliving physical thing. Data can be sent
between the physical and virtual worlds without interruption, allowing the virtual
and physical worlds to coexist in real time. In order to monitor, comprehend, and
41 The Adaptive Strategies Improving Digital Twin Using 441
Fig. 41.1 Digital twin technology
optimize the physical entity, a digital twin is an integrated multimedia that enriches
the pathways and gives continuous input to improve quality of life and well-being.
Artificial intelligence (AI), mixed reality (MR), and haptics, IoT, cybersecurity, the
Tactile Internet (TI), industry, and health are some of the technologies that make up
a digital twin. In addition to the quality of experience in particular (Fig. 41.1).
A key component of the digital twin vision is the optimization of multi-modal
communication/interaction from the user perspective, which did not get any attention
from industrial and academic stake holders until so far. As a result, the demands to
conduct extensive researches and developments in this domain are still growing and
infancy stages.
41.2 Relevant Work
Due to COVID-19 outbreak, most of the daily activities such as work, research,
and education are taken place online rather than in an inline style. According to
Feldmann et al. [4] Since March 2020, traffic to applications for teleworking and
online education, such as VPN and video conferencing, has increased by more than
200 per cent on an annual basis. Because of this, there has been a significant increase
in Internet traffic. As a matter of fact, according to a recent CNN news piece 1, popular
video providers such as Netflix and YouTube are slowing down their operations in
North America and Europe in order to prevent the Internet from crashing completely.
The COVID-19 pandemic has altered how we live, breathe, and do business. It has
brought the global economy to a halt, posing insurmountable continuity problems
to any industry. With that being addressed, there is an increasing demand to enable
digital twin technology and ultimately over the Tactile Internet.
When stepping into the era of 5G, the emergence of the Tactile Internet (TI) can
help haptic communication enter the new world for digital twins’ applications. The
most directly related is people’s health and well-being in life. Up to now, people
442 N. Venkateswarulu et al.
Fig. 41.2 Industry 4.0 using digital twin (DT) architecture over the Tactile Internet
who live in some suburban areas still suffer from not getting high-quality and timely
treatment. With the help of superb network conditions and the support from the haptic
technical system, the patients whose physical conditions are not allowed to travel
in long distance to seek medical treatment can be diagnosed and treated at home or
local hospital by some experienced doctors at remote places all over the world. One
of its well-known applications is telesurgery, as illustrated in Fig. 41.2.
41.3 Proposed Work
The predicted decay pattern of the spectrum. The same pattern was observed in all of
the participants studied. If you look at Fig. 41.2, you can clearly see that the median
frequency patterns of muscles like the deltoid and deltoid muscles diminish over
time as a sign of muscle exhaustion. Because the EMG MDF signal fits the second
polynomial regression model, we could utilize its slope as a good predictor of muscle
weariness in that respondent.
Algorithm 1 MDF Fatigue measurement
1: procedure SUBJECT’S MUSCLE
2: for each subject do
3: for each muscle do
4: GET EMG signal
5: GET Accy signal
6: do FFTsEMG(ω)
7: Calculate MDF
8: V =second polynomial regression fit of MDF
9: end for
10: FI =MDF d2V dt2
41 The Adaptive Strategies Improving Digital Twin Using 443
11: end for
12: end procedure
Algorithm 1 is based solely on the high-level reported subjective parameters of
the taxonomy. Both models outperform the others with a Pearson correlation for the
SVM =87.4% and 86.9% for the normal and generalized regression. In terms of
how the reported QoE varies from its model-predicted level. The error rate for the
SVM was the minimal with er =6.5%. This means that SVM can predict QoE for
Mukers HVR game with results that only deviate 6.5 points on a scale of 100 from
the reported subjective QoE values. The linear regression model also performs well
and was capable to quantify the predicted QoE with Eq. 4.4:
QoEP =α1.CQ +α2HWQ +α3NWQ +α4.UX +α5.US +β(41.1)
where αirepresents the weighting factor for each influencing quality and βis the
intercept. With coefficient of determination =91.0%, Eq. 4.4 can be realized with a
95% confidence interval as Eq. 4.5:
QoEP =5.751CQ +0.1848HWQ +0.3766NWQ +0.1723.UX +0.0372US
11.29 (41.2)
As can be noticed from Eqs. 41.1 and 41.2 has the most significant factors due
to the fact that the users were was not aware of the simulated QoS disturbance
for the controlled scenarios and most of them reported that there was a noticeable
latency that impact the responsiveness of the game as well as physics and the force
feedback from the haptic devices were abrupted during the HVR interaction. Keep in
mind that the Mukers is a dynamic Telehaptic VR environment that demands more
communication with the server in order to update the objects 3D model and its haptic
and locomotion parameters.
41.4 Results and Discussion
In terms of QoE, prediction based on biosignals only. As can be seen in Fig. 41.3,the
SVM again outperforms the other ML algorithms. It should be noted that even with
using biosignals as the input features for the ML algorithms, we have achieved an
accepted prediction level with the largest relative error for Chi-squared Automatic
Interaction Detection algorithm (CHAID) equal to 16.64%. It is obvious that incor-
porating both subjective and physiopsychological inputs improves the performance
of all 6 ML algorithms reaching almost 92% correlation rate for the case of SVM.
More statics about the performance of the 6 ML algorithms are depicted in Fig. 41.3.
Lastly, it should be reported that when we feed all the attributes to the ML regression
pipeline.
444 N. Venkateswarulu et al.
Linear
Biometric only 0.834 0.833 0.828 0.827 0.833 0.793
Subjective+Biometric 0.919 0.917 0.915 0.9145 0.899 0.866
Linear
Biometric Only
0.077 0.094 0.0968 0.1017 0.1029 0.1664
Subjective+Biometric 0.054 0.0569 0.0601 0.0637 0.0974 0.1147
Fig. 41.3 Evaluation performance for ML QoE based on biometrics only (red) and subjective +
biometrics
41.5 Conclusion
The technology employed in the experiment as well as the research methodologies
used to realize our DT QoE taxonomy. We wanted to elaborate and show the reasoning
and justifications that drove the implementation of this experiment. We also comment
about the variables that were important for this QoE modelling and why they were
selected. We also build multiple QoE prediction models based on machine learning.
These models can automatically estimate the QoE based on explicit users’ subjective
and implicit biomedical features. The models were benchmarked and assessed using
multiple metrics. In the next two chapters, we will emphasize the QoE of DT tele-
operation on a large scale as well as we will model an important physiological
attribute, i.e., fatigue when designing a DT system. We compared our technique with
the existing technique in terms of precision and response time.
41 The Adaptive Strategies Improving Digital Twin Using 445
References
1. El Saddik, A. (2018). Digital twins: The convergence of multimedia technologies. IEEE
Multimedia, 25(2), 87–92.
2. Steinbach, E., Strese, M., Eid, M., Liu, X., Bhardwaj, A., Liu, Q., Al-Ja’afreh, M., Mahmoodi,
T., Hassen, R., El Saddik, A., et al. (2018). Haptic codecs for the tactile internet. Proceedings
of the IEEE, 107(2), 447–470.
3. Khaitan, S. K., & McCalley, J. D. (2014). Design techniques and applications of cyberphysical
systems: A survey. IEEE Systems Journal, 9(2), 350–365.
4. Feldmann, A., Gasser, O., Lichtblau, F., Pujol, E., Poese, I., Dietzel, C., Wagner, D., Wichtl-
huber, M., Tapiador, J., Vallina-Rodriguez, N., et al. (2020). The lockdown effect: Implications
of the covid-19 pandemic on internet traffic. In Proceedings of the ACM Internet Measurement
Conference (pp. 1–18).
5. Arima, R., Sithu, M., Ishibashi, Y., et al. (2017). QOE assessment of fairness between players in
networked virtual 3D objects identification game using haptic, olfactory, and auditory senses.
International Journal of Communications, Network and System Sciences, 10(07), 129.
6. Narumi, T., Kajinami, T., Nishizaka, S., Tanikawa, T., & Hirose, M. (2011). Pseudo-gustatory
display system based on cross-modal integration of vision, olfaction and gustation. In 2011
IEEE Virtual Reality Conference (VR) (pp. 127–130). IEEE.
7. Hassen, R., & Steinbach, E. (2018). Hssim: An objective haptic quality assessment measure
for force-feedback signals. In 2018 Tenth International Conference on Quality of Multimedia
Experience (QoMEX) (pp. 1–6). IEEE.
8. 3D systems. Touch haptic device. (2021). Online accessed; Thursday, May 6, 2021.
9. Hoda, M., Hafidh, B., & El Saddik, A. (2015). Haptic glove for finger rehabilitation. In 2015
IEEE International Conference on Multimedia & Expo Workshops (ICMEW) (pp. 1–6). IEEE.
10. Holland, O., Steinbach, E., Venkatesha Prasad, R., Liu, Q., Dawy, Z., Aijaz, A., Pappas, N.,
Chandra, K., Rao, V.S., Oteafy, S., et al. (2019). The IEEE 1918.1 “tactile internet” standards
working group and its standards. Proceedings of the IEEE, 107(2), 256–279.
11. ITU-T Study Group 13. (2012). Recommendation itu-t y.2060, overview of the internet of
things.
12. Fettweis, G. P. (2014). The tactile internet: Applications and challenges. IEEE Vehicular
Technology Magazine, 9(1), 64–70.
13. International TelecommunicationUnion ITU-T. (2014). The tactile internet. ITU-T Technology
Watch Report.
14. Rank, M., Shi, Z., Müller, H. J., & Hirche, S. (2010). Perception of delay in haptic telepresence
systems. Presence: Teleoperators and Virtual Environments, 19(5), 389–399.
15. Ache, B. W., & Young, J. M. (2005). Olfaction: Diverse species, conserved principles. Neuron,
48(3), 417–430.
16. El Saddik, A., Orozco, M., Eid, M., & Cha, J. (2011). Haptics technologies: Bringing touch to
multimedia.Springer Series on Touch and Haptic Systems. Springer.
17. Yuan, Z., Ghinea, G., & Muntean, G.-M. (2014). Quality of experience study for multiple senso-
rial media delivery. In 2014 International Wireless Communications and Mobile Computing
Conference (IWCMC) (pp. 1142–1146). IEEE.
18. Yuan, Z., Chen, S., Ghinea, G., & Muntean, G.-M. (2014). User quality of experience of
mulsemedia applications. ACM Transactions on Multimedia Computing, Communications,
and Applications, 11(1s), 15:1–15:19.
19. Eid, M., & El Saddik, A. (2012). Admux communication protocol for real-time multi-
modal interaction. In Proceedings of the 2012 IEEE/ACM 16th International Symposium on
Distributed Simulation and Real Time Applications, DS-RT ’12 (pp. 118–123), Washington,
DC, USA. IEEE Computer Society.
20. Robles-De-La-Torre, G. (2006). The importance of the sense of touch in virtual and real
environments. IEEE Multimedia, 13(3), 24–30.
Chapter 42
Deep Learning for Breast Cancer
Diagnosis Using Histopathological
Images
Mohammad Gouse Galety , Firas Husham Almukhtar,
Rebaz Jamal Maaroof, and Fanar Fareed Hanna Rofoo
Abstract Deep learning capabilities with convolution neural networks are unlim-
ited in achieving absolute learning results in all kinds of medical imaging methods.
The algorithmic mechanism on datasets is prudent enough to qualify their efficiency.
Many random textures and structures are found in the histopathological images of
breast cancer that deal with multi-color and multi-structure components. Most of
the experiments performed in the wet labs derive results conventionally, but when
assisted with the computation models of learning, the accuracy, reliability, and speci-
ficity of the results are boosted empirically. The process of employing the compu-
tational methods using convolution neural networks in parallel to the conventional
experimentation of classification for diagnosing malign breast cancer images attains
satisfactory results for effective decision-support.
42.1 Introduction
Despite the annoying rise in breast cancer counts, many developed countries have
preventive healthcare systems developed to reduce the death rate in recent years. A
boon of technologies and awareness in the country milieu is the significant landmarks
that prevent falling prey to the cancerous epidemic. Medical imaging technologies
yell in almost all the corners of the world for early detection and screening for
effective prevention. However, medical and health care still work with a varied set of
diagnostics tests and mechanisms in parallel with technologies with high sensitivity
and specificity verging into the classification as ‘cancer’ or ‘no cancer.’
Research and Literary consensus demonstrate multi-level and multiple levels of
resolutions on the histopathological images. Many time-hard analyses in the research
jot on to valuable academic theses. In vitro, datasets are the golden collection of
benchmarking datasets, where experts with minimal diagnostic risks in computer-
aided diagnosis (CAD) have derived many sophisticated decision-making methods.
M. G. Galety (B)·F. H. Almukhtar ·R. J. Maaroof ·F. F. H. R ofoo
Department of Information Technology, College of Computer Science and Information
Technology, Catholic University in Erbil, Erbil, Kurdistan Region 44001, Iraq
e-mail: m.galety@cue.edu.krd
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
B. N. K. Rao et al. (eds.), Intelligent Computing and Applications, Smart Innovation,
Systems and Technologies 315, https://doi.org/10.1007/978-981-19-4162-7_42
447
448 M. G. Galety et al.
Experts and research multitude annotate prospective decision-making in building
automated systems with promising features of reducing the risks and improving
efficiency in diagnosis.
42.2 Literature Review
Classification is the process of categorizing and identifying the types of breast cancer
from high-resolution histopathological images used in deep learning using convolu-
tional neural networks. Classifying benign and malign tissues is the order of work in
the prognostic evaluation [1]. The carcinoma, which is also hostility in the tissues, is
studied with many morphological properties to determine the various levels of sever-
ities [2]. Hematoxylin and Eosin (H&E) is the mechanism conventionally used to
trace the primary stains on the specimens. Rule-based methods in machine learning to
convolution neural networks of deep learning work with many digital compositions
on histopathological analyses to achieve the automated end-to-end process [3,4].
Gradient descent and gradient boosting algorithms were prioritized during the
iterations of a CNN model in the ImageNet. Collections of H&E stained images of
histopathological breast cancer forms are processed in the pre-trained ImageNet. A
fully automated Computer-Aided Diagnostic system effectively uses DL and ML to
diagnose breast cancer [5]. Detecting and extracting patches containing masses and
classification are the core of the CNN frameworks. Early screening and detection and
preprocessing of pectoral muscle by removal of mass segments examination need
not be encouraged, with external interventions, if CAD can do effectively.
Many inherent features and relationships in the histopathological images are
discovered using machine learning. The performance of machine learning is further
extended to convolutional neural networks, as prudent as a hallmark for intelligent
decision-making. Learning-based critical problems are solved meticulously using
convolutional neural networks, particularly temporal and spatial datasets [6].
An inter-pathologist variability is an essential consideration in the traditional
systems while conducting all diagnostic experiments, which are prone to error and
biases. Using quantitative analyses in the clinical setup, such variations are well miti-
gated [7]. Still, H&E promotes the aspects of variability for the stained areas, which
paves ardent avenues into the research for the pronunciation of new contributions in
proving accuracy in image analysis.
Resolution, format, and the structure of images influence the development of
imaging devices. In contrast, large images occupy much storage space format, and
the structure of the image plays a significant role. Whole-Side digital pathology
Images (WSI) are examples of the detailed imagery of histopathological breast cancer
images. These images are difficult to slice into hundreds and many smaller tiles during
the preprocessing and selection stage; hence, the lowest and detailed magnification
of imagery is considered for classification and segmentation tasks. Metadata plays
are essential in sliced images to map with WSI positions, which must be integrated
with information. The WSI experiments break the barriers of handling errors during
42 Deep Learning for Breast Cancer Diagnosis Using 449
the mapping of sliced images with the whole image format, subsequently providing
input to the convolutional neural networks to alleviate the challenges of uncertainty,
size, and format. As problems of tumor identification are well dealt with machine
learning frameworks, AI is the carrier for mitigating the imaging problems and their
sizes without compromising the quality of the problem identification and description
about the tumor detection, quoted by Robertson et al. [2,8].
42.3 Proposed Method
Computer-Aided Diagnosis, software that supports medical diagnosis, shall use
datasets, preferably benchmark data sets which are proven for the experimentations
of deep learning and conventional models. Benchmark datasets like BreakHis have
clinically relevant public breast cancer histopathology dataset, with different kinds of
trade-offs for practitioners, which is a significant study to date; however, the relative
availability of the internal clinical data is worth compared to the benchmark data.
Pathological specimens from the experimentations are thoroughly expedited during
automated examination, a just in time process and adapts economy in analyses [911]
(Fig. 42.1).
Various resolutions of images and WSI have been considered for the experimen-
tations; based on the degree of magnification of the image, the images are collected
into four major categories of benign and malign class. The preprocessing stage of
breast cancer diagnosis using convolutional neural networks begins with differenti-
ating the images as benign or malign. Further, they are sub-categorized, as shown
above. For effective classification and sub-classification of images, a binary classifier
is employed to classify the identity and benign or malign to overcome the problem
of iterative re-works.
The convolutional neural network, which shall be employed on the datasets,
is based on the classification requirements. Binary classification can ideally work
on magnification-specific (MS) training scenarios, and multi-category classifica-
tion works linearly on the training strategies magnification-independent (MI). The
MS employs on various models with different sets of specifically-magnified image
datasets, per WHO identification of benign and malign cancers. Data abstraction
is considered during the MI implementation as all the magnification levels are
considered, unlike the MS method.
Fig. 42.1 WHO
recommended classification
and pathological
categorization of breast
cancer
450 M. G. Galety et al.
Feature Extraction
Recent research methods are proposed to extract the features of images, which contain
certain constituent blocks of features such as edges, corners, blobs, clouds, and
ridges. One of these properties of the features can be considered for the analyses in
the breast cancer images. These properties are reflected as pixels representing the
color attributes and the pathological attributes viewed in the physical image. Matrix-
based methods, binary-pattern methods, and color-histogram methods can be used
in deep learning to identify and extract the features from the images systematically.
These methods with complex computations in parallel with pathological studies will
undoubtedly compete with the experimentation of the wet lab.
An image is floated considering the tumor/cancer, which correlated with healthy
reference image symptoms, where some areas of the images are examined using simi-
larity computations and statistics. Methods of statistics such as cross-correlations,
phase correlations, image ratio uniformity, and difference of squares; other methods
of mutual information exchange for the selected areas of the populated features
include complex computations. The mapping of feature points of the healthy image
with the floating images is examined in the similarity process. Specific parame-
ters identified from the health to tumor are considered conversion or transformation
parameters, with bifurcation and cross-over points, which are studied algorithmically
(Fig. 42.2).
When identifying tumors or cancer, specific essential components are not visible
in the images, where grayscale aberrations must be eliminated. Tissue staining
methods like H&E are implemented before the process of the visualization. The
process of staining could enable the reading of highlighted areas of the compo-
nents and morphological features, where the same can be seen with a high definition
microscope.
Fig. 42.2 Basic image analysis framework
42 Deep Learning for Breast Cancer Diagnosis Using 451
Fig. 42.3 Samples of mammography mass lesions; abenign; bmalignant
Deep Learning Methodology
From images corpuses, candidate dataset for the classification process is selected
by scaling the original images into different sizes. The scaling and phasing of the
images influence the learning time, with a possible least time, and irrelevant portions
of the images could be eliminated from the learning process. However, grayscale
images are useful but cause aberrations and conflicting brightness, which intrude the
entire process of identifying and detecting the tumor parts, where mainly the shape
of the tumor is the only identification of its benign or malign nature. Further, the
convolution neural network framework is implemented on the collected benign and
malign images (Fig. 42.3).
42.4 Experimental Results
Many essential aspects of malign nature proved pathologically from the breast cancer
images are revealed using the convolutional neural network. From the pashing and
scaling, the normalized images of breast cancer are sized to 220 ×220 are the primary
tensors of size 192 ×192. The beginning convolution in the configured CNN shall be
initiated with a 3 ×3×2 kernel filter with an essential stride of 1 ×1, and 24 filters
are applied. A max-pool is produced from the first convolution using the pooling
layer with 2 ×2, reducing to 96 ×96. A ReLU is applied on the resulting output of
the first convolution and sent through the subsequent convolution with nonlinearity
and into the successive layers. For the second convolution operation, kernel filters
of size 3 ×3×24 with 48 filtrations are applied, and the input size is reduced to
48 ×48; after max-pooling, with a stride of 2 ×2, the output is further scaled. Once
again, the nonlinearity is added to the output of the previous convolution layer. For
the third convolution operation, kernel filters of size 3 ×3×96 with 96 filtrations
are applied, and the input size is reduced to 24 ×24 after max-pooling with a stride
of 2 ×2 the output is further scaled. Nonlinearity is imbibed in the activation stage
452 M. G. Galety et al.
using the ReLU function, promoting the fourth convolution with 192 filters of 3 ×
3×192 as the kernel size.
To clear the anomaly of activations, by filling the space during reduction of the
output, the image is max-pooled to the size 12 ×12. Further convolutions are operated
to process the results of all the pre-configured layers of the network, including ReLU
and 240 filtrations. The tensor 6 ×6×240 results from the convolutions and then
linearized and flattened to shape the feature. The values of the features within the
neurons reflect the symptoms of the malign tissues.
Convolution neural networks face the problem of underfitting and overfitting. To
overcome overfit, specific dropout values are used in the dropout layer, where the
feature can be defined in a realistic format. Fewer neurons are used to determine the
class of the datasets to minimize the ambiguity from the fully connected layers. The
fully connected layer at the final stage of convolution will bring out the tensor with
a limited number of neurons; in the experiment, it is observed as 48 are converted to
several classes under the malign and benign. Significant loss and while improvising
accuracy occurred during the experimentation’s training and validation, depicted in
the following graphs (Figs. 42.4 and 42.5).
Fig. 42.4 Misclassified histopathological breast cancer images and the loss incurred during training
and validation toward the accuracy of benign nature of images
Fig. 42.5 Misclassified histopathological breast cancer images and the loss incurred during training
and validation toward the accuracy of malign nature of images
42 Deep Learning for Breast Cancer Diagnosis Using 453
Fig. 42.6 ROC curve demonstrating the AUC for benign and malign breast cancer images
The generalization of the proposed model is based on the selection of the images,
where the grading inaccuracies will affect the interest of deep learning methodology.
From the observations of the experimentations conducted, the classification of breast
cancer images as malign categories with their sub-categories, based on the selection
of the datasets and propensity of the proposed method, is referenced by the AUC of
the ROC, which are drawn as follows determining for the malign and benign classes.
Accuracy (ACC) =0.7866, Sensitivity (TPR) =0.7921, Specificity (TNR) =0.7837,
False Positive Ratio (FPR) =0.2163, Positive Predictive Value (PPV) =0.6597, and
Negative Predictive Value (NPV) =0.8769 are the ROC factors obtained for the
benign class. Accuracy (ACC) =0.7849, Sensitivity (TPR) =0.788, Specificity
(TNR) =0.7832, False Positive Ratio (FPR) =0.2168, Positive Predictive Value
(PPV) =0.673, and Negative Predictive Value (NPV) =0.8671 are the ROC factors
obtained for the malign class (Fig. 42.6).
Therefore, the CNN proposes a learning model that has achieved measurable
results on the scaled, phased, and normalized histopathological images, with different
dimensions and resolutions classifying the images as malign. Different kinds of
CNN architectures may be proposed, almost like ImageNet, AlexNet, etc., where
the malign images of 240 ×240 are given as input to the CNN’s convolution,
pooling, and ReLu. The proposed model is prudent and flexible in deriving the
desired results. Python with Tensorflow Keras in Anaconda Navigator with snip-
pets of code in Jupyter Notebook and specific areas of adding activation function
performed with colab of Google.
42.5 Conclusion
The CNN model for the classification of breast cancer images has been proposed in
the paper, which proves a simple CNN with a sequential model can be implemented
for image classification. All possible structures and textures of the features of breast
454 M. G. Galety et al.
cancer can be thoroughly examined by overcoming various kind of color-scale aber-
rations. The proposed model has achieved the best classification of breast cancer
images as malign. The proposed model can compete with the wet lab experiments
and has promising quantitative and qualitative analysis features. More qualitative
and good resolution images can be applied to obtain the absolute image classification
results.
References
1. Elston, C. W., & Ellis, I. O. (1991). Pathological prognostic factors in breast cancer. I. The
value of histological grade in breast cancer: Experience from a large study with long-term
follow-up. Histopathology, 19(5), 403–410.
2. Robertson, S., et al. (2018). Digital image analysis in breast pathology—From image processing
techniques to artificial intelligence. Translational Research, 194, 19–35.
3. Rakhlin, et al. (2018). Deep convolutional neural networks for breast cancer histologyimage
analysis. In International Conference Image Analysis and Recognition (pp. 737–744). Springer.
4. Spanhol, et al. (2015). A dataset for breast cancer histopathological image classification. IEEE
Transactions on Biomedical Engineering, 63(7), 1455–1462.
5. Shayma’a, et al. (2020). Breast cancer masses classification using deep convolutional neural
networks and transfer learning. Multimedia Tools and Applications, 79(41), 30735–30768.
6. Khan, et al. (2020). A survey of the recent architectures of deep convolutional neural networks.
Artificial Intelligence Review, 53(8), 5455–5516.
7. Srinidhi, et al. (2020). Deep neural network models for computational histopathology: A survey.
Medical Image Analysis, 101813.
8. Reshma, G., Al-Atroshi, C., Nassa, V. K., Geetha, B., Sunitha, G., Galety, M.G., &
Neelakandan, S. (2022). Deep learning-based skin lesion diagnosis model using dermoscopic
images. Intelligent Automation and Soft Computing, 31, 621–634.
9. Galety,M., Al Mukthar, F. H., Maaroof, R. J., & Rofoo, F. (2021). Deep neural network concepts
for classification using convolutional neural network: A systematic review and evaluation.
Technium: Romanian Journal of Applied Sciences and Technology, 3(8), 58–70. http://doi.org/
10.47577/technium.v3i8.4554
10. Sahu, B., Gouse, M., Pattnaik, C. R., & Mohanty, S. N. (2021). MMFA-SVM: New bio-marker
gene discovery algorithms for cancer gene expression. Materials Today: Proceedings. ISSN
2214-7853. http://doi.org/10.1016/j.matpr.2020.11.617
11. Gouse, D. G. M., Haji, C. M., & Saravanan, D. (2018). Improved reconfigurable based
lightweight crypto algorithms for IoT based applications. Journal of Advanced Research in
Dynamical & Control Systems, 10(12), 186–193.
Chapter 43
Implementation of 12 Band Integer
Filter-Bank for Digital Hearing Aid
K. Ayyappa Swamy and Zachariah C. Alex
Abstract To compensate for sensorineural hearing loss, audio signals should be
amplified selectively and very sensitively. For this, it can pass through a filter-bank
structure followed by a gain adjustment block. In this research, we proposed to design
a 12 band integer filter-bank that can produce filtered and amplified audio signals
without the need for additional amplification. Integer filter-bank also reduces the
latency when related to the fractional filter-bank. The proposed method is examined
with four audiograms having different kinds of hearing losses. The proposed 12 band
integer filter-bank provides minimum matching error with acceptable latency.
43.1 Introduction
A digital hearing aid is a device that can compensate for hearing losses of different
kinds. Conductive hearing loss can be compensated using a simple amplifier with
constant gain. In mixed and sensorineural hearing loss cases, gain values may be
different at each frequency. To compensate mixed and sensorineural hearing loss
pass, the audio signal through the set of filters known as filter-bank. The output of
each filter is feed to an amplifier with some gain which will be depending on the
level of hearing loss at that band of frequency. The block diagram representation of
digital hearing aid with filter-bank technique is shown in Fig. 43.1 [1].
K. Ayyappa Swamy ·Z. C. Alex (B)
School of Electronics Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu 632014,
India
e-mail: zachariahcalex@vit.ac.in
K. Ayyappa Swamy
Department of ECE, Aditya Engineering College, Surampalem, Andhra Pradesh 533437, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
B. N. K. Rao et al. (eds.), Intelligent Computing and Applications, Smart Innovation,
Systems and Technologies 315, https://doi.org/10.1007/978-981-19-4162-7_43
455
456 K. Ayyappa Swamy and Z. C. Alex
Fig. 43.1 Block diagram for digital hearing aid with filter-bank structure [1]
From past 3 to 4 decades, researchers are working on developing better algorithm
for hearing aid using filter-bank structure. Currently, researches focused on to reduce
the complexity as well as to reduce the latency. Mostly used filter-bank is uniform
filter-bank [2] in which entire range of audio signal is divided into equal bands.
In this paper, we have implemented an efficient 12 band non-uniform integer
filter-bank to compensate for hearing loss. We introduced a new integer filter-bank
which was designed by converting FIR filter coefficients in filter-bank to integer
values. Integer filter-bank performs better in terms of speed when compared with
fractional filter-bank [3]. Proposed integer filter-bank performs gain adjustment inter-
nally without the need for any additional gain adjustment block. The performance
of the proposed algorithm was tested on two different types of audiograms.
The following is a breakdown of the paper’s structure. Implementation of integer
filter-bank and how it amplifies without multipliers are discussed in Sect. 43.2.
Design examples are given in Sect. 43.3. In Sect. 43.4, the test results are examined.
Section 43.5 draws the conclusion.
43.2 Integer Filter-Bank with Internal Gain Adjustment
43.2.1 Integer Filter-Bank
The proposed filter-bank shown in Fig. 43.2 consists of 12 non-uniform FIR filters,
it is a combination of low pass, high pass and band pass filters. In this paper, integer
filter-bank is implemented by converting fractional filter coefficients into integer
values by multiplying the fractional filter coefficients with 32,768 (215) and rounded
it to the nearest integer value. Fractional input samples are also converted into inte-
gers by multiplying with 32,768 (215) to perform integer multiplications and additions
which will reduce the delay when compared to fractional operations. After filtering,
43 Implementation of 12 Band Integer Filter-Bank for Digital 457
Fig. 43.2 Magnitude spectrum for FIR filter-bank designed using a Taylor window
filtered outputs which are in the form of integers will be converted back to fractional
value by dividing it with 230. To reduce the computational complexity, all multipli-
cations and divisions required in this process are performed by shift operations as
the multiplications are in the order of 2N.
In filter-bank, each FIR filter was designed using Taylor window [4] with Nbar
4, side-lobe level 60 dB, with order 70 and at 16,000 Hz sampling frequency. It is
observed that the Taylor window gives better stop-band attenuation for minimum
filter order [5]. For hearing aid applications, filters should have at least 60 dB attenu-
ation which will be helpful to amplify up to 60 dB [6,7] in case of moderately severe
hearing loss.
In this research, we considered 12 band non-uniform FIR filter-bank. 12 band
filter-bank is implemented by combining few filters in 16 band non-uniform filter-
bank proposed in [1]. Figure 43.2 shows the magnitude spectrum of 12 band filter-
bank for each FIR filter designed using a Taylor window with the specifications
mentioned above. As shown in the figure, the designed filter-bank provides 60 dB
stop-band attenuation which is best when comparison with other filter-banks types.
Filter-banks proposed in [8,9] provides up to 50 dB stop-band attenuation only.
43.2.2 Gain Adjustment
In sensorineural or mixed hearing loss gain values vary with frequency, different
range of frequencies may have different gain values. So, in hearing aid algorithm,
gain adjustment should be performed selectively and very sensitively. The output of
each filter needs to be multiplied with a predefined gain value given by the audiogram
to indemnify the hearing loss in the particular frequency range. The gain value is
458 K. Ayyappa Swamy and Z. C. Alex
calculated based on the hearing threshold at that frequency range which was recorded
using pure tone audiometry.
In the proposed algorithm, no need for an additional multiplier for gain adjustment
block, this is performed in the filtering process internally. Gain values are converted
in the form of a sum of 2ms and that mwill be subtracted from 30 which is used
to get fractional filter value in the integer filter-bank explained in the above section
(integer filter-bank). For example, if the gain value is 20, then it will be converted in
theformofasumof2
msas2
4+22=20. That means instead of shifting right by 30
it should be shifted by 26 30 (4) and 28 30 (2) and add both of them to get the
amplified output with a gain of 20.
43.3 Design Example and Performance Evaluation
Proposed 12 band FIR filter-bank is implemented using an audio signal processing
toolbox in MATLAB software. Audiograms with various kinds of hearing impair-
ments are used to assess the efficiency of the proposed non-uniform integer filter-bank
structure. The algorithm proposed in this paper is evaluated with audiograms having
a mild conductive hearing losses and high loss at frequencies.
Example 1: Mild conductive hearing loss
The result of audiometry for such cases is appeared in Fig. 43.3. Hearing edges
of right side ear are symbolised with ‘O’ were taken into consideration for gain
adjustment. As per the audiogram, the increase esteems 25, 25, 25, 35, 25 and 30 are
given to the proposed filter-bank.
Example 2: Middle frequency moderate hearing loss
Audiogram with moderate hearing misfortune at mid frequencies is considered for
assessing the proposed model in this example. As per the audiogram, the hearing
loss thresholds are 10, 20, 40, 50, 20 and 10, respectively. From the above outcomes,
plainly the proposed filter-bank gives 4.25 dB maximum matching error (MME) at
4000 Hz.
43.4 Results and Discussions
Figure 43.4 depicts the filter-bank’s response after gain adjustment. Figure 43.5
speaks about matching error. Tables 43.1 and 43.2 clear that the latency in the
designed hearing aid algorithm is less compared with other hearing aid algorithms.
Table 43.1 is for mild conductive hearing loss discussed in Example 1, and Table
43.2 is for moderate level hearing loss at mid frequencies as discussed in Example 2
in the design examples section.
43 Implementation of 12 Band Integer Filter-Bank for Digital 459
Fig. 43.3 Audiogram for moderately severe conductive hearing loss
Fig. 43.4 Magnitude response of 12 band filter-bank with gain adjustment
460 K. Ayyappa Swamy and Z. C. Alex
Fig. 43.5 Matching curve and matching error
Table 43.1 Proposed filter-bank results comparison with other filter-banks for example 1
Type of filter-bank MME (dB) Delay of the filter-bank (ms)
Fixed uniform (direct design) 6.39 4.3
Fixed non-uniform [10]9.61 15.7
Reconfigurable [8]4.82 29
Reconfigurable [9]5.63 12.1
Reconfigurable [7]3.33 5.55
Proposed algorithm 4.25 6.56
Table 43.2 Proposed filter-bank results comparison with other filter-banks for example 2
Type of filter-bank MME (dB) Delay of the filter-bank (ms)
Fixed uniform (direct design) 5.86 4.3
Fixed non-uniform [10]3.67 15.7
Reconfigurable [8]2.67 25
Reconfigurable [9]1.84 12.1
Reconfigurable [7]4.1 5.55
Proposed algorithm 2.7 6.56
43 Implementation of 12 Band Integer Filter-Bank for Digital 461
43.5 Conclusion
In this paper, a new non-uniform filter-bank structure with integer filter coefficients
is presented. The speed of the filter-bank has been improved in three ways: (1) by
replacing fractional filters with integer filters, (2) by replacing multiplications with
shifting operations in gain adjustment and (3) by using a symmetrical filter in filter-
bank. According to design examples, the suggested structure outperforms the existing
algorithm in terms of MME and latency (Figs. 43.6,43.7 and 43.8).
Fig. 43.6 Audiogram for moderate hearing loss at mid frequencies
462 K. Ayyappa Swamy and Z. C. Alex
Fig. 43.7 Magnitude response of 12 band filter-bank with gain adjustment
Fig. 43.8 Matching curve and matching error
Acknowledgements Work reported in this publication was supported by TIDE, DST, Govern-
ment of India, under grant reference No. #SEED/T I DE/015/2017/G”.
43 Implementation of 12 Band Integer Filter-Bank for Digital 463
References
1. Wei, Y., & Lian, Y. (2006). A 16-band non-uniform FIR digital filter bank for hearing aid. In
IEEE Biomedical Circuits and Systems Conference (pp. 186–189).
2. Brennan, R., & Schneider, T. (2001). An ultra low-power DSP system with a flexible filterbank.
In Conference Record of Thirty-Fifth Asilomar Conference on Signals, Systems and Computers
(Vol. 1, pp. 809–813).
3. Chrysafis, A. P., & Lansdowne, S. (1988). Fractional and integer arithmetic using the DSP56000
family of general-purpose digital signal processors. Motorola.
4. Prabhu, K. M. (2014). Window functions and their applications in signal process ing.Taylor&
Francis.
5. Gabr, R. H., & Kadah, Y. M. (2008). Digital color Doppler signal processing. In 2008 Cairo
International Biomedical Engineering Conference (pp. 1–5).
6. Tiwari, N., & Pandey, P. C. (2019). Sliding-band dynamic range compression for use in hearing
aids. International Journal of Speech Technology, 22(4), 911–926.
7. Swamy, K. A., & Alex, Z. C. (2021). Efficient low delay reconfigurable filter bank using parallel
structure for hearing aid applications with IoT. Personal and Ubiquitous Computing, 1–14.
8. Wei, Y., & Liu, D. (2013). A reconfigurable digital filterbank for hearing-aid systems with a
variety of sound wave decomposition plans. IEEE Transactions on Biomedical Engineering,
60(6).
9. Wei, Y., & Wang, Y. (2015). Design of low complexity adjustable filter bank for personalized
hearing aid solutions. IEEE/ACM Transactions on Audio, Speech, and Language Processing,
23(5), 923–931.
10. Lian, Y., & Wei, Y. (2005). A computationally efficient nonuniform FIR digital filter bank for
hearing aids. IEEE Transactionson Circuits and Systems I: Regular Papers,52(12), 2754–2762.
Chapter 44
Comparative Analysis on Heart Disease
Prediction Using Convolutional Neural
Network with Adapted Backpropagation
K. Suneetha, Kamala Challa, J. Avanija, Yaswanth Raparthi,
and Suresh Kallam
Abstract According to global medical records, the deadliest disease in the world is
Heart Disease. In this current generation, most of the population is suffering from
heart disease due to a lack of awareness regarding healthy food habits and fitness
maintenance. The death rate has increased, so early diagnosis is essential, and regular
health checkups are necessary. The problems arise due to insufficient blood flow to
the heart, short breathing, tiredness, or fatigue. Traditional methods are not adequate
to diagnose heart disease. There is a critical need for the medical system to detect
and predict heart disease early and deals with a more accurate analysis. Previously
the health industry generates massive unstructured data; because of change in the
modern world, the health industry generates structured machine-understandable data.
Nowadays, Artificial Intelligence methodologies are becoming more popular. This
study proposed convolutional neural network (CNN) method to detect heart disease
early. It deals with the 13 medical features as input and the CNN trained by Modi-
fied backpropagation. Comparing the proposed model with the existing models like
Gradient boosting tree, voting, Naïve Bayes, and Hybrid random forest linear model.
The results show that CNN gives a 96% accurate outcome by predicting heart disease
during testing.
K. Suneetha (B)
CS and IT, Jain (Deemed-to-be University), Bangalore, India
e-mail: keerthisuni.k@gmail.com
K. Challa
Department of Information Technology, VNR VJIET, Hyderabad, Telangana, India
J. Avanija ·S. Kallam
Department of CSE, Sree Vidyanikethan Engineering College, Tirupati, India
Y. Raparthi
School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, Tamil
Nadu, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
B. N. K. Rao et al. (eds.), Intelligent Computing and Applications, Smart Innovation,
Systems and Technologies 315, https://doi.org/10.1007/978-981-19-4162-7_44
465
466 K. Suneetha et al.
44.1 Introduction
The World Health Organization (WHO) statistics shows, heart disease prediction was
the leading death causes around the world. The incidence of heart disease has the
highest rate in many countries, including India. 2.6 million people are suffering from
heart disease. After the first or second treatment, half of them lost their lives. The death
rates are increasing due to not aware of the disease in the starting stage. The physician
could give effective medicines and treatments to the patient if the disease predicted
early, saving many lives. Developing a system that offers less operational cost and
high accuracy is the primary point. In medical analysis, artificial intelligence works
to find the hidden patterns of the data. To identify heart disease early and provide
solutions, some risk factors such as hypertension, diabetes, gender, age. Here, data
collection is the most foremost task, and the medical domain contains vast databases
to store all types of patient records.
Patients’ data is stored in a raw manner that the machine language cannot interpret
in Fig. 44.1. Data wrangling is the process of transforming raw data into machine-
readable information. There will be a selection and filtering of raw data by the
physician in order to choose and filter out the necessary information for analysis.
The training algorithm discovers the hidden patterns and rules in the filtered data, and
the test algorithm evaluates the model’s performance by determining its accuracy. If
the model’s accuracy is adequate after training and testing, the data will be put to
use. Deployment is the result of integrating optimization with operational activities.
For heart disease prediction of automated health diagnosis in this study, proposed a
convolutional neural networks model. The convolutional neural networks model is
used to predict heart disease early and trained by using a modified backpropagation
method. The prediction is done by preforming the analysis on heart disease dataset
and results were compared with Gradient boosted tree, voting, Naïve Bayes, Hybrid
random forest Linear model.
Paper organization as follows Sect. 44.2 related works, Sect. 44.3 data collection
with detailed information, Sect. 44.4 detailed explanation about the convolution
neural network, Sect. 44.5 detailed description about proposed CNN with adapted
backpropagation, Sect. 44.6 Performance metrics measure for the proposed model
on heart disease prediction, experimental results and finally Sect. 44.7 is discussed
the conclusion and future work.
44.2 Literature Survey
Many researchers were rigorously doing their research work on heart disease predic-
tion. Waris and Koteeswaran [1] used an improved k-means neighbor classifier tech-
nique to predict heart disease and illustrated it with the ER diagram and architecture.
Sinha [2] implemented MRI techniques that include conventional lymphangiograms
44 Comparative Analysis on Heart Disease Prediction Using 467
Database
Data as services to physician for
medical diagnosis
Physician
Analysis the data to make a
decision using machine learning
technique
Diagnoses the disease
patient’s data
Fig. 44.1 Architecture machine learning technique for data analytics
and dynamic contrast-enhanced magnetic resonance imaging to predict disease in
the heart.
Doppala [3] worked on the genetic algorithm using feature selection with radial
basis function to detect coronary illness. Here, the proposed study gave better accu-
racy after features minimized to nine characters during data preprocessing. Rani
[4] proposed the hybrid decision system and genetic algorithm to predict heart
disease. Bazoukis and Stavrakis [5] explained the conventional clinical methods of
heart disease. The study illustrates the search strategy, data extraction, and statistical
analysis, predicting the left ventricular assist device’s output promising inefficient
procedures structure.
Patel et al. [6] implemented the IoT and MQTT-based machine learning frame-
work, which estimates precision, disposition, and affectability for cardiovascular
disease. Al-Yarimi et al. [7] worked on n-gram attributes optimization by distinct
attribute correlation weight, and experiments gave significant results. Halima El
Machine learning techniques including Naive Bayes, support vector machine, deci-
sion tree, random forest, and K-nearest neighbor were used by El Hamdaoui [8]to
predict cardiac disease. With an accuracy rate of 84.17% in training and 84.28% in
testing, the Bayes algorithm was the most accurate method.
Kamalapurkar and Samyama Gunjal [9] worked on a web-based system using
machine learning. It provides better accurate results compared with two artificial
intelligence techniques like support vector machine and random forest. Spencer
[10] predicted heart disease using four different datasets (Cleveland dataset, Long-
Beach-VA-dataset, Hungarian dataset, Switzerland dataset). Filter and data extraction
methods used like PC features, Chi-squared, Relief, SU, and classification methods
like logistic and AdaBoost.
468 K. Suneetha et al.
Mohan [11] used cybersecurity for information-centric Internet of things and
machine learning algorithms applied to different features to predict heart disease.
Einarson et al. [12] estimated the cardiovascular disease in type 2 diabetes and
reviewed ten years (2007–2017) published papers. Abiko [13] measured the serum
concentration of hs-cTnT to predict heart disease. Shanmugasundaram [14]used
machine learning approaches such as Naïve Bayes, Decision Tree, K-nearest
neighbor to predict heart disease. Brain tumor classification of MRI images using
deep convolutional neural network [15]. Cardiac Arrhythmia Detection using
Dual-Tree Wavelet Transform and convolutional neural network [16].
Based on the above literature survey, in the medical domain, many machine
learning techniques are developed to analyze the patient’s data generated from the
medical database. The time to analyze medical data is a significant concern because
the patient data are complex and hard to diagnose, so it is difficult to predict the disease
based on the patient data stored in the medical database. Due to unpredictability in
massive datasets, the false positive rate increased and precision reduced in diagnosis.
Therefore, an efficient learning algorithm must lower computation time and improve
diagnostic accuracy using medical data analytics.
44.3 Methodology
44.3.1 Data Collecting
The Heart Disease dataset is available in the UCI Machine Learning repository.
The researchers extensively use for evaluation and analysis purpose. Heart disease
contains 303 instances and 13 features shown in Table 44.1. In the dataset, age
value ranges from 34 to 77 years, where 0 represents the female gender patients,
and 1 represents male gender patients. Chest pain is categorized into four types: (i)
Atypical Angina; the heart will not receive the required amount of blood and resultant
in narrowing the coronary arteries. (ii) Typical Angina will not receive the necessary
amount of blood and consequent in narrowing the coronary arteries. The difference
is the mental stress and emotional feeling with the chest pain in Angina type 2. (iii)
Asymptomatic, chest pains that are not related to type 3 angina experience various
reasons won’t consider heart disease. (iv) Non-anginal pain, symptoms which are
not reflecting heart disease. The trestbps signifies the resting blood pressure, the
level of cholesterol labeled as chol. Exercise-induced angina 1 means the chest
pain caused due to the lack of oxygen supply to the heart, and 0 indicated no chest
pain and represented with the feature code as exang. FBS means fasting blood sugar.
Achieving the maximum heart rate is thalach. Ca means the number of significant
vessels colored by fluoroscopy. ST-depression prompted by exercise relative rest
gives an old peak. Normal Resting electrocardiographic results indicated with 0, ST-T
wave abnormality in estecg indicated with 1, and restecg left ventricular hypertrophy
is delivered with 2. The peak’s slope means increment in the heart rate, where 1
44 Comparative Analysis on Heart Disease Prediction Using 469
represents up sloping, 2 represents flat, and 3 means downsloping. A thallium scan
means checking how much blood reaches all body parts by using the radioactive
tracer. There are missing values in the dataset; the missing values are replaced with
the interpolation values. Next, the dataset splits into 85% of training data with 258
instances and 13 attributes and 15% of testing data with 45 cases and 13 attributes.
Another attribute will considered as a label which is the output parameter for the
neural network. Where 1 represents the presence of heart disease and 0 means absence
of heart disease.
Table 44.1 Details of the dataset
Features Codes Description
Age Age Age in years
Chest pain type Cp 1=typical angina
2=atypical angina
3=non-anginal pain
4=asymptotic
Exercise-induced angina Exang 0=No
1=Yes
Fasting blood sugar Fbs 0=false
1=true
Maximum heart rate achieved Thalach Displays maximum heart rate
Number of major vessels Ca Displays values as integers or float
Oldpeak Oldpeak Displays values as integers or float
Resting blood pressure Trestbps Displays values in mmHg (unit)
Resting electrocardiographic results Restecg 0=normal
1=ST-T wave abnormality
2=left ventricular hypertrophy
Serum cholesterol Chol Displays values in mg/dl (unit)
Sex Sex 0=female
1=male
Slope of the peak Slope 1=up sloping
2=flat
3=down sloping
Thallium scan Thal 3 =normal
6=fixed defect
7=reversible defect
470 K. Suneetha et al.
44.3.2 Classification Modeling
44.3.2.1 Gradient Boosted Tree
The Gradient boosted trees are constructed based on the high entropy rate for
the data’s training samples. It follows a top-down recursive divide and conquers
approach. The Gradient boosting tree pruning is performed to evaluate the error
optimization of the dataset.
Entropy =−
m
j=1
xij log2xij (44.1)
44.3.2.2 Voting
Voting is an aggregation technique, and it is the combination of multiple classifiers.
The vote means dividing the training dataset into smaller and equal subsets of the
data and construct a classifier for each subset of the data. The final decision based on
the maximum number of votes that the class achieved by summing up all the votes
and choosing the highest aggregate value.
44.3.2.3 Naïve Bayes
The Naïve Bayes model applies the Bayes rule through independent attributes. For the
highest subsequent probability, the instance of data allocated to the class. The model
trained through the Gaussian function with prior probability Pxf=priority
(0:1)
Pxf1,xf2,....,xfm|c=πn
i=1P(xfi|c)(44.2)
Pxf|ci=Pci|xfPxf
P(ci)(44.3)
44.3.2.4 HRFLM
HRFLM is the combination of random forest and linear model. In this method, prepro-
cessing data followed by feature selection based on the classification performance
evaluation, entropy, and outcome accuracy is achieved. The feature selection model
repeats for various combinations of attributes. The HRFLM model’s performance
44 Comparative Analysis on Heart Disease Prediction Using 471
based on the thirteen attributes and the machine learning technique used for repe-
tition and performance is recorded. A linear model was developed in statistics and
studied as a model for understanding the relationship between input and output vari-
ables. The ensemble classifier constructs several random decision trees and combines
them to get the best outcome. It mainly applies bagging or bootstrap aggregating. In
the preprocessing data stage, the missing values removed from the data.
For the given data, x=x1,x2,....,xnwith y=x1,x2,....,xnwhich repeats
bagging from n=1toN. The average prediction for an individual tree is:
j=1
N
N
n=1
fb
x(44.4)
44.4 Convolutional Neural Network
In the deep neural networks, convolutional neural networks are higher version and
practical models for predicting. It has more layers to study lower and advanced-level
features. The convolutional neural networks contain convolution layer, pooling, fully
connected layers, and softmax layer. The convolution layer has the learning property
that gives pixels of images by splitting the images into minor pixels’ boxes. In
this layer, Cnn performs kernel and filtering operations on the data. The input is the
resultant of the previous layer. Dropping all the unused parameter in the pooling layers
which reduces dimensions of feature maps. (i) The max-pooling layer performance
operations on the maximum elements of the input data from the feature map area.
(ii) The average pooling calculates the average of the input data present in the size of
the feature map. (iii) The global pooling will reduce each network in the feature map
to a signal value. The fully connected layers take the transformed vector–matrix.
Here the feature map converts into a vector and fed into the neural network, and each
layer is connected to the activation unit. The fully connected network takes a vector
matrix and combines them into a one-dimensional feature vector to form a model
and activation function classify softmax function which is shown in Fig. 44.2.
44.5 CNN Modified Backpropagation
The convolutional neural network is a set of connected input and output elements.
Each of the corresponding elements has weights associated with it: In the learning
stage, the convolutional neural network adjusts the weights to predict the input tuple’s
accurate class label. Backpropagation is the fast matrix-based algorithm to compute
the output from a network. It is a learning algorithm and fine-tuning weight of a neural
network based on the previous epoch’s error rate. It helps calculate the Gradient
472 K. Suneetha et al.
Fig. 44.2 Convolutional neural network architecture
of a loss function concerning all the neural network weights. The convolutional
neural network-based model prediction was trained using a modified backpropaga-
tion training method. The below are the equations used to calculate the input and
output deviation. The low deviation indicates the value close to the mean (expected
value), while variation is high, it suggests that the values are in a wide range. Weights
and bias are essential models in a neural network. During the transmission between
the neurons and bias, the weight is applied to the input layers and passed into the
activation function and finding patterns to predict heart disease.
Input deviation of the nth neuron.
dI0
n=(yntn)ϕ(ϑn)=ϕ(ϑn)dO0
n(44.5)
Outcome deviation of the nth neuron.
dO0
n=(yntn)(44.6)
Bias and weight of the nth neuron
Bias0
n=dI0
n(44.7)
Weight0
n,p=dI0
nyn,p(44.8)
Input bias of the nth neuron in hidden layer L
dIL
n=ϕ(ϑn)dOL
n(44.9)
44 Comparative Analysis on Heart Disease Prediction Using 473
Outcome bias of the nth neuron in hidden layer L
dOL
n=
i++
n
dI0
nWi,n(44.10)
Bias and weight difference in row p, column qin the kth feature pattern, layer in
front of nneurons in hidden layer L.
WeightL,n
k,p,q=dIL
nyk
p,q(44.11)
BiasL
n=dIL
n(44.12)
Input Bias of row p, column qin the subsamples layer sand kth feature pattern
dIs,k
p,q=ϕ(ϑn)dOs,k
p,q(44.13)
Output Bias of row p, column qin the subsamples layer sand kth feature pattern
dOs,k
p,q=
i++
n
dIL
k,p,qWL,n
k,p,q(44.14)
Bias and weight difference in row p, column qin the subsamples layer sand kth
feature pattern,
Weights,k=
fh
p=0
fw
q=0
dIs,k
p
2,q
2Oc,k
p,q(44.15)
Cis represented as a convolution layer
Biass,k=
fh
p=0
fw
q=0
dOc,k
p,q(44.16)
Input bias of row pand column qin the kth feature pattern and convolutional layer
c
dIc,n
p,q=ϕ(ϑn)dIc,n
p,q(44.17)
Output bias of row pand column qin the kth feature pattern and convolutional
layer c
dOc,k
p,q=dIs,k
p
2,q
2Wn(44.18)
474 K. Suneetha et al.
Fig. 44.3 CNN flow chart
Weight variation of row p, column qin the kth the core layer, corresponding to
the nth feature pattern in the convolutional layer c
Weightn,k
r,c=
fh
p=0
fw
q=0
dIc,n
p,qOi1,k
p+r,q+c(44.19)
The complete bias variation of the convolutional layer core:
Biasc,n=
fh
p=0
fw
q=0
dIc,n
p,q(44.20)
The convolutional neural network is beneficial because of its high discriminative
power. Figure 44.3 represents the flowchart of the proposed method.
Figure 44.3 represents CNN flow chart:
(a) The dataset is given as input to the model in the data collection process.
(b) Essentially focused on reduction of noise and selection of features in the Data
preprocessing stage.
(c) To the dataset convolutional neural network is applied for processing in the Data
mining stage
(d) In the Pattern evaluation stage, the performance analysis is executed to calculate
the outcomes.
(e) In the Discovery of knowledge stage, the level of heart disease will be
discovered.
44.6 Results and Discussion
The experimental results and discussion of the CNN and existing models used to
calculate the model performance efficacy. Several standard metrics such as accu-
racy, precision, F1-score, recall have been measured and compared via graphical
44 Comparative Analysis on Heart Disease Prediction Using 475
representation (Fig. 44.4). Accuracy means it classifies true positive and true nega-
tive over the total number of classifications. Precision shows the closeness between
actual values and measurement. F1 score represents the mean of precision and recall.
Recall shows the model’s performance at all the classification models, and it is the
fraction of relevant instances retrieved. The metrics contain true positive (TF—appro-
priately predicted occurrences), true negative (TN—properly expected events which
are not required). False positive (FP—incorrectly predicted occurrences), false nega-
tive (FN—incorrectly predicted occurrences are not necessary). These are needed
to measure the accuracy outcomes. The heart disease prediction can be achieved
accurately, and success indicates heart disease using metric evaluation performance.
Accuracy =TP +TN /(TP +FP +TN +FN)(44.21)
Precision =TP/(TP +FP)(44.22)
Recall =TP/(TP +FN)(44.23)
F1-score =2Precision Recall/(precision +recall)(44.24)
0
20
40
60
80
100
120
ACCURACY PRECISION RECALL F1_SCORE
Performance
Evaluaon metrics
Gradient Boosted tree Vong Naïve bayes HRFLM CNN
Fig. 44.4 Result and analysis
476 K. Suneetha et al.
Machine learning builds a mathematical model based on the sample data. It
discovers new knowledge from the dataset and develops a system that automati-
cally adapts and customizes itself to individual users. The best performance model
in comparison to the existing model is the primary emphasis of machine learning
approaches. Convolutional neural networks were offered as a method for predicting
heart disease because of its high accuracy and low classification error.
Figure 44.4 contains all the experimental results compared with the existing
model where CNN achieved the highest accuracy. The convolutional neural network
is trained with four fully connected layers and with 91 feature maps. These fully
connected layers contain hidden layers of 1024 units each. The system performance
is calculated by changing the number of convolutional layers. Next, increment the
convolutional layers from 2 to 5 with similar feature maps to note the convolutional
layer’s impacts. Note that no change is made in entirely associated layers. The best
resultant is 96% of the accuracy is accomplished for four convolutional layers.
44.7 Conclusion
In this study, a convolutional neural network algorithm used to predict the pres-
ence of heart disease. Because of this neural network, the minimal cost can detect
heart disease with low time consumption. The model calculated on the heart disease
dataset contains thirteen features and one class label. A modified cnn backpropa-
gation algorithm is used, which gave the outcome of the patient’s health condition
(heart disease present or not). Most of the research work done using various machine
algorithms, convolution neural networks gave the best results compared with the
Gradient Boosted tree, Voting, Naive Bayes, Hybrid random forest linear model.
Future research may consider combining different machine learning techniques and
new feature selection methods that can be implemented by adding more diseases to
predict the risk of patients suffering from a heart attack.
References
1. Waris, S. F., & Koteeswaran, S. (2021). Heart disease early prediction using a novel machine
learning method called improved K-means neighbor classifier in python. Materials Today
Proceedings
2. Sinha, S., Lee, E. W., Dori, Y., & Katsuhide, M. (2021). Advances in lymphatic imaging and
interventions in patients with congenital heart disease. Progress in Pediatric Cardiology.
3. Doppala, B. P., Bhattacharyya, D., Chakkravarthy, M., & Kim, T. H. (2021). Hybrid machine
learning approach to identify coronary diseases using feature selection mechanism on heart
disease dataset. Distributed and Parallel Databases.
4. Rani, P., Kumar, R., Ahmed, N. M. O., & Jain, A. (2021). A decision support system for heart
disease prediction based upon machine learning. Journal of Reliable Intelligent Environments.
44 Comparative Analysis on Heart Disease Prediction Using 477
5. Bazoukis, G., Stavrakis, S., Zhou, J., et al. (2020). Machine learning versus conventional clinical
methods in guiding management of heart failure patients—A systematic review. International
Journal of Recent Advances in Multidisciplinary Topics.
6. Patel, W. D., Vala, B., & Parekh, H. (2021). An advanced cognitive approach for heart disease
prediction based on machine learning and internet of medical things (IoMT). In Proceedings of
the Second International Conference on Information Management and Machine Intelligence.
7. Al-Yarimi, F. A. M., Munassar, N. M. A., et al. (2020). Feature optimization by discrete weights
for heart disease prediction using supervised learning. Methodologies and Application.
8. El Hamdaoui, H., Boujraf, S., El Houda Chaoui, N., & Maaroufi, M. (2020). A clinical
support system for prediction of heart disease using machine learning techniques. In 2020
5th International Conference on Advanced Technologies for Signal and Image Processing
(ATSIP).
9. Kamalapurkar, S., & Samyama Gunjal, G. H. (2020). Online portal for prediction of heart
disease using machine learning ensemble method (PrHD-ML). In 2020 IEEE Bangalore
Humanitarian Technology Conference (B-HTC).
10. Spencer, R., Thabtah, F., Abdelhamid, N., & Thompson, M. (2020). Exploring feature selection
and classification methods for predicting heart disease.
11. Mohan, S., Thirumalai, C., & Srivastava, G. (2019). Effective heart disease prediction using
hybrid machine learning techniques. In Smart Caching, Communications, Computing and
Cybersecurity for Information-Centric Internet of Things.
12. Einarson, T. R., Acs, A., Ludwig, C., & Panton, U. H. (2018). Prevalence of cardiovascular
disease in type 2 diabetes: A systematic literature review of scientific evidence from across the
world in 2007–2017. Cardiovascular Diabetology, 17 (1), 1–19.
13. Abiko, M., Inai, K., et al. (2018). The prognostic value of high sensitivity cardiac troponin T
in patients with congenital heart disease. Journal of Cardiology.
14. Shanmugasundaram, G., Malar Selvam, V., Saravanan, R., & Balaji, S. (2018). An investigation
of heart disease prediction techniques. In 2018 IEEE International Conference on System,
Computation, Automation and Networking (ICSCA).
15. Kuraparthi, S., & Reddy Madhavi, K. (2021). Brain tumor classification of MRI images using
deep convolutional neural network. Traitement du Signal, 38(4), 1171–1179. https://doi.org/
10.18280/ts.380428
16. Reddy Madhavi, K., et al. (2021). Cardiac arrhythmia detection using dual-tree wavelet
transform and convolutional neural network. Soft Computing.
Chapter 45
Applying Machine Learning to Enhance
COVID-19 Prediction and Diagnosis
of COVID-19 Treatment Using
Convalescent Plasma
Lavanya Kongala, Thoutireddy Shilpa, K. Reddy Madhavi,
Pradeep Ghantasala, and Suresh Kallam
Abstract The Coronavirus 2 relentless discriminating Respiratory Syndrome, the
source of coronavirus syndrome (COVID-19), has sparked a universal fitness catas-
trophe. So far, there is no such thing as approved prophylaxis solutions for the ones
who were revealed to neither SARS-CoV-2 nor therapies for individuals who have
acquired COVID-19. Impervious, i.e., “convalescent” plasma relates toward plasma
obtained as of persons, subsequent infection decree, and antibody production. Flaccid
supervision of antibodies throughout Convalescent plasma transfusion can happen
to provide Just for the short term policy on behalf of giving instantaneous resis-
tance headed for liable Participants. In COVID-19 convalescent plasma was also
used epidemic; It is limited in China records advise scientific detriment, together
with radiological motion, viral load diminution, and enhanced endurance. Interna-
tionally, blood centers have a robust infrastructure for storing and building a catalog
of convalescent plasma to congregate increasing stipulate.
45.1 Introduction
Corona Miridae [1] family viruses have positive thinking, single-strand, 26–32-
kilo base-length RNA structure. Coronaviruses have identified countless hosts and
multiple mammals like mice, camels, dogs, cats, bats, and, much later, crusty bristles.
L. Kongala (B)
Department of Computer Science, Vignan Nirula Institute of Technology and Science for Women,
Guntur, Andhra Pradesh, India
e-mail: sailavanya45@gmail.com
T. Sh i l pa
Department of CSE, B V Raju Institute of Technology, Narsapur, Medak, Telangana, India
K. Reddy Madhavi ·S. Kallam
Department of CSE, Sree Vidyanikethan Engineering College, Tirupati, India
P. Ghantasala
Chitkara University Institute of Engineering and Technology, Chandigarh, Punjab, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
B. N. K. Rao et al. (eds.), Intelligent Computing and Applications, Smart Innovation,
Systems and Technologies 315, https://doi.org/10.1007/978-981-19-4162-7_45
479
480 L. Kongala et al.
Some Coronaviruses are pathogenic to a person but cause serene or asymptomatic
symptoms. On the other hand, two lethal diseases have come out within this family in
the last two eras: relentless acute respiratory syndrome [2] coronavirus, and MERS
in the center East. They are illustrating Extreme fever (89%), unproductive cough
(75%), myalgia (55%), and dyspnea (60%), with the high level of permission to a
concentrated concern unit. A novel member of the Corona Miridae family linked to
stern. There was pneumonia found In December 2019, in Wuhan, China. Patients had
similar clinical results with high fever, dyspnea, and chest X-rays showing insidious
multilobed lesions To COV-SARS and COV-MERS. Originally named the book of
2019 coronavirus (2019–nCoV), the virus is currently known as the 2019 coronavirus
disease developing SARS-CoV-2 [3].
COVID-19 mortality [4] rates are higher than seasonal influenza mortality rates,
where mortality rates are generally below 0.1% 3 Patients with any comorbidity or
elder age are linked employing more inferior medical results than younger or non-
comorbid patients. If the number of comorbidities increases, the development of
the medical procedure worsens. Smoking, chronic disruptive pulmonary syndrome,
hypertension, diabetes, cardiovascular disease along melanoma be stated to be the
menace factors on behalf of extreme COVID-19 and mortality rise. While medical
trials have been performed concerning therapeutic and vaccine production, currently
there are no approved COVID-19 Vaccines or Treatments. It is significant finding a
cure within a limited period of the elevated transience rate and the lofty numeral of
grimy issues.
Reactive antibody treatment entails administering antibodies to a susceptible
person against a specific agent to prevent or indulgence infectious disorder related
to this agent. Active vaccination, on the contrary, necessitates stimulation of an
impervious reaction that takes time to create. Therefore, passive administration of
an antigen is merely meant to automatically susceptible immune system individ-
uals. Flaccid Antibodies treatment includes one tradition dating In the 1890s and
until antimicrobial therapy, was the solitary means to extravagance several infectious
diseases evolved about the 1940s.
Today, illness supervision is difficult, and being deficient in medical records with
antiviral agents is the norm. Lopinavir/Ritonavir [5] treatment approaches have failed
to show a decrease in overall mortality. The latest randomized medical revise of
hydroxychloroquine demonstrated a decline in body warmth and cough remission
relative to controls in the intrusion community.
Nevertheless, the limited taster size and the brief follow-up duration prevent
assumptions about its efficiency. Other work indicates an Azithromycin plus Hydrox-
ychloroquine might decrease epidemiologic load, but the medical reaction linked by
this method has not been calculated and relics to be established. Recently this mixture
remained related to more inferior results once administering hydroxy chloro-quine at
high doses. And there is no efficient or safe medication for COVID-19 supervision.
Recognition of the deficiency in support for the classic and historical diagnosis
of COVID-19 and vaccines methods has been thanked as possibilities for disease
prevention. It is that the case with Curbing plasma, an inert inoculation procedure.
It was used since the early twentieth century in the hindrance and organization of
45 Applying Machine Learning to Enhance COVID-19 Prediction 481
contagious diseases. The CP [6] is retrieved through the aphaeresis For Survivors,
previous Pathogen-caused infections concern where Antimicrobials to the disease’s
causative means is produced.
The critical goal for its eradication [7] is to neutralize the pathogen. Because of its
speedy acquisition, CP has been considered a disaster response to many Pandemics,
together with SARS-CoV, Spanish flu, Ebola, and West Nile in recent years [8]. CP
administered early after the onset of symptoms seen a decrease in the death toll in
contrast to panacea or no treatment for severe Acute Virus Respiratory Infections
such as Grippe and CoV-SARS. Still, related feedback At Ebola germ malady was
not it experiential.
Specific Proteins, for example, anti-inflammatory cytokines, clot issue, normal
defensins, antibodies, intransigent, and some more unidentified are proteins acquired
from spenders during aphaeresis [9] in addition to neutralizing antibodies. In that
respect, CP passage to contaminated patients can afford additional advantages
including immune modulation through the improvement of extreme provocative
reply. The former may be the case with COVID-19 [10] where a Systemic “cytokine
earthquake” or hyper inflammation-induced By IL-1β, IL-2, IL-6, IL-17, IL-8, CCL2,
and TNFαthat result in over-activation of the resistant system. This inflammatory
reaction can exacerbate fibrosis and pulmonary capacity damage. Here we recom-
mend the possible Profitable Pathways for the administration from CP-COVID-19
patients and availability of a review the indication for that strategy in today’s endemic.
Secretory IgA, whichever is the dominant immunoglobulin isotype on the mucosal
façades, are crucial players in setting respiratory viral infections. They were prepared
up of 2 molecules of IgA, a joining protein (J chain), and a secretory portion. IgM and
IgA are vigorously ecstatic via the polymeric Ig receptor or through the neonatal Fc
receptor through the epithelium, although IgG could inertly migrate keen on alveolar
fluids.
The lung needs unique IgG2a antiviral on behalf of defense in bronchial oils then
alveoli terminals. Because of the COVID-19 pandemic emergency [11], this study
shortened historical application situations. It analyzed existing know-hows for the
assortment, manufacture, pathogen inactivation and funding of convalescent blood
yields, by a particular emphasis on potential COVID-19 presentations.
45.2 Convalescent Plasma as Potentially COVID-19
Therapy
Convalescent Plasma (CP) therapy encompasses to manage the plasma-containing
immune globulins of a freshly convalesced patient since a particular septicity SARS-2
for somewhat person prone to or contaminated with a severe disease alike COVID-
19 meant for prophylaxis and treatment purposes. Immunized plasma works through
directly mandatory to a specific pathogen like infection SARS-2 and inducing its
482 L. Kongala et al.
Fig. 45.1 The antiviral reactions
denaturation, ultimately exterminating the concluding from the peripheral blood-
stream, whereas further antibody-interceded conduits, like complementary mech-
anism, antibody-needy phagocytosis and cytotoxicity, may also subsidize to the
therapeutic possessions attained (Fig. 45.1).
In the absenteeism in some approved medications or, treatment, convalescent
plasma is historically usage in Machupo virus eruption, Lassa fever, Junín virus,
and an insufficient others to list. Previously, convalescent plasma treatment was used
successfully in the healing of flurry of SARS, MERS [12], and Ebola viruses. Only
approximately readings indicate convalescent plasma therapy has been operative in
the treatment of influenza H5N1, avian flu, and H1N1. The usage of pooled plasma or
immune globulins derived after convalesced Western Nile encephalitis patients has
confirmed a caring result in infected mice as well as a clinical advantage in patients.
In additional meta-analysis reading of 10 patients by seven reported cases of
COVID-19 and three patients with viral capacity not observable nevertheless indi-
viduals per symptoms were preserved with transfusion About 200 ml immune
of plasma 1:640 Antibody neutralizer titers besides the antiviral products, then
methylprednisolone, post-transfusion tests revealed full symptom determination in
1:700.
Of these, nine patients were treated with Arbidol monotherapy or Remde-
sivir, Peramivir or Ribavirin combination therapy although one patient received
Monotherapy for Ribavirin. Six of these, too, administered methylprednisolone intra-
venous. In the same analysis, comparative results were reported with ten further
patients who did not undergo plasma therapy in conjunction through corticosteroid
methylprednisolone then antiviral medicines in which it was found that three patients
pass away. In contrast, six others stood in stable circumstance, and one case for control
group showed symptom perseverance, resulting in a higher mortality rate of around
30%.
45.3 The Covid-19 Healing of Convalescent Plasma
In the latest global outbreak, CP was used to take care of patients with COVID-19
China. Five decisively ailing COVID-19 patients who were wayward to antiviral and
hormone therapy obtained Of 5 separate donors, 400 mL CP in studies carried out
45 Applying Machine Learning to Enhance COVID-19 Prediction 483
by Shan et al. Both donors had an about 1:1000 Antibody Titer with SARS-CoV-
2-particular ELISA; and an antibody titer over 40 In 4 (80%) of 5 patients, after
CP transfusion; stabilized temperature in body surrounded by three days; decreased
Sequential Evaluation of Organ Failure; and increased PaO2/FiO2In 12 days (range
172–276 and 284–366); decreased Viralist load and within 12 days it turned negative;
increased ELISA and Antibody titers neutralized.
After 13 days, severe respiratory suffering disorder enhanced in 5 individuals
(85%); 2 weeks later, four patients were enthralled; 3 out of 5 patients (60%) The
hospital was discharged and the other two patients remained healthy 37 days last 14
reports, Researchers diagnosed ten decisively ill COVID-19 patients amid antiviral
along with steroid treatment, plus a 200 mL dosage of CP Investigators prospectively
compared signs and results from the laboratory four days after infusion with CP.
All patients have well-accepted CP. This considerably increased the neutralization
of antibodies [13] to a high degree; viremia vanished in 7 days; medical indicators
rapidly resolved within three days. It was one increase in the number of lymphocytes,
and in SaO2; On Radiology inspection, lung lesions were confirmed to have improved
considerably within weekdays. Although these experiments include a limited patient
number, available evidence indicates The CP supervision has secure also diminishes
the load of a virus (Fig. 45.2).
Fig. 45.2 Chests of various patients recovering from COVID-19
484 L. Kongala et al.
Fig. 45.3 Effects of immunomodulatory
45.3.1 Plasma Convalescent: The Principle of Action
The exact CP function has not yet been established in COVID-19. Past research,
however, showed that CP’s Other Viral Action Mechanism infections Like Ebola
disease also Syncytial Respiratory [14] germ are predominantly virus neutralizing.
The different pathways are cytotoxic to cells that are caused by the antibody, comple-
ment activation and phagocytosis. The neutralization of antibodies supplied with CP
will provide viral charge regulation. Additionally, non-neutralizing anticorps will
help prophylaxis [15] and improve revitalization (Fig. 45.3).
45.3.2 Collections Steps of Convalescent Plasma
It is imperative to control each particular phase of CP gathering. Preliminary from
donor consideration to patient supervision of CP, all measures should be cautiously
coordinated and carried out by professional fitness employees.
(a) Eligibility for a Donor
(b) Donor Pre-donation Appraisal
(c) Recruiting donors
(d) Convalescent plasma processing at aphaeresis centers.
(a) Eligibility for a Donor: The eligibility requirements for CP donors can differ
across countries. Individuals that get together the following criteria may be a
CP donor according to the FDA:
(i) Blood donor checks were performed, recovered from COVID-19, and was
appropriate for contribution;
(ii) COVID-19 proof reported by a laboratories inspection Via the diagnostic
examination, for instance, Nasopharyngeal swab, when sick, or by after
45 Applying Machine Learning to Enhance COVID-19 Prediction 485
recovery, complimentary serological check for SARS-CoV-2, if precedent
to recovery. YIELENO AND AL [16]. 369 Diagnostic monitoring at the
time COVID-19 was supposed was not carried out.
(iii) Full symptom resolution no later than 14 days before contribution.
(iv) The male or female contributor who were Donors not expectant or female
who were screened for Antibodies to HLA since their latest pregnancy
and have perceived the consequences as unfavorable.
(v) It is recommended that antibody titers of at least 1:190 be neutralized when
calculating the neutralizing antibody titers are available. A 1:90 titer can
be measured appropriate if nearby is no alternative matched unit single.
(b) Donor Pre-donation Appraisal: Real-time reverse transcriptase PCR [17]is
currently a preferential examine for coronavirus discovery. RNA detectability,
however, was imperfect in tasters serene previous today-6, and days 15–40.
For this cause, pre-donation investigation on coronavirus must be accompa-
nied by Antibodies detection checking. Any donor must meet his or her eligi-
bility requirements, and the entire tests needed. On regular donation of blood
must be assessed. Women givers for one record one would be pregnant monitor
for Antibodies to HLA reduce danger for severe lung damage associated with
transfusion.
(c) Recruiting Donors: Blood centers, in conjunction with local hospitals, will play
a role in attracting the contributors. In Turkey, therapeutic aphaeresis centers
approved by the department of fitness and the Turkish Crescent Rouge are
carrying exposed donor actions to receive CP.
(d) Convalescent Plasma Processing by Aphaeresis Clusters: Providers that
completed the pre-donation efficaciously valuation has mentioned in the insides
of aphaeresis. For compelling more momentous volumes in squat periods, CP
would be acquired through aphaeresis. Alike to the providers entire blood
volume; Circa 200–600 mL of plasma could be used achieved concluded
aphaeresis appliances. For, respectively, practice, the congregated plasma
capacity excepting Solution Anticoagulants does not outstrip 750 mL engaging
the donor’s endorsement a plasma donation conference might be scheduled
over. The time among contributions would fluctuate between nations. CP can be
processed to freeze, or to apply without freezing In 6 h. A freeze will begin within
the first 6 h after completion of the aphaeresis cycle. For traceability, plasma
ingredients should be marked on with ISBT128 Encryption scheme. These gath-
ered Tools should be used tagged About 200 mL alienated constituents individu-
ally, and specified as 1 unit. Barcoded goods should have been kept in a separate
storage cabinet at or below 18/25.
486 L. Kongala et al.
45.4 The Plasma Convalescent Dose
CP dosing is highly variable in the preceding studies. One plasma unit (200 mL) was
designed for use in prophylaxis in clinical trials, and one or two companies were
prepared for treatment. The period of effectiveness of the antibodies is uncertain.
Still, it is predicted to last for weeks to several months. In prior use CP therapy, 5 mL
per kg of plasma in SARS was used on 1:160 titers. A part or parts of the drug dose
was used in earlier trials for prophylactic rationale.
On Normal equivalence, 3250 mL/kg per plasma employing a titer of > 1:65 will
endow with 1/4 of 5 mL/kg of plasma among 1:160 titers an equivalent immune
globulin level. Dose by corpse credence should be used in pediatric transfusions
[18]. In the pediatric age group, COVID-19 is seldom symptomatic. Therefore, each
method in this epoch faction should be carried out in conjunction employing state and
global fitness powers that be within the framework of clinical research (Fig. 45.4).
Patient Medley: Numerous current medical examinations have very disparate
admissibility requirements around the incredibly pretentious and Individuals poste-
rior to exposure. The alternative of participants will diverge across Nations. The
FDA [19] permitted using the CP in patients getting to know the requirements below:
Confirmation of COVID-19 by laboratory [6].
Severe or endangering existence COVID-19 instantly.
The definition of serious disease is All of them, or more above:
Dystrophy,
30/min tachypnea,
Fig. 45.4 Convalescent
plasma
45 Applying Machine Learning to Enhance COVID-19 Prediction 487
Fig. 45.5 Work flow of potential regenerative blood products (CBP)
Saturation of Oxygen to the blood 93%,
PaO2/FiO2< 300,
Pulmonary penetration within 24–48 h > 50%.
This describes life-threatening illness like one or other of above:
Airborne malfunction,
Sepsis Stroke,
Dysfunction of Related organs (Fig. 45.5).
(a) Possible Risks
Transfusion of human plasma is a normal, regular activity in new hospice. Human
plasma Anti-SARS-CoV-2 only differs from normal plasma because antibodies to
SARS-CoV-2 are present. Donors must meet all blood donation criteria based on the
eligibility of voluntary contributors under federal and state regulations. We are going
to be obtained At FDA approved blood axis.
The Beneficiaries Threats of transfusion will therefore also be no dissimilar and
those from typical plasma recipients. The menace of transfusion-infectious infectivity
is precise low in an USA and further high-proceeds nations. Approximations usually
allude to are lesser than one infectivity with HIV bacteria, hepatitis B, and hepatitis
C per two million donations. Non-infectious transfusion hazards are also present.
For example, allergic transfusion retorts, transfusion induced vascular surcharge,
and acute damage linked to transfusion. While the probability of TRALI for every
488 L. Kongala et al.
5000 transfused units is typically less than one, TRALI [20] is from scrupulous
apprehension in extreme COVID-19 since possible Pulmonary Endothelium Priming.
Nonetheless, regular contributor screening involves the HLA anticorps screening
of female donors pregnancy Background to the reduce the menace of TRALI.
Remember that TACO Risk Factors, e.g., cardiovascular [21] illness and old age,
kidney failure, etc. Those at risk of COVID-19 are shared, illustrating requirement
close control of the amount of fluid.
(b) Plasma Preparation Process and Eminence Manage
Aphaeresis [22] was achieved Using CS Baxter 300 Baxter cell partition. Conva-
lescence [23] was plasma harvested with 40 donors for diagnosis. The mean age
was IQR 42.0 years, 32.5–49 years. Based on epoch and body weight, a 250–450-
mL ABO-complaint plasma trial be composed all contributor, and every sample was
broken down and stored at 4 °C as 200-mL Non-detergent aliquots or warmth healing.
Then the CP was tested in the therapeutic plasma germ [24] inactivation cabinet for
30 min with ethylene blue and light treatment.
45.5 Conclusion
The chances of infection with COVID-19 are significant. Individual Recovered
plasma screens with is COVID-19 expected being both a protected with the possibly
useful remedy on behalf of diagnosis moreover prophylaxis upon exposure. Consid-
erable proof of benefit with previous use for viral illness gives such an approach
a clear precedent. Nonetheless, well-controlled medical examinations are critically
essential to validate the effectiveness, thus notify informed evidence-based super-
visory. However, in the case of disapprovingly hostile patients, plasma transfusions
boost their clinical ailment and decrease transience rates, auxiliary research and
controlled irrefutable trials remain still needed to evaluate their effectiveness and
meticulous function in the usage of new Coronavirus.
References
1. Lu, H. (2020). Drug treatment options for the 2019-new corona virus (2019-nCoV). Bioscience
Trends, 14, 69–71.
2. Cheng, Y., et al. (2005). Use of convalescent plasma therapy in SARS patients in Hong Kong.
European Journal of Clinical Microbiology and Infectious Diseases, 24, 44–46.
3. Ko, J. H., et al. (2018). Challenges of convalescent plasma infusion therapy in Middle East
respiratory corona virus infection: A single centre experience. Antiviral Therapy, 23, 617–622.
4. World Health Organization. (2019). Novel-corona virus.
5. Wang, M., Cao, R., Zhang, L., Yang, X., Liu, J., et al. (2020). Remdesivir and chloroquine effec-
tively inhibit the recently emerged novel corona virus (2019-nCoV) in-vitro. Cell Research,
30, 269–271.
45 Applying Machine Learning to Enhance COVID-19 Prediction 489
6. Shen, C., Wang, Z., Zhao, F., Yang, Y., et al. (2020). Treatment of 5 critically ill patients with
COVID-19 with convalescent plasma. JAMA, 323(16), 1582–1589.
7. Hoenen, T., Groseth, A., & Feldmann, H. (2019). Therapeutic strategies to target the Ebola
virus life cycle. Nature Reviews Microbiology, 17, 593–606.
8. Luke, T. C., Kilbane, E. M., Jackson, J. L., & Hoffman, S. L. (2006). Meta-analysis: Conva-
lescent blood products for Spanish influenza pneumonia: A future H5N1 treatment? Annals of
Internal Medicine, 145, 599–609.
9. Wong, H. K., Lee, C. K., Hung, I. F., Leung, J. N., Hong, J., Yuen, K. Y., & Lin, C. K.
(2010). Practical limitations of convalescent plasma collection: A case scenario in pandemic
preparation for influenza A (H1N1) infection. Transfusion, 50, 1967–1971.
10. Su, S., Wong, G., Shi, W., Liu, J., Lai, A. C. K., Zhou, J., et al. (2016). Epidemiology, genetic
recombination, and pathogenesis of corona viruses. Trends in Microbiology, 24, 490–502.
https://doi.org/10.1016/j.tim.2016.03.00
11. Cavanagh, D. (2007). Corona virus avian infectious bronchitis virus. Veterinary Research, 38,
281–297. https://doi.org/10.1051/vetres:2006055
12. Sherer, Y., Levy, Y., & Shoenfeld, Y. (2002). IVIG in autoimmunity and cancer–efficacy versus
safety. Expert Opinion on Drug Safety, 1, 153–158. https://doi.org/10.1517/14740338.1
13. Katz, U., Achiron, A., Sherer, Y., & Shoenfeld, Y. (2007). Safety of intravenous immunoglob-
ulin (IVIG) therapy. Autoimmunity Reviews, 6, 257–259. https://doi.org/10.1016/j.autrev.2006.
08.011
14. Bloch, E. M., Shoham, S., Casadevall, A., Sachais, B. S., Shaz, B., Winters, J. L., et al. (2020).
Deployment of convalescent plasma for the prevention and treatment of COVID19. The Journal
of Clinical Investigation.https://doi.org/10.1172/JCI138745
15. Satya Sree, K. P. N. V., Bikku, T., Mounika, S., Ravinder, N., Kumar, M. L., & Prasad, C. (2021).
EMG Controlled Bionic Robotic Arm using Artificial Intelligence and Machine Learning. In
2021 Fifth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud)
(I-SMAC) (pp. 548–554). https://doi.org/10.1109/I-SMAC52330.2021.9640623
16. Johns Hopkins Bloomberg School of Public Health, Department of Molecular Microbiology
and Immunology, Baltimore, MD.
17. Holshue, M. L., et al. (2020). Washington state 2019-nCoV case investigation team, first case of
2019 novel corona virus in the United States. New England Journal of Medicine, 382, 929–936.
18. Toomula, S., Paulraj, D., Bose, J., Bikku, T.,& Sivabalaselvamani, D. (2022). IoT and wearables
for detection of COVID-19 diagnosis using fusion-based feature extraction with multikernel
extreme learning machine. In Wearable Telemedicine Technology for the Healthcare Industry
(pp. 137–152). Academic Press.
19. Chen, L., Xiong, J., Bao, L., & Shi, Y. (2020). Convalescent plasma as a potential therapy for
COVID-19. The Lancet Infectious Diseases, 20, 398–400.
20. Bikku, T., Satya Sree, K. P. N. V., Jarugula, J., & Sunkara, M. (2022). A Novel Integrated IoT
Framework with Classification Approach for Medical Data Analysis. In 2022 9th International
Conference on Computing for Sustainable Global Development (INDIACom). pp. 710-715.
https://doi.org/10.23919/INDIACom54597.2022.9763297
21. Duan, K., Liu, B., Li, C., Zhang, H., Yu, T., et al. (2020). The feasibility of convalescent plasma
therapy in severe COVID-19 patients: A pilot study. BMJ Yale.
22. Cao, W. C., Liu, W., Zhang, P. H., Zhang, F., & Richards, J. H. (2007). Disappearance of
antibodies to SARS-associated corona virus after recovery. New England Journal of Medicine,
357, 1162–1163.
23. Kong, L. K., & Zhou, B. P. (2006). Successful treatment of avian influenza with convalescent
plasma. Hong Kong Medical Journal, 12, 489.
24. Wong, S. S. Y., & Yuen, K.-Y. (2008). The management of corona virus infections with partic-
ular reference to SARS. Journal of Antimicrobial Chemotherapy, 62, 437–441. https://doi.org/
10.1093/jac/dkn243
Chapter 46
Analysis of Disaster Tweets Using
Natural Language Processing
Thulasi Bikku, Pathakamuri Chandrika, Anuhya Kanyadari,
Vuyyuru Prathima, and Borra Bhavana Sai
Abstract Now-a-days social media has become a crucial part of life. Twitter is a
social networking site on which people post and interact with messages renowned
as tweets. Users registered officially can tweet, like and re-tweet messages. During
the emergence of a disaster or crisis social media has become a significant means of
communication. The widespread use of mobile phones and other forms of commu-
nication allows individuals to express and alert others about real-life disasters. Such
knowledge relating to disasters spread over the media could save thousands of indi-
viduals by warning others and allowing them to take the required actions. It is being
worked on by many firms to analyze tweets and observe tweets relating to disasters
and emergencies using programming. Such efforts may be useful to loads of people
using the internet. However, this effort has other problems, such as detecting and
distinguishing catastrophe tweets from non-disaster tweets. Often the data available
in twitter is not structured so processing is to be done on the data to classify data as
‘disaster’ and ‘non-disaster’. This paper deals with developing a model that can tell if
a user is sharing data about a disaster. The data set used includes 10,000 tweets along
with classifiers. This Optimized SVM model pre-processes the data using Natural
Language Processing (NLP) and then builds the classifier model that gives maximum
accuracy.
46.1 Introduction
Twitter came into existence from July 15 in 2006, in the beginning it was a messaging
service and used among small group. It expanded and became a micro-blogging,
social networking service. It enabled users to post writings, images, and messages
renowned as ‘tweets’. Registered members can post, like and re-tweet messages,
whereas those not registered can only access and view tweets. Twitter attained a lot
of fame since its beginning from 2006. According to a data produced in 2019, on
T. Bikku (B)·P. Chandrika ·A. Kanyadari ·V. P r a t h i m a ·B. B. Sai
Department of Computer Science and Engineering, Vignan’s Nirula Institute of Technology and
Science for Women, Guntur, India
e-mail: thulasi.bikku@gmail.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
B. N. K. Rao et al. (eds.), Intelligent Computing and Applications, Smart Innovation,
Systems and Technologies 315, https://doi.org/10.1007/978-981-19-4162-7_46
491
492 T. Bikku et al.
an average around 6000 tweets are generated every second internationally. This is
producing a huge data every year world-wide it is estimated to be 200 billion tweets
approximately.
The data generated is unstructured as it is not in any authorized format. There
comes the exact challenge in analyzing the data available through twitter, so we
make use of Natural Language Processing (NLP). NLP is used to extract keywords
[1] that imply that the tweet is a disaster. Before using NLP, the data is to be processed
and then tweets are divided into words called tokens using tokenization, then analysis
is done on the tokenized data [2,3].
There are some methods that are used to pre-process the data in order to clean
it. These include removal of punctuation, tokenization, removal of stop-words,
and stemming. The process of removing punctuation marks from data is known
as removal of punctuation. The process of splitting text into units such as characters,
words, or sentences is tokenization. Some words do not give much significance such
words are stop-words and they are to be removed. Removal of words containing
prefixes and suffixes is necessary this can be accomplished by using stemming. The
objective is that we classify the tweets into either belonging to disaster class or a
non-disaster class. The question here is a yes or no question, we have to find out if
it belongs to a disaster class or not, and we later check the accuracy of the system
using Confusion matrix.
46.2 Literature Survey
Natural language processing is being the most crucial part in text mining and there are
many remarkable and exemplary research works relating to this field. The proposed
work on natural language processing highlights the process involved in processing
the tweets in order to analyze the tweets [4]. Sentiment analysis of twitter is also
another famous field which laid emphasis in developing disaster tweets models.
There are many works relating to sentiment analysis of twitter. One such paper
developed a model that is predictive in nature to analyze sentiment of tweets [5].
Lee [6] gives an overview of performing classification using K-Neighbors classifier.
Hasan et al. [7] highlights classification of tweets grounded on politics, movies, fraud
news, fashion, humanity and justice into three main categories like positive, negative,
and neutral, they collected tweets in Pakistan and did analysis using classification
algorithms like Naïve Bayes and SVM. Algorithms like KNN, Naïve Bayes, and
Modified K-Means are used to perform sentiment analysis on twitter data to classify
them into three classes like positive, negative, and neutral classes [8]. They compared
the accuracy results using all the above mentioned algorithms. Feature extraction
involves cleaning the data using tokenization, stop-words removal, and stemming
followed by this feature selection [911] is performed and a comparative study using
SVM, KNN, and Naïve Bayes is performed [1]. Text mining, feature selection and
also on feature transformation for analyzing text data are performed in our model
[12,13].
46 Analysis of Disaster Tweets Using Natural Language Processing 493
46.3 Proposed Model
Classification algorithms are used to classify the tweets into either a disaster or a
non-disaster. After running various classification algorithms [14] like MPP, KNN,
Random Forest, BPNN, SVM, K-means, SVM gave the most accurate results. Clas-
sification using optimized SVM algorithm is implemented in our proposed model as
shown in Fig. 46.1.
We utilized the SVM model in the sk-learn package to develop the Support Vector
Machine (SVM) technique. The svm.py file in the projects repository contains the
implementation for the SVM algorithm. Simply run svm.py with Python3 and no
command line arguments to execute the file. There are two methods in this file: main()
and SVM(). The main() method pulls in the training and testing data, pre-processes
the data, and invokes the SVM() function to run the model on the project’s data when
this file is executed. The SVM() function is a standalone implementation of the SVM
model that accepts as input parameters training and testing data. This function will
also provide a confusion matrix with a heat map, as well as the performance metrics
Precision, Recall, F1-Score, and Overall Accuracy. The SVM model utilized allows
you to change many settings; however we found that keeping everything at default
yielded the greatest results in our tests. The performance metrics and how it compared
to other implementations can be found in the results section.
Figure 46.1 explains the Architecture of Proposed system, here initially we collect
the data, and after collecting the data we do the analysis. During Data Analysis
Phase we inspect, clean, transform and model the data, then we pre-process the data,
we process raw data into an understandable format. After preprocessing data, we
structure the model. After structuring, testing is done on the model. After testing if
Fig. 46.1 Architecture of proposed system
494 T. Bikku et al.
we attain efficient score we finally implement it to produce the output, if we attain
poor score we again pre-process the data until we get the efficient score.
46.4 Algorithm
Input:
Sin(total number of input vectors)
Ssv(number of support vectors)
Sft(total number of attributes in a support vector)
SV5[Msv](support vector array)
IQ[Min](input vector array), bs*
Output:
V(output of conclusion function) for f β1toSsv,by1do
H=0
for qβ1toSsv,by1do
Distance=0
for sβ1toSft,by1do
distance==(SV1[h].feature[m]IN[f].feature[r])^2
end
v=exp(X×distance)
H+=SV5[i].α*×v
end
H=H+bs*
end
The Support Vector Machine (SVM) is a type of directed AI computation that
may be used for both planning and detecting relapses. However, we consider that
challenges with relapse are the best candidates for arrangement. The purpose of SVM
is to find an N-dimensional hyper plane that organizes the informative components
in a certain way. The amount of highlights determines the hyper plane’s component.
The hyper plane is more than a line when the number of information highlights is
two elements. The hyper plane becomes a two-dimensional plane when the number of
information highlights approaches three, it becomes harder to see when the number
of components surpasses three.
46 Analysis of Disaster Tweets Using Natural Language Processing 495
46.5 Results
As shown in Fig. 46.2, a receiver operating characteristic curve (ROC curve) is a
graph that illustrates how well a classification model performs across all classification
thresholds. The True Positive Rate and False Positive Rate are plotted on this curve.
This curve plots characteristic curve for SVM algorithm. AUC is the Area under the
ROC Curve. For this algorithm, it is shown as 0.86. AUC is a composite measure of
performance that takes into account all potential categorization levels.
As shown in Fig. 46.3, a receiver operating characteristic curve, often known
as a ROC curve, is a graph that depicts how a binary classifier system’s diagnostic
performance changes over time. This curve plots characteristic curve for BPNN
algorithm. As shown in the Fig. 46.3 it gives 78.4% accuracy on the overall data. It
has less accuracy than SVM algorithm. AUC of this BPNN algorithm is 0.85.
Figures 46.4 and 46.5 shows the ROC curves of Random Forest and KNN algo-
rithms, the AUC of Random Forest Classifier is 0.85 whereas AUC of K-Neighbors
Classifier is 0.84.
Figures 46.6,46.7,46.8,46.9 show the confusion matrix of various algorithms
used in our model. Confusion matrix is an N×Nmatrix used to evaluate the perfor-
mance of a classification model, where Nis the number of target classes. The matrix
compares the actual goal values to the machine learning model’s predictions. This
provides us with a comprehensive picture of how well our classification model is
working and the types of errors it makes. For a binary classification task, we’d use a
2×2 matrix with four values, as seen in figures: There are two possible values for
the target variable: positive or negative. The target variable’s real values are shown
in the columns. The rows indicate the target variable’s expected values.
The true positive value for MPP algorithm as derived from the confusion matrix
of Fig. 46.6 is 1538 alongside the true negative value is 985. The false positive and
false negative values are 323 and 417. Figure 46.7 is the confusion matrix of KNN
Fig. 46.2 ROC curve of
SVM
496 T. Bikku et al.
Fig. 46.3 ROC curve of
BPNN
Fig. 46.4 ROC curve of
random forest
algorithm, the true positive and true negative values are 1681 and 859, respectively,
and the false positive and false negative values are 180 and 543, respectively.
The true positive value for BPNN algorithm as derived from the confusion matrix
of Fig. 46.8 is 1525 alongside the true negative value is 1018. The false positive and
false negative values are 336 and 384.
Figure 46.9 is the confusion matrix of Random Forest algorithm, the true positive
and true negative values are 1663 and 929, respectively, and the false positive and
false negative values are 198 and 473, respectively.
Figure 46.10 shows the confusion matrix derived from SVM algorithm. The true
positive value is 1699 and true negative value is 918. The false positive and false
negative values are 162 and 484, respectively. Figure 46.11 shows the overall accuracy
46 Analysis of Disaster Tweets Using Natural Language Processing 497
Fig. 46.5 ROC curve of
KNN
Fig. 46.6 Confusion matrix
for MPP
Fig. 46.7 Confusion matrix
for KNN
498 T. Bikku et al.
Fig. 46.8 Confusion matrix
for BPNN
Fig. 46.9 Confusion matrix
for random forest
graph which is plotted by taking in the accuracies of various algorithms used. The
graph reached its maximum at 80.2%.
Table 46.1 shows the Performance metrics like Precision, Recall, F1 score and the
Overall accuracy of the various algorithms used to develop the model. Precision is
one measure of a machine learning model’s performance—the accuracy of a model’s
positive prediction. The number of true positives divided by the total number of
positive predictions is known as precision (i.e., the number of true positives plus the
number of false positives). The Precision values of the algorithms used are listed in
table and the highest precision was shown by optimized SVM algorithm (85.0%)
followed by KNN with 82.7%.
Recall is a statistic that measures how many correct positive predictions were
produced out of all possible positive predictions. All the Recall values of algorithms
are listed in Table 46.1. The highest recall value was with BPNN algorithm. F1 score
46 Analysis of Disaster Tweets Using Natural Language Processing 499
Fig. 46.10 Confusion
matrix for SVM
Fig. 46.11 Overall accuracy graph
Table 46.1 Performance metrics of the algorithms used
Algorithm used Precision Recall F1score Overall accuracy
MPP 75.3 70.3 72.7 77.3
KNN 82.7 61.3 70.4 77.8
Random Forest 81.4 66.4 73.1 79
Optimized SVM 85.0 65.5 74.0 80.2
K-means 30.4 4.2 7.4 54.7
BPNN 76.7 71.7 74.1 78.5
500 T. Bikku et al.
is calculated using precision and recall. The highest F1 score is for BPNN algorithm
which means that there is a balance between precision and recall of that algorithm.
Overall accuracy of all the algorithms can be seen from Table 46.1. The highest
accuracy was achieved by optimized SVM algorithm with 80.2%.
46.6 Conclusion
This research paper performed analysis and text mining on twitter datasets. It tried
classifying the twitter data into two categories as either disaster or non-disaster. From
the experimental findings of this paper it can be winded up that twitter can safely
classify the tweets with a great accuracy. The highest accuracy was achieved using
optimized SVM on the reduced data set, the accuracy was expected to be 80.5%.
There are certain drawbacks, the performance of the was slower than models that
provided better accuracy. So there comes the room for improvement that the accuracy
can be improved by improving the fusion testing and more tweaking is also necessary.
There is improved chance of future improvement in this particular field. This paper
used SVM as the classification algorithm as it gave best results. Other algorithms,
such as Decision trees, K-Nearest Neighbors, Random forest, and fusion approaches,
can be used instead of these to increase accuracy, these aspects could be studied in
the future.
References
1. Khan, A., Baharudin, B., Lee, L. H., & Khan, K. (2010). A review of machine learning algo-
rithms for text-documents classification. Journal of Advances in Information Technology, 1(1),
4–20.
2. Chowdhury, G. G. (2003). Natural language processing. Annual Review of Information Science
and Technology,37(1), 51–89.11.V
3. Indurkhya, N., & Damerau, F. J. (Eds.). (2010). Handbook of natural language processing
(Vol. 2). CRC Press.
4. Shetty, B. (2018). Natural language processing (NLP) for machine learning. Retrieved
November 24, 2018.
5. Bagheri, H., & Islam, M. J. (2017). Sentiment analysis of twitter data. arXiv preprint arXiv:
1711.10377
6. Lee, E. (2019). Using k-nearest-neighbors (knn) machine learning technique to classify
archived helicopter wear debris data. In AIAC18: 18th Australian International Aerospace
Congress (2019): HUMS-11th Defence Science and Technology (DST) International Confer-
ence on Health and Usage Monitoring (HUMS 2019): ISSFD-27th International Symposium
on Space Flight Dynamics (ISSFD) (p. 816). Engineers Australia, Royal Aeronautical Society.
7. Hasan, A., Moin, S., Karim, A., & Shamshirband, S. (2018). Machine learning-based sentiment
analysis for Twitter accounts. Mathematical and Computational Applications, 23(1), 11.
8. Bharti, O., & Malhotra, M. (2016). Sentiment analysis on Twitter data. International Journal
of Computer Science and Mobile Computing, 5(6), 601–609.
9. Bikku, T., & Priya, A. P. A novel algorithm for clustering and feature selection of high
dimensional datasets.
46 Analysis of Disaster Tweets Using Natural Language Processing 501
10. Deepa Lakshmi, S., & Velmurugan, T. (2016). Empirical study of feature selection methods
for high dimensional data. Indian Journal of Science and Technology, 9(39), 1–6.
11. Dayanika, J., Archana, G., SivaKumar, K., & Pavani, N. (2020). Early detection of cyber attacks
based on feature selection algorithm. Journal of Computational and Theoretical Nanoscience,
17(9–10), 4648–4653.
12. Goswami, S., & Raychaudhuri, D. (2020). Identification of disaster-related tweets using natural
language processing. In International Conference on Recent Trends in Artificial Intelligence,
IOT, Smart Cities & Applications (ICAISC-2020), May 26, 2020.
13. Bikku, T., Nandam, S. R., & Akepogu, A. R. (2018). A contemporary feature selection and
classification framework for imbalanced biomedical datasets. Egyptian Informatics Journal,
19(3), 191–198.
14. Sen, P. C., Hajra, M., & Ghosh, M. (2020). Supervised classification algorithms in machine
learning: A survey and review. In Emerging technology in modelling and graphics (pp. 99–111).
Springer.
Author Index
A
Abhishek, B., 135
Aishwarya Govindkar, 169
Almukhtar, Firas Husham, 447
Anantha Rao, G., 315,361
Angadi Lakshmi, 203
Anil, K., 149
Anuhya Kanyadari, 491
Asha, S., 397,407
Avanija, J., 465
Aviral Pulast, 397
Ayyappa Swamy, K., 455
B
Badrinath, N., 327
Badugu Samatha, 179
Balaji Bhanu, B., 127
Balasundaram, A., 135,189
Bhaskar Kumar Rao, 127
Bhaskar, T., 49
Bhavya,K.R.,419
Bibhuti Bhusan Dash, 109
Biksham, V., 49
Borra Bhavana Sai, 491
Brahmananda Reddy, A., 69
D
Damodaram, A. K., 81
Deepak, V., 179
Divakar, T. V. S., 315,361
Divya Jagabattula, 339
G
Gabbireddy Keerthi, 303
Gadhiraju Tej Varma, 217
Galety, Mohammad Gouse, 431,447
Ganesh Davanam, 95
Gavini Sreelatha, 169
Gayathri, N., 235
Govinda, K., 39
H
Harini, M., 1
Hussian, Md. Asdaque, 249
I
Indraneel, S., 159
J
Jafflet Trinishia, A., 407
Jagadeesh Kannan Raju, 135,189
Janakiramaiah Bonam, 295,339
Jasthi Siva Sai, 203
Jayasree, K., 259
K
Kamala Challa, 465
Kamal Hajari, 29
Kannan, I., 281
Karthikeyan Jayaraman, 351
Karthik, S. A., 387,419
Kerenalli Sudarshana, 9
Khanjan Shah, 135
Kiran, K. V. D., 235
Kondra Pranitha, 265
Korupalli V. Rajesh Kumar, 303
© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Singapore Pte Ltd. 2023
B. N. K. Rao et al. (eds.), Intelligent Computing and Applications, Smart Innovation,
Systems and Technologies 315, https://doi.org/10.1007/978-981-19-4162-7
503
504 Author Index
Kotte Vinaykumar, 49
Krishnaveni, C. V., 265
Kurakula Arun Kumar, 351
L
Lakshmi Ramani Burra, 295,339
Lalitha, G., 387
Lavanya Kongala, 479
M
Maaroof, Rebaz Jamal, 431,447
Maganti Venkatesh, 109
Malathy, C., 59
Marni Srinu, 109
Mekala Narendar, 95
Midhun Chakkaravarthy, 265
Mohan, A., 127
Mukesh Chinta, 203
Mukkamala Namitha, 203
Mulugu Suma Anusha, 203
Muniraju Naidu Vadlamudi, 249
MylaraReddy, C., 9
N
Naga Badra Kali, M., 119
Nagarjuna Karyemsetty, 179
Nagendra Panini Challa, 119,127
Narendra Kumar Rao, B, 127,265,295
Naveen Kumar Polisetty, S., 159
Nuthalapati Sudha, 69
O
Obulakonda Reddy, R., 327
Obulesu, O., 439
P
Padyala Venkata Vara Prasad, 235
Pathakamuri Chandrika, 491
Pattan Afrid Ahmed, 189
Pavan Kumar Vadrevu, 227
Pavan Kumar, T., 95
Prabhu Gantayat, 189
Pradeep Ghantasala, 479
Praveen Tumuluru, 295
Pravinth Raja, S., 419
Priyanka Gaba, 19
R
Rabinarayan Satpathy, 109
Raghav Srinivaas, M., 135
Rajasekhar Kommaraju, 235
Ram Shringar Raw, 19
Ranjana, 265
Reddy Madhavi, K., 81,303,327,479
Riyazuddin, Y. Md., 387
Rofoo, Fanar Fareed Hanna, 431,447
Roopasri Sai Varshitha Godavarthi, 339
Routhu Ramya Dedeepya, 203
Rupa Devi, B., 303
S
Sampath Korra, 49
Sarika Jay, 189
Sarukolla Ushaswini, 169
Sasi Kumar Bunga, 227
Shaik Jani, 377
Shyam Mohan, J. S., 119,127
Sivakumar, B., 1
Sivaprakasam, S., 149
Sivaprakasam, T., 159
Soumya Gogulamudi, 339
Sowmya Eda, 339
Sreenivasa Chakravarthi, S., 81
Sree T. Sucharitha, 281
Sridhar, P., 149
Sri Krishna Adusumalli, 217,227
Subbarao Gogulamudi, 179
Subhash Chavadaki, 419
Sudhakara, M., 303,327
Sumam Mary Idicula, 259
Suneetha, K., 465
Sunil Kumar Reddy, P., 439
Sunil Kumar, B., 419
Sunil Kumar, M., 95
Suresh, K., 439
Suresh Kallam, 465,479
Syamala, K., 361
T
Thalakola Syamsundararao, 179
Thoutireddy Shilpa, 479
Thulasi Bikku, 491
U
Ujwalla Gawande, 29
V
Vaidhehi, M., 59
Valli Kumari, V., 377
Var u n K u ma r, K . A ., 281
Author Index 505
Veeramanickam, M. R. M., 227
Venkataramana, R., 387
Venkata Rama Raju, P., 119
Venkata Sai Satvik, 189
Venkata Subbaiah, C., 39
Venkateswara Reddy, L., 81,95
Venkateswarulu, N., 439
Vijaya Shambhavi, Y., 327
Vijaya Kumar Gudivada, 109
Vuyyuru Prathima, 491
Y
Yaswanth Raparthi, 465
Yogesh Golhar, 29
Z
Zachariah C. Alex, 455
Zameer Ahmed Adhoni, 9
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
span id="docs-internal-guid-54b35aa6-7fff-0992-ed4c-aca4d05cfcfa"> Underwater image enhancement (UIE) is an imperative computer vision activity with many applications and different strategies proposed in recent years. Underwater images are firmly low in quality by a mixture of noise, wavelength dependency, and light attenuation. This paper depicts an effective strategy to improve the quality of degraded underwater images. Existing methods for dehazing in the literature considering dark channel prior utilize two separate phases for evaluating the transmission map (i.e., transmission estimation and transmission refinement). Accurate restoration is not possible with these methods and takes more computational time. A proposed three-step method is an imaging approach that does not need particular hardware or underwater conditions. First, we utilize the multi-layer perceptron (MLP) to comprehensively evaluate transmission maps by base channel, followed by contrast enhancement. Furthermore, a gamma-adjusted version of the MLP recovered image is derived. Finally, the multi-scale fusion method was applied to two attained images. The standardized weight is computed for the two images with three different weights in the fusion process. The quantitative results show that significantly our approach gives the better result with the difference of 0.536, 2.185, and 1.272 for PCQI, UCIQE, and UIQM metrics, respectively, on a single underwater image benchmark dataset. The qualitative results also give better results compared with the state-of-the-art techniques. </span
Presentation
Full-text available
The Internet of things (IoT) describes physical objects (or groups of such objects) with sensors, processing ability, software and other technologies that connect and exchange data with other devices and systems over the Internet or other communications networks. Internet of things has been considered a misnomer because devices do not need to be connected to the public internet, they only need to be connected to a network and be individually addressable.
Article
Full-text available
Electronic medical records (EMRs) square measure vital, sensitive personal data in aid, and wish to be often shared between peers. Blockchain Technologyfacilitates a shared, im-mutable and history of all the transactions creatingsoftwareof trust, responsibility and transparency. This provides a novel chance to implement a secure and reliable EMR knowledge management and sharing, system victimization. In this paper, we gift our views on blockchain primarily based aid knowledge management, specially, for EMR knowledge sharing between aid suppliers and for analysis studies. we have a tendency to propose a framework for managing EMR knowledge for cancer patient care. together with an Hospital, we have a tendency to enforced our framework in an exceedingly image that ensures privacy, security, convenience, and fine-grained access management over EMR knowledge. The planned paper will considerably scale back the turnaround for EMR sharing, improve higher cognitive process for medical aid, and scale back the value. Confidentiality in health industry refers to the "obligation of professionals" , World Health Organization canhave access to patient records or exchange information to carry that data in confidence. Managing electronic health data presents distinctive challenges for restrictive compliance, for moral concerns and ultimately for quality of care. As the meaningful use of Electronic Health record system expands from the health devices, its aiding organizations grow. All World Health Organization work with health data-health information processing and management professionals, doctors, researchers, business directors have responsibility to accept that data. And as patients, we've privacy rights with rele-vancy our own health data Associate in Nursing an expectation that our data be control in confidence and guarded. Confidentiality of patient medical records is of utmost importance. Access to patient medical records in hospital software package ought to be with the treating/admitting practician and therefore the team. Access to medical records mustn't lean to everybody within the hospital network. one in all the thanks to address this confidentiality issue is "Blockchain Technology". Victimization digital signatures on Blockchain-based knowledge permits access for multiple folks may regulate the provision and maintain the security of health records. Additionally, a community of individuals, together with stakeholders of health care industry, might be a part of the Blockchain, can reduce fraud in payments.
Article
Full-text available
The non-stationary ECG signals are used as key tools in screening coronary diseases. ECG recording is collected from millions of cardiac cells and depolarization and re-polarization conducted in a synchronized manner as: the P wave occurs first, followed by the QRS-complex and the T wave, which will repeat in each beat. The signal is altered in a cardiac beat period for different heart conditions. This change can be observed in order to diagnose the patient’s heart status. Simple naked eye diagnosis can mislead the detection. At that point, computer-assisted diagnosis (CAD) is therefore required. In this paper dual-tree wavelet transform is used as a feature extraction technique along with deep learning (DL)-based convolution neural network (CNN) to detect abnormal heart. The findings of this research and associated studies are without any cumbersome artificial environments. This work investigates the viability of using deep learning-based architectures for heartbeat classification. The DL architecture is used for the proposed project, and the results suggest that it is feasible to use 2D images to train a deep learning architectures for heartbeat classification. The CNN produced the highest overall accuracy of around 99%. The CAD method proposed has high generalizability; it can help doctors efficiently identify diseases and decrease misdiagnosis.
Article
Full-text available
Data mining is the promising field that attracted the industries to manage huge volumes of data. The most effective and challenging techniques of data mining is data classification. The main intension of this research is to design and develop a data classification strategy based on hybrid fusion model using the deep learning approach, Adaptive Lion Fuzzy System (ALFS), and Robust Grey wolf based Sine Cosine Algorithm based Fuzzy System (RGSCA-FS). The hybrid model consists of three phases: In the first phase, the data is classified using ALFS and the rule base of the fuzzy system is updated by optimally generating the rules using adaptive lion optimization (ALA) from the training data. The second step is the fuzzification process, which converts the scalar values in the training data into fuzzy values with the help of membership function, which is based on Adaptive Genetic Fuzzy System (AGFS). Finally, the classified score of data instances is determined using defuzzification process, which converts the linguistic variable into fuzzy score. In the second phase, the data is classified using Robust Grey wolf based Sine Cosine Algorithm based Fuzzy System (RGSCA-FS), which is used for selecting the optimal fuzzy rules. In the third phase, the data is classified using deep learning networks. The outputs from three phases are fused together using the hybrid fusion model for which the weighed fusion is employed. The performance of the system is validated using three different datasets that are available in UCI machine learning repository. The proposed hybrid model outperforms the existing methods with sensitivity of 0.99, specificity of 0.9350, and accuracy of 0.9411, respectively.
Chapter
Full-text available
Presently, wearables act as a vital part of healthcare sector and they are able to offer exclusive perceptions about the person’s health conditions. In contrast to traditional diagnosis in a hospital environment, wearables can give unrestricted access to real-time physiological data. COVID-19 epidemic is increasing at a faster rate with limited test kits. Hence, it becomes essential to develop a novel COVID-19 diagnostic model. Numerous studies were based on the utilization of artificial intelligence techniques on radiological images to precisely identify the disease. This chapter presents an efficient fusion-based feature extraction with multikernel extreme learning machine (FFE-MKELM) for COVID-19 diagnosis using internet of things (IoT) and wearables. Primarily, the wearables and IoT are used to capture the radiological images of the patient. The presented FFE-MKELM model incorporates Gaussian filtering based preprocessing for removing the noise that exists in the radiological image. Besides, directional local extreme patterns with deep features based on Inception v4 model are applied for the FFE process. In addition, MKELM model is utilized as a classification model to determine the appropriate class label of the input radiological images. Moreover, monarch butterfly optimization algorithm is applied to fine tune the parameters involved in the MKELM model. Experimental validation of the FFE-MKELM model is performed against benchmark dataset and the outcomes are inspected under different measures. The resultant simulation outcome ensured the betterment of the FFE-MKELM method by demonstrating an increased sensitivity of 97.34%, specificity of 97.26%, accuracy of 97.14%, and F-measure of 97.01%.
Chapter
Natural language processing (NLP) is a growing field of artificial intelligence (AI) that combines machine learning and linguistics to enable computers to understand and generate human language. Applications of NLP range from voice assistants like Apple’s Siri and Amazon’s Alexa to text summarization, machine translation, and spam filtering. NLP is particularly challenging given the complexity and hierarchical nature of human lan at the most basic level, individual words can take guage, which can include subtle meanings. Fortunately, rapidly improving computing power, new tools and avenues of mass data collection, and recent improvements in NLP algorithms (large language models) have all made it possible to train computers to understand human language more efficiently and more accurately.