ChapterPDF Available

Enhancing QoS of Network Traffic Based on 5G Wireless Networking Using Machine Learning Approaches

December 2022

December 2022

DOI:10.1007/978-3-031-22915-2_1

In book: Computational Intelligence and Smart Communication (pp.3-15)

Authors:

Pankaj Pratap Singh

IMS Engineering College

Arif Ali

DEV BHOOMI UTTARAKHAND UNIVERSITY DEHRADUN

Show all 5 authorsHide

5G wireless networks are based on heterogeneous networks. Heterogeneous networks offer a higher quality of service (QoS) and let you better utilize the resources of the network. Control of traffic on a network is complicated when a multiplicity of heterogeneous networks are present. When different protocols and data transmission rates are used, heterogeneous networks face the problem of managing and managing network traffic appropriately. In this paper our objective is to reduce Network Traffic and improve QoS for 5G wireless Network thus we have discuss some Supervised and Unsupervised Algorithm of Machine Learning Approach. So we have implemented K-mean algorithm in this paper will reduce traffic and improve the efficiency of 5G wireless Network. The K-mean is an iterative grouping technique that moves data objects between cluster sets until one desired set is reached. The dataset for K-mean Algorithm divides the traffic into two classes and then weighted mean is calculated for each cluster until the resultant output is identical weighted mean. If there are two clusters have identical weighted mean then there are no changes in cluster of classes.KeywordsHeterogeneous networkTraffic classificationMachine learningSupervised learningUnsupervised learning

Heterogeneous network

…

Methodology of proposed work

…

Sample Images (a) original (b) augmented.

…

Applications of AI/ML algorithms [5].

…

+35

Columns in training dataset

…

Figures - uploaded by Pankaj Pratap Singh

Content may be subject to copyright.

Content uploaded by Pankaj Pratap Singh

Content may be subject to copyright.

Content uploaded by Pankaj Pratap Singh

Content may be subject to copyright.

Ritika Mehra

Phayung Meesad

Sateesh K. Peddoju

Dhajvir S. Rai (Eds.)

First International Conference, ICCISC 2022

Dehradun, India, June 10–11, 2022

Revised Selected Papers

Computational

Intelligence and

Smart Communication

Communications in Computer and Information Science 1672

Communications

in Computer and Information Science 1672

Editorial Board Members

Joaquim Filipe

Polytechnic Institute of Setúbal, Setúbal, Portugal

Ashish Ghosh

Indian Statistical Institute, Kolkata, India

Raquel Oliveira Prates

Federal University of Minas Gerais (UFMG), Belo Horizonte, Brazil

Lizhu Zhou

Tsinghua University, Beijing, China

More information about this series at https://link.springer.com/bookseries/7899

Ritika Mehra ·Phayung Meesad ·

Sateesh K. Peddoju ·Dhajvir S. Rai (Eds.)

Computational

Intelligence and

Smart Communication

First International Conference, ICCISC 2022

Dehradun, India, June 10–11, 2022

Revised Selected Papers

Editors

Ritika Mehra

Dev Bhoomi Uttarakhand University

Dehradun, India

Sateesh K. Peddoju

Indian Institute of Technology Roorkee

Roorkee, India

Phayung Meesad

King Mongkut University of Technology

Bangkok, Thailand

Dhajvir S. Rai

College of Engineering Roorkee

Roorkee, India

ISSN 1865-0929 ISSN 1865-0937 (electronic)

Communications in Computer and Information Science

ISBN 978-3-031-22914-5 ISBN 978-3-031-22915-2 (eBook)

https://doi.org/10.1007/978-3-031-22915-2

to Springer Nature Switzerland AG 2022

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the

material is concerned, speciﬁcally the rights of translation, reprinting, reuse of illustrations, recitation,

broadcasting, reproduction on microﬁlms or in any other physical way, and transmission or information

storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now

known or hereafter developed.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication

does not imply, even in the absence of a speciﬁc statement, that such names are exempt from the relevant

protective laws and regulations and therefore free for general use.

The publisher, the authors, and the editors are safe to assume that the advice and information in this book are

believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors

give a warranty, expressed or implied, with respect to the material contained herein or for any errors or

omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in

published maps and institutional afﬁliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG

The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Preface

A two-day International Conference on Computational Intelligence and Smart Commu-

nication (ICCISC 2022) was organized by the School of Computer Science and Engi-

neering at Dev Bhoomi Uttarakhand University (DBUU), Dehradun, during June 10–

11, 2022. This conference was organized in association with Springer. Our professional

partners for this conference were the ACM Jaipur Chapter and the Dev Bhoomi Uttarak-

hand University CSI Student Chapter, and it was sponsored by the Uttarakhand State

Council for Science & Technology and the Uttarakhand Science Education & Research

Centre, Dehradun, India.

The aim of the conference was to provide a platform for researchers and practitioners

from both academia and industry to meet and share cutting-edge developments in the ﬁeld

of computational intelligence and smart communication. It also focused on all aspects

of computation intelligence and data sciences with modern and emerging computational

topics.

ICCISC 2022 provided an excellent international forum to share knowledge as well

as their ﬁndings in theory, methodology, and/or applications relevant to the confer-

ence themes. The conference featured paper presentations in addition to the keynote

addresses from prominent speakers on related state-of-the-art technologies. The con-

ference beneﬁted the delegates by helping them to add to or improve their skills and

knowledge. The networking during the conference also laid foundations for possible

future collaborations. ICCISC 2022 provided an invaluable platform to raise awareness

about forthcoming innovations in diverse ﬁelds of computational intelligence and smart

communication.

The conference brought together a community of international researchers, industrial

experts, and academicians. It not only was restricted to paper presentations but also paved

the way for subsequent discussions on the latest trends in research and development

linked to the conference themes and allied areas.

The conference featured keynote addresses by prominent people, including experts

from the Cloud Lab at the University of Melbourne, Australia, the Artiﬁcial Intelligence

Research Institute (IIIA-CSIC), Spain, the Multimedia Data Analytics and Processing

Research Unit at Chulalongkorn University, Thailand, and the AI and Cybersecurity

Research Centre at Staffordshire University, UK.

The main conference themes were as follows:

•Track 1: Wireless Sensor Networks and Computing Technologies

•Track 2: Networks, Security and Privacy

•Track 3: Smart Communication and Technology

•Track 4: Emerging Computing

Topics of interest included the following subthemes: block chain, deep learning,

pattern recognition, modeling and simulation, natural language processing, internet of

vi Preface

things, soft computing, artiﬁcial intelligence, quantum computing, cloud computing, fog

computing, cyber security, sentiment analysis, wireless sensor networks, signal process-

ing, intelligent communications and networking, software deﬁned networks, 5G net-

works, mobile and optical broadband, e-health, real time networks, satellite and space

communication, radar and microwaves, secure and energy efﬁcient networks, cognitive

radio and cognitive networks, multimedia communication, intelligent control, robotics,

and smart embedded systems.

The review process is one of the major components that governs the quality of

research being shared and the success of the event as well, thus making it a very critical

part of a conference. To maintain transparency and to ensure that high standards and

ethics of research are followed, the submission of papers was performed through the

EasyChair conference management system. This platform has an included feature for

plagiarism checks, which are performed using the Turnitin plagiarism software tool.

Papers with a plagiarism coefﬁcient of more than 30% plagiarism get rejected automat-

ically. Papers with coefﬁcients below 30% but above 12% were returned to the authors

for revision. Papers having plagiarism coefﬁcients below 12% in the literature review

section (if any) were considered for presentation in conference provided they met other

norms related to plagiarism.

A single-blind review process was followed, with each paper assigned to three inde-

pendent reviewers. Most of the reviewers were external to ensure the quality standards

of the conference. The process mandatorily required three reviews to be completed—

no paper was considered for presentation unless all three reviews were received—and

efforts were made to reassign papers in cases where a reviewer declined or expressed

unavailability for the process. After receiving the reviews for a paper, it was judged on

the basis of positive reviews and comments, and wherever applicable minor or major

modiﬁcations as suggested by the reviewers were communicated to authors and the

paper(s) revised as necessary.

If two or more reviews were found satisfactory, then only the paper was sent to the

General Chair for further veriﬁcation, and the acceptance or rejection of the paper was

at the sole discretion of the General Chair, whilst keeping the reviewers’ comments in

mind. The conference received 106 papers from authors for consideration and, after the

stringent review process, only 56 were shortlisted for presentation. Further, only nine

research articles were considered for publication in Springer’s CCIS series. Out of these

nine, eight are full length papers and one is a short paper. We hope that you enjoy reading

the selected papers.

October 2022 Ritika Mehra

Phayung Meesad

Sateesh K. Peddoju

Dhajvir S. Rai

Organization

General Chairs

Sanjay Bansal Dev Bhoomi Uttarakhand University, India

Preety Kothiyal Dev Bhoomi Uttarakhand University, India

Raj Kishore Tripathi Dev Bhoomi Uttarakhand University, India

Program Committee Chairs

Ritika Mehra Dev Bhoomi Uttarakhand University, India

Dhajvir Singh Rai Dev Bhoomi Uttarakhand University, India

Steering Committee

Jean-Paul Van Belle University of Cape Town, South Africa

Bhuvanesh Unhelkar University of South Florida, USA

Ankit Agarwal Northwestern University, USA

Phayung Meesad King Mongkut’s University of Technology,

Thailand

Sateesh K. Peddoju Indian Institute of Technology Roorkee, India

Waralak V. Siricharoen Silpakorn University, Thailand

S. Gomathi UK International Qualiﬁcations Limited, UK

R. C. Bansal University of Sharjah, UAE

Michael Pecht Maryland University, USA

Sachin R. Jain Oklahoma State University, SUSA

Ahmed Elngar Beni-Suef University, Egypt

Sanjeevi Padmanaban Aalborg University, Esbjerg, Denmark

Ahmed J. Obaid University of Kufa, Iraq

Balachandra Pattanaik Wollega University, Ethiopia

Ali Musrrat King Faisal University, Saudi Arabia

Mohammad Shoab Shaqra University, Saudi Arabia

Rhonnel S. Paculanan University of Makati, Philippines

Kourosh Ahmadi Auckland Institute of Studies, New Zealand

Rocha Alvaro University of Lisbon, Portugal

Pao Ann Hsuing National Chung Cheng University, Taiwan

Durga Toshniwal Indian Institute of Technology Roorkee, India

Kunwar Vaisla Bipin Tripathi Kumauni Institute of Technology,

India

viii Organization

Vishal Jain Sharda University, India

Manoj Kumar Shukla Harcourt Butler Technical University, India

Aman Jatin Amity University, Gurgaun, India

Sandeep Vijay Tula Institute, India

R. K. Bharti Bipin Tripathi Kumaon Institute of Technology,

India

Ajit Singh Bipin Tripathi Kumaon Institute of Technology,

India

Mayank Aggarwal Gurukul Kangri Vishwavidyalaya, India

Amit Aggarwal Abdul Kalam Institute of Technology, Tanakpur,

India

Pramod Kumar Krishna Engineering College, India

Vishal Kumar Bipin Tripathi Kumauni Institute of Technology,

India

S. C. Sharma Indian Institute of Technology Roorkee, India

T. S. Arora National Institute of Technology, India

Vibhash Yadav REC Banda, India

Umesh Chandra Banda University of Agriculture and Technology,

India

Anish Gupta Academy of Business and Engineering Science,

India

Vaisla Kunwar Bipin Tripathi Kumauni Institute of Technology,

India

Sudhakar Chauhan National Institute of Technology Kurukshetra,

India

Kapil Gupta National Institute of Technology Kurukshetra,

India

Ved Prakash Amity University, Haryana, India

Nilam Choudhary Jaipur Engineering College and Research Centre,

India

Baldev Singh Vivekananda Global University, India

Suresh Kumar Manav Rachna Institute of Research and Studies,

India

Das Nripendra Manipal University Jaipur, India

Vijay Bhaskar Semwal Maulana Azad National Institute of Technology,

India

Prakash S. Bharath Institute of Higher Education and

Research, India

Gaurav Verma Jaypee Institute of Information Technology, India

Surya Prakash Thapar University, India

R. Dhanasekaran Syed Ammal Engineering College, India

M. K. Sharma Amarpali Group of Institutions, India

Dinesh Goel Poornima University, India

Organization ix

Parma Nand Sharda University, India

Sachin Sharma Graphic Era University, India

Program Committee

Diago Galar Lulea University of Technology, Sweden

Omar H. Alhazmi Taibah University, Saudi Arabia

Anand Nayyar Duy Tan University, Vietnam

Pham Quoc Cuong HCMUT-VNUHCM, Vietnam

Felix J. Garcia Clemente University of Murcia, Spain

G. E. Alexender Cristina Universidad Technica Particular de Loja, Ecuador

Sameeka Saini Dev Bhoomi Uttarakhand University, India

Manik Sharma DAV University, India

Jeetendra Pande Uttarakhand Open University, India

Bagwari Ashish WIT Dehradun, India

Gunjan Bhatnagar Dev Bhoomi Uttarakhand University, India

Vivek Arya Gurukul Kangri University, India

Vipul Sharma Gurukul Kangri University, India

Bhawna Parihar Bipin Tripathi Kumaon Institute of Technology,

India

Poonam Chhimwal Bipin Tripathi Kumaon Institute of Technology,

India

Saurav Mishra Dehradun Institute of Technology, India

Banit Negi GBPIET, India

Alka Dikshit Himachal Pradesh University, India

Ashish Nayar IITM, India

Deepak Sharma Jagan Institute of Management Studies Delhi,

India

Jitendra Rauthan GBPIET, India

Ekta Upadhayay Dev Bhoomi Uttarakhand University, India

Varun Uniyal GBPIET, India

Sunil Mankotiya Himachal Pradesh University Shimla, India

Nishta Kapoor Rajeev Gandhi Government Degree College

Shimla, India

Anurag Jain University of Petroleum and Energy Studies, India

Shamik Tiwari University of Petroleum and Energy Studies, India

Pooja Munjal Delhi University, India

Deepesh Rawat SRHU Dehradun, India

K. C. Mishra WIT Dehradun, India

Ajit Rathor Ajay Kumar Garg Engineering College, India

Anuj Sharma Gurukul Kangri University, India

Anupama Mishra Swami Rama Himalayan University, India

D. C. Pandey Graphic Era University, India

x Organization

Vivek Kumar Gupta Dehradun Institute of Technology, India

Vijay Shankar Sharma Manipal University, India

Nemi Chand Barwar Mugneeram Bangur Memorial Engineering

College, India

Additional Reviewers

Mukesh Joshi

Anvesha Katti

Purnendu Bikash Acharjee

Rajeev Kumar

Akhilesh Kumar Sharma

Gaurav Verma

Sandeep Budhani

Kapil Joshi

Gaurav Aggarwal

Anuj Kumar

Shilpa Srivastava

Aruna Pavate

Kanchan Dabre

Vaibhav Ranjan

Anupama Chadha

Ram Narayan

Sunil Kumar

Wiqas Ghai

Amit Kishor

Samya Muhuri

Atul Garg

Abhineet Anand

Mandeep Kaur

Nishant Mathur

Ahmed A. Thabit

Sanjeev Pippal

Madhulika Mittal

Vivek Ary a

Jitendra Saturwar

Avdhesh Kumar Tiwari

Pooja Gupta

Gesu Thakur

Ashish Gupta

Sapana Singh

Bharti Sharma

Sridhar Iyer

Abhilasha Chauhan

Deepesh Rawat

Sateesh Kumar

Apurva Sharma

Anirudh Mangore

Pratap Singh

Ganesh Yadav

Shalini Puri

Bhagyashree Shendkar

Vineet Kumar Salar

Michael Albino

Sohit Agarwal

Naveen Tewari

Gaurav Goel

Ajesh F.

Vivek Sharma

Sonika Singh

Rajiv Kumar

Pronab Adhikari

Jefferson Costales

Rajat Goel

Rakesh Saini

Layla H. Abood

Anupama Chadha

Nipur Singh

Prashant Kumar

Kuntal Chowdhury

Yogesh Chauhan

Piyush Anand

Michael Albino

Sunil Pathak

Abid Hussain

Sanjeev Kumar

Srikanta Kumar Mohapatra

Gajanand Sharma

Gourav Bathla

Hussein Jabar Khadim

Contents

Wireless Sensor Networks and Computing Technologies

Enhancing QoS of Network Trafﬁc Based on 5G Wireless Networking

Using Machine Learning Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Shivani Saini, Sharvan Kumar Garg, Pankaj Pratap Singh, Arif Ali,

and Akhilesh Pandey

Soil Classiﬁcation and Crop Prediction Using Machine Learning . . . . . . . . . . . . . 16

Yuvraj Jangir, Tushar Goyal, Sumit Kandari, and Arshad Husain

Analysis of the Performance of Data Mining Classiﬁcation Algorithm

for Diabetes Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

Vijaylakshmi Sajwan, Monisha Awasthi, Prakhar Awasthi, Ankur Goel,

Manisha Khanduja, and Anuj Kumar

Networks, Security and Privacy Parallel and Distributed Networks

Prediction of DDoS Attacks Using Machine Learning Algorithms Based

on Classiﬁcation Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

Anupama Mishra and Deepesh Rawat

Role of Internet of Things and Cloud Computing in Education System:

AReview ............................................................ 51

Ajay Krishan Gairola and Vidit Kumar

Smart Communication and Technology

An Effective Image Augmentation Approach for Maize Crop Disease

Recognition and Classiﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

M. Nagaraju, Priyanka Chawla, and Rajeev Tiwari

Implementation of Artiﬁcial Intelligence (AI) in Smart Manufacturing:

AStatus Review ....................................................... 73

Akash Sur Choudhury, Tamesh Halder, Arindam Basak,

and Debashish Chakravarty

Emerging Computing Computational Intelligence

Flight Fare Prediction Using Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

K. P. Arjun, Tushar Rawat, Rohan Singh, and N. M. Sreenarayanan

xii Contents

Impact of Work from Home During Covid-19 on the Socio-economic

StatusofIndia ........................................................ 100

Poonam Ojha, Sudhanshu Maurya, and Manish Kumar Ojha

Author Index ......................................................... 115

Wireless Sensor Networks

and Computing Technologies

Enhancing QoS of Network Trafﬁc Based on 5G

Wireless Networking Using Machine Learning

Approaches

Shivani Saini1(B), Sharvan Kumar Garg1, Pankaj Pratap Singh2,ArifAli

and Akhilesh Pandey3

1Subharti Institute of Technology and Engineering, Swami Vivekanand Subharti University,

Meerut, India

Shivanisaini792@gmail.com

2School of Computer Science and Engineering, Dev Bhoomi Uttarakhand University,

Dehradun, India

3Uttaranchal School of Computing Science, Uttaranchal University, Dehradun, India

Abstract. 5G wireless networks are based on heterogeneous networks. Hetero-

geneous networks offer a higher quality of service (QoS) and let you better utilize

the resources of the network. Control of trafﬁc on a network is complicated when

a multiplicity of heterogeneous networks are present. When different protocols

and data transmission rates are used, heterogeneous networks face the problem of

managing and managing network trafﬁc appropriately. In this paper our objective

is to reduce Network Trafﬁc and improve QoS for 5G wireless Network thus we

have discuss some Supervised and Unsupervised Algorithm of Machine Learning

Approach. So we have implemented K-mean algorithm in this paper will reduce

trafﬁc and improve the efﬁciency of 5G wireless Network. The K-mean is an

iterative grouping technique that moves data objects between cluster sets until

one desired set is reached. The dataset for K-mean Algorithm divides the trafﬁc

into two classes and then weighted mean is calculated for each cluster until the

resultant output is identical weighted mean. If there are two clusters have identical

weighted mean then there are no changes in cluster of classes.

Keywords: Heterogeneous network ·Trafﬁc classiﬁcation ·Machine learning ·

Supervised learning ·Unsupervised learning

1 Introduction

Now days 5G wireless network run on application that requiring high demand for data

rates. Heterogeneous network(HetNets) that can use different power level for trans-

mission to assure the data trafﬁc and heterogeneous utilizes multiple types of access

nodes, offers low power consumption, spectrum efﬁciency, energy efﬁciency, and qual-

ity of service, and offers reduced green house gases. In addition to the conventional”

High power antenna “HPN and HetNets Introduces “Low Power Antenna”. The high

power antenna can sever large the geographical area and low power antenna can sever

R. Mehra et al. (Eds.): ICCISC 2022, CCIS 1672, pp. 3–15, 2022.

https://doi.org/10.1007/978-3-031-22915-2_1

4 S. Saini et al.

comparatively small the geographical area. Various layers of cells, for example, femto,

full scale, miniature, pico, transfers, different client gadgets and applications interface

with the heterogeneous organization.

Heterogeneous In wireless telecommunications, network expressions can have a vari-

ety of meanings. It could, for example, refer to a pattern of ﬂawless and always-present

interoperability among a variety of multi-reporting protocols (HetNet). Alternative uses

for the term in homogeneity include describing the spatial division of wireless nodes or

users (also known as spatial distribution in homogeneity) [1](seeFig.1).

Explains the lack of clarity in technical writing and peer-reviewed publications could

result from describing the perception of “heterogeneous networks” without providing

that context. Secondary uncertainty may arise as a result of the fact that the “HetNet”

pattern can be studied from a “geometric” perspective as well as [2].

Fig. 1. Heterogeneous network

2 Trafﬁc Classiﬁcations

(See Fig. 2) The trafﬁc classiﬁcation what does mean and why should we are in a word

it’s all about performance We believe the trafﬁc classiﬁcation is serious to civilizing

your Ethernet. IP Network presentation and the consumer knowledge that is because

present simply so much bandwidth presented on your network and present a grouping

of trafﬁc like voice and ﬁnances transaction application are critical and needs to get

through quickly as soon as possible next year may have For other trafﬁc, such as Internet

browsing through video streaming this trafﬁc may be less latency sensitive variation

and then there all the rest which still needs to get there but can probably wait a bit by

classifying trafﬁc before putting on network you make the best use of bandwidth you

have available for illustration purpose its compare it an airplane that is only one quarter

Enhancing QoS of Network Trafﬁc Based on 5G Wireless 5

fall what worse is that you cannot ﬁt all your passengers in the ﬁrst class on the airplane

so bring if another ﬁll up the ﬁrst again and ﬂy another one quarter full and so on its.

In networking the same with available bandwidth and the network trafﬁc if you’re

not clarity the trafﬁc before it gets on to the network then all of two gets treated as a

priority this doesn’t make sense now here what does you take certain trafﬁc and say this

is my ﬁrst class trafﬁc must get through rapidly then this is my business class trafﬁc

it can get through rapidly but can wait a little ﬁrst class need to get through ﬁrst what

remain is my best effort trafﬁc which can wait.

Fig. 2. Trafﬁc classiﬁcations

3 Related Works

In (see Fig. 3) its shows Internet Trafﬁc classiﬁcation is divided into three Approaches

are followings.

3.1 Port Based Approach

In a node, many processes will be running and data which are sent or received must reach

the right Process. Every Process in a node is uniquely indentiﬁed using port number.

Suppose in a computer there are ﬁve process running and one process is requesting to

the data to the another computer replying and that reply must reach the right process

which send the request and reaching the right process which has sent the request the

right process and reaching the right process which has sent the request is done with help

of port address [3]. So, Port number or simply port we called as the communication

end point. In real Scenarios we have two categories of port number ﬁxed port number

(25.80) and dynamic port number (0–65565).UDP also uses port numbers, even though

it is connection less service [4].

6 S. Saini et al.

Fig. 3. Trafﬁc classiﬁcations

3.2 Payload Based Approach

This method determines the package by parsing the package sub headers. The packet

payload is parsed bit at a time to ﬁnd a stream of bits spanning the signature. Some-

times that ﬂow is decisive. In this case, you can name the set of factors precisely. This

machine works continuously to detect P2P trafﬁc and identify system outages [5]. The

real downside of this system is that security laws prevent administrators from evaluating

the payload. It also requires a lot of ﬂexibility and load preparation for trafﬁc identiﬁ-

cation devices. It scans the entire payload, requiring signiﬁcant processing power and

capacity limitations [6].

3.3 Statistical Based Classiﬁcation

The pattern recognition system is dividing into two major modes of operation training

and classiﬁcation. The role of preprocessing module is to segment the required pattern

from the given background, removenoise and normalization it’s and after other operation,

represented the pattern for further processing. On this approach the category of networks

is base absolutely on association altitude style and Network protocol performance. This

technique relies entirely on discovering and verifying host performance patterns at the

transport layer. The advantage of this classiﬁcation is that it does not require packet

payload access [7,8].

4 Work Flow of Trafﬁc Classiﬁcation in Networks

This is how network trafﬁc can be classiﬁed into different types of site visitor learning

based on any parameter [9]. First, it captures the trafﬁc of the community and extracts

Enhancing QoS of Network Trafﬁc Based on 5G Wireless 7

the characteristics of the chosen information. It after that trains the system using the data

sampling system and ﬁnally runs the algorithm and computes the outcome (see Fig. 4).

Fig. 4. Trafﬁc classiﬁcation approaches

STEP: 1 Network Trafﬁc Capture

At this stage of data collection one of the most important and critical step in data collec-

tion. This step captures network trafﬁc in real time. There are lots of tools for intercepting

network trafﬁc, such as Wire Shark.

STEP: 2 Feature Extraction Selections

The second step in network trafﬁc analysis is the selection of feature extraction. This

includes features extracted from the data collected in the ﬁrst stage of trafﬁc analy-

sis, such as packet length, packet duration, and time between packet arrival protocols,

etc. Then use the extract function to train a machine learning class. After training the

model and receiving the data, machine learning validates the data and outputs the results

accordingly. Some machine learning algorithm classiﬁers are trained during data testing

and training respectively.

STEP: 3 Training Process Sampling

The third step in network trafﬁc analysis is sampling of the learning process. Contains

data sets selected for supervised learning. In supervised learning, data are ﬁrst labeled

to classify unfamiliar network applications [10].

STEP: 4 Algorithm Implementation

The fourth step in network trafﬁc analysis is the implementation of machine learning

algorithms. Implementation steps involved in applying a machine learning algorithm

or classiﬁer to an instance. For example, the use of supervised, unsupervised and semi-

supervised learning algorithms. This article implements the algorithm of SVM and naive

8 S. Saini et al.

Bayes supervised K-Nearest Neighbor algorithm, the unsupervised learning algorithm

is K-Means, DBSCAN.

STEP: 5 Results and Observation

The third step of network trafﬁc analysis result and observation.After applying machine

learning algorithm gives the classiﬁer result.

5 Machine Learning in 5G Network

ML algorithms using statistical techniques that can be enhanced with experience with

the machine. The new scenarios and features of the 5G network trafﬁc described above

use many calls for existing motion control strategy [6]. To resolve these problems, you

can resolve a solution to work around so that you can create a solution directly to the

machine training model, or you can learn the data without using a subsequent rule set

[11] (see Fig. 5) is will discuss 5G trafﬁc management from the point of view of the ML

algorithm: controlled training, unconditionally educational.

Fig. 5. Machine learning workﬂow

6 Supervised Learning

Supervised learning is a method which we provide the machine a label dataset or in other

word we can say that we provide the machine with a given data set on which it is trained

to perform a future task. Supervised learning creates a comprehensive model that maps

contribution features to desired outputs. In a number of cases, maps are implemented

as a set of limited models, such because case-based inference or Nearest-Neighbor

algorithms.

Supervised learning applied to network management has been reported to shape

network routing path selection, trafﬁc volume prediction, etc. [12]. To solve the 5G

network trafﬁc problem with supervised learning, you should consider the following

steps:

Enhancing QoS of Network Trafﬁc Based on 5G Wireless 9

Step 1: Decide on the type of string example.

Step 2: Assemble the training set.

Step 3: Determine the input representation of the recognizable function.

Step 4: It determines the structure of the well-read function and the algorithm for learning

it.

6.1 Supervised Learning Algorithm

These are the supervised machine learning algorithms following:

a) Naïve Bayes

b) Support Vector Machine

c) K-Nearest Neighbor

6.2 Naïve Bayes

Naive Bayes Most of the generic Bayesian network models used for machine learning.

Bayes’ theorem is used to manage network trafﬁc and accurately classiﬁes network trafﬁc

using ﬂow features provided as training data for the model. Naive Bayesian learning has

no problems with noisy data and can make more accurate predictions (See Fig. 6).

Fig. 6. Naïve Bayes work ﬂow

Step 1: Read the dataset (Collection of IP address)

Step 2: Correlation Based Feature Selection (Relationship-based element subset selec-

tion is used in studies to ﬁnd subsets of highlights with high-level explicit relationships

and low-level relationships.)

10 S. Saini et al.

Step 3: Naïve Bayes Learner (learn the naïve bayes model).

Step 4: Naïve Bayes Predictor (Use Naïve Bayes model to predict classes)

Step 5: Classiﬁcation Result.

6.3 K-Nearest Neighbor

The nearest neighbor rule is an extension of the nearest neighbor rule. Most classes of

these K nearest neighbors are class labels assigned to the new sample. The value chosen

for k is signiﬁcant. If the value of k is correct, the classiﬁcation accuracy is better than

using the nearest-neighbor algorithm.

Networks can assign cluster values and use K-Nearest Neighbors to classify trafﬁc.

In the K-nearest neighbor method, K can be any integer greater than one. Calculate the

nearest neighbor group for each new data point to classify.

There are ﬂowing step in K – nearest neighbor.

Steps 1: Get data.

Steps 2: Deﬁne K Neighbors.

Steps 3: Calculated the Neighbors Distance.

Steps 4: Assign new instance to Majority of Neighbors.

6.4 Support Vector Machine

Support Vector Machine (SVM) is a Supervised Machine Learning Algorithm gener-

ally used to partition a numerical data set into different classes based on mathematical

properties and characteristics. Classiﬁcation aims to ﬁnd constraints (or equivalently

minimize classiﬁcation errors) between different classes using limits on the maximum

distance from a sample to that limit [13].

Classiﬁcation is then performed along the hyper plane that separates the two classes.

If you need a model that can accurately determine if a cat is a dog by looking at a strange

cat and dog that also has cat characteristics, you can use the SVM algorithm to create

such a model. The development concerned in the SVM classiﬁer is as follows:

Step1: Past Labeled Data.

Step 2: Model Training.

Step 3: Predication.

Step 4: Output.

In a network, ﬁrst the network trains on past labeled data so that it can learn different

characteristics of the data, It then tests the new data and after that learns how the algorithm

predicts and classiﬁes new received classes. You need to prepare the classiﬁer ﬁrst and

in that case cross-validate it through the data validation. To get correct predictions with

the SVM classiﬁer, you want to utilize the SVM kernel purpose and tune the parameters.

Enhancing QoS of Network Trafﬁc Based on 5G Wireless 11

7 Unsupervised Learning Algorithm

This is learning to train an output device to respond to a group of patterns as input. Unsu-

pervised learning is used in self-organizing neural networks. You don’t need a teacher

for this training. In this learning method, Related types of input vectors are grouped

mutually not including by means of tanning data to point out what a representative com-

ponent of every one grouping capacity look like, or which group a component belongs

to. During training, the neural network receives input models and classiﬁes them. When

a new input model is applied, the neural network provides an output response indicating

which class the input model belongs to. If no class exists for the input model, a new

class is created.

The study of network trafﬁc management in 5G networks allows the use of trafﬁc

patterns and probabilistic modeling in trafﬁc conditions. Network planning and conﬁg-

uration, network trafﬁc, better network planning and conﬁguration forecasting. Failure

of hands-free algorithms used in networks – K-Mean, DBSCAN.

7.1 K-Mean

The K-mean is an iterative grouping technique that moves data objects between cluster

sets until one desired set is reached.A tall degree of similitude is accomplished between

components of a cluster, whereas a tall degree of disparity between components of

diverse clusters is accomplished at the same time.

a) Algorithm

A K-mean partitioning algorithm that expresses the centered of each cluster as the

average of the features in the cluster.

K=Number of clusters.

D={t1,t2,------- tn}:An data set contain n objects.

Output: A set of K Clusters.

b) Method

1) Arbitrary in D, select K features as initial cluster centroids.

2) Replicate

3) (re) allocate all object ti in the cluster where the object is nearly all related, It is

based on the average value of the objects in the cluster and Update the cluster mean.

Analyze the average of the features for each cluster.

4) Repeated pending there is rejection adjusts.

c) Proposed Model for Reduced Network Trafﬁc

Ts =Trafﬁc state.

K=Number of clusters.

M=weighted Mean.

i. Suppose that we have given the following trafﬁc states to cluster. ( 2Ts1, 4Ts2,

10Ts3, 12Ts4, 3Ts5, 20Ts6, 30TS7,11Ts8, 25Ts9,) and K =2.

12 S. Saini et al.

ii. We initially assign th means to the ﬁrst two values M1 =2 and M2 =4.

iii. Using Euclidean distance initially K1 ={2Ts1, 3Ts5} and K2 ={4Ts2, 10Ts3,

12Ts4, 20Ts6, 30TS7, 11Ts8, 25Ts9}

iv. The Value 3 is equidistant from both means, so K1 is arbitrarily chosen.

v. Now, means are recalculated to get M1 =2.5, and M2 =16.

vi. Objects are assigned again to the crew clusters having K1 ={2Ts1, 3Ts5, 4Ts2}

and K2 ={10Ts3, 12Ts4, 20Ts6, 30TS7, 11Ts8, 25Ts9,} Continuing this we

obtain the following.

vii. The clusters in the last two steps are identical.

viii. This will yield identical means and thus the means same and no changes in clusters.

This will provide identical and therefore identical means and no variation in clusters

(Figs. 7,8,9and 10) (Table 1).

Tabl e . 1 . Variation in clusters

M1M2K1K2

318 {2Ts1,3Ts

5,4Ts

2, 10Ts3} {20Ts6, 30TS7, 11Ts8, 25Ts9}

4.75 19.6 {2Ts1,3Ts

5,4Ts

2, 10Ts3, 11Ts8, 12Ts4} {20Ts6, 30TS7, 25Ts9}

725 {2Ts1,3Ts

5,4Ts

2, 10Ts3, 11Ts8, 12Ts4} {20Ts6, 30TS7, 25Ts9}

7.2 DBSCAN

Noise-based spatial density clustering is a density-based clustering algorithm that uses

dense functional areas. Parameters used in the DBSCAN algorithm.

Esp, score, minimum DBSCAN clusters reach the density directly and are

formed at the midpoint where the density can be reached. Collect data from online

tools, select features based on the package, and adapt your model to testing and training.

Finally, forecasts and reviews.

Let X ={x1, x2, x3, …, xn} be the set of data points. DBSCAN requires two.

Parameters: (eps) and the smallest number of points needed to.

Form a cluster (Minpts).

1) The algorithm proceeds by randomly selecting a point from the data set (waiting for

all points to have been accessed. Access).

2) If there are at smallest amount ‘minPoint’ points inside the radius ‘ε’Uptothis

point, all these points are considered part of the same cluster

3) Then the clusters are extended by how to recursively iterate the neighborhood

calculating for both neighbor point.

Enhancing QoS of Network Trafﬁc Based on 5G Wireless 13

Fig. 7. Initial data (K =2)

Fig. 8. Phase – 2 (Finding the Neighbors and voting for label)

14 S. Saini et al.

Fig. 9. Phase -3 (Finding the Neighbors and voting for label)

Fig. 10. Phase-4 (Finding the Neighbors and voting for label)

Enhancing QoS of Network Trafﬁc Based on 5G Wireless 15

8 Conclusions

Heterogeneous networks are the foundation of 5G networks when trafﬁc on the network

plays a major role in disrupting network performance. This article described ML for

5G trafﬁc control, including supervised and unsupervised learning. Supervised learning

algorithm. When we use Naive Bayes algorithm its provide less accuracy because If

there is a variable in the test dataset that is not in the tan dataset, the naive Bayes model

assigns it a probability of zero and makes no predictions about it and Support vector

Machine is not appropriate for huge amount of dataset for the reason that of its high

training time and SVM performance was not good in case of overlapping classes. In

unsupervised learning algorithm we using the k-mean algorithm are used reduce the

network trafﬁc using the method Clustering that gives the more accuracy.

References

1. Soldani, D., Manzalini, A.: Horizon 2020 and beyond: on the5G operating system for a true

digital society. IEEE Veh. Technol. Mag. 10(1), 32–42 (2015)

2. Liu, Y., Zhang, Y., Yu, R., Xie, S.: Integrated energy and spectrum harvesting for 5G wireless

communications. IEEE Network 29(3), 75–81 (2015)

3. Shaﬁ, M., et al.: 5G: A tutorial overview of standards, trials, challenges, deployment, and

practice. IEEE JSAC 35(6), 1201–1221 (2017)

4. Dhote, Y., Agrawal, S., Deen, A.J.: A survey on feature selection techniques for internet

trafﬁc classiﬁcation. In: 2015 International Conference on Computational Intelligence and

Communication Networks (CICN). IEEE (2015)

5. Dzulkiﬂy, S., Giupponi, L., Sai, F., Dohler, M.: Decentralized Q learning for uplink power

control. In: IEEE International Workshop on Computer Aided Modelling and Design of

Communication Links and Networks, pp. 54–58. IEEE (2015)

6. Li, R., et al.: Intelligent 5G: when cellular networks meet artiﬁcial intelligence. IEEE Wirel.

Commun. 24, 175–183 (2017)

7. Alnwaimi, G., Vahid, S., Moessner, K.: Dynamic heterogeneous learning games for oppor-

tunistic access in LTE-based macro/femtocell deployments. IEEE Trans. Wirel. Commun.

14(4), 2294–2308 (2015)

8. Challita, U., Dong, L., Saad, W.: Deep learning for proactive resource allocation in LTE-U

networks. In: European Wireless 2017- 23rd European Wireless Conference (2017)

9. Shaﬁq, M., et al.: Network trafﬁc classiﬁcation techniques and comparative analysis using

machine learning algorithms. In: 2016 2nd IEEE International Conference on Computer and

Communications (ICCC). IEEE (2016)

10. Mole, P.V.: Towards 5G Enabled Trafﬁc Management Systems: A Literature Review

11. Chih-Lin, I., Han, S., Xu, Z., Wang, S., Sun, Q., Chen, Y.: New paradigm of 5G wireless

internet. IEEE J. Sel. Areas in Commun. 34(3), 474–482 (2016)

12. Fu, Y., et al.: Artiﬁcial intelligence to manage network trafﬁc of 5G wireless networks. IEEE

Network 32(6), 58–64 (2018)

13. Chiti, F., Fantacci, R., Giuli, D., Paganelli, F., Rigazzi, G.: Communications protocol design

for 5G vehicular networks. In: Xiang, W., Zheng, K., Shen, X.( (eds.) 5G mobile commu-

nications, pp. 625–649. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-34208-

5_23

Soil Classiﬁcation and Crop Prediction Using

Machine Learning

Yuvraj Jangir(B), Tushar Goyal, Sumit Kandari, and Arshad Husain

Department of Computer Science, DIT University, Dehradun, Uttarakhand, India

Yuv.rraj786@gmail.com

Abstract. Soil classiﬁcation is the process in which soil is segregated according

to its physical and chemical properties. This process can be achieved manually

or using a machine learning algorithm. The use of machine learning algorithms

has been on the rise in recent years due to their accuracy. They can classify soils

with more precision than humans can manually, by considering many factors such

as pH, organic matter content, and particle size distribution. We here proposed a

model to classify soil and to predict the most suitable crops using various algo-

rithms of machine learning like Convolutional Neural networks (CNN), Decision

Trees, Naive Bayes. Soil and crop datasets are used, they comprise of different

geographical and physical. Parameters. The module is tested on manually created

datasets and results are obtained.

Keywords: Soil types ·Machine learning ·Convolutional neural network

(CNN) ·Decision tree classiﬁer

1 Introduction

1.1 Objective of Study

To alleviate the agricultural crisis in its current state, it is necessary to put in place

better recommendation systems to alleviate the crisis by helping farmers make informed

decisions before planting begins. The main objective of this experiment was to classify

and predict the suitability of different types of soils on various crops based on predictor

variables like temperature, rainfall, and location. Prediction obtained using decision trees

algorithm. It is important to note that this algorithm works only when a set number of

input values are set up in the form of training samples, which gives a result prediction for

each sample in accordance with that input setting, which can then be compared against

ground truth at other input settings (some samples may not have the rule applicable).

As such, each predictor variable was considered as a feature to deﬁne the suitability of

a given map. A decision tree was then constructed to mimic this process and so predict

the result for an input location.

R. Mehra et al. (Eds.): ICCISC 2022, CCIS 1672, pp. 16–21, 2022.

https://doi.org/10.1007/978-3-031-22915-2_2

Soil Classiﬁcation and Crop Prediction Using Machine Learning 17

1.2 Related Works

In order to predicts soil type and based on the prediction, suggests suitable a model is

proposed which includes several machine learning algorithms are used for soil classiﬁca-

tion. Experimental results show that the proposed SVM (support vector machines) based

method performs better than many existing methods [1]. Soil dataset and crop dataset

were used to classify the soil. Soil dataset contains class labelled chemical feature of soil.

The crop suggestion dataset contains class labelled crop suggestion attributes [2]. The

soil image has been analyzed using various image preprocessing. Soil color is identiﬁed

using statistical properties such as mean of Red, Blue, Green (RGB) values of image

pixels. A classiﬁcation of soil based on feature extraction of soil color, soil pH values

and texture by using Support Vector Machine classiﬁer. Here, the one-against-all SVM

method is used for classiﬁcation. Recommended which crops were suitable for tested

soil image based on feature extraction by using image processing [3]. Digital image

processing and Image analysis technology has been used in which suggestions of crops

to be grown in that soil type. The image of the soil/land is clicked using phone camera

and submit it. An Image is a two - dimensional signal. Image processing is a method

to perform some operations on an image, to get an enhanced image or to extract some

useful information [4]. A model has been proposed for predicting the soil type and sug-

gesting a suitable crop that can be cultivated in that soil. The model predicts soil fertility

and 8 other properties of agricultural land [5]. A comprehensive system is described for

soil classiﬁcation in which different images of soil samples are captured. The features

of each type of soil are collected and are stored in a separate database. This database

is later used in the ﬁnal stage for soil classiﬁcation [6]. In order to efﬁciently classify

the soil instances and maps the soil type to the crop data to get better prediction with

higher accuracies. Soil prediction involves types of crop classiﬁcations and geographical

attributes. It also aims at creating a system that processes the real-time soil data to predict

the crops with higher accuracy [7].

2 Proposed Framework

In present work, we propose a novel methodology for soil classiﬁcation using Convo-

lutional Neural Network (CNN). (Fig. 1.) The generated classiﬁers were validated with

the accuracy of more than 90%. Secondly, we discussed about crop prediction using

decision tree-based classiﬁer.

In this paper, we described a proposed architecture for soil classiﬁcation using image

samples. Soil classiﬁcation is done by analyzing different types of soil image sam-

ples. Enough training samples are needed to classify soil samples. Training samples

should be selected carefully. After selecting the training dataset, all the images are then

passed through Image Processing which includes resizing and augmentation process.

After applying Image Processing on all images, the resultant images are passed through

Convolutional Neural Network (CNN) for classiﬁcation.

Convolutional Neural Network (CNN) Is a powerful algorithm for image processing.

These algorithms are currently the best algorithms we have for the automated processing

of images. CNN is a directed acyclic graph with four main layers, which are: input layer,

18 Y. Jangir et al.

Fig. 1. Architecture of the system

ﬁlter layer (convolutional layer), pooling layer and output layer to provide better accuracy

of classiﬁcation. CNN is used to classify the soil samples by extracting various features

from the image. (Fig. 2.) The dataset used in the paper contains nearly 200 cropped

images from different soil types. It is a type of machine learning algorithm that allows

for classiﬁcation and prediction tasks. It is used to classify inputs into several categories

and predict an output based on given inputs. Images contain data of RGB combination.

The computer does not see an image, all it sees is an array of numbers. Color images are

stored in 3-dimensional arrays. The ﬁrst two dimensions correspond to the height and

width of the image (the number of pixels). The last dimension corresponds to the red,

green, and blue colors present in each pixel.

Fig. 2. Wor kin g o f CNN

Soil Classiﬁcation and Crop Prediction Using Machine Learning 19

The output of CNN is compared with a threshold value and if it is greater than the

threshold value then it will be classiﬁed into class1 otherwise it will be classiﬁed into

class2. The classiﬁcation process is done using Python 3.10.4 and Keras 2.8.0 lucid

framework. The evaluated accuracy of this method is greater than 90%, which means

the method is reliable, accurate and fast. Then, the trained model is stored in h5 format,

which is then used to predict the soil, using the image provided in the input.

In the crop prediction model, we ﬁrst created a dataset with the following parameters:

states, rainfall, ground water, temperature, soil type, season, and crop, and stored it in

a csv ﬁle. After creating the dataset, we split the data for the training data set and the

testing data set separately. After that, we used the decision tree classiﬁer to train the crop

prediction model. The performance of the model was measured by the accuracy value.

Decision Tree Is a supervised prediction method which is widely used because they can

easily obtain input data and are suitable for classifying the data into a wide range of

categories. Decision trees can be used as a type of classiﬁer or regression model that

uses binary trees to predict outcomes. The key advantage of decision trees is that they

can be easily implemented and interpreted. Decision tree models have been shown to

be effective in many real-world applications. Decision tree classiﬁers are used in many

ﬁelds, such as pattern recognition, data mining, machine learning and bioinformatics.

Decision trees can be used to predict categorical outcomes, and they frequently have an

option to do regression functions.

In this study, we presented a real-world dataset that has the names of various crops.

We also had the opportunity to determine two meteorological inputs: rainfall and ground

water level. These two inputs were necessary to help us build a crop prediction model

using decision trees. After training the model, we stored the model in a.sav format, which

is then used to predict the crop, using the parameters provided in the input and the soil

type predicted using soil classiﬁcation model.

3 Result and Discussion

3.1 Dataset Collection

Multiple datasets are used to train and obtain relevant results. All the datasets used are

custom datasets; built and structured according to the requirements of the algorithm and

the proposed test cases. Below is the list and types of the datasets used-

1. Soils.zip (Soil Image Dataset) - Contains about 150–200 images of different types

of soil which are used for agriculture and found in the Indian subcontinent.

2. Cat_crops.csv - The CSV ﬁle mentioned contains data on various parameters

that were considered when training the machine learning model for the crop

recommendation system.

20 Y. Jangir et al.

Fig. 3. Result obtained

3.2 Results

The below Fig. 3. Represents the accuracy of the soil classiﬁcation model which was

built using CNN (Convolutional Neural Networks).

After several changes and observations, it has been noticed that after improving the

dataset, the accuracy of the model is also improved.

Fig. 4. Accuracy of different algorithm

In crop prediction model, we provided a csv ﬁle which consists of the following

parameters: States, Rainfall, Ground Water, Temperature, Soil Type, Season and Crop.

We compared the results with the previous prediction methods. (see Fig. 4.) We used

4 different classiﬁers to train the crop prediction model, out of which the decision tree

Soil Classiﬁcation and Crop Prediction Using Machine Learning 21

classiﬁer gave us the best accuracy among them. The accuracies of all the different

classiﬁers used were:

4 Conclusions and Future Scope

This proposed system is based on an image processing technique where digital images

of the soil samples were processed using convolutional neural network (CNN). In this

study, Decision Tree was used to determine the crop suitability of the soil sample. The

results showed that CNN recommended which crops were suitable for tested image of

the soil samples. So, the proposed method will help farmers to increase the productivity

of yield by identifying suitable crops for the soil samples.

In future perspective, a point location-based rainfall prediction and ground water

detection module can be integrated with the other parameters. This would increase the

overall prediction of suitable crops. Also, the dataset for all these three experiments

consisted of nearly 200 cropped images from different soil types, a larger dataset is

needed. This will increase the accuracy of prediction and classiﬁcation of soil as well as

crop.

References

1. Rahman, S.A.Z., Mitra, K.C., Islam, S.M.M.: Soil classiﬁcation using machine learning meth-

ods and crop suggestion based on soil series. In: International Conference of Science and

Technology Computer (2018)

2. Reddy, K.M.A., Chithra, S., Hemashree, H.M., Kurian, T.: Soil classiﬁcation and crop Yadav,

suggestion. Int. J. Res. Appl. Sci. Eng. Technol. (2020)

3. Yadav, P., Ahire, P.: Soil health analysis for crop suggestions using machine learning.

AEGAEUM J. (2020)

4. Aishwarya, M., Revathy,R., Periasamy, J.K., Srujana, T.: Soil classiﬁcation and crop suggestion

using machine learning techniques. J. Gujrat Res. Soc. (2019)

5. Saranya, N., Mythili, A.: Classiﬁcation of soil and crop suggestion using machine learning

techniques. Int. J. Eng. Res. Technol. 9, 671–673 (2020)

6. Chandan, R.T.: An intelligent model for indian soil classiﬁcation using various machine

learning techniques. Int. J. Comput. Eng. Res. (IJCER) 33, 3005 (2018)

7. Shravani, V., Uday Kiran, S., Yashaswini, J.S., Priyanka, D.: Soil classiﬁcation and crop

suggestion using machine learning. Int. Res. J. Eng. Technol. (IRJET) (2020)

Analysis of the Performance of Data Mining

Classiﬁcation Algorithm for Diabetes Prediction

Vijaylakshmi Sajwan1, Monisha Awasthi1(B), Prakhar Awasthi2, Ankur Goel3,

Manisha Khanduja1, and Anuj Kumar4

1Uttaranchal School of Computing Sciences, Uttaranchal University, Dehradun, India

uumonishaawasthi@gmail.com

2Department of Computer Science and Engineering, RIT, Bangaluru, Karnataka, India

3Department of Business Administration, MIET Group, MIT, Meerut, U.P, India

4Uttaranchal Institute of Technology, Uttaranchal Unversity, Dehradun, India

Abstract. The purpose of this paper is to identify solutions for the diagnosis of

diabetes disease by analyzing the patterns found in the data using classiﬁcation

algorithms such as Decision Tree, SVM, KNN, Naive Bayes, Random Forest,

Neural Network, and Logistic Regression. According to a WHO report, almost

42.2 crores population of the world has diabetes, who are primarily the residents of

low and middle income countries, and diabetes is resulting in around 0.15 crores

of deaths each year globally [1]. To evaluate and discuss the performance of

above-mentioned algorithms, Orange as a data mining tool has been applied. Fur-

thermore, the data set used in this research is the “Pima Indian Diabetic Dataset,”

which is obtained from the University of California, Irvine (UCI) Repository of

Machine Learning datasets. As this study utilized several classiﬁers to simulate

actual diabetes diagnosis for local and systemic therapy, the results indicated that

Logistic Regression outperforms all other classiﬁers. The experimental data also

demonstrated the signiﬁcance of the suggested model in the study. The disease has

been ranked as the ﬁfth-deadliest in the United States, and there is currently no

cure in sight. With the advancement of information technology and its continued

penetration into the medical and healthcare sectors, diabetes cases and symptoms

have become well documented and discussed. The research is original and adds

value to the current studies in the same domain as researchers develop a more

rapid and efﬁcient method of diagnosing the disease, allowing for more timely

treatment of patients.

Keywords: Accuracy ·Diabetes ·KNN ·Logistic regression ·Naive bayes ·

Neural network ·Random forest ·Support vector machine

1 Introduction

Databases are densely packed with hidden data and are designed to aid in intellectual

decision making. Different types of data analysis, such as classiﬁcation and prediction,

are used to make predictions about future data and to describe the data classes. The

classiﬁcation is a process that predicts the labels for categorical classes. The labels for

R. Mehra et al. (Eds.): ICCISC 2022, CCIS 1672, pp. 22–36, 2022.

https://doi.org/10.1007/978-3-031-22915-2_3

Analysis of the Performance of Data Mining Classiﬁcation Algorithm 23

this class may be discrete or nominal in nature. Classiﬁcation techniques classify data

using a training set and class labels [2]. With the rising prevalence of implementations

of various classiﬁcation and prediction algorithms, there is a need for a central hub

that could evaluate the performance of all classiﬁcation algorithms as well as provide

information on which classiﬁer is the best [3].

The objective here is to examine various algorithms of machine learning for classiﬁ-

cation using the diabetes data set. ORANGE is also used for this purpose. The purpose of

this paper is to compare ORANGE classiﬁers on a diabetes dataset. Such techniques are

compared using the results of their ORANGE calculations. We have used the Diabetes

dataset because it is a chronic and one of the dramatically increasing metabolic diseases

in the world. Diabetes mellitus, more generally referred to as diabetes, is a collection of

illnesses (metabolic) characterized by persistently increased levels of sugar in a blood

(beyond a certain limit) and caused by lowering the secretion of insulin or biological

effects, or both. It is a disorder in which the person’s body is not able to metabolize

food in an adequate manner. Diabetes can wreak havoc on a variety of tissues, most

notably the eyes, kidneys, heart, blood vessels, and nerves, resulting in chronic damage

and dysfunction. Diabetes is primarily classiﬁed into two segments (types): T1D – Type

1 Diabetes and T2D - Type 2 Diabetes. Type 1 diabetes typically develops in young aged

people (below 30 years of age), and the general symptoms include thirst and urination

again, as well as elevated levels of sugar in the blood. Only must be treated with insulin

as impossible with other oral drugs. Type 2 diabetes is more prevalent in the younger

than younger aged and senior population, and is frequently related with obesity, hyper-

tension, dyslipidemia, arteriosclerosis, and other disorders [4]. Numerous data mining

classiﬁcation methods have been developed with the goal of classifying, forecasting,

and diagnosing diabetes. However, no meaningful comparison evaluation of the perfor-

mance of such algorithms has been conducted. There has been no research conducted to

determine which of the existing classiﬁer model scans provides the best prediction for

diabetes. The decision tree, Naive Bayes, Random Forest, KNN (K-Nearest neighbours)

and Support vector machines (SVM) classiﬁcation methods were utilized in this work

to develop classiﬁer models [5].

2 Related Work

According to Aljumah [6], diabetes is a chronic condition that arises when the body

insulin is ineffectively used or when the pancreas produces insufﬁcient insulin. A promi-

nent hormone, Insulin regulates the levels of blood sugar. Unregulated diabetes results

in a rise of blood sugar, which leads to serious vandalism to various body parts and

systems, like the blood vessels and nerves, over time. According to Health informatics,

it is the study of how to collect, retrieve, communicate, store, and utilize health-related

data, knowledge, and information to the best of one’s ability. Barakat et al. [7] deﬁned

how healthcare providers should handle patient information and how citizens should

participate in their own health care. It is now widely recognized as a necessary and

widespread component of long-term health-care delivery. Machine Learning (ML) is

the fastest-growing area in computer science today. When using machine learning in

diabetes related data for prediction, it’s important to remember that this data isn’t being

24 V. Sajwan et al.

collected to address speciﬁc research questions; instead, learning algorithms are being

utilized to analyze biomedical data automatically. Song et al. [8] analyzed multiple

categorization algorithms utilizing characteristics such as thickness of skin, pedigree

of diabetes, glucose level, Body Mass Index, patient age, insulin and blood pressure.

Pradeep and Dr. Naveen compared the machine learning algorithms’ performances in

[9] and measured the accuracy of each algorithm. There were accuracy variations in

terms of techniques utilized, pre-processing and after processing of data. It was noticed

that ‘Pre-processing of data’ had better accuracy and overall performance for prediction

of diabetes. In this study, before preprocessing for prediction of diabetes, the Decision

tree algorithm provided better accuracy as compared to other techniques like Random

forest and Support vector machine. According to Loannis et al. [10], Machine learning

techniques, such as the diabetic disorders dataset, have become a signiﬁcant tool for

predicting diabetes using diverse medical data sets (DD). In this work, SVM, Logis-

tic Regression, and Nave Bayes were used. They used 10-fold cross validation for the

diabetes dataset (DD). The SVM (Support Vector Machine) strategy outperformed the

others in terms of precision and processing, according to the study. For diabetes predic-

tion, Nilashi et al. [11] suggested a CART (classiﬁcation and Regression Tree) model.

Expectation Maximization (EM) and PCA (Principal Component Analysis) were applied

to pre-process the data and remove noise before applying the rule. The goal of this study

is to design a diabetes decision assistance system. The effect of CART with removal

of noise provided efﬁciency and enhanced prediction, allowing human life to be saved

from premature demise. A categorization model was suggested by Kamadi et al. in [12].

One of the most typical problems in categorization, they claim, is reduction of data. PCA

(principal component Analysis) was employed in this work for pre-processing of data,

as well as for reduction of data to enhance accuracy. The study employed a modiﬁed

DT (Decision tree) and a fuzzy rule to make predictions. They discovered that reducing

the dataset improves the results. Sajida et al. [13] employed the Canadian primary care

sentinel surveillance Network(CPCSSN) dataset and three machine learning models to

detect diabetes at a primary stage in order to save human lives. To predict diabetes,

decision tree (J48), Adaboost, and Bagging were used in this study. Rathore et al. [14]

Diabetic disorder can be detected and predicted. The performance measurements were

examined using R Studio and the Pima Indians diabetes dataset. SVM and Decision Tree

are two machine learning techniques employed. The SVM has an accuracy of 82%.

In [15], S M Hasan Mahmud et al. forecast diabetes. To discover the performance

measurements of the classiﬁcation algorithms, 10-fold cross validation procedures were

used. The study found that Naive Bayes outperformed the other classiﬁers, with an

F1 score of 0.74. On the PIMA dataset, Ahuja et al. [16] conducted a comparison

examination of various machine learning techniques, including NB, DT, and MLP, for

diabetic categorization and found MLP to be superior to other classiﬁers. Fine-tuning

and efﬁcient feature engineering, according to the authors, can improve MLP’s perfor-

mance. Garca-Ordás, M.T. et al. [17] employ min-max normalization and a variant auto

encoder sparse auto encoder to solve data standardization, feature augmentation and

imbalance. MLP was then used for classiﬁcation, with an accuracy of 92.31%. Without

preprocessing, Bukhari, M.M. et al. [18] state that their ABP-SCGNN (Artiﬁcial Back

Propagation Scaled Conjugate Gradient Neural Network) obtained 93% accuracy. [19]

Analysis of the Performance of Data Mining Classiﬁcation Algorithm 25

is another example of good performance utilizing NN-based models. They looked at

median value imputation (MVI), KNN and an iterative imputer for imputation of the

missing value. Then, to attain an F1-score of 98%, MLP was employed for classiﬁ-

cation. Khanam and Foo [20] employed MVI and Pearson Correlation for selection of

features and missing value imputation. To further standardize the data and eliminate out-

liers, interquartile ranges were used. The classiﬁcation model based on DNN achieved

an accuracy of 88.6% using several hidden layers. Overall, missing value imputation

and feature selection regarding data pretreatment techniques were seen to be highly

appropriate for prediction of diabetes classiﬁcation performance. The majority of data

preparation approaches, on the other hand, have been found to perform well when data is

normally distributed. Nonlinear approaches will be better adapted to the problem if the

data does not conform to normalcy assumptions, and they are likely to add signiﬁcantly

to a classiﬁer’s performance. As a result, this study will look at nonlinear preprocessing

approaches and classiﬁers for data preprocessing.

3 Methodology

This section describes the classiﬁcation model’s approach as well as its efﬁcacy in DM

classiﬁcation. Figure 1summarises the process.

Fig. 1. Methodology of proposed work

26 V. Sajwan et al.

For all of the algorithms (Naive Bayes, KNN,ANN, Logistic Regression,Decision

Tree, Random Forest, SVM), the ‘Confusion Matrix (CM)’ encapsulates the varioussteps

from raw data to grading, data reduction, pre-processing, scoring, and testing.These steps

are described in greater detail in the following subsections as:

A- It describes the data mining toolkit.

B- It describes the database and its attributes.

C- It provides insights into the pre-processing steps.

D- It discusses the process of classiﬁcation using the algorithms of seven classiﬁcations.

3.1 Data Mining Toolkit

To imitate excellent classiﬁcation techniques, the Orange Data Mining suite of tools

[21] is utilized. Orange was developed as an Open Source Machine Learning (OSML)

framework having in-built visualization of data and analytic capabilities at the University

of Ljubljana’s Bioinformatics Lab. Orange provides a data preprocessing, classiﬁcation,

regression, clustering, visualization and assessment environment with association rules.

3.2 Collection of Database

The National Institute of Diabetes and Digestive and Kidney Diseases (NIDDKD)

obtained the Pima Indians Diabetes dataset (PIDD) of patients. We would like to express

our gratitude to Vincent Sigillito for supplying the data and short detailing is provided

in Table 1which shows the class distribution in PIDD.

Tabl e 1 . Distribution of classes in the Pima Indians diabetes dataset

Class value Number of instances Relabeled value

0500 Tested_negative

1268 Tested_positive

The NIDDKD owns the PIDD downloaded from Kaggle [22]. Diabetes mellitus may

be identiﬁed with the use of this dataset. It has a total of 2000 records, each with eight

characteristics and the class label (outcome). The data set’s description, including its

properties, statistical analysis, and values, are included in Table 2.

These eight characteristics are symptoms that people may or may not have that

indicate their likelihood of having diabetes mellitus.

3.3 Data Preprocessing

Pre-processing is essential to improving model prediction performance. The Orange

toolbox supports a variety of pre-processing techniques [23]. Three different types of

pre-processing approaches are used in this article to increase the dataset’s quality and

eventually the classiﬁcation models performance.

Analysis of the Performance of Data Mining Classiﬁcation Algorithm 27

Tabl e 2 . Data set description, properties, statistical analysis and values of data

SNo Attribute name Attribute

description

Data type of attribute Range of attribute

1Preg Pregnancy

frequency

N0to17

2Plas Concentration of

Plasma Glucose

N0 to199

3Pres BP (Blood

Pressure) (mm,

hg)

N0to122

4 Skin Thickness of skin

fold

N0to99

5Insulin 2hseruminsulin

(mm U/ml

N0to846

6Mass BMI – Body

Mass Index

N0 to 67.1

7Pedi Function of

Diabetes pedigree

N0.078 to

2.42

8Age Age of Person (in

yrs.)

N21 to 81

9Outcome Class variable …………. Tested positive, tested

negative

N* - Numeric

•Removal of values which are missing

Due to the fact that the utilized dataset had some missing values, Orange toolkit

presents three methods for imputing values which are missing: eliminate such records,

change them with values which are random, or lastly, change such values with the mean

of other accessible values [24]. As a result, this strategy is selected to be utilized to

eliminate missing values from the applied dataset.

•Selection of Relevant feature

It is critical to choose the most relevant elements. This stage assigns a score to

each characteristic based on its association with the designated diabetes class. From

the dataset, eight characteristics were retrieved. ANOVA is a statistical technique. [25]

Once ANOVA was calculated, it was obtained that thickness of skin and BP are the least

important characteristics and would play a little role in the process of classiﬁcation;

hence, they were deleted from the features vector, resulting in six rather than eight

features. Figure 2. The table below summarizes the results of the ANOVA test on the

characteristics.

28 V. Sajwan et al.

Fig. 2. Result of ANOVA test on the characteristics

•Normalize the Data

Normalization of data can simplify operations and increase computation perfor-

mance. As a result, the data were normalized to a general scale in a range of zero and

one [26]. Scaling by standard deviation (SD) is one of the methods provided in the

Orange toolbox.

3.4 Data Classiﬁcation

During this step, the diabetes dataset was classiﬁed using six different algorithms. The

investigated classiﬁers were Naive Bayes, KNN, ANN, SVM, Random Forest, Decision

Tree, Logistic Regression and Adaboost. The data set of features in the data base is

separated into two parts as training was 70% and testing was 30% to guarantee that the

classiﬁcation process is exactly ﬁt.

•Naive Bayes

Naive Bayes is a statistical learning technique that uses a condensed version of the

Bayes rule to determine the posterior distribution of a category given the input attribute

values of an example case. Prior probabilities for groups and attribute values that are

conditional on categories are calculated using training data frequency counts. Naive

Bayes is a straightforward and fast technique for learning that frequently beats more

advanced methods. Bayesian classiﬁcation is both a supervised learning technique and

a statistical classiﬁcation technique. It is capable of resolving diagnostic and predictive

issues [27].

Analysis of the Performance of Data Mining Classiﬁcation Algorithm 29

•KNN

The KNN algorithm [28] is a simple classiﬁcation approach. The detection of the

nearest K neighbours during the training phase. The distance between objects and the

value of K, the number of closest neighbours, are calculated using a similarity measure.

•ANN

ANN is a supervised learning method [29] that uses a network of layers to represent

input data, one or more non-linear layers called hidden layers, and ﬁnally an output layer

that represents the classiﬁcation category.

•Random Forest

This classiﬁer creates a collection of decision trees [30], which is a random subset

of the training data. The test object’s ﬁnal class is chosen to be one that aggregates votes

from the various decision trees.

•SVM

SVM models are a type of supervised learning method that may be used for both

classiﬁcation and regression issues, but is most frequently used for classiﬁcation prob-

lems. This classiﬁer is a widely used statistical model that is built on a logistic function

applied to a binary dependent variable in the model [31].

•Decision Tree

A decision tree is a tree structure that resembles a ﬂowchart. It is a method for

classiﬁcation and prediction that uses nodes and inter-nodes to describe the data. The

root and internal nodes are test cases that are used to distinguish instances with varying

characteristics. Internal nodes are generated as a result of attribute testing. The class

variable is denoted by the leaf nodes [32].

•Linear Regression

Logistic regression is a technique for binary classiﬁcation. The input variables are

expected to be numeric and to have a Gaussian distribution. It is not required for the

last statement to be true in logistic regression. In other words, the method is capable

of producing acceptable results even when the data is not Gaussian. Each input value

is assigned a coefﬁcient, which is then linearly merged into a regression function and

converted using a logistic function [33].

4 Evaluation & Result

In this part, the results of implemented performance measurements are shown using the

Orange toolkit’s pleasant graphical interface.

30 V. Sajwan et al.

4.1 Setup of Experiments with Results

This subpart explains the procedure of sampling used, the parameters of the classiﬁcation

model, and the CM for every algorithm.

•Method of Sampling

The developed models’ performance is evaluated using a K-fold cross-validation

sampling approach [27]. The whole datasets are cross-validated tenfold in this article

(2000 records). The data were divided into tenfold samples. The classiﬁcation model is

trained on seven folds, with the remaining fold serving as a testing set. As a result, for

training the model 70% and for testing the model 30% of data records were utilized.

•Decision Tree

The CM of the Tree classiﬁer is demonstrated in Fig. 3. Out of 500 data points, which

are labeled as ‘0’, the correct classiﬁcation is for 402 records. Out of 268 data points,

which are labeled as ‘1’, the correct classiﬁcation is for 142 records.

The Confusion matrix illustrates four critical metrics for evaluating the Decision

Tree Classiﬁer model: true positive (TP), true negative (TN), false positive (FP), and

false negative (FN). Where TP =142, TN =402, FP 126 and FN =98.

•SVM

To learn the model, the attribute space is transformed into a new feature space using

a Radial Basis Function (RBF) kernel. The maximum number of iterations authorized

was 100. Figure 4depicts the SVM classiﬁer’s confusion matrix.

Whereas out of 500 data points, which are labeled as ‘0’, the correct classiﬁcation

is for 401 records and out of 268 data points, which are labeled as ‘1’, the correct

classiﬁcation is for 152 records. Again, the values of four critical metrics are TP =152,

TN =401, FP =116, and FN =99.

•KNN

Figure 5illustrates the KNN classiﬁer’s confusion matrix. The nearest neighbours’

numbers was set to ﬁve in the KNN model, and the usage of Euclidean distance was

done to calculate the distance between two points, with points weighted according to

their distance from the query point.

We can see in Fig. 5, The CM summarizes four critical metrics for evaluating the

KNN Where TP =156, TN =413, FP =112 and FN =87.

•Random Forest

A forest was incorporated here with 10 decision trees. In Fig. 6, the model’s confusion

matrix is depicted. The CM illustrates four critical metrics for evaluating the Decision

Tree Classiﬁer model, where TP =161, TN =425, FP =107, and FN =75.

Analysis of the Performance of Data Mining Classiﬁcation Algorithm 31

•Naive Bayes

Whereas out of 500 data points, which are labeled as ‘0’, the correct classiﬁcation

is for 403 records and out of 268 data points, which are labeled as ‘1’, the correct and

successful classiﬁcation is for 182 records.

The CM illustrates four critical metrics for evaluating the Naive Bayes Classiﬁer

model, where TP =182 TN =403, FP =86 and FN =97.

•Artiﬁcial Neural Network

In this model, back-propagation was applied with a multi-layer perceptron (MLP)

approach. Each buried layer had 200 neurons with a Rectiﬁed Linear Unit (ReLu) activa-

tion function. The Adam technique was then employed to efﬁciently optimise stochastic

weights. In Fig. 8, the con-fusion matrix for the neural network model is shown. Whereas

out of 500 data points, which are labeled as ‘0’, the correct classiﬁcation is for 431 records

and out of 268 data points, which are labeled as ‘1’, the correct classiﬁcation is for 157

records. The Confusion matrix summarizes four critical metrics for evaluating an ANN

Classiﬁer model as TP =157, TN =431, FP =111, and FN =69.

•Logistic Regression

This model’s regularization is set to ridges (L2), and the cost strength is set to its

default value of one (C =1). The model’s CM is depicted in Fig. 9.

From 500 data points labeled 0, 442 records were successfully identiﬁed, while from

268 data points labeled 1, 151 records were correctly classiﬁed.

True positive (TP), true negative (TN), false positive (FP), and false negative (FN)

are four signiﬁcant metrics used to assess Logistic Regression Classiﬁer model (FN).

TP =151, TN =442; FP =117; FN =58 (Fig. 7).

Fig. 3. CM of tree classiﬁer

•Comparison of Performance

The classiﬁcation methods performance on the dataset of diabetes is examined and

compared. The following sections contain details on performance measurements and

comparisons.

32 V. Sajwan et al.

Fig. 4. CM of SVM

Fig. 5. CM of KNN

Fig. 6. CM of random forest

Fig. 7. CM of Naïve Bayes

Analysis of the Performance of Data Mining Classiﬁcation Algorithm 33

Fig. 8. CM of artiﬁcial neural network

Fig. 9. CM of logistic regression

•Evaluation Measures of Performance

As mentioned before, the CM illustrates four critical metrics for evaluating classiﬁca-

tion models: true negative (TN), true positive (TP), false negative (FN) and false positive

(FP). These metrics are applied to calculate the following measures of performance:

a) Recall b) Precision c) Accuracy D) F1-measure. These performance metrics are

derived by the use of (TP, TN, FP, and FN). The following metrics are used in this study

to examine and evaluate categorization models:

Accuracy =TP +TN

TP +TN +FP +FN (1)

precision =TP

TP +FP (2)

Re call =TP

TP =FN (3)

F1−measure =2×(precision ×recall )

precision +recall (4)

•Classiﬁcation Model Comparison

The performance of the implemented classiﬁers is assessed in this subsection

using the aforementioned metrics. Table 3summarizes the performance metrics for

the classiﬁers used.

34 V. Sajwan et al.

Tabl e 3 . Measures of performance of applied classiﬁers

Method AUC CA F1 Precision Recall

Tree 67.9% 70.8% 70.04% 70.2% 70.8%

SVM 75.9% 72% 71.8% 71.6% 72.0%

KNN 78.8% 74.1% 73.8% 73.6% 74.1%

Naïve Bayes 82.9% 76.2% 76.3% 76.4% 76.2%

Random Forest 81.1% 76.3% 75.9% 75.8% 76.3%

Neural Network 82.6% 76.6% 76% 76.0% 76.6%

Logistic Regression 82.9% 77.2% 76.4% 76.7% 77.2%

It also compares the accurate performance of all applicable models. It is self-

evident that Logistic Regression surpasses other classiﬁers with 77.2% accuracy. Logistic

Regression is followed by a Artiﬁcial Neural Network model in second place with an

accuracy of 76.6% and Random Forest in third place with a accuracy of 76.3%. Random

forest is followed by the KNN model in fourth place with accuracy of 74.1%. And SVM

got ﬁfth position with accuracy of 72%. Decision tree with the accuracy of 70.8% is

the worst case. Logistic regression outperforms in all performance measures like AUC,

F1-score, Precision and Recall, which can be shown in Table 3.

5 Conclusion

Automatic diabetes detection is a signiﬁcant real-world medical issue. Early detection

and management of diabetes are critical. This article demonstrates the use of several

classiﬁers, including Decision Trees, SVM, KNN, Naive Bayes, Random Forest, Neural

Network, and Logistic Regression, to simulate actual diabetes diagnosis for local and

systemic therapy, as well as presenting relevant work in the ﬁeld and the outcome indi-

cates that Logistic Regression outperforms all other classiﬁers. The suggested model’s

usefulness is demonstrated by experimental data. The performance of the strategies was

evaluated in relation to the problem of diabetes diagnosis. Experiments validate the

given model. In the future, it is planned to compile data from several locations across

India and develop a more precise and broad predictive model for diabetes diagnosis.

Future research will similarly focus on accumulating data from a later time period and

identifying additional possible prognostic factors to integrate. The technique might be

expanded and reﬁned to automate the analysis of diabetes.

References

1. https://www.who.int/news-room/fact-sheets/detail/diabetes

2. Amin, D.M., Garg, A.: Performance analysis of data mining algorithms. J. Comput. Theor.

Nanosci. 16(9), 3849–3853 (2019). https://doi.org/10.1166/jctn.2019.8260

Analysis of the Performance of Data Mining Classiﬁcation Algorithm 35

3. Saichanma, S., Chulsomlee, S., Thangrua, N., Pongsuchart, P., Sanmun, D.: The observation

report of red blood cell morphology in Thailand teenager by using data mining technique.

Adv. Hematol. 2014, 1–5 (2014). https://doi.org/10.1155/2014/493706

4. Canlas, R.D. (2009). Data Mining in Healthcare: Current applications & Issues, Unpublished

Master Thesis, 1–10

5. Iyer, A., Jeyalatha, S., Sumbaly, R.: Diagnosis of diabetes using classiﬁcation mining tech-

niques. Int. J. Data Min. Knowl. Manag. Process 5(1), 01–14 (2015). https://doi.org/10.5121/

ijdkp.2015.5101

6. Aljumah, A.A., Ahamad, M.G., Siddiqui, M.K.: Application of data mining: diabetes health

care in young and old patients. Journal of King Saud University - Computer and Information

Sciences 25(2), 127–136 (2013). https://doi.org/10.1016/j.jksuci.2012.10.003

7. Barakat, N., Bradley, A.P., Barakat, M.N.H.: Intelligible support vector machines for diagnosis

of diabetes mellitus. IEEE Trans. Inf. Technol. Biomed. 14(4), 1114–1120 (2010). https://

doi.org/10.1109/titb.2009.2039485

8. Komi, M., Li, J., Zhai, Y., Zhang, X.: Application of data mining methods in diabetes

prediction. In: 2nd International Conference on Image, Vision and Computing (ICIVC),

pp. 1006–1010 (2017)

9. Pradeep, K.R., Naveen, N.C.: Predictive analysis of diabetes using J48 algorithm of clas-

siﬁcation techniques. In: 2nd International Conference on Contemporary Computing and

Informatics (IC3I), pp. 347–352 (2016)

10. Kavakiotis, I., Tsave, O., Salifoglou, A., Maglaveras, N., Vlahavas, I., Chouvarda, I.: Machine

learning and data mining methods in diabetes research. Comput. Struct. Biotechnol. J. 15,

104–116 (2017). https://doi.org/10.1016/j.csbj.2016.12.005

11. Nilashi, M., Ibrahim, O.B., Ahmadi, H., Shahmoradi, L.: An analytical method for diseases

prediction using machine learning techniques. Comput. Chem. Eng. 106, 212–223 (2017).

https://doi.org/10.1016/j.compchemeng.2017.06.011

12. Kamadi, V.V., Allam, A.R., Thummala, S.M.: A computational intelligence technique for

the effective diagnosis of diabetic patients using principal component analysis (PCA) and

modiﬁed fuzzy SLIQ decision tree approach. Appl. Soft Comput. 49, 137–145 (2016). https://

doi.org/10.1016/j.asoc.2016.05.010

13. Perveen, S., Shahbaz, M., Guergachi, A., Keshavjee, K.: Performance analysis of data mining

classiﬁcation techniques to predict diabetes. Procedia Comput. Sci. 82, 115–121 (2016).

https://doi.org/10.1016/j.procs.2016.04.016

14. Rathore, A., Chauhan, S., Gujral, S.: Detecting and predicting diabetes using supervised

learning: an approach towards better healthcare for women. Int. J. Adv. Res. Comput. Sci.

8(5), 1192–1195 (2017)

15. Mahmud, S.M.H., et al.: Machine Learning Based Uniﬁed Frameworkfor Diabetes Prediction.

Association for Computing Machinery. China (2018). https://doi.org/10.1145/3297730.329

7737

16. Ahuja, R., Sharma, S.C., Ali, M.: A diabetic disease prediction model based on classiﬁcation

algorithms. Annals of Emerging Technologies in Computing 3(3), 44–52 (2019). https://doi.

org/10.33166/aetic.2019.03.005

17. García-Ordás, M.T., Benavides, C., Benítez-Andrades, J.A., Alaiz-Moretón, H., García-

Rodríguez, I.: Diabetes detection using deep learning techniques with oversampling and

feature augmentation. Comput. Methods Programs Biomed. 202, 105968 (2021). https://doi.

org/10.1016/j.cmpb.2021.105968

18. Bukhari, M.M., Alkhamees, B.F., Hussain, S., Gumaei, A., Assiri, A., Ullah, S.S.: An

improved artiﬁcial neural network model for effective diabetes prediction. Complexity 2021,

1–10 (2021). https://doi.org/10.1155/2021/5525271

36 V. Sajwan et al.

19. Roy, K., et al.: An enhanced machine learning framework for type 2 diabetes classiﬁcation

using imbalanced data with missing values. Complexity 2021, 1–21 (2021). https://doi.org/

10.1155/2021/9953314

20. Khanam, J.J., Foo, S.Y.: A comparison of machine learning algorithms for diabetes prediction.

ICT Express 7(4), 432–439 (2021). https://doi.org/10.1016/j.icte.2021.02.004

21. Orange – Data Mining Fruitful & Fun. https://orange.biolab.si/

22. Diabetes –dataset. https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database/.

Accessed 01 April 2022

23. Pattnaik, P.K., Rautaray, S.S., Das, H., Nayak, J.: Progress in computing, analytics and

networking. In: Proceedings of ICCAN 2017 (2018)

24. Garcia, S., Luengo, J., Herra, F.: Data Preprocessing in Data Mining. Springer (2015). https://

doi.org/10.1007/978-3-319-10247-4

25. Alsalamah, M., Amin, S., Palade, V.: Clinical practice for diagnostic causes for obstructive

sleep apnea using artiﬁcial intelligent neural networks. In: Miraz, M.H., Excell, P., Ware, A.,

Soomro, S., Ali, M. (eds.) iCETiC 2018. LNICSSITE, vol. 200, pp. 259–272. Springer, Cham

(2018). https://doi.org/10.1007/978-3-319-95450-9_22

26. Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining: Practical Machine Learning Tools

and Techniques. Morgan Kaufmann (2016)

27. Rennie, J.D., et al.: Tackling the poor asumptions of naive bayes text classiﬁers. In: Proceed-

ings of the 20th International Conference on Machine Learning (ICML 2003), pp. 616–623

(2003)

28. Chen, G.H., Shah, D.: Explaining the success of nearest neighbor methods in prediction.

Found. Trends® Mach. Learn. 10(5–6), 337–588 (2018). https://doi.org/10.1561/2200000064

29. van Gerven, M., Bohte, S.: Artiﬁcial neural networks as models of neural information

processing. Front. Comput. Neurosci. 11, 114 (2017). https://doi.org/10.3389/fncom.2017.

00114

30. Davies, A., Ghahramani, Z.: The random forest kernal and other kiernals for big data from

random partitions (2014). arXiv.1402.4293

31. Smola, A.J., Schölkopf, B.: A tutorial on support vector regression. Stat. Comput. 14(3),

199–222 (2004). https://doi.org/10.1023/b:stco.0000035301.49549.88

32. Rokach, L.: Data Mining with Decision Trees: Theory and Application, vol. 81. World

Scientiﬁc (2014)

33. Weisberg, S.: Applied Linear Regression, 4th ed. Wiley (2013)

Networks, Security and Privacy Parallel

and Distributed Networks

Prediction of DDoS Attacks Using Machine

Learning Algorithms Based on Classiﬁcation

Technique

Anupama Mishra1(B)and Deepesh Rawat2

1Computer Science and Engineering, Himalayan School of Science and Technology, Swami

Rama Himalayan University, Dehradun, India

anupamamishra@srhu.edu.in

2Electronics & Communication Engineering, Himalayan School of Science and Technology,

Swami Rama Himalayan University, Dehradun, India

Abstract. Distributed denial of service attacks often know as network threat is a

severe threat, and are a type of cyber-attack that are directed at a particular system

or network in an effort to make that system or net-work out of reach and unusable

for a period of time. The improved detection of a wide variety of dis-tributeddenial-

of-service (DDoS) cyber threats by utilizing advanced algorithms and a higher

level of accuracy while maintaining a manageable level of computational cost has

consequently emerged as the utmost essential part of detecting DDoS in today’s

world. The DDoS attack that has been launched against the targeted network or

system must be determined in view of defending the machines in a net-work that

has been targeted. In this paper, a number of ensemble classiﬁcation techniques

are dis-cussed, which combine the performance of various algorithms to improve

overall performance. Using many performance metrics such as a receiver operating

characteristics (ROC) curves, precision, accuracy, recall and F1 scores, we present

and analyzed the performance of algorithms used in our proposed approach.

Keywords: Distributed denial of service attack ·Machine learning ·Random

forest ·Naïve Bayes ·Decision tree

1 Introduction

A denial-of-service attack, according to the World Wide Web security question, is one

that is “de-signed to prevent a computer or network from providing normal services [1].“

The rapid expansion of Internet has resulted the speciﬁc type of DoS attack development

that has proven to be extremely effective and difﬁcult to defend against the distributed

DoS attack. This attacks do not originate from a single source, but rather from a number

of spoofed sources that use a variety of attack types in a coordinated effort. Fake Internet

Protocol (IP) addresses, as opposed to real ones, are used to identify computers that are

either unwitting accomplices or that the attacker has control over. Attackers are able to

coordinate deadly attacks on multiple targets at the same time using their own resources

R. Mehra et al. (Eds.): ICCISC 2022, CCIS 1672, pp. 39–50, 2022.

https://doi.org/10.1007/978-3-031-22915-2_4

40 A. Mishra and D. Rawat

and the resources of their “zombies,” resulting in greater damage in a shorter amount of

time than they could have done otherwise [2].

A distributed denial of service which is also a cyberattack can bring websites, servers,

and other online services to a crawl. The perpetrator uses multiple computers and devices

to send fraudulent requests to a server, making it appear that the server is being attacked

by a large number of people. The term is some-times used interchangeably with the

term “denial of service attack”, but “DDOS” refers speciﬁcally to an attack that uses

multiple sources to ﬂood a target with requests. Some DDOS attacks involve the use

of botnets, which are networks of compromised computers and devices that have been

malware-infected without the users’ knowledge [3].

DDoS is a form of cyberattack where a network is attacked with heavy trafﬁc that’s

create a problem for users to access a website or service. This type of attack is often

used as a tactic to make a target site or service appear overwhelmed or unreliable to

users. DDoS attacks can also be used to force a website or service to a user to a speciﬁc

location, where it is under the control of the attacker. DDoS attacks are often used for

online harassment, for example, when a website is under constant attack and can’t im-

prove its performance or stay online [4]. DDoS is a method of disrupting a system in a

network by sending a large amount of data to a server or system from multiple different

sources. The result is that the system or network is not able to handle the load and

ultimately crashes. This makes the system or net-work unavailable to its intended use

[5].

DDoS is a form of cyberattack where multiple hosts are used to bombard a web

server or other net-work target with data, often using a botnet or other network of

malware-infected computers, until the target’s resources are consumed and it is rendered

inaccessible to legitimate users. Unlike a traditional DoS attack, where a single host is

used to ﬂood a target’s resources and crash their services, a DDoS at-tack is much more

complicated and powerful. DDoS attacks can be extremely difﬁcult to stop due to their

decentralized nature. DDoS attacks can be carried out with a handful of hosts or even a

single host, making them much more difﬁcult to detect and investigate [6]. A denial of

service refers to a situation in which a service or resource is utilized to the point of being

rendered unusable or inaccessible to other users. These are often seen in the context of

online gaming servers, but can also affect online banking or shopping services.

A distributed denial of service occurs when a single device or user is able to bring

down a server or network resource. This can be accomplished by distributing a request

across a network, such as when a single computer is used to send an entire ﬁle to a website.

Figure 1depicts how this attack [3] had a signiﬁcant impact on the telecommunications

industry [7]. Current and former employees of several tech companies are accusing

Amazon, Facebook, Google and Microsoft of failing to protect their employees from the

COVID-19 pandemic, with some accusing the companies of endangering their workers’

safety. We’ve seen this story before. In the wake of a string of high-proﬁle layoffs,

employees have accused the companies of not doing enough to protect their workers’

health. But for companies as large as Amazon and Facebook, the risks of a pandemic

are remote. Therefore, the developed technology helps in this ﬁeld to defend our system

and network from the threat. Machine learning is one of the technologies which is being

used for defensive mechanisms in many applications.

Prediction of DDoS Attacks Using Machine Learning Algorithms 41

Following are the sections that provides an outline for the paper: Sect. 2presents the

related work based on existing defensive mechanisms, our proposed work is discussed

in Sect. 3. Section 4evaluated the research work, and Sect. 5brings the research to its

conclusion.

Telecommunicat

ions

51%

Gaming /

Gambling

23%

Informaon

Technology

and Services

21%

Computer

Soware

Business

Services

Computer

Hardware

Cryptocurrency

Retail

pct_a_bytes

Fig. 1. Impact of distributed denial of service attacks

2 Related Work

Machine learning allows to compare models with different features, which can help to

choose the one that is best for the use of applications. Speciﬁcally, we can use a machine

learning model to identify which types of data are most predictive of the outcome of

interest, such as cyberattack.

A defensive approach based on low-cost was proposed by the authors in [6,7]and

focused in calculation of the entropy between benign trafﬁc and DDoS attacks. As an

additional suggestion, the authors Intensity reduction strategy for dealing with attacks has

the following characteristics: The following are three advantages of using this method-

ology over other current methods: The ﬁrst is that it has a high level of detection. In

addition to having a reduces false alarm, also the capability of detecting small changes

in the environment at a rapid pace along with the mitigate approach.

The authors [7] developed a solution for resolving authentication and security chal-

lenges connected to smart vessels in sea transport. An identity-based approach is used

to authenticate the access for smart vessel and devices. But the method is limited to

maritime transportation.

An auction with many attributes was proposed in [8] to mitigate distributed DDoS

attacks on a net-work. A reputation-based detection approach was proposed, in which

the minimal utility deﬁnes the user’s reputation. A payment plan for normal users and

42 A. Mishra and D. Rawat

another for fraudulent users, as well as an identifying mechanism are proposed in addition

with the identiﬁcation method. In this method, a greedy re-source for allocation strategy

is used to ensure that resources are distributed appropriately among legitimate users.

Differential payment systems are designed to penalize malevolent users that manipulate

their offers in order to obtain the maximum possible share of limited resources.

The authors [9] describe an approach for detecting distributed denial of service

attacks that makes use of Bayesian game theory. It is assumed that the service provider as

well as legitimate users monitor the network in order to collect probabilistic information

to ensure that another user is acting maliciously on their behalf or not. As a result of

having this probabilistic knowledge, both the service provider and authorized users have

the ability to alter their actions and replies in reaction to harmful activity on the network.

The authors propose a Bayesian pricing [10] and auction approach for obtaining Bayesian

Nash Equilibrium points in a variety of settings in which genuine consumers and service

providers beneﬁt from probabilistic knowledge. This is accomplished by taking into

consideration the aforementioned assumptions and facts. In addition to this, a reputation

evaluation and updating system is offered to determine a user’s dependability based on

factors such as the user’s payment history and the amount of time spent participating in

the platform (Table 1).

Tabl e 1 . Comparative table of existing work.

References Techniques Merits Limitation

[10] Used SDN (Software

Deﬁned Network

Detection Rate

is high

Only work on

Volume based DDoS

[11] Worked on IBE (Identity

Based Encryption)

Detection Rate is high Overhead is high

[12] Worked on IBS (Identity

Based Signature) along with

IBE

Detection rate is moderate Overhead is high

[13]BasedonBoosting

Algorithms

Detection rate is moderate False alarms

3 Proposed Work

3.1 Approach

In our paper, we are primarily concerned with data preprocessing, selection of signiﬁcant

features [13], machine modelling through a classiﬁer, and then ﬁnally prediction on

testing dataset. After performance evaluation on results, the research work is concluded.

The approach includes the following activities [14–19]:

Preparation of information: This phase is concerned with preparation of the data

which is comprised of tasks helps on processing the raw data into a clean dataset.

Prediction of DDoS Attacks Using Machine Learning Algorithms 43

If the raw data is not in a usable state at the time of completion, the type and order

of activities may change, and Some of the unrelatable features may be removed. A

few examples of the responsibilities involved are data cleansing, feature selection, and

data transformation. In our work, Fig. 2depicts the best 15 features by using extra

tree classiﬁers. The selected features are: ACK Flag Count, Inbound, URG Flag Count,

Destination IP, Source IP, Init_Win_bytes_forward, Timestamp, Flow ID, Pro-tocol, Min

Packet Length, min_seg_size_forward, Destination Port, Max Packet Length, Average

Packet Size, and Packet Length Std.

Modelling: In the modelling phase, modelling techniques are applied to the data. This

is done in order to achieve the best possible performance by adjusting the parameters of

the models in question.

As previously indicated, this step is closely tied to data preparation because modelling

might disclose previously unknown data errors. Depending on the situation, the data

preparation method can result in the employment of several models.

Fig. 2. 15 Best Features are selected by using extra trees classiﬁers

44 A. Mishra and D. Rawat

3.2 Modelling

In order to compare and contrast the datasets, three different supervised learning clas-

siﬁers are selected based on a number of parameters [15–17], including the paramet-

ric models and nonparametric models, applications and use of algorithms have been

discussed and used in previous work.

3.2.1 Random Forest

A machine learning algorithm that is used to classify a dataset into a speciﬁc category. It

is a combination of many decision trees. The results of the decision trees are combined

in a way that helps reduce the error rate of the classiﬁcation. This is similar to how a

forest is grown [20–22].

Random forest is a machine learning technique that groups examples together by

their similarity, rather than grouping them by their distance to the target classiﬁcation.

This is often referred to as “many small decisions,” as opposed to “one big decision,”

which is how other machine learning techniques work. This means that random forest

will, on average, get more things right than other machine learning techniques. However,

it is also more likely to get things wrong.

A machine learning technique that ﬁnds patterns in large numbers of variables. For

example, in a medical diagnosis problem, instead of just looking at a patient’s symptoms

and lab results, a machine learning technique might look at millions of different combi-

nations of symptoms and lab results to ﬁnd patterns that help make a better diagnosis.

In a ﬁnancial prediction problem, instead of just looking at a stock’s past performance,

a machine learning technique might look at millions of past stock trans-actions to ﬁnd

patterns that help predict whether a stock will rise or fall. The same is true for any other

problem: ﬁnding the right combination of input variables is critical for making accurate

predictions [23].

The random forest technique is often more effective than traditional decision trees,

because it is more likely to capture non-linear relationships in data.

3.2.2 Decision Tree

Decision trees are a machine learning technique that ﬁnds patterns in large amounts of

data. In a traditional decision tree, the data is split into two groups: examples that should

be classiﬁed as “yes” or “no” in the question being asked, and examples that should be

classiﬁed as “yes” or “no” on their own. This is a two-class classiﬁcation problem. For

example, if the question being asked is “does this dog have ﬂeas?” [24,25].

A machine learning technique that uses decision trees to make a classiﬁcation. Each

decision tree is built on a subset of the original data. The random forest technique is

often more effective than traditional decision trees, because it is more likely to capture

non-linear relationships in data. Data is split into a number of groups, and each group is

given a separate decision tree.

A machine learning technique that ﬁnds patterns in large numbers of variables. For

example, in a medical diagnosis problem, instead of just looking at a patient’s symptoms

and lab results, a machine learning technique might look at millions of different combi-

nations of symptoms and lab results to ﬁnd pat-terns that help make a better diagnosis.

Prediction of DDoS Attacks Using Machine Learning Algorithms 45

In a ﬁnancial prediction problem, instead of just looking at a stock’s past performance,

a machine learning technique might look at millions of past stock transactions to ﬁnd

patterns that help predict whether a stock will rise or fall. The same is true for any other

problem: ﬁnding the right combination of input variables is critical for making accurate

predictions. It is a tree-based method that uses the outcome of a single decision tree as the

input for the next tree in the forest. This helps reduce the error rate of the classiﬁcation.

This is similar to how a forest is grown [26].

Decision trees are one of the most basic machine learning techniques. They are a

machine learning technique that ﬁnds patterns in large numbers of input variables. To

build a decision tree, a machine learning technique starts by choosing a subset of the

original data as the root node and then from there, the machine learning technique divides

the original data into subsets, or nodes, based on some criteria, such as variable type or

variable range.

3.2.3 Naïve Bayes

A machine learning technique that uses Bayes theorem to make predictions. The Naive

Bayes ma-chine learning technique assumes that each input variable is independent of

the others. For example, if a machine learning technique is trying to predict whether a

person has a certain disease, Naive Bayes would assume that the presence or absence of

a symptom has no effect on the prediction. This assumption turns out to be surprisingly

accurate in many cases [27,28].

An example of a machine learning technique that uses decision trees is the naïve

Bayes classiﬁcation machine learning technique and often used in text classiﬁcation

problems. In a text classiﬁcation problem, the naïve Bayes machine learning technique

uses decision trees to classify text into different categories. The naïve Bayes machine

learning technique uses decision trees instead of other machine learning techniques

because decision trees are able to capture non-linear relationships in data better than

other machine learning techniques.

Naive Bayes [18,19] used to make predictions. It is often one of the ﬁrst machine

learning techniques that people learn because it is easy to understand. The Naive Bayes

machine learning technique is also often used as a simple baseline to compare the

accuracy of other machine learning techniques. For ex-ample, if a machine learning

technique is twice as accurate as the Naive Bayes machine learning technique, then it is

likely that the ﬁrst machine learning technique is a good one.

Naive Bayes is a machine learning technique that is used to make a prediction. It

is a classiﬁcation technique. For example, if the question being asked is “Will it rain

today? a Naive Bayes technique might be used to predict whether it will rain today. It is

a machine learning technique that ﬁnds patterns in large numbers of variables. It works

on bayes theorem by combining the probability that a certain in-put will lead to a certain

output with the probability that a different input will lead to the same output.

4 Result Analysis

We have the following metrics for evaluating the performance of classiﬁcation machine

learning [29–31]. Performance metrics for machine learning such as precision, recall,

46 A. Mishra and D. Rawat

and f1-score are used to evaluate the quality of the model. These metrics can be used

to compare the performance of different models and to evaluate the impact of different

training regimes on model performance. Precision is a measure of the number of correctly

classiﬁed examples. It is calculated as the ratio of the number of examples correctly

classiﬁed by the model to the total number of examples in the training set.

The precision is the percentage of the time that an output was actually produced. The

recall is the per-centage of the time that a speciﬁc output was actually identiﬁed. Table

2,Fig.3and 4are used to show the results as precision, recall , f1 score and accuracy.

Tabl e 2 . Classiﬁcation report of applied algorithms.

Decision Tree Random Forest Naïve Bayes

Precision Recall F1

Score

Precision Recall F1

Score

Precision Recall F1

Score

Benign 94 98 96 94 100 97 84 100 94

DDoS 100 100 100 100 100 100 100 99 100

100

105

Precision Recall F1 Score Precision Recall F1 Score Precision Recall F1 Score

Decision Tree Random Forest Naïve Bayes

Performance Matrics of Applied Classiﬁers

Benign DDoS

Fig. 3. Performance metrics of applied classiﬁers

Prediction of DDoS Attacks Using Machine Learning Algorithms 47

99.2

99.3

99.4

99.5

99.6

99.7

99.8

99.9

Random Forest Decision Tree Naïve Bayes

Accuracy

Fig. 4. Accuracy of applied classiﬁers

As shown in Figs. 5,6and 7, the results are discussed and analyzed. All three of

them performed admirably when it came to classifying DDoS trafﬁc. The results of the

performance metric evaluations can be conﬁrmed through the examination of the RoC

Curve.

Fig. 5. RoC curve of Random Forest

48 A. Mishra and D. Rawat

Fig. 6. RoC curve of Naïve Bayes

Fig. 7. RoC curve of Decision Tree

5 Conclusion

In this study, the datasets were classiﬁed into binary classiﬁcation using machine learning

classiﬁers, and each class was detected and validated properly. A comprehensive analysis

of multiple machine learning algorithms was carried out for the purpose of identifying

DDoS cyber threats, with the Random Forest with the highest accuracy score of 99.80

percent. The naive bayes method achieved 99.42 percent accuracy, while the decision

tree achieved 99.75 percent accuracy in achieving the target. For future work, types of

DDoS attacks can be targeted for classiﬁcation and prediction in the future.

References

1. Badve, O.P., et al.: Taxonomy of DoS and DDoS attacks and desirable defense mechanism in

a cloud computing environment. Neural Comput. Appl. 28(12), 3655–3682 (2017)

2. Gupta, B.B., et al.: A comprehensive survey on DDoS attacks and recent defense mechanisms.

In: Handbook of Research on Intrusion Detection Systems, pp. 186–218. IGI Global (2020)

3. https://radar.cloudflare.com/notebooks/ddos-2022-q1. Accessed 2 Apr 2022

Prediction of DDoS Attacks Using Machine Learning Algorithms 49

4. Mishra, A., et al.: Security threats and recent countermeasures in cloud computing. Modern

Principles, Practices, and Algorithms for Cloud Security, pp. 145–161. IGI Global (2020)

5. Mishra, A., Gupta, N.: Analysis of Cloud Computing Vulnerability against DDoS. In: Interna-

tional Conference on Innovative Sustainable Computational Technologies (CISCT), pp. 1–6.

IEEE (2019)

6. Mishra, A., Gupta, N., Gupta, B.B.: Defense mechanisms against DDoS attack based on

entropy in SDN-cloud using POX controller. Telecommun. Syst. 77(1), 47–62 (2021). https://

doi.org/10.1007/s11235-020-00747-w

7. Gaurav, A., et al.: Identity-based authentication mechanism for secure information sharing in

the mari-time transport system. IEEE Trans. Intell. Transp. Syst. (2021)

8. Nguyen, G.N., et al.: Secure blockchain enabled cyber–physical systems in healthcare using

deep belief network with ResNet model. J. Parallel Distrib. Comput. 153, 150–160 (2021)

9. Zhou, Z., et al.: A ﬁne-grained access control and security approach for intelligent vehicular

transport in 6g communication system. IEEE Trans. Intell. Transp. Syst. (2021)

10. Dahiya, A., Gupta, B.B.: Multi attribute auction based incentivized solution against DDoS

attacks. Comput. Secur. 92, 101763 (2020)

11. Cviti´c, I., et al.: Boosting-based DDoS detection in internet of things systems. IEEE Internet

of Things J. 9, 2109–2123 (2021)

12. Dahiya, A., et al.: A reputation score policy and Bayesian game theory based incentivised

mechanism for DDoS attacks mitigation and cyber defense. Future Generation Computer

Systems (2020)

13. Han, J., et al.: Data Mining: Concepts and Techniques. Elsevier (2011)

14. DDoS 2019 | Datasets | Research | Canadian Institute for Cybersecurity | UNB. Accessed 28

Apr 2022

15. Alzahrani, R.J., et al.: Security analysis of DDoS attacks using machine learning algorithms

in networks trafﬁc. Electronics 10(23), 2919 (2021)

16. He, Z., Zhang, T., Lee, R.B.: Machine learning based DDoS attack detection from source side

in cloud. In: Proceedings of the 2017 IEEE 4th International Conference on Cyber Security

and Cloud Computing (CSCloud), New York, NY, USA, 26–28 June 2017, pp. 114–120

(2017)

17. Aamir, M., et al.: DDoS attack detection with feature engineering and machine learning: the

framework and performance evaluation. Int. J. Inf. Secur. 18, 761–785 (2019)

18. Liu, Z., et al.: The prediction of DDoS attack by machine learning. In: Third International Con-

ference on Electronics and Communication; Network and Computer Technology (ECNCT

2021), vol. 12167, pp. 681–686. SPIE (2022)

19. Zewdie, T.G., Girma, A.: An evaluation framework for machine learning methods in detection

of DoS and DDoS intrusion. In: 2022 International Conference on Artiﬁcial Intelligence in

Information and Communication (ICAIIC), pp. 115–121 (2022)

20. Sahoo, S., et al.: Multiple features based approach for automatic fake news detection on social

net-works using deep learning. Appl. Soft Comput. 100, 106983 (2021)

21. Cviti´c, I., Perakovi´c, D., Periša, M., Gupta, B.: Ensemble machine learning approach for

classiﬁcation of IoT devices in smart home. Int. J. Mach. Learn. Cybern. 12(11), 3179–3202

(2021). https://doi.org/10.1007/s13042-020-01241-0

22. Gupta, B.B., et al.: Machine learning and smart card based two-factor authentication scheme

for pre-serving anonymity in telecare medical information system (TMIS). Neural Computing

and Applications, 1–26 (2021). https://doi.org/10.1007/s00521-021-06152-x

23. Yamaguchi, S., Gupta, B.: Malware threat in Internet of Things and its mitigation analysis.

In: Research Anthology on Combating Denial-of-Service Attacks, pp. 371–387. IGI Global

(2021)

50 A. Mishra and D. Rawat

24. Perakovi´c, D., et al.: A Big Data and Deep Learning based Approach for DDoS Detection

in Cloud Computing Environment. In: 2021 IEEE 10th Global Conference on Consumer

Electronics (GCCE), pp. 287–290. IEEE (2021)

25. Dahiya, A., et al.: A PBNM and economic incentive-based defensive mechanism against

DDoS at-tacks. Enterp. Inf. Syst. 16(3), 406–426 (2022)

26. Dahiya, A., et al.: A reputation score policy and Bayesian game theory based incentivized

mechanism for DDoS attacks mitigation and cyber defense. Futur. Gener. Comput. Syst. 117,

193–204 (2021)

27. Chartuni, A., et al.: Multi-classiﬁer of DDoS attacks in computer networks built on neural

networks. Appl. Sci. 11(22), 10609 (2021)

28. Zhu, X., et al.: Prediction of rockhead using a hybrid N-XGBoost machine learning

framework. J. Rock Mech. Geotech. Eng. 13(6), 1231–1245 (2021)

29. Teles, G., Rodrigues, J.J., Rabêlo, R.A., Kozlov, S.A.: Comparative study of support vector

machines and random forests machine learning algorithms on credit operation. Software:

Practice and Experience 51(12), 2492–2500 (2021)

30. Gaurav, A., et al.: A comprehensive survey on machine learning approaches for malware

detection in IoT-based enterprise information system. Enterprise Information Systems, 1–25

(2022)

31. Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–

2830 (2011)

Role of Internet of Things and Cloud Computing

in Education System: A Review

Ajay Krishan Gairola1,2(B)and Vidit Kumar1

1Graphic Era Deemed to Be University, Dehradun, India

ajaykrishangairola@gmail.com

2Graphic Era Hill University, Dehradun, India

Abstract. The current outbreak of the coronavirus (COVID19) pandemic has

affected education across the world. To meet the current challenges posed by

COVID19, educational institutions (schools, colleges and universities) need to be

more efﬁcient in providing quality educational services virtually. Cloud computing

and Internet of things (IOT) are such technologies that accomplishes this. In this

work, we review the recent works related to the cloud technology and IOT in

education system and explores its various beneﬁts and challenges. Furthermore,

this article examines recent work on the potential scope of IoT in the Education

Sector.

Keywords: Cloud computing ·Virtual learning ·Internet of Things ·Online

teaching ·Virtual classes ·Smart device

1 Introduction

The international quarantine imposed since December 2019 due to the coronavirus pan-

demic has put pressure on us to reconsider the idea of e-learning [1]. Cloud computing

is a must-have within the academic process, and it is widely employed in enterprises.

For individuals in every aspect of the process of mastery, cloud technology makes train-

ing a true and enjoyable pleasure. With the help of smart devices, students can now

communicate with each other or with teachers and experience the ﬂexibility of learning.

The offerings of cloud computing have advanced the results of an institution’s study

and have allowed professors and college students to access this modern technology as

well as receive additional blessings. Over time the educational system has changed and

at the same time is no longer limited to blackboard classrooms and textbooks. Training

in the cutting edge landscape of cloud computing has emerged as an advantage for the

commercial enterprise, while all and sundry are trying their hardest to combat the virus

posed by COVID-19 [18]. From preserving scholarly data to storing information, from

online training systems to advanced study analysis, it has completely revolutionized

teaching-learning training. Students, professors, and instructors can now enjoy cloud-

primarily based totally training‘s accessibility and convenience. The Internet of Things

connects processes, people, data, and devices, making it easier for education stakehold-

ers to convert data acquired from portable devices and sensors into useful information

R. Mehra et al. (Eds.): ICCISC 2022, CCIS 1672, pp. 51–60, 2022.

https://doi.org/10.1007/978-3-031-22915-2_5

52 A. K. Gairola and V. Kumar

and to take meaningful steps taken in response to that facts [2]. It is crucial to con-

sider the inﬂuence of IoT adoption in order to understand the problems and beneﬁts

of Internet of Things in Education, especially as IoT is still in its early stages in the

education system. The Internet of Things provides numerous advantages, including: the

development of intelligent interactive classes; the ability to customize dynamic models

in which Students are active learners process; the encouragement of imagination; as well

as real-time monitoring of students’ cognitive processing. The COVID19 epidemic has

put a strain on both research and the use of new technology in education. Higher interest

in this study issue is indicated by an increase in the number of publications on the use

of IoT in education, while contemporary educational practices are a factual evidence of

such interest. Nevertheless, due to the fact that those possibilities are limitless within a

cloud pack- age and educational process, this research focuses on providing the recent

progress related to the cloud technology and IOT in education system along with its

beneﬁts and challenges.

1.1 Cloud Computing and Education

Cloud computing is a method of delivering a variety of services via virtual machines

that are placed on top of a big pool of actual equipment in the cloud [3]. Services

are stored in laboriously scalable information in visible form centers fashionable the

cloud and accessed via the internet by some connected scheme. In the dispersed cloud

environment, we have a lot of compute power and storage capacities. Some applications

of cloud technologies in education are depicted in Table 1.

Tabl e 1 . Examples of the application of cloud technologies in education

Google Classroom [4]Google Classroom is a cloud-primarily based totally gaining

knowledge of control machine this is a part of the Google Apps for

Education suite of products. Students can use Google Classroom on

PCs, tablets, and cell-phones

Blackboard [5] Education, mobility, communication, and trade software program, in

addition to associated services, are furnished via way of means of

Blackboard to customers together with instructional institutions,

enterprises, and authorities agencies. In January 2014, round 17,000

faculties and businesses in a hundred nations had been the use of its

software program and services

Knowledge Matters [6]Knowledge Matters is a major virtualized online ﬁrm that teaches

important business principles to college and high school students

through interactive web, game-like business simulations

Coursera [7] The most well-known educational site, in my opinion. Anyone can

study a wide range of subjects on Coursera. There isn‘t a single student

in the United States, Canada, Thailand, Russia, or Ukraine who isn’t

aware of Coursera’s opportunity to gain valuable knowledge

(continued)

Role of Internet of Things and Cloud Computing in Education System 53

Tabl e 1 . (continued)

Microsoft Education Centre [8] The Microsoft Education Centre turned into supposed to permit

college students to retain studying irrespective of their circumstances.

They make on- line studying viable and offer the best training to each

and every student

Classﬂow [9]Classﬂow is an interactive screen-based course delivery program that

runs in the cloud. They offer customers unlimited access to lessons and

learning tools without a subscription

1.2 IOT Enabled Education Environment

The Internet of Things is changing the way we live by transforming every product

becoming an intelligent entity. All of this is correct in the teaching institution, where a

veritable cycle of power is intelligently carried from Smart University, Smart Classroom,

Smart Learning, Smart Learning and Smart Teaching to Smart Analysis (Table 2).

Tabl e 2 . Smart education

Smart Education The purpose of smart education is to educate students with the skills and expertise

they need to succeed in today’s market. Smart education’s success is dependent on

sensing devices, an IoT infrastructure, communication linkages, and user apps. The

IoT integration in the classroom institution would conclusion in higher educational

quality since students will learn quickly and teachers will be prepared to carry forward

their educational duties more efﬁciently [10]

Smart University A smart university combines cutting-edge hardware and software, cutting-edge

concepts, education techniques based on trendy, learning tactics, and smart teaching

and smart classrooms equipped with cutting-edge technology [11]. A smart university

has access to a diverse range of worldwide materials, an interactive teaching

environment that can be examined inside the network, and learning that is ﬂexible to

data acquired. Many institutions have IoT devices such as temperature control devices,

security cameras, electricity, heating systems, and building access devices

Smart Classroom A smart classroom is a location where students can access educational activities

utilizing electronic equipment such as internet-connected gadgets, digital screens and

video projectors [12]. Beginning in 2012, a smart class is built on automated

communication devices, mobile learning and mobile technologies, which use cameras,

facial recognition algorithms, video projectors, sensors, and extra modules to keep

track of many characteristics of the natural environments. When machines are linked

to the Internet of Things, they form a smart class that allows access to knowledge from

everywhere and at any time. A smart class offers numerous advantages, including

greater information communication, ﬂexibility, interactive learning, educational

content exchange, and improved thinking capacities

(continued)

54 A. K. Gairola and V. Kumar

Tabl e 2 . (continued)

Smart Teaching The manner in which information is transmitted via electronic devices can differ

greatly from traditional teaching approaches. The material is always accessible, and

Learning is adaptable, allowing you to stay up to date on the most recent

advancements. The Internet of Things may provide access to the actual world, which

might make teaching difﬁcult because it must be adjusted and adapted to meet the

needs of students with various impairments. Teaching methods must also be modiﬁed

to accommodate students with disabilities

Smart Learning Smart learning is a learning approach that makes use of electronic gadgets. According

to [13], smart learning is a procedure that assists students in learning by focusing on

the subject as well as the students themselves. This technology’s intelligence,

adaptability, and efﬁcacy are dependent on the ICT infrastructure. The usage of

Internet of things e-learning apps is critical for establishing a virtual classroom and a

competitive learning process, both worldwide and locally. Because Students have

access to every library or lab throughout the universe to collect data, conduct

experiments, and send assignment or for self-evaluation and be assigned, the Internet

of Things fosters online self-teaching

Smart Assessment Smart assessment [14] goes beyond the traditional methods by adding other types of

evaluation, such as interviews and focus groups. To make an accurate assessment, we

must consider the effects of modern technology on how we work. The evaluation

process then evolves as we engage inside an ever-expanding IoT ecosystem. To

capture student behavior in online learning assessments, new learning systems must

have the appropriate technologies. The Internet of Things instruments are available for

use in assess the student’s concentration, which is critical in assessing their education.

It is feasible to design adaptive exams that are adjusted to the student’s responses to

questions and are presented in the student’s preferred learning style. This type of

examination would allow us to delve into the students’ knowledge, how they

understand and apply it, as well as their learning styles and skills. The usage of

simulations during educational activities is an important component of smart

assessment and can also be utilized as a learning approach

2 Literature Review

2.1 Cloud Computing in Education

In recent years, cloud computing implementations have attracted attention in several

areas, including higher education in emerging markets. In this section we present a

review of the adoption of cloud computing on education. Moodle was investigated [15]

as a case of cellular cloud gaining knowledge of structures in better learning. Posted in

[16] an overview on the use of mobile cloud computing (MCC) within the instructional

ﬁeld, which summarized the demanding conditions and troubles that MCC requires to

gain knowledge of structures, as well as privacy, interoperability arise because of. The

cognitive load on college students due to exceptional operating platforms, information

integrity, community availability and community speed, and large learning materials

and courses. Cloud computing (CC) in education was evaluated from the perspective

of teaching staff and IT professionals in Saudi Arabia [17]. [19] The authors studied

characteristics inﬂuencing adoption by collecting data from a mobile cloud learning

environment from Blackboard users at the University of Leeds in the United Kingdom

Role of Internet of Things and Cloud Computing in Education System 55

using a structured questionnaire. [20] The authors examined their country’s potential to

transition to remote mastering and reviewed structures that have been used in schooling

and supported by government access, as well as modern online conversation structures.

Those who are advanced with the help of using Microsoft Teams. [21] The authors

noted China’s revelations about the duration of the covid-19 lockdown with continued

learning. Authors described technical support for instructors and knowing help for col-

lege students. [22] The authors examined the risks of using an automated machine and

provided a web multi-element authentication test method. The authors [23] summarized

the conditions of seeking to achieve the amazing distance of knowing and e-checking

from the thoughts of professors and college students in Arab universities. Universities,

schools and various educational institutions are important for the standard development

of a country.

[24] Used the ambiguous AHP to take a look at the determinants of CC adoption

in better Indian educational institutions. The maximum important factors decided to be

relative proﬁt, IT demand and security. [25] Using a SEM approach, the authors evaluated

cloud computing-based education in Saudi Arabia and inﬂuenced characteristics such

as reliability, social impact, information quality and ease of use. Tuan suggested that

it would be more accurate to assess teacher research productivity using an integrated

multi-criteria decision-making approach (MCDM). To do this, the researchers used the

hybrid AHPTOPSIS technology. Due to the suspension of on-campus classes, a large

jump in student ranks, monodemic content, and the content offered, and the material

provided, e-mastering structures have grown at an exponential rate [26,27]. The cloud

era is now being used by many educational institutions, and it is very clear [28] that it

has a shiny destiny.

In addition, due to the fact that there is a single database for all customers in the

cloud, cyber security modiﬁcations can be analyzed and made quickly [29]. [30], as it

was designed to allow customers to collaborate from anywhere at any time. It can reach

out to more students outside the general study room and meet their needs. Due to better

calls to keep education aﬂoat, establishments are paying additional interest on a mix of

cloud generation and e-learning. Almost all educational institutions saw it as a viable and

suitable e-learning option. Nevertheless, the lack of study may also provide a theoretical

framework on which to build a technology. On the other hand, the potential inherent in

the cloud approach can also be highlighted as a major advantage in the development of

an analytical framework and one-hit training techniques [31]. However, in the literature,

common features of the cloud are associated with social participation and collaborative

learning activities [32].

2.2 IOT in Education

The term “Internet of Things” (IoT) refers to state-of-the-art technology that connects all

intelligent objects in a network without the need for human intervention. This is indeed

a new study focus that would have recently discovered an important and compelling

research base in a wide range of academic and industrial disciplines [33]. According to

Walcott [34], many governments are implementing the latest digital defense strategies in

the ﬁght against COVID 19. During this time, digital technology and innovation gradually

became the focus of mankind. The economic demands of COVID19 strongly drive the

56 A. K. Gairola and V. Kumar

deployment and creation of new digital technologies at a particular pace and scale.

Population surveillance, response assessment, incident detection and touch tracking is

one of the digital tools used to facilitate the international public health impact of COVID

19, with a focus on public participation and data mobility. According to Islam [35], the

integration of IoT with advanced technology could be a major step forward in efforts to

combat new epidemics. The potential of the Internet of Things will have a signiﬁcant

impact on the ability of Western countries to achieve the SDGs (Sustainable Development

Goals). In environments where Internet-ofThings-enabled devices and applications are

used, it is essential to implement speciﬁed protocols, patient monitoring and primary

identiﬁcation procedures, to reduce the chances of spreading the coronavirus.

According to Jawed [36], the Internet of Things can send and receive both information

and physical goods (IoT). Intelligent hospital equipment and concepts were controlled

via wireless and wired Internet. Various medical diagnostics, instruments, advanced

imaging equipment, artiﬁcial intelligence and sensors are essential for the implementa-

tion of IoT in the medical ﬁeld. Intelligent technologies can collect and share data to

carry out essential tasks in our daily lives. The application will pave the way for enter-

tainment systems, automobiles, connected healthcare and smart cities. These advances

have increased both the quality of life and the efﬁciency of industries and societies, both

new and old. This technology is ﬂourishing in health surveillance during the COVID 19

pandemic. According to Nasajpur et al. [37] Innovation has retained most of the infor-

mation about COVID19 patients inside the data center to ensure adequate attention,

and that could be more helpful. Internet of Things (IoT) combines all-digital, computing

technologies and mechanicals to transmit data over the web without human intervention.

In this dire scenario, many people die of late and incorrect medical information. The

Internet of Things is taking over every day human activities and changing health prob-

lems. Sensors are used to quickly notify the system of health issues [38]. The successful

operation of medical institutions requires proper equipment. During the COVID-19 pan-

demic, the use of the Internet of Things improves patient care. Smart medical devices

are connected via smart connectors to deliver important medical data to doctors. These

devices use the Internet of Things to successfully track real-time data, saving lives from

a variety of health problems. The Internet of Things (IoT) has great potential to analyze

and leverage impactful activities including after-services [36]. De Rauer and Radanlive

[39,40] focused on ethical IoT design updates and IoT design, but did not discuss the

implications of the coupled and multiple risks of IoT system-themes. They concluded

that before new ICT systems are incorporated, production facilities should be coupled

with an ethical assessment of cyber threats. They enable governments, health experts, and

medical organizations to build a framework to provide guidance in this article [41]asthe

introduction of IoT into the vaccination supply chain increases risk. [42] Applications of

the Internet of Things include contact tracing devices, wearable health monitors, thermal

cameras, temperature sensors and package tracking to help ﬁght disease by providing

critical data needed for the safe delivery of COVID19 vaccines. In this COVID situa-

tion, IoT has helped to make automated activities in warehouses and supply chains more

resilient to encourage social distancing and secure remote access to industrial machines.

By studying the potential of IoT in the socio-economic development areas of

Bangladesh, Parvez et al. [43] Created a conceptual framework model, and the model

Role of Internet of Things and Cloud Computing in Education System 57

showed that Bangladesh needed to develop a set of policies for IoT deployment to

implement a national strategy on the Internet of Things. Miyazi et al. [44]TheIoT,

introduced in Bangladesh, reveals technical challenges, ﬁnancial challenges, security,

privacy issues and device reliability, along with opportunities such as occupational safety,

mHealth, trafﬁc safety, service management and environmental monitoring. Sarkar et al.

[45] Highlighting the future prospects and problems of some of the most promising IoT

applications. As per the literature review on IoT applications in Bangladesh, no such

in-depth work has been found on the current scenario of employing IoT in various indus-

tries during COVID-19. As a result, a conceptual model of the impact of Inter-Net of

Things applications across multiple industries was created during the pandemic. Dur-

ing the pandemic, this study looks at the barriers and beneﬁts of adopting IoT services

across multiple sectors, and the ﬁndings will help organizations respond and adapt to

IoT services more quickly, giving them a competitive advantage.

3 Discussion and Conclusion

In India, cloud computing adaptation in higher education is an under-studied area and

the literature does not document systematic studies. We studied in this article how cloud

computing can be used in educational contexts. Due to the COVID-19 pandemic which

has prompted many schools, colleges and establishments to supply online training, it

has become mandatory. According to the analysis’ overview, using cloud services in

E-learning is a good option since it allows teachers to take use of cloud adaptability,

ﬂexibility, and security to reﬂect the primary framework of E-learning education acces-

sible from anywhere, at any time, and on any device. We can fully use the prospects

presented by an efﬁcient learning environment with specialized information that is easily

adaptable to today’s educational paradigm. Integrating an elearning system into the cloud

has several advantages, including increased storage, computing, network connectivity

and prioritize software and hardware cost savings. On the other hand, it offers a more

diverse range of educational programs at a lower license cost. The replacement rate for

student computers is lower due to the extended machine life. These savings add up to a

reduction in IT personnel costs associated with computer lab maintenance and software

updates. Today’s e-learning services and systems are all about personalizing learning and

learning for each user. As a result of this technology, students receive generic e-learning

that is not tailored to their speciﬁc needs. In most modern systems, interaction between

professors and students is essential for improving the quality of each student’s learning

experience. When evaluating the scale of a problem, there are many things to consider.

In response to customer concerns about security and privacy, cloud service providers

have made major investments in cloud infrastructure and platforms. Furthermore, coun-

try limits are necessary since some countries require data to be maintained within their

borders, making data storage remotely or outside of the country illegal. As per current

research, academics have a wealth of data at their disposal to aid in building cloud-based

elearning frameworks and implementations. A quantitative assessment of the impact of

switching to a cloud e-learning environment on several factors such as access speed,

educational quality, and return will be conducted in the future. The adoption of IoT in

universities may be inﬂuenced by education policy in terms of administrative support

58 A. K. Gairola and V. Kumar

and change mindset. There is a need to examine the advantages and disadvantages of

Internet of Things in depth. Information and communication technology (ICT), a soci-

ety that places a high value on acquiring knowledge, and the current pandemic have all

contributed to an increase in the amount of pressure on the education system to adopt

ICT and make education more intelligent. There is also a need to explore machine learn-

ing algorithms [46–50] in cloud based analysis of education systems for tasks such as

student monitoring, student lecture engagement, etc.

References

1. Dias, S.B., Hadjileontiadou, S.J., Diniz, J., Hadjileontiadis, L.J.: DeepLMS: a deep learning

predictive model for supporting online learning in the Covid-19 era. Sci. Rep. 10(1), 1–17

(2020)

2. Bagheri, M., Movahed, S.H.: The Effect of the Internet of Things (IoT) on Education Business

Model, in Proc, pp. 435–441. SITIS, Naples, Italy (2016)

3. Gong, C., Liu, J., Zhang, Q., Chen, H., Gong, Z.: The characteristics of cloud computing. In:

2010 39th International Conference on Parallel Processing Workshops, pp. 275–279. IEEE

(2010)

4. https://classroom.google.com/. Accessed 26 May 2022

5. https://www.blackboard.com/en-apac. Accessed 26 May 2022

6. https://knowledgematters.com/. Accessed 26 May 2022

7. https://www.coursera.org/. Accessed 26 May 2022

8. https://education.microsoft.com/en-us. Accessed 26 May 2022

9. https://classflow.com/. Accessed 26 May 2022

10. Mohanty, D.: Smart learning using IoT. Int. Res. J. Eng. Tech. 6(6), 1032– 1037 (2019)

11. Uskov, V.L., Bakken, J.P., Howlett, R.J., Jain, L.C. (eds.): SEEL 2017. SIST, vol. 70. Springer,

Cham (2018). https://doi.org/10.1007/978-3-319-59454-5

12. Pai, S.S., et al.: IOT application in education. Int. J. Adv. Res. Ideas Innovations Technol.

2(6), 20–24 (2017)

13. Gwak, D.: The meaning and predict of smart learning. In: Proceedings of the Smart Learning

Korea (2010)

14. Aljohany, D.A., Mohamed, R., Saleh, M.: ASSA: adaptive E-learning smart students

assessment model. Int. J. Adv. Comput. Sci. Appl. 9(7), 128–136 (2018)

15. Wang, M., Chen, Y., Khan, M.J.: Mobile cloud learning for higher education: a case study of

moodle in the cloud. Int. Rev. Res. Open Distrib. Learn. 15(2), 254–267 (2014)

16. Sarode, N., Bakal, J.W.: A review on use of mobile cloud system in educational sector. In:

2020 6th International Conference on Advanced Computing and Communication Systems

(ICACCS), pp. 715–720. IEEE (2020)

17. Almutairi, M.M.: A review of cloud computing in education in Saudi Arabia. Int. J. Inform.

Technol. 12(4), 1385–1391 (2020). https://doi.org/10.1007/s41870-020-00452-6

18. Kumar, V.: A review on deep learning based diagnosis of COVID-19 from X-ray and CT

images. In: 2022 International Mobile and Embedded Technology Conference (MECON),

pp. 547–552. IEEE (2022)

19. Sultana, J.: Determining the factors that affect the uses of mobile cloud learning (MCL)

platform blackboard-a modiﬁcation of the UTAUT model. Educ. Inform. Technol. 25(1),

223–238 (2020). https://doi.org/10.1007/s10639-019-09969-1

20. Basilaia, G., Kvavadze, D.: Transition to online education in schools during a SARS-CoV-2

coronavirus (COVID-19) pandemic in Georgia. Pedagogical Research 5, 4 (2020)

Role of Internet of Things and Cloud Computing in Education System 59

21. Huang, R.H., Liu, D.J., Tlili, A., Yang, J.F., Wang, H.H.: Handbook on Facilitating Flexible

Learning During Educational Disruption: The Chinese Experience in Maintaining Undis-

rupted Learning in COVID-19 Outbreak, pp. 1–54. Smart Learning Institute of Beijing Normal

University, Beijing (2020)

22. Mallik, S., Halder,S., Saha, P., Mukherjee, S.: Multi-factor authentication-based E-exam man-

agement system (EEMS). In: Bhattacharjee, D., Kole, D.K., Dey, N., Basu, S., Plewczynski,

D. (eds.) Proceedings of International Conference on Frontiers in Computing and Systems.

AISC, vol. 1255, pp. 711–720. Springer, Singapore (2021). https://doi.org/10.1007/978-981-

15-7834-2_66

23. Bashitialshaaer, R., Alhendawi, M., Lassoued, Z.: Obstacle comparisons to achieving distance

learning and applying electronic exams during COVID19 pandemic. Symmetry 13(1), 99

(2021)

24. Sharma, M., Gupta, R., Acharya, P.: Factors inﬂuencing cloud computing adoption forhigher

educational institutes in India: a fuzzy AHP approach. Int. J. Inf. Technol. Manage. 19(2–3),

126–150 (2020)

25. Naveed, Q.N., Alam, M.M., Qahmash, A.I., Quadri, K.M.: Exploring the determinants of

service quality of cloud E-learning system for active system usage. Appl. Sci. 11(9), 4176

(2021)

26. Khan, R.M.I., Radzuan, N., Farooqi, S., Shahbaz, M., Khan, M.: Learners’ perceptions on

whatsapp integration as a learning tool to develop EFL spoken vocabulary. Int. J. Lang. Educ.

5(2), 1–14 (2021)

27. Khan, R.M.I., Shahbaz, M., Kumar, T., Khan, I.: Investigating reading challenges faced by

EFL learners at elementary level. Register J. 13(2), 277–292 (2020)

28. Khan, I., Ibrahim, A.H., Kassim, A., Khan, R.M.I.: Exploring the EFI learners’ attitudes

towards the integration of active reading software in learning reading comprehension at tertiary

level. MIER J. Educ. Stud. Trends Pract., 248-266 (2020)

29. Bhardwaj, A., Goundar, S.: A framework to deﬁne the relationship between cyber security

and cloud performance. Comput. Fraud & Secur. 2019(2), 12–19 (2019)

30. Kaisara, G., Bwalya, K.J.: Investigating the e-learning challenges faced by students during

COVID-19 in Namibia. Int. J. High. Educ. 10(1), 308–318 (2021)

31. Park, J.H., Park, J.H.: Blockchain security in cloud computing: use cases, challenges, and

solutions. Symmetry 9(8), 164 (2017)

32. Marinescu, D.C.: Cloud Computing: Theory and Practice. Morgan Kaufmann (2017)

33. Mohammed, T., Jean-Yves, C., Peter, B., Christophe, R.: Petrogenesis of the post-collisional

Bled M’Dena volcanic ring complex in Reguibat Rise (western Eglab shield, Algeria). J. Afr.

Earth Sci. 166, 102250 (2020)

34. Walcott, D.A.: How the fourth industrial revolution can help us beat COVID-19. In: World

Economic Forum (2020). https://www.weforum.org/agenda/2020/05/how-the-fourth-indust

rialrevolution-can-help-us-handle-the-threat-of-covid-19

35. Islam, A., Anum, K., Dwidienawati, D., Wahab, S., Abdul, L.A.: Building a post COVID-19

conﬁguration between Internet of Things (IoT) and sustainable development goals (SDGs)

for developing countries. J. Arts Soc. Sci. 4(1), 45–58 (2020)

36. Javaid, M., Khan, I.H.: Internet of Things (IoT) enabled healthcare helps to take the challenges

of COVID-19 pandemic. J. Oral Biol. Craniofac. Res. 11(2), 209–214 (2021)

37. Nasajpour, M., Pouriyeh, S., Parizi, R.M., Dorodchi, M., Valero, M., Arabnia, H.R.: Internet

of Things for current COVID-19 and future pandemics: an exploratory study. J Healthcare

Inf Res. 1, 40 (2020)

38. Fahrni, S., Jansen, C., John, M., Kasah, T., Körber, B., Mohr, N.: Coronavirus: Industrial IoT

in Challenging Times. McKinsey & Company, New York (2020)

60 A. K. Gairola and V. Kumar

39. Radanliev, P., De Roure, D.: Alternative mental health therapies in prolonged lockdowns:

narratives from Covid-19. Heal. Technol. 11(5), 1101–1107 (2021). https://doi.org/10.1007/

s12553-021-00581-3

40. Radanliev, P., De Roure, D.: Epistemological and bibliometric analysis of ethics and shared

responsibility—health policy and IoT systems. Sustainability. 13(15), 8355 (2021)

41. Radanliev, P., De Roure, D., Ani, U., Carvalho, G.: The ethics of shared Covid-19 risks: an

epistemological framework for ethical health technology assessment of risk in vaccine supply

chain infrastructures. Heal. Technol. 11(5), 1083–1091 (2021). https://doi.org/10.1007/s12

553-021-00565-3

42. Forum, W.E.: State of the Connected World (2020). http://www3.weforum.org/docs/WEF_

The_State_of_the_Connected_World_2020.pdf

43. Parvez, N., Chowdhury, T.H., Urmi, S.S., Taher, K.A.: Prospects of Internet of Things

for Bangladesh. In: 2021 International Conference on Information and Communication

Technology for Sustainable Development (ICICT4SD), pp. 481–485 (2021)

44. Miazi, M.N.S., Erasmus, Z., Razzaque, M.A., Zennaro, M., Bagula, A.: Enabling the Internet

of Things in developing countries: opportunities and challenges. In: 2016 5th International

Conference on Informatics, Electronics and Vision (ICIEV), pp. 564–569. IEEE (2016)

45. Sarker, S., Roy, K., Afroz, F., Pathan, A.-S.: On the opportunities, applications, and chal-

lenges of internet of things. In: Khan, M.A., Quasim, M.T., Algarni, F., Alharthi, A. (eds.)

Decentralised Internet of Things. SBD, vol. 71, pp. 231–254. Springer, Cham (2020). https://

doi.org/10.1007/978-3-030-38677-1_11

46. Kumar, V., et al.: Hybrid spatiotemporal contrastive representation learning for content-based

surgical video retrieval. Electronics 11, 1353 (2022)

47. Kumar, V., Tripathi, V., Pant, B.: Learning unsupervised visual representations using 3d

convolutional autoencoder with temporal contrastive modeling for video retrieval. Int. J.

Math. Eng. Manag. Sci. 7(2), 272–287 (2022)

48. Kumar, V., Tripathi, V., Pant, B.: Enhancing unsupervised video representation learning by

temporal contrastive modelling using 2d CNN. In: 5th IAPR International Conference on

Computer Vision & Image Processing (CVIP 2021)

49. Kumar, V., Tripathi, V., Pant, B.: Unsupervised learning of visual representations via rotation

and future frame prediction for video retrieval. In: Singh, M., Tyagi, V., Gupta, P.K., Flusser,

J., Ören, T., Sonawane, V.R. (eds.) ICACDS 2021. CCIS, vol. 1440, pp. 701–710. Springer,

Cham (2021). https://doi.org/10.1007/978-3-030-81462-5_61

50. Kumar, V., Tripathi, V., Pant, B.: Exploring the strengths of neural codes for video retrieval. In:

Tomar, A., Malik, H., Kumar, P., Iqbal, A. (eds.) Machine Learning, Advances in Computing,

Renewable Energy and Communication. LNEE, vol. 768, pp. 519–531. Springer, Singapore

(2022). https://doi.org/10.1007/978-981-16-2354-7_46

Smart Communication and Technology

An Effective Image Augmentation Approach

for Maize Crop Disease Recognition

and Classiﬁcation

M. Nagaraju1, Priyanka Chawla2(B), and Rajeev Tiwari3

1School of Computer Science and Engineering, Lovely Professional University, Phagwara,

Punjab, India

2Department of Computer Science & Engineering, National Institute of Technology Warangal,

Hanamkonda, Telangana, India

priyankac@nitw.ac.in

3Systemic, School of Computer Science, University of Petroleum and Energy Studies,

Dehradun, Uttarakhand, India

Abstract. Deep learning techniques have been applied to computer vision appli-

cations like image recognition and classiﬁcation successfully. Especially, con-

volutional neural networks preserve the characteristics of an object in an image

using kernels and performs recognition very efﬁciently. However, the performance

of these networks depends on larger datasets which is a big challenge to the

researchers in the agriculture ﬁeld. Image augmentation can be a better solution

that supports the neural network model to perform the classiﬁcation task efﬁciently

with more input images. In this paper, severalimage augmentation techniques were

applied to generate varieties of new images from the original image. The paper

proposes a new CNN-based model for the classiﬁcation of six diseased and one

healthy maize crop images. The proposed model will be trained for twice indepen-

dently with 4652 original dataset images and 10640 augmented images dataset.

Finally, the outcomes will be analyzed separately with respect to precise and loss

functions. Before the implementation of augmentation approach, the proposed

model has achieved 99.61% training and 77.44% classiﬁcation accuracies and

does not control overﬁtting. Moreover, after applying augmentation techniques,

the model has obtained 95.96% of training accuracy and 93.61% of classiﬁcation

accuracy and controlled overﬁtting. Therefore, it has been proved that the image

augmentation approach and the proposed convolutional neural network model

contribute a better solution while classifying maize crop diseases with a higher

level of accuracy.

Keywords: Computer vision ·Convolutional neural networks ·Deep learning ·

Image augmentations ·Image classiﬁcation ·Image preprocessing techniques

1 Introduction

A practical approach to deal with less images in computer vision is typical. Augment

the training images artiﬁcially can extract new images and may reduce overﬁtting [8].

R. Mehra et al. (Eds.): ICCISC 2022, CCIS 1672, pp. 63–72, 2022.

https://doi.org/10.1007/978-3-031-22915-2_6

64 M. Nagaraju et al.

Imbalanced datasets can also be solved by applying augmentation techniques to transfer

the original shape of the plant generating additional images [9]. Image augmentations

gives a resultant dataset that is six times greater than the quantity of original set. The

proposed model LeafGAN boosted the accuracy by 7.4% [2]. Model generalization can

be improved by performing image preprocessing and augmentation. The experiments

were conducted to evaluate the efﬁciency of the proposed model to classify DiaMOS

plant dataset [3].

An advanced machine learning (ML) model is proposed to classify the major diseases

in banana crop using ariel images. The results obtained by the trained models proposed

that image augmentation have given positive outcomes on disease classiﬁcation. The

model has a control on training rate without any overﬁtting [4]. An image augmentation

strategy has been followed to amplify the original images and tested the performance

classiﬁcation of apple. The results have shown that the proposed model achieved 6.3%

higher recognition accuracy [3]. An enhanced classiﬁcation model is proposed to increase

the classiﬁcation accuracy and address the overﬁtting problem. The model has achieved

24.4% higher overall classiﬁcation accuracy using conventional image augmentations

[1]. Synthetic image is the most usual method of data augmentation. The proposed

model achieved 7% higher improved accuracy over the existing models [5]. A transfer

learning concept has implemented by modifying VGG-16 to classify the images obtained

from different mango farms. Image augmentation process is adopted and achieved 73%

accuracy on the training dataset and 73% on the testing set. Data augmentation leads

to 13.43% improvement on the testing data [7]. A conditional deep neural network is

proposed for vigor rating of plants and investigated that after data augmentations the

model has improved the classiﬁcation accuracy and succeeded to obtain 23% increase

in F1score. The proposed approach has resolved the problem of insufﬁcient data size in

plant diseases task [11–14,12].

In this article, we have identiﬁed the need to expand all the seven datasets with data

augmentation techniques. Image augmentation generates the new images that allows

to balance all the disease classes with equal number of images. Firstly, an image of a

diseased leaf with RGB representation is augmented into different variations for each

transformation type. Seven types of image transformations like random rotation, hor-

izontal shift, vertical shift, horizontal ﬂip, vertical ﬂip, random zooming, and random

brightness are applied to an image of a diseased leaf. The outcome of these techniques

generates a set of newly generated set of images that are used further to train a convolu-

tional neural network (CNN). The novelty of the proposed work is to generate the new

images dataset, design a CNN model, perform disease classiﬁcation with original and

augmented datasets.

The other sections are organized as: Sect. 2discusses the approach followed to

apply the image augmentation techniques. Section 3describes the experimental results

achieved from the image augmentation approach followed by the conclusion in Sect. 4.

2 Image Augmentation Approach

Image Augmentation or simple IA is an essential approach that replicates the given

image with few transformations. The augmentation technique increases the diversity of

An Effective Image Augmentation Approach for Maize Crop Disease Recognition 65

images by changing each image in different ways like rotating, shifting, zooming, and

ﬂipping.

In this paper, a supervised learning named RGB Image Augmentation Approach

(IAA) is followed to generate new images. The approach directs to improve the tradi-

tional techniques like augmentation by considering the requirement for new augmenta-

tion techniques. Figure 1shows some original images and the images generated after

applying the augmentations. The study employs the IAA as a preprocessing procedure

to identify the converted images automatically. The newly generated images serve as

the preprocessing procedure to feed the convolutional neural networks with the input

images for training.

Fig. 1. Sample Images (a) original (b) augmented.

3 Experimental Results and Observations

A CNN model is developed to implement the IA approach and perform the image

classiﬁcation in this section.

3.1 RGB Image Augmentation

A new CNN model is developed with four convolution and max-pooling layers one and

all continued with 1 ﬂatten and 2 dense layers. Figure 2shows the summary of the

proposed model and the related hyperparameters are shown in Fig. 3.

3.2 CNN Evaluation

The classiﬁcation performance of the proposed CNN model is evaluating by conducting

the experiments with 4652 leaf images collected from Kaggle repository (Kaggle Dataset

n.d.). These images belong to seven different classes of maize crop. The image dataset is

split into two subsets as training set and testing set. IAA will be applied on the training

images to generate new images with different variants. After applying IAA, the quantity

of the training dataset is increased to 10640 images. The testing dataset is prepared with

66 M. Nagaraju et al.

Fig. 2. Proposed model summary

Fig. 3. The proposed 11-layer CNN model

2660 images that are randomly selected from the ﬁrst dataset of 4652 images. The list

of diseases and the number of images considered for each disease is depicted in Table

1. The model is evaluated with original and augmented datasets by training and testing

individually. Later, the results obtained are used to validate and compare the actual

predictions with respect to accuracy, loss and confusion matrix.

3.2.1 CNN Model Implementation

The present sequential model is designed with four convolution layers (Conv2D_1, 2,

3 and 4 layers) with kernel size 3 ×3 followed by four max-pooling layers with pool

size 2 and 2 strides. The model has one fully connected layer, and two dense layers with

128 units for ﬁrst and 7 units for the last layers. First, the proposed model is feed with

original dataset (without augmentation) and performed the classiﬁcation. Figure 4shows

the performance with training, testing accuracy and loss curves before implementing

IAA. The accuracy and loss curves are plotted individually for better understanding and

comparison of results.

An Effective Image Augmentation Approach for Maize Crop Disease Recognition 67

Tabl e 1 . Details of training and testing datasets

Disease class 1 2 3 4 Total images

Anthracnose Leaf Blight - ALB 435 380 1520 380 2280

Anthracnose Stalk Rot – ASR 512 380 1520 380 2280

Eye Spot – ES 352 380 1520 380 2280

Gabriella Stalk Rot – GSR 452 380 1520 380 2280

Health – H 1320 380 1520 380 2280

Northern Corn Leaf Spot –NCLS 1170 380 1520 380 2280

Southern Rust - SR 411 380 1520 380 2280

Tota l 4652 2660 10640 2660 13300

1-Number of training images before IAA, 2-Number of testing image before IAA, 3-Number of

training images after IAA, 4-Number of testing images after IAA.

Fig. 4. Performance evaluation before applying the IAA technique

The plots in Fig. 4(a) shows the variation in learning of the proposed model using

training and testing accuracies. The plots in Fig. 4(a) shows the variation in learning of

the proposed model using training and testing losses. ‘Acc’ and ‘val_acc’ indicates the

training and testing accuracy curves whereas ‘loss’ and ‘val_loss’ indicate the training

and testing loss curves. The large gap between the accuracies and losses curves, a higher

difference of 22.17% among the training and testing accuracies describe that the model

is overﬁt to the training dataset and not performed the classiﬁcation well on the testing

set. Before applying IAA and after running the model using 100 epochs, the training

accuracy of 99.61%, and the testing accuracy of 77.44% are obtained.

In addition to confusion metrics, the other performance metrics like precision, recall,

and F1-score have also computed and the values are presented in Table 2. The confusion

matrix of the proposed model shown in Fig. 5(a) and 5(b) revealed that the classiﬁcation

exhibited more than 95% accuracy in only two classes ES (95.7%) and H (97.8%).

The precision metric values for ES and H disease classes are found as 93% and 91%

respectively. The recall metric values for ES and H disease classes are found as 96%

and 98% respectively. The F1-score metric value is reported as 96% for H disease class.

68 M. Nagaraju et al.

Tabl e 2 . Classiﬁcation performance before and after applying IAA

Class # 1 2 1 2 1 2 Support

Precision Recall F1-score

00.61 0.89 0.71 0.95 0.65 0.92 380

10.87 0.97 0.89 0.90 0.88 0.93 380

20.78 0.94 0.96 0.94 0.86 0.94 380

30.93 0.88 0.98 1.00 0.96 0.93 380

40.91 0.95 0.71 0.94 0.80 0.95 380

50.59 0.95 0.58 0.87 0.59 0.91 380

60.77 0.98 0.59 0.95 0.67 0.97 380

Accuracy 0.77 0.94 2660

Macro Avg 0.78 0.94 0.77 0.94 0.77 0.94 2660

Weighted Avg 0.78 0.94 0.59 0.94 0.67 0.94 2660

1-Before IAA and 1-After IAA.

The metric values revealed that the proposed model has performed the classiﬁcation

more efﬁciently only for two disease classes ES and H. It is observed that the model is

not so efﬁcient during the classiﬁcation of other ﬁve diseases ALB, ASR, GSR, NCLS

and SR. Second, the proposed model is feed with augmented dataset and performed the

classiﬁcation. Figure 6shows the performance with training and testing accuracy and

loss curves after applying IAA. The accuracy and loss curves are plotted separately or

better understanding and comparison of results.

The plots in Fig. 6(a) shows the variation in learning of the proposed model using

training and testing accuracies. The plots in Fig. 6(a) shows the variation in learning of

the proposed model using training and testing losses. ‘Acc’ and ‘val_acc’ indicates the

training and testing accuracy curves whereas ‘loss’ and ‘val_loss’ indicate the training

and testing loss curves. The proposed model after training and testing with the images

generated after applying IAA has obtained 95.96% training and 93.61% testing accu-

racies. A 2.35% difference among the accuracies between training and testing datasets

shows that the model has performed well when compared the results obtained before

applying IAA. The accuracy and loss curves describe that the present model is not overﬁt

to the training images dataset and performed the classiﬁcation well even on the testing

dataset. The results shown in Fig. 6explains that the proposed model is so efﬁcient when

it is trained with a greater number of input images. In this regard, image augmentation

techniques have contributed a lot to increase the number of training images and achieve

better classiﬁcation performance.

In addition to confusion matrix, the performance metrics like precision, recall, and

F1-score have also computed and the values are illustrated in Table 2. The confusion

matrix of the proposed model depicted in Fig. 7(a) and 7(b) revealed that the classiﬁcation

exhibited 100% accuracy for H. The results exhibited more than 95% accuracy in only

two classes ALB (95.2%) and SR (95%). It is observed that the classiﬁcation exhibited

An Effective Image Augmentation Approach for Maize Crop Disease Recognition 69

Fig. 5. Confusion Matrix before Implementing IAA Technique (a) without normalization (b) with

normalization (c) color scale reﬂections

equal or more than 90% in two disease classes ES (93.6%), and GSR (94.2%). The

classiﬁcation exhibited equal or more than 95% accuracy in three classes ALB (95.2%),

H (100%), and SR (95%). Precision metric values of ASR, ES, GSR, NCLS, and SR

disease classes are found as 97%, 94%, 95%, 95% and 98% respectively. Recall metric

values of ALB, ASR, ES, H, GSR, and SR disease classes are found as 95%, 90%,

94%, 100%, 94% and 95% respectively. The F1-score is reported 100% for H disease

class. The results revealed that the proposed model has performed the classiﬁcation of

six disease classes ALB, ASR, ES, H, GSR, and SR more efﬁciently. It is observed that

the model is not so efﬁcient for classiﬁcation of only one disease class NCLS. Figure 8

depicts the classiﬁcation performance of the proposed model before and after applying

IAA.

70 M. Nagaraju et al.

Fig. 6. Performance evaluation after applying the IAA technique

Fig. 7. Confusion Matrix after implementing IAA technique (a) without normalization (b) with

normalization (c) color scale reﬂections

An Effective Image Augmentation Approach for Maize Crop Disease Recognition 71

Fig. 8. Performance measure before/after IAA technique

4 Conclusion

An adoptive supervised learning approach IAA to perform image augmentations to

generate the multiple images automatically in this paper. The enhanced dataset can

be adaptive to train any CNN model. The present paper has proposed a new CNN-

based model with 11 layers to perform the image classiﬁcation of seven maize crop

disease classes. Later, the model has trained and tested with different datasets. First,

the model has trained with only 4652 images collected from Kaggle repository. Next,

the same model has trained with 10640 augmented images. Finally, the accuracy and

loss values have collected separately for two experiments and compared to identify

the difference in classiﬁcation performances. The ﬁrst experimental results show that

before applying augmentations, the proposed CNN model has obtained 99.61% training

accuracy and testing accuracy of only 77.44%. The second experimental results show

that after applying augmentation approach has obtained a training accuracy of 95.96%

and testing accuracy of 93.61%. The observations have described that the proposed CNN

model is so efﬁcient while classifying the maize crop diseased and healthy images. CNN

model is extremely effective when it comes to classifying the diseased images of maize

crop. The ﬁndings drew attention to more advanced transformations for increasing the

number of images in training set and assisting in the prevention of model overﬁtting.

References

1. Bi, L., Hu, G.: Improving image-based plant disease classiﬁcation with generative adversarial

network under limited training set. Front. Plant Sci. 11 (2020). https://doi.org/10.3389/fpls.

2020.583438

2. Cap, Q.H., Uga, H., Kagiwada, S., Iyatomi, H.: LeafGAN: an effective data augmentation

method for practical plant disease diagnosis (2020). http://arxiv.org/abs/2002.10100

3. Fenu, G., Malloci, F.M.: Using multioutput learning to diagnose plant disease and stress

severity. Complexity (2021). https://doi.org/10.1155/2021/6663442

72 M. Nagaraju et al.

4. Gomez Selvaraj, M., et al.: Detection of banana plants and their major diseases through aerial

images and machine learning methods: a case study in DR Congo and Republic of Benin.

ISPRS J. Photogramm. Remote Sens. 169, 110–124 (2020). https://doi.org/10.1016/j.isprsj

prs.2020.08.025

5. Hu, G., Peng, X., Yang, Y., Hospedales, T., Verbeek, J.: Frankenstein: learning deep face

representations using small data (2016). http://arxiv.org/abs/1603.06470

6. Kaggle Dataset: Corn or Maize Dataset Corn or Maize Leaf Disease Dataset | Kaggle (n.d.)

7. Kusrini, K., et al.: Data augmentation for automated pest classiﬁcation in Mango farms.

Comput. Electron. Agric. 179 (2020). https://doi.org/10.1016/j.compag.2020.105842 279

8. Shorten, C., Khoshgoftaar, T.M.: A survey on image data augmentation for deep learning. J.

Big Data 6(1), 1–48 (2019). https://doi.org/10.1186/s40537-019-0197-0

9. Toda, Y., Okura, F.: How convolutional neural networks diagnose plant disease. Plant

Phenomics (2019). https://doi.org/10.34133/2019/9237136

10. Yan, Q., et al.: Apple leaf diseases recognition based on an improved convolutional neural

network. Sensors 20, 3535 (2020). https://doi.org/10.3390/s20123535

11. Zhu, F., He, M., Zheng, Z.: Data augmentation using improved cDCGAN for plant vigor

rating. Comput. Electron. Agric. 175 (2020). https://doi.org/10.1016/j.compag.2020.105603

12. Mishra, A.M., Harnal, S., Gautam, V., Tiwari, R., Upadhyay, S.: Weed density estimation in

soya bean crop using deep convolutional neural networks in smart agriculture. J. Plant Dis.

Prot., 1–12 (2022)

13. Kaur, P., et al.: Recognition of leaf disease using hybrid convolutional neural network by

applying feature reduction. Sensors 22(2), 575 (2022)

14. Kaur, P., et al.: A hybrid convolutional neural network model for diagnosis of COVID-19

using chest X-ray images. Int. J. Environ. Res. Public Health 18(22), 12191 (2021)

Implementation of Artiﬁcial Intelligence (AI)

in Smart Manufacturing: A Status Review

Akash Sur Choudhury1, Tamesh Halder2, Arindam Basak1(B),

and Debashish Chakravarty2

1School of Electronics Engineering, KIIT, Bhubaneswar, Odisha, India

arindambasak2007@gmail.com

2Department of Mining Engineering, Indian Institute of Technology, Kharagpur, West Bengal,

India

Abstract. In today’s world, artiﬁcial intelligence (AI) is widely considered one of

the highly innovativetechnologies. Usage of AI has been implemented nearly in all

sectors such as manufacturing, R&D, education, smart cities, agriculture, etc. The

new era of the Internet plus AI has resulted in the high-speed evolutionof the central

technologies, analyzed based on research regarding recent artiﬁcial intelligence

(AI) applications in smart manufacturing. It is necessary to set up an industry

that must be ﬂexible with turbulent changes and adequately manage highly skilled

employees and workers to design a more suitable working atmosphere for both men

and technology. Google Scholar is widely used to explore several keywords and

their combinations and search and examine the relevant articles, papers, journals,

and study data for conducting this manuscript. The recent progress in intelligent

manufacturing is discussed by observing the outlook of intelligent manufacturing

technology and its applications. Lastly, the study talks about the scope of AI and

how it is implemented in today’s smart manufacturing sector of India, focusing

on its present status, limitations, and suggestions for overcoming problems.

Keywords: Artiﬁcial intelligence ·Smart manufacturing ·Industry 4.0 ·IIOT ·

CPS ·Machine learning ·Deep learning ·RUL ·ICT

1 Introduction

AI refers to technology having perceptive and psychological abilities. It has also autho-

rized ﬁrst-class coherent processes such as thinking, learning, perceiving, decision-

making, problem-solving, data collection, segregation, and analysis to supplement

human brainpower. In 1956, computer scientists Allen Newell, Marvin Minsky, John

McCarthy, Arthur Samuel, and Herbert Simon developed artiﬁcial intelligence theory.

Late in the 1990s and early in the twenty-ﬁrst century, AI usage is rapidly transforming

the globe, increasing the signiﬁcance of analytics and enormous growth of computing

ability [1]. Figure 1tells us about the applications of AI/ML algorithms in different

processes such as fault prediction, security, etc. In Fig. 2, various machine learning clas-

siﬁcations and their characteristics are discussed. According to the training system and

the input data type, there are three types of machine learning algorithm classiﬁcations:

R. Mehra et al. (Eds.): ICCISC 2022, CCIS 1672, pp. 73–85, 2022.

https://doi.org/10.1007/978-3-031-22915-2_7

74 A. S. Choudhury et al.

supervised learning, unsupervised learning, and reinforcement learning [2,5].

Fig. 1. Applications of AI/ML algorithms [5].

Fig. 2. Machine learning classiﬁcations and their signiﬁcant characteristics [5].

Semi-supervised learning: this method has a small set of labeled information, and

the remaining data is unlabelled hence the name semi-supervised.

Now smart manufacturing refers to the manufacturing technology aiming to provide

the industrial setting for intelligent, real-time, autonomous, and interoperable production

environments.

It integrates recent and innovative information and communication technologies

(including 5G networks and Wiﬁ), such as the Internet of Things (IoT), cloud com-

puting (CC), and cyber-physical systems (CPS) powered by AI/ML decision-making

technologies, and results in accurate fault detection and also real-time defective product

recognition [3]. This paper describes the cycle of Industry 4.0, such as data acquisition,

monitoring, connectivity, big data, smart assembling, control, and scheduling [47]. AI

in intelligent manufacturing is utilized in various applications such as quality inspec-

tion, energy conservation, supply chain, and predictive maintenance [48]. The lifecycle

of industry 4.0 in smart manufacturing is mentioned below in Fig. 3. While Fig. 4

describes the model of an intelligent manufacturing system and its applications [4].

Implementation of Artiﬁcial Intelligence (AI) in Smart Manufacturing 75

Fig. 3. Lifecycle of Industry 4.0 in smart manufacturing [47].

Fig. 4. Intelligent manufacturing model design [4].

Fig. 5. Smart hybrid manufacturing system [2].

Autonomous sensing, learning, analysis, interconnection, cognition, decision-

making, control, and information execution are included in the above ﬁgure, which

integrates and optimizes the copious features of a manufacturing enterprise. AI-based

76 A. S. Choudhury et al.

Smart manufacturing leads to enhanced worker safety, product quality, energy use, pro-

duction efﬁciency, and fault predictions leading to a more high-yielding and secure

workspace, thus engaging smart machines to carry out big tasks and assisting the human

labor force getting rid of routine procedures [5]. Recently, the rise in usage of data-driven

approaches has led to the achievement of monitoring and diagnosis by CPS observation

and analysis that collect and communicate immense data through standardized inter-

faces, which gives rise to the Internet of Things [6]. Figure 5shows an image of an

intelligent hybrid manufacturing system consisting of sensors and new technology such

as big data.

Machinery industries around the world have utilized the technology of smart

machines and the automation of assembly lines to send the production data of these

machines to a monitoring platform in real-time, such as inspection of ball-bearing mal-

function, industrial data-driven monitoring, ball-bearing vibration data, remote wind

turbine condition monitoring, vibration monitoring for smart maintenance and analy-

sis of vibration time [7]. Signiﬁcant cost reduction is made using predictive mainte-

nance. This method can be proposed by prolonging the functional life of manufacturing

machines and increasing overall equipment effectiveness [8]. In Industry 4.0 manu-

facturing, condition monitoring has been a valuable tool for improving safety, health,

and equipment performance. In smart and sophisticated industrial equipment [9]. The

knowledge-based intelligent supervisory system proposes a pattern recognition strategy

and learning process to inspect rare quality events [10]. In this article, different types of

ML and their applications in Industry 4.0 have been discussed. Also, how AI has taken

place in Indian manufacturing, its scope, possibilities, and suggestions are discussed.

This paper has been written to help future scientists to undergo further research regard-

ing artiﬁcial intelligence and its algorithms in smart manufacturing. The article has also

been reported to depict the Indian scenario in Industry 4.0. The Indian Government,

Indian scientists, and engineers will be aware of the actual condition of Indian man-

ufacturing and get encouraged to work with this new-age technology. Simultaneously,

this article will also encourage Indian industrialists to invest capital in India’s AI-based

manufacturing. Last but not least, this manuscript tells us about the consequences of

AI technology implementation in less developed countries, such as unemployment (due

to lack of skill), to aware factory workers as well as skilled professionals of the reality

regarding its actual implementation so that they will become ready to get accustomed

with this new AI-based manufacturing technology.

2 Literature Review

AI and its powerful technologies, such as machine learning (ML), deep learning (DL),

etc., are generally widespread in manufacturing. It has been evident that applying these

technologies in real life requires enormous capital and efﬁcient human resources capable

of cooperative effort in surroundings [1]. The rapid advancement of machine learning has

led to the massive revolution in the artiﬁcial intelligence ﬁeld through which machines

are allowed to learn, improve and optimize speciﬁc tasks without being programmed

directly. Machine learning can be used widely in smart machining (consisting of CPSs)

[2]. The Industrial Internet of Things (IIoT) provides real-time production data collecting

Implementation of Artiﬁcial Intelligence (AI) in Smart Manufacturing 77

with enhanced wireless connectivity, leading to Industry 4.0 powered by AI [5]. Remark-

able progress has been made in recent years regarding database technologies, computer

power, machine learning (ML), big data, and optimization methods to attain fault-free

(defect-free) processes with the help of ISCS(Intelligent Supervisory Control Systems)

[10]. Predictive model-based quality inspection is an innovative solution developed for

industrial manufacturing applications using edge cloud computing technology, machine

learning techniques, and IOT architecture. The quality inspection processes based on

the predictive model are shown in Fig. 6[11].

Fig. 6. Predictive model-based quality inspection framework arrangement [11].

Automation results in affordable cost, high reliability, and highly endurant qual-

ity inspection process in smart manufacturing industries. This helps to optimize high

productivity, reliability, and repeatability. The automation process in manufacturing is

run and regulated by a Programmable Logic Controller (PLC) [12]. Numerous appli-

cations of data science technologies or big data analytics in industries include process

adjustment, monitoring, and optimization [13]. DL-based fault diagnosis of rotating

machinery eradicates the drawbacks of traditional fault diagnosis methods [14]. Artiﬁ-

cial neural networks (ANNs) have a long history of detecting equipment health conditions

and RUL prediction in smart machines because of their effectiveness, adaptability, and

many other factors [15]. If the engineers accurately implement inactive state detection

in smart appliances in manufacturing, it will be beneﬁcial in performing maintenance

works, error reduction, and catastrophic failure detection [16]. The advancement of the

IIoT has led to the rapid development and installation of sensors to monitor the machine

condition and check whether the machine is operating and working correctly or not [17].

Incorrect readings and values of malfunctioning sensors can be estimated by accurately

performing predictive analysis of big data, which can also be used in decision-making,

including operation and maintenance planning [18]. Artiﬁcial intelligence, data mining,

and other applications all use neural networks. A Deep Neural Network (DNN) is mainly

proposed for non-linear high-dimensional regression problems, leading to the ambigu-

ous process due to complexity [19]. Extreme learning methods are generally applied

to eradicate the difﬁculties of a single hidden layer feedforward network and enhance

generalization performance and learning capability [20]. One of the essential bearing

types is the rolling element bearing. It is commonly used in the mechatronics ﬁeld. The

various bearing failures of rolling elements affect industrial equipment, such as produc-

tivity reduction, the rise of safety risks, and accuracy loss within this severe and harsh

working environment. RUL (Remaining Useful Life) prediction is helpful in industrial

78 A. S. Choudhury et al.

manufacturing and production optimization [21]. The time-domain vibration signal fea-

tures are extracted through fault diagnosis from the rotating machinery consisting of

standard and ﬂawed bearings. This can be possible with the help of ANN having input,

hidden, and output layers [24]. Figure 7shows various time-domain vibration signals.

Fig. 7. Time-domain vibration signal: (a) acquired (normal), (b) band-pass ﬁltered (normal), (c)

wavelet transformed D2 (normal), (d) acquired (defective), (e) band-pass ﬁltered (defective), (f)

wavelet transformed D2 (normal) [24].

Machine learning techniques like neural networks help maintain and manage the

considerable data complexities [23]. Cyber-Physical Production Systems (CPPS) and

reducing this amount also helps to enhance machine efﬁciency, leading to more cost-

effective output in industrial plants [22]. Data mining and ML (Machine Learning) algo-

rithms can be executed to the data present in the SAP application to build classiﬁcation

models for predicting the reliability of industrial machines. [25]. Bearings are a crucial

part of rotating machinery operation in most manufacturing systems. RUL and health

analysis of bearings are performed to predict reliability and safety in manufacturing

by increasingly providing powerful methods and processes that enable smart progno-

sis and bearing health management [26]. Defects present in rolling bearing may result

in machine failure. But to avoid malfunctioning, early detection of faults is essential

[27]. In comparison to other rotating machinery defects, rotor faults (mainly bearing

and gear faults) have attracted more attention from the AI research community regard-

ing the use of fault-speciﬁc traits in feature engineering [28]. Therefore, the vibration

analysis technique predicts better reports in rolling bearing condition monitoring and

fault diagnosis [46]. Oil and gas industry projects require colossal capital, including

equipment acquisition and installation. The current drop in petroleum prices has limited

spending, highlighting the necessity of proper maintenance management in the oil and

gas business. Rotating mechanical equipment such as compressors, pumps, and induc-

tion motors are essential elements widely used in manufacturing procedures [28]. Image

analysis powered by artiﬁcial intelligence enables accurate material characterization and

measurement, displaying the quality of composite materials [30]. For the high-resolution

images in the dataset, such as (the LCD panel cutting wheel degradation dataset), to

enhance the computational efﬁciency, one will ﬁrst extract the regions of interest from

the raw image data, with 1400 ×80 pixels. After that, those regions are transformed into

grey-scale images for further processing and then pretraining unsupervised data based

Implementation of Artiﬁcial Intelligence (AI) in Smart Manufacturing 79

on the dataset information [45]. Machine Learning and artiﬁcial intelligence will better

identify failures, ensure quality, and improve preventative maintenance in real-world

applications [31].

3 Methodology

We gathered, inspected, and clustered the data relevant to countless websites and dif-

ferent research papers as per the research requirements. The data has been collected

from multiple websites together with the help of a brief introduction to AI technology

and Industry 4.0. After that, the information was put compactly. Different AI and ML

algorithms are used in smart manufacturing [1].

Tabl e 1 . Different models of prediction [11].

Model Accuracy

(%)

Stand ard

deviation

(%)

Recall (%) Pre cision

(%)

Training

time (1000

rows) in ms

Scoring time

(1000 rows)

in ms

Naïve

Bayes

83.5 ±2.7 94.7 75.5 3 9

Decision

Tree

88.2 ±1.5 91.9 84.0 39 6

LR 71.9 ±1.3 77.0 66.8 49 27

SVM 92.9 ±1.3 96.4 89.3 300 360

GBT 92.6 ±1.0 89.9 93.1 240

Methods such as CNN and ELM are applied in gearbox and motor-bearing datasets.

The Continuous Wavelet Transform (CWT) is initially implemented to get pre-processed

presentations of raw vibration signals. After that, the CNN algorithm is developed to

extract high-level features, and ELM is further used to enhance the classiﬁcation per-

formance [32]. While ANN is used to classify the machine status into standard or faulty

bearings, R-ELM is used to extract stator current vibration signals, detect bearing faults,

and accurately achieve reliable classiﬁcation, satisfying the need to see online bearing

fault [33,24]. The performance of different prediction models is shown in Table 1.

Various signal processing techniques, such as STFT, WPT, FFT, etc., are proposed to

overcome the challenges, such as removing background noise from vibration signals to

extract the fault features with high resolution [34]. Mainly deep learning algorithms are

used for regression of rotorcraft vibrational spectra [35]. Below at Table 2, it has been

discussed about the input signal effect.

Generative Adversarial Networks (GAN) solve the current problems effectively

encountered in defect examination of industrial datasets and identify unrevealed defects

in future processing events, which led to its increased usage in Industrial Anomaly

Detection [36]. In AI diagnostic techniques, spectral envelope analysis of the current

remnant eliminates noise, manifesting the characteristic bearing faults [37]. Integration

80 A. S. Choudhury et al.

Tabl e 2 . Effects of input signals on identifying machine conditions with ﬁve features (RMS, s2,

g3,g4, g6) [24].

Case no Input signals Training success Test success Epochs

1 1 24/24 (100%) 13/16 (81.25%) 28

2 2 24/24 (100%) 14/16 (87.50%) 17

3 3 24/24 (100%) 12/16 (75.00%) 33

4 4 24/24 (100%) 15/16 (93.75%) 24

5 5 24/24 (100%) 15/16 (93.75%) 19

62,3 48/48 (100%) 32/32 (100%) 12

72, 3, 4 72/72 (100%) 48/48(100%) 22

81, 2, 3, 4 96/96 (100%) 63/64 (98.44%) 23

91, 2, 3, 4, 5 120/120 (100%) 79/80 (98.75%) 32

of RNN with LSTM can mitigate risk in rotating equipment predictive maintenance, lead-

ing to cost reduction in oil and gas operations [38]. GDAU Neural Network describes

the tendency of rolling bearing degradation to have more vital short-term and long-

term prediction ability, so it is more worthy for RUL prediction of bearings [21]. After

undergoing extraction from the raw image data, the grey-scale images and pretraining

unsupervised ML-based RUL prediction algorithms such as DCNN, DCNN-M, LSTM,

NoAtt, and Nosupatt are used in the LCD panel cutting wheel degradation dataset con-

taining images of multiple wheels having high-resolution. These RUL prediction meth-

ods provide a practical approach to prognosticative problems and partial observations

[45]. Thus, recently there has been a rise in AI-based predictive maintenance and fault

diagnosis in smart manufacturing, mechanical processes, and machinery.

4 Findings and Discussion

AI is a technology with perceptive and psychological abilities, having some high-yielding

research relevant ﬁelds such as image processing, natural language processing, machine

learning, etc., which is currently used in industry 4.0 manufacturing systems. Different

manufacturing abilities such as Computer Numerical Control (CNC), automated guided

vehicles (AGV), Direct Numerical Control (DNC), robotics, etc., are being used in smart

manufacturing. Recently, the Internet of Things has taken manufacturing to another new

level. The disadvantage is that, in many developing and underdeveloped countries such

as India, there is a lack of resources to set up a basic structure; as most businesses

operate in villages, there is a high cost of the smart infrastructure, skills, and training

deﬁcit among people in these technologies and a proﬁtable proper investment put a

barrier to implement this AI-based smart manufacturing technology. In less developed

countries, unemployment is the central issue that led to numerous constraints in the

absolute implementation of artiﬁcial intelligence. Other than that, according to experts AI

and new age technologies only become a crisis for people who cannot adapt themselves,

Implementation of Artiﬁcial Intelligence (AI) in Smart Manufacturing 81

readjust according to the market’s needs, or fail to become accustomed to new technology

and skills, leading to joblessness. The probability of jobs in various ﬁelds due to artiﬁcial

intelligence is shown in Fig. 8.

Fig. 8. The perceived probability of jobs due to Artiﬁcial Intelligence [45].

5 Research Limitation and Future Scope

Artiﬁcial intelligence and machine learning are extensively applied in today’s world in

different ﬁelds and purposes. Among them, smart manufacturing is one of the ﬁelds

where the implementation of AI technology is at its peak. But, there are many ways

to reap the beneﬁts of artiﬁcial intelligence, such as smart maintenance, better product

development, quality improvement, market adaptation, etc. Innovative care means main-

taining manufacturing machines and systems more brilliantly, i.e., reducing the mainte-

nance cost of appliances and types of equipment. As maintenance of equipment is one

of the most signiﬁcant expenses in manufacturing, it is necessary to implement smart

maintenance such as predictive maintenance (powered by AI algorithms such as neural

networks and machine learning), which will help save enormous amounts of money and

enhance RUL of machinery. Through better product development, one can assess and

examine the different parameters in production, such as available production resources,

budget, and time, which can be implemented with the help of deep learning models and

algorithms. To meet the highest standard and quality of products, machine learning, and

machine vision can be used to identify, detect and eliminate faults in products and alert

about the problems at the production line which may affect the overall production, lead-

ing toward production quality improvement. AI and ML techniques will help the smart

manufacturing industries improve supply chains and strategic vision and make them

interact with changes in the market by generating estimates relating to several factors

like political situation, weather, consumer behavior, the status of the economy, etc. The

utilization of AI, robots, and CPS will probably revolutionize mass production robots.

CPS can perform any laborious tasks at high speed in smart factory units, eradicating

human error and delivering superior levels of quality assurance. Unlike humans, AI and

industrial automation can easily carry out tasks in hazardous places. Overall, AI-run

smart machines can provide skilled workers, engineers, and scientists opportunities to

focus on their complex and innovative functions in science, engineering, and technology

rather than tedious and ordinary human tasks. But, the lack of necessary skills of workers

regarding AI technology, especially in developing countries like India, may hinder the

82 A. S. Choudhury et al.

progress of AI in the Industry 4.0 manufacturing, which can only be solved by educating

and equipping them with these AI-based technologies.

6 Indian Scenario

In India, Industry 4.0 smart manufacturing induces the industrial stalwarts to lay the

groundwork for smart factories and adopt modern and innovative technologies. The

Indian Government has recently initiated the Smart Advanced Manufacturing and Rapid

Transformation Hub (SAMARTH). To improve the application of AI-based smart manu-

facturing in the current context, the Indian Government is developing a National Policy on

Advanced Manufacturing [1]. Our country has achievedtechnological excellence by inte-

grating Cyber-Physical Systems (CPS) and Information and Communication Technolo-

gies (ICT) into Advanced Manufacturing Technologies (AMTs). Increased automation

in additive manufacturing, Advanced Manufacturing Systems, manufacturing robotics,

advanced analytics, and Big Data are all worth mentioning in the design of smart manu-

facturing for Industry 4.0. They will help Indian MSMEs become more internationally

competitive and contribute to global value creation [39]. Though adoption of artiﬁ-

cial intelligence is less in India, there has been a remarkable transformation in all the

Indian industrial sectors where companies are adopting, developing, and integrating AI

technologies in their products and industrial processes, such as electronics, heavy elec-

tricals, automobiles, ﬁntech, software/IT, agriculture, agrobased industries, etc. [40]. In

terms of government funding, the Union Cabinet approved the launch of the National

Mission regarding Interdisciplinary Cyber-Physical Systems (NM-ICPS) in 2019, which

the Department of Science and Technology (DST) will execute with an unlimited budget

of INR 3660 Cr (USD 494 Mn) for ﬁve years to make India a leader in Cyber-Physical

Systems (which includes AI, ML, and IoT) (FY 2019–20 to 2023–24). The mission’s

goal is to build a strong and stable ecosystem for CPS technologies in India, which

would help the country’s Industry 4.0 manufacturing sector thrive [41]. SMEs have sig-

niﬁcant advantages in terms of innovation in general, but they face a variety of obstacles

in India [42]. Though several countries have decided on their strategy for AI, India has

not yet formulated its strategy in Industry 4.0 [43]. Another disadvantage is the lack of

skilled workers in AI technology in our country; unemployment will rise in India. On

one side cities will be equipped with all modern facilities and will be becoming smart

and other side jobs will be killed due to transformation. Low and Middle skills level

jobs will be shrunk, but high-skilled jobs where the critical decision will have to take

will exist as machines cannot resemble human intelligence in case of making critical

decisions. This transformation will add new development aspects to India’s infrastruc-

ture and enhance the economic status in the coming years. However, few jobs in a few

sectors will disappear because of transformation through AI in the next 5 to 10 years

[44].

7 Conclusion

Although AI is still considered a nascent stage in Industry 4.0 manufacturing, one can still

hopefully say that technological transformations are occurring. 5G technologies in com-

munication can improve the overall efﬁciency and productivity, which has high network

Implementation of Artiﬁcial Intelligence (AI) in Smart Manufacturing 83

reliability and support IoT and CPS devices according to the industry requirements.

Other advancements include lights-out manufacturing, which can create and regulate

production with minimal human interaction, and smart and dynamic technology, which

can be effective in areas with high production rates and low human error rates. Setting

up an AI infrastructure platform may be costly due to advanced machines and equip-

ment, but this reduces the labor required to ﬁnish the ﬁnal product. But the advanced

technology of AI-based applications in the Indian scenario will be extracted fully in the

SME sector, which can be achieved 100% by providing incentives and encouragement

to SMEs (because most of the people in India are employed in this sector). Similarly,

the Indian educational system needs to be enhanced to enormously extract the potential

beneﬁts of these technologies.

References

1. Rizvi, A.T., Haleem, A., Bahl, S., Javaid, M.: Artiﬁcial intelligence (AI) and its applications

in indian manufacturing: a review. In: Acharya, S.K., Mishra, D.P. (eds.) Current Advances in

Mechanical Engineering. LNME, pp. 825–835. Springer, Singapore (2021). https://doi.org/

10.1007/978-981-33-4795-3_76

2. Kim, D.-H., et al.: Smart machining process using machine learning: a review and perspective

on machining industry. Int. J. Precis. Eng. Manuf. Green Technol. 5(4), 555–568 (2018).

https://doi.org/10.1007/s40684-018-0057-y

3. Trakadas, P., et al.: An Artiﬁcial intelligence-based collaboration approach in industrial IoT

manufacturing: key concepts, architectural extensions and potential applications. Sensors

20(19), 5480 (2020). https://doi.org/10.3390/s20195480

4. Li, B., Hou, B., Yu, W., Lu, X., Yang, C.: Applications of artiﬁcial intelligence in intelligent

manufacturing: a review. Front. Inf. Technol. Electron. Eng. 18(1), 86–96 (2017). https://doi.

org/10.1631/FITEE.1601885

5. Angelopoulos, A., et al.: Tackling faults in the industry 4.0 era—a survey of machine-learning

solutions and key aspects. Sensors 20(1), 109 (2020). https://doi.org/10.3390/s20010109

6. Kumar, M., Aggarwal, A., Rawat, T.K.: Bat algorithm: application to adaptive inﬁnite impulse

response system identiﬁcation. Arab. J. Sci. Eng. 41(9), 3587–3604 (2016)

7. Tsai, M.-F., Chu, Y.-C., Li, M.-H., Chen, L.-W.: Smart machinery monitoring system with

reduced information transmission and fault prediction methods using industrial Internet of

Things. Mathematics 9(1), 3 (2021). https://doi.org/10.3390/math9010003

8. McCulloch, W.S., Pitts, W.: A logical calculus of the ideas immanent in nervous activity. Bull.

Math. Biophys. 5(4), 115–133 (1943). https://doi.org/10.1007/BF02478259

9. Hotait, H., Chiementin, X., Rasolofondraibe, L.: Intelligent online monitoring of rolling

bearing: diagnosis and prognosis. Entropy 23(7), 791(2021)

10. Escobar, C.A., Morales-Menendez, R.: Machine learning techniques for quality control in

high conformance manufacturing environment. Adv. Mech. Eng. 10(2), 1–16 (2018). https://

doi.org/10.1177/1687814018755519

11. Pai, P.F., Hong, W.C.: Forecasting regional electricity load based on recurrent support vector

machines with genetic algorithms. Electric Power Syst. Res. 74 (3), 417–425 (2005)

12. Ashwini, K., Rudraswamy, S.B.: Automated inspection system for automobile bearing seals.

Mater. Today Proc. 46(10), 4709–5471 (2020). https://doi.org/10.1016/j.matpr.2020.10.301

13. Butte, S., Prashanth, A.R., Patil, S.: Machine learning based predictive maintenance strat-

egy: a super learning approach with deep neural networks. In: 2018 IEEE Workshop on

Microelectronics and Electron Devices (WMED), pp. 1–5 (2018)

84 A. S. Choudhury et al.

14. Tang, S., Yuan, S., Zhu, Y.: Deep learning-based intelligent fault diagnosis methods toward

rotating machinery. IEEE Access 8, 9335–9346 (2020)

15. Tian, Z.: An artiﬁcial neural network method for remaining useful life prediction of equipment

subject to condition monitoring. J. Intell. Manuf. 23(1), 227–237 (2012). https://doi.org/10.

1007/s10845-009-0356-9

16. Borith, T., Bakhit, S., Nasridinov, A., Yoo, K.-H.: Prediction of machine inactivation status

using statistical feature extraction and machine learning. Appl. Sci. 10(21), 7413 (2020).

https://doi.org/10.3390/app10217413

17. Ertu˘grul, Ö.F.: A novel approach for extracting ideal exemplars by clustering for massive

time-ordered datasets. Turk. J. Electr. Eng. Comput. Sci. 25(4), 2614–2634 (2017). https://

doi.org/10.3906/elk-1602-341

18. Miorandi, D., Sicari, S., De Pellegrini, F.: Internet of things: vision, applications and research

challenges. Ad Hoc Netw. 10(7), 1497–1516 (2012). https://doi.org/10.1016/j.adhoc.2012.

02.016

19. Beyerer, J., Usländer, T.: Industrial internet of things supporting factory automation.

at-Automatisierungstechnik 64(9), 697–698 (2016). https://doi.org/10.1515/auto-2016-0104

20. Ding, S., Zhao, H., Zhang, Y., Xu, X., Nie, R.: Extreme learning machine: algorithm, theory

and applications. Artif. Intell. Rev. 44(1), 103–115 (2013). https://doi.org/10.1007/s10462-

013-9405-z

21. Qin, Y., Chen, D., Xiang, S., Zhu, C.: Gated dual attention unit neural networks for remaining

useful life prediction of rolling bearings. IEEE Trans. Ind. Inf. 17(9), 6438–6447 (2021).

https://doi.org/10.1109/TII.2020.2999442

22. Kroll, B., Schaffranek, D., Schriegel, S., Niggemann, O.: System modeling based on machine

learning for anomaly detection and predictive maintenance in industrial plants. In: Proceedings

of the 2014 IEEE ETFA, pp. 1–7 (2014). https://doi.org/10.1109/ETFA.2014.7005202

23. Dubois, D., Prade, H.: Possibility theory is not fully compositional! A comment on a short

note by H.J. Greenberg. Fuzzy Sets Syst. 95(1), 131–134 (1998)

24. Krishnasamy, L., Khan, F., Haddara, M.: Development of a risk-based maintenance (RBM)

strategy for a power-generating plant. J. Loss Prev. Process Ind. 18(2), 69–81 (2005). https://

doi.org/10.1016/j.jlp.2005.01.002

25. Shilaskar, S., Ghatol, A., Chatur, P.: Medical decision support system for extremely

imbalanced datasets. Inf. Sci. 384, 205–19 (2017). https://doi.org/10.1016/j.ins.2016.08.077

26. Rena, L., Suna, Y., Cuia, J., Zhang, L.: Bearing remaining useful life prediction based on deep

autoencoder and deep neural networks. J. Manuf. Syst. 48(C), 71–77 (2018). https://doi.org/

10.1016/j.jmsy.2018.04.008

27. Gupta, P., Pradhan, M.K.: Fault detection analysis in rolling element bearing: a review. Mater.

Today Proc. 4(2), 2085–2094 (2017)

28. Nath, A.G., Udmale, S.S., Singh, S.K.: Role of artiﬁcial intelligence in rotor fault diagnosis:

a comprehensive review. Artif. Intell. Rev. 54(4), 2609–2668 (2020). https://doi.org/10.1007/

s10462-020-09910-w

29. Golub, T.R., et al.: Molecular classiﬁcation of cancer: class discovery and class prediction

by gene expression monitoring. Science 286(5439), 531–7(1999). https://doi.org/10.1126/sci

ence.286.5439.531

30. Aggour, K.S., et al.: Artiﬁcial intelligence/machine learning in manufacturing and inspection:

a GE perspective. MRS Bull. 44(7), 545–558 (2019). https://doi.org/10.1557/mrs.2019.157

31. Mohapatra, P., Chakravarty, S., Dash, P.K.: Microarray medical data classiﬁcation using kernel

ridge regression and modiﬁed cat swarm optimization based gene selection system. Swarm

Evolut. Comput. 28, 144–60 (2016). https://doi.org/10.1016/j.swevo.2016.02.002

32. Chen, Z., Gryllias, K., Li, W.: Mechanical fault diagnosis using convolutional neural networks

and extreme learning machine. Mech. Syst. Signal Process. 133(1), 106272 (2019). https://

doi.org/10.1016/j.ymssp.2019.106272

Implementation of Artiﬁcial Intelligence (AI) in Smart Manufacturing 85

33. Zhang, H.-G., Zhang, S., Yin, Y.-X.: A novel improved ELM algorithm for a real industrial

application. Math. Probl. Eng. 2, 1–7 (2014). https://doi.org/10.1155/2014/824765

34. García-Nieto, J., Alba, E.: Parallel multi-swarm optimizer for gene selection in DNA

microarrays. Appl. Intell. 37(2), 255–266 (2012). https://doi.org/10.1007/s10489-011-0325-9

35. Martinez, D., Brewer, W., Behm, G., Strelzoff, A., Wilson, A., Wade, D.: Deep learning

evolutionary optimization for regression of rotorcraft vibrational spectra. In: 2018 IEEE/ACM

Machine Learning in HPC Environments (MLHPC), pp. 57–66 (2018). https://doi.org/10.

1109/MLHPC.2018.8638645

36. Wang, A., An, N., Chen, G., Yang, J., Li, L., et al.: Incremental wrapper based gene selec-

tion with Markov blanket. In: 2014 IEEE International Conference on Bioinformatics and

Biomedicine (BIBM), pp. 74–79. IEEE (2014). https://doi.org/10.1109/BIBM.2014.6999251

37. Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, A.: Distributed feature selection:

an application to microarray data classiﬁcation. Appl. Soft Comput. 30, 136–150 (2015).

https://doi.org/10.1016/j.asoc.2015.01.035

38. Chazhoor,A., Mounika, Y., Vergin Raja Sarobin, M., Sanjana, M.V., Yasashvini, R.: Predictive

maintenance using machine learning based classiﬁcation models. IOP Conf. Ser. Mater. Sci.

Eng. 954(1) (2020). https://doi.org/10.1088/1757899X/954/1/012001

39. Sankararaju, M., Dharmar, S.: Design of low power CMOS LC VCO for direct conversion

transceiver. Turk. J. Electr. Eng. Comput. Sci. 24(4), 3263–3273 (2016)

40. Hossain, M., Muhammad, G., Guizani, N.: Explainable AI and mass surveillance system-

based healthcare framework to combat COVID-19 like pandemics. IEEE Network 34(4),

126–132 (2020). https://doi.org/10.1109/MNET.011.2000458

41. Acar, E., Yilmaz, I.: COVID-19 detection on IBM quantum computer with classical-quantum

transfer learning. Turk. J. Electr. Eng. Comput. Sci. 29(1), 46–61 (2021). https://doi.org/10.

3906/elk-2006-94

42. Krishnaswamy, K.N., Bala Subrahmanya, M.H., Mathirajan, M.: Technological innovation

induced growth of engineering industry SMEs: case studies in Bangalore. Asian J. Innov.

Policy 4(2), 217–41(2015). https://doi.org/10.7545/AJIP.2015.4.2.217

43. Conti, M, Dehghantanha, A., Franke, K., Watson, S.: Internet of things security and forensics:

challenges and opportunities. Future Gener. Comput. Syst. 78(2), 544–546 (2018). https://

doi.org/10.1016/j.future.2017.07.060

44. Mendoza, C.V., Kleinschmidt, J.H.: Mitigating on-off attacks in the Internet of Things using

a distributed trust management scheme. Int. J. Distrib. Sens. Netw. 11(11), 859731 (2015)

45. Chen, R., Guo, J., Bao, F.: Trust management for SOA-based IoT and its application to service

composition. IEEE Trans. Serv. Comput. 9(3), 482–95 (2014)

46. Abderrahim, O.B., Elhedhili, M.H., Saidane, L.: DTMS-IoT: a Dirichlet-based trust man-

agement system mitigating OnOff attacks and dishonest recommendations for the Internet of

Things. In: IEEE/ACS 13th International Conference of Computer Systems and Applications

(AICCSA), Agadir, Morocco, pp. 1–8 (2016)

47. Zheng, P., et al.: Smart manufacturing systems for Industry 4.0: conceptual framework, sce-

narios, and future perspectives. Front. Mech. Eng. 13(2), 137–150 (2018). https://doi.org/10.

1007/s11465-018-0499-5

48. Ding, H., Gao, R.X., Isaksson, A.J., Landers, R.G., Parisini, T., Yuan, Y.: State of AIBased

monitoring in smart manufacturing and introduction to focused section. IEEE/ASME Trans.

Mechatron. 25(5), 2143–2154 (2020). https://doi.org/10.1109/TMECH.2020.3022983

Emerging Computing Computational

Intelligence

Flight Fare Prediction Using Machine Learning

K. P. Arjun1(B), Tushar Rawat2, Rohan Singh2, and N. M. Sreenarayanan1

1Department of Computer Science and Engineering, GITAM University, Bengaluru, Karnataka,

India

arjunkppc@gmail.com, sree.narayanan1@gmail.com

2School of Computer Science and Engineering, Galgotias University, Greater Noida, Uttar

Pradesh, India

tusharrawat517@gmail.com, rohansingh7217@gmail.com

Abstract. The price of airline tickets can ﬂuctuate gradually and generally with

the same aircraft, independent of, in the seats that are closest together inside

the same cabinet. Customers have the expectation that they will pay decreased

expenses, whereas airlines work to maintain or even increase their overall earnings

while also working to improve their proﬁtability. To maximise their payload,

airlines employ a variety of mathematical methods, such as guessing and suitable

classiﬁcation, among others. Models that estimate the best open door buy and

models that anticipate the cost of a basic ticket are the two sorts of client-side

models that various industry professionals recommend in order to save clients

money. Both of these models fall under the category of client-side predictive

models. According to our research, models on both sides depend on the restricted

performance of several components, such as actual ticket price data, the date

the ticket was purchased, and the date the passenger exited the venue. Many

individuals take ﬂights on a regular basis, and as a result, they are familiar with

the times of year that provide the best deals on airline tickets. Despite this, there

are a great number of individuals who have recently purchased tickets but wind

up falling prey to the snares made by organisations, as a result of which they wind

up spending more money than they should have.

Keywords: Airline price ·Machine learning ·Flutter ·Flask ·Random Forest ·

Flight ticket

1 Introduction

Calculations of forecasts are essential to the process of matching customers with the

appropriate products and typically involve the use of real customer data. A related con-

dition does not strictly anticipate the future item but rather suggests an item that does

not occur in the actual information but that the customer could appreciate. In most cases,

the focus of these proposal methods is positioned appropriately at the appropriate point

in a sequence [1].

Despite this, there are situations in which it is necessary to predict as well as maybe

suggest. A good illustration of this is the process of booking ﬂights, in which the objective

R. Mehra et al. (Eds.): ICCISC 2022, CCIS 1672, pp. 89–99, 2022.

https://doi.org/10.1007/978-3-031-22915-2_8

90 K. P. Arjun et al.

examples for the guaranteed customer might either be constant or change depending on

a huge number of complex elements [2].

This challenging combat booking area is the focal point of this study, with a spe-

ciﬁc interest in measuring client ways of behaving for this mixed forecast proposal

environment. Speciﬁcally, this exploration has a speciﬁc interest in gauging client ways

of behaving. Researching the efﬁcacy of algorithmic ways in suggesting the follow-

ing objective reserving for a carrier client is the purpose of our examination. The vast

majority of the previous work in both the expectation and suggestion sectors has been

developed and evaluated solely on veriﬁable datasets. This is true of both regions. A lim-

ited number of earlier examinations have conducted research with actual customers to

evaluate the generated models. In the course of this investigation, we conduct evaluations

using both genuine information and actual customers [3].

The purpose of this effort is to promote the development of an application that will

use the AI model to predict the cost of ﬂight for a variety of airlines. The customer will

receive the features that were anticipated, and using it as a reference, the customer may

choose to purchase his tickets in the same manner. As a result of the same problem,

airlines are attempting to keep a tighter rein on the prices of their tickets in order to

boost their earnings [4]. Numerous individuals have made ﬂying their primary mode of

transportation, and as a result, they are constantly on the lookout for ways to cut costs

when they make their reservations. However, there are a lot of people who are not used

to purchasing tickets, and as a result, they frequently ﬁnd themselves falling into the

wrong trap set by organisations, which results in them paying more money than they

should have. The proposed structure has the potential to assist clients save a signiﬁcant

amount of rupees by displaying booking information to them via the most advantageous

open door [5]. We have constructed a model out of wood, however it is not appropriate

for use in estimating the cost of aircraft due to the many different aspects that play a role

in determining the cost of aircraft [6].

To estimate future trip costs, our team came up with the Random Forest Regression

Algorithm as a result of the fact that it employs both regression and classiﬁcation in its

prediction-making process, resulting in a more precise outcome. The research that we

carried out lends credence to this notion [7].

The cost of airline tickets can be hard to see, today we can see the price, see the

cost of the same ﬂight tomorrow will be another matter. We may have often heard

travelers say that the cost of airline tickets is unpredictable [8]. Air travel has become

an important means of transportation travel a signiﬁcant distance. To maximize proﬁts,

airlines use an integrated pricing system called “yield the board” to calculate the cost

of each trip. Competition, etc. The ultimate goal of earning extra proﬁt on each aircraft.

Since travelers are generally willing to admit that air travel costs are increasing when

the purchase date is closer to the departure date, they usually purchase airline tickets

from a city farther away from the departure date as expected [9]. However, buying this

way is not right. It is like failing and in the process the passenger will be spending a lot

of money.

After modeling the guessing system, it was important to make a visual interface that

is easy to use and can be used on any device running on any operating system. AI is

a mathematical investigation that will work best through experience [10]. Ofﬁcially he

Flight Fare Prediction Using Machine Learning 91

works at work, based on performance, due to involvement. It is a topic under Artiﬁ-

cial Intelligence. While AI controls the in-depth functions performed by a non-human

professional, ML controls selective knowledge-based choices. AI is a very large ﬁeld

in software engineering. AI can be redirected, or on the other hand can be redirected.

Problems in ML included. Compilation: if a few details are given, we need to plan a

speciﬁc way to put it together. Repetition: given a small amount of information, can we

expect the results to be respected? The information used in ML can be of three types:

Categorical Nominal, Categorical Ordinal, and Continuous. We look forward to working

with duplicate models that we suspect can provide us with more accurate results. Filtered

models to work with [11].

In fact, it is undoubtedly a challenge for travelers to foresee when the best opportunity

to buy war tickets is for the following reasons:

•Incomplete Information: Travelers can access part of the network carrier data. Truth

be told, they do not have access to important information, such as the number of extra

tickets and understanding between network company organizations [12].

•Different Information: Data that can be obtained by sailors is categorized. For example,

it is undoubtedly a challenge for the average inspector to ﬁnd a relationship between

ﬂight costs and ﬂight costs, such as travel expenses, departure time, and so on.

•Unusual Changes: Although inspectors cannot collect a guaranteed ﬂight amount, the

cost change is not smooth. In fact, not all reports are predictable. So, travelers can

only anticipate future ﬂight costs with great effort in terms of recorded prices.

2 Literature Review

A dataset consisting of 1814 information trips on Aegean Airlines was gathered and

used to prepare an artiﬁcial intelligence model for the research work. This procedure

was used for the research work that was proposed by K. Tziridis T [13] on air fare cost

expectation using machine learning procedure. To demonstrate how the identiﬁcation of

highlights can affect the accuracy of a model, a varied number of elements was used to

prepare each model.

The next piece of research to be provided is a concentration on suggestion that

William Forests, an expert, should work on improving buy timing for customers. In

order to construct a model, an incomplete least square relapse technique is applied.

A study on aeroplane passage anticipation using AI calculation uses a small dataset

consisting of travels between Delhi and Bombay, as stated by the author Supriya Rajankar

[14]. The calculations K-closest neighbours (KNN), straight relapse, and support vector

machine (SVM), among others, are applied.

Research carried out by Santos [15] investigates the cost of ﬂying from Madrid to

London, Frankfurt, New York, and Paris over the course of a few short months. The

number of days that are considered to be adequate before booking an airline ticket is

provided by the model. Tianyi Wang [16] developed a framework in which two informa-

tion bases are integrated along with information regarding macroeconomics and artiﬁcial

intelligence calculations, such as support vector machine.

The aforementioned algorithms each have their own drawbacks, such as the fact that

there is not enough data in the system to make an accurate prediction. The accuracy

92 K. P. Arjun et al.

of the system shifts whenever the algorithm is altered, which can make things a little

bit confused, despite the fact that the accuracy shifts signiﬁcantly only when essential

elements are disabled. These studies are now considered to be obsolete as a result of the

proliferation of new airlines, signiﬁcant shifts in the cost of oil, and rising prices for a

variety of other goods and services [17–20].

3 Proposed Methodology

In order to predict the price of a plane it was necessary to consider all possible parameters

and how it effects the price of the aircraft in order to improve the Random Forest machine

model which provides almost the most accurate result based on the data provided. Table 1

represents the columns name and its description.

Tabl e 1 . Parameters for price prediction.

Name Description

Origin The place where ﬂight will star

from

Destination Place where the ﬂight has to

reach

Departure date The departure date of the ﬂight

Arrival date The arrival date of the ﬂight

Departure time The departure time of the ﬂight

“HH:MM”

Arrival time The arrival time of the ﬂight

“HH:MM”

Airline company The airline company whose

ﬂight

we are using

Duration Total time taken by the ﬂight to

complete the journey

Stops Number of stops between origin

and destination

To make a for “Airfare Prediction” model in light of past carrier ticket deals dataset for

further developing deals in Indian Domestic Airline. Our fundamental thought process

is to furnish the client with a forecast framework from which it can take an ideal choice

of expanding or diminishing the Airfare so the ﬂight doesn’t go unﬁlled or no cash is

lost because of unexpected expansion in unreﬁned petroleum [8].

a) To perform information investigation on client’s ticket booking information for a

short measure of time.

Flight Fare Prediction Using Machine Learning 93

b) To reﬁne the information for example Eliminating copy records, vagueness and so

forth.

c) To perform Feature designing to separate signiﬁcant component from dataset for

expectation.

d) To Brainstorm the Features for example to choose how to utilize those elements

e) To make highlights for example to get new highlights from those helpful elements.

The proposed framework is made out of four stages [13]:

1. Dataset Selection

2. Data Cleaning

3. Feature Extraction

4. Machine Learning Model Selection

3.1 Information Input

Input data is given to the system in the form of a.csv ﬁle. The dataset that was chosen from

Kaggle serves as both the training dataset and the testing dataset. The data only pertain

to ﬂights within the country. In total, our dataset is comprised of 11 columns. [https://

www.kaggle.com/datasets/shubhambathwal/ﬂight-price-prediction]. Figure 1shows the

columns header listing.

Fig. 1. Columns in training dataset

94 K. P. Arjun et al.

3.2 Data Cleaning

The process of cleaning data involves removing any instances of null values from the

dataset and replacing them with more appropriate values. These values are typically the

mean, median, or mode of the other data in the column. The presence of null values in

the dataset may have an impact on the accuracy of the model. The data cleaning steps

shows in Fig. 2.

Fig. 2. Cleaning of data

3.3 Feature Extraction

In this phase we try to extract new features from the dataset which will help to train

model more accurate and prediction becomes easy and convenient shows in Fig. 3.New

Features is added to the dataset which becomes the discriminating factor of price of ﬂight

and the reason of their variation. Figure 4shows the correlation between the different

feature in the dataset.

Features that can be considered as deciding factor of ﬂight fare are.

•Feature 1: date and time of time

•Feature 2: date and time of departure

•Feature 3: How the early the ticket is booked

•Feature 4: Type of passenger (Adult/Child)

•Feature 4: Class of the ﬂight booked (Economy/Business)

•Feature 5: Departure Location

•Feature 6: Destination Location

Flight Fare Prediction Using Machine Learning 95

Fig. 3. Feature extraction in model.

Fig. 4. Correlation between attributes

3.4 Machine Learning Model Selection

There are bunch of Machine Learning algorithm to choose from each having their own

pros and cons. Linear regression being easy to train and simple to test but with less

accuracy we decide not to move forward with it. Decision trees are essentially of two

kinds of arrangement and regression tree where arrangement is utilized for unmitigated

values and regression is utilized for persistent qualities. Decision tree picks autonomous

variable from dataset as choice hubs for independent direction.

Random forest fundamentally utilizes gathering of decision trees as gathering of

models. Random amount of information is passed to choice trees and every decision tree

predicts values as indicated by the dataset given to it. From the expectations went with

by the decision trees the typical worth of the anticipated qualities whenever considered

96 K. P. Arjun et al.

as the result of the arbitrary woods model. Since it uses both regression and classiﬁcation

we ﬁnd it to be the best ﬁt for our system.

4 Result and Discussions

In proposed work, developed various algorithms such as Linear Regression, Decision

Tree Decision, Random Forest Depression and compared the accuracy of the results

based on our set of experimental data. Based on various levels of accuracy we ﬁnd that

the Random Forest Regression provides the highest accuracy at 81% shows in Table 2.

So we selected Random Forest Resolve and built user interface based on it.

Tabl e 2 . Accuracy of different algorithms

Algorithms Accuracy

Linear Regression 0.61

Decision Tree

Regression

0.64

Random Forest

Regression

0.85

4.1 Performance Metrics

Performance measurements are validated models that will be used to determine the

accuracy of AI models suitable for various calculations. The sklearn.metrics module will

be used to apply the deﬁciencies in each model using backslide scales. The following

measurements will be used to assess the bumble level of each model.

4.1.1 MAE (Mean Absolute Error)

A small component of mathematical accuracy is called the Mean Absolute Error (MAE)

as Eq. 1. Mean Absolute Error is basically, as the name suggests, a description of obvious

errors. Direct error is the actual value of the difference between the expected value and

the actual value. It Means Perfect Error It means measuring accuracy of a ﬁxed object.

MAE =1

nx−x



(1)

n=the number of errors,

=summation symbol (which means “add them all up”),

x−ˆx=the absolute errors.

Lesser the value of MAE the better the performance of your model.

Flight Fare Prediction Using Machine Learning 97

4.1.2 MSE (Mean Square Error)

Mean Square Error squares the distinction of real and anticipated result esteems prior to

adding them all rather than utilizing the outright worth shows in Eq. 2.

MSE =1

n∗(actual −forecast)(2)

n=number of items,

=summation notation,

Actual =original or observed y-value,

Forecast =y-value from regression.

4.1.3 RMSE

It is more noticeable than MAE and lower RMSE value among various models to improve

the presentation of that model shows in Eq. 3. R2 (Coefﬁcient of assurance) Helps you

to see how the free factor has changed with the ﬂexibility of your model.

RMSE =(Forecast −Actual)2

n(3)

To use the random tree regression, we used a number of scales like 1000 and the

number of random circuits was 42. This measurement process is well suited for informal

data where dependence between factors is difﬁcult to identify. The Fig. 5shows the

proposed random forest method’s performance matrix.

Fig. 5. Metrics of Random Forest algorithm

5 Conclusion and Future Scope

At the moment, there are a great deal of domains in which management is predicated on

expectancies. One such domain is stock trading and management, which uses items that

reﬂect the number of shares traded, such as Zestimate, which gives a proven quantity of

the costs associated with real estate. In the airline industry, a need for management like

this that can assist customers in booking tickets has arisen as a direct result of this need.

98 K. P. Arjun et al.

There has been a signiﬁcant amount of study conducted on this topic making use of a

variety of methods, and additional testing is anticipated to work towards understanding

expectations through the use of a variety of statistics. Information that is more accurate

and has better features can be used in the same way to get results that are more accurate.

In future, our research could be expanded to include air exchange ticketing data,

which could provide additional insight into a speciﬁc schedule, such as time and date of

departure, appearance, coverage, etc. Model weather forecast for daily ﬂight or hourly

rate. In addition, the cost of a ﬂight on the market segment may be affected by the

unpredictable inﬂux of large numbers of travelers brought about by different events.

References

1. Sharma, L., Carpenter, M. (eds.): Analysis of machine learning techniques for airfare predic-

tion. In: Computer Vision and Internet of Things: Technologies and Applications, 1st edn.

Chapman and Hall/CRC (2022). https://doi.org/10.1201/9781003244165

2. Khandelwal, K., Sawarkar, A., Hira, S.: A novel approach for fare prediction using machine

learning techniques. Int. J. Next Gener. Comput. Suppl. 12(5), 602–609 (2021). 8p.

3. Arjun, K.P., Achuthshankar, A., Soumya, M.K., Sreenarayanan, N.M., Priya, V.V., Faby, K.A.:

PROvacy: protecting image privacy in social networking sites using reversible data hiding.

In: 2016 10th International Conference on Intelligent Systems and Control (ISCO), pp. 1–4,

January 2016

4. Gupta, J., Singh, I., Arjun, K.P.: Artiﬁcial Intelligence for Blockchain I, Blockchain, Internet

of Things, and Artiﬁcial Intelligence, vol. 6. CRC Press, April 2021

5. Achuthshankar,A., Achuthshankar, A., Arjun, K., Sreenarayanan, N.: Encryption of reversible

data hiding for better visibility and high security. Procedia Technol. 25, 216–223 (2016)

6. Groves, W., Gini, M.: An agent for optimizing airline ticket purchasing. In: Ito, Jonker, Gini,

Shehory (eds.) Proceedings of the 12th International Conference on Autonomous Agents and

Multiagent Systems (AAMAS 2013), Saint Paul, Minnesota, USA, 6–10 May 2013

7. Biswas, P., et al.: Flight price prediction: a case study. Int. J. Res. Appl. Sci. Eng. Tech-

nol. (IJRASET) 10(6) (2022). https://doi.org/10.22214/ijraset.2022.43666. ISSN: 2321-9653

8. Champawat, J.S., Arora, U., Vijaya, K.: Indian ﬂight fare prediction: a proposal. Int. J. Adv.

Technol. Eng. Sci. 9(3) (2021)

9. Tian, H., Presa-Reyes, M., Tao, Y., et al.: Data analytics for air travel data: a survey and new

perspectives. ACM Comput. Surv. 54(8), 1–35 (2022)

10. Joseph, J., et al.: Flight ticket price predicting with the use of machine learning. Int. J. Adv.

Trends Comput. Sci. Eng. 10(2), 1243–1246 (2021). https://doi.org/10.30534/IJATCSE/2021/

1071022021

11. Oza, R.K., Jain, A.V., Raval, A.S.: Machine learning techniques for predicting airfare prices

based on reviews 9(3) (2020). ISSN: 2319-8753 ISSN: 2347-6710

12. Tziridis, K., Kalampokas, T., Papakostas, G., Diamantaras, K.: Airfare price prediction using

machine learning techniques. In: 25th European Signal Processing Conference (EUSIPCO)

(2017)

13. Rajankar, S., Sakhrakar, N., Rajankar, O.: Flight fare prediction using machine learning

algorithms. Int. J. Eng. Res. Technol. (IJERT) (2019)

14. Santos Domínguez-Menchero, J., Rivera, J., Torres-Manzanera, E.: Optimal purchase timing

in the airline market. J. Air Transp. Manag. 40, 137–143 (2014). ISSN 0969-6997

15. Shabana, T., Aﬁfa, S., Naziya, S., Mariya, K.: A novel machine learning methodology to

increase sales in business services. Int. J. Comput. Sci. Eng. 6(12), 924–926 (2018)

Flight Fare Prediction Using Machine Learning 99

16. Wang, T., et al.: A framework for airfare price prediction: a machine learning approach. In:

2019 IEEE 20th International Conference on Information Reuse and Integration for Data

Science (IRI), pp. 200–207 (2019). https://doi.org/10.1109/IRI.2019.00041

17. Thirumuruganathan, S., Jung, S., Robillos, D.R., Salminen, J., Jansen, B.J.: Forecasting the

nearly unforecastable: why aren’t airline bookings adhering to the prediction algorithm?

Electron. Commer. Res. 21(1), 73–100 (2021)

18. Ratnakanth, G.: Prediction of ﬂight fare using deep learning techniques. In: International

Conference on Computing, Communication and Power Technology (IC3P), pp. 308–313

(2022). https://doi.org/10.1109/IC3P52835.2022.00071

19. Subramanian, R.R., Murali, M.S., Deepak, B., Deepak, P., Reddy, H.N., Sudharsan, R.R.:

Airline fare prediction using machine learning algorithms. In: 2022 4th International Con-

ference on Smart Systems and Inventive Technology (ICSSIT), pp. 877–884 (2022).: https://

doi.org/10.1109/ICSSIT53264.2022.9716563

20. Abdella, J.A., Zaki, N., Shuaib, K., Khan, F.: Airline ticket price and demand prediction: a

survey. J. King Saud Univ. Comput. Inf. Sci. 33(4), 375–391 (2021)

Impact of Work from Home During Covid-19

on the Socio-economic Status of India

Poonam Ojha1, Sudhanshu Maurya2(B), and Manish Kumar Ojha3

1School of Management, Graphic Era Hill University Bhimtal Campus, Nainital 263132,

Uttarakhand, India

2School of Computing, Graphic Era Hill University Bhimtal Campus, Nainital,

Uttarakhand 263132, India

dr.sm0302@gmail.com

3Amity University Noida, Noida, Uttar Pradesh, India

Abstract. Socioeconomic status (SES) is an instrument to measure the economic

and social status of an individual or an economy concerning others. Though,

Socioeconomic status is more commonly used to represent an economic differ-

ence in any society. Work from home is now a day (Covid-19) contributing to

the nation for its socio-economic activities. This paper has examined the impact

of ‘work from home’ on the socio-economic status of India as so many people

became unemployed, the income of the society decreased as well as the Education

system was worse affected. The present situation of the pestilence provided great

importance to work from home (WFH) for many employees to have the opportu-

nity to both carries on working and safely from the risk of virus vulnerability. As

this Pandemic period is uncertain, working from home is more acceptable as the

new normal working way. On the contrary, to ﬁnd the impact of WFH on socioe-

conomic status, we took three variables: education, employment, and income &

wealth.

Keywords: Work from home ·Socio-economic status ·Education ·Income &

wealth ·Employment

1 Introduction

The socioeconomic study refers to the interaction between the social and economic

behavior of a group of people, linking ﬁnancial and social issues together. SES is a

prominent indicator of any nation’s economic as well as social position in the world.

This index decides the togetherness of socio-economic activities. “Pandemics are not

a new experience for the communities as they were recorded since prehistoric times.

During each pandemic, major changes were noticed in the areas of economics, local

and national policies, social behavior, and citizens’ mentalities as well. Opposing these

changes, it was detected that mentalities and social behavior were slightest potted as

the institutionalized modiﬁcations [1], through public policies, were not adequately

attached and synthesized with the psychosocial changes [2].” During the Pestilence of

Covid-19, it is realized that SES has been affected severely because of aberration of

R. Mehra et al. (Eds.): ICCISC 2022, CCIS 1672, pp. 100–113, 2022.

https://doi.org/10.1007/978-3-031-22915-2_9

Impact of Work from Home During Covid-19 101

social and economic activities. The COVID-19 pandemic is becoming furious and will

have its long-term effects worldwide, most probably resulting in structural effects on the

socio-economic status of India and other affected countries. “Like any other epidemics,

COVID-19 has caused noteworthy changes on all levels of modern-day society [3–8].”

The countrywide lockdown has ended up with ﬁnancial losses as well as affected all

segments of society including health, healthcare, and nutrition [15]. “Population density

[9–11], high degree of mobility of humans, and mass socialization, as well as cultural,

social, and tourism events [12–14] have been the basic reasons for COVID-19.” In this

description, in a nutshell, the main aim is to confer the effect of Work from Home in

rejoinder to the pestilence on education, income & wealth, and employment in India.

1.1 Education

From preschool to tertiary education, the education system has been affected, resultantly

worldwide policies have been introduced to target the complete shutdown of educational

institutions. Consequently, UNESCO estimated that this shutdown procedure of educa-

tional facilities has affected almost 900 million learners. At the same time as the objective

of these shutdowns is to prevent the spread of the virus and obviate carriage to defense-

less individuals in the institutions, these shutdowns have had ubiquitous socioeconomic

implications.

In the absence of a proper support system ofﬁce, work, and household work, as

well as home time and school time, were inseparable during the lockdown and the

playtime for children became zero [16]. “Every house became a school and each parent

a teacher, during lockdown when schools and colleges were closed across India. There

was no boundary between the playtime and my time for millions of children in the

country. Further, it was realized the paucity of a structured learning environment at

home with having a worse impact on overall learning and consequently affected the

overall education outcome [16]”, education and SES are depicted in Fig. 1.

“As almost 70% of the 1.4 million schools and 51,000 colleges with nearly 300

million children are run by government bodies in India, the rural schools and the parents

now face a bleak education system and emptiness even as government’s advisories ask

schools to go online, and the government is looking at ways in which course can be

designed so students do not suffer.” The impact of a long-term school shutdown is yet

to be seen.

1.2 Employment

Many IT sector companies prefer WFH at a wide scale to enhance workplace ﬂexibility

[17] and to reduce the worst impact on Society. The sudden importance and growth of

WFH have increased investigation of the WFH phenomenon, especially intending to

identify the number of jobs that can be done casually [18–22]. In general, the literature

overlooks the possible effects of WFH along with the unequal distribution of wages and

income. The causes of inequalities are multiple and distinct and have been growing in

eminence in policymakers, employment, and SES are depicted in Fig. 2.

102 P. Ojha et al.

According to Pouliakas and Branka (2020) and Fana et al. (2020), “the most defense-

less groups, such as women, non-natives, those with non-standard contracts (self-

employed and temporary workers), the lower educated, those employed in micro-sized

workplaces, and low-wage workers has been impacted by the COVID-19 pandemic.”

Consequently, Palomino et al. (2020) in their ﬁndings ﬁnd that the crisis has increased

the levels of inequality and poverty [23]. Beland et al. (2020) examined “the short-term

consequences of COVID-19 on employment and wages as in his ﬁndings suggested that

the unemployment rate has been increased due to COVID-19; Working Hours and labor

force participation has decreased and had no signiﬁcant impacts on wages [23].” Also,

this crisis has increased labor market inequalities. “According to the World Economic

Forum, the current pestilence compelled migrants to be trapped abroad and compromise

to the unfavorable circumstances, by taking up low-wage jobs, living in poor working

conditions, restricting spending, and thus, risk exposure to infections like the coronavirus

[24].”

1.3 Income and Wealth

Under our best observation, this study ﬁrst shows how an increase in WFH would have an

impact on changes in income and wealth, as shown in Fig. 3. The lower socio-economic

stratum (SES) has been greatly affected by the economic downturn during the current

pandemic [15]. “The three main areas that have an economic impact of covid-19 are

given below:

•Elevation in poverty i.e., approaching more people below the poverty line [25]

•Aggravation of socio-economic disparities [26,27], and

•Conciliation in health-related precautions (use of masks, social distancing, looking

for medical guidance in case of cough and fever, etc.).”

PWFH OEO

Education SES

Fig. 1. Education and SES

In the current situation inequalities of income & wealth shocked younger households

and middle-aged households respectively. One of the disruptions which are caused by

this pandemic has had a major bang on the remittance ﬂows used by migrant Indian

Impact of Work from Home During Covid-19 103

CWFH SE

EBS Employment

SES

Fig. 2. Employment and SES

workers; works as one of the ways of poverty diminution, economic development, and

boosting GDP. In India, remittances are anticipated to go down by about 23% in 2020;

with a remarkable gap to a growth of 5.5% in 2019 [28]. WFH system has emerged with

Covid-19, under which the people were suggested to work, study, and worship from

home.

TWFH BOBR

WG Income

SES

Fig. 3. Income and SES

Educationalists were also invited to adopt work from home system using technology,

as per the orders of The Ministry of Education. WFH for teachers has a few advantages

and disadvantages as well, for the performance of teachers. Also Work from home can

104 P. Ojha et al.

be carried out successfully if both the Educationalists and the educational institution go

through it dutifully [29]. Talking about certain disadvantages of WFH is that teachers

may not have any motivation to work due to a few constraints, like salary cuts, ﬁring, etc.,

which reduce their income, consequently an aberration of enthusiasm and motivation.

Although WFH is considered the most effective way of performing activities, it helps to

minimize pestilence crisis and helps to run economic activities to earn Income.

Education

Employment

Income

SES

Fig. 4. SES model

In India comorbidity of this Pandemic has a great impact on Socioeconomic status

(SES), especially during lockdown and post-lockdown. The above model shown in Fig. 4

of SES represents the relationship between three variables education, employment, and

Income & wealth that are analyzed based on the independent variables, given below:

i. preference for work from home (PWFH),

ii. comfortable with work from home (CWFH),

iii. Time to work from home (TWFH)

and dependent variables are also given below:

i. online classes effective than ofﬂine (OEO),

ii. Self-employed (SE),

iii. economically beneﬁcial for society (EBS)

iv. boost in online business revenue (BOBR) and

v. larger wealth gap (WG).

One of the most notable to this model is the socio-economic status of India is framed

by OEO, SE, EBS, BOBR, and the WG. During covid-19, the online classes were more

Impact of Work from Home During Covid-19 105

effective than Ofﬂine classes as children realized as per their safety basis with this

the online business revenue has increased as various activities have only one way to be

performed i.e., Online. People enjoyed lockdown with the help of online games and other

entertainment options, hence we can say lockdown enhanced the use of online platforms.

Due to this pandemic, people realized to have technical knowledge that again encouraged

cognitive behavior. Revenue from online businesses encouraged online employment

in the form of self-employment which could decrease unemployment. WFH also has

optimistic brim over effects on workers as it is beneﬁcial to them for increased income

and reduced infection risks [18].

“As in the US economy [23], Beland et al. (2020) examined that covid-19 leads to an

increase in the unemployment rate, working hours, as well as the participation of the labor

force, has decreased; India faced the same issues due to which income and employment

level went down.” This happening allows other problems, like a larger wealth gap with

increased income inequalities and poverty to have emerged. Further, this increased the

scope for self-employment during the post-lockdown period under a good preview of the

SES (Socio-economic status) of India. For the growth of any economy like India, SE &

BOBR play a vital role to design a digniﬁed SES. With this reference model of SES,

we examined the performance of the OEO, SE, BOBR & wealth gap in the landscape

of education, employment, and income & wealth to encourage the growth of SES in

India. Pandemic is responsible for shutting down certain employment opportunities,

decreased income sources, and more impact on education, but on the contrary, we found

certain development in these ﬁelds. Likewise, innovations are positively related to worse

conditions, as it is said in a worse situation when we have no more options, the human

mind conquers new ideas, and it leads to innovations. With these arguments, we analyzed

that SES is the outcome of alterations we have in OEO, SE, EBS, BOBR, and WG as

these were enhanced during this crisis.

2 Theoretical Background and Hypothesis Development

In this paper, we took 257 respondents from schools, Universities, professionals, industry

persons, and academicians from corner to corner via social media platform (WhatsApp)

in India to understand the effect of WFH on education, employment, and Income in India.

The data is limited to a few states like Uttar Pradesh, New Delhi, Uttarakhand, Maha-

rashtra, Gujarat, Punjab, Assam, Bihar, and West Bengal. Drawn from our arguments

and past research we developed three hypotheses. Under the surveillance of covid-19;

the study was conducted based on primary data collection (n =257) in online mode.

We selected University students, teachers, and other participants on a convenience sam-

pling basis to ensure feasibility. Quite a lot of advantages and disadvantages to the WFH

program have been observed by different researchers, as WFH activity is more ﬂexible

than the physical activities to complete the work [29]. In education as well as in other

professions like IT sectors the stress level has been decreased with a distancing from

trafﬁc jams and also have more free time for family. This gives a boost for the employees

to strengthen their ability.

Various research has conﬁrmed that WFH is beneﬁcial for the health of the country

socially and economically [30], hence we thought to go for an analysis of the Impact of

106 P. Ojha et al.

WFH on the SES of any nation, like India. For this, we tried to get information related

to the three elements of SES (education, health, and income) that deﬁne the health of

any nation.

We had a set of questions through an online survey anonymously, using the non-

probability snowball sampling technique that has been framed ﬁrst on demographics

like age and gender; then the questions were divided into three parts

i. Education

ii. Employment

iii. Income & wealth.

For the ﬁrst part of the questionnaire, we asked teachers & professors, do they feel

comfortable with online classes? and do these online classes are more effective than

ofﬂine? Further, the questionnaire consists of two questions that have been asked to

private employees (teachers & professors, low and middle-class workers, and Industry

persons) do they think work from home is economically beneﬁcial for society? and Do

self-employment is the outcome of ‘work from home’ during this pandemic? Finally, we

asked three questions to them including estate dealers and purchasers; do they think it

is appropriate or suitable for the health of any nation? Do online movements facilitate a

boost in revenue from online businesses? do Wealth gaps (like income equality) become

larger during this Pandemic?

H1: Effect of WFH on Education regarding the independent variable PWFH and

dependent variable OEO.

H2: Impact of WFH on Employment regarding the Independent variable CWFH and

dependent variables SE and EBS.

H3: Impact of WFH on Income& wealth regarding the independent variable TWFH and

dependent variables BOBR and WG.

H4: SES depends on PWFH, CWFH, and TWFH with the special reference to education,

employment, and income & wealth.

The collected information was then analyzed by Simple Linear Regression analysis

in SPSS. We examined close relationships between different variables taken in the study.

Based on Descriptive Statistics, we found the Range =1, mean (n =257) =1.44, S.D. =

0.499 of all respondents.

A. Study 1: We took 100 students and teachers out of 257 respondents and found that

online study is more effective than ofﬂine as it reduces infection risks and enhanced

the technical knowledge of both. Further, the results i.e., P <0.002, R2 =0.131, and

F=10.266 stated that the overall regression model was signiﬁcant. This has suggested

that students and teachers prefer online classes, consequently preferring WFH and so

contributing to the growth of SES, as shown in Tables 1,2, and 3.

Impact of Work from Home During Covid-19 107

Tabl e 1 . Model summary

Model R R square Adjusted R square Std.erroroftheestimate

1.360a.131 .118 .939

aPredictors: (Constant), PWFH.

R2 =0.131; taken as a set, the predictors i.e., dependent variables account for 13.1% of the

variance in the independent variable.

Tabl e 2 . ANOVAa(test using alpha =0.05)

Model Sum of squares Df Mean square F Sig.

1Regression 9.052 19.052 10.266 .002b

Residual 60.836 69 .882

Total 69.887 70

aDependent Variable: OEO.

bPredictors: (Constant), PWFH.

The overall regression model was signiﬁcant, F =(9.052, 60.836) =10.266,

Tabl e 3 . Co-efﬁcientsa(test each predictor at alpha =0.05)

Model Unstandardized

coefﬁcients

Standardized coefﬁcients t Sig.

B Std. error Beta

1 (Constant) 2.861 .274 10.457 .000

PWFH .376 .118 .360 3.204 .002

aDependent Variable: OEO.

B. Study 2: This study deals with the second hypothesis, where we found that EBS is

insigniﬁcant at P <0.221, R2 =0.023, but SE is signiﬁcant with P <0.001, R2 =

0.157. Examining this we can state that self-employment has been encouraged during

Covid-19, on the contrary, WFH is not economically beneﬁcial for society because of a

dearth of motivation, and competition and has hampered Industrial work (ﬁeldwork), as

shown in Tables 4,5, and 6.

During this pestilence, self-employment has been encouraged due to less employ-

ment in the economy and cutting of salaries, which discouraged employees to remain in

the job. Although the business also had many constraints during this period, still people

were ready to engage themselves in business activities.

C. Study 3: An extrapolation of the below preliminary ﬁndings suggests that the ﬁrst

variable in TWFH is ‘boost in online business revenue’ has no signiﬁcant effect on SES.

From the results, we found P <0.725, R2 =0.002 which shows only 0.2% of the variance

108 P. Ojha et al.

Tabl e 4 . Model summary

Model R R square Adjusted R square Std.erroroftheestimate

1.147a.023 .008 1.071

aPredictors: (Constant), CWFH.

R2=.023; taken as a set, the predictors i.e., dependent variables account for 2.3% of the variance

in the independent variable.

Tabl e 5 . ANOVAa(test using alpha =0.05)

Model Sum of squares Df Mean square F Sig.

1Regression 1.755 11.755 1.528 .221b

Residual 79.203 69 1.148

Total 80.958 70

aDependent Variable: EBS.

bPredictors: (Constant), CWFH.

The overall regression model was signiﬁcant, F =(1.755, 79.203) =1.528.

Tabl e 6 . Coefﬁcientsa(test each predictor at alpha =0.05)

Model Unstandardized

coefﬁcients

Standardized coefﬁcients t Sig.

B Std. error Beta

1 (Constant) 2.090 .277 7.548 .000

CWFH .132 .107 .147 1.236 .221

aDependent Variable: EBS.

in the independent variable. Although during lockdown people at home preferred to play

online and also it has been observed predilection for online entertainment, as shown in

Tables 7,8,And9.

Tabl e 7 . Model summary

Model R R square Adjusted R square Std.erroroftheestimate

1.396a.157 .145 1.071

a.Predictors: (Constant), CWFH.

R2=0.157; taken as a set, the predictors i.e., dependent variables account for 15.7% of the

variance in the independent variable

Impact of Work from Home During Covid-19 109

Tabl e 8 . ANOVAa(test using alpha =0.05)

Model Sum of squares Df Mean square F Sig.

1Regression 14.740 114.740 12.842 .001b

Residual 79.203 69 1.148

Total 93.944 70

aDependent Variable: SE.

bPredictors: (Constant), CWFH.

Tabl e 9 . Coefﬁcients (test each predictor at alpha =0.05)

Model Unstandardized

coefﬁcients

Standardized coefﬁcients t Sig.

B Std. error Beta

1(Constant) 1.090 .277 3.937 .000

CWFH .382 .107 .396 3.584 .001

aDependent Variable: SE.

The overall regression model was signiﬁcant, F =(14.740, 79.203) =12.842.

D. Study 4: In this study, we examined the relationship between TWFH and WG to know

whether these variables are interconnected or not. Although we know that there is a very

close relationship but during the pandemic, income decreased at a remarkable rate and for

this reason, our analysis showed insigniﬁcant results and a low percentage of variance.

People need more time for WFH and the income to be increased; it is predicted that

WFH is preferred by It companies and others forever, in that case, Income will increase

and SES as well. R2 =0.005; taken as a set, the predictors i.e., dependent variables

account for0.5% of the variance in the independent variable, shown in Tables 10,11,12,

13,14, and 15.

Table 10. Model summary

Model R R square Adjusted R square Std.erroroftheestimate

1.043a.002 −.013 .855

aPredictors: (Constant), TWFH.

R2=0.002; taken as a set, the predictors i.e., dependent variables account for 0.2% of the variance

in the independent variable

The above analysis revealed that WG and BOBR have insigniﬁcant relations, but

both have a positive relationship with SES. Shreds of evidence from Tables 4and 5

explain the reason why WFH was one of the instruments in reducing infection rates

during the early days of the pestilence.

110 P. Ojha et al.

Table 11. ANOVAa(test using alpha =0.05)

Model Sum of squares Df Mean square F Sig.

1Regression .091 1.091 .125 .725b

Residual 50.387 69 .730

Total 50.479 70

aDependent Variable: BOBR.

bPredictors: (Constant), TWFH.

The overall regression model was signiﬁcant, F =(0.091, 50.387) =12.842.

Table 12. Coefﬁcientsa(test each predictor at alpha =0.05)

Model Unstandardized

coefﬁcients

Standardized coefﬁcients t Sig.

B Std. error Beta

1 (Constant) 1.541 .281 5.478 .000

TWFH .034 .097 .043 .354 .725

aDependent Variable: BOBR.

Table 13. Model summary

Model R R square Adjusted R square Std.erroroftheestimate

1.071a.007 −.010 1.236

aPredictors: (Constant), TWFH.

Table 14. ANOVAa(test using alpha =0.05)

Model Sum of squares Df Mean square FSig

1Regression .515 1.515 .337 .563b

Residual 105.401 69 1.528

Total 105.915 70

aDependent Variable: WG.

bPredictors: (Constant), TWFH.

The overall regression model was signiﬁcant, F =(0.515, 105.401) =0 .337.

3 Conclusion

During the period of pestilence, we all are moving with a threat of being caught in

this trap of pandemic and don’t have any clues on how to get rid of the situation; we

are worried for our family and obviously for us too, knowing the adverse impact of

Impact of Work from Home During Covid-19 111

Table 15. Coefﬁcientsa(test each predictor at alpha =0.05)

Model Unstandardized

coefﬁcients

Standardized coefﬁcients t Sig.

B Std. error Beta

1(Constant) 2.047 .407 5.032 .000

TWFH .081 .140 .070 .581 .563

aDependent Variable: WG.

covid-19. For the time being, we have vaccination now, but every higher authority has

question marks in their minds about whether they can solve this issue at that level of

desire of the public. Many efforts have been done to ﬁght with covid-19, but not got

the ﬁnal solution. In between that, every nation tried to overcome this issue at its best

levels. India also revealed the best part of its socio-economic aspects by balancing the

situation by applying WFH which is the utmost during the pandemic. This paper argued

that work from home is very much effective as it saves lives and the economy as well.

All else equal, the education, employment and income level of the economy have a

worse impact because of this pandemic and WFH allows reducing infection risk while

maintaining both economic and social activities. In this paper we took these (education,

employment, and income)three parts of SES as indicators and compared them with a

preference for work from home (PWFH), comfortable with work from home (CWFH), &

Time to work from home (TWFH) as independent variables; and dependent variables

i) online classes effective than ofﬂine (OEO), ii) Self-employed (SE), iii) economically

beneﬁcial for society (EBS) iv) boost in online business revenue (BOBR) and v) larger

wealth gap(WG); to examine the relationships. The results were shocking for different

dependent variables, we found the signiﬁcant relations of all to SES except one variable

i.e., WG which gave insigniﬁcant results during the ﬁrst phase of covid-19. We examined

that WFH beneﬁted the socio-economic part of the nation with few negative impacts that

imply WFH should be encouraged as long as noteworthy virus risk remains.

References

1. Sonia, S.: Pandemic: Tracking Contagions, From Cholera to Ebola and Beyond. Sarah

Crichton Books, New York (2016). ISBN 978-0-374-12288-1

2. Saini, S.: COVID-19 may double poverty in India [Internet] Financial Express 2020.

https://www.ﬁnancialexpress.com/opinion/covid-19-may-double-poverty-in-india/194

3736/. Accessed 22 May 2020

3. Purwanto, A., et al.: Impact of work from home (WFH) on Indonesian teachers performance

during the Covid-19 pandemic: an exploratory study. Int. J. Adv. Sci. Technol. 29(5), 6235–

6244 (2020)

4. Bick, A., Blandin, A., Mertens, K.: Work from home after the Covid-19 outbreak. CEPR

Discussion Paper No. DP15000

5. Alon, T., Doepke, M., Rumsey, J.-O., Tertilt, M.: The impact of COVID-19 on gender equality.

In: NBERvWorking Papers 26947. Inc, National Bureau of Economic Research (2020)

112 P. Ojha et al.

6. Anser, M.K., Yousaf, Z., Khan, M.A., Nassani, A.A., Alotaibi, S.M., Qazi Abro, M.M., et al.:

Does communicable diseases (including COVID-19) may increase global poverty risk? A

cloud on the horizon. Environ. Res. 15(187), 109668 (2020)

7. Atkeson, A.: What will be the Economic Impact of COVID-19 in the US? Rough Estimates of

Disease Scenarios. National Bureau of Economic Research, Cambridge (2020). http://www.

nber.org/Papers/w26867. Accessed 21 June 2020

8. Baker, S., Bloom, N., Davis, S., Kost, K., Sammon, M., Viratyosin, T.: The Unprecedented

Stock Market Impact of COVID-19. https://www.nber.org/papers/w26945. Accessed May 21

2020

9. Bartik, A., Bertrand, M., Cullen, Z., Glaeser, E., Luca, M., Stanton, C.: How Are Small Busi-

nesses Adjusting to COVID19? Early Evidence from a Survey. National Bureau of Economic

Research, Cambridge (2020). https://www.nber.org/papers/w26989. Accessed 21 June 2020

10. Béland, L.-P., Brodeur, A., Wright, T.: The short-term economic consequences of COVID-

19: exposure to disease, remote work and government response. IZA Discussion Paper Series

(13159) (2020)

11. Di Gennaro, F., et al.: Coronavirus diseases (COVID-19) current status and future perspectives:

a narrative review. Int. J. Environ. Res. Public Health 17, 2690 (2020).

12. Dingel, J., Neiman, B.: How many jobs can be done at home? National Bureau of Economic

Research No. 26948 (2020)

13. Guerrieri, V., Lorenzoni, G., Straub, L., Werning, I.: Macroeconomic Implications of COVID-

19: Can Negative Supply Shocks Cause Demand Shortages? National Bureau of Economic

Research, Cambridge (2020). http://www.nber.org/papers/w26918.pdf. Accessed 21 June

2020

14. Guermond, V., Datta, K.: How coronavirus could hit the billions migrant workers send home

[Internet] World Economic Forum 2020. https://www.weforum.org/agenda/2020/04/how-cor

onavirus-could-hitthe-Billions-migrant-workers-send-home/. Accessed 23 Apr 2020

15. Mccloskey, B., et al.: Mass gathering events and reducing further global spread of COVID-19:

a political and publicHealth dilemma. Lancet 395, 1096–1099 (2020)

16. Bonacini, L., Gallo, G., Scicchitano, S.: Working from home and income inequality: risks of

a ‘new normal’ with COVID-19. J. Popul. Econ. 34, 303–360 (2021)

17. Mahendradev, S.: Addressing COVID-19 impacts on agriculture, food security,

and livelihoods in India | IFPRI: international food policy research institute.

IFPRI. https://www.ifpri.org/Blog/addressing-covid-19-impacts-agriculture-food-security-

and-livelihoods-India. Accessed 22 May 2020

18. Alipour, J.V., Fadinger, H., Schymik, J.: My home is my castle – the beneﬁts of working from

home during a pandemic crisis. J. Public Econ. 196, 104373 (2021)

19. Kang, D., Choi, H., Kim, J.H., Choi, J.: Spatial epidemic dynamics of the COVID-19 outbreak

in China. Int. J. Infect. Dis. 94, 96–102 (2020)

20. Koren, M., Peto, R.: Business disruptions from social distancing. In: Covid Economics (2),

13–31. Press, CEPRIZA Discussion Paper No. 1328133 Pages Posted, 23 May 2020 Kon-

stantinos Pouliakas Cedefop, University of Aberdeen - Business School, IZA Institute of

Labor Economics Jiri Branka (2020)

21. Leibovici, F., Santacrue, A.M., Famiglietti, M.: Social distancing and contact-intensive

occupations. St. Louis Federal Reserve Bank - On the Economy Blog, March (2020)

22. Lemay, M.C.: Global Pandemic Threats: A Reference Handbook. ABC-CLIO, Santa Barbara

(2020). ISBN 978-1-4408-4282-5

23. Honigsbaum, M.: The Pandemic Century: One Hundred Years of Panic, Hysteria, and Hubris.

W. W. Norton & Company, New York (2019). ISBN 978-0393254754

24. Ito, H., Hanaoka, S., Kawasaki, T.: The cruise industry and the COVID-19 outbreak. Transp.

Res. Interdiscip. Perspect. 5, 100136 (2020)

Impact of Work from Home During Covid-19 113

25. Mongey, S., Pilossoph, L., Weinberg. A.: Which workers bear the burden of social distancing

policies? NBER Working Paper No. 27085 (2020)

26. Fana, M., Torrejon Prez, S., Fernandez-Macias, E.: Employment impact of Covid-19: from

short term effects to long terms prospects. J. Ind. Bus. Econ. 47, 391–410 (2020)

27. Pouliakas, K., Branka, J.: EU Jobs at highest risk of Covid-19 social distancing: will the

pandemic exacerbate labour market divide? IZA discussion paper no. 13281. https://ssrn.

com/abstract=3608530

28. Praharaj, S., Vaidya, H.: The Urban Dimension of COVID-19 in India: COVID Outbreak

and Lessons for Future Cities. https://www.researchgate.net/publication/341616744_The_

urban_dimension_of_COVID19_in_India_COVID_Outbreak_and_Lessons_for_Future_Cit

ies?Channel=doi&Linkid=5ecb837492851c11a8880043&showfulltext=true. Accessed 21

May 2020

29. Prashant, K.N., Khanna, P.: Every house a school, every parent a teacher as Covid-19 impacts

education of 300mn students. https://www.livemint.com/news/india/every-house-a-school-

every-parent-a-teacher-as-covid-19-impacts-education-11585140662556.html

30. Adams-Prassl, A., Boneva, T., Golin, M., Rauh, C.: Inequality in the impact of the coronavirus

shock: evidence from real-time surveys. IZA Discussion Paper No. 13183 (2020)

Author Index

Ali, Arif 3

Arjun, K. P. 89

Awasthi, Monisha 22

Awasthi, Prakhar 22

Basak, Arindam 73

Chakravarty, Debashish 73

Chawla, Priyanka 63

Choudhury, Akash Sur 73

Gairola, Ajay Krishan 51

Garg, Sharvan Kumar 3

Goel, Ankur 22

Goyal, Tushar 16

Halder, Tamesh 73

Husain, Arshad 16

Jangir, Yuvraj 16

Kandari, Sumit 16

Khanduja, Manisha 22

Kumar, Anuj 22

Kumar, Vidit 51

Maurya, Sudhanshu 100

Mishra, Anupama 39

Nagaraju, M. 63

Ojha, Manish Kumar 100

Ojha, Poonam 100

Pandey, Akhilesh 3

Rawat, Deepesh 39

Rawat, Tushar 89

Saini, Shivani 3

Sajwan, Vijaylakshmi 22

Singh, Pankaj Pratap 3

Singh, Rohan 89

Sreenarayanan, N. M. 89

Tiwari, Rajeev 63

A Classification Framework for IoT Network Traffic Data for Provisioning 5G Network Slices in Smart Computing Applications

Conference Paper

Jun 2023

Hybrid Spatiotemporal Contrastive Representation Learning for Content-Based Surgical Video Retrieval

Article

Full-text available

Apr 2022

In the medical field, due to their economic and clinical benefits, there is a growing interest in minimally invasive surgeries and microscopic surgeries. These types of surgeries are often recorded during operations, and these recordings have become a key resource for education, patient disease analysis, surgical error analysis, and surgical skill assessment. However, manual searching in this collection of long-term surgical videos is an extremely labor-intensive and long-term task, requiring an effective content-based video analysis system. In this regard, previous methods for surgical video retrieval are based on handcrafted features which do not represent the video effectively. On the other hand, deep learning-based solutions were found to be effective in both surgical image and video analysis, where CNN-, LSTM-and CNN-LSTM-based methods were proposed in most surgical video analysis tasks. In this paper, we propose a hybrid spatiotemporal embedding method to enhance spatiotemporal representations using an adaptive fusion layer on top of the LSTM and temporal causal convolutional modules. To learn surgical video representations, we propose exploring the supervised contrastive learning approach to leverage label information in addition to augmented versions. By validating our approach to a video retrieval task on two datasets, Surgical Actions 160 and Cataract-101, we significantly improve on previous results in terms of mean average precision, 30.012 ± 1.778 vs. 22.54 ± 1.557 for Surgical Actions 160 and 81.134 ± 1.28 vs. 33.18 ± 1.311 for Cataract-101. We also validate the proposed method's suitability for surgical phase recognition task using the benchmark Cholec80 surgical dataset, where our approach outper-forms (with 90.2% accuracy) the state of the art.

A Review on Deep Learning based diagnosis of COVID-19 from X-ray and CT Images

Conference Paper

Full-text available

Mar 2022

Vidit Kumar

More than 400 million cases of the new coronavirus (COVID-19) have been confirmed since December 2019 in more than 200 countries. Since the spread of original COVID-19 virus SARS-CoV-2, thousands of mutations have been discovered. The most dominant ones are Alpha, Beta, Gama, Delta and Omicron variants, with the Omicron variant rapidly spreading and dominating the current phase of the COVID wave across the globe. It needs early detection and self-isolation to contain the virus. Molecular tests like rRTPCR are common for its detection. However, with the current spreading rate and lack of availability of large-scale testing laboratories, rapid diagnosis has become difficult. COVID-19 diagnosis from CT and X-ray images using deep learning techniques has been the subject of a lot of research in the last two years. This work presents a review of these studies sourced from top databases such as Web of Science and highlights challenges and research gaps with future research directions.

Learning Unsupervised Visual Representations using 3D Convolutional Autoencoder with Temporal Contrastive Modeling for Video Retrieval

Article

Full-text available

Apr 2022
IJMEMS

The rapid growth of tag-free user-generated videos (on the Internet), surgical recorded videos, and surveillance videos has necessitated the need for effective content-based video retrieval systems. Earlier methods for video representations are based on hand-crafted, which hardly performed well on the video retrieval tasks. Subsequently, deep learning methods have successfully demonstrated their effectiveness in both image and video-related tasks, but at the cost of creating massively labeled datasets. Thus, the economic solution is to use freely available unlabeled web videos for representation learning. In this regard, most of the recently developed methods are based on solving a single pretext task using 2D or 3D convolutional network. However, this paper designs and studies a 3D convolutional autoencoder (3D-CAE) for video representation learning (since it does not require labels). Further, this paper proposes a new unsupervised video feature learning method based on joint learning of past and future prediction using 3D-CAE with temporal contrastive learning. The experiments are conducted on UCF-101 and HMDB-51 datasets, where the proposed approach achieves better retrieval performance than state-of-the-art. In the ablation study, the action recognition task is performed by fine-tuning the unsupervised pre-trained model where it outperforms other methods, which further confirms the superiority of our method in learning underlying features. Such an unsupervised representation learning approach could also benefit the medical domain, where it is expensive to create large label datasets.

A Fine-Grained Access Control and Security Approach for Intelligent Vehicular Transport in 6G Communication System

Article

Jul 2022

The area of intelligent transport systems (ITS) is attracting growing attention because of the integration of the smart IoT with vehicles that improve user safety and overall travel experience. Vehicular ad hoc network (VANET) is the part of ITS; that deals with the routing protocols and security of smart vehicles. However, due to the rapid increase in the number of smart vehicles, the existing network technology’s resources unable to handle the traffic load. It expects that the 6G communication system has the ability to fulfill the requirements of VANETs. Only a few studies explore this area, but they also overlooked the security aspect of VANETs in 6G communications networks. In this paper, we present an approach to address authentication and security issues for vehicles in VANET. By authenticating cars in the VANET and identifying various cyber assaults such as DDoS, our method significantly contributes to the intelligent transport communication network. Our approach uses the concepts of identity-based encryption to provide access control to the vehicles and deep learning-based techniques for filtering malicious packets. Our identity-based encryption technique is IND-sID-CCA secure, and a state-of-the-art deep learning algorithm detects malicious packets with an accuracy of 99.72%. These results emphasize the validity of our proposed approach for VANETs in 6G communication systems.

Prediction of Flight Fare using Deep Learning Techniques

Conference Paper

Jan 2022

G Ratnakanth

Weed density estimation in soya bean crop using deep convolutional neural networks in smart agriculture

Article

Mar 2022

Weeds are those unwanted plants that grow between cultivated crops, which reduce the purity of the crops. Crops are severely affected by weeds for their quality and yields. Farmers use the traditional method for weed removal that is time-consuming and also makes it difficult to identify the difference between weed and crop. This research proposes deep convolutional neural network based Inception V4 architecture approach for identifying weed density in soya bean crop fields using crop weed field image dataset (CFWID). This work uses RGB weed and crop images. It offers a data cleaning to eliminate background, and foreground vegetation using segmentation masked. Thereafter, the weed-density area is identified using vegetation segmentation, which is a major challenge in many of such research works. This approach is validated using the CFWID weed and crop dataset that consists of 1100 broadleaf, 2548 grass weed, and the remaining 736 weed images collected from soya bean crop fields and close-to-crop weeds. The proposed model achieves an accuracy of 98.2% using 4384 weed images. Therefore, the proposed approach has been generalized to different weed species in the soya bean crop without the need for extensive labelled data with the precision value of 97%, recall value as 99%, and F1 score as 98%.

The prediction of DDoS attack by machine learning

Conference Paper

Mar 2022

An Evaluation Framework for Machine Learning Methods in Detection of DoS and DDoS Intrusion

Conference Paper

Feb 2022

A distributed denial-of-service (DDoS) and DoS attack are the most devastating and expensive attacks among various cyber and network attacks [1] [2] . Coupled with the fact that launching such attacks could be relatively easy, it makes it a big problem in the realm of Security and Cyber Space in general. However, with the advent of advanced Artificial Intelligence / Machine Learning(AI/ML) methods and tools, we explore different research techniques and methodologies to find a better detection accuracy result and prevent many different kinds of Attacks and Intrusions. During the research process, we will address Analytical and Computational challenges, Feature Selection issues, and Machine Learning Models while paying particular attention to Feature Engineering by using Mutual Information and Principal Component Analysis in the feature construction process. Moreover, K-Nearest Neighbors, Decision Trees, Random Forests, and XGBoost for Classification are used. In General, this study will target to analyze the ability of these methods to detect DoS and DDoS attacks while also examining the capacity of the ways to distinguish between different kinds of these attacks. Finally, the research investigates and proposes a framework for simultaneous evaluation of different Machine Learning methods in detecting DoS and DDoS.

Airline Fare Prediction Using Machine Learning Algorithms

Conference Paper

Jan 2022

Novel Approach for Fare Prediction Using Machine Learning Techniques

Article

Nov 2021

A survey suggests that the Flight and Cab fares vary according to various factors like location, time of the day, etc. The airline companies put into effect dynamic pricing for the flight tickets. Also, it changes with the festival, holiday season and weekends. So, what’s an excellent time to buy a flight ticket? The same can be seen with cabs as well, where the fare depends upon the number of passengers, traffic, etc. The seller has information about all of the factors, but the buyers are able to access the information that is limited through which we cannot predict the tariffs. Considering the characteristics like time of departure, the number of days left for departure and time of the day, it’ll give the prime time to purchase the ticket. Likewise, the cab companies like Uber and Ola use factors like traffic in a particular location, demand and supply factors, for example when the demand for cabs is high there is a hike in prices but when the demand for cabs is not high the prices are calculated normally according to their algorithm. Availability of drivers and type of car to travel in are also crucial factors in determining the cab fares. The motive of the paper is to analyse the factors that influence the deviation within the tariffs and the way they’re associated with the change within the prices. The impetus for the research paper is to inspect the elements which have an impact on the deviations in the tariffs and how they could be related to the variation within the prices. Using this data, build an algorithm that can assist buyers to buy a ticket at an optimal time when they get the maximum benefits and minimum fares.

Enhancing QoS of Network Traffic Based on 5G Wireless Networking Using Machine Learning Approaches

Abstract and Figures

Recommended publications

Machine Learning for 5G Mobile Networks: a Pragmatic Essay on Where, How and Why

Role of Internet of Things and Cloud Computing in Education System: A Review

Prediction Approach against DDoS Attack based on Machine Learning Multiclassfier

Prediction of DDoS Attacks Using Machine Learning Algorithms Based on Classification Technique

Flight Fare Prediction Using Machine Learning