Conference PaperPDF Available

HCTDDA: Hybrid Classification Technique for Detection of DDoS Attacks

Authors:
  • Maharishi Markandeshwar Deemed University, Mullana
HCTDDA: Hybrid Classification Technique for
Detection of DDoS Attacks
Vimal Gaur
CSE Dept. MMEC,
Maharishi Markandeshwar (Deemed to be University),
Mullana, Ambala-133207, India and
CSE Dept, Maharaja Surajmal Institute of Technology,
New Delhi-110058, India
vimalgaur@msit.in,
ORCID ID 0000-0003-4097-1859
Rajneesh Kumar
CSE Dept. MMEC, Maharishi
Markandeshwar (Deemed to be
University),
Mullana, Ambala-133207, India
drrajneeshgujral@mmumullana.org,
ORCID ID 0000-0002-8139-3533
AbstractToday attackers are turning to mobile and
Internet of Things (IoT) technologies to diversify and
strengthen DDoS Attacks. Many methods exist for detecting
different types of DDoS attacks. In this paper, the authors
proposed HCTDDA (hybrid classification technique for
detection of DDoS attacks) for detecting DDoS attacks by using
machine learning and deep learning methods (Random Forest,
Decision Tree, SNN, DNN and XGBoost) in combination with
various feature selection methods. The goal of proposing
HCTDDA is early and accurate detection of DDoS attacks. To
achieve this goal, we apply the feature selection methods (Chi-
Square, Extra Tree, ANOVA and Mutual Information) to
determine the appropriate attributes that are better
deliverables for the prediction model. After analyzing the
results Mutual Information Feature selection at 45 features
gives 96.77% as the highest accuracy with a feature reduction
rate of 43.04% for early detection of DDoS attacks.
KeywordsExtra Tree, Chi-Square, ANOVA, Mutual
Information
I. INTRODUCTION
The birth of computing gave rise to the problem of
threats, malware and security issues. In the past few years,
growth on the internet has created more awareness towards
security. A major concern is an attack on our systems by
hackers. One such important attack is DDoS, which is most
severe due to its impact [1]. The impact of this attack on IoT
devices cannot be ignored with the rise in innovations (Cloud
Computing, Fog Computing). In cybersecurity, DDoS
attacks are the most prominent that stop internet traffic
towards the server. In this paper, the developments and
breaches in groundworks for the progress of security-
efficient algorithms have been discussed [2] [3]. Also, DDoS
attacks in countless network backgrounds have been
examined. Many Intrusion detection tools have come into
existence for the detection of these types of attacks [4] [5].
The first IDS (Intrusion Detection System) came in the year
1980, and many other systems were further introduced [6]
[7]. Their main aim is to generate alerts for threatening
situations and consequently generating false alarm rates. A
lot of research has been done in reducing false alarms and
generating high detection rates which led to the initiation of
new fields. These fields have become subjects of extensive
research and study thereby aiming to propose new systems
for tackling zero-day attacks [4] [5]. The primary goal of this
paper is to assess DDoS datasets and analyze their
performance based upon varying network traffic.
The
important contributions of our paper are as follows: (1)
Analyze the current state of existing intrusion detection
datasets, including characteristics and shortcomings.
(2)
Collect and process open DDoS datasets from reliable
sources and review them.
(3)
Select the most suitable
machine learning algorithms to assess the dataset and
build appropriate training models by labeling training
instances according to the type of network traffic,
malicious or benign.
(4)
Train, validate and test each
dataset against the machine learning algorithms and
generate results for each.
(5)
Evaluate the results of the
supervised learning models using a set of performance
metrics.
(6)
Analyze the intrusion detection performance of
machine learning classifiers based on the achieved re sults.
The remainder paper is structured as follows. Section 2
presents the literature survey. The dataset to be used and
its preprocessing have been discussed in Section 3.
Results have been discussed in Section 4. Section 5
represents the conclusion and future scope.
II. LITERATURE SURVEY
In this section, the related work is presented. Related
work includes different methodologies which have been
concluded below:-
TABLE I. RELATED WORK
S.No
Author
Year
Methodology
1
M.Barati et
al. [8]
Aug
2014
WEKA is used to apply GA and ANN to
CAIDA UCSD 2007 Dataset. A precision
value of 1 is obtained for both attack and
non
-
attack data. FP rate is 0 for attack data
and 0.002 for non-attack data.
2
A.Saied et
al. [9]
April
2015
Used ANN to detect DDoS attacks (TCP,
UDP, and ICMP) and produced 98%
accuracy. It detects known and unknown
attacks.
3
T.Zhao et
al. [10]
Aug
2015
A multi-factor detection approach based
on Neural Network has been implemented
in Apache Hadoop and HBase for
detecting DDoS attacks. Dataset gathered
from network logs is highly unstructured.
4
M.
Alkasassbeh
et al. [11]
Jan
2016
A new dataset has been generated which
includes network layer and application
layer attacks (Smurf, UDP
-
Flood, HTTP-
Flood, SIDDoS). On applying MLP, RF
and NB, MLP gives maximum accuracy as
98.63%.
5
C.J.Hsieh
and
T.Y.Chan
[12]
May
2016
Apache Spark has been used for handling
large amounts of data produced by the
2000 DARPA DDoS dataset, then
produced 94% accuracy with ANN.
6
T.J. Su et al.
[13]
Feb
2017
Chart analysis of DDoS Flooding Attacks,
Reflector Attacks and Amplification
Attacks has been done as it gives a
detailed analysis of traffic characteristics.
7
T. Khalil
June
Detection and Prevention of DDoS attacks
2021 5th International Conference on Information Systems and Computer Networks (ISCON)
GLA University, Mathura, India. Oct 22-23, 2021
978-1-6654-0341-2/21/$31.00 ©2021 IEEE 1
2021 5th International Conference on Information Systems and Computer Networks (ISCON) | 978-1-6654-0341-2/21/$31.00 ©2021 IEEE | DOI: 10.1109/ISCON52037.2021.9702399
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY - UTTARAKHAND. Downloaded on August 09,2022 at 05:44:54 UTC from IEEE Xplore. Restrictions apply.
[14]
2017
using ANN machine learning algorithm on
the proposed framework have been done.
8
J.LI et al.
[15]
Dec
2017
Feature selection algorithms classification
based on data has been done: Data can be
Conventional, Structured, Heterogeneous,
and Streaming.
9
A.Lohachab
and
B.Karambir
[16]
Sep
2018
Types of DDoS attacks and their impact
on the IoT ecosystem have been discussed.
Some automatic and manual tools have
also been discussed.
10
L.Huraj et
al. [17]
Sep
2018
Design of real-world DDoS attack (UDP-
based Distributed Reflexive DoS) is done.
11
K.Whehbi
[18]
April
2019
Three Approaches were used. In the first
SVM, KNN, RF, DT, NN and in the
second QDA, LDA, SVM, NB, KNN, RF,
DT were used. In approach 3 ANN was
used and considered as efficient means of
DDoS detection.
12
Prasad.et al
[19]
April
2019
NB, DT, RF, KNN, SGB has been used on
three datasets. CSE
-CIC-IDS2018-
AWS,
CICIDS2017, CICDoS2016 for anomaly
detection. SGB gives a perfect value of
evaluation parameters. Dataset is a
Multiclass labeled file.
Tuning of hyper
-
parameters has been done
using Grid Search Technique.
13
S. J. Mary
and C.
Nalini [20]
Sep
2019
Anomaly detection on DDoS attacks using
SVM, Neural Networks, DT.
14
L. R.
Brasilino
and M.
Swamy [21]
Oct
2019
Used CoAP Accelerometer for DDoS
Flooding attacks against different services
(Software Server and Accelerated Service)
with some payload.
15
A. T.
Vasques
and J. J.
Gondim
[22]
Oct
2019
The saturation point of different IoT
devices against varying packet rates has
been measured for Amplified Reflection
DDoS attacks.
16
M.Wang et
al. [23]
Jan
2020
MLP is a type of feedforward ANN and
uses a supervised learning technique.
Proposed dynamic methods (SBS
-
MLP,
SFS
-MLP, CTSBS
) for DDoS attack
detection on NSL
-
KDD dataset and SBS-
MLP turns out to be best.
III. DATASET PREPROCESSING
In this section, we will discuss the dataset to be used and
how data is prepared.
A. About Dataset
The dataset CICDDoS2019 for our work has been taken
from the link (https://www.unb.ca/cic/datasets/ddos-
2019.html). This data gathered on two days have 79 features
and is classified as a training day and testing day. As this
data contains missing values, infinite values and undefined
values, so data has been preprocessed effectively. The
system has been implemented on a cloud-based environment
called Google Colab. Implementation is done in the Jupyter
notebook environment on CPU and GPU. The dataset used
has been divided into two days: one for the training day and
one for the testing day. Each dataset consists of different
types of attacks. The tests have been applied to verify the
accuracy of the system in the detection of DDoS attacks.
B. Data Preparation
This data contains missing values, infinite values and
undefined values. Below mentioned are the tasks that has
been performed to preprocess this data.
Avoid
Missing data: Handling missing data is vital in
machine learning, as it could lead to incorrect predictions
for any model. Accordingly, null values are eliminated by
propagating the last valid observation forward along the
column axis.
Undefined Data: The elimination of null values can
result in undefined data. A null field with no cells on its
left becomes NaN after propagation since there are no
cells to provide a value.
Transformation: The format of the collected data might
not be suitable for modeling. In such cases, data and data
types need to be transformed so that the data can then be
fed into the models, as described by the CRISP-DM
method.
Class Labels: Each dataset instance represents a
snapshot of the network traffic at a given point in time.
These instances are labeled according to the nature of the
traffic, that is, whether the traffic is benign or malicious.
A key characteristic of a good learning model is its
ability to generalize to new, or unseen, data. A model
which is too close to a particular set of data is described as
overfitting, and therefore, will not perform well with
unseen data. A generalized model requires exposure to
multiple variations of input samples. Primarily, models
require two sets of data, one to train and another to test.
C. Data Modeling
The classification phase constitutes two aspects: (1)
the construction of the learning model, and (2) the
generation of the predicted labels.
Training
During the training process, the selected algorithms
are provided with training data to learn from to eventually
create machine learning models. At this point in the
process, the input data source needs to be provided and
should contain the target attribute (class label). The
training process involves finding patterns in the training
set that map the input features with the target attribute.
Based on the observed patterns, a model is produced.
Testing
Testing is conducted to assess how a model represents
data and how well it will perform in the future. The
testing data is used only once. The models are also tested
with unseen data. Various performance metrics were
generated to be able to analyze the performance of the
DDoS datasets, such as accuracy, precision, recall, and F-
measure.
D. Methodology
The methodology used in this paper has been described
in fig. 1. The hybrid classification technique for detecting
DDoS attacks proposed in the paper is smart detection to
counter DDoS attacks on IoT devices. In this methodology,
initially data collection is done, then preprocessing is done
for cleaning data. As the data class labels are highly
imbalanced in nature, Standard Scalar has been used for
normalizing data. Features are selected using different
feature selection algorithms. Once features are selected,
classification of data is done using machine learning
algorithms and deep learning algorithms. Malicious and
benign data can be easily found with this approach, hence the
attacks are detected.
2
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY - UTTARAKHAND. Downloaded on August 09,2022 at 05:44:54 UTC from IEEE Xplore. Restrictions apply.
Fig. 1. Hybrid Classification Technique for DDoS Attack Detection
IV. RESULTS AND DISCUSSIONS
We have illustrated the performance evaluation of
various machine learning classifiers in table II. The
performance has been analyzed using the following statistical
metrics: Accuracy, Precision, Recall, F-1 Score, True
Positive Rate, True Negative Rate, False Positive Rate, False
Negative Rate, and Training Time. As seen from the results,
there are slight variations in accuracy but changes in the false
positive rate and true positive rate are worthwhile to focus
on. Also clear from the table, classifier algorithms will
produce accuracies with a different number of features. Our
paper works in two phases: In the first phase, we have
performed series of iterations by applying feature selection
methods on classification algorithms at an interval of ‘5’. For
experimentation purposes, we are using Chi-Square, Extra
Tree, ANOVA and Mutual information as feature selection
methods. These methods will then be applied to hybrid
classification algorithms (Random Forest, Decision Tree,
Deep Neural Network, Shallow Neural Network and
XGBoost). Random forest with Mutual Information produces
superior results giving the highest accuracy as 96.77% with
45 variables. A low false-positive rate and high true positive
rate are prime concerns in the detection of DDoS attacks.
Since we have achieved best results with mutual information
feature selection method, so in the second phase, this has
been applied to all classifiers with same number of features
i.e 45 and performance is analyzed. As can be seen from
Table III, Random Forest again gives superior performance
in terms of all parameters except training time. The
advantage of this paper is that it makes detection of DDoS
attacks easier and efficient with 0 False Positive Rate and
nearly 100% True Positive Rate. Mutual Information is the
most widely applied model in most data mining applications.
We have also used deep learning algorithms as their
performance increases with increase in dataset. As a
conclusion, we can say that a blend of feature selection
algorithms and Machine learning classifiers provide good
results.
TABLE II. MACHINE LEARNING CLASSIFIER WITH DIFFERENT FEATURE SELECTION METHODS
Classifier
Feature Selection
Methods
Evaluation Criteria
Number of
features
Feature Reduction
Ratio (%)
Training Time
(in Seconds)
RF
Chi-Square
55
30.38
756.06
Extra Tree
35
56.70
753.35
ANOVA
65
17.73
785.58
Mutual Information
45
43.04
750.94
XGBoost
Chi-Square
40
49.30
1236
Extra Tree
65
17.73
2126
ANOVA
60
24.05
2023
Mutual Information
65
17.73
1516
DT
Chi-Square
40
49.30
52
Extra Tree
10
87.34
15
ANOVA
10
87.34
15
Mutual Information
15
81.01
67
SNN
Chi-Square
40
49.30
450
Extra Tree
5
93.67
310
ANOVA
45
43.03
512
Mutual Information
55
30.38
487
DNN
Chi-Square
40
49.30
746
Extra Tree
50
36.70
685
ANOVA
55
30.38
693
Mutual Information
60
24.05
673
3
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY - UTTARAKHAND. Downloaded on August 09,2022 at 05:44:54 UTC from IEEE Xplore. Restrictions apply.
TABLE III. MUTUAL INFORMATION FEATURE SELECTION WITH 45 FEATURES
Classification
Algorithms
Accuracy
Precision
Recall
F1
-
Score
TPR
TNR
FPR
FNR
Training Time
(in seconds)
RF
96.77
0.97
0.97
0.96
0.86
0.99
0
0.13
750.94
XGBoost
96.67
0.96
0.96
0.96
0.85
0.99
0
0.14
1166.00
DNN
96.17
0.91
0.82
0.82
0.58
0.97
0.03
0.41
590.68
SNN
95.14
0.94
0.79
0.80
0.56
0.96
0.02
0.43
438.49
DT
91.29
0.93
0.91
0.92
0.84
0.98
0.01
0.15
64.29
Fig. 2. Accuracy of Classification Algorithms
V. CONCLUSION AND FUTURE SCOPE
In this work, we presented a hybrid classification
technique for the detection of DDoS attacks. The
classification technique used is a combination of supervised
machine learning algorithms and neural networks. Feature
selection algorithms have been applied before classification
is done.
To test the detection accuracy, we compared the
performance of machine learning algorithms on the
application of feature selection algorithms. According to the
results presented, the Random Forest- Mutual Information
presented a superior performance compared to the others,
reaching a high true negative rate and low false-positive rate.
In the future, we plan to consider hyperparameter tuning to
increase detection accuracy.
REFERENCES
[1] V. Gaur, R. Kumar, “Analysis of Machine Learning Classifiers for
Early Detection of DDoS attacks on IoT Devices,” in Arab J Sci Eng.
https://doi.org/10.1007/s13369-021-05947-3.
[2] V. Gaur, R. Kumar, “ GDH Key Exchange Protocol for Group
Security among Hypercube deployed IoT devices” in Mobile Radio
Communications and 5G Networks, Lecture Notes in Networks and
Systems 140,
[3] https://doi.org/10.1007/978-981-15-7130-5_50.
[4] P. Rani, R. K. Gujral, N.S. Ahmed, A. Jain, “ A decision support
system for heart disease prediction based upon machine learning” in
J. Reliab. Intell. Environ. DOI:10.1007/s40860-021-00133-6.
[5] P. Rani, R. Kumar, A. Jain and S. K. Chawla, “A Hybrid Approach
for Feature Selection Based on Genetic Algorithm and Recursive
Feature Elimination” in Int. J. Inf. Syst. Model. Des. IGI Global, vol.
12(2), pages 17-38, April 2021.
[6] R. Lamba, T. Gulati, H. F. Alharbi and A.Jain,” A hybrid system for
Parkinson's disease diagnosis using machine learning techniques” in
Int. J. Speech Technol. https://doi.org/10.1007/s10772-021-09837-9.
[7] K. Wehbi, L. Hong, T. Al-salah and A. A. Bhutta, "A Survey on
Machine Learning-Based Detection on DDoS Attacks for IoT
Systems," 2019 SoutheastCon, 2019, pp. 1-6, doi:
10.1109/SoutheastCon42311.2019.9020468.
[8] T.H. Hai, N.T. Khiem, N.H.Phuc, “Towards an online DoS/DDoS
Classification: Empirical Study for Network Intrusion Detection
Systems, “ J. Comput. Sci., 2021, pp. 304-318,
DOI:10.3844/JCSSP.2021.304.318.
[9] M. Barati, A. Abdullah, N. I. Udzir, R. Mahmod and N. Mustapha,
"Distributed Denial of Service detection using hybrid machine
learning technique," in International Symposium on Biometrics and
Security Technologies, Kuala Lumpur, Malaysia, Aug 2014.
[10] A. Saeid, R. E. Overill and T. Radzik, "Detection of known and
unknown DDoS attacks using Artificial Neural Networks,"
NeuroComputing, vol. 172, pp. 385-393
http://dx.doi.org/10.1016/j.neucom.2015.04.101., Jan 2016.
[11] T. Zhao, D. C. Tien and K. Qian, "A Neural- Network Based DDoS
Detection System Using Hadoop and HBase," in IEEE 12th
International Conference on Embedded Software and Systems, New
York, Aug 2015 doi: 10.1109/HPCC-CSS-ICESS.2015.38.
[12] M. Alkasassbeh, A. B. Hassanat, G. A. Naymat and M. Almseidin,
"Detecting distributed denial of service attacks using data mining
techniques," International Journal of Advanced Computer Science
and Applications (IJACSA), vol. 7, no. 1, pp. 436-445, Jan 2016.
[13] C. J. Hsieh and T. Y. Chan, "Detection DDoS attacks based on neural
network using Apache Spark," in International Conference on
Applied System Innovation (ICASI), Okinawa, Japan, 26-30 May,
2016.
[14] T. J. Su, S. M. Wang, Y. F. Chen and C. L. Liu, "Attack Detection of
Distributed Denial of Service Based on Splunk," in Proceedings of
the IEEE International Conference on Advanced Materials for
Science and Engineering, Tainan, Feb 2017.
[15] T. Khalil, "IoT Security against DDoS attacks using Machine
Learning algorithms," International Journal of Scientific and
Research Publications, vol. 7, no. 6, pp. 739-741 ISSN 2250-3153,
June 2017.
[16] J. LI, K. CHENG, S. WANG, F. MORSTATTER, J. TANG and H.
LIU, "Feature Selection: A Data Perspective," ACM Computing
Surveys, vol. 50, no. 6, p. Article No.: 94
https://doi.org/10.1145/3136625, Dec 2017.
[17] A. Lohachab and B. Karambir, "Critical Analysis of DDoS- An
Emerging Security Threat over IoT Networks," Journal of
Communications and Information Networks, vol. 3, no. 3, pp. 57-78,
Sep. 2018.
[18] L. Huraj, M. Šimon and T. Horak, "IoT measuring of UDP-based
distributed reflective DoS attack," in IEEE 16th International
Symposium on Intelligent Systems and Informatics (SISY), Subotica,
Serbia.
[19] K. Wehbi, L. Hong, T. A. Salah and A. A. Bhutta, "A Survey on
Machine Learning-Based Detection on DDoS Attacks for IoT
Systems," in 2019 Southeast Conference, Huntsville, AL, USA, USA,
April 2019.
4
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY - UTTARAKHAND. Downloaded on August 09,2022 at 05:44:54 UTC from IEEE Xplore. Restrictions apply.
[20] M. D. Prasad, P. B. V and C. Amarnath, "Machine Learning DDoS
Detection Using Stochastic Gradient Boosting," International Journal
of Computer Sciences and Engineering, vol. 7, no. 4, pp. 157-166,
April 2019.
[21] S. J. Mary and C. Nalini, "Improving DDoS Attack Predection
Performance using Ensambling Techniqes," International Journal of
Recent Technology and Engineering (IJRTE), vol. 8, no. 3, pp. 4760-
4763, Sep 2019 ISSN: 2277-3878.
[22] L. R. Brasilino and M. Swamy, "Mitigating DDoS Flooding Attacks
against IoT using Custom Hardware Modules," in Sixth International
Conference on Internet of Things: Systems, Management and Security
(IOTSMS), Granada, Spain, Spain, 22-25 Oct. 2019.
[23] A. T. Vasques and J. J. Gondim, "Amplified Reflection DDoS
Attacks over IoT Mirrors: A Saturation Analysis," in Workshop on
Communication Networks and Power Systems (WCNPS), Brasilia,
Brazil, 2019.
[24] M. Wang, Y. Lu and J. Qin, "A dynamic MLP based DDoS attack
detection method using feature selection and feedback," Computers &
Security, vol. 88, no. DOI: 10.1016/j.cose.2019.101645 , Jan 2020.
5
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY - UTTARAKHAND. Downloaded on August 09,2022 at 05:44:54 UTC from IEEE Xplore. Restrictions apply.
... SQL injection was discovered in digital evidence in a database, and it has remained undetected for over two decades. Regrettably, a large majority of web developers are unaware of their web platform's security flaws, which is understandable given the hundreds of lines of code that make it very difficult for them to spot the flaws [4]. Despite the fact that a SQL injection can be easily prevented with the help of web vulnerability scanners, the majority of scanners have the potential to give false detection results. ...
Conference Paper
Full-text available
SQL Injection is one of the top 10 vulnerabilities in web-based systems. This attack essentially penetrates the logical section of the database. If the database has a logical flaw, the attackers send a new type of logical payload and get all of the user's credentials. Despite the fact that technology has advanced significantly in recent years, SQLinjections can still be carried out by taking advantage of security flaws. On the Kaggle SQL Injection Dataset, authors used multiple machine learning methods to identify and detect SQL Injection assaults, including Logistic Regression, AdaBoost (Adaptive Boosting), Random Forest, Naive Bayes, and XGBoost (Extreme Gradient Boosting) Classifier. According to this research, the best strategy for detecting SQL inject is Naive Bayes, which has an accuracy of 98.33, which is 2% better than previous work.
... Binary classifies whether an attack has occurred or not [15]. As clear from table 2, the 1-layer LSTM model achieves maximum accuracy of 99.46% in a minimum training time of 120 seconds. ...
Article
Full-text available
This paper proposes are utilizing support vector machine (SVM), Neural networks and decision tree C5 algorithms for anticipating undesirable data's. To dispose of DoS attack we have the intrusion detection systems however we have to keep up the exhibition of the intrusion detection systems. Along these lines, we propose a novel model for intrusion detection system in cloud platform utilizing random forest classifier and XG Boost model. Random Forest (RF) is a group classifier and performs all around contrasted with other conventional classifiers for viable classification of attacks. Intrusion detection system is made quick and effective by utilization of ideal feature subset selection utilizing IG. In this paper, we showed DDoS anomaly detection on the open Cloud DDoS attack datasets utilizing Random forest and Gradient Boosting (GB) machine learning (ML) model.
Article
Full-text available
Distributed denial-of-service attacks are still difficult to handle as per current scenarios. The attack aim is a menace to network security and exhausting the target networks with malicious traffic from multiple sites. Although a plethora of conventional methods have been proposed to detect DDoS attacks, so far the rapid diagnosis of these attacks using feature selection algorithms is a daunting challenge. The proposed system uses a hybrid methodology for selecting features by applying feature selection methods on machine learning classifiers. Feature selections methods, namely chi-square, Extra Tree and ANOVA have been applied on four classifiers Random Forest, Decision Tree, k-Nearest Neighbors and XGBoost for early detection of DDoS attacks on IoT devices. We use the CICDDoS2019 dataset containing comprehensive DDoS attacks to train and assess the proposed methodology in a cloud-based environment (Google Colab). Based on the experimental results, the proposed hybrid methodology provides superior performance with a feature reduction ratio of 82.5% by achieving 98.34% accuracy with ANOVA for XGBoost and helps in early detection of DDoS attacks on IoT devices.
Article
Full-text available
Parkinson’s disease is a neurodegenerative disorder that progresses slowly and its symptoms appear over time, so its early diagnosis is not easy. A neurologist can diagnose Parkinson's by reviewing the patient's medical history and repeated scans. Besides, body movement analysts can diagnose Parkinson's by analyzing body movement. Recent research work has shown that changes in speech can be used as a measurable indicator for early Parkinson’s detection. In this work, the authors propose a speech signal-based hybrid Parkinson's disease diagnosis system for its early diagnosis. To achieve this, the authors have tested several combinations of feature selection approaches and classification algorithms and designed the model with the best combination. To formulate various combinations, three feature selection methods such as mutual information gain, extra tree, and genetic algorithm and three classifiers namely naive bayes, k-nearest-neighbors, and random forest have been used. To analyze the performance of different combinations, the speech dataset available at the UCI (University of California, Irvine) machine learning repository has been used. As the dataset is highly imbalanced so the class balancing problem is overcome by the synthetic minority oversampling technique (SMOTE). The combination of genetic algorithm and random forest classifier has shown the best performance with 95.58% accuracy. Moreover, this result is also better than the recent work found in the literature.
Article
Full-text available
Machine learning has become an integral part of our life in today's world. Machine learning when applied to real-world applications suffers from the problem of high dimensional data. Data can have unnecessary and redundant features. These unnecessary features affect the performance of classification systems used in prediction. Selection of important features is the first step in developing any decision support system. In this paper, the authors have proposed a hybrid feature selection method GARFE by integrating GA (genetic algorithm) and RFE (recursive feature elimination) algorithms. Efficiency of proposed method is analyzed using support vector machine classifier on the scale of accuracy, sensitivity, specificity, precision, F-measure, and execution time parameters. Proposed GARFE method is also compared to eight other feature selection methods. Results demonstrate that the proposed GARFE method has increased the performance of classification systems by removing irrelevant and redundant features.
Article
Full-text available
Detection of heart disease through early-stage symptoms is a great challenge in the current world scenario. If not diagnosed timely then this may become the cause of death. In developing countries where heart specialist doctors are not available in remote, semi-urban, and rural areas; an accurate decision support system can play a vital role in early-stage detection of heart disease. In this paper, the authors have proposed a hybrid decision support system that can assist in the early detection of heart disease based on the clinical parameters of the patient. Authors have used multivariate imputation by chained equations algorithm to handle the missing values. A hybridized feature selection algorithm combining the Genetic Algorithm (GA) and recursive feature elimination has been used for the selection of suitable features from the available dataset. Further for pre-processing of data, SMOTE (Synthetic Minority Oversampling Technique) and standard scalar methods have been used. In the last step of the development of the proposed hybrid system, authors have used support vector machine, naive bayes, logistic regression, random forest, and adaboost classifiers. It has been found that the system has given the most accurate results with random forest classifier. The proposed hybrid system was tested in the simulation environment developed using Python. It was tested on the Cleveland heart disease dataset available at UCI (University of California, Irvine) machine learning repository. It has achieved an accuracy of 86.6%, which is superior to some of the existing heart disease prediction systems found in the literature.
Chapter
Full-text available
Effective deployment of IoT devices to preserve security, with minimum computations and energy consumption will be great challenge in the current world scenarios. This research work discusses the problem of deploying IoT devices and key exchange among IoT devices. These devices are prone to physical attacks, reason is their unattended deployment. Security of IoT devices is difficult to achieve, because of the heterogeneous array of servers and other devices. In this paper, labels are assigned to each device in virtual hypercube overlay to facilitate communication. Since hypercube deployment has the least computational complexity among other private key management solutions. Finally, communication between IoT devices deployed in hypercube arrangement is done using group Diffie–Hellman key agreement protocol. The proposed protocol is more efficient in terms of assigning a new labeling scheme to IoT devices (placed at hypercube nodes) and then uses these labels to generate shared secret key. This labeling scheme does in fact allow for effective key exchanging among IoT devices.
Article
In recent years, Distributed Denial of Services (DDoS) attacks have caused significant losses to industry and government due to an increasing number of devices connected to the Internet. These devices use services-over-Internet more frequently with services characterized and provided seamlessly by 5G, Cloud and Edge Computing. According to Cisco Annual Internet Report, the frequency of DoS/DDoS attacks has increased more than 2.5 times over the last 3 years and the average size of attacks is increasing steadily and approaching 1 Gbps. Therefore, there are cyber threats continuing to grow even with the development of new protection technologies. Our work is strongly motivated from with the goal to study and evaluate four Machine Learning models toward development of an Online Network Intrusion Detection System (N-IDS). This article studies on the application on three feature selection algorithms combined with four machine learning models applied to NIDS. We have implemented performance evaluation our proposed model on three up-to-date DoS/DDoS datasets. We have shown that Feature Importance and K-Nearest Neighbors’ algorithm (KNN) can give better results in all benchmark datasets than previous work and the empirical results of all four machine learning models and three feature selection algorithms are also presented in detail. © 2021 Tran Hoang Hai, Nguyen Trong Khiem and Nguyen Huu Phuc. This open access article is distributed under a Creative Commons Attribution (CC-BY) 4.0 license.