Content uploaded by Vinayakumar Ravi
Author content
All content in this area was uploaded by Vinayakumar Ravi on Nov 07, 2020
Content may be subject to copyright.
Received: 26 May 2020 Revised: 13 August 2020 Accepted: 18 August 2020
DOI: 10.1002/itl2.232
SPECIAL ISSUE ARTICLE
Intrusion detection systems using classical machine
learning techniques vs integrated unsupervised feature
learning and deep neural network
Shisrut Rawat1Aishwarya Srinivasan2Vinayakumar Ravi3Uttam Ghosh4
1Vellore Institute of Technology, Vellore,
India
2IBM, New York, New York
3Division of Biomedical Informatics,
Cincinnati Children’s Hospital Medical
Center, Cincinnati, Ohio
4Vanderbilt University, Nashville,
Tennessee
Correspondence
Aishwarya Srinivasan, IBM, New York,
10504, NY.
Email: aishgrt@gmail.com
Security analysts and administrators face a lot of challenges to detect and pre-
vent network intrusions in their organizations, and to prevent network breaches,
detecting the breach on time is crucial. Challenges arise while detecting unfore-
seen attacks. This work includes a performance comparison of classical machine
learning approaches that require vast feature engineering, vs integrated unsu-
pervised feature learning and deep neural networks on the NSL-KDD dataset.
Various trials of experiments were run to identify suitable hyperparameters and
network configurations of machine learning models. The DNN using 15 features
extracted using Principal Component analysis (PCA) was the most effective
modeling method. The further analysis using the Software Defined Networking
features also presented a good accuracy using Deep Neural network.
KEYWORDS
deep learning, dimensionality reduction, intrusion detection, machine learning, software defined
networking, unsupervised learning
1INTRODUCTION
With the era of digitization and the Internet of Everything where all devices are paired into a signal network of com-
munication, network attacks and endpoint attacks have splurged to a vast extent.1Cybersecurity involves techniques
and technologies to protect the device’s software network, and data from unauthorized and unauthenticated access, mal-
ware attacks and network attacks.2Multiple systems have been designed around each of these spaces targeting specific
detection and prevention methodology.3This paper revolves around network intrusion attacks, classical and rule-based
methods, recent advancements using machine learning and a proposal of a two-level model integrating unsupervised and
deep neural networks. The effectiveness of the network intrusion detection comes into play when apart from identifying
the known attacks; the system can detect inherited and new attacks. The thumb of rule-based network intrusion detection
system (NIDS) are broadly classified into misuse-based or signature-based (SNIDS), anomaly-based (ANIDS) and ensem-
ble methods.4In signature-based NIDS the attack signatures are hardcoded and matching of these patterns is performed
for incoming traffic to catch any abnormal traffic in the network. In, anomaly-based NIDS abnormal traffic is flagged; it
is well designed for the recognition of new patterns of abnormal traffic. It is one of the most efficient to detect zero-day
attacks which are not well supported using SNIDS. However, the performance of ANIDS in terms of false-positive rate is
very high.2These two systems can be well integrated leveraging the strength points of SNIDS and ANIDS.
Machine learning capabilities have been seen in many domains, particularly to detect zero-day attacks.4,5 Use of
deep learning algorithms to various cybersecurity application such as malware analysis, intrusion detection, and botnet
detection has improved the results significantly.6In this paper, ML and DL models are trained on the NSL-KDD data
Internet Technology Letters. 2020;e232. wileyonlinelibrary.com/journal/itl2 © 2020 John Wiley & Sons, Ltd. 1of5
https://doi.org/10.1002/itl2.232
2of5 RAWAT .
set and various performance matrix are compared. Additionally, a NIDS is designed and tested exclusively based on
software-defined networking. The major contributions of the proposed work are given below.
•This work proposes an unsupervised feature learning with deep learning integrated framework for NIDS.
•Detailed investigation and analysis is shown on NSL-KDD dataset.
•The advantage of dimensionality reduction technique is discussed towards attaining the best performances in
detection of network intrusions.
The paper starts with the related work for intrusion detection using network service access using machine learning
techniques and the advancements in the methods in Section 2. The paper follows by details about the dataset used for
the analysis in Section 3. The methodology section describes the details of the models built for intrusion detection in
Section 3. The study presents a comparative analysis of multiple machine learning models vs deep learning models in
Section 4. Conclusion and Future works are placed in Section 5.
2RELATED WORK
A self-taught learning-based NIDS is proposed in,7where a sparse autoencoder and softmax regression is used. The pro-
posed model is trained on the NSL-KDD dataset and it achieves an accuracy around 79.10% for 5-class classification which
is very close to the performance of other state-of-the-art models. Apart from this, 23-class and 2-class classification are also
achieving good performance. In,8the performance of RNN based NIDS is studied. The model is trained on the NSL-KDD
dataset, binary and multi-class classification are performed. The performance of RNN based IDS is far superior in both
classification when compared to other traditional approaches and the author claims that RNN based IDS has a strong
modeling capability for IDS. Unlike the above works,9proposes IDS for the SDN environment. A DNN based model is
trained on only six basic features taken from the NSL-KDD dataset with different learning rates and it achieves a max-
imum accuracy of 75.75%. In,10 a new stacked nonsymmetric deep autoencoder (NDAE) based NIDS is proposed. The
model has trained on both KDD Cup 99 and NSLKDD benchmark datasets and its performance is compared with DBN
based model. It can be observed from the experimental analysis that the NDAE based approach improves the accuracy of
up to 5% with 98.8% training time reduction when compared to DBN based approach. In,11 the authors have claimed that
modeling network traffic data as a time series improves the performance of IDS. They substantiate the claim by training
LSTM models with the KDD Cup dataset with a full and minimal feature set for 1000 epochs and have obtained a maxi-
mum accuracy of 93.82%. In,12 the effectiveness of CNN and CNN-RNN based models are studied. Models such as CNN,
CNN-LSTM, CNN-GRU, and CNN-RNN are trained on the KDD Cup dataset and it can be observed that CNN based model
outperforms hybrid CNN-RNN models. Unlike previously mentioned works,2analyses several ML-based approaches for
intrusion detection for identifying various issues. Issues related to the detection of low-frequency attacks are discussed
with a possible solution to improve the performance further. In,13 a highly scalable deep learning framework is proposed
for intrusion detection at both network and host level. Various ML and DNN models are trained on datasets such as KDD
Cup, NSLKDD, WSN-DS, UNSW-NB15, CICIDS 2017, ADFA-LD and ADFA-WD and their performance is compared.
3METHODOLOGY
3.1 Description of dataset
The network security datasets are available in two ways, First, from packet monitoring software such as Wireshark, Tcp-
dump, WinDump etc but these data will not be labeled and a lot of time will go into labeling hence may not be suitable
for modeling purposes but can serve the purpose of an out time validation data set in that ensures the robustness of the
ML/DL model. Second way is the use of open-source network security datasets available for free download, it saves data
acquisition time and increases efficiency of research because they require very less cleaning and are present in a condi-
tion suitable for a modeler, For example DARPA Intrusion detection dataset, KDD Cup 99 dataset, ADFA dataset, NSL
KDD dataset.2For our research used the NSL KDD dataset,14 it is a better version of the KDD Cup 99 dataset. One of
the major drawbacks with the KDD Cup 99 dataset is a large number of duplicate observations in test and train, the NSL
KDD dataset overcomes these limitations hence, it suits our purpose of building robust predictive models.
For each observation in the NSL KDD dataset, there are 41 features, 3 are nominal, 4 are binary and the remaining 34
are continuous variables. It has 23 traffic classes in the training dataset and 30 in the test dataset. These attacks can be
clustered into four main categories DOS, probing, U2R and R2L. The features are classified into 3 broad types (a) basic
RAWAT . 3of5
TABLE 1 Dataset network intrusion details Traffic Train Test
Normal 67 343 9711
Dos 45 927 7458
U2R 52 67
R2L 995 2887
Probe 11 656 2421
TABLE 2 Subcategories of intrusions
under each broader class intrusion (The
high- lighted attacks are only present in the
test dataset)
Category Attacks
DoS back, land, neptune, pod, smurf, teardrop, mailbomb, processtable,
udpstorm, apache2, worm
R2L ftp-write, guess-passwd, imap, multihop, phf, spy, warezmaster, xlock,
xsnoop, snmpguess, snmpgetattack, httptunnel, sendmail, named
U2R buffer-overflow, loadmodule, perl, rootkit, sqlattack, xterm, ps
Probe ipsweep, nmap, portsweep, satan, mscan, saint
features, (b) content-based features and (c) traffic-based features. The attack information of the NLS-KDD dataset is listed
in Tables 1 and 2.
3.2 Model architecture
The proposal includes an unsupervised feature selection combined with the deep neural network and a deep neural
network without unsupervised feature selection. Following the hyperparameter selection study, the Deep Neural Network
of 5-layers was created. The deep neural network is an advanced model of classical feed-forward network (FNN). As the
name indicates the DNN contains many hidden layers along with the input and output layer. When the number of layer
increases in FFN causes the vanishing and exploding gradient issue. To handle the vanishing and exploding gradient
issue, the ReLU non-linear activation was introduced. ReLU helps to protect weights from vanishing by the gradient error.
Compared to other non-linear functions, ReLU is more robust to the first-order derivative function since it does not zero
for high positive and negative values of the domain. The proposed DNN architecture contains an input layer, five hidden
layers, and an output layer. The output layer of DNN contains Sigmoid activation function with a unit, which results in
either 0 or 1. The value 0 indicates normal and 1 indicates an attack. The DNN model uses binary cross-entropy as loss
function that can be defined as follows
loss(p,e)=−1
N
N
∑
i=1
[eilog(pi)+(1−ei)log(1−pi)] (1)
Where p=predicted labels vector, e=truth/expected label vector.
4EVALUATION & RESULTS
Deep neural networks (DNNs) were trained using GPU enabled TensorFlow* as backend with Keras†framework. The
learning rate of proposed DNN model is set to 0.01, optimizer to adam and batch size to 64. To compare the performance
of various models using the NSL-KDD dataset, the following different scenarios were taken into consideration.
1. Classification of the network connection records as normal or attack considering all features present in the NSL-KDD
dataset.
2. Classification of the network connection records as normal or attack considering minimal feature set9present in the
NSL-KDD dataset.
4of5 RAWAT .
TABLE 3 Model performance with all features
Algorithm Train Accuracy Validation Accuracy Test Accuracy
Decision Tree 1.0 0.9978 0.778
Extra Tree 1.00.9973 0.767
Ensemble Extra Tree 1.0 0.999 0.769
Light GBM 0.996 0.989 0.776
Deep Neural Network 0.949 0.972 0.772
PCA +Deep Neural Network 0.967 0.982 0.793
TABLE 4 Model performance with 6 SDN features
Algorithm Train Accuracy Validation Accuracy Test Accuracy
Decision Tree 0.978 0.975 0.712
Extra Tree 0.978 0.973 0.744
Ensemble Extra Tree 0.978 0.974 0.736
Light GBM 0.976 0.966 0.742
Deep Neural Network 0.948 0.955 0.759
The network connection records in the dataset are either Normal or Attack in the case of binary classification.
1. True Positive (TP) - connections that were accurately classified as the Normal class.
2. True Negative (TN) - connections that were accurately classified as the Attack class.
3. False Positive (FP) - Normal connection inaccurately classified as the Attack connection.
4. False Negative (FN) - Attack connection inaccurately classified as the Normal connection.
Accuracy: It the ratio of the accurately classified network connections to the entire test dataset. Larger the accuracy
better the classification model, the range of accuracy score is between 0 and 1. Accuracy score is defined as follows
Accuracy =TP +TN
TP +TN +FP +FN (2)
The models built for the study include training Decision Tree, Extra Tree, Ensemble Extra Tree, and Light GBM and
DNN. In addition to the analysis, instead of using all features as the input to the DNN, PCA15 was applied on the 41
features to extract 15 reduced features and then fed into DNN. The hyperparameters were tuned for all the aforementioned
models, whose details are not explicitly mentioned in the paper. All the models were run on train data of NSL-KDD with
stratified cross-validation and later tested on the test data of NSL-KDD. As mentioned in the model architecture section,
the models were trained and tested on 41 features and 6 features separately. According to multiple types of research by
Tang et al.,9the intrusion dataset consists of six features that depict the Software Defined Networking features, namely
duration, protocol type, source byte, destination byte, same host connection, and same service connection. To observe the
relative performance of the predictive model over using all intrusion features vs SDN features, the models were built using
just these six features. The results from the models on train, validation and test sets are presented in Tables 3 and 4 for
the NSL-KDD dataset with 41 features and NSL-KDD dataset with minimal feature sets. The classical models performed
better than the DNN on NSL-KDD dataset with 41 features. However, The DNN model performances better than the
classical modes with minimal feature sets. Also, the performance attained by all the models with minimal feature sets is
closer to 41 feature sets of the NSL-KDD dataset. This infers that all 41 features are not significant and most importantly
the DNN model performed better on the reduced dataset. This indicates that the PCA is an important approach which
helps to reduce the noisy features in the dataset.
5CONCLUSION AND FUTURE WORK
In this paper, a deep learning algorithm for intrusion detection in networks was implemented and evaluated. As seen in
the test dataset, there are multiple new intrusions were seen within each broader category. When the model was trained
RAWAT . 5of5
and evaluated on the train-validation split, the model performance was quite high, compared to test set accuracy, where
new intrusions are seen. Compared to all other classifiers, the deep neural network presents a much better model fitting
and better accuracy on the test set with a 0.793 accuracy. The other models seem to overfit the training data while perform-
ing less effectively on recognizing the intrusion patterns in the test data. Another implementation focuses on the Software
Defined Networking variables for model training and evaluation. With just the six features out of the 41 features, the deep
learning model gives an accuracy of 0.759 on the test set with unseen intrusions. In the future, we plan to implement a
continuous real-time model training to have better performance rather than model training on static data. In addition, the
proposed model can be evaluated on the recently released benchmark NIDS datasets along with the NSL-KDD to show
that the proposed model is more generalizable and can detect new types of attacks. This has been considered as one of
the significant directions towards future works.
ENDNOTES
∗https://www.tensorflow.org/
†https://keras.io/
ORCID
Vinayakumar Ravi https://orcid.org/0000-0001-6873-6469
REFERENCES
1. Vinayakumar R, Alazab M, Srinivasan S, Pham QV, Padannayil SK, Simran K. A visualized botnet detection system based deep learning
for the internet of things networks of smart cities. IEEE Trans Indus Appl. 2020;56:4436-4456.
2. Mishra P, Varadharajan V, Tupakula U, Pilli ES. A detailed investigation and analysis of using machine learning techniques for intrusion
detection. IEEE Commun Surv Tut. 2018;21(1):686-728.
3. Vinayakumar, R., Soman, K. P., & Poornachandran, P.. Evaluating effectiveness of shallow and deep networks to intrusion detection
system. In 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI); September 2017:
1282-1289. IEEE.
4. Vinayakumar R, Soman KP, Poornachandran P. A comparative analysis of deep learning approaches for network intrusion detection
systems (N-IDSs): deep learning for N-IDSs. Int J Dig Crime Foren. 2019;11(3):65-89.
5. Vinayakumar R, Soman KP, Poornachandran P. Evaluation of recurrent neural network and its variants for intrusion detection system
(IDS). Int J Inform Syst Model Des. 2017;8(3):43-63.
6. Singla A, Bertino E. How deep learning is making information security more intelligent. IEEE Secur Privacy. 2019;17(3):56-65.
7. Javaid, A., Niyaz, Q., Sun, W., & Alam, M.. A deep learning approach for network intrusion detection system. In Proceedings of the 9th
EAI International Conference on Bio-inspired Information and Communications Technologies (formerly BIONETICS); May 2016: 21-26.
8. Yin C, Zhu Y, Fei J, He X. A deep learning approach for intrusion detection using recurrent neural networks. IEEE Access.
2017;5:21954-21961.
9. Tang, T. A., Mhamdi, L., McLernon, D., Zaidi, S. A. R., & Ghogho, M.. Deep learning approach for network intrusion detection in software
defined networking. In 2016 International Conference on Wireless Networks and Mobile Communications (WINCOM); October 2016;
258-263. IEEE.
10. Shone N, Ngoc TN, Phai VD, Shi Q. A deep learning approach to network intrusion detection. IEEE Trans Emerg Top Comput Intel.
2018;2(1):41-50.
11. Staudemeyer RC. Applying long short-term memory recurrent neural networks to intrusion detection. South Afr Comput J.
2015;56(1):136-154.
12. Vinayakumar, R., Soman, K. P., & Poornachandran, P.. Applying convolutional neural network for network intrusion detection. In 2017
International Conference on Advances in Computing, Communications and Informatics (ICACCI); September 2017: 1222-1228. IEEE.
13. Vinayakumar R, Alazab M, Soman KP, Poornachandran P, Al-Nemrat A, Venkatraman S. Deep learning approach for intelligent intrusion
detection system. IEEE Access. 2019;7:41525-41550.
14. Tavallaee, M., Bagheri, E., Lu, W., & Ghorbani, A. A.. A detailed analysis of the KDD CUP 99 data set. In 2009 IEEE symposium on
computational intelligence for security and defense applications; July 2009: 1-6. IEEE.
15. RM SP, Maddikunta PKR, Parimala M, et al. An effective feature engineering for DNN using hybrid PCA-GWO for intrusion detection in
IoMT architecture. Comput Commun. 2020.
How to cite this article: Rawat S, Srinivasan A, Ravi V, Ghosh U. Intrusion detection systems using classical
machine learning techniques vs integrated unsupervised feature learning and deep neural network. Internet
Technology Letters. 2020;e232. https://doi.org/10.1002/itl2.232