Content uploaded by Shailendra Sahu
Author content
All content in this area was uploaded by Shailendra Sahu on Nov 04, 2022
Content may be subject to copyright.
Network Intrusion Detection System Using J48
Decision Tree
Shailendra Sahu
School of Computer and Information Science
University of Hyderabad
CIAM Lab
IDRBT
Hyderabad, India
shailendrasahu668@gmail.com
B M Mehtre
CIAM Lab
IDRBT
Hyderabad, India
bmmehtre@idrbt.ac.in
Abstract—As the number of cyber attacks have increased,
detecting the intrusion in networks become a very tough job.
For network intrusion detection system (NIDS), many data
mining and machine learning techniques are used. However, for
evaluation, most of the researchers used KDD Cup 99 data set,
which has widely criticized for not showing current network
situation. In this paper we used a new labelled network dataset,
called Kyoto 2006+ dataset. In Kyoto 2006+ data set, every instant
is labelled as normal (no attack), attack (known attack) and
unknown attack. We use Decision Tree (J48) algorithm to classify
the network packet that can be used for NIDS. For training and
testing we used 134665 network instances. The generated rules
works with 97.2% correctness for detecting the connection i.e.,
no attack, known attack or unknown attack.
Keywords—Data Mining, Decision tree, Intrusion Detection
System, Kyoto data set, J48 algorithm.
I. INTRODUCTION
Due to large number of cyber crimes and large data in
cyber-world data mining techniques are good option to address
the cyber security challenges. Data mining is the extraction of
knowledge from a large amount of data [15]. Data mining
uses the statistical techniques, mathematical algorithms and
machine learning methods to discover hidden, valid patterns
and relationship among the attributes in a large data set, which
are useful to finding malicious actions. For detecting the cyber
attacks, intrusion detection is one of the popular technique.
In this paper we are discussing Network Intrusion De-
tection System (NIDS). NIDS is categorised on the basis of
detection technique, one is anomaly based and another is
signature based [1]. Anomaly based NIDS generate the alert
when the system deviates from its normal behaviour. Signature
based NIDS generates alert when the analyze data matches
with the known attack pattern (signature).
Most of the available NIDSs are signature based. [2]
states that “anomaly-based NIDS have one great advantage
over signature-based ones: they can detect threats for which
there exists no signature yet, including zero-day and targeted
attacks”. So, signatures based NIDS are somehow unable to
detect the unknown attacks.
In this paper, we implemented decision tree i.e., J48
algorithm on KYOTO 2006+ data set, for intrusion detection.
As the well known KDD cup data set has some fatal problems
that it cannot reflect the current network situations and the
latest attack trends [13], we move to KYOTO 2006+ data set.
II. LITERATURE SURVEY
Many data mining techniques have been used for intrusion
detection. In 1980; James P. Anderson [3] classified the threats
and introduce a system which can detect the anomalies in
user’s behaviour. Later on many researchers used different
techniques i.e., SVM (Support Vector Machine), RST (Rough
Set Theory), Principal Component Analysis (PCA) to make
an efficient intrusion detection system, genetic network pro-
gramming (GNP), Levenberg Marquardt (LM) Learning, etc.,
to make an efficient intrusion detection system. In 2007, Shai
Rubin and Barton P. Miller introduce a technique called pro-
tomatching that combines protocol analysis, normalization and
pattern matching into a single phase [4]. In 2009, Meng Jian-
liang and Shang Haikun [5], used K-Means cluster algorithm
for intrusion detection. Later in 2010, Mohammaderza, Sara,
Fatimah and, Lilly [6] used two techniques i.e., C4.5 and SVM
for detecting network intrusion and find that C4.5 algorithm
performs better than SVM in detecting network intrusion.
Zubair A. Baig [7], in his AODE-based NIDS, suggested that
the Naive Bayes does not accurately detect network intrusion.
In 2012, Yogendra Kumar Jain [8], compared four machine
learning algorithms i.e., J48, BayesNet, OneR and, NB for
intrusion detection and results shows that the J48 decision tree
gives more accuracy than the other three algorithms. In the
same year, R Rangaduari [9] introduces a Adaptive NIDS using
a Hybrid Approach which uses two stage approach: in the
first stage, a probabilistic classifier is used where as in second
stage, a HMM based traffic model is used. V. Jaiganesh [10]
used Kernelized SVM with Levenbergmarquardt Learing for
intrusion detection. Gholam Reza Zargar [11] introduce a
category based IDS using PCA. Christopher and Justin [16]
described the application of carefully selected nonparametric,
semi-supervised learning algorithm to the network intrusion
problem, in their study they compared the performances of
different model types using feature-based data derived from
operational network. In [17], Chitrakar et al. proposed hybrid
approach of combining k-means clustering techniques with
Naive Bayes classification. For simulation, Kyoto 2006+ data
set is used in [16] [17].
2023
978-1-4799-8792-4/15/$31.00 c
2015 IEEE
Authorized licensed use limited to: University of Hyderabad IG Memorial Library. Downloaded on September 17,2021 at 05:45:57 UTC from IEEE Xplore. Restrictions apply.
III. DECISION TREE
Decision tree is a classification technique. It is based on
divide and conquer strategy. A decision tree consists decision
nodes and leaf nodes, where decision node specifies a test
over one of the attributes and a leaf node represents the class
value [12]. Every path from the root node to leaf node is
rule. Classification error is the performance major factor for
decision tree. Classification error is defined as the percentage
of misclassified cases [12]. In practice, the training data sets
are usually large, which results in more number of branches
and layers in the generated decision tree. In decision tree
when the class categories are more, classification accuracy
is significantly reduced. There are different algorithms for
generating decision tree such as ID3, J48, FT, BFTree, LMT
and many more. For our study we use J48 algorithm as it has
more accuracy rate [8]. J48 algorithm is proposed by Quinalan
in 1993.
A. Algorithm
Algorithm 1 Pseudo code for C4.5 (J48) algorithm
1: Create a root node N;
2: If (T belongs to same category C)
{leaf node=N;
mark N as class C;
return N; }
3: For i=1 to n
{Calculate Information gain(Ai);}
4: ta= testing attribute;
5: N.ta= attribute having highest information gain;
6: if (N.ta==continuous)
{find threshold; }
7: For (Each T’ in the splitting of T)
8: if (T’ is empty)
{child of N is a leaf node;}
9: else
{child of N= dtree (T’)}
10: calculate classification error rate of node N;
11: return N;
B. Information Gain
The information gain of an attribute, A is calculated as
follows:
gain =info(T)−
s
i=1
|Ti|
|T|×info(Ti)
where, T is set of cases and Ti(i=1 to s)are the subsets of T
consisting distinct value for attribute A. info (T) is known as
entropy function described as follow:
info(T)=−
Nclass
j=1
freq(Cj,T)
|T|×log2freq(Cj,T)
|T|
In practice, the generated decision tree may be large, which
make it unreadable. In C4.5 we can simplify the decision tree
by adjusting the confidence level.
TABLE I. CONFUSION MATRIX
Predicted Class +ve Predicted Class -ve
Actual Class +ve TP FP
Actual Class -ve FN TN
C. Confusion Matrix
Confusion matrix is a table for visualizing the performance
of an algorithm. Table I shows the confusion matrix:
A confusion matrix has four measurement factors i.e., true
positive (TP), true negative (TN), false positive (FP) and false
negative (FN).
1) True Positive: TP shows the number of correct predic-
tion that an instance belongs to same class.
2) True Negative: TN shows the number of incorrect
prediction that an instance belongs to other class.
3) False Positive: FP shows the number of incorrect pre-
dictions that an instance belongs to same class.
4) False Negative: FN shows the number of correct pre-
diction that an instance belong to other class.
D. Accuracy
In decision tree accuracy means the percentage of correctly
classified instances. Accuracy is calculated by using confusion
matrix which is as follow:
Accuracy =TP +FN
TP +FP +TN +FN
IV. DATA SET
The KDD Cup’99 data set has been used from long time for
evaluating network intrusion system. However, there is a major
problem in KDD Cup 99’ data set, that the dataset cannot
reflect the current network situation and the latest attack trends.
In this study we use a new data set called Kyoto 2006+ which
is built on the three years of real traffic data, which is obtained
from diverse types of honeypots [13].
Fig. 1. Instances of Kyoto 2006+ Data set (24 Features)
The Kyoto 2006+ dataset consist twenty four statistical
features; fourteen features are extracted, based on KDD Cup
99 data set and ten are additional features, which are listed in
Table II. The Kyoto data set has ten additional features among
which three are very important;
2024 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI)
Authorized licensed use limited to: University of Hyderabad IG Memorial Library. Downloaded on September 17,2021 at 05:45:57 UTC from IEEE Xplore. Restrictions apply.
1) IDS Detection: This features shows the alert generated
by IDS, where ’0’ indicates that there is no alert triggered by
IDS. For this feature Symantec IDS is used.
2) Malware Detection: For this feature clamav software
is used which indicates whether a malicious software was
observed in the connection. ’0’ indicates no malware was
observed and string indicates the corresponding malware is
observed in the connection.
3) Ashula Detection: This feature shows whether there is
any exploit and shell code is used in connection.
All the instance in the data set are marked as normal
(1), attack (-1) and unknown attack (-2). The data set is
freely available on http://www.takakura.com/Kyoto data/ with
description. Fig 1 shows the instance of the Kyoto 2006+
data set. In our study we selected 15 features; first fourteen
conventional features and label from the additional feature. We
used perl language for extracting features from the existing
Kyoto 2006+ data set, the perl code is shown in fig 2.
TABLE II. FEATURES OF KYOTO 2006+ DATA SET
No. Feature No. Feature
1 Duration 13 Dst host srv serror rate
2 Service 14 Flag
3 Source bytes 15 IDS detection
4 Destination bytes 16 Malware detection
5 Count 17 Ashula detection
6 Same srv rate 18 Label
7 Serror rate 19 Source IP Address
8 Srv serror rate 20 Source Port Number
9 Dst host count 21 Destination IP Address
10 Dst host srv count 22 Destination Port Number
11 Dst host same src port rate 23 Start time
12 Dst host serror rate 24 Duration
Fig. 2. Perl Code For ’feature selection’ and labelling (normal, known attack
& unknown attack)
V. E XPERIMENT &RESULT
For the experiment we use the very popular data mining
tool; WEKA 3.6.10. Fig 3 shows the output generated by
WEKA Classifier.
Fig. 3. Output generated by WEKA J48 Classifier
The experiment is performed on a Intel core i5 processor
system with 4 GB RAM and UBUNTU as OS. We use J48
decision tree for our experiment. The sample training data
consist 134665 instances; normal 44257, attack 86649 and
unknown attack 3759. For testing and training we used 10
fold cross validation. For this experiment we preprocessed the
data i.e we discretize the instances, which is shown in Fig 4.
Fig. 4. Attribute Visualization of transformed Kyoto 2006+ data set, after
discretization , where the 3 colors indicate the 3 different classes: BLUE =
No Attack, RED = Known Attack & LIGHT BLUE = Unknown Attack
As an output we got a decision tree. Fig 5 shows the tree
visualization generated by WEKA.
2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI) 2025
Authorized licensed use limited to: University of Hyderabad IG Memorial Library. Downloaded on September 17,2021 at 05:45:57 UTC from IEEE Xplore. Restrictions apply.
Fig. 5. Decision Tree Generated by the WEKA Tool, where the root node
is SERVICE
The total number of leaves in the generated tree, is 652
and size of tree is 689.
The build model,correctly classified 130931 (97.23%) in-
stances and 3734 (2.67%) instance incorrectly. Table III shows
the detailed summary. Table IV and Table V shows the
confusion matrix and detailed accuracy by class respectively.
TABLE III. DETAILED SUMMARY GENERATED BY WEKA
1 Correctly classified instances 130931
2 Incorrectly classified instance 3734
3 Kappa statistic 0.9401
4 Mean absolute error 0.0272
5 Root mean squared error 0.1176
6 Relative absolute error 8.5413%
7 Root relative squared error 29.4943%
TABLE IV. C ONFUSION MATRIX
normal attack unknown attack
44140 117 0
49 86387 213
0 3355 404
TABLE V. D ETAILED ACCURACY BYCLASS
Class TP Rate FP Rate Precision Recall F-Measure ROC A rea
normal 0.997 0.001 0.999 0.997 0.998 0.999
attack 0.997 0.072 0.961 0.997 0.979 0.994
unknown attack 0.107 0.002 0.655 0.107 0.185 0.964
Weighted Avg. 0.972 0.047 0.965 0.972 0.963 0.995
VI. CONCLUSION
In this paper, we used Kyoto 2006+ data set, which is built
on three year of real traffic data (Nov. 2006 to Aug 2009).
We used J48 decision tree for network intrusion detection
and we got the accuracy of approximately 97.23%. Using
WEKA 3.6.10 tool, we build the decision tree for detecting
intrusion in the Kyoto 2006+ data set and we got the high true
positive rate (99%) for normal and attack packets. As a result
the generated tree classified 130931 instances correctly out
of 134665 instances, which is 97.23%. Our simulation results
shows, the model is able to detect unknown attacks too. In
this study we can state that we can use Kyoto 2006+ data set
for network intrusion detection.
REFERENCES
[1] P. Garcia-Teodoro, J. Diaz-Verdejo, G. MaciaFernandez, and E. Vazquez,
“Anomaly-basednetwork intrusion detection: Techniques, systemsand
challenges,” Computers and Security, vol. 28, no. 12, pp. 18-28, 2009.
[2] D. Hadiosmanovi, L. Simionato, D. Bolzoni, E.Zambon, and S. Etalle,
“N-Gram against the machine: on the feasibility of the n-gram network
analysis for binary protocols,” In Research in Attacks, Intrusions, and
Defenses, 2012, pp. 354-373.
[3] James P. Anderson, “Computer Security Threat Monitoring and Surveil-
lance,” Technical report, James P. Anderson Co., Fort Washington, Feb
1980.
[4] Shai Rubin, Somesh Jha, and Barton P. Miller, “Protomatching Network
Traffic for High Throughput Network Intrusion Detection,” In the Pro-
ceedings of the 13th ACM conference on Computer and Communications
Security, pages 47-58. ACM, 2006.
[5] Meng Jianliang, and Shang Haikun, “The application on intrusion de-
tection based on K-Means cluster algorithm,” International Forum on
Information Technology and Application, 2009.
[6] Mohammadreza Ektefa, Sara Memar, Fatimah Sidi, and Lilly Suriani
Affendey, “Intrusion Detection Using Data Mining Techniques,” In the
proceedings of IEEE International Conference on Information Retrieval
& Knowledge Management, Exploring Invisible World, CAMP10, 2010,
pp. 200-203.
[7] Zubair A. Baig, Abdulrhman S. Shaheen, and Radwan AbdelAal, “An
AODE-based Intrusion Detection System for Computer Networks,” World
Congress on Internet Security (WorldCIS), pp. 28-35, IEEE 2011.
[8] Yogendra Kumar Jain and Upendra, “An Efficient Intrusion Detection
Based on Decision Tree Classifier Using Feature Reduction,” Interna-
tional Journal of Scientific and Research Publications, vol. 2, issue 1,
ISSN 2250-3153, Jan. 2012
[9] Rangadurai Karthick R., Hattiwale V.P., and Ravindran B., “Adaptive
network intrusion detection system using a hybrid approach,” 4th Inter-
national Conference on Communication Systems and Networks (COM-
SNETS), vol.1, no. 7, pp. 3-7, Jan. 2012
[10] V Jaiganesh and P Sumathi, “Kernelized Extreme Learning Machine
with Levenberg-Marquardt Learning Approach towards Intrusion Detec-
tion,” International Journal of Computer Applications, vol. 54, pp. 38-44,
September 2012.
[11] Gholam Reza Zargar, and Tania Baghaie, “Category-Based Intrusion
Detection Using PCA,” Journal of Information Security, vol. 3, no.4,
2012.
[12] Ruggieri S., “Efficient C4.5 [classification algorithm],” IEEE transac-
tion on Knowledge and Data Engineering, vol. 14, no.2, pp. 438-444,
Mar/Apr 2002.
[13] Jungsuk Song, Hiroki Takakura, Yasuo Okabe, Masashi Eto, Daisuke
Inoue, and Koji Nakao, “Statistical Analysis of Honeypot Data and
Building of Kyoto 2006+ Dataset for NIDS Evaluation,” In the proceed-
ings of the 1st Workshop on Building Analysis Datasets and Gathering
Experience Returns for Security, pp. 29-36, 2011.
[14] Kyoto 2006+ data set, Available at:http://www.takakura.com/Kyoto
data/
[15] Sumeet Dua, and Xian Du, “Data Mining and Machine Learning in
Cybersecurity”, Auerbach Publications, 2011.
[16] Christopher T. Symons, and Justin M. Beaver, “Nonparametric Semi-
Supervised Learning for Network Intrusion Detection: Combining Per-
formance Improvements with Realistic In-Situ Training”, In the pro-
ceeding of the 5th ACM workshop on Security and artificial intelli-
gence(AISec’12), 2012.
[17] Chitrakar R., and Huang Chuanhe, “Anomaly detection using Support
Vector Machine classification with l-Medoids clustering,” 3rd Asian
Himalayas International Conference on Internet (AH-ICI), vol. 1, no.
5, pp. 23-25, Nov 2012.
2026 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI)
Authorized licensed use limited to: University of Hyderabad IG Memorial Library. Downloaded on September 17,2021 at 05:45:57 UTC from IEEE Xplore. Restrictions apply.