Conference PaperPDF Available

Network intrusion detection system using J48 Decision Tree

August 2015

August 2015

DOI:10.1109/ICACCI.2015.7275914

Conference: 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI)

Authors:

Shailendra Sahu

University of Hyderabad

Instances of Kyoto 2006+ Data set (24 Features)

…

Perl Code For 'feature selection' and labelling (normal, known attack & unknown attack)

…

Output generated by WEKA J48 Classifier

…

Attribute Visualization of transformed Kyoto 2006+ data set, after discretization , where the 3 colors indicate the 3 different classes: BLUE = No Attack, RED = Known Attack & LIGHT BLUE = Unknown Attack

…

Decision Tree Generated by the WEKA Tool, where the root node is SERVICE

…

Figures - uploaded by Shailendra Sahu

Content may be subject to copyright.

Content uploaded by Shailendra Sahu

Content may be subject to copyright.

Network Intrusion Detection System Using J48

Decision Tree

Shailendra Sahu

School of Computer and Information Science

University of Hyderabad

CIAM Lab

IDRBT

Hyderabad, India

shailendrasahu668@gmail.com

B M Mehtre

CIAM Lab

IDRBT

Hyderabad, India

bmmehtre@idrbt.ac.in

Abstract—As the number of cyber attacks have increased,

detecting the intrusion in networks become a very tough job.

For network intrusion detection system (NIDS), many data

mining and machine learning techniques are used. However, for

evaluation, most of the researchers used KDD Cup 99 data set,

which has widely criticized for not showing current network

situation. In this paper we used a new labelled network dataset,

called Kyoto 2006+ dataset. In Kyoto 2006+ data set, every instant

is labelled as normal (no attack), attack (known attack) and

unknown attack. We use Decision Tree (J48) algorithm to classify

the network packet that can be used for NIDS. For training and

testing we used 134665 network instances. The generated rules

works with 97.2% correctness for detecting the connection i.e.,

no attack, known attack or unknown attack.

Keywords—Data Mining, Decision tree, Intrusion Detection

System, Kyoto data set, J48 algorithm.

I. INTRODUCTION

Due to large number of cyber crimes and large data in

cyber-world data mining techniques are good option to address

the cyber security challenges. Data mining is the extraction of

knowledge from a large amount of data [15]. Data mining

uses the statistical techniques, mathematical algorithms and

machine learning methods to discover hidden, valid patterns

and relationship among the attributes in a large data set, which

are useful to ﬁnding malicious actions. For detecting the cyber

attacks, intrusion detection is one of the popular technique.

In this paper we are discussing Network Intrusion De-

tection System (NIDS). NIDS is categorised on the basis of

detection technique, one is anomaly based and another is

signature based [1]. Anomaly based NIDS generate the alert

when the system deviates from its normal behaviour. Signature

based NIDS generates alert when the analyze data matches

with the known attack pattern (signature).

Most of the available NIDSs are signature based. [2]

states that “anomaly-based NIDS have one great advantage

over signature-based ones: they can detect threats for which

there exists no signature yet, including zero-day and targeted

attacks”. So, signatures based NIDS are somehow unable to

detect the unknown attacks.

In this paper, we implemented decision tree i.e., J48

algorithm on KYOTO 2006+ data set, for intrusion detection.

As the well known KDD cup data set has some fatal problems

that it cannot reﬂect the current network situations and the

latest attack trends [13], we move to KYOTO 2006+ data set.

II. LITERATURE SURVEY

Many data mining techniques have been used for intrusion

detection. In 1980; James P. Anderson [3] classiﬁed the threats

and introduce a system which can detect the anomalies in

user’s behaviour. Later on many researchers used different

techniques i.e., SVM (Support Vector Machine), RST (Rough

Set Theory), Principal Component Analysis (PCA) to make

an efﬁcient intrusion detection system, genetic network pro-

gramming (GNP), Levenberg Marquardt (LM) Learning, etc.,

to make an efﬁcient intrusion detection system. In 2007, Shai

Rubin and Barton P. Miller introduce a technique called pro-

tomatching that combines protocol analysis, normalization and

pattern matching into a single phase [4]. In 2009, Meng Jian-

liang and Shang Haikun [5], used K-Means cluster algorithm

for intrusion detection. Later in 2010, Mohammaderza, Sara,

Fatimah and, Lilly [6] used two techniques i.e., C4.5 and SVM

for detecting network intrusion and ﬁnd that C4.5 algorithm

performs better than SVM in detecting network intrusion.

Zubair A. Baig [7], in his AODE-based NIDS, suggested that

the Naive Bayes does not accurately detect network intrusion.

In 2012, Yogendra Kumar Jain [8], compared four machine

learning algorithms i.e., J48, BayesNet, OneR and, NB for

intrusion detection and results shows that the J48 decision tree

gives more accuracy than the other three algorithms. In the

same year, R Rangaduari [9] introduces a Adaptive NIDS using

a Hybrid Approach which uses two stage approach: in the

ﬁrst stage, a probabilistic classiﬁer is used where as in second

stage, a HMM based trafﬁc model is used. V. Jaiganesh [10]

used Kernelized SVM with Levenbergmarquardt Learing for

intrusion detection. Gholam Reza Zargar [11] introduce a

category based IDS using PCA. Christopher and Justin [16]

described the application of carefully selected nonparametric,

semi-supervised learning algorithm to the network intrusion

problem, in their study they compared the performances of

different model types using feature-based data derived from

operational network. In [17], Chitrakar et al. proposed hybrid

approach of combining k-means clustering techniques with

Naive Bayes classiﬁcation. For simulation, Kyoto 2006+ data

set is used in [16] [17].

2023

978-1-4799-8792-4/15/$31.00 c

2015 IEEE

Authorized licensed use limited to: University of Hyderabad IG Memorial Library. Downloaded on September 17,2021 at 05:45:57 UTC from IEEE Xplore. Restrictions apply.

III. DECISION TREE

Decision tree is a classiﬁcation technique. It is based on

divide and conquer strategy. A decision tree consists decision

nodes and leaf nodes, where decision node speciﬁes a test

over one of the attributes and a leaf node represents the class

value [12]. Every path from the root node to leaf node is

rule. Classiﬁcation error is the performance major factor for

decision tree. Classiﬁcation error is deﬁned as the percentage

of misclassiﬁed cases [12]. In practice, the training data sets

are usually large, which results in more number of branches

and layers in the generated decision tree. In decision tree

when the class categories are more, classiﬁcation accuracy

is signiﬁcantly reduced. There are different algorithms for

generating decision tree such as ID3, J48, FT, BFTree, LMT

and many more. For our study we use J48 algorithm as it has

more accuracy rate [8]. J48 algorithm is proposed by Quinalan

in 1993.

A. Algorithm

Algorithm 1 Pseudo code for C4.5 (J48) algorithm

1: Create a root node N;

2: If (T belongs to same category C)

{leaf node=N;

mark N as class C;

return N; }

3: For i=1 to n

{Calculate Information gain(Ai);}

4: ta= testing attribute;

5: N.ta= attribute having highest information gain;

6: if (N.ta==continuous)

{ﬁnd threshold; }

7: For (Each T’ in the splitting of T)

8: if (T’ is empty)

{child of N is a leaf node;}

9: else

{child of N= dtree (T’)}

10: calculate classiﬁcation error rate of node N;

11: return N;

B. Information Gain

The information gain of an attribute, A is calculated as

follows:

gain =info(T)−



i=1

|Ti|

|T|×info(Ti)

where, T is set of cases and Ti(i=1 to s)are the subsets of T

consisting distinct value for attribute A. info (T) is known as

entropy function described as follow:

info(T)=−

Nclass



j=1

freq(Cj,T)

|T|×log2freq(Cj,T)

|T|

In practice, the generated decision tree may be large, which

make it unreadable. In C4.5 we can simplify the decision tree

by adjusting the conﬁdence level.

TABLE I. CONFUSION MATRIX

Predicted Class +ve Predicted Class -ve

Actual Class +ve TP FP

Actual Class -ve FN TN

C. Confusion Matrix

Confusion matrix is a table for visualizing the performance

of an algorithm. Table I shows the confusion matrix:

A confusion matrix has four measurement factors i.e., true

positive (TP), true negative (TN), false positive (FP) and false

negative (FN).

1) True Positive: TP shows the number of correct predic-

tion that an instance belongs to same class.

2) True Negative: TN shows the number of incorrect

prediction that an instance belongs to other class.

3) False Positive: FP shows the number of incorrect pre-

dictions that an instance belongs to same class.

4) False Negative: FN shows the number of correct pre-

diction that an instance belong to other class.

D. Accuracy

In decision tree accuracy means the percentage of correctly

classiﬁed instances. Accuracy is calculated by using confusion

matrix which is as follow:

Accuracy =TP +FN

TP +FP +TN +FN

IV. DATA SET

The KDD Cup’99 data set has been used from long time for

evaluating network intrusion system. However, there is a major

problem in KDD Cup 99’ data set, that the dataset cannot

reﬂect the current network situation and the latest attack trends.

In this study we use a new data set called Kyoto 2006+ which

is built on the three years of real trafﬁc data, which is obtained

from diverse types of honeypots [13].

Fig. 1. Instances of Kyoto 2006+ Data set (24 Features)

The Kyoto 2006+ dataset consist twenty four statistical

features; fourteen features are extracted, based on KDD Cup

99 data set and ten are additional features, which are listed in

Table II. The Kyoto data set has ten additional features among

which three are very important;

2024 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI)

Authorized licensed use limited to: University of Hyderabad IG Memorial Library. Downloaded on September 17,2021 at 05:45:57 UTC from IEEE Xplore. Restrictions apply.

1) IDS Detection: This features shows the alert generated

by IDS, where ’0’ indicates that there is no alert triggered by

IDS. For this feature Symantec IDS is used.

2) Malware Detection: For this feature clamav software

is used which indicates whether a malicious software was

observed in the connection. ’0’ indicates no malware was

observed and string indicates the corresponding malware is

observed in the connection.

3) Ashula Detection: This feature shows whether there is

any exploit and shell code is used in connection.

All the instance in the data set are marked as normal

(1), attack (-1) and unknown attack (-2). The data set is

freely available on http://www.takakura.com/Kyoto data/ with

description. Fig 1 shows the instance of the Kyoto 2006+

data set. In our study we selected 15 features; ﬁrst fourteen

conventional features and label from the additional feature. We

used perl language for extracting features from the existing

Kyoto 2006+ data set, the perl code is shown in ﬁg 2.

TABLE II. FEATURES OF KYOTO 2006+ DATA SET

No. Feature No. Feature

1 Duration 13 Dst host srv serror rate

2 Service 14 Flag

3 Source bytes 15 IDS detection

4 Destination bytes 16 Malware detection

5 Count 17 Ashula detection

6 Same srv rate 18 Label

7 Serror rate 19 Source IP Address

8 Srv serror rate 20 Source Port Number

9 Dst host count 21 Destination IP Address

10 Dst host srv count 22 Destination Port Number

11 Dst host same src port rate 23 Start time

12 Dst host serror rate 24 Duration

Fig. 2. Perl Code For ’feature selection’ and labelling (normal, known attack

& unknown attack)

V. E XPERIMENT &RESULT

For the experiment we use the very popular data mining

tool; WEKA 3.6.10. Fig 3 shows the output generated by

WEKA Classiﬁer.

Fig. 3. Output generated by WEKA J48 Classiﬁer

The experiment is performed on a Intel core i5 processor

system with 4 GB RAM and UBUNTU as OS. We use J48

decision tree for our experiment. The sample training data

consist 134665 instances; normal 44257, attack 86649 and

unknown attack 3759. For testing and training we used 10

fold cross validation. For this experiment we preprocessed the

data i.e we discretize the instances, which is shown in Fig 4.

Fig. 4. Attribute Visualization of transformed Kyoto 2006+ data set, after

discretization , where the 3 colors indicate the 3 different classes: BLUE =

No Attack, RED = Known Attack & LIGHT BLUE = Unknown Attack

As an output we got a decision tree. Fig 5 shows the tree

visualization generated by WEKA.

2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI) 2025

Authorized licensed use limited to: University of Hyderabad IG Memorial Library. Downloaded on September 17,2021 at 05:45:57 UTC from IEEE Xplore. Restrictions apply.

Fig. 5. Decision Tree Generated by the WEKA Tool, where the root node

is SERVICE

The total number of leaves in the generated tree, is 652

and size of tree is 689.

The build model,correctly classiﬁed 130931 (97.23%) in-

stances and 3734 (2.67%) instance incorrectly. Table III shows

the detailed summary. Table IV and Table V shows the

confusion matrix and detailed accuracy by class respectively.

TABLE III. DETAILED SUMMARY GENERATED BY WEKA

1 Correctly classiﬁed instances 130931

2 Incorrectly classiﬁed instance 3734

3 Kappa statistic 0.9401

4 Mean absolute error 0.0272

5 Root mean squared error 0.1176

6 Relative absolute error 8.5413%

7 Root relative squared error 29.4943%

TABLE IV. C ONFUSION MATRIX

normal attack unknown attack

44140 117 0

49 86387 213

0 3355 404

TABLE V. D ETAILED ACCURACY BYCLASS

Class TP Rate FP Rate Precision Recall F-Measure ROC A rea

normal 0.997 0.001 0.999 0.997 0.998 0.999

attack 0.997 0.072 0.961 0.997 0.979 0.994

unknown attack 0.107 0.002 0.655 0.107 0.185 0.964

Weighted Avg. 0.972 0.047 0.965 0.972 0.963 0.995

VI. CONCLUSION

In this paper, we used Kyoto 2006+ data set, which is built

on three year of real trafﬁc data (Nov. 2006 to Aug 2009).

We used J48 decision tree for network intrusion detection

and we got the accuracy of approximately 97.23%. Using

WEKA 3.6.10 tool, we build the decision tree for detecting

intrusion in the Kyoto 2006+ data set and we got the high true

positive rate (99%) for normal and attack packets. As a result

the generated tree classiﬁed 130931 instances correctly out

of 134665 instances, which is 97.23%. Our simulation results

shows, the model is able to detect unknown attacks too. In

this study we can state that we can use Kyoto 2006+ data set

for network intrusion detection.

REFERENCES

[1] P. Garcia-Teodoro, J. Diaz-Verdejo, G. MaciaFernandez, and E. Vazquez,

“Anomaly-basednetwork intrusion detection: Techniques, systemsand

challenges,” Computers and Security, vol. 28, no. 12, pp. 18-28, 2009.

[2] D. Hadiosmanovi, L. Simionato, D. Bolzoni, E.Zambon, and S. Etalle,

“N-Gram against the machine: on the feasibility of the n-gram network

analysis for binary protocols,” In Research in Attacks, Intrusions, and

Defenses, 2012, pp. 354-373.

[3] James P. Anderson, “Computer Security Threat Monitoring and Surveil-

lance,” Technical report, James P. Anderson Co., Fort Washington, Feb

1980.

[4] Shai Rubin, Somesh Jha, and Barton P. Miller, “Protomatching Network

Trafﬁc for High Throughput Network Intrusion Detection,” In the Pro-

ceedings of the 13th ACM conference on Computer and Communications

Security, pages 47-58. ACM, 2006.

[5] Meng Jianliang, and Shang Haikun, “The application on intrusion de-

tection based on K-Means cluster algorithm,” International Forum on

Information Technology and Application, 2009.

[6] Mohammadreza Ektefa, Sara Memar, Fatimah Sidi, and Lilly Suriani

Affendey, “Intrusion Detection Using Data Mining Techniques,” In the

proceedings of IEEE International Conference on Information Retrieval

& Knowledge Management, Exploring Invisible World, CAMP10, 2010,

pp. 200-203.

[7] Zubair A. Baig, Abdulrhman S. Shaheen, and Radwan AbdelAal, “An

AODE-based Intrusion Detection System for Computer Networks,” World

Congress on Internet Security (WorldCIS), pp. 28-35, IEEE 2011.

[8] Yogendra Kumar Jain and Upendra, “An Efﬁcient Intrusion Detection

Based on Decision Tree Classiﬁer Using Feature Reduction,” Interna-

tional Journal of Scientiﬁc and Research Publications, vol. 2, issue 1,

ISSN 2250-3153, Jan. 2012

[9] Rangadurai Karthick R., Hattiwale V.P., and Ravindran B., “Adaptive

network intrusion detection system using a hybrid approach,” 4th Inter-

national Conference on Communication Systems and Networks (COM-

SNETS), vol.1, no. 7, pp. 3-7, Jan. 2012

[10] V Jaiganesh and P Sumathi, “Kernelized Extreme Learning Machine

with Levenberg-Marquardt Learning Approach towards Intrusion Detec-

tion,” International Journal of Computer Applications, vol. 54, pp. 38-44,

September 2012.

[11] Gholam Reza Zargar, and Tania Baghaie, “Category-Based Intrusion

Detection Using PCA,” Journal of Information Security, vol. 3, no.4,

2012.

[12] Ruggieri S., “Efﬁcient C4.5 [classiﬁcation algorithm],” IEEE transac-

tion on Knowledge and Data Engineering, vol. 14, no.2, pp. 438-444,

Mar/Apr 2002.

[13] Jungsuk Song, Hiroki Takakura, Yasuo Okabe, Masashi Eto, Daisuke

Inoue, and Koji Nakao, “Statistical Analysis of Honeypot Data and

Building of Kyoto 2006+ Dataset for NIDS Evaluation,” In the proceed-

ings of the 1st Workshop on Building Analysis Datasets and Gathering

Experience Returns for Security, pp. 29-36, 2011.

[14] Kyoto 2006+ data set, Available at:http://www.takakura.com/Kyoto

data/

[15] Sumeet Dua, and Xian Du, “Data Mining and Machine Learning in

Cybersecurity”, Auerbach Publications, 2011.

[16] Christopher T. Symons, and Justin M. Beaver, “Nonparametric Semi-

Supervised Learning for Network Intrusion Detection: Combining Per-

formance Improvements with Realistic In-Situ Training”, In the pro-

ceeding of the 5th ACM workshop on Security and artiﬁcial intelli-

gence(AISec’12), 2012.

[17] Chitrakar R., and Huang Chuanhe, “Anomaly detection using Support

Vector Machine classiﬁcation with l-Medoids clustering,” 3rd Asian

Himalayas International Conference on Internet (AH-ICI), vol. 1, no.

5, pp. 23-25, Nov 2012.

2026 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI)

Authorized licensed use limited to: University of Hyderabad IG Memorial Library. Downloaded on September 17,2021 at 05:45:57 UTC from IEEE Xplore. Restrictions apply.

Network Intrusion Detection Method Based on CNN, BiLSTM, and Attention Mechanism

Article

Full-text available

Jan 2024

To address the issue of low accuracy and high false positive rate in existing intrusion detection methods, a network intrusion detection model based on Convolutional Neural Network, Bidirectional Long Short-Term Memory, and attention mechanism in this paper. Convolutional Neural Network is used to extract the spatial features from the intrusion data, Bidirectional Long Short-Term Memory is used to mine the temporal features from the data further, and the attention mechanism is added to enhance the role of important features in the calculation process through assigning different weights to the extracted spatiotemporal features, thereby improving the classification accuracy of the model. In addition, for the problem of class imbalance existing in network intrusion data, Equalization Loss v2 is introduced as the loss function of the CNN-BiLSTM-Attention model in this paper, making the model pay more attention to minority attack class data during training, thereby improving the detection rate of the model for the minority class data. Finally, comparative experiments are conducted on NSL-KDD, UNSW-NB15, and CIC-DDoS2019 datasets. The experimental results showthat the model in this paper outperforms the other models in terms of accuracy, detection rate, and false positive rate.

DDANet: Deep DenseAttention Learning-Based for Intrusion Detection in Industrial Control Systems

Preprint

Full-text available

Mar 2024

Industrial control systems (ICSs) are integrated with communication networks and the Internet of Things (IoT), they become more susceptible to cyberattacks, which can have catastrophic effects. However, the lack of sufficient high-quality attack examples has made it very difficult to withstand cyber threats like large-scale, sophisticated, and heterogeneous ICS. Conventional intrusion detection systems (IDSs), designed primarily to assist IT systems, rely heavily on pre-established models and are mostly trained on particular types of cyberattacks. Furthermore, most intrusion detection systems suffer from low accuracy and high false-positive rates when used because they fail to take into account the imbalanced nature of datasets and feature redundancy. In this article,the Deep DenseAttention Learning Model (DDAnet), a novel and inventive deep learning scheme described in this article, is intended to identify and detect cyber attacks that target industrial control systems. The intrusion activity is regarded as a densenet-based network intrusion detection model with an attention model along with a random forest as a classifier. The DDAnet learning scheme has been extensively tested on a real industrial control system dataset. The results of these experiments reveal the great effectiveness of the scheme in identifying different types of data injection attacks on industrial control systems. Furthermore, the scheme has been found to have superior performance compared to state-of-the-art schemes and existing methodologies. The proposed strategy is a versatile method that can be easily deployed in the current ICS infrastructure with minimal effort.

Securing IoT networks in cloud computing environments: a real-time IDS

Article

Full-text available

Mar 2024
J SUPERCOMPUT

The term “Internet of Things” (IoT) encompasses an entire group of gadgets that are capable of connecting to the Internet in order to gather and share data. The IoT paradigm is being pushed into computer networks by numerous highly advanced intrusions. Cloud computing greatly enhances the success of the IoT by enabling users to perform computing tasks using Internet-based services accessed through connected devices. This seamless integration of cloud technology and the IoT has become a powerful catalyst, revolutionizing the way we operate. The adoption of a distributed architecture, such as cloud computing, exposes the system to potential threats like Distributed Denial of Service (DDoS) and Denial of Service (DoS) attacks. To mitigate these risks, the concept of an intrusion detection system (IDS) has been introduced within the cloud environment. Various machine learning (ML) and deep learning (DL) algorithms have been proposed and implemented to effectively detect and respond to such malicious traffic in the cloud system. For dimension reduction during the training process of those algorithms, multiple independent and hybrid techniques have been proposed. This study presents an efficient ML-based real-time IDS framework with proposed hybrid feature selection techniques. Additionally, in this study, a concise comparative analysis has been conducted using five well-known public datasets. The findings presented in this paper reveal that our proposed IDS achieved a maximum accuracy of 99.98% in identifying malicious traffic.

An Exploration of Machine Learning Approaches in the Field of Cybersecurity

Chapter

Apr 2024

Network Intrusion Detection System Using Decision Tree and KNN Algorithm

Conference Paper

Feb 2024

Analyzing the Evolution of DDoS Attack Detection: Traditional vs. Modern Machine Learning Models

Conference Paper

Dec 2023

Evaluating Security enhancement through Machine Learning Approaches for Anomaly Based Intrusion Detection Systems

Conference Paper

Feb 2024

The Implementation of Machine Learning for Optimizing Network-Based Intrusion Detection in the Snort Application

Conference Paper

Dec 2023

BERT-Based Sentiment Forensics Analysis for Intrusion Detection

Conference Paper

Dec 2023

Examining evolutionary scale modeling‐derived different‐dimensional embeddings in the antimicrobial peptide classification through a KNIME workflow

Article

Full-text available

Mar 2024
PROTEIN SCI

Molecular features play an important role in different bio‐chem‐informatics tasks, such as the Quantitative Structure–Activity Relationships (QSAR) modeling. Several pre‐trained models have been recently created to be used in downstream tasks, either by fine‐tuning a specific model or by extracting features to feed traditional classifiers. In this regard, a new family of Evolutionary Scale Modeling models (termed as ESM‐2 models) was recently introduced, demonstrating outstanding results in protein structure prediction benchmarks. Herein, we studied the usefulness of the different‐dimensional embeddings derived from the ESM‐2 models to classify antimicrobial peptides (AMPs). To this end, we built a KNIME workflow to use the same modeling methodology across experiments in order to guarantee fair analyses. As a result, the 640‐ and 1280‐dimensional embeddings derived from the 30‐ and 33‐layer ESM‐2 models, respectively, are the most valuable since statistically better performances were achieved by the QSAR models built from them. We also fused features of the different ESM‐2 models, and it was concluded that the fusion contributes to getting better QSAR models than using features of a single ESM‐2 model. Frequency studies revealed that only a portion of the ESM‐2 embeddings is valuable for modeling tasks since between 43% and 66% of the features were never used. Comparisons regarding state‐of‐the‐art deep learning (DL) models confirm that when performing methodologically principled studies in the prediction of AMPs, non‐DL based QSAR models yield comparable‐to‐superior performances to DL‐based QSAR models. The developed KNIME workflow is available‐freely at https://github.com/cicese-biocom/classification-QSAR-bioKom. This workflow can be valuable to avoid unfair comparisons regarding new computational methods, as well as to propose new non‐DL based QSAR models.

Category-Based Intrusion Detection Using PCA

Article

Full-text available

Jan 2012

Existing Intrusion Detection Systems (IDS) examine all the network features to detect intrusion or misuse patterns. In feature-based intrusion detection, some selected features may found to be redundant, useless or less important than the rest. This paper proposes a category-based selection of effective parameters for intrusion detection using Principal Components Analysis (PCA). In this paper, 32 basic features from TCP/IP header, and 116 derived features from TCP dump are selected in a network traffic dataset. Attacks are categorized in four groups, Denial of Service (DoS), Remote to User attack (R2L), Remote to User attack (U2R) and Probing attack. TCP dump from DARPA 1998 dataset is used in the experiments as the selected dataset. PCA method is used to determine an optimal feature set to make the detection process faster. Experimental results show that feature reduction can improve detection rate for the category-based detection approach while maintaining the detection accuracy within an acceptable range. In this paper KNN classification method is used for the classification of the attacks. Experimental results show that feature reduction will significantly speed up the train and the testing periods for identification of the intrusion attempts.

N-Gram against the Machine: On the Feasibility of the N-Gram Network Analysis for Binary Protocols

Conference Paper

Full-text available

Sep 2012

In recent years we have witnessed several complex and high-impact attacks specifically targeting "binary" protocols (RPC, Samba and, more recently, RDP). These attacks could not be detected by current --- signature-based --- detection solutions, while --- at least in theory --- they could be detected by state-of-the-art anomaly-based systems. This raises once again the still unanswered question of how effective anomaly-based systems are in practice. To contribute to answering this question, in this paper we investigate the effectiveness of a widely studied category of network intrusion detection systems: anomaly-based algorithms using n-gram analysis for payload inspection. Specifically, we present a thorough analysis and evaluation of several detection algorithms using variants of n-gram analysis on real-life environments. Our tests show that the analyzed systems, in presence of data with high variability, cannot deliver high detection and low false positive rates at the same time.

Anomaly detection using Support Vector Machine classification with k-Medoids clustering

Conference Paper

Full-text available

Nov 2012

Anomaly based Intrusion Detection System, in the recent years, has become more dependent on learning methods - specially on classifications schemes. To make the classification more accurate and effective, hybrid approaches of combining with clustering techniques are often introduced. In this paper, a better combination is proposed to address problems of the previously proposed hybrid approach of combining k-Means/k-Medoids clustering technique with Naïve Bayes classification. In this new approach, the need of large samples by the previous approach is reduced by using Support Vector Machine while maintaining the high quality clustering of k-Medoids. Simulations have been carried out by using Kyoto2006+ data sets in order to evaluate performance, accuracy, detection rate and false positive rate of the classification scheme. Experiments and analyses show that the new approach is better in increasing the detection rate as well as in decreasing the false positive rate.

Data Mining and Machine Learning in Cybersecurity

Article

Apr 2016

Nonparametric semi-supervised learning for network intrusion detection: Combining performance improvements with realistic in-situ training

Conference Paper

Oct 2012

A barrier to the widespread adoption of learning-based network intrusion detection tools is the in-situ training requirements for effective discrimination of malicious traffic. Supervised learning techniques necessitate a quantity of labeled examples that is often intractable, and at best cost-prohibitive. Recent advances in semi-supervised techniques have demonstrated the ability to generalize well based on a significantly smaller set of labeled samples. In network intrusion detection, placing reasonable requirements on the number of training examples provides realistic expectations that a learning-based system can be trained in the environment where it will be deployed. This in-situ training is necessary to ensure that the assumptions associated with the learning process hold, and thereby support a reasonable belief in the generalization ability of the resulting model. In this paper, we describe the application of a carefully selected nonparametric, semi-supervised learning algorithm to the network intrusion problem, and compare the performance to other model types using feature-based data derived from an operational network. We demonstrate dramatic performance improvements over supervised learning and anomaly detection in discriminating real, previously unseen, malicious network traffic while generating an order of magnitude fewer false alerts than any alternative, including a signature IDS tool deployed on the same network.

Statistical analysis of honeypot data and building of Kyoto 2006+ dataset for NIDS evaluation

Conference Paper

Apr 2011

With the rapid evolution and proliferation of botnets, large-scale cyber attacks such as DDoS, spam emails are also becoming more and more dangerous and serious cyber threats. Because of this, network based security technologies such as Network based Intrusion Detection Systems (NIDSs), Intrusion Prevention Systems (IPSs), firewalls have received remarkable attention to defend our crucial computer systems, networks and sensitive information from attackers on the Internet. In particular, there has been much effort towards high-performance NIDSs based on data mining and machine learning techniques. However, there is a fatal problem in that the existing evaluation dataset, called KDD Cup 99' dataset, cannot reflect current network situations and the latest attack trends. This is because it was generated by simulation over a virtual network more than 10 years ago. To the best of our knowledge, there is no alternative evaluation dataset. In this paper, we present a new evaluation dataset, called Kyoto 2006+, built on the 3 years of real traffic data (Nov. 2006 ~ Aug. 2009) which are obtained from diverse types of honeypots. Kyoto 2006+ dataset will greatly contribute to IDS researchers in obtaining more practical, useful and accurate evaluation results. Furthermore, we provide detailed analysis results of honeypot data and share our experiences so that security researchers are able to get insights into the trends of latest cyber attacks and the Internet situations.

An AODE-based intrusion detection system for computer networks

Conference Paper

Feb 2011

Detecting anomalous traffic on the Internet has remained an issue of concern for the community of security researchers over the years. Advances in computing performance, in terms of processing power and storage, have allowed the use of resource-intensive intelligent algorithms, to detect intrusive activities, in a timely manner. Naïve Bayes is a statistical inference learning algorithm with promise for document classification, spam detection and intrusion detection. The attribute independence issue associated with Naïve Bayes has been resolved through the development of the Average One Dependence Estimator (AODE) algorithm. In this paper, we propose the application of AODE for intrusion detection. The performance of the proposed scheme is studied and analyzed on the KDD-99 intrusion benchmark data set. With a detection rate of 99.7%, AODE outperformed Naïve Bayes, which reported a detection rate of 97.3%, and a larger number of false positives.

Kernelized Extreme Learning Machine with Levenberg-Marquardt Learning Approach towards Intrusion Detection

Article

Sep 2012

Network and system security is of vital importance in the present data communication environment. Hackers and intruders can create many successful attempts to cause the crash of the networks and web services by unauthorized intrusion. New threats and associated solutions to prevent these threats are emerging together with the secured system evolution. Intrusion Detection Systems (IDS) are one of these solutions. The main function of Intrusion Detection System is to protect the resources from threats. It analyzes and predicts the behaviours of users, and then these behaviours will be considered an attack or a normal behaviour. There are several techniques which exist at present to provide more security to the network, but most of these techniques are static. On the other hand, intrusion detection is a dynamic one, which can give dynamic protection to the network security by observing the attack. In recent times, Extreme Learning Machine (ELM) has been extensively applied to provide potential solutions for the IDS problem. But, the practicability of ELM is affected because of the complexity in choosing the suitable ELM parameters. Hence, in this paper sigma () of the radial basis kernel function is tuned using Levenberg-Marquardt (LM) learning and proposed kernelized Extreme Learning Machine with LM. In order to obtain a converged solution, LM learning is utilized. The experiment is carried out with the help of WEKA by using KDD Cup 1999 dataset and the results indicate that the proposed technique can achieve higher detection rate, very low false alarm rate and to achieve high accuracy than the regular ELM algorithms. This method is used to decrease the space densisty of the data.

Computer Security Threat Monitoring and Surveillance

Article

Jan 1980

James P Anderson

The Application on Intrusion Detection Based on K-means Cluster Algorithm

Article

May 2009

Internet security has been one of the most important problems in the world. Anomaly detection is the basic method to defend new attack in Intrusion Detection. Network intrusion detection is the process of monitoring the events occurring in a computing system or network and analyzing them for signs of intrusions, defined as attempts to compromise the confidentiality. A wide variety of data mining techniques have been applied to intrusion detections. In data mining, clustering is the most important unsupervised learning process used to find the structures or patterns in a collection of unlabeled data. We use the K-means algorithm to cluster and analyze the data in this paper. Computer simulations show that this method can detect unknown intrusions efficiently in the real network connections.

Network intrusion detection system using J48 Decision Tree

Figures

Recommended publications

Intelligent switching planning models for fault location and isolation in MV-feeders

Implementation of network intrusion detection system using variant of decision tree algorithm

Design of a Hybrid Logic Based Adaboost Decision Tree Model for Identifying Web Attacks

Design of a Hybrid Logic Based Adaboost Decision Tree Model for Identifying Web Attacks

A learning system for discriminating variants of malicious network traffic

An Improved Method for Anomaly-Based Network Scan Detection