Conference PaperPDF Available

Malicious Packet Classification Based on Neural Network Using Kitsune Features

March 2022

March 2022

DOI:10.1007/978-3-031-08277-1_25

Conference: International Conference on Intelligent Systems and Pattern Recognition

Authors:

Chansu Han

National Institute of Information and Communications Technology

Show all 7 authorsHide

Network Intrusion Detection Systems (NIDSes) play an important role in security operations to detect and defend against cyberat-tacks. As artificial intelligence (AI)-powered NIDSes are adaptive to various kinds of attacks by exploring the knowledge presented in the data, they are in high demand to treat the cyberattacks nowadays with increasing diversity and intensity. In this paper, we present a feasibility study on neural networks (NNs)-based NIDSes aiming to solve the packet classification problem-distinguishing malicious packets from benign packets while specifying a class of anomaly to which a malicious packet belongs. We employ the features defined by Kitsune-a lightweight NN-based packet anomaly detector-as inputs to our classifier. A Kitsune feature vector is composed of statistics calculated from a single packet and its predecessors using a successive algorithm. We evaluate the proposed packet classification scheme using the CSE-CIC-IDS2018 open dataset. The experimental results show that our method can achieve good performance for particular attack types so that it can meet the requirement of a practical NIDSes.

Content uploaded by Chansu Han

Content may be subject to copyright.

Malicious Packet Classiﬁcation Based on Neural

Network Using Kitsune Features

Kohei Miyamoto1, Hiroki Goto1, Ryosuke Ishibashi1, Han Chansu2, Tao Ban2,

Takeshi Takahashi2, and Jun’ichi Takeuchi1

1Kyushu University, Fukuoka, Japan

2National Institute of Information and Communications Technology, Tokyo, Japan

Abstract. Network Intrusion Detection Systems (NIDSes) play an im-

portant role in security operations to detect and defend against cyberat-

tacks. As artiﬁcial intelligence (AI)-powered NIDSes are adaptive to var-

ious kinds of attacks by exploring the knowledge presented in the data,

they are in high demand to treat the cyberattacks nowadays with increas-

ing diversity and intensity. In this paper, we present a feasibility study on

neural networks (NNs) -based NIDSes aiming to solve the packet classi-

ﬁcation problem – distinguishing malicious packets from benign packets

while specifying a class of anomaly to which a malicious packet belongs.

We employ the features deﬁned by Kitsune – a lightweight NN-based

packet anomaly detector – as inputs to our classiﬁer. A Kitsune fea-

ture vector is composed of statistics calculated from a single packet and

its predecessors using a successive algorithm. We evaluate the proposed

packet classiﬁcation scheme using the CSE-CIC-IDS2018 open dataset.

The experimental results show that our method can achieve good per-

formance for particular attack types so that it can meet the requirement

of a practical NIDSes.

Keywords: Network intrusion detection system ·packet classiﬁcation ·

neural networks

1 Introduction

The number and variety of devices connected to the Internet are growing expo-

nentially in recent years, and so do the cyberattacks targeting these devices. In

defending against these cyberattacks, network intrusion detection systems (NID-

Ses) help security operators by monitoring network traﬃc, detecting suspicious

behaviors therein, and issuing alerts based on the detection results. So far, there

have been many kinds of NIDSes proposed. Depending on the diﬀerent detection

mechanisms and implementations, these NIDSes have their own pros and cons.

Generally, a proper aggregation of the outputs from multiple NIDSes is expected

to realize better security protection than using a single appliance.

In this paper, we discuss eﬀective ways to develop an AI-powered packet

classiﬁer that can predict a class name of cyberattacks for each attack packet. AI-

powered NIDSes can be roughly divided into two categories: anomaly detectors

2 Authors Suppressed Due to Excessive Length

and multi-class classiﬁers. An anomaly detector outputs values called anomaly

scores to measure whether captured packets are benign or not. In contrast, a

classiﬁer outputs class labels of anomalies that the packets belong to. Both types

use feature vectors extracted from monitored traﬃcs as their input. Nevertheless,

anomaly detectors can be trained in an unsupervised way: training data need

not to be labeled; while classiﬁers have to be trained using labeled data.

We use Kitsune [5], a well-known AI-based packet anomaly detector, as a

base of our development of AI-powered NIDS. The ﬁrst step of our AI-powered

NIDS is to extract the input features from monitored traﬃcs. For that process,

we utilize the feature extractor of Kitsune, which employs a successive algorithm

to extract the statistical features that characterize the packet and the commu-

nication sessions the packet lies in. Exemplary features includes the length and

protocol of the packet and frequencies of packet communication between two

hosts, etc. Based on these features, Kitsune performs packet level anomaly de-

tection based on the reconstruction error of auto-encoders.

The system framework proposed in Kitsune has proved to be eﬀective and

eﬃcient as a packet anomaly detector. In this paper, we seek to further extend

its application to solve the multi-class packet classiﬁcation problem. To do so,

we design a new packet classiﬁer based on NNs that can explore the knowledge

in Kitsune features to predict the attack types associated with the packet. We

evaluate the proposed scheme using the CSE-CIC-IDS2018 open dataset [2, 7].

The results of our experiment show that our classiﬁer has good performance for

many classes in the dataset.

In the rest of this paper, we explain the feature extraction in Kitsune and our

scheme to classify packets using the Kitsune features. Finally, we explain the our

experiments for evaluation of our scheme, show the results of the experiments

and discuss them.

2 Related Work

Ishibashi et al.[3] proposed a method to generate labeled datasets using alerts

from existing NIDSes, which we can use. Hwang et al.[1] proposed another packet

classiﬁcation method using features based on word embedding techniques. Taka-

hashi et al.[8] proposed the integration of various methods for analysing cyber-

attacks. Trainable NIDSes like our work will be useful as a component of such

products.

3 Preliminaries

In this section, we ﬁrst introduce the problem setting of packet classiﬁcation.

Then, we provide a brief introduction of Kitsune and its feature extraction.

3.1 Problem Setting

When an NIDS is working, it monitors traﬃc in a speciﬁed network. The traﬃc

can be represented as a sequence of packets. Let Pbe the set of all possible

Title Suppressed Due to Excessive Length 3

packets and p1, p2, . . . be a sequence of packets captured by the NIDS. We assume

each packet to be timestamped when it is captured. When we have a ﬁnite set

of classes C, packet classiﬁers can be regarded as a mapping from Pto C. A NN

for packet classiﬁcation is also regarded as a mapping from some feature space

to the set of probability vectors over C. In this paper, we consider only simple

feed-forward NNs. Therefore, the feature space is the real vector space RDof

ﬁxed dimension D. Hence, a feature extraction method can be regarded as a

mapping from Pto RD. When a packet is captured, an NIDS extracts a feature

vector from it. Then, the NIDS input the feature vector to the NN and obtain a

probability vector over Cas its output. As a classiﬁer, the NIDS outputs a class

which has the maximum probability in the distribution.

Labeled data for training of such classiﬁers are pairs of a packet and a class

i.e. members of P × C. Each class in Cis one-hot encoded before we input it

for a NN. A NN for classiﬁcation is usually trained by minimizing categorical

cross-entropy between output vectors and true labels. The categorical cross-

entropy is a loss function commonly used in multi-class classiﬁcation. When a

given label is i-th class in the set of classes C, this label is encoded into a one-

hot vector y= (y1,· · · , y|C|) where yi= 1 and other elements are 0. For an

output probability vector ˆyof the NN and an encoded label y, the categorical

cross-entropy loss function for them is deﬁned as L(y, ˆy) = −P|C|

j=1 yjlog ˆyi.

The optimization using the categorical cross-entropy loss leads to approximate

the conditional probability of classes given input features. The data is usually

split into 2 subsets, a train set and a test set. During the training phase, we

optimize the weights of the NN by using the train set. During the testing phase,

we evaluate the performance of the trained NN by using the test set. There are

some kinds of measures of the performance of classiﬁers, e.g. precision, recall

and F-measure of the prediction are commonly used.

3.2 Feature Extraction of Kitsune

Kitsune [5] is an NIDS based on a NN-based anomaly detector. A reference

implementation of Kitsune is provided at [9]. The anomaly detector of Kitsune

has a unique structure which consists of an ensemble of auto-encoders and a

unique preprocessing method called feature mapper. However, in this paper, we

use only the feature extractor from the structure of Kitsune. Therefore, we do

not explain the detail of the anomaly detector of Kitsune.

The feature extractor of Kitsune is intended to be capable to process arriving

packets successively without large memory consumption. Captured packets pro-

vide us a timestamp, a packet size, MAC addresses, IP addresses and TCP/UDP

ports related to them. The feature extractor uses these information of a given

packet to calculate a feature vector and update states for the calculation.

The feature extractor manages statistics called damped incremental statis-

tics. For a parameter λ > 0, an incremental statistic is a 3-tuple of real values

denoted as ISλ= (w, LS, SS). Each incremental statistic ISλis related to a

data stream determined by MAC addresses, IP addresses and TCP/UDP ports.

4 Authors Suppressed Due to Excessive Length

each packet is also related to some data streams. Data streams are divided into

the following 4 types.

–srcIP : an IP address of source of packet

–srcMAC-IP : (srcMAC, srcIP), a pair of MAC address and IP address of

source of a packet

–Channel : (srcIP, dstIP), a pair of srcIP and an IP address of destination of

a packet

–Socket : (srcIP, srcPort, dstIP, dstPort), a 4-tuple of IP addresses and TCP/UDP

ports used by a packet

Therefore, for each packet, the Feature extractor updates 4 incremental statistics.

Each incremental statistics is initialized by zero values. Let xbe a packet size of a

given packet for 3 types of incremental statistics except Channel-type and let xbe

a jitter value for Channel-type, where the jitter value is deﬁned as the diﬀerence

of the timestamp from the timestamp of the last packet observed between the

same IP addresses. For a packet with a timestamp t, each incremental statistics

are updated by the followings.

γ= 2−λ(t−tlast),(1)

(w, LS, SS)←(γw + 1, γ LS +x, γ SS +x2),(2)

where tlast means the timestamp of the last packet related to the same stream

and ←mean updates of variables in the left side. These updates can be done

successively without keeping information of packets processed in the past except

the timestamp tlast. The parameter λdetermines the intensity of time decay done

by multiplying γ. The feature extractor uses multiple values of λ. It extracts

features based on each of them and concatenates these features. Then, it output

the concatenated feature vector.

From an incremental statistic, we obtain statistics, µ=LS/w and σ=

p|SS/w −(LS/w)2|. They reﬂect approximations of a mean value and a stan-

dard deviation of xobserved in some period respectively. Since each of them

depends on single data stream, these features are called as 1D statistics in [5].

For Channel and Socket type streams, other 4 kinds of statistics called as 2D

statistics are deﬁned. They depend on 2 data streams, for example 2 streams re-

lated to diﬀerent source IP addresses. They reﬂect characteristics like covariance

and correlation between 2 streams.

The feature extractor extracts 20 statistics from a packet for each λ. The

extracted statistics consist of 4×3 = 12 1D statistics and 2 ×4 = 8 2D statistics.

In [5], λ= 5,3,1,0.1,0.01 are employed. The extracted feature vectors used for

anomaly detection are 5 ×20 = 100 dimensional vectors consisting of 60 1D

statistics and 40 2D statistics.

4 Methodology

In this section, we propose a new packet classiﬁcation method based on NN and

Kitsune features.

Title Suppressed Due to Excessive Length 5

Suppose we have a dataset consisting of packets and labels that indicate

which class each packet belongs to. Using the relationship between packets and

labels in the dataset, we can construct a packet classiﬁer in a supervised learning

way. NN is a powerful model capable of learning such relationship between inputs

and outputs. Packet classiﬁers using NN can adapt to various characteristics of

attacks and traﬃcs by learning appropriate data.

We propose using Kitsune features as inputs for a NN-based packet classiﬁer.

Kitsune uses features extracted from its feature extractor for anomaly detection.

However, we can use them as input also for packet classiﬁcation based on a NN.

Although Kitsune features are originally 100-dimensional vectors, we use only

the 60-dimensional subset consisting of 1D statistics. The reason for this is that

the extraction of 2D statistics has diﬃculty on computational time for large

datasets.

In this paper, we use a simple feed-forward NN as classiﬁers. Our classiﬁer

has a 60 dimensional input layer and a softmax layer as an output layer. An

output for each input is regarded as a probability vector over a set of classes

to predict. Our classiﬁer takes an argmax of the output probability vector and

output the class corresponding to it as the prediction.

Using Kitsune features as inputs of a NN has the following beneﬁt. After the

training, we can use the NN for online processing, because the feature extraction

is done in an online manner.

5 Experiment

In this section, we show the experimental results using an open dataset CSE-

CIC-IDS2018 [2]. First, we provide an introduction to the dataset. Then, we

introduce the setting of the experiments and show its results.

5.1 CSE-CIC-IDS2018 Dataset

CSE-CIC-IDS2018 dataset[2, 6] is an open dataset of traﬃcs of cyberattacks.

This dataset is provided by the Communications Security Establishment (CSE)

and the Canadian Institute for Cybersecurity (CIC) and distributed at [7].

CSE-CIC-IDS2018 dataset was generated by simulating a network with be-

nign traﬃc and running some tools to attack the simulated network. This dataset

contains raw data of captured traﬃcs per day and information of the attacks.

The attacks’ information includes the kinds of each attack, periods of each at-

tack, and IP addresses of attackers/targets of each attack. Therefore, we can

label packets in the captured traﬃcs by using the information of the attacks.

This dataset consists of data of 10 days. The traﬃcs in the data were captured

per day and per machine in the network. Each day includes traﬃcs of 1 to 3

kinds of attacks and all days include benign traﬃcs. All kinds of attacks have no

overlapping of their periods. This dataset contains 14 kinds of attacks in total.

Therefore, including benign class, we use 15 classes for our experiments.

6 Authors Suppressed Due to Excessive Length

5.2 Labeling and Feature Extraction

We label the data by the following procedure. For each packet, we see the times-

tamp of capture and IP addresses of it. If the timestamp is in a period of a kind

of attack and the packet was transmitted from the attackers to the targets, we

label the packet the attack’s name. If no kinds of attacks contain the timestamp

of the packet in their periods or the packet is not from attackers to targets, we

label the packet ”BENIGN”.

Since the number of benign packets is usually much larger than the number

of anomaly packets, for each day, we use only packets captured at target IP

addresses of attacks to ease the imbalance of labels.

We use the reference implementation of the extractor in [9]. The extraction

was done in the temporal order of captures.

5.3 Experiments and Results

We did two types of experiments. The ﬁrst type is experiments using data per

day. Since each day’s data contain labels at most four-classes, we performed

at most four class classiﬁcation in this type of experiment. We separated the

last 20% of all the packets in each attack duration, which were used as the

test data. We used the rest of the packets in attack duration and the packets

captured from 30 minutes before the attack duration as the training data. As

an exception, for data from 2018/02/21, we used only sub-sampled 25% of such

training data, because the total number of packets of this date is too large to

use in our experiments. The sub-sampling was done with stratiﬁcation.

The second type is an experiment using the data from all the days. We did

stratiﬁed sampling of 20000 packets from each day’s training data used in the

previous experiments and merged them into a sub dataset. We call this sub

dataset mixture data. We performed a 15 classes classiﬁcation with this mixture

data. We used the same test data as the ﬁrst type of experiments in the test

phase.

In all experiments, we used NNs consisting of an input layer, 3 hidden layers

and a softmax layer. All of hidden layers have 16 units. We used hyperbolic

tangent activation functions in the hidden layers. We implemented NNs using

Tensorﬂow and used Adam [4] as the optimizer. The initial learning rate was

0.001. The training batch size was 1024. We enabled the training to early stop

when the validation loss does not update its minimum for 5 epochs.

In each experiment, we did the following procedure 5 times and calculated

mean values of evaluation metrics we obtained. First, we initialize a model. Then,

we randomly split the training data into train/validation sets with a ratio of 3:1.

Finally, we train the model and evaluate it with the test data. All features are

standardized based on the train set in the training and evaluation.

The metrics we employed to evaluate a classiﬁer were precision, recall and

F-measure. Let Tbe a test set of pairs of a feature vector and a label. Let p(x)

be a class predicted by the classiﬁer for the input feature vector x. The metrics

are deﬁned for each class c, as precision(c) = TP(c)/(TP(c)+FP(c)), recall(c) =

Title Suppressed Due to Excessive Length 7

Fig. 1: Experimental results per day

Fig. 2: Experimental results for the mixture data

TP(c)/(TP(c)+FN(c)) and F(c) = 2precision(c)·recall(c)/(precision(c)+recall(c)),

where TP(c) = P(x,y)∈T I(p(x) = c, y =c), TN(c) = P(x,y)∈T I(p(x)=c, y =

c), FP(c) = P(x,y)∈T I(p(x) = c, y =c), FN(c) = P(x,y )∈T I(p(x)=c, y =c)

and I(·) is the indicator function. All of these metrics take values in [0,1] and

larger values mean better performance.

We show the results for experiments using data per day in Fig.1. The hori-

zontal axis shows the names of classes and the dates. The vertical axis shows the

mean values of each metric. We also show the results for the experiment using

mixture data in Fig.2.

6 Discussion

Fig.1 shows that our classiﬁers for each day have good performance except for

classes ”Inﬁltration” and ”SQL-Injection”. However, Fig.2 shows that our classi-

ﬁer for mixture data has lower performance for some classes than the classiﬁers

8 Authors Suppressed Due to Excessive Length

for each day. In particular, the performance for the DDoS-LOIC-HTTP class and

the DoS-SlowHTTPTest show signiﬁcant decreases. This implies that our classi-

ﬁer may not classify these classes in practical situations. Since Kitsune features

originally are designed to be used in anomaly detection, they may not contain

suﬃcient information to discriminate some attacks.

7 Conclusion

We propose a new packet classiﬁer based on NN using Kitsune features as inputs.

We evaluate the proposed classiﬁer by experiments using CSE-CIC-IDS2018

open dataset. Our experiments show that Kitsune 1D features can be used for

packet classiﬁcation with some performance for many kinds of attacks. However,

it also shows that the performance is not good when we should discriminate a

large number of classes of attacks.

Acknowledgments

This research was conducted under a contract of “MITIGATE” among “Research

and Development for Expansion of Radio Wave Resources (JPJ000254),”which

was supported by the Ministry of Internal Aﬀairs and Communications, Japan.

References

1. Hwang, R.H., Peng, M.C., Nguyen, V.L., Chang, Y.L.: An lstm-based deep learning

approach for classifying malicious traﬃc at the packet level. Applied Sciences 9(16)

(2019)

2. Iman, S., Arash, H.L., Ali, A.G.: “toward generating a new intrusion detection

dataset and intrusion traﬃc characterization”. In: 4th International Conference on

Information Systems Security and Privacy (ICISSP) (Jan 2018)

3. Ishibashi, R., Goto, H., Han, C., Ban, T., Takahashi, T., Takeuchi, J.: ”which packet

did they catch? associating nids alerts with their communication sessions”. In: The

16th Asia Joint Conference on Information Security (Aug 2021)

4. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint

arXiv:1412.6980 (Dec 2014)

5. Mirsky, Y., Doitshman, T., Elovici, Y., Shabtai, A.: ”kitsune: an ensemble of au-

toencoders for online network intrusion detection”. In: Network and Distributed

System Security Symposium 2018 (Feb 2018)

6. Online: Cse-cic-ids2018 on aws. https://www.unb.ca/cic/datasets/ids-2018.html

(last visited on 2021-12-31)

7. Online: A realistic cyber defense dataset (cse-cic-ids2018).

https://registry.opendata.aws/cse-cic-ids2018/ (last visited on 2021-12-31)

8. Takahashi, T., Umemura, Y., Han, C., Ban, T., Furumoto, K., Nakamura, O., Yosh-

ioka, K., Takeuchi, J., Murata, N., Shiraishi, Y.: Designing comprehensive cyber

threat analysis platform: Can we orchestrate analysis engines? In: 2021 IEEE Inter-

national Conference on Pervasive Computing and Communications Workshops and

other Aﬃliated Events (PerCom Workshops) (2021)

9. ymirsky: Kitsune-py. https://github.com/ymirsky/Kitsune-py (last visited on 2021-

12-31)

Consolidating Packet-Level Features for Effective Network Intrusion Detection: A Novel Session-Level Approach

Article

Full-text available

Jan 2023

Network Intrusion Detection Systems (NIDSs) are crucial tools for ensuring cyber security. Recently, machine learning-based NIDSs have gained popularity due to their ability to adapt to various anomalies. To enable machine learning techniques, packet-level features have been proposed for packet-level classification, but this approach may generate an excessive number of security alerts and reduce performance due to irrelevant packets. To address these limitations, this paper proposes a session-level classification approach that consolidates packet-level classification outputs to identify anomalous sessions. The effectiveness of the proposed approach is demonstrated by a prototype system. Experiments on a publicly available benchmark dataset demonstrate the high performance of proposed approach achieving F1-measure exceeding 98%. It also shows that even when we used only a few packets in head parts of each session to obtain session-level predictions, the high F1-measure still could be achieved. This result implies that the proposed approach is also efficient in terms of the number of packets to be processed. These results highlight the promising potential of the proposed approach for adaptive network intrusion detection.

Mitigate: Toward Comprehensive Research and Development for Analyzing and Combating IoT Malware

Article

Sep 2023

In this paper, we developed the latest IoT honeypots to capture IoT malware currently on the loose, analyzed IoT malware with new features such as persistent infection, developed malware removal methods to be provided to IoT device users. Furthermore, as attack behaviors using IoT devices become more diverse and sophisticated every year, we conducted research related to various factors involved in understanding the overall picture of attack behaviors from the perspective of incident responders. As the final stage of countermeasures, we also conducted research and development of IoT malware disabling technology to stop only IoT malware activities in IoT devices and IoT system disabling technology to remotely control (including stopping) IoT devices themselves.

Generating Labeled Training Datasets Towards Unified Network Intrusion Detection Systems

Article

Full-text available

Jan 2022

It is crucial to implement innovative artificial intelligence (AI)-powered network intrusion detection systems (NIDSes) to protect enterprise networks from cyberattacks, which have recently become more diverse and sophisticated. High-quality labeled training datasets are required to train AI-powered NIDSes; such datasets are globally scarce, and generating new training datasets is considered cumbersome. In this study, we investigate the possibility of an approach that integrates the strengths of existing security appliances to generate labeled training datasets that can be leveraged to develop brand-new AI-powered cybersecurity solutions. We begin by locating communication flows that the deployed NIDSes detect as suspicious, investigating their causal factors, and assigning appropriate labels in a universal format. Then, we output the packet data in the identified communication flows and the corresponding alert-type labels as labeled data. We demonstrate the effectiveness of the labeling scheme by evaluating classification models trained with the labeled dataset we generated. Furthermore, we provide case studies to examine the performance of several commonly used NIDSes and on practical approaches to automating the security triage process. Labeled datasets in this study are generated using public datasets and open-source NIDSes to ensure the reproducibility of the results. The datasets and the software tools are made publicly accessible for research use.

Packet-Level Intrusion Detection Using LSTM Focusing on Personal Information and Payloads

Conference Paper

Aug 2023

IoT Traffic Fractal Dimension Statistical Characteristics on the Kitsune Dataset Example

Article

Nov 2023

The paper considers a method for estimating the fractal properties of traffic, and also evaluates the statistical parameters of the fractal dimension of IoT traffic. An analysis of real traffic with attacks from the Kitsune dump and an analysis of the fractal properties of traffic in normal mode and under the influence of attacks such as SSDP Flood, Mirai, OS Scan showed that jumps in the fractal dimension of traffic when attacks occur can be used to create algorithms for detecting computer attacks in IoT networks. Studies have shown that in the case of online analysis of network traffic, when assessing the RF, preference should be given to the modified algorithm for estimating the Hurst exponent in a sliding analysis window.

Multivalued Classification of Computer Attacks Using Artificial Neural Networks with Multiple Outputs

Article

Full-text available

Sep 2023

Modern computer networks (CN), having a complex and often heterogeneous structure, generate large volumes of multi-dimensional multi-label data. Accounting for information about multi-label experimental data (ED) can improve the efficiency of solving a number of information security problems: from CN profiling to detecting and preventing computer attacks on CN. The aim of the work is to develop a multi-label artificial neural network (ANN) architecture for detecting and classifying computer attacks in multi-label ED, and its comparative analysis with known analogues in terms of binary metrics for assessing the quality of classification. A formalization of ANN in terms of matrix algebra is proposed, which allows taking into account the case of multi-label classification and the new architecture of ANN with multiple output using the proposed formalization. The advantage of the proposed formalization is the conciseness of a number of entries associated with the ANN operating mode and learning mode. Proposed architecture allows solving the problems of detecting and classifying multi-label computer attacks, on average, 5% more efficiently than known analogues. The observed gain is due to taking into account multi-label patterns between class labels at the training stage through the use of a common first layer. The advantages of the proposed ANN architecture are scalability to any number of class labels and fast convergence.

Which Packet Did They Catch? Associating NIDS Alerts with Their Communication Sessions

Conference Paper

Full-text available

Aug 2021

An LSTM-Based Deep Learning Approach for Classifying Malicious Traffic at the Packet Level

Article

Full-text available

Aug 2019

Recently, deep learning has been successfully applied to network security assessments and intrusion detection systems (IDSs) with various breakthroughs such as using Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) to classify malicious traffic. However, these state-of-the-art systems also face tremendous challenges to satisfy real-time analysis requirements due to the major delay of the flow-based data preprocessing, i.e., requiring time for accumulating the packets into particular flows and then extracting features. If detecting malicious traffic can be done at the packet level, detecting time will be significantly reduced, which makes the online real-time malicious traffic detection based on deep learning technologies become very promising. With the goal of accelerating the whole detection process by considering a packet level classification, which has not been studied in the literature, in this research, we propose a novel approach in building the malicious classification system with the primary support of word embedding and the LSTM model. Specifically, we propose a novel word embedding mechanism to extract packet semantic meanings and adopt LSTM to learn the temporal relation among fields in the packet header and for further classifying whether an incoming packet is normal or a part of malicious traffic. The evaluation results on ISCX2012, USTC-TFC2016, IoT dataset from Robert Gordon University and IoT dataset collected on our Mirai Botnet show that our approach is competitive to the prior literature which detects malicious traffic at the flow level. While the network traffic is booming year by year, our first attempt can inspire the research community to exploit the advantages of deep learning to build effective IDSs without suffering significant detection delay.

Kitsune: An Ensemble of Autoencoders for Online Network Intrusion Detection

Article

Full-text available

Feb 2018

Neural networks have become an increasingly popular solution for network intrusion detection systems (NIDS). Their capability of learning complex patterns and behaviors make them a suitable solution for differentiating between normal traffic and network attacks. However, a drawback of neural networks is the amount of resources needed to train them. Many network gateways and routers devices, which could potentially host an NIDS, simply do not have the memory or processing power to train and sometimes even execute such models. More importantly, the existing neural network solutions are trained in a supervised manner. Meaning that an expert must label the network traffic and update the model manually from time to time. In this paper, we present Kitsune: a plug and play NIDS which can learn to detect attacks on the local network, without supervision, and in an efficient online manner. Kitsune's core algorithm (KitNET) uses an ensemble of neural networks called autoencoders to collectively differentiate between normal and abnormal traffic patterns. KitNET is supported by a feature extraction framework which efficiently tracks the patterns of every network channel. Our evaluations show that Kitsune can detect various attacks with a performance comparable to offline anomaly detectors, even on a Raspberry PI. This demonstrates that Kitsune can be a practical and economic NIDS.

Kitsune: An Ensemble of Autoencoders for Online Network Intrusion Detection

Conference Paper

Full-text available

Jan 2018

Designing Comprehensive Cyber Threat Analysis Platform: Can We Orchestrate Analysis Engines?

Conference Paper

Mar 2021

Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization

Conference Paper

Jan 2018

With exponential growth in the size of computer networks and developed applications, the significant increasing of the potential damage that can be caused by launching attacks is becoming obvious. Meanwhile, Intrusion Detection Systems (IDSs) and Intrusion Prevention Systems (IPSs) are one of the most important defense tools against the sophisticated and ever-growing network attacks. Due to the lack of adequate dataset, anomaly-based approaches in intrusion detection systems are suffering from accurate deployment, analysis and evaluation. There exist a number of such datasets such as DARPA98, KDD99, ISC2012, and ADFA13 that have been used by the researchers to evaluate the performance of their proposed intrusion detection and intrusion prevention approaches. Based on our study over eleven available datasets since 1998, many such datasets are out of date and unreliable to use. Some of these datasets suffer from lack of traffic diversity and volumes, some of them do not cover the variety of attacks, while others anonymized packet information and payload which cannot reflect the current trends, or they lack feature set and metadata. This paper produces a reliable dataset that contains benign and seven common attack network flows, which meets real world criteria and is publicly available. Consequently, the paper evaluates the performance of a comprehensive set of network traffic features and machine learning algorithms to indicate the best set of features for detecting the certain attack categories.

Adam: A Method for Stochastic Optimization

Article

Dec 2014

We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions. The method is straightforward to implement and is based an adaptive estimates of lower-order moments of the gradients. The method is computationally efficient, has little memory requirements and is well suited for problems that are large in terms of data and/or parameters. The method is also ap- propriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The method exhibits invariance to diagonal rescaling of the gradients by adapting to the geometry of the objective function. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. We demonstrate that Adam works well in practice when experimentally compared to other stochastic optimization methods.

Malicious Packet Classification Based on Neural Network Using Kitsune Features

Abstract

Recommended publications

PIXEL-ORIENTED VISUALIZATION FOR EXPLAINING DATA CLASSIFICATION IN A MULTILAYER NEURAL NETWORK

Generating Labeled Training Datasets Towards Unified Network Intrusion Detection Systems

Packet-Level Intrusion Detection Using LSTM Focusing on Personal Information and Payloads

Consolidating Packet-Level Features for Effective Network Intrusion Detection: A Novel Session-Level...

Which Packet Did They Catch? Associating NIDS Alerts with Their Communication Sessions