Conference Paper

Deep Learning Based Latent Feature Extraction for Intrusion Detection

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Luego se utilizó un clasificador Softmax para identificar los ataques. De manera muy similar, en (Mighan and Kahani, 2018) se utiliza un SDAE para la representación reducida de los datos y luego mediante una máquina de soporte vectorial se realiza la clasificación de los ataques en redes de tráfico como PU-IDS Dataset (Singh et al., 2015). Los autores de (Farahnakian and Heikkonen, 2018) utilizan un AE profundo entrenado de manera glotona (en inglés: greedy) por Revista Cubana de Ciencias Informáticas Vol. ...
... En la literatura, como se describieron brevemente en la sección anterior, se han reportado numerosos trabajos donde se han dado evidencias del uso de los diferentes tipos de redes neuronales profundas para la detección de anomalías (Javaid et al., 2016;Yu et al., 2017b;Seng and Wong, 2017;Pumsirirat and Yan, 2018;Zheng et al., 2018b,a;Fu et al., 2016;Luo et al., 2017;Yu et al., 2017a;Farahnakian and Heikkonen, 2018;Mighan and Kahani, 2018;Roy et al., 2017;Kim et al., 2017b;Zhao et al., 2017;Qu et al., 2017;Vinayakumar et al., 2017;Loukas et al., 2018;Wang and Xu, 2018;Yin et al., 2017;Tang et al., 2018;Kim and Kim, 2015;Kim et al., 2017a). En estos trabajos se pudo observar el uso de las redes neuronales profundas para tareas como la reducción de dimensionalidad y/o la representación de los datos y además como clasificador para la detección de anomalías, ya sea como fraudes en redes bancarias o ataques de intrusiones en redes de tráficos de datos. ...
... La Habana, Cuba rcci@uci.cu 2017;Farahnakian and Heikkonen, 2018;Mighan and Kahani, 2018;Roy et al., 2017;Kim et al., 2017b;Zhao et al., 2017;Qu et al., 2017;Vinayakumar et al., 2017;Loukas et al., 2018;Yin et al., 2017;Tang et al., 2018;Kim and Kim, 2015;Kim et al., 2017a) solo proponen nuevas configuraciones obtenidas empíricamente para la optimización de las redes sobre los tipos de datos analizados.ConclusionesEn este reporte se presenta una revisión de los métodos reportados para la detección de anomalías, específicamente detección de fraudes e intrusiones, basados en aprendizaje profundo y se enfatiza en el aporte científico brindado por estos métodos en el proceso de la detección de anomalías. En esta revisión se categorizaron los métodos reportados según el tipo de DNN usada. ...
Article
Full-text available
Anomaly detection is a Data Mining technique that allows the recognition of new patterns with unusual behavior, which can be translated as non-valid actions or anomalies over the data. Anomaly detection has allowed the identification and prevention of malicious activities such as fraud and intrusions, among others. The use of traditional techniques of anomaly detection has reported very good results. However, in the last years, more relevant results have been reported through the use of deep learning techniques. The aim of this report is to give a revision of the principal and most recent state-of-the-art methods for anomaly (fraud and intrusions) detection based on the deep learning technique, which we categorized according to the kind of the used deep neural network.
... In contrast, the main aim behind AE is to learn a compressed and distributed representation of a given data [25]. In other words, AE aims to process a data as input and output the same data itself. ...
... Unlike the traditional feature selection techniques where the output is a subset of features, the selected features in AE take different form. Similar to principle component analysis (PCA), AE will summarize the features with lower dimension using mathematical values [25]. These values are depicted in the hidden nodes of the AE and it can summarize or generalized the feature space. ...
Article
Full-text available
The evolution in gadgets where various devices have become connected to the internet such as sensors, cameras, smartphones, and others, has led to the emergence of internet of things (IoT). As any network, security is the main issue facing IoT. Several studies addressed the intrusion detection task in IoT. The majority of these studies utilized different statistical and bio-inspired feature selection techniques. Deep learning is a family of techniques that demonstrated remarkable performance in the field of classification. The emergence of deep learning techniques has led to configure new neural network architectures that is designed for the feature selection task. This study proposes a deep learning architecture known as auto-encoder (AE) for the task of feature selection in IoT intrusion detection. A benchmark dataset for IoT intrusions has been considered in the experiments. The proposed AE has been carried out for the feature selection task along with a simple neural network (NN) architecture for the classification task. Experimental results showed that the proposed AE showed an accuracy of 99.97% with a false alarm rate (FAR) of 1.0. The comparison against the state of the art proves the efficacy of AE. © 2022 Institute of Advanced Engineering and Science. All rights reserved.
... In [14], the author suggested about a hybrid framework of NIDS in which SAE network and support vector machine (SVM) are merged together to obtain high precision rate, accuracy and low false alarm. Here SAE is responsible for the dimensionality reduction of features to obtain the latent features and SVM is enough capable to recognize the intrusions using binary classification. ...
... Rosay et al. Different types of attacks are not identified Choi et al. [10] 1) AE classifier 2) Detection of abnormal data with high accuracy 1) Multiple datasets are not analyzed Khan et al. [12] 1) SAE along with soft max classifier 2) Low power consumption and false alarm 1) Value of different parameters are not chosen using any optimized techniqueMighan et al.[14] 1) SAE and SVM classifier 2) High precision and accuracy rate 3) Low false alarm 1) Attacks, belongs in more than one classes ,are not classified ...
Conference Paper
Full-text available
Evolution and proliferation of the state-of-the-art technologies enable a human life marked by convenience and ease of access. Due to this technological advancement, difficulties are also created which are observed across various segments. Security issues in the field of networking are no exceptions to this. Networking also experiences lots of troubles produced by diffident types of complex intrusions. As a countermeasure to these troubles, a sophisticated and well-organized system, defined as network intrusion detection system (NIDS), has been introduced. It is aimed not only at the improvement of detecting accuracy but also at the correct identification of unspecified attacks. Over the last few decades, several technologies are introduced for developing NIDS to ensure a high degree of privacy and reliability. With the progression of modern deep learning technologies, NIDS performs wonders in the field of information security. In this paper, a number of deep learning based NIDSs are reviewed in a wide range of network oriented infrastructures, viz., traditional and ad hoc networks. This paper mainly discusses about the applications, limitations and methodologies of various NIDSs to provide a complete and transparent overview of network security to the readers.
... We call our framework SAE-SVM in big data. The first two phases of our method are the same as those presented in our paper in [33], which followed binary classification. Definition 1 (Flow of packet) Each flow of packets, which includes a combination of packets, has several characteristics. ...
... The network is trained in 1000 epoch and its batch size is 256. This experiment was done in [33]. ...
Article
Full-text available
This paper successfully tackles the problem of processing a vast amount of security related data for the task of network intrusion detection. It employs Apache Spark, as a big data processing tool, for processing a large size of network traffic data. Also, we propose a hybrid scheme that combines the advantages of deep network and machine learning methods. Initially, stacked autoencoder network is used for latent feature extraction, which is followed by several classification-based intrusion detection methods, such as support vector machine, random forest, decision trees, and naive Bayes which are used for fast and efficient detection of intrusion in massive network traffic data. A real time UNB ISCX 2012 dataset is used to validate our proposed method and the performance is evaluated in terms of accuracy, f-measure, sensitivity, precision and time.
... Stateful protocol detection approaches compare the detected actions and recognize the unconventionality of the state of the protocol, and take advantage of both signature and anomaly-based attack detection approaches. In general, IDS is categorized into three types according to its architecture: Host intrusion detection system (HIDS), Network intrusion detection system (NIDS), and a hybrid approach [5,6]. ...
... A type of IDS in which a host computer plays a dynamic role in which application software is installed and useful for the monitoring and evaluation of system behavior is called a host-based intrusion detection system. In a HIDS event log files play a key role in intrusion detection [5,7]. Unlike HIDS, which evaluates every host individually, NIDS evaluates the flow of packets over the network. ...
Article
Full-text available
With the rapid advancements of ubiquitous information and communication technologies, a large number of trustworthy online systems and services have been deployed. However, cybersecurity threats are still mounting. An intrusion detection (ID) system can play a significant role in detecting such security threats. Thus, developing an intelligent and accurate ID system is a non-trivial research problem. Existing ID systems that are typically used in traditional network intrusion detection system often fail and cannot detect many known and new security threats, largely because those approaches are based on classical machine learning methods that provide less focus on accurate feature selection and classification. Consequently, many known signatures from the attack traffic remain unidentifiable and become latent. Furthermore, since a massive network infrastructure can produce large-scale data, these approaches often fail to handle them flexibly, hence are not scalable. To address these issues and improve the accuracy and scalability, we propose a scalable and hybrid IDS, which is based on Spark ML and the convolutional-LSTM (Conv-LSTM) network. This IDS is a two-stage ID system: the first stage employs the anomaly detection module, which is based on Spark ML. The second stage acts as a misuse detection module, which is based on the Conv-LSTM network, such that both global and local latent threat signatures can be addressed. Evaluations of several baseline models in the ISCX-UNB dataset show that our hybrid IDS can identify network misuses accurately in 97.29% of cases and outperforms state-of-the-art approaches during 10-fold cross-validation tests.
... MLP algorithm is used for misuse identification. Reinforcement algorithm is used to discover anomalies [25]. ...
Article
Full-text available
The Internet of things (IoT) has gained more attention in recent years because of its ubiquitous operations, connectivity, methods of communication, and intelligent decisions to evoke activities from various devices. Therefore, artificial intelligence techniques have been integrated into all aspects of the Internet of Things and making life more comfortable in various ways. A novel deep learning model named Device-based Intrusion Detection System (DIDS) was proposed in the second phase. This DIDS learning model incorporates the prediction of unknown attacks to handle the computational overhead in large networks and increase the throughput with a low false alarm rate. Our proposed algorithm has been evaluated with standard algorithms, and the results show that it detects attacks earlier than standard algorithms. The computational time has also been reduced, and 99% of accuracy has been achieved in detecting the attacks.
... ere are many ways to use deep learning for feature extraction, such as Autoencoder (AE) [9][10][11][12], improved Autoencoder [1,[13][14][15], Long Short-Term Memory (LSTM) neural network [16], and Stacked Nonsymmetric Deep Autoencoder (SNDAE) [17]. Wang and Wang [9] implemented the dimensionality reduction and feature extraction of the original data by introducing AE and used the improved K-means algorithm to cluster the processed data. ...
Article
Full-text available
To address the problems of high reconstruction error and long training time when using Stack Nonsymmetric Deep Autoencoder (SNDAE) feature extraction technology for intrusion detection, Adam Nonsymmetric Deep Autoencoder (ANDAE) is proposed based on SNDAE. The Adam optimization algorithm is used to update network parameters during training so that the loss function can quickly converge to the ideal value. Under the premise of not affecting the effect of feature extraction, the network structure is simplified, and the training time of the network is reduced to realize the efficient extraction of the rapid growth of high-dimension and nonlinear network traffic features. For the low-dimensional prominent features extracted by ANDAE, Random Forest is used for classification to detect intrusion action, and a network intrusion detection model based on ANDAE feature extraction is implemented. The experimental results on the NSL-KDD and the CIC-IDS2017 datasets show that, compared to the SNDAE-based intrusion detection model, the ANDAE model has an average increase of 6.78% in accuracy, an average of 13.06% in recall, and an average of 14.9% in F1 scores. Feature extraction time is reduced by 23.1% on average. Thus, the ANDAE model is an intrusion detection solution, which can simultaneously improve detection accuracy and time efficiency.
... The same authors in [21] proposed a hybrid SAE-SVM model for a fast and efficient intrusion detection system. Based on the different types of experimental results they concluded that the model is to improve the accuracy and also prediction speed when compared to the existing state-of-the-art models. ...
Article
Full-text available
The present era is facing lot of Security, Privacy, and Integrity issues because of tremendous development in communication technology, data storage devices, and computing advancements leading to unavoidable losses. As a result of the aforementioned technological revolutions day by day, many of the organizations or institutions started migrating to cloud environment. Because of this, security issues have increased coupled with the advent of new ways of penetration into networks. Unauthorized users and many professionals with malicious intent started exploiting the legitimate users through cyber-crimes. So, there is a need to implement a proper Intrusion Detection System with optimization procedures. This paper proposes a hybrid Intrusion Detection approach with a combination of Constraints Optimized Stacked Autoencoder (COSAE) for dimension reduction and grid search based SVM-RBF classifier (GSVM-RBF). The COSAE+GSVM-RBF model enhanced the performance using a two-fold. i) The SAE is optimized through regularization techniques with the adoption of weight and dropout constraints, ii) To enhance the performance of the SVM classifier with RBF for tuning the hyperparameters using grid search. Various experiments are conducted to validate this model with four activation functions Scaled Exponential Linear Unit (SELU), Rectified Linear Unit, softplus, and Exponential Linear Unit (ELU) for dimension reduction using COSAE. The improvements carried out in this paper result in exploding gradients and vanishing gradients avoids overfitting in large datasets, intrusion detection rate, gain in computational time, and 100% F-Measure in classifying minor class labels. The proposed approach is validated on the CICIDS2017 dataset. Further, a comparative analysis of the proposed approach with state-of-the-art approaches has been conducted. Based on the experimental results it is observed that the proposed approach outperforms the prevailing approaches.
... However, this can be further evaluated using GPU acceleration and parallel platforms. Also, in [85], the authors introduced an IDS approach that uses SVM and deep learning to improve intrusion detection performance. They utilized a stacked autoencoder to decrease features and applied the SVM classifier for events classification into normal or attacks. ...
Article
Full-text available
Nowadays, the ever-increasing complication and severity of security attacks on computer networks have inspired security researchers to incorporate different machine learning methods to protect the organizations’ data and reputation. Deep learning is one of the exciting techniques which recently are vastly employed by the IDS or intrusion detection systems to increase their performance in securing the computer networks and hosts. This survey article focuses on the deep learning-based intrusion detection schemes and puts forward an in-depth survey and classification of these schemes. It first presents the primary background concepts about IDS architecture and various deep learning techniques. It then classifies these schemes according to the type of deep learning methods utilized in each of them. It describes how deep learning networks are utilized in the intrusion detection process to recognize intrusions accurately. Finally, a complete analysis of the investigated IDS frameworks is provided, and concluding remarks and future directions are highlighted.
... This part of the paper discusses the Auto-encoder-based IDS schemes Louati and Ktata, 2020;Mighan and Kahani, 2018;Ieracitano et al., 2020) introduced for different environments. For instance, in (Sadaf and Sultana, 2020), Sadaf and Sultana introduced an IDS approach denoted as Auto-IF, for real-time intrusion detection in fog computing environments using isolation forest and auto-encoder. ...
Article
Providing a high-performance Intrusion Detection System (IDS) can be very effective in controlling malicious behaviors and cyber-attacks. Regarding the ever-growing negative impacts of the security attacks on computer systems and networks, various Artificial Intelligence (AI)-based techniques have been used to introduce versatile IDS approaches. Deep learning is a branch of AI techniques, mainly based on multi-layer artificial neural networks. Recently, deep learning techniques have gained momentum in the intrusion detection domain and several IDS approaches are provided in the literature using various deep neural networks to deal with privacy concerns and security threats. For this purpose, this article focuses on the deep IDS approaches and investigates how deep learning networks are employed by different approaches in different steps of the intrusion detection process to achieve better results. It classifies the studied IDS schemes regarding the deep learning networks utilized in them and describes their main contributions and capabilities. Besides, in each category, their main features such as evaluated metrics, datasets, simulators, and environments are compared. Also, a comparison of the deep IDS approaches main properties are provided to illuminate the main techniques applied in them as well as the area less focused in the literature. Finally, the concluding remarks in the deep IDS context are provided and possible directions at the subsequent studies are listed.
... In [54], the authors introduced a hybrid IDS scheme that combines deep learning and SVM to improve intrusion detection performance. They utilized a stacked auto-encoder to decrease the features and applied the SVM classi er for events classi cation into normal or attacks. ...
Preprint
Full-text available
The ever-increasing complication and severity of the computer networks' security attacks have inspired security researchers to apply various machine learning methods to protect the organizations' data and reputation. Deep learning is one of the exciting techniques that recently have been widely used by intrusion detection systems (IDS) to secure computer networks and hosts' performance. This survey article focuses on the signature-based IDS using deep learning techniques and puts forward an in-depth survey and classification of these schemes. For this purpose, it first presents the essential background concepts about IDS architecture and various deep learning techniques. It then classifies these schemes according to the type of deep learning methods applied in each of them. It describes how deep learning networks are utilized in the misuse detection process to recognize intrusions accurately. Finally, a complete analysis of the investigated IDS frameworks is provided, and concluding remarks and future directions are highlighted.
... Zaman et al. [20] used a better-quality ID algorithm recognized as enhanced support vector decision function (ESVDF) and evaluated their proposed IDS using the DARPA dataset; the proposed IDS was found to be superior to other conventional ID approaches. [27] MCA + EMD 87.29 ISCX-2012 Thi-Thu et al. [28] FS + DT 95.33 ISCX-2012 Mighan et al. [29] SAE + SVM 90.3 ISCX-2012 Wang et al. [30] HAST − IDS 96.6 ISCX-2012 ...
Article
Machine learning (ML) algorithms are often used to design effective intrusion detection (ID) systems for appropriate mitigation and effective detection of malicious cyber threats at the host and network levels. However, cybersecurity attacks are still increasing. An ID system can play a vital role in detecting such threats. Existing ID systems are unable to detect malicious threats, primarily because they adopt approaches that are based on traditional ML techniques, which are less concerned with the accurate classication and feature selection. Thus, developing an accurate and intelligent ID system is a priority. The main objective of this study was to develop a hybrid intelligent intrusion detection system (HIIDS) to learn crucial features representation efciently and automatically from massive unlabeled raw network trafc data. Many ID datasets are publicly available to the cybersecurity research community. As such, we used a spark MLlib (machine learning library)-based robust classier, such as logistic regression (LR), extreme gradient boosting (XGB) was used for anomaly detection, and a state-of-the-art DL, such as a long short-term memory autoencoder (LSTMAE) for misuse attack was used to develop an efcient and HIIDS to detect and classify unpredictable attacks. Our approach utilized LSTM to detect temporal features and an AE to more efciently detect global features. Therefore, to evaluate the efcacy of our proposed approach, experiments were conducted on a publicly existing dataset, the contemporary real-life ISCX-UNB dataset. The simulation results demonstrate that our proposed spark MLlib and LSTMAE-based HIIDS signicantly outperformed existing ID approaches, achieving a high accuracy rate of up to 97.52% for the ISCX-UNB dataset respectively 10-fold crossvalidation test. It is quite promising to use our proposed HIIDS in real-world circumstances on a large-scale.
... The evaluation of the proposed hybrid method will be based on the common information retrieval metrics precision, recall and f-measure [28][29][30]. Such measures can be computed using the contingency table as shown in Table 2. ...
Article
Full-text available
With the development of web applications nowadays, intrusions represent a crucial aspect in terms of violating the security policies. Intrusions can be defined as a specific change in the normal behavior of the network operations that intended to violate the security policies of a particular network and affect its performance. Recently, several researchers have examined the capabilities of machine learning techniques in terms of detecting intrusions. One of the important issues behind using the machine learning techniques lies on employing proper set of features. Since the literature has shown diversity of feature types, there is a vital demand to apply a feature selection approach in order to identify the most appropriate features for intrusion detection. This study aims to propose a hybrid method of genetic algorithm and support vector machine. GA has been as a feature selection in order to select the best features, while SVM has been used as a classification method to categorize the behavior into normal and intrusion based on the selected features from GA. A benchmark dataset of intrusions (NSS-KDD) has been in the experiment. In addition, the proposed method has been compared with the traditional SVM. Results showed that GA has significantly improved the SVM classification by achieving 0.927 of f-measure.
... These ID techniques, ranging from simple statistic algorithms to advanced ML approaches, have been useful in extracting features from network traffic so that abnormal traffic can be distinguished from the normal traffic. In previous research, Naseer et al. [20], Bandyopadhyay et al. [21], Tama et al. [22], Albahar et al. [23], Tang et al. [24], Qatf et al. [25], Farahnakian et al. [26], Thi-Thu et al. [27], Pektas et al. [28], Mighan et al. [29], Meira et al. [30], Wang et al. [31] used various models, methods, and techniques based on conventional ML supervised and unsupervised approaches have been introduced for ID problems to increase the performance of the ID framework. ML approaches, such as k-NN [32], SVM [33], ANN [34], RF [35,36], and many others, have been extensively used for ID. ...
Article
Full-text available
Recently, due to the rapid development and remarkable result of deep learning (DL) and machine learning (ML) approaches in various domains for several long-standing artificial intelligence (AI) tasks, there has an extreme interest in applying toward network security too. Nowadays, in the information communication technology (ICT) era, the intrusion detection (ID) system has the great potential to be the frontier of security against cyberattacks and plays a vital role in achieving network infrastructure and resources. Conventional ID systems are not strong enough to detect advanced malicious threats. Heterogeneity is one of the important features of big data. Thus, designing an efficient ID system using a heterogeneous dataset is a massive research problem. There are several ID datasets openly existing for more research by the cybersecurity researcher community. However, no existing research has shown a detailed performance evaluation of several ML methods on various publicly available ID datasets. Due to the dynamic nature of malicious attacks with continuously changing attack detection methods, ID datasets are available publicly and are updated systematically. In this research, spark MLlib (machine learning library)-based robust classical ML classifiers for anomaly detection and state of the art DL, such as the convolutional-auto encoder (Conv-AE) for misuse attack, is used to develop an efficient and intelligent ID system to detect and classify unpredictable malicious attacks. To measure the effectiveness of our proposed ID system, we have used several important performance metrics, such as FAR, DR, and accuracy, while experiments are conducted on the publicly existing dataset, specifically the contemporary heterogeneous CSE-CIC-IDS2018 dataset.
... With the diversity and different characteristics of DoS, the process of detecting such attack is still facing obstacles [13][14][15][16]. Şimşek & Şentürk [17] have proposed method that utilize the pre-congestion in order to analyze the flow of data during this period. ...
Article
Full-text available
With the dramatic evolution in networks nowadays, an equivalent growth of challenges has been depicted toward implementing and deployment of such networks. One of the serious challenges is the security where wide range of attacks would threat these networks. Denial-of-Service (DoS) is one of the common attacks that targets several types of networks in which a huge amount of information is being flooded into a specific server for the purpose of turning of such server. Many research studies have examined the simulation of networks in order to observe the behavior of DoS. However, the variety of its types hinders the process of configuring the DoS attacks. In particular, the Distributed DoS (DDoS) is considered to be the most challenging threat to various networks. Hence, this paper aims to accommodate a comprehensive simulation in order to figure out and detect DDoS attacks. Using the well-known simulator technique of NS-2, the experiments showed that different types of DDoS have been characterized, examined and detected. This implies the efficacy of the comprehensive simulation proposed by this study.
... The proposed model achieves 99.97% accuracy and a very impressive FAR of 0.02% which are better than other stateof-the-art approaches. In [500], a hybrid SAE and SVM based IDS is proposed. The SAE model is used for dimensionality reduction and for extraction of 10 latent features and then the SVM model is used as classifier. ...
Preprint
Full-text available
div>This work aims to review the state-of-the-art deep learning architectures in Cyber Security applications by highlighting the contributions and challenges from various recent research papers. </div
... The proposed model achieves 99.97% accuracy and a very impressive FAR of 0.02% which are better than other stateof-the-art approaches. In [500], a hybrid SAE and SVM based IDS is proposed. The SAE model is used for dimensionality reduction and for extraction of 10 latent features and then the SVM model is used as classifier. ...
Preprint
Full-text available
div>This work aims to review the state-of-the-art deep learning architectures in Cyber Security applications by highlighting the contributions and challenges from various recent research papers. </div
... The unsupervised phase of the approach allows reducing the irrelevant normal traffic data for DDoS which allows reducing false-positive rates and increasing accuracy. Experiments performed using datasets NSL-KDD, UNB ISCX 12 and UNSW-NB15.The authors in [23] applied a hybrid scheme that combines deep learning and support vector machine to improve accuracy in ISCX IDS UNB dataset classes. The result indicated the combined model outperforms SVM alone in terms of both accuracy and run-time efficiency. ...
Chapter
Network intrusion detection is important for protecting computer networks from malicious attacks. However, class imbalance in network traffic data can make it difficult to detect intrusions accurately. To address this challenge, we propose a novel deep learning model called the deep regularizer learning model (DRLM). DRLM uses sample similarity across categories to enhance its ability to learn from imbalanced data. To enhance the representation ability of the neural network, DRLM also employs a feature extraction encoder consisting, LayerNorm and Skip-connection units. DRLM uses a adaptive contrastive loss function to optimize the model during training. The model validation is done against IoT-23 dataset, a real-time traffic data from numerous smart home IoT devices. Experimental results showed that the DRLM outperformed existing methods, demonstrating its superior generalization ability and its ability to handle class imbalance without additional pre-training or fine-tuning.
Article
Full-text available
The rapid increase in network traffic has recently led to the importance of flow-based intrusion detection systems processing a small amount of traffic data. Furthermore, anomaly-based methods, which can identify unknown attacks are also integrated into these systems. In this study, the focus is concentrated on the detection of anomalous network traffic (or intrusions) from flow-based data using unsupervised deep learning methods with semi-supervised learning approach. More specifically, Autoencoder and Variational Autoencoder methods were employed to identify unknown attacks using flow features. In the experiments carried out, the flow-based features extracted out of network traffic data, including typical and different types of attacks, were used. The Receiver Operating Characteristics (ROC) and the area under ROC curve, resulting from these methods were calculated and compared with One-Class Support Vector Machine. The ROC curves were examined in detail to analyze the performance of the methods in various threshold values. The experimental results show that Variational Autoencoder performs, for the most part, better than Autoencoder and One-Class Support Vector Machine.
Article
Full-text available
Efficiently detecting network intrusions requires the gathering of sensitive information. This means that one has to collect large amounts of network transactions including high details of recent network transactions. Assessments based on meta-heuristic anomaly are important in the intrusion related network transaction data’s exploratory analysis. These assessments are needed to make and deliver predictions related to the intrusion possibility based on the available attribute details that are involved in the network transaction. We were able to utilize the NSL-KDD data set, the binary and multiclass problem with a 20% testing dataset. This paper develops a new hybrid model that can be used to estimate the intrusion scope threshold degree based on the network transaction data’s optimal features that were made available for training. The experimental results revealed that the hybrid approach had a significant effect on the minimisation of the computational and time complexity involved when determining the feature association impact scale. The accuracy of the proposed model was measured as 99.81% and 98.56% for the binary class and multiclass NSL-KDD data sets, respectively. However, there are issues with obtaining high false and low false negative rates. A hybrid approach with two main parts is proposed to address these issues. First, data needs to be filtered using the Vote algorithm with Information Gain that combines the probability distributions of these base learners in order to select the important features that positively affect the accuracy of the proposed model. Next, the hybrid algorithm consists of following classifiers: J48, Meta Pagging, RandomTree, REPTree, AdaBoostM1, DecisionStump and NaiveBayes. Based on the results obtained using the proposed model, we observe improved accuracy, high false negative rate, and low false positive rule.
Conference Paper
Full-text available
Due to the advance of information and communication techniques, sharing information through online has been increased. And this leads to creating the new added value. As a result, various online services were created. However, as increasing connection points to the internet, the threats of cyber security have also been increasing. Intrusion detection system(IDS) is one of the important security issues today. In this paper, we construct an IDS model with deep learning approach. We apply Long Short Term Memory(LSTM) architecture to a Recurrent Neural Network(RNN) and train the IDS model using KDD Cup 1999 dataset. Through the performance test, we confirm that the deep learning approach is effective for IDS.
Conference Paper
Full-text available
Intrusion Detection System (IDS) plays an effective way to achieve higher security in detecting malicious activities for a couple of years. Anomaly detection is one of intrusion detection system. Current anomaly detection is often associated with high false alarm with moderate accuracy and detection rates when it's unable to detect all types of attacks correctly. To overcome this problem, we propose an hybrid learning approach through combination of K-Means clustering and Naïve Bayes classification. The proposed approach will be cluster all data into the corresponding group before applying a classifier for classification purpose. An experiment is carried out to evaluate the performance of the proposed approach using KDD Cup '99 dataset. Result show that the proposed approach performed better in term of accuracy, detection rate with reasonable false alarm rate.
Article
Full-text available
This paper introduces a hybrid scheme that combines the advantages of deep belief network and support vector machine. An application of intrusion detection imaging has been chosen and hybridization scheme have been applied to see their ability and accuracy to classify the intrusion into two outcomes: normal or attack, and the attacks fall into four classes; R2L, DoS, U2R, and Probing. First, we utilize deep belief network to reduct the dimensionality of the feature sets. This is followed by a support vector machine to classify the intrusion into five outcome; Normal, R2L, DoS, U2R, and Probing. To evaluate the performance of our approach, we present tests on NSL-KDD dataset and show that the overall accuracy offered by the employed approach is high.
Conference Paper
A few exploratory works studied Restricted Boltzmann Machines (RBMs) as an approach for network intrusion detection, but did it in a rather empirical way. It is possible to go one step further taking advantage from already mature theoretical work in the area. In this paper, we use RBMs for network intrusion detection showing that it is capable of learning complex datasets. We also illustrate an integrated and systematic way of learning. We analyze learning procedures and applications of RBMs and show experimental results for training RBMs on a standard network intrusion detection dataset.
Conference Paper
Recently, deep learning has gained prominence due to the potential it portends for machine learning. For this reason, deep learning techniques have been applied in many fields, such as recognizing some kinds of patterns or classification. Intrusion detection analyses got data from monitoring security events to get situation assessment of network. Lots of traditional machine learning method has been put forward to intrusion detection, but it is necessary to improvement the detection performance and accuracy. This paper discusses different methods which were used to classify network traffic. We decided to use different methods on open data set and did experiment with these methods to find out a best way to intrusion detection.
Conference Paper
With the advent of digital technology, security threats for computer networks have increased dramatically over the last decade being much bolder and brazen. There is a great need for an effective Intrusion Detection System (IDS) which are intelligent specialized system designed to interpret the intrusion attempts in incoming network traffic. Deep belief neural (DBN) networks proved to be the most influential deep neural nets and generative neural networks that stack Restricted Boltzmann Machines. In this paper, we explore the capabilities of DBN's performing intrusion detection through series of experiments after training it with NSL-KDD dataset. The trained DBN network now identifies any kind of unknown attack in dataset supplied to it and to the best of our knowledge this is first comprehensive paper performing intrusion detection using deep belief nets. The proposed system not only detect attacks but also classify them in five groups with the accuracy of identifying and classifying network activity based on limited, incomplete, and nonlinear data sources. The proposed system achieved detection accuracy about 97.5% for only fifty iterations that is state of art performance compare to the existing system till today for intrusion detection.
Article
Malicious JavaScript code in webpages on the Internet is an emergent security issue because of its universality and potentially severe impact. Because of its obfuscation and complexities, detecting it has a considerable cost. Over the last few years, several machine learning-based detection approaches have been proposed; most of them use shallow discriminating models with features that are constructed with artificial rules. However, with the advent of the big data era for information transmission, these existing methods already cannot satisfy actual needs. In this paper, we present a new deep learning framework for detection of malicious JavaScript code, from which we obtained the highest detection accuracy compared with the control group. The architecture is composed of a sparse random projection, deep learning model, and logistic regression. Stacked denoising auto-encoders were used to extract high-level features from JavaScript code; logistic regression as a classifier was used to distinguish between malicious and benign JavaScript code. Experimental results indicated that our architecture, with over 27 000 labeled samples, can achieve an accuracy of up to 95%, with a false positive rate less than 4.2% in the best case. Copyright
Conference Paper
In this paper, we present a Genetic Algorithm (GA) approach with an improved initial population and selection operator, to efficiently detect various types of network intrusions. GA is used to optimize the search of attack scenarios in audit files, thanks to its good balance exploration / exploitation; it provides the subset of potential attacks which are present in the audit file in a reasonable processing time. In the testing phase the Network Security Laboratory-Knowledge Discovery and Data Mining (NSL-KDD99) benchmark dataset has been used to detect the misuse activities. By combining the IDS with Genetic algorithm increases the performance of the detection rate of the Network Intrusion Detection Model and reduces the false positive rate.
Article
Today Network is one of the very important parts of life and a lot of essential activities are performed using network. Network security plays critical role in real life situations. This paper presents a Data Mining method in which various preprocessing methods are involved such as Normalization, Discretization and Feature selection. With the help of these methods the data is preprocessed and required features are selected. Here Naïve Bayes classifier is used in supervised learning method which classifies various network events for the KDD cup′99 Dataset. This dataset is the most commonly used dataset for Intrusion Detection.
Conference Paper
Intrusion detection systems (IDS) play a major role in detecting the attacks that occur in the computer or networks. Anomaly intrusion detection models detect new attacks by observing the deviation from profile. However there are many problems in the traditional IDS such as high false alarm rate, low detection capability against new network attacks and insufficient analysis capacity. The use of machine learning for intrusion models automatically increases the performance with an improved experience. This paper proposes a novel method of integrating principal component analysis (PCA) and support vector machine (SVM) by optimizing the kernel parameters using automatic parameter selection technique. This technique reduces the training and testing time to identify intrusions thereby improving the accuracy. The proposed method was tested on KDD data set. The datasets were carefully divided into training and testing considering the minority attacks such as U2R and R2L to be present in the testing set to identify the occurrence of unknown attack. The results indicate that the proposed method is successful in identifying intrusions. The experimental results show that the classification accuracy of the proposed method outperforms other classification techniques using SVM as the classifier and other dimensionality reduction or feature selection techniques. Minimum resources are consumed as the classifier input requires reduced feature set and thereby minimizing training and testing overhead time.
Article
With the rapid growth and the increasing complexity of network infrastructures and the evolution of attacks, identifying and preventing network abuses is getting more and more strategic to ensure an adequate degree of protection from both external and internal menaces. In this scenario many techniques are emerging for inspecting network traffic and discriminating between anomalous and normal behaviors to detect undesired or suspicious activities. Unfortunately, the concept of normal or abnormal network behavior depends on several factors and its recognition requires the availability of a model aiming at characterizing current behavior, based on a statistical idealization of past events. There are two main challenges when generating the training data needed for effective modeling. First, network traffic is very complex and unpredictable, and second, the model is subject to changes over time, since anomalies are continuously evolving. As attack techniques and patterns change, previously gained information about how to tell them apart from normal traffic may be no longer valid. Thus, a desirable characteristic of an effective model for network anomaly detection is its ability to adapt to change and to generalize its behavior to multiple different network environments. In other words, a self-learning system is needed. This suggests the adoption of machine learning techniques to implement semi-supervised anomaly detection systems where the classifier is trained with ''normal'' traffic data only, so that knowledge about anomalous behaviors can be constructed and evolve in a dynamic way. For this purpose we explored the effectiveness of a detection approach based on machine learning, using the Discriminative Restricted Boltzmann Machine to combine the expressive power of generative models with good classification accuracy capabilities to infer part of its knowledge from incomplete training data.
Article
In network intrusion detection, anomaly-based approaches in particular suffer from accurate evaluation, comparison, and deployment which originates from the scarcity of adequate datasets. Many such datasets are internal and cannot be shared due to privacy issues, others are heavily anonymized and do not reflect current trends, or they lack certain statistical characteristics. These deficiencies are primarily the reasons why a perfect dataset is yet to exist. Thus, researchers must resort to datasets that are often suboptimal. As network behaviors and patterns change and intrusions evolve, it has very much become necessary to move away from static and one-time datasets toward more dynamically generated datasets which not only reflect the traffic compositions and intrusions of that time, but are also modifiable, extensible, and reproducible. In this paper, a systematic approach to generate the required datasets is introduced to address this need. The underlying notion is based on the concept of profiles which contain detailed descriptions of intrusions and abstract distribution models for applications, protocols, or lower level network entities. Real traces are analyzed to create profiles for agents that generate real traffic for HTTP, SMTP, SSH, IMAP, POP3, and FTP. In this regard, a set of guidelines is established to outline valid datasets, which set the basis for generating profiles. These guidelines are vital for the effectiveness of the dataset in terms of realism, evaluation capabilities, total capture, completeness, and malicious activity. The profiles are then employed in an experiment to generate the desirable dataset in a testbed environment. Various multi-stage attacks scenarios were subsequently carried out to supply the anomalous portion of the dataset. The intent for this dataset is to assist various researchers in acquiring datasets of this kind for testing, evaluation, and comparison purposes, through sharing the generated datasets and profiles.
Article
K-means clustering algorithm is an effective method that has been proved for apply to the intrusion detection system. Particle swarm optimization (PSO) algorithm which is evolutionary computation technology based on swarm intelligence has good global search ability. With the deficiency of global search ability for K-means clustering algorithm, we propose a K-means clustering algorithm based on particle swarm optimization (PSO-KM) in this paper. The proposed algorithm has overcome falling into local minima and has relatively good overall convergence. Experiments on data sets KDD CUP 99 has shown the effectiveness of the proposed method and also shows the method has higher detection rate and lower false detection rate.
Article
Intrusion detection is a necessary step to identify unusual access or attacks to secure internal networks. In general, intrusion detection can be approached by machine learning techniques. In literature, advanced techniques by hybrid learning or ensemble methods have been considered, and related work has shown that they are superior to the models using single machine learning techniques. This paper proposes a hybrid learning model based on the triangle area based nearest neighbors (TANN) in order to detect attacks more effectively. In TANN, the k-means clustering is firstly used to obtain cluster centers corresponding to the attack classes, respectively. Then, the triangle area by two cluster centers with one data from the given dataset is calculated and formed a new feature signature of the data. Finally, the k-NN classifier is used to classify similar attacks based on the new feature represented by triangle areas. By using KDD-Cup ’99 as the simulation dataset, the experimental results show that TANN can effectively detect intrusion attacks and provide higher accuracy and detection rates, and the lower false alarm rate than three baseline models based on support vector machines, k-NN, and the hybrid centroid-based classification model by combining k-means and k-NN.
Article
The popularity of using Internet contains some risks of network attacks. Intrusion detection is one major research problem in network security, whose aim is to identify unusual access or attacks to secure internal networks. In literature, intrusion detection systems have been approached by various machine learning techniques. However, there is no a review paper to examine and understand the current status of using machine learning techniques to solve the intrusion detection problems. This chapter reviews 55 related studies in the period between 2000 and 2007 focusing on developing single, hybrid, and ensemble classifiers. Related studies are compared by their classifier design, datasets used, and other experimental setups. Current achievements and limitations in developing intrusion detection systems by machine learning are present and discussed. A number of future research directions are also provided.
Conference Paper
IDS (Intrusion Detection system) is an active and driving defense technology. This paper mainly focuses on intrusion detection based on clustering analysis. The aim is to improve the detection rate and decrease the false alarm rate. A modified dynamic K-means algorithm called MDKM to detect anomaly activities is proposed and corresponding simulation experiments are presented. Firstly, the MDKM algorithm filters the noise and isolated points on the data set. Secondly by calculating the distances between all sample data points, we obtain the high-density parameters and cluster-partition parameters, using dynamic iterative process we get the k clustering center accurately, then an anomaly detection model is presented. This paper used KDD CUP 1999 data set to test the performance of the model. The results show the system has a higher detection rate and a lower false alarm rate, it achieves expectant aim.
Article
From the publisher: This is the first comprehensive introduction to Support Vector Machines (SVMs), a new generation learning system based on recent advances in statistical learning theory. SVMs deliver state-of-the-art performance in real-world applications such as text categorisation, hand-written character recognition, image classification, biosequences analysis, etc., and are now established as one of the standard tools for machine learning and data mining. Students will find the book both stimulating and accessible, while practitioners will be guided smoothly through the material required for a good grasp of the theory and its applications. The concepts are introduced gradually in accessible and self-contained stages, while the presentation is rigorous and thorough. Pointers to relevant literature and web sites containing software ensure that it forms an ideal starting point for further study. Equally, the book and its associated web site will guide practitioners to updated literature, new applications, and on-line software.
Article
It is possible to combine multiple latent-variable models of the same data by multiplying their probability distributions together and then renormalizing. This way of combining individual "expert" models makes it hard to generate samples from the combined model but easy to infer the values of the latent variables of each expert, because the combination rule ensures that the latent variables of different experts are conditionally independent when given the data. A product of experts (PoE) is therefore an interesting candidate for a perceptual system in which rapid inference is vital and generation is unnecessary. Training a PoE by maximizing the likelihood of the data is difficult because it is hard even to approximate the derivatives of the renormalization term in the combination rule. Fortunately, a PoE can be trained using a different objective function called "contrastive divergence" whose derivatives with regard to the parameters can be approximated accurately and efficiently. Examples are presented of contrastive divergence learning using several types of expert on several types of data.
A hybrid malicious code detection method based on deep learning
  • Yuancheng Li
  • Rong Ma
  • Runhai Jiao
The Applications of Deep Learning on Traffic Identification
  • wang
A survey on secure network: intrusion detection & prevention approaches
  • bijone