Description of UNSW-NB15 dataset

Description of UNSW-NB15 dataset

Source publication
Article
Full-text available
Intrusion detection system (IDS) plays an important role in network security by discovering and preventing malicious activities. Due to the complex and time-varying network environment, the network intrusion samples are submerged into a large number of normal samples, which leads to insufficient samples for model training and detection results with...

Context in source publication

Context 1
... partitioned dataset consists of 42 features with their parallel class labels which are normal and nine different attacks. The information regarding simulated attacks category and its detailed statistics are described in Table 3. ...

Similar publications

Article
Full-text available
Thanks to its ability to face unknown attacks, Anomaly-based Intrusion Detection is a key research topic in network security and different statistical methods, fed by suitable traffic features , have been proposed in the literature. The choice of a proper dataset is a critical element not only for performance comparison, but also for the correct id...
Article
Full-text available
In recent years, the security industry has seen an exponential increase in cyber-attacks. These attacks have been effective in accomplishing their despicable goals. A secure network needs a robust intrusion detection scheme. Traditional machine learning approaches seem to be inefficient in the face of dynamic communication networks and various intr...
Article
Full-text available
The notorious attacks of the last few years have propelled cyber security to the top of the boardroom agenda, and raised the level of criticality to new heights. Therefore, building a secure system has become an important issue that cannot be delayed. In this paper, we propose an intrusion detection approach based on incremental long short-term mem...
Article
Full-text available
Intrusion Detection Systems (IDS) are critical components of cyber security because they recognize network anomalies traffic. Severe security concerns are currently posed to the Internet and computer networks. These threats always evolve and will eventually mutate into new and undiscovered varieties. We propose putting it into action for deep learn...
Article
Full-text available
The fast improvement of Machine-Learning (ML) methods gives rise to new attacks in Information System (IS). Simultaneously, ML also creates new opportunities for network intrusion detection. Early network intrusion detection is a valuable asset for IS security, as it fosters early deployment of countermeasures and reduces the impact of attacks on s...

Citations

... First, we lower the noise samples in the majority category using one-side selection (OSS). Then, we utilize the Synthetic Minority Over-sampling Technique (SMOTE) [59] to augment the minority samples. ...
Article
Full-text available
Computer networks face vulnerability to numerous attacks, which pose significant threats to our data security and the freedom of communication. This paper introduces a novel intrusion detection technique that diverges from traditional methods by leveraging Recurrent Neural Networks (RNNs) for both data preprocessing and feature extraction. The proposed process is based on the following steps: (1) training the data using RNNs, (2) extracting features from their hidden layers, and (3) applying various classification algorithms. This methodology offers significant advantages and greatly differs from existing intrusion detection practices. The effectiveness of our method is demonstrated through trials on the Network Security Laboratory (NSL) and Canadian Institute for Cybersecurity (CIC) 2017 datasets, where the application of RNNs for intrusion detection shows substantial practical implications. Specifically, we achieved accuracy scores of 99.6% with Decision Tree, Random Forest, and CatBoost classifiers on the NSL dataset, and 99.8% and 99.9%, respectively, on the CIC 2017 dataset. By reversing the conventional sequence of training data with RNNs and then extracting features before applying classification algorithms, our approach provides a major shift in intrusion detection methodologies. This modification in the pipeline underscores the benefits of utilizing RNNs for feature extraction and data preprocessing, meeting the critical need to safeguard data security and communication freedom against ever-evolving network threats.
... Jiang et al. [19] a feature reduction approach on the basis of the RF algorithm was implemented to create the Feature Importance (FI) scores for every attribute in the dataset of UNSW-NB15. Based on their significance in the classification process w.r.t. ...
Article
Full-text available
Computer networks rely on Intrusion Detection Systems (IDSs) and Intrusion Prevention Systems (IPSs) to ensure the security, reliability, and availability of an organization. In recent years, various approaches were developed and implemented to create effective IDSs and IPSs. This paper specifically focuses on IDSs that utilize Machine Learning (ML) techniques for improved accuracy. ML-based IDSs have verified to be successful in discovering network attacks. However, their performance tends to decline when dealing with high-dimensional data spaces. It is essential to develop a suitable feature extraction strategy that could identify and remove irrelevant features that do not significantly classification process to address this issue. Additionally, many ML-based IDSs exhibit high false positive rates and poor detection accuracy when trained on unbalanced datasets. In this study, we analyze the UNSW-NB15 IDS, which will serve as the training and testing data for our models. In order to reduce the feature space and improve the efficiency of our analysis, we leverage a filter-based feature reduction method utilizing the Pearson correlation coefficient algorithm. By identifying and selecting only the most relevant features, we are able to streamline our dataset and focus on the variables that have the highest impact on our analysis. This approach not only reduces computational complexity but also improves the interpretability of our results by eliminating unnecessary noise from the data. After applying the feature reduction technique, we proceed to implement a range of machine learning methods to perform our classification task. These include well-known algorithms such as Stacking, Extra Trees, Multi-Layer Perceptron, XGBoost, K-Nearest Neighbors, Logistic Regression, Naïve Bayes, Support Vector Machine, Random Forest, and Decision Tree. By employing a diverse set of algorithms, we are able to explore different modeling approaches and evaluate their effectiveness in accurately classifying the various types of assaults. In order to assess the performance of our classification models, we utilize a range of specialized evaluation metrics such as Root Mean Square Error (RMSE), Mean Absolute Error (MAE), R2-Score, Mean Squared Error (MSE), Precision, F1-Score, Recall, and Accuracy. These metrics provide us with a comprehensive understanding of how well our models are performing across different dimensions, including the accuracy of predictions, the level of precision in classifying different assault types, and the overall goodness-of-fit of our models. By considering multiple evaluation metrics, we are able to gain a more nuanced understanding of the strengths and weaknesses of each algorithm and make informed decisions about their suitability for our classification task. These metrics deliver a complete evaluation of the classifiers’ effectiveness in detecting community intrusions.
... The study on the use of machine learning algorithms for Intrusion Detection Systems (IDS) reveals a significant focus on evaluating the performance of these algorithms using datasets like UNSW-NB15. Studies such as Jiang et al. [2] and Vitorino et al. [3] have specifically utilized the UNSW-NB15 dataset to test the effectiveness of various classification algorithms for network intrusion detection. Jiang et al. [2] verified their network intrusion detection algorithm using the UNSW-NB15 dataset, achieving classification accuracies of 83.58% and 77.16%. ...
... Studies such as Jiang et al. [2] and Vitorino et al. [3] have specifically utilized the UNSW-NB15 dataset to test the effectiveness of various classification algorithms for network intrusion detection. Jiang et al. [2] verified their network intrusion detection algorithm using the UNSW-NB15 dataset, achieving classification accuracies of 83.58% and 77.16%. Vitorino et al. [3] developed models based on SVM, XGBoost, LightGBM, iForest, LOF, and a DRL model for IoT intrusion detection, showcasing the dataset's versatility in different contexts. ...
Article
Full-text available
Network intrusion detection is a critical aspect of cybersecurity, aiming to distinguish between normal and malicious network activities. This study evaluates the performance of various machine learning algorithms on the UNSW-NB15 dataset for binary classification of network traffic into normal and attack categories. We employed several preprocessing steps, including handling missing values, encoding categorical features, and addressing class imbalance using a mix of Synthetic Minority Over-sampling Technique (SMOTE) and undersampling. The models evaluated include k-Nearest Neighbors (k-NN), Naive Bayes, Logistic Regression, Support Vector Machines (SVM), and Neural Networks. Our experimental results show that complex models like Neural Networks and SVMs significantly outperform simpler models. The Neural Network model achieved the highest accuracy of 92%, with a precision of 91%, recall of 93%, and an F1-score of 92%. SVM also performed robustly with an accuracy of 90%. Simpler models, while less effective, still achieved respectable performance, with Logistic Regression and k-NN reaching accuracies of 88% and 85%, respectively. The study highlights the importance of comprehensive preprocessing and the implementation of advanced machine learning techniques for effective network intrusion detection. The results suggest that while complex models offer superior detection capabilities, simpler models can still be valuable in resource-constrained environments. Future research should focus on applying these models to real-world data, exploring more advanced neural network architectures, and implementing cost-sensitive learning techniques to further enhance detection performance and efficiency.
... Wang et al. [27] developed a prediction model for network flow, combining deep learning with ensemble learning strategies. Various deep learning architectures, such as Long Short-Term Memory (LSTM) and Convolutional Neural Networks (CNN), have been employed to boost the effectiveness of IDS systems [28], [29]. Li et al. [30] proposed a method that combines LSTM to analyze temporal characteristics and CNN to assess spatial features for detecting HTTP malicious traffic in mobile networks. ...
Preprint
Full-text available
Distributed Denial of Service (DDoS) attacks pose an increasingly substantial cybersecurity threat to organizations across the globe. In this paper, we introduce a new deep learning-based technique for detecting DDoS attacks, a paramount cybersecurity challenge with evolving complexity and scale. Specifically, we propose a new dual-space prototypical network that leverages a unique dual-space loss function to enhance detection accuracy for various attack patterns through geometric and angular similarity measures. This approach capitalizes on the strengths of representation learning within the latent space (a lower-dimensional representation of data that captures complex patterns for machine learning analysis), improving the model's adaptability and sensitivity towards varying DDoS attack vectors. Our comprehensive evaluation spans multiple training environments, including offline training, simulated online training, and prototypical network scenarios, to validate the model's robustness under diverse data abundance and scarcity conditions. The Multilayer Perceptron (MLP) with Attention, trained with our dual-space prototypical design over a reduced training set, achieves an average accuracy of 94.85% and an F1-Score of 94.71% across our tests, showcasing its effectiveness in dynamic and constrained real-world scenarios.
... Next, we compare the performance of KOMIG to state-ofthe-art IDSs such as CNN-BiLSTM IDS [78], RSHPO-DMN IDS [54], and the memory-augmented deep auto-encoder IDS, MemAE [79]. A comparison of the detection accuracies of the four models is illustrated in Figure 8. ...
Article
Full-text available
The security of communication networks can be compromised through both known and novel attack methods. Protection against such attacks may be achieved through the use of an intrusion detection system (IDS), which can be designed by training machine learning models to detect cyberattacks. In this paper, the KOMIG (knapsack optimization and mutual information gain) IDS was developed to detect network intrusions. The KOMIG IDS combined the strengths of optimization and machine learning together to achieve a high intrusion detection performance. Specifically, KOMIG IDS comprises a 2-stage feature selection procedure; the first was accomplished with a knapsack optimization algorithm and the second with a mutual information gain filter. In particular, we developed an optimization model for the selection of the most important features from a network intrusion dataset. Then, a new set of features was synthesized from the selected features and combined with the selected features to form a candidate features set. Next, we applied an information gain filter to the candidate features set to prune out redundant features, leaving only the features that possess the maximum information gain, which were used to train machine learning models. The proposed KOMIG IDS was applied to the UNSW-NB15 dataset, which is a well-known network intrusion evaluation dataset, and the resulting data, after optimization operation, were used to train four machine learning models, namely, logistic regression (LR), random forest (RF), decision tree (DT), and K-nearest neighbors (KNN). Simulation experiments were conducted, and the results revealed that our proposed KNN-based KOMIG IDS outperformed comparative schemes by achieving an accuracy score of 97.14%, a recall score of 99.46%, a precision score of 95.53%, and an F1 score of 97.46%.
... Flow features, also called network-based features, are the major features in NIDS, especially in ML-based detection systems [58,59]. In order to utilize host-based features, two-dimensional feature data is flattened into a onedimensional vector. ...
... In addition, neural networks have substantial advancements due to their improved predicting abilities [16] and deep neural network approaches for identifying malware. The authors employed CNN (Convolutional Neural Network) [17] to obtain a hierarchy of description of features using network call traces, and they claimed that the deep neural network techniques performed well for classifying malware [18]. The two primary stages of a system that detects intrusions are pre-processing the data and detecting the attacks. ...
Article
Full-text available
The detection of intrusions has a significant impact on providing information security, and it is an essential technology to recognize diverse network threats effectively. This work proposes a machine learning technique to perform intrusion detection and classification using multiple feature extraction and testing using an Extreme learning machine (ELM). The model is evaluated on the two network intrusion datasets (NSL-KDD and UNSW-NB15), which consist of real-time network traffic. The arithmetic, gradient, and statistical features were extracted and evaluated with the proposed model. The method’s efficacy is assessed using accuracy, sensitivity, specificity, precision, and F1-score. The proposed method achieves 94.5%, 97.61%, 96.91%, 96.51%, and 97.05% accuracy, sensitivity, specificity, precision, and F1-score for NSL-KDD and 94.3%, 98,36%, 99.31%, 99.67% and 99.01% of accuracy, sensitivity, specificity, precision and F1-score for the UNSWNB-15 dataset respectively, which is better performance outcomes when compared to other existing works.
... The specific problems in existing approaches in the detection of intrusions are specified below: Both hybrid and individual (signature-based and anomaly-based intrusion detection systems) still have major issues due to a lack of features and security. Several deep learning approaches have been proposed to precisely detect anomalies with the special characteristics of neural networks [37,38]. An intrusion detection system based on hybrid sampling with a deep hierarchical network was proposed in [37]. ...
... Several deep learning approaches have been proposed to precisely detect anomalies with the special characteristics of neural networks [37,38]. An intrusion detection system based on hybrid sampling with a deep hierarchical network was proposed in [37]. Initially, the balancing of data was executed by incorporating OSS and SMOTE-based hybrid sampling. ...
Article
Full-text available
With the increasing adoption of Internet of Things (IoT) networks, ensuring their security has become a critical concern due to resource limitations and the growing complexity of malicious attacks. Intrusion Detection and Prevention Systems play a pivotal role in safeguarding network performance, but traditional methods often struggle with attack severity and classifying unknown packets. In this research, we introduce the Attention-IDS model, a comprehensive solution comprising five stages: two-fold authentication, local density-based clustering, flow-based feature extraction, intrusion detection system (IDS), and intrusion severity detection. Leveraging IoT devices and user-based authentication, our model effectively detects and prevents unauthorized access attempts, while ensuring enhanced security through the utilization of the Combine Counter Mode algorithm on the blockchain. The IDS stage, powered by the Isolation Forest algorithm, accurately classifies features as normal, malicious, or unknown. Leveraging the proposed Attention-based ResNet model, our approach intelligently classifies unknown packets into normal and malicious categories, employing feature extraction, selection, and classification. Additionally, the Extended Kalman Filter determines intrusion severity, enabling network-wide notification alarms for frequent intrusions and targeted responses for rare intrusions. Extensive simulations using the NS3.26 network simulator demonstrate the superior performance of Attention-IDS compared to existing methods.
... Firstly, a single CNN or LSTM model may not fully exploit the information in complex network traffic data, resulting in low detection accuracy or ineffectiveness against specific types of attacks [26]. Secondly, due to the complexity and diversity of SDN environments, traditional single deep learning models may not adapt well to different network topologies and traffic patterns [27]. ...
... 3 Proposed methodology SDN-based networks utilize a centralized control plane, making them vulnerable to DoS/DDoS attacks and susceptible to single points of failure [27]. To mitigate the sophisticated cyber threats in networks, the development of a reliable IDS framework is required. ...
Article
Full-text available
Software-defined networking (SDN) is a new network paradigm, which is highly decoupled compared to traditional networks, and makes it easier to operate by separating the data and control planes of the network, promoting logical centralization of network control, and introducing the ability to program the network. Due to the feature of logically centralized control, the attack on the controller will lead to the paralysis of the entire network, so the intrusion detection is particularly important for SDN. With the rise of artificial intelligence network, machine learning technology and deep learning technology have been applied in all aspects of life. Due to the advantages of high accuracy, light weight, and fast response speed, deep learning is beneficial for intrusion detection. However, the methods proposed at this stage are mainly concentrated in traditional networks, and they are often used to detect DDoS (Distributed Denial of Service) attacks only, which cannot be applied to SDN directly to classify different attack types specific to this new network paradiam. In this work, we propose a hybrid Long Short Term Memory (LSTM)-based multi-class intrusion detection method, i.e., Convolutional Neural Network with Attention (CNNA)-BiLSTM to detect 8 common intrusion types on the InSDN dataset. Firstly, a feature selection method is proposed for the high-dimensional data of SDN network data traffic to extract the positive features that are effective for model decision-making, reduce the misleading of the model by unfavorable and negative features, and decrease the computational cost. Secondly, a multi-class intrusion detection model based on multi-output nodes and hybrid BiLSTM with attention is proposed to improve the accuracy of the model for emerging detection. The proposed deep learning model provides a better classification result in two-class and multi-class problems compared with other methods. It achieves an accuracy of 99.86% and 99.31% on two-class and multi-class scenarios, respectively. Moreover, our proposed model can accurately detect each category in multi-classification detection, while other standard models cannot detect Botnet, Web, and U2R attacks accurately because of their small sample scales.
... The VMIDS is identical to any of the three IDSs listed above, except it is run on a remote virtual machine instead of locally [8]. There is a low probability of false alarms with these sorts of systems, but the systems are not able to identify assaults that have not been previously recognized with no historical information [9][10]. To address this concern, a new machine learning-based framework is developed in this paper. ...
... Where indicates encoding function (tangent activation function), 1 represents the weight matrix of the encoder, and indicates bias vector. Further, the output ( ) is obtained by decoding the original information, which is mathematically represented in Eq. (10). Then, the objective ( , ) is minimized for training the autoencoder that is defined in Eq. (11). ...
Research
Full-text available
Currently, cloud computing and its application is a popular area of research in which, intrusion detection has become an imperative system for detecting several security breaches. The main motivation of this research is to develop an automated intrusion detection system (IDS) for the detection of intrusions in the cloud and internet of things (IoT) systems. After the acquisition of intrusion data from the NSL-KDD, CICIDS2017, Kyoto 2006+, and UNSW-NB15 databases, the Min-Max normalization approach is employed for data rescaling. Then, the relevant attributes/features are selected by proposing pareto optimality based grasshopper optimization algorithm (POGOA), where the selection of relevant features efficiently reduces the system's complexity and computational time. In the POGOA, the relevant features are selected based on pareto optimal solutions that help in enhancing the premature convergence and distribution rate of the traditional GOA. Further, the selected features are given to the stacked autoencoder model for classifying the normal and attack classes. The proposed POGOA with stacked autoencoder model's experiment is conducted using the Matlab environment. The proposed model shows better performance by means of precision, f1-score, accuracy, and recall when related to the comparative models. The proposed POGOA with stacked autoencoder model has obtained 99.32%, 99.84%, 99.56%, and 97.24% of detection accuracy on the CICIDS2017, NSL-KDD, Kyoto 2006+, and UNSW-NB15 databases.