ChapterPDF Available

Intrusion-Based Attack Detection Using Machine Learning Techniques for Connected Autonomous Vehicle

Authors:

Abstract and Figures

With advancements in technology, an important issue is ensuring the security of self-driving cars. Unfortunately, hackers have been developing increasingly complex and harmful cyberattacks, making them difficult to detect. Furthermore, due to the diversity of the data exchanged amongst these vehicles, traditional algorithms face difficulty detecting such threats. Therefore, a network intrusion detection system is essential in a connected autonomous vehicle's communication infrastructure. The IDS (intrusion detection system) aims to secure the network by identifying malicious and abnormal traffic in real-time. This paper focuses on the data preprocessing, feature extraction, attack detection for such a system.Additionally, it will compare the performance of this proposed IDS when operating in different machine learning models. We apply Linear Regression (LR), Linear Discriminant Analysis (LDA), K Nearest Neighbors (KNN), Classification and Regression Tree (CART), and Support Vector Machine (SVM) to classify the NSL-KDD dataset. The dataset was classified using binary and multiclass classification to train and test files. This data resulted in 94% and 98% accuracy for the train and test files, respectively, with KNN and CART algorithms.KeywordsMachine learningAutonomous vehicleCyberattacksIntrusionData preprocessingFeature engineeringML modelAccuracy
Content may be subject to copyright.
Intrusion-based Attack Detection Using Machine
Learning Techniques for Connected Autonomous Vehicle
Mansi Bhavsar1, Kaushik Roy2, Zhipeng Liu3, John Kelly4, Balakrishna Gokaraju5
1 North Carolina A&T State University, Greensboro, NC, USA
mhbhavsar@aggies.ncat.edu
2 North Carolina A&T State University, Greensboro, NC, USA
kroy@ncat.edu
3 North Carolina A&T State University, Greensboro, NC, USA
zilu2@aggies.ncat.edu
4 North Carolina A&T State University, Greensboro, NC, USA
jck@ncat.edu
5 North Carolina A&T State University, Greensboro, NC, USA
bgokalraju@ncat.edu
Abstract. With advancements in technology, an important issue is ensuring the security of
self-driving cars. Unfortunately, hackers have been developing increasingly complex and harm-
ful cyberattacks, making them difficult to detect. Furthermore, due to the diversity of the data
exchanged amongst these vehicles, traditional algorithms face difficulty detecting such threats.
Therefore, a network intrusion detection system is essential in a connected autonomous vehicle's
communication infrastructure. The IDS (intrusion detection system) aims to secure the network
by identifying malicious and abnormal traffic in real-time. This paper focuses on the data pre-
processing, feature extraction, attack detection for such a system.
Additionally, it will compare the performance of this proposed IDS when operating in differ-
ent machine learning models. We apply Linear Regression (LR), Linear Discriminant Analysis
(LDA), K Nearest Neighbors (KNN), Classification and Regression Tree (CART), and Support
Vector Machine (SVM) to classify the NSL-KDD dataset. The dataset was classified using binary
and multiclass classification to train and test files. This data resulted in 94% and 98% accuracy
for the train and test files, respectively, with KNN and CART algorithms.
Keywords: Machine learning, Autonomous vehicle, cyberattacks, intru-
sion, data preprocessing, feature engineering, ML model, accuracy.
1 Introduction
As advances in machine learning (ML) and deep neural networks (DNN) bring colossal
potential to search for and develop self-driving cars a reality. Technology advance-
ments in both the software and hardware side open new doors for huge applications in
different domains. Many companies are in the race to develop safe and secure autono-
mous cars. (Such as Ford, Toyota, NVIDIA, NCA&T, and many more). Due to its com-
munication system, more chances of threat surface access to exploit system vulnerabil-
ities for malicious hackers. Connected autonomous vehicles (CAV) is a transformative
technology that has increased potential in the research area. It helps reduce traffic
2
congestion and accidents, improve efficiency, and the improved quality of vehicular
systems. Moreover, using developed technologies such as ML, big data, IoT, and shar-
ing economy extensively benefit intelligent cities. [1]
Autonomous cars are vulnerable to different security threats. Network security is an
important topic, and an intrusion detection system (IDS) can help us mitigate network
threats without disrupting the safety and security of the host and the network. An intru-
sion detection system (IDS) is being implemented by applying ML techniques. It may
be grouped by Host-based IDS and Network-based IDS, described by its placement
over the network system [2]. There have been two types of detection: Misuse and anom-
aly detection. The misuse detects the known attacks, whereas anomaly detects the ab-
normal behavior. ML models are used to build anomaly-based detection systems. ML
models also assist in feature engineering. This paper uses an existing labeled dataset
(NSL-KDD) [3] to evaluate an anomaly-based intrusion detection system (IDS) to mit-
igate the threats and attacks. Dataset has made researchers compare different IDS meth-
ods and build an IDS system, either host or network-based. We apply different ML
algorithms and present a comparative analysis.
The rest of the paper is organized as follows. Section II covers the related work.
Section III includes the methodology, including data description, data preprocessing,
and feature engineering. Section IV presents results and performance metrics. Finally,
section V is the discussion and conclusion.
2 Literature Review
Cyber threats have become an essential issue with the emergence of self-driving vehi-
cles and require the system to provide safe and secure connected vehicles. NSL-KDD
[3] dataset is a refined version of the KDDcup99 data set [4]. Many analyses have been
taken place by applying different techniques and tools to develop an effective intrusion
detection system. The detailed implementation of various machine learning techniques
with the WEKA tool [5]. A detailed description of the dataset is given in [6].
The problem of redundancy gets biased while learning, which might be one reason
why a specific classifier shows an accuracy of above 95% [7,8]. In [9], results show
that the machine learning algorithm does not produce good results in the case of detec-
tion of misuse. In [10], the author compared the supervised ML classifiers for intrusion
detection in a network. The efficient algorithm has been selected via performance ma-
trices and concluded that Random Forest performs better than other classifiers. Authors
in [11] proposed a lightweight IDS method that focuses on data-preprocessing to use
essential features. They removed the redundant data from the dataset, which helps them
get unbiased results with machine learning models. Different feature selection tech-
niques have been used, such as wrapper or embedded feature selection, to improve the
results [12]. Correlation-based feature selection filter methods have been used, which
verify the model's effectiveness in terms of the detection rate as keeping a low false-
positive rate with the use of a full attack scenario [13]. The IDS uses supervised
3
machine learning models to detect normal and abnormal attacks [14]. The proposed
method only classifies the Denial of Service (DOS) and probe attacks, but the remaining
episodes are not considered.
The [15] proposed anomaly intrusion detection using an improved self-Adaptive
Bayesian algorithm to process a large amount of data. The paper proposed an intrusion
detection method using a support vector machine [16]. They used the feature removal
method to improve the efficiency of the algorithm.
3 Methodology
3.1 Dataset Used
The project used the NSL-KDD [3] dataset with 42 attributes. Data is an improved
version of the KDD99 dataset [17], a standard dataset for intrusion detection. The
dataset has several versions available, from which the KDDTrain+ and KDDTest+
(training and testing data, respectively considered, which have a total of 125912 and
22544 instances.
The dataset contains network attacks related to the autonomous vehicle, including
the 24 training attack types with 14 classes in the test file. Therefore, the dataset has
KDD_train.csv and KDD_test.csv, which are not recorded from the same probability
distribution, making it more realistic. Moreover, some intrusion experts believe that
most novel attacks are variants of known attacks, and those can be sufficient to catch
the novel variants.
This is a classification problem. The dataset description is given in Figure 1, which
provides the total instances of both files. The measure of different attacks and features
of a dataset is shown in Table 1.
Table 1Detailed instances in the dataset
NSL-KDD
Total in-
stances
Normal
Dos
Probe
R2L
U2R
KDD_train
125973
67343
45927
11656
995
52
KDD_test
22544
9711
7460
2885
2421
67
The Normal traffic shows no attack recorded, and the other four subtypes show the
documented Distributed Denial of service attack (DDoS). DDoS is a malicious attempt
to disrupt the traffic of a targeted server [18]. The four sub attacks description is given
below:
DOS: (Denial of Service) is recorded when overloading the server with too
many requests to be handled.
Probe: the hacker scans the network to misuse a known vulnerability.
4
R2L: (remote-to-local) attacks in which the attacker tries to gain local access
to unauthorized information through sending packets to the victim machine.
U2R: (user to root) attack where an attacker gets the core access of the system
using his standard account to exploit the system vulnerabilities.
3.2 Data Preprocessing
It is vital to preprocess the dataset to apply the ML techniques to any given dataset. The
less essential attributes in the dataset do not affect the accuracy of the classifier we want
to use. This report aims to provide the complete preprocessing steps of the two files of
the NSL-KDD [3] dataset. To preprocess the dataset, python-Anaconda-navigator (Ju-
pyter notebook) was used. The same methods were used [19] to preprocess the dataset
of both files.
Preprocessing contains:
load the dataset and analyze the statistics of the dataset.
Change sub attach labels to their respective class.
Check the missing value in the dataset.
Check the outliers.
After performing the above steps, the dataset contains no missing values but outliers.
The results of paired boxplot figure 1 with different ranges (a & b). So, the IQR (Inter-
quartile range) method was used to remove outliers. However, the box and whisker plot
(in figure 2) provide removed outliers scenario between attributes. The shape of the
cleaned data (which is (40118,42)) is not suitable as the dataset losses more than half
of the information, which is not acceptable because if we keep the cleaned dataset, then
the model overfits. (Which checked with applying the spot-check algorithm and getting
0.99 accuracy results).
(a) range 0-25
Figure 1 Paired Box Plot
5
The histogram result (shown in figure 3) helped consider the data itself. This is be-
cause many attributes such as (duration, host_srv_rate, serror_rate, …) have many val-
ues in one class, whereas the difference between the two classes is not distinct. (The
Figure 2 Box and Whisker plot
6
difference is too significant, and it considered the outlier as the highest value compared
to that one class which means it dropped the valuable figures from the dataset.)
It is not worth removing the outliers from the dataset and keeping the original data for
our use case.
Preprocessing steps continue…
below are the converted labels into five classes for multiclass classification
for training and testing.
Used binary classification for changing attack labels into two categories: nor-
mal and abnormal attacks, with the help of a label encoder.
Figure 3 Histogram plot
7
Used multiclass classification for changing attack labels into five usual cate-
gories, R2L, Probe, U2R, and Dos, respectively, with the help of a Label en-
coder.
3.3 Feature Engineering
Feature engineering is a crucial step to improve the performance of ML techniques.
The feature selection is made using the Pearson correlation method [19]. It is a
standard method used for filtering the essential features from the dataset. The paper
research [20] concluded that a correlation coefficient value below 0.2 is considered
a negligible correlation. We selected the threshold value of 0.5 to extract the feature
with moderate to high correlation. A correlation matrix completes the filtering for
more than 0.5 correlation attributes with encoded attack label features selected for
binary and multiclass classification.
Figure 4 explains the features selected with the highest correlation greater than 0.5
and selected that attribute for binary and multiclass classification. The same procedure
was followed for both train and test files.
4 Results and Discussion
We applied logistic regression (LR), support vector machine (SVM), K-nearest neigh-
bor (KNN), classification and regression tree (CART), and linear discriminant analysis
(LDA) to the modified version of the NSL-KDD dataset. The above models are stand-
ard for machine learning models with their respective advantages and disadvantages,
making them unique. This algorithm works efficiently according to the user's use case
Figure 4 Feature Selection with the correlation method for train and test file
8
and data type. The results of the model’s accuracy are shown below in Tables 2 & 3
and model comparison in Figures 5 & 6, respectively, for the train and test files.
Table 2 train accuracy for binary and multiclass classification
The train Table 2 gives the likely results for binary and multiclass classification. It
Accuracy (%)
Binary
Multiclass
LR
96.95
94.43
LDA
96.68
93.14
KNN
98.52
98.25
CART
98.45
98.25
SVM
97.24
95.91
Figure 5 train data box plot for model comparison of binary and multiclass classification
Figure 6 test data box plot for model accuracy of binary and multiclass classification
9
is a classification problem; therefore, the CART algorithm and KNN give higher
accuracy results than LR and LDA. Boxplot Figures 5 & 6 describe the model accu-
rately predicting binary and multiclass classification data for algorithm comparison.
The accuracy of multiclass classification for test files the LR and SVM algorithm
accuracies were increased by tuning the parameters. Tolerance and different solvers
were used for LDA from the Sklearn documentation, but it didn’t help improve the
accuracy.
Table 3 test accuracy for binary and multiclass classification
Test Table 3 gives the acceptable results for binary and multiclass classification.
Since it is a classification problem, the CART algorithm and KNN give higher accuracy
results than LR and LDA. Therefore, it can be said that the model is accurate and pre-
dicted with 94% overall accuracy of the suitable attack classes with the CART and
KNN algorithm.
Performance matrices check the model's performance, behavior, and activities. It de-
scribes whether the model is performing well or not, predicting as per the test data.
Different parameters of performance matrices were checked, and the results for binary
and multiclass classification of test and train data files are given in Tables. 4 and 5,
respectively.
Table 4 performance metrics for binary classification
Accuracy (%)
Binary
Multiclass
LR
92.45
90.16
LDA
92.45
87.76
KNN
94.98
94.84
CART
94.47
95.78
SVM
92.00
91.02
Performance Metrics for binary classification
Accuracy
KDD_test
KDD_train
Precision
0.875
0.967
Recall
0.942
0.973
Accuracy
0.917
0.967
False alarm rate
0.05803
0.0270
10
Table 5 performance metrics for multiclass classification
It is evident from Tables (4 & 5) that our model outperforms with high accuracy of
91% for the test, 97% for train binary classification, a recall of 94% for the test, and
97% for train binary classification to classify the attacks. The CART, KNN, and SVM
perform much better as its mostly preferred classifiers for the classification problem.
Each sort achieves higher precision because they have a relatively low false alarm rate,
which is consistent. The results are similar in pattern for multiclass classification.
5 Conclusion
This paper briefly explains how the ML algorithms are applied to the CAV dataset step
by step. The procedure starts with data massaging, feature extraction, and machine
learning classifiers. Next, the dataset predicts intrusion-based attacks on a self-driving
car. Finally, the results were studied to predict the DDoS attack with binary and mul-
ticlass classification. The accuracy of models is comparatively similar for multiclass
and binary (approximately 94 to 98%).
Furthermore, the feature selection helps to reduce the training and testing times. Testing
is performed with the help of 10-fold cross-validation, where each fold is used once for
testing and nine times for training. Results showed that the proposed preprocessing and
feature selection method delivers excellent accuracy in the model.
6 Future Scope
The model results are reasonable for both cases. However, the gap here is in the class
attack dataset instances. When looking back into the attack labels values per class, there
has been an imbalance of the dataset in the U2R attack (class 3- the value is 52 in-
stances), which is significantly less compared to the other courses, which may be one
of the reasons behind the varying accuracy or dropping the accuracy in multiclass clas-
sification than the binary classification. Furthermore, it can be limited to see the
Performance Metrics for multiclass classification
Accuracy
KDD_test
KDD_train
Precision
0.898
0.952
Recall
0.900
0.961
Accuracy
0.900
0.961
False alarm rate
0.1572
0.0933
11
difference while removing the third class(U2R), with fewer attributes. To add to it, as
the details do not have effective differences amongst them, maybe removing the third
class itself will not improve the results by a reasonable amount. Unlikely, based on the
mechanisms of attack on IoT, exterior features are being evaluated and need to be con-
sidered. A data spike can be retrieved in the simple anomaly technique as future attacks
exist. Thus, we believe that our model is capable of such intrusions. In the future, we
will improve dataset techniques (such as interpolation) to check whether that helps with
the prediction model or improves the data redundancy and vulnerability of the novel
attacks. We will also consider various outlier handling techniques.
Acknowledgment
This research is supported by Palo Alto Networks. Any opinions, findings, conclusions,
or recommendations expressed in this material are those of the author(s) and do not
necessarily reflect the views of Palo Alto Networks.
References
1. Bimbraw K: Autonomous Cars: Past, Present, and Future. A Review of the Devel-
opments in the Last Century, the Present Scenario, and the Expected Future of Au-
tonomous Vehicle Technology, In: In Proceedings of the 12th International Con-
ference on Informatics in Control, Automation and Robotics, pages 191-198,
ICINCO, (2015).
2. Ieracitano C, Adeel A, Morabito F C, Hussain A, A novel statistical analysis and
autoencoder driven intelligent intrusion detection approach., Neurocomputing,
387, Doi: https://doi.org/10.1016/j.neucom.2019.11.016, 5162, (2020).
3. M. Tavallaee, E. Bagheri, W. Lu, and A. Ghorbani, A Detailed Analysis of the
KDD CUP 99 Data Set, Submitted to Second IEEE Symposium on Computational
Intelligence for Security and Defense Applications, CISDA, (2009).
4. Mahbod Tavallaee, Ebrahim Bagheri, Wei Lu, and Ali A. Ghorbani, A Detailed
Analysis of the KDD CUP 99 Data Se, In: Proceedings of the 2009 IEEE Sympo-
sium on Computational Intelligence in Security and Defense Applications,
CISDA, (2009).
5. S. Revathi, Dr. A. Malathi, A Detailed Analysis on NSL-KDD Dataset Using Var-
ious Machine Learning Techniques for Intrusion Detection, International Journal
of Engineering Research & Technology (IJERT), ISSN: 2278-0181, Vol. 2 Issue
(12, December-2013).
6. L. Dhanabal1, Dr. S.P. Shantharajah, A Study on NSL-KDD Dataset for Intrusion
Detection System Based on Classification Algorithms, International Journal of
Advanced Research in Computer and Communication Engineering Vol. 4, Issue 6,
(June 2015).
7. S. Revathi, Dr. A. Malathi, A detailed analysis of KDD cup99 Dataset for IDS, In-
ternational, Journal of Engineering Research & Technology (IJERT) Vol. 2, Issue
12, December (2013).
8. R. P. Lippmann, D. J. Fried, and I. Graf, Evaluating intrusion detection systems:
The 1998 DARPA off-line intrusion detection evaluation, In: Proceedings of the
2000 DARPA Information Survivability Conference and Exposition, DISCEX’00,
(2000).
12
9. Maheshkumar Sabhnani, Gursel Serpen, Why Machine Learning Algorithms Fail
in Misuse Detection on KDD Intrusion Detection Dataset, ACM Transactions on
Intelligent Data Analysis, pp.403-415, (2004).
10. Y. Hamid, M. Sugumaran, and V. R. Balasaraswathi. 2016, IDS Using Machine
Learning - Current State of Art and Future Directions, Current Journal of Applied
Science and Technology. 15, 3 (Mar 2016), 1-22, doi:10.9734/BJAST/2016/23668
11. J. Manjula C. Belavagi and Balachandra Muniyal, Performance Evaluation of Su-
pervised Machine Learning Algorithms for Intrusion Detection, 25th International
Multi-Conference on Information Processing, (2016).
12. Yasmen Wahba, Ehab Elsalamouny, Ghada Eltaweel, Improving the performance
of multiclass intrusion detection systems using feature reduction, Research gate,
article, (June 2015)
13. Iman Sharafaldin, Arash Habibi Lashkari, and Ali A. Ghorbani, Toward Generat-
ing a New Intrusion Detection Dataset and Intrusion Traffic Characterization, 4th
International Conference on Information Systems Security and Privacy, ICISSP,
Portugal, (January 2018)
14. P.Sangkatsanee, N. Wattanapongsakorn, C. Charnsripinyo, Practical Real-Time
Intrusion Detection Using Machine Learning Approaches, Computer Communica-
tions, Vol. 34, no. 18, pp. 22272235, (2011).
15. D. M. Farid and M. Z. Rahman, Anomaly network intrusion detection based on
improved self-adaptive Bayesian algorithm, Journal of Computers, vol. 5, no. 1,
pp. 2331, (2010)
16. Y. Li, J. Xia, S. Zhang, J. Yan, X. Ai, and K. Dai, An Efficient Intrusion Detection
System Based on Support Vector Machines and Gradually Feature Removal
Method, Expert Systems with Applications, vol. 39, no. 1, pp. 424430, (2012).
17. The NSL-KDD dataset from the Canadian Institute for Cybersecurity (an updated
version of the original KDD Cup 1999 Data (KDD99) https://www.unb.ca/cic/da-
tasets/nsl.html
18. Iman Sharafaldin, Arash Habibi Lashkari, Saqib Hakak, Ali A. Ghorbani, Devel-
oping Realistic Distributed Denial of Service (DDoS) Attack Dataset and Taxon-
omy, IEEE 53rd International Carnahan Conference on Security Technology,
Chennai, India, (2019).
19. https://github.com/abhinav-bhardwaj/Network-Intrusion-Detection-Using-Ma-
chine-Learning/blob/master/README.md
20. M. M. Mukaka, Statistics Corner: A guide to the appropriate use of Correlation
coefficient in medical research, Malawi Med. J., vol. 24, no. September, pp. 69
71, (2012).
... It has been recommended that NSL-KDD (Bhavsar et al., 2022;Louk and Tama, 2023) be used to address some of the inherent issues with the KDD'99 data set. Due to the dearth of publicly available data sets for network-based IDS (Bhavsar et al., 2022), even though this updated version of the KDD data set still has some of the issues raised by McHugh and may not be a perfect representation of real-world networks, we still think it can be used as a useful benchmark data set to aid researchers in comparing various IDS approaches. ...
... It has been recommended that NSL-KDD (Bhavsar et al., 2022;Louk and Tama, 2023) be used to address some of the inherent issues with the KDD'99 data set. Due to the dearth of publicly available data sets for network-based IDS (Bhavsar et al., 2022), even though this updated version of the KDD data set still has some of the issues raised by McHugh and may not be a perfect representation of real-world networks, we still think it can be used as a useful benchmark data set to aid researchers in comparing various IDS approaches. Similar to KDDcup99, NSL-training KDD's set consists of about 1,074,992 single linkage vectors, each of which has 41 characteristics. ...
... The creation of a new data set known as the NSL-KDD data set was prompted by the inherent issues with the KDD (Bhavsar et al., 2022) data set. Many issues, such as duplicate instances, have been resolved with this fresh data collection. ...
Article
Full-text available
IoT devices generate enormous amounts of data, which deep learning algorithms can learn from more effectively than shallow learning algorithms. The approach for threat detection may ultimately benefit fog computing or fog networking (fogging). The authors present a cutting-edge distributed DL method for detecting cyberattacks and vulnerability injection (CAVID) in this paper. In terms of the evaluation metrics tested in the tests, the DL model performs better than the SL models. They demonstrated a distributed DL-driven fog computing CAVID approach using the open-source NSL-KDD dataset. A pre-trained SAE was utilised for feature engineering, whereas Softmax was employed for categorization. They used parametric evaluation for system assessment to evaluate the model in comparison to SL techniques. For scalability, accuracy across several worker nodes was taken into consideration. In addition to the robustness, effectiveness, and optimization of distributed parallel learning among fog nodes for enhancing accuracy, the findings demonstrate DL models exceeding classic ML architectures.
... Thakkar et al. [39] conducted a comprehensive survey focusing on machine learning and deep learning methods employed in intrusion detection systems for the Internet of Things (IoT). In one of our previous papers [40], we focused on the importance of IDS in CAVs. Using benchmarking datasets, we build the IDS using 5 different Machine learning (ML) techniques. ...
Article
Full-text available
A federated learning-based intrusion detection system (FL-IDS) is introduced in this paper to enhance the security of vehicular networks in the context of IoT edge device implementations. The FL-IDS system protects data privacy by using local learning, where devices share only model updates with an aggregation server. This server then generates an enhanced detection model. The FL-IDS system also incorporates machine learning (ML) and deep learning (DL) classifiers, namely logistic regression (LR) and convolutional neural networks (CNN), to prevent attacks in transportation IoT environments. The performance of the proposed IDS was evaluated using two different datasets, NSL-KDD and Car-Hacking. The model evaluation has been evaluated based on the accuracy and loss parameters. The results showthat the FL-IDS system outperforms traditional centralized machine learning and deep learning approaches regarding accuracy and privacy protection.
... The study by [13] presents an approach to fortify selfdriving cars against complex cyber threats. Their focus involves preprocessing data, feature extraction, and attack detection using various machine learning models such as logistic regression (LR), linear discriminant analysis(LDA), K-nearest neighbors(KNN), classification and regression trees (CART), SVM on the NSL-KDD dataset, attaining notable accuracies of 94% and 98% for training and testing datasets, respectively, stands out, particularly when employing KNN and CART algorithms. ...
Conference Paper
Cybersecurity poses a growing threat to technology infrastructure, especially raising concerns for automobile technology. Modern vehicles, reliant on connectivity, face a critical challenge in safeguarding their in-vehicle networks from cyber-attacks. Although the Controller Area Network is a standard for in-vehicle networks, its lack of security features exposes vehicles to vulnerabilities. Especially, the use of AI for offensive purposes further intensifies the threat to automotive technology infrastructure to protect it from cyber-attacks. This study proposes an approach to enhance in-vehicle network security, employing machine learning algorithms to protect against both AI-based generated attacks and traditional attacks. The Conditional Tab-ular Generative Adversarial Network (CTGAN) is utilized to generate in-vehicle network traffic, which is then combined with benchmark in-vehicle network traffic to create a complex and diverse scenario. The Random Forest model achieved a significant accuracy score for both benchmark in-vehicle network traffic and AI-based traffic, with scores of 0.93 and 0.89, respectively.
... This paper aims to bridge this gap by proposing a Federated-LSTM approach tailored for network security in Intelligent Connected Vehicles. Bhavsar, M et al [24] delves into utilizing machine learning techniques for detecting intrusion-based attacks in CAVs. The specific focus on intrusion-based attacks targeting CAVs remains a relatively less-explored area in the literature. ...
Article
Full-text available
Today’s Internet of Vehicles (IoV) has soared by leveraging data gathered from transportation systems, yet it grapples with security concerns stemming from network vulnerabilities, exposing it to cyber threats. This study proposes an innovative method to anticipate anomalies and exploit IoV services related to road traffic. Using the Unceasement Conditional Random Field Dynamic Bayesian Network Model (U-CRF-DDBN), this approach predicts the impact of network attacks, strategically managing vulnerable nodes and attackers. Through experimentation and comparisons with existing methods, our model demonstrates its effectiveness in mitigating IoV vulnerabilities. The U-CRF-DDBN strikes a superior balance, outperforming other approaches in intrusion detection for Internet of Vehicles systems. Evaluating its performance on the NSL-KDD dataset reveals a promising average Detection Rate of 93.512% and a low False Acceptance Rate of 0.125% for known attacks, highlighting its robustness. However, with unknown attacks, while the Detection Rate remains at 74.157%, there is an increased FAR of 16.47%, resulting in a slightly lower F1-score of 0.822.
Chapter
The rapid development of urbanization in China has brought many serious “urban diseases”, especially in the field of transportation. The situation and contradictions are becoming increasingly severe. For this reason, many intelligent transportation systems (ITS) based on various sensors (such as radio frequency, electromagnetic induction, infrared, and video processing) have been born in combination with the current research of various high and new technologies, especially the sensor technology of the Internet of things. Double car merging is a common problem in traffic. To solve this problem, intelligent vehicles should be able to detect other vehicles and merge with them at the right time. Electromagnetic induction (EMI) has been used as a means of detecting the presence of another vehicle and its distance from the current position of the vehicle. However, there are some problems in the use of electromagnetic interference. This paper presents an intelligent system based on image processing technology. The system can overcome these problems and provide accurate information about the position of other vehicles and the distance between them and the current position of the first car or truck.KeywordsImage processingIntelligent vehicleElectromagnetic inductionDouble car merging
Article
Full-text available
Intrusion detection system plays an important role in network security. Intrusion detection model is a predictive model used to predict the network data traffic as normal or intrusion. Machine Learning algorithms are used to build accurate models for clustering, classification and prediction. In this paper classification and predictive models for intrusion detection are built by using machine learning classification algorithms namely Logistic Regression, Gaussian Naive Bayes, Support Vector Machine and Random Forest. These algorithms are tested with NSL-KDD data set. Experimental results shows that Random Forest Classifier out performs the other methods in identifying whether the data traffic is normal or an attack.
Article
Full-text available
The prosperity of technology worldwide has made the concerns of security tend to increase rapidly. The enormous usage of Internetworking has raised the need of protecting systems as well as networks from the unauthorized access or intrusion. An intrusion is an activity of breaking into the system by compromising the security policies, and the process of analyzing the network data for the possible intrusions is Intrusion Detection. For the last two decades automatic intrusion detection system has been an important research topic. Up to the moment, researchers have developed Intrusion Detection Systems (IDS) capable of detecting attacks in several available environments. A boundlessness of methods for misuse detection as well as anomaly detection has been applied, most popular of the all is using machine learning techniques. In this work a survey of various research efforts spared towards the development of intrusion detection systems based on machine learning techniques in given. The surveyed works are presented in easy to understand tabular forms and for each work; technique employed, dataset used and the parameters evaluated are mentioned. Current achievements and limitations in developing intrusion detection system by machine learning and future directions for research are also given.
Article
Full-text available
The field of autonomous automation is of interest to researchers, and much has been accomplished in this area, of which this paper presents a detailed chronology. This paper can help one understand the trends in autonomous vehicle technology for the past, present, and future. We see a drastic change in autonomous vehicle technology since 1920s, when the first radio controlled vehicles were designed. In the subsequent decades, we see fairly autonomous electric cars powered by embedded circuits in the roads. By 1960s, autonomous cars having similar electronic guide systems came into picture. 1980s saw vision guided autonomous vehicles, which was a major milestone in technology and till date we use similar or modified forms of vision and radio guided technologies. Various semi-autonomous features introduced in modern cars such as lane keeping, automatic braking and adaptive cruise control are based on such systems. Extensive network guided systems in conjunction with vision guided features is the future of autonomous vehicles. It is predicted that most companies will launch fully autonomous vehicles by the advent of next decade. The future of autonomous vehicles is an ambitious era of safe and comfortable transportation.
Article
Full-text available
Intrusion detection systems (IDS) are widely studied by researchers nowadays due to the dramatic growth in network-based technologies. Policy violations and unauthorized access is in turn increasing which makes intrusion detection systems of great importance. Existing approaches to improve intrusion detection systems focus on feature selection or reduction since some features are irrelevant or redundant which when removed improve the accuracy as well as the learning time. In this paper we propose a hybrid feature selection method using Correlation-based Feature Selection and Information Gain. In our work we apply adaptive boosting using na\"ive Bayes as the weak (base) classifier. The key point in our research is that we are able to improve the detection accuracy with a reduced number of features while precisely determining the attack. Experimental results showed that our proposed method achieved high accuracy compared to methods using only 5-class problem. Correlation is done using Greedy search strategy and na\"ive Bayes as the classifier on the reduced NSL-KDD dataset.
Article
In the current digital era, one of the most critical and challenging issue is ensuring cybersecurity in information technology (IT) infrastructures. Indeed, with the significant improvement of technology, hackers have been developing ever more complex and dangerous malware attacks that make the intrusion recognition a very difficult task. In this context, the existing traditional analytic tools are facing severe challenges to detect and mitigate these threats. In this work, we introduce a statistical analysis and autoencoder (AE) driven intelligent intrusion detection (IDS) system. Specifically, the proposed IDS combines data analytics, statistical techniques with recent advances in machine learning theory to extract optimized and more correlated features. The validity of the proposed IDS is tested using the benchmark NSL-KDD database. Experimental results show that the designed IDS achieves better classification performance as compared to deep and conventional shallow machine learning as well as recently proposed state-of-the-art techniques.
Conference Paper
With exponential growth in the size of computer networks and developed applications, the significant increasing of the potential damage that can be caused by launching attacks is becoming obvious. Meanwhile, Intrusion Detection Systems (IDSs) and Intrusion Prevention Systems (IPSs) are one of the most important defense tools against the sophisticated and ever-growing network attacks. Due to the lack of adequate dataset, anomaly-based approaches in intrusion detection systems are suffering from accurate deployment, analysis and evaluation. There exist a number of such datasets such as DARPA98, KDD99, ISC2012, and ADFA13 that have been used by the researchers to evaluate the performance of their proposed intrusion detection and intrusion prevention approaches. Based on our study over eleven available datasets since 1998, many such datasets are out of date and unreliable to use. Some of these datasets suffer from lack of traffic diversity and volumes, some of them do not cover the variety of attacks, while others anonymized packet information and payload which cannot reflect the current trends, or they lack feature set and metadata. This paper produces a reliable dataset that contains benign and seven common attack network flows, which meets real world criteria and is publicly available. Consequently, the paper evaluates the performance of a comprehensive set of network traffic features and machine learning algorithms to indicate the best set of features for detecting the certain attack categories.
Conference Paper
Advancement of the network technology has increased our dependency on the Internet. Hence the security of the network plays a very important role. The network intrusions can be identified using Intrusion Detection System (IDS). Machine learning algorithms are used to predict the network behavior as intrusion or normal. This paper discusses the prediction analysis of different supervised machine learning algorithms namely Logistic Regression, Gaussian Naive Bayes, Support Vector Machine and Random Forest on NSL-KDD dataset. These machine learning classification techniques are used to predict the four different types of attacks namely Denial of Service attack, Remote to Local (R2L), Probe and User to Root(U2R) attacks using multi-class classification technique.