Conference PaperPDF Available

Performance Evaluation of Intrusion Detection System Using Machine Learning and Deep Learning Algorithms

August 2023

August 2023

DOI:10.1109/IBDAP58581.2023.10271964

Conference: 2023 4th International Conference on Big Data Analytics and Practices (IBDAP)

Authors:

Dipayan Ghose

East West University (Bangladesh)

Show all 9 authorsHide

Content uploaded by Mahamudul Hasan

Content may be subject to copyright.

XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE

Performance Evaluation of Intrusion Detection System Using

Machine Learning and Deep Learning Algorithms

Md. Sabbir Hossain, Dipayan Ghose, All Masror Partho, Minhaz Ahmed, Md. Tanvir Chowdhury, Mahamudul Hasan,

Md Sawkat Ali, Taskeed Jabid, Maheen Islam,

Department of Computer Science and Engineering,

East West University, Dhaka, Bangladesh

Email: { sabbirhossain1338, ghosedipayen, Masror.partho, minhazahmed39, mdtanvirchowdhury015, munna09bd}@gmail.com,

{alim, taskeed, maheen}@ewubd.edu

Abstract- Now that Internet access is so widely used,

our society has a greater number of networked

technologies. Data travels between them because of their

daily activities. Due to the server's weaknesses, hackers

may get access to the system through difficult-to-identify

network breaches. One of the most well-known defense

mechanisms against these attacks on networked devices

is the Intrusion Detection System (IDS), which is built

into the system. IDS has previously received extensive

training in the classification of threats using traditional

machine learning-based models and pre-assembled

datasets. In this research, we presented two deep

learning-based models, the Multilayer Perceptron Model

(MLP) and Long-Short Term Memory (LSTM), along

with five machine learning-based models, including

Naive Bayes (NB), Decision Tree (DT), K-Nearest

Neighbor (KNN), Random Forest (RF), and Support

Vector Machine (SVM). The NSL-KDD dataset has been

used to achieve 89.6% accuracy with normalization and

89.2% without normalization, 97.77% with LSTM and

96.89% with MLP. Each record in the data collection

has 43 features, including two labels and 41 features that

are related to traffic input.

Index Terms- Intrusion Detection System (IDS),

KNN, LSTM, MLP, Classification, Accuracy

I. INTRODUCTION

Researchers looked into how machine learning (ML)

and deep learning (DL) techniques could be used to create

an intrusion detection system (IDS) that could satisfy

contemporary network security needs. The advancement of

technology and the shift towards online transactions have

led to an increase in network and endpoint attacks, posing

risks to data integrity, confidentiality, and availability. The

researchers emphasize that while traditional security

measures such as access control, password protection, and

firewalls are important, they are not sufficient to protect

against sophisticated intrusions. Hence, IDS is employed as

a real-time monitoring system that can identify suspicious

activities and send warnings when unauthorized access or

malicious attacks occur.

Machine learning and deep learning are subfields of

artificial intelligence (AI) that are well suited for analyzing

massive volumes of data and extracting meaningful

information. These techniques enable the IDS to accurately

predict both typical and deviant actions based on learned

patterns from network traffic. The study covers various

aspects of network intrusion risks, including conventional

and rule-based procedures, as well as innovative machine

learning and deep learning techniques. By leveraging these

advanced approaches, the IDS can enhance the detection

and prevention capabilities, thereby strengthening the

overall cybersecurity posture. In conclusion, the researchers

suggest using deep learning and machine learning

approaches to create an efficient intrusion detection system.

This approach leverages the power of AI to analyze network

traffic, detect anomalies, and provide timely warnings,

thereby improving the security of online transactions and

protecting against unauthorized access and attacks.

Fig.1. Intrusion Detection System

II. BACKGROUND AND RELATED WORK

A. Machine Learning Algorithm

Since the beginning of civilization, people have employed a

variety of tools to complete a range of tasks in ways that are

more practical and less complicated to meet their needs.

Thanks to the human mind's capacity for creation, a wide

range of tools and objects have been made. By enabling

people to meet a range of needs, such as those related to

travel, industry, and computing, these devices facilitate the

ease of human living. Machine learning is the most extreme

and stands out nowadays [3]. These gadgets made it easier

for people to live their lives by helping to address a variety

of needs, including those pertaining to travel, business, and

computing. The most notable of these is machine learning

[3]. Making it feasible for computers to learn without being

explicitly instructed is the aim of the area of computer

science known as machine learning, according to Arthur

Samuel [4]. The focus of machine learning techniques is

function approximation issues, where the aim is represented

by a function, and the learning problem is to improve the

accuracy of that function using experience from a sample of

input-output pairs that are well-known for the function [5].

As a result, the primary challenge of supervised learning is a

lack of enough labeled data. Unsupervised learning, on the

other hand, draws out relevant feature information from

unlabeled data, greatly increasing the accessibility of

training material. In terms of detection, supervised learning

methods frequently beat unsupervised learning methods

[10].

Supervised Learning Algorithm:

A collection of data is used as input in supervised

learning, and a machine learning model is used to

identify a connection between the feed and the

outcome. The two categories are regression and

classification.

1. Support Vector Machine (SVM): In SVMs, the

objective is to locate a max-margin separation

hyperplane in an n-dimensional feature space.

SVMs can produce good results even with limited

training sets because the separation hyperplane

only requires a small number of support vectors to

be set. SVMs, however, are noise-sensitive close to

the hyperplane. SVMs excel at handling linear

issues. Kernel functions are usually used for data

that is not linear [10]. The kernel technique

translates inputs implicitly to high-dimensional

feature spaces. It essentially defines the boundaries

between the classes. The margins are designed to

maximize the distance between the margin and the

classes, hence decreasing classification error.

Because SVMs and other machine learning

methods frequently use kernel techniques [3].

2. Naïve Bayes (NB): The Naive Bayes method is a

classification algorithm based on attribute

independence and conditional probability. The

example is put in the outcome class with the

highest probability [10]. Because it simplifies the

assumptions about the qualities, it is referred to as

"naive". A probabilistic algorithm is what the

Naive Bayes algorithm is known as, and its

formula is.

3. Decision tree (DT): One of the main methods for

supervised machine learning, DT applies a set of

decisions (rules) to classify and predict data using

both regression and classification. A typical tree

structure with nodes, branches, and leaves is

included in the model [6]. The simplest classifier is

the decision tree. The extreme gradient boosting

(XGBoost), which consists of several decision trees

with parent and root nodes, and random forest are

examples of advanced techniques [12].

4. Random Forest (RF): Random Forest is a

supervised learning technique that generates

training sample sets using the Bagging (Bootstrap

aggregation) technique and creates multiple

decision trees when a new set of data is input.

When this new set of samples is supplied, each

decision tree in the forest makes a prediction on it

separately, and then the predictions of all the trees

are combined to provide a result [12]. Most of the

time, even without the use of a hyperparameter, it

is possible to obtain acceptable results. It is one of

the most favored techniques due to the speed and

accuracy with which it produces results, even for

mixed, incomplete, and noisy datasets [13].

Unsupervised Learning Algorithm:

Unsupervised learning models identify hidden

patterns without human assistance but may require

human involvement.

1. K-Nearest Neighbor (KNN): The manifold

theory serves as the foundation for KNN's

main idea. If most of its neighbors also fall

into that class, there is a substantial likelihood

that the sample will as well. As a result, only

the k closest neighbors are associated to the

classification outcome. The parameter k has a

big impact on how well KNN models perform.

The likelihood of overfitting increases as k

decreases because the model becomes more

complex. On the other hand, as the work

grows, the model gets simpler and loses its

capacity to fit data.

B. Deep Learning

Artificial neural networks are used in deep learning, a branch

of machine learning, to simulate human thought and

learning. The analysis of Big Data, picture classification,

language translation, and speech recognition all use it

nowadays. Data scientists who gather, examine, and

understand vast volumes of data can also benefit from it. A

deep learning artificial neural network applies signals to

nodes using weights to produce outputs. It can identify data

using binary true or false queries, but it also needs robust

hardware and data sets. Due to its capacity to grow and learn

through time, it has recently become more and more

pertinent. Over time, the facial recognition program will

successfully recognize faces in this scenario [15]. While first

proposed in the 1980s, deep learning has only recently

gained popularity for two reasons.

1. Deep learning needs lots of data that has been classified.

For instance, millions of images and many hours of video are

needed to create self-driving cars.

2. For deep learning, a lot of processing power is required.

High-performance GPUs are ideal for deep learning,

allowing systems to group data and provide precise

predictions by taking cues from the human brain. Deep

learning algorithms conduct logical analyses of data to draw

conclusions that are comparable to those of humans [16].

C. Deep Neural Network

An artificial neural network is the foundation of the

advanced machine learning technology known as deep

learning. It needs a lot of data to learn on, and it labels the

information utilized during training. Only once a deep

learning model has been trained and achieved an acceptable

level of accuracy can it interpret unstructured data. Neural

networks are made up of interconnected nodes, called

neurons, which are based on our brain's organic neurons.

Weights are used to create connections between neurons,

and each node has a weight and threshold associated with it.

Artificial neural networks can be used to swiftly classify and

cluster data, and training data is utilized to learn and

improve accuracy. One of the most well-known neural

networks is the one that powers Google's search engine.

1. Perceptron: Perceptron is a single neuron that

processes input values and transfers them to an

activation function to generate binary output.

2. Feedforward neural networks (MLP): The

neurons and hidden layers make up Feed Forward

(FF) neural networks, which move data forward

without backpropagation. Flow control begins at

the input level and goes to the output level,

allowing for customization of weights and

improved learning FF neural networks are used in

classification, speech recognition, face recognition,

pattern recognition.

3. Multi-layer perceptron’s (MLPs): Multi-layer

perceptron’s, which can be used for multi-class and

binary classification, are bi-directional, with inputs

propagating forward and weight changes

propagating backward.

4. Recurrent neural networks (RNN): parallel

neuronal systems (RNNs) are deep learning

techniques used in popular applications such as

Siri, voice search, and Google Translate. RNNs use

a Hidden Layer to remember specific information

about a sequence. They use the same parameters

for each input, reducing the complexity of the

parameters. The primary drawback of RNN is the

Vanishing Gradient problem, which makes it

impossible to remember the weights of earlier

layers.

5. Long Short-Term Memory Networks (LSTM):

Long Short-Term Memory Networks (LSTMs) are

a variation of recurrent neural networks that can be

used to tackle the Vanishing Gradient problem.

LSTMs can identify long-term dependencies and

use gates to decide which outputs should be used

and which should be ignored. The input gate

determines which data should be kept in memory,

while the output gate regulates the data transferred

to the following layer. LSTMs are used in various

applications such as: gesture recognition, speech

recognition, text prediction.

III. METHODOLOGIES

A. Research Analysis

This section of the proposed approach discusses the chosen

approaches for supporting or analyzing a set of data or a

desired case. In order to predict results, the effective model

collects a dataset, analyzes it, and then applies machine

learning and deep learning algorithms to it. In this research

proposed approach, we will investigate an ABC dataset

utilizing some of the most popular machine learning methods

as well as some proven deep learning techniques. The

algorithms used were Support Vector Machine, Naive Bayes,

Decision Tree, Random Forest Classifier, MLP, and LSTM.

The suggested model's schematic process diagram is shown

in Fig. 2. Following the preprocessing of the acquired

dataset, observations were produced, paving the way for the

feature selection technique's identification of key features.

The dataset is divided into train and test sets after the class

imbalance issue has been addressed. In order to train the

models, classifiers are fed the training dataset. Making

predictions using test instances is the next step in evaluating

the trained models' performance.

Fig.2. Workflow Diagram

B. Data Overview

The NSL-KDD dataset enhances KDD'99 for testing

intrusion detection methods. It has training and testing sets:

KDDTrain+, KDDTest+, and KDDTest-21. They include

attack-type labels in CSV. DoS, U2R, R2L, probing, and

normal instances are covered. These attacks fall into

surveillance, DoS, and probing categories. U2R is

unauthorized local superuser access, R2L is unauthorized

remote access. Our study evaluated methods using DoS

attacks. NSL-KDD is recommended over KDD Cup'99 for

addressing issues. It's vital for building intrusion detection

systems and studying cybersecurity. Industry also uses

datasets like ADFA-ID, ISCX-UNB. Let's explore NSL-

KDD's improvements.

Data Preprocessing

Because noisy and contradictory data can result in a

fatal mistake [27], it is preprocessed using the

techniques described in it goes through preprocessing

using the methods outlined in Fig.2.

Due to extraction or input problems, a portion of the

dataset comprises some noisy data, duplicate values,

missing values, infinite values, and so on. As a result,

we begin by preprocessing the data. The following is

the basic fundamental.

Fig.3. Different preprocessing phases

IV. EVALUATION OF MODELS

A. Evaluation Metrics

To assess our model's performance, we utilize accuracy. We

also discuss the false positive rate and the detection rate. The

number of records that are appropriately rejected and

identified as anomalies is referred to as True Positive (TP).

True Negative (TN) signifies the opposite. True Negative

(TN) indicates normal records, whereas False Negative (FN)

Identify applicable funding agency here. If none, delete this text box.

indicates the opposite. The following measures are used to

assess the effectiveness of self-taught learning:

1) Accuracy: It is defined as the proportion of properly

categorized records among all records.

2) Precision: The proportion of records with true

positives (TP) to all records with true positives and false

positives (FP) is calculated.

3) Recall: The ratio of true positive records to all true

positives and false negative (FN) category records,

expressed as a percentage.

4) F-measure: Precision and recall's harmonic mean is

characterized as a balance between the two.

B. Machine Learning Model Performance without

Normalization

Five different machine learning algorithms—Naive Bayes

(NB), Decision Tree (DT), K-Nearest Neighbors (KNN),

Random Forest (RF), and Support Vector Machine

(SVM)—have been used in this study. We have determined

the accuracy, precision, recall, and f1-score of each

algorithm. The following are the machine learning

algorithms results shown in Table 1:

TABLE I. Without Normalization results

Fig.4. Plotting of Machine Learning Algorithms

Results(Without Normalization)

So, from our machine learning of all algorithms, we can see

that the Random Forest (RF) method gave us the most

accuracy, 89.2%, and the KNN algorithm gave us the lowest

accuracy, 73.5%. Now, if we look at the F1 score for

machine learning algorithms, we can find that the DT

method has the best score (92%), and the Naive Bayes (NB)

algorithm has the lowest score (72%). From our table and

charts, we can also observe the precision and recall values

for the machine learning methods.

C. Machine Learning Model Performance with

Normalization

TABLE II. With Normalization Results

Fig.5. Plotting of Machine Learning Algorithms Results

(With Normalization)

So, from our machine learning of all algorithms, we can see

that the Random Forest (RF) method gave us the best

accuracy, which was 89.6%, while the NB algorithm gave

us the lowest accuracy, which was 75.9%. Now, if we take

F1 score into account for machine learning algorithms. We

can see that the Naive Bayes (NB) algorithm gave us the

lowest score, 69%, while both the DT and RF algorithms

gave us the greatest score, 92%. From our table and charts,

we can also observe the precision and recall values for the

machine learning methods.

D. Deep Learning Model Performance

We have employed the Long-Short Term Memory (LSTM)

and the Feed Forward Neural Network (MLP) as two

separate deep learning techniques. All these algorithms'

training accuracy, testing accuracy, precision, recall, and f1-

score have been calculated. The following is the deep

learning algorithms results shown in the Table 3:

TABLE III. Deep Learning Algorithms Results

Fig.6. Plotting of Deep Learning Algorithms Results

Now, if we consider When using deep learning techniques,

we can find that the algorithm's accuracy was 97.77%, while

the MLP algorithm's training accuracy was 96.89%.

Additionally, we achieved a top f1 score of 97.23% using

the LSTM algorithm. From our table and charts, we can also

examine the precision and recall numbers for deep learning

algorithms.

E. Comparison Between ML (Normalization) & DL

Algorithms Result

Fig.7. ML & DL Algorithms Classification

We can see from the plot above that there are two deep

learning algorithms and five machine learning algorithms.

The deep learning algorithms are LSTM and MLP, whereas

the machine learning methods are RF, DT, SVM, NB, and

KNN. We can see that RF, a machine learning algorithm,

gave us the best accuracy out of all the methods, albeit deep

learning algorithms also produced better results. From the

KNN, a machine learning algorithm, we obtained the lowest

accuracy.

F. Deep Learning Model Accuracy and Loss Curve

(LSTM)

We can now observe the accuracy vs. epoch plotting curve

for deep learning algorithms as well as the loss vs. epoch

charting curve for train and test datasets. The accuracy of

the LSTM algorithm for the train and test datasets is plotted

as a function of epoch in the chart below.

Fig.8. Plot of LSTM Algorithm of Accuracy Vs Epoch for

Train and Test Dataset

Fig.9. Plot of LSTM Algorithm of Loss Vs Epoch for Train

and Test Dataset

G. Deep Learning Model Accuracy and Loss Curve

(MLP)

Now, for deep learning algorithms, we shall examine the

accuracy vs. epoch plotting curve for the train and test

datasets as well as the loss vs. epoch plotting curve. The

accuracy vs. epoch plotting curve for the MLP algorithm for

the train and test dataset is shown below:

Fig.10. Plot of MLP Algorithm of Accuracy Vs Epochs for

Train and Test Dataset

Fig.11. Plot of MLP Algorithm of Loss Vs Epochs for Train

and Test Dataset

V. CONCLUSION

The intrusion detection system is evaluated in this paper

using machine learning and deep learning techniques. It

demonstrates that, except for random forests and decision

trees, the model using neural networks achieves greater

accuracy than typical machine learning models. The model

may enhance both the capability to identify the type of

intrusion and the accuracy of intrusion detection. In future,

it is recommended to lower the average accuracy while

increasing system efficiency by decreasing the imbalance

ratio. The NSL KDD dataset has no duplicate data, which

enables us to identify the models' maximum accuracy.

REFERENCES

[1] The History and Evolution of Intrusion Detection

from Global Information Assurance Certification

Paper -

https://www.giac.org/paper/gsec/1294/history-

evolution-intrusion detection/10057

[2] Dataset History- A Deeper Dive into the NSL-KDD

Data Set https://towardsdatascience.com/a-deeper-

dive-into-the-nsl-kdd-data-

set 15c753364657#:~:text=The%20data%20set%2

0contains%2043,of%20the%20traffic%20input%

20itself).

[3] Mahesh, Batta. "Machine learning algorithms-a

review." International Journal of Science and

Research (IJSR).[Internet] 9 (2020): 381-386.

[4] Qifang Bi, Katherine E Goodman, Joshua

Kaminsky, Justin Lessler, What is Machine

Learning? A Primer for the Epidemiologist,

AMERICAN JOURNAL OF EPIDEMIOLOGY,

Volume 188, Issue 12, December 2019, Pages

2222–2239, https://doi.org/10.1093/aje/kwz189

[5] Jordan, Michael I., and Tom M. Mitchell.

"Machine learning:Trends,perspectives, and

prospects." Science 349.6245 (2015): 255-260.

[6] H. Wang, C. Ma and L. Zhou, "A Brief Review of

Machine Learning and Its Application," 2009

International Conference on Information

Engineering and Computer Science, 2009, pp. 1-4,

doi: 10.1109/ICIECS.2009.5362936.

[7] O. M. Surakhi, A. M. García, M. Jamoos and M. Y.

Alkhanafseh, "A Comprehensive Survey for

Machine Learning and Deep Learning Applications

for Detecting Intrusion Detection," 2021 22nd

International Arab Conference on Information

Technology (ACIT), 2021, pp. 1-13, doi:

10.1109/ACIT53391.2021.9677375.

[8] Ma, Y.; Liu, K.; Guan, Z.; Xu, X.; Qian, X.; Bao,

H. Background Augmentation Generative

Adversarial Networks (BAGANs): Effective Data

Generation Based on GAN-Augmented 3D

Synthesizing. SYMMETRY 2018, 10, 734.

https://doi.org/10.3390/sym10120734]

[9] Regression for Machine Learning by Jason

Brownlee

https://machinelearningmastery.com/logistic-

regression-for-machine-learning/ 10

[10] A Machine Learning Approach to Network

Intrusion Detection System Using K Nearest

Neighbor and Random Forest

https://yourpastquestions.com/product/a-machine-

learning-approach-to-network-intrusion-

detection system/

[11] Intrusion Detection Systems using Machine

Learning and Deep Learning Techniques

https://rke.abertay.ac.uk/en/studentTheses/intrusion-

detection-systems-using-machine learning- and-

deep-learn

[12] Intrusion detection using machine learning

algorithms

https://thescholarship.ecu.edu/handle/10342/7650

[13] INTRUSION DETECTION SYSTEM USING

MACHINE LEARNING TECHNIQUES IN

CLOUD COMPUTING.

https://efaidnbmnnnibpcajpcglclefindmkaj/https://s3

-ap-

southeast 1.amazonaws.com/gtusitecirculars/uploa

ds/Synopsis-Patel%20Pinal-

129990907010_446069.pdf

[14] Neural Networks| IBM Cloud Education.

https://www.ibm.com/cloud/learn/neural-networks

[15] Facial recognition.

https://link.springer.com/article/10.1007/s13198-

022-01844-6

[16] Deep Learning: A Comprehensive Overview on

Techniques, Taxonomy, Applications and Research

Directions

https://link.springer.com/article/10.1007/s42979-

021-00815-1

[17] Anomaly-Based Network Intrusion Detection Using

Machine Learning

https://ieeexplore.ieee.org/document/8456129

Vinayakumar, R., Soman, K. and Prabaharan, P.

(2020) Evaluation of Recurrent Neural Network and

Its Variants for Intrusion Detection System

(IDS).https://doi.org/10.4018/978-1- 7998-0414-

7.ch018 Amarasinghe, K. and Manic, M. (2018)

Improving User Trust on Deep Neural Networks

Based Intrusion Detection Systems. IECON 2018-

44th Annual Conference of the IEEE Industrial

Electronics Society, Washington DC, 21-23

October 2018, 3262-3268.

https://doi.org/10.1109/IECON.2018.8591322 10

[18] Zarai, R. , Kachout, M. , Hazber, M. and Mahdi, M.

(2020) Recurrent Neural Networks & Deep Neural

Networks Based on Intrusion Detection System.

Open Access Library Journal, 7, 1- 11. doi:

10.4236/oalib.1106151.

[19] How to choose cross-entropy loss function in

Keras? https://androidkt.com/choose-

cross entropy-loss-function-in-keras/

[20] Activation functions and its types by Vinodkumar

Baskaran.

https://medium.com/@vinodhb95/activation-

functions-and-its-types-8750f1287464

[21] IDS 2018 Intrusion CSVs (CSE-CIC-IDS2018)

Dataset. IDS 2018 Intrusion CSVs (CSE CIC-

IDS2018) | Kaggle

[22] Garc´ıa, S., Luengo, J., Francisco, H.: Data

Preprocessing in Data Mining, vol. 72. Springer

(2015)

[23] M. M. E. M. S. a. A. M. H. Nasr, "Building

Sentiment analysis Model using Graphlab," IJSER,

2017

[24] Srimani PK, Patil MM (2016) Mining data streams

with concept drift in massive online analysis frame

work. Wseas Trans Comput 15

[25] Mishra P, Varadharajan V, Tupakula U, Pilli ES

(2018) A detailed investigation and analysis of

using machine learning technique for intrusion

detection, IEEE Commun Surv Tutorials

[26] What Is Deep Learning?

https://www.mathworks.com/discovery/deep-

learning.html

[27] Intrusion detection model using machine learning

algorithm on Big Data environment

https://journalofbigdata.springeropen.com/articles/1

0.1186/s40537-018-0145-4

Classification of satellite images with VGG19 and Convolutional Neural Network (CNN)

Conference Paper

May 2024

This work presents the application procedure VGG19, MobileNetV2, and CNN models for images Classification in remote sensing applications. This strategy employs pre-trained models, using transfer learning Feature extraction. The dataset includes classes like ’Cloudy,’ ’desert,’ ’green_field’ and ’water.’ Data augmentation techniques are involved to improve model generalization. The Models pass through training and evaluation with various considerations Setup, and class delivery are illustrated. Confusion matrix and precision-recall curves are analyzed. In addition, the Convolutional Neural Network (CNN) is used, focusing Data preparation, amplification and model design. The study presents the training, evaluation and comparison of two models version, shows effective image classification. The test result show a accuracy of 97.51% with consistently high Precision, recall, and F1-score across classes (’cloudy,’ ’desert,’ ’green_area,’ and ’water’), resulting in aggregates accuracy of 96%. These results underscore its effectiveness Transfer learning and CNN-based techniques in robust images classification is the critical step from data exploration to model deployment considerations are included, providing a comprehensive framework for remote sensing imagery classification.

Machine Learning Algorithms -A Review

Technical Report

Full-text available

Jan 2019

Batta Mahesh

Machine learning (ML) is the scientific study of algorithms and statistical models that computer systems use to perform a specific task without being explicitly programmed. Learning algorithms in many applications that's we make use of daily. Every time a web search engine like Google is used to search the internet, one of the reasons that work so well is because a learning algorithm that has learned how to rank web pages. These algorithms are used for various purposes like data mining, image processing, predictive analytics, etc. to name a few. The main advantage of using machine learning is that, once an algorithm learns what to do with data, it can do its work automatically. In this paper, a brief review and future prospect of the vast applications of machine learning algorithms has been made.

Recurrent Neural Networks and Deep Neural Networks Based on Intrusion Detection System

Article

Full-text available

Jan 2020

Rabeb ZARAI

Background Augmentation Generative Adversarial Networks (BAGANs): Effective Data Generation Based on GAN-Augmented 3D Synthesizing

Article

Full-text available

Dec 2018

Augmented Reality (AR) is crucial for immersive Human–Computer Interaction (HCI) and the vision of Artificial Intelligence (AI). Labeled data drives object recognition in AR. However, manually annotating data is expensive, labor-intensive, and data distribution asymmetry . Scantily labeled data limits the application of AR. Aiming at solving the problem of insufficient and asymmetry training data in AR object recognition, an automated vision data synthesis method, i.e., background augmentation generative adversarial networks (BAGANs), is proposed in this paper based on 3D modeling and the Generative Adversarial Network (GAN) algorithm. Our approach has been validated to have better performance than other methods through image recognition tasks with respect to the natural image database ObjectNet3D. This study can shorten the algorithm development time of AR and expand its application scope, which is of great significance for immersive interactive systems.

Building Sentiment analysis Model using Graphlab

Article

Full-text available

Jun 2017

Sentiment analysis is called opinion mining which is the field of study that analyzes people’s opinions, sentiments, evaluations, appraisals, attitudes, and emotions towards entities such as products, services, organizations, individuals, issues, events, topics, and their attributes. Starting from the importance of the sentiment analysis generally for individuals and more specifically for gigantic organizations, we started digging in this paper. Graphlab was used to build the sentiment models. Many algorithms were used along with text features selection techniques to predict the positive and negative sentiments like “SVM”, “logistic regression” and “boosted trees”. The mentioned classifiers were applied to a Hotel reviews dataset got from Trip Advisor website to emulate real customer opinions. The results showed that using SVM classifier along with N-grams features selection technique was superior to others.

A Comprehensive Survey for Machine Learning and Deep Learning Applications for Detecting Intrusion Detection

Conference Paper

Dec 2021

What Is Machine Learning: a Primer for the Epidemiologist

Article

Sep 2019

Machine learning is a branch of computer science that has the potential to transform epidemiological sciences. Amid a growing focus on "Big Data," it offers epidemiologists new tools to tackle problems for which classical methods are not well-suited. In order to critically evaluate the value of integrating machine learning algorithms and existing methods, however, it is essential to address language and technical barriers between the two fields that can make it difficult for epidemiologists to read and assess machine learning studies. Here, we provide an overview of the concepts and terminology used in machine learning literature, which encompasses a diverse set of tools with goals ranging from prediction, to classification, to clustering. We provide a brief introduction to five common machine learning algorithms and four ensemble-based approaches. We then summarize epidemiological applications of machine learning techniques in the published literature. We recommend approaches to incorporate machine learning in epidemiological research and discuss opportunities and challenges for integrating machine learning and existing epidemiological research methods.

Improving User Trust on Deep Neural Networks Based Intrusion Detection Systems

Conference Paper

Nov 2018

Deep Neural Networks based intrusion detection systems (DNN-IDS) have proven to be effective. However, in domains like critical infrastructure security, user trust on the DNN-IDS is imperative and high accuracy isn’t sufficient. The black-box nature of DNNs hinders transparency of the DNN-IDS, which is necessary for building trust. The main objective of this work is to improve user trust by improving transparency of the DNN-IDS by making it more communicative. This paper presents a methodology to generate offline and online feedback to the user on the decision making process of the DNN-IDS. Offline, the user is reported the input features that are most relevant in detecting each type of intrusion by the trained DNN-IDS. Online, for each detection, the user is reported the inputs features that contributed most to the detection. The presented method was implemented on the KDD-NSL dataset with a multi-layer perceptron (MLP) based DNN-IDS. Binary and multi-class classification was carried out on the dataset. Further, several DNN-IDS architectures with different depth were tested to study the factors that drive classification. It was observed that despite showing very similar accuracy results, the factors that drove the decisions were different across architectures. This evidences that the qualitative analysis that is enabled through reporting relevant input features is important for the user to make a more informed decision in choosing a DNN-IDS. This online and offline feedback leads to improving the transparency of the DNN-IDS and helps build trust prior to and during deployment.

A Detailed Investigation and Analysis of Using Machine Learning Techniques for Intrusion Detection

Article

Jun 2018

Intrusion detection is one of the important security problems in today’s cyber world. A significant number of techniques have been developed which are based on machine learning approaches. However, they are not very successful in identifying all types of intrusions. In this paper, a detailed investigation and analysis of various machine learning techniques have been carried out for finding the cause of problems associated with various machine learning techniques in detecting intrusive activities. Attack classification and mapping of the attack features is provided corresponding to each attack. Issues which are related to detecting low-frequency attacks using network attack dataset are also discussed and viable methods are suggested for improvement. Machine learning techniques have been analyzed and compared in terms of their detection capability for detecting the various category of attacks. Limitations associated with each category of them are also discussed. Various data mining tools for machine learning have also been included in the paper. At the end, future directions are provided for attack detection using machine learning techniques.

Data Preprocessing in Data Mining

Book

Jan 2015

Machine Learning: Trends, Perspectives, and Prospects

Article

Jul 2015
SCIENCE

Machine learning addresses the question of how to build computers that improve automatically through experience. It is one of today’s most rapidly growing technical fields, lying at the intersection of computer science and statistics, and at the core of artificial intelligence and data science. Recent progress in machine learning has been driven both by the development of new learning algorithms and theory and by the ongoing explosion in the availability of online data and low-cost computation. The adoption of data-intensive machine-learning methods can be found throughout science, technology and commerce, leading to more evidence-based decision-making across many walks of life, including health care, manufacturing, education, financial modeling, policing, and marketing.

Performance Evaluation of Intrusion Detection System Using Machine Learning and Deep Learning Algorithms

Recommended publications

Performance Evaluation of Machine Learning and Deep Learning Techniques for Sentiment Analysis

Artificial Intelligence for Intrusion Detection Systems

Transfer Learning Method for Handling The Intrusion Detection System with Zero Attacks Using Machine...

Data Protection Methods and Intrusion Detection Systems Employing Machine Learning and Deep Learning...