ArticlePDF Available

A genomic rule-based KNN model for fast flux botnet detection

July 2023
Egyptian Informatics Journal 24(2):313-325

July 2023
24(2):313-325

DOI:10.1016/j.eij.2023.05.002

License
CC BY-NC-ND 4.0

Authors:

Femi Ayo

Olabisi Onabanjo University

Awotunde J. Bamidele

University of Ilorin

Sakinat Oluwabukonla Folorunso

Olabisi Onabanjo University

Matthew Adigun

University of Zululand

Show all 5 authorsHide

Content uploaded by Sunday Adeola Ajagbe

Content may be subject to copyright.

A genomic rule-based KNN model for fast ﬂux botnet detection

Femi Emmanuel Ayo

, Joseph Bamidele Awotunde

⇑

, Sakinat Oluwabukonla Folorunso

Matthew O. Adigun

, Sunday Adeola Ajagbe

Department of Mathematical Sciences, Olabisi Onabanjo University, Ago-Iwoye 120107, Ogun State, Nigeria

Department of Computer Science, Faculty of Information and Communication Sciences, University of Ilorin, Ilorin 240003, Kwara State, Nigeria

Department of Information Technology, Cape University of Technology, Cape Town, South Africa

Computer & Industrial Production Engineering, First Technical University, Ibadan, 200255, Oyo State, Nigeria

article info

Article history:

Received 30 March 2023

Accepted 5 May 2023

Keywords:

Botnet Detection

Fast Flux Botnet

K-Nearest Neighbor

Genetic Algorithm

Fuzzy Logic

abstract

Fast Flux Botnet (FFB) is an advance method developed by cyber criminals to perpetrate distributed mali-

cious attacks. The major problems of existing FFB detection systems are the vulnerability to evasion

mechanisms, long detection time, and high dimensionality of the feature set. In this study, an improved

FFB detection architecture called Bot-FFX was developed to address some of these problems. The devel-

oped Bot-FFX consists of four modules: extractor, ﬁlter, resolver, and detector. The extractor module is

responsible for Domain Name System (DNS) queries on domains. The ﬁlter module can classify the

incoming domains as either blacklist or whitelist and sends the unclassiﬁed domains to the resolver.

The resolver extracts all IP addresses associated with the domain at its Time-To-Live (TTL) within a time

frame of 10 min. The detector module uses a rule-based Genetic Algorithm (GA) and K-Nearest Neighbor

(KNN) for botnet detection. The detector computed the Standard Deviation of Round Trip Time (SDRTT),

Average Google Hits (AGH) and Genetic Threshold Value (GTV) for all IP addresses associated with the

domains. The detector, built on a decision tree rules and the K-Dimensional (KD) tree KNN algorithm,

classiﬁed the domains using the set of IP addresses, SDRTT, AGH, and GTV. The Bot-FFX was implemented

on a dataset of 2,000 benign domains and 1,630 botnet domains. The dataset was split into 50% training

and 50% testing sets. The evaluation results on the same datasets showed that Bot-FFX is an effective FFB

detection system with accuracy, false positive, and false negative of 99.178%, 0.8%, and 0.8% respectively.

Ó2023 THE AUTHORS. Published by Elsevier BV on behalf of Faculty of Computers and Artiﬁcial Intel-

ligence, Cairo University. This is an open access article under the CC BY-NC-ND license (http://creative-

commons.org/licenses/by-nc-nd/4.0/).

1. Introduction

Botnet is an organized network of distributed and infected com-

puters (zombies) executing malicious codes called bots, under the

remote command of a human originator called botnetmaster [1].

The evolution of the Internet has played host to a network of dis-

tributed and compromised computers. Hence, the use of the Inter-

net attracts certain risks which make Botnet one of the key issues

in Internet security.

The network of anonymous users of the Internet unaware of the

importance of security provides many opportunities for malicious

users to exploit [2,3]. Increasingly, malicious users are constantly

developing more advanced methods to proﬁt from cybercrime

activities [4,5]. This occurrence has led to the design and imple-

mentation of a distributed architecture of remotely controlled net-

works of infected hosts, called botnets, for performing malicious

activities. With a single command from a Command and Control

(C&C) server, botnetmaster can control networks of vulnerable

hosts [6,7,8]. Botnetmaster usually performs maintenance and

update of their C&C set-up on Fast-Flux Service Network (FFSN)

to make the detection of bots difﬁcult.

In FFSN, the Internet Protocol (IP) address of several zombies

are advertized as phishing web servers in the Domain Name Sys-

tem (DNS). These phishing web servers act as a masquerader by

redirecting requests to the C&C server to execute the intended

malicious services. The botnetmaster frequently changes these

phishing web servers to the IP addresses of other bots to exploit

the weakness of the Hypertext Transfer Transmission Protocol

(HTTP) to detect the C&C server [9,7,10,11]. This phenomenon

https://doi.org/10.1016/j.eij.2023.05.002

1110-8665/Ó2023 THE AUTHORS. Published by Elsevier BV on behalf of Faculty of Computers and Artiﬁcial Intelligence, Cairo University.

This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

⇑

Corresponding author.

E-mail addresses: ayo.femi@oouagoiwoye.edu.ng (F.E. Ayo), awotunde.jb@uni-

lorin.edu.ng (J.B. Awotunde), sakinat.folorunso@oouagoiwoye.edu.ng (S.O. Folor-

unso), profmatthewo@gmail.com (M.O. Adigun), sunday.ajagbe@tech-u.edu.ng (S.A.

Ajagbe).

Egyptian Informatics Journal 24 (2023) 313–325

Contents lists available at ScienceDirect

Egyptian Informatics Journal

journal homepage: www.sciencedirect.com

may limit researchers to the option of detecting Fast Flux Botnets

(FFB) through the domain name masqueraded in spam mails and

other public forums.

FFB detection is a traditional machine learning task where the

features of an instance are fed to a classiﬁer and the classiﬁer

attempts to detect the class membership of that instance. How-

ever, unlike common classiﬁcation tasks where the feature set is

ﬁxed, FFB detection demands that the researcher learns new fea-

tures or adopt known reliable existing subsets based on the litera-

ture [12,13,14]. A number of Botnet Detection Systems (BDS) have

been developed, but the identiﬁcation, adoption, and merger of

reliable features for detection still remains a problem. This is

because most of the botnet detection systems are limited in accu-

racy due to the unreliable nature of the existing features

[15,16,3,17]. Secondly, the inability of a BDS to deal with botnet-

master constant evasion mechanisms to masquerade the opera-

tions of legitimate Internet devices [18,17]. Evasion mechanisms

are the different techniques adopted by Botnetmaster to make

the detection of their bots difﬁcult. These techniques include

advertizing the IP addresses of several zombies as phishing web

servers and performing update operations on the C&C server.

The motivation of this study is to adapt a reliable and effective

feature to deal with current evasion schemes adopted by botnet-

masters. This study developed an improved Fast Flux Botnet Detec-

tion (Bot-FFX) that adopts a K-Nearest Neighbor (KNN) classiﬁer

rooted in rule-based Genetic Algorithm (GA) consisting of three

features: Standard deviation of Round-Trip-Time, Average Google

Hits, and number of IP address over a time window. The main

motivation for adopting the KNN is to beneﬁt from the algorithm’s

high detection accuracy. The adoption of the listed features is to

tackle the problem of evasion as they exhibit different behaviours

for both FFB and legitimate domains. Additionally, a rule-based GA

technique is introduced to reduce the time taken to differentiate

between legitimate and botnet domains advertizing the same set

of IP addresses over the time window.

The rest of this paper is structured as follows: Section 2 pre-

sents related work. Methodology is presented in Section 3. The

implementation, evaluation, and results are presented in Section 4.

Section 5 concludes the work with future work.

2. Related work

A number of solutions have been developed by researchers in

recent years for BDS. These solutions can be mainly classiﬁed into

honeynets-based, intrusion-based, and heuristic-based detection

[19]. For instance, solutions in [20,21,22] have developed different

honeynet-based detection techniques. However, honeynet-based

detection may not necessarily detect botnet attacks, but useful to

understand botnet architecture and features. On the other hand,

intrusion-based detection solutions have been useful for botnet

attack detection. More so, heuristic-based detections are based

on adjustable thresholds. The classiﬁcation of botnet detection

solutions can be summarized below.

2.1. Honeynets based detection

Honeynet is a collection of simulated servers called honeypots

on a physical server. The honeypots are loopholes that are inten-

tionally introduced to motivate attackers to attack the system.

The main purpose is to gather bot signatures and mechanisms of

the C&C server [23]. The honeynet-based detection usually gener-

ates a report regarding the detected bot signatures to better under-

stand the penetration mechanisms of the botnet [24]. However, the

damages caused by the botnet are not always detected. Honeypots

can be classiﬁed as low-interaction and high-interaction honeypots

based on their simulation capability. The low-interaction honey-

pots allow partial penetration to the attackers through the simula-

tion of a few features that deﬁne the real system [25]. In other

words, low-interaction honeypots limit the accessibility of the

attackers to the real system through controlled features. For exam-

ple, Provos [20] presented a low interaction honeypot framework

called Honeyd. The framework simulates virtual honeypots with

thousands of IP addresses at the network level. The developed

Honeyd showed high security capability in botnet detection and

prevention. On the other hand, the high-interaction honeypots

allow full penetration to the attackers through the simulation of

all features of the real system. In other words, high-interaction

honeypots allow full accessibility of the attackers to the real sys-

tem through uncontrolled features. For example, Vrable et al.

[26] developed a honeypot architecture that can simulate the full

features of a real system with high scalability to support potential

hundreds of live virtual machines. The developed architecture was

able to detect attacker behavior at a faster rate than related honey-

pots architectures. In another study, Bajtoš et al. [27] proposed a

network of high-interaction honeypots for botnet detection. The

proposed method was able to analyze botnets in the infection

phase and detect botnet based on known signatures from their col-

lected datasets. The developed method used Pearson’s correlation

coefﬁcients to show dependencies between commands, and

between commands and directories used for botnet infection.

The developed method showed that it can detect various types of

botnet attacks.

2.2. Intrusion based detection

The intrusion-based detection is a more effective method of

botnet detection that collects bot signatures and trains classiﬁers

to identify any abnormal activities. Intrusion-based detection can

be classiﬁed into four methods, namely, signature-based detection,

anomaly-based detection [28,29], DNS-based detection, and

mining-based detection. Signature-based detection used known

signatures of existing botnets for botnet detection. However,

signature-based detection methods can only be used for known

botnet detection. In other words, the method cannot be used for

unknown botnet detection. For example, Gu et al. [30] presented

a real-time botnet detector consisting of both rule-based and

anomaly detector engines for attack signature gathering. The gath-

ered signatures are parsed by a correlator engine for detection and

collection of botnet infection trails. The authors evaluated the

developed system in both virtual and live network honeynet envi-

ronments. The results of the evaluation showed that the developed

system is accurate and scalable for botnet detection. In a similar

study, Xie et al. [31] developed a spam signature generation archi-

tecture called AutoRE to detect spamming botnets. The developed

AutoRE was able to detect botnet spam with a low false alarm rate

without any preclassiﬁed training data. Behal et al. [32] developed

a signature-based botnet detection and prevention architecture.

The developed architecture can ﬁrst extracts network trafﬁc to

gather information of attack types and the output stored in a data-

base. Alerts are then generated based on the attack types identiﬁed

in the network. New rules/signatures from the alerts are then

developed and updated to both a detection and prevention data-

bases. The prevention database contains rules for ﬁltering network

trafﬁc that has been detected by the rule-based detection database

to contain botnet attacks. The developed system showed that it can

dynamically develop new rules for new attacks and drop the trafﬁc

in real time. Anomaly-based detection attempts to detect botnet

using some anomaly system behaviors such as high network

latency and high trafﬁc volume. These anomaly system behaviors

indicate the presence of bot attack in the network. For example,

Chen et al. [33] proposed an ensemble anomaly-based method that

F.E. Ayo, J.B. Awotunde, S.O. Folorunso et al. Egyptian Informatics Journal 24 (2023) 313–325

314

employed normal trafﬁc for training. The method consists of two

detectors which proﬁle and analyze the anomaly behavior with

increased accuracy and reduced false alarms compared to tradi-

tional anomaly detectors. Martinez-Bea et al. [34] proposed a

hybrid real-time fast-ﬂux detection model by building a linear

SVM classiﬁer that merged the feature sets of both McGrath et al.

[35] and Hsu et al. [10] in a bid to utilize their collective strengths.

The authors argued that using the feature set of McGrath et al. [35]

and Hsu et al. [10] in isolation can lead to an unacceptable rate of

false negatives and false positives respectively. The authors trained

and validated their classiﬁer with a k-fold cross validation method

using dataset extracted from domain names advertised on various

malware reporting forums. The scheme provided a lower false pos-

itive rate compared to reviewed works examined by the authors.

Zhao & Traore [36] designed a detection system using ﬂow metrics

to create a decision tree based approach in detecting botnets. Six

(6) features were proposed for botnet detection– mean reported

TTL (MTTL) upon performing DNS queries, actual mean TTL (ATTL)

as observed over time by the detection system, Total Unique A

records (ARCRD) similar to Number of A records, A Record Change

Variance (ARCRDV) similar to the frequency of A record change, A

Record IP stability (ARCRDS) similar to the number of subnets

observed, and Domain name conﬁdence (DCONF) similar to the

domain age feature. Initially, a fast rule based detector observes

the highlighted attributes for each domain over a period of one

(1) week, and schedules domains that suggest malicious fast ﬂux

behavior for extended monitoring. DNS records of suspected

domains are polled continuously at a rate of half its TTL value on

the A records, and the responses are captured. The data captured

through this process are then transformed into a set of attributes

which are then fed into a decision tree. The results showed reduced

false positives compared to other schemes and not prone to dis-

guise attacks. Celik & Oktug [14] proposed a detection system

purely based on the DNS request and the corresponding response

packets collected from recursive DNS server. Speciﬁcally, the

authors constructed a 19-dimensional feature vector broadly cate-

gorized into ﬁve (5) groups-DNS Answer based, Domain name

based, spatial based, Network based, and Timing-based. Their fea-

ture vector is a collection of features extracted from various exist-

ing schemes. Dataset extracted from the responses obtained during

DNS queries were used to train a C4.5 decision tree classiﬁer. In

order to ﬁnd the best subset of feature that accurately detects a

fast-ﬂux botnet, 10-fold cross validation approach was employed

on their dataset. The authors initially trained and validated the

classiﬁer with each feature subset separately and observed their

corresponding accuracy. Finally, all features were merged in order

to evaluate the corresponding accuracy. The results showed a

robust detection features that is not prone to disguise attacks.

Vranken & Alizadeh [37] proposed a domain name generation algo-

rithm (DGA) using Term Frequency Inverse Document Frequency

(TF-IDF). The authors used TF-IDF to measure the rates of the most

occurring terms in domain names, and use these as features for

their learning algorithms. The results of their comparison showed

that deep learning model using TF-IDF features yielded the best

results achieving high classiﬁcation accuracy. Cucchiarelli et al.

[38] proposed an algorithmically generated malicious domain

names detection based on n-grams features. The proposed scheme

represented the domain names through a set of features using 2–3-

grams in a single unclassiﬁed and classiﬁed domain names. The

authors used the Kullback-Leibner divergence and the Jaccard

Index to evaluate similarity, and deployed state-of-the-arts

machine learning algorithms to classify each domain. The results

showed that the proposed scheme yielded a good level of accuracy

and the scheme was able to classify previously unseen domains.

Muhammad et al. [39] proposed a machine learning approach for

early stage botnet detection. In this paper, the authors proposed

an approach for early-stage botnet detection. The proposed

approach ﬁrst selects the optimal features using Principal Compo-

nent Analysis (PCA) and Information Gain (IG) feature selection

techniques. The selected features were then fed into machine

learning classiﬁers for botnet detection. The results revealed that

the proposed approach is accurate with low false alarms for an

early stage botnet detection. Haq & Singh [40] developed a

machine learning scheme for botnet detection. The approach

divided the adapted dataset into two subsets and then applied k-

means clustering on one set and j48 classiﬁcation on the other

set. The mean of the accuracy of k-means clustering and j48 classi-

ﬁcation approach (hybrid approach) was calculated for botnet

detection. The hybrid approach was compared with the clustering

and classiﬁcation approach. The results showed that the hybrid

approach is balanced for classiﬁcation and clustering of botnet

attacks. Randhawa et al. [41] proposed a security hardening of bot-

net detectors using generative adversarial networks (GANs). The

authors used GAN to generate an extended dataset to the original

train set to mitigate adversarial evasion attacks in botnet detec-

tion. The results showed that GANs can provide quality botnet

detection samples compared to the traditional trafﬁc generation

methods. Although, generated samples not valid as real-life trafﬁc

and the scheme is vulnerable to evasion mechanisms. Stiawan

et al. [42] proposed a dimensionality reduction approach for

machine learning based botnet detection in Internet of Things

(IoT). The authors used random projection method for dimension-

ality reduction to enhance state-of-the-arts machine learning

methods to detect botnet in IoT. The experiment results showed

random projection method combined with decision tree was able

to detect IoT botnet at fast time and high accuracy. Hosseini

et al. [43] proposed a Convolutional Neural Network and Long

Short Term Memory (CNN-LSTM) for botnet detection. The objec-

tive of the study is to detect botnets based on neural network

and the Negative Selection Algorithm (NSA). The authors used data

wrangling method on the adapted dataset for data normalization.

The normalized data is then fed into the NSA phase to reduce

dimension. The authors then used a data scaling method based

on the z-score algorithm and the scaled data used on CNN-LSTM

algorithm for botnet detection. The results showed shorter training

time and high detection accuracy. Lefoane et al. [44] proposed an

optimized feature selection based on machine learning approach

(decision tree, logistic regression, and support vector machine))

for botnet detection. The ﬁrst part of the study is the use of a fea-

ture selection approach to remove less important features for bot-

net attack detection. The feature selection is based on the

frequency of occurrence of the counted values to total instances

in each of the features. The second part used the selected features

to build machine learning classiﬁers for botnet detection. The pro-

posed approach was tested on a standard IoT dataset and the

results revealed that the proposed feature selection approach has

enhanced the detection accuracy of the machine learning classi-

ﬁers with low false alarm rate. Kolpe & Kshirsagar [45] presented

a botnet detection approach using Bayes classiﬁer. The authors

used different ﬁlter-based feature selection schemes to select the

most important features for botnet detection and the selected fea-

tures used as input into a naive Bayes classiﬁer. The results of the

study showed that naïve Bayes classiﬁer achieved the best accu-

racy for botnet detection using the CICIDS-2017 DoS dataset. For

further studies on anomaly-based detection, literatures such as

[46,47,48,49,50,51] are recommended.

2.3. DNS-based detection

The botnetmaster frequently masquerades phishing web ser-

vers as legitimate DNS and then redirecting users to the C&C server

for malicious attacks. Therefore, DNS-based detection is based on

F.E. Ayo, J.B. Awotunde, S.O. Folorunso et al. Egyptian Informatics Journal 24 (2023) 313–325

315

the monitoring and detection of DNS trafﬁc anomalies generated

by a botnetmaster. The same anomaly-based detection algorithms

can be utilized for DNS-based detection. For example, Alieyan et al.

[52] proposed a DNS rule-based method for botnet detection. The

method applied DNS query and response rules to detect any anom-

aly DNS query and response activities. The proposed method

showed an improved accuracy and low false alarm rate for botnet

detection. For further studies on DNS-based detection, literatures

such as [53,54,55,56] are recommended.

2.4. Mining-based detection

Mining-based detection uses several data mining and machine

learning algorithms to detect botnet C&C trafﬁc. For example, Ibra-

him et al. [57] proposed a multilayer framework that consists of ﬁl-

tering and detection modules for botnet detection. The ﬁltering

module was to ﬁlter and reduce the number of network features

and group the network trafﬁc in the minimum time interval. The

detection module then used the reduced network features for bot-

net detection based on a multilayer framework. The result showed

that the proposed method can detect botnet with good accuracy.

For further studies on mining-based detection, literatures such as

[58,59,60,61] are recommended.

2.5. Heuristic-based detection

Heuristic-based detection system (HBDS) employs a dynamic

threshold score calculation based on some rules or statistical anal-

ysis of the network trafﬁc for attack classiﬁcation. HBDS used an

adjustable threshold score to adjust to patterns in network trafﬁc

and reduce the false alarm rate [62]. The limitation of HBDS is

the ability to correctly classify attacks based on the optimization

of their threshold decision. For example, Ramachandran et al.

[63] proposed a set of heuristics methods to detect DNS-based

Black-hole List (DNSBL) lookup queries executed by a botmaster

to know whether their bot have been blacklisted. The proposed

heuristic method was able to provide counter intelligent measures

to the methods used by botmasters to determine blacklisted bots.

The goal of the proposed heuristic model is to detect in real-time

DNSBL queries executed by botmasters from legitimate DNSBL

queries. However, the proposed heuristic model cannot handle dis-

tributed DNSBL queries by botmaster.

Table 1 shows the detailed literature surveys and their

limitations.

2.6. Motivation of the work

To address some of the limitations highlighted in the summary

of related works, this study adapted a reliable and effective feature

to deal with current evasion schemes adopted by botnetmasters.

To overcome the identiﬁed problems, this study developed an

improved Bot-FFX that adopts a KNN classiﬁer rooted in rule-

based GA consisting of three features: Standard deviation of

Round-Trip-Time, Average Google Hits, and number of IP address

over a time window. The main motivation for adopting the KNN

is to beneﬁt from the algorithm’s high detection accuracy. The

adoption of the listed features is to tackle the problem of evasion

as they exhibit different behaviours for both FFB and legitimate

domains. Additionally, a rule-based GA technique is introduced

to reduce the time taken to differentiate between legitimate and

botnet domains advertizing the same set of IP address over the

time window.

2.7. Genetic algorithm

Genetic algorithms (GAs) used the computer to simulate the

process of natural selection and evolution [64]. This notion origi-

nates from the ‘‘adaptive survival in natural organisms”. GAs were

ﬁrst proposed by Goldberg & Holland [65] and have been success-

fully applied to the ﬁeld of machine learning [66,67]. The algorithm

begins with a randomly generated population of individual pro-

grams. The determination of how good an individual is in a popu-

lation is rooted in their performance evaluation and based on

various types of ﬁtness measures. Then, at every iteration, a com-

puterized genetic recombination and mixing is performed on the

current population of individual programs to replace a less ﬁt indi-

vidual program by a high performing individual program. That is, a

program with a low ﬁtness value is removed and replaced by pro-

grams with high ﬁtness value for the next computer iteration.

2.8. K-Nearest Neighbor

K-nearest neighbor (k-NN) is one of the simplest of all machine

learning techniques. It is regarded as the traditional nonparametric

technique in pattern recognition for the categorization of data

[68,69]. It classify and assigns objects to the modal class of its pre-

deﬁned nearest neighbors. K represents the number of predeﬁned

nearest neighbors for an object and denotes a vital factor that

determines the performance of the classiﬁer. Different k-values

will trigger different performances in the classiﬁer and thus a con-

siderably small positive integer is needed for k-value. A big and

even number k-value can adversely affect the classiﬁcation time

and impact the prediction accuracy, while a small and odd number

k-value can increase the prediction accuracy [70].k-NN is termed

instance-based learning because of its peculiarity compared to

the inductive learning methods [71]. Thus, k-NN as an instance-

based learning, does not include a model training phase, instead

it determines the instances of input attributes and classiﬁes new

instances based on the determined k-nearest neighbor of the

new instance.

2.9. Decision tree algorithm

Decision tree is a non-parametric supervised learning method

commonly used for classiﬁcation [72,73]. In other words, it does

not require any prior assumptions regarding the type of probability

distributions satisﬁed by the class or other attributes. The goal of a

decision tree is to create a model that predicts the value of a target

variable by learning simple decision rules inferred from the data

features [74,75]. In a decision tree, each leaf node is assigned a

class label. The non-terminal nodes, which include the internal

nodes and root node contain attribute test conditions to separate

records that have different characteristics.

3. Methodology

In this study, a Bot-FFX was developed to accurately differenti-

ate between legitimate and botnet domains. The developed system

uses a rule-based GA technique and k-NN algorithm for fast ﬂux

botnet detection. The architecture of the Bot-FFX is described in

Fig. 1.

The Bot-FFX is divided into modules which include –.

3.1. The extractor

The extractor is responsible for DNS queries on domains to:

extract domains and IP addresses

F.E. Ayo, J.B. Awotunde, S.O. Folorunso et al. Egyptian Informatics Journal 24 (2023) 313–325

316

extract Round Trip Times (RTT) of associated IP addresses

extract the Google Hits of each IP address.

3.2. The ﬁlter

The ﬁlter mechanism of the Bot-FFX uses the blacklist and

whitelist ﬁltering concepts to classify the incoming domains. The

actions executed by the ﬁlter mechanism are deﬁned as follows:

Deny Access to domain, if ðD2BÞ//Botnet

Grant Access to domain, if ðD2WÞ// Benign

Send domain to resolver, if ðDRfB[WgÞ

Where, D¼Domain name, B¼Blacklist of known botnet

domains and W¼Whitelist of known benign domains.

3.3. The resolver

The ﬁlter mechanism can send the unclassiﬁed domains to the

resolver. The resolver extracts all IP addresses associated with

the domain at its Time-To-Live (TTL) within a time frame of

10 min. The operations of the resolver are deﬁned in three phases:

Name-server resolution: The set of authoritative name servers

of the domain is extracted. The output of the operation results in a

Table 1

Summary of related works.

Author(s) and year Method Strength Limitation

Provos [20] Honeyd -Security

-Spam prevention

-Vulnerability to evasion mechanisms

-Long detection time

Vrable et al. [26] Potemkin -Scalable

-Security

-Vulnerability to attacks

-Not completely scalable to denial-of-Service

Bajtoš et al. [27] Honeynet -Botnet attacks detection

-Botnet analysis

-High dimensionality of feature set

-Vulnerability to evasion mechanisms

Gu et al. [30] IDS-Driven Dialog

Correlation

-Accurate for botnet detection

-Scalable

-Vulnerability to evasion mechanisms

-High dimensionality of feature set

Xie et al. [31] Spamming Botnets -High detection accuracy

-Low false alarm rate

-Ability to detect frequent domain

modiﬁcations

-High dimensionality of feature set

-Vulnerability to evasion mechanisms

Behal et al. [32] Signature-based botnet

detection

-Dynamic rule generation for botnet detection

-Real-time monitoring and detection

-It requires access to a current database of attack

signatures

-Vulnerability to evasion mechanisms

Chen et al. [33] Ensemble anomaly-

based method

-Increased accuracy

-Reduced false alarms

-High dimensionality of feature set

-Vulnerability to evasion mechanisms

Alieyan et al. [52] DNS rule-based

method

-High detection accuracy

-Low false alarm rate

-It cannot detect Peer-to-Peer botnets

-Vulnerability to evasion mechanisms

Ibrahim et al. [57] Multilayer framework -It can detect botnet with good accuracy

-Low false-negative rate

-Long processing and detecting time

-Reduced performance while clustering decentralized

botnets

Ramachandran et al. [63] Heuristic method -It can detect DNS-based Black-hole List

(DNSBL)

-It cannot handle distributed DNSBL queries by

botmaster

-High dimensionality of feature set

Martinez-Bea et al. [34] SVM classiﬁer -Resilience of the scheme to evasion techniques -False positives

-False negatives

Zhao & Traore [36] Decision tree -Reduced false positives

-Not prone to disguise attacks

-Does not provide real time detection

Celik & Oktug [14] DCA-based on n-grams

features

-Robust detection features.

-Not prone to disguise attacks

-Does not provide real time detection

-Computationally expensive scheme to implement

Vranken & Alizadeh [37] DCA-based on TF-IDF

features

-High classiﬁcation accuracy

-Usage of TF-IDF for feature selection

-Lack of comparative analysis with related works

-Vulnerability to evasion mechanisms

-High dimensionality of feature set

Cucchiarelli et al. [38] DCA-based on n-grams

features

-Effective classiﬁcation of previously unseen

domains

-High classiﬁcation accuracy

-Long processing and detecting time

-High dimensionality of feature set

Muhammad et al. [39] Machine learning

approach

-Optimal features selection

-High detection accuracy

-It cannot detect decentralized

P2P based botnets

-Vulnerability to evasion mechanisms

Haq & Singh [40] Hybrid machine

learning approach

-High clustering accuracy -Vulnerability to evasion mechanisms

Randhawa et al. [41] Generative adversarial

network

-Generate quality botnet detection samples

-Robust to data imbalance

-Decrease false positives

-Generated samples not valid as real-life trafﬁc

-Vulnerability to evasion mechanisms

Stiawan et al. [42] Random projection

method based on

machine learning

method

-High detection accuracy

-Low false positive rate

-Fast detection time

-Vulnerability to evasion mechanisms

Hosseini et al. [43] CNN-LSTM -Shorter training time

-High accuracy

-Vulnerability to evasion mechanisms

Lefoane et al. [44] Feature selection-

based machine

learning approach

-Good accuracy

-Low false alarm

-Vulnerability to evasion mechanisms

Kolpe & Kshirsagar [45] Bayes classiﬁer -Optimal feature selection

-High detection accuracy

-Vulnerability to evasion mechanisms

F.E. Ayo, J.B. Awotunde, S.O. Folorunso et al. Egyptian Informatics Journal 24 (2023) 313–325

317

set of name servers Xfx

;;x

gsuch that jXj2. The query

syntax used by the resolver for name-server resolution is -

dig þshortNSdomain ð1Þ

where, þshort ¼command to restrict output to only name-server

records, NS ¼command that speciﬁes the request for name server

records and domain ¼the domain whose name server record is

needed

Time-To-Live resolution: This involves the extraction of the

TTL of the domain to deﬁne the moving window for IP address res-

olution. The query syntax used by the resolver for TTL extraction is

as follows:

dig@domain þtracettlid ð2Þ

where, @domain ¼the domain name whose TTL value is needed,

þtrace ¼indicates the downward traversal from the root name ser-

ver to the authoritative name server of the domain name and ttlid ¼

indicator specifying the request for TTL value

IP Address resolution: For each TTL window, the set of IP

address mapped to the domain is extracted for a speciﬁed time

frame of 10 min. The query syntax for IP address extraction is as

follows:

dig@NSdomain ð3Þ

where, @NS ¼a name server x

such that x

2X,domain ¼the

domain whose IP address are needed. This query returns a set of

IP address Yfy

;;y

gsuch that jYj1. The speciﬁed time

frame of 10 min for resolution is required to enable the accumula-

tion of several IP addresses of the domain. Researchers such as

Knysz et al. [18], and Hsu et al. [17] revealed that the IP addresses

of botnet domains increased rapidly after the ﬁrst DNS query. This

phenomenon is adopted by botnetmasters to evade real-time detec-

tion solutions that focus on 1 DNS query for the accumulation of

domain IP addresses.

3.4. The detection

This module is responsible for classiﬁcation in cases where the

ﬁlter mechanism cannot determine the benign nature of the

domain. It comprises of three modules, namely, SDRTT, AGH, and

detector.

Standard Deviation of Round-Trip-Times (SDRTT): This part

of the detection computes the standard deviation of Round-Trip-

Time (SDRTT) on the set of IP addresses of the domain. The dis-

tances between the system and the bots are expected to be quite

large and the SDRTT will be a relatively large value. SD-RTT is com-

puted as:

RTT ¼ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

i¼1

fðÞ

tð4Þ

Where, t

= the RTT between the system and IP address i,n¼

the total number of IP addresses, f= the mean of the total RTT.

Average Google Hits (AGH): This part of the detection com-

putes the average number of hits returned by querying a search

engine using all IP addresses associated with the domain. The aver-

age Google hit for each domain is computed as –

AGH ¼P

i¼1

GoogleHitðiÞ

nð5Þ

Where, i¼the ith IP address of the domain and n¼the total

number of IP address associated with the domain.

Detector: The C4.5 decision tree was built to automatically gen-

erate the detection rules for the K-NN. The detector computes

Genetic Threshold Value (GTV) for IP addresses using the GA-

KD-k-NN algorithm. It performs detection based on the decision

tree rules and the GTV using k-NN distance measure as in (4).

The KNN used KD tree for the search of K-values. The Manhattan

distance was adapted because of its suitability for high dimension-

Fig. 1. Bot-FFX architecture.

F.E. Ayo, J.B. Awotunde, S.O. Folorunso et al. Egyptian Informatics Journal 24 (2023) 313–325

318

ality in data. The k-NN add IP address to the whitelist if the IP

address is less than the k-NN distance value (D

knn

) as computed

in equation (8). Otherwise, the IP address is added to the blacklist.

Once an IP is added in the blacklist, a further request from the IP is

automatically rejected by the ﬁrewall ﬁltering unit of the detector.

The ﬁtness can be computed as –

fitness ¼X

i¼0

match weight

ð6Þ

where, nis the number of genes present in each chromosome. In

this case, the gene means properties to be checked for each network

domain, where each network domain is equal to a chromosome. The

properties which might be considered as genes for a network

domain includes: source IP address, destination IP address, source

port number, destination port number, size of packet, number of

hops between the source and destination, TTL, packet type, payload,

checksum, sequence number etc.

GTV ¼bestfitness worstfitness

totalnumberoffitness ð7Þ

knn

¼X

i¼1

GTV x:

alue iðÞy:

alue iðÞ

ð8Þ

where, x:

alue = input query, y:

alue = known data point closest to

the input query, k¼the k-value. Table 2 shows the overall GA-KD-

KNN algorithm.

4. Implementation, results and discussion

The implementation was carried out on an Intel(R) Pentium(R)

CPU N3710 @ 1.60 GHz with 4 GB RAM running on Windows 10

operating system. The experimentation for the developed Bot-

FFX was implemented with JAVA, NetBeans 8, MySQL, JavaML,

Jgap, Weka J48, JKDTreeKNN and LibSVM API, Jsoup Java API for

Google search query, JFreeChart Java API and Microsoft Excel for

chart development.

4.1. Description of dataset

The dataset adopted for the evaluation of the developed Bot-FFX

consists of 2,000 benign domains collected from Alexa website and

1630 malicious domains obtained from various malware reporting

media such as Domain Name System Black List (DNSBL), Zeus

tracker monitor, and DNS Black Hole (DNSBH) project. For each

domain, several queries were performed to extract the speciﬁed

Table 2

The GA-KD-KNN algorithm.

Step 1. Generate the initial population,n

Step 2. Fitness value estimation

fori:¼1ton

Apply elitism by copying the best GA

individual to the next generation

Apply tournament selection

Apply uniform crossover

Apply mutation operator

Calculate the ﬁtness function for each

individual using equation (6)

Compute GTV using equation (7)

end

Step 3. Select the best individual that has the highest ﬁtness value

Step 4. Apply k-NN based on GTV in 7

fori:¼1ton

Compute

knn

¼P

i¼1

GTV x:

alue iðÞy:

alue iðÞ

Compute the group of nearest neighbors

end

Step 5. //k -d tree nearest neighbor search

T T1þT2

T1 ﬁrst side of the splitting plane

T2 other side of the splitting plane

While ðcurrent node root node:T1)

if search pointðcurrent nodeÞ

current node

alue (D

knn

)then

search first :¼left

else

search first :¼right

end

if leaf nodeisreachedthen

current best :¼node point

end

Repeat Step 5 for T2

if (current best exist in T2)

current best :¼node pointend

Return current best

Step 6. //return the majority class label for unknown domain

if IP address <D

knn

aluethen

add IP address to whitelist

else

add IP address to blacklist

end

Step 7. Block further request from the blacklisted IP addresses

F.E. Ayo, J.B. Awotunde, S.O. Folorunso et al. Egyptian Informatics Journal 24 (2023) 313–325

319

features (Standard deviation of Round-Trip-Time, Average Google

Hits, and Number of IP address over a time window). The collected

datasets were also used by other authors in the literature for per-

formance evaluation. Table 3 shows the summary of the adopted

datasets.

4.2. Experimentation

In the development of Bot-FFX, a number of experiments were

conducted. The experiments are described and analyzed in this

section. When tested and used for the experiment, the Bot-FFX

was observed to successfully perform the following tasks:

(i) Executing DNS query on domains to extract IP addresses

(ii) Extracting Round Trip Times (RTT) of associated IP addresses

(iii) Extracting the Google Hits of each IP address

(iv) Calculate GTV for each IP address

(v) Build the C4.5 decision tree

(vi) Build the k-NN detector

(vii) Update the blacklist based on the detection

In order to generate the rules for the k-NN detector, the C4.5

decision tree was used. The result of this decision tree can be

described as set of rules, encapsulating the adopted feature set,

that are used during detection. The rules (e.g. rules 001 to 004)

generated by the decision tree are then extended by the distance

value based on the genetic threshold value (e.g. rules 005 to 006)

for botnet detection. The rule representation of the C4.5 decision

tree in Fig. 2 is as follows:

001: IF att1129.51 THEN domain is benign.

002: IF att1>129.51 AND att08 THEN domain is benign.

003: IF att1129.51 AND att0>8 AND att2866 THEN

domain is benign.

004: IF att1129.51 AND att0>8 AND att2>866 THEN

domain is botnet.

005: 003: IF att1129.51 AND att0>8 AND att2866 AND

att3 > 8 THEN domain is benign.

006: 003: IF att1129.51 AND att0>8 AND att2866 AND

att3 less than 8 THEN domain is botnet.

Where, att0¼number of IP address associated with the

domain, att1¼standard deviation of round-trip time for the

domains’ set of IP addresses, att2¼average google hits for the

domains’ set of IP addresses and att3¼distance value based on

the GTV.

4.3. Result analysis and evaluation

The datasets obtained after extracting the speciﬁed features

were adopted for performance evaluation using benchmarked per-

formance metrics.

4.3.1. Performance metrics

The performance of the developed Bot-FFX was measured based

on standard metrics which are False Positive Rate (FPR), False

Negative Rate (FNR), True Positive Rate (TPR), True Negative Rate

(TNR) and Overall Accuracy (OA). To evaluate the Bot-FFX, the

dataset was split into 50% training and 50% testing data, which

was also adopted by Lin et al. [3] and Hsu et al. [17]. The training

and testing data contained 1000 benign and 815 botnet domains,

respectively. The results obtained in this study were compared

with GRADE in Lin et al. [3], FFD in Hsu et al. [17], MLP in Ibrahim

et al. [57], Logistic regression in Palaniappan et al. [76], Random

forest in Sivaguru et al.[77], and Random forest in Patsakis &

Casino [78].

4.3.2. Bot-FFX testing results

The testing dataset was tested with three machine learning

algorithms namely: Genetic Algorithm and K-Nearest Neighbors

(GA-k-NN), k-NN and Support Vector Machines (SVM). The justiﬁ-

cation for this evaluation is to determine the learning algorithm

that best suits the detector. The SVM algorithm was implemented

with the aid of LibSVM [79]. The performance of these algorithms

was evaluated in terms of False Positive Rate (FPR), False Negative

Rate (FNR), True Positive Rate (TPR), True Negative Rate (TNR) and

Accuracy.

Table 4 shows the testing results. The results from Table 4

revealed that GA- k-NN,k-NN and SVM algorithms provided an

overall accuracy of 99.178%, 96.362% and 98.741% respectively.

These results informed the decision to adopt the GA-k-NN as the

most suitable learning algorithm for the detector module. Table 5

and Table 6 show the performance comparison of the developed

GA-k-NN and traditional k-NN on benign and botnet domains,

respectively. Table 5 revealed that GA-k-NN provided OA of

96.858% on benign domain and OA of 99.178% on the botnet

domain. Similarly, Table 6 revealed that k-NN provided OA of

98.706% on benign domain and OA of 96.362% on botnet domain.

Table 7 revealed that SVM provided OA of 96.858% on benign

domain and OA of 98.741% on botnet domain. These results

showed that the developed GA-k-NN is better for botnet detection

when compared to the traditional SVM and k-NN algorithms.

4.3.3. Analysis of feature set

The high performance of Bot-FFX can be attributed to the efﬁ-

cacy of the adopted attributes for differentiating between botnet

and benign domains. To support the above statement, a plot of

the IP address utilization, Standard Deviation of Round-Trip Time

(SDRTT) and the Average Google Hits (AGH) for each domain cate-

gory was carried out on the adopted dataset (Figs. 3 to 5). In Fig. 3,

it is apparent that the distribution of IP address utilization varies

for each domain category. The bar chart shows that about 93.2%

of benign domains advertised less than 9 IP addresses while botnet

domains advertised a minimum of 9 IP addresses during a total of

10 min of IP resolution duration. The justiﬁcation for the 10 min

time frame was due to the fact that many legitimate domains

use short TTL values.

A critical look at Fig. 4 revealed that the SDRTT of botnet

domain is much higher than that of benign domain. A large number

of benign domain exhibited SDRTT values lower than 200 ms. In

contrast, only a few botnet domains exhibit such behavior and this

is due to the geographical dispersion of the set of IP address

adopted by botnets. Similarly, Fig. 5 revealed that the Google foot-

print for botnet domains is much higher than for that of benign

domains. A large number of benign domains exhibited AGH values

less than 10,000. In contrast, most botnet domains exhibited AGH

values above 10,000.

The contributions of this study to science are –.

Table 3

Summary of Datasets.

Datasets Instances Category

Alexa

2000 Benign

DNSBL project

110 FFSN

ZeusTracker monitor

20 FFSN

DNSBH

1500 FFSN

http:// https://www.alexa.com.

http:// htttp://dnsbl.abuse.ch/fastﬂuxtracker.php.

http:// https://zeus.abuse.ch/monitor.php?ﬁlter = level 5.

http:// https://www.malwaredomains.com.

F.E. Ayo, J.B. Awotunde, S.O. Folorunso et al. Egyptian Informatics Journal 24 (2023) 313–325

320

Fig. 2. C.45 decision tree based on the adopted feature set.

Table 4

Performance Comparison of GA-k-NN, k-NN and SVM.

Algorithm TPR TNR FPR FNR OA (%)

GA-k-NN 808 793 7 7 99.178%

k-NN 784 769 31 31 96.362%

SVM 813 798 2 2 98.741%

Table 5

Performance Comparison of GA-k-NN on Benign and Botnet domains.

ItrNo

BENIGN DOMAINS BOTNET DOMAINS

NoTI TPR FPR OA (%) NoTI TPR FPR OA (%)

1 1000 982 18 96.673 815 795 20 97.653

2 1000 962 38 92.976 815 808 7 99.178

3 1000 983 17 96.858 815 800 15 98.239

4 1000 978 22 95.934 815 807 8 99.061

5 1000 983 17 96.858 815 799 16 98.122

6 1000 981 19 96.488 815 805 10 98.826

7 1000 982 18 96.673 815 802 13 98.474

8 1000 980 20 96.303 815 804 11 98.709

9 1000 983 17 96.858 815 801 14 98.357

10 1000 981 19 96.488 815 803 12 98.592

ItrNo – Iteration Number, NoTI –Number of Test Instance.

F.E. Ayo, J.B. Awotunde, S.O. Folorunso et al. Egyptian Informatics Journal 24 (2023) 313–325

321

The development of a genomic k-NN Fast-Flux Botnet detection

system for attacks classiﬁcation.

The use of reliable features to enhance the detection of fast ﬂux

botnets.

The study also introduced a rule representation method for

whitelisting domains.

4.4. Discussion

4.4.1. Summary of test results

Table 4 shows the performance comparison of GA-k-NN, k-NN,

and SVM in terms of TPR, TNR, FPR, FNR, and OA. The OA column

showed that GA-k-NN has the highest accuracy of 99.178% com-

pared to the other well-known algorithms of k-NN and SVM with

OA of 96.362% and 98.741% respectively. The increase in accuracy

rate for GA-k-NN was due to the use of an optimization method

in GA and a rule representation method that guide the selection

of best solutions. The increase in accuracy of the GA-k-NN is also

slightly dependent on the use of Standard deviation of Round-

Trip-Time, Average Google Hits, and Number of IP address as the

adopted features to enhance the detection of fast ﬂux botnets.

Table 6

Performance Comparison of k-NN on Benign and Botnet domains.

ItrNo BENIGN DOMAINS BOTNET DOMAINS

NoTI TPR FPR OA (%) NoTI TPR FPR OA (%)

1 1000 968 32 94.085 815 772 43 94.953

2 1000 949 51 90.573 815 784 31 96.362

3 1000 980 20 96.303 815 769 46 94.601

4 1000 977 23 95.749 815 775 40 95.305

5 1000 985 15 97.227 815 767 48 94.366

6 1000 983 17 96.858 815 772 43 94.953

7 1000 993 7 98.706 815 760 55 93.545

8 1000 988 12 97.782 815 765 50 94.132

9 1000 992 8 98.521 815 755 60 92.958

10 1000 990 10 98.152 815 762 53 93.779

ItrNo – Iteration Number, NoTI –Number of Test Instance.

Table 7

Performance Comparison of SVM on Benign and Botnet domains.

ItrNo BENIGN DOMAINS BOTNET DOMAINS

NoTI TPR FPR OA (%) NoTI TPR FPR OA (%)

1 1000 980 20 95.562 815 793 22 96.542

2 1000 959 41 91.754 815 813 2 98.741

3 1000 981 19 95.635 815 803 12 97.128

4 1000 976 24 94.723 815 810 5 98.152

5 1000 981 19 95.747 815 797 18 97.234

6 1000 979 21 95.379 815 802 13 97.715

7 1000 980 20 95.562 815 800 15 97.363

8 1000 978 22 95.212 815 802 13 97.617

9 1000 981 19 95.747 815 799 16 97.246

10 1000 979 21 95.379 815 801 14 97.481

ItrNo – Iteration Number, NoTI –Number of Test Instance.

100

1-4 5-8 9-12 13-16 17-20

88.2

50.6 5.9 0.1

37.8

50.5

11.5

Domains (%)

IP address range

Benign

Botnet

Fig. 3. IP Address utilization for benign and botnet domains.

Fig. 4. SDRTT for benign and botnet domains in the dataset.

F.E. Ayo, J.B. Awotunde, S.O. Folorunso et al. Egyptian Informatics Journal 24 (2023) 313–325

322

In order to improve predictions and remove the problem of

unbalanced data in classiﬁcation, k-fold (k = 5) cross-validation

was used. This study randomly divided the training data into 5

equal sized subsets. A single subset was applied to test the devel-

oped method and the remaining 4 subsets were used as the train-

ing data. The comparative test results were obtained on the same

computational platform.

4.4.2. Comparison of GA-k-NN on benign and botnet domains

Table 5 shows the test experiments of GA-k-NN conducted on

both the benign and botnet domains. The results of the test showed

that GA-k-NN provided OA of 96.858% on the benign domain and

OA of 99.178% on the botnet domain. The high performance of

GA-k-NN can be attributed to the efﬁcacy of the adopted attributes

for differentiating between botnet and benign domains.

4.4.3. Comparison of k-NN and SVM on benign and botnet domains

Table 6 shows the test experiments of k-NN conducted on both

the benign and botnet domains. The results of the test showed that

k-NN provided OA of 98.706% on the benign domain and OA of

96.362% on the botnet domain. The results of the k-NN can be

attributed to the algorithm’s high detection accuracy. Table 7

shows the test experiment of SVM conducted on both the benign

and botnet domains. The results revealed that SVM provided OA

of 96.858% on benign domain and OA of 98.741% on botnet domain.

The high performance of GA-k-NN compared to k-NN and SVM can

be attributed to the adoption of the used features to tackle the

problem of botnet master evasion as they exhibit different beha-

viours for both botnet and benign domains. Additionally, GA-k-

NN was better than k-NN and SVM due to the introduction of the

GA rooted in a rule representation method to reduce the time

taken to differentiate between legitimate and botnet domains

advertizing the same set of IP address over the time window.

4.4.4. Overall performance of Bot-FFX

The developed Bot-FFX showed overall performance with OA of

99.178%, FPR of 0.8%, and FNR of 0.8%. This result shows the posi-

tive contribution of Bot-FFX for botnet attack classiﬁcation with

reduced false alarm rate. The reduced false alarm rate was due to

the ability of the Bot-FFX to clearly differentiate between botnet

and benign domains using the adopted attributes.

4.4.5. IP address utilization for benign and botnet domains

Fig. 3 shows the plot of IP Address utilization for benign and

botnet domains. Previous results has established the high perfor-

mance of Bot-FFX. To further justify the high performance of Bot-

FFX, a plot of the IP address utilization for each domain category

was carried out on the adopted dataset. In Fig. 3, it is apparent that

the distribution of IP address utilization varies for each domain

category. The bar chart shows that about 93.2% of benign domains

advertised less than 9 IP addresses while botnet domains adver-

tised a minimum of 9 IP addresses during a total of 10 min of IP

resolution duration. The justiﬁcation for the 10 min time frame

was due to the fact that many legitimate domains use short TTL

values.

4.4.6. SDRTT for benign and botnet domains in the dataset

Fig. 4 shows the plot of SDRTT for benign and botnet domains in

the dataset. In order to support the results for the high perfor-

mance of the Bot-FFX, a plot of the Standard Deviation of Round-

Trip Time (SDRTT) for each domain category was carried out on

the adopted dataset. A critical look at Fig. 4 revealed that the

SDRTT of botnet domain is much higher than that of benign

domain. A large number of benign domain exhibited SDRTT values

lower than 200 ms (Fig. 4a). In contrast, only a few botnet domains

exhibit such behavior (Fig. 4b) and this is due to the geographical

dispersion of the set of IP address adopted by botnets.

4.4.7. AGH for benign and botnet domains in the dataset

Fig. 5 shows the plot of AGH for benign and botnet domains in

the dataset. Fig. 5 revealed that the Google footprint for botnet

domains is much higher than for that of benign domains. A large

number of benign domains exhibited AGH values less than

10,000 (Fig. 5a). In contrast, most botnet domains exhibited AGH

values above 10,000 (Fig. 5b). This is because malicious domains

are one of the key domains that attackers used to perpetrate mali-

cious actions over the Internet. Hence, the botnet domains exhibit-

ing AGH values above 10,000 compared to their benign

counterpart.

4.4.8. Benchmarking Bot-FFX with related works

Table 8 compared the performance of the developed Bot-FFX

with GRADE in Lin et al. [3], FFD in Hsu et al. [17], MLP in Ibrahim

et al. [57], Logistic regression in Palaniappan et al. [76], Random

forest in Sivaguru et al.[77], and Random forest in Patsakis &

Casino [78]. This study implemented and tested the methods under

comparison on the same datasets and computational platform. The

evaluation results obtained from the comparison, indicated that

Fig. 5. AGH for benign and botnet domains in the dataset.

Table 8

Comparison of Bot-FFX with related works.

Detection Approaches OA (%) FN rate (%) FP rate (%)

Lin et al. [3] 96.5 1.6 1.9

Hsu et al. [17] 93.4 1.5 0.7

Ibrahim et al. [57] 98.7 0.9 0.8

Palaniappan et al. [76] 91.5 1.6 1.8

Sivaguru et al. [77] 98.4 1.5 1.7

Patsakis & Casino [78] 98.5 1.4 1.5

Bot-FFX 99.2 0.8 0.8

F.E. Ayo, J.B. Awotunde, S.O. Folorunso et al. Egyptian Informatics Journal 24 (2023) 313–325

323

the developed Bot-FFX is better than Lin et al. [3], Hsu et al. [17],

Ibrahim et al. [57], Palaniappan et al. [76], Sivaguru et al. [77],

and Patsakis & Casino [78] respectively. The main advantage of

Bot-FFX over the other related implemented systems is the

requirement of a set of three (3) features depending on the ﬁlter

module decision. This requirement reduces the time needed to

train the GA-k-NN classiﬁer to few minutes. Besides, the practical

deployment of Bot-FFX will result to the detection of a botnet

domain within 20 min of deployment. The developed Bot-FFX is

also robust to dynamic environment since genetic algorithm can

varies in accordance to the current situation. The results of the

developed Bot-FFX also showed an optimized solution due to the

fact that genetic algorithm always produce the best result.

5. Conclusion and future work

The evolution of the Internet and the network of anonymous

users unaware of the need of Internet security has led to Fast-

Flux Botnets as a means of exploitation by money-driven cyber-

criminals. Fast-Flux Botnet is a prevalent security challenge as it

provides botnetmasters the opportunity to remotely control the

network of infected hosts. In the literature, a number of solutions

have been developed to reduce this menace. However, these solu-

tions are still limited in detection accuracy due to the ineffective-

ness of the adopted feature set. In this study, Bot-FFX was

developed with this limitation in mind, and this resulted in the uti-

lization of a rule-based GA scheme and three effective features that

are fed into a k-NN built on the decision tree and KD tree algo-

rithms. Bot-FFX was tested on a public dataset and benchmarked

with GRADE in Lin et al. [3], FFD in Hsu et al. [17], MLP in Ibrahim

et al. [57], Logistic regression in Palaniappan et al. [76], Random

forest in Sivaguru et al. [77], and Random forest in Patsakis &

Casino [78]. The evaluation results showed that the developed

Bot-FFX is better in detection accuracy and achieved the best false

negative rate of 0.8% compared to other related implemented

methods. In the future, a machine learning classiﬁer in combina-

tion with genetic algorithm will be deployed in the extractor mod-

ule to produce the GTV and another machine learning classiﬁer

based on the GTV will be deployed in the detector module.

CRediT authorship contribution statement

Femi Emmanuel Ayo: Conceptualization, Methodology. Joseph

Bamidele Awotunde: Data curation. Sakinat Oluwabukonla

Folorunso: Visualization, Investigation. Matthew O. Adigun:

Supervision. Sunday Adeola Ajagbe: Validation.

Declaration of Competing Interest

The authors declare that they have no known competing ﬁnan-

cial interests or personal relationships that could have appeared

to inﬂuence the work reported in this paper.

References

[1] Zhang, L., Shui, Y., Di, W. & Paul, W. 2011. A Survey on Latest Botnet Attack and

Defense. In: Proceedings of International Joint Conference of IEEE Trustcom-

11/IEEE ICESS-11/FCST-11. Changsha China pp.53-60.

[2] Butt UJ, Richardson W, Nouman A, Agbo HM, Eghan C, Hashmi F. Cloud and Its

Security Impacts on Managing a Workforce Remotely: A Reﬂection to Cover

Remote Working Challenges. In: Cybersecurity, Privacy and Freedom

Protection in the Connected World. Cham: Springer; 2021. p. 285–311.

[3] Lin H-T, Lin Y-Y, Chiang J-W. Genetic-based Real-time Fast-Flux Service

Networks Detection. J. Comput. Networks: Elsevier 2013;57(2):501–13.

[4] Holz, T., Gorecki, C., Rieck, K. & Freiling F.C. 2008. Detection and mitigation of

fast-ﬂux service networks. In: Proceedings of the 15th Network and

Distributed System Security Symposium. San Diego USA.

[5] Lallie HS, Shepherd LA, Nurse JR, Erola A, Epiphaniou G, Maple C, et al. Cyber

security in the age of covid-19: A timeline and analysis of cyber-crime and

cyber-attacks during the pandemic. Comput Secur 2021;105:102248.

[6] Stalmans, E. & Irwin, B. 2011. A framework for DNS based detection and

mitigation of malware infections on a network. In: Proceedings of the 10th

IEEE International Conference on Information Security. Johannesburg South

Africa pp.1-8.

[7] Khari M, Dalal R, Rohilla P. Extended paradigms for botnets with WoT

applications: a review. Smart Innovation of Web of Things 2020:105–22.

[8] Aruna J, Shyry SP. Survey on Artiﬁcial Intelligence Based Resilient Recovery of

Botnet Attack. In: In 2021 5th International Conference on Trends in Electronics

and Informatics (ICOEI). IEEE; 2021. p. 1–8.

[9] Firat I. Inevitable Battle Against Botnets. In: Management Association IR,

editor. Research Anthology on Combating Denial-of-Service Attacks:. IGI Global;

2021. p. 1–19.

[10] Hsu, C-H., Huang, C-Y. & Chen, K-T. 2010. Fast-ﬂux bot detection in real time.

In: Proceedings of the 13th International Conference on Recent Advances in

Intrusion Detection (RAID). Springer Berlin Heidelberg pp.464–483.

[11] Passerini E, Roberto P, Lorenzo M, Danilo B. FluXOR: Detecting and Monitoring

Fast-Flux Service Networks. Berlin: Detection of Intrusions and Malware, and

Vulnerability Assessment, Springer; 2008. p. 186–206.

[12] Ahmad R, Alsmadi I. Machine learning approaches to IoT security: A

systematic literature review. Internet of Things 2021;100365.

[13] Kumar P, Gupta GP, Tripathi R. Toward design of an intelligent cyber attack

detection system using hybrid feature reduced approach for iot networks. Arab

J Sci Eng 2021;46(4):3749–78.

[14] Celik, Z.B. & Oktug, S. 2013. Detection of Fast-Flux Networks Using Various

DNS Feature Sets. In: Proceedings of IEEE Symposium on Computers and

Communications (ISCC). Split Croatia pp.000868 – 000873.

[15] Ashraf J, Keshk M, Moustafa N, Abdel-Basset M, Khurshid H, Bakhshi AD, et al.

IoTBoT-IDS: A Novel Statistical Learning-enabled Botnet Detection Framework

for Protecting Networks of Smart Cities. Sustain Cities Soc 2021;103041.

[16] Zhang J, Ling Y, Fu X, Yang X, Xiong G, Zhang R. Model of the intrusion

detection system based on the integration of spatial-temporal features.

Comput Secur 2020;89:101681.

[17] Hsu F-H, Wang C-SfC-H, Tso C-K, Chen L-H, Lin S-H. Detect Fast-Flux Domains

Through Response Time Differences. IEEE J Sel Areas Commun 2014;32

(10):1947–56.

[18] Knysz, M., Hu, X. & Shin, K. 2011. Good guys vs. bot guise: Disguise attacks

against fast-ﬂux detection systems. In: Proceedings of 2011 IEEE INFOCOM.

Shanghai China pp.1844-1852.

[19] Zhu Z, Lu G, Chen Y, Fu ZJ, Roberts P, Han K. Botnet research survey. In: In 2008

32nd Annual IEEE International Computer Software and Applications

Conference. IEEE; 2008. p. 967–72.

[20] Provos, N. 2004. A Virtual Honeypot Framework. In USENIX Security

Symposium (Vol. 173, No. 2004, pp. 1-14).

[21] Choo KKR. Zombies and botnets. Trends Issues Crime Crim Justice

2007;333:1–6.

[22] Dagon, D., Zou, C. C., & Lee, W. 2006. Modeling Botnet Propagation Using Time

Zones. In NDSS (Vol. 6, pp. 2-13).

[23] Zeidanloo HR, Shooshtari MJZ, Amoli PV, Safari M, Zamani M. A taxonomy of

botnet detection techniques. In 2010 3rd International Conference on Computer

Science and Information Technology, Vol. 2. IEEE; 2010. p. 158–62.

[24] Wang TZ, Wang HM, LIU B, Shi PC. Some critical problems of botnets. Chinese J

Comput 2012;35(6):1192–208.

[25] Alparslan E, Karahoca A, Karahoca D. BotNet detection: Enhancing analysis by

using data mining techniques. Advances in Data Mining Knowledge Discovery

and Applications 2012;Vol. 349.

[26] Vrable M, Ma J, Chen J, Moore D, Vandekieft E, Snoeren AC, et al. Scalability,

ﬁdelity, and containment in the potemkin virtual honeyfarm. SIGOPS Oper

Syst Rev 2005;39(5):148–62.

[27] Bajtoš, T., Sokol, P., & Mézešová, T. 2018. Virtual honeypots and detection of

telnet botnets. In Proceedings of the Central European Cybersecurity Conference

2018 (pp. 1-6).

[28] Kumar P, Gupta GP, Tripathi R. Design of anomaly-based intrusion detection

system using fog computing for IoT network. Autom Control Comput Sci

2021;55(2):137–47.

[29] Kumar, P., Tripathi, R., & P. Gupta, G. 2021d. P2IDF: a privacy-preserving based

intrusion detection framework for software deﬁned Internet of Things-fog

(SDIoT-Fog). In Adjunct Proceedings of the 2021 International Conference on

Distributed Computing and Networking (pp. 37-42).

[30] Gu, G., Porras, P. A., Yegneswaran, V., Fong, M. W., & Lee, W. 2007. Bothunter:

Detecting malware infection through ids-driven dialog correlation. In USENIX

Security Symposium (Vol. 7, pp. 1-16).

[31] Xie Y, Yu F, Achan K, Panigrahy R, Hulten G, Osipkov I. Spamming botnets:

signatures and characteristics. ACM SIGCOMM Computer Communication

Review 2008;38(4):171–82.

[32] Behal S, Brar AS, Kumar K. Signature-based botnet detection and prevention.

In: In Proceedings of International Symposium on Computer Engineering and

Technology. p. 127–32.

[33] Chen T, Zhou G, Liu Z, Jing T. A novel ensemble anomaly based approach for

command and control channel detection. In: In Proceedings of the 2020 4th

International Conference on Cryptography, Security and Privacy. p. 74–8.

[34] Martinez-Bea, S., Castillo-Perez, S., & Garcia-Alfaro, J. 2013. Real-time

malicious fast-ﬂux detection using DNS and bot related features. In 2013

Eleventh Annual Conference on Privacy, Security and Trust (pp. 369-372). IEEE.

F.E. Ayo, J.B. Awotunde, S.O. Folorunso et al. Egyptian Informatics Journal 24 (2023) 313–325

324

[35] McGrath DK, Kalafut A, Gupta M. Phishing infrastructure ﬂuxes all the way.

IEEE Secur Priv 2009;7(5):21–8.

[36] Zhao, D., & Traore, I. 2012. P2P botnet detection through malicious fast ﬂux

network identiﬁcation. In 2012 Seventh International Conference on P2P,

Parallel, Grid, Cloud and Internet Computing (pp. 170-175). IEEE.

[37] Vranken H, Alizadeh H. Detection of DGA-Generated Domain Names with TF-

IDF. Electronics 2022;11(3):414.

[38] Cucchiarelli A, Morbidoni C, Spalazzi L, Baldi M. Algorithmically generated

malicious domain names detection based on n-grams features. Expert Syst

Appl 2021;170:114551.

[39] Muhammad, A., Asad, M., & Javed, A. R. 2020. Robust early stage botnet

detection using machine learning. In 2020 International Conference on Cyber

Warfare and Security (ICCWS) (pp. 1-6). IEEE.

[40] Haq, S., & Singh, Y. 2018. Botnet detection using machine learning. In 2018

Fifth International Conference on Parallel, Distributed and Grid Computing

(PDGC) (pp. 240-245). IEEE.

[41] Randhawa RH, Aslam N, Alauthman M, Raﬁq H, Comeau F. Security hardening

of botnet detectors using generative adversarial networks. IEEE Access

2021;9:78276–92.

[42] Stiawan, D., Ariﬁn, M. A. S., Rejito, J., Idris, M. Y., & Budiarto, R. 2021. A

Dimensionality Reduction Approach for Machine Learning Based IoT Botnet

Detection. In 2021 8th International Conference on Electrical Engineering,

Computer Science and Informatics (EECSI) (pp. 26-30). IEEE.

[43] Hosseini S, Nezhad AE, Seilani H. Botnet detection using negative selection

algorithm, convolution neural network and classiﬁcation methods. Evol Syst

2022;13(1):101–15.

[44] Lefoane M, Ghaﬁr I, Kabir S, Awan IU. Machine Learning for Botnet Detection:

An Optimized Feature Selection Approach. In: In The 5th International

Conference on Future Networks & Distributed Systems. p. 195–200.

[45] Kolpe P, Kshirsagar D. Botnet Detection Using Bayes Classiﬁer. In: Applied

Information Processing Systems. Singapore: Springer; 2022. p. 321–30.

[46] Hoang XD, Nguyen QC. Botnet detection based on machine learning

techniques using DNS query data. Future Internet 2018;10(5):43.

[47] Nõmm, S., & Bahsßi, H. 2018. Unsupervised anomaly based botnet detection in

IoT networks. In 2018 17th IEEE international conference on machine learning

and applications (ICMLA) (pp. 1048-1053). IEEE.

[48] Shang, Y., Yang, S., & Wang, W. 2018. Botnet detection with hybrid analysis on

ﬂow based and graph based features of network trafﬁc. In International

Conference on Cloud Computing and Security (pp. 612-621). Springer, Cham.

[49] Maeda, S., Kanai, A., Tanimoto, S., Hatashima, T., & Ohkubo, K. 2019. A botnet

detection method on SDN using deep learning. In 2019 IEEE International

Conference on Consumer Electronics (ICCE) (pp. 1-6). IEEE.

[50] Ayo FE, Folorunso SO, Abayomi-Alli AA, Adekunle AO, Awotunde JB. Network

intrusion detection based on deep learning model optimized with rule-based

hybrid feature selection. Informat Secur J Global Perspect 2020;29(6):267–83.

[51] Kumar P, Gupta GP, Tripathi R. PEFL: Deep Privacy-Encoding-Based Federated

Learning Framework for Smart Agriculture. IEEE Micro 2021;42(1):33–40.

[52] Alieyan K, Almomani A, Anbar M, Alauthman M, Abdullah R, Gupta BB. DNS

rule-based schema to botnet detection. Enterprise Informat Syst 2021;15

(4):545–64.

[53] Kwon J, Lee J, Lee H, Perrig A. PsyBoG: A scalable botnet detection method for

large-scale DNS trafﬁc. Comput Netw 2016;97:48–73.

[54] Pomorova, O., Savenko, O., Lysenko, S., Kryshchuk, A., & Bobrovnikova, K. 2016.

Anti-evasion technique for the botnets detection based on the passive DNS

monitoring and active DNS probing. In International Conference on Computer

Networks (pp. 83-95). Springer, Cham.

[55] Wang TS, Lin HT, Cheng WT, Chen CY. DBod: Clustering and detecting DGA-

based botnets using DNS trafﬁc analysis. Comput Secur 2017;64:1–15.

[56] Dwyer, O. P., Marnerides, A. K., Giotsas, V., & Mursch, T. 2019. Proﬁling IoT-

based Botnet Trafﬁc using DNS. In 2019 IEEE Global Communications

Conference (GLOBECOM) (pp. 1-6). IEEE.

[57] Ibrahim WNH, Anuar S, Selamat A, Krejcar O, Crespo RG, Herrera-Viedma E,

et al. Multilayer framework for botnet detection using machine learning

algorithms. IEEE Access 2021;9:48753–68.

[58] Masud, M. M., Al-Khateeb, T., Khan, L., Thuraisingham, B., & Hamlen, K. W.

2008. Flow-based identiﬁcation of botnet trafﬁc by mining multiple log ﬁles.

In 2008 ﬁrst international conference on distributed framework and

applications (pp. 200-206). IEEE.

[59] Shahrestani, A., Feily, M., Ahmad, R., & Ramadass, S. 2009. Architecture for

applying data mining and visualization on network ﬂow for botnet trafﬁc

detection. In 2009 International Conference on Computer Technology and

Development (Vol. 1, pp. 33-37). IEEE.

[60] Liao, W. H., & Chang, C. C. 2010. Peer to peer botnet detection using data

mining scheme. In 2010 international conference on internet technology and

applications (pp. 1-4). IEEE.

[61] Folorunso O, Ayo FE, Babalola YE. Ca-NIDS: A network intrusion detection

system using combinatorial algorithm approach. J Informat Priv Secur 2016;12

(4):181–96.

[62] Dora V, Lakshmi VN. Optimal feature selection with CNN-feature learning for

DDoS attack detection using meta-heuristic-based LSTM. Int J Intellig Robot

Appl 2022:1–27.

[63] Ramachandran A, Feamster N, Dagon D. Revealing botnet membership using

dnsbl counter-intelligence. Sruti 2006;6:49–54.

[64] Koza JR. Genetic programming: On the programming of computers by means

of natural selection. Massachusetts: MIT; 1992.

[65] Goldberg, D. E., & Holland, J. H. 1988. Genetic algorithms and machine

learning. Machine Learning, 3(2): 95–99 Springer, USA.

[66] Alcalá R, Gacto MJ, Herrera F, Alcalá-Fdez J. A multi-objective genetic

algorithm for tuning and rule selection to obtain accurate and compact

linguistic fuzzy rule-based systems. Int J Uncertainty, Fuzzin Knowledge-

Based Syst, World Scientiﬁc: Singapore 2007;15(05):539–57.

[67] Fernández A, López V, del Jesus MJ, Herrera F. Revisiting Evolutionary Fuzzy

Systems: Taxonomy, applications, new trends and challenges. Knowl-Based

Syst 2015;80:109–21.

[68] Bishop CM. Neural networks for pattern recognition. England: Oxford

University; 1995.

[69] Manocha S, Girolami MA. An empirical analysis of the probabilistic Knearest

neighbour classiﬁer. Pattern Recogn Lett 2007;28:1818–24.

[70] Chaudhari P, Agarwal H, Bhateja V. Data augmentation for cancer classiﬁcation

in oncogenomics: an improved KNN based approach. Evol Intel 2021;14

(2):489–98.

[71] Mitchell T. Machine learning. New york: McGraw Hill; 1997.

[72] Navada A, Ansari AN, Patil S, Sonkamble BA. Overview of use of decision tree

algorithms in machine learning. In: In 2011 IEEE control and system graduate

research colloquium. IEEE; 2011. p. 37–42.

[73] Yan X, He J, Zhang C, Liu Z, Qiao B, Zhang H. Single-vehicle crash severity

outcome prediction and determinant extraction using tree-based and other

non-parametric models. Accid Anal Prev 2021;153:106034.

[74] Rathore SS, Kumar S. A decision tree logic based recommendation system to

select software fault prediction techniques. Computing 2017;99(3):255–85.

[75] Muñoz V, Vallejo M, Aedo JE. Machine learning models for predicting crime

hotspots in medellin city. In: In 2021 2nd Sustainable Cities Latin America

Conference (SCLA). IEEE; 2021. p. 1–6.

[76] Palaniappan G, Sangeetha S, Rajendran B, Goyal S, Bindhumadhava BS.

Malicious domain detection using machine learning on domain name

features, host-based features and web-based features. Procedia Comput Sci

2020;171:654–61.

[77] Sivaguru R, Peck J, Olumoﬁn F, Nascimento A, De Cock M. Inline detection of

DGA domains using side information. IEEE Access 2020;8:141910–22.

[78] Patsakis C, Casino F. Exploiting statistical and structural features for the

detection of Domain Generation Algorithms. J Informat Secur Appl

2021;58:102725.

[79] Chang C-H, Lin C-J. LIBSVM: a library for support vector machines. ACM Trans

Intell Syst Technol 2011;2(27):27.

F.E. Ayo, J.B. Awotunde, S.O. Folorunso et al. Egyptian Informatics Journal 24 (2023) 313–325

325

An improved Harris Hawks optimizer based feature selection technique with effective two-staged classifier for network intrusion detection system

Article

Full-text available

Jun 2024

Due to the increase in network attacks, maintaining network security is significantly difficult, to overcome security vulnerabilities Intrusion Detection System (IDS) is utilized. IDS is a software application that monitors the network traffic and detects the malicious activity in the network. Network Intrusion Detection System (NIDS) identifies the suspicious behaviour of nodes in the network by analysing the network traffic. Most of the existing IDS suffer from achieving better feature selection with high classification accuracy with reduced false alarm rate. In the proposed system, the Principal Component Analysis (PCA) technique is utilized to reduce the dimensionality of the dataset. Improved Harris Hawks Optimizer (IHHO) is employed for effective feature selection which provides powerful global search capability. For classification, two-staged classifier is proposed which employs Support Vector Machine (SVM) for stage-1 and K-Nearest Neighbors (KNN) for stage-2. The main goal of the proposed system is to combine the advantages of SVM and KNN to enhance classification accuracy with a reduced false alarm rate. The performance of the proposed system is evaluated by using the NSL- KDD dataset and it has achieved an overall classification accuracy of 95.01%, a False alarm rate of 0.01%, and an overall detection rate of 92.01%.

B-CAT: a model for detecting botnet attacks using deep attack behavior analysis on network traffic flows

Article

Full-text available

Apr 2024

Threats on computer networks have been increasing rapidly, and irresponsible parties are always trying to exploit vulnerabilities in the network to do various dangerous things. One way to exploit vulnerabilities in a computer network is by employing malware. Botnets are a type of malware that infects and attacks targets in groups. Botnets develop quickly; the characteristics of initially sporadic attacks have grown into periodic and simultaneous. This rapid development has proved that the botnet is advanced and requires more attention and proper handling. Many studies have introduced detection models for botnet attack activity on computer networks. Apart from detecting the presence of botnet attacks, those studies have attempted to explore the characteristics of botnets, such as attack intensity, relationships between activities, and time segment analysis. However, there has been no research that explicitly detects those characteristics. On the other hand, each botnet characteristic requires different handling, while recognizing the characteristics of the botnet can help network administrators make appropriate decisions. Based on these reasons, this research builds a detection model that can recognize botnet characteristics using sequential traffic mining and similarity analysis. The proposed method consists of two main processes. The first is training to build a knowledge base, and the second is testing to detect botnet activity and attack characteristics. It involves dynamic thresholds to improve the model sensitivity in recognizing attack characteristics through similarity analysis. The novelty includes developing and combining analytical techniques of sequential traffic mining, similarity analysis, and dynamic threshold to detect and recognize the characteristics of botnet attacks explicitly on actual behavior in network traffic. Extensive experiments have been conducted for the evaluation using three different datasets whose results show better performance than others.

Ontology-Based Layered Rule-Based Network Intrusion Detection System for Cybercrimes Detection

Article

Full-text available

Feb 2024
KNOWL INF SYST

The need to secure Internet applications on global networks has become an important task due to the ever-increasing cybercrimes. A common technique for identifying intrusions in computer networks is the Network Intrusion Detection System (NIDS). Several Intrusion Detection Systems have been proposed previously, but these systems are still limited in detection and error rates. Additionally, most of the detection techniques used a set of static rules and manual taxonomies for the detection of intrusions. In this study, a layered rule-based NIDS using ontology was developed. The study adapted a layered attribute evaluator approach to choose the best attributes for NIDS. In order to automatically construct the rules for intrusion detection, the chosen attributes were trained with a classification tree. The created rules are then introduced into the Protégé software for the ontology classification of NIDS. In contrast with taxonomies, the generated ontology provides comprehensive definitions of the concepts inside the NIDS domain that are machine interpretable and illustrates the relationships between the concepts. The findings revealed that the developed approach has 97.431% accuracy, 97.48% precision, 97.41% recall, and 97.41% F1-score on the original dataset. Similarly, the developed approach reported 98.21% accuracy, 98.21% precision, 98.21% recall, and 98.21% F1-score on the reduced dataset. These results demonstrated that the developed approach outperformed the other similar approaches on both the original and reduced datasets. The developed approach also showed better training time compared to the other related approaches.

A novel hybrid feature selection and ensemble-based machine learning approach for botnet detection

Article

Full-text available

Dec 2023

In the age of sophisticated cyber threats, botnet detection remains a crucial yet complex security challenge. Existing detection systems are continually outmaneuvered by the relentless advancement of botnet strategies, necessitating a more dynamic and proactive approach. Our research introduces a ground-breaking solution to the persistent botnet problem through a strategic amalgamation of Hybrid Feature Selection methods—Categorical Analysis, Mutual Information, and Principal Component Analysis—and a robust ensemble of machine learning techniques. We uniquely combine these feature selection tools to refine the input space, enhancing the detection capabilities of the ensemble learners. Extra Trees, as the ensemble technique of choice, exhibits exemplary performance, culminating in a near-perfect 99.99% accuracy rate in botnet classification across varied datasets. Our model not only surpasses previous benchmarks but also demonstrates exceptional adaptability to new botnet phenomena, ensuring persistent accuracy in a landscape of evolving threats. Detailed comparative analyses manifest our model's superiority, consistently achieving over 99% True Positive Rates and an unprecedented False Positive Rate close to 0.00%, thereby setting a new precedent for reliability in botnet detection. This research signifies a transformative step in cybersecurity, offering unprecedented precision and resilience against botnet infiltrations, and providing an indispensable blueprint for the development of next-generation security frameworks.

Deep learning techniques for detection and prediction of pandemic diseases: a systematic literature review

Article

Full-text available

May 2023
MULTIMED TOOLS APPL

Deep learning (DL) is becoming a fast-growing field in the medical domain and it helps in the timely detection of any infectious disease (IDs) and is essential to the management of diseases and the prediction of future occurrences. Many scientists and scholars have implemented DL techniques for the detection and prediction of pandemics, IDs and other healthcare-related purposes, these outcomes are with various limitations and research gaps. For the purpose of achieving an accurate, efficient and less complicated DL-based system for the detection and prediction of pandemics, therefore, this study carried out a systematic literature review (SLR) on the detection and prediction of pandemics using DL techniques. The survey is anchored by four objectives and a state-of-the-art review of forty-five papers out of seven hundred and ninety papers retrieved from different scholarly databases was carried out in this study to analyze and evaluate the trend of DL techniques application areas in the detection and prediction of pandemics. This study used various tables and graphs to analyze the extracted related articles from various online scholarly repositories and the analysis showed that DL techniques have a good tool in pandemic detection and prediction. Scopus and Web of Science repositories are given attention in this current because they contain suitable scientific findings in the subject area. Finally, the state-of-the-art review presents forty-four (44) studies of various DL technique performances. The challenges identified from the literature include the low performance of the model due to computational complexities, improper labeling and the absence of a high-quality dataset among others. This survey suggests possible solutions such as the development of improved DL-based techniques or the reduction of the output layer of DL-based architecture for the detection and prediction of pandemic-prone diseases as future considerations.

The Construction of Network Domain Name Security Access Identification System Based on Artificial Intelligence

Article

Full-text available

Jan 2023

Lin Li

With the popularization of the internet, cybercrime continues to increase, and traditional blacklist methods have difficulty in coping with new threats. To address this challenge, the authors propose a web domain name security access recognition algorithm based on bidirectional recurrent neural networks, aiming to more effectively combat domain name generation technology. This algorithm extracts richer semantic features at each layer through bidirectional recurrent neural networks to more accurately describe domain name features, thus effectively handling SGD problems in abnormal network traffic detection. The results show that compared with the other three algorithms, the model trained by HCA-BAGD has better performance and higher accuracy, successfully solving the problem of network security detection. This study emphasizes the importance of cybersecurity and emphasizes continuous innovation and the adoption of new technological tools to ensure the safe operation of the internet ecosystem, bringing new perspectives and solutions to research and applications in the field of cybersecurity.

Faulty Line Identification in AC–DC Hybrid Grids Based on MTF and Improved Resnet

Article

Full-text available

Jan 2023

As the degree of AC-DC hybridization of power grids is increasing, their fault characteristics become more complex, and the current hybrid grid fault diagnosis methods, despite their high accuracy, are not sufficiently adaptable, so there is an urgent need to study new grid fault diagnosis methods. To this end, the Markov Transition Field combined with an improved Resnet fault line identification method for AC-DC hybrid grids is proposed. First, the data is reconstructed by the improved complete ensemble empirical mode decomposition with adaptive noise, and then the MTF is used to transform the one-dimensional signal into a two-dimensional picture, then on the basis of residual neural network, the original network is improved by adding multi-branch cavity convolution structure and Ghost module, and the fault features are adaptively extracted and classified by the improved network, so as to realize faulty line identification. The experimental results show that the proposed method can effectively identify the fault lines of AC-DC hybrid power grid, the improved residual neural network can dig out the fault features more deeply, and has strong anti-noise and anti-data loss interference ability, the method has 99.91% fault line identification accuracy. It has higher recognition performance compared to traditional machine learning algorithms and various deep learning algorithms.

Analyzing and detecting Botnet Attacks using Anomaly Detection with Machine Learning

Conference Paper

Aug 2023

Detection of DGA-Generated Domain Names with TF-IDF

Article

Full-text available

Jan 2022

Botnets often apply domain name generation algorithms (DGAs) to evade detection by generating large numbers of pseudo-random domain names of which only few are registered by cybercriminals. In this paper, we address how DGA-generated domain names can be detected by means of machine learning and deep learning. We first present an extensive literature review on recent prior work in which machine learning and deep learning have been applied for detecting DGA-generated domain names. We observe that a common methodology is still missing, and the use of different datasets causes that experimental results can hardly be compared. We next propose the use of TF-IDF to measure frequencies of the most relevant n-grams in domain names, and use these as features in learning algorithms. We perform experiments with various machine-learning and deep-learning models using TF-IDF features, of which a deep MLP model yields the best results. For comparison, we also apply an LSTM model with embedding layer to convert domain names from a sequence of characters into a vector representation. The performance of our LSTM and MLP models is rather similar, achieving 0.994 and 0.995 AUC, and average F1-scores of 0.907 and 0.891 respectively.

Optimal feature selection with CNN-feature learning for DDoS attack detection using meta-heuristic-based LSTM

Article

Full-text available

Jun 2022

Nowadays, the digital era is reshaped by new technologies, and the cyber-attacks are more sophisticated and becoming as a commonplace. The distributed denial of service (DDoS) attacks are the exponentially-growing and major prevalent attack that targets the emerging and changing computational network infrastructures around the globe. It is complex to distinguish the DDoS attack traffic from the legitimate network traffic when the transit happens from the zombies or attacker to the victim. The DDoS attack is considered as a stubborn network security conflict. Yet, these algorithms need a priori knowledge regarding the classes, and it is not possible to adapt to the subsequent varying network traffic trends in an automatic manner. This creates the requirement for the enhancement of the novel DDoS detection mechanisms that in turn sophisticated and targets the DDoS attacks. The main intent of this paper is to implement the DDoS detection model through deep learning by the integration of convolutional neural network (CNN), and optimized long short-term memory (LSTM), so called CNN-O-LSTM. On the standard five benchmark datasets, the optimal feature selection is performed by the closest position-based grey wolf optimization (CP-GWO) with the consideration of minimizing the correlation among the features. With the optimally selected features, CNN is adopted for the feature learning process, from which the features of the second pooling layer are extracted, which is used for performing the detection. The adoption of optimally selected features with the CNN features enhances the detection performance with the most significant features. Finally, the optimized LSTM is used in the detection phase, which aims to maximize the detection accuracy by optimizing the hidden neurons of LSTM. The proposed DDoS detection scheme is experimented on a set of benchmark datasets, and the outcomes are compared over the traditional models.

A Dimensionality Reduction Approach for Machine Learning Based IoT Botnet Detection

Conference Paper

Full-text available

Oct 2021

Security Hardening of Botnet Detectors Using Generative Adversarial Networks

Article

Full-text available

May 2021

Machine learning (ML) based botnet detectors are no exception to traditional ML models when it comes to adversarial evasion attacks. The datasets used to train these models have also scarcity and imbalance issues. We propose a new technique named Botshot, based on generative adversarial networks (GANs) for addressing these issues and proactively making botnet detectors aware of adversarial evasions. Botshot is cost-effective as compared to the network emulation for botnet traffic data generation rendering the dedicated hardware resources unnecessary. First, we use the extended set of network flow and time-based features for three publicly available botnet datasets. Second, we utilize two GANs (vanilla, conditional) for generating realistic botnet traffic. We evaluate the generator performance using classifier two-sample test (C2ST) with 10-fold 70-30 train-test split and propose the use of ’recall’ in contrast to ’accuracy’ for proactively learning adversarial evasions. We then augment the train set with the generated data and test using the unchanged test set. Last, we compare our results with benchmark oversampling methods with augmentation of additional botnet traffic data in terms of average accuracy, precision, recall and F1 score over six different ML classifiers. The empirical results demonstrate the effectiveness of the GAN-based oversampling for learning in advance the adversarial evasion attacks on botnet detectors.

Machine Learning for Botnet Detection: An Optimized Feature Selection Approach

Conference Paper

Dec 2021

Machine Learning Models for Predicting Crime Hotspots in Medellin City

Conference Paper

Aug 2021

PEFL: Deep Privacy-Encoding-Based Federated Learning Framework for Smart Agriculture

Article

Sep 2021

Smart Agriculture (SA) incorporates low cost and low energy consuming sensors and devices to enhance quantitative and qualitative agricultural production. However, these devices uses an open channel i.e., Internet and generates large amount of data in real-time and thus, has the potential to be misused. As a consequence, the major concern in the implementation of SA is minimizing the risk of security and data privacy violation (e.g., adversaries performing inference attacks). To address these challenges, we propose PEFL, a deep privacy-encoding based Federated Learning (FL) framework that adopts a perturbation-based encoding and Long-Short Term Memory-AutoEncoder (LSTM-AE) technique to achieve the target of privacy. Then a FL-based Gated Recurrent Unit Neural Network algorithm (FedGRU) is designed using the encoded data for intrusion detection. The experiment results based on ToN-IoT dataset reveals that the PEFL can efficiently identify normal and attack patterns after transformation over other non-FL and FL methods.

Botnet Detection Using Bayes Classifier

Chapter

Jul 2021

Kolpe, PraptiKshirsagar, DeepakIn today’s connected world, risk of getting attacked over the internet is increased, which plays a major role in infecting the devices over the internet. The internet is flooded with different malwares, but we have focused on the harmful effects of Botnet. Botnet is a group of devices controlled by a single device to attack and infect other devices over the internet. The devices are called bots and these can be any internet-connected device and the single device controlling these can be called as a botmaster or a bot driver. It is crucial to detect them at a faster rate since they can perform various malicious activities. We performed different experiments to detect Botnet. For experimentation, we used CICIDS2017 dataset and different machine learning algorithms from Weka. With the ML algorithms, we achieved the highest accuracy of 98.9146% for NaiveBayesMultinominalText algorithm.

Survey on Artificial Intelligence Based Resilient Recovery of Botnet Attack

Conference Paper

Jun 2021

IoTBoT-IDS: A Novel Statistical Learning-enabled Botnet Detection Framework for Protecting Networks of Smart Cities

Article

May 2021

The rapid proliferation of the internet of things (IoT) systems, has enabled transforming urban areas into smart cities. Smart cities’ paradigm has resulted in improved quality of life and better services to citizens, like smart healthcare, smart parking, smart transport, smart buildings, smart homes, and so on. One of the major challenges of IoT devices is the limited capacity of their battery because the devices consume a large amount of energy once they communicate with each other. Furthermore, the IoT-based smart systems would contain sensitive data about network systems, introducing serious privacy and security issues. IoT-based smart systems are highly exposed to botnet attacks. Examples of such attacks are Mirai and BASHLITE malware launched from compromised surveillance devices, which are common in smart cities, resulting in paralysis of Internet-based services through distributed denial of service (DDoS) attacks. Such DDoS attacks on IoT devices and their networks further threaten the emerging concept of sustainable smart cities. To discover such cyberattacks, this paper proposes a novel statistical learning-based botnet detection framework, called IoTBoT-IDS, which protects IoT-based smart networks against botnet attacks. IoTBoT-IDS captures the normal behavior of IoT networks by applying statistical learning-based techniques, using Beta Mixture Model (BMM) and a Correntropy model. Any deviation from the normal behavior is detected as an anomalous event. To evaluate IoTBoT-IDS, three benchmark datasets generated from realistic IoT networks were used. The evaluation results showed that IoTBoT-IDS effectively identifies various types of botnets with an average detection accuracy of 99.2%, which is higher by about 2-5% compared with compelling intrusion detection methods, namely AdaBoost ensemble learning, fuzzy c-means, and deep feed forward neural networks.