ArticlePDF Available
A genomic rule-based KNN model for fast flux botnet detection
Femi Emmanuel Ayo
a
, Joseph Bamidele Awotunde
b,
, Sakinat Oluwabukonla Folorunso
a
,
Matthew O. Adigun
c
, Sunday Adeola Ajagbe
d
a
Department of Mathematical Sciences, Olabisi Onabanjo University, Ago-Iwoye 120107, Ogun State, Nigeria
b
Department of Computer Science, Faculty of Information and Communication Sciences, University of Ilorin, Ilorin 240003, Kwara State, Nigeria
c
Department of Information Technology, Cape University of Technology, Cape Town, South Africa
d
Computer & Industrial Production Engineering, First Technical University, Ibadan, 200255, Oyo State, Nigeria
article info
Article history:
Received 30 March 2023
Accepted 5 May 2023
Keywords:
Botnet Detection
Fast Flux Botnet
K-Nearest Neighbor
Genetic Algorithm
Fuzzy Logic
abstract
Fast Flux Botnet (FFB) is an advance method developed by cyber criminals to perpetrate distributed mali-
cious attacks. The major problems of existing FFB detection systems are the vulnerability to evasion
mechanisms, long detection time, and high dimensionality of the feature set. In this study, an improved
FFB detection architecture called Bot-FFX was developed to address some of these problems. The devel-
oped Bot-FFX consists of four modules: extractor, filter, resolver, and detector. The extractor module is
responsible for Domain Name System (DNS) queries on domains. The filter module can classify the
incoming domains as either blacklist or whitelist and sends the unclassified domains to the resolver.
The resolver extracts all IP addresses associated with the domain at its Time-To-Live (TTL) within a time
frame of 10 min. The detector module uses a rule-based Genetic Algorithm (GA) and K-Nearest Neighbor
(KNN) for botnet detection. The detector computed the Standard Deviation of Round Trip Time (SDRTT),
Average Google Hits (AGH) and Genetic Threshold Value (GTV) for all IP addresses associated with the
domains. The detector, built on a decision tree rules and the K-Dimensional (KD) tree KNN algorithm,
classified the domains using the set of IP addresses, SDRTT, AGH, and GTV. The Bot-FFX was implemented
on a dataset of 2,000 benign domains and 1,630 botnet domains. The dataset was split into 50% training
and 50% testing sets. The evaluation results on the same datasets showed that Bot-FFX is an effective FFB
detection system with accuracy, false positive, and false negative of 99.178%, 0.8%, and 0.8% respectively.
Ó2023 THE AUTHORS. Published by Elsevier BV on behalf of Faculty of Computers and Artificial Intel-
ligence, Cairo University. This is an open access article under the CC BY-NC-ND license (http://creative-
commons.org/licenses/by-nc-nd/4.0/).
1. Introduction
Botnet is an organized network of distributed and infected com-
puters (zombies) executing malicious codes called bots, under the
remote command of a human originator called botnetmaster [1].
The evolution of the Internet has played host to a network of dis-
tributed and compromised computers. Hence, the use of the Inter-
net attracts certain risks which make Botnet one of the key issues
in Internet security.
The network of anonymous users of the Internet unaware of the
importance of security provides many opportunities for malicious
users to exploit [2,3]. Increasingly, malicious users are constantly
developing more advanced methods to profit from cybercrime
activities [4,5]. This occurrence has led to the design and imple-
mentation of a distributed architecture of remotely controlled net-
works of infected hosts, called botnets, for performing malicious
activities. With a single command from a Command and Control
(C&C) server, botnetmaster can control networks of vulnerable
hosts [6,7,8]. Botnetmaster usually performs maintenance and
update of their C&C set-up on Fast-Flux Service Network (FFSN)
to make the detection of bots difficult.
In FFSN, the Internet Protocol (IP) address of several zombies
are advertized as phishing web servers in the Domain Name Sys-
tem (DNS). These phishing web servers act as a masquerader by
redirecting requests to the C&C server to execute the intended
malicious services. The botnetmaster frequently changes these
phishing web servers to the IP addresses of other bots to exploit
the weakness of the Hypertext Transfer Transmission Protocol
(HTTP) to detect the C&C server [9,7,10,11]. This phenomenon
https://doi.org/10.1016/j.eij.2023.05.002
1110-8665/Ó2023 THE AUTHORS. Published by Elsevier BV on behalf of Faculty of Computers and Artificial Intelligence, Cairo University.
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
Corresponding author.
E-mail addresses: ayo.femi@oouagoiwoye.edu.ng (F.E. Ayo), awotunde.jb@uni-
lorin.edu.ng (J.B. Awotunde), sakinat.folorunso@oouagoiwoye.edu.ng (S.O. Folor-
unso), profmatthewo@gmail.com (M.O. Adigun), sunday.ajagbe@tech-u.edu.ng (S.A.
Ajagbe).
Egyptian Informatics Journal 24 (2023) 313–325
Contents lists available at ScienceDirect
Egyptian Informatics Journal
journal homepage: www.sciencedirect.com
may limit researchers to the option of detecting Fast Flux Botnets
(FFB) through the domain name masqueraded in spam mails and
other public forums.
FFB detection is a traditional machine learning task where the
features of an instance are fed to a classifier and the classifier
attempts to detect the class membership of that instance. How-
ever, unlike common classification tasks where the feature set is
fixed, FFB detection demands that the researcher learns new fea-
tures or adopt known reliable existing subsets based on the litera-
ture [12,13,14]. A number of Botnet Detection Systems (BDS) have
been developed, but the identification, adoption, and merger of
reliable features for detection still remains a problem. This is
because most of the botnet detection systems are limited in accu-
racy due to the unreliable nature of the existing features
[15,16,3,17]. Secondly, the inability of a BDS to deal with botnet-
master constant evasion mechanisms to masquerade the opera-
tions of legitimate Internet devices [18,17]. Evasion mechanisms
are the different techniques adopted by Botnetmaster to make
the detection of their bots difficult. These techniques include
advertizing the IP addresses of several zombies as phishing web
servers and performing update operations on the C&C server.
The motivation of this study is to adapt a reliable and effective
feature to deal with current evasion schemes adopted by botnet-
masters. This study developed an improved Fast Flux Botnet Detec-
tion (Bot-FFX) that adopts a K-Nearest Neighbor (KNN) classifier
rooted in rule-based Genetic Algorithm (GA) consisting of three
features: Standard deviation of Round-Trip-Time, Average Google
Hits, and number of IP address over a time window. The main
motivation for adopting the KNN is to benefit from the algorithm’s
high detection accuracy. The adoption of the listed features is to
tackle the problem of evasion as they exhibit different behaviours
for both FFB and legitimate domains. Additionally, a rule-based GA
technique is introduced to reduce the time taken to differentiate
between legitimate and botnet domains advertizing the same set
of IP addresses over the time window.
The rest of this paper is structured as follows: Section 2 pre-
sents related work. Methodology is presented in Section 3. The
implementation, evaluation, and results are presented in Section 4.
Section 5 concludes the work with future work.
2. Related work
A number of solutions have been developed by researchers in
recent years for BDS. These solutions can be mainly classified into
honeynets-based, intrusion-based, and heuristic-based detection
[19]. For instance, solutions in [20,21,22] have developed different
honeynet-based detection techniques. However, honeynet-based
detection may not necessarily detect botnet attacks, but useful to
understand botnet architecture and features. On the other hand,
intrusion-based detection solutions have been useful for botnet
attack detection. More so, heuristic-based detections are based
on adjustable thresholds. The classification of botnet detection
solutions can be summarized below.
2.1. Honeynets based detection
Honeynet is a collection of simulated servers called honeypots
on a physical server. The honeypots are loopholes that are inten-
tionally introduced to motivate attackers to attack the system.
The main purpose is to gather bot signatures and mechanisms of
the C&C server [23]. The honeynet-based detection usually gener-
ates a report regarding the detected bot signatures to better under-
stand the penetration mechanisms of the botnet [24]. However, the
damages caused by the botnet are not always detected. Honeypots
can be classified as low-interaction and high-interaction honeypots
based on their simulation capability. The low-interaction honey-
pots allow partial penetration to the attackers through the simula-
tion of a few features that define the real system [25]. In other
words, low-interaction honeypots limit the accessibility of the
attackers to the real system through controlled features. For exam-
ple, Provos [20] presented a low interaction honeypot framework
called Honeyd. The framework simulates virtual honeypots with
thousands of IP addresses at the network level. The developed
Honeyd showed high security capability in botnet detection and
prevention. On the other hand, the high-interaction honeypots
allow full penetration to the attackers through the simulation of
all features of the real system. In other words, high-interaction
honeypots allow full accessibility of the attackers to the real sys-
tem through uncontrolled features. For example, Vrable et al.
[26] developed a honeypot architecture that can simulate the full
features of a real system with high scalability to support potential
hundreds of live virtual machines. The developed architecture was
able to detect attacker behavior at a faster rate than related honey-
pots architectures. In another study, Bajtoš et al. [27] proposed a
network of high-interaction honeypots for botnet detection. The
proposed method was able to analyze botnets in the infection
phase and detect botnet based on known signatures from their col-
lected datasets. The developed method used Pearson’s correlation
coefficients to show dependencies between commands, and
between commands and directories used for botnet infection.
The developed method showed that it can detect various types of
botnet attacks.
2.2. Intrusion based detection
The intrusion-based detection is a more effective method of
botnet detection that collects bot signatures and trains classifiers
to identify any abnormal activities. Intrusion-based detection can
be classified into four methods, namely, signature-based detection,
anomaly-based detection [28,29], DNS-based detection, and
mining-based detection. Signature-based detection used known
signatures of existing botnets for botnet detection. However,
signature-based detection methods can only be used for known
botnet detection. In other words, the method cannot be used for
unknown botnet detection. For example, Gu et al. [30] presented
a real-time botnet detector consisting of both rule-based and
anomaly detector engines for attack signature gathering. The gath-
ered signatures are parsed by a correlator engine for detection and
collection of botnet infection trails. The authors evaluated the
developed system in both virtual and live network honeynet envi-
ronments. The results of the evaluation showed that the developed
system is accurate and scalable for botnet detection. In a similar
study, Xie et al. [31] developed a spam signature generation archi-
tecture called AutoRE to detect spamming botnets. The developed
AutoRE was able to detect botnet spam with a low false alarm rate
without any preclassified training data. Behal et al. [32] developed
a signature-based botnet detection and prevention architecture.
The developed architecture can first extracts network traffic to
gather information of attack types and the output stored in a data-
base. Alerts are then generated based on the attack types identified
in the network. New rules/signatures from the alerts are then
developed and updated to both a detection and prevention data-
bases. The prevention database contains rules for filtering network
traffic that has been detected by the rule-based detection database
to contain botnet attacks. The developed system showed that it can
dynamically develop new rules for new attacks and drop the traffic
in real time. Anomaly-based detection attempts to detect botnet
using some anomaly system behaviors such as high network
latency and high traffic volume. These anomaly system behaviors
indicate the presence of bot attack in the network. For example,
Chen et al. [33] proposed an ensemble anomaly-based method that
F.E. Ayo, J.B. Awotunde, S.O. Folorunso et al. Egyptian Informatics Journal 24 (2023) 313–325
314
employed normal traffic for training. The method consists of two
detectors which profile and analyze the anomaly behavior with
increased accuracy and reduced false alarms compared to tradi-
tional anomaly detectors. Martinez-Bea et al. [34] proposed a
hybrid real-time fast-flux detection model by building a linear
SVM classifier that merged the feature sets of both McGrath et al.
[35] and Hsu et al. [10] in a bid to utilize their collective strengths.
The authors argued that using the feature set of McGrath et al. [35]
and Hsu et al. [10] in isolation can lead to an unacceptable rate of
false negatives and false positives respectively. The authors trained
and validated their classifier with a k-fold cross validation method
using dataset extracted from domain names advertised on various
malware reporting forums. The scheme provided a lower false pos-
itive rate compared to reviewed works examined by the authors.
Zhao & Traore [36] designed a detection system using flow metrics
to create a decision tree based approach in detecting botnets. Six
(6) features were proposed for botnet detection– mean reported
TTL (MTTL) upon performing DNS queries, actual mean TTL (ATTL)
as observed over time by the detection system, Total Unique A
records (ARCRD) similar to Number of A records, A Record Change
Variance (ARCRDV) similar to the frequency of A record change, A
Record IP stability (ARCRDS) similar to the number of subnets
observed, and Domain name confidence (DCONF) similar to the
domain age feature. Initially, a fast rule based detector observes
the highlighted attributes for each domain over a period of one
(1) week, and schedules domains that suggest malicious fast flux
behavior for extended monitoring. DNS records of suspected
domains are polled continuously at a rate of half its TTL value on
the A records, and the responses are captured. The data captured
through this process are then transformed into a set of attributes
which are then fed into a decision tree. The results showed reduced
false positives compared to other schemes and not prone to dis-
guise attacks. Celik & Oktug [14] proposed a detection system
purely based on the DNS request and the corresponding response
packets collected from recursive DNS server. Specifically, the
authors constructed a 19-dimensional feature vector broadly cate-
gorized into five (5) groups-DNS Answer based, Domain name
based, spatial based, Network based, and Timing-based. Their fea-
ture vector is a collection of features extracted from various exist-
ing schemes. Dataset extracted from the responses obtained during
DNS queries were used to train a C4.5 decision tree classifier. In
order to find the best subset of feature that accurately detects a
fast-flux botnet, 10-fold cross validation approach was employed
on their dataset. The authors initially trained and validated the
classifier with each feature subset separately and observed their
corresponding accuracy. Finally, all features were merged in order
to evaluate the corresponding accuracy. The results showed a
robust detection features that is not prone to disguise attacks.
Vranken & Alizadeh [37] proposed a domain name generation algo-
rithm (DGA) using Term Frequency Inverse Document Frequency
(TF-IDF). The authors used TF-IDF to measure the rates of the most
occurring terms in domain names, and use these as features for
their learning algorithms. The results of their comparison showed
that deep learning model using TF-IDF features yielded the best
results achieving high classification accuracy. Cucchiarelli et al.
[38] proposed an algorithmically generated malicious domain
names detection based on n-grams features. The proposed scheme
represented the domain names through a set of features using 2–3-
grams in a single unclassified and classified domain names. The
authors used the Kullback-Leibner divergence and the Jaccard
Index to evaluate similarity, and deployed state-of-the-arts
machine learning algorithms to classify each domain. The results
showed that the proposed scheme yielded a good level of accuracy
and the scheme was able to classify previously unseen domains.
Muhammad et al. [39] proposed a machine learning approach for
early stage botnet detection. In this paper, the authors proposed
an approach for early-stage botnet detection. The proposed
approach first selects the optimal features using Principal Compo-
nent Analysis (PCA) and Information Gain (IG) feature selection
techniques. The selected features were then fed into machine
learning classifiers for botnet detection. The results revealed that
the proposed approach is accurate with low false alarms for an
early stage botnet detection. Haq & Singh [40] developed a
machine learning scheme for botnet detection. The approach
divided the adapted dataset into two subsets and then applied k-
means clustering on one set and j48 classification on the other
set. The mean of the accuracy of k-means clustering and j48 classi-
fication approach (hybrid approach) was calculated for botnet
detection. The hybrid approach was compared with the clustering
and classification approach. The results showed that the hybrid
approach is balanced for classification and clustering of botnet
attacks. Randhawa et al. [41] proposed a security hardening of bot-
net detectors using generative adversarial networks (GANs). The
authors used GAN to generate an extended dataset to the original
train set to mitigate adversarial evasion attacks in botnet detec-
tion. The results showed that GANs can provide quality botnet
detection samples compared to the traditional traffic generation
methods. Although, generated samples not valid as real-life traffic
and the scheme is vulnerable to evasion mechanisms. Stiawan
et al. [42] proposed a dimensionality reduction approach for
machine learning based botnet detection in Internet of Things
(IoT). The authors used random projection method for dimension-
ality reduction to enhance state-of-the-arts machine learning
methods to detect botnet in IoT. The experiment results showed
random projection method combined with decision tree was able
to detect IoT botnet at fast time and high accuracy. Hosseini
et al. [43] proposed a Convolutional Neural Network and Long
Short Term Memory (CNN-LSTM) for botnet detection. The objec-
tive of the study is to detect botnets based on neural network
and the Negative Selection Algorithm (NSA). The authors used data
wrangling method on the adapted dataset for data normalization.
The normalized data is then fed into the NSA phase to reduce
dimension. The authors then used a data scaling method based
on the z-score algorithm and the scaled data used on CNN-LSTM
algorithm for botnet detection. The results showed shorter training
time and high detection accuracy. Lefoane et al. [44] proposed an
optimized feature selection based on machine learning approach
(decision tree, logistic regression, and support vector machine))
for botnet detection. The first part of the study is the use of a fea-
ture selection approach to remove less important features for bot-
net attack detection. The feature selection is based on the
frequency of occurrence of the counted values to total instances
in each of the features. The second part used the selected features
to build machine learning classifiers for botnet detection. The pro-
posed approach was tested on a standard IoT dataset and the
results revealed that the proposed feature selection approach has
enhanced the detection accuracy of the machine learning classi-
fiers with low false alarm rate. Kolpe & Kshirsagar [45] presented
a botnet detection approach using Bayes classifier. The authors
used different filter-based feature selection schemes to select the
most important features for botnet detection and the selected fea-
tures used as input into a naive Bayes classifier. The results of the
study showed that naïve Bayes classifier achieved the best accu-
racy for botnet detection using the CICIDS-2017 DoS dataset. For
further studies on anomaly-based detection, literatures such as
[46,47,48,49,50,51] are recommended.
2.3. DNS-based detection
The botnetmaster frequently masquerades phishing web ser-
vers as legitimate DNS and then redirecting users to the C&C server
for malicious attacks. Therefore, DNS-based detection is based on
F.E. Ayo, J.B. Awotunde, S.O. Folorunso et al. Egyptian Informatics Journal 24 (2023) 313–325
315
the monitoring and detection of DNS traffic anomalies generated
by a botnetmaster. The same anomaly-based detection algorithms
can be utilized for DNS-based detection. For example, Alieyan et al.
[52] proposed a DNS rule-based method for botnet detection. The
method applied DNS query and response rules to detect any anom-
aly DNS query and response activities. The proposed method
showed an improved accuracy and low false alarm rate for botnet
detection. For further studies on DNS-based detection, literatures
such as [53,54,55,56] are recommended.
2.4. Mining-based detection
Mining-based detection uses several data mining and machine
learning algorithms to detect botnet C&C traffic. For example, Ibra-
him et al. [57] proposed a multilayer framework that consists of fil-
tering and detection modules for botnet detection. The filtering
module was to filter and reduce the number of network features
and group the network traffic in the minimum time interval. The
detection module then used the reduced network features for bot-
net detection based on a multilayer framework. The result showed
that the proposed method can detect botnet with good accuracy.
For further studies on mining-based detection, literatures such as
[58,59,60,61] are recommended.
2.5. Heuristic-based detection
Heuristic-based detection system (HBDS) employs a dynamic
threshold score calculation based on some rules or statistical anal-
ysis of the network traffic for attack classification. HBDS used an
adjustable threshold score to adjust to patterns in network traffic
and reduce the false alarm rate [62]. The limitation of HBDS is
the ability to correctly classify attacks based on the optimization
of their threshold decision. For example, Ramachandran et al.
[63] proposed a set of heuristics methods to detect DNS-based
Black-hole List (DNSBL) lookup queries executed by a botmaster
to know whether their bot have been blacklisted. The proposed
heuristic method was able to provide counter intelligent measures
to the methods used by botmasters to determine blacklisted bots.
The goal of the proposed heuristic model is to detect in real-time
DNSBL queries executed by botmasters from legitimate DNSBL
queries. However, the proposed heuristic model cannot handle dis-
tributed DNSBL queries by botmaster.
Table 1 shows the detailed literature surveys and their
limitations.
2.6. Motivation of the work
To address some of the limitations highlighted in the summary
of related works, this study adapted a reliable and effective feature
to deal with current evasion schemes adopted by botnetmasters.
To overcome the identified problems, this study developed an
improved Bot-FFX that adopts a KNN classifier rooted in rule-
based GA consisting of three features: Standard deviation of
Round-Trip-Time, Average Google Hits, and number of IP address
over a time window. The main motivation for adopting the KNN
is to benefit from the algorithm’s high detection accuracy. The
adoption of the listed features is to tackle the problem of evasion
as they exhibit different behaviours for both FFB and legitimate
domains. Additionally, a rule-based GA technique is introduced
to reduce the time taken to differentiate between legitimate and
botnet domains advertizing the same set of IP address over the
time window.
2.7. Genetic algorithm
Genetic algorithms (GAs) used the computer to simulate the
process of natural selection and evolution [64]. This notion origi-
nates from the ‘‘adaptive survival in natural organisms”. GAs were
first proposed by Goldberg & Holland [65] and have been success-
fully applied to the field of machine learning [66,67]. The algorithm
begins with a randomly generated population of individual pro-
grams. The determination of how good an individual is in a popu-
lation is rooted in their performance evaluation and based on
various types of fitness measures. Then, at every iteration, a com-
puterized genetic recombination and mixing is performed on the
current population of individual programs to replace a less fit indi-
vidual program by a high performing individual program. That is, a
program with a low fitness value is removed and replaced by pro-
grams with high fitness value for the next computer iteration.
2.8. K-Nearest Neighbor
K-nearest neighbor (k-NN) is one of the simplest of all machine
learning techniques. It is regarded as the traditional nonparametric
technique in pattern recognition for the categorization of data
[68,69]. It classify and assigns objects to the modal class of its pre-
defined nearest neighbors. K represents the number of predefined
nearest neighbors for an object and denotes a vital factor that
determines the performance of the classifier. Different k-values
will trigger different performances in the classifier and thus a con-
siderably small positive integer is needed for k-value. A big and
even number k-value can adversely affect the classification time
and impact the prediction accuracy, while a small and odd number
k-value can increase the prediction accuracy [70].k-NN is termed
instance-based learning because of its peculiarity compared to
the inductive learning methods [71]. Thus, k-NN as an instance-
based learning, does not include a model training phase, instead
it determines the instances of input attributes and classifies new
instances based on the determined k-nearest neighbor of the
new instance.
2.9. Decision tree algorithm
Decision tree is a non-parametric supervised learning method
commonly used for classification [72,73]. In other words, it does
not require any prior assumptions regarding the type of probability
distributions satisfied by the class or other attributes. The goal of a
decision tree is to create a model that predicts the value of a target
variable by learning simple decision rules inferred from the data
features [74,75]. In a decision tree, each leaf node is assigned a
class label. The non-terminal nodes, which include the internal
nodes and root node contain attribute test conditions to separate
records that have different characteristics.
3. Methodology
In this study, a Bot-FFX was developed to accurately differenti-
ate between legitimate and botnet domains. The developed system
uses a rule-based GA technique and k-NN algorithm for fast flux
botnet detection. The architecture of the Bot-FFX is described in
Fig. 1.
The Bot-FFX is divided into modules which include –.
3.1. The extractor
The extractor is responsible for DNS queries on domains to:
extract domains and IP addresses
F.E. Ayo, J.B. Awotunde, S.O. Folorunso et al. Egyptian Informatics Journal 24 (2023) 313–325
316
extract Round Trip Times (RTT) of associated IP addresses
extract the Google Hits of each IP address.
3.2. The filter
The filter mechanism of the Bot-FFX uses the blacklist and
whitelist filtering concepts to classify the incoming domains. The
actions executed by the filter mechanism are defined as follows:
Deny Access to domain, if ðD2BÞ//Botnet
Grant Access to domain, if ðD2WÞ// Benign
Send domain to resolver, if ðDRfB[W
Where, D¼Domain name, B¼Blacklist of known botnet
domains and W¼Whitelist of known benign domains.
3.3. The resolver
The filter mechanism can send the unclassified domains to the
resolver. The resolver extracts all IP addresses associated with
the domain at its Time-To-Live (TTL) within a time frame of
10 min. The operations of the resolver are defined in three phases:
Name-server resolution: The set of authoritative name servers
of the domain is extracted. The output of the operation results in a
Table 1
Summary of related works.
Author(s) and year Method Strength Limitation
Provos [20] Honeyd -Security
-Spam prevention
-Vulnerability to evasion mechanisms
-Long detection time
Vrable et al. [26] Potemkin -Scalable
-Security
-Vulnerability to attacks
-Not completely scalable to denial-of-Service
Bajtoš et al. [27] Honeynet -Botnet attacks detection
-Botnet analysis
-High dimensionality of feature set
-Vulnerability to evasion mechanisms
Gu et al. [30] IDS-Driven Dialog
Correlation
-Accurate for botnet detection
-Scalable
-Vulnerability to evasion mechanisms
-High dimensionality of feature set
Xie et al. [31] Spamming Botnets -High detection accuracy
-Low false alarm rate
-Ability to detect frequent domain
modifications
-High dimensionality of feature set
-Vulnerability to evasion mechanisms
Behal et al. [32] Signature-based botnet
detection
-Dynamic rule generation for botnet detection
-Real-time monitoring and detection
-It requires access to a current database of attack
signatures
-Vulnerability to evasion mechanisms
Chen et al. [33] Ensemble anomaly-
based method
-Increased accuracy
-Reduced false alarms
-High dimensionality of feature set
-Vulnerability to evasion mechanisms
Alieyan et al. [52] DNS rule-based
method
-High detection accuracy
-Low false alarm rate
-It cannot detect Peer-to-Peer botnets
-Vulnerability to evasion mechanisms
Ibrahim et al. [57] Multilayer framework -It can detect botnet with good accuracy
-Low false-negative rate
-Long processing and detecting time
-Reduced performance while clustering decentralized
botnets
Ramachandran et al. [63] Heuristic method -It can detect DNS-based Black-hole List
(DNSBL)
-It cannot handle distributed DNSBL queries by
botmaster
-High dimensionality of feature set
Martinez-Bea et al. [34] SVM classifier -Resilience of the scheme to evasion techniques -False positives
-False negatives
Zhao & Traore [36] Decision tree -Reduced false positives
-Not prone to disguise attacks
-Does not provide real time detection
Celik & Oktug [14] DCA-based on n-grams
features
-Robust detection features.
-Not prone to disguise attacks
-Does not provide real time detection
-Computationally expensive scheme to implement
Vranken & Alizadeh [37] DCA-based on TF-IDF
features
-High classification accuracy
-Usage of TF-IDF for feature selection
-Lack of comparative analysis with related works
-Vulnerability to evasion mechanisms
-High dimensionality of feature set
Cucchiarelli et al. [38] DCA-based on n-grams
features
-Effective classification of previously unseen
domains
-High classification accuracy
-Long processing and detecting time
-High dimensionality of feature set
Muhammad et al. [39] Machine learning
approach
-Optimal features selection
-High detection accuracy
-It cannot detect decentralized
P2P based botnets
-Vulnerability to evasion mechanisms
Haq & Singh [40] Hybrid machine
learning approach
-High clustering accuracy -Vulnerability to evasion mechanisms
Randhawa et al. [41] Generative adversarial
network
-Generate quality botnet detection samples
-Robust to data imbalance
-Decrease false positives
-Generated samples not valid as real-life traffic
-Vulnerability to evasion mechanisms
Stiawan et al. [42] Random projection
method based on
machine learning
method
-High detection accuracy
-Low false positive rate
-Fast detection time
-Vulnerability to evasion mechanisms
Hosseini et al. [43] CNN-LSTM -Shorter training time
-High accuracy
-Vulnerability to evasion mechanisms
Lefoane et al. [44] Feature selection-
based machine
learning approach
-Good accuracy
-Low false alarm
-Vulnerability to evasion mechanisms
Kolpe & Kshirsagar [45] Bayes classifier -Optimal feature selection
-High detection accuracy
-Vulnerability to evasion mechanisms
F.E. Ayo, J.B. Awotunde, S.O. Folorunso et al. Egyptian Informatics Journal 24 (2023) 313–325
317
set of name servers Xfx
1
;x
2
;;x
n
gsuch that jXj2. The query
syntax used by the resolver for name-server resolution is -
dig þshortNSdomain ð1Þ
where, þshort ¼command to restrict output to only name-server
records, NS ¼command that specifies the request for name server
records and domain ¼the domain whose name server record is
needed
Time-To-Live resolution: This involves the extraction of the
TTL of the domain to define the moving window for IP address res-
olution. The query syntax used by the resolver for TTL extraction is
as follows:
dig@domain þtracettlid ð2Þ
where, @domain ¼the domain name whose TTL value is needed,
þtrace ¼indicates the downward traversal from the root name ser-
ver to the authoritative name server of the domain name and ttlid ¼
indicator specifying the request for TTL value
IP Address resolution: For each TTL window, the set of IP
address mapped to the domain is extracted for a specified time
frame of 10 min. The query syntax for IP address extraction is as
follows:
dig@NSdomain ð3Þ
where, @NS ¼a name server x
i
such that x
i
2X,domain ¼the
domain whose IP address are needed. This query returns a set of
IP address Yfy
1
;y
2
;;y
n
gsuch that jYj1. The specified time
frame of 10 min for resolution is required to enable the accumula-
tion of several IP addresses of the domain. Researchers such as
Knysz et al. [18], and Hsu et al. [17] revealed that the IP addresses
of botnet domains increased rapidly after the first DNS query. This
phenomenon is adopted by botnetmasters to evade real-time detec-
tion solutions that focus on 1 DNS query for the accumulation of
domain IP addresses.
3.4. The detection
This module is responsible for classification in cases where the
filter mechanism cannot determine the benign nature of the
domain. It comprises of three modules, namely, SDRTT, AGH, and
detector.
Standard Deviation of Round-Trip-Times (SDRTT): This part
of the detection computes the standard deviation of Round-Trip-
Time (SDRTT) on the set of IP addresses of the domain. The dis-
tances between the system and the bots are expected to be quite
large and the SDRTT will be a relatively large value. SD-RTT is com-
puted as:
r
RTT ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1
nX
n
i¼1
t
i
fðÞ
2
v
u
u
tð4Þ
Where, t
i
= the RTT between the system and IP address i,n¼
the total number of IP addresses, f= the mean of the total RTT.
Average Google Hits (AGH): This part of the detection com-
putes the average number of hits returned by querying a search
engine using all IP addresses associated with the domain. The aver-
age Google hit for each domain is computed as
AGH ¼P
n
i¼1
GoogleHitðiÞ
nð5Þ
Where, i¼the ith IP address of the domain and n¼the total
number of IP address associated with the domain.
Detector: The C4.5 decision tree was built to automatically gen-
erate the detection rules for the K-NN. The detector computes
Genetic Threshold Value (GTV) for IP addresses using the GA-
KD-k-NN algorithm. It performs detection based on the decision
tree rules and the GTV using k-NN distance measure as in (4).
The KNN used KD tree for the search of K-values. The Manhattan
distance was adapted because of its suitability for high dimension-
Fig. 1. Bot-FFX architecture.
F.E. Ayo, J.B. Awotunde, S.O. Folorunso et al. Egyptian Informatics Journal 24 (2023) 313–325
318
ality in data. The k-NN add IP address to the whitelist if the IP
address is less than the k-NN distance value (D
knn
) as computed
in equation (8). Otherwise, the IP address is added to the blacklist.
Once an IP is added in the blacklist, a further request from the IP is
automatically rejected by the firewall filtering unit of the detector.
The fitness can be computed as
fitness ¼X
n
i¼0
match weight
i
ð6Þ
where, nis the number of genes present in each chromosome. In
this case, the gene means properties to be checked for each network
domain, where each network domain is equal to a chromosome. The
properties which might be considered as genes for a network
domain includes: source IP address, destination IP address, source
port number, destination port number, size of packet, number of
hops between the source and destination, TTL, packet type, payload,
checksum, sequence number etc.
GTV ¼bestfitness worstfitness
totalnumberoffitness ð7Þ
D
knn
¼X
k
i¼1
GTV x:
v
alue iðÞy:
v
alue iðÞ
jj
ð8Þ
where, x:
v
alue = input query, y:
v
alue = known data point closest to
the input query, k¼the k-value. Table 2 shows the overall GA-KD-
KNN algorithm.
4. Implementation, results and discussion
The implementation was carried out on an Intel(R) Pentium(R)
CPU N3710 @ 1.60 GHz with 4 GB RAM running on Windows 10
operating system. The experimentation for the developed Bot-
FFX was implemented with JAVA, NetBeans 8, MySQL, JavaML,
Jgap, Weka J48, JKDTreeKNN and LibSVM API, Jsoup Java API for
Google search query, JFreeChart Java API and Microsoft Excel for
chart development.
4.1. Description of dataset
The dataset adopted for the evaluation of the developed Bot-FFX
consists of 2,000 benign domains collected from Alexa website and
1630 malicious domains obtained from various malware reporting
media such as Domain Name System Black List (DNSBL), Zeus
tracker monitor, and DNS Black Hole (DNSBH) project. For each
domain, several queries were performed to extract the specified
Table 2
The GA-KD-KNN algorithm.
Step 1. Generate the initial population,n
Step 2. Fitness value estimation
fori:¼1ton
Apply elitism by copying the best GA
individual to the next generation
Apply tournament selection
Apply uniform crossover
Apply mutation operator
Calculate the fitness function for each
individual using equation (6)
Compute GTV using equation (7)
end
Step 3. Select the best individual that has the highest fitness value
Step 4. Apply k-NN based on GTV in 7
fori:¼1ton
Compute
D
knn
¼P
k
i¼1
GTV x:
v
alue iðÞy:
v
alue iðÞ
jj
Compute the group of nearest neighbors
end
Step 5. //k -d tree nearest neighbor search
T T1þT2
T1 first side of the splitting plane
T2 other side of the splitting plane
While ðcurrent node root node:T1)
do
if search pointðcurrent nodeÞ
current node
v
alue (D
knn
)then
search first :¼left
else
search first :¼right
end
if leaf nodeisreachedthen
current best :¼node point
end
end
Repeat Step 5 for T2
if (current best exist in T2)
current best :¼node pointend
Return current best
Step 6. //return the majority class label for unknown domain
if IP address <D
knn
v
aluethen
add IP address to whitelist
else
add IP address to blacklist
end
Step 7. Block further request from the blacklisted IP addresses
F.E. Ayo, J.B. Awotunde, S.O. Folorunso et al. Egyptian Informatics Journal 24 (2023) 313–325
319
features (Standard deviation of Round-Trip-Time, Average Google
Hits, and Number of IP address over a time window). The collected
datasets were also used by other authors in the literature for per-
formance evaluation. Table 3 shows the summary of the adopted
datasets.
4.2. Experimentation
In the development of Bot-FFX, a number of experiments were
conducted. The experiments are described and analyzed in this
section. When tested and used for the experiment, the Bot-FFX
was observed to successfully perform the following tasks:
(i) Executing DNS query on domains to extract IP addresses
(ii) Extracting Round Trip Times (RTT) of associated IP addresses
(iii) Extracting the Google Hits of each IP address
(iv) Calculate GTV for each IP address
(v) Build the C4.5 decision tree
(vi) Build the k-NN detector
(vii) Update the blacklist based on the detection
In order to generate the rules for the k-NN detector, the C4.5
decision tree was used. The result of this decision tree can be
described as set of rules, encapsulating the adopted feature set,
that are used during detection. The rules (e.g. rules 001 to 004)
generated by the decision tree are then extended by the distance
value based on the genetic threshold value (e.g. rules 005 to 006)
for botnet detection. The rule representation of the C4.5 decision
tree in Fig. 2 is as follows:
001: IF att1129.51 THEN domain is benign.
002: IF att1>129.51 AND att08 THEN domain is benign.
003: IF att1129.51 AND att0>8 AND att2866 THEN
domain is benign.
004: IF att1129.51 AND att0>8 AND att2>866 THEN
domain is botnet.
005: 003: IF att1129.51 AND att0>8 AND att2866 AND
att3 > 8 THEN domain is benign.
006: 003: IF att1129.51 AND att0>8 AND att2866 AND
att3 less than 8 THEN domain is botnet.
Where, att0¼number of IP address associated with the
domain, att1¼standard deviation of round-trip time for the
domains’ set of IP addresses, att2¼average google hits for the
domains’ set of IP addresses and att3¼distance value based on
the GTV.
4.3. Result analysis and evaluation
The datasets obtained after extracting the specified features
were adopted for performance evaluation using benchmarked per-
formance metrics.
4.3.1. Performance metrics
The performance of the developed Bot-FFX was measured based
on standard metrics which are False Positive Rate (FPR), False
Negative Rate (FNR), True Positive Rate (TPR), True Negative Rate
(TNR) and Overall Accuracy (OA). To evaluate the Bot-FFX, the
dataset was split into 50% training and 50% testing data, which
was also adopted by Lin et al. [3] and Hsu et al. [17]. The training
and testing data contained 1000 benign and 815 botnet domains,
respectively. The results obtained in this study were compared
with GRADE in Lin et al. [3], FFD in Hsu et al. [17], MLP in Ibrahim
et al. [57], Logistic regression in Palaniappan et al. [76], Random
forest in Sivaguru et al.[77], and Random forest in Patsakis &
Casino [78].
4.3.2. Bot-FFX testing results
The testing dataset was tested with three machine learning
algorithms namely: Genetic Algorithm and K-Nearest Neighbors
(GA-k-NN), k-NN and Support Vector Machines (SVM). The justifi-
cation for this evaluation is to determine the learning algorithm
that best suits the detector. The SVM algorithm was implemented
with the aid of LibSVM [79]. The performance of these algorithms
was evaluated in terms of False Positive Rate (FPR), False Negative
Rate (FNR), True Positive Rate (TPR), True Negative Rate (TNR) and
Accuracy.
Table 4 shows the testing results. The results from Table 4
revealed that GA- k-NN,k-NN and SVM algorithms provided an
overall accuracy of 99.178%, 96.362% and 98.741% respectively.
These results informed the decision to adopt the GA-k-NN as the
most suitable learning algorithm for the detector module. Table 5
and Table 6 show the performance comparison of the developed
GA-k-NN and traditional k-NN on benign and botnet domains,
respectively. Table 5 revealed that GA-k-NN provided OA of
96.858% on benign domain and OA of 99.178% on the botnet
domain. Similarly, Table 6 revealed that k-NN provided OA of
98.706% on benign domain and OA of 96.362% on botnet domain.
Table 7 revealed that SVM provided OA of 96.858% on benign
domain and OA of 98.741% on botnet domain. These results
showed that the developed GA-k-NN is better for botnet detection
when compared to the traditional SVM and k-NN algorithms.
4.3.3. Analysis of feature set
The high performance of Bot-FFX can be attributed to the effi-
cacy of the adopted attributes for differentiating between botnet
and benign domains. To support the above statement, a plot of
the IP address utilization, Standard Deviation of Round-Trip Time
(SDRTT) and the Average Google Hits (AGH) for each domain cate-
gory was carried out on the adopted dataset (Figs. 3 to 5). In Fig. 3,
it is apparent that the distribution of IP address utilization varies
for each domain category. The bar chart shows that about 93.2%
of benign domains advertised less than 9 IP addresses while botnet
domains advertised a minimum of 9 IP addresses during a total of
10 min of IP resolution duration. The justification for the 10 min
time frame was due to the fact that many legitimate domains
use short TTL values.
A critical look at Fig. 4 revealed that the SDRTT of botnet
domain is much higher than that of benign domain. A large number
of benign domain exhibited SDRTT values lower than 200 ms. In
contrast, only a few botnet domains exhibit such behavior and this
is due to the geographical dispersion of the set of IP address
adopted by botnets. Similarly, Fig. 5 revealed that the Google foot-
print for botnet domains is much higher than for that of benign
domains. A large number of benign domains exhibited AGH values
less than 10,000. In contrast, most botnet domains exhibited AGH
values above 10,000.
The contributions of this study to science are –.
Table 3
Summary of Datasets.
Datasets Instances Category
Alexa
1
2000 Benign
DNSBL project
2
110 FFSN
ZeusTracker monitor
3
20 FFSN
DNSBH
4
1500 FFSN
1
http:// https://www.alexa.com.
2
http:// htttp://dnsbl.abuse.ch/fastfluxtracker.php.
3
http:// https://zeus.abuse.ch/monitor.php?filter = level 5.
4
http:// https://www.malwaredomains.com.
F.E. Ayo, J.B. Awotunde, S.O. Folorunso et al. Egyptian Informatics Journal 24 (2023) 313–325
320
Fig. 2. C.45 decision tree based on the adopted feature set.
Table 4
Performance Comparison of GA-k-NN, k-NN and SVM.
Algorithm TPR TNR FPR FNR OA (%)
GA-k-NN 808 793 7 7 99.178%
k-NN 784 769 31 31 96.362%
SVM 813 798 2 2 98.741%
Table 5
Performance Comparison of GA-k-NN on Benign and Botnet domains.
ItrNo
BENIGN DOMAINS BOTNET DOMAINS
NoTI TPR FPR OA (%) NoTI TPR FPR OA (%)
1 1000 982 18 96.673 815 795 20 97.653
2 1000 962 38 92.976 815 808 7 99.178
3 1000 983 17 96.858 815 800 15 98.239
4 1000 978 22 95.934 815 807 8 99.061
5 1000 983 17 96.858 815 799 16 98.122
6 1000 981 19 96.488 815 805 10 98.826
7 1000 982 18 96.673 815 802 13 98.474
8 1000 980 20 96.303 815 804 11 98.709
9 1000 983 17 96.858 815 801 14 98.357
10 1000 981 19 96.488 815 803 12 98.592
ItrNo Iteration Number, NoTI Number of Test Instance.
F.E. Ayo, J.B. Awotunde, S.O. Folorunso et al. Egyptian Informatics Journal 24 (2023) 313–325
321
The development of a genomic k-NN Fast-Flux Botnet detection
system for attacks classification.
The use of reliable features to enhance the detection of fast flux
botnets.
The study also introduced a rule representation method for
whitelisting domains.
4.4. Discussion
4.4.1. Summary of test results
Table 4 shows the performance comparison of GA-k-NN, k-NN,
and SVM in terms of TPR, TNR, FPR, FNR, and OA. The OA column
showed that GA-k-NN has the highest accuracy of 99.178% com-
pared to the other well-known algorithms of k-NN and SVM with
OA of 96.362% and 98.741% respectively. The increase in accuracy
rate for GA-k-NN was due to the use of an optimization method
in GA and a rule representation method that guide the selection
of best solutions. The increase in accuracy of the GA-k-NN is also
slightly dependent on the use of Standard deviation of Round-
Trip-Time, Average Google Hits, and Number of IP address as the
adopted features to enhance the detection of fast flux botnets.
Table 6
Performance Comparison of k-NN on Benign and Botnet domains.
ItrNo BENIGN DOMAINS BOTNET DOMAINS
NoTI TPR FPR OA (%) NoTI TPR FPR OA (%)
1 1000 968 32 94.085 815 772 43 94.953
2 1000 949 51 90.573 815 784 31 96.362
3 1000 980 20 96.303 815 769 46 94.601
4 1000 977 23 95.749 815 775 40 95.305
5 1000 985 15 97.227 815 767 48 94.366
6 1000 983 17 96.858 815 772 43 94.953
7 1000 993 7 98.706 815 760 55 93.545
8 1000 988 12 97.782 815 765 50 94.132
9 1000 992 8 98.521 815 755 60 92.958
10 1000 990 10 98.152 815 762 53 93.779
ItrNo Iteration Number, NoTI Number of Test Instance.
Table 7
Performance Comparison of SVM on Benign and Botnet domains.
ItrNo BENIGN DOMAINS BOTNET DOMAINS
NoTI TPR FPR OA (%) NoTI TPR FPR OA (%)
1 1000 980 20 95.562 815 793 22 96.542
2 1000 959 41 91.754 815 813 2 98.741
3 1000 981 19 95.635 815 803 12 97.128
4 1000 976 24 94.723 815 810 5 98.152
5 1000 981 19 95.747 815 797 18 97.234
6 1000 979 21 95.379 815 802 13 97.715
7 1000 980 20 95.562 815 800 15 97.363
8 1000 978 22 95.212 815 802 13 97.617
9 1000 981 19 95.747 815 799 16 97.246
10 1000 979 21 95.379 815 801 14 97.481
ItrNo Iteration Number, NoTI Number of Test Instance.
0
20
40
60
80
100
1-4 5-8 9-12 13-16 17-20
88.2
50.6 5.9 0.1
00
37.8
50.5
11.5
Domains (%)
IP address range
Benign
Botnet
Fig. 3. IP Address utilization for benign and botnet domains.
Fig. 4. SDRTT for benign and botnet domains in the dataset.
F.E. Ayo, J.B. Awotunde, S.O. Folorunso et al. Egyptian Informatics Journal 24 (2023) 313–325
322
In order to improve predictions and remove the problem of
unbalanced data in classification, k-fold (k = 5) cross-validation
was used. This study randomly divided the training data into 5
equal sized subsets. A single subset was applied to test the devel-
oped method and the remaining 4 subsets were used as the train-
ing data. The comparative test results were obtained on the same
computational platform.
4.4.2. Comparison of GA-k-NN on benign and botnet domains
Table 5 shows the test experiments of GA-k-NN conducted on
both the benign and botnet domains. The results of the test showed
that GA-k-NN provided OA of 96.858% on the benign domain and
OA of 99.178% on the botnet domain. The high performance of
GA-k-NN can be attributed to the efficacy of the adopted attributes
for differentiating between botnet and benign domains.
4.4.3. Comparison of k-NN and SVM on benign and botnet domains
Table 6 shows the test experiments of k-NN conducted on both
the benign and botnet domains. The results of the test showed that
k-NN provided OA of 98.706% on the benign domain and OA of
96.362% on the botnet domain. The results of the k-NN can be
attributed to the algorithm’s high detection accuracy. Table 7
shows the test experiment of SVM conducted on both the benign
and botnet domains. The results revealed that SVM provided OA
of 96.858% on benign domain and OA of 98.741% on botnet domain.
The high performance of GA-k-NN compared to k-NN and SVM can
be attributed to the adoption of the used features to tackle the
problem of botnet master evasion as they exhibit different beha-
viours for both botnet and benign domains. Additionally, GA-k-
NN was better than k-NN and SVM due to the introduction of the
GA rooted in a rule representation method to reduce the time
taken to differentiate between legitimate and botnet domains
advertizing the same set of IP address over the time window.
4.4.4. Overall performance of Bot-FFX
The developed Bot-FFX showed overall performance with OA of
99.178%, FPR of 0.8%, and FNR of 0.8%. This result shows the posi-
tive contribution of Bot-FFX for botnet attack classification with
reduced false alarm rate. The reduced false alarm rate was due to
the ability of the Bot-FFX to clearly differentiate between botnet
and benign domains using the adopted attributes.
4.4.5. IP address utilization for benign and botnet domains
Fig. 3 shows the plot of IP Address utilization for benign and
botnet domains. Previous results has established the high perfor-
mance of Bot-FFX. To further justify the high performance of Bot-
FFX, a plot of the IP address utilization for each domain category
was carried out on the adopted dataset. In Fig. 3, it is apparent that
the distribution of IP address utilization varies for each domain
category. The bar chart shows that about 93.2% of benign domains
advertised less than 9 IP addresses while botnet domains adver-
tised a minimum of 9 IP addresses during a total of 10 min of IP
resolution duration. The justification for the 10 min time frame
was due to the fact that many legitimate domains use short TTL
values.
4.4.6. SDRTT for benign and botnet domains in the dataset
Fig. 4 shows the plot of SDRTT for benign and botnet domains in
the dataset. In order to support the results for the high perfor-
mance of the Bot-FFX, a plot of the Standard Deviation of Round-
Trip Time (SDRTT) for each domain category was carried out on
the adopted dataset. A critical look at Fig. 4 revealed that the
SDRTT of botnet domain is much higher than that of benign
domain. A large number of benign domain exhibited SDRTT values
lower than 200 ms (Fig. 4a). In contrast, only a few botnet domains
exhibit such behavior (Fig. 4b) and this is due to the geographical
dispersion of the set of IP address adopted by botnets.
4.4.7. AGH for benign and botnet domains in the dataset
Fig. 5 shows the plot of AGH for benign and botnet domains in
the dataset. Fig. 5 revealed that the Google footprint for botnet
domains is much higher than for that of benign domains. A large
number of benign domains exhibited AGH values less than
10,000 (Fig. 5a). In contrast, most botnet domains exhibited AGH
values above 10,000 (Fig. 5b). This is because malicious domains
are one of the key domains that attackers used to perpetrate mali-
cious actions over the Internet. Hence, the botnet domains exhibit-
ing AGH values above 10,000 compared to their benign
counterpart.
4.4.8. Benchmarking Bot-FFX with related works
Table 8 compared the performance of the developed Bot-FFX
with GRADE in Lin et al. [3], FFD in Hsu et al. [17], MLP in Ibrahim
et al. [57], Logistic regression in Palaniappan et al. [76], Random
forest in Sivaguru et al.[77], and Random forest in Patsakis &
Casino [78]. This study implemented and tested the methods under
comparison on the same datasets and computational platform. The
evaluation results obtained from the comparison, indicated that
Fig. 5. AGH for benign and botnet domains in the dataset.
Table 8
Comparison of Bot-FFX with related works.
Detection Approaches OA (%) FN rate (%) FP rate (%)
Lin et al. [3] 96.5 1.6 1.9
Hsu et al. [17] 93.4 1.5 0.7
Ibrahim et al. [57] 98.7 0.9 0.8
Palaniappan et al. [76] 91.5 1.6 1.8
Sivaguru et al. [77] 98.4 1.5 1.7
Patsakis & Casino [78] 98.5 1.4 1.5
Bot-FFX 99.2 0.8 0.8
F.E. Ayo, J.B. Awotunde, S.O. Folorunso et al. Egyptian Informatics Journal 24 (2023) 313–325
323
the developed Bot-FFX is better than Lin et al. [3], Hsu et al. [17],
Ibrahim et al. [57], Palaniappan et al. [76], Sivaguru et al. [77],
and Patsakis & Casino [78] respectively. The main advantage of
Bot-FFX over the other related implemented systems is the
requirement of a set of three (3) features depending on the filter
module decision. This requirement reduces the time needed to
train the GA-k-NN classifier to few minutes. Besides, the practical
deployment of Bot-FFX will result to the detection of a botnet
domain within 20 min of deployment. The developed Bot-FFX is
also robust to dynamic environment since genetic algorithm can
varies in accordance to the current situation. The results of the
developed Bot-FFX also showed an optimized solution due to the
fact that genetic algorithm always produce the best result.
5. Conclusion and future work
The evolution of the Internet and the network of anonymous
users unaware of the need of Internet security has led to Fast-
Flux Botnets as a means of exploitation by money-driven cyber-
criminals. Fast-Flux Botnet is a prevalent security challenge as it
provides botnetmasters the opportunity to remotely control the
network of infected hosts. In the literature, a number of solutions
have been developed to reduce this menace. However, these solu-
tions are still limited in detection accuracy due to the ineffective-
ness of the adopted feature set. In this study, Bot-FFX was
developed with this limitation in mind, and this resulted in the uti-
lization of a rule-based GA scheme and three effective features that
are fed into a k-NN built on the decision tree and KD tree algo-
rithms. Bot-FFX was tested on a public dataset and benchmarked
with GRADE in Lin et al. [3], FFD in Hsu et al. [17], MLP in Ibrahim
et al. [57], Logistic regression in Palaniappan et al. [76], Random
forest in Sivaguru et al. [77], and Random forest in Patsakis &
Casino [78]. The evaluation results showed that the developed
Bot-FFX is better in detection accuracy and achieved the best false
negative rate of 0.8% compared to other related implemented
methods. In the future, a machine learning classifier in combina-
tion with genetic algorithm will be deployed in the extractor mod-
ule to produce the GTV and another machine learning classifier
based on the GTV will be deployed in the detector module.
CRediT authorship contribution statement
Femi Emmanuel Ayo: Conceptualization, Methodology. Joseph
Bamidele Awotunde: Data curation. Sakinat Oluwabukonla
Folorunso: Visualization, Investigation. Matthew O. Adigun:
Supervision. Sunday Adeola Ajagbe: Validation.
Declaration of Competing Interest
The authors declare that they have no known competing finan-
cial interests or personal relationships that could have appeared
to influence the work reported in this paper.
References
[1] Zhang, L., Shui, Y., Di, W. & Paul, W. 2011. A Survey on Latest Botnet Attack and
Defense. In: Proceedings of International Joint Conference of IEEE Trustcom-
11/IEEE ICESS-11/FCST-11. Changsha China pp.53-60.
[2] Butt UJ, Richardson W, Nouman A, Agbo HM, Eghan C, Hashmi F. Cloud and Its
Security Impacts on Managing a Workforce Remotely: A Reflection to Cover
Remote Working Challenges. In: Cybersecurity, Privacy and Freedom
Protection in the Connected World. Cham: Springer; 2021. p. 285–311.
[3] Lin H-T, Lin Y-Y, Chiang J-W. Genetic-based Real-time Fast-Flux Service
Networks Detection. J. Comput. Networks: Elsevier 2013;57(2):501–13.
[4] Holz, T., Gorecki, C., Rieck, K. & Freiling F.C. 2008. Detection and mitigation of
fast-flux service networks. In: Proceedings of the 15th Network and
Distributed System Security Symposium. San Diego USA.
[5] Lallie HS, Shepherd LA, Nurse JR, Erola A, Epiphaniou G, Maple C, et al. Cyber
security in the age of covid-19: A timeline and analysis of cyber-crime and
cyber-attacks during the pandemic. Comput Secur 2021;105:102248.
[6] Stalmans, E. & Irwin, B. 2011. A framework for DNS based detection and
mitigation of malware infections on a network. In: Proceedings of the 10th
IEEE International Conference on Information Security. Johannesburg South
Africa pp.1-8.
[7] Khari M, Dalal R, Rohilla P. Extended paradigms for botnets with WoT
applications: a review. Smart Innovation of Web of Things 2020:105–22.
[8] Aruna J, Shyry SP. Survey on Artificial Intelligence Based Resilient Recovery of
Botnet Attack. In: In 2021 5th International Conference on Trends in Electronics
and Informatics (ICOEI). IEEE; 2021. p. 1–8.
[9] Firat I. Inevitable Battle Against Botnets. In: Management Association IR,
editor. Research Anthology on Combating Denial-of-Service Attacks:. IGI Global;
2021. p. 1–19.
[10] Hsu, C-H., Huang, C-Y. & Chen, K-T. 2010. Fast-flux bot detection in real time.
In: Proceedings of the 13th International Conference on Recent Advances in
Intrusion Detection (RAID). Springer Berlin Heidelberg pp.464–483.
[11] Passerini E, Roberto P, Lorenzo M, Danilo B. FluXOR: Detecting and Monitoring
Fast-Flux Service Networks. Berlin: Detection of Intrusions and Malware, and
Vulnerability Assessment, Springer; 2008. p. 186–206.
[12] Ahmad R, Alsmadi I. Machine learning approaches to IoT security: A
systematic literature review. Internet of Things 2021;100365.
[13] Kumar P, Gupta GP, Tripathi R. Toward design of an intelligent cyber attack
detection system using hybrid feature reduced approach for iot networks. Arab
J Sci Eng 2021;46(4):3749–78.
[14] Celik, Z.B. & Oktug, S. 2013. Detection of Fast-Flux Networks Using Various
DNS Feature Sets. In: Proceedings of IEEE Symposium on Computers and
Communications (ISCC). Split Croatia pp.000868 000873.
[15] Ashraf J, Keshk M, Moustafa N, Abdel-Basset M, Khurshid H, Bakhshi AD, et al.
IoTBoT-IDS: A Novel Statistical Learning-enabled Botnet Detection Framework
for Protecting Networks of Smart Cities. Sustain Cities Soc 2021;103041.
[16] Zhang J, Ling Y, Fu X, Yang X, Xiong G, Zhang R. Model of the intrusion
detection system based on the integration of spatial-temporal features.
Comput Secur 2020;89:101681.
[17] Hsu F-H, Wang C-SfC-H, Tso C-K, Chen L-H, Lin S-H. Detect Fast-Flux Domains
Through Response Time Differences. IEEE J Sel Areas Commun 2014;32
(10):1947–56.
[18] Knysz, M., Hu, X. & Shin, K. 2011. Good guys vs. bot guise: Disguise attacks
against fast-flux detection systems. In: Proceedings of 2011 IEEE INFOCOM.
Shanghai China pp.1844-1852.
[19] Zhu Z, Lu G, Chen Y, Fu ZJ, Roberts P, Han K. Botnet research survey. In: In 2008
32nd Annual IEEE International Computer Software and Applications
Conference. IEEE; 2008. p. 967–72.
[20] Provos, N. 2004. A Virtual Honeypot Framework. In USENIX Security
Symposium (Vol. 173, No. 2004, pp. 1-14).
[21] Choo KKR. Zombies and botnets. Trends Issues Crime Crim Justice
2007;333:1–6.
[22] Dagon, D., Zou, C. C., & Lee, W. 2006. Modeling Botnet Propagation Using Time
Zones. In NDSS (Vol. 6, pp. 2-13).
[23] Zeidanloo HR, Shooshtari MJZ, Amoli PV, Safari M, Zamani M. A taxonomy of
botnet detection techniques. In 2010 3rd International Conference on Computer
Science and Information Technology, Vol. 2. IEEE; 2010. p. 158–62.
[24] Wang TZ, Wang HM, LIU B, Shi PC. Some critical problems of botnets. Chinese J
Comput 2012;35(6):1192–208.
[25] Alparslan E, Karahoca A, Karahoca D. BotNet detection: Enhancing analysis by
using data mining techniques. Advances in Data Mining Knowledge Discovery
and Applications 2012;Vol. 349.
[26] Vrable M, Ma J, Chen J, Moore D, Vandekieft E, Snoeren AC, et al. Scalability,
fidelity, and containment in the potemkin virtual honeyfarm. SIGOPS Oper
Syst Rev 2005;39(5):148–62.
[27] Bajtoš, T., Sokol, P., & Mézešová, T. 2018. Virtual honeypots and detection of
telnet botnets. In Proceedings of the Central European Cybersecurity Conference
2018 (pp. 1-6).
[28] Kumar P, Gupta GP, Tripathi R. Design of anomaly-based intrusion detection
system using fog computing for IoT network. Autom Control Comput Sci
2021;55(2):137–47.
[29] Kumar, P., Tripathi, R., & P. Gupta, G. 2021d. P2IDF: a privacy-preserving based
intrusion detection framework for software defined Internet of Things-fog
(SDIoT-Fog). In Adjunct Proceedings of the 2021 International Conference on
Distributed Computing and Networking (pp. 37-42).
[30] Gu, G., Porras, P. A., Yegneswaran, V., Fong, M. W., & Lee, W. 2007. Bothunter:
Detecting malware infection through ids-driven dialog correlation. In USENIX
Security Symposium (Vol. 7, pp. 1-16).
[31] Xie Y, Yu F, Achan K, Panigrahy R, Hulten G, Osipkov I. Spamming botnets:
signatures and characteristics. ACM SIGCOMM Computer Communication
Review 2008;38(4):171–82.
[32] Behal S, Brar AS, Kumar K. Signature-based botnet detection and prevention.
In: In Proceedings of International Symposium on Computer Engineering and
Technology. p. 127–32.
[33] Chen T, Zhou G, Liu Z, Jing T. A novel ensemble anomaly based approach for
command and control channel detection. In: In Proceedings of the 2020 4th
International Conference on Cryptography, Security and Privacy. p. 74–8.
[34] Martinez-Bea, S., Castillo-Perez, S., & Garcia-Alfaro, J. 2013. Real-time
malicious fast-flux detection using DNS and bot related features. In 2013
Eleventh Annual Conference on Privacy, Security and Trust (pp. 369-372). IEEE.
F.E. Ayo, J.B. Awotunde, S.O. Folorunso et al. Egyptian Informatics Journal 24 (2023) 313–325
324
[35] McGrath DK, Kalafut A, Gupta M. Phishing infrastructure fluxes all the way.
IEEE Secur Priv 2009;7(5):21–8.
[36] Zhao, D., & Traore, I. 2012. P2P botnet detection through malicious fast flux
network identification. In 2012 Seventh International Conference on P2P,
Parallel, Grid, Cloud and Internet Computing (pp. 170-175). IEEE.
[37] Vranken H, Alizadeh H. Detection of DGA-Generated Domain Names with TF-
IDF. Electronics 2022;11(3):414.
[38] Cucchiarelli A, Morbidoni C, Spalazzi L, Baldi M. Algorithmically generated
malicious domain names detection based on n-grams features. Expert Syst
Appl 2021;170:114551.
[39] Muhammad, A., Asad, M., & Javed, A. R. 2020. Robust early stage botnet
detection using machine learning. In 2020 International Conference on Cyber
Warfare and Security (ICCWS) (pp. 1-6). IEEE.
[40] Haq, S., & Singh, Y. 2018. Botnet detection using machine learning. In 2018
Fifth International Conference on Parallel, Distributed and Grid Computing
(PDGC) (pp. 240-245). IEEE.
[41] Randhawa RH, Aslam N, Alauthman M, Rafiq H, Comeau F. Security hardening
of botnet detectors using generative adversarial networks. IEEE Access
2021;9:78276–92.
[42] Stiawan, D., Arifin, M. A. S., Rejito, J., Idris, M. Y., & Budiarto, R. 2021. A
Dimensionality Reduction Approach for Machine Learning Based IoT Botnet
Detection. In 2021 8th International Conference on Electrical Engineering,
Computer Science and Informatics (EECSI) (pp. 26-30). IEEE.
[43] Hosseini S, Nezhad AE, Seilani H. Botnet detection using negative selection
algorithm, convolution neural network and classification methods. Evol Syst
2022;13(1):101–15.
[44] Lefoane M, Ghafir I, Kabir S, Awan IU. Machine Learning for Botnet Detection:
An Optimized Feature Selection Approach. In: In The 5th International
Conference on Future Networks & Distributed Systems. p. 195–200.
[45] Kolpe P, Kshirsagar D. Botnet Detection Using Bayes Classifier. In: Applied
Information Processing Systems. Singapore: Springer; 2022. p. 321–30.
[46] Hoang XD, Nguyen QC. Botnet detection based on machine learning
techniques using DNS query data. Future Internet 2018;10(5):43.
[47] Nõmm, S., & Bahsßi, H. 2018. Unsupervised anomaly based botnet detection in
IoT networks. In 2018 17th IEEE international conference on machine learning
and applications (ICMLA) (pp. 1048-1053). IEEE.
[48] Shang, Y., Yang, S., & Wang, W. 2018. Botnet detection with hybrid analysis on
flow based and graph based features of network traffic. In International
Conference on Cloud Computing and Security (pp. 612-621). Springer, Cham.
[49] Maeda, S., Kanai, A., Tanimoto, S., Hatashima, T., & Ohkubo, K. 2019. A botnet
detection method on SDN using deep learning. In 2019 IEEE International
Conference on Consumer Electronics (ICCE) (pp. 1-6). IEEE.
[50] Ayo FE, Folorunso SO, Abayomi-Alli AA, Adekunle AO, Awotunde JB. Network
intrusion detection based on deep learning model optimized with rule-based
hybrid feature selection. Informat Secur J Global Perspect 2020;29(6):267–83.
[51] Kumar P, Gupta GP, Tripathi R. PEFL: Deep Privacy-Encoding-Based Federated
Learning Framework for Smart Agriculture. IEEE Micro 2021;42(1):33–40.
[52] Alieyan K, Almomani A, Anbar M, Alauthman M, Abdullah R, Gupta BB. DNS
rule-based schema to botnet detection. Enterprise Informat Syst 2021;15
(4):545–64.
[53] Kwon J, Lee J, Lee H, Perrig A. PsyBoG: A scalable botnet detection method for
large-scale DNS traffic. Comput Netw 2016;97:48–73.
[54] Pomorova, O., Savenko, O., Lysenko, S., Kryshchuk, A., & Bobrovnikova, K. 2016.
Anti-evasion technique for the botnets detection based on the passive DNS
monitoring and active DNS probing. In International Conference on Computer
Networks (pp. 83-95). Springer, Cham.
[55] Wang TS, Lin HT, Cheng WT, Chen CY. DBod: Clustering and detecting DGA-
based botnets using DNS traffic analysis. Comput Secur 2017;64:1–15.
[56] Dwyer, O. P., Marnerides, A. K., Giotsas, V., & Mursch, T. 2019. Profiling IoT-
based Botnet Traffic using DNS. In 2019 IEEE Global Communications
Conference (GLOBECOM) (pp. 1-6). IEEE.
[57] Ibrahim WNH, Anuar S, Selamat A, Krejcar O, Crespo RG, Herrera-Viedma E,
et al. Multilayer framework for botnet detection using machine learning
algorithms. IEEE Access 2021;9:48753–68.
[58] Masud, M. M., Al-Khateeb, T., Khan, L., Thuraisingham, B., & Hamlen, K. W.
2008. Flow-based identification of botnet traffic by mining multiple log files.
In 2008 first international conference on distributed framework and
applications (pp. 200-206). IEEE.
[59] Shahrestani, A., Feily, M., Ahmad, R., & Ramadass, S. 2009. Architecture for
applying data mining and visualization on network flow for botnet traffic
detection. In 2009 International Conference on Computer Technology and
Development (Vol. 1, pp. 33-37). IEEE.
[60] Liao, W. H., & Chang, C. C. 2010. Peer to peer botnet detection using data
mining scheme. In 2010 international conference on internet technology and
applications (pp. 1-4). IEEE.
[61] Folorunso O, Ayo FE, Babalola YE. Ca-NIDS: A network intrusion detection
system using combinatorial algorithm approach. J Informat Priv Secur 2016;12
(4):181–96.
[62] Dora V, Lakshmi VN. Optimal feature selection with CNN-feature learning for
DDoS attack detection using meta-heuristic-based LSTM. Int J Intellig Robot
Appl 2022:1–27.
[63] Ramachandran A, Feamster N, Dagon D. Revealing botnet membership using
dnsbl counter-intelligence. Sruti 2006;6:49–54.
[64] Koza JR. Genetic programming: On the programming of computers by means
of natural selection. Massachusetts: MIT; 1992.
[65] Goldberg, D. E., & Holland, J. H. 1988. Genetic algorithms and machine
learning. Machine Learning, 3(2): 95–99 Springer, USA.
[66] Alcalá R, Gacto MJ, Herrera F, Alcalá-Fdez J. A multi-objective genetic
algorithm for tuning and rule selection to obtain accurate and compact
linguistic fuzzy rule-based systems. Int J Uncertainty, Fuzzin Knowledge-
Based Syst, World Scientific: Singapore 2007;15(05):539–57.
[67] Fernández A, López V, del Jesus MJ, Herrera F. Revisiting Evolutionary Fuzzy
Systems: Taxonomy, applications, new trends and challenges. Knowl-Based
Syst 2015;80:109–21.
[68] Bishop CM. Neural networks for pattern recognition. England: Oxford
University; 1995.
[69] Manocha S, Girolami MA. An empirical analysis of the probabilistic Knearest
neighbour classifier. Pattern Recogn Lett 2007;28:1818–24.
[70] Chaudhari P, Agarwal H, Bhateja V. Data augmentation for cancer classification
in oncogenomics: an improved KNN based approach. Evol Intel 2021;14
(2):489–98.
[71] Mitchell T. Machine learning. New york: McGraw Hill; 1997.
[72] Navada A, Ansari AN, Patil S, Sonkamble BA. Overview of use of decision tree
algorithms in machine learning. In: In 2011 IEEE control and system graduate
research colloquium. IEEE; 2011. p. 37–42.
[73] Yan X, He J, Zhang C, Liu Z, Qiao B, Zhang H. Single-vehicle crash severity
outcome prediction and determinant extraction using tree-based and other
non-parametric models. Accid Anal Prev 2021;153:106034.
[74] Rathore SS, Kumar S. A decision tree logic based recommendation system to
select software fault prediction techniques. Computing 2017;99(3):255–85.
[75] Muñoz V, Vallejo M, Aedo JE. Machine learning models for predicting crime
hotspots in medellin city. In: In 2021 2nd Sustainable Cities Latin America
Conference (SCLA). IEEE; 2021. p. 1–6.
[76] Palaniappan G, Sangeetha S, Rajendran B, Goyal S, Bindhumadhava BS.
Malicious domain detection using machine learning on domain name
features, host-based features and web-based features. Procedia Comput Sci
2020;171:654–61.
[77] Sivaguru R, Peck J, Olumofin F, Nascimento A, De Cock M. Inline detection of
DGA domains using side information. IEEE Access 2020;8:141910–22.
[78] Patsakis C, Casino F. Exploiting statistical and structural features for the
detection of Domain Generation Algorithms. J Informat Secur Appl
2021;58:102725.
[79] Chang C-H, Lin C-J. LIBSVM: a library for support vector machines. ACM Trans
Intell Syst Technol 2011;2(27):27.
F.E. Ayo, J.B. Awotunde, S.O. Folorunso et al. Egyptian Informatics Journal 24 (2023) 313–325
325
... Like SVM this method is also versatile because it gets adapted to various kinds of problems. It is a non-parametric model which does not make any expectations about data distribution [40]. KNN is more effective for small datasets and computational cost is high. ...
Article
Full-text available
Due to the increase in network attacks, maintaining network security is significantly difficult, to overcome security vulnerabilities Intrusion Detection System (IDS) is utilized. IDS is a software application that monitors the network traffic and detects the malicious activity in the network. Network Intrusion Detection System (NIDS) identifies the suspicious behaviour of nodes in the network by analysing the network traffic. Most of the existing IDS suffer from achieving better feature selection with high classification accuracy with reduced false alarm rate. In the proposed system, the Principal Component Analysis (PCA) technique is utilized to reduce the dimensionality of the dataset. Improved Harris Hawks Optimizer (IHHO) is employed for effective feature selection which provides powerful global search capability. For classification, two-staged classifier is proposed which employs Support Vector Machine (SVM) for stage-1 and K-Nearest Neighbors (KNN) for stage-2. The main goal of the proposed system is to combine the advantages of SVM and KNN to enhance classification accuracy with a reduced false alarm rate. The performance of the proposed system is evaluated by using the NSL- KDD dataset and it has achieved an overall classification accuracy of 95.01%, a False alarm rate of 0.01%, and an overall detection rate of 92.01%.
... The evaluation results show that the proposed model can detect DNS botnets better than the others. Ayo et al. [47] also introduce an approach to improve the performance of Fast Flux Botnet detection (FFB). The problem of detection speed and large feature dimensions are investigated using the Genetic Algorithm (GA) and k-Nearest Neighbor ( k-NN) approaches. ...
Article
Full-text available
Threats on computer networks have been increasing rapidly, and irresponsible parties are always trying to exploit vulnerabilities in the network to do various dangerous things. One way to exploit vulnerabilities in a computer network is by employing malware. Botnets are a type of malware that infects and attacks targets in groups. Botnets develop quickly; the characteristics of initially sporadic attacks have grown into periodic and simultaneous. This rapid development has proved that the botnet is advanced and requires more attention and proper handling. Many studies have introduced detection models for botnet attack activity on computer networks. Apart from detecting the presence of botnet attacks, those studies have attempted to explore the characteristics of botnets, such as attack intensity, relationships between activities, and time segment analysis. However, there has been no research that explicitly detects those characteristics. On the other hand, each botnet characteristic requires different handling, while recognizing the characteristics of the botnet can help network administrators make appropriate decisions. Based on these reasons, this research builds a detection model that can recognize botnet characteristics using sequential traffic mining and similarity analysis. The proposed method consists of two main processes. The first is training to build a knowledge base, and the second is testing to detect botnet activity and attack characteristics. It involves dynamic thresholds to improve the model sensitivity in recognizing attack characteristics through similarity analysis. The novelty includes developing and combining analytical techniques of sequential traffic mining, similarity analysis, and dynamic threshold to detect and recognize the characteristics of botnet attacks explicitly on actual behavior in network traffic. Extensive experiments have been conducted for the evaluation using three different datasets whose results show better performance than others.
... Also, an essential component of a NIDS is the taxonomy used to classify the attacks and a formal model to represent the training examples of the taxonomy [15][16][17]. According to authors in [18] and [19], the combination of ontology with machine learning (ML) techniques is still a relatively new topic of study in the intrusion detection field [20]. Using an ontology model should be able to provide a classification taxonomy for intrusion detection. ...
Article
Full-text available
The need to secure Internet applications on global networks has become an important task due to the ever-increasing cybercrimes. A common technique for identifying intrusions in computer networks is the Network Intrusion Detection System (NIDS). Several Intrusion Detection Systems have been proposed previously, but these systems are still limited in detection and error rates. Additionally, most of the detection techniques used a set of static rules and manual taxonomies for the detection of intrusions. In this study, a layered rule-based NIDS using ontology was developed. The study adapted a layered attribute evaluator approach to choose the best attributes for NIDS. In order to automatically construct the rules for intrusion detection, the chosen attributes were trained with a classification tree. The created rules are then introduced into the Protégé software for the ontology classification of NIDS. In contrast with taxonomies, the generated ontology provides comprehensive definitions of the concepts inside the NIDS domain that are machine interpretable and illustrates the relationships between the concepts. The findings revealed that the developed approach has 97.431% accuracy, 97.48% precision, 97.41% recall, and 97.41% F1-score on the original dataset. Similarly, the developed approach reported 98.21% accuracy, 98.21% precision, 98.21% recall, and 98.21% F1-score on the reduced dataset. These results demonstrated that the developed approach outperformed the other similar approaches on both the original and reduced datasets. The developed approach also showed better training time compared to the other related approaches.
... Despite the promise these techniques hold, their efficacy often wanes when singularly deployed against the multifaceted and sophisticated nature of modern botnets, as no lone algorithm proves universally potent. This reality begets the core motivation for our research-crafting a composite machine learning strategy that integrates the strengths of multiple algorithms to establish a robust, ensemble-based model capable of superior performance in the dynamic arena of botnet detection 12,[25][26][27][28][29] . Ibrahim et al. 30 proposed a multilayer architecture for botnet detection, employing the KNN algorithm and achieving an accuracy of 92%. ...
Article
Full-text available
In the age of sophisticated cyber threats, botnet detection remains a crucial yet complex security challenge. Existing detection systems are continually outmaneuvered by the relentless advancement of botnet strategies, necessitating a more dynamic and proactive approach. Our research introduces a ground-breaking solution to the persistent botnet problem through a strategic amalgamation of Hybrid Feature Selection methods—Categorical Analysis, Mutual Information, and Principal Component Analysis—and a robust ensemble of machine learning techniques. We uniquely combine these feature selection tools to refine the input space, enhancing the detection capabilities of the ensemble learners. Extra Trees, as the ensemble technique of choice, exhibits exemplary performance, culminating in a near-perfect 99.99% accuracy rate in botnet classification across varied datasets. Our model not only surpasses previous benchmarks but also demonstrates exceptional adaptability to new botnet phenomena, ensuring persistent accuracy in a landscape of evolving threats. Detailed comparative analyses manifest our model's superiority, consistently achieving over 99% True Positive Rates and an unprecedented False Positive Rate close to 0.00%, thereby setting a new precedent for reliability in botnet detection. This research signifies a transformative step in cybersecurity, offering unprecedented precision and resilience against botnet infiltrations, and providing an indispensable blueprint for the development of next-generation security frameworks.
... The manual detection of the infection using radiographic pictures is quite difficult since it takes a lot of time and is very prone to human error [28]. The development of automated, precise, and effective methods for pandemics and Identities, including COVID-19 detection, has showed promise and is now being pursued. ...
Article
Full-text available
Deep learning (DL) is becoming a fast-growing field in the medical domain and it helps in the timely detection of any infectious disease (IDs) and is essential to the management of diseases and the prediction of future occurrences. Many scientists and scholars have implemented DL techniques for the detection and prediction of pandemics, IDs and other healthcare-related purposes, these outcomes are with various limitations and research gaps. For the purpose of achieving an accurate, efficient and less complicated DL-based system for the detection and prediction of pandemics, therefore, this study carried out a systematic literature review (SLR) on the detection and prediction of pandemics using DL techniques. The survey is anchored by four objectives and a state-of-the-art review of forty-five papers out of seven hundred and ninety papers retrieved from different scholarly databases was carried out in this study to analyze and evaluate the trend of DL techniques application areas in the detection and prediction of pandemics. This study used various tables and graphs to analyze the extracted related articles from various online scholarly repositories and the analysis showed that DL techniques have a good tool in pandemic detection and prediction. Scopus and Web of Science repositories are given attention in this current because they contain suitable scientific findings in the subject area. Finally, the state-of-the-art review presents forty-four (44) studies of various DL technique performances. The challenges identified from the literature include the low performance of the model due to computational complexities, improper labeling and the absence of a high-quality dataset among others. This survey suggests possible solutions such as the development of improved DL-based techniques or the reduction of the output layer of DL-based architecture for the detection and prediction of pandemic-prone diseases as future considerations.
Article
Full-text available
With the popularization of the internet, cybercrime continues to increase, and traditional blacklist methods have difficulty in coping with new threats. To address this challenge, the authors propose a web domain name security access recognition algorithm based on bidirectional recurrent neural networks, aiming to more effectively combat domain name generation technology. This algorithm extracts richer semantic features at each layer through bidirectional recurrent neural networks to more accurately describe domain name features, thus effectively handling SGD problems in abnormal network traffic detection. The results show that compared with the other three algorithms, the model trained by HCA-BAGD has better performance and higher accuracy, successfully solving the problem of network security detection. This study emphasizes the importance of cybersecurity and emphasizes continuous innovation and the adoption of new technological tools to ensure the safe operation of the internet ecosystem, bringing new perspectives and solutions to research and applications in the field of cybersecurity.
Article
Full-text available
As the degree of AC-DC hybridization of power grids is increasing, their fault characteristics become more complex, and the current hybrid grid fault diagnosis methods, despite their high accuracy, are not sufficiently adaptable, so there is an urgent need to study new grid fault diagnosis methods. To this end, the Markov Transition Field combined with an improved Resnet fault line identification method for AC-DC hybrid grids is proposed. First, the data is reconstructed by the improved complete ensemble empirical mode decomposition with adaptive noise, and then the MTF is used to transform the one-dimensional signal into a two-dimensional picture, then on the basis of residual neural network, the original network is improved by adding multi-branch cavity convolution structure and Ghost module, and the fault features are adaptively extracted and classified by the improved network, so as to realize faulty line identification. The experimental results show that the proposed method can effectively identify the fault lines of AC-DC hybrid power grid, the improved residual neural network can dig out the fault features more deeply, and has strong anti-noise and anti-data loss interference ability, the method has 99.91% fault line identification accuracy. It has higher recognition performance compared to traditional machine learning algorithms and various deep learning algorithms.
Article
Full-text available
Botnets often apply domain name generation algorithms (DGAs) to evade detection by generating large numbers of pseudo-random domain names of which only few are registered by cybercriminals. In this paper, we address how DGA-generated domain names can be detected by means of machine learning and deep learning. We first present an extensive literature review on recent prior work in which machine learning and deep learning have been applied for detecting DGA-generated domain names. We observe that a common methodology is still missing, and the use of different datasets causes that experimental results can hardly be compared. We next propose the use of TF-IDF to measure frequencies of the most relevant n-grams in domain names, and use these as features in learning algorithms. We perform experiments with various machine-learning and deep-learning models using TF-IDF features, of which a deep MLP model yields the best results. For comparison, we also apply an LSTM model with embedding layer to convert domain names from a sequence of characters into a vector representation. The performance of our LSTM and MLP models is rather similar, achieving 0.994 and 0.995 AUC, and average F1-scores of 0.907 and 0.891 respectively.
Article
Full-text available
Nowadays, the digital era is reshaped by new technologies, and the cyber-attacks are more sophisticated and becoming as a commonplace. The distributed denial of service (DDoS) attacks are the exponentially-growing and major prevalent attack that targets the emerging and changing computational network infrastructures around the globe. It is complex to distinguish the DDoS attack traffic from the legitimate network traffic when the transit happens from the zombies or attacker to the victim. The DDoS attack is considered as a stubborn network security conflict. Yet, these algorithms need a priori knowledge regarding the classes, and it is not possible to adapt to the subsequent varying network traffic trends in an automatic manner. This creates the requirement for the enhancement of the novel DDoS detection mechanisms that in turn sophisticated and targets the DDoS attacks. The main intent of this paper is to implement the DDoS detection model through deep learning by the integration of convolutional neural network (CNN), and optimized long short-term memory (LSTM), so called CNN-O-LSTM. On the standard five benchmark datasets, the optimal feature selection is performed by the closest position-based grey wolf optimization (CP-GWO) with the consideration of minimizing the correlation among the features. With the optimally selected features, CNN is adopted for the feature learning process, from which the features of the second pooling layer are extracted, which is used for performing the detection. The adoption of optimally selected features with the CNN features enhances the detection performance with the most significant features. Finally, the optimized LSTM is used in the detection phase, which aims to maximize the detection accuracy by optimizing the hidden neurons of LSTM. The proposed DDoS detection scheme is experimented on a set of benchmark datasets, and the outcomes are compared over the traditional models.
Article
Full-text available
Machine learning (ML) based botnet detectors are no exception to traditional ML models when it comes to adversarial evasion attacks. The datasets used to train these models have also scarcity and imbalance issues. We propose a new technique named Botshot, based on generative adversarial networks (GANs) for addressing these issues and proactively making botnet detectors aware of adversarial evasions. Botshot is cost-effective as compared to the network emulation for botnet traffic data generation rendering the dedicated hardware resources unnecessary. First, we use the extended set of network flow and time-based features for three publicly available botnet datasets. Second, we utilize two GANs (vanilla, conditional) for generating realistic botnet traffic. We evaluate the generator performance using classifier two-sample test (C2ST) with 10-fold 70-30 train-test split and propose the use of ’recall’ in contrast to ’accuracy’ for proactively learning adversarial evasions. We then augment the train set with the generated data and test using the unchanged test set. Last, we compare our results with benchmark oversampling methods with augmentation of additional botnet traffic data in terms of average accuracy, precision, recall and F1 score over six different ML classifiers. The empirical results demonstrate the effectiveness of the GAN-based oversampling for learning in advance the adversarial evasion attacks on botnet detectors.
Article
Smart Agriculture (SA) incorporates low cost and low energy consuming sensors and devices to enhance quantitative and qualitative agricultural production. However, these devices uses an open channel i.e., Internet and generates large amount of data in real-time and thus, has the potential to be misused. As a consequence, the major concern in the implementation of SA is minimizing the risk of security and data privacy violation (e.g., adversaries performing inference attacks). To address these challenges, we propose PEFL, a deep privacy-encoding based Federated Learning (FL) framework that adopts a perturbation-based encoding and Long-Short Term Memory-AutoEncoder (LSTM-AE) technique to achieve the target of privacy. Then a FL-based Gated Recurrent Unit Neural Network algorithm (FedGRU) is designed using the encoded data for intrusion detection. The experiment results based on ToN-IoT dataset reveals that the PEFL can efficiently identify normal and attack patterns after transformation over other non-FL and FL methods.
Chapter
Kolpe, PraptiKshirsagar, DeepakIn today’s connected world, risk of getting attacked over the internet is increased, which plays a major role in infecting the devices over the internet. The internet is flooded with different malwares, but we have focused on the harmful effects of Botnet. Botnet is a group of devices controlled by a single device to attack and infect other devices over the internet. The devices are called bots and these can be any internet-connected device and the single device controlling these can be called as a botmaster or a bot driver. It is crucial to detect them at a faster rate since they can perform various malicious activities. We performed different experiments to detect Botnet. For experimentation, we used CICIDS2017 dataset and different machine learning algorithms from Weka. With the ML algorithms, we achieved the highest accuracy of 98.9146% for NaiveBayesMultinominalText algorithm.
Article
The rapid proliferation of the internet of things (IoT) systems, has enabled transforming urban areas into smart cities. Smart cities’ paradigm has resulted in improved quality of life and better services to citizens, like smart healthcare, smart parking, smart transport, smart buildings, smart homes, and so on. One of the major challenges of IoT devices is the limited capacity of their battery because the devices consume a large amount of energy once they communicate with each other. Furthermore, the IoT-based smart systems would contain sensitive data about network systems, introducing serious privacy and security issues. IoT-based smart systems are highly exposed to botnet attacks. Examples of such attacks are Mirai and BASHLITE malware launched from compromised surveillance devices, which are common in smart cities, resulting in paralysis of Internet-based services through distributed denial of service (DDoS) attacks. Such DDoS attacks on IoT devices and their networks further threaten the emerging concept of sustainable smart cities. To discover such cyberattacks, this paper proposes a novel statistical learning-based botnet detection framework, called IoTBoT-IDS, which protects IoT-based smart networks against botnet attacks. IoTBoT-IDS captures the normal behavior of IoT networks by applying statistical learning-based techniques, using Beta Mixture Model (BMM) and a Correntropy model. Any deviation from the normal behavior is detected as an anomalous event. To evaluate IoTBoT-IDS, three benchmark datasets generated from realistic IoT networks were used. The evaluation results showed that IoTBoT-IDS effectively identifies various types of botnets with an average detection accuracy of 99.2%, which is higher by about 2-5% compared with compelling intrusion detection methods, namely AdaBoost ensemble learning, fuzzy c-means, and deep feed forward neural networks.