Content uploaded by Chansu Han
Author content
All content in this area was uploaded by Chansu Han on Jun 17, 2022
Content may be subject to copyright.
Internet-Wide Scanner Fingerprint Identifier Based
on TCP/IP Header
Akira Tanaka∗† , Chansu Han∗, Takeshi Takahashi∗, and Katsuki Fujisawa†
∗National Institute of Information and Communications Technology, Tokyo, Japan.
{tanaka.akira, han, takeshi takahashi}@nict.go.jp
†Kyushu University, Fukuoka, Japan. akirat.tanaka@kyudai.jp and fujisawa@imi.kyushu-u.ac.jp
Abstract—Identifying individual scan activities is a crucial and
challenging activity for mitigating emerging cyber threats or
gaining insights into security scans. Sophisticated adversaries
distribute their scans over multiple hosts and operate with
stealth; therefore, low-rate scans hide beneath other benign
traffic. Although previous studies attempted to discover such
stealth scans by observing the distribution of ports and hosts,
well-organized scans are difficult to find. However, a scanner
can embed a fingerprint into the packet fields to distinguish
between the scan and other traffic. In this study, we propose a
new algorithm to identify the flexible fingerprint in consideration
of the genetic algorithm idea. To the best of our knowledge, this
is the first such attempt. We successfully identified previously
unknown fingerprints rather than existing ones through numer-
ical experiments on darknet traffic. We analyzed the packets
and discovered distinctive scan activities. Further, we collated the
results with both cyber threat intelligence and investigation/large-
scale scanner lists to ascertain the reliability of our model.
Index Terms—Internet-wide scan, genetic algorithm, finger-
print, darknet traffic
I. INTRODUCTION
Port scanning serves multiple purposes by probing a server
or host for open ports. Shodan and Censys1, known as search
engines for Internet-connected devices, build a searchable
database to discover vulnerabilities, their impact, and affected
IoT devices. Network administrators conduct port scans to
determine open ports and services in a penetration test to
verify network security policies. However, an attacker utilizes
a port scan to identify active network services and exploit their
vulnerabilities.
Intrusion detection systems (IDS) or firewalls detect such
adversaries by monitoring the number of packets sent by a
source IP address or a network class. However, sophisticated
adversaries slip through these defense systems by distributing
the scans over multiple hosts, thereby reducing the scan rate.
Durumeric et al. [1] investigated widespread port scanning and
distributed botnet scans. These distributed scans disappear in
the background noise during other scanning activities such as
benign traffic, benign port scans for investigation purposes,
and internet backscatter [2].
Robertson et al. [3] developed a detection module for stealth
scans, which assumes that cooperated hosts are located in
the same subnet. Further, by inspecting destination ports and
1https://www.shodan.io/, https://censys.io/
destination IP addresses, Yegneswara et al. [4] uncovered the
trend of intrusion attempts, wherein a few correlated sources
made a significantly large fraction of intrusion. Blaise et al. [5]
developed a statistical approach that utilizes a modified Z-
score measure for port-level change detection, which con-
tributes to the early detection of emerging botnets.
Moreover, existing studies report that scanners embed their
fingerprints into packet fields to distinguish between their
scan and backscatter traffic [6]. The adversary probably reuses
the same fingerprint during a scanning campaign because
creating a fingerprint for each source is inefficient. Snort,
Suricata2, and most IDS tools rely on a fingerprint database
to identify attacks that match the given fingerprints. Although
a fingerprint is expressive and easy to be understood by
network administrators, problems still exist. First, emerging
unknown threats may be overlooked. Second, it is expensive
to manually generate a fingerprint. Griffioen et al. [6] proposed
a method for finding a fingerprint in TCP and IP header fields
to address this problem. Their numerical experiments verified
its high detection performance for various scenarios, such as
distributing source addresses and limiting packets per host.
However, existing studies assume a fixed-form fingerprint,
thereby overlooking low-profile malware scans with diverse
fingerprints. Therefore, this study proposes an algorithm to au-
tomatically generate fingerprint candidates and select promis-
ing fingerprints. To the best of our knowledge, this is the first
study to automatically generate flexible fingerprints; thus, our
model cannot be compared with other existing models. We
analyzed the packets that have a fingerprint and collated their
source IP addresses with investigation/large-scale scanner lists
to ascertain the reliability of our model.
In summary, this study offers the following contributions:
1) We consider a genetic algorithm (GA) and propose a
new algorithm that automatically identifies the finger-
print embedded in IPv4 and TCP header fields.
2) We demonstrate the feasibility of our approach using
darknet traffic and discover several new fingerprints
previously unknown in the literature.
3) We collate our results with both cyber threat intelligence
and investigation/large-scale scanner lists to ensure the
reliability of our model.
2https://www.snort.org/, https://suricata.io/
TABLE I: Fingerprints of major open-source scanners (Mass-
can and ZMap) and malware (Mirai, Hajime) [1], [7]. fFB(·)
and fLB(·)output the first and last bytes of the input, respec-
tively. Similarly, fL2B(·)outputs the last two bytes of the input.
⊕denotes the bitwise XOR that performs the logical inclusive
OR operation on each pair of corresponding bits.
Name Fingerprint
Hajime [8] xp,(tcp.window,14600)
∧xp,(fFB(tcp.seq),0) ∨xp,(fLB (tcp.seq),0)
Massscan [9] xp,(ip.id⊕fL2B(ip.dst)⊕tcp.dport⊕fL2B (tcp.seq),0)
ZMap [10] xp,(ip.id,54321)
Mirai [11] xp,(ip.seq⊕ip.dst,0)
II. PRELIMINARIES
This section provides the preliminaries to our proposed
method and experiments.
A. Header Fields Used for Producing Fingerprints
As shown in Fig. 1, the header fields of the IPv4 and
TCP protocols are categorized into: (a) modifiable fields
(yellow) and (b) unchangeable fields (white). Modifications
of unchangeable fields prevent IPv4 or TCP from working
correctly. For example, a modification of the IPv4 version field
prevents the destination from parsing the packet. Therefore,
fingerprints embedded in a packet are probably made of only
modifiable header fields.
B. Well-known Fingerprints
We define a TCP function fas a function of a TCP packet
in the form of a binary number, Fas the set of TCP functions,
and Bas the set of binary numbers. For example, the function
fthat accepts a TCP packet pand returns its source address
is a TCP function and is expressed as f(p) = ip.src. A sign
(f, b)is defined as an ordered pair of the TCP function fand a
binary number b. A TCP packet pis said to have a sign (f, b)
if f(p) = b, where f(p)is the output of the TCP function f.
Given the fingerprint of a scanner or malware, the corre-
sponding fingerprint is composed of the following variables:
xp,(f,b)=(1if a packet phas a sign (f, b)
0otherwise (1)
According to a previous study [7], a TCP SYN packet phas a
Hajime [8] signature if it possesses two features: (1) a window
size of 14,600 and (2) the first or last byte of the sequence
number is zero. These two conditions are satisfied if and only
if
xp,(tcp.window,14600)
∧xp,(fFB(tcp.seq),0) ∨xp,(fLB (tcp.seq),0)= 1 (2)
where fFB and fLB output the first and last bytes of the input,
respectively. Other fingerprints and corresponding fingerprints
are summarized in Table I.
C. Internet-wide Scanner
We prepared two Internet-wide scanner lists. In Sec-
tion IV-C, these lists were collated with the source IP addresses
that send packet embedded fingerprints. The detail of the lists
are as follows:
•Investigation scanner list
Querying domain names, we obtained 674 IP addresses
for investigation purposes. This list includes eighteen
organizations such as Shodan, Censys and BinaryEdge.
•Large-scale scanner list
Following the previous research [12], if an IP address
satisfies both conditions in a day: (1) the number of
unique destination ports is more than or equal to 30;
(2) the number of packets exceeds the total IP address
space (we used the estimated number from darknet IP
addresses), the IP address is called a large-scale scanner.
Three hundred and twenty-five IP addresses satisfied the
criteria at least one day in October 2018, and the list
included all of them.
The above two lists have only 13 overlapping IP addresses,
all of which are Shodan.
III. METHODOLOGY
Our approach identified a fingerprint that is expressed as a
combination of bitwise operations between the header fields
of the IPv4 and TCP protocols. We provide an overview
(Section III-A) and the details of our method in the succeeding
sections.
A. Scanner’s Fingerprint Identifier
Algorithm 1 is the pseudocode for identifying fingerprints,
and the overview is summarized below:
1) Generate TCP functions
Considering the GA, we generate a new TCP func-
tion (line 5) from the current TCP functions ntimes.
Section III-B explains an initial TCP functions F1.
Generating a new TCP function is described in Sec-
tion III-C, thus we present an analogy between the GA
and generating TCP functions in Section III-D.
2) Find effective signs
If many packets have a sign (f, b), then xp,(f,b)is
probably a component of some fingerprints. The sign
(f, b)is defined as an effective sign, and Section III-E
provides a method for identifying them.
3) Consolidate effective signs and identify fingerprints
We calculate co-occurrences between effective signs,
that is, how often two effective signs hold for a packet
simultaneously. We unify some effective signs with
high co-occurrences and create a new fingerprint. For
instance, almost all the packets that have a ZMap
fingerprint (ip.id=54321) satisfy tcp.window=65,535 and
ack=0; hence ZMap fingerprint is updated to ip.id =
54321 ∧tcp.window = 65535 ∧ack = 0.
Fig. 1: Header fields of the IPv4 and TCP protocols. Modifiable fields are in yellow, and unchangeable fields are in white.
Algorithm 1: Scanner’s Fingerprint Identifier
Input : n: number of generated TCP functions
F1⊆F: initial TCP functions
Output: S: set of fingerprints
1
2Function fgpt_identifier(n, F1):
// (1) Generate TCP functions
3F←F1
4for i←1to ndo
// see Algorithm 2
5f←generate_TCPfunction(F)
6F.add(f)
// (2)Find effective signs
7E← ∅
8for f∈Fdo
// see Algorithm 3
9E0←find_effective_sign(f)
10 E←E∪E0
// (3) Consolidate effective signs and
identify fingerprints
11 S←consolidates and arranges effective signs Eand
finally identifies fingerprints
12 return S
B. Initialization of TCP functions
We define initial TCP functions F1⊆Fas the set of TCP
functions that returns modifiable header fields (colored yellow
in Fig. 1). However, some header fields have the same binary
value for almost all packets. We eliminated the fields from
the initial TCP functions because these header fields are not
appropriate for the initial TCP functions component.
C. Generate TCP function
Given a subset of TCP functions F⊆F, our algorithm
generates a new TCP function fby leveraging either (1)
feature extraction or (2) binary operation, with a probability
of ror 1−r, respectively. r∈[0,1] is a hyperparameter
that determines the priority of the two operations. In feature
extraction, we first select a TCP function ffrom Fand
generate a new TCP function k◦f, where k:B→Bis
selected from predefined binary-valued functions Kwith a
discrete uniform distribution. For example, k◦f=fL2B◦ip.dst
transforms a packet into the last two bytes of the destination
address. Conversely, the binary operation chooses two TCP
functions f, g and produces a new TCP function ψ(f, g),
where ψis chosen from predefined binary operations Ψwith
a discrete uniform distribution. If f=ip.seq, g=ip.dst, and
ψis the bitwise XOR, the TCP function ψ(f, g)transforms a
packet into the XOR result between the sequence number and
destination address, denoted by ip.seq ⊕ip.dst.
ATCP function fis built through some function com-
positions, and its minimum number plus one is denoted by
τcount(f):
τcount(f):= min{i|f∈Fi},where (3)
Fi:={k◦f|k∈K, f ∈Fi0(i0< i)}
∪ {ψ(f, g)|ψ∈Ψ, f ∈Fi1, g ∈Fi2(i1, i2< i)}
(4)
for i= 2,3,· · · . For instance,
τcount (fL2B(ip.seq)⊕fL2B (ip.dst)) (5)
=τcount (fL2B (ψ(ip.seq,ip.dst)) (6)
=3 (7)
where ψdenotes bitwise XOR.
The pseudocode for generating a new TCP function is
written as the generate_TCPfunction in Algorithm 2.
D. Analogy between Genetic Algorithm and Generating TCP
functions
The GA pursues high-quality solutions for optimization
problems in broad research fields [13]. In GA, a population
of candidate solutions (individuals) iteratively evolves toward
better solutions via biologically inspired operators, such as
mutation, crossover, and selection. In each iteration, the fitness
of every individual is evaluated, and the fit individuals are
stochastically selected from the current population. Subse-
quently, these individuals are then modified via recombination
or random mutation.
Although we cannot directly apply the GA in identifying
fingerprints, we consider this idea and build a new algorithm
to make TCP functions. In Algorithm 2, a TCP function
represents an individual, and the fitness of each TCP function
is assessed via τcount. Specifically, a TCP function fthat
has smaller τcount(f)tends to be chosen, which implies that
we seek more simpler TCP functions. (1) feature extraction
and (2) binary operations correspond to the mutation and
crossover, respectively. The output of these operations inherit
the input features, which are similar to those of mutation and
crossover.
Algorithm 2: Generate TCP function
Input : F⊆F:TCP functions
K: binary-valued functions
(k∈Kimplies k:B→B)
Ψ: binary operations on F
(ψ∈Ψimplies ψ:F×F→F)
r∈[0,1] : probability of future extraction
Output: f: a new TCP function
1
2Function select_TCP_function(F):
3f←select f∈Fwith the probability (1/τcount(f))2
Pf∈F(1/τcount(f))2
4return f
5Function feature_extraction(F, K):
6f←select_TCP_function(F)
7k←select k∈Kwith a discrete uniform distribution
8return k◦f
9Function binary_operation(F, Ψ):
10 f←select_TCP_function(F)
11 g←select_TCP_function(F)
12 ψ←select ψ∈Ψwith a discrete uniform distribution
13 return ψ(f, g)
14 Function generate_TCPfunction(F):
15 x∼Uniform(0,1) // random number from [0,1]
16 if x≤rthen
17 f←feature_extraction(F, K )
18 else
19 f←binary_operation(F, Ψ)
20 return f
E. Find Effective Signs
We describes a method for identifying the effective signs
(f, b), given a TCP function f. We examine the appearance
ratio of each binary in the destination of f.
Let fbe a TCP function, and let P={p}pbe packets
collected from network traffic. For each binary b∈B, we
define the appearance ratio of bas
ra(b):=#{p∈P|f(p) = b}
#P(8)
where #Adenotes the number of the elements in set A. The
image of the packets P={p}punder fis defined by
B:=f(P) = {f(p)|p∈P} ⊆ B(9)
.R:= (ra(f(p)))p∈Pdenotes a multiset (not set) whose
underlying set is {ra(b)|b∈B}. For every real number
α, we define R<α ⊆R(R≤α⊆R) as a multiset composed
of any element of r∈Rsuch that r < α (r≤α). The
population variance of a multiset Ais denoted by σ2
A. We
define the effective indicator of b∈Bfor f∈Fas
ef(b):=(σ2
R≤ra(b)/σ2
R<ra(b)(if σ2
R<ra(b)>0)
NU LL (otherwise)(10)
Algorithm 3: Find Effective Sign
Input : f: a TCP function
P={p}p: packets
max sign : max number of effective signs
per TCP function
sign thres : threshold of effective signs
Output: E={(f, b)}:effective signs
1
2Function find_effective_sign(f):
3B← {f(p)|p∈P}
4sorted B ←arrange Bin descending order of the
appearance ratio ra(b)
5max idx ←NUL L
6for i←0to max sign −1do
7b←sorted B[i]
8if ef(b)>sign thres then
9max idx ←i
10 E← ∅ // initializes effective signs
11 if max idx 6=NU LL then
12 for i←0to max idx do
13 E.add((f, B [i]))
14 return E
where NU LL implies that ef(b)cannot be defined. The effective
indicator evaluates the extent to which the binary b∈Binflu-
ences the variance of the appearance ratio, and a larger value
of ef(b)implies that (f, b)is an effective sign. Algorithm 3
describes the pseudocode of finding effective signs.
IV. EXP ER IM EN TS
We applied our model to darknet traffic and demonstrated
its feasibility. Section IV-A explains the dataset used in our
experiments, and the parameters of our model are summarized
in Section IV-B. We analyze the packets that have a fingerprint
in Section IV-C.
A. Dataset
Our dataset was collected from a darknet operated by
NICTER3. The darknet, also known as a network telescope,
passively monitors network traffic with an unreachable dark IP
address block. The darknet is an effective system for observing
indiscriminate Internet-wide scans because it does not receive
benign and regular network traffic. Because TCP SYN packets
are used to survey active hosts and open ports [6], our
experiment used only TCP SYN packets. Table II summarizes
the dataset and the computation time of our algorithm.
B. Parameter Setting
The following parameters were selected manually based on
the empirical investigation. In Algorithm 1, we set n = 2,000,
F1={ip.id,ip.checksum,ip.src,ip.dstaddr}
∪ {tcp.sport,tcp.dport,tcp.seq,tcp.window}(11)
3Network Incident Analysis Center for Tactical Emergency Response: https:
//www.nicter.jp/en
TABLE II: Summary of our dataset and the computation time
of Algorithm 1
Situation #IPs§Period*#Packets Time
Implement
Algorithm 1 4,096 10/22 – 10/24 572,289†6.5 h‡
Analyze packets
that has a fingerprint 4,096 10/22 – 10/28 117.5×10618 h
*The year was 2018 and the period was selected owing to active malicious
activities [7], [14].
§The number of IP addresses of our darknet.
‡We implemented Algorithm 1 using Python and one core on a server
AMD EPYC 7H12 (64 CPUs and 56 GB RAM). Elasticsearch was utilized
for the database of packets. The computation time only includes: (1)
Generating TCP functions and (2) Finding effective signs (lines 2–10).
†We used 1 h amount of packet because of the computation cost (the
number of total packets is 41.2×106).
. Algorithm 2 adopts r= 0.1,K={fF2B, fL2B}and
Ψ = {ψ}where ψ:F×F→Fsatisfies ψ(f, g)(p)7→
f(p)⊕g(p)for every packet p. max sign = 10 is used in
Algorithm 3, and sign thres is determined such that the num-
ber of effective signs ranges from 20 to 50. The computation
time of Algorithm 1 is proportional to both nand P={p}p.
F1,Kand Ψspecify the search space of the TCP functions.
max sign and sign thres determine the threshold for effective
signs.
C. Analyzing Packets that have a Fingerprint
For the TCP SYN packets, we applied Algorithm 1 five
times while excluding packets that had some fingerprints. Mul-
tiple applications can identify a fingerprint with a small num-
ber of packets. We identified three known and six unknown
fingerprints, as summarized in Table III. Some IP addresses
sent numerous packets that had a fingerprint with different
characteristics from the other packets that have the same
fingerprint. Therefore, we eliminated packets sent by these
IP addresses. Although any identified unknown fingerprint
occupied less than 0.5 % of the total packets, our model could
detect them. We identified well-known fingerprints, which are
summarized in Table I, except for Hajime, which is regarded as
a trivial attack because of the small number of packets (0.02 %
of the total packets). We categorized the identified fingerprints
into attack or investigation purposes, and the major features
are summarized below.
•Attack purpose (Mirai botnet, Botnet1, and Attack2–6)
–Destination ports are related with some vulnerability.
–All source IP addresses of Mirai botnet, Botnet1,
and Attack2–5 are not included in the Investigation
or large-scale scanner lists, which are defined in
Section II-C.
•Investigation purpose (ZMap and Masscan)
–Destination ports cover a wide range of ports.
–Source IP addresses overlap investigation and large-
scale scanner lists.
1) Attack Purpose Scan: As summarized in Table III, the
incessant packets with the Mirai fingerprint occupy 14.31% of
the total TCP SYN packets. Mirai targets 23/TCP (53.8%) and
2323/TCP (6.0%) as destination ports for the brute-force login
phase, where each percentage indicates the ratio of the packets
with the Mirai fingerprint. Destination port 4444/TCP (6.0%)
was used for an infection campaign for Android devices.
The remarkable characteristic of Botnet1 is that: (1) the
destination port is 5431, (2) the source port is 6, and (3)
the window size is 65535. Moreover, Botnet1 excludes the
packets with ip.id=54321, which is the ZMap fingerprint.
The destination ports 5431/TCP and 154 K unique source
IP addresses indicate that the Botnet1 aims to infect router
equipment with the Broadcom UPnP feature enabled4.
Both Attack2 and 3 satisfy ip.id=256, tcp.window=16384,
fL2B(tcp.seq)=0. Attack2 uses a fixed source port 6000/TCP,
whereas Attack3 uses any port except 6000/TCP. Destination
ports of Attack2 and 3, 3306/TCP, 1433/TCP, and 60001/TCP
occupy more than 14%, 14%, and 15%, respectively. The
adversary explores the vulnerability of SQL through 3306/TCP
and 1433/TCP [15], [16] and the vulnerability of the Jaws Web
Server (EDB-ID:414715) on 60001/TCP.
All the destination ports of Attack4 and 5 are dynamic ports
(in the range of 49152 to 65535).
Attack6 aims at 80/TCP (31.0%), 8080/TCP (26.9%),
85/TCP (26.8%) and 443(14.9%). According to the NICTER
observation report 20186, ports 80, 443, and 8080 originated
from attacks on GPON home routers (CVE-2018-10561 and
CVE-2018-105627).
2) Security Scan: Because almost all the packets with the
original ZMap fingerprint (ip.id=54321) satisfy tcp.ack=0 and
tcp.window=65535, the ZMap fingerprint is updated, as sum-
marized in Table III. Compared to the investigation scanner
list, the 64 source IP addresses are from Censys, denoted by
Censys(64), Binaryedge(12), Security.ipip.net(21), and Shad-
owserver(166). Succinctly, the investigation and large-scale
scanner list occupy 9.42% and 22.16%, respectively. We
observed constant packets, and a few IP addresses caused three
spikes. Further, ZMap uses a wide variety of source ports.
Massscan accounts for 67.14% of the total TCP SYN
packets and has Binaryedge(1), Censys(2), Onyphe(1), Secu-
rity.ipip.net(3), Shadowserver(11), and Shodan(6). Moreover,
77.38% of Masscan’s packets are from the IP addresses in the
large-scale scanner list.
V. CONCLUSION
This study considers a genetic algorithm and proposes a
new algorithm that automatically identifies the fingerprints
embedded in packet header fields. Numerical experiments
using darknet traffic demonstrated the feasibility of our model
by identifying previously unknown fingerprints rather than
existing ones. We analyzed these packets and revealed char-
acteristic scan activities that accounted for less than 0.5%
4https://blog.netlab.360.com/bcmpupnp hunter-a-100k- botnet-turns-home
-routers- to-email- spammers-en/
5EDB-ID represents the identification number of the exploit database
6https://www.nicter.jp/en/report
7https://nvd.nist.gov/
TABLE III: Fingerprints identified by Algorithm 1. ¬xdenotes the negation of x, and K denotes 103.
Name Packets (%) #source IP
addresses (%) Fingerprint
Mirai Botnet 16,812 K
(14.31%)
366,358
(25.851%) x(ip.seq⊕ip.dst,0) ∧xp,(tcp.ack,0)
Botnet1 439 K
(0.37%)
154,136
(10.876%)
xp,(tcp.dport,5431) ∧xp,(tcp.sport,6) ∧xp,(tcp.window,65535)
∧xp,(tcp.seq,0) ∧xp,(tcp.ack,0) ∧ ¬xp,(ip.id,54321)
Attack2 302 K
(0.26%)
61
(0.004%)
xp,(ip.id,256) ∧xp,(tcp.window,16384) ∧xp,(fL2B(tcp.seq),0)
∧xp,(tcp.ack,0) ∧xp,(tcp.sport,6000)
Attack3 477 K
(0.41%)
71
(0.001%)
xp,(ip.id,256) ∧xp,(tcp.window,16384) ∧xp,(fL2B(tcp.seq),0)
∧xp,(tcp.ack,0) ∧ ¬xp,(tcp.sport,6000)
Attack4 423 K
(0.36%)
18
(0.001%)
x(fL2B(ip.dst)⊕tcp.dport,0) ∧xp,(tcp.ack,1)
∧xp,(ip.id,0) ∧xp,(tcp.window,17520) ∧x(tcp.sport,80)
Attack5 268 K
(0.23%)
98
(0.007%) x(fL2B(ip.dst)⊕tcp.dport,0) ∧xp,(tcp.ack,1) ∧xp,(ip.id,38993)
Attack6 88 K
(0.07%)
56
(0.004%) xp,(tcp.window,1300) ∧xp,(tcp.ack,0)
ZMap 9,441 K
(8.04%)
4,117
(0.291%) xp,(ip.id,54321) ∧xp,(tcp.ack,0) ∧xp,(tcp.window,65535)
Massscan 78,915 K
(67.17%)
1,650
(0.116%) xp,(ip.id⊕fL2B(ip.dst)⊕tcp.dport⊕fL2B (tcp.seq),0) ∧xp,(tcp.ack,0)
of the total packets. These results were collated with both
cyber threat intelligence and investigation/large-scale scanner
lists to ascertain the reliability of the fingerprints. In the next
step, we will perform parallel computations for speedup to
integrate our model into a real-time system. Furthermore, we
aim at identifying more reliable fingerprints by applying our
method to packets from dynamic malware analysis. Finally,
we will build a system that automatically associates identified
fingerprints with open-source threat intelligence [17].
ACKNOWLEDGMENT
The study was partly conducted under a contract of “MIT-
IGATE” of the Research and Development for Expansion of
Radio Wave Resources (JPJ000254), which was supported by
the Ministry of Internal Affairs and Communications, Japan.
REFERENCES
[1] Z. Durumeric, M. Bailey, and J. A. Halderman, “An internet-wide view
of internet-wide scanning,” Proc. 23rd USENIX Secur. Symp., pp. 65–78,
2014.
[2] N. Blenn, V. Ghi¨
ette, and C. Doerr, “Quantifying the Spectrum of
Denial-of-Service Attacks through Internet Backscatter,” in Proc. 12th
Int. Conf. Availability, Reliab. Secur., ser. ARES ’17, 2017.
[3] S. Robertson, E. V. Siegel, M. Miller, and S. J. Stolfo, “Surveillance
detection in high bandwidth environments,” Proc. DARPA Inf. Surviv.
Conf. Expo. DISCEX, vol. 1, pp. 130–138, 2003.
[4] V. Yegneswaran, P. Barford, and J. Ullrich, “Internet intrusions: Global
characteristics and prevalence,” Perform. Eval. Rev., vol. 31, no. 1, pp.
138–147, 2003.
[5] A. Blaise, M. Bouet, V. Conan, and S. Secci, “Detection of zero-day
attacks: An unsupervised port-based approach,” Comput. Networks, vol.
180, no. January, 2020.
[6] H. Griffioen and C. Doerr, “Discovering Collaboration: Unveiling Slow,
Distributed Scanners based on Common Header Field Patterns,” in
IEEE/IFIP Netw. Oper. Manag. Symp., 2020, pp. 1–9.
[7] C. Han, J. Shimamura, T. Takahashi, and et al., “Real-Time Detection
of Global Cyberthreat Based on Darknet by Estimating Anomalous
Synchronization Using Graphical Lasso,” IEICE Trans. Inf. Syst., vol.
103, no. 10, pp. 2113–2124, 2020.
[8] S. Herwig, K. Harvey, G. Hughey, and et al., “Measurement and analysis
of Hajime, a peer-to-peer IoT botnet,” in Netw. Distrib. Syst. Secur.
Symp., 2019.
[9] R. D. Graham, “MASSCAN: Mass IP port scanner.” [Online].
Available: https://github.com/robertdavidgraham/masscan
[10] Z. Durumeric, E. Wustrow, and J. A. Halderman, “ZMap: Fast Internet-
wide Scanning and Its Security Applications,” in Proc. USENIX Secur.
Symp., 2013, pp. 605–620.
[11] M. Antonakakis, T. April, M. Bailey, and et al., “Understanding the
Mirai Botnet,” in 26th USENIX Secur. Symp., 2017, pp. 1093–1110.
[12] Y. Endo, Y. Mori, J. Shimamura, and M. Kubo, “Proposing Criteria for
Detecting Internet-Wide Scanners for Darknet Monitoring,” in IEICE
Tech. Rep., ser. ICSS2019-80, vol. 119, no. 437, Okinawa, mar 2020,
pp. 73–78, (in Japanese).
[13] Z. Z. Wang and A. Sobey, “A comparative review between Genetic
Algorithm use in composite optimisation and the state-of-the-art in
evolutionary computation,” Compos. Struct., vol. 233, 2020.
[14] C. Han, J. Takeuchi, T. Takahashi, and D. Inoue, “Automated detection of
malware activities using nonnegative matrix factorization,” in 20th IEEE
International Conference On Trust, Security And Privacy In Computing
And Communications (TrustCom), 2021.
[15] K. Goseva-Popstojanova, B. Miller, R. Pantev, and A. Dimitrijevikj,
“Empirical analysis of attackers activity on multi-tier web systems,”
Proc. Int. Conf. Adv. Inf. Netw. Appl. AINA, pp. 781–788, 2010.
[16] T. Battle, “GIAC Certified Incident Handler Practical,” Methods, 2003.
[17] T. Takahashi, Y. Umemura, C. Han, T. Ban, K. Furumoto, O. Nakamura,
K. Yoshioka, J. Takeuchi, N. Murata, and Y. Shiraishi, “Designing
comprehensive cyber threat analysis platform: Can we orchestrate anal-
ysis engines?” in 2021 IEEE International Conference on Pervasive
Computing and Communications Workshops and other Affiliated Events
(PerCom Workshops). IEEE, Mar. 2021.