Conference PaperPDF Available

Spying in the Dark: TCP and Tor Traffic Analysis

Authors:

Abstract and Figures

We show how to exploit side-channels to identify clients without eavesdropping on the communication to the server, and without relying on known, distinguishable traffic patterns. We present different attacks, utilizing different side-channels, for two scenarios: a fully off-path attack detecting TCP connections, and an attack detecting Tor connections by eavesdropping only on the clients. Our attacks exploit three types of side channels: globally-incrementing IP identifiers, used by some operating systems, e.g., in Windows; packet processing delays, which depend on TCP state; and bogus-congestion events, causing impact on TCP’s throughput (via TCP’s congestion control mechanism). Our attacks can (optionally) also benefit from sequential port allocation, e.g., deployed in Windows and Linux. The attacks are practical - we present results of experiments for all attacks in different network environments and scenarios. We also present countermeasures for these attacks.
Content may be subject to copyright.
1
Spying in the Dark:
TCP and Tor Traffic Analysis
Yossi Gilad and Amir Herzberg
Abstract—We show how to exploit side-channels to identify
clients without eavesdropping on their communication with the
server, and without relying on known, distinguishable traffic
patterns. We present different attacks, utilizing different side-
channels, for two scenarios: an off-path attacker that detects
TCP connections, and an eavesdropper attacker that detects
connections relayed via the Tor anonymity network.
Our attacks exploit three types of side channels: globally-
incrementing IP identifiers, used by some operating systems,
e.g., in Windows; the port selection algorithm used by Linux
and Android, which we show to leak when a user connects
to a website (this algorithm is standardized, and recommended
in RFC 6056); and bogus-congestion events, causing impact on
TCP’s throughput (via TCP’s congestion control mechanism).
The attacks are practical - we present results of experiments for
all attacks in different network environments and scenarios. We
conclude this work with practical client and server end defenses
against our attacks.
I. INTRODUCTION
Internet communication is often sensitive, and security
measures must be taken to protect privacy against attackers.
The exact measures depend on the exact threat; in particular,
encryption protocols such as TLS [1] are necessary to protect
content from an eavesdropping attacker.
However, encryption is insufficient to prevent traffic anal-
ysis, and in particular, to prevent exposure of the identities
of the communicating peers. Users concerned against traffic
analysis by eavesdropping attackers, use anonymizing services
such as Tor. Other users may simply assume that the adversary
is off-path (non-eavesdropping), and expect privacy against
such (weaker) attackers.
We present three traffic-analysis attacks against these sce-
narios. Two attacks identify clients that communicate to a spe-
cific server directly over TCP (without anonymizing interme-
diaries such as Tor). Such attacks do not require eavesdropping
at all, and may be launched by weak, off-path attackers, even
for commercial motivations. In fact, since the attacks do not
involve eavesdropping, they may even be deemed to be legal
(not wiretapping). We believe technical measures should (and
can) prevent such off-path traffic analysis.
Our third traffic analysis attack is against Tor users. It re-
quires eavesdropping abilities only on the client side, and only
spoofing abilities on the server side. We believe that this is an
important scenario, since Tor clients are often concerned about
attackers which can eavesdrop on their connection to the Tor
relay, since the client-relay connection may be insecure (e.g.,
Internet cafe) or controlled by a potential snoopy organization
Department of Computer Science, Bar Ilan University, Israel.
3a. Post-Probe Query
3b. Post-Probe Response
Attacker C S
1a. Pre-Probe Query
1b. Pre-Probe Response
2a. Probe
Post-Probe
Phase
Pre-Probe
Phase
Probe Phase
2b. Probe2a. Probe 2c. Probe
FW/NAT
Fig. 1. Query-Probe-Query off-path attack pattern. Attacker uses spoofed
source address for probe.
(employer, government, etc). Our evaluation of this attack is
yet incomplete; however, the preliminary results which we
present in this paper provide a warning about a new attack
vector on the Tor anonymity network.
Our attacks exploit different side-channels, providing useful
information on a TCP/Tor connection to an off-path attacker
(for Tor, attacker can eavesdrop, but only on the client). Side
channels have been extensively used in attacks on crypto-
graphic systems, e.g., [2], but also in attacks on privacy, e.g.,
[3], and more recently also applied to traffic analysis [4], [5],
[6].
Our attacks on direct (i.e., non-anonymous) TCP connec-
tions can be viewed as instances of an attack pattern which
we call Query-Probe-Query, illustrated in Figure 1; this is
a generalization of the well-known idle (stealth) scan attack
[7], [8], and of the measurement method used in [9]. In the
Query-Probe-Query attack pattern illustrated in Figure 1, the
first query measures some ‘pre-probe state’; the probe may
cause some change on the state, where the change depends
on the property measured, e.g., whether a specific client port
is open; and the final query measures the ‘post-probe state’.
This attack pattern can be applied with different queries and
probes, to measure different values.
In our implementations, each probe is a packet, or few
packets, sent to the client C. The probes test for some event
e, such that if e holds then C will send some packet(s), and
otherwise he will not send a packet (or send less packets).
We use the pre/post probe response to infer on e: we measure
the increase in the IP-ID counter, or measure time between
receipt of the two response packets (1b. and 3b in Figure 1).
Table I summarizes our results for traffic analysis on direct
TCP connections.
In the recent years, there is growth in the use of anonymity
mechanisms such as Tor [10]. Tor is a low latency, circuit
based, anonymity network that is widely used and increasing in
2
Side Adversary Location
Channel Local Remote
Success Rate
IP-ID (Section III) 1 0.92
Timing (Section IV) 0.74 0.58
Attack Duration (seconds)/
Data Sent (MB)
IP-ID (Section III) 14/0.6 38/2
Timing (Section IV) 70/5 50/4
TABLE I
RESULTS FOR PROBING DIRECT (TCP) CONNECTIONS. ATTACKER LOCATION IS RELATIVE TO THE CLIENT (VICTIM). SUCCESS RATES CAN BE IMPROVED
BY REPEATING THE ATTACK.
popularity (according to [11], increase of about 70% in recent
year). Tor is designed to ensure traffic privacy, even when
the adversary is able to eavesdrop on either C or S. However,
due to its low latency, Tor cannot ensure traffic privacy against
attackers eavesdropping at both ends (C and S), see discussion
on related works below.
We show how an adversary capable of eavesdropping on
the client, C, but not on the server, S, is able to detect that
C is communicating with S. This attack uses a side channel
as well, but does not follow the query-probe-query pattern;
here the attacker causes a reduction in the rate of packets that
would reach C in case that he is communicating with S, and
then tests whether reduction had occurred.
Our attacks on Tor are active, i.e., involve sending packets
to Tor exit relay. When there is a known, distinct traffic pattern
to the communication of specific server (website fingerprint),
then alternative passive attacks may be applicable as well, e.g.,
[12], [13], [14]. It may be possible to extend our techniques
to also take advantage of such site fingerprint, when available.
A. Related works
IP-ID side-channel and off-path traffic analysis.: We show
how the use of globally-incrementing IP-ID field in IP headers,
provides side-channel allowing effective off-path traffic anal-
ysis. The use of globally-incrementing IP-ID is recognized, in
[15], as a common practice with known security implications;
e.g., both globally-incrementing and per-destination incre-
menting IP-ID allow interception, injection and discarding of
fragmented traffic [16]. Globally-incrementing IP-ID can allow
estimation of the number of packets sent [17], stealth-scan for
open ports (idle scan) [18] and counting hosts behind NAT
[9].
The technique that we present for the case of a client that
uses a globally-incrementing IP ID and is not connected via a
firewall/NAT (see Figure 1) is a variant of idle-scan [7], [8].
The difference is that while idle-scan probes a server for an
open (i.e., ‘listening’) port, our attack probes a client for a
connection with a server.
The only other previous work we found that performs traffic
analysis by off-path attacker, using a side-channel, is [13].
Their attack is based on detecting changes in the round trip
delays from the attacker to the DSL router; this is a rather
crude side channel, much less efficient than both the IP-ID
and the timing side channels we use. Indeed, they only present
detection of whether a client is browsing or playing a video,
based on the significant difference in bandwidth, and assuming
no other traffic. Our results dramatically improve the impact
of detection compared to theirs, provided that the attacker can
communicate with the clients.
Tor traffic analysis.: Low-latency anonymity networks are
known to be vulnerable to traffic correlation attacks by an
attacker that eavesdrops on both ends; this problem, and
possible countermeasures, was studied in several works, e.g.,
[19], [20], [21]; efficiency and accuracy can significantly
improve if attacker can also manipulate traffic at the exit relay,
see [22]. Indeed, Tor designers are well aware of its inability
to properly protect against an attacker (eavesdropping) at both
ends of the circuit.
Other attacks manipulate the traffic at the server or the last
(exit) relay in the circuit, and use different techniques to detect
the relay along the path based on delays [4], [23], [24]. These
works assume that the attacker controls the server or the exit
relay, but do not require client-side eavesdropping. In contrast,
our attack on Tor requires client-side eavesdropping, but does
not require control over the server or exit relay. An obvious
challenge is to combine the results, and identify clients without
controlling server or exit relay, and without eavesdropping at
all. Our traffic detection attacks on TCP may be applicable.
B. Our contributions
The main contribution of this paper is identification and
analysis of side channels in the TCP/IP suite and their practical
implications on privacy, as we verify in experiments. We
provide practical countermeasures to the problems that we
identify, these allow quick patching at the firewall level and
require no changes to hosts or core operating system services.
This work motivates use of cryptography in lower network
layers and in particular IPsec [25] as we show that higher
network layer solutions such as SSL/TLS do not prevent blind
traffic analysis.
C. Paper Organization
In Section II we present our attacker models and the
scenarios that we consider, we also present the criteria we use
to measure the effectiveness of the attacks. Sections III, IV
present the global-ID and timing side channels; both sections
provide results of empirical experiments. Section V presents
our attack on Tor and corresponding experiments. Section
VI presents practical defenses. Finally, Section VII presents
our conclusions from this work, as well as future research
directions.
II. MODEL
Let C and S be communicating TCP client and server (re-
spectively). We consider two types of adversaries, depending
on how C and S are connected. In Sections III and IV, we
3
www.s.com
C
www.mallory.com
connec tion
?
Network
Fig. 2. C is surfing in both Mallory and S’s sites, Mallory tries to detect
whether there is a connection between C and S.
consider the case that C and S have a direct TCP connection.
In Section V, C connects to S through the onion routing
anonymity network, Tor [10]; i.e., C communicates with S
via a circuit of relays (proxies). The goal of the attacker is
to identify clients who connect to a server S. We identify S
using its IP address and port.
We consider two types of attackers: Mallory, an off-path
adversary, and Eve, an eavesdropping adversary. The at-
tackers can send spoofed packets, i.e., packets with fake
(spoofed) sender IP address. Due to ingress filtering [26],
[27], [28] and other anti-spoofing measures, IP spoofing is
less commonly available than before, but still feasible, see
[29], [30]. Apparently, there is still a significant number of
ISPs that do not perform ingress filtering for their clients
(especially to multihomed customers). Furthermore, with the
growing concern of cyberwarfare and cybercrime, some ISPs
may intentionally support spoofing. Hence, it is still reasonable
to assume spoofing ability.
We describe both adversary models in Sections II-A and
II-B below. Section II-C presents the criteria we use to evaluate
the attacks we present.
A. Mallory - Off-path Adversary
We assume that C visits a website that Mallory controls,
denoted www.mallory.com. Mallory uses this (legitimate) con-
nection, to probe whether C has any connections S, see Figure
2.
We consider three variants of Mallory, as illustrated in
Figure 3: with-C, near-C and remote. These differ with respect
to Mallorys abilities to communicate with C; the greater the
distance, the more likely it is that packet loss or reordering
occurs, decreasing the quality of the side channels.
The with-C and near-C attackers are located near the client
(C); the difference between them is that the with-C adversary
directly communicates with the client, allowing Mallory to take
advantage of Windows globally incrementing port allocation
(if C runs Windows). When the adversary and C communicate
via a NAT (near-C or remote), we assume that the NAT
uses per destination incremental assignment of external ports
(e.g., as in the widely-used IP-tables NAT/Firewall provided
in Linux). See in Section III how we exploit different client
port allocation techniques. Finally, the remote Mallory attacker
simulates an adversary that communicates with the clients
S
C
1
Clients
Network
C
2
C
m
.
.
.
NAT
With-C
Attacker
Near-C
Attacker
Remote
Attacker
Fig. 3. Three variants of the Mallory adversary.
www.restricted.com
C
1
accessed
a restricted
site
?
www.other.com
Tor
exit relay
C
m
.
.
.
Fig. 4. Eve identifies that some of the clients she eavesdrops on are
using Tor and wants to detect which of them is communicating with
www.restricted.com. C (C
m
) connects to www.restricted.com via a circuit
of 3 relays.
from a remote location, i.e., via a high latency, jittery and
lossy channel.
B. Eve - Adversary for Anonymized Connections
In the attacks on Tor, we consider the adversary Eve who
is able to eavesdrop on many clients that use Tor, however,
Eve cannot eavesdrop on the servers (see Figure 4). Such an
adversary may include a government or an employer, spying
on citizens or employees. Eves goal is to detect which of
the clients is communicating (using Tor) with a particular
watched/restricted site, S.
C. Attack Evaluation Criteria
In addition to measuring the success, false positive and false
negative rates, we consider two additional measures. The first
measure is the time that an adversary (with some reasonable
constant bandwidth) needs to run the attack in order to reach
a particular success probability for detecting a connection.
This value also provides the minimal detectable connection
time. The second measure is the average amount of data per
victim that the attacker is required to send to reach a particular
success rate.
III. GLOBALLY-INCREMENTING IDENTIFIER BASED
TRAFFIC ANALYSIS
This section presents a probing technique that allows an
off-path (blind) adversary, Mallory, to identify a connection
between a client C and a server S when C uses a globally
4
incrementing IP identifier (IP-ID)
1
. This side channel is only
applicable when the TCP connection is over IPv4, since in
IPv6 [31] the IP-ID field is only specified in fragmented
traffic and TCP packets are rarely fragmented. In the following
section we introduce a general technique that does not rely on
IP-ID and also applies to IPv6.
A globally-incrementing identifier is not really hidden from
Mallory, who can usually learn its value simply by receiving
some packet from the victim. A globally incrementing IP
identifier is used in all Windows versions we tested (including
XP, Vista and 7) and is also the default configuration in
FreeBSD; clients running these systems are vulnerable to the
attack below. The vast deployment of Windows on client
machines (more than 70% according to browser user-agent
based surveys, see [32]) makes IP-ID attack vector very
practical.
Section III-A defines a port test that uses the leakage in
the IP-ID field to detect whether C is communicating with
S through a tested port. The test depends on whether C is
connected to the network through a NAT or a stateful firewall
that keeps track of existing connections; the test used when C
is connected through a NAT/firewall device the attack is a bit
simpler. We believe that this is the more common scenario,
since recent versions of Windows (XP SP2 and later) ship
with a built in (stateful) firewall that is enabled by default, and
furthermore, use of NAT devices in small local area networks
connecting clients to the Internet is common. Due to space
limitations we describe only this test and include the test for
the complementary scenario (no firewall/NAT) in an online
technical report [33].
In Section III-B we describe how Mallory can identify a
relatively small set of client ports to test for a connection
with S; Mallory performs the port-test for all of them. Section
III-C presents our experimental setup and empirical results.
A. Port-Test for a Client Behind a Firewall/NAT
According to the TCP specification [34] (Section 3.9, bot-
tom of page 69), the first check that a recipient conducts
on an incoming packet, in case it belongs to an established
connection, validates that the sequence number is within the
congestion window. If this check finds the packet invalid,
then the recipient discards the packet and sends a duplicate
Ack feedback. A stateful firewall or NAT device connecting
C to the network keeps track of existing connections and
processes all incoming packets before they reach C. We use the
following observation: incoming packets that do not belong to
an established connection will be discarded before reaching C
(by firewall/NAT), whereas packets that belong to an existing
connection, but specify arbitrary (probably invalid) sequence
numbers will reach C who replies with a duplicate Ack.
The port test for the case of firewall/NAT deploying client is
according to the general query-probe-query pattern. The probe
specifies Ss address and port as source (i.e., probe is spoofed)
and Cs address as destination, Mallory specifies a different
destination port in each test. Figure 5 illustrates two iterations
of the port test: in the first iteration, the firewall/NAT blocks
1
Ss IP-ID implementation does not influence the probing technique.
Mal C
S
1.1
. P
re
-P
ro
b
e Q
u
er
y
1.3. P
ost
-Prob
e Q
uery
Response, id = i
1
Firewall/NAT
1.2. Prob
e
t
1,1
Response, id = i
1
+1
t
1,2
2.3. P
ost
-Prob
e Q
uery
2.2. Prob
e
t
2,1
t
2,2
Response, id = i
2
Response, id = i
2
+2
2.1
. P
re
-P
ro
b
e Q
u
er
y
Dup Ack,
id = i
2
+1
P
r
o
c
e
s
s
P
r
o
c
e
s
s
P
r
o
c
e
s
s
P
r
o
c
e
s
s
P
r
o
c
e
s
s
RST (NAT
only)
Fig. 5. Two iterations of port test.
the probe packet (i.e., no connection through the tested port).
In the second iteration, the probe specifies existing connection
parameters (IP addresses and ports) and therefore reaches C
who processes the probe and sends a duplicate Ack to S.
Notice that since the probe packet appears to be from S
(in case it specifies a valid 4-tuple), it is difficult to block the
probe in firewalls without blocking the legitimate connection
that C has with S.
When C uses a global identifier, the difference in the IP-ID
field in Cs responses to Mallory indicates whether C had sent
a packet in response to the probe (duplicate Ack). If Mallory
identifies that C had sent a packet, then it is likely that C
is communicating with S via the tested port; however, the
identifier may have increased since C had sent an independent
packet to some other peer. Repeating this test several times
allows Mallory to efficiently detect whether C is connected to
S and reveal Cs, see empirical evaluation below.
We keep a ‘score’ for each possible port, and increment
a specific port’s score by 1 point for every test that seems
to indicate that there is a connection through that port. We
conduct r > 1 rounds of the attack, where each port is tested.
Finally, we decide that there is a connection if there is a port
with a score higher than a threshold, TH.
Some firewalls have an option to randomize the IP-ID; our
tests would, of course, fail if the packets pass through such
randomizing firewall. The attack we describe in the following
section applies even in this scenario (but is less accurate).
1) Implementing Test Queries/Responses.: Our attacks use
packets that Mallory receives from C to learn the effect of
the (spoofed) probe packet. Mallory can cause C to send her
such packets by using the legitimate TCP connection that she
has with C: a query is some short data packet that Mallory
sends to C, the response is the Cs acknowledgment sent back
to Mallory. This allows Mallory to bypass typical firewall
defenses (e.g., Windows), since all packets in the test appear
to belong to legitimate connections (requests to C-Mallory
connection, probe to C-S connection). See further details in
the technical report [33].
5
B. Improving the Search: Client Port Allocation Algorithms
The port test that we presented allows Mallory to test
whether the client has a connection to some server via a
specific port. There are 2
16
possible ports that C might use to
communicate with S. However, common client port allocation
paradigms allow more efficient attacks.
Below we present two common paradigms and methods to
reduce the number of tests for each of them.
1) Globally Incrementing:: the client port is incremented
for every new connection (initialized to a random value) Algo-
rithm 1 in [35] describes the implementation. This approach is
used in Windows and FreeBSD. If C uses this port allocation
paradigm, then recent connections that the client forms are
likely to use ‘close’ ports to that C uses in the connection
with Mallory. Hence, Mallory can test only these ports.
2) Per-Server Incrementing:: the client port is incremented
for every new connection with the server. Connections to
different servers use different counters. This approach is used
in Linux; Algorithm 3 in [35] describes the implementation.
The previous ‘trick’ we presented does not work in this case
since the port that C uses for the connection with Mallory does
not correlate to that C uses to communicate with S. How-
ever, we can still use the counter property of this paradigm:
Mallory causes C to create x ‘dummy’ connections to S (we
explain how below); since these connections all share the same
counter, they are sequential. Hence, Mallory can test every
port y = 0 (mod x) and identify p, a port C that uses to
communicate with S. Next, Mallory checks all ports in the
interval [p x, p + x] and checks whether there are at least
x + 1 connection ports. If yes, then C has an ‘independent’
connection with S. In this method, the attacker would test
roughly
2
16
x
different ports.
In is left to describe how Mallory causes C to establish
multiple connections with S. Since C is in Mallorys site, she
can run a script (in the browser sandbox) on C. This script,
while very limited, can open connections with other servers
to dynamically embed remote objects. We use it to open con-
nections to www1.mallory.com,. . . ,wwwx.mallory.com which
are domains that Mallory controls. Since Mallory controls the
DNS records for these domains, she sets each of these records
to point to the same IP, that of S. Browsers open a new
connection for each domain (regardless of its IP address);
hence, this technique, which we verified on Internet Explorer,
Firefox and Chrome, opens x new connections to S.
The typical limitation of x is the number of connections that
a browser can have simultaneously; this limitation is typically
one or few dozens; e.g., 16 in Firefox. In our experiments
below and in the following section, we use x = 10.
C. Empirical Evaluation
1) Setup.: In our empirical evaluation, the client network
is a class C subnet that has 5 clients running Windows 7,
each of them sends on average 64 packets per second to other
peers in the subnet (these packets are short, to simulate clients
that usually send Ack packets or short requests). Mallory
probes one of the clients in the network, C, who connects
to her (malicious) website. Mallorys bandwidth is limited to
10 mbps. We used the network topology illustrated in Figure
3, network nodes are connected through switch devices. The
0
2
4
6
8
10
0 2 4 6 8 10
Score
Number of Rounds
Connection Port (near-C)
Best Non-Connection Port (near-C)
Connection Port (remote)
Best Non-Connection Port (remote)
Fig. 6. Global-ID attack. Comparison of a connection port to that of the
highest scoring non connection port as a function of round number. Each
measurement is an average of 10 runs, error-bars mark the standard deviation
values.
NAT device in the network topology is a Linux machine
(kernel version 2.6.35) running IP-tables (version 1.4.4). The
server machine runs Linux (kernel version 2.6.35) and uses
an Apache web-server (version 2.2.14). When we evaluate
the attack for the ‘Remote Attacker’ scenario, the adversary
communicates with the clients via a traffic shaper that induces
high latency (200ms), significant loss probability (0.5%) and
jitter (1-10 milliseconds).
2) Evaluation.: We first evaluated the attack in case that C
is communicating with S. We compared between the score of
the ‘connection port’ (i.e., port that C uses for the connection)
to that of the best appearing non-connection port (i.e., port
with the highest score that is not the connection port) in each
round (repetition of the attack, see discussion above); note that
the highest scoring non-connection port may change between
rounds.
Figure 6 shows results for near-C and remote attackers. In
both environments, the score of the connection port was well
above 50% of the maximal score, certainly after ve or more
reounds; hence, for efficiency, we can continue testing only
‘high scoring’ ports in advanced rounds. Namely, a port is
tested in the next round only if its current score is above 50%
of the maximal possible score.
We implemented an adversarial website that presents its
clients a request to arbitrarily decide whether to connect
(‘surf’) to a third-party website, S; our website attempted to
detect the clients’ choice. We used an automatic client, C, that
chooses to connect to S with probability
1
2
and implemented
the port-test above.
The choice of whether there is a connection between C and
S is according to a threshold over the final score of the ports.
Namely, if there exists a port with a score over this threshold,
then we identify that there is a connection. Figure 6 shows
that a choice of 70% of the maximal possible score as a
threshold provides a good seperation between the connection
port (in case it exists) and other ports. Figures 7 - 9 show
the success rate in detecting whether C communicates with S
for different adversary locations as a function of the duration
of the attack. Figure 10 compares the average amount of data
that Mallory sends (per victim) to reach different success rates
and for different locations in the network.
In Figures 7 - 10, the measurements are the average of
6
50 runs; error-bars mark the standard deviation values (for
readability, not all measurements specify the error bars). Note
that the thresholds that we have used in our evaluation may
not work as well in other scenarios, e.g., when the client sends
much more than 64 packets per second the thresholds should
be higher.
IV. TIME-BASED TRAFFIC ANALYSIS
The globally incrementing IP-ID side channel, presented in
Section III, exploits an operating system flaw. In this section
we explore a more generic, timing based, side channel that is
applicable when C is behind a firewall or a NAT. We define
below a new port test which resembles the IP-ID based port
test and is illustrated in Figure 5 as well.
The timing attack is based on the following observation: if C
is protected by a firewall or connects through a NAT device,
then in case that Mallory tests the correct port, C sends an
additional packet to S (response to the probe); this delays
processing of following packets, and in particular the post-
probe query; see illustration in Figure 5. We use this delay to
identify the connection.
A. Timing-Based Port Test
A significant challenge is the jitter in the network, i.e., laten-
cies may vary while testing different ports. Thus, identifying
the longest time difference between two responses and testing
whether it is over a threshold is likely to produce an incorrect
result. We cope with this challenge by relatively comparing
ports: we assign each port to a small group of s arbitrary
ports.
Ports in each group are tested one after the other; we assume
that jitter does not vary much during the short time interval of
testing a specific (small) group. After testing a group, each port
is assigned with a relative rank according to the time difference
between responses in the corresponding port-test; the lower the
(group-relative) rank, the greater the time difference and the
more likely is a connection through that port. We conduct
several rounds of this attack (to reduce the probability of
errors).
Similarly to the attack presented in the previous section,
we keep a score for each port and after each round increase
a port’s score according to its rank: denote by σ
i
the number
of points that a port gains if it has rank i within the group,
these weights are normalized; i.e.,
P
s
i=1
σ
i
= 1, and for
every i < j, σ
i
σ
j
. The values of s and the vector
σ = (σ
1
, · · · , σ
s
) depend on the channel between Mallory and
C. We employ a machine learning approach (genetic algorithm)
to learn appropriate value for the vector σ; see details of the
algorithm in [33]. Let µ, µ
0
denote the expected scores of
connection and non connection ports respectively. The target
function of the learning algorithm is to maximize µµ
0
. In our
empirical evaluation below we explain how Mallory obtains
measurements of µ, µ
0
for different values of s, σ.
B. Empirical Evaluation
The environment we used to evaluate the timing attack is
as described in Section III-C, except that the client machines
0
2
4
6
8
10
12
14
0 5 10 15 20
Score
Number of Rounds
Connection Port (near-C)
Best Non-Connection Port (near-C)
Connection Port (remote)
Best Non-Connection Port (remote)
Fig. 11. Timing Attack. Comparison of a connection port to that of the highest
scoring non connection port as a function of round number. Average of 10
runs, error-bars mark the standard deviation values.
run Linux (kernel version 2.6.35) instead of Windows; hence,
the attacker cannot employ the global IP-ID based attack. All
Linux distributions ship with IP-tables firewall, its rule-set is
empty by default; we therefore evaluated only the scenarios
where Mallory is near-C remote (see Figure 3), i.e., Mallory
communicates with C and S via a NAT device.
The first task is to obtain a good estimation of the optimal
values of s, σ for the channel between C and Mallory (this
depends on Mallory relative location to C). The machine
learning algorithm we employ uses the connection that Mallory
has with C (see Figure 2): since for this connection Mallory
knows the client’s port, he is able to obtain measurements
for different group sizes (s) and weights (the vector σ), see
more details in [33]. We found that these values significantly
differ between the two attacker locations; e.g., in our setup we
found s = 31 to be suitable for a near-C attacker while s = 4
appeared optimal for the remote attacker. Figure 11 compares
the connection-port score to that of the highest scoring non-
connection port as a function of the number of rounds.
Next, we derive two thresholds for promoting ports to
following rounds according to their current score, this is
similar to the experiments in Section III-C. According to the
training set results, a choice of 60% of the maximal score
for the near-C attacker scenario and 40% in for the Interent
attacker scenario appear to be reasonable. As in Section III-C,
these thresholds require further research for other scenarios,
e.g., thresholds are effected by the victim’s transmission rate.
We implemented the timing attack and conducted an exper-
iment similar to that presented in Section III-C. We set the
threshold for deciding whether a connection exists between C
and S according to the difference between the expected scores
of a connection port (µ) and a non connection port (µ
0
) as
derived from our training measurements. See analysis in [33];
in this experiment we set the threshold to 0.2µ
0
+ 0.8µ. We
measured our success rate in probing whether C is communi-
cating with a (third-party) website, S. Results are in Figures
12 - 14.
Comparing these results to those of the ID based attack,
more time is required to obtain similar success rates, and the
maximal success rates reached are lower. However, the results
7
0
0.2
0.4
0.6
0.8
1
0 2 4 6 8 10 12 14 16
Rate
Attack Duration (seconds)
Success
False Positives
False Negatives
Fig. 7. Global-ID attack, with-C attacker.
0
0.2
0.4
0.6
0.8
1
0 5 10 15 20 25 30 35 40
Rate
Attack Duration (seconds)
Success
False Positives
False Negatives
Fig. 8. Global-ID attack, near-C attacker.
0
0.2
0.4
0.6
0.8
1
0 5 10 15 20 25 30 35 40
Rate
Attack Duration (seconds)
Success
False Positives
False Negatives
Fig. 9. Global-ID attack, remote attacker.
0
0.5
1
1.5
2
2.5
0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Data Sent (MB)
Success Rate
with-C attacker
near-C attacker
remote attacker
Fig. 10. Amount of data Mallory sends as a function of her success rate.
show that the timing attack does provide information on the
connection (since success ratio is greater than 0.5); but its
hint is often misleading (since success ratio is significantly
less than 1). Attacker can repeat the attack several times and
select by the majority.
Figure 15 illustrates the average amount of data that Mallory
needs to send in order to reach a particular success rate for
different locations in the network and number of probes in
each test.
In Figures 12 - 15, the measurements are the average of 50
runs; error-bars mark the standard deviation values.
V. TRAFFIC ANALYSIS FOR TOR CLIENTS
In this section we consider the second scenario presented
in Section II, where C uses an onion routing infrastructure to
connect to S. We focus on the popular Tor network, but similar
attacks may apply to other low latency anonymity networks.
In this section we assume that the attacker, Eve, is able to
eavesdrop on C (but not on S).
Here, Eve actively interferes in the possible connection
between C and S and then tests whether a change in the rate of
packets that C receives occurred. If the result is positive, then
it is an indication that C communicates with S. As of writing
this version of the paper, we only did preliminary testing of
this attack; more work is required to evaluate the practicality
of this attack.
A Tor client connects to a remote server via a chain of
relays (proxies). The last relay in the chain, i.e., the exit relay,
has a direct TCP connection with the server. The number of
0
0.2
0.4
0.6
0.8
1
0 50 100 150 200
Portion of Connections
Number of Different Exit Nodes
Fig. 16. The portion of 2000 circuits we created using the Tor client as a
function of the number of different exit relays used.
possible Tor exit relays is important for our attacks (since
a direct connection exists between the exit relay and the
server); the Tor network comprises of few thousand relays,
about one thousand of which can perform as exit relays (see
[11]). However the number of different exit relays that a client
is likely to use is significantly lower: first, a client can only
use online relays; second, Tor clients typically choose the exit
relays according to various parameters such as stability and
bandwidth. We have formed Tor circuits from two clients in
different geographic locations and kept track on the exit relay
that was used. The measurements show that 20% of the 2000
circuits which we created (using Tor client version 0.2.2.35)
had one of 7 specific exit relays. Figure 16 illustrates our
measurements.
8
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 10 20 30 40 50 60 70 80
Rate
Attack Duration (seconds)
Success
False Positives
False Negatives
Fig. 12. Timing attack, near-C attacker. Mallory sends 2 probes in each test.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 10 20 30 40 50 60 70 80
Rate
Attack Duration (seconds)
Success
False Positives
False Negatives
Fig. 13. Timing attack, near-C attacker. Mallory sends 5 probes in each test.
0
0.2
0.4
0.6
0.8
1
0 10 20 30 40 50 60 70 80
Rate
Attack Duration (seconds)
Success
False Positives
False Negatives
Fig. 14. Timing attack, remote attacker. Mallory sends 5 probes in each test.
0
1
2
3
4
5
6
0.45 0.5 0.55 0.6 0.65 0.7 0.75
Data Sent (MB)
Success Rate
near-C attacker, 2 probes
near-C attacker, 5 probes
remote attacker, 5 probes
Fig. 15. Amount of data Mallory sends as a function of her success rate.
A. The Indirect Rate Reduction Attack
In this section we present an attack that uses the following
observation: if Eve influences the rate of communication
between S and the exit relay, then this, in turn, will change
the rate of the connection between C and the first (entrance)
relay. Eve sees the latter connection and is able to detect the
change.
Since Eve can only observe the aggregated rate of data that
C receives from the entrance relay (since communication is
encapsulated), this attack vector weakens when C communi-
cates with several other servers via one Tor circuit and Cs
connection rate with S is relatively small to that of the other
servers.
The following attack uses TCP congestion control mecha-
nisms to fake congestion events; hence, reducing the commu-
nication rate. This attack is based on the insight previously
noted in Section III-A: by sending a (spoofed) packet to an
exit relay, Eve would cause that relay to immediately send a
duplicate acknowledgment (Ack) in response to S, as long as
Eves packet appears from an existing connection between the
exit relay and S. The duplicate Ack that the exit relay sends
to S in response, has a valid sequence number and S will
accept it. A sequence of three duplicate Acks is interpreted
by TCP as a congestion event, see [36]; when it occurs, S
congestion window shrinks. The exact effect depends on the
TCP implementation that the server runs. Until recently, TCP
Reno variant was default in Linux (the common operating
system of server machines); for this variant each congestion
event halves the size of the congestion window. Recent Linux
Eve
Exit Relay
S
s
e
q
=
i
s
o
u
r
c
e
S
(
s
p
o
o
f
e
d
)
,
s
e
q
u
e
n
c
e
#
i
n
v
a
l
i
d
A
c
k
i
A
c
k
i
+
1
A
c
k
i
+
2
A
c
k
i
+
3
3
d
u
p
l
i
c
a
t
e
A
c
k
s
f
o
r
i
+
3
cwnd=2
s
e
q
=
i
+
1
s
e
q
=
i
+
2
s
e
q
=
i
+
3
cwnd=4
Fig. 17. Eve causes the exit relay to send 3 duplicate Acks to S. S’ congestion
window is halved as a result.
kernels use the TCP Cubic variant, where the TCP window
size is multiplied by a constant of 0.8 for each congestion
event.
The congestion window size directly effects the sender’s (S)
transmission rate: S only sends as much as the congestion
window allows. Thus, by causing the exit relay to send a
sequence of 3 duplicate Acks to the server, Eve causes the
latter to significantly reduce its ‘sending’ rate. This attack
is illustrated in Figure 17, which shows the effect when Eve
sends the spoofed packets to an exit relay and port through
which there is a connection with S.
9
1) Attack Process.: We use the asymmetry in the distribu-
tion of client choice for exit relays to reduce the number of
packets that the attacker needs to send to perform indirect rate
reduction. Namely, while there are many exit relays available,
there are only few ‘likely’ exit relays that a client might
use (see discussion above and Figure 16). For every server
IP address s and likely exit relay x, Eve can optionally
employ one of the attacks in the Sections III and IV to
identify those exit relays that communicate with the server.
This optional step will reduce the effort in the following
steps of the attack. The techniques in Sections III and IV
do not only identify the existence of a connection between
two peers, but also identify the client port if a connection
exists, then this is the port with the highest score; see details
on how Eve employs these techniques in [33]. Next, for each
of the ‘suspected’ connections, she performs the indirect rate
reduction attack described above and checks which of the
clients had experienced ‘rate reduction’. This process repeats
several times for statistical coherency; after each iteration the
attack is suspended to allow Ss congestion window to recover.
An important property of this attack is that the spoofed
packets that Eve sends to the exit relay in order to reduce the
server’s rate, are not client specific. Hence, in case that Eve
eavesdrops on multiple clients (e.g., a government spying on
its citizens) this attack would simultaneously check which of
these possible clients has a connection with S.
2) Characteristics of vulnerable connections.: Since the
attack repeats for several iterations with intermediate suspen-
sions, this attack requires connections lasting several minutes
(see evaluation below). Furthermore, the connection must be
‘active’, i.e., the server should send data to the client while
the attack takes place; this allows Eve to detect rate reductions
and allows the congestion window to recover when the attack
is suspended. These type of connections include, for example,
file transfers (over FTP or HTTP).
B. Analysis
Our analysis in this subsection assumes that Eve does not
try to detect a direct connection between the exit relays and the
server S (the optional step). Instead, she performs the indirect
rate reduction attack on every likely exit relay and all possible
ports.
When using Tor, clients connect to S via proxies; therefore,
clients’ geographic location does not hint Eve on the server
IP address that they will connect to (in case S has multiple
physical servers, e.g., for load balancing). As a result, Eve
must enumerate all the IP addresses of S during the attack.
For each of the n
s
server addresses and for every exit relay
that Eve tries, she performs 2
16
iterations, trying a different
port in each iteration; for each port she sends three packets
that would cause the exit relay to send three duplicate Acks
to the server, if a connection exists through that port. These
packets can be short, with only one byte of data, i.e., 41 bytes
long. Hence, the overall data that Eve sends to a particular exit
relay, using a particular source IP of S in a single attack is
2
16
·3·41 < 7.7MB. As shown in Figure 16, a small set of exit
relays allows a good ‘hit’ rate. If Eve enumerates on all n
s
possible server addresses and the most likely seven exit relays,
then by our measurements the attack results in a ‘hit’ rate of
about
1
5
(see Figure 16); in this attack, she sends 53 · n
s
MB
in each round. As noted at the end of the previous subsection,
Eves effort is divided on the number of clients (victims) that
she probes.
C. Empirical Evaluation
1) Setup.: We used the Tor network to evaluate the indirect
rate reduction attack. To simplify the experiment and limit
the effect on other Tor users, we performed the following
measures: the restricted web-site server, a Linux machine
(kernel version 2.6.35) which runs an Apache web-server
(version 2.2.14), had only one IP address; furthermore, when
running the attack, Eve was aware of the exit relay that is
used and its port used for the connection. Given these three
parameters, Eve only sends 3 packets of 41 bytes, i.e., 123
bytes, to carry out a single rate reduction iteration. Below, we
describe the frequency of iterations and show that we send
about 0.5 KB per second; we believe that this did not load the
exit relay or caused damage for other Tor users. The client
machine in our experiments runs Windows 7 and uses Tor
(version 0.2.1.30) to connect to web-servers. While running
our evaluation, we created Tor circuits using 12 different exit
relays.
2) Evaluation.: First, we observed the effect of the rate
reduction attack (three duplicate Ack technique). To measure
this effect, C connects to one of our servers through Tor;
our server reports to Eve the IP address and port of the exit
relay. Eve sends her packets only to the reported exit relay
and only to the specific port used in the connection with our
server. Eve performs three iterations of rate reduction every
second, aiming to fake three congestion events and decrease
the congestion window to about half of its size (in case of
cubic variant). This implies that in every second, Eve sends
369 bytes to the exit relay. In Figure 18 we compare between
the rate of packets that the client receives (as observed by Eve)
on normal conditions and when Eve attacks the exit relay; our
attack reduces the average rate.
We next tested the scenario considered in Section II, i.e., of
a client that connects through Tor to one of two sites. Eve uses
rate reduction to test whether the client is communicating with
the restricted site. We conducted the experiment as follows:
the victim C connects to one of two servers in each time,
each server is chosen with probability
1
2
. Regardless of the
choice that the client makes, the ‘restricted’ server sends Eve
an IP address and port, allegedly describing the exit relay
connected to it. In case that the client does not connect to
the restricted server, these values specify an arbitrary exit relay
and port. Eve then employs the attack above, performing three
rate reductions per second and sending a total of 369 Bps to
the specified exit relay.
If client rate decreased by at least 20% during the last
30 seconds, then the client’s score is incremented. The 20%
threshold is motivated by the results in Figure 18, but may
change in other scenarios, e.g., for a different server. This
process is repeated; between iterations there is a 30 seconds
10
0
5
10
15
20
25
0 20 40 60 80 100
Incoming Packets In Last 100ms
Time (seconds)
(a) Normal Conditions
0
5
10
15
20
25
0 20 40 60 80 100
Incoming Packets In Last 100ms
Time (seconds)
(b) Under Attack
Fig. 18. Comparison between a rate of a TCP connection (via Tor) in normal conditions and when under rate reduction attack.
0
0.2
0.4
0.6
0.8
1
0 50 100 150 200 250 300 350 400 450
Portion
Attack Duration (seconds)
Success Rate
False Positives
False Negatives
Fig. 19. Eve’s success rate in detecting client access to a restricted site via
Tor. Each measurement is the average of 20 runs. Server runs TCP cubic
variant.
suspension that allows the TCP connection between the server
and exit relay to recover to its normal rate (in case the
connection exists) and allows Eve to obtain a recent measure-
ment of the average rate in Cs connection. Eventually, Eve
decides that C is communicating with the restricted site if C
has more than half of the possible points. Figure 19 shows
Eves success rate as a function of the duration of the attack.
In these experiments, the servers run TCP Cubic variant; an
improvement in success rate is observed when server runs TCP
Reno.
VI. DEFENSE MECHANISMS
The countermeasures that we propose in this section do not
completely eliminate the related side channel threat, however,
they make it more difficult to exploit. These defenses are
suitable for deployment on firewalls to ease deployment.
The globally incrementing IP identifier side channel, as
mentioned in Section III, is only relevant while still using IPv4.
One way to avoid it is to use random IP-ID values; however,
this can result in collisions and loss for fragmented traffic.
The attack in Section III can be prevented by simply moving
from globally-incrementing IP-ID to per-destination IP-ID;
this would preferably be done by hosts, but until hosts do so
2
,
a firewall can implement this by adding (pseudo)random per-
2
We informed Microsoft to the IP-ID issues, but we are not aware of
intention to fix the IP-ID in Windows.
destination offset to the IP-ID. See analysis and better ways
to fix the IP-ID in [16], [15].
It is more challenging to block or reduce the timing side
channels and cope with the rate reduction attack presented in
Section V. The flaw that we identify is that a blind adversary
is able to cause a TCP recipient an involuntary reaction by
sending arbitrary (spoofed) packets. We propose keeping a
small window of acceptable sequence numbers that may be
processed. This window resembles the receiver’s congestion
window, but is more aggressive: while packets outside the
congestion window cause a duplicate acknowledgment (which
we use in the attacks described in Sections III-V), packets
that specify sequence numbers outside the acceptable-window
are silently discarded. The acceptable-window is larger than
the host’s congestion window and includes it. A congestion
window is usually up to 2
16
bytes, an acceptable-window that
is twice as large, i.e., 2
17
bytes, will significantly degrade the
attacker’s ability to conduct all the attacks in this paper. Since
the sequence number is 32 bits long, the attacker is required to
send
2
32
2
17
= 2
15
times the number of packets to conduct similar
attacks. However, this technique requires that the firewall will
inspect the sequence numbers in incoming TCP packets, which
increases the packet processing overhead.
VII. CONCLUSIONS AND FUTURE WORK
Our primary conclusion is that TCP implementations leak
information that allows attackers to study the existence of
connections via side channels as we demonstrated in three
attacks.
We leave several research directions for future work. Specif-
ically, a more extensive empirical study is required to complete
the evaluation of the Indirect Rate Reduction attack on the
Tor network. Furthermore, it would be desirable to provide an
analytic analysis for the attacks presented in this paper.
An important question is, can we perform a more efficient
and more accurate attack on Tor anonymity by combining the
indirect rate reduction attack presented in this paper with other
existing attacks on Tor anonymity which exploit other attack
vectors, e.g., [23], [12], [13], [14].
Acknowledgements
Thanks to Moti Geva, Amit Klein, Roger Dingledine and
the anonymous referees for their comments and suggestions.
11
REFERENCES
[1] T. Dierks and E. Rescorla, “The Transport Layer Security (TLS)
Protocol Version 1.2, RFC 5246 (Proposed Standard), Internet
Engineering Task Force, Aug. 2008, updated by RFCs 5746, 5878,
6176. [Online]. Available: http://www.ietf.org/rfc/rfc5246.txt
[2] P. C. Kocher, “Timing Attacks on Implementations of Diffie-Hellman,
RSA, DSS, and Other Systems, in CRYPTO’96, ser. LNCS, N. Koblitz,
Ed., vol. 1109, IACR. Springer-Verlag, Germany, 1996, pp. 104–113.
[Online]. Available: http://www.cryptography.com/timingattack/paper.
htmlhttp://www.cryptography.com/timingattack/timing.pdf
[3] E. W. Felten and M. A. Schneider, “Timing Attacks on Web
Privacy, in Proceedings of the 7th ACM Conference on Computer and
Communications Security, S. Jajodia, Ed. Greece: ACM Press, Nov.
2000, pp. 25–32. [Online]. Available: http://www.acm.org/pubs/articles/
proceedings/commsec/352600/p25-felten/p25-felten.pdf
[4] S. Chakravarty, A. Stavrou, and A. D. Keromytis, “Traffic Analysis
against Low-Latency Anonymity Networks Using Available Bandwidth
Estimation, in ESORICS, D. Gritzalis, B. Preneel, and M. Theoharidou,
Eds., vol. 6345. Springer, 2010, pp. 249–267. [Online]. Available:
http://dx.doi.org/10.1007/978-3-642-15497-3
[5] P. Mittal, A. Khurshid, J. Juen, M. Caesar, and N. Borisov,
“Stealthy Traffic Analysis of Low-Latency Anonymous Communication
Using Throughput Fingerprinting, in ACM Conference on Computer
and Communications Security. ACM, 2011, pp. 215–226. [Online].
Available: http://doi.acm.org/10.1145/2046707.2046732
[6] S. Zander and S. J. Murdoch, “An Improved Clock-Skew Measurement
Technique for Revealing Hidden Services, in USENIX Security
Symposium, P. C. van Oorschot, Ed. USENIX Association, 2008, pp.
211–226. [Online]. Available: http://www.usenix.org/events/sec08/tech/
full papers/zander/zander.pdf
[7] G. Lyon, Nmap Network Scanning: The Official Nmap Project Guide
to Network Discovery and Security Scanning. http://nmap.org/book/,
2009.
[8] M. Zalewski, Silence on the Wire: A Field Guide to Passive Reconnais-
sance and Indirect Attacks. No Starch Press, 2005.
[9] S. M. Bellovin, “A Technique for Counting Natted Hosts, in
Internet Measurement Workshop. ACM, 2002, pp. 267–272. [Online].
Available: http://doi.acm.org/10.1145/637201.637243
[10] R. Dingledine, N. Mathewson, and P. F. Syverson, “Tor: The
Second-Generation Onion Router, in USENIX Security Symposium.
USENIX, 2004, pp. 303–320. [Online]. Available: http://www.usenix.
org/publications/library/proceedings/sec04/tech/dingledine.html
[11] “Tor Metrics Portal. Network and Usage Graphs, http://metrics.
torproject.org/graphs.html, Nov. 2011.
[12] A. Hintz, “Fingerprinting websites using traffic analysis, in Privacy
Enhancing Technologies, ser. Lecture Notes in Computer Science,
R. Dingledine and P. F. Syverson, Eds., vol. 2482. Springer, 2002, pp.
171–178. [Online]. Available: http://dx.doi.org/10.1007/3-540-36467-6
13
[13] S. Kadloor, X. Gong, N. Kiyavash, T. Tezcan, and N. Borisov,
“Low-Cost Side Channel Remote Traffic Analysis Attack in Packet
Networks, in ICC. IEEE, 2010, pp. 1–5. [Online]. Available:
http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=5497983
[14] A. Panchenko, L. Niessen, A. Zinnen, and T. Engel, “Website
Fingerprinting in Onion Routing Based Anonymization Networks,
in Proceedings of the 10th annual ACM workshop on Privacy in the
electronic society, ser. WPES ’11. New York, NY, USA: ACM, 2011,
pp. 103–114. [Online]. Available: http://doi.acm.org/10.1145/2046556.
2046570
[15] F. Gont, “Security Assessment of the Internet Protocol Version 4,
RFC 6274 (Informational), Internet Engineering Task Force, Jul. 2011.
[Online]. Available: http://www.ietf.org/rfc/rfc6274.txt
[16] Y. Gilad and A. Herzberg, “Fragmentation Considered Vulnerable:
Blindly Intercepting and Discarding Fragments, in Proc. USENIX
Workshop on Offensive Technologies, Aug 2011. [Online]. Available:
http://www.usenix.org/events/woot11/tech/final/files/Gilad.pdf
[17] S. Sanfilippo, “About the IP Header ID, http://www.kyuzz.org/antirez/
papers/ipid.html, Dec 1998.
[18] ——, “A New TCP Scan Method, http://seclists.org/bugtraq/1998/Dec/
79, 1998.
[19] G. Danezis, “The Traffic Analysis of Continuous-Time Mixes, in
Proceedings of Privacy Enhancing Technologies workshop (PET), 2004,
pp. 35–50.
[20] B. N. Levine, M. K. Reiter, C. Wang, and M. K. Wright, “Timing Attacks
in Low-Latency Mix-Based Systems, in Proc. Financial Cryptography,
A. Juels, Ed. Springer-Verlag, LNCS 3110, Feb. 2004, pp. 251–265.
[21] Zhu, Fu, Graham, Bettati, and Zhao, “On Flow Correlation Attacks
and Countermeasures in Mix Networks, in International Workshop on
Privacy Enhancing Technologies (PET), LNCS, vol. 4, 2004.
[22] R. Pries, W. Yu, X. Fu, and W. Zhao, “A New Replay Attack
Against Anonymous Communication Networks, in IEEE International
Conference on Communications (ICC), 2008, pp. 1578–1582. [Online].
Available: http://dx.doi.org/10.1109/ICC.2008.305
[23] N. S. Evans, R. Dingledine, and C. Grothoff, “A Practical Congestion
Attack on Tor Using Long Paths, in USENIX Security Symposium.
USENIX Association, 2009, pp. 33–50. [Online]. Available: http:
//www.usenix.org/events/sec09/tech/full papers/evans.pdf
[24] S. J. Murdoch and G. Danezis, “Low-Cost Traffic Analysis of Tor, in
IEEE Symposium on Security and Privacy. IEEE Computer Society,
2005, pp. 183–195. [Online]. Available: http://doi.ieeecomputersociety.
org/10.1109/SP.2005.12
[25] S. Kent and K. Seo, “Security Architecture for the Internet
Protocol, RFC 4301 (Proposed Standard), Internet Engineering Task
Force, Dec. 2005, updated by RFC 6040. [Online]. Available:
http://www.ietf.org/rfc/rfc4301.txt
[26] F. Baker and P. Savola, “Ingress Filtering for Multihomed Networks,
RFC 3704 (Best Current Practice), Internet Engineering Task Force,
Mar. 2004. [Online]. Available: http://www.ietf.org/rfc/rfc3704.txt
[27] P. Ferguson and D. Senie, “Network Ingress Filtering: Defeating
Denial of Service Attacks which employ IP Source Address
Spoofing, RFC 2827 (Best Current Practice), Internet Engineering
Task Force, May 2000, updated by RFC 3704. [Online]. Available:
http://www.ietf.org/rfc/rfc2827.txt
[28] T. Killalea, “Recommended Internet Service Provider Security
Services and Procedures, RFC 3013 (Best Current Practice),
Internet Engineering Task Force, Nov. 2000. [Online]. Available:
http://www.ietf.org/rfc/rfc3013.txt
[29] Advanced Network Architecture Group, “Spoofer Project,
http://spoofer.csail.mit.edu/summary.php, May 2013.
[30] T. Ehrenkranz and J. Li, “On the State of IP Spoofing Defense, ACM
Transactions on Internet Technology, vol. 9, no. 2, pp. 6:1–6:29, 2009.
[Online]. Available: http://doi.acm.org/10.1145/1516539.1516541
[31] S. Deering and R. Hinden, “Internet Protocol, Version 6 (IPv6)
Specification, RFC 2460 (Draft Standard), Internet Engineering Task
Force, Dec. 1998, updated by RFCs 5095, 5722, 5871, 6437, 6564,
6935. [Online]. Available: http://www.ietf.org/rfc/rfc2460.txt
[32] Wikipedia, “Usage Share of Operating Systems,” http://en.wikipedia.org/
wiki/Usage share of operating systems, December 2011.
[33] Y. Gilad and A. Herzberg, “Spying in the Dark: TCP and Tor Traffic
Analysis, http://u.cs.biu.ac.il/
herzbea/security/TR/TR12 02, Bar Ilan
University, Tech. Rep. 2, April 2012.
[34] J. Postel, “Transmission Control Protocol, RFC 793 (INTERNET
STANDARD), Internet Engineering Task Force, Sep. 1981, updated
by RFCs 1122, 3168, 6093, 6528. [Online]. Available: http:
//www.ietf.org/rfc/rfc793.txt
[35] M. Larsen and F. Gont, “Recommendations for Transport-Protocol
Port Randomization, RFC 6056 (Best Current Practice), Internet
Engineering Task Force, Jan. 2011. [Online]. Available: http:
//www.ietf.org/rfc/rfc6056.txt
[36] M. Allman, V. Paxson, and E. Blanton, “TCP Congestion Control,
RFC 5681 (Draft Standard), Internet Engineering Task Force, Sep.
2009. [Online]. Available: http://www.ietf.org/rfc/rfc5681.txt
... The most intuitive feature used in these attacks is the inter-packet arrival time [28]. Packet rate, used by Gilad et al. [57], and latency are some other features used in the timing attacks. ...
... 2) Active Attacks: Most side-channel attacks available in the surveyed literature are passive. However, Gilad et al. [57] describe an active attack where the attacker influences the rate of communication between the exit node and the server and is, therefore, able to observe the traffic between the client and the entry guard. Firstly, the attacker sends spoofed packets (with the server's address and port as a source) from the probe circuit to the exit node. ...
... The attacker observes this at the client end, thus de-anonymising the communications. However, it should be noted that Gilad et al.'s paper [57] was published with only preliminary results for this type of de-anonymisation attack. In 2014, He et al. [72] proposed an active WF attack. ...
Article
Full-text available
Anonymity networks are becoming increasingly popular in today’s online world as more users attempt to safeguard their online privacy. Tor is currently the most popular anonymity network in use and provides anonymity to both users and services (hidden services). However, the anonymity provided by Tor is also being misused in various ways. Hosting illegal sites for selling drugs, hosting command and control servers for botnets, and distributing censored content are but a few such examples. As a result, various parties, including governments and law enforcement agencies, are interested in attacks that assist in de-anonymising the Tor network, disrupting its operations, and bypassing its censorship circumvention mechanisms. In this survey paper, we review known Tor attacks and identify current techniques for the de-anonymisation of Tor users and hidden services. We discuss these techniques and analyse the practicality of their execution method. We conclude by discussing improvements to the Tor framework that help prevent the surveyed de-anonymisation attacks.
... We can circumvent blocked Tor access in different ways [17], but users never know if someone analyzes their traffic [18], [30], [36]. Low-cost countermeasures do not sufficiently protect metadata [13] and obfuscating traffic against correlation leads to per-packet delays [28]. ...
... Throughout this work, we follow attacker models proposed in the literature and assume an AS-level adversary who can cover between 40 % (single malicious AS) and 85 % (state level adversary, colluding ASes) [40] of nodes in an attack. The attacker is capable of performing routing attacks, e. g., BGP hijacks [44] for redirecting user traffic, and trafficanalysis attacks [18], [21], [36] with the goal of learning sensitive information about Tor users. This may be achieved by having access to relays in the Tor network (by contributing as a volunteer relay operator), to layer three or four switches (network nodes that forward IP or TCP/UDP traffic), or by monitoring Internet exchange points (IXP). ...
... In a well-known example of this principle, the sender of a 2013 Harvard bomb threat was identified despite their use of Tor because they were the only client connecting to Tor from Harvard's campus at the time [21]. 4) Traffic Analysis: Traffic analysis attacks are a type of anonymity-compromising attack against Tor that identify features of encrypted traffic stream, such as packet interarrival times [57], to either: 1) recognize a previously observed stream [38,47,50,62], linking it across Tor or, 2) observe a pattern corresponding to a website fingerprint and infer the destination of traffic [20,45,72,80,88]. Both styles give the adversary an advantage in linking a client to their destination, compromising Tor's anonymity by making clients, servers, or client-server pairs more identifiable. ...
Preprint
Full-text available
We present ShorTor, a protocol for reducing latency on the Tor network. ShorTor uses multi-hop overlay routing, a technique typically employed by content delivery networks, to influence the route Tor traffic takes across the internet. ShorTor functions as an overlay on top of onion routing-Tor's existing routing protocol and is run by Tor relays, making it independent of the path selection performed by Tor clients. As such, ShorTor reduces latency while preserving Tor's existing security properties. Specifically, the routes taken in ShorTor are in no way correlated to either the Tor user or their destination, including the geographic location of either party. We analyze the security of ShorTor using the AnoA framework, showing that ShorTor maintains all of Tor's anonymity guarantees. We augment our theoretical claims with an empirical analysis. To evaluate ShorTor's performance, we collect a real-world dataset of over 400,000 latency measurements between the 1,000 most popular Tor relays, which collectively see the vast majority of Tor traffic. With this data, we identify pairs of relays that could benefit from ShorTor: that is, two relays where introducing an additional intermediate network hop results in lower latency than the direct route between them. We use our measurement dataset to simulate the impact on end users by applying ShorTor to two million Tor circuits chosen according to Tor's specification. ShorTor reduces the latency for the 99th percentile of relay pairs in Tor by 148 ms. Similarly, ShorTor reduces the latency of Tor circuits by 122 ms at the 99th percentile. In practice, this translates to ShorTor truncating tail latencies for Tor which has a direct impact on page load times and, consequently, user experience on the Tor browser.
... The most famous class of attacks is represented by the traffic analysis attacks [14][15][16] in which the adversary analyzes the traffic to find correlations. Among the traffic analysis attacks there are the timing attacks [17][18][19], in which the adversary observes the timing of the messages arriving at and leaving from the nodes to find correlations.Other interesting subclasses of traffic analysis attacks are traffic confirmation attacks [20], in which the adversary controls and observes two possible end-relays of a Tor circuit to conclude that they really belong to the same circuit, and watermarking attacks [21], in which the adversary manipulates the traffic stream by introducing an identifiable pattern. Another category of attacks target the router selection used to build the Tor circuit. ...
Article
Full-text available
Tor is the de facto standard used for anonymous communication over the Internet. Despite its wide usage, Tor does not guarantee sender anonymity, even in a threat model in which the attacker passively observes the traffic at the first Tor router. In a more severe threat model, in which the adversary can perform traffic analysis on the first and last Tor routers, relationship anonymity is also broken. In this paper, we propose a new protocol extending Tor to achieve sender anonymity (and then relationship anonymity) in the most severe threat model, allowing a global passive adversary to monitor all of the traffic in the network. We compare our proposal with Tor through the lens of security in an incremental threat model. The experimental validation shows that the price we have to pay in terms of network performance is tolerable.
... Tshark module sets the IP and ORPort of Entry Node in CaptureFilter like Fig. 3 as a filtering option before collecting packet because Tshark module perform packet collection and refinement simultaneously [14]. CaptureFilter filters and sends packet to the dumpcap according to the filtering option set at the time of collection, then this dumpcap stores received packet in the virtual machine storage space in libpcap file format. ...
... Specifically, to determine if two participants are in communication, GPAs may conduct prefix hijacking [47] to intercept network traffic and then use off-path statistical analysis [48] to sort messages. For example, in a typical passive traffic analysis attack, GPAs inspect every message of the network and keep observing the load of each participant. ...
Article
Full-text available
Traditional anonymous networks (e.g., Tor) are vulnerable to traffic analysis attacks that monitor the whole network traffic to determine which users are communicating. To preserve user anonymity against traffic analysis attacks, the emerging mix networks mess up the order of packets through a set of centralized and explicit shuffling nodes. However, this centralized design of mix networks is insecure against targeted DoS attacks that can completely block these shuffling nodes. In this paper, we present DAENet, an efficient mix network that resists both targeted DoS attacks and traffic analysis attacks with a new abstraction called Stealthy Peer-to-Peer (P2P) Network. The stealthy P2P network effectively hides the shuffling nodes used in a routing path into the whole network, such that adversaries cannot distinguish specific shuffling nodes and conduct targeted DoS attacks to block these nodes. In addition, to handle traffic analysis attacks, we leverage the confidentiality and integrity protection of Intel SGX to ensure trustworthy packet shuffles at each distributed host, and use multiple routing paths to prevent adversaries from tracking and revealing user identities. We show that our system is scalable with moderate latency (2.2s) when running in a cluster of 10,000 participants and is robust in the case of machine failures, making it an attractive new design for decentralized anonymous communication. DAENet's code is released on http://github.com/tdsc0652/dae-net.
... Malicious CAPTCHA [14] has also been used to trick the user into disclosing private information. In another attack, cross-site side-channel was used to detect whether a user is contacting a given web-site, even when the user uses Tor browser [16]. Fig. 1: Cross-site side-channel attacks are a combination of two attack types: cross-site and side-channel. ...
Conference Paper
Full-text available
Cross-site search attacks allow a rogue website to expose private, sensitive user-information from web applications. The attacker exploits timing and other side channels to extract the information, using cleverly-designed cross-site queries. In this work, we present a systematic approach to the study of cross-site search attacks. We begin with a comprehensive taxonomy, clarifying the relationships between different types of cross-site search attacks, as well as relationships to other attacks. We then present, analyze, and compare cross-site search attacks; We present new attacks that have improved efficiency and can circumvent browser defenses, and compare to already-published attacks. We developed and present a reproducibility framework, which allows study and evaluation of different cross-site attacks and defenses. We also discuss defenses against cross-site search attacks, for both browsers and servers. We argue that server-based defenses are essential, including restricting cross-site search requests.
Article
The research in marine sensors and the Internet of Things (IoT) has grown exponentially with the ample warehouse of natural materials in the sea. The growing activities in the marine sensor environment increased the threat of anomalies and cyber-attacks. Many Intrusion Detection Systems (IDS) and classical machine learning-based models have been proposed to secure the sensor-based IoT infrastructure. Still, these mechanisms have failed to achieve significant results for securing the marine sensor environment due to the discriminant requirements of the IoT appliances in deep oceans, such as distribution, information complexity, scalability, higher network bandwidth requirements, and low computational capacity. Hence, we propose a lightweight and robust ensemble model to secure the marine IoT environment from cyber-attacks and malicious activities. This paper established an optimized Light Gradient Boosting Machine (Light-GBM) algorithm for ocean IoT attack detection. The experiments were conducted on Distributed Smart Space Orchestration System (DS2OS) dataset. The proposed methodology includes a label encoding technique for best feature selection, hyper-parameter tuning, ensemble function, and a novel algorithm to develop an ocean IoT attack detection model. As an extension of traditional methods, the optimized Light-GBM model can handle the distributed IoT attacks in the deeper marine environments with low computational cost and with 98.52% detection accuracy. The comparative analysis confirms the effectiveness of the proposed model for marine sensor safety. Conclusively, the proposed model mitigates the threat of cyber-attacks in the marine sensor environment and presenting a promising future in real-time ocean-based IoT applications.
Article
In this paper, we uncover a new off-path TCP hijacking attack that can be used to terminate victim TCP connections or inject forged data into victim TCP connections by manipulating the new mixed IPID assignment method, which is widely used in Linux kernel version 4.18 and beyond. Our attack has three steps. First, an off-path attacker can downgrade the IPID assignment for TCP packets from the more secure per-socket-based policy to the less secure hash-based policy, thus building a shared IPID counter that forms a side channel in the victim. Second, the attacker detects the presence of TCP connections by observing the side channel of the shared IPID counter. Third, the attacker infers sequence and acknowledgment numbers of the detected connection by observing the side channel. Consequently, the attacker can completely hijack the connection, e.g., resetting the connection or poisoning the data stream. We evaluate the impacts of our attack in the real world, and we uncover that more than 20% of Alexa top 100k websites are vulnerable to our attack. Our case studies of SSH DoS, manipulating web traffic, and poisoning BGP routing tables show its threat on a wide range of applications. Moreover, we demonstrate that our attack can be further extended to exploit IPv4/IPv6 dual-stack networks on increasing the hash collisions and enlarging vulnerable populations. Finally, we analyze the root cause and develop a new IPID assignment method to defeat this attack. We prototype our defense in Linux 4.18 and confirm its effectiveness in the real world.