Conference PaperPDF Available

Spying in the Dark: TCP and Tor Traffic Analysis

July 2012

July 2012
7384:100-119

DOI:10.1007/978-3-642-31680-7

Conference: Privacy Enhancing Technologies Symposium

Authors:

Yossi Gilad

Hebrew University of Jerusalem

Amir Herzberg

University of Connecticut

We show how to exploit side-channels to identify clients without eavesdropping on the communication to the server, and without relying on known, distinguishable traffic patterns. We present different attacks, utilizing different side-channels, for two scenarios: a fully off-path attack detecting TCP connections, and an attack detecting Tor connections by eavesdropping only on the clients. Our attacks exploit three types of side channels: globally-incrementing IP identifiers, used by some operating systems, e.g., in Windows; packet processing delays, which depend on TCP state; and bogus-congestion events, causing impact on TCP’s throughput (via TCP’s congestion control mechanism). Our attacks can (optionally) also benefit from sequential port allocation, e.g., deployed in Windows and Linux. The attacks are practical - we present results of experiments for all attacks in different network environments and scenarios. We also present countermeasures for these attacks.

Query-Probe-Query off-path attack pattern. Attacker uses spoofed source address for probe.

…

Global-ID attack. Comparison of a connection port to that of the highest scoring non connection port as a function of round number. Each measurement is an average of 10 runs, error-bars mark the standard deviation values.

…

Timing Attack. Comparison of a connection port to that of the highest scoring non connection port as a function of round number. Average of 10 runs, error-bars mark the standard deviation values.

…

Figures - uploaded by Amir Herzberg

Content may be subject to copyright.

Content uploaded by Amir Herzberg

Content may be subject to copyright.

Spying in the Dark:

TCP and Tor Trafﬁc Analysis

Yossi Gilad and Amir Herzberg

Abstract—We show how to exploit side-channels to identify

clients without eavesdropping on their communication with the

server, and without relying on known, distinguishable trafﬁc

patterns. We present different attacks, utilizing different side-

channels, for two scenarios: an off-path attacker that detects

TCP connections, and an eavesdropper attacker that detects

connections relayed via the Tor anonymity network.

Our attacks exploit three types of side channels: globally-

incrementing IP identiﬁers, used by some operating systems,

e.g., in Windows; the port selection algorithm used by Linux

and Android, which we show to leak when a user connects

to a website (this algorithm is standardized, and recommended

in RFC 6056); and bogus-congestion events, causing impact on

TCP’s throughput (via TCP’s congestion control mechanism).

The attacks are practical - we present results of experiments for

all attacks in different network environments and scenarios. We

conclude this work with practical client and server end defenses

against our attacks.

I. INTRODUCTION

Internet communication is often sensitive, and security

measures must be taken to protect privacy against attackers.

The exact measures depend on the exact threat; in particular,

encryption protocols such as TLS [1] are necessary to protect

content from an eavesdropping attacker.

However, encryption is insufﬁcient to prevent trafﬁc anal-

ysis, and in particular, to prevent exposure of the identities

of the communicating peers. Users concerned against trafﬁc

analysis by eavesdropping attackers, use anonymizing services

such as Tor. Other users may simply assume that the adversary

is off-path (non-eavesdropping), and expect privacy against

such (weaker) attackers.

We present three trafﬁc-analysis attacks against these sce-

narios. Two attacks identify clients that communicate to a spe-

ciﬁc server directly over TCP (without anonymizing interme-

diaries such as Tor). Such attacks do not require eavesdropping

at all, and may be launched by weak, off-path attackers, even

for commercial motivations. In fact, since the attacks do not

involve eavesdropping, they may even be deemed to be legal

(not wiretapping). We believe technical measures should (and

can) prevent such off-path trafﬁc analysis.

Our third trafﬁc analysis attack is against Tor users. It re-

quires eavesdropping abilities only on the client side, and only

spooﬁng abilities on the server side. We believe that this is an

important scenario, since Tor clients are often concerned about

attackers which can eavesdrop on their connection to the Tor

relay, since the client-relay connection may be insecure (e.g.,

Internet cafe) or controlled by a potential snoopy organization

Department of Computer Science, Bar Ilan University, Israel.

3a. Post-Probe Query

3b. Post-Probe Response

Attacker C S

1a. Pre-Probe Query

1b. Pre-Probe Response

2a. Probe

Post-Probe

Phase

Pre-Probe

Phase

Probe Phase

2b. Probe2a. Probe 2c. Probe

FW/NAT

Fig. 1. Query-Probe-Query off-path attack pattern. Attacker uses spoofed

source address for probe.

(employer, government, etc). Our evaluation of this attack is

yet incomplete; however, the preliminary results which we

present in this paper provide a warning about a new attack

vector on the Tor anonymity network.

Our attacks exploit different side-channels, providing useful

information on a TCP/Tor connection to an off-path attacker

(for Tor, attacker can eavesdrop, but only on the client). Side

channels have been extensively used in attacks on crypto-

graphic systems, e.g., [2], but also in attacks on privacy, e.g.,

[3], and more recently also applied to trafﬁc analysis [4], [5],

[6].

Our attacks on direct (i.e., non-anonymous) TCP connec-

tions can be viewed as instances of an attack pattern which

we call Query-Probe-Query, illustrated in Figure 1; this is

a generalization of the well-known idle (stealth) scan attack

[7], [8], and of the measurement method used in [9]. In the

Query-Probe-Query attack pattern illustrated in Figure 1, the

ﬁrst query measures some ‘pre-probe state’; the probe may

cause some change on the state, where the change depends

on the property measured, e.g., whether a speciﬁc client port

is open; and the ﬁnal query measures the ‘post-probe state’.

This attack pattern can be applied with different queries and

probes, to measure different values.

In our implementations, each probe is a packet, or few

packets, sent to the client C. The probes test for some event

e, such that if e holds then C will send some packet(s), and

otherwise he will not send a packet (or send less packets).

We use the pre/post probe response to infer on e: we measure

the increase in the IP-ID counter, or measure time between

receipt of the two response packets (1b. and 3b in Figure 1).

Table I summarizes our results for trafﬁc analysis on direct

TCP connections.

In the recent years, there is growth in the use of anonymity

mechanisms such as Tor [10]. Tor is a low latency, circuit

based, anonymity network that is widely used and increasing in

Side Adversary Location

Channel Local Remote

Success Rate

IP-ID (Section III) 1 0.92

Timing (Section IV) 0.74 0.58

Attack Duration (seconds)/

Data Sent (MB)

IP-ID (Section III) 14/0.6 38/2

Timing (Section IV) 70/5 50/4

TABLE I

RESULTS FOR PROBING DIRECT (TCP) CONNECTIONS. ATTACKER LOCATION IS RELATIVE TO THE CLIENT (VICTIM). SUCCESS RATES CAN BE IMPROVED

BY REPEATING THE ATTACK.

popularity (according to [11], increase of about 70% in recent

year). Tor is designed to ensure trafﬁc privacy, even when

the adversary is able to eavesdrop on either C or S. However,

due to its low latency, Tor cannot ensure trafﬁc privacy against

attackers eavesdropping at both ends (C and S), see discussion

on related works below.

We show how an adversary capable of eavesdropping on

the client, C, but not on the server, S, is able to detect that

C is communicating with S. This attack uses a side channel

as well, but does not follow the query-probe-query pattern;

here the attacker causes a reduction in the rate of packets that

would reach C in case that he is communicating with S, and

then tests whether reduction had occurred.

Our attacks on Tor are active, i.e., involve sending packets

to Tor exit relay. When there is a known, distinct trafﬁc pattern

to the communication of speciﬁc server (website ﬁngerprint),

then alternative passive attacks may be applicable as well, e.g.,

[12], [13], [14]. It may be possible to extend our techniques

to also take advantage of such site ﬁngerprint, when available.

A. Related works

IP-ID side-channel and off-path trafﬁc analysis.: We show

how the use of globally-incrementing IP-ID ﬁeld in IP headers,

provides side-channel allowing effective off-path trafﬁc anal-

ysis. The use of globally-incrementing IP-ID is recognized, in

[15], as a common practice with known security implications;

e.g., both globally-incrementing and per-destination incre-

menting IP-ID allow interception, injection and discarding of

fragmented trafﬁc [16]. Globally-incrementing IP-ID can allow

estimation of the number of packets sent [17], stealth-scan for

open ports (idle scan) [18] and counting hosts behind NAT

[9].

The technique that we present for the case of a client that

uses a globally-incrementing IP ID and is not connected via a

ﬁrewall/NAT (see Figure 1) is a variant of idle-scan [7], [8].

The difference is that while idle-scan probes a server for an

open (i.e., ‘listening’) port, our attack probes a client for a

connection with a server.

The only other previous work we found that performs trafﬁc

analysis by off-path attacker, using a side-channel, is [13].

Their attack is based on detecting changes in the round trip

delays from the attacker to the DSL router; this is a rather

crude side channel, much less efﬁcient than both the IP-ID

and the timing side channels we use. Indeed, they only present

detection of whether a client is browsing or playing a video,

based on the signiﬁcant difference in bandwidth, and assuming

no other trafﬁc. Our results dramatically improve the impact

of detection compared to theirs, provided that the attacker can

communicate with the clients.

Tor trafﬁc analysis.: Low-latency anonymity networks are

known to be vulnerable to trafﬁc correlation attacks by an

attacker that eavesdrops on both ends; this problem, and

possible countermeasures, was studied in several works, e.g.,

[19], [20], [21]; efﬁciency and accuracy can signiﬁcantly

improve if attacker can also manipulate trafﬁc at the exit relay,

see [22]. Indeed, Tor designers are well aware of its inability

to properly protect against an attacker (eavesdropping) at both

ends of the circuit.

Other attacks manipulate the trafﬁc at the server or the last

(exit) relay in the circuit, and use different techniques to detect

the relay along the path based on delays [4], [23], [24]. These

works assume that the attacker controls the server or the exit

relay, but do not require client-side eavesdropping. In contrast,

our attack on Tor requires client-side eavesdropping, but does

not require control over the server or exit relay. An obvious

challenge is to combine the results, and identify clients without

controlling server or exit relay, and without eavesdropping at

all. Our trafﬁc detection attacks on TCP may be applicable.

B. Our contributions

The main contribution of this paper is identiﬁcation and

analysis of side channels in the TCP/IP suite and their practical

implications on privacy, as we verify in experiments. We

provide practical countermeasures to the problems that we

identify, these allow quick patching at the ﬁrewall level and

require no changes to hosts or core operating system services.

This work motivates use of cryptography in lower network

layers and in particular IPsec [25] as we show that higher

network layer solutions such as SSL/TLS do not prevent blind

trafﬁc analysis.

C. Paper Organization

In Section II we present our attacker models and the

scenarios that we consider, we also present the criteria we use

to measure the effectiveness of the attacks. Sections III, IV

present the global-ID and timing side channels; both sections

provide results of empirical experiments. Section V presents

our attack on Tor and corresponding experiments. Section

VI presents practical defenses. Finally, Section VII presents

our conclusions from this work, as well as future research

directions.

II. MODEL

Let C and S be communicating TCP client and server (re-

spectively). We consider two types of adversaries, depending

on how C and S are connected. In Sections III and IV, we

www.s.com

www.mallory.com

connec tion

Network

Fig. 2. C is surﬁng in both Mallory and S’s sites, Mallory tries to detect

whether there is a connection between C and S.

consider the case that C and S have a direct TCP connection.

In Section V, C connects to S through the onion routing

anonymity network, Tor [10]; i.e., C communicates with S

via a circuit of relays (proxies). The goal of the attacker is

to identify clients who connect to a server S. We identify S

using its IP address and port.

We consider two types of attackers: Mallory, an off-path

adversary, and Eve, an eavesdropping adversary. The at-

tackers can send spoofed packets, i.e., packets with fake

(spoofed) sender IP address. Due to ingress ﬁltering [26],

[27], [28] and other anti-spooﬁng measures, IP spooﬁng is

less commonly available than before, but still feasible, see

[29], [30]. Apparently, there is still a signiﬁcant number of

ISPs that do not perform ingress ﬁltering for their clients

(especially to multihomed customers). Furthermore, with the

growing concern of cyberwarfare and cybercrime, some ISPs

may intentionally support spooﬁng. Hence, it is still reasonable

to assume spooﬁng ability.

We describe both adversary models in Sections II-A and

II-B below. Section II-C presents the criteria we use to evaluate

the attacks we present.

A. Mallory - Off-path Adversary

We assume that C visits a website that Mallory controls,

denoted www.mallory.com. Mallory uses this (legitimate) con-

nection, to probe whether C has any connections S, see Figure

We consider three variants of Mallory, as illustrated in

Figure 3: with-C, near-C and remote. These differ with respect

to Mallory’s abilities to communicate with C; the greater the

distance, the more likely it is that packet loss or reordering

occurs, decreasing the quality of the side channels.

The with-C and near-C attackers are located near the client

(C); the difference between them is that the with-C adversary

directly communicates with the client, allowing Mallory to take

advantage of Windows globally incrementing port allocation

(if C runs Windows). When the adversary and C communicate

via a NAT (near-C or remote), we assume that the NAT

uses per destination incremental assignment of external ports

(e.g., as in the widely-used IP-tables NAT/Firewall provided

in Linux). See in Section III how we exploit different client

port allocation techniques. Finally, the remote Mallory attacker

simulates an adversary that communicates with the clients

Clients

Network

NAT

With-C

Attacker

Near-C

Attacker

Remote

Attacker

Fig. 3. Three variants of the Mallory adversary.

www.restricted.com

accessed

a restricted

site

www.other.com

Tor

exit relay

Fig. 4. Eve identiﬁes that some of the clients she eavesdrops on are

using Tor and wants to detect which of them is communicating with

www.restricted.com. C (C

) connects to www.restricted.com via a circuit

of 3 relays.

from a remote location, i.e., via a high latency, jittery and

lossy channel.

B. Eve - Adversary for Anonymized Connections

In the attacks on Tor, we consider the adversary Eve who

is able to eavesdrop on many clients that use Tor, however,

Eve cannot eavesdrop on the servers (see Figure 4). Such an

adversary may include a government or an employer, spying

on citizens or employees. Eve’s goal is to detect which of

the clients is communicating (using Tor) with a particular

watched/restricted site, S.

C. Attack Evaluation Criteria

In addition to measuring the success, false positive and false

negative rates, we consider two additional measures. The ﬁrst

measure is the time that an adversary (with some reasonable

constant bandwidth) needs to run the attack in order to reach

a particular success probability for detecting a connection.

This value also provides the minimal detectable connection

time. The second measure is the average amount of data per

victim that the attacker is required to send to reach a particular

success rate.

III. GLOBALLY-INCREMENTING IDENTIFIER BASED

TRAFFIC ANALYSIS

This section presents a probing technique that allows an

off-path (blind) adversary, Mallory, to identify a connection

between a client C and a server S when C uses a globally

incrementing IP identiﬁer (IP-ID)

. This side channel is only

applicable when the TCP connection is over IPv4, since in

IPv6 [31] the IP-ID ﬁeld is only speciﬁed in fragmented

trafﬁc and TCP packets are rarely fragmented. In the following

section we introduce a general technique that does not rely on

IP-ID and also applies to IPv6.

A globally-incrementing identiﬁer is not really hidden from

Mallory, who can usually learn its value simply by receiving

some packet from the victim. A globally incrementing IP

identiﬁer is used in all Windows versions we tested (including

XP, Vista and 7) and is also the default conﬁguration in

FreeBSD; clients running these systems are vulnerable to the

attack below. The vast deployment of Windows on client

machines (more than 70% according to browser user-agent

based surveys, see [32]) makes IP-ID attack vector very

practical.

Section III-A deﬁnes a port test that uses the leakage in

the IP-ID ﬁeld to detect whether C is communicating with

S through a tested port. The test depends on whether C is

connected to the network through a NAT or a stateful ﬁrewall

that keeps track of existing connections; the test used when C

is connected through a NAT/ﬁrewall device the attack is a bit

simpler. We believe that this is the more common scenario,

since recent versions of Windows (XP SP2 and later) ship

with a built in (stateful) ﬁrewall that is enabled by default, and

furthermore, use of NAT devices in small local area networks

connecting clients to the Internet is common. Due to space

limitations we describe only this test and include the test for

the complementary scenario (no ﬁrewall/NAT) in an online

technical report [33].

In Section III-B we describe how Mallory can identify a

relatively small set of client ports to test for a connection

with S; Mallory performs the port-test for all of them. Section

III-C presents our experimental setup and empirical results.

A. Port-Test for a Client Behind a Firewall/NAT

According to the TCP speciﬁcation [34] (Section 3.9, bot-

tom of page 69), the ﬁrst check that a recipient conducts

on an incoming packet, in case it belongs to an established

connection, validates that the sequence number is within the

congestion window. If this check ﬁnds the packet invalid,

then the recipient discards the packet and sends a duplicate

Ack feedback. A stateful ﬁrewall or NAT device connecting

C to the network keeps track of existing connections and

processes all incoming packets before they reach C. We use the

following observation: incoming packets that do not belong to

an established connection will be discarded before reaching C

(by ﬁrewall/NAT), whereas packets that belong to an existing

connection, but specify arbitrary (probably invalid) sequence

numbers will reach C who replies with a duplicate Ack.

The port test for the case of ﬁrewall/NAT deploying client is

according to the general query-probe-query pattern. The probe

speciﬁes S’s address and port as source (i.e., probe is spoofed)

and C’s address as destination, Mallory speciﬁes a different

destination port in each test. Figure 5 illustrates two iterations

of the port test: in the ﬁrst iteration, the ﬁrewall/NAT blocks

S’s IP-ID implementation does not inﬂuence the probing technique.

Mal C

1.1

. P

-P

e Q

1.3. P

ost

-Prob

e Q

uery

Response, id = i

Firewall/NAT

1.2. Prob

1,1

Response, id = i

1,2

≈

≈ ≈ ≈

2.3. P

ost

-Prob

e Q

uery

2.2. Prob

2,1

2,2

Response, id = i

2.1

. P

-P

e Q

Dup Ack,

id = i

RST (NAT

only)

Fig. 5. Two iterations of port test.

the probe packet (i.e., no connection through the tested port).

In the second iteration, the probe speciﬁes existing connection

parameters (IP addresses and ports) and therefore reaches C

who processes the probe and sends a duplicate Ack to S.

Notice that since the probe packet appears to be from S

(in case it speciﬁes a valid 4-tuple), it is difﬁcult to block the

probe in ﬁrewalls without blocking the legitimate connection

that C has with S.

When C uses a global identiﬁer, the difference in the IP-ID

ﬁeld in C’s responses to Mallory indicates whether C had sent

a packet in response to the probe (duplicate Ack). If Mallory

identiﬁes that C had sent a packet, then it is likely that C

is communicating with S via the tested port; however, the

identiﬁer may have increased since C had sent an independent

packet to some other peer. Repeating this test several times

allows Mallory to efﬁciently detect whether C is connected to

S and reveal C’s, see empirical evaluation below.

We keep a ‘score’ for each possible port, and increment

a speciﬁc port’s score by 1 point for every test that seems

to indicate that there is a connection through that port. We

conduct r > 1 rounds of the attack, where each port is tested.

Finally, we decide that there is a connection if there is a port

with a score higher than a threshold, TH.

Some ﬁrewalls have an option to randomize the IP-ID; our

tests would, of course, fail if the packets pass through such

randomizing ﬁrewall. The attack we describe in the following

section applies even in this scenario (but is less accurate).

1) Implementing Test Queries/Responses.: Our attacks use

packets that Mallory receives from C to learn the effect of

the (spoofed) probe packet. Mallory can cause C to send her

such packets by using the legitimate TCP connection that she

has with C: a query is some short data packet that Mallory

sends to C, the response is the C’s acknowledgment sent back

to Mallory. This allows Mallory to bypass typical ﬁrewall

defenses (e.g., Windows), since all packets in the test appear

to belong to legitimate connections (requests to C-Mallory

connection, probe to C-S connection). See further details in

the technical report [33].

B. Improving the Search: Client Port Allocation Algorithms

The port test that we presented allows Mallory to test

whether the client has a connection to some server via a

speciﬁc port. There are 2

possible ports that C might use to

communicate with S. However, common client port allocation

paradigms allow more efﬁcient attacks.

Below we present two common paradigms and methods to

reduce the number of tests for each of them.

1) Globally Incrementing:: the client port is incremented

for every new connection (initialized to a random value) Algo-

rithm 1 in [35] describes the implementation. This approach is

used in Windows and FreeBSD. If C uses this port allocation

paradigm, then recent connections that the client forms are

likely to use ‘close’ ports to that C uses in the connection

with Mallory. Hence, Mallory can test only these ports.

2) Per-Server Incrementing:: the client port is incremented

for every new connection with the server. Connections to

different servers use different counters. This approach is used

in Linux; Algorithm 3 in [35] describes the implementation.

The previous ‘trick’ we presented does not work in this case

since the port that C uses for the connection with Mallory does

not correlate to that C uses to communicate with S. How-

ever, we can still use the counter property of this paradigm:

Mallory causes C to create x ‘dummy’ connections to S (we

explain how below); since these connections all share the same

counter, they are sequential. Hence, Mallory can test every

port y = 0 (mod x) and identify p, a port C that uses to

communicate with S. Next, Mallory checks all ports in the

interval [p − x, p + x] and checks whether there are at least

x + 1 connection ports. If yes, then C has an ‘independent’

connection with S. In this method, the attacker would test

roughly

different ports.

In is left to describe how Mallory causes C to establish

multiple connections with S. Since C is in Mallory’s site, she

can run a script (in the browser sandbox) on C. This script,

while very limited, can open connections with other servers

to dynamically embed remote objects. We use it to open con-

nections to www1.mallory.com,. . . ,wwwx.mallory.com which

are domains that Mallory controls. Since Mallory controls the

DNS records for these domains, she sets each of these records

to point to the same IP, that of S. Browsers open a new

connection for each domain (regardless of its IP address);

hence, this technique, which we veriﬁed on Internet Explorer,

Firefox and Chrome, opens x new connections to S.

The typical limitation of x is the number of connections that

a browser can have simultaneously; this limitation is typically

one or few dozens; e.g., 16 in Firefox. In our experiments

below and in the following section, we use x = 10.

C. Empirical Evaluation

1) Setup.: In our empirical evaluation, the client network

is a class C subnet that has 5 clients running Windows 7,

each of them sends on average 64 packets per second to other

peers in the subnet (these packets are short, to simulate clients

that usually send Ack packets or short requests). Mallory

probes one of the clients in the network, C, who connects

to her (malicious) website. Mallory’s bandwidth is limited to

10 mbps. We used the network topology illustrated in Figure

3, network nodes are connected through switch devices. The

0 2 4 6 8 10

Score

Number of Rounds

Connection Port (near-C)

Best Non-Connection Port (near-C)

Connection Port (remote)

Best Non-Connection Port (remote)

Fig. 6. Global-ID attack. Comparison of a connection port to that of the

highest scoring non connection port as a function of round number. Each

measurement is an average of 10 runs, error-bars mark the standard deviation

values.

NAT device in the network topology is a Linux machine

(kernel version 2.6.35) running IP-tables (version 1.4.4). The

server machine runs Linux (kernel version 2.6.35) and uses

an Apache web-server (version 2.2.14). When we evaluate

the attack for the ‘Remote Attacker’ scenario, the adversary

communicates with the clients via a trafﬁc shaper that induces

high latency (200ms), signiﬁcant loss probability (0.5%) and

jitter (1-10 milliseconds).

2) Evaluation.: We ﬁrst evaluated the attack in case that C

is communicating with S. We compared between the score of

the ‘connection port’ (i.e., port that C uses for the connection)

to that of the best appearing non-connection port (i.e., port

with the highest score that is not the connection port) in each

round (repetition of the attack, see discussion above); note that

the highest scoring non-connection port may change between

rounds.

Figure 6 shows results for near-C and remote attackers. In

both environments, the score of the connection port was well

above 50% of the maximal score, certainly after ﬁve or more

reounds; hence, for efﬁciency, we can continue testing only

‘high scoring’ ports in advanced rounds. Namely, a port is

tested in the next round only if its current score is above 50%

of the maximal possible score.

We implemented an adversarial website that presents its

clients a request to arbitrarily decide whether to connect

(‘surf’) to a third-party website, S; our website attempted to

detect the clients’ choice. We used an automatic client, C, that

chooses to connect to S with probability

and implemented

the port-test above.

The choice of whether there is a connection between C and

S is according to a threshold over the ﬁnal score of the ports.

Namely, if there exists a port with a score over this threshold,

then we identify that there is a connection. Figure 6 shows

that a choice of 70% of the maximal possible score as a

threshold provides a good seperation between the connection

port (in case it exists) and other ports. Figures 7 - 9 show

the success rate in detecting whether C communicates with S

for different adversary locations as a function of the duration

of the attack. Figure 10 compares the average amount of data

that Mallory sends (per victim) to reach different success rates

and for different locations in the network.

In Figures 7 - 10, the measurements are the average of

50 runs; error-bars mark the standard deviation values (for

readability, not all measurements specify the error bars). Note

that the thresholds that we have used in our evaluation may

not work as well in other scenarios, e.g., when the client sends

much more than 64 packets per second the thresholds should

be higher.

IV. TIME-BASED TRAFFIC ANALYSIS

The globally incrementing IP-ID side channel, presented in

Section III, exploits an operating system ﬂaw. In this section

we explore a more generic, timing based, side channel that is

applicable when C is behind a ﬁrewall or a NAT. We deﬁne

below a new port test which resembles the IP-ID based port

test and is illustrated in Figure 5 as well.

The timing attack is based on the following observation: if C

is protected by a ﬁrewall or connects through a NAT device,

then in case that Mallory tests the correct port, C sends an

additional packet to S (response to the probe); this delays

processing of following packets, and in particular the post-

probe query; see illustration in Figure 5. We use this delay to

identify the connection.

A. Timing-Based Port Test

A signiﬁcant challenge is the jitter in the network, i.e., laten-

cies may vary while testing different ports. Thus, identifying

the longest time difference between two responses and testing

whether it is over a threshold is likely to produce an incorrect

result. We cope with this challenge by relatively comparing

ports: we assign each port to a small group of s arbitrary

ports.

Ports in each group are tested one after the other; we assume

that jitter does not vary much during the short time interval of

testing a speciﬁc (small) group. After testing a group, each port

is assigned with a relative rank according to the time difference

between responses in the corresponding port-test; the lower the

(group-relative) rank, the greater the time difference and the

more likely is a connection through that port. We conduct

several rounds of this attack (to reduce the probability of

errors).

Similarly to the attack presented in the previous section,

we keep a score for each port and after each round increase

a port’s score according to its rank: denote by σ

the number

of points that a port gains if it has rank i within the group,

these weights are normalized; i.e.,

i=1

= 1, and for

every i < j, σ

≥ σ

. The values of s and the vector

σ = (σ

, · · · , σ

) depend on the channel between Mallory and

C. We employ a machine learning approach (genetic algorithm)

to learn appropriate value for the vector σ; see details of the

algorithm in [33]. Let µ, µ

denote the expected scores of

connection and non connection ports respectively. The target

function of the learning algorithm is to maximize µ−µ

. In our

empirical evaluation below we explain how Mallory obtains

measurements of µ, µ

for different values of s, σ.

B. Empirical Evaluation

The environment we used to evaluate the timing attack is

as described in Section III-C, except that the client machines

0 5 10 15 20

Score

Number of Rounds

Connection Port (near-C)

Best Non-Connection Port (near-C)

Connection Port (remote)

Best Non-Connection Port (remote)

Fig. 11. Timing Attack. Comparison of a connection port to that of the highest

scoring non connection port as a function of round number. Average of 10

runs, error-bars mark the standard deviation values.

run Linux (kernel version 2.6.35) instead of Windows; hence,

the attacker cannot employ the global IP-ID based attack. All

Linux distributions ship with IP-tables ﬁrewall, its rule-set is

empty by default; we therefore evaluated only the scenarios

where Mallory is near-C remote (see Figure 3), i.e., Mallory

communicates with C and S via a NAT device.

The ﬁrst task is to obtain a good estimation of the optimal

values of s, σ for the channel between C and Mallory (this

depends on Mallory relative location to C). The machine

learning algorithm we employ uses the connection that Mallory

has with C (see Figure 2): since for this connection Mallory

knows the client’s port, he is able to obtain measurements

for different group sizes (s) and weights (the vector σ), see

more details in [33]. We found that these values signiﬁcantly

differ between the two attacker locations; e.g., in our setup we

found s = 31 to be suitable for a near-C attacker while s = 4

appeared optimal for the remote attacker. Figure 11 compares

the connection-port score to that of the highest scoring non-

connection port as a function of the number of rounds.

Next, we derive two thresholds for promoting ports to

following rounds according to their current score, this is

similar to the experiments in Section III-C. According to the

training set results, a choice of 60% of the maximal score

for the near-C attacker scenario and 40% in for the Interent

attacker scenario appear to be reasonable. As in Section III-C,

these thresholds require further research for other scenarios,

e.g., thresholds are effected by the victim’s transmission rate.

We implemented the timing attack and conducted an exper-

iment similar to that presented in Section III-C. We set the

threshold for deciding whether a connection exists between C

and S according to the difference between the expected scores

of a connection port (µ) and a non connection port (µ

) as

derived from our training measurements. See analysis in [33];

in this experiment we set the threshold to 0.2µ

+ 0.8µ. We

measured our success rate in probing whether C is communi-

cating with a (third-party) website, S. Results are in Figures

12 - 14.

Comparing these results to those of the ID based attack,

more time is required to obtain similar success rates, and the

maximal success rates reached are lower. However, the results

0.2

0.4

0.6

0.8

0 2 4 6 8 10 12 14 16

Rate

Attack Duration (seconds)

Success

False Positives

False Negatives

Fig. 7. Global-ID attack, with-C attacker.

0.2

0.4

0.6

0.8

0 5 10 15 20 25 30 35 40

Rate

Attack Duration (seconds)

Success

False Positives

False Negatives

Fig. 8. Global-ID attack, near-C attacker.

0.2

0.4

0.6

0.8

0 5 10 15 20 25 30 35 40

Rate

Attack Duration (seconds)

Success

False Positives

False Negatives

Fig. 9. Global-ID attack, remote attacker.

0.5

1.5

2.5

0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Data Sent (MB)

Success Rate

with-C attacker

near-C attacker

remote attacker

Fig. 10. Amount of data Mallory sends as a function of her success rate.

show that the timing attack does provide information on the

connection (since success ratio is greater than 0.5); but its

hint is often misleading (since success ratio is signiﬁcantly

less than 1). Attacker can repeat the attack several times and

select by the majority.

Figure 15 illustrates the average amount of data that Mallory

needs to send in order to reach a particular success rate for

different locations in the network and number of probes in

each test.

In Figures 12 - 15, the measurements are the average of 50

runs; error-bars mark the standard deviation values.

V. TRAFFIC ANALYSIS FOR TOR CLIENTS

In this section we consider the second scenario presented

in Section II, where C uses an onion routing infrastructure to

connect to S. We focus on the popular Tor network, but similar

attacks may apply to other low latency anonymity networks.

In this section we assume that the attacker, Eve, is able to

eavesdrop on C (but not on S).

Here, Eve actively interferes in the possible connection

between C and S and then tests whether a change in the rate of

packets that C receives occurred. If the result is positive, then

it is an indication that C communicates with S. As of writing

this version of the paper, we only did preliminary testing of

this attack; more work is required to evaluate the practicality

of this attack.

A Tor client connects to a remote server via a chain of

relays (proxies). The last relay in the chain, i.e., the exit relay,

has a direct TCP connection with the server. The number of

0.2

0.4

0.6

0.8

0 50 100 150 200

Portion of Connections

Number of Different Exit Nodes

Fig. 16. The portion of 2000 circuits we created using the Tor client as a

function of the number of different exit relays used.

possible Tor exit relays is important for our attacks (since

a direct connection exists between the exit relay and the

server); the Tor network comprises of few thousand relays,

about one thousand of which can perform as exit relays (see

[11]). However the number of different exit relays that a client

is likely to use is signiﬁcantly lower: ﬁrst, a client can only

use online relays; second, Tor clients typically choose the exit

relays according to various parameters such as stability and

bandwidth. We have formed Tor circuits from two clients in

different geographic locations and kept track on the exit relay

that was used. The measurements show that 20% of the 2000

circuits which we created (using Tor client version 0.2.2.35)

had one of 7 speciﬁc exit relays. Figure 16 illustrates our

measurements.

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 10 20 30 40 50 60 70 80

Rate

Attack Duration (seconds)

Success

False Positives

False Negatives

Fig. 12. Timing attack, near-C attacker. Mallory sends 2 probes in each test.

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 10 20 30 40 50 60 70 80

Rate

Attack Duration (seconds)

Success

False Positives

False Negatives

Fig. 13. Timing attack, near-C attacker. Mallory sends 5 probes in each test.

0.2

0.4

0.6

0.8

0 10 20 30 40 50 60 70 80

Rate

Attack Duration (seconds)

Success

False Positives

False Negatives

Fig. 14. Timing attack, remote attacker. Mallory sends 5 probes in each test.

0.45 0.5 0.55 0.6 0.65 0.7 0.75

Data Sent (MB)

Success Rate

near-C attacker, 2 probes

near-C attacker, 5 probes

remote attacker, 5 probes

Fig. 15. Amount of data Mallory sends as a function of her success rate.

A. The Indirect Rate Reduction Attack

In this section we present an attack that uses the following

observation: if Eve inﬂuences the rate of communication

between S and the exit relay, then this, in turn, will change

the rate of the connection between C and the ﬁrst (entrance)

relay. Eve sees the latter connection and is able to detect the

change.

Since Eve can only observe the aggregated rate of data that

C receives from the entrance relay (since communication is

encapsulated), this attack vector weakens when C communi-

cates with several other servers via one Tor circuit and C’s

connection rate with S is relatively small to that of the other

servers.

The following attack uses TCP congestion control mecha-

nisms to fake congestion events; hence, reducing the commu-

nication rate. This attack is based on the insight previously

noted in Section III-A: by sending a (spoofed) packet to an

exit relay, Eve would cause that relay to immediately send a

duplicate acknowledgment (Ack) in response to S, as long as

Eve’s packet appears from an existing connection between the

exit relay and S. The duplicate Ack that the exit relay sends

to S in response, has a valid sequence number and S will

accept it. A sequence of three duplicate Acks is interpreted

by TCP as a congestion event, see [36]; when it occurs, S’

congestion window shrinks. The exact effect depends on the

TCP implementation that the server runs. Until recently, TCP

Reno variant was default in Linux (the common operating

system of server machines); for this variant each congestion

event halves the size of the congestion window. Recent Linux

Eve

Exit Relay

(

)

cwnd=2

cwnd=4

Fig. 17. Eve causes the exit relay to send 3 duplicate Acks to S. S’ congestion

window is halved as a result.

kernels use the TCP Cubic variant, where the TCP window

size is multiplied by a constant of 0.8 for each congestion

event.

The congestion window size directly effects the sender’s (S)

transmission rate: S only sends as much as the congestion

window allows. Thus, by causing the exit relay to send a

sequence of 3 duplicate Acks to the server, Eve causes the

latter to signiﬁcantly reduce its ‘sending’ rate. This attack

is illustrated in Figure 17, which shows the effect when Eve

sends the spoofed packets to an exit relay and port through

which there is a connection with S.

1) Attack Process.: We use the asymmetry in the distribu-

tion of client choice for exit relays to reduce the number of

packets that the attacker needs to send to perform indirect rate

reduction. Namely, while there are many exit relays available,

there are only few ‘likely’ exit relays that a client might

use (see discussion above and Figure 16). For every server

IP address s and likely exit relay x, Eve can optionally

employ one of the attacks in the Sections III and IV to

identify those exit relays that communicate with the server.

This optional step will reduce the effort in the following

steps of the attack. The techniques in Sections III and IV

do not only identify the existence of a connection between

two peers, but also identify the client port – if a connection

exists, then this is the port with the highest score; see details

on how Eve employs these techniques in [33]. Next, for each

of the ‘suspected’ connections, she performs the indirect rate

reduction attack described above and checks which of the

clients had experienced ‘rate reduction’. This process repeats

several times for statistical coherency; after each iteration the

attack is suspended to allow S’s congestion window to recover.

An important property of this attack is that the spoofed

packets that Eve sends to the exit relay in order to reduce the

server’s rate, are not client speciﬁc. Hence, in case that Eve

eavesdrops on multiple clients (e.g., a government spying on

its citizens) this attack would simultaneously check which of

these possible clients has a connection with S.

2) Characteristics of vulnerable connections.: Since the

attack repeats for several iterations with intermediate suspen-

sions, this attack requires connections lasting several minutes

(see evaluation below). Furthermore, the connection must be

‘active’, i.e., the server should send data to the client while

the attack takes place; this allows Eve to detect rate reductions

and allows the congestion window to recover when the attack

is suspended. These type of connections include, for example,

ﬁle transfers (over FTP or HTTP).

B. Analysis

Our analysis in this subsection assumes that Eve does not

try to detect a direct connection between the exit relays and the

server S (the optional step). Instead, she performs the indirect

rate reduction attack on every likely exit relay and all possible

ports.

When using Tor, clients connect to S via proxies; therefore,

clients’ geographic location does not hint Eve on the server

IP address that they will connect to (in case S has multiple

physical servers, e.g., for load balancing). As a result, Eve

must enumerate all the IP addresses of S during the attack.

For each of the n

server addresses and for every exit relay

that Eve tries, she performs 2

iterations, trying a different

port in each iteration; for each port she sends three packets

that would cause the exit relay to send three duplicate Acks

to the server, if a connection exists through that port. These

packets can be short, with only one byte of data, i.e., 41 bytes

long. Hence, the overall data that Eve sends to a particular exit

relay, using a particular source IP of S in a single attack is

·3·41 < 7.7MB. As shown in Figure 16, a small set of exit

relays allows a good ‘hit’ rate. If Eve enumerates on all n

possible server addresses and the most likely seven exit relays,

then by our measurements the attack results in a ‘hit’ rate of

about

(see Figure 16); in this attack, she sends 53 · n

in each round. As noted at the end of the previous subsection,

Eve’s effort is divided on the number of clients (victims) that

she probes.

C. Empirical Evaluation

1) Setup.: We used the Tor network to evaluate the indirect

rate reduction attack. To simplify the experiment and limit

the effect on other Tor users, we performed the following

measures: the restricted web-site server, a Linux machine

(kernel version 2.6.35) which runs an Apache web-server

(version 2.2.14), had only one IP address; furthermore, when

running the attack, Eve was aware of the exit relay that is

used and its port used for the connection. Given these three

parameters, Eve only sends 3 packets of 41 bytes, i.e., 123

bytes, to carry out a single rate reduction iteration. Below, we

describe the frequency of iterations and show that we send

about 0.5 KB per second; we believe that this did not load the

exit relay or caused damage for other Tor users. The client

machine in our experiments runs Windows 7 and uses Tor

(version 0.2.1.30) to connect to web-servers. While running

our evaluation, we created Tor circuits using 12 different exit

relays.

2) Evaluation.: First, we observed the effect of the rate

reduction attack (three duplicate Ack technique). To measure

this effect, C connects to one of our servers through Tor;

our server reports to Eve the IP address and port of the exit

relay. Eve sends her packets only to the reported exit relay

and only to the speciﬁc port used in the connection with our

server. Eve performs three iterations of rate reduction every

second, aiming to fake three congestion events and decrease

the congestion window to about half of its size (in case of

cubic variant). This implies that in every second, Eve sends

369 bytes to the exit relay. In Figure 18 we compare between

the rate of packets that the client receives (as observed by Eve)

on normal conditions and when Eve attacks the exit relay; our

attack reduces the average rate.

We next tested the scenario considered in Section II, i.e., of

a client that connects through Tor to one of two sites. Eve uses

rate reduction to test whether the client is communicating with

the restricted site. We conducted the experiment as follows:

the victim C connects to one of two servers in each time,

each server is chosen with probability

. Regardless of the

choice that the client makes, the ‘restricted’ server sends Eve

an IP address and port, allegedly describing the exit relay

connected to it. In case that the client does not connect to

the restricted server, these values specify an arbitrary exit relay

and port. Eve then employs the attack above, performing three

rate reductions per second and sending a total of 369 Bps to

the speciﬁed exit relay.

If client rate decreased by at least 20% during the last

30 seconds, then the client’s score is incremented. The 20%

threshold is motivated by the results in Figure 18, but may

change in other scenarios, e.g., for a different server. This

process is repeated; between iterations there is a 30 seconds

0 20 40 60 80 100

Incoming Packets In Last 100ms

Time (seconds)

(a) Normal Conditions

0 20 40 60 80 100

Incoming Packets In Last 100ms

Time (seconds)

(b) Under Attack

Fig. 18. Comparison between a rate of a TCP connection (via Tor) in normal conditions and when under rate reduction attack.

0.2

0.4

0.6

0.8

0 50 100 150 200 250 300 350 400 450

Portion

Attack Duration (seconds)

Success Rate

False Positives

False Negatives

Fig. 19. Eve’s success rate in detecting client access to a restricted site via

Tor. Each measurement is the average of 20 runs. Server runs TCP cubic

variant.

suspension that allows the TCP connection between the server

and exit relay to recover to its normal rate (in case the

connection exists) and allows Eve to obtain a recent measure-

ment of the average rate in C’s connection. Eventually, Eve

decides that C is communicating with the restricted site if C

has more than half of the possible points. Figure 19 shows

Eve’s success rate as a function of the duration of the attack.

In these experiments, the servers run TCP Cubic variant; an

improvement in success rate is observed when server runs TCP

Reno.

VI. DEFENSE MECHANISMS

The countermeasures that we propose in this section do not

completely eliminate the related side channel threat, however,

they make it more difﬁcult to exploit. These defenses are

suitable for deployment on ﬁrewalls to ease deployment.

The globally incrementing IP identiﬁer side channel, as

mentioned in Section III, is only relevant while still using IPv4.

One way to avoid it is to use random IP-ID values; however,

this can result in collisions and loss for fragmented trafﬁc.

The attack in Section III can be prevented by simply moving

from globally-incrementing IP-ID to per-destination IP-ID;

this would preferably be done by hosts, but until hosts do so

a ﬁrewall can implement this by adding (pseudo)random per-

We informed Microsoft to the IP-ID issues, but we are not aware of

intention to ﬁx the IP-ID in Windows.

destination offset to the IP-ID. See analysis and better ways

to ﬁx the IP-ID in [16], [15].

It is more challenging to block or reduce the timing side

channels and cope with the rate reduction attack presented in

Section V. The ﬂaw that we identify is that a blind adversary

is able to cause a TCP recipient an involuntary reaction by

sending arbitrary (spoofed) packets. We propose keeping a

small window of acceptable sequence numbers that may be

processed. This window resembles the receiver’s congestion

window, but is more aggressive: while packets outside the

congestion window cause a duplicate acknowledgment (which

we use in the attacks described in Sections III-V), packets

that specify sequence numbers outside the acceptable-window

are silently discarded. The acceptable-window is larger than

the host’s congestion window and includes it. A congestion

window is usually up to 2

bytes, an acceptable-window that

is twice as large, i.e., 2

bytes, will signiﬁcantly degrade the

attacker’s ability to conduct all the attacks in this paper. Since

the sequence number is 32 bits long, the attacker is required to

send

= 2

times the number of packets to conduct similar

attacks. However, this technique requires that the ﬁrewall will

inspect the sequence numbers in incoming TCP packets, which

increases the packet processing overhead.

VII. CONCLUSIONS AND FUTURE WORK

Our primary conclusion is that TCP implementations leak

information that allows attackers to study the existence of

connections via side channels as we demonstrated in three

attacks.

We leave several research directions for future work. Specif-

ically, a more extensive empirical study is required to complete

the evaluation of the Indirect Rate Reduction attack on the

Tor network. Furthermore, it would be desirable to provide an

analytic analysis for the attacks presented in this paper.

An important question is, can we perform a more efﬁcient

and more accurate attack on Tor anonymity by combining the

indirect rate reduction attack presented in this paper with other

existing attacks on Tor anonymity which exploit other attack

vectors, e.g., [23], [12], [13], [14].

Acknowledgements

Thanks to Moti Geva, Amit Klein, Roger Dingledine and

the anonymous referees for their comments and suggestions.

REFERENCES

[1] T. Dierks and E. Rescorla, “The Transport Layer Security (TLS)

Protocol Version 1.2,” RFC 5246 (Proposed Standard), Internet

Engineering Task Force, Aug. 2008, updated by RFCs 5746, 5878,

6176. [Online]. Available: http://www.ietf.org/rfc/rfc5246.txt

[2] P. C. Kocher, “Timing Attacks on Implementations of Difﬁe-Hellman,

RSA, DSS, and Other Systems,” in CRYPTO’96, ser. LNCS, N. Koblitz,

Ed., vol. 1109, IACR. Springer-Verlag, Germany, 1996, pp. 104–113.

[Online]. Available: http://www.cryptography.com/timingattack/paper.

htmlhttp://www.cryptography.com/timingattack/timing.pdf

[3] E. W. Felten and M. A. Schneider, “Timing Attacks on Web

Privacy,” in Proceedings of the 7th ACM Conference on Computer and

Communications Security, S. Jajodia, Ed. Greece: ACM Press, Nov.

2000, pp. 25–32. [Online]. Available: http://www.acm.org/pubs/articles/

proceedings/commsec/352600/p25-felten/p25-felten.pdf

[4] S. Chakravarty, A. Stavrou, and A. D. Keromytis, “Trafﬁc Analysis

against Low-Latency Anonymity Networks Using Available Bandwidth

Estimation,” in ESORICS, D. Gritzalis, B. Preneel, and M. Theoharidou,

Eds., vol. 6345. Springer, 2010, pp. 249–267. [Online]. Available:

http://dx.doi.org/10.1007/978-3-642-15497-3

[5] P. Mittal, A. Khurshid, J. Juen, M. Caesar, and N. Borisov,

“Stealthy Trafﬁc Analysis of Low-Latency Anonymous Communication

Using Throughput Fingerprinting,” in ACM Conference on Computer

and Communications Security. ACM, 2011, pp. 215–226. [Online].

Available: http://doi.acm.org/10.1145/2046707.2046732

[6] S. Zander and S. J. Murdoch, “An Improved Clock-Skew Measurement

Technique for Revealing Hidden Services,” in USENIX Security

Symposium, P. C. van Oorschot, Ed. USENIX Association, 2008, pp.

211–226. [Online]. Available: http://www.usenix.org/events/sec08/tech/

full papers/zander/zander.pdf

[7] G. Lyon, Nmap Network Scanning: The Ofﬁcial Nmap Project Guide

to Network Discovery and Security Scanning. http://nmap.org/book/,

2009.

[8] M. Zalewski, Silence on the Wire: A Field Guide to Passive Reconnais-

sance and Indirect Attacks. No Starch Press, 2005.

[9] S. M. Bellovin, “A Technique for Counting Natted Hosts,” in

Internet Measurement Workshop. ACM, 2002, pp. 267–272. [Online].

Available: http://doi.acm.org/10.1145/637201.637243

[10] R. Dingledine, N. Mathewson, and P. F. Syverson, “Tor: The

Second-Generation Onion Router,” in USENIX Security Symposium.

USENIX, 2004, pp. 303–320. [Online]. Available: http://www.usenix.

org/publications/library/proceedings/sec04/tech/dingledine.html

[11] “Tor Metrics Portal. Network and Usage Graphs,” http://metrics.

torproject.org/graphs.html, Nov. 2011.

[12] A. Hintz, “Fingerprinting websites using trafﬁc analysis,” in Privacy

Enhancing Technologies, ser. Lecture Notes in Computer Science,

R. Dingledine and P. F. Syverson, Eds., vol. 2482. Springer, 2002, pp.

171–178. [Online]. Available: http://dx.doi.org/10.1007/3-540-36467-6

[13] S. Kadloor, X. Gong, N. Kiyavash, T. Tezcan, and N. Borisov,

“Low-Cost Side Channel Remote Trafﬁc Analysis Attack in Packet

Networks,” in ICC. IEEE, 2010, pp. 1–5. [Online]. Available:

http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=5497983

[14] A. Panchenko, L. Niessen, A. Zinnen, and T. Engel, “Website

Fingerprinting in Onion Routing Based Anonymization Networks,”

in Proceedings of the 10th annual ACM workshop on Privacy in the

electronic society, ser. WPES ’11. New York, NY, USA: ACM, 2011,

pp. 103–114. [Online]. Available: http://doi.acm.org/10.1145/2046556.

2046570

[15] F. Gont, “Security Assessment of the Internet Protocol Version 4,”

RFC 6274 (Informational), Internet Engineering Task Force, Jul. 2011.

[Online]. Available: http://www.ietf.org/rfc/rfc6274.txt

[16] Y. Gilad and A. Herzberg, “Fragmentation Considered Vulnerable:

Blindly Intercepting and Discarding Fragments,” in Proc. USENIX

Workshop on Offensive Technologies, Aug 2011. [Online]. Available:

http://www.usenix.org/events/woot11/tech/ﬁnal/ﬁles/Gilad.pdf

[17] S. Sanﬁlippo, “About the IP Header ID,” http://www.kyuzz.org/antirez/

papers/ipid.html, Dec 1998.

[18] ——, “A New TCP Scan Method,” http://seclists.org/bugtraq/1998/Dec/

79, 1998.

[19] G. Danezis, “The Trafﬁc Analysis of Continuous-Time Mixes,” in

Proceedings of Privacy Enhancing Technologies workshop (PET), 2004,

pp. 35–50.

[20] B. N. Levine, M. K. Reiter, C. Wang, and M. K. Wright, “Timing Attacks

in Low-Latency Mix-Based Systems,” in Proc. Financial Cryptography,

A. Juels, Ed. Springer-Verlag, LNCS 3110, Feb. 2004, pp. 251–265.

[21] Zhu, Fu, Graham, Bettati, and Zhao, “On Flow Correlation Attacks

and Countermeasures in Mix Networks,” in International Workshop on

Privacy Enhancing Technologies (PET), LNCS, vol. 4, 2004.

[22] R. Pries, W. Yu, X. Fu, and W. Zhao, “A New Replay Attack

Against Anonymous Communication Networks,” in IEEE International

Conference on Communications (ICC), 2008, pp. 1578–1582. [Online].

Available: http://dx.doi.org/10.1109/ICC.2008.305

[23] N. S. Evans, R. Dingledine, and C. Grothoff, “A Practical Congestion

Attack on Tor Using Long Paths,” in USENIX Security Symposium.

USENIX Association, 2009, pp. 33–50. [Online]. Available: http:

//www.usenix.org/events/sec09/tech/full papers/evans.pdf

[24] S. J. Murdoch and G. Danezis, “Low-Cost Trafﬁc Analysis of Tor,” in

IEEE Symposium on Security and Privacy. IEEE Computer Society,

2005, pp. 183–195. [Online]. Available: http://doi.ieeecomputersociety.

org/10.1109/SP.2005.12

[25] S. Kent and K. Seo, “Security Architecture for the Internet

Protocol,” RFC 4301 (Proposed Standard), Internet Engineering Task

Force, Dec. 2005, updated by RFC 6040. [Online]. Available:

http://www.ietf.org/rfc/rfc4301.txt

[26] F. Baker and P. Savola, “Ingress Filtering for Multihomed Networks,”

RFC 3704 (Best Current Practice), Internet Engineering Task Force,

Mar. 2004. [Online]. Available: http://www.ietf.org/rfc/rfc3704.txt

[27] P. Ferguson and D. Senie, “Network Ingress Filtering: Defeating

Denial of Service Attacks which employ IP Source Address

Spooﬁng,” RFC 2827 (Best Current Practice), Internet Engineering

Task Force, May 2000, updated by RFC 3704. [Online]. Available:

http://www.ietf.org/rfc/rfc2827.txt

[28] T. Killalea, “Recommended Internet Service Provider Security

Services and Procedures,” RFC 3013 (Best Current Practice),

Internet Engineering Task Force, Nov. 2000. [Online]. Available:

http://www.ietf.org/rfc/rfc3013.txt

[29] Advanced Network Architecture Group, “Spoofer Project,”

http://spoofer.csail.mit.edu/summary.php, May 2013.

[30] T. Ehrenkranz and J. Li, “On the State of IP Spooﬁng Defense,” ACM

Transactions on Internet Technology, vol. 9, no. 2, pp. 6:1–6:29, 2009.

[Online]. Available: http://doi.acm.org/10.1145/1516539.1516541

[31] S. Deering and R. Hinden, “Internet Protocol, Version 6 (IPv6)

Speciﬁcation,” RFC 2460 (Draft Standard), Internet Engineering Task

Force, Dec. 1998, updated by RFCs 5095, 5722, 5871, 6437, 6564,

6935. [Online]. Available: http://www.ietf.org/rfc/rfc2460.txt

[32] Wikipedia, “Usage Share of Operating Systems,” http://en.wikipedia.org/

wiki/Usage share of operating systems, December 2011.

[33] Y. Gilad and A. Herzberg, “Spying in the Dark: TCP and Tor Trafﬁc

Analysis,” http://u.cs.biu.ac.il/

∼

herzbea/security/TR/TR12 02, Bar Ilan

University, Tech. Rep. 2, April 2012.

[34] J. Postel, “Transmission Control Protocol,” RFC 793 (INTERNET

STANDARD), Internet Engineering Task Force, Sep. 1981, updated

by RFCs 1122, 3168, 6093, 6528. [Online]. Available: http:

//www.ietf.org/rfc/rfc793.txt

[35] M. Larsen and F. Gont, “Recommendations for Transport-Protocol

Port Randomization,” RFC 6056 (Best Current Practice), Internet

Engineering Task Force, Jan. 2011. [Online]. Available: http:

//www.ietf.org/rfc/rfc6056.txt

[36] M. Allman, V. Paxson, and E. Blanton, “TCP Congestion Control,”

RFC 5681 (Draft Standard), Internet Engineering Task Force, Sep.

2009. [Online]. Available: http://www.ietf.org/rfc/rfc5681.txt

De-Anonymisation Attacks on Tor: A Survey

Article

Full-text available

Jul 2021

Anonymity networks are becoming increasingly popular in today’s online world as more users attempt to safeguard their online privacy. Tor is currently the most popular anonymity network in use and provides anonymity to both users and services (hidden services). However, the anonymity provided by Tor is also being misused in various ways. Hosting illegal sites for selling drugs, hosting command and control servers for botnets, and distributing censored content are but a few such examples. As a result, various parties, including governments and law enforcement agencies, are interested in attacks that assist in de-anonymising the Tor network, disrupting its operations, and bypassing its censorship circumvention mechanisms. In this survey paper, we review known Tor attacks and identify current techniques for the de-anonymisation of Tor users and hidden services. We discuss these techniques and analyse the practicality of their execution method. We conclude by discussing improvements to the Tor framework that help prevent the surveyed de-anonymisation attacks.

On the Challenges of Geographical Avoidance for Tor

Conference Paper

Jan 2019

ShorTor: Improving Tor Network Latency via Multi-hop Overlay Routing

Preprint

Full-text available

Apr 2022

We present ShorTor, a protocol for reducing latency on the Tor network. ShorTor uses multi-hop overlay routing, a technique typically employed by content delivery networks, to influence the route Tor traffic takes across the internet. ShorTor functions as an overlay on top of onion routing-Tor's existing routing protocol and is run by Tor relays, making it independent of the path selection performed by Tor clients. As such, ShorTor reduces latency while preserving Tor's existing security properties. Specifically, the routes taken in ShorTor are in no way correlated to either the Tor user or their destination, including the geographic location of either party. We analyze the security of ShorTor using the AnoA framework, showing that ShorTor maintains all of Tor's anonymity guarantees. We augment our theoretical claims with an empirical analysis. To evaluate ShorTor's performance, we collect a real-world dataset of over 400,000 latency measurements between the 1,000 most popular Tor relays, which collectively see the vast majority of Tor traffic. With this data, we identify pairs of relays that could benefit from ShorTor: that is, two relays where introducing an additional intermediate network hop results in lower latency than the direct route between them. We use our measurement dataset to simulate the impact on end users by applying ShorTor to two million Tor circuits chosen according to Tor's specification. ShorTor reduces the latency for the 99th percentile of relay pairs in Tor by 148 ms. Similarly, ShorTor reduces the latency of Tor circuits by 122 ms at the 99th percentile. In practice, this translates to ShorTor truncating tail latencies for Tor which has a direct impact on page load times and, consequently, user experience on the Tor browser.

Achieving Sender Anonymity in Tor against the Global Passive Adversary

Article

Full-text available

Dec 2021

Tor is the de facto standard used for anonymous communication over the Internet. Despite its wide usage, Tor does not guarantee sender anonymity, even in a threat model in which the attacker passively observes the traffic at the first Tor router. In a more severe threat model, in which the adversary can perform traffic analysis on the first and last Tor routers, relationship anonymity is also broken. In this paper, we propose a new protocol extending Tor to achieve sender anonymity (and then relationship anonymity) in the most severe threat model, allowing a global passive adversary to monitor all of the traffic in the network. We compare our proposal with Tor through the lens of security in an incremental threat model. The experimental validation shows that the price we have to pay in terms of network performance is tolerable.

Design and Implementation of Tor Traffic Collection System Using Multiple Virtual Machines

Article

Jan 2019

Hyun-Jae Choi

DAENet: Making Strong Anonymity Scale in a Fully Decentralized Network

Article

Full-text available

Jan 2021

Traditional anonymous networks (e.g., Tor) are vulnerable to traffic analysis attacks that monitor the whole network traffic to determine which users are communicating. To preserve user anonymity against traffic analysis attacks, the emerging mix networks mess up the order of packets through a set of centralized and explicit shuffling nodes. However, this centralized design of mix networks is insecure against targeted DoS attacks that can completely block these shuffling nodes. In this paper, we present DAENet, an efficient mix network that resists both targeted DoS attacks and traffic analysis attacks with a new abstraction called Stealthy Peer-to-Peer (P2P) Network. The stealthy P2P network effectively hides the shuffling nodes used in a routing path into the whole network, such that adversaries cannot distinguish specific shuffling nodes and conduct targeted DoS attacks to block these nodes. In addition, to handle traffic analysis attacks, we leverage the confidentiality and integrity protection of Intel SGX to ensure trustworthy packet shuffles at each distributed host, and use multiple routing paths to prevent adversaries from tracking and revealing user identities. We show that our system is scalable with moderate latency (2.2s) when running in a cluster of 10,000 participants and is robust in the case of machine failures, making it an attractive new design for decentralized anonymous communication. DAENet's code is released on http://github.com/tdsc0652/dae-net.

Cross-Site Search Attacks: Unauthorized Queries over Private Data

Conference Paper

Full-text available

Oct 2020

Cross-site search attacks allow a rogue website to expose private, sensitive user-information from web applications. The attacker exploits timing and other side channels to extract the information, using cleverly-designed cross-site queries. In this work, we present a systematic approach to the study of cross-site search attacks. We begin with a comprehensive taxonomy, clarifying the relationships between different types of cross-site search attacks, as well as relationships to other attacks. We then present, analyze, and compare cross-site search attacks; We present new attacks that have improved efficiency and can circumvent browser defenses, and compare to already-published attacks. We developed and present a reproducibility framework, which allows study and evaluation of different cross-site attacks and defenses. We also discuss defenses against cross-site search attacks, for both browsers and servers. We argue that server-based defenses are essential, including restricting cross-site search requests.

ShorTor: Improving Tor Network Latency via Multi-hop Overlay Routing

Conference Paper

May 2022

An enhanced intelligent model: To protect marine IoT sensor environment using ensemble machine learning approach

Article

Dec 2021
OCEAN ENG

The research in marine sensors and the Internet of Things (IoT) has grown exponentially with the ample warehouse of natural materials in the sea. The growing activities in the marine sensor environment increased the threat of anomalies and cyber-attacks. Many Intrusion Detection Systems (IDS) and classical machine learning-based models have been proposed to secure the sensor-based IoT infrastructure. Still, these mechanisms have failed to achieve significant results for securing the marine sensor environment due to the discriminant requirements of the IoT appliances in deep oceans, such as distribution, information complexity, scalability, higher network bandwidth requirements, and low computational capacity. Hence, we propose a lightweight and robust ensemble model to secure the marine IoT environment from cyber-attacks and malicious activities. This paper established an optimized Light Gradient Boosting Machine (Light-GBM) algorithm for ocean IoT attack detection. The experiments were conducted on Distributed Smart Space Orchestration System (DS2OS) dataset. The proposed methodology includes a label encoding technique for best feature selection, hyper-parameter tuning, ensemble function, and a novel algorithm to develop an ocean IoT attack detection model. As an extension of traditional methods, the optimized Light-GBM model can handle the distributed IoT attacks in the deeper marine environments with low computational cost and with 98.52% detection accuracy. The comparative analysis confirms the effectiveness of the proposed model for marine sensor safety. Conclusively, the proposed model mitigates the threat of cyber-attacks in the marine sensor environment and presenting a promising future in real-time ocean-based IoT applications.

Off-Path TCP Hijacking Attacks via the Side Channel of Downgraded IPID

Article

Oct 2021

In this paper, we uncover a new off-path TCP hijacking attack that can be used to terminate victim TCP connections or inject forged data into victim TCP connections by manipulating the new mixed IPID assignment method, which is widely used in Linux kernel version 4.18 and beyond. Our attack has three steps. First, an off-path attacker can downgrade the IPID assignment for TCP packets from the more secure per-socket-based policy to the less secure hash-based policy, thus building a shared IPID counter that forms a side channel in the victim. Second, the attacker detects the presence of TCP connections by observing the side channel of the shared IPID counter. Third, the attacker infers sequence and acknowledgment numbers of the detected connection by observing the side channel. Consequently, the attacker can completely hijack the connection, e.g., resetting the connection or poisoning the data stream. We evaluate the impacts of our attack in the real world, and we uncover that more than 20% of Alexa top 100k websites are vulnerable to our attack. Our case studies of SSH DoS, manipulating web traffic, and poisoning BGP routing tables show its threat on a wide range of applications. Moreover, we demonstrate that our attack can be further extended to exploit IPv4/IPv6 dual-stack networks on increasing the hash collisions and enlarging vulnerable populations. Finally, we analyze the root cause and develop a new IPID assignment method to defeat this attack. We prototype our defense in Linux 4.18 and confirm its effectiveness in the real world.