Content uploaded by Jianhua Feng
Author content
All content in this area was uploaded by Jianhua Feng
Content may be subject to copyright.
DESKS: Direction-Aware Spatial Keyword Search
Guoliang Li, Jianhua Feng, Jing Xu
Department of Computer Science, Tsinghua University, Beijing 100084, China
liguoliang@tsinghua.edu.cn; fengjh@tsinghua.edu.cn; xmandbq@gmail.com
Abstract— Location-based services (LBS) have been widely
accepted by mobile users. Many LBS users have direction-aware
search requirement that answers must be in a search direction.
However to the best of our knowledge there is not yet any
research available that investigates direction-aware search. A
straightforward method first finds candidates without considering
the direction constraint, and then generates the answers by prun-
ing those candidates which invalidate the direction constraint.
However this method is rather expensive as it involves a lot of
useless computation on many unnecessary directions. To address
this problem, we propose a direction-aware spatial keyword
search method which inherently supports direction-aware search.
We devise novel direction-aware indexing structures to prune
unnecessary directions. We develop effective pruning techniques
and search algorithms to efficiently answer a direction-aware
query. As users may dynamically change their search directions,
we propose to incrementally answer a query. Experimental results
on real datasets show that our method achieves high performance
and outperforms existing methods significantly.
I. INTRODUCT ION
Location-based services (LBS) have been widely accepted
by mobile users. Many online location-based services are
available, such as AT&T (http://www.wireless.att.com/lbs) and
go2 (http://www.go2.com/). Recently many LBS users have
direction-aware search requirement that answers must be in a
search direction. For example, a user on the highway wants
to find nearest gas stations or restaurants. She has a search
requirement that the answers should be in the right front of
her driving direction, if in a right-hand traffic country (e.g., US
and China). Consider another example that a user is walking
to a supermarket. She wants to find an ATM around her walk
direction so as to avoid a long walk. In this case she also has
a direction-aware search requirement. There are many other
direction-aware search requirements in LBS, e.g., multiple
destination routing and virtual reality (to show local 3D
streetscape). More importantly, many modern mobilephones
(e.g., iPhone 4 and HTC) have GPS and compass. We can
easily get user’s location via the GPS and direction by the
compass. Thus we can utilize user’s location and search
direction to improve user search experiences in LBS.
However to the best of our knowledge there is not yet
any research available that investigates direction-aware search.
A straightforward method to support direction-aware search
first finds the candidates without considering the direction
constraint (e.g, [6] and [5]) and then generates the answers
by pruning those candidates that invalidate the direction con-
straint. However this method is rather expensive as it involves
a lot of useless computation on many unnecessary directions.
To address this problem, we propose a direction-aware
spatial keyword search method, called DESK S, which inher-
ently supports direction-aware search. We first formulate the
problem of direction-aware spatial keyword search as follows.
Consider a set of Points of Interest (POIs) where each POI
is associated with spatial information and textual description.
Given a direction-aware spatial keyword query with a location,
a direction, and a set of keywords, the direction-aware search
finds knearest neighbors of the query which are in the search
direction and contain all input keywords.
To support direction-aware spatial keyword queries, we
devise novel direction-aware index structures to prune un-
necessary directions. We first group the POIs based on their
distances to the bottom-left point of the Minimum Bounding
Rectangle (MBR) that contains all POIs. Then for POIs in each
group, we sort them based on their directions to the bottom-left
point. Given a query, we can deduce a direction range with
a lower direction bound and an upper direction bound. We
can prove that for any POI if its direction to the bottom-left
point is not in the direction range of the query, it will not be
an answer, and we can prune the POI. Similarly we can also
prune a group of POIs based on the direction range. Motivated
by this observation, we develop novel direction-aware index
structures, effective pruning techniques, and efficient search
algorithms to facilitate direction-aware spatial keyword search.
To summarize, we make the following contributions.
•We formulate the problem of direction-aware spatial
keyword search and propose an efficient direction-aware
search method to address this problem.
•We devise a novel direction-aware index structure which
groups the POIs based on their distances and directions.
The indexing structures can be used to effectively prune
many unnecessary POIs.
•We develop effective pruning techniques and search algo-
rithms to answer direction-aware spatial keyword queries.
As mobilephone users may dynamically change search
directions, we propose to incrementally answer a query
based on the cached results of previously issued queries.
•We have implemented our method, and the experimental
results show that our method achieves high performance
and outperforms existing methods significantly.
The rest of this paper is organized as follows. We first
formulate the problem of direction-aware spatial keyword
search and devise a novel indexing structure in Section II. We
develop effective pruning techniques in Section III. Section IV
gives efficient algorithms to answer a direction-aware query.
We discuss how to incrementally answer a query in Section V.
Experiment results are provided in Section VI. We review
related works in Section VII and conclude in Section VIII.
II. DIREC TI ON-AWARE SPATIA L KEY WORD SEA RCH
A. Problem Formulation
Data: Consider a set of POIs, P={p1, p2,··· , p|P |}. Each
POI pihas a location (pi.x, pi.y)where pi.x is the x-
coordinate and pi.y is the y-coordinate of the POI. piis also
associated with a set of keywords, denoted by pi.d. Thus a
POI is denoted by p=h(p.x, p.y ); p.di.
Query: A query qcontains a location (q.x, q.y)with an x-
coordinate q.x and a y-coordinate q.y. Query qhas a direction
constraint [α, β], which denotes that the user is only interested
in the POIs with directions to qin [α, β ]. Query qcontains
a set of user-input keywords K={k1, k2,··· , k|K| }. Users
can specify an integer kto find top-krelevant answers. Thus
query qis denoted by q=h(q.x, q.y); [α, β]; K;ki.
Answer: Let Rdenote the Minimum Bounding Rectangle
(MBR) that contains all POIs in P. Given a query qwith
direction [α, β], let Sqdenote the sector centered at qwith a
radius rand an angle from αto β, where ris the maximal
distance from qto the boundary of region R. Let Rqdenote
the intersection of Sqand R, which is the search region
satisfying the direction constraint. A POI pis an answer of
query q, if pis in Rqand p.d contains all keywords in K.
Let Pqdenote the set of all answers of q. We find knearest
neighbors of qfrom Pq. Next we formulate our problem.
Definition 1(DIRE CTION -AWARE SPATIA L KEY WORD
SEA RCH) Given a set of POIs Pand a query q=
h(q.x, q.y ); [α, β]; K;ki, let Pqdenote the set of POIs in Rq
that contain all keywords in K. DESK S finds a subset Pk
qof
Pqwith kPOIs such that ∀p∈ Pk
qand ∀p′∈ Pq− Pk
q,
dist(p, q)≤dist(p′, q), where dis t(·)is a distance function
and in the paper we use Euclidean distance∗.
Consider an example in Figure 1. There are 24 POIs. Given
a query qwith keywords “chinese food”, the ten highlighted
POIs p3, p4, p5, p6, p9, p12, p15 , p21, p22 , p23 contain the two
keywords. If we have no direction constraint, p3and p4are two
nearest neighbors. If we have direction constraint as shown in
Figure 1, p12 and p22 are two nearest neighbors.
We can extend existing spatial keyword search methods (e.g,
[6] and [5]) to support our problem. The method contains two
steps. (1) The filter step: It ignores the direction constraint
and finds knearest neighbors of query qwhich contain all
keywords. (2) The verification step: For each found POI in the
first step, it checks whether the POI is in the search direction.
If yes, it is a knearest neighbor of q. As most knearest
neighbors of qmay invalidate the direction constraint, it needs
to repeatedly execute the two steps until finding kanswers.
Although we can incorporate the verification step into the filter
step, this method still needs to visit many unnecessary POIs.
To address this problem, we propose a direction-aware spatial
keyword search method to achieve a high performance.
B. Direction-aware Indexing Structures
Given a set of POIs, we first generate the MBR Rthat
contains all POIs. Let Obl , Obr, Otr , Otl respectively denote
∗We suppose q∈R and our method can be extended to support q6∈R.
R1
R1
R2
R2
R2
R3
R3
p3
p4
p5
p6
p9
p11
p12
p15
p21
p22
p23
R1
R1
R1
R2
R2
R2
R3
R3
p3
p4
p5
p6
p7
p9
p12
p15
p21
p22
p23
RPRP
0
1
2
3
4
5
6
7
8
0
1
2
3
4
5
6
7
8
9
10
9
10
2
3
1
2
4
3
4
2
3
4
1
2
4
3
4
R11={p1, p2},R12={p3, p4},R13={p5, p6},R14={p7, p8}
R21={p9, p10},R22={p11, p12},R23={p13, p14 },R24={p15, p16}
R31={p17, p18},R32={p19, p20},R33={p21, p22},R34={p23, p24 }
Fig. 1. A running example
the bottom-left point, the bottom-right point, the top-right
point, and the top-left point of Ras shown in Figure 1.
We sort the POIs based on their distances to the bottom-left
point Obl. Without loss of generality, assume the sorted POIs
are p1, p2,··· , p|P| where dist(pi, Obl)≤dist(pj, Obl)for
i < j. Then we evenly partition them into Ndisjoint buckets,
B1, B2,··· , BN. If every POI has a distinct distance to Obl ,
we have Bi={p(i−1)×λ+1,··· , pi×λ}for 1≤i≤N−1and
BN={p(N−1)×λ+1,··· , p|P |}where λ=⌈|P |
N⌉. If multiple
POIs have the same distance to Obl, we partition the POIs
into different buckets as follows. We first put the first λPOIs
into the first bucket B1. If dist(pλ+1, Obl)=dist(pλ, Obl),
we add pλ+1 into B1; otherwise, we add λPOIs starting
with pλ+1 into B2. Iteratively we can put each POI into a
bucket. Let ri−1denote the smallest distance of POIs in Bi
for 1≤i≤N. We draw N−1arcs centered at Obl with
radiuses r1, r2,··· , rN−1. The N−1arcs partition Rinto N
regions (quarter concentric rings) R1,R2,··· ,RN, where R1
is within r1,RNis outside rN−1, and Riis between ri−1
and rifor 1< i < N. Obviously the POIs in Bifall in
Ri. Especially a POI on the i-th arc belongs to region Ri+1 .
Obviously the distance of any POI in Rito Obl is in [ri−1, ri)
for 1≤i < N (r0=dist(p1, Obl)). For example, in Fig-
ure 1, we partition POIs into three regions R1={p1,··· , p8},
R2={p9,··· , p16}, and R3={p17,··· , p24}.
Each POI pin region Rihas a direction to the bottom-
left point Obl , denoted by pθ=arctan p.y−Obl.y
p.x−Obl.x . For ease of
presentation, suppose Obl =(0,0). Thus pθ=arctan p.y
p.x . We
sort POIs in Ribased on their directions in ascending order.
Similarly we evenly partition POIs in Riinto Mbuckets
Bi1, Bi2,··· , BiM. Each bucket contains about |P|
M×NPOIs.
Suppose the minimal direction of POIs in bucket Bijis
θij−1for 1≤j≤M. We use M−1lines from Obl
with directions θi1, θi2,··· , θiM−1to partition Riinto M
sub-regions (a part of concentric rings) Ri1,Ri2,··· ,RiM.
Obviously the direction of any POI in Rijis in [θij−1, θij).
For example, in Figure 1, we partition each Riinto four sub-
regions. For instance, we partition R2into R21={p9, p10},
R22={p11, p12},R23={p13, p14}, and R24={p15 , p16}.
bl br
trtl
i
i-1
1
N
1 2 iN
12
i-1
1 2
Ri
Ri
Ri
p4
p7
p12
p18
p57
p68
p79
p22
p23
p34
p48
p57
p64
p92
if have large memory if have small memory
RPRP
Ri1RijRiM
j
ij
1
3
7
Ri
Ri
Ri
2
3
8
i
Fig. 2. Indexing structure
Our region structure is illustrated in Figure 2, which has
two salient features. Firstly given two sub-regions Risand
Rjt, for any POI p∈ Risand p′∈ Rjt, if i < j, we
have dist(p, Obl)<dist(p′, Obl). Secondly given two sub-
regions Risand Rit, for any POI p∈ Risand p′∈ Rit, if
s < t, we have pθ< p′
θ. We will use these two features to do
efficient pruning. Notice that traditional MBRs have no such
features, thus we propose the new index structure to facilitate
direction-aware search.
Although we can use the region structure to do spatial
pruning, we cannot use it to do textual pruning. To address this
issue, we build an inverted list for keywords in each sub-region
Rij. We give the space complexity of our index structure. For
the region structure, its space complexity is O(M×N). As
M×Nis not large (N=1000, M=600 for 16 million POIs,
see Section VI), we can keep the region structure in memory.
For the inverted lists, suppose each POI contains Wdistinct
keywords in average. The total inverted-list size is O(|P |×W).
If the inverted-list size is very large, we use a disk-based
structure. For each keyword kx, we maintain two inverted lists:
(1) The region list LR
kxthat keeps the sorted IDs of sub-regions
that contain kx. The sub-regions are sorted as follows. Ris<
Rjtif i < j, and Ris<Ritif s < t; (2) The POI list
LP
kxthat keeps the sorted IDs of POIs that contain kx: The
POIs in different sub-regions are sorted by sub-region order
and the POIs in the same sub-region are sorted by directions.
In LR
kx, for each Rij∈ LR
kx, we also maintain a pointer to the
POI list LP
kxthat keeps the position of the smallest POI ID in
Rij∩LP
kx. Based on the sorted property, suppose Rij’s pointer
is lijand the pointer of its next sub-region is lij+1 . We can
efficiently find POIs in Rijthat contain keyword kxfrom LP
kx,
e.g., the POIs in LP
kx[lij, lij+1 ). Suppose each sub-region Rij
contains Ldistinct keywords in average. The space complexity
of the disk-based inverted list is O(|P | × W+L×M×N).
The overall index structure is shown in Figure 2. Note that to
efficiently answer a query, besides building an index structure
for Obl, we also maintain index structures for Obr, Otr , Otl.
Thus the total index size is four times of that for Obl .
For example, in Figure 1, there are 24 POIs. Suppose
N=3and M=4. We generate 12 sub-regions, R11,··· ,R14,
q
i
ii
ij
ij
ij-1
pi-1j
pij
p
i-1j-1
pij-1
R
R
ij
ij-1
i-1
i-1
Fig. 3. Notations
R21,··· ,R24,R31,··· ,R34. Each sub-region has two POIs.
For example, in R22, there are two POIs p11 and p12 .
For keyword “chinese”, we maintain a region inverted list
which has seven sub-regions and a POI inverted list that has
eleven POIs as shown in Figure 1. The pointer of R13is
LP
chinese[2] = p5, that is p5is the smallest POI in R13that
contains “chinese”. Thus we can easily get POIs in R13
that contain “chinese” using its pointer as the start position
(LP
chinese[2]) and using the pointer of its next sub-region as
the end position (LP
chinese[4]), i.e., LP
chinese[2,4) = {p5, p6}.
In this paper we study how to use our index structures to
answer a direction-aware spatial keyword query and leave data
update as a future work.
C. Notations
For ease of presentation, we introduce some notations as
shown in Figure 3. Let qθ= arctan q.y
q.x denote the direction
of qto Obl and qd=dist(q, Obl )denote the distance of qto
Obl. Given a region Ri, let ri−1and rirespectively denote the
radius of its inner arc and its outer arc. Given a sub-region Rij,
we use a quadruple to denote the region, hri−1, ri, θij−1, θiji,
where θij−1is the minimum direction and θijis the maximal
direction of POIs in Rijto Obl . Let pi−1j, pi−1j−1, pij, pij−1
respectively denote the bottom-left point, bottom-right point,
top-left point, and top-right point of Rij(Figure 3).
Let qri−1
α(qri−1
β)denote the intersection of the line from q
with α(β)direction and the inner arc of Ri(with radius ri−1).
i
i-
i i
Fig. 4. Pruning R1,· · · ,Ri−1
bl br
tr
tl
i
i- i
MinDist q,Rii- d
d
(a) α≤qθ≤β
bl br
tr
tl
i
i- i
MinDist q,Ridist q
i-1
ri-1
(b) qθ< α
bl br
tr
tl
i
i- i
i-1
MinDist q,Ridist q ri-1
(c) qθ> β
Fig. 5. MI NDI ST(q,Ri)
As qri−1
α(qri−1
α.x, qri−1
α.y)is on the arc with radius ri−1, we
have (qri−1
α.x)2+ (qri−1
α.y)2=r2
i−1. In addition, as the point
is on the line with direction αto q,(qri−1
α.y −q.y)/(qri−1
α.x−
q.x) = tan α†. Thus we can compute the x-coordinate and
y-coordinate of qri−1
αusing the following Equations
(qri−1
α.y −q.y)/(qri−1
α.x −q.x) = tan α
(qri−1
α.x)2+ (qri−1
α.y)2=r2
i−1
(1)
Similarly, we can compute the point qri−1
β.
Let qθij−1
α(qθij
α)denote the intersection of the line from
qwith αdirection and the line from Obl with θij−1(θij)
direction. Similarly we can define qθij−1
βand qθij
β. As
qθij−1
α(qθij−1
α.x, qθij−1
α.y)is on the line with direction θij−1
to Obl,(qθij−1
α.y)/(qθij−1
α.x) = tan θij−1. As the point is on
the line with direction αto q,(qθij−1
α.y −q.y)/(qθij−1
α.x −
q.x) = tan α. Thus we can compute the x-coordinate and
y-coordinate of qθij−1
αusing the following Equations
((qθij−1
α.y −q.y)/(qθij−1
α.x −q.x) = tan α
(qθij−1
α.y)/(qθij−1
α.x) = tan θij−1
(2)
Similarly, we can compute the points qθij
α,qθij−1
β, and qθij
β.
Suppose the intersection of the line from qwith α(β)
direction and the boundary of Ris qR
α(qR
β)as shown in
Figure 3. Next we discuss how to compute qR
α. Suppose q′
θ
denote the direction from qto the top-right point Otr. If
α > q′
θ,qR
αwill fall on the top line from Otl to Otr . In this
case the y-coordinate of qR
α,qR
α.y =H, and x-coordinate
of qR
α,qR
α.x =q.x + (H−q.y)/tan α, where His the
height of the MBR R. If α=q′
θ,qR
αis exactly Otr. If
α < q′
θ,qR
αwill fall on the right line from Obr to Otr. In
this case the x-coordinate qR
α.x =Land the y-coordinate
qR
α.y =q.y + (L−q.x)×tan α, where Lis the length of the
MBR R. Thus we can compute the point qR
αas follows.
(qR
α.x, qR
α.y) =
(q.x +H−q.y
tan α, H)α > q′
θ
(L, H)α=q′
θ
(L, q.y + (L−q.x)×tan α)α < q′
θ
(3)
Similarly we can compute qR
β. We will use the above-
mentioned points to do pruning in the following sections.
†In this section, we suppose 0≤α≤β≤π
2and our technique can be
easily extended to support other directions (Section IV).
III. PRUN ING UNNE CE SSARY REG IONS
In this section, we propose effective pruning techniques
to prune unnecessary regions Ri(Section III-A) and Rij
(Section III-B). We first consider the direction in 0≤α≤
β≤π
2and discuss how to support any direction in Section IV.
A. Pruning Region Ri
Consider regions R1,R2,··· ,RNwith the radiuses of their
outer circles respectively r1, r2,··· , rN. Given a query q,
we first locate in which region qappears. To this end, we
first compute its distance to Obl ,qd. Then we use a binary
search on r1, r2,··· , rNto find the first radius which is larger
than qd. Suppose we find risuch that ri−1≤qd< rias
shown in Figure 4. We can prove that any POI in regions
R1,R2,··· ,Ri−1will not be an answer of query q, as they
are not in the search direction as formalized in Lemma 1.
Lemma 1Given a query point qwith 0≤α≤β≤π
2,
suppose ri−1≤qd< ri. Any POI in R1,R2,··· ,Ri−1
cannot be an answer of q‡.
Lemma 1 holds for any query with direction 0≤α≤β≤
π
2. For example, in Figure 1, we can directly prune region R1
and all POIs in R1do not need to be accessed. Note that it may
not hold if β > π
2. Consider a counter-example where a query
qis on the bottom line from Obl to Obr. If βis larger than π
2,
the search direction may have overlap with Ri−1. Similarly
αshould be no smaller than 0and the counter-example is a
query on the left line from Obl to Otl.
MIN DIST function for Ri:To facilitate nearest neighbor
search, traditional methods use function M IN DIST to estimate
the distance between a query and an MBR [10]. Formally,
given a query qand an MBR mbr, function MIN DIS T(q, mbr)
returns the minimal distance of qto mbr. As Riin our method
is not an MBR and our query has direction constraint, we
extend the function to support our problem as follows.
If qis outside the outer arc of Ri(qd≥ri), we have
MIN DIST(q, Ri)=∞based on Lemma 1. If qis in Ri(ri−1≤
qd< ri), we have MINDIS T(q, Ri) = 0. If qis inside the
inner arc of Ri(qd< ri−1), we give the function as follows.
Consider the direction of qto Obl ,qθ. If α≤qθ≤β, the near-
est neighbor of qin Riis the intersection of the line with qθ
direction and the inner arc of Riwith radius ri−1(Figure 5(a)).
Thus MI NDIST(q, Ri) = ri−1−qd. If qθ< α, the nearest
‡In this paper, we omit the proofs of Lemmas due to space constraints.
i
ii
R
R
R
u
R
l
p
(a) τR
l=θqR
αand τR
u=θqR
β
i
ii
R
R
R
u
R
l
p
(b) τR
l=qθand τR
u=θqR
β
i
ii
R
R
R
u
R
l
p
(c) τR
l=θqR
αand τR
u=qθ
Fig. 6. Direction-based pruning for regions Ri,· · · ,RN
i
l
u
i
i-
Ri
Ri
R
R
i
i
R
u
R
l
Fig. 7. Direction-based Pruning for Ri
neighbor of qin Riis qri−1
αwhich is the intersection of the line
from qwith αdirection and the inner arc of Ri(Figure 5(b)).
Thus MI NDIST(q, Ri) = dist(q, qri−1
α). Similarly if qθ> β,
MIN DIST(q, Ri) = dist(q, q ri−1
β)(Figure 5(c)).
Thus we give the MINDIST function as follows.
MIN DIST(q, Ri) =
∞qd≥ri
0ri−1≤qd< ri
ri−1−qdqd< ri−1&α≤qθ≤β
dist(q, qri−1
α)qd< ri−1&qθ< α
dist(q, qri−1
β)qd< ri−1&qθ> β
(4)
where qri−1
αand qri−1
βcan be computed using Equation 1.
Given a query q, we first find its located region Ri
and access the POIs in Ri. Then we verify whether the
POIs satisfy the direction constraint and contain all key-
words. Suppose the k-th smallest distance of the candi-
dates that have been computed is dk. Then for the next
region Ri+1, if MINDIST (q, Ri+1)≥dk, we terminate and
prune Ri+1,··· ,RN; otherwise we access POIs in Ri+1 .
Iteratively we can find all answers. As we use the best-
first search method, we only utilize MINDIS T function and
will not use MINMAX DIS T function [10]. For example, in
Figure 1, suppose k= 1. In R2, we find an answer p12.
As MI NDIST(q, R3)>di st(q, p12), we terminate and prune
POIs in R3.
However this method neglects the fact that some sub-
regions Rijin Rimay not satisfy the direction constraint.
For example, in Figure 1, although R21has a POI p9which
contains all keywords, we can prune the region as it is not in
the search direction. Similarly we can prune R24. To achieve
our goal, we discuss how to effectively prune Rijin Ri.
B. Pruning Regions Rij
In this section, we first introduce how to prune some
unnecessary sub-regions Rijwhich have no overlap with the
search direction, and then give the function MINDIST(q, Rij).
In the rest of this paper, if the context is clear, the term
“region” and “sub-region” are used interchangeably for Rij.
Our indexing structure has a salient feature: If a POI pis an
answer of q, its direction (pθ= arctan p.y
p.x )to Obl must be in
a range [τR
l, τ R
u]. In other words, we can prune the POIs with
direction smaller than τR
lor larger than τR
u. Next we discuss
how to deduce the lower bound τR
land the upper bound τR
u.
Given query qwith direction [α, β], consider the intersection
qR
α(qR
β)of the line from qwith α(β)direction and the
boundary of region Ras shown in Figure 6. Let θqR
αand
θqR
βrespectively denote the directions of points qR
αand qR
β
to Obl. As α≤β,θqR
α≤θqR
β. Let τR
l= min(θqR
α, qθ)and
τR
u= max(θqR
β, qθ). For any point p, if pθ> τR
u, its direction
to qmust be larger than β, thus pcannot be an answer of q
(Figure 6(b)). Similarly, if pθ< τ R
l, its direction to qmust be
smaller than α, thus pcannot be an answer of q(Figure 6(c)).
The correctness is formalized in Lemma 2.
Lemma 2Given a query qwith direction [α, β ], let τR
l=
min(θqR
α, qθ)and τR
u= max(θqR
β, qθ). For any POI p, if
pθ>τR
uor pθ<τR
l,pcannot be an answer of q.
Based on Lemma 2 we only need to access the POIs with
directions between τR
land τR
u. Moreover, a region Rijhas
a lower direction bound θij−1and an upper direction bound
θij, which respectively denote the minimal direction and the
maximal direction of POIs in Rij. In other words, for any POI
p∈ Rijwe have θij−1≤pθ< θij. Based on Lemma 2, for
region Rijwith direction [θij−1, θij), if θij≤τR
lor θij−1>
τR
u, we can prune the region Rijas formalized in Lemma 3.
Lemma 3Given a query qwith direction [α, β ], let τR
l=
min(θqR
α, qθ)and τR
u= max(θqR
β, qθ). For any region Rij
with direction [θij−1, θij), if θij≤τR
lor θij−1> τR
u, any
POI in Rijcannot be an answer of q.
For example, in Figure 1, although R21and R24have POIs
that contain all keywords, we can prune them as they are
not in search direction based on the direction-based pruning
technique in Lemma 3. Notice that this pruning technique is
valid for all regions. Next we devise tighter direction bounds
for region Ri. Let τRi
ldenote the tighter lower bound and
τRi
udenote the tighter upper bound for Ri. For any POI pin
Ri, if pθ< τRi
lor pθ> τRi
u, we can prune the POI. Next
we discuss how to deduce the two tighter bounds.
Consider the intersection of the line from qwith α(β)
direction and the outer arc of Ri, denoted by qri
α(qri
β). The
two points can be computed by Equation 1. Let θqri
α, θqri
β
respectively denote the directions of points qri
α, qri
βto Obl.
It is easy to figure out that if qri
αis in region R(denoted
by qri
α∈ R), θqri
α≥θqR
α; otherwise θqri
α< θqR
α(Figure 7).
Similarly if qri
β∈ R,θqri
β≤θqR
β; otherwise θqri
β> θqR
β. Based
on this observation, we give the tighter bounds τRi
land τRi
u.
τRi
l=
qθqθ≤α
θqri
αqθ> α &qri
α∈ R
θqR
αqθ> α &qri
α6∈ R
(5)
bl br
tr
tl
i-1
i
i-
iMinDist q,Ri q, p
q p
ij
ij-1
i-1j-1
j
p
i-1j-1
ij
i-1j-1
(a) R<
i[0, θij−1)
bl br
trtl
i
i
i-
i
q
ij
ij-1
ij
qri-1
MinDist q,Ri q,
j
i-1
(b) R<
i[θij−1, θij)
bl br
tr
tl
i-
i
i-
i
ij
ij
ij-1
MinDist q,Ri q,
j
ij
qij
q p
i-1j
pi-1j
(c) R<
i[θij,π
2]
bl br
tr
tl
i-
i
i-
iMinDist q,Ri q,
j
ij
ij-1
ij
ij-1
qij-1
p
i-1j-1
(d) Ri[0, θij−1)
bl br
tr
tl
i-
i
i-
iMinDist(q,Ri ) = 0
ij
ij-1
ij
j
(e) Ri[θij−1, θij)
bl br
tr
tl
i-
i
i-
iMinDist(q,Ri )=dist(q, )
ij
ij
ij-1
ij
jqij
pi-1j
(f) Ri[θij,π
2]
Fig. 8. MI NDI ST(q,Rij)
τRi
u=
qθqθ≥β
θqri
βqθ< β &qri
β∈ R
θqR
βqθ< β &qri
β6∈ R
(6)
Then consider region Rijwith the minimal direction θij−1
and the maximal direction θij. If θij≤τRi
lor θij−1> τRi
u,
region Rijhas no overlap with the search direction, thus
we can prune Rij. In other words, for Ri, we only need to
access the regions Ril,··· ,Riu, such that θil−1≤τRi
l<θiland
θiu−1≤τRi
u<θiu. To efficiently identify such regions, we use
τRi
lto do a binary search on the directions of regions in Ri,
{θi1,··· , θiM}, and find the smallest one which is larger than
τRi
l, i.e., Ril. Then we use τRi
uto do a binary search on the
directions in {θil+1 ,··· , θiM}, and find the largest one which
is smaller than τRi
u, i.e., Riu. Thus we only need to access
Ril,··· ,Riu. Lemma 4 formalizes the pruning technique.
Lemma 4Given a query qwith direction [α, β ]and a region
Ri, let τRi
l=min(θqri
α, qθ)and τRi
u=max(θqri
β, qθ). For any
POI p∈Ri, if pθ>τRi
uor pθ<τRi
l,pcannot be an answer of q;
For any region Rij∈Riwith direction [θij−1, θij), if θij≤τRi
l
or θij−1>τRi
u, any POI in Rijcannot be an answer of q.
Consider the example in Figure 1. We can prune regions
R21and R24in R2, and regions R31and R34in R3.
MIN DIST for Rij:For each region Rijin {Ril,··· ,Riu},
we use MINDI ST function to estimate the distance between
qand Rij, i.e., MI NDI ST(q, Rij). To this end, we partition
Rinto three regions by the inner arc (ri−1) and the outer arc
(ri), i.e., the region inside the inner arc R<
i, the region Ri,
and the region outside R>
i. Obviously, if q∈ R>
i, any POI
in Rijwill not be an answer of qbased on Lemma 1, thus
MIN DIST(q, Rij) = ∞. For R<
iand Ri, we respectively par-
tition them into three regions based on the two directions θij−1
and θij, denoted by R<
i[0, θij−1),R<
i[θij−1, θij),R<
i[θij,π
2],
and Ri[0, θij−1),Ri[θij−1, θij),Ri[θij,π
2](Figure 8).
(1) q∈R<
i[0, θij−1)(Figure 8(a)). If we have no direction
constraint, the nearest neighbor of qis the bottom-right point
pi−1j−1. Next, we consider the case with direction [α, β]. Let
θ(q, pi−1j−1)denote the direction from qto pi−1j−1. If α≤
θ(q, pi−1j−1)≤β, the nearest neighbor of qis still pi−1j−1.
If θ(q, pi−1j−1)< α, the nearest neighbor of qis qri−1
α, which
is the intersection of the line from qwith αdirection and the
arc with radius ri−1(computed by Equation 1). Similarly if
θ(q, pi−1j−1)> β, the nearest neighbor of qis qθij−1
β, which
is the intersection of the line from qwith βdirection and the
line from Obl with θij−1direction (computed by Equation 2).
(2) q∈R<
i[θij−1, θij)(Figure 8(b)). If α≤qθ≤β, the nearest
neighbor of qis qri−1
θwhich is the intersection of the line from
qwith qθdirection and the arc with radius ri−1. The distance
is ri−1−qd. If qθ< α, the nearest neighbor of qis qri−1
α. If
qθ> β, the nearest neighbor of qis qri−1
β.
(3) q∈R<
i[θij,π
2](Figure 8(c)). Similar to case (1), consider
the bottom-left point pi−1j. Let θ(q, pi−1j)denote the direc-
tion from qto pi−1j. If α≤θ(q, pi−1j)≤β, the nearest neighbor
of qis pi−1j. If θ(q, pi−1j)<α, the nearest neighbor of qis
qθij
α. If θ(q, pi−1j)>β, the nearest neighbor of qis qri−1
β.
(4) q∈Ri[0, θij−1)(Figure 8(d)). As β≤π
2, the nearest
neighbor of qmust be qθij−1
β(computed by Equation 2).
(5) q∈Ri[θij−1, θij)(Figure 8(e)). As qis in Rij,
MIN DIST(q, Rij)=0.
(6) q∈Ri[θij,π
2](Figure 8(f)). As α≥0, the nearest neighbor
of qmust be qθij
α(computed by Equation 2).
To summarize, we give function MINDIST (q, Rij)in Table I.
TABLE I
MIN DIS T(q, Rij)
Regions MIN DIS T(q, Rij)
R>
i∞
R<
i[0, θij−1)
dist(q, qri−1
α)θ(q, pi−1j−1)< α
dist(q, pi−1j−1)α≤θ(q, pi−1j−1)≤β
dist(q, qθij−1
β)θ(q, pi−1j−1)> β
R<
i[θij−1, θij)
dist(q, qri−1
α)qθ< α
ri−1−qdα≤qθ≤β
dist(q, qri−1
β)qθ> β
R<
i[θij,π
2]
dist(q, qθij
α)θ(q, pi−1j)< α
dist(q, pi−1j)α≤θ(q, pi−1j)≤β
dist(q, qri−1
β)θ(q, pi−1j)> β
Ri[0, θij−1)dist(q, qθij−1
β)
Ri[θij−1, θij)0
Ri[θij,π
2]dist(q, qθij
α)
IV. SEA RCH ALGORI THMS
In this section, we first give an algorithm to answer a query
with direction 0≤α≤β≤π
2(Section IV-A), and then discuss
how to answer a query with any direction (Section IV-B).
A. Answering Queries with 0≤α≤β≤π
2
We combine our pruning techniques and M IN DIST func-
tions to answer a query with direction 0≤α≤β≤π
2.
Figure 9 gives the pseudo-code of our algorithm. To efficiently
find knearest neighbors of q, we maintain a priority queue Q
(line 2) and keep the k-th smallest distance of POIs in Q
to q(dk) that have already been computed (line 3). Given
a query q, we first locate which region query qappears
using a binary search method on radiuses r1, r2,··· , rN
(line 4). Suppose we find Risuch that ri−1≤qd< ri. If
MIN DIST(q, Ri)≥dk, we terminate as there is no answer
in Ri···RN(line 6); otherwise for each region Ri, we find
the “candidate regions” which have overlap with the search
direction and contain all keywords in K, by calling function
FIN DCANDREGIO NS(line 7). Next for each candidate region
Rij∈ CRi, if MI NDI ST(q, Rij)≥dk, we break as there is no
answer in Rij···RiM(line 9); otherwise we find “candidate
POIs” in Rijwhich are in the search direction and contain
all keywords, by calling function FINDCANDPOIS(line 10).
Finally we need to access region Ri+1 if necessary (line 11).
Iteratively we can find the knearest neighbors of query q.
Then we discuss how to compute the candidate regions
in Ri. Function FI NDCANDREGI ON S gives the pseudo-code
(Figure 9). We first compute the lower direction bound τRi
l
and the upper direction bound τRi
u(line 2). Next we find
the regions satisfying the direction constraint Ri[α, β] =
{Ril,··· ,Riu}(in [τRi
l, τ Ri
u]) using a binary search method
on the directions θi1,··· , θiM(line 3). Then if the inverted
lists are in memory, we check whether each region in Ri[α, β]
contains all keywords and add such regions into candidate-
region set CRi. If we use a disk-based method, we load
region inverted lists for each keyword LR
ki(line 4), compute
their intersection LR
Kthat satisfies keyword constraint (line 5),
intersect the regions satisfying keyword constraint LR
Kwith
the regions satisfying region constraint Ri[α, β], and get
RK
i[α, β](line 6). For each region Rij∈ RK
i[α, β], if
Algorithm 1: DE SK S-BAISC (P, q)
Input:P: A collection of POIs
q=h(q.x, q.y ); [α, β]; K, ki: A query
Output:Pk
q={p|p∈ Pqand pis a knn of q}, where Pq
is the set of POIs in the search direction that
contain all the keywords in K.
begin1
Initialize an empty priority queue Q;2
Let dkdenote the k-th smallest distance in Q;3
Locate the region Riwhere qappears using a binary4
search on r1,··· , rN;
while i≤Ndo5
if MIN DIST (q, Ri)≥dkthen return;6
else CRi= FINDCANDRE GI ONS(q,Ri,dk) ;7
for Rij∈ CRi(CRiare sorted) do8
if MIN DIST (q, Rij)≥dkthen break;9
else FINDCANDPO IS(q,Rij,dk,Q) ;10
i=i+ 1 ;11
end12
Function FINDCANDREGIONS(q,Ri,dk)
Input:q=h(q.x, q.y ); [α, β]; K, ki: A query
dk: The k-th smallest distance in Q
Ri: Region Ri
Output:CRi: A sorted candidate-region set
begin1
Compute direction bounds τRi
land τRi
u;2
Find regions Ri[α, β] = {Ril,··· ,Riu}in3
[τRi
l, τ Ri
u]using a binary search on θi1···θiM;
Load region inverted lists LR
kifor ki∈ K ;
4
Compute LR
K=∩ki∈KLR
ki;
5
Compute RK
i[α, β] = Ri[α, β ]∩ LR
K;6
for Rij∈ RK
i[α, β]do7
if MIN DIST (q, Rij)< dkthen CRi← Rij;8
Sort CRibased on the MINDIST function ;9
end10
Function FINDCANDPOI S(q,Rij,dk,Q)
Input:q=h(q.x, q.y ); [α, β]; K, ki: A query
dk: The k-th smallest distance in Q
Rij: Region Rij;Q: Queue
begin1
Load POI inverted lists LP
ki(Rij)for ki∈ K ;
2
Compute intersection LP
K=∩ki∈KLP
ki(Rij);
3
for p∈ LP
Kdo4
if α≤θ(q, p)≤β&dist(q, p)<dkthen5
add pinto Q, and update Qand dk;6
end7
Fig. 9. DE SK S-BAI SC algorithm (using disk-based inverted lists)
MIN DIST(q, Rij)< dk, we add Rijinto the candidate-region
set CRi(line 8). Finally we sort the regions in CRibased on
the MI NDIST function in ascending order (line 9).
Next we discuss how to compute the candidate POIs in Rij.
Function FI NDCANDPOI Sgives the pseudo-code (Figure 9).
If the POI inverted lists are in memory, we directly compute
l
u
ii-
i
i
Ri
Ri
Fig. 10. Pruning for [π
2≤α < β < π]
l
u
ii-
Ri
Ri
i
i
Fig. 11. Pruning for [π≤α < β < 3π
2]
i
i-
i
i
l
u
Ri
Ri
Fig. 12. Pruning for [3π
2≤α < β < 2π]
their intersection. If the POI inverted lists are on disk, we load
the POI inverted lists for each keyword. Note that for kiwe
only load POIs that are in Rij,LP
ki(Rij), based on the pointers
in region lists as shown in Figure 1 (line 2). Then we compute
the intersection of POI lists LP
K=∩ki∈KLP
ki(Rij)(line 3). For
p∈LP
K, if α≤θ(q, p)≤βand dist(q, p)<dk,pis a candidate.
We add pinto the priority queue and update dk(line 6).
B. Answering Queries with Any Direction
In this section, we discuss how to answer a query with
arbitrary directions. We first classify queries into basic queries
and complex queries as follows.
•Case 1 – Basic Queries:
–0≤α≤β≤π
2. We answer it using the index
structures on Obl as discussed in the above sections.
–π
2≤α≤β≤π. We answer it using the index
structures on Obr , which is similar to answer a query
with 0≤α≤β≤π
2as shown in Figure 10.
–π≤α≤β≤3π
2. We answer it using the index
structures on Otr , which is similar to answer a query
with 0≤α≤β≤π
2as shown in Figure 11.
–3π
2≤α≤β≤2π. We answer it using the index
structures on Otl , which is similar to answer a query
with 0≤α≤β≤π
2as shown in Figure 12.
•Case 2 – Complex Queries: All other queries are called
complex queries. For a complex query qwith direction
[α, β], we decompose qinto at most four basic queries:
(1) q1with direction [0,π
2)∩[α, β]; (2) q2with direction
[π
2, π)∩[α, β]; (3) q3with direction [π, 3π
2)∩[α, β]; and
(4) q4with direction [3π
2,2π)∩[α, β]. Thus we can first
answer the sub-queries and then combine the results to
generate the final answers of query q.§
A straightforward method to answer a complex query first
decomposes it into basic sub-queries and then computes k
nearest neighbors for each basic query. Finally it finds the real
knearest neighbors by combing the results of each basic query.
However this method is very expensive as some sub-queries
may have no real answers and we do not need to answer such
sub-queries. To this end, we propose an efficient algorithm by
pruning many unnecessary POIs. For each basic query, we first
compute their candidate regions. Then we sort the candidate
regions based on their MI NDI ST values. Next we access the
§We use α∈[0,2π)and β≤α+2πto denote any direction. If β > 2π,
we decompose the direction to [α, 2π)and [2π, β ] = [0, β −2π]. Then we
decompose them to basic queries and generate at most five sub-queries.
Algorithm 2: DE SK S (P, q)
Input:P: A collection of POIs
q=h(q.x, q.y ); [α, β]; K, ki: A query
Output:Pk
q={p|p∈ Pqand pis a knn of q}, where Pq
is the set of POIs in the search direction that
contain all the keywords in K.
begin1
Initialize an empty priority queue QPfor POIs;2
Let dkdenote the k-th small distance in QP;3
Initialize an empty priority queue QRfor regions;4
Decompose qinto q1, q2,··· , q4;/*some may be empty*/5
for 1≤s≤4do6
Locate region Risfor qswhere qsappears;7
add Risinto QR;8
while QR6=φdo9
Get region Rimwith minimal MINDI ST(q , Rim);10
if MIN DIST (q, Rim)≥dkthen return;11
else CRim=FI ND CANDREGION S(q,Rim,dk);12
for Rim
j∈ CRimdo
13
if MIN DIST (q, Rim
j)≥dkthen break;
14
else FINDCANDPO IS(q,Rim,dk,QP) ;15
Pop Rimfrom QR;16
if MIN DIST (q, Rim+1)< dkthen17
add Rim+1 into QR;18
end19
Fig. 13. DE SK S algorithm (using disk-based inverted lists)
candidate regions in order and prune unnecessary regions. The
pseudo-code of the algorithm is shown in Figure 13.
We maintain two priority queues: QPfor candidate POIs
(line 2) and QRfor regions (line 4). We first decompose the
query into at most four sub-queries (line 5). Then for each
sub-query qs, we locate which region qsappears (line 7) and
add the region Risinto region queue QR(line 8). Then we
find region Rimwith the minimal M IN DIST value in QR
(line 10). If M IN DIST(q, Rim)≥dk, we terminate as we
have found knearest neighbors (line 11); otherwise we find
candidate regions in Rim,CRim(line 12). For each candidate
region Rim
j∈ CRim, if MINDI ST(q, Rim
j)≥dk, we break
as there is no answer in Rim
j· · · Rim
M(line 14); otherwise,
we compute candidate POIs in Rim
j(line 15). Next we pop
Rimfrom QR. For the next region Rim+1 after Rim, if
MIN DIST(q, Rim+1)< dk, we add it into region queue QR
(line 18). Iteratively we can find all answers of query q.
V. IN CREME NTAL SEAR CH AL GORIT HMS
Mobilephone users will dynamically change directions if
they cannot find expected answers in the current direction.
A naive method is to answer a new query from scratch.
However this method is very expensive. To address this issue,
we propose to incrementally answer a query based on the
cached results of previously issued queries. To avoid involving
huge space, we only cache knearest neighbors for a query.
We consider the following two cases to update a direction¶.
Case 1: The user increases a direction from [α, β]to [α′<
α, β′> β]. This corresponds to the case that the user increases
the direction using two fingers on the mobilephone screen.
Section V-A discusses how to answer such a query efficiently.
Case 2: The user moves the direction from [α, β]to [α+
δθ, β +δθ]. This corresponds to the case that the user changes
the direction by moving the mobilephone direction. Section V-
B discusses how to answer such a query efficiently.
Note that our method can support any direction-update
queries using these two operations.
A. Increasing The Direction
Suppose a user has issued a query qwith direction [α, β]
and then the user issues a new query q′by increasing the
direction to [α′< α, β′> β]. We use the cached results
of qto answer this new query q′as follows. Obviously, an
answer of qmust be an answer of q′. Let d′
k(dk)denote the
k-th smallest distance of nearest neighbors to query q′(q). We
have d′
k≤dk. Thus we can use dkas an upper bound.
We insert knearest neighbors of qinto the priority queue
of q′. Then we decompose q′into three queries, q1[α′, α],
q2[β, β ′], and q[α, β]. We only need to answer q1and q2
with bound dk. We answer the two queries simultaneously as
answering sub-queries in Section IV-B. Note that in the two
new directions, if there is a POI p(or region Rij) with distance
to qlarger than dk, we prune p(or region Rij); otherwise we
insert it into the priority queue (or access the region). Thus
we can incrementally and efficiently answer query q′.
B. Moving The Direction
Suppose a user has issued a query qwith direction [α, β]and
then the user issues a new query q′by moving the direction
to [α+δθ, β +δθ]. Firstly consider δθ>0. If α+δθ>β,qand
q′have no overlapped direction and we answer the new query
from scratch. On the contrary, qand q′have an overlapped
direction [α+δθ, β]. We examine each knearest neighbors of
q, and if it is in [α+δθ, β], we insert it into the priority queue
of query q′and update the k-th smallest threshold d′
k. Then we
answer the new query with direction [β, β+δθ]using threshold
d′
k. If we find kanswers in the priority queue or in direction
[β, β +δθ]within distance dk, we do not need to access
regions in direction [α+δθ, β ]; otherwise, we need to access
those regions in direction [α+δθ, β]with MIN DIST values no
smaller than dk. Thus we can use the bound d′
kto do effective
¶In this paper we do not consider moving queries (changing locations).
pruning. Similarly if δθ<0and β+δθ>α, we can use the above
method to answer query q′with direction [α+δθ, α]. Thus we
can incrementally and efficiently answer query q′.
VI. EXPER IM ENTAL STU DY
We have implemented our proposed methods. We com-
pared with two state-of-the-art methods MIR2-tree [6] and
LkT [5]k. We extended their methods to support direction-
aware search by examining whether each accessed MBR (or
POI) is in search direction. For LkT, we got the codes from
the authors [5] which were implemented in Java. For MIR2-
tree, we implemented it in C++. Our algorithms were also
implemented in C++. All the C++ codes were compiled using
GCC 4.2.3 with -O3 flag. As the baseline algorithms used
disk-based indexes, we also used disk-based index structure.
All the experiments were run on a Ubuntu machine with an
Intel Core E5450 3.0GHz CPU and 4 GB memory.
We used three real datasets, POIs in California(CA), POIs
in Virginia(VA), and POIs in China(CN). The statistics of the
datasets was summarized in Table II. We generated five query
sets with keyword numbers from 1 to 5 and each query set
had 1000 queries. TABLE II
DATASE TS.
CA VA CN
Total number of POIs (millions) 0.91 0.96 16.5
Total number of terms (millions) 9.7 4.663.6
Total number of unique terms (thousands) 35 26 753
Average number of unique terms per POI 8.57 4.53.85
A. Varying Mand N
In this section, we evaluate the effect on varying region
number Nand sub-region number M. Figure 14 shows the
results. We see that different values of Nand Mhad no
significant effect on the performance for M > 50. On
the VA dataset, the running time was about 2.3-2.7 ms on
every combinations of Mand N, and we got the highest
performance at N=100 and M=150. On the CA dataset, the
running time was 11-15 ms for different Mand Nvalues,
and we got the highest performance at N=100 and M=150.
On the CN dataset, the time was about 9-16 ms. The highest
performance was achieved at N=1000 and M= 600. Based
on the results, we had a conclusion that each region Riwas
better to contain 10, 000 POIs and each sub-region Rijwas
better to contain 100 POIs. In the reminder experiments, we
used N=100 and M=150 on the CA and VA datasets, and
N=1000 and M=600 on the CN dataset.
B. Evaluation on Pruning Techniques
In this section, we evaluate our pruning techniques. We im-
plemented three methods. (1) DE SK S+R: We used the region-
pruning techniques and function MI NDIS T(q, Ri)to prune Ri.
(2) DE SK S+D: We used the direction-pruning techniques and
function MI NDI ST(q , Rij)to prune Rij. (3) D ES KS+RD: We
used both region pruning and direction pruning.
Varying k:We first evaluated the pruning techniques by
varying kon the 5000 queries and α=0,β=π
3. Figure 15 shows
kAs MIR2-tree generally achieves much higher performance than IR2-
tree, we do not report results for IR2-tree.
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
50 100 150 200 250
Elapsed Time (ms)
N
M= 50
M=100
M=150
M=200
M=250
(a) VA
10
11
12
13
14
15
16
17
50 100 150 200 250
Elapsed Time (ms)
N
M= 50
M=100
M=150
M=200
M=250
(b) CA
8
9
10
11
12
13
14
15
16
0 200 400 600 800 1000
Elapsed Time (ms)
N
M= 200
M= 400
M= 600
M= 800
M=1000
(c) CN
Fig. 14. Average search performance: Varying Mand N(5000 queries, k= 10,α= 0, β =π
3)
0
10
20
30
40
50
60
1 5 10 20 50 100
Elapsed Time (ms)
Top-k
Desks+R
Desks+D
Desks+RD
(a) VA
0
10
20
30
40
50
1 5 10 20 50 100
Elapsed Time (ms)
Top-k
Desks+R
Desks+D
Desks+RD
(b) CA
0
10
20
30
40
50
60
1 5 10 20 50 100
Elapsed Time (ms)
Top-k
Desks+R
Desks+D
Desks+RD
(c) CN
Fig. 15. Average search performance: Varying k(5000 queries, α= 0, β =π
3)
0
10
20
30
40
50
60
1 2 3 4 5 6 7 8 9 10 11 12
Elapsed Time (ms)
Directions (* π/6)
Desks+R
Desks+D
Desks+RD
(a) VA
0
20
40
60
80
100
120
140
1 2 3 4 5 6 7 8 9 10 11 12
Elapsed Time (ms)
Directions (* π/6)
Desks+R
Desks+D
Desks+RD
(b) CA
0
50
100
150
1 2 3 4 5 6 7 8 9 10 11 12
Elapsed Time (ms)
Directions (* π/6)
Desks+R
Desks+D
Desks+RD
(c) CN
Fig. 16. Average search performance: Varying directions β−αfrom π
6to 2π(5000 queries, k= 10)
the results. We can see that DE SKS+D and DESK S+RD sig-
nificantly outperformed DES KS+R. This is because DESKS +R
needed to access many unnecessary regions and the direction-
based pruning can prune large numbers of unnecessary re-
gions. DE SK S+RD was also better than DESK S+D, especially
on the CN dataset. This is because DESK S+RD can prune
many regions Ri. For example, on the CN dataset, for k=100,
DES KS+R took 55 ms, DESKS +D improved it to 32 ms, and
DES KS+RD further improved it to 16 ms. There are two
reasons that the improvement of DESK S+RD over DESK S+D
was not significant on the CA and VA datasets. Firstly, there
were small numbers of POIs that contain all keywords. Both
DES KS+D and DESK S+RD needed to access many regions.
Secondly, there were small numbers of regions (Ri). As
N=100, DE SK S+RD cannot prune large numbers of regions.
Varying directions: We evaluated the pruning techniques
by varying directions on 5000 queries and k= 10. Fig-
ure 16 shows the results. Similarly DE SKS+D and DESKS +RD
significantly outperformed D ES KS+R. On the VA dataset,
DES KS+R took more than 20 ms to answer a query, and
DES KS+D and DESK S+RD only took about 2 ms. This
is because DESK S+R needed to enumerate many regions
while DE SKS+D and DE SKS+RD can prune large numbers
of regions based on the direction-aware indexes.
C. Comparison with Existing Methods
We compared our algorithm DESKS (DESKS+RD) with
state-of-the-art methods MIR2-tree and LkT. We first com-
pared the index sizes and time as shown in Table III. LkT
TABLE III
IND EXI NG T IME A ND S IZ ES.
Data Sizes(MB) Index Sizes (MB) Index Time (Minutes)
MIR2-tree LkT DES KS MIR2-tree LkT DE SK S
CA 72.2 72 1430 265 1.3 780 1.8
VA 54.8 76 920 149 0.8 690 1.2
CN 805 1304 – 3552 25 – 33
was very expensive to build indexes as it needed to cluster
keywords in POIs. On the CN dataset, it took more than 2
days to index 1 million POIs, and it will take 1 month to
index 16 million POIs. Thus we did not show the results on
the CN dataset. MIR2-tree used R-tree and keyword signatures
to build indexes. Although DES KS had larger index sizes than
MIR2-tree (as DE SK S built indexes for Obl, Obr , Otr , Otl),
DES KS still had acceptable index sizes. LkT had much larger
index sizes as it built inverted lists for each R-tree node.
Varying directions: We first compared different methods by
varying directions on 5000 queries and k= 10. Figure 17
shows the results. Although LkT and MIR2-tree achieved high
performance for large directions, they were very slow for small
directions. This is because they needed to enumerate many
MBRs and POIs, which was very expensive. For example, on
the CA dataset, they took 200 ms for direction 2π, but took
more than 5 seconds for direction π
3. DE SKS only took 20
ms for any direction, since DES KS can use the index to do
effective direction pruning. Even for the direction with 2π,
DES KS still outperformed existing methods. There are three
reasons. Firstly, our region structure is very effective and can
be in memory. Secondly, our region inverted lists can prune
many unnecessary POIs. Thirdly, existing methods usually
1
10
100
1000
10000
1 2 3 4 5 6 7 8 9 10 11 12
Elapsed Time (ms)
Directions (* π/6)
LkT
MIR2-Tree
Desks
(a) VA
1
10
100
1000
10000
100000
1 2 3 4 5 6 7 8 9 10 11 12
Elapsed Time (ms)
Directions (* π/6)
LkT
MIR2-Tree
Desks
(b) CA
1
10
100
1000
10000
100000
1 2 3 4 5 6 7 8 9 10 11 12
Elapsed Time (ms)
Directions (* π/6)
MIR2-Tree
Desks
(c) CN
Fig. 17. Performance comparison: Varying directions β−αfrom π
6to 2π(5000 queries, k= 10)
1
10
100
1000
1 5 10 20 50 100
Elapsed Time (ms)
Top-k
LkT
MIR2-Tree
Desks
(a) VA
1
10
100
1000
10000
1 5 10 20 50 100
Elapsed Time (ms)
Top-k
LkT
MIR2-Tree
Desks
(b) CA
1
10
100
1000
10000
1 5 10 20 50 100
Elapsed Time (ms)
Top-k
MIR2-Tree
Desks
(c) CN
Fig. 18. Performance comparison: Varying k(5000 queries, α= 0, β =π
3)
0.1
1
10
100
1000
10000
1 2 3 4 5
Elapsed Time (ms)
Numbers of Keywords
LkT
MIR2-Tree
Desks
(a) VA
0.1
1
10
100
1000
10000
100000
1 2 3 4 5
Elapsed Time (ms)
Numbers of Keywords
LkT
MIR2-Tree
Desks
(b) CA
0.1
1
10
100
1000
10000
100000
1 2 3 4 5
Elapsed Time (ms)
Numbers of Keywords
MIR2-Tree
Desks
(c) CN
Fig. 19. Performance comparison: Varying numbers of keywords (1000 queries in each query set, k= 10,α= 0, β =π
3)
achieved high performance for POIs with many keywords
(documents) [5]. However real POIs have no many keywords.
Varying k:Then we compared different methods by varying
kon 5000 queries and α= 0, β =π
3. Figure 18 shows the
results. We can see that DESKS significantly outperformed
MIR2-tree and LkT, even in 2-3 orders of magnitude. On the
VA dataset, MIR2-tree and LkT took about 500 ms, and DESKS
improved the time to 2-5 ms. The main reason is that existing
methods cannot use the index to do effective direction pruning.
DES KS used the novel direction-aware index which can prune
large numbers of unnecessary regions and POIs.
Varying the number of keywords: Next we compared
different methods by varying keyword numbers and setting
k= 10 and α= 0, β =π
3. Figure 19 shows the results.
We can see that for different numbers of keywords, DESK S
was still much better than MIR2-tree and LkT. For different
numbers of keywords, DESKS only took about 10-20 ms.
D. Evaluation on Incremental Search
In this section, we test our incremental search method. We
first initialized queries with β−α=π
3and then increased direc-
tions by π
36 ,··· ,12π
36 . Figure 20(a) shows the results. We can
see that our incremental method DE SKS-I NC RE outperformed
DES KS. This is because DE SKS-I NC RE can incrementally
answer a query using the previously issued queries. We also
evaluated DES KS -INCR E by moving directions. Figure 20(b)
shows the results. We still initialized queries with β−α=π
3
and then moved the directions by −6π
36 ,··· ,6π
36 . We can see
that for a small direction, DES KS -INCR E was much better than
DES KS, as DE SKS-I NC RE can use a tighter bound to answer
new queries. For a large direction, the improvement was not
high as DESKS- IN CRE needed to answer queries from scratch.
E. Scalability
In this section, we evaluate the scalability on the CN dataset
by varying numbers of POIs. Figure 21 shows the results with
different kvalues and directions. We can see that our method
scaled very well. This is contributed to our effective direction-
aware index structures and effective pruning techniques.
VII. RELATED WO RK
Many studies on spatial keyword search have been proposed
recently [25], [3], [9], [6], [23], [5], [24], [22], [1], [21], [2],
[19], [13]. The most related work to our problem is the study
by Felipe et al. [6], which proposed the index structures by
integrating signature files and R-tree to enable top-kspatial
keyword queries. Another similar study [5] is provided by
Cong et al., which combined inverted files and R-tree to
answer the location-aware top-ktext retrieval (LkT) query.
Our direction-aware spatial keyword query is different from
their methods as we have a direction constraint.
Zhou et al. [25] proposed to find web documents relevant
to user input keywords within a pre-specified region. They
developed several methods by combining R-tree and inverted
indexes. Chen et al. [3] extended this problem by supporting
5
6
7
8
9
10
1 2 3 4 5 6 7 8 9 10 11 12
Elapsed Time (ms)
Directions (* π/36)
Desks
Desks-Incre
(a) Increasing directions
4
5
6
7
8
9
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6
Elapsed Time (ms)
Directions (* π/36)
Desks
Desks-Incre
(b) Moving directions
Fig. 20. Incremental Search on the CN dataset (k= 10)
0
5
10
15
20
2 4 6 8 10 12 14 16
Elapsed Time (ms)
Numbers of POIs (* million)
k=100
k= 50
k= 20
k= 10
k= 1
(a) Varying k(β−α=π
3)
0
10
20
30
40
2 4 6 8 10 12 14 16
Elapsed Time (ms)
Numbers of POIs (* million)
2π
5π/3
4π/3
π
2π/3
π/3
(b) Varying directions (k= 10)
Fig. 21. Scalability on the CN dataset
large numbers of “footprint representations.” Hariharan et
al. [9] focused on finding objects containing a set of key-
words within a specific region. They proposed a hybrid index
structures by integrating R-tree and inverted lists. Zhang et
al. [23], [24] introduced the m-closest keyword query (mCK
query) which aims at finding the closest objects that match
keywords. Cong et al. [1] studied how to find top-kprestige-
based relevant spatial web objects. Yao et al. [22] tackled
the problem of answering approximate string match queries
in spatial databases. Wu et al. [21] studied spatial keyword
search for moving objects. Lu et al. [13] extended reversed
knn techniques to support reverse spatial and textual knearest
neighbor search. Roy and Chakrabarti [19] studied type-ahead
search in spatial databases using materialization techniques.
Cao et al. [2] studied collective keyword search by considering
multiple points. Leung et al. [12] proposed to use locations for
personalized search. Obviously the above queries substantially
differ from our direction-aware spatial keyword query.
There are many studies on knn [18], [16], [10], [11], [20],
[17]. Ferhatosmanoglu et al. [7] studied constrained nearest
neighbor search using polygon as a constraint. Cheng et al. [4]
studied constrained knn queries over uncertain data. Gao et
al. [8] and Nutanong et al. [14] proposed to answer visible
knn queries. Patroumpas et al. [15] studied the problem of
monitoring object orientations. However their methods cannot
support our problem as we support keyword-based search. We
consider direction constraint which is different from theirs.
Although we can build two separate indexes, one for key-
words and another for locations, this method is expensive, as
it cannot simultaneously apply textual and spatial pruning.
VIII. CON CL USION
In this paper we have studied the problem of direction-
aware spatial keyword search. We find the knearest neighbors
to the query that contain all input keywords and satisfy the
direction constraint. To efficiently answer a direction-aware
spatial keyword query, we proposed novel indexing structures,
which can prune large number of unnecessary POIs. We
developed effective region-based pruning and direction-based
pruning techniques to increase the search performance. We
devised efficient algorithms to answer direction-aware spatial
keyword queries. We also studied how to incrementally answer
a query. We have implemented our algorithms, and experimen-
tal results show that our method achieves high performance
and outperforms existing methods significantly.
IX. ACKNOW LEDGE ME NT
The authors would like to thank the anonymous reviewers for
their constructive comments and suggestions. This work was partly
supported by the National Natural Science Foundation of China under
Grant No. 61003004 and 60873065, National Grand Fundamental Re-
search 973 Program of China under Grant No. 2011CB302206, Na-
tional S&T Major Project of China under Grant No. 2011ZX01042-
001-002, and “NExT Research Center” funded by MDA, Singapore,
under Grant No. WBS:R-252-300-001-490.
REF ERENC ES
[1] X. Cao, G. Cong, and C. S. Jensen. Retrieving top-k prestige-based
relevant spatial web objects. PVLDB, 3(1):373–384, 2010.
[2] X. Cao, G. Cong, C. S. Jensen, and B. C. Ooi. Collective spatial keyword
querying. In SIGMOD Conference, pages 373–384, 2011.
[3] Y.-Y. Chen, T. Suel, and A. Markowetz. Efficient query processing in
geographic web search engines. In SIGMOD Conference, pages 277–
288, 2006.
[4] R. Cheng, J. Chen, M. F. Mokbel, and C.-Y. Chow. Probabilistic
verifiers: Evaluating constrained nearest-neighbor queries over uncertain
data. In ICDE, pages 973–982, 2008.
[5] G. Cong, C. S. Jensen, and D. Wu. Efficient retrieval of the top-k most
relevant spatial web objects. PVLDB, 2009.
[6] I. D. Felipe, V. Hristidis, and N. Rishe. Keyword search on spatial
databases. In ICDE, 2008.
[7] H. Ferhatosmanoglu, I. Stanoi, D. Agrawal, and A. E. Abbadi. Con-
strained nearest neighbor queries. In SSTD, pages 257–278, 2001.
[8] Y. Gao, B. Zheng, W.-C. Lee, and G. Chen. Continuous visible nearest
neighbor queries. In EDBT, pages 144–155, 2009.
[9] R. Hariharan, B. Hore, C. Li, and S. Mehrotra. Processing spatial-
keyword (SK) queries in geographic information retrieval (GIR) systems.
In SSDBM, 2007.
[10] G. R. Hjaltason and H. Samet. Distance browsing in spatial databases.
ACM Trans. Database Syst., 1999.
[11] M. R. Kolahdouzan and C. Shahabi. Voronoi-based k nearest neighbor
search for spatial network databases. In VLDB, pages 840–851, 2004.
[12] K. W.-T. Leung, D. L. Lee, and W.-C. Lee. Personalized web search
with location preferences. In ICDE, pages 701–712, 2010.
[13] J. Lu, Y. Lu, and G. Cong. Reverse spatial and textual k nearest neighbor
search. In SIGMOD Conference, pages 349–360, 2011.
[14] S. Nutanong, E. Tanin, and R. Zhang. Visible nearest neighbor queries.
In DASFAA, pages 876–883, 2007.
[15] K. Patroumpas and T. K. Sellis. Monitoring orientation of moving
objects around focal points. In SSTD, pages 228–246, 2009.
[16] S. Pramanik and J. Li. Fast approximate search algorithm for nearest
neighbor queries in high dimensions. In ICDE, page 251, 1999.
[17] J. B. Rocha-Junior, A. Vlachou, C. Doulkeridis, and K. Nørv ˚ag. Efficient
processing of top-k spatial preference queries. PVLDB, 4(2):93–104,
2010.
[18] N. Roussopoulos, S. Kelley, and F. Vincent. Nearest neighbor queries.
In SIGMOD Conference, 1995.
[19] S. B. Roy and K. Chakrabarti. Location-aware type ahead search on
spatial databases: semantics and efficiency. In SIGMOD Conference,
pages 361–372, 2011.
[20] Y. Tao, D. Papadias, and Q. Shen. Continuous nearest neighbor search.
In VLDB, pages 287–298, 2002.
[21] D. Wu, M. L. Yiu, C. S. Jensen, and G. Cong. Efficient continuously
moving top-k spatial keyword query processing. In ICDE, pages 541–
552, 2011.
[22] B. Yao, F. Li, M. Hadjieleftheriou, and K. Hou. Approximate string
search in spatial databases. In ICDE, 2010.
[23] D. Zhang, Y. M. Chee, A. Mondal, A. K. H. Tung, and M. Kitsuregawa.
Keyword search in spatial databases: Towards searching by document.
In ICDE, 2009.
[24] D. Zhang, B. C. Ooi, and A. K. H. Tung. Locating mapped resources
in web 2.0. In ICDE, pages 521–532, 2010.
[25] Y. Zhou, X. Xie, C. Wang, Y. Gong, and W.-Y. Ma. Hybrid index
structures for location-based web search. In CIKM, 2005.