ArticlePDF Available

DESKS: Direction-aware spatial keyword search

Authors:

Abstract

Location-based services (LBS) have been widely accepted by mobile users. Many LBS users have direction-aware search requirement that answers must be in a search direction. However to the best of our knowledge there is not yet any research available that investigates direction-aware search. A straightforward method first finds candidates without considering the direction constraint, and then generates the answers by prun-ing those candidates which invalidate the direction constraint. However this method is rather expensive as it involves a lot of useless computation on many unnecessary directions. To address this problem, we propose a direction-aware spatial keyword search method which inherently supports direction-aware search. We devise novel direction-aware indexing structures to prune unnecessary directions. We develop effective pruning techniques and search algorithms to efficiently answer a direction-aware query. As users may dynamically change their search directions, we propose to incrementally answer a query. Experimental results on real datasets show that our method achieves high performance and outperforms existing methods significantly.
DESKS: Direction-Aware Spatial Keyword Search
Guoliang Li, Jianhua Feng, Jing Xu
Department of Computer Science, Tsinghua University, Beijing 100084, China
liguoliang@tsinghua.edu.cn; fengjh@tsinghua.edu.cn; xmandbq@gmail.com
Abstract Location-based services (LBS) have been widely
accepted by mobile users. Many LBS users have direction-aware
search requirement that answers must be in a search direction.
However to the best of our knowledge there is not yet any
research available that investigates direction-aware search. A
straightforward method first finds candidates without considering
the direction constraint, and then generates the answers by prun-
ing those candidates which invalidate the direction constraint.
However this method is rather expensive as it involves a lot of
useless computation on many unnecessary directions. To address
this problem, we propose a direction-aware spatial keyword
search method which inherently supports direction-aware search.
We devise novel direction-aware indexing structures to prune
unnecessary directions. We develop effective pruning techniques
and search algorithms to efficiently answer a direction-aware
query. As users may dynamically change their search directions,
we propose to incrementally answer a query. Experimental results
on real datasets show that our method achieves high performance
and outperforms existing methods significantly.
I. INTRODUCT ION
Location-based services (LBS) have been widely accepted
by mobile users. Many online location-based services are
available, such as AT&T (http://www.wireless.att.com/lbs) and
go2 (http://www.go2.com/). Recently many LBS users have
direction-aware search requirement that answers must be in a
search direction. For example, a user on the highway wants
to find nearest gas stations or restaurants. She has a search
requirement that the answers should be in the right front of
her driving direction, if in a right-hand traffic country (e.g., US
and China). Consider another example that a user is walking
to a supermarket. She wants to find an ATM around her walk
direction so as to avoid a long walk. In this case she also has
a direction-aware search requirement. There are many other
direction-aware search requirements in LBS, e.g., multiple
destination routing and virtual reality (to show local 3D
streetscape). More importantly, many modern mobilephones
(e.g., iPhone 4 and HTC) have GPS and compass. We can
easily get user’s location via the GPS and direction by the
compass. Thus we can utilize user’s location and search
direction to improve user search experiences in LBS.
However to the best of our knowledge there is not yet
any research available that investigates direction-aware search.
A straightforward method to support direction-aware search
first finds the candidates without considering the direction
constraint (e.g, [6] and [5]) and then generates the answers
by pruning those candidates that invalidate the direction con-
straint. However this method is rather expensive as it involves
a lot of useless computation on many unnecessary directions.
To address this problem, we propose a direction-aware
spatial keyword search method, called DESK S, which inher-
ently supports direction-aware search. We first formulate the
problem of direction-aware spatial keyword search as follows.
Consider a set of Points of Interest (POIs) where each POI
is associated with spatial information and textual description.
Given a direction-aware spatial keyword query with a location,
a direction, and a set of keywords, the direction-aware search
finds knearest neighbors of the query which are in the search
direction and contain all input keywords.
To support direction-aware spatial keyword queries, we
devise novel direction-aware index structures to prune un-
necessary directions. We first group the POIs based on their
distances to the bottom-left point of the Minimum Bounding
Rectangle (MBR) that contains all POIs. Then for POIs in each
group, we sort them based on their directions to the bottom-left
point. Given a query, we can deduce a direction range with
a lower direction bound and an upper direction bound. We
can prove that for any POI if its direction to the bottom-left
point is not in the direction range of the query, it will not be
an answer, and we can prune the POI. Similarly we can also
prune a group of POIs based on the direction range. Motivated
by this observation, we develop novel direction-aware index
structures, effective pruning techniques, and efficient search
algorithms to facilitate direction-aware spatial keyword search.
To summarize, we make the following contributions.
We formulate the problem of direction-aware spatial
keyword search and propose an efficient direction-aware
search method to address this problem.
We devise a novel direction-aware index structure which
groups the POIs based on their distances and directions.
The indexing structures can be used to effectively prune
many unnecessary POIs.
We develop effective pruning techniques and search algo-
rithms to answer direction-aware spatial keyword queries.
As mobilephone users may dynamically change search
directions, we propose to incrementally answer a query
based on the cached results of previously issued queries.
We have implemented our method, and the experimental
results show that our method achieves high performance
and outperforms existing methods significantly.
The rest of this paper is organized as follows. We first
formulate the problem of direction-aware spatial keyword
search and devise a novel indexing structure in Section II. We
develop effective pruning techniques in Section III. Section IV
gives efficient algorithms to answer a direction-aware query.
We discuss how to incrementally answer a query in Section V.
Experiment results are provided in Section VI. We review
related works in Section VII and conclude in Section VIII.
II. DIREC TI ON-AWARE SPATIA L KEY WORD SEA RCH
A. Problem Formulation
Data: Consider a set of POIs, P={p1, p2,··· , p|P |}. Each
POI pihas a location (pi.x, pi.y)where pi.x is the x-
coordinate and pi.y is the y-coordinate of the POI. piis also
associated with a set of keywords, denoted by pi.d. Thus a
POI is denoted by p=h(p.x, p.y ); p.di.
Query: A query qcontains a location (q.x, q.y)with an x-
coordinate q.x and a y-coordinate q.y. Query qhas a direction
constraint [α, β], which denotes that the user is only interested
in the POIs with directions to qin [α, β ]. Query qcontains
a set of user-input keywords K={k1, k2,··· , k|K| }. Users
can specify an integer kto find top-krelevant answers. Thus
query qis denoted by q=h(q.x, q.y); [α, β]; K;ki.
Answer: Let Rdenote the Minimum Bounding Rectangle
(MBR) that contains all POIs in P. Given a query qwith
direction [α, β], let Sqdenote the sector centered at qwith a
radius rand an angle from αto β, where ris the maximal
distance from qto the boundary of region R. Let Rqdenote
the intersection of Sqand R, which is the search region
satisfying the direction constraint. A POI pis an answer of
query q, if pis in Rqand p.d contains all keywords in K.
Let Pqdenote the set of all answers of q. We find knearest
neighbors of qfrom Pq. Next we formulate our problem.
Definition 1(DIRE CTION -AWARE SPATIA L KEY WORD
SEA RCH) Given a set of POIs Pand a query q=
h(q.x, q.y ); [α, β]; K;ki, let Pqdenote the set of POIs in Rq
that contain all keywords in K. DESK S finds a subset Pk
qof
Pqwith kPOIs such that p Pk
qand p Pq Pk
q,
dist(p, q)dist(p, q), where dis t(·)is a distance function
and in the paper we use Euclidean distance.
Consider an example in Figure 1. There are 24 POIs. Given
a query qwith keywords chinese food”, the ten highlighted
POIs p3, p4, p5, p6, p9, p12, p15 , p21, p22 , p23 contain the two
keywords. If we have no direction constraint, p3and p4are two
nearest neighbors. If we have direction constraint as shown in
Figure 1, p12 and p22 are two nearest neighbors.
We can extend existing spatial keyword search methods (e.g,
[6] and [5]) to support our problem. The method contains two
steps. (1) The filter step: It ignores the direction constraint
and finds knearest neighbors of query qwhich contain all
keywords. (2) The verification step: For each found POI in the
first step, it checks whether the POI is in the search direction.
If yes, it is a knearest neighbor of q. As most knearest
neighbors of qmay invalidate the direction constraint, it needs
to repeatedly execute the two steps until finding kanswers.
Although we can incorporate the verification step into the filter
step, this method still needs to visit many unnecessary POIs.
To address this problem, we propose a direction-aware spatial
keyword search method to achieve a high performance.
B. Direction-aware Indexing Structures
Given a set of POIs, we first generate the MBR Rthat
contains all POIs. Let Obl , Obr, Otr , Otl respectively denote
We suppose q∈R and our method can be extended to support q6∈R.
R1
R1
R2
R2
R2
R3
R3
p3
p4
p5
p6
p9
p11
p12
p15
p21
p22
p23
R1
R1
R1
R2
R2
R2
R3
R3
p3
p4
p5
p6
p7
p9
p12
p15
p21
p22
p23
RPRP
0
1
2
3
4
5
6
7
8
0
1
2
3
4
5
6
7
8
9
10
9
10
2
3
1
2
4
3
4
2
3
4
1
2
4
3
4
R11={p1, p2},R12={p3, p4},R13={p5, p6},R14={p7, p8}
R21={p9, p10},R22={p11, p12},R23={p13, p14 },R24={p15, p16}
R31={p17, p18},R32={p19, p20},R33={p21, p22},R34={p23, p24 }
Fig. 1. A running example
the bottom-left point, the bottom-right point, the top-right
point, and the top-left point of Ras shown in Figure 1.
We sort the POIs based on their distances to the bottom-left
point Obl. Without loss of generality, assume the sorted POIs
are p1, p2,··· , p|P| where dist(pi, Obl)dist(pj, Obl)for
i < j. Then we evenly partition them into Ndisjoint buckets,
B1, B2,··· , BN. If every POI has a distinct distance to Obl ,
we have Bi={p(i1)×λ+1,··· , pi×λ}for 1iN1and
BN={p(N1)×λ+1,··· , p|P |}where λ=|P |
N. If multiple
POIs have the same distance to Obl, we partition the POIs
into different buckets as follows. We first put the first λPOIs
into the first bucket B1. If dist(pλ+1, Obl)=dist(pλ, Obl),
we add pλ+1 into B1; otherwise, we add λPOIs starting
with pλ+1 into B2. Iteratively we can put each POI into a
bucket. Let ri1denote the smallest distance of POIs in Bi
for 1iN. We draw N1arcs centered at Obl with
radiuses r1, r2,··· , rN1. The N1arcs partition Rinto N
regions (quarter concentric rings) R1,R2,··· ,RN, where R1
is within r1,RNis outside rN1, and Riis between ri1
and rifor 1< i < N. Obviously the POIs in Bifall in
Ri. Especially a POI on the i-th arc belongs to region Ri+1 .
Obviously the distance of any POI in Rito Obl is in [ri1, ri)
for 1i < N (r0=dist(p1, Obl)). For example, in Fig-
ure 1, we partition POIs into three regions R1={p1,··· , p8},
R2={p9,··· , p16}, and R3={p17,··· , p24}.
Each POI pin region Rihas a direction to the bottom-
left point Obl , denoted by pθ=arctan p.yObl.y
p.xObl.x . For ease of
presentation, suppose Obl =(0,0). Thus pθ=arctan p.y
p.x . We
sort POIs in Ribased on their directions in ascending order.
Similarly we evenly partition POIs in Riinto Mbuckets
Bi1, Bi2,··· , BiM. Each bucket contains about |P|
M×NPOIs.
Suppose the minimal direction of POIs in bucket Bijis
θij1for 1jM. We use M1lines from Obl
with directions θi1, θi2,··· , θiM1to partition Riinto M
sub-regions (a part of concentric rings) Ri1,Ri2,··· ,RiM.
Obviously the direction of any POI in Rijis in [θij1, θij).
For example, in Figure 1, we partition each Riinto four sub-
regions. For instance, we partition R2into R21={p9, p10},
R22={p11, p12},R23={p13, p14}, and R24={p15 , p16}.
bl br
trtl
i
i-1
1
N
1 2 iN
12
i-1
1 2
Ri
Ri
Ri
p4
p7
p12
p18
p57
p68
p79
p22
p23
p34
p48
p57
p64
p92
if have large memory if have small memory
RPRP
Ri1RijRiM
j
ij
1
3
7
Ri
Ri
Ri
2
3
8
i
Fig. 2. Indexing structure
Our region structure is illustrated in Figure 2, which has
two salient features. Firstly given two sub-regions Risand
Rjt, for any POI p Risand p Rjt, if i < j, we
have dist(p, Obl)<dist(p, Obl). Secondly given two sub-
regions Risand Rit, for any POI p Risand p Rit, if
s < t, we have pθ< p
θ. We will use these two features to do
efficient pruning. Notice that traditional MBRs have no such
features, thus we propose the new index structure to facilitate
direction-aware search.
Although we can use the region structure to do spatial
pruning, we cannot use it to do textual pruning. To address this
issue, we build an inverted list for keywords in each sub-region
Rij. We give the space complexity of our index structure. For
the region structure, its space complexity is O(M×N). As
M×Nis not large (N=1000, M=600 for 16 million POIs,
see Section VI), we can keep the region structure in memory.
For the inverted lists, suppose each POI contains Wdistinct
keywords in average. The total inverted-list size is O(|P W).
If the inverted-list size is very large, we use a disk-based
structure. For each keyword kx, we maintain two inverted lists:
(1) The region list LR
kxthat keeps the sorted IDs of sub-regions
that contain kx. The sub-regions are sorted as follows. Ris<
Rjtif i < j, and Ris<Ritif s < t; (2) The POI list
LP
kxthat keeps the sorted IDs of POIs that contain kx: The
POIs in different sub-regions are sorted by sub-region order
and the POIs in the same sub-region are sorted by directions.
In LR
kx, for each Rij LR
kx, we also maintain a pointer to the
POI list LP
kxthat keeps the position of the smallest POI ID in
Rij∩LP
kx. Based on the sorted property, suppose Rijs pointer
is lijand the pointer of its next sub-region is lij+1 . We can
efficiently find POIs in Rijthat contain keyword kxfrom LP
kx,
e.g., the POIs in LP
kx[lij, lij+1 ). Suppose each sub-region Rij
contains Ldistinct keywords in average. The space complexity
of the disk-based inverted list is O(|P | × W+L×M×N).
The overall index structure is shown in Figure 2. Note that to
efficiently answer a query, besides building an index structure
for Obl, we also maintain index structures for Obr, Otr , Otl.
Thus the total index size is four times of that for Obl .
For example, in Figure 1, there are 24 POIs. Suppose
N=3and M=4. We generate 12 sub-regions, R11,··· ,R14,
q
i
ii
ij
ij
ij-1
pi-1j
pij
p
i-1j-1
pij-1
R
R
ij
ij-1
i-1
i-1
Fig. 3. Notations
R21,··· ,R24,R31,··· ,R34. Each sub-region has two POIs.
For example, in R22, there are two POIs p11 and p12 .
For keyword chinese”, we maintain a region inverted list
which has seven sub-regions and a POI inverted list that has
eleven POIs as shown in Figure 1. The pointer of R13is
LP
chinese[2] = p5, that is p5is the smallest POI in R13that
contains chinese”. Thus we can easily get POIs in R13
that contain chinese using its pointer as the start position
(LP
chinese[2]) and using the pointer of its next sub-region as
the end position (LP
chinese[4]), i.e., LP
chinese[2,4) = {p5, p6}.
In this paper we study how to use our index structures to
answer a direction-aware spatial keyword query and leave data
update as a future work.
C. Notations
For ease of presentation, we introduce some notations as
shown in Figure 3. Let qθ= arctan q.y
q.x denote the direction
of qto Obl and qd=dist(q, Obl )denote the distance of qto
Obl. Given a region Ri, let ri1and rirespectively denote the
radius of its inner arc and its outer arc. Given a sub-region Rij,
we use a quadruple to denote the region, hri1, ri, θij1, θiji,
where θij1is the minimum direction and θijis the maximal
direction of POIs in Rijto Obl . Let pi1j, pi1j1, pij, pij1
respectively denote the bottom-left point, bottom-right point,
top-left point, and top-right point of Rij(Figure 3).
Let qri1
α(qri1
β)denote the intersection of the line from q
with α(β)direction and the inner arc of Ri(with radius ri1).
i
i-
i i
Fig. 4. Pruning R1,· · · ,Ri1
bl br
tr
tl
i
i- i
MinDist q,Rii- d
d
(a) αqθβ
bl br
tr
tl
i
i- i
MinDist q,Ridist q
i-1
ri-1
(b) qθ< α
bl br
tr
tl
i
i- i
i-1
MinDist q,Ridist q ri-1
(c) qθ> β
Fig. 5. MI NDI ST(q,Ri)
As qri1
α(qri1
α.x, qri1
α.y)is on the arc with radius ri1, we
have (qri1
α.x)2+ (qri1
α.y)2=r2
i1. In addition, as the point
is on the line with direction αto q,(qri1
α.y q.y)/(qri1
α.x
q.x) = tan α. Thus we can compute the x-coordinate and
y-coordinate of qri1
αusing the following Equations
(qri1
α.y q.y)/(qri1
α.x q.x) = tan α
(qri1
α.x)2+ (qri1
α.y)2=r2
i1
(1)
Similarly, we can compute the point qri1
β.
Let qθij1
α(qθij
α)denote the intersection of the line from
qwith αdirection and the line from Obl with θij1(θij)
direction. Similarly we can define qθij1
βand qθij
β. As
qθij1
α(qθij1
α.x, qθij1
α.y)is on the line with direction θij1
to Obl,(qθij1
α.y)/(qθij1
α.x) = tan θij1. As the point is on
the line with direction αto q,(qθij1
α.y q.y)/(qθij1
α.x
q.x) = tan α. Thus we can compute the x-coordinate and
y-coordinate of qθij1
αusing the following Equations
((qθij1
α.y q.y)/(qθij1
α.x q.x) = tan α
(qθij1
α.y)/(qθij1
α.x) = tan θij1
(2)
Similarly, we can compute the points qθij
α,qθij1
β, and qθij
β.
Suppose the intersection of the line from qwith α(β)
direction and the boundary of Ris qR
α(qR
β)as shown in
Figure 3. Next we discuss how to compute qR
α. Suppose q
θ
denote the direction from qto the top-right point Otr. If
α > q
θ,qR
αwill fall on the top line from Otl to Otr . In this
case the y-coordinate of qR
α,qR
α.y =H, and x-coordinate
of qR
α,qR
α.x =q.x + (Hq.y)/tan α, where His the
height of the MBR R. If α=q
θ,qR
αis exactly Otr. If
α < q
θ,qR
αwill fall on the right line from Obr to Otr. In
this case the x-coordinate qR
α.x =Land the y-coordinate
qR
α.y =q.y + (Lq.x)×tan α, where Lis the length of the
MBR R. Thus we can compute the point qR
αas follows.
(qR
α.x, qR
α.y) =
(q.x +Hq.y
tan α, H)α > q
θ
(L, H)α=q
θ
(L, q.y + (Lq.x)×tan α)α < q
θ
(3)
Similarly we can compute qR
β. We will use the above-
mentioned points to do pruning in the following sections.
In this section, we suppose 0αβπ
2and our technique can be
easily extended to support other directions (Section IV).
III. PRUN ING UNNE CE SSARY REG IONS
In this section, we propose effective pruning techniques
to prune unnecessary regions Ri(Section III-A) and Rij
(Section III-B). We first consider the direction in 0α
βπ
2and discuss how to support any direction in Section IV.
A. Pruning Region Ri
Consider regions R1,R2,··· ,RNwith the radiuses of their
outer circles respectively r1, r2,··· , rN. Given a query q,
we first locate in which region qappears. To this end, we
first compute its distance to Obl ,qd. Then we use a binary
search on r1, r2,··· , rNto find the first radius which is larger
than qd. Suppose we find risuch that ri1qd< rias
shown in Figure 4. We can prove that any POI in regions
R1,R2,··· ,Ri1will not be an answer of query q, as they
are not in the search direction as formalized in Lemma 1.
Lemma 1Given a query point qwith 0αβπ
2,
suppose ri1qd< ri. Any POI in R1,R2,··· ,Ri1
cannot be an answer of q.
Lemma 1 holds for any query with direction 0αβ
π
2. For example, in Figure 1, we can directly prune region R1
and all POIs in R1do not need to be accessed. Note that it may
not hold if β > π
2. Consider a counter-example where a query
qis on the bottom line from Obl to Obr. If βis larger than π
2,
the search direction may have overlap with Ri1. Similarly
αshould be no smaller than 0and the counter-example is a
query on the left line from Obl to Otl.
MIN DIST function for Ri:To facilitate nearest neighbor
search, traditional methods use function M IN DIST to estimate
the distance between a query and an MBR [10]. Formally,
given a query qand an MBR mbr, function MIN DIS T(q, mbr)
returns the minimal distance of qto mbr. As Riin our method
is not an MBR and our query has direction constraint, we
extend the function to support our problem as follows.
If qis outside the outer arc of Ri(qdri), we have
MIN DIST(q, Ri)=based on Lemma 1. If qis in Ri(ri1
qd< ri), we have MINDIS T(q, Ri) = 0. If qis inside the
inner arc of Ri(qd< ri1), we give the function as follows.
Consider the direction of qto Obl ,qθ. If αqθβ, the near-
est neighbor of qin Riis the intersection of the line with qθ
direction and the inner arc of Riwith radius ri1(Figure 5(a)).
Thus MI NDIST(q, Ri) = ri1qd. If qθ< α, the nearest
In this paper, we omit the proofs of Lemmas due to space constraints.
i
ii
R
R
R
u
R
l
p
(a) τR
l=θqR
αand τR
u=θqR
β
i
ii
R
R
R
u
R
l
p
(b) τR
l=qθand τR
u=θqR
β
i
ii
R
R
R
u
R
l
p
(c) τR
l=θqR
αand τR
u=qθ
Fig. 6. Direction-based pruning for regions Ri,· · · ,RN
i
l
u
i
i-
Ri
Ri
R
R
i
i
R
u
R
l
Fig. 7. Direction-based Pruning for Ri
neighbor of qin Riis qri1
αwhich is the intersection of the line
from qwith αdirection and the inner arc of Ri(Figure 5(b)).
Thus MI NDIST(q, Ri) = dist(q, qri1
α). Similarly if qθ> β,
MIN DIST(q, Ri) = dist(q, q ri1
β)(Figure 5(c)).
Thus we give the MINDIST function as follows.
MIN DIST(q, Ri) =
qdri
0ri1qd< ri
ri1qdqd< ri1&αqθβ
dist(q, qri1
α)qd< ri1&qθ< α
dist(q, qri1
β)qd< ri1&qθ> β
(4)
where qri1
αand qri1
βcan be computed using Equation 1.
Given a query q, we first find its located region Ri
and access the POIs in Ri. Then we verify whether the
POIs satisfy the direction constraint and contain all key-
words. Suppose the k-th smallest distance of the candi-
dates that have been computed is dk. Then for the next
region Ri+1, if MINDIST (q, Ri+1)dk, we terminate and
prune Ri+1,··· ,RN; otherwise we access POIs in Ri+1 .
Iteratively we can find all answers. As we use the best-
first search method, we only utilize MINDIS T function and
will not use MINMAX DIS T function [10]. For example, in
Figure 1, suppose k= 1. In R2, we find an answer p12.
As MI NDIST(q, R3)>di st(q, p12), we terminate and prune
POIs in R3.
However this method neglects the fact that some sub-
regions Rijin Rimay not satisfy the direction constraint.
For example, in Figure 1, although R21has a POI p9which
contains all keywords, we can prune the region as it is not in
the search direction. Similarly we can prune R24. To achieve
our goal, we discuss how to effectively prune Rijin Ri.
B. Pruning Regions Rij
In this section, we first introduce how to prune some
unnecessary sub-regions Rijwhich have no overlap with the
search direction, and then give the function MINDIST(q, Rij).
In the rest of this paper, if the context is clear, the term
“region” and “sub-region” are used interchangeably for Rij.
Our indexing structure has a salient feature: If a POI pis an
answer of q, its direction (pθ= arctan p.y
p.x )to Obl must be in
a range [τR
l, τ R
u]. In other words, we can prune the POIs with
direction smaller than τR
lor larger than τR
u. Next we discuss
how to deduce the lower bound τR
land the upper bound τR
u.
Given query qwith direction [α, β], consider the intersection
qR
α(qR
β)of the line from qwith α(β)direction and the
boundary of region Ras shown in Figure 6. Let θqR
αand
θqR
βrespectively denote the directions of points qR
αand qR
β
to Obl. As αβ,θqR
αθqR
β. Let τR
l= min(θqR
α, qθ)and
τR
u= max(θqR
β, qθ). For any point p, if pθ> τR
u, its direction
to qmust be larger than β, thus pcannot be an answer of q
(Figure 6(b)). Similarly, if pθ< τ R
l, its direction to qmust be
smaller than α, thus pcannot be an answer of q(Figure 6(c)).
The correctness is formalized in Lemma 2.
Lemma 2Given a query qwith direction [α, β ], let τR
l=
min(θqR
α, qθ)and τR
u= max(θqR
β, qθ). For any POI p, if
pθR
uor pθR
l,pcannot be an answer of q.
Based on Lemma 2 we only need to access the POIs with
directions between τR
land τR
u. Moreover, a region Rijhas
a lower direction bound θij1and an upper direction bound
θij, which respectively denote the minimal direction and the
maximal direction of POIs in Rij. In other words, for any POI
p Rijwe have θij1pθ< θij. Based on Lemma 2, for
region Rijwith direction [θij1, θij), if θijτR
lor θij1>
τR
u, we can prune the region Rijas formalized in Lemma 3.
Lemma 3Given a query qwith direction [α, β ], let τR
l=
min(θqR
α, qθ)and τR
u= max(θqR
β, qθ). For any region Rij
with direction [θij1, θij), if θijτR
lor θij1> τR
u, any
POI in Rijcannot be an answer of q.
For example, in Figure 1, although R21and R24have POIs
that contain all keywords, we can prune them as they are
not in search direction based on the direction-based pruning
technique in Lemma 3. Notice that this pruning technique is
valid for all regions. Next we devise tighter direction bounds
for region Ri. Let τRi
ldenote the tighter lower bound and
τRi
udenote the tighter upper bound for Ri. For any POI pin
Ri, if pθ< τRi
lor pθ> τRi
u, we can prune the POI. Next
we discuss how to deduce the two tighter bounds.
Consider the intersection of the line from qwith α(β)
direction and the outer arc of Ri, denoted by qri
α(qri
β). The
two points can be computed by Equation 1. Let θqri
α, θqri
β
respectively denote the directions of points qri
α, qri
βto Obl.
It is easy to figure out that if qri
αis in region R(denoted
by qri
α R), θqri
αθqR
α; otherwise θqri
α< θqR
α(Figure 7).
Similarly if qri
β R,θqri
βθqR
β; otherwise θqri
β> θqR
β. Based
on this observation, we give the tighter bounds τRi
land τRi
u.
τRi
l=
qθqθα
θqri
αqθ> α &qri
α R
θqR
αqθ> α &qri
α6∈ R
(5)
bl br
tr
tl
i-1
i
i-
iMinDist q,Ri q, p
q p
ij
ij-1
i-1j-1
j
p
i-1j-1
ij
i-1j-1
(a) R<
i[0, θij1)
bl br
trtl
i
i
i-
i
q
ij
ij-1
ij
qri-1
MinDist q,Ri q,
j
i-1
(b) R<
i[θij1, θij)
bl br
tr
tl
i-
i
i-
i
ij
ij
ij-1
MinDist q,Ri q,
j
ij
qij
q p
i-1j
pi-1j
(c) R<
i[θij,π
2]
bl br
tr
tl
i-
i
i-
iMinDist q,Ri q,
j
ij
ij-1
ij
ij-1
qij-1
p
i-1j-1
(d) Ri[0, θij1)
bl br
tr
tl
i-
i
i-
iMinDist(q,Ri ) = 0
ij
ij-1
ij
j
(e) Ri[θij1, θij)
bl br
tr
tl
i-
i
i-
iMinDist(q,Ri )=dist(q, )
ij
ij
ij-1
ij
jqij
pi-1j
(f) Ri[θij,π
2]
Fig. 8. MI NDI ST(q,Rij)
τRi
u=
qθqθβ
θqri
βqθ< β &qri
β R
θqR
βqθ< β &qri
β6∈ R
(6)
Then consider region Rijwith the minimal direction θij1
and the maximal direction θij. If θijτRi
lor θij1> τRi
u,
region Rijhas no overlap with the search direction, thus
we can prune Rij. In other words, for Ri, we only need to
access the regions Ril,··· ,Riu, such that θil1τRi
liland
θiu1τRi
uiu. To efficiently identify such regions, we use
τRi
lto do a binary search on the directions of regions in Ri,
{θi1,··· , θiM}, and find the smallest one which is larger than
τRi
l, i.e., Ril. Then we use τRi
uto do a binary search on the
directions in {θil+1 ,··· , θiM}, and find the largest one which
is smaller than τRi
u, i.e., Riu. Thus we only need to access
Ril,··· ,Riu. Lemma 4 formalizes the pruning technique.
Lemma 4Given a query qwith direction [α, β ]and a region
Ri, let τRi
l=min(θqri
α, qθ)and τRi
u=max(θqri
β, qθ). For any
POI p∈Ri, if pθRi
uor pθRi
l,pcannot be an answer of q;
For any region Rij∈Riwith direction [θij1, θij), if θijτRi
l
or θij1Ri
u, any POI in Rijcannot be an answer of q.
Consider the example in Figure 1. We can prune regions
R21and R24in R2, and regions R31and R34in R3.
MIN DIST for Rij:For each region Rijin {Ril,··· ,Riu},
we use MINDI ST function to estimate the distance between
qand Rij, i.e., MI NDI ST(q, Rij). To this end, we partition
Rinto three regions by the inner arc (ri1) and the outer arc
(ri), i.e., the region inside the inner arc R<
i, the region Ri,
and the region outside R>
i. Obviously, if q R>
i, any POI
in Rijwill not be an answer of qbased on Lemma 1, thus
MIN DIST(q, Rij) = . For R<
iand Ri, we respectively par-
tition them into three regions based on the two directions θij1
and θij, denoted by R<
i[0, θij1),R<
i[θij1, θij),R<
i[θij,π
2],
and Ri[0, θij1),Ri[θij1, θij),Ri[θij,π
2](Figure 8).
(1) q∈R<
i[0, θij1)(Figure 8(a)). If we have no direction
constraint, the nearest neighbor of qis the bottom-right point
pi1j1. Next, we consider the case with direction [α, β]. Let
θ(q, pi1j1)denote the direction from qto pi1j1. If α
θ(q, pi1j1)β, the nearest neighbor of qis still pi1j1.
If θ(q, pi1j1)< α, the nearest neighbor of qis qri1
α, which
is the intersection of the line from qwith αdirection and the
arc with radius ri1(computed by Equation 1). Similarly if
θ(q, pi1j1)> β, the nearest neighbor of qis qθij1
β, which
is the intersection of the line from qwith βdirection and the
line from Obl with θij1direction (computed by Equation 2).
(2) q∈R<
i[θij1, θij)(Figure 8(b)). If αqθβ, the nearest
neighbor of qis qri1
θwhich is the intersection of the line from
qwith qθdirection and the arc with radius ri1. The distance
is ri1qd. If qθ< α, the nearest neighbor of qis qri1
α. If
qθ> β, the nearest neighbor of qis qri1
β.
(3) q∈R<
i[θij,π
2](Figure 8(c)). Similar to case (1), consider
the bottom-left point pi1j. Let θ(q, pi1j)denote the direc-
tion from qto pi1j. If αθ(q, pi1j)β, the nearest neighbor
of qis pi1j. If θ(q, pi1j), the nearest neighbor of qis
qθij
α. If θ(q, pi1j), the nearest neighbor of qis qri1
β.
(4) q∈Ri[0, θij1)(Figure 8(d)). As βπ
2, the nearest
neighbor of qmust be qθij1
β(computed by Equation 2).
(5) q∈Ri[θij1, θij)(Figure 8(e)). As qis in Rij,
MIN DIST(q, Rij)=0.
(6) q∈Ri[θij,π
2](Figure 8(f)). As α0, the nearest neighbor
of qmust be qθij
α(computed by Equation 2).
To summarize, we give function MINDIST (q, Rij)in Table I.
TABLE I
MIN DIS T(q, Rij)
Regions MIN DIS T(q, Rij)
R>
i
R<
i[0, θij1)
dist(q, qri1
α)θ(q, pi1j1)< α
dist(q, pi1j1)αθ(q, pi1j1)β
dist(q, qθij1
β)θ(q, pi1j1)> β
R<
i[θij1, θij)
dist(q, qri1
α)qθ< α
ri1qdαqθβ
dist(q, qri1
β)qθ> β
R<
i[θij,π
2]
dist(q, qθij
α)θ(q, pi1j)< α
dist(q, pi1j)αθ(q, pi1j)β
dist(q, qri1
β)θ(q, pi1j)> β
Ri[0, θij1)dist(q, qθij1
β)
Ri[θij1, θij)0
Ri[θij,π
2]dist(q, qθij
α)
IV. SEA RCH ALGORI THMS
In this section, we first give an algorithm to answer a query
with direction 0αβπ
2(Section IV-A), and then discuss
how to answer a query with any direction (Section IV-B).
A. Answering Queries with 0αβπ
2
We combine our pruning techniques and M IN DIST func-
tions to answer a query with direction 0αβπ
2.
Figure 9 gives the pseudo-code of our algorithm. To efficiently
find knearest neighbors of q, we maintain a priority queue Q
(line 2) and keep the k-th smallest distance of POIs in Q
to q(dk) that have already been computed (line 3). Given
a query q, we first locate which region query qappears
using a binary search method on radiuses r1, r2,··· , rN
(line 4). Suppose we find Risuch that ri1qd< ri. If
MIN DIST(q, Ri)dk, we terminate as there is no answer
in Ri···RN(line 6); otherwise for each region Ri, we find
the candidate regions which have overlap with the search
direction and contain all keywords in K, by calling function
FIN DCANDREGIO NS(line 7). Next for each candidate region
Rij CRi, if MI NDI ST(q, Rij)dk, we break as there is no
answer in Rij···RiM(line 9); otherwise we find candidate
POIs in Rijwhich are in the search direction and contain
all keywords, by calling function FINDCANDPOIS(line 10).
Finally we need to access region Ri+1 if necessary (line 11).
Iteratively we can find the knearest neighbors of query q.
Then we discuss how to compute the candidate regions
in Ri. Function FI NDCANDREGI ON S gives the pseudo-code
(Figure 9). We first compute the lower direction bound τRi
l
and the upper direction bound τRi
u(line 2). Next we find
the regions satisfying the direction constraint Ri[α, β] =
{Ril,··· ,Riu}(in [τRi
l, τ Ri
u]) using a binary search method
on the directions θi1,··· , θiM(line 3). Then if the inverted
lists are in memory, we check whether each region in Ri[α, β]
contains all keywords and add such regions into candidate-
region set CRi. If we use a disk-based method, we load
region inverted lists for each keyword LR
ki(line 4), compute
their intersection LR
Kthat satisfies keyword constraint (line 5),
intersect the regions satisfying keyword constraint LR
Kwith
the regions satisfying region constraint Ri[α, β], and get
RK
i[α, β](line 6). For each region Rij RK
i[α, β], if
Algorithm 1: DE SK S-BAISC (P, q)
Input:P: A collection of POIs
q=h(q.x, q.y ); [α, β]; K, ki: A query
Output:Pk
q={p|p Pqand pis a knn of q}, where Pq
is the set of POIs in the search direction that
contain all the keywords in K.
begin1
Initialize an empty priority queue Q;2
Let dkdenote the k-th smallest distance in Q;3
Locate the region Riwhere qappears using a binary4
search on r1,··· , rN;
while iNdo5
if MIN DIST (q, Ri)dkthen return;6
else CRi= FINDCANDRE GI ONS(q,Ri,dk) ;7
for Rij CRi(CRiare sorted) do8
if MIN DIST (q, Rij)dkthen break;9
else FINDCANDPO IS(q,Rij,dk,Q) ;10
i=i+ 1 ;11
end12
Function FINDCANDREGIONS(q,Ri,dk)
Input:q=h(q.x, q.y ); [α, β]; K, ki: A query
dk: The k-th smallest distance in Q
Ri: Region Ri
Output:CRi: A sorted candidate-region set
begin1
Compute direction bounds τRi
land τRi
u;2
Find regions Ri[α, β] = {Ril,··· ,Riu}in3
[τRi
l, τ Ri
u]using a binary search on θi1···θiM;
Load region inverted lists LR
kifor ki K ;
4
Compute LR
K=ki∈KLR
ki;
5
Compute RK
i[α, β] = Ri[α, β ] LR
K;6
for Rij RK
i[α, β]do7
if MIN DIST (q, Rij)< dkthen CRi Rij;8
Sort CRibased on the MINDIST function ;9
end10
Function FINDCANDPOI S(q,Rij,dk,Q)
Input:q=h(q.x, q.y ); [α, β]; K, ki: A query
dk: The k-th smallest distance in Q
Rij: Region Rij;Q: Queue
begin1
Load POI inverted lists LP
ki(Rij)for ki K ;
2
Compute intersection LP
K=ki∈KLP
ki(Rij);
3
for p LP
Kdo4
if αθ(q, p)β&dist(q, p)<dkthen5
add pinto Q, and update Qand dk;6
end7
Fig. 9. DE SK S-BAI SC algorithm (using disk-based inverted lists)
MIN DIST(q, Rij)< dk, we add Rijinto the candidate-region
set CRi(line 8). Finally we sort the regions in CRibased on
the MI NDIST function in ascending order (line 9).
Next we discuss how to compute the candidate POIs in Rij.
Function FI NDCANDPOI Sgives the pseudo-code (Figure 9).
If the POI inverted lists are in memory, we directly compute
l
u
ii-
i
i
Ri
Ri
Fig. 10. Pruning for [π
2α < β < π]
l
u
ii-
Ri
Ri
i
i
Fig. 11. Pruning for [πα < β < 3π
2]
i
i-
i
i
l
u
Ri
Ri
Fig. 12. Pruning for [3π
2α < β < 2π]
their intersection. If the POI inverted lists are on disk, we load
the POI inverted lists for each keyword. Note that for kiwe
only load POIs that are in Rij,LP
ki(Rij), based on the pointers
in region lists as shown in Figure 1 (line 2). Then we compute
the intersection of POI lists LP
K=ki∈KLP
ki(Rij)(line 3). For
p∈LP
K, if αθ(q, p)βand dist(q, p)<dk,pis a candidate.
We add pinto the priority queue and update dk(line 6).
B. Answering Queries with Any Direction
In this section, we discuss how to answer a query with
arbitrary directions. We first classify queries into basic queries
and complex queries as follows.
Case 1 Basic Queries:
0αβπ
2. We answer it using the index
structures on Obl as discussed in the above sections.
π
2αβπ. We answer it using the index
structures on Obr , which is similar to answer a query
with 0αβπ
2as shown in Figure 10.
παβ3π
2. We answer it using the index
structures on Otr , which is similar to answer a query
with 0αβπ
2as shown in Figure 11.
3π
2αβ2π. We answer it using the index
structures on Otl , which is similar to answer a query
with 0αβπ
2as shown in Figure 12.
Case 2 Complex Queries: All other queries are called
complex queries. For a complex query qwith direction
[α, β], we decompose qinto at most four basic queries:
(1) q1with direction [0,π
2)[α, β]; (2) q2with direction
[π
2, π)[α, β]; (3) q3with direction [π, 3π
2)[α, β]; and
(4) q4with direction [3π
2,2π)[α, β]. Thus we can first
answer the sub-queries and then combine the results to
generate the final answers of query q.§
A straightforward method to answer a complex query first
decomposes it into basic sub-queries and then computes k
nearest neighbors for each basic query. Finally it finds the real
knearest neighbors by combing the results of each basic query.
However this method is very expensive as some sub-queries
may have no real answers and we do not need to answer such
sub-queries. To this end, we propose an efficient algorithm by
pruning many unnecessary POIs. For each basic query, we first
compute their candidate regions. Then we sort the candidate
regions based on their MI NDI ST values. Next we access the
§We use α[0,2π)and βα+2πto denote any direction. If β > 2π,
we decompose the direction to [α, 2π)and [2π, β ] = [0, β 2π]. Then we
decompose them to basic queries and generate at most five sub-queries.
Algorithm 2: DE SK S (P, q)
Input:P: A collection of POIs
q=h(q.x, q.y ); [α, β]; K, ki: A query
Output:Pk
q={p|p Pqand pis a knn of q}, where Pq
is the set of POIs in the search direction that
contain all the keywords in K.
begin1
Initialize an empty priority queue QPfor POIs;2
Let dkdenote the k-th small distance in QP;3
Initialize an empty priority queue QRfor regions;4
Decompose qinto q1, q2,··· , q4;/*some may be empty*/5
for 1s4do6
Locate region Risfor qswhere qsappears;7
add Risinto QR;8
while QR6=φdo9
Get region Rimwith minimal MINDI ST(q , Rim);10
if MIN DIST (q, Rim)dkthen return;11
else CRim=FI ND CANDREGION S(q,Rim,dk);12
for Rim
j CRimdo
13
if MIN DIST (q, Rim
j)dkthen break;
14
else FINDCANDPO IS(q,Rim,dk,QP) ;15
Pop Rimfrom QR;16
if MIN DIST (q, Rim+1)< dkthen17
add Rim+1 into QR;18
end19
Fig. 13. DE SK S algorithm (using disk-based inverted lists)
candidate regions in order and prune unnecessary regions. The
pseudo-code of the algorithm is shown in Figure 13.
We maintain two priority queues: QPfor candidate POIs
(line 2) and QRfor regions (line 4). We first decompose the
query into at most four sub-queries (line 5). Then for each
sub-query qs, we locate which region qsappears (line 7) and
add the region Risinto region queue QR(line 8). Then we
find region Rimwith the minimal M IN DIST value in QR
(line 10). If M IN DIST(q, Rim)dk, we terminate as we
have found knearest neighbors (line 11); otherwise we find
candidate regions in Rim,CRim(line 12). For each candidate
region Rim
j CRim, if MINDI ST(q, Rim
j)dk, we break
as there is no answer in Rim
j· · · Rim
M(line 14); otherwise,
we compute candidate POIs in Rim
j(line 15). Next we pop
Rimfrom QR. For the next region Rim+1 after Rim, if
MIN DIST(q, Rim+1)< dk, we add it into region queue QR
(line 18). Iteratively we can find all answers of query q.
V. IN CREME NTAL SEAR CH AL GORIT HMS
Mobilephone users will dynamically change directions if
they cannot find expected answers in the current direction.
A naive method is to answer a new query from scratch.
However this method is very expensive. To address this issue,
we propose to incrementally answer a query based on the
cached results of previously issued queries. To avoid involving
huge space, we only cache knearest neighbors for a query.
We consider the following two cases to update a direction.
Case 1: The user increases a direction from [α, β]to [α<
α, β> β]. This corresponds to the case that the user increases
the direction using two fingers on the mobilephone screen.
Section V-A discusses how to answer such a query efficiently.
Case 2: The user moves the direction from [α, β]to [α+
δθ, β +δθ]. This corresponds to the case that the user changes
the direction by moving the mobilephone direction. Section V-
B discusses how to answer such a query efficiently.
Note that our method can support any direction-update
queries using these two operations.
A. Increasing The Direction
Suppose a user has issued a query qwith direction [α, β]
and then the user issues a new query qby increasing the
direction to [α< α, β> β]. We use the cached results
of qto answer this new query qas follows. Obviously, an
answer of qmust be an answer of q. Let d
k(dk)denote the
k-th smallest distance of nearest neighbors to query q(q). We
have d
kdk. Thus we can use dkas an upper bound.
We insert knearest neighbors of qinto the priority queue
of q. Then we decompose qinto three queries, q1[α, α],
q2[β, β ], and q[α, β]. We only need to answer q1and q2
with bound dk. We answer the two queries simultaneously as
answering sub-queries in Section IV-B. Note that in the two
new directions, if there is a POI p(or region Rij) with distance
to qlarger than dk, we prune p(or region Rij); otherwise we
insert it into the priority queue (or access the region). Thus
we can incrementally and efficiently answer query q.
B. Moving The Direction
Suppose a user has issued a query qwith direction [α, β]and
then the user issues a new query qby moving the direction
to [α+δθ, β +δθ]. Firstly consider δθ>0. If α+δθ,qand
qhave no overlapped direction and we answer the new query
from scratch. On the contrary, qand qhave an overlapped
direction [α+δθ, β]. We examine each knearest neighbors of
q, and if it is in [α+δθ, β], we insert it into the priority queue
of query qand update the k-th smallest threshold d
k. Then we
answer the new query with direction [β, β+δθ]using threshold
d
k. If we find kanswers in the priority queue or in direction
[β, β +δθ]within distance dk, we do not need to access
regions in direction [α+δθ, β ]; otherwise, we need to access
those regions in direction [α+δθ, β]with MIN DIST values no
smaller than dk. Thus we can use the bound d
kto do effective
In this paper we do not consider moving queries (changing locations).
pruning. Similarly if δθ<0and β+δθ, we can use the above
method to answer query qwith direction [α+δθ, α]. Thus we
can incrementally and efficiently answer query q.
VI. EXPER IM ENTAL STU DY
We have implemented our proposed methods. We com-
pared with two state-of-the-art methods MIR2-tree [6] and
LkT [5]k. We extended their methods to support direction-
aware search by examining whether each accessed MBR (or
POI) is in search direction. For LkT, we got the codes from
the authors [5] which were implemented in Java. For MIR2-
tree, we implemented it in C++. Our algorithms were also
implemented in C++. All the C++ codes were compiled using
GCC 4.2.3 with -O3 flag. As the baseline algorithms used
disk-based indexes, we also used disk-based index structure.
All the experiments were run on a Ubuntu machine with an
Intel Core E5450 3.0GHz CPU and 4 GB memory.
We used three real datasets, POIs in California(CA), POIs
in Virginia(VA), and POIs in China(CN). The statistics of the
datasets was summarized in Table II. We generated five query
sets with keyword numbers from 1 to 5 and each query set
had 1000 queries. TABLE II
DATASE TS.
CA VA CN
Total number of POIs (millions) 0.91 0.96 16.5
Total number of terms (millions) 9.7 4.663.6
Total number of unique terms (thousands) 35 26 753
Average number of unique terms per POI 8.57 4.53.85
A. Varying Mand N
In this section, we evaluate the effect on varying region
number Nand sub-region number M. Figure 14 shows the
results. We see that different values of Nand Mhad no
significant effect on the performance for M > 50. On
the VA dataset, the running time was about 2.3-2.7 ms on
every combinations of Mand N, and we got the highest
performance at N=100 and M=150. On the CA dataset, the
running time was 11-15 ms for different Mand Nvalues,
and we got the highest performance at N=100 and M=150.
On the CN dataset, the time was about 9-16 ms. The highest
performance was achieved at N=1000 and M= 600. Based
on the results, we had a conclusion that each region Riwas
better to contain 10, 000 POIs and each sub-region Rijwas
better to contain 100 POIs. In the reminder experiments, we
used N=100 and M=150 on the CA and VA datasets, and
N=1000 and M=600 on the CN dataset.
B. Evaluation on Pruning Techniques
In this section, we evaluate our pruning techniques. We im-
plemented three methods. (1) DE SK S+R: We used the region-
pruning techniques and function MI NDIS T(q, Ri)to prune Ri.
(2) DE SK S+D: We used the direction-pruning techniques and
function MI NDI ST(q , Rij)to prune Rij. (3) D ES KS+RD: We
used both region pruning and direction pruning.
Varying k:We first evaluated the pruning techniques by
varying kon the 5000 queries and α=0,β=π
3. Figure 15 shows
kAs MIR2-tree generally achieves much higher performance than IR2-
tree, we do not report results for IR2-tree.
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
50 100 150 200 250
Elapsed Time (ms)
N
M= 50
M=100
M=150
M=200
M=250
(a) VA
10
11
12
13
14
15
16
17
50 100 150 200 250
Elapsed Time (ms)
N
M= 50
M=100
M=150
M=200
M=250
(b) CA
8
9
10
11
12
13
14
15
16
0 200 400 600 800 1000
Elapsed Time (ms)
N
M= 200
M= 400
M= 600
M= 800
M=1000
(c) CN
Fig. 14. Average search performance: Varying Mand N(5000 queries, k= 10,α= 0, β =π
3)
0
10
20
30
40
50
60
1 5 10 20 50 100
Elapsed Time (ms)
Top-k
Desks+R
Desks+D
Desks+RD
(a) VA
0
10
20
30
40
50
1 5 10 20 50 100
Elapsed Time (ms)
Top-k
Desks+R
Desks+D
Desks+RD
(b) CA
0
10
20
30
40
50
60
1 5 10 20 50 100
Elapsed Time (ms)
Top-k
Desks+R
Desks+D
Desks+RD
(c) CN
Fig. 15. Average search performance: Varying k(5000 queries, α= 0, β =π
3)
0
10
20
30
40
50
60
1 2 3 4 5 6 7 8 9 10 11 12
Elapsed Time (ms)
Directions (* π/6)
Desks+R
Desks+D
Desks+RD
(a) VA
0
20
40
60
80
100
120
140
1 2 3 4 5 6 7 8 9 10 11 12
Elapsed Time (ms)
Directions (* π/6)
Desks+R
Desks+D
Desks+RD
(b) CA
0
50
100
150
1 2 3 4 5 6 7 8 9 10 11 12
Elapsed Time (ms)
Directions (* π/6)
Desks+R
Desks+D
Desks+RD
(c) CN
Fig. 16. Average search performance: Varying directions βαfrom π
6to 2π(5000 queries, k= 10)
the results. We can see that DE SKS+D and DESK S+RD sig-
nificantly outperformed DES KS+R. This is because DESKS +R
needed to access many unnecessary regions and the direction-
based pruning can prune large numbers of unnecessary re-
gions. DE SK S+RD was also better than DESK S+D, especially
on the CN dataset. This is because DESK S+RD can prune
many regions Ri. For example, on the CN dataset, for k=100,
DES KS+R took 55 ms, DESKS +D improved it to 32 ms, and
DES KS+RD further improved it to 16 ms. There are two
reasons that the improvement of DESK S+RD over DESK S+D
was not significant on the CA and VA datasets. Firstly, there
were small numbers of POIs that contain all keywords. Both
DES KS+D and DESK S+RD needed to access many regions.
Secondly, there were small numbers of regions (Ri). As
N=100, DE SK S+RD cannot prune large numbers of regions.
Varying directions: We evaluated the pruning techniques
by varying directions on 5000 queries and k= 10. Fig-
ure 16 shows the results. Similarly DE SKS+D and DESKS +RD
significantly outperformed D ES KS+R. On the VA dataset,
DES KS+R took more than 20 ms to answer a query, and
DES KS+D and DESK S+RD only took about 2 ms. This
is because DESK S+R needed to enumerate many regions
while DE SKS+D and DE SKS+RD can prune large numbers
of regions based on the direction-aware indexes.
C. Comparison with Existing Methods
We compared our algorithm DESKS (DESKS+RD) with
state-of-the-art methods MIR2-tree and LkT. We first com-
pared the index sizes and time as shown in Table III. LkT
TABLE III
IND EXI NG T IME A ND S IZ ES.
Data Sizes(MB) Index Sizes (MB) Index Time (Minutes)
MIR2-tree LkT DES KS MIR2-tree LkT DE SK S
CA 72.2 72 1430 265 1.3 780 1.8
VA 54.8 76 920 149 0.8 690 1.2
CN 805 1304 3552 25 33
was very expensive to build indexes as it needed to cluster
keywords in POIs. On the CN dataset, it took more than 2
days to index 1 million POIs, and it will take 1 month to
index 16 million POIs. Thus we did not show the results on
the CN dataset. MIR2-tree used R-tree and keyword signatures
to build indexes. Although DES KS had larger index sizes than
MIR2-tree (as DE SK S built indexes for Obl, Obr , Otr , Otl),
DES KS still had acceptable index sizes. LkT had much larger
index sizes as it built inverted lists for each R-tree node.
Varying directions: We first compared different methods by
varying directions on 5000 queries and k= 10. Figure 17
shows the results. Although LkT and MIR2-tree achieved high
performance for large directions, they were very slow for small
directions. This is because they needed to enumerate many
MBRs and POIs, which was very expensive. For example, on
the CA dataset, they took 200 ms for direction 2π, but took
more than 5 seconds for direction π
3. DE SKS only took 20
ms for any direction, since DES KS can use the index to do
effective direction pruning. Even for the direction with 2π,
DES KS still outperformed existing methods. There are three
reasons. Firstly, our region structure is very effective and can
be in memory. Secondly, our region inverted lists can prune
many unnecessary POIs. Thirdly, existing methods usually
1
10
100
1000
10000
1 2 3 4 5 6 7 8 9 10 11 12
Elapsed Time (ms)
Directions (* π/6)
LkT
MIR2-Tree
Desks
(a) VA
1
10
100
1000
10000
100000
1 2 3 4 5 6 7 8 9 10 11 12
Elapsed Time (ms)
Directions (* π/6)
LkT
MIR2-Tree
Desks
(b) CA
1
10
100
1000
10000
100000
1 2 3 4 5 6 7 8 9 10 11 12
Elapsed Time (ms)
Directions (* π/6)
MIR2-Tree
Desks
(c) CN
Fig. 17. Performance comparison: Varying directions βαfrom π
6to 2π(5000 queries, k= 10)
1
10
100
1000
1 5 10 20 50 100
Elapsed Time (ms)
Top-k
LkT
MIR2-Tree
Desks
(a) VA
1
10
100
1000
10000
1 5 10 20 50 100
Elapsed Time (ms)
Top-k
LkT
MIR2-Tree
Desks
(b) CA
1
10
100
1000
10000
1 5 10 20 50 100
Elapsed Time (ms)
Top-k
MIR2-Tree
Desks
(c) CN
Fig. 18. Performance comparison: Varying k(5000 queries, α= 0, β =π
3)
0.1
1
10
100
1000
10000
1 2 3 4 5
Elapsed Time (ms)
Numbers of Keywords
LkT
MIR2-Tree
Desks
(a) VA
0.1
1
10
100
1000
10000
100000
1 2 3 4 5
Elapsed Time (ms)
Numbers of Keywords
LkT
MIR2-Tree
Desks
(b) CA
0.1
1
10
100
1000
10000
100000
1 2 3 4 5
Elapsed Time (ms)
Numbers of Keywords
MIR2-Tree
Desks
(c) CN
Fig. 19. Performance comparison: Varying numbers of keywords (1000 queries in each query set, k= 10,α= 0, β =π
3)
achieved high performance for POIs with many keywords
(documents) [5]. However real POIs have no many keywords.
Varying k:Then we compared different methods by varying
kon 5000 queries and α= 0, β =π
3. Figure 18 shows the
results. We can see that DESKS significantly outperformed
MIR2-tree and LkT, even in 2-3 orders of magnitude. On the
VA dataset, MIR2-tree and LkT took about 500 ms, and DESKS
improved the time to 2-5 ms. The main reason is that existing
methods cannot use the index to do effective direction pruning.
DES KS used the novel direction-aware index which can prune
large numbers of unnecessary regions and POIs.
Varying the number of keywords: Next we compared
different methods by varying keyword numbers and setting
k= 10 and α= 0, β =π
3. Figure 19 shows the results.
We can see that for different numbers of keywords, DESK S
was still much better than MIR2-tree and LkT. For different
numbers of keywords, DESKS only took about 10-20 ms.
D. Evaluation on Incremental Search
In this section, we test our incremental search method. We
first initialized queries with βα=π
3and then increased direc-
tions by π
36 ,··· ,12π
36 . Figure 20(a) shows the results. We can
see that our incremental method DE SKS-I NC RE outperformed
DES KS. This is because DE SKS-I NC RE can incrementally
answer a query using the previously issued queries. We also
evaluated DES KS -INCR E by moving directions. Figure 20(b)
shows the results. We still initialized queries with βα=π
3
and then moved the directions by 6π
36 ,··· ,6π
36 . We can see
that for a small direction, DES KS -INCR E was much better than
DES KS, as DE SKS-I NC RE can use a tighter bound to answer
new queries. For a large direction, the improvement was not
high as DESKS- IN CRE needed to answer queries from scratch.
E. Scalability
In this section, we evaluate the scalability on the CN dataset
by varying numbers of POIs. Figure 21 shows the results with
different kvalues and directions. We can see that our method
scaled very well. This is contributed to our effective direction-
aware index structures and effective pruning techniques.
VII. RELATED WO RK
Many studies on spatial keyword search have been proposed
recently [25], [3], [9], [6], [23], [5], [24], [22], [1], [21], [2],
[19], [13]. The most related work to our problem is the study
by Felipe et al. [6], which proposed the index structures by
integrating signature files and R-tree to enable top-kspatial
keyword queries. Another similar study [5] is provided by
Cong et al., which combined inverted files and R-tree to
answer the location-aware top-ktext retrieval (LkT) query.
Our direction-aware spatial keyword query is different from
their methods as we have a direction constraint.
Zhou et al. [25] proposed to find web documents relevant
to user input keywords within a pre-specified region. They
developed several methods by combining R-tree and inverted
indexes. Chen et al. [3] extended this problem by supporting
5
6
7
8
9
10
1 2 3 4 5 6 7 8 9 10 11 12
Elapsed Time (ms)
Directions (* π/36)
Desks
Desks-Incre
(a) Increasing directions
4
5
6
7
8
9
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6
Elapsed Time (ms)
Directions (* π/36)
Desks
Desks-Incre
(b) Moving directions
Fig. 20. Incremental Search on the CN dataset (k= 10)
0
5
10
15
20
2 4 6 8 10 12 14 16
Elapsed Time (ms)
Numbers of POIs (* million)
k=100
k= 50
k= 20
k= 10
k= 1
(a) Varying k(βα=π
3)
0
10
20
30
40
2 4 6 8 10 12 14 16
Elapsed Time (ms)
Numbers of POIs (* million)
2π
5π/3
4π/3
π
2π/3
π/3
(b) Varying directions (k= 10)
Fig. 21. Scalability on the CN dataset
large numbers of “footprint representations. Hariharan et
al. [9] focused on finding objects containing a set of key-
words within a specific region. They proposed a hybrid index
structures by integrating R-tree and inverted lists. Zhang et
al. [23], [24] introduced the m-closest keyword query (mCK
query) which aims at finding the closest objects that match
keywords. Cong et al. [1] studied how to find top-kprestige-
based relevant spatial web objects. Yao et al. [22] tackled
the problem of answering approximate string match queries
in spatial databases. Wu et al. [21] studied spatial keyword
search for moving objects. Lu et al. [13] extended reversed
knn techniques to support reverse spatial and textual knearest
neighbor search. Roy and Chakrabarti [19] studied type-ahead
search in spatial databases using materialization techniques.
Cao et al. [2] studied collective keyword search by considering
multiple points. Leung et al. [12] proposed to use locations for
personalized search. Obviously the above queries substantially
differ from our direction-aware spatial keyword query.
There are many studies on knn [18], [16], [10], [11], [20],
[17]. Ferhatosmanoglu et al. [7] studied constrained nearest
neighbor search using polygon as a constraint. Cheng et al. [4]
studied constrained knn queries over uncertain data. Gao et
al. [8] and Nutanong et al. [14] proposed to answer visible
knn queries. Patroumpas et al. [15] studied the problem of
monitoring object orientations. However their methods cannot
support our problem as we support keyword-based search. We
consider direction constraint which is different from theirs.
Although we can build two separate indexes, one for key-
words and another for locations, this method is expensive, as
it cannot simultaneously apply textual and spatial pruning.
VIII. CON CL USION
In this paper we have studied the problem of direction-
aware spatial keyword search. We find the knearest neighbors
to the query that contain all input keywords and satisfy the
direction constraint. To efficiently answer a direction-aware
spatial keyword query, we proposed novel indexing structures,
which can prune large number of unnecessary POIs. We
developed effective region-based pruning and direction-based
pruning techniques to increase the search performance. We
devised efficient algorithms to answer direction-aware spatial
keyword queries. We also studied how to incrementally answer
a query. We have implemented our algorithms, and experimen-
tal results show that our method achieves high performance
and outperforms existing methods significantly.
IX. ACKNOW LEDGE ME NT
The authors would like to thank the anonymous reviewers for
their constructive comments and suggestions. This work was partly
supported by the National Natural Science Foundation of China under
Grant No. 61003004 and 60873065, National Grand Fundamental Re-
search 973 Program of China under Grant No. 2011CB302206, Na-
tional S&T Major Project of China under Grant No. 2011ZX01042-
001-002, and “NExT Research Center” funded by MDA, Singapore,
under Grant No. WBS:R-252-300-001-490.
REF ERENC ES
[1] X. Cao, G. Cong, and C. S. Jensen. Retrieving top-k prestige-based
relevant spatial web objects. PVLDB, 3(1):373–384, 2010.
[2] X. Cao, G. Cong, C. S. Jensen, and B. C. Ooi. Collective spatial keyword
querying. In SIGMOD Conference, pages 373–384, 2011.
[3] Y.-Y. Chen, T. Suel, and A. Markowetz. Efficient query processing in
geographic web search engines. In SIGMOD Conference, pages 277–
288, 2006.
[4] R. Cheng, J. Chen, M. F. Mokbel, and C.-Y. Chow. Probabilistic
verifiers: Evaluating constrained nearest-neighbor queries over uncertain
data. In ICDE, pages 973–982, 2008.
[5] G. Cong, C. S. Jensen, and D. Wu. Efficient retrieval of the top-k most
relevant spatial web objects. PVLDB, 2009.
[6] I. D. Felipe, V. Hristidis, and N. Rishe. Keyword search on spatial
databases. In ICDE, 2008.
[7] H. Ferhatosmanoglu, I. Stanoi, D. Agrawal, and A. E. Abbadi. Con-
strained nearest neighbor queries. In SSTD, pages 257–278, 2001.
[8] Y. Gao, B. Zheng, W.-C. Lee, and G. Chen. Continuous visible nearest
neighbor queries. In EDBT, pages 144–155, 2009.
[9] R. Hariharan, B. Hore, C. Li, and S. Mehrotra. Processing spatial-
keyword (SK) queries in geographic information retrieval (GIR) systems.
In SSDBM, 2007.
[10] G. R. Hjaltason and H. Samet. Distance browsing in spatial databases.
ACM Trans. Database Syst., 1999.
[11] M. R. Kolahdouzan and C. Shahabi. Voronoi-based k nearest neighbor
search for spatial network databases. In VLDB, pages 840–851, 2004.
[12] K. W.-T. Leung, D. L. Lee, and W.-C. Lee. Personalized web search
with location preferences. In ICDE, pages 701–712, 2010.
[13] J. Lu, Y. Lu, and G. Cong. Reverse spatial and textual k nearest neighbor
search. In SIGMOD Conference, pages 349–360, 2011.
[14] S. Nutanong, E. Tanin, and R. Zhang. Visible nearest neighbor queries.
In DASFAA, pages 876–883, 2007.
[15] K. Patroumpas and T. K. Sellis. Monitoring orientation of moving
objects around focal points. In SSTD, pages 228–246, 2009.
[16] S. Pramanik and J. Li. Fast approximate search algorithm for nearest
neighbor queries in high dimensions. In ICDE, page 251, 1999.
[17] J. B. Rocha-Junior, A. Vlachou, C. Doulkeridis, and K. Nørv ˚ag. Efficient
processing of top-k spatial preference queries. PVLDB, 4(2):93–104,
2010.
[18] N. Roussopoulos, S. Kelley, and F. Vincent. Nearest neighbor queries.
In SIGMOD Conference, 1995.
[19] S. B. Roy and K. Chakrabarti. Location-aware type ahead search on
spatial databases: semantics and efficiency. In SIGMOD Conference,
pages 361–372, 2011.
[20] Y. Tao, D. Papadias, and Q. Shen. Continuous nearest neighbor search.
In VLDB, pages 287–298, 2002.
[21] D. Wu, M. L. Yiu, C. S. Jensen, and G. Cong. Efficient continuously
moving top-k spatial keyword query processing. In ICDE, pages 541–
552, 2011.
[22] B. Yao, F. Li, M. Hadjieleftheriou, and K. Hou. Approximate string
search in spatial databases. In ICDE, 2010.
[23] D. Zhang, Y. M. Chee, A. Mondal, A. K. H. Tung, and M. Kitsuregawa.
Keyword search in spatial databases: Towards searching by document.
In ICDE, 2009.
[24] D. Zhang, B. C. Ooi, and A. K. H. Tung. Locating mapped resources
in web 2.0. In ICDE, pages 521–532, 2010.
[25] Y. Zhou, X. Xie, C. Wang, Y. Gong, and W.-Y. Ma. Hybrid index
structures for location-based web search. In CIKM, 2005.
... Socially aware SK query [7,74,144] T kLUS Retrieval of objects: Extend the TkSK query to consider social aspects in the ranking function Retrieval of users: Extend the TkSK query to find topk users who create geo-tagged posts that satisfy an SK constraint Retrieval of terms: Extend the BRSK query to find the top-k frequent terms in the geo-tagged posts of the social network friends of a query user Direction-aware SK query [81] n.a. Extend the BkSK query to consider an extra direction constraint ...
... Direction-aware spatial keyword query [81] In addition to the spatial and textual predicates, a direction-aware spatial keyword query q contains a direction constraint [α, β], which captures that the user is only interested in geo-textual objects with directions from q in [α, β]. Thus, the query is given by q = (ρ, ψ, k, [α, β]), where ρ, ψ, and k denotes a location, a keyword set, and the result cardinality, respectively, and [α, β] represents the query direction. ...
... To answer a query, they first compute the set of relevant Geohash code and keyword pairs and then process the objects in the corresponding posting lists. Direction-aware spatial keyword query [81] Li et al. [81] propose a direction-aware index to organize objects. Using a tree structure, the objects in the non-leaf nodes are partitioned into subregions according to their directions and distances with respect to the bottom left point of a minimum bounding rectangle (MBR) that covers them. ...
Article
Full-text available
With the broad adoption of mobile devices, notably smartphones, keyword-based search for content has seen increasing use by mobile users, who are often interested in content related to their geographical location. We have also witnessed a proliferation of geo-textual content that encompasses both textual and geographical information. Examples include geo-tagged microblog posts, yellow pages, and web pages related to entities with physical locations. Over the past decade, substantial research has been conducted on integrating location into keyword-based querying of geo-textual content in settings where the underlying data is assumed to be either relatively static or is assumed to stream into a system that maintains a set of continuous queries. This paper offers a survey of both the research problems studied and the solutions proposed in these two settings. As such, it aims to offer the reader a first understanding of key concepts and techniques, and it serves as an “index” for researchers who are interested in exploring the concepts and techniques underlying proposed solutions to the querying of geo-textual data.
... Another query is the Direction-based k-Nearest Neighbors Query as a straightforward combination of k-Nearest Neighbors Query and the generalized version of Angle-constrained Query (Li, Feng, and Xu 2012). It finds k-nearest neighbors of the search object s that have a particular angular relationship with s (Figure 9(b)). ...
... Another approach is to use these refinements in machine learning models (Du et al. 2017). Li, Feng, and Xu (2012) propose spatial index structures to process Direction-based k-Nearest Neighbors Queries where spatial objects are also associated with textual description. The underlying idea is to build a hierarchical structure that organizes the elements of a point dataset in concentric rings. ...
Article
Spatial relationships are core components in the design and definition of spatial queries. A spatial relationship determines how two or more spatial objects are related or connected in space. Hence, given a spatial dataset, users can retrieve spatial objects in a given relationship with a search object. Different interpretations of spatial relationships are conceivable, leading to different types of relationships. The main types are (i) topological relationships (e.g. overlap, meet, inside), (ii) metric relationships (e.g. nearest neighbors), and (iii) direction relationships (e.g. cardinal directions). Although spatial information retrieval has been extensively studied in the literature, it is unclear which types of spatial queries can be defined using spatial relationships. In this article, we introduce a taxonomy for naming, describing, and classifying types of spatial queries frequently found in the literature. This taxonomy is based on the types of spatial relationships that are employed by spatial queries. By using this taxonomy, we discuss the intuitive descriptions, formal definitions, and possible implementation techniques of several types of spatial queries. The discussions lead to the identification of correspondences between types of spatial queries. Further, we identify challenges and open research topics in the spatial information retrieval area.
Article
Modern location-based systems have stimulated explosive growth of urban trajectory data and promoted many real-world applications, e.g. , trajectory prediction. However, heavy big data processing overhead and privacy concerns hinder trajectory acquisition and utilization. Inspired by regular trajectory distribution on transportation road networks, we propose to model trajectory data privately with a deep generative model and leverage the model to generate representative trajectories for downstream tasks or directly support these tasks ( e.g. , popularity ranking), rather than acquiring and processing the original big trajectory data. Nevertheless, it is rather challenging to model high-dimensional trajectories with time-varying yet skewed distribution. To address this problem, we model and generate trajectory sequence with judiciously encoded spatio-temporal features over skewed distribution by leveraging an important factor neglected by the literature - the underlying road properties ( e.g. , road types and directions), which are closely related to trajectory distribution. Specifically, we decompose trajectory into map-matched road sequence with temporal information and embed them to encode spatio-temporal features. Then, we enhance trajectory representation by encoding inherent route planning patterns from the underlying road properties. Later, we encode spatial correlations among edges and daily and weekly temporal periodicity information. Next, we employ a meta-learning module to generate trajectory sequence step by step by learning generalized trajectory distribution patterns from skewed trajectory data based on the well-encoded trajectory prefix. Last but not least, we preserve trajectory privacy by learning the model differential privately with clipping gradients. Experiments on real-world datasets show that our method significantly outperforms existing methods.
Article
A direction-aware augmented spatial keyword top- $k$ query (DAT $k$ Q) returns the top- $k$ objects based on a ranking function that considers spatial distance, textual similarity, query numeric attributes, and query direction. When a user initiates a DAT $k$ Q, some user-desired objects (missing objects) may not appear in the query result set, and then the user wonders why they do not appear, which is called the why-not question. This paper focuses on answering why-not questions on DAT $k$ Qs. We first discuss how to obtain the refined query direction by analyzing the position relationship between missing objects and original query direction in Polar coordinates. Then a DAPC index structure is designed, which can cut down irrelevant search space based on not only conventional distance pruning, keyword pruning, and attribute pruning but also query direction pruning. Particularly, by comparing the position relationship between the query direction and the sector (sector ring) region segmented by the DAPC-based method, the search space that does not meet the query direction is pruned. In addition, we discuss the applicability of our scheme for handling why-not questions on regional spatial keyword queries (SKQ), ordinary direction-aware top- $k$ SKQ queries and complex scoring SKQ queries. Finally, a series of experiments are conducted on two real datasets to show the efficiency of our DAPC-based method.
Article
In this paper, we revisit the problem of route travel time estimation on a road network and aim to boost its accuracy by capturing and utilizing spatio-temporal features from four significant aspects: heterogeneity, proximity, periodicity and dynamicity. Spatial-wise, we consider two forms of heterogeneity at link level in a road network: the turning ways between different links are heterogeneous which can make the travel time of the same link various; different links contain heterogeneous attributes and thereby lead to different travel time. In addition, we take into account the proximity: neighboring links have similar traffic patterns and lead to similar travel speeds. To this end, we build a link-connection graph to capture such heterogeneity and proximity. Temporal-wise, the weekly/daily periodicity of temporal background information (e.g., rush hours) and dynamic traffic conditions have significant impact on the travel time, which result in static and dynamic spatio-temporal features respectively. To capture such impacts, we regard the travel time/speed as a combination of static and dynamic parts, and extract many spatio-temporal relevant features for the prediction task. Talking about the methodology, it remains an open problem to build a generic learning model to boost the estimation accuracy. Hence, we design a novel encoder-decoder framework - The encoder uses the sequence attention model to encode dynamic features from the temporal-wise perspective. The decoder first uses the heterogeneous graph attention model to decode the static part of travel speed based on static spatio-temporal features, and then leverages the sequence attention model to decode the estimated travel time from spatial-wise perspective. Extensive experiments on real datasets verify the superiority of our method as well as the importance of the four aspects outlined above.
Article
Answering spatio-temporal range queries (RQs) on trajectory databases, i.e., finding all trajectories that intersect given ranges, is crucial in many real-world applications. Various kinds of indexes have been proposed to accelerate RQs. However, existing indexes typically use Euclidean distance to prune irrelevant regions without concerning the underlying road network information. Nevertheless, as vehicle trajectories are generated on road network edges, the road network could be seen as meta knowledge of trajectories and be used to index and query trajectories. To this end, we propose RP-Tree, a r oad network-aware p artition tree to support efficient RQs. The basic idea is partitioning a road network graph into hierarchical subgraphs and generate a balanced tree structure, where each tree node maintains its associated trajectories. We compactly index the spatio-temporal information of trajectories on the corresponding road network edges. Then, we design efficient search algorithms to support RQs by pruning irrelevant trajectories through subgraph range borders associated with RP-Tree nodes. Last but not least, we scale RP-Tree to very large datasets by devising approximate algorithms with bounded confidence at an interactive speed. Experimental results on three real-world datasets from Porto, Chengdu, and Beijing show that our method outperform baselines by 1 to 2 orders of magnitude.
Chapter
Location-based services recommend points of interests (POIs) which are nearer to the user’s position q. In practice, when the user is moving with a velocity \(\overrightarrow{v}\), he may prefer the nearer POIs which match his moving direction. In this paper, we propose the velocity-dependent nearest neighbor query (VeloNN query), which selects the POIs that are nearer and best match the user’s moving direction. In the VeloNN query, if the direction of a POI o highly matches the direction of \(\overrightarrow{v}\), o is likely to be preferred. Since computing the directional preferences of all POIs is time-consuming, we propose rules to filter out the POIs with low directional preferences. We also divide the space into tiles, i.e., rectangular areas, and compute a candidate set for each tile in advance. The VeloNN candidates can be quickly prepared after finding the tile where the user is. We conduct experiments on both synthetic and real datasets and the results show the proposed algorithms can support VeloNN queries efficiently.
Article
The collective spatial keyword query (CoSKQ), an important variant of spatial keyword query, aims to find a set of objects collectively covering the user’s query keywords, that are close to the query location and are close to each other. However, existing works only focus on the CoSKQ problem of exact keyword matching and cannot handle spelling errors and conventional spelling differences (for example, color vs. colour), that are common in real applications. Moreover, query time information is not considered. To this end, this paper takes the lead in studying the problem of Time-aware Approximate Collective spatial Keyword query processing in traffic networks (TACoSKQ), where the objects are located on a predefined traffic network. We first prove that the TACoSKQ problem is NP-complete, and design a hybrid index called TDAG-tree to support query-object distance pruning, inter-object distance pruning, approximate keyword pruning, and temporal pruning simultaneously. Then, we present two approximate algorithms with provable approximation bounds to efficiently support TACoSKQ query processing on traffic networks. Finally, extensive experiments using three real datasets demonstrate the efficiency and accuracy of our proposed algorithms.
Article
So-called spatial web queries retrieve web content representing points of interest, such that the points of interest have descriptions that are relevant to query keywords and are located close to a query location. Two broad categories of such queries exist. The first encompasses queries that retrieve single spatial web objects that each satisfy the query arguments. Most proposals belong to this category. The second category, to which this paper's proposal belongs, encompasses queries that support exploratory user behavior and retrieve sets of objects that represent regions of space that may be of interest to the user. Specifically, the paper proposes a new type of query, the top-k spatial textual cluster retrieval ( $k$ -STC) query that returns the top-k clusters that (i) are located close to a query location, (ii) contain objects that are relevant with regard to given query keywords, and (iii) have an object density that exceeds a given threshold. To compute this query, we propose a DBSCAN-based approach and an OPTICS-based approach that rely on on-line density-based clustering and that exploit early stop conditions. Empirical studies on real data sets offer evidence that the paper's proposals can find good quality clusters and are capable of excellent performance.
Conference Paper
Full-text available
Mapping mashups are emerging Web 2.0 applications in which data objects such as blogs, photos and videos from different sources are combined and marked in a map using APIs that are released by online mapping solutions such as Google and Yahoo Maps. These objects are typically associated with a set of tags capturing the embedded semantic and a set of coordinates indicating their geographical locations. Traditional web resource searching strategies are not effective in such an environment due to the lack of the gazetteer context in the tags. Instead, a better alternative approach is to locate an object by tag matching. However, the number of tags associated with each object is typically small, making it difficult for an object to capture the complete semantics in the query objects. In this paper, we focus on the fundamental application of locating geographical resources and propose an efficient tag-centric query processing strategy. In particular, we aim to find a set of nearest co-located objects which together match the query tags. Given the fact that there could be large number of data objects and tags, we develop an efficient search algorithm that can scale up in terms of the number of objects and tags. Further, to ensure that the results are relevant, we also propose a geographical context sensitive geo-tf-idf ranking mechanism. Our experiments on synthetic data sets demonstrate its scalability while the experiments using the real life data set confirm its practicality.
Conference Paper
Full-text available
There is more and more commercial and research interest in location-based web search, i.e. finding web content whose topic is related to a particular place or region. In this type of search, location information should be indexed as well as text information. However, the index of conventional text search engine is set-oriented, while location information is two-dimensional and in Euclidean space. This brings new research problems on how to efficiently represent the location attributes of web pages and how to combine two types of indexes. In this paper, we propose to use a hybrid index structure, which integrates inverted files and R*-trees, to handle both textual and location aware queries. Three different combining schemes are studied: (1) inverted file and R*-tree double index, (2) first inverted file then R*-tree, (3) first R*-tree then inverted file. To validate the performance of proposed index structures, we design and implement a complete location-based web search engine which mainly consists of four parts: (1) an extractor which detects geographical scopes of web pages and represents geographical scopes as multiple MBRs based on geographical coordinates; (2) an indexer which builds hybrid index structures to integrate text and location information; (3) a ranker which ranks results by geographical relevance as well as non-geographical relevance; (4) an interface which is friendly for users to input location-based search queries and to obtain geographical and textual relevant results. Experiments on large real-world web dataset show that both the second and the third structures are superior in query time and the second is slightly better than the third. Additionally, indexes based on R*-trees are proven to be more efficient than indexes based on grid structures.
Conference Paper
Full-text available
In this paper we introduce the notion of constrained nearest neighbor queries (CNN) and propose a series of methods to answer them. This class of queries can be thought of as nearest neighbor queries with range constraints. Although both nearest neighbor and range queries have been analyzed extensively in previous literature, the implications of constrained nearest neighbor queries have not been discussed. Due to their versatility, CNN queries are suitable to a wide range of applications from GIS systems to reverse nearest neighbor queries and multimedia applications. We develop methods for answering CNN queries with different properties and advantages. We prove the optimality (with respect to I/O cost) of one of the techniques proposed in this paper. The superiority of the proposed technique is shown by a performance analysis.
Conference Paper
Full-text available
Geographic objects associated with descriptive texts are becoming prevalent. This gives prominence to spatial keyword queries that take into account both the locations and textual descriptions of content. Specifically, the relevance of an object to a query is measured by spatial-textual similarity that is based on both spatial proximity and textual similarity. In this paper, we define Reverse Spatial Textual k Nearest Neighbor (RSTkNN) query, i.e., finding objects that take the query object as one of their k most spatial-textual similar objects. Existing works on reverse kNN queries focus solely on spatial locations but ignore text relevance. To answer RSTkNN queries efficiently, we propose a hybrid index tree called IUR-tree (Intersection-Union R-Tree) that effectively combines location proximity with textual similarity. Based on the IUR-tree, we design a branch-and-bound search algorithm. To further accelerate the query processing, we propose an enhanced variant of the IUR-tree called clustered IUR-tree and two corresponding optimization algorithms. Empirical studies show that the proposed algorithms offer scalability and are capable of excellent performance.
Article
A frequently encountered type of query in Geographic Information Systems is to find the k nearest neighbor objects to a given point in space. Processing such queries requires substantially different search algorithms than those for location or range queries. In this paper we present an efficient branch-and-bound R-tree traversal algorithm to find the nearest neighbor object to a point, and then generalize it to finding the k nearest neighbors. We also discuss metrics for an optimistic and a pessimistic search ordering strategy as well as for pruning. Finally, we present the results of several experiments obtained using the implementation of our algorithm and examine the behavior of the metrics and the scalability of the algorithm.
Conference Paper
We consider a setting with numerous location-aware mov- ing objects that communicate with a central server. Assuming a set of focal points of interest, we aim at continuously monitoring object orien- tations and hence detect situations where many objects get closer to or move away from any such site. Towards this goal, we propose a streaming approach that delegates part of the processing to objects, which relay po- sitional updates upon significant deviations at their course. The central processor maintains the changing distribution of current object headings around each focal point and may issue alerts once it observes many ob- jects moving along a direction (e.g., increased northbound traffic near the stadium). To efficiently answer such navigational queries, we intro- duce a novel access method that indexes object headings influencing a specific site. Furthermore, we extent this scheme to examine trajectory movements around sites over the recent past. Experimental results verify that this framework is able to cope with scalable numbers of objects at reduced communication cost, while offering instant notification of impor- tant trends along diverse directions for multiple focal points.
Conference Paper
A frequent type of query in spatial networks (e.g., road networks) is to flnd the K near- est neighbors (KNN) of a given query ob- ject. With these networks, the distances be- tween objects depend on their network con- nectivity and it is computationally expen- sive to compute the distances (e.g., shortest paths) between objects. In this paper, we pro- pose a novel approach to e-ciently and accu- rately evaluate KNN queries in spatial net- work databases using flrst order Voronoi di- agram. This approach is based on partition- ing a large network to small Voronoi regions, and then pre-computing distances both within and across the regions. By localizing the pre- computation within the regions, we save on both storage and computation and by per- forming across-the-network computation for only the border points of the neighboring re- gions, we avoid global pre-computation be- tween every node-pair. Our empirical experi- ments with several real-world data sets show that our proposed solution outperforms ap- proaches that are based on on-line distance computation by up to one order of magnitude, and provides a factor of four improvement in the selectivity of the fllter step as compared to the index-based approaches.
Conference Paper
With the proliferation of geo-positioning and geo-tagging, spatial web objects that possess both a geographical location and a textual description are gaining in prevalence, and spatial keyword queries that exploit both location and textual description are gaining in prominence. However, the queries studied so far generally focus on finding individual objects that each satisfy a query rather than finding groups of objects where the objects in a group collectively satisfy a query. We define the problem of retrieving a group of spatial web objects such that the group's keywords cover the query's keywords and such that objects are nearest to the query location and have the lowest inter-object distances. Specifically, we study two variants of this problem, both of which are NP-complete. We devise exact solutions as well as approximate solutions with provable approximation bounds to the problems. We present empirical studies that offer insight into the efficiency and accuracy of the solutions.