ArticlePDF Available

DESKS: Direction-aware spatial keyword search

April 2012

April 2012

DOI:10.1109/ICDE.2012.93

Authors:

Guoliang Li

Tsinghua University

Jianhua Feng

Tsinghua University

Location-based services (LBS) have been widely accepted by mobile users. Many LBS users have direction-aware search requirement that answers must be in a search direction. However to the best of our knowledge there is not yet any research available that investigates direction-aware search. A straightforward method first finds candidates without considering the direction constraint, and then generates the answers by prun-ing those candidates which invalidate the direction constraint. However this method is rather expensive as it involves a lot of useless computation on many unnecessary directions. To address this problem, we propose a direction-aware spatial keyword search method which inherently supports direction-aware search. We devise novel direction-aware indexing structures to prune unnecessary directions. We develop effective pruning techniques and search algorithms to efficiently answer a direction-aware query. As users may dynamically change their search directions, we propose to incrementally answer a query. Experimental results on real datasets show that our method achieves high performance and outperforms existing methods significantly.

Content uploaded by Jianhua Feng

Content may be subject to copyright.

DESKS: Direction-Aware Spatial Keyword Search

Guoliang Li, Jianhua Feng, Jing Xu

Department of Computer Science, Tsinghua University, Beijing 100084, China

liguoliang@tsinghua.edu.cn; fengjh@tsinghua.edu.cn; xmandbq@gmail.com

Abstract— Location-based services (LBS) have been widely

accepted by mobile users. Many LBS users have direction-aware

search requirement that answers must be in a search direction.

However to the best of our knowledge there is not yet any

research available that investigates direction-aware search. A

straightforward method ﬁrst ﬁnds candidates without considering

the direction constraint, and then generates the answers by prun-

ing those candidates which invalidate the direction constraint.

However this method is rather expensive as it involves a lot of

useless computation on many unnecessary directions. To address

this problem, we propose a direction-aware spatial keyword

search method which inherently supports direction-aware search.

We devise novel direction-aware indexing structures to prune

unnecessary directions. We develop effective pruning techniques

and search algorithms to efﬁciently answer a direction-aware

query. As users may dynamically change their search directions,

we propose to incrementally answer a query. Experimental results

on real datasets show that our method achieves high performance

and outperforms existing methods signiﬁcantly.

I. INTRODUCT ION

Location-based services (LBS) have been widely accepted

by mobile users. Many online location-based services are

available, such as AT&T (http://www.wireless.att.com/lbs) and

go2 (http://www.go2.com/). Recently many LBS users have

direction-aware search requirement that answers must be in a

search direction. For example, a user on the highway wants

to ﬁnd nearest gas stations or restaurants. She has a search

requirement that the answers should be in the right front of

her driving direction, if in a right-hand trafﬁc country (e.g., US

and China). Consider another example that a user is walking

to a supermarket. She wants to ﬁnd an ATM around her walk

direction so as to avoid a long walk. In this case she also has

a direction-aware search requirement. There are many other

direction-aware search requirements in LBS, e.g., multiple

destination routing and virtual reality (to show local 3D

streetscape). More importantly, many modern mobilephones

(e.g., iPhone 4 and HTC) have GPS and compass. We can

easily get user’s location via the GPS and direction by the

compass. Thus we can utilize user’s location and search

direction to improve user search experiences in LBS.

However to the best of our knowledge there is not yet

any research available that investigates direction-aware search.

A straightforward method to support direction-aware search

ﬁrst ﬁnds the candidates without considering the direction

constraint (e.g, [6] and [5]) and then generates the answers

by pruning those candidates that invalidate the direction con-

straint. However this method is rather expensive as it involves

a lot of useless computation on many unnecessary directions.

To address this problem, we propose a direction-aware

spatial keyword search method, called DESK S, which inher-

ently supports direction-aware search. We ﬁrst formulate the

problem of direction-aware spatial keyword search as follows.

Consider a set of Points of Interest (POIs) where each POI

is associated with spatial information and textual description.

Given a direction-aware spatial keyword query with a location,

a direction, and a set of keywords, the direction-aware search

ﬁnds knearest neighbors of the query which are in the search

direction and contain all input keywords.

To support direction-aware spatial keyword queries, we

devise novel direction-aware index structures to prune un-

necessary directions. We ﬁrst group the POIs based on their

distances to the bottom-left point of the Minimum Bounding

Rectangle (MBR) that contains all POIs. Then for POIs in each

group, we sort them based on their directions to the bottom-left

point. Given a query, we can deduce a direction range with

a lower direction bound and an upper direction bound. We

can prove that for any POI if its direction to the bottom-left

point is not in the direction range of the query, it will not be

an answer, and we can prune the POI. Similarly we can also

prune a group of POIs based on the direction range. Motivated

by this observation, we develop novel direction-aware index

structures, effective pruning techniques, and efﬁcient search

algorithms to facilitate direction-aware spatial keyword search.

To summarize, we make the following contributions.

•We formulate the problem of direction-aware spatial

keyword search and propose an efﬁcient direction-aware

search method to address this problem.

•We devise a novel direction-aware index structure which

groups the POIs based on their distances and directions.

The indexing structures can be used to effectively prune

many unnecessary POIs.

•We develop effective pruning techniques and search algo-

rithms to answer direction-aware spatial keyword queries.

As mobilephone users may dynamically change search

directions, we propose to incrementally answer a query

based on the cached results of previously issued queries.

•We have implemented our method, and the experimental

results show that our method achieves high performance

and outperforms existing methods signiﬁcantly.

The rest of this paper is organized as follows. We ﬁrst

formulate the problem of direction-aware spatial keyword

search and devise a novel indexing structure in Section II. We

develop effective pruning techniques in Section III. Section IV

gives efﬁcient algorithms to answer a direction-aware query.

We discuss how to incrementally answer a query in Section V.

Experiment results are provided in Section VI. We review

related works in Section VII and conclude in Section VIII.

II. DIREC TI ON-AWARE SPATIA L KEY WORD SEA RCH

A. Problem Formulation

Data: Consider a set of POIs, P={p1, p2,··· , p|P |}. Each

POI pihas a location (pi.x, pi.y)where pi.x is the x-

coordinate and pi.y is the y-coordinate of the POI. piis also

associated with a set of keywords, denoted by pi.d. Thus a

POI is denoted by p=h(p.x, p.y ); p.di.

Query: A query qcontains a location (q.x, q.y)with an x-

coordinate q.x and a y-coordinate q.y. Query qhas a direction

constraint [α, β], which denotes that the user is only interested

in the POIs with directions to qin [α, β ]. Query qcontains

a set of user-input keywords K={k1, k2,··· , k|K| }. Users

can specify an integer kto ﬁnd top-krelevant answers. Thus

query qis denoted by q=h(q.x, q.y); [α, β]; K;ki.

Answer: Let Rdenote the Minimum Bounding Rectangle

(MBR) that contains all POIs in P. Given a query qwith

direction [α, β], let Sqdenote the sector centered at qwith a

radius rand an angle from αto β, where ris the maximal

distance from qto the boundary of region R. Let Rqdenote

the intersection of Sqand R, which is the search region

satisfying the direction constraint. A POI pis an answer of

query q, if pis in Rqand p.d contains all keywords in K.

Let Pqdenote the set of all answers of q. We ﬁnd knearest

neighbors of qfrom Pq. Next we formulate our problem.

Deﬁnition 1(DIRE CTION -AWARE SPATIA L KEY WORD

SEA RCH) Given a set of POIs Pand a query q=

h(q.x, q.y ); [α, β]; K;ki, let Pqdenote the set of POIs in Rq

that contain all keywords in K. DESK S ﬁnds a subset Pk

qof

Pqwith kPOIs such that ∀p∈ Pk

qand ∀p′∈ Pq− Pk

dist(p, q)≤dist(p′, q), where dis t(·)is a distance function

and in the paper we use Euclidean distance∗.

Consider an example in Figure 1. There are 24 POIs. Given

a query qwith keywords “chinese food”, the ten highlighted

POIs p3, p4, p5, p6, p9, p12, p15 , p21, p22 , p23 contain the two

keywords. If we have no direction constraint, p3and p4are two

nearest neighbors. If we have direction constraint as shown in

Figure 1, p12 and p22 are two nearest neighbors.

We can extend existing spatial keyword search methods (e.g,

[6] and [5]) to support our problem. The method contains two

steps. (1) The ﬁlter step: It ignores the direction constraint

and ﬁnds knearest neighbors of query qwhich contain all

keywords. (2) The veriﬁcation step: For each found POI in the

ﬁrst step, it checks whether the POI is in the search direction.

If yes, it is a knearest neighbor of q. As most knearest

neighbors of qmay invalidate the direction constraint, it needs

to repeatedly execute the two steps until ﬁnding kanswers.

Although we can incorporate the veriﬁcation step into the ﬁlter

step, this method still needs to visit many unnecessary POIs.

To address this problem, we propose a direction-aware spatial

keyword search method to achieve a high performance.

B. Direction-aware Indexing Structures

Given a set of POIs, we ﬁrst generate the MBR Rthat

contains all POIs. Let Obl , Obr, Otr , Otl respectively denote

∗We suppose q∈R and our method can be extended to support q6∈R.

p11

p12

p15

p21

p22

p23

p12

p15

p21

p22

p23

RPRP

R11={p1, p2},R12={p3, p4},R13={p5, p6},R14={p7, p8}

R21={p9, p10},R22={p11, p12},R23={p13, p14 },R24={p15, p16}

R31={p17, p18},R32={p19, p20},R33={p21, p22},R34={p23, p24 }

Fig. 1. A running example

the bottom-left point, the bottom-right point, the top-right

point, and the top-left point of Ras shown in Figure 1.

We sort the POIs based on their distances to the bottom-left

point Obl. Without loss of generality, assume the sorted POIs

are p1, p2,··· , p|P| where dist(pi, Obl)≤dist(pj, Obl)for

i < j. Then we evenly partition them into Ndisjoint buckets,

B1, B2,··· , BN. If every POI has a distinct distance to Obl ,

we have Bi={p(i−1)×λ+1,··· , pi×λ}for 1≤i≤N−1and

BN={p(N−1)×λ+1,··· , p|P |}where λ=⌈|P |

N⌉. If multiple

POIs have the same distance to Obl, we partition the POIs

into different buckets as follows. We ﬁrst put the ﬁrst λPOIs

into the ﬁrst bucket B1. If dist(pλ+1, Obl)=dist(pλ, Obl),

we add pλ+1 into B1; otherwise, we add λPOIs starting

with pλ+1 into B2. Iteratively we can put each POI into a

bucket. Let ri−1denote the smallest distance of POIs in Bi

for 1≤i≤N. We draw N−1arcs centered at Obl with

radiuses r1, r2,··· , rN−1. The N−1arcs partition Rinto N

regions (quarter concentric rings) R1,R2,··· ,RN, where R1

is within r1,RNis outside rN−1, and Riis between ri−1

and rifor 1< i < N. Obviously the POIs in Bifall in

Ri. Especially a POI on the i-th arc belongs to region Ri+1 .

Obviously the distance of any POI in Rito Obl is in [ri−1, ri)

for 1≤i < N (r0=dist(p1, Obl)). For example, in Fig-

ure 1, we partition POIs into three regions R1={p1,··· , p8},

R2={p9,··· , p16}, and R3={p17,··· , p24}.

Each POI pin region Rihas a direction to the bottom-

left point Obl , denoted by pθ=arctan p.y−Obl.y

p.x−Obl.x . For ease of

presentation, suppose Obl =(0,0). Thus pθ=arctan p.y

p.x . We

sort POIs in Ribased on their directions in ascending order.

Similarly we evenly partition POIs in Riinto Mbuckets

Bi1, Bi2,··· , BiM. Each bucket contains about |P|

M×NPOIs.

Suppose the minimal direction of POIs in bucket Bijis

θij−1for 1≤j≤M. We use M−1lines from Obl

with directions θi1, θi2,··· , θiM−1to partition Riinto M

sub-regions (a part of concentric rings) Ri1,Ri2,··· ,RiM.

Obviously the direction of any POI in Rijis in [θij−1, θij).

For example, in Figure 1, we partition each Riinto four sub-

regions. For instance, we partition R2into R21={p9, p10},

R22={p11, p12},R23={p13, p14}, and R24={p15 , p16}.

bl br

trtl

i-1

1 2 iN

i-1

1 2

p12

p18

p57

p68

p79

p22

p23

p34

p48

p57

p64

p92

if have large memory if have small memory

RPRP

Ri1RijRiM

Fig. 2. Indexing structure

Our region structure is illustrated in Figure 2, which has

two salient features. Firstly given two sub-regions Risand

Rjt, for any POI p∈ Risand p′∈ Rjt, if i < j, we

have dist(p, Obl)<dist(p′, Obl). Secondly given two sub-

regions Risand Rit, for any POI p∈ Risand p′∈ Rit, if

s < t, we have pθ< p′

θ. We will use these two features to do

efﬁcient pruning. Notice that traditional MBRs have no such

features, thus we propose the new index structure to facilitate

direction-aware search.

Although we can use the region structure to do spatial

pruning, we cannot use it to do textual pruning. To address this

issue, we build an inverted list for keywords in each sub-region

Rij. We give the space complexity of our index structure. For

the region structure, its space complexity is O(M×N). As

M×Nis not large (N=1000, M=600 for 16 million POIs,

see Section VI), we can keep the region structure in memory.

For the inverted lists, suppose each POI contains Wdistinct

keywords in average. The total inverted-list size is O(|P |×W).

If the inverted-list size is very large, we use a disk-based

structure. For each keyword kx, we maintain two inverted lists:

(1) The region list LR

kxthat keeps the sorted IDs of sub-regions

that contain kx. The sub-regions are sorted as follows. Ris<

Rjtif i < j, and Ris<Ritif s < t; (2) The POI list

kxthat keeps the sorted IDs of POIs that contain kx: The

POIs in different sub-regions are sorted by sub-region order

and the POIs in the same sub-region are sorted by directions.

In LR

kx, for each Rij∈ LR

kx, we also maintain a pointer to the

POI list LP

kxthat keeps the position of the smallest POI ID in

Rij∩LP

kx. Based on the sorted property, suppose Rij’s pointer

is lijand the pointer of its next sub-region is lij+1 . We can

efﬁciently ﬁnd POIs in Rijthat contain keyword kxfrom LP

kx,

e.g., the POIs in LP

kx[lij, lij+1 ). Suppose each sub-region Rij

contains Ldistinct keywords in average. The space complexity

of the disk-based inverted list is O(|P | × W+L×M×N).

The overall index structure is shown in Figure 2. Note that to

efﬁciently answer a query, besides building an index structure

for Obl, we also maintain index structures for Obr, Otr , Otl.

Thus the total index size is four times of that for Obl .

For example, in Figure 1, there are 24 POIs. Suppose

N=3and M=4. We generate 12 sub-regions, R11,··· ,R14,

ij-1

pi-1j

pij

i-1j-1

pij-1

ij-1

i-1

Fig. 3. Notations

R21,··· ,R24,R31,··· ,R34. Each sub-region has two POIs.

For example, in R22, there are two POIs p11 and p12 .

For keyword “chinese”, we maintain a region inverted list

which has seven sub-regions and a POI inverted list that has

eleven POIs as shown in Figure 1. The pointer of R13is

chinese[2] = p5, that is p5is the smallest POI in R13that

contains “chinese”. Thus we can easily get POIs in R13

that contain “chinese” using its pointer as the start position

(LP

chinese[2]) and using the pointer of its next sub-region as

the end position (LP

chinese[4]), i.e., LP

chinese[2,4) = {p5, p6}.

In this paper we study how to use our index structures to

answer a direction-aware spatial keyword query and leave data

update as a future work.

C. Notations

For ease of presentation, we introduce some notations as

shown in Figure 3. Let qθ= arctan q.y

q.x denote the direction

of qto Obl and qd=dist(q, Obl )denote the distance of qto

Obl. Given a region Ri, let ri−1and rirespectively denote the

radius of its inner arc and its outer arc. Given a sub-region Rij,

we use a quadruple to denote the region, hri−1, ri, θij−1, θiji,

where θij−1is the minimum direction and θijis the maximal

direction of POIs in Rijto Obl . Let pi−1j, pi−1j−1, pij, pij−1

respectively denote the bottom-left point, bottom-right point,

top-left point, and top-right point of Rij(Figure 3).

Let qri−1

α(qri−1

β)denote the intersection of the line from q

with α(β)direction and the inner arc of Ri(with radius ri−1).

i i

Fig. 4. Pruning R1,· · · ,Ri−1

bl br

i- i

MinDist q,Rii- d

(a) α≤qθ≤β

bl br

i- i

MinDist q,Ridist q

i-1

ri-1

(b) qθ< α

bl br

i- i

i-1

MinDist q,Ridist q ri-1

Fig. 5. MI NDI ST(q,Ri)

As qri−1

α(qri−1

α.x, qri−1

α.y)is on the arc with radius ri−1, we

have (qri−1

α.x)2+ (qri−1

α.y)2=r2

i−1. In addition, as the point

is on the line with direction αto q,(qri−1

α.y −q.y)/(qri−1

α.x−

q.x) = tan α†. Thus we can compute the x-coordinate and

y-coordinate of qri−1

αusing the following Equations

(qri−1

α.y −q.y)/(qri−1

α.x −q.x) = tan α

(qri−1

α.x)2+ (qri−1

α.y)2=r2

i−1

(1)

Similarly, we can compute the point qri−1

β.

Let qθij−1

α(qθij

α)denote the intersection of the line from

qwith αdirection and the line from Obl with θij−1(θij)

direction. Similarly we can deﬁne qθij−1

βand qθij

β. As

qθij−1

α(qθij−1

α.x, qθij−1

α.y)is on the line with direction θij−1

to Obl,(qθij−1

α.y)/(qθij−1

α.x) = tan θij−1. As the point is on

the line with direction αto q,(qθij−1

α.y −q.y)/(qθij−1

α.x −

q.x) = tan α. Thus we can compute the x-coordinate and

y-coordinate of qθij−1

αusing the following Equations

((qθij−1

α.y −q.y)/(qθij−1

α.x −q.x) = tan α

(qθij−1

α.y)/(qθij−1

α.x) = tan θij−1

(2)

Similarly, we can compute the points qθij

α,qθij−1

β, and qθij

β.

Suppose the intersection of the line from qwith α(β)

direction and the boundary of Ris qR

α(qR

β)as shown in

Figure 3. Next we discuss how to compute qR

α. Suppose q′

denote the direction from qto the top-right point Otr. If

α > q′

θ,qR

αwill fall on the top line from Otl to Otr . In this

case the y-coordinate of qR

α,qR

α.y =H, and x-coordinate

of qR

α,qR

α.x =q.x + (H−q.y)/tan α, where His the

height of the MBR R. If α=q′

θ,qR

αis exactly Otr. If

α < q′

θ,qR

αwill fall on the right line from Obr to Otr. In

this case the x-coordinate qR

α.x =Land the y-coordinate

α.y =q.y + (L−q.x)×tan α, where Lis the length of the

MBR R. Thus we can compute the point qR

αas follows.

(qR

α.x, qR

α.y) = 





(q.x +H−q.y

tan α, H)α > q′

(L, H)α=q′

(L, q.y + (L−q.x)×tan α)α < q′

(3)

Similarly we can compute qR

β. We will use the above-

mentioned points to do pruning in the following sections.

†In this section, we suppose 0≤α≤β≤π

2and our technique can be

easily extended to support other directions (Section IV).

III. PRUN ING UNNE CE SSARY REG IONS

In this section, we propose effective pruning techniques

to prune unnecessary regions Ri(Section III-A) and Rij

(Section III-B). We ﬁrst consider the direction in 0≤α≤

β≤π

2and discuss how to support any direction in Section IV.

A. Pruning Region Ri

Consider regions R1,R2,··· ,RNwith the radiuses of their

outer circles respectively r1, r2,··· , rN. Given a query q,

we ﬁrst locate in which region qappears. To this end, we

ﬁrst compute its distance to Obl ,qd. Then we use a binary

search on r1, r2,··· , rNto ﬁnd the ﬁrst radius which is larger

than qd. Suppose we ﬁnd risuch that ri−1≤qd< rias

shown in Figure 4. We can prove that any POI in regions

R1,R2,··· ,Ri−1will not be an answer of query q, as they

are not in the search direction as formalized in Lemma 1.

Lemma 1Given a query point qwith 0≤α≤β≤π

suppose ri−1≤qd< ri. Any POI in R1,R2,··· ,Ri−1

cannot be an answer of q‡.

Lemma 1 holds for any query with direction 0≤α≤β≤

2. For example, in Figure 1, we can directly prune region R1

and all POIs in R1do not need to be accessed. Note that it may

not hold if β > π

2. Consider a counter-example where a query

qis on the bottom line from Obl to Obr. If βis larger than π

the search direction may have overlap with Ri−1. Similarly

αshould be no smaller than 0and the counter-example is a

query on the left line from Obl to Otl.

MIN DIST function for Ri:To facilitate nearest neighbor

search, traditional methods use function M IN DIST to estimate

the distance between a query and an MBR [10]. Formally,

given a query qand an MBR mbr, function MIN DIS T(q, mbr)

returns the minimal distance of qto mbr. As Riin our method

is not an MBR and our query has direction constraint, we

extend the function to support our problem as follows.

If qis outside the outer arc of Ri(qd≥ri), we have

MIN DIST(q, Ri)=∞based on Lemma 1. If qis in Ri(ri−1≤

qd< ri), we have MINDIS T(q, Ri) = 0. If qis inside the

inner arc of Ri(qd< ri−1), we give the function as follows.

Consider the direction of qto Obl ,qθ. If α≤qθ≤β, the near-

est neighbor of qin Riis the intersection of the line with qθ

direction and the inner arc of Riwith radius ri−1(Figure 5(a)).

Thus MI NDIST(q, Ri) = ri−1−qd. If qθ< α, the nearest

‡In this paper, we omit the proofs of Lemmas due to space constraints.

(a) τR

l=θqR

αand τR

u=θqR

(b) τR

l=qθand τR

u=θqR

l=θqR

αand τR

u=qθ

Fig. 6. Direction-based pruning for regions Ri,· · · ,RN

Fig. 7. Direction-based Pruning for Ri

neighbor of qin Riis qri−1

αwhich is the intersection of the line

from qwith αdirection and the inner arc of Ri(Figure 5(b)).

Thus MI NDIST(q, Ri) = dist(q, qri−1

α). Similarly if qθ> β,

MIN DIST(q, Ri) = dist(q, q ri−1

β)(Figure 5(c)).

Thus we give the MINDIST function as follows.

MIN DIST(q, Ri) = 









∞qd≥ri

0ri−1≤qd< ri

ri−1−qdqd< ri−1&α≤qθ≤β

dist(q, qri−1

α)qd< ri−1&qθ< α

dist(q, qri−1

β)qd< ri−1&qθ> β

(4)

where qri−1

αand qri−1

βcan be computed using Equation 1.

Given a query q, we ﬁrst ﬁnd its located region Ri

and access the POIs in Ri. Then we verify whether the

POIs satisfy the direction constraint and contain all key-

words. Suppose the k-th smallest distance of the candi-

dates that have been computed is dk. Then for the next

region Ri+1, if MINDIST (q, Ri+1)≥dk, we terminate and

prune Ri+1,··· ,RN; otherwise we access POIs in Ri+1 .

Iteratively we can ﬁnd all answers. As we use the best-

ﬁrst search method, we only utilize MINDIS T function and

will not use MINMAX DIS T function [10]. For example, in

Figure 1, suppose k= 1. In R2, we ﬁnd an answer p12.

As MI NDIST(q, R3)>di st(q, p12), we terminate and prune

POIs in R3.

However this method neglects the fact that some sub-

regions Rijin Rimay not satisfy the direction constraint.

For example, in Figure 1, although R21has a POI p9which

contains all keywords, we can prune the region as it is not in

the search direction. Similarly we can prune R24. To achieve

our goal, we discuss how to effectively prune Rijin Ri.

B. Pruning Regions Rij

In this section, we ﬁrst introduce how to prune some

unnecessary sub-regions Rijwhich have no overlap with the

search direction, and then give the function MINDIST(q, Rij).

In the rest of this paper, if the context is clear, the term

“region” and “sub-region” are used interchangeably for Rij.

Our indexing structure has a salient feature: If a POI pis an

answer of q, its direction (pθ= arctan p.y

p.x )to Obl must be in

a range [τR

l, τ R

u]. In other words, we can prune the POIs with

direction smaller than τR

lor larger than τR

u. Next we discuss

how to deduce the lower bound τR

land the upper bound τR

Given query qwith direction [α, β], consider the intersection

α(qR

β)of the line from qwith α(β)direction and the

boundary of region Ras shown in Figure 6. Let θqR

αand

θqR

βrespectively denote the directions of points qR

αand qR

to Obl. As α≤β,θqR

α≤θqR

β. Let τR

l= min(θqR

α, qθ)and

τR

u= max(θqR

β, qθ). For any point p, if pθ> τR

u, its direction

to qmust be larger than β, thus pcannot be an answer of q

(Figure 6(b)). Similarly, if pθ< τ R

l, its direction to qmust be

smaller than α, thus pcannot be an answer of q(Figure 6(c)).

The correctness is formalized in Lemma 2.

Lemma 2Given a query qwith direction [α, β ], let τR

min(θqR

α, qθ)and τR

u= max(θqR

β, qθ). For any POI p, if

pθ>τR

uor pθ<τR

l,pcannot be an answer of q.

Based on Lemma 2 we only need to access the POIs with

directions between τR

land τR

u. Moreover, a region Rijhas

a lower direction bound θij−1and an upper direction bound

θij, which respectively denote the minimal direction and the

maximal direction of POIs in Rij. In other words, for any POI

p∈ Rijwe have θij−1≤pθ< θij. Based on Lemma 2, for

region Rijwith direction [θij−1, θij), if θij≤τR

lor θij−1>

τR

u, we can prune the region Rijas formalized in Lemma 3.

Lemma 3Given a query qwith direction [α, β ], let τR

min(θqR

α, qθ)and τR

u= max(θqR

β, qθ). For any region Rij

with direction [θij−1, θij), if θij≤τR

lor θij−1> τR

u, any

POI in Rijcannot be an answer of q.

For example, in Figure 1, although R21and R24have POIs

that contain all keywords, we can prune them as they are

not in search direction based on the direction-based pruning

technique in Lemma 3. Notice that this pruning technique is

valid for all regions. Next we devise tighter direction bounds

for region Ri. Let τRi

ldenote the tighter lower bound and

τRi

udenote the tighter upper bound for Ri. For any POI pin

Ri, if pθ< τRi

lor pθ> τRi

u, we can prune the POI. Next

we discuss how to deduce the two tighter bounds.

Consider the intersection of the line from qwith α(β)

direction and the outer arc of Ri, denoted by qri

α(qri

β). The

two points can be computed by Equation 1. Let θqri

α, θqri

respectively denote the directions of points qri

α, qri

βto Obl.

It is easy to ﬁgure out that if qri

αis in region R(denoted

by qri

α∈ R), θqri

α≥θqR

α; otherwise θqri

α< θqR

α(Figure 7).

Similarly if qri

β∈ R,θqri

β≤θqR

β; otherwise θqri

β> θqR

β. Based

on this observation, we give the tighter bounds τRi

land τRi

τRi

l=





qθqθ≤α

θqri

αqθ> α &qri

α∈ R

θqR

αqθ> α &qri

α6∈ R

(5)

bl br

i-1

iMinDist q,Ri q, p

q p

ij-1

i-1j-1

(a) R<

i[0, θij−1)

bl br

trtl

ij-1

qri-1

MinDist q,Ri q,

i-1

(b) R<

i[θij−1, θij)

bl br

ij-1

MinDist q,Ri q,

qij

q p

i-1j

pi-1j

i[θij,π

bl br

iMinDist q,Ri q,

ij-1

qij-1

i-1j-1

(d) Ri[0, θij−1)

bl br

iMinDist(q,Ri ) = 0

ij-1

(e) Ri[θij−1, θij)

bl br

iMinDist(q,Ri )=dist(q, )

ij-1

jqij

pi-1j

(f) Ri[θij,π

Fig. 8. MI NDI ST(q,Rij)

τRi

u=









qθqθ≥β

θqri

βqθ< β &qri

β∈ R

θqR

βqθ< β &qri

β6∈ R

(6)

Then consider region Rijwith the minimal direction θij−1

and the maximal direction θij. If θij≤τRi

lor θij−1> τRi

region Rijhas no overlap with the search direction, thus

we can prune Rij. In other words, for Ri, we only need to

access the regions Ril,··· ,Riu, such that θil−1≤τRi

l<θiland

θiu−1≤τRi

u<θiu. To efﬁciently identify such regions, we use

τRi

lto do a binary search on the directions of regions in Ri,

{θi1,··· , θiM}, and ﬁnd the smallest one which is larger than

τRi

l, i.e., Ril. Then we use τRi

uto do a binary search on the

directions in {θil+1 ,··· , θiM}, and ﬁnd the largest one which

is smaller than τRi

u, i.e., Riu. Thus we only need to access

Ril,··· ,Riu. Lemma 4 formalizes the pruning technique.

Lemma 4Given a query qwith direction [α, β ]and a region

Ri, let τRi

l=min(θqri

α, qθ)and τRi

u=max(θqri

β, qθ). For any

POI p∈Ri, if pθ>τRi

uor pθ<τRi

l,pcannot be an answer of q;

For any region Rij∈Riwith direction [θij−1, θij), if θij≤τRi

or θij−1>τRi

u, any POI in Rijcannot be an answer of q.

Consider the example in Figure 1. We can prune regions

R21and R24in R2, and regions R31and R34in R3.

MIN DIST for Rij:For each region Rijin {Ril,··· ,Riu},

we use MINDI ST function to estimate the distance between

qand Rij, i.e., MI NDI ST(q, Rij). To this end, we partition

Rinto three regions by the inner arc (ri−1) and the outer arc

(ri), i.e., the region inside the inner arc R<

i, the region Ri,

and the region outside R>

i. Obviously, if q∈ R>

i, any POI

in Rijwill not be an answer of qbased on Lemma 1, thus

MIN DIST(q, Rij) = ∞. For R<

iand Ri, we respectively par-

tition them into three regions based on the two directions θij−1

and θij, denoted by R<

i[0, θij−1),R<

i[θij−1, θij),R<

i[θij,π

2],

and Ri[0, θij−1),Ri[θij−1, θij),Ri[θij,π

2](Figure 8).

(1) q∈R<

i[0, θij−1)(Figure 8(a)). If we have no direction

constraint, the nearest neighbor of qis the bottom-right point

pi−1j−1. Next, we consider the case with direction [α, β]. Let

θ(q, pi−1j−1)denote the direction from qto pi−1j−1. If α≤

θ(q, pi−1j−1)≤β, the nearest neighbor of qis still pi−1j−1.

If θ(q, pi−1j−1)< α, the nearest neighbor of qis qri−1

α, which

is the intersection of the line from qwith αdirection and the

arc with radius ri−1(computed by Equation 1). Similarly if

θ(q, pi−1j−1)> β, the nearest neighbor of qis qθij−1

β, which

is the intersection of the line from qwith βdirection and the

line from Obl with θij−1direction (computed by Equation 2).

(2) q∈R<

i[θij−1, θij)(Figure 8(b)). If α≤qθ≤β, the nearest

neighbor of qis qri−1

θwhich is the intersection of the line from

qwith qθdirection and the arc with radius ri−1. The distance

is ri−1−qd. If qθ< α, the nearest neighbor of qis qri−1

α. If

qθ> β, the nearest neighbor of qis qri−1

β.

(3) q∈R<

i[θij,π

2](Figure 8(c)). Similar to case (1), consider

the bottom-left point pi−1j. Let θ(q, pi−1j)denote the direc-

tion from qto pi−1j. If α≤θ(q, pi−1j)≤β, the nearest neighbor

of qis pi−1j. If θ(q, pi−1j)<α, the nearest neighbor of qis

qθij

α. If θ(q, pi−1j)>β, the nearest neighbor of qis qri−1

β.

(4) q∈Ri[0, θij−1)(Figure 8(d)). As β≤π

2, the nearest

neighbor of qmust be qθij−1

β(computed by Equation 2).

(5) q∈Ri[θij−1, θij)(Figure 8(e)). As qis in Rij,

MIN DIST(q, Rij)=0.

(6) q∈Ri[θij,π

2](Figure 8(f)). As α≥0, the nearest neighbor

of qmust be qθij

α(computed by Equation 2).

To summarize, we give function MINDIST (q, Rij)in Table I.

TABLE I

MIN DIS T(q, Rij)

Regions MIN DIS T(q, Rij)

i∞

i[0, θij−1)









dist(q, qri−1

α)θ(q, pi−1j−1)< α

dist(q, pi−1j−1)α≤θ(q, pi−1j−1)≤β

dist(q, qθij−1

β)θ(q, pi−1j−1)> β

i[θij−1, θij)





dist(q, qri−1

α)qθ< α

ri−1−qdα≤qθ≤β

dist(q, qri−1

β)qθ> β

i[θij,π

2]









dist(q, qθij

α)θ(q, pi−1j)< α

dist(q, pi−1j)α≤θ(q, pi−1j)≤β

dist(q, qri−1

β)θ(q, pi−1j)> β

Ri[0, θij−1)dist(q, qθij−1

β)

Ri[θij−1, θij)0

Ri[θij,π

2]dist(q, qθij

α)

IV. SEA RCH ALGORI THMS

In this section, we ﬁrst give an algorithm to answer a query

with direction 0≤α≤β≤π

2(Section IV-A), and then discuss

how to answer a query with any direction (Section IV-B).

A. Answering Queries with 0≤α≤β≤π

We combine our pruning techniques and M IN DIST func-

tions to answer a query with direction 0≤α≤β≤π

Figure 9 gives the pseudo-code of our algorithm. To efﬁciently

ﬁnd knearest neighbors of q, we maintain a priority queue Q

(line 2) and keep the k-th smallest distance of POIs in Q

to q(dk) that have already been computed (line 3). Given

a query q, we ﬁrst locate which region query qappears

using a binary search method on radiuses r1, r2,··· , rN

(line 4). Suppose we ﬁnd Risuch that ri−1≤qd< ri. If

MIN DIST(q, Ri)≥dk, we terminate as there is no answer

in Ri···RN(line 6); otherwise for each region Ri, we ﬁnd

the “candidate regions” which have overlap with the search

direction and contain all keywords in K, by calling function

FIN DCANDREGIO NS(line 7). Next for each candidate region

Rij∈ CRi, if MI NDI ST(q, Rij)≥dk, we break as there is no

answer in Rij···RiM(line 9); otherwise we ﬁnd “candidate

POIs” in Rijwhich are in the search direction and contain

all keywords, by calling function FINDCANDPOIS(line 10).

Finally we need to access region Ri+1 if necessary (line 11).

Iteratively we can ﬁnd the knearest neighbors of query q.

Then we discuss how to compute the candidate regions

in Ri. Function FI NDCANDREGI ON S gives the pseudo-code

(Figure 9). We ﬁrst compute the lower direction bound τRi

and the upper direction bound τRi

u(line 2). Next we ﬁnd

the regions satisfying the direction constraint Ri[α, β] =

{Ril,··· ,Riu}(in [τRi

l, τ Ri

u]) using a binary search method

on the directions θi1,··· , θiM(line 3). Then if the inverted

lists are in memory, we check whether each region in Ri[α, β]

contains all keywords and add such regions into candidate-

region set CRi. If we use a disk-based method, we load

region inverted lists for each keyword LR

ki(line 4), compute

their intersection LR

Kthat satisﬁes keyword constraint (line 5),

intersect the regions satisfying keyword constraint LR

Kwith

the regions satisfying region constraint Ri[α, β], and get

i[α, β](line 6). For each region Rij∈ RK

i[α, β], if

Algorithm 1: DE SK S-BAISC (P, q)

Input:P: A collection of POIs

q=h(q.x, q.y ); [α, β]; K, ki: A query

Output:Pk

q={p|p∈ Pqand pis a knn of q}, where Pq

is the set of POIs in the search direction that

contain all the keywords in K.

begin1

Initialize an empty priority queue Q;2

Let dkdenote the k-th smallest distance in Q;3

Locate the region Riwhere qappears using a binary4

search on r1,··· , rN;

while i≤Ndo5

if MIN DIST (q, Ri)≥dkthen return;6

else CRi= FINDCANDRE GI ONS(q,Ri,dk) ;7

for Rij∈ CRi(CRiare sorted) do8

if MIN DIST (q, Rij)≥dkthen break;9

else FINDCANDPO IS(q,Rij,dk,Q) ;10

i=i+ 1 ;11

end12

Function FINDCANDREGIONS(q,Ri,dk)

Input:q=h(q.x, q.y ); [α, β]; K, ki: A query

dk: The k-th smallest distance in Q

Ri: Region Ri

Output:CRi: A sorted candidate-region set

begin1

Compute direction bounds τRi

land τRi

u;2

Find regions Ri[α, β] = {Ril,··· ,Riu}in3

[τRi

l, τ Ri

u]using a binary search on θi1···θiM;

Load region inverted lists LR

kifor ki∈ K ;

Compute LR

K=∩ki∈KLR

ki;

Compute RK

i[α, β] = Ri[α, β ]∩ LR

K;6

for Rij∈ RK

i[α, β]do7

if MIN DIST (q, Rij)< dkthen CRi← Rij;8

Sort CRibased on the MINDIST function ;9

end10

Function FINDCANDPOI S(q,Rij,dk,Q)

Input:q=h(q.x, q.y ); [α, β]; K, ki: A query

dk: The k-th smallest distance in Q

Rij: Region Rij;Q: Queue

begin1

Load POI inverted lists LP

ki(Rij)for ki∈ K ;

Compute intersection LP

K=∩ki∈KLP

ki(Rij);

for p∈ LP

Kdo4

if α≤θ(q, p)≤β&dist(q, p)<dkthen5

add pinto Q, and update Qand dk;6

end7

Fig. 9. DE SK S-BAI SC algorithm (using disk-based inverted lists)

MIN DIST(q, Rij)< dk, we add Rijinto the candidate-region

set CRi(line 8). Finally we sort the regions in CRibased on

the MI NDIST function in ascending order (line 9).

Next we discuss how to compute the candidate POIs in Rij.

Function FI NDCANDPOI Sgives the pseudo-code (Figure 9).

If the POI inverted lists are in memory, we directly compute

ii-

Fig. 10. Pruning for [π

2≤α < β < π]

ii-

Fig. 11. Pruning for [π≤α < β < 3π

Fig. 12. Pruning for [3π

2≤α < β < 2π]

their intersection. If the POI inverted lists are on disk, we load

the POI inverted lists for each keyword. Note that for kiwe

only load POIs that are in Rij,LP

ki(Rij), based on the pointers

in region lists as shown in Figure 1 (line 2). Then we compute

the intersection of POI lists LP

K=∩ki∈KLP

ki(Rij)(line 3). For

p∈LP

K, if α≤θ(q, p)≤βand dist(q, p)<dk,pis a candidate.

We add pinto the priority queue and update dk(line 6).

B. Answering Queries with Any Direction

In this section, we discuss how to answer a query with

arbitrary directions. We ﬁrst classify queries into basic queries

and complex queries as follows.

•Case 1 – Basic Queries:

–0≤α≤β≤π

2. We answer it using the index

structures on Obl as discussed in the above sections.

–π

2≤α≤β≤π. We answer it using the index

structures on Obr , which is similar to answer a query

with 0≤α≤β≤π

2as shown in Figure 10.

–π≤α≤β≤3π

2. We answer it using the index

structures on Otr , which is similar to answer a query

with 0≤α≤β≤π

2as shown in Figure 11.

–3π

2≤α≤β≤2π. We answer it using the index

structures on Otl , which is similar to answer a query

with 0≤α≤β≤π

2as shown in Figure 12.

•Case 2 – Complex Queries: All other queries are called

complex queries. For a complex query qwith direction

[α, β], we decompose qinto at most four basic queries:

(1) q1with direction [0,π

2)∩[α, β]; (2) q2with direction

[π

2, π)∩[α, β]; (3) q3with direction [π, 3π

2)∩[α, β]; and

(4) q4with direction [3π

2,2π)∩[α, β]. Thus we can ﬁrst

answer the sub-queries and then combine the results to

generate the ﬁnal answers of query q.§

A straightforward method to answer a complex query ﬁrst

decomposes it into basic sub-queries and then computes k

nearest neighbors for each basic query. Finally it ﬁnds the real

knearest neighbors by combing the results of each basic query.

However this method is very expensive as some sub-queries

may have no real answers and we do not need to answer such

sub-queries. To this end, we propose an efﬁcient algorithm by

pruning many unnecessary POIs. For each basic query, we ﬁrst

compute their candidate regions. Then we sort the candidate

regions based on their MI NDI ST values. Next we access the

§We use α∈[0,2π)and β≤α+2πto denote any direction. If β > 2π,

we decompose the direction to [α, 2π)and [2π, β ] = [0, β −2π]. Then we

decompose them to basic queries and generate at most ﬁve sub-queries.

Algorithm 2: DE SK S (P, q)

Input:P: A collection of POIs

q=h(q.x, q.y ); [α, β]; K, ki: A query

Output:Pk

q={p|p∈ Pqand pis a knn of q}, where Pq

is the set of POIs in the search direction that

contain all the keywords in K.

begin1

Initialize an empty priority queue QPfor POIs;2

Let dkdenote the k-th small distance in QP;3

Initialize an empty priority queue QRfor regions;4

Decompose qinto q1, q2,··· , q4;/*some may be empty*/5

for 1≤s≤4do6

Locate region Risfor qswhere qsappears;7

add Risinto QR;8

while QR6=φdo9

Get region Rimwith minimal MINDI ST(q , Rim);10

if MIN DIST (q, Rim)≥dkthen return;11

else CRim=FI ND CANDREGION S(q,Rim,dk);12

for Rim

j∈ CRimdo

if MIN DIST (q, Rim

j)≥dkthen break;

else FINDCANDPO IS(q,Rim,dk,QP) ;15

Pop Rimfrom QR;16

if MIN DIST (q, Rim+1)< dkthen17

add Rim+1 into QR;18

end19

Fig. 13. DE SK S algorithm (using disk-based inverted lists)

candidate regions in order and prune unnecessary regions. The

pseudo-code of the algorithm is shown in Figure 13.

We maintain two priority queues: QPfor candidate POIs

(line 2) and QRfor regions (line 4). We ﬁrst decompose the

query into at most four sub-queries (line 5). Then for each

sub-query qs, we locate which region qsappears (line 7) and

add the region Risinto region queue QR(line 8). Then we

ﬁnd region Rimwith the minimal M IN DIST value in QR

(line 10). If M IN DIST(q, Rim)≥dk, we terminate as we

have found knearest neighbors (line 11); otherwise we ﬁnd

candidate regions in Rim,CRim(line 12). For each candidate

region Rim

j∈ CRim, if MINDI ST(q, Rim

j)≥dk, we break

as there is no answer in Rim

j· · · Rim

M(line 14); otherwise,

we compute candidate POIs in Rim

j(line 15). Next we pop

Rimfrom QR. For the next region Rim+1 after Rim, if

MIN DIST(q, Rim+1)< dk, we add it into region queue QR

(line 18). Iteratively we can ﬁnd all answers of query q.

V. IN CREME NTAL SEAR CH AL GORIT HMS

Mobilephone users will dynamically change directions if

they cannot ﬁnd expected answers in the current direction.

A naive method is to answer a new query from scratch.

However this method is very expensive. To address this issue,

we propose to incrementally answer a query based on the

cached results of previously issued queries. To avoid involving

huge space, we only cache knearest neighbors for a query.

We consider the following two cases to update a direction¶.

Case 1: The user increases a direction from [α, β]to [α′<

α, β′> β]. This corresponds to the case that the user increases

the direction using two ﬁngers on the mobilephone screen.

Section V-A discusses how to answer such a query efﬁciently.

Case 2: The user moves the direction from [α, β]to [α+

δθ, β +δθ]. This corresponds to the case that the user changes

the direction by moving the mobilephone direction. Section V-

B discusses how to answer such a query efﬁciently.

Note that our method can support any direction-update

queries using these two operations.

A. Increasing The Direction

Suppose a user has issued a query qwith direction [α, β]

and then the user issues a new query q′by increasing the

direction to [α′< α, β′> β]. We use the cached results

of qto answer this new query q′as follows. Obviously, an

answer of qmust be an answer of q′. Let d′

k(dk)denote the

k-th smallest distance of nearest neighbors to query q′(q). We

have d′

k≤dk. Thus we can use dkas an upper bound.

We insert knearest neighbors of qinto the priority queue

of q′. Then we decompose q′into three queries, q1[α′, α],

q2[β, β ′], and q[α, β]. We only need to answer q1and q2

with bound dk. We answer the two queries simultaneously as

answering sub-queries in Section IV-B. Note that in the two

new directions, if there is a POI p(or region Rij) with distance

to qlarger than dk, we prune p(or region Rij); otherwise we

insert it into the priority queue (or access the region). Thus

we can incrementally and efﬁciently answer query q′.

B. Moving The Direction

Suppose a user has issued a query qwith direction [α, β]and

then the user issues a new query q′by moving the direction

to [α+δθ, β +δθ]. Firstly consider δθ>0. If α+δθ>β,qand

q′have no overlapped direction and we answer the new query

from scratch. On the contrary, qand q′have an overlapped

direction [α+δθ, β]. We examine each knearest neighbors of

q, and if it is in [α+δθ, β], we insert it into the priority queue

of query q′and update the k-th smallest threshold d′

k. Then we

answer the new query with direction [β, β+δθ]using threshold

d′

k. If we ﬁnd kanswers in the priority queue or in direction

[β, β +δθ]within distance dk, we do not need to access

regions in direction [α+δθ, β ]; otherwise, we need to access

those regions in direction [α+δθ, β]with MIN DIST values no

smaller than dk. Thus we can use the bound d′

kto do effective

¶In this paper we do not consider moving queries (changing locations).

pruning. Similarly if δθ<0and β+δθ>α, we can use the above

method to answer query q′with direction [α+δθ, α]. Thus we

can incrementally and efﬁciently answer query q′.

VI. EXPER IM ENTAL STU DY

We have implemented our proposed methods. We com-

pared with two state-of-the-art methods MIR2-tree [6] and

LkT [5]k. We extended their methods to support direction-

aware search by examining whether each accessed MBR (or

POI) is in search direction. For LkT, we got the codes from

the authors [5] which were implemented in Java. For MIR2-

tree, we implemented it in C++. Our algorithms were also

implemented in C++. All the C++ codes were compiled using

GCC 4.2.3 with -O3 ﬂag. As the baseline algorithms used

disk-based indexes, we also used disk-based index structure.

All the experiments were run on a Ubuntu machine with an

Intel Core E5450 3.0GHz CPU and 4 GB memory.

We used three real datasets, POIs in California(CA), POIs

in Virginia(VA), and POIs in China(CN). The statistics of the

datasets was summarized in Table II. We generated ﬁve query

sets with keyword numbers from 1 to 5 and each query set

had 1000 queries. TABLE II

DATASE TS.

CA VA CN

Total number of POIs (millions) 0.91 0.96 16.5

Total number of terms (millions) 9.7 4.663.6

Total number of unique terms (thousands) 35 26 753

Average number of unique terms per POI 8.57 4.53.85

A. Varying Mand N

In this section, we evaluate the effect on varying region

number Nand sub-region number M. Figure 14 shows the

results. We see that different values of Nand Mhad no

signiﬁcant effect on the performance for M > 50. On

the VA dataset, the running time was about 2.3-2.7 ms on

every combinations of Mand N, and we got the highest

performance at N=100 and M=150. On the CA dataset, the

running time was 11-15 ms for different Mand Nvalues,

and we got the highest performance at N=100 and M=150.

On the CN dataset, the time was about 9-16 ms. The highest

performance was achieved at N=1000 and M= 600. Based

on the results, we had a conclusion that each region Riwas

better to contain 10, 000 POIs and each sub-region Rijwas

better to contain 100 POIs. In the reminder experiments, we

used N=100 and M=150 on the CA and VA datasets, and

N=1000 and M=600 on the CN dataset.

B. Evaluation on Pruning Techniques

In this section, we evaluate our pruning techniques. We im-

plemented three methods. (1) DE SK S+R: We used the region-

pruning techniques and function MI NDIS T(q, Ri)to prune Ri.

(2) DE SK S+D: We used the direction-pruning techniques and

function MI NDI ST(q , Rij)to prune Rij. (3) D ES KS+RD: We

used both region pruning and direction pruning.

Varying k:We ﬁrst evaluated the pruning techniques by

varying kon the 5000 queries and α=0,β=π

3. Figure 15 shows

kAs MIR2-tree generally achieves much higher performance than IR2-

tree, we do not report results for IR2-tree.

2.2

2.3

2.4

2.5

2.6

2.7

2.8

2.9

50 100 150 200 250

Elapsed Time (ms)

M= 50

M=100

M=150

M=200

M=250

(a) VA

50 100 150 200 250

Elapsed Time (ms)

M= 50

M=100

M=150

M=200

M=250

(b) CA

0 200 400 600 800 1000

Elapsed Time (ms)

M= 200

M= 400

M= 600

M= 800

M=1000

Fig. 14. Average search performance: Varying Mand N(5000 queries, k= 10,α= 0, β =π

1 5 10 20 50 100

Elapsed Time (ms)

Top-k

Desks+R

Desks+D

Desks+RD

(a) VA

1 5 10 20 50 100

Elapsed Time (ms)

Top-k

Desks+R

Desks+D

Desks+RD

(b) CA

1 5 10 20 50 100

Elapsed Time (ms)

Top-k

Desks+R

Desks+D

Desks+RD

Fig. 15. Average search performance: Varying k(5000 queries, α= 0, β =π

1 2 3 4 5 6 7 8 9 10 11 12

Elapsed Time (ms)

Directions (* π/6)

Desks+R

Desks+D

Desks+RD

(a) VA

100

120

140

1 2 3 4 5 6 7 8 9 10 11 12

Elapsed Time (ms)

Directions (* π/6)

Desks+R

Desks+D

Desks+RD

(b) CA

100

150

1 2 3 4 5 6 7 8 9 10 11 12

Elapsed Time (ms)

Directions (* π/6)

Desks+R

Desks+D

Desks+RD

Fig. 16. Average search performance: Varying directions β−αfrom π

6to 2π(5000 queries, k= 10)

the results. We can see that DE SKS+D and DESK S+RD sig-

niﬁcantly outperformed DES KS+R. This is because DESKS +R

needed to access many unnecessary regions and the direction-

based pruning can prune large numbers of unnecessary re-

gions. DE SK S+RD was also better than DESK S+D, especially

on the CN dataset. This is because DESK S+RD can prune

many regions Ri. For example, on the CN dataset, for k=100,

DES KS+R took 55 ms, DESKS +D improved it to 32 ms, and

DES KS+RD further improved it to 16 ms. There are two

reasons that the improvement of DESK S+RD over DESK S+D

was not signiﬁcant on the CA and VA datasets. Firstly, there

were small numbers of POIs that contain all keywords. Both

DES KS+D and DESK S+RD needed to access many regions.

Secondly, there were small numbers of regions (Ri). As

N=100, DE SK S+RD cannot prune large numbers of regions.

Varying directions: We evaluated the pruning techniques

by varying directions on 5000 queries and k= 10. Fig-

ure 16 shows the results. Similarly DE SKS+D and DESKS +RD

signiﬁcantly outperformed D ES KS+R. On the VA dataset,

DES KS+R took more than 20 ms to answer a query, and

DES KS+D and DESK S+RD only took about 2 ms. This

is because DESK S+R needed to enumerate many regions

while DE SKS+D and DE SKS+RD can prune large numbers

of regions based on the direction-aware indexes.

C. Comparison with Existing Methods

We compared our algorithm DESKS (DESKS+RD) with

state-of-the-art methods MIR2-tree and LkT. We ﬁrst com-

pared the index sizes and time as shown in Table III. LkT

TABLE III

IND EXI NG T IME A ND S IZ ES.

Data Sizes(MB) Index Sizes (MB) Index Time (Minutes)

MIR2-tree LkT DES KS MIR2-tree LkT DE SK S

CA 72.2 72 1430 265 1.3 780 1.8

VA 54.8 76 920 149 0.8 690 1.2

CN 805 1304 – 3552 25 – 33

was very expensive to build indexes as it needed to cluster

keywords in POIs. On the CN dataset, it took more than 2

days to index 1 million POIs, and it will take 1 month to

index 16 million POIs. Thus we did not show the results on

the CN dataset. MIR2-tree used R-tree and keyword signatures

to build indexes. Although DES KS had larger index sizes than

MIR2-tree (as DE SK S built indexes for Obl, Obr , Otr , Otl),

DES KS still had acceptable index sizes. LkT had much larger

index sizes as it built inverted lists for each R-tree node.

Varying directions: We ﬁrst compared different methods by

varying directions on 5000 queries and k= 10. Figure 17

shows the results. Although LkT and MIR2-tree achieved high

performance for large directions, they were very slow for small

directions. This is because they needed to enumerate many

MBRs and POIs, which was very expensive. For example, on

the CA dataset, they took 200 ms for direction 2π, but took

more than 5 seconds for direction π

3. DE SKS only took 20

ms for any direction, since DES KS can use the index to do

effective direction pruning. Even for the direction with 2π,

DES KS still outperformed existing methods. There are three

reasons. Firstly, our region structure is very effective and can

be in memory. Secondly, our region inverted lists can prune

many unnecessary POIs. Thirdly, existing methods usually

100

1000

10000

1 2 3 4 5 6 7 8 9 10 11 12

Elapsed Time (ms)

Directions (* π/6)

LkT

MIR2-Tree

Desks

(a) VA

100

1000

10000

100000

1 2 3 4 5 6 7 8 9 10 11 12

Elapsed Time (ms)

Directions (* π/6)

LkT

MIR2-Tree

Desks

(b) CA

100

1000

10000

100000

1 2 3 4 5 6 7 8 9 10 11 12

Elapsed Time (ms)

Directions (* π/6)

MIR2-Tree

Desks

Fig. 17. Performance comparison: Varying directions β−αfrom π

6to 2π(5000 queries, k= 10)

100

1000

1 5 10 20 50 100

Elapsed Time (ms)

Top-k

LkT

MIR2-Tree

Desks

(a) VA

100

1000

10000

1 5 10 20 50 100

Elapsed Time (ms)

Top-k

LkT

MIR2-Tree

Desks

(b) CA

100

1000

10000

1 5 10 20 50 100

Elapsed Time (ms)

Top-k

MIR2-Tree

Desks

Fig. 18. Performance comparison: Varying k(5000 queries, α= 0, β =π

0.1

100

1000

10000

1 2 3 4 5

Elapsed Time (ms)

Numbers of Keywords

LkT

MIR2-Tree

Desks

(a) VA

0.1

100

1000

10000

100000

1 2 3 4 5

Elapsed Time (ms)

Numbers of Keywords

LkT

MIR2-Tree

Desks

(b) CA

0.1

100

1000

10000

100000

1 2 3 4 5

Elapsed Time (ms)

Numbers of Keywords

MIR2-Tree

Desks

Fig. 19. Performance comparison: Varying numbers of keywords (1000 queries in each query set, k= 10,α= 0, β =π

achieved high performance for POIs with many keywords

(documents) [5]. However real POIs have no many keywords.

Varying k:Then we compared different methods by varying

kon 5000 queries and α= 0, β =π

3. Figure 18 shows the

results. We can see that DESKS signiﬁcantly outperformed

MIR2-tree and LkT, even in 2-3 orders of magnitude. On the

VA dataset, MIR2-tree and LkT took about 500 ms, and DESKS

improved the time to 2-5 ms. The main reason is that existing

methods cannot use the index to do effective direction pruning.

DES KS used the novel direction-aware index which can prune

large numbers of unnecessary regions and POIs.

Varying the number of keywords: Next we compared

different methods by varying keyword numbers and setting

k= 10 and α= 0, β =π

3. Figure 19 shows the results.

We can see that for different numbers of keywords, DESK S

was still much better than MIR2-tree and LkT. For different

numbers of keywords, DESKS only took about 10-20 ms.

D. Evaluation on Incremental Search

In this section, we test our incremental search method. We

ﬁrst initialized queries with β−α=π

3and then increased direc-

tions by π

36 ,··· ,12π

36 . Figure 20(a) shows the results. We can

see that our incremental method DE SKS-I NC RE outperformed

DES KS. This is because DE SKS-I NC RE can incrementally

answer a query using the previously issued queries. We also

evaluated DES KS -INCR E by moving directions. Figure 20(b)

shows the results. We still initialized queries with β−α=π

and then moved the directions by −6π

36 ,··· ,6π

36 . We can see

that for a small direction, DES KS -INCR E was much better than

DES KS, as DE SKS-I NC RE can use a tighter bound to answer

new queries. For a large direction, the improvement was not

high as DESKS- IN CRE needed to answer queries from scratch.

E. Scalability

In this section, we evaluate the scalability on the CN dataset

by varying numbers of POIs. Figure 21 shows the results with

different kvalues and directions. We can see that our method

scaled very well. This is contributed to our effective direction-

aware index structures and effective pruning techniques.

VII. RELATED WO RK

Many studies on spatial keyword search have been proposed

recently [25], [3], [9], [6], [23], [5], [24], [22], [1], [21], [2],

[19], [13]. The most related work to our problem is the study

by Felipe et al. [6], which proposed the index structures by

integrating signature ﬁles and R-tree to enable top-kspatial

keyword queries. Another similar study [5] is provided by

Cong et al., which combined inverted ﬁles and R-tree to

answer the location-aware top-ktext retrieval (LkT) query.

Our direction-aware spatial keyword query is different from

their methods as we have a direction constraint.

Zhou et al. [25] proposed to ﬁnd web documents relevant

to user input keywords within a pre-speciﬁed region. They

developed several methods by combining R-tree and inverted

indexes. Chen et al. [3] extended this problem by supporting

1 2 3 4 5 6 7 8 9 10 11 12

Elapsed Time (ms)

Directions (* π/36)

Desks

Desks-Incre

(a) Increasing directions

-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6

Elapsed Time (ms)

Directions (* π/36)

Desks

Desks-Incre

(b) Moving directions

Fig. 20. Incremental Search on the CN dataset (k= 10)

2 4 6 8 10 12 14 16

Elapsed Time (ms)

Numbers of POIs (* million)

k=100

k= 50

k= 20

k= 10

k= 1

(a) Varying k(β−α=π

2 4 6 8 10 12 14 16

Elapsed Time (ms)

Numbers of POIs (* million)

2π

5π/3

4π/3

2π/3

π/3

(b) Varying directions (k= 10)

Fig. 21. Scalability on the CN dataset

large numbers of “footprint representations.” Hariharan et

al. [9] focused on ﬁnding objects containing a set of key-

words within a speciﬁc region. They proposed a hybrid index

structures by integrating R-tree and inverted lists. Zhang et

al. [23], [24] introduced the m-closest keyword query (mCK

query) which aims at ﬁnding the closest objects that match

keywords. Cong et al. [1] studied how to ﬁnd top-kprestige-

based relevant spatial web objects. Yao et al. [22] tackled

the problem of answering approximate string match queries

in spatial databases. Wu et al. [21] studied spatial keyword

search for moving objects. Lu et al. [13] extended reversed

knn techniques to support reverse spatial and textual knearest

neighbor search. Roy and Chakrabarti [19] studied type-ahead

search in spatial databases using materialization techniques.

Cao et al. [2] studied collective keyword search by considering

multiple points. Leung et al. [12] proposed to use locations for

personalized search. Obviously the above queries substantially

differ from our direction-aware spatial keyword query.

There are many studies on knn [18], [16], [10], [11], [20],

[17]. Ferhatosmanoglu et al. [7] studied constrained nearest

neighbor search using polygon as a constraint. Cheng et al. [4]

studied constrained knn queries over uncertain data. Gao et

al. [8] and Nutanong et al. [14] proposed to answer visible

knn queries. Patroumpas et al. [15] studied the problem of

monitoring object orientations. However their methods cannot

support our problem as we support keyword-based search. We

consider direction constraint which is different from theirs.

Although we can build two separate indexes, one for key-

words and another for locations, this method is expensive, as

it cannot simultaneously apply textual and spatial pruning.

VIII. CON CL USION

In this paper we have studied the problem of direction-

aware spatial keyword search. We ﬁnd the knearest neighbors

to the query that contain all input keywords and satisfy the

direction constraint. To efﬁciently answer a direction-aware

spatial keyword query, we proposed novel indexing structures,

which can prune large number of unnecessary POIs. We

developed effective region-based pruning and direction-based

pruning techniques to increase the search performance. We

devised efﬁcient algorithms to answer direction-aware spatial

keyword queries. We also studied how to incrementally answer

a query. We have implemented our algorithms, and experimen-

tal results show that our method achieves high performance

and outperforms existing methods signiﬁcantly.

IX. ACKNOW LEDGE ME NT

The authors would like to thank the anonymous reviewers for

their constructive comments and suggestions. This work was partly

supported by the National Natural Science Foundation of China under

Grant No. 61003004 and 60873065, National Grand Fundamental Re-

search 973 Program of China under Grant No. 2011CB302206, Na-

tional S&T Major Project of China under Grant No. 2011ZX01042-

001-002, and “NExT Research Center” funded by MDA, Singapore,

under Grant No. WBS:R-252-300-001-490.

REF ERENC ES

[1] X. Cao, G. Cong, and C. S. Jensen. Retrieving top-k prestige-based

relevant spatial web objects. PVLDB, 3(1):373–384, 2010.

[2] X. Cao, G. Cong, C. S. Jensen, and B. C. Ooi. Collective spatial keyword

querying. In SIGMOD Conference, pages 373–384, 2011.

[3] Y.-Y. Chen, T. Suel, and A. Markowetz. Efﬁcient query processing in

geographic web search engines. In SIGMOD Conference, pages 277–

288, 2006.

[4] R. Cheng, J. Chen, M. F. Mokbel, and C.-Y. Chow. Probabilistic

veriﬁers: Evaluating constrained nearest-neighbor queries over uncertain

data. In ICDE, pages 973–982, 2008.

[5] G. Cong, C. S. Jensen, and D. Wu. Efﬁcient retrieval of the top-k most

relevant spatial web objects. PVLDB, 2009.

[6] I. D. Felipe, V. Hristidis, and N. Rishe. Keyword search on spatial

databases. In ICDE, 2008.

[7] H. Ferhatosmanoglu, I. Stanoi, D. Agrawal, and A. E. Abbadi. Con-

strained nearest neighbor queries. In SSTD, pages 257–278, 2001.

[8] Y. Gao, B. Zheng, W.-C. Lee, and G. Chen. Continuous visible nearest

neighbor queries. In EDBT, pages 144–155, 2009.

[9] R. Hariharan, B. Hore, C. Li, and S. Mehrotra. Processing spatial-

keyword (SK) queries in geographic information retrieval (GIR) systems.

In SSDBM, 2007.

[10] G. R. Hjaltason and H. Samet. Distance browsing in spatial databases.

ACM Trans. Database Syst., 1999.

[11] M. R. Kolahdouzan and C. Shahabi. Voronoi-based k nearest neighbor

search for spatial network databases. In VLDB, pages 840–851, 2004.

[12] K. W.-T. Leung, D. L. Lee, and W.-C. Lee. Personalized web search

with location preferences. In ICDE, pages 701–712, 2010.

[13] J. Lu, Y. Lu, and G. Cong. Reverse spatial and textual k nearest neighbor

search. In SIGMOD Conference, pages 349–360, 2011.

[14] S. Nutanong, E. Tanin, and R. Zhang. Visible nearest neighbor queries.

In DASFAA, pages 876–883, 2007.

[15] K. Patroumpas and T. K. Sellis. Monitoring orientation of moving

objects around focal points. In SSTD, pages 228–246, 2009.

[16] S. Pramanik and J. Li. Fast approximate search algorithm for nearest

neighbor queries in high dimensions. In ICDE, page 251, 1999.

[17] J. B. Rocha-Junior, A. Vlachou, C. Doulkeridis, and K. Nørv ˚ag. Efﬁcient

processing of top-k spatial preference queries. PVLDB, 4(2):93–104,

2010.

[18] N. Roussopoulos, S. Kelley, and F. Vincent. Nearest neighbor queries.

In SIGMOD Conference, 1995.

[19] S. B. Roy and K. Chakrabarti. Location-aware type ahead search on

spatial databases: semantics and efﬁciency. In SIGMOD Conference,

pages 361–372, 2011.

[20] Y. Tao, D. Papadias, and Q. Shen. Continuous nearest neighbor search.

In VLDB, pages 287–298, 2002.

[21] D. Wu, M. L. Yiu, C. S. Jensen, and G. Cong. Efﬁcient continuously

moving top-k spatial keyword query processing. In ICDE, pages 541–

552, 2011.

[22] B. Yao, F. Li, M. Hadjieleftheriou, and K. Hou. Approximate string

search in spatial databases. In ICDE, 2010.

[23] D. Zhang, Y. M. Chee, A. Mondal, A. K. H. Tung, and M. Kitsuregawa.

Keyword search in spatial databases: Towards searching by document.

In ICDE, 2009.

[24] D. Zhang, B. C. Ooi, and A. K. H. Tung. Locating mapped resources

in web 2.0. In ICDE, pages 521–532, 2010.

[25] Y. Zhou, X. Xie, C. Wang, Y. Gong, and W.-Y. Ma. Hybrid index

structures for location-based web search. In CIKM, 2005.

Location- and keyword-based querying of geo-textual data: a survey

Article

Full-text available

Mar 2021
VLDB J

With the broad adoption of mobile devices, notably smartphones, keyword-based search for content has seen increasing use by mobile users, who are often interested in content related to their geographical location. We have also witnessed a proliferation of geo-textual content that encompasses both textual and geographical information. Examples include geo-tagged microblog posts, yellow pages, and web pages related to entities with physical locations. Over the past decade, substantial research has been conducted on integrating location into keyword-based querying of geo-textual content in settings where the underlying data is assumed to be either relatively static or is assumed to stream into a system that maintains a set of continuous queries. This paper offers a survey of both the research problems studied and the solutions proposed in these two settings. As such, it aims to offer the reader a first understanding of key concepts and techniques, and it serves as an “index” for researchers who are interested in exploring the concepts and techniques underlying proposed solutions to the querying of geo-textual data.

Defining and designing spatial queries: the role of spatial relationships

Article

May 2023

Anderson Chaves Carniel

Spatial relationships are core components in the design and definition of spatial queries. A spatial relationship determines how two or more spatial objects are related or connected in space. Hence, given a spatial dataset, users can retrieve spatial objects in a given relationship with a search object. Different interpretations of spatial relationships are conceivable, leading to different types of relationships. The main types are (i) topological relationships (e.g. overlap, meet, inside), (ii) metric relationships (e.g. nearest neighbors), and (iii) direction relationships (e.g. cardinal directions). Although spatial information retrieval has been extensively studied in the literature, it is unclear which types of spatial queries can be defined using spatial relationships. In this article, we introduce a taxonomy for naming, describing, and classifying types of spatial queries frequently found in the literature. This taxonomy is based on the types of spatial relationships that are employed by spatial queries. By using this taxonomy, we discuss the intuitive descriptions, formal definitions, and possible implementation techniques of several types of spatial queries. The discussions lead to the identification of correspondences between types of spatial queries. Further, we identify challenges and open research topics in the spatial information retrieval area.

Example Searcher: A Spatial Query System via Example

Conference Paper

Apr 2023

A Deep Generative Model for Trajectory Modeling and Utilization

Article

Feb 2023

Modern location-based systems have stimulated explosive growth of urban trajectory data and promoted many real-world applications, e.g. , trajectory prediction. However, heavy big data processing overhead and privacy concerns hinder trajectory acquisition and utilization. Inspired by regular trajectory distribution on transportation road networks, we propose to model trajectory data privately with a deep generative model and leverage the model to generate representative trajectories for downstream tasks or directly support these tasks ( e.g. , popularity ranking), rather than acquiring and processing the original big trajectory data. Nevertheless, it is rather challenging to model high-dimensional trajectories with time-varying yet skewed distribution. To address this problem, we model and generate trajectory sequence with judiciously encoded spatio-temporal features over skewed distribution by leveraging an important factor neglected by the literature - the underlying road properties ( e.g. , road types and directions), which are closely related to trajectory distribution. Specifically, we decompose trajectory into map-matched road sequence with temporal information and embed them to encode spatio-temporal features. Then, we enhance trajectory representation by encoding inherent route planning patterns from the underlying road properties. Later, we encode spatial correlations among edges and daily and weekly temporal periodicity information. Next, we employ a meta-learning module to generate trajectory sequence step by step by learning generalized trajectory distribution patterns from skewed trajectory data based on the well-encoded trajectory prefix. Last but not least, we preserve trajectory privacy by learning the model differential privately with clipping gradients. Experiments on real-world datasets show that our method significantly outperforms existing methods.

DAPC: Answering Why-Not Questions on Top-$k$ Direction-Aware ASK Queries in Polar Coordinates

Article

May 2023

A direction-aware augmented spatial keyword top- $k$ query (DAT $k$ Q) returns the top- $k$ objects based on a ranking function that considers spatial distance, textual similarity, query numeric attributes, and query direction. When a user initiates a DAT $k$ Q, some user-desired objects (missing objects) may not appear in the query result set, and then the user wonders why they do not appear, which is called the why-not question. This paper focuses on answering why-not questions on DAT $k$ Qs. We first discuss how to obtain the refined query direction by analyzing the position relationship between missing objects and original query direction in Polar coordinates. Then a DAPC index structure is designed, which can cut down irrelevant search space based on not only conventional distance pruning, keyword pruning, and attribute pruning but also query direction pruning. Particularly, by comparing the position relationship between the query direction and the sector (sector ring) region segmented by the DAPC-based method, the search space that does not meet the query direction is pruned. In addition, we discuss the applicability of our scheme for handling why-not questions on regional spatial keyword queries (SKQ), ordinary direction-aware top- $k$ SKQ queries and complex scoring SKQ queries. Finally, a series of experiments are conducted on two real datasets to show the efficiency of our DAPC-based method.

Route Travel Time Estimation on a Road Network Revisited: Heterogeneity, Proximity, Periodicity and Dynamicity

Article

Jan 2023

In this paper, we revisit the problem of route travel time estimation on a road network and aim to boost its accuracy by capturing and utilizing spatio-temporal features from four significant aspects: heterogeneity, proximity, periodicity and dynamicity. Spatial-wise, we consider two forms of heterogeneity at link level in a road network: the turning ways between different links are heterogeneous which can make the travel time of the same link various; different links contain heterogeneous attributes and thereby lead to different travel time. In addition, we take into account the proximity: neighboring links have similar traffic patterns and lead to similar travel speeds. To this end, we build a link-connection graph to capture such heterogeneity and proximity. Temporal-wise, the weekly/daily periodicity of temporal background information (e.g., rush hours) and dynamic traffic conditions have significant impact on the travel time, which result in static and dynamic spatio-temporal features respectively. To capture such impacts, we regard the travel time/speed as a combination of static and dynamic parts, and extract many spatio-temporal relevant features for the prediction task. Talking about the methodology, it remains an open problem to build a generic learning model to boost the estimation accuracy. Hence, we design a novel encoder-decoder framework - The encoder uses the sequence attention model to encode dynamic features from the temporal-wise perspective. The decoder first uses the heterogeneous graph attention model to decode the static part of travel speed based on static spatio-temporal features, and then leverages the sequence attention model to decode the estimated travel time from spatial-wise perspective. Extensive experiments on real datasets verify the superiority of our method as well as the importance of the four aspects outlined above.

Road-Aware Indexing for Trajectory Range Queries

Article

Jan 2022

Answering spatio-temporal range queries (RQs) on trajectory databases, i.e., finding all trajectories that intersect given ranges, is crucial in many real-world applications. Various kinds of indexes have been proposed to accelerate RQs. However, existing indexes typically use Euclidean distance to prune irrelevant regions without concerning the underlying road network information. Nevertheless, as vehicle trajectories are generated on road network edges, the road network could be seen as meta knowledge of trajectories and be used to index and query trajectories. To this end, we propose RP-Tree, a r oad network-aware p artition tree to support efficient RQs. The basic idea is partitioning a road network graph into hierarchical subgraphs and generate a balanced tree structure, where each tree node maintains its associated trajectories. We compactly index the spatio-temporal information of trajectories on the corresponding road network edges. Then, we design efficient search algorithms to support RQs by pruning irrelevant trajectories through subgraph range borders associated with RP-Tree nodes. Last but not least, we scale RP-Tree to very large datasets by devising approximate algorithms with bounded confidence at an interactive speed. Experimental results on three real-world datasets from Porto, Chengdu, and Beijing show that our method outperform baselines by 1 to 2 orders of magnitude.

Velocity-Dependent Nearest Neighbor Query

Chapter

Aug 2021

Location-based services recommend points of interests (POIs) which are nearer to the user’s position q. In practice, when the user is moving with a velocity $\overrightarrow{v}$, he may prefer the nearer POIs which match his moving direction. In this paper, we propose the velocity-dependent nearest neighbor query (VeloNN query), which selects the POIs that are nearer and best match the user’s moving direction. In the VeloNN query, if the direction of a POI o highly matches the direction of $\overrightarrow{v}$, o is likely to be preferred. Since computing the directional preferences of all POIs is time-consuming, we propose rules to filter out the POIs with low directional preferences. We also divide the space into tiles, i.e., rectangular areas, and compute a candidate set for each tile in advance. The VeloNN candidates can be quickly prepared after finding the tile where the user is. We conduct experiments on both synthetic and real datasets and the results show the proposed algorithms can support VeloNN queries efficiently.

Time-aware approximate collective keyword search in traffic networks

Article

Aug 2021
KNOWL-BASED SYST

The collective spatial keyword query (CoSKQ), an important variant of spatial keyword query, aims to find a set of objects collectively covering the user’s query keywords, that are close to the query location and are close to each other. However, existing works only focus on the CoSKQ problem of exact keyword matching and cannot handle spelling errors and conventional spelling differences (for example, color vs. colour), that are common in real applications. Moreover, query time information is not considered. To this end, this paper takes the lead in studying the problem of Time-aware Approximate Collective spatial Keyword query processing in traffic networks (TACoSKQ), where the objects are located on a predefined traffic network. We first prove that the TACoSKQ problem is NP-complete, and design a hybrid index called TDAG-tree to support query-object distance pruning, inter-object distance pruning, approximate keyword pruning, and temporal pruning simultaneously. Then, we present two approximate algorithms with provable approximation bounds to efficiently support TACoSKQ query processing on traffic networks. Finally, extensive experiments using three real datasets demonstrate the efficiency and accuracy of our proposed algorithms.

Density-Based Top-K Spatial Textual Clusters Retrieval

Article

Jan 2021

So-called spatial web queries retrieve web content representing points of interest, such that the points of interest have descriptions that are relevant to query keywords and are located close to a query location. Two broad categories of such queries exist. The first encompasses queries that retrieve single spatial web objects that each satisfy the query arguments. Most proposals belong to this category. The second category, to which this paper's proposal belongs, encompasses queries that support exploratory user behavior and retrieve sets of objects that represent regions of space that may be of interest to the user. Specifically, the paper proposes a new type of query, the top-k spatial textual cluster retrieval ( $k$ -STC) query that returns the top-k clusters that (i) are located close to a query location, (ii) contain objects that are relevant with regard to given query keywords, and (iii) have an object density that exceeds a given threshold. To compute this query, we propose a DBSCAN-based approach and an OPTICS-based approach that rely on on-line density-based clustering and that exploit early stop conditions. Empirical studies on real data sets offer evidence that the paper's proposals can find good quality clusters and are capable of excellent performance.

Locating Mapped Resources in Web 2.0

Conference Paper

Full-text available

Apr 2010

Mapping mashups are emerging Web 2.0 applications in which data objects such as blogs, photos and videos from different sources are combined and marked in a map using APIs that are released by online mapping solutions such as Google and Yahoo Maps. These objects are typically associated with a set of tags capturing the embedded semantic and a set of coordinates indicating their geographical locations. Traditional web resource searching strategies are not effective in such an environment due to the lack of the gazetteer context in the tags. Instead, a better alternative approach is to locate an object by tag matching. However, the number of tags associated with each object is typically small, making it difficult for an object to capture the complete semantics in the query objects. In this paper, we focus on the fundamental application of locating geographical resources and propose an efficient tag-centric query processing strategy. In particular, we aim to find a set of nearest co-located objects which together match the query tags. Given the fact that there could be large number of data objects and tags, we develop an efficient search algorithm that can scale up in terms of the number of objects and tags. Further, to ensure that the results are relevant, we also propose a geographical context sensitive geo-tf-idf ranking mechanism. Our experiments on synthetic data sets demonstrate its scalability while the experiments using the real life data set confirm its practicality.

Hybrid index structures for location-based Web search

Conference Paper

Full-text available

Oct 2005

There is more and more commercial and research interest in location-based web search, i.e. finding web content whose topic is related to a particular place or region. In this type of search, location information should be indexed as well as text information. However, the index of conventional text search engine is set-oriented, while location information is two-dimensional and in Euclidean space. This brings new research problems on how to efficiently represent the location attributes of web pages and how to combine two types of indexes. In this paper, we propose to use a hybrid index structure, which integrates inverted files and R*-trees, to handle both textual and location aware queries. Three different combining schemes are studied: (1) inverted file and R*-tree double index, (2) first inverted file then R*-tree, (3) first R*-tree then inverted file. To validate the performance of proposed index structures, we design and implement a complete location-based web search engine which mainly consists of four parts: (1) an extractor which detects geographical scopes of web pages and represents geographical scopes as multiple MBRs based on geographical coordinates; (2) an indexer which builds hybrid index structures to integrate text and location information; (3) a ranker which ranks results by geographical relevance as well as non-geographical relevance; (4) an interface which is friendly for users to input location-based search queries and to obtain geographical and textual relevant results. Experiments on large real-world web dataset show that both the second and the third structures are superior in query time and the second is slightly better than the third. Additionally, indexes based on R*-trees are proven to be more efficient than indexes based on grid structures.

Constrained Nearest Neighbor Queries

Conference Paper

Full-text available

Jul 2001

In this paper we introduce the notion of constrained nearest neighbor queries (CNN) and propose a series of methods to answer them. This class of queries can be thought of as nearest neighbor queries with range constraints. Although both nearest neighbor and range queries have been analyzed extensively in previous literature, the implications of constrained nearest neighbor queries have not been discussed. Due to their versatility, CNN queries are suitable to a wide range of applications from GIS systems to reverse nearest neighbor queries and multimedia applications. We develop methods for answering CNN queries with different properties and advantages. We prove the optimality (with respect to I/O cost) of one of the techniques proposed in this paper. The superiority of the proposed technique is shown by a performance analysis.

Reverse spatial and textual k nearest neighbor search

Conference Paper

Full-text available

Jun 2011

Geographic objects associated with descriptive texts are becoming prevalent. This gives prominence to spatial keyword queries that take into account both the locations and textual descriptions of content. Specifically, the relevance of an object to a query is measured by spatial-textual similarity that is based on both spatial proximity and textual similarity. In this paper, we define Reverse Spatial Textual k Nearest Neighbor (RSTkNN) query, i.e., finding objects that take the query object as one of their k most spatial-textual similar objects. Existing works on reverse kNN queries focus solely on spatial locations but ignore text relevance. To answer RSTkNN queries efficiently, we propose a hybrid index tree called IUR-tree (Intersection-Union R-Tree) that effectively combines location proximity with textual similarity. Based on the IUR-tree, we design a branch-and-bound search algorithm. To further accelerate the query processing, we propose an enhanced variant of the IUR-tree called clustered IUR-tree and two corresponding optimization algorithms. Empirical studies show that the proposed algorithms offer scalability and are capable of excellent performance.

Processing spatial keyword(SK) queries in geographic information retrieval(GIR) systems

Article

Jan 2007

Efficient processing of top-k spatial preference queries

Article

Jan 2011

Nearest Neighbor Queries

Article

May 1995
SIGMOD REC

A frequently encountered type of query in Geographic Information Systems is to find the k nearest neighbor objects to a given point in space. Processing such queries requires substantially different search algorithms than those for location or range queries. In this paper we present an efficient branch-and-bound R-tree traversal algorithm to find the nearest neighbor object to a point, and then generalize it to finding the k nearest neighbors. We also discuss metrics for an optimistic and a pessimistic search ordering strategy as well as for pruning. Finally, we present the results of several experiments obtained using the implementation of our algorithm and examine the behavior of the metrics and the scalability of the algorithm.

Monitoring Orientation of Moving Objects around Focal Points

Conference Paper

Jul 2009

We consider a setting with numerous location-aware mov- ing objects that communicate with a central server. Assuming a set of focal points of interest, we aim at continuously monitoring object orien- tations and hence detect situations where many objects get closer to or move away from any such site. Towards this goal, we propose a streaming approach that delegates part of the processing to objects, which relay po- sitional updates upon significant deviations at their course. The central processor maintains the changing distribution of current object headings around each focal point and may issue alerts once it observes many ob- jects moving along a direction (e.g., increased northbound traffic near the stadium). To efficiently answer such navigational queries, we intro- duce a novel access method that indexes object headings influencing a specific site. Furthermore, we extent this scheme to examine trajectory movements around sites over the recent past. Experimental results verify that this framework is able to cope with scalable numbers of objects at reduced communication cost, while offering instant notification of impor- tant trends along diverse directions for multiple focal points.

Voronoi-Based K Nearest Neighbor Search for Spatial Network Databases

Conference Paper

Dec 2004

A frequent type of query in spatial networks (e.g., road networks) is to flnd the K near- est neighbors (KNN) of a given query ob- ject. With these networks, the distances be- tween objects depend on their network con- nectivity and it is computationally expen- sive to compute the distances (e.g., shortest paths) between objects. In this paper, we pro- pose a novel approach to e-ciently and accu- rately evaluate KNN queries in spatial net- work databases using flrst order Voronoi di- agram. This approach is based on partition- ing a large network to small Voronoi regions, and then pre-computing distances both within and across the regions. By localizing the pre- computation within the regions, we save on both storage and computation and by per- forming across-the-network computation for only the border points of the neighboring re- gions, we avoid global pre-computation be- tween every node-pair. Our empirical experi- ments with several real-world data sets show that our proposed solution outperforms ap- proaches that are based on on-line distance computation by up to one order of magnitude, and provides a factor of four improvement in the selectivity of the fllter step as compared to the index-based approaches.

Collective spatial keyword querying

Conference Paper

Jun 2011

With the proliferation of geo-positioning and geo-tagging, spatial web objects that possess both a geographical location and a textual description are gaining in prevalence, and spatial keyword queries that exploit both location and textual description are gaining in prominence. However, the queries studied so far generally focus on finding individual objects that each satisfy a query rather than finding groups of objects where the objects in a group collectively satisfy a query. We define the problem of retrieving a group of spatial web objects such that the group's keywords cover the query's keywords and such that objects are nearest to the query location and have the lowest inter-object distances. Specifically, we study two variants of this problem, both of which are NP-complete. We devise exact solutions as well as approximate solutions with provable approximation bounds to the problems. We present empirical studies that offer insight into the efficiency and accuracy of the solutions.

DESKS: Direction-aware spatial keyword search

Abstract

Recommended publications

Nine-areas-tree-bit-patterns-based method for continuous range queries over moving objects

AGENT: an adaptive geo-indistinguishable mechanism for continuous location-based service

Grid-based indexing with expansion of resident domains for monitoring moving objects

A Spatial Indexing Scheme for Location Based Service Queries in a Single Wireless Broadcast Channel