Content uploaded by Ran Tao
Author content
All content in this area was uploaded by Ran Tao on Jul 11, 2016
Content may be subject to copyright.
Spatial Cluster Detection in Spatial Flow Data
Ran Tao, Jean-Claude Thill
Department of Geography and Earth Sciences and Project Mosaic, University of North Carolina at
Charlotte, Charlotte, NC
As a typical form of geographical phenomena, spatial flow events have been widely stud-
ied in contexts like migration, daily commuting, and information exchange through tele-
communication. Studying the spatial pattern of flow data serves to reveal essential
information about the underlying process generating the phenomena. Most methods of
global clustering pattern detection and local clusters detection analysis are focused on
single-location spatial events or fail to preserve the integrity of spatial flow events. In this
research we introduce a new spatial statistical approach of detecting clustering (clusters)
of flow data that extends the classical local K-function, while maintaining the integrity of
flow data. Through the appropriate measurement of spatial proximity relationships
between entire flows, the new method successfully upgrades the classical hot spot detec-
tion method to the stage of “hot flow” detection. Several specific aspects of the method
are discussed to provide evidence of its robustness and expandability, such as the multi-
scale issue and relative importance control, using a real data set of vehicle theft and
recovery location pairs in Charlotte, NC.
Introduction
Spatial flows, also known as interactions between georeferenced places, constitute an enduring
object of research in spatial sciences. A flow event in geography typically consists of two basic
components, namely the spatial one, represented as a vector, and the aspatial component, which
encapsulates the type or value it represents. Common examples include migration flows, daily com-
muting flows, international trade flows, and flows of information exchanged through telecommuni-
cation. In general, there are two types of flow data, namely individual flows and aggregated flows
(Murray et al. 2011). The former pertain to individual activities, for example one person taking the
subway from home to work on a weekday morning. In contrast, the latter represent the movement
or interactions of a group of people or objects, for example a group of elks residing in the northern
section of Yellowstone National Park and migrating to lower altitudes before winter arrives.
Correspondence: Ran Tao, Department of Geography and Earth Sciences and Project Mosaic, Univer-
sity of North Carolina at Charlotte, Charlotte, NC
e-mail: rtao2@uncc.edu
[Correction added on 1 June 2016, after first online publication: the publisher apologizes for the wrong
version of this article being inadvertently published due to a technical error. Corrections for clarity
have been made throughout the article in the text, equations and references, without impacting the
results or conclusions of the study].
Submitted: March 05, 2015. Revised version accepted: February 01, 2016.
doi: 10.1111/gean.12100 1
V
C2016 The Ohio State University
Geographical Analysis (2016) 00, 00–00
Understanding the pattern and dynamics of spatial flows has been a long standing goal of
spatial scientists. With the fast development in sensor and GPS technologies in recent years, large
volumes of spatiotemporal data have become available with fine granularity. In addition, emerg-
ing types of interactive activities, like information exchange on social media networking, enhance
the richness of flow events. The increased availability of massive volumes of new forms of flow
data inevitably brings unprecedented opportunities to enrich our understanding of patterns and
processes embedded in the geographic space, but this also presents new analytical challenges at
several levels. First, there is the challenge to develop advanced methods to generalize and extract
useful information from massive flow data; next, the challenge to conceive new visualization
approaches to represent flows more effectively; also, to design handy and highly interactive tools
to incorporate flow data into geospatial information systems; and finally, to build spatial interac-
tion models to understand the nature behind locational choices and their relationships. Among
these endeavors, detecting spatial distribution patterns globally or locally, that is, clustered, scat-
tered, or random, across the spatial extent has garnered a lot of attention. While many contribu-
tions have used techniques such as Spatial Data Mining, Geovisualization, and Graph Theory
(Tobler 1987; Cui et al. 2008; Guo 2009; Zhu and Guo 2014) to better handle the large data vol-
ume, we contend that spatial statistics has not shown its full potentials for the detection of spatial
distribution patterns of flow data, in spite of the abundance of effective spatial statistics techni-
ques that have been devised to deal with spatial point data, spatial line segment data, and spatial
polygon data (e.g., Moran’s I (Moran 1950), Geary’s C (Geary 1954), Getis and Ord’s G (Getis
and Ord 1992; Ord and Getis 1995), Ripley’s K-function [Ripley 1976]). Thus, it is the purpose
of this study to develop novel spatial statistical approaches to detect spatial clustering patterns in
flow data with the aim of understanding their spatial relationships, while preserving the integrity
of the flow data. To this end, we introduce new spatial proximity measures tailored for flow data,
on the basis of which we extend the well-known point data analysis method, namely the local
Ripley’s K-function, to the spatial flow context. The new approach is presented and the evidence
of its robustness and efficiency is provided via experiments on a real data set.
The rest of this article is organized as follows. In the second section a brief literature
review is provided, which covers previous studies on spatial clustering detection especially
those pertaining to flow data. Then a thorough explanation of our new approach is presented,
including both the theoretical foundations and the technical details. The fourth section consists
of experiments with real data, along with evaluations of the performance of the proposed ana-
lytical method. We conclude with a discussion of the main characteristics and contribution of
our method, as well as proposed future extensions.
Literature review
Given the general tendency of spatial phenomena to co-occur spatially as encapsulated by
Tobler’s First Law of Geography (1970), spatial clustering is one of the most common spatial
patterns of point events. It represents a general tendency of events occurring closer to each
other than one might expect by chance (Waller 2009). An extensive body of literature on clus-
ter detection and monitoring exists that has advanced various methods to identify such pattern.
Several excellent references provide overviews of the concepts and methods involved (e.g.,
Diggle 1983; Cressie 1993; Fortin and Dale 2009; Symanzik 2014).
Early studies were mostly concerned with the overall spatial pattern exhibited by the
events and devised spatial statistics as a single index, sometimes labeled as “global” statistics,
Geographical Analysis
2
to depict the nature of events and of the spatial process producing a certain spatial distribution
within the entire study area. Well-known examples include Moran’s I, Geary’s C, Quadrat
Analysis, Nearest Neighbor Index, and Ripley’s K-function. However, one of the fundamental
assumptions of these methods, namely the spatial stationarity, is difficult to comply with in
many real situations. Furthermore, a single statistic does not allow to further investigate more
detailed patterns and relationships such as how the spatial process associated with one variable
would be dependent on others (Fotheringham 1997). To cope with such issues, spatial pattern
analysis has shifted toward the development of local statistics for detecting spatial clusters. In
contrast with global spatial clustering methods that are designed to identify whether there exists
a general tendency for events to occur nearer other events than expected by chance, techniques
for localized cluster detection are aimed at finding anomalies and interesting collections of spa-
tial events within the study area that appear to be inconsistent with the background conceptual
model of how events arise (Besag and Newell 1991; Waller 2009). Notable approaches include
the geographical analysis machine (GAM) (Openshaw et al. 1987) and its derivative methods
(Besag and Newell 1991; Fotheringham and Zhan 1996), the local version of Ripley’s K-
function (Getis and Franklin 1987), local indicators of spatial association (LISA) especially the
local Moran’s I statistic, local Geary’s C (Anselin 1995), and local G statistic (Getis and Ord
1992; Ord and Getis 1995). Some local detection methods around predetermined locations are
called “focused tests” to differentiate them from those based on randomly chosen event loca-
tions (Besag and Newell 1991). The local Cross K-function is such a focused test to identify
clusters of events around specific locations, such as crime instances around railway stations or
shopping malls (Boots and Okabe 2007). Regardless of the technical details, local cluster
detection methods all hold the advantages that they can better integrate with the fast-
developing GeoComputation technology to handle large data sets and their results can be well
illustrated with the visualization and mapping capabilities of Geographic Information Systems
(GIS) (Fotheringham 1997). Recent contributions come from both the methodological develop-
ment perspective, such as the network-constrained local K-function and local Moran’s I
(Yamada and Thill 2007, 2010), the Multidirectional Optimum Ecotope-Based Algorithm
(AMOEBA) (Aldstadt and Getis 2006), and from the toolset designing perspective, for exam-
ple R, ArcGIS, GeoDa (Anselin, Syabri, and Kho 2006), SaTScan (Kulldorff et al. 1997).
The preponderance of the literature on spatial point pattern analysis treats each point as an
event independent of all the others. Spatial flow data, however, encompass at least two points,
one corresponding to the origin or start of the flow and one for the destination or end of the flow.
Flow data, therefore, differ fundamentally from single point data and methods designed to handle
the latter cannot be directly applied to flow data. Several endeavors have been undertaken in pre-
vious research to fill this gap. Berglund and Karlstr€
om (1999) applied the Gistatistics introduced
by Getis and Ord (1992) and Ord and Getis (1995) to identify local spatial association in flow
data. Although several different spatial weight matrices were proposed in this article to address
spatial non-stationarity, only the simplest binary spatial weight matrix based on identical origins
or destinations was implemented, which certainly limits its usage. Lu and Thill (2003) proposed
an ad hoc and partially qualitative approach in which they apply point cluster detection methods
to analyze origin and destination points respectively, and combine the two sets of results via a
relationship table to conclude on the patterns exhibited by the flows. Related issues such as sensi-
tivity to scale and neighborhood definition were discussed in their later work (Lu and Thill 2008).
While decomposing one-dimensional flows into zero-dimensional points can considerably sim-
plify the problem, this approach would inevitably overlook the simultaneity of some critical
Ran Tao and Jean-Claude Thill Spatial Flow Cluster Detection
3
information, such as flow direction and flow length. Murray et al. (2011) departed from this
approach by combining exploratory spatial data analysis and confirmatory circular statistics to
analyze the similarities of flow direction and length. However, they sacrifice the actual locational
information in the process so that little knowledge on spatial relationships between movements
can be extracted. More recently, Liu, Tong, and Liu (2015) extended both global and local Mor-
an’s I statistics to a flow context, considering movement distances and directions at once. None-
theless, their approach is still based on the spatial proximity relationship of either set of end
points rather than entire vectors. Therefore, we contend that it remains within the scope of meas-
uring spatial autocorrelation of vectors/flows in parts rather than as a whole. The method pro-
posed in this article departs radically from the existing literature by maintaining the integrity of
flow data. It not only fully considers flow characteristics, that is, end points, length, and direction,
but also builds on proper measurement of spatial proximity relationship between entire flows.
While this article mainly focuses on spatial statistical methods, contributions from other
perspectives are also worth considering. Various research contributions apply techniques of
data mining and geovisualization to investigate the properties of spatial flows. Tobler (1987)
suggested that selective information aggregation and removal is an effective strategy for identi-
fying patterns through visualization and he pioneered this idea to analyze migration flows with
computer-drawn maps. Benefiting from burgeoning computing capability and visualization per-
formance, many contributions have emerged to be both effective and efficient, especially for
large data sets. K-means algorithms have proved very effective with respect to multilocation
spatial data (Genolini and Falissard 2010; Ossama, Mokhtar, and El-Sharkawi 2011). Density-
based clustering methods have also been adjusted to the nature of flow data by summarizing
the distributions of origins and destinations (Nanni and Pedreschi 2006; Zhu and Guo 2014).
Geometry-based edge-bounding is another type of approaches to reduce the visual clutter
caused by extensive edge crossing in flow maps (Cui et al. 2008). To serve the same purpose,
Guo (2009) proposed a visualization framework to partition spatial interactions into their
“nature” regions and discover mixing patterns of flow networks. In general such visual analyti-
cal methods embrace the principle of data mining and analytical classification methods
designed to group observations into “clusters” based on similarity (Waller 2009); therefore,
they are also named “cluster analysis.” Given the overlap in terminology is really confusing, it
is necessary to differentiate these “cluster analysis” methods from the spatial statistical
approaches of cluster detection presented in this article. While we mainly focus on building
innovative spatial statistics here, it is potentially very meaningful to incorporate these methods
of exploratory analysis as a prior step to help propose hypotheses.
Methodology
The principle
In spatial analysis, cluster detection is an approach to second-order analysis that is designed to
examine spatial dependence, or spatial relationships between events (Getis and Franklin 1987).
The first step is to choose an appropriate measure of spatial proximity between events, for
which distance is a common choice. Ripley’s K-function, Geographic Analysis Machine, Near-
est Neighbor Index and many other statistical approaches are all distance-based methods. Aside
from the default Euclidean distance, other kinds of distance are also applied in some cases, for
instance the network distance (Yamada and Thill 2007). With spatial flow data, there is no nat-
ural mean to measure spatial proximity due to the multilocation nature of flow records and this
Geographical Analysis
4
is arguably the biggest difficulty in analyzing spatial patterns of flow data. In other words, with
appropriately measured spatial proximity, cluster detection on flows boils down to the same
algorithmic processes as for points or polygons. Although various distance measures have been
proposed in data mining studies of trajectory, for example using the Hausdorff distance to
extract clustered line segments of trajectories (Lee, Han, and Whang 2007; Chen et al. 2011),
we argue that these distances are not suitable to measure proximity between flows which have
explicit and meaningful location correspondence. Accordingly, we devise a new proximity
measure called the “Flow Distance” and a variant called the “Flow Dissimilarity.” Then we
extend a well-developed spatial point statistic, namely Ripley’s K-function, to the spatial flow
context based on the newly defined proximity measures. Statistical significance is tested by
Monte Carlo simulation against the null hypothesis of spatial randomness. Several aspects such
as the multiscalar relevance, relative importance control, and flow value, are discussed in detail
here to demonstrate that this method is versatile and practical.
Flow model
The first step is to define the study object, namely the spatial flow process. Fig. 1 shows two
instances of a spatial process Fthat starts at location Oand ends at location D. Basic character-
istics of Finclude length: l5j
~
ODj; direction: same as the direction of vector
~
OD; type: T(e.g.,
commuting flow); and value W(e.g., the number of commuters). This basic model is used to
represent spatial flow processes in the rest of the article.
Flow proximity
As mentioned earlier, defining an appropriate proximity measure is the key to decode spatial
flow patterns. Here we introduce such measures based on which both intrarelationships and
interrelationships of flows can be extracted.
Let us take the simple case of measuring the spatial proximity between flow F
i
(with origin
point O
i
(x
i,
y
i
) and destination point D
i
(u
i,
v
i
)) and flow F
j
(from point O
j
(x
j,
y
j
) to point D
j
(u
j,
v
j
)) in a two-dimensional space (Fig. 1). Measuring distance between these two spatial flows
following the approaches advocated so far in the literature would generally be inadequate
because distance between either origin points or destination points cannot fully represent the
closeness between flows in their entirety. For instance, when both origins are a short (or long)
distance to each other and the same can be said of destinations, we can expect that F
i
and F
j
are also close (or distant, respectively). However, things become less trivial when the two end-
point pairs show dissimilar spatial closeness, that is, origins are close while destinations are
distant, or vice versa. Using categorical descriptions is certainly one way to associate distances
among origins and destinations. For instance, both distances being short (or both endpoint pairs
belong to the same region) would correspond to “high” spatial association between flows while
only one pair of end points being close (or belonging to the same region) would correspond to
Figure 1. Basic flow model.
Ran Tao and Jean-Claude Thill Spatial Flow Cluster Detection
5
a “medium” degree of association (Berglund and Karlstr€
om 1999; Lu and Thill 2003; Zhu and
Guo 2014). While such approaches make sense to some extent, they are very sensitive to the ad
hoc description standards and exhibit limited external validity.
Unlike approaches treating spatial flows as two separate sets of endpoints, we propose to
calculate a flow distance that regards flows as inseparable objects. A flow process F
i
with origin
point O
i
(x
i,
y
i
) and destination point D
i
(u
i,
v
i
) can be seen as a vector point with four coordi-
nates F
i
(x
i,
y
i,
u
i,
v
i
) in a four-dimensional space. Derived from the general function of Euclid-
ean distance, we define the Flow Distance between flows F
i
(x
i,
y
i,
u
i,
v
i
) and F
i
(x
j,
y
j,
u
j,
v
j
) as:
FDij5ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
axi2xjÞ21yi2yjÞ2
i
1b ui2ujÞ21vi2vjÞ2
i
:
hh
r
or simplify as :FDij 5ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
adO21bdD2
q:(1)
where FDij denotes the distance between these two flows; dOand dDare the Euclidean distan-
ces between the two origins and two destinations, respectively; the coefficients aand bserve
to control the relative importance of either sets of endpoints (a>0; b>0;a1b52; by
default a5b51). Through this definition, both the closeness of origins and of destinations
make a contribution to the calculation of the Flow Distance. For example in Fig. 2a,
FD125ffiffiffiffiffiffiffiffiffiffiffiffiffi
22122
p5ffiffiffi
8
p. The value of Flow Distance becomes larger (or smaller) if both end-
points are moved further (or closer) to their counterpart at the same time, for example, FD12
increases to ffiffiffiffiffi
18
pin Fig. 2b while it decreases to ffiffiffi
2
pin Fig. 2c. This corresponds to the general
sense that proximities of endpoints are positively correlated to the flow closeness.
More importantly, the distance between origins and the distance between destinations are
integrated by the same square root transformation so their variations are captured continuously
and consistently, which leads to greater accuracy than qualitative descriptors. For instance,
compared with Fig. 2a, Flow F2in Fig. 2d has its origin moved toward F1’s and has its destina-
tion moved away from F1’s. According to previous methods, whether these two flows in
Figure 2. Flow Distance Examples.
Geographical Analysis
6
Fig. 2d are as close as they are in Fig. 2a completely depend on the definition of endpoint’s
contiguity relationship. In other words, if two points are defined as contiguous when their dis-
tance is less than or equal to 2, F1and F2would have two contiguous endpoint pairs in Fig. 2a
but only one in Fig. 2d. As a result, the proximities between F1and F2are radically different.
In contrast, by our definition of Flow Distance, measuring proximity between two flows is
not subject to the definition of endpoint’s own region or the description of the combined end-
point’s closeness. Instead, we capture the variation of all locations seamlessly and let the
flow data decide its own spatial neighbors for itself. Accordingly, the distance between F1
and F2can be calculated and compared directly as FD12 equals ffiffiffi
8
pin both Fig. 2a and d
scenarios.
Nevertheless, only using the location information of endpoints may be inadequate some-
times because a flow does not only represent the interaction or movement between two loca-
tions, but also indicates how far and in what direction the interaction or movement happens. As
shown in Fig. 2e, two flows have exactly the same endpoint distances as Fig. 2a, therefore the
Flow Distances are the same according to equation (1). Regardless of the real data type they
represent, it would be controversial to say that the two flows in Fig. 2e are as close as the ones
in Fig. 2a given that they are separated much more, relative to their lengths. Controlling for the
impact of flow length may be necessary to avoid false positive detection of flow clusters. To
this end, we propose an extended version of Flow Distance that involves a rescaling, as pro-
vided by equation (2). By dividing by the geometric mean of two flow lengths, a flow pair with
longer average length would be measured closer, ceteris paribus. Therefore, the distance
between the short flows F1and F2in Fig. 2e becomes four times longer as the one in Fig. 2a.
The rationality behind this adjustment is that under many circumstances it is more difficult or
rarer to witness spatial interaction or movement happen between two distant locations than
close locations. For example wild animals are more likely to travel to a nearby river than a dis-
tant one to seek water. Incorporating flow length into the measure is one way to adjust the crite-
rion of clustering detection for flows with unequal lengths. Given the adjustment would impair
some of the metric properties of distance, we name the adjusted Flow Distance as Flow Dissim-
ilarity, short for FDS in the rest of this article. Also we choose to use the geometric mean over
the arithmetic mean of flow lengths because the former is more capable to attenuate the impact
of extremely unequal length values. In addition, it avoids the limit case of zero-length flows.
FDSij 5ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
a½ðxi2xjÞ21ðyi2yjÞ21b½ðui2ujÞ21ðvi2vjÞ2
LiLj
s:
or :FDSij5ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
adO21bdD2
LiLj
s:(2)
where FDSij denotes the Flow Dissimilarity between these two flows; Liand Ljare the flow
lengths; the rest are the same as equation (1).
Although considering flow length in spatial pattern detection can be very useful and some-
times necessary, we are not arguing that this is a better approach in all situations. Instead, we
believe that they both make sense under certain circumstances. Evidences can be found in liter-
ature that flow length was not discussed in some research (Berglund and Karlstr€
om 1999; Lu
and Thill 2003, 2008; Zhu and Guo 2014), while it was taken into consideration in some others
(Murray et al. 2011; Liu, Tong, and Liu 2015). In this research experiments have been
Ran Tao and Jean-Claude Thill Spatial Flow Cluster Detection
7
conducted with both Flow Distance (equation [1]) and Flow Dissimilarity (equation [2]) for
comparison, and details are provided in the case study section below.
Besides endpoint locations and flow length, the only remaining spatial element of a flow is
its directionality. Although we do not directly measure directionality in equations (1) and (2),
its impact is implicitly accounted for. As illustrated in Fig. 2f, to maintain F2at the same dis-
tance from F1, according to our Flow Dissimilarity equation it is sufficient to keep its origin
and destination at a constant distance from F1’s two endpoints, that is, to keep its endpoints sit-
uated on circles centered on F1’s two endpoints (the dashed rings), for example, F’
2. Given this
geometric constraint, there are in fact few degrees of freedom in directionality for flows that
exhibit a tendency toward clustering. Therefore we argue that it is not necessary to discuss flow
direction alone since it is heavily dependent on the endpoint locations and flow length. Our test
results have also demonstrated this argument by identifying clusters of similar-direction flows.
Last but not least, the coefficients (a;b) in the distance and dissimilarity functions are
designed to offer some flexibilities in measuring real flow data. The basic functions by default
(a5b51) assign equal importance to the origin location and destination location of each
flow. However, the research objectives may lead us to pay closer attention to one set of end-
points over the other. For instance, in a study of settlement of foreign immigrants in New York
City in relation to national origin, socio-spatial patterns and processes would be better
informed if more weight is put on where immigrants choose to reside rather than where they
come from. As another example, the manager of a shopping center would be more interested in
where customers come from so that more targeted and effective advertising strategies can be
designed. The inconsistent spatial scale of flow origins and destinations may be another justifi-
cation to rebalance the relative importance of origins and destinations in the Flow Distance and
Dissimilarity measures. For example, different land uses are known to be spatially distributed
differently across cities; in particular employment sites tend to be more clustered geographi-
cally than residential land uses. Therefore, to avoid a statistical bias, a spatial analysis of com-
muting flows should control for the spatial distribution of potential flow origins and
destinations. With appropriate calibration, the same distance (e.g., 500 meters) would have the
same impact on describing the proximity between two origin locations or between two destina-
tion locations.
By adjusting the values of aand b, the Flow Distance or Dissimilarity can receive differ-
ent contributions from origins and destinations. For example, if we assign a51.5 and b50.5,
the Flow Distance or Dissimilarity would be more sensitive to the change of origin locations
and the corresponding spatial pattern would put more weight on where flows start. In addition,
we restrict that a1b52 to ensure the results with different coefficients are comparable. They
both must also have positive value to match the reality of flow data sets rather than points.
Hot spot detection method
Using our Flow Distance (or Flow Dissimilarity) as the spatial proximity measure, it becomes
possible to apply well-developed distance-based methods to detect spatial clusters of flow data.
In this study we choose to adjust the local version of Ripley’s K-function. As a classical clus-
tering detection method, the K-function has been continuously implemented and enhanced
since it was redefined by Ripley in 1976 (Ripley 1976; Okabe, Boots, and Satoh 2007). The
fundamental idea of the K-function is to count the number of events within a certain distance
threshold of randomly selected event locations. This number is then used to calculate K-
Geographical Analysis
8
function value after dividing by the event density and the analysis is repeated for other distan-
ces within a set interval. To obtain statistical conclusions, the K-function value needs to be
compared with the expected value given by the null hypothesis, for example Complete Spatial
Randomness (CSR). If the observed value is higher than expected, the study events exhibit a
tendency toward clustering; or dispersed, if it is lower. Monte Carlo simulation is a frequently
applied technique to assess statistical significance (Openshaw et al. 1987). One of the meaning-
ful extensions of K-functions was introduced by Getis and Franklin (1987), based on second-
order neighborhood analysis of mapped point patterns, which has been known as local K-
function analysis. An extension of the local K-function (equation [4]) is applied in this research
to flow data using the four-dimensional approach introduced above. Instead of counting point
events, flow events are counted within a certain Flow Distance (or Flow Dissimilarity) rof
flow F
i
to represent the function value:
LocKirðÞ5E number of other flow events within r of flow iðÞ:(3)
where LocKirðÞis the local K-function value of flow F
i
at scale r. The scale r, also known as
the detection window radius or threshold distance, has always been a crucial factor in spatial
statistics, especially the K-function, which is even known as “multi-distance cluster analysis”.
In our approach we implement the local K-function at multiple scales as well. By increasing
the magnitude of scale rwithin a certain range deemed suitable to the process under study, for
example, from 0.1 mile to 1 mile when using Flow Distance or from 0.1 to 1.0 when using
Flow Dissimilarity, it is convenient to detect multiscale clustering patterns at once.
As with other spatial statistical methods, statistical inference is an important part of reach-
ing any conclusion. Given the nature of flow data, normal approximation is not an appropriate
null hypothesis (Lu and Thill 2003, 2008; Liu, Tong, and Liu 2015). Random permutations
with Monte Carlo simulation can better serve this purpose. In a two-dimensional space, there
are normally more than one way to simulate a set of flows. On the one hand, we can proceed
by setting the location of two endpoints for each simulated flow. Alternatively, we could use
observed flows as objects and move or rotate them in the study area according to some random-
ization procedure. Whatever the technique used, the theory or basic assumptions behind the
simulation must be fully spelled out.
The simplest way is to simulate two sets of points randomly and independently based on
Poisson distribution, and then pair and connect them as flows. However, the customary null
hypothesis for point data, that is, CSR, may not be the best option for flows. A more sensible
way is conditional spatial randomness, which has been used widely for computing the pseudo
P-value in spatial statistics (Anselin 1995). In terms of flow data, the “condition” should be
considered when the endpoints are restricted to the distribution of an at-risk population. For
instance, to simulate commuting flows according to residence distribution and workplace distri-
bution (Lu and Thill 2003); to simulate car accident points on the road network and adjust by
annual average daily traffic (Yamada and Thill 2010). In addition to endpoint locations, the dis-
tribution of flow length and flow direction can also be conditional. Liu, Tong, and Liu (2015)
simulate a set of flows by moving one flow to another randomly selected flow’s endpoint loca-
tion so that only flows’ locations are changed while the lengths and directions are kept the
same. They propose another way by randomly pairing two points, one from observed origins
and the other from observed destinations, to form simulated flows. This approach keeps end-
point locations the same but reshuffles the lengths and directions as opposed to the first
Ran Tao and Jean-Claude Thill Spatial Flow Cluster Detection
9
approach. In sum, there is no unique way to simulate spatial flows for significance testing. It is
subject to the data to make appropriate assumption (e.g., restricted to at risk population). In
addition, is up to the analyst to choose which aspect to examine (e.g., to examine the contribu-
tion of flow location to the general flow clustering pattern by only randomizing location while
fixing direction and length). Fundamentally cluster detection is an exploratory analysis. The
clusters identified can reflect the respective underlying geographical processes and can also
help us contemplate unknown ruling attributes contributing to the spatial pattern. The detailed
algorithm is presented step by step as follows.
Algorithm implementation
1. Calculate Flow Proximity
a. Prepare flow events as vectors with the coordinates of origin and destination points.
For example, flow Fiwith origin Oixi;yi
ðÞand destination Diui;vi
ðÞis formatted
as Fixi;yi;ui;vi
ðÞ:
b. Apply equation (1) or (2) to calculate the Flow Distance or Flow Dissimilarity
between every two flows. Thus an Nby Ndistance matrix is computed for subsequent
use.
2. Calculate clustering detection statistics.
Calculate the local K-function using equation (3) for all the flow events using a series
of scales rt(t51, 2, ..., 10; rt5r13t). The unit of r1is chosen on the proximity equation
used in previous step, for example, r150.1 mile along with equation (1); r150.1 along
with equation (2).
3. Evaluate statistical significance.
a. Randomly simulate a set of Nflows in the study area.
b. Calculate the local K-function value for each simulated flow same as step (1) and (2).
c. Repeat previous two steps 1,000 times.
d. Sort results of the 1,000-time simulations for each flow at each scale. Set the smallest
and largest ones as the lower and upper envelopes (0.1% significance level).
e. Compare the actual result with the corresponding significance envelopes. If the
observed value surpasses the upper envelop, or is below the lower envelope, the
observed pattern is said to be clustered or dispersed, respectively.
4. Visualize and discuss the results.
Experimental study
Data description
In this study, we test the new flow K-Function method and its algorithmic implementation
using a data set of vehicle theft and recovery location pairs in Charlotte, North Carolina. Given
the determinate relationship and chronological order of the data, the locations where theft hap-
pened and the places where the vehicles were recovered can be regarded as flow origins and
destinations, respectively. According to the crime report released by the Charlotte-
Mecklenburg Police Department (CMPD), there were 14,064 vehicle theft cases within the city
from 09/01/2008 to 08/31/2014. Of all these cases, 6,960 have correct corresponding recovery
locations somewhere else in the city. In the data cleaning process, we excluded the records
with identical theft and recovery locations to exclude the cases of attempted break-ins, damage
Geographical Analysis
10
to the vehicle, interrupted stealing, or other incomplete theft crimes. The final study data set
consists of 6,810 theft-recovery flow events. From the map shown as Fig. 3 we can observe the
distribution of these locations. Overall, both theft and recovery locations have similar distribu-
tion across the city: there is a concentration around the city center, except for the southern por-
tion, which is known to encompass more affluent neighborhoods.
To gain a more intuitive knowledge of the data we also estimated the kernel density
(KDE) for both sets of locations with a cell size of 400 square feet and bandwidth of 0.5 mile
(Fig. 4). The KDE maps indicate that many car thefts happened in the eastern and northern
areas near the city center, while a significant part of them were recovered in the northwestern
region, where Charlotte Douglas International Airport is located. However, based on point pat-
tern analysis only, we can hardly build connections between theft locations and corresponding
recovery locations. According to popular criminological theories of vehicle theft crimes, such
as rational choice theory and routine activity theory, most criminals have meticulously
designed their target places and destination places in advance based on their cost-benefit analy-
ses (Lu 2006). As the new trend indicates, more vehicles are stolen by criminal gangs for
money-making business rather than joy-riding (McGoey 2000). Thus it would be extremely
useful to discover the spatial patterns of how stolen vehicles are transported from their offense
place to their destination.
Following the complete algorithm given in the previous section, we implement our flow
clustering detection approach on these crime data step by step. The null hypothesis of flow dis-
tribution is that car thefts and recoveries can happen anywhere on the street network within the
Charlotte city limits. Therefore the 1,000 time Monte Carlo simulation is proceeded by
Figure 3. (a) Vehicle theft locations in Charlotte. (b) Vehicle recovery locations in
Charlotte.
Ran Tao and Jean-Claude Thill Spatial Flow Cluster Detection
11
randomly locating flows’ endpoints on the city’s street network. The reason to choose such
assumption is that we have little prior knowledge about motor vehicle theft crime to add more
restrictions to the distribution of car theft and recovery event locations, or to the flow lengths
and directions. Not imposing constraints on the spatial characteristics of flows in the simulation
process has the advantage of not excluding any possible contributions to the final cluster
results. Edge effects are corrected by reducing the analysis area by a distance equal to the larg-
est distance band used in the analysis (one mile in this case study). Only the flows with both
endpoints within this shrunk area are selected to computing the algorithm, while the back-
ground flow spatial process and the simulated flows remain within the original area. The imple-
mentation program is written in C/C11 and parallel computing technique OpenMP is also
applied to accelerate computation, especially the simulation part. Results are visualized via
software ArcMap 10.1 and jFlowMap (Boyandin, Bertini, Lalanne 2010).
Results and discussion
Fig. 5 shows the local flow clusters detected with our method at selected scales.
1
The flows on
the maps represent the local clusters detected by our new approach as significant at the 0.1%
level. Each flow has one end colored in red to denote the theft location and the other end in
green to show the recovery location. To avoid visual clutter, we aggregate nearby flow clusters
into the census block groups where their end points are situated.
The results are analyzed from two aspects. First, we compare the results obtained using the
same equation of flow proximity measure. The first three results use Flow Distance with scale
of different magnitudes, that is, 0.1, 0.2, and 0.3 of a mile. As the magnitude of the scale
Figure 4. (a) KDE estimation of theft locations. (b) Kernel density estimation of recovery
locations.
Geographical Analysis
12
increases, more flows are detected as local clusters. The same pattern can be found in the other
set of results using Flow Dissimilarity. The variance caused by scale magnitude is consistent
with the basic feature of the K-function that the spatial pattern is partly dependent upon the
size of the detection window. The increasing number of local flow clusters indicates that more
nearby flows are included to contribute to the local K-function value as the detection window
becomes larger. At the same time, the increase of scale does not have an equivalent impact on
the background distribution which represents our null hypothesis. It is because we simulate the
background distribution by randomly placing the flow events on the street network without fur-
ther specific control, for example, crime risk; therefore the simulated flows are distributed
more sparsely throughout the city. As a result, the increase of scale has a positive impact on the
number of local flow clusters that are detected. As in other K-function related research, choos-
ing the optimal magnitude of scale remains an open question. It is typically selected in relation
to how the results can make sense to explain context-dependent research questions. In this
case, Fig. 5f presents some interesting patterns about vehicle theft and recovery flows. Vehicles
Figure 5. Detected flow clusters using different flow proximity measures. (a), (b), (c) use
Flow Distance (equation [1]) with detection scale equal to 0.1 mile, 0.2 mile, and 0.3 mile,
respectively. (d), (e), (f) use Flow Dissimilarity (equation [2]) with detection scale equal to
0.03, 0.04, and 0.05 respectively.
Ran Tao and Jean-Claude Thill Spatial Flow Cluster Detection
13
stolen from the area in the Southwestern section of the city are usually found somewhere far
away and their transport directions vary considerably. In addition, there is another group of
clusters in the Southeast showing much shorter transport distances and with similar directions
toward the North. One possible reason is that for the vehicles stolen in the Southwest area there
are only a few “favorable” places nearby for criminals to dispose of them. Therefore these cars
are transported over a long distance to places like chop shops for selling or to places like the
airport. Routine criminals who steal from the Southeast area may find it much easier because
there are sites nearby in the North to dispose of the cars.
On the other hand, we can also compare the results using different types of flow proximity
measures, namely the Flow Distance and Flow Dissimilarity. Comparing the two series of
maps in the top and bottom parts of Fig. 5 for a similar number of local clusters, the most
obvious difference is the average length of clustered flows. The results using Flow Distance
contain many short flows, while the results using Flow Dissimilarity tend to indicate longer
flows as local clusters. Taking a closer look, we find that some flows—especially shorter
ones—within the same cluster identified using Flow Distance do not share many geographic
and geometric similarities with their neighboring flows, for example, quite different flow direc-
tions and flow lengths. In contrast, flows within the same cluster using Flow Dissimilarity tend
to be very similar to each other. The reason behind this difference is that, when flow length is
not considered in measuring flow proximity, short flows need not be as similar in endpoint
locations, length and direction to each other as longer ones to have the same flow distance.
Therefore, they are more readily detected as the locus of a significant cluster than long ones, all
other things being equal. It results in false positive detection since some flows are detected as
local clusters simply because they are short enough to be captured by the detection window.
On the contrary, local clusters identified with Flow Dissimilarity include flows with close
vehicle theft sites, close vehicle recovery sites, and similar movement directionality and distan-
ces. The pattern is consistent throughout the study region. Moreover, the results would be of
practical use to law enforcement agencies to detect routine gang-related crimes with locational
preference for stealing and selling/disposing of vehicles in the city. As a conclusion, we argue
that the algorithm using Flow Dissimilarity to measure flow proximity is less likely to lead to
false positive errors as it controls for one source of spurious cluster detection. Besides, it pro-
vides a meaningful alternative to the traditional distance scale in solving the instability or
inequality in cross-scale flow clustering detection.
So far we have only discussed experiments with the basic version of the flow proximity meas-
ures. Further usefulness of the measures can be explored by changing its parameter value. In both
equations (1) and (2), we specify two coefficients, that is, aand b, to control the relative impor-
tance of origins and destinations. The expectation is that changing the relative value of these coeffi-
cients can purposely create a tendency for alternative cluster detection results. To test this
hypothesis, we adjust our approach by changing the coefficient values in Flow Distance. We assign
a51:5 and b50:5forthefirstgroupanda50:5 and b51:5 for the second. The sum of the
coefficient values is controlled as 2, for the sake of the comparability of the results.
Fig. 6 includes two comparable result maps. Fig. 6a shows the clusters detected by the
Flow Dissimilarity with a51:5 and b50:5, while Fig. 6b shows the outcomes setting
a50:5 and b51:5, both using Flow Dissimilarity measure with a scale equal to 0.04. Compar-
ing these two maps and also comparing them with Fig. 5d for which a5b51 by default, we
find that Fig. 6a contains more unique clusters with very close theft locations (red end) but rela-
tively distant recovery locations (green end), while Fig. 6b tends to show the opposite pattern.
Geographical Analysis
14
In other words, flows with close theft locations are easy to be detected as clusters in Fig. 6a and
flows with close recovery locations are favored in Fig. 6b. These observations are in line with
our premise that changing the value of Flow Distance coefficients can lead to results with dif-
ferent emphases, which can cater to people with different interests. In terms of practical useful-
ness, citizens would be more interested in looking at Fig. 6a which can inform where vehicle-
theft crimes are more likely to happen so that they can avoid parking in these highly risky pla-
ces. On the contrary, police would find Fig. 6b more useful in order to know where the concen-
trations of car-disposal places are and where they should search for the lost vehicles. By
comparing the result maps with Google Maps we found that the neighborhoods surrounding the
main campus of UNC Charlotte correspond to the cluster of theft sites in the northeastern part
of Fig. 6a, which indicates that this area is a popular car theft locus. Some clusters of recovery
places near the city center in Fig. 6b match the locations of savage vehicle yards or chop shops,
where stolen cars can be quickly transacted with cash and be sold again in parts.
Conclusions
Spatial statistical approaches to clustering detection have been continuously developed for dec-
ades. In contrast with abundant methods designed for point and polygon data, approaches well
suited to handling spatial flow data have not been well developed so far. To fill this gap and
also to meet the challenges brought by the emerging breadth of massive flow data, this research
has developed an innovative spatial statistical method for flows. A pair of particular spatial
proximity measures called the Flow Distance and Flow Dissimilarity have been designed.
Based on these measures the local version of the K-function is adjusted and implemented to
Figure 6. Flow clusters with different endpoint emphases. (a) Clusters more focused on theft
locations (a51:5;b50:5). (b) Clusters more focused on recovery locations (a50:5;b51:5).
Ran Tao and Jean-Claude Thill Spatial Flow Cluster Detection
15
examine the second-order effects of spatial flows. By comparing the observed local K-function
value with the statistical confidence envelops generated via Monte Carlo simulation, the local
clustering pattern of each flow event can be identified at a certain statistical significance level.
The new method is an intuitive extension of the principles embedded in the K-function for one-
dimensional point events and is applicable to all types of flow data.
To test the effectiveness and usefulness of our method, a series of experiments have been
implemented using a real data set of vehicle theft-recovery flows in Charlotte, NC. The results
demonstrate that our method is capable of identifying local clusters from the several thousands of
tangled flows. Specifically, the measures we designed proved not only to be measures of spatial
proximity, but an effective solution for the inclusion of the multilocation interaction objects
within the scope of well-developed point pattern spatial statistics, namely the local K-function.
By adjusting the parameters of endpoint coordinate pairs, the study emphasis can be purposely
placed on the spatial associations between either flow origins or flow destinations. In addition,
the impact of flow length has also been thoroughly discussed. To overcome the statistical bias
brought by flow lengths, we introduced a variant of Flow Distance called Flow Dissimilarity.
The experiment shows that the algorithm using Flow Dissimilarity leads to more stable spatial
patterns and is adaptive to flows with varied lengths across the study region. Overall, the method
designed in this research has fully utilized the spatial characteristics of flow data, and it is demon-
strated to be capable of investigating spatial associations of flow events across scales. The results
examined with this method have practical implications as well. In this vehicle-theft crime exam-
ple, it can inform not only where frequent car theft and recovery happen, but how the stolen cars
are moved from one place to another in the form of spatial flow clusters. The results are espe-
cially useful to devise effective police responses to routine gang crime activities.
The proposed analytic method can be extended in several ways. First, further work can be
done to expand the capability of this method to include additional event characteristics, for
example considering flow type and value in “hot flow” detection. A plausible idea is to use the
local cross K-function (Boots and Okabe 2007) instead of the traditional local K-function to
detect clusters of flows with different types, for example, rescue goods flow spatially associated
with refugee flow; and to accumulate the total value of nearby flows instead of simply tallying
their frequency in calculating the local K-function so as to adjust the contribution of flows with
unequal value, for example, a one-thousand-people commuting flow versus a single-person
commuting flow. Also, we believe that the Flow Distance and Flow Dissimilarity measures can
be shown to be effective with other methods of exploratory spatial data analysis including the
local Moran’s I and G statistics for flow data analysis. Furthermore, we envision that the princi-
ples of the flow proximity measure can be further expanded to higher dimensionality for the
space-time analysis of flow data, or to other kinds of spatial analyses, for example spatial inter-
action modeling and trajectory data analysis. Lastly, combining this spatial statistical method
with other fast-developing techniques is also very meaningful. GeoComputation, GeoVisuali-
zation, and spatial data mining are all powerful methods that complement confirmatory statisti-
cal analysis, especially in this “Big Data” era.
Note
1 The observed global K-function for this dataset is above the 0.01 upper envelope at most scales. To bet-
ter demonstrate the capability of our new local flow clustering statistics, we report results for selected
scales within the range of statistical significance.
Geographical Analysis
16
References
Aldstadt, J., and A. Getis. (2006). “Using AMOEBA to Create a Spatial Weights Matrix and Identify Spa-
tial Clusters.” Geographical Analysis 38(4), 327–43.
Anselin, L. (1995). “Local Indicators of Spatial Association–LISA.” Geographical Analysis 27(2),
93–115.
Anselin, L., I. Y. Syabri, and Kho. (2006). “GeoDa: An Introduction to Spatial Data Analysis.” Geograph-
ical Analysis 38(1), 5–22.
Berglund, S., and A. Karlstr€
om (1999). “Identifying Local Spatial Association in Flow Data.” Journal of
Geographical Systems 1(3), 219–36.
Besag, J., and J. Newell. (1991). “The Detection of Clusters in Rare Diseases.” Journal of the Royal Sta-
tistical Society Series A 154(1), 143–55.
Boots, B., and Okabe, A. (2007). “Local Statistical Spatial Analysis: Inventory and Prospect.” Interna-
tional Journal of Geographical Information Science 21(4), 355–75.
Boyandin, I., E. Bertini, and D. Lalanne. (2010). “Using Flow Maps to Explore Migrations over Time.” In
Geospatial Visual Analytics Workshop in Conjunction with The 13th AGILE International Confer-
ence on Geographic Information Science. Guimar~
aes, Portugal, 2(3).
Chen, J., R. Wang, L. Liu, and J. Song. (2011). “Clustering of Trajectories Based on Hausdorff Distance.”
2011 International Conference on Electronics, Communications and Control (ICECC), Ningbo,
China, 1940–44.
Cressie, N. (1993). Statistics for Spatial Data. New York: Wiley.
Cui, W., H. Zhou, H. Qu, P. C. Wong, and X. Li. (2008). “Geometry-Based Edge Clustering for Graph
Visualization.” IEEE Transactions on Visualization and Computer Graphics 14(6), 1277–84.
Diggle, P. (1983). Statistical Analysis of Spatial Point Patterns. London: Academic Press.
Fortin, M., and Dale, M. (2009). “Spatial Autocorrelation.” In The SAGE Handbook of Spatial Analysis,
89–103, edited by S. Fotheringham and P. Rogerson. London: Sage
Fotheringham, S. (1997). “Trends in Quantitative Methods I: Stressing the Local.” Progress in Human
Geography 21(1), 88–96.
Fotheringham, S., and B. Zhan. (1996). “A Comparison of Three Exploratory Methods for Cluster Detec-
tion in Spatial Point Patterns.” Geographical Analysis 28(3), 200–18.
Geary, R. (1954). “The Contiguity Ratio and Statistical Mapping.” The Incorporated Statistician (The
Incorporated Statistician) 5(3), 115–45.
Genolini, C., and B. Falissard. (2010). “KmL: K-Means for Longitudinal Data.” Computational Statistics
25(2), 317–28.
Getis, A., and J. Franklin. (1987). “Second-Order Neighborhood Analysis of Mapped Point Patterns.”
Ecology 68, 473–77.
Getis, A., and J. Ord. (1992). “The Analysis of Spatial Association by Use of Distance Statistics.” Geo-
graphical Analysis 24(3), 189–206.
Guo, D. (2009). “Flow Mapping and Multivariate Visualization of Large Spatial Interaction Data.” IEEE
Transactions on Visualization and Computer Graphics 15(6), 1041–48.
Kulldorff, M. (1997). “A Spatial Scan Statistic.” Communications in Statistics - Theory and Methods
26(6), 1481–96.
Lee, J. G., J. Han, and K. Y. Whang. (2007). “Trajectory Clustering: A Partition-and-Group Framework.” In
Proceedings of the 2007 ACM SIGMOD international conference on Management of data.Beijing,
China 593–604.
Liu, Y., D. Tong, and X. Liu. (2015). “Measuring Spatial Autocorrelation of Vectors.” Geographical
Analysis. 47(3), 300–319.
Lu, Y. (2006). “Spatial Choice of Auto Thefts in an Urban Environment.” Security Journal 19 (3),
143–166.
Lu, Y., and J.-C. Thill. (2003). “Assessing the Cluster Correspondence between Paired Point Locations.”
Geographical Analysis 35(4), 290–309.
Lu, Y., and J.-C. Thill. (2008). “Cross-scale Analysis of Cluster Correspondence Using Different Opera-
tional Neighborhoods.” Journal of Geographical Systems 10(3), 241–61.
McGoey, C. (2000). “Auto Theft Facts.” www.crimedoctor.com/autotheft1.htm
Ran Tao and Jean-Claude Thill Spatial Flow Cluster Detection
17
Moran, P. (1950). “Notes on Continuous Stochastic Phenomena.” Biometrika 37(1), 17–23.
Murray, A., Y. Liu, S. J. Rey, and L. Anselin (2011). “Exploring Movement Object Patterns.” The Annals
of Regional Science 49(2), 471–84.
Nanni, M., and Pedreschi, D. (2006). “Time-Focused Clustering of Trajectories of Moving Objects.”
Journal of Intelligent Information Systems 27(3), 267–289.
Okabe, A., B. Boots, and T. Satoh. (2010). “A Class of Local and Global K-functions and Their Exact Sta-
tistical Methods.” Perspectives on Spatial Data Analysis. 101–12. edited by L. Anselin and S. J. Rey.
Berlin, Heidelberg: Springer.
Openshaw, S., M. Charlton, C. Wymer, and A. Craft. (1987). “A Mark 1 Geographical Analysis Machine
for the Automated Analysis of Point Data Sets.” International Journal of Geographical Information
Systems 1(4), 335–58.
Ord, J., and A. Getis. (1995). “Local Spatial Autocorrelation Statistics: Distributional Issues and an
Application.” Geographical Analysis 27(4), 286–306.
Ossama, O., H. Mokhtar, and M. El-Sharkawi (2011). “Clustering Moving Objects Using Segments
Slopes.” International Journal of Database Management Systems 3(1), 35–48.
Ripley, B. D. (1976). “The Second-Order Analysis of Stationary Point Processes.” Journal of Applied
Probability 13, 255–66.
Symanzik, J. 2014. “Exploratory Spatial Data Analysis.” In Handbook of Regional Science, 1295–310,
edited by F. Manfred and N. Peter. Heidelberg, Germany: Springer.
Tobler, W. R. (1987). “Experiments in Migration Mapping by Computer.” The American Cartographer
14, 155–63.
Waller, L. (2009). “Detection of Clustering in Spatial Data.” In The SAGE Handbook of Spatial Analysis,
159–81, edited by S. Fotheringham and P. Rogerson. London: Sage.
Yamada, I., and J.-C. Thill. (2007) “Local Indicators of Network-Constrained Clusters in Spatial Point
Patterns.” Geographical Analysis 39(3), 268–92.
Yamada, I., and J.-C. Thill. (2010). “Local Indicators of Network-Constrained Clusters in Spatial Patterns
Represented by a Link Attribute.” Annals of the Association of American Geographers 100(2),
269–85.
Zhu, X., and D. Guo. (2014). “Mapping Large Spatial Flow Data with Hierarchical Clustering.” Transac-
tions in GIS 18 (3), 421–35.
Geographical Analysis
18