ArticlePDF Available

The Colocation Quotient: A New Measure of Spatial Association Between Categorical Subsets of Points

Authors:

Abstract and Figures

This article presents a new metric we label the colocation quotient (CLQ), a measurement designed to quantify (potentially asymmetrical) spatial association between categories of a population that may itself exhibit spatial autocorrelation. We begin by explaining why most metrics of categorical spatial association are inadequate for many common situations. Our focus is on where a single categorical data variable is measured at point locations that constitute a population of interest. We then develop our new metric, the CLQ, as a point-based association metric most similar to the cross-k-function and join count statistic. However, it differs from the former in that it is based on distance ranks rather than on raw distances and differs from the latter in that it is asymmetric. After introducing the statistical calculation and underlying rationale, a random labeling technique is described to test for significance. The new metric is applied to economic and ecological point data to demonstrate its broad utility. The method expands upon explanatory powers present in current point-based colocation statistics.
Content may be subject to copyright.
The Colocation Quotient: A New Measure of
Spatial Association Between Categorical
Subsets of Points
Timothy F. Leslie,
1
Barry J. Kronenfeld
2
1
Department of Geography and Geoinformation Science, George Mason University, Fairfax, VA,
2
Department of Geology/Geography, Eastern Illinois University, Charleston, IL
This article presents a new metric we label the colocation quotient (CLQ), a mea-
surement designed to quantify (potentially asymmetrical) spatial association between
categories of a population that may itself exhibit spatial autocorrelation. We begin by
explaining why most metrics of categorical spatial association are inadequate for many
common situations. Our focus is on where a single categorical data variable is mea-
sured at point locations that constitute a population of interest. We then develop our
new metric, the CLQ, as a point-based association metric most similar to the cross-k-
function and join count statistic. However, it differs from the former in that it is based
on distance ranks rather than on raw distances and differs from the latter in that it is
asymmetric. After introducing the statistical calculation and underlying rationale, a
random labeling technique is described to test for significance. The new metric is
applied to economic and ecological point data to demonstrate its broad utility. The
method expands upon explanatory powers present in current point-based colocation
statistics.
Introduction
Geographers have long considered the relationship between the characteristics of
an object and its neighbors. Tobler’s statement relating all things, near more than
far, remains one of the few statements geographers can claim as ‘‘law’’ (Tobler
1970). Tobler’s law applies to qualitative concepts such as culture and ecological
process, as well as to quantified measures like patenting rates and potential eva-
potranspiration (Galiano 1986; O
´hUallacha
´in and Leslie 2005). Quantifying spatial
relationships has become a hallmark of geographic analysis.
Among the various Tobleresque relationships that are of interest to a geogra-
pher, the spatial relationship between distinct populations or distributions is one of
Correspondence: Timothy F Leslie, Department of Geography and Geoinformation Science,
George Mason University, 4400 University Dr MS 6C3, Fairfax, VA 22030
e-mail: tleslie@gmu.edu
Submitted: April 21, 2009. Revised version accepted: August 24, 2010.
Geographical Analysis 43 (2011) 306–326 r2011 The Ohio State University306
Geographical Analysis ISSN 0016-7363
the most fundamental. This type of spatial relationship may be denoted by the term
spatial association. Aspects of spatial association are captured in the concepts of
‘‘spatial overlay,’’ ‘‘cross-correlation,’’ and ‘‘colocation’’ (Wartenberg 1985; de
Smith, Goodchild, and Longley 2009). The term spatial association is used else-
where to refer to pattern either within a single population (i.e., spatial autocorre-
lation) or between two or more populations. Here, we confine usage of the term to
the latter situation only, corresponding to what Lee (2001) refers to as ‘‘bivariate
spatial association.’’ In contrast to spatial autocorrelation, analysis of spatial asso-
ciation requires simultaneous consideration of multiple patterns and processes. The
autocorrelative structure of a joint population, and of each distinct subpopulation,
should be considered when selecting a metric for spatial analysis, as these aspects
of pattern may influence the observed association between populations.
Our interest lies in the situation where a single categorical data variable is
measured at point locations that constitute a population of interest. Because the
values of interest are nominal in nature, measures of spatial association developed
for ratio point data, such as the cross-variogram (Vallejos 2008), are not suitable.
Other measures, such as the join count statistic (Cliff and Ord 1981), are typically
applied to polygon rather than point data; this is also true of Moran’s coefficient,
which can be applied to nominal data (Griffith 2010). The most similar measure to
what we propose is the cross-k-function (Cressie 1991), but because it measures
spatial association between two populations, the null hypothesis that it tests is not
appropriate to the situation in which categorized individuals come from a single
population.
To analyze this situation, we develop a generalized method to determine
whether categories within a population are spatially correlated and, if so, how, in
what direction, and by how much. This situation typifies a variety of problems in
both human and physical geography. For example, one might wish to determine the
colocation preferences of businesses of different types within a metropolitan area or
examine the relationship between pairs of tree species in a forest setting in order to
identify possible interspecies relationships.
In either case, the underlying distribution is the result of two conceptually dis-
tinct spatial processes. First, the spatial structure of the overall population may
cause point locations to be clustered or dispersed. Second, nested within the overall
spatial pattern, relationships between categories may result in some categories be-
ing more or less likely to occur near others. Failure to distinguish between these two
hierarchical processes can result in spurious findings. In addition to separating
spatial association between categories from overall population clustering, recog-
nizing that categorical effects may be asymmetric is also important. Asymmetric
relationships in ecology include obligatory predatorism and parasitism, in which a
predator or parasite is confined to locations where the prey or host is found, but
the reverse is not necessarily true. In logistics, businesses further down a supply
chain often are dependent on (and therefore located near) their suppliers, while
suppliers locate based on natural resources and other inputs. Any metric of pairwise
The Colocation QuotientTimothy F. Leslie and Barry J. Kronenfeld
307
categorical spatial correlation should be able to deal with this asymmetry. In some
cases, points in category A may prefer category B, points in category B may prefer
category C, and points in C may prefer being close to category A. In such cases, a
symmetrical spatial association metric could potentially find ‘‘significance’’ in all
or none of the bidirectional pairwise associations it measures (e.g., A 2B,
B2C, and C 2A).
Our metric, the colocation quotient (CLQ), quantifies spatial relationships
between categories by building on the concept of the location quotient used by
geographers and economists to judge a region’s degree of specialization in a
particular industry (Blair 1995; Stimson, Stough, and Roberts 2006). The CLQ is
defined with respect to two categories (e.g., types A and B), and provides a measure
of the degree to which one categorical subset is spatially dependent on the other.
Specifically, CLQ
A!B
measures the degree to which type A events are spatially
attracted to type B events. The CLQ is calculated as a ratio of observed versus ex-
pected points of one type among the set of nearest neighbors of points of another
type. It also may be viewed as a modification of traditional measures of spatial
correlation between categories, including the join count statistic and the cross-k-
function.
In the next section, we explain why these existing metrics of categorical spatial
association are inadequate for many common situations. We then develop a new
statistical measure and a corresponding significance-testing framework. Finally, we
present applications of the metric in socioeconomic and physical geography con-
texts, and then conclude with suggestions for implementation.
Motivation
Categorical variables present an interesting challenge to measuring Tobler’s law
due to the multiplicity of relationships. In particular, the presence of multiple sub-
categories of a single type of entity results in two complicating factors, which are
illustrated in Fig. 1. First, the interaction between any given pair of categories often
is asymmetrical. In Fig. 1(1), asymmetry results from a unidirectional dependency,
in which individuals of category B are found only in close proximity to category A,
but individuals of category A may be found in any location independent of the
Figure 1. Illustrations of spatial patterns exhibiting (1, 2) asymmetry in pairwise categorical
spatial associations and (3) spatial autocorrelation in the overall population.
Geographical Analysis
308
presence of category B. As demonstrated, and despite its similarity to Fig. 1(1), the
attraction between A and B in Fig. 1(2) is symmetric; a spatial association metric
must be capable of distinguishing between these two patterns. Second, spatial
relationships between categories often are confounded by spatial autocorrelation of
the joint population. Fig. 1(3) illustrates a situation in which spatial autocorrelation
exists in the overall population, but little further categorical association is evident.
A metric that cannot distinguish between these two types of correlative processes
has only limited practical value.
In the following section, we review the two most commonly used metrics of
spatial association between categories: the join count statistic and Ripley’s cross-k-
function. We argue that each metric has shortcomings when applied to the afore-
mentioned situations.
Join count statistic
The join count statistic is an area-level measure of spatial association, that is, of
correlation between categories on a k-color map (Dacey 1965). The statistic op-
erates by comparing the number of times a pair of categories occurs in adjacent
positions with the expectation of randomness (Iyer 1949; David 1971; Cliff and Ord
1981). It often is applied to binary grid data, which are typically conceptualized as
a black-and-white checkerboard or as an irregular polygon tessellation, in which
case the statistic becomes a measure of spatial autocorrelation. Links between each
polygon are counted as color-same (black touching black or white touching white)
or color-different (black touching white). The resulting counts are tallied, and they
can be used to determine if the data are significantly autocorrelated. These counts
are compared with the expectations for a binomial random variable under the null
hypothesis using a w
2
distribution.
Despite being conceptually simple, the join count statistic has not been im-
plemented in most popular geographic information system software packages. In
spatial settings where source data come as points rather than as areal values, com-
putation requires a geometrical association of point pairs. Traditionally, this asso-
ciation is accomplished by drawing Thiessen polygons around each point and
treating the resulting diagram as a polygon tessellation (Upton and Fingleton 1985).
This implementation introduces a certain degree of arbitrariness to the pairing of
points with their ‘‘neighbors.’’ Furthermore, point pairs are defined in a symmet-
rical manner, so that counts of A !B and B !A joins are equal by definition.
The join count statistic, by nature, cannot detect asymmetry such as that shown in
Fig. 1(1).
The emphasis on binary measures also results in the join count statistic
rarely being used to measure categorical spatial association. Although the under-
lying theory for moving the join count statistic to more than two colors is well
developed (Haining 2003), this sort of analysis is rarely done in practice. Instead,
scholars primarily use this statistic to examine the degree of spatial autocorrelation
The Colocation QuotientTimothy F. Leslie and Barry J. Kronenfeld
309
within individual categories, even when their data contain multiple categories
(e.g., Stevens and Jenkins 2000). Another spatial binary conceptualization, the
autologistic model, measures the likelihood of a point’s neighbor having a value
of one given that the point itself has a value of one (Cressie 1991). The autologistic
model provides a framework for quantifying the probability of occurrence
given neighborhood relations but does not provide theoretical guidance about
how neighborhood relations should be defined. Multivariate developments for the
autologistic model are very recent (Kavousi, Meshkani, and Mohammadzadeh
2010).
Researchers who have applied join count measures to multivariate data have
not implemented an ‘‘overall’’ statistic, so analysis can be done only by examining
each pair of variables separately. The original join count statistic has no need of an
overall statistic: either a binary pair is significantly autocorrelated negatively or
positively, or it is insignificant. In the multiple-category situation, the matrix of
pairwise significance in each pair of categories should be augmented by an overall
count of same–same linkages and an associated significance value, similar to the
work by Dixon (2002) and Ceyhan (2008).
Cross-k-function
The cross-k-function is an extension of Ripley’s k-function for two distributions
(Cressie 1991). It measures the overall density of category B within a prescribed
neighborhood around individuals of category A. This measure is compared with the
overall density of B, which is equivalent to the probability of finding B in a random
area. Results are presented as a graph of clustering over the range of distance radii,
similar to presentations of Ripley’s k.
The cross-k-function is asymmetrical and can account for differences in the
complementary relationships between two categories. However, metrical distance
is used rather than topological neighborhood distance. As a consequence, the
effects of the spatial pattern of an overall population are comingled with the effects
of cross-correlation between categories. The cross-k-function, while providing a
graph of results, can lead to erroneous conclusions because of effects occurring at
multiple scales within a data set. Given the pattern shown in Fig. 1(3), for example,
the cross-k-function would report highly significant positive spatial correlations
between every pair of categories because the density of individuals in each cate-
gory is higher in the vicinity of individuals of any other category. However, if A, B,
and C are businesses and if each cluster represents a city, then this is a trivial result
in most types of analysis because businesses clustering within cities is already well-
known. Of greater interest is the question of whether specific pairs of categories are
more mutually clustered than would be expected given the spatial pattern of a
parent population. A variation of the cross-k-function developed for network situ-
ations partially corrects this problem, but the correction is only applicable for net-
work-based analyses (Okabe and Yamada 2001).
Geographical Analysis
310
Other related metrics
A number of papers and articles propose or examine methods of analyzing spatial
association between values measured at points. Clifford, Richardson, and Hemon
(1989) and Dutilleul (1993) examine the effect of spatial autocorrelation on the
significance value of the standard correlation coefficient (r) between two geocoded
variables. Wartenberg (1985) multivariate spatial correlation expands Moran’s Ito
examine quantitative multivariate geographic distributions and shows analogies
to principal components analysis. Lee (2001) builds on Wartenberg (1985) work to
decompose Moran’s Iinto a spatial smoothing scalar and correlation (Pearson’s r)
between spatially lagged (smoothed) values of observed variables, and uses this
decomposition to create a bivariate measure of spatial association. Vallejos (2008)
and Rukhin and Vallejos (2008) use a normalized cross-variogram, treating point
data as samples of a continous spatial process. These measures all derive from a
conceptualization of two or more ratio variables distributed on a continuous spatial
domain. Although it may be possible to adapt these methods to the measurement
of spatial association between nominal values distributed on a discontinuous
(point) domain, such adaptation is not straightforward and is beyond the scope of
this article.
Other metrics have been developed to describe spatial association between
categories of points, but none handle the problems of asymmetry and nested pattern
correlation. Galiano (1986) calculates conditional probabilities within distance
neighborhoods, similar to the cross-k-function, to study relationships between tree
species. Dale (1999) dismisses Galiano’s conditional probabilities as being equiv-
alent to the paired quadrat covariance, a symmetrical measure of cross-correlation
based on fixed-area quadrats that is affected by spatial pattern in a joint population.
Leslie and O
´hUallacha
´in (2006) presented the nearest establishment with asym-
metrical relationships (NEAR) statistic, a preliminary version of the CLQ, but they
exclude same-category associations, discount multiple observations at a single lo-
cation, and do not provide a basis for understanding the statistical significance of
their results. The need remains for an asymmetrical topological measure that can
work with categorical data constrained by a parent population that may itself be
clustered (or dispersed). The use of nearest neighbors is apposite, as the closest
individuals are generally expected to have the greatest influence (Ord 1990).
Within the field of ecology, a few investigations use nearest neighbor contingency
tables to investigate point patterns (Dixon 1994, 2002; Ceyhan 2008). However,
these investigations lack a solid semantic foundation from which other researchers
can choose when to use their statistics versus other statistics. The CLQ was devel-
oped to fill this need and is explained in the following section.
Method
Although a number of metrics exist to quantify spatial association (Cressie 1991;
Okabe and Yamada 2001; Dixon 2002; Leslie and O
´hUallacha
´in 2006), a
The Colocation QuotientTimothy F. Leslie and Barry J. Kronenfeld
311
conceptual framework to distinguish these patterns and processes has not been ar-
ticulated. As an impetus for our methods, we seek to do three things: (1) to con-
solidate our proposed analytic into a small number of accessible equations; (2) to
discuss the existence and causes of asymmetry in measures of spatial association;
and (3) to create and test a null hypothesis that captures the analytical power and
explains the limits of the statistic. In developing our methodological framework, we
build upon the importance of nearest neighbors. This construction is done to ad-
dress the problems that arise from the clustering of joint point patterns for reasons
other than colocation.
In developing a statistical metric that distinguishes between the effects of
autocorrelation of a joint population and the specific associative relationships
between pairs of categorical subsets, identification of the appropriate null hypoth-
esis is important. While research implementing the cross-k-function uses a null
hypothesis of ‘‘there is no spatial association between any pair of categorical sub-
sets,’’ the null hypothesis for a CLQ-based analysis is ‘‘given the clustering of the
joint population, there is no spatial association between pairs of categorical sub-
sets.’’ That is, we take the geometric pattern of a joint population as a given and
search for associations that cannot be explained by this joint pattern alone.
Let Pdenote a point population within which each individual is assigned
uniquely to one of k-categories in a classification system X, and let AAX and BAX
denote (possibly the same) categories in X. CLQ
A!B
is defined as the ratio of ob-
served to expected proportions of B among A’s nearest neighbors. Formally, this
calculation is given by
CLQA!B¼CA!B=NA
N0
BðN1Þ;ð1Þ
where Ndenotes the population size of the set of categories under analysis; N
A
denotes the population size of A; N0
B
denotes the population size of B (if AB) or
the population size of B minus 1 (if A 5B); and C
A!B
denotes the count of type A
points whose nearest neighbor is a type B point; defined more rigorously in equa-
tion (7) below. The numerator of CLQ
A!B
is the proportion of type B points among
A’s nearest neighbors (i.e., the observed proportion), while the denominator is the
proportion of type B points that could be a nearest neighbor to each type A point
(i.e., the expected proportion). To calculate the expected proportion, N1 rather
than Nis used in the denominator, because a point cannot be its own nearest
neighbor (Dixon 2002). Similarly, in the calculation of the same–same category
CLQ, N0
B
is defined as the count of type B ( 5A) points minus one because each
point of category A can have all other points of type A as neighbors except itself.
Semantically, CLQ
A!B
denotes the spatial attraction of A to B, or, alterna-
tively, as the degree to which B attracts A. For example, CLQ
A!B
52 indicates
that A is twice as likely to have B as its nearest neighbor (i.e., to locate near a point
of type B) as would be expected by chance. The attraction expressed by CLQ
A!B
is unidirectional because it is dependent on nearest neighbor relationships that may
Geographical Analysis
312
be asymmetric. If many cases exist where A’s nearest neighbor is B but B’s nearest
neighbor is not A, then C
A!B
4C
B!A
, and, therefore, CLQ
A!B
4CLQ
B!A
,
logically expressing that A is more attracted to B than B is to A. Same-category
CLQs are interpreted in a similar manner, such that a CLQ
A!A
50.67 indicates
that A is only two-thirds as likely to be its own nearest neighbor as would be
expected given A’s proportion in the overall parent population; in this case, the
attraction is bidirectional.
The CLQ can be viewed as a simple modification of either the join count sta-
tistic or the cross-k-function. With regard to the join count statistic, the CLQ is
derived by replacing pairwise joins with nearest neighbor counts. From the cross-k-
function, the CLQ is derived by substituting neighbor ranks for absolute distances as
the basis for determining relative probabilities. The CLQ also may be considered an
extension of metrics used to measure the degree to which categories are associated
with specific locations or types of locations. Economic geographers are familiar
with the location quotient, which measures the ratio of a local economy’s propor-
tion of economic activity in a particular sector to the proportion of activity in the
country and/or region that encompasses it (Blair 1995; Stimson, Stough, and Rob-
erts 2006). The location quotient assigns values greater (or less) than one to places
with greater (or less) than average activity in a particular sector. A similar measure
used in forestry and other ecological applications is fidelity, which describes the
degree to which a given species is associated with a particular community type
(e.g., Dyer 2006). Though a conceptual descendant of these metrics, the CLQ de-
scribes spatial association of one category of objects with another category rather
than with a region or set of regions.
Like classical location quotients, a CLQ value of one has semantic importance.
The value of one occurs when the proportion of category B individuals among
category A individuals’ closest neighbors equals the proportion of category B in its
overall population (excluding one individual of category A). A CLQ
A!B
greater
than one shows a higher number of nearest neighbors of category B than expected
given the relative counts in its population, whereas a value less than one indicates
that points in group B are closest neighbors to points in group A less frequently than
expected. The lowest possible value is zero, which occurs when no points in cat-
egory B are the closest neighbor to any points of category A. Every integer value
above unity indicates a multiple of ‘‘closeness’’ more than expected. The same-
category CLQ is undefined if any category has fewer than two points.
The CLQ does have a maximum value that depends on the proportion within
the overall population of the category under examination as well as certain geo-
metrical constraints. Ignoring these geometrical constraints, the proportional max-
imum CLQ
A!B
is found when all of A’s neighbors are B (C
A!B
5N
A
), resulting in
CLQA!B¼NA=NA
N0
BðN1Þ¼1
N0
BðN1Þ¼N1
N0
B
:ð2Þ
The Colocation QuotientTimothy F. Leslie and Barry J. Kronenfeld
313
This formula shows that the maximum degree to which A can be attracted to B
depends on the population of B and, somewhat counterintuitively, that this rela-
tionship is an inverse one. In other words, the larger the population of B, the less the
attractive force it can exert on A. The reason for this is that if B constitutes a sig-
nificant proportion of the overall population, then one would expect a large pro-
portion of type A individuals to locate near a type B individual due to chance alone.
For example, suppose that type B individuals made up 500 (nearly half) of an
overall population of 1001; regardless of the population of A, one would expect
half of all type A individuals to have a type B individual as their nearest neighbor.
Even if every type A individual had a type B individual as its nearest neighbor, this
would only be twice as many as expected, and the maximum value of CLQ
A!B
would be equal to two. This issue becomes important only when dealing with
a large number of categories with substantial variance in their relative popula-
tions, especially when one category makes up a large percentage of the parent
population.
Geometry also limits the maximum CLQ value. Maximum values occur when
each point of category B has a point of category A as its nearest neighbor. This
maximum increases as B becomes a larger share of the overall population. A geo-
metric limit occurs when every point of category A is surrounded by a ring of five
category B points that each have the category A point as their nearest neighbor. In
this situation, any additional category B point will be just as close or closer to an
existing category B point as it is to the central A point, and so the proportion of
category B points that have A as their nearest neighbor cannot increase any further.
Therefore, 5N
B
is substituted for C
A!B
in equation (1) to determine the geometric
maximum:
CLQA!B¼5NB=NA
NB=ðN1Þ¼5N1ðÞ
NA
;ð3Þ
for nonsame category analysis. For the same–same category analysis, the geometric
maximum does not apply, as higher values are not achieved when a set of points is
closer to a central point than to other points in a ring. In situations where N
A
is more
than five times the size of N
B
, the geometric maximum rather than the proportional
maximum is the maximum CLQ value.
Considering both numerical and geometric constraints, equation (4) furnishes
the formula for the maximum CLQ, which holds for both the multivariate and
bivariate situations:
MaxðCLQA!BÞ¼Min N1
N0
B
;5N1ðÞ
NA

:ð4Þ
This maximum expresses two constraints on the value of CLQ
A!B
. First, as the
relative population of B increases, the degree to which A is attracted to B is limited
numerically. This restriction arises because having a category B point as a nearest
neighbor is less ‘‘surprising’’ when category B points are more common. Second, as
Geographical Analysis
314
the proportion of A increases, the degree to which A is attracted to B is limited
geometrically. This constraint occurs because only so many category A points can
be packed around each category B point.
Because the CLQ semantically indicates the ratio of observed to expected,
demonstrating that its expectation is unity is important. Given the values of N,N
A
,
and N0
B
, a random allocation of categories dictates that the expected count C
A!B
of type A points’ nearest neighbors that are of type B is simply N
A
multiplied by the
conditional probability of selecting a point of type B given that one point of type A
has already been selected:
EC
A!B
ðÞ¼NAEp
BjA

¼NAN0
B
N1

:ð5Þ
Substituting this expected count of nearest neighbors E(C
A!B
) into equation
(1), the expectation of the CLQ becomes
ECLQA!B
ðÞ¼ECA!B=NA
N0
BðN1Þ
!
¼N1
NAN0
B

EC
A!B
ðÞ¼
N1
NAN0
B

NAN0
B
N1

¼1:ð6Þ
Therefore, the expected value of the CLQ is one if categories are randomly
allocated across a fixed point pattern.
In the preceding discussion, each category A point is assumed to have exactly
one nearest neighbor. However, often a point has several equidistant nearest
neighbors. This could occur when multiple points coexist or appear to coexist at
a single location, as, for example, when several retail stores share the same postal
address. Equidistant nearest neighbors also are common when points are arranged
on a regular grid. By making explicit the definition of C
A!B
in equation (1), the
CLQ can easily accommodate such situations. Because CLQ
A!B
expresses the
degree to which A is attracted to B, the weight of each category A point is made
equal by formally defining C
A!B
as
CA!B¼X
NA
i¼1X
n
j¼1
Bijð1;0Þ
n;ð7Þ
where nis the number of equidistant nearest neighbors a point has, and B
ij
is a 0–1
decision rule variable of whether the jth equidistant nearest neighbor of the ith
category A point is of category B. Thus, the contribution of each category A point to
the total nearest neighbor count is exactly one and is not determined by the number
of equidistant neighbors.
Asymmetry is defined by the condition that CLQ
A!B
CLQ
B!A
. Equation (1)
shows that this occurs only when C
A!B
C
B!A
. This means that asymmetry in
the CLQ results if and only if there exist asymmetrical spatial configurations in
which the nearest neighbor of an individual does not have that individual as its own
nearest neighbor. In Fig. 1(1), for example, several clusters are present in which
The Colocation QuotientTimothy F. Leslie and Barry J. Kronenfeld
315
more than one individual of type B is clustered around a single individual of type A.
This results in asymmetry because there are many type B individuals whose nearest
neighbor is an individual of type A but who are not the nearest neighbors of that
type A individual. Specifically in this example, CLQ
A!B
50.95 (not statistically
significantly different from one), but CLQ
B!A
51.9: B is the closest neighbor of A
just slightly less than would be expected for a completely random mixture, while A
is the closest neighbor of B much more than a random mixture would suggest. This
result indicates that the occurrence of B is strongly dependent on the occurrence of
A, but not vice versa. The maximum CLQ value for both sectors is 1.9, which is
reached by CLQ
B!A
. Also notable is that CLQ
B!B
is 0, while CLQ
A!A
is 1.05.
These same-category CLQs show that B appears to avoid itself as its own nearest
neighbor (as noted previously, it appears to strongly prefer A as its nearest neigh-
bor), while A has itself as its own nearest neighbor only slightly more than average.
In Fig. 1(2), half of the points of category B have been removed. This new pat-
tern shares very little in common with Fig. 1(1) because significant differences exist
between the two figures of when A is B’s nearest neighbor but not the reverse.
Now CLQ
A!B
and CLQ
B!A
are the same value: 1.4. These CLQ values seman-
tically indicate that A and B prefer the opposite category 40% more than would be
expected for a random distribution. The cross-category CLQs are not the only val-
ues to change with the removal of category B points: while CLQ
B!B
remains zero,
CLQ
A!A
decreases to 0.77. In Fig. 1(2), the number of As is a much larger portion
of the population, and the expectation of same-category links increases. However,
as the actual number of same-category nearest neighbors from category A to cat-
egory A remains the same, CLQ
A!A
decreases between Figs. 1(1) and 1(2). This
example illustrates how spatial attraction of one categorical subset for another is
influenced not only by the number of observed nearest neighbors but also by the
proportions of the category types within the parent population that influence the
expected number of nearest neighbors.
While the pairwise CLQ matrix provides a means of identifying important cat-
egorical relationships, a global statistic facilitates significance testing. The global
CLQ is defined as the ratio of the observed number of same-category nearest
neighbor pairs to that expected number under the null hypothesis of no spatial as-
sociation between categories. This global CLQ is demonstrated as
CLQGlobal ¼P
A2X
CA!A
P
A2X
NANA1
N1

:ð8Þ
The denominator represents the expected number of same-category nearest
neighbors under the null hypothesis of no spatial association.
To determine which patterns are statistically different from random, we need to
know the likelihood of an observed CLQ occurring within a given spatial pattern if
categorical assignments are random. Research on spatial autocorrelation statistics
Geographical Analysis
316
such as the join count shows that the assumptions of a simple t-test of the difference
of proportions do not hold and that nonnormality varies with the size of a data set
(Cliff and Ord 1981). We follow recent developments in spatial statistics, such as
the LISA (Anselin 1995, 2003), the network cross-k-function (Okabe and Yamada
2001), and the Ripley’s k-function (Marcon and Puech 2003) in using Monte Carlo
simulation, which makes no assumptions about the expected distribution except
that location behaviors are similar across a study area. In each simulation trial, the
proportion of the total population assigned to each category is held constant, but
these category assignments are randomly redistributed within a population, and
each pairwise CLQ as well as the CLQ
global
is recalculated. After a predetermined
number of permutations, the simulated sample distributions for the pairwise and
global CLQs are used to determine the significance of observed CLQs. Two-tailed
significance is determined by taking the lesser of the number of trials in which the
simulated CLQ was greater than or equal to, or less than or equal to, the observed
CLQ and multiplying by two.
Calculation of the CLQ and Monte Carlo simulation to determine significance
was implemented in a Visual Basic. Net (Microsoft Corp., Redmond, WA) stand-
alone program. A spatial index was created to facilitate efficient computation of
nearest neighbors, which is performed in O(nlog (n)) time including index creation
(Friedman, Bentley, and Finkel 1977), where nis the number of points in the pop-
ulation. Each Monte Carlo simulation requires an O(n) allocation of categories
among the existing points but not recomputation of nearest neighbors. Therefore,
the overall computational efficiency is, therefore, the greater of O(nlog n) and
O(mn), where mis the number of simulations.
Finally, the ability of the method to incorporate easily point-based data sets in
the CLQ is substantial because it does not require the use of predefined areal units
that were devised for purposes outside the identification of the phenomenon in
question, such as traffic analysis zones to examine metropolitan employment (Gi-
uliano and Small 1991) or municipal townships to analyze ecological data (Cogbill,
Burk, and Motzkin 2002). Point data do not have the typical location quotient
sensitivity to scale (Mulligan and Schmidt 2005). However, caution is warranted
when applying the CLQ to categories with a small count. We do not recommend
the use of this method when a category has fewer than 10 individuals because the
power of the test will be low. Aggregating categories, if theoretically appropriate,
would likely solve this problem.
Two applications of the CLQ
To illustrate the proposed metric, we determined global and pairwise CLQs for two
data sets from very different domains. The first data set consists of 36,909 business
establishments in the Phoenix metropolitan region, classified according to eco-
nomic sector. The second data set consists of 368,122 trees tallied in a 50 ha plot on
Barro Colorado Island in the Panama Canal, classified according to health and
The Colocation QuotientTimothy F. Leslie and Barry J. Kronenfeld
317
health-related physiognomic characteristics. For each data set, we ran 10,000 sim-
ulations to determine significance values for the global and pairwise CLQs. Global
CLQs for both data sets are strongly positive and highly significant (Po0.001).
Establishments in phoenix
Which types of businesses locate near one another? Do establishments colocate
near establishments in their own sector or in different sectors? While this topic is
addressed at the intrasectoral level (no same-category associations) in Leslie and
O
´hUallacha
´in (2006), the addition of same-category associations as well as intra-
sectoral links provides a comparison and a more in-depth look at the postmodern
metropolitan area of Phoenix. Phoenix is a flat city with few natural barriers, de-
veloped around the automobile, and decentralized for suburbanizing residents
(Gammage 2003). Intrasectorally, in 2002 Phoenix had three groupings: secondary
sector, wholesale and transportation, and administrative support; finance, insur-
ance, and producer services; and retailers with entertainment and accommodation
establishments (Leslie and O
´hUallacha
´in 2006). In the Leslie and O
´hUallacha
´in
(2006) analysis, the effect of same-category associations is mentioned but not
investigated.
A point-level establishment data set created by the Maricopa Association of
Governments (MAG) is used here. As a regional planning alliance, MAG conducts a
regular survey of nongovernmental employers in the region, most recently in 2004.
Business category groupings identified by this survey appear in Table 1. Maps
Table 1 Descriptive Categorical Information for Phoenix Economic Analysis, 2004
NAICS Sector N
11–23 Agriculture, mining, utilities, construction 3,705
31–33 Manufacturing 3,021
42–43 Wholesale trade 2,732
44–45 Retail 5,279
48–49 Transport 802
51 Information 786
52 Finance and insurance 1,931
53 Real estate 1,608
54–55 Professional, scientific, technical, management services 3,631
56 Administrative support 1,954
61 Education 1,185
62 Health care and social assistance 3,045
71 Arts, entertainment, and recreation 622
72 Accommodation and food services 3,199
81 Other services 2,886
92 Public administration 519
Total 36,905
Geographical Analysis
318
of these data appear in Leslie and O
´hUallacha
´in (2006). The data are spatially
clustered and have a nearest neighbor R-value of 0.24 (significantly nonrandom at
the 0.01 level). Conducting a cross-k-function analysis on this data set would likely
find many pairwise categorical associations simply because the data set itself is
highly clustered.
As noted, the global CLQ is strongly positive (CLQ
global
52.53). This finding is
highlighted by the presence of same-category pairwise CLQs (Table 2, diagonal)
that are significant and greater than one, which indicates that businesses of all
categories have strong preferences for colocating with other businesses of the same
category. This effect is strongest in public administration establishments (NAICS
92), which are 14 times more likely to have their neighbor be the same category
than would be expected for a random distribution. This effect is weakest for ad-
ministrative support establishments (NAICS 56), which are just one-and-a-half
times more likely to have another administrative support establishment as their
nearest neighbor. Same-category values indicate that most establishment types are,
in general, two-and-a-half times more likely to locate next to a similar establish-
ment than would be expected from a random mixture.
A sector-by-sector inspection reveals that, in general, location preferences do
tend to be symmetric and reveal substantial category-groupings. The classic group-
ing of natural resources and subsequent processing is present. The primary sector
(agriculture, mining, utilities, and construction [NAICS 11–23]), manufacturing
sector (NAICS 31–33), wholesale sector (NAICS 42–43), and transportation and
warehousing sector (NAICS 48–49) all have high preferences for each other, with
the unexpected exception that the wholesale sector and transportation and ware-
housing sector have no associative links with the primary sector. The primary sector
also has a mutually high CLQ with administrative support (NAICS 56) and educa-
tion (NAICS 61), although the manufacturing, wholesale, and transportation and
warehousing sectors do not. The retail sector (NAICS 44) has CLQs significantly less
than one for almost every category except itself and the accommodation and food
services sector (NAICS 72), likely a result of the placement of Phoenix retail es-
tablishments in strip and shopping malls. Other services (NAICS 81) tend to have
retail as a nearest neighbor more often than expected, although the reverse is not
true. Producer services also have a grouping; information (NAICS 51), finance and
insurance (NAICS 52), and professional, scientific, technical, and management
services (NAICS 54–55) have mutually reciprocated high CLQs with each other.
Left out of this mix is the real estate sector (NAICS 53), which has strong location
preferences only with the finance and insurance sector but not with information or
professional, scientific, technical, and management services. Professional, scien-
tific, technical, and management services are mutually close to administrative sup-
port, but information and finance and insurance services do not share this
association. Finally, accommodation and food services have a link with other ser-
vices, also likely to be a result of colocation in strip malls throughout the metro-
politan area.
The Colocation QuotientTimothy F. Leslie and Barry J. Kronenfeld
319
Table 2 CLQs for Phoenix Establishments by NAICS, 2004
NAICS 11–23 31–33 42–43 44–45 48–49 51 52 53 54–55 56 61 62 71 72 81 92 GEO MAX
11–23 2.16 1.25 1.20 0.61 0.74 0.86 1.34 1.36 0.56 0.44 0.63 49.80
31–33 1.33 2.59 2.19 0.69 1.60 0.42 0.69 0.75 0.56 0.35 0.49 0.48 0.77 0.49 61.08
42–43 2.28 2.33 0.73 1.48 0.61 0.74 0.75 0.65 0.43 0.56 0.56 0.75 0.57 67.54
44–45 0.53 0.62 0.64 2.41 0.61 0.49 0.82 0.82 0.48 0.64 0.52 0.58 1.70 0.51 34.95
48–49 1.34 1.90 1.68 0.72 5.11 0.43 0.58 0.44 0.52 0.70 230.07
51 0.76 0.73 2.27 1.7 1.63 0.69 0.75 234.76
52 0.52 0.45 0.55 0.75 0.31 1.9 3.31 1.38 1.73 1.30 0.61 0.64 95.56
53 0.53 0.73 1.37 1.68 1.81 114.75
54–55 0.88 0.67 0.76 0.55 0.47 1.69 1.87 2.19 1.5 0.76 0.60 0.76 50.82
56 1.18 0.82 0.72 1.39 1.52 1.63 0.70 0.63 94.43
61 1.29 0.61 0.60 0.71 0.69 1.37 3.52 1.65 0.64 155.71
62 0.47 0.42 0.41 0.67 0.47 0.69 0.81 0.72 4.19 0.74 60.60
71 0.61 0.61 1.5 2.48 1.34 1.38 296.66
72 0.35 0.41 0.45 1.73 0.45 0.70 0.64 0.62 0.67 0.74 2.60 1.29 0.56 57.68
81 0.72 0.86 1.20 0.81 0.70 0.85 1.19 1.77 0.57 63.94
92 0.73 0.42 0.60 0.42 0.49 14.39 355.53
NUM MAX 9.96 12.22 13.51 6.99 46.01 46.95 19.11 22.95 10.16 18.89 31.14 12.12 59.33 11.54 12.79 71.11
Note: Values indicate the likelihood of a point’s nearest neighbor belonging to the column category given that the point belongs to the row
category, as compared with a random distribution (CLQ
row !column
). Only colocation values significantly different from one at the 0.05 level or
below are shown. Global CLQ 52.53, P50.001.
Geographical Analysis
320
Important asymmetries are present, aside from those previously mentioned. The
primary and other services sectors both have a large number of asymmetric associ-
ations. Public administration (NAICS 92) also has several of these asymmetric asso-
ciations. Asymmetries appear to reflect not supply-chain mechanics but rather
category sets where one sector may be the basis of a cluster (such as a medical
[NAICS 62], retail, or government complex) and other establishment types are arrayed
around it. This pattern causes high levels of same-category associations, with asym-
metries in the support services (administrative support, other services). Some sectors
appear to have very little colocation requirements in either direction. The public ad-
ministration and arts, entertainment, and recreation sectors (NAICS 71) have only
three or four significant associations outside of their same-sector preferences.
Tree conditions in Barro Colorado island
Data for the Barro Colorado Island example were obtained from the Center for
Tropical Forest Science (Hubbell, Condit, and Foster 2005). Previously connected
with the larger forest, this island was formed when the surrounding area was
flooded during the construction of the Panama Canal in the early 20th century. All
trees within the plot have been tallied since 1982. The most recent data, collected
in 2005, include observations about health and growth conditions. Live trees are
noted if they are buttressed (i.e., have enlarged trunks at the base), multistemmed,
or leaning significantly from a vertical position, or if the main trunk is broken below
the crown. Buttressed trees have noticeably widened trunks near the ground, which
may be caused by wet or unstable soil or internal rot. In addition, trees that died
since the previous census are noted as still standing, down, or missing. More than
one condition is recorded for many trees; in these cases, we used the most severe
condition for analysis. In this manner, we assigned trees to one of eight possible
categories (Table 3).
Several hypotheses relating to spatial patterns of growth and mortality naturally
arise from these data. For example, one hypothesis is that all types of mortality are
Table 3 Tree Condition Categories Used in Analysis of the Barro Colorado Data
Condition Definition Count
Normal No special code recorded 175,077
Buttressed Buttressed to at least 1.3 m, but not leaning or broken 2,462
Multistem Multistemmed plant, not buttressed 20,542
Leaning Leaning, but not broken or dead 12,757
Broken Broken above 1.3 m, but not dead 11,816
Dead Dead, not downed or missing 86,491
Downed Dead, trunk lying on ground 7,408
Missing Tree recorded in earlier survey, but not found in 2005 51,569
Note: Categories were aggregated from the original data, in which a single tree could be
assigned multiple codes.
The Colocation QuotientTimothy F. Leslie and Barry J. Kronenfeld
321
colocated. An alternative hypothesis is that wind-controlled mortality events ex-
hibit a spatial pattern distinct from senescence, disease, and other types of mortality
that leave a tree standing. Other hypotheses relate to the causes of certain mor-
phological characteristics. For example, buttressing and multistemmed growth are
the result of favorable growing conditions or, conversely, are a response to adverse
conditions. Spatial patterns of colocation provide both direct and indirect evidence
for these types of hypotheses.
Fig. 2 portrays a portion of the data. The nearest neighbor statistic indicates that
the overall spatial pattern is slightly clustered (R50.982; Po0.01). By itself, this
minor departure from randomness is not a great concern. However, careful exam-
ination of Fig. 2 reveals that the pattern is more nuanced: apparent clustering is the
result of large gaps containing few or no trees, likely caused by the presence of
roads, rivers, or recent disturbances. If these gaps are excluded, the tree pattern in
the remaining areas is likely to exhibit a slightly dispersed pattern. When the CLQ
matrix is computed for these data, several significant patterns stand out.
1
First, same-category CLQs are significantly positive for seven of the eight cat-
egories. This tendency for trees of the same category to colocate results in a weak
but highly significant global CLQ (CLQ
global
51.13). Second, patterns of colocation
suggest common disturbances, perhaps from wind: live trees that are leaning or
broken are strongly colocated with each other and also with downed dead trees.
The strongest measures of same-category spatial autocorrelation also occur among
these three categories, which also are somewhat colocated with missing trees but
not with standing dead trees. Standing dead trees are not colocated with any other
category, a pattern that suggests that most deaths are caused by senescence or local
Figure 2. Portion of the Barro Colorado tree data set.
Geographical Analysis
322
disease outbreaks, rather than from windthrow. Also, both buttressing and multi-
stem growth appear to be negatively associated with disturbance. Indeed, the ab-
sence of buttressed trees is notable in the vicinity of leaning and downed trees and
of standing dead trees. Even normal live trees show a slight tendency toward co-
location with buttressed and multistem trees. Consequently, healthy trees living in
fertile, sheltered locations appear to be more capable of growing multiple stems
and buttressed bases.
Although most relationships in this example are symmetrical, a notable asym-
metry exists in relationships that involve buttressed trees. Leaning and downed trees
avoid locating near buttressed trees (CLQ
leaning !buttressed
50.56, Po0.01;
CLQ
downed !buttressed
50.71, P50.04). In contrast, buttressed trees locate near
leaning and downed trees only slightly less than would be expected by chance
(CLQ
buttressed !leaning
50.91, P50.45; CLQ
buttressed !downed
50.95, P50.78).
This finding suggests that buttressed trees tend to exclude other trees and prefer
isolation.
Final considerations
The CLQ is a reenvisioning of spatial association at the point level that provides an
overall and pairwise method of describing degrees of spatial association. The pair-
wise information is descriptive of the strength of a relationship and has a natural
interpretation as a multiplicative factor on probability of occurrence. Comparison
of bidirectional CLQ pairs further provides information about the potential asym-
metry among types of points. Each pair’s linkages are described simply, and the
final results describe spatial associations after controlling for the clustering level of
an overall data set, an improvement to the cross-k-function analysis. Finally, the
addition of same-category associations, the ability to calculate properly with mul-
tiple points in the same location, and a quantification of statistical significance are
substantial improvements over Leslie and O
´hUallacha
´in (2006) NEAR statistic. The
CLQ statistic is an extension of work by Dixon (1994, 2002) in its evaluation
of maximum ratio values and the production of a comparison variable that has a
semantic meaning.
While one of our goals for the CLQ is to capture asymmetrical relationships,
over the course of developing the CLQ we found that the definition of asymmetry is
more nuanced than we had at first anticipated. The CLQ specifically captures
asymmetry in spatial configuration, which occurs when the nearest neighbor of the
nearest neighbor of a point is not the original point. The CLQ describes how certain
categories can locate more or less often compared with a random distribution and
supports discovery of asymmetrical relationships in which one category appears to
have a greater chance of locating next to category X, while category X points do not
reciprocate, which can lead to meaningful insights about the underlying pattern.
Application of the CLQ to two very different data sets illustrates its general
applicability. Empirically, pairwise CLQs reveal three types of information, which
The Colocation QuotientTimothy F. Leslie and Barry J. Kronenfeld
323
are similar for both data sets. First, significant same-type autocorrelation is strong in
both data sets. Both business establishments in Phoenix and trees on Barro Colo-
rado Island are more likely to have neighbors of the same sector or category than
indicated by a random distribution. Second, symmetrical associations between
categories reveal major groupings or data clusters. In Phoenix, two major groupings
are revealed: one of establishments in resource- and transportation-based sectors,
and a second of high-level services involving information, finance and insurance,
and professional, scientific, technical, and management services, though not the real
estate sector. On Barro Colorado Island, two major groupings are also revealed: one
of healthy, buttressed, and multistemmed trees, and a second of leaning, broken,
downed, and missing trees, but not standing dead trees. Some categories, such as
retail in Phoenix and standing dead trees on Barro Colorado Island, do not show such
group affinities as they have relatively weak or no positive colocation tendencies
with any other category. Third, asymmetries in pairwise CLQs reveal inequities in the
degree of influence exerted by each category. In Phoenix, asymmetries surprisingly
do not appear to follow supply-chain mechanics but instead indicate that certain
industries form core clusters with a mix of other sector types around this core. On
Barro Colorado Island, asymmetry suggests that the forces that cause trees to buttress
themselves at the base also isolate these trees from others.
The CLQ appears to be robust and stable. We propose that the CLQ be used to
examine a point data set consisting of multiple subcategories of a single type of en-
tity, such as trees or businesses. The cross-k-function, in contrast, should be used to
quantify the relationship between conceptually distinct entity types whose joint pop-
ulation does not form a semantically meaningful unit. The CLQ should be used in
place of the cross-k-function in situations where clustering of a joint population could
confound results, though we realize that the distinction between conceptual category
types and subcategories is not always so clear-cut. In many real-world situations, the
joint population shares traits that suggest similar spatial distributions, and the purpose
of analysis is to identify pairwise categorical relationships beyond those expected
from a joint population. In the realm of human geography, the spatial patterns of
cities, homes, businesses, and political institutions are controlled by the overall pop-
ulation distribution and transportation infrastructure, and exhibit tendencies to locate
in varying degrees of proximity to other human activity centers. Similarly, natural
categories, such as lichen, birds, and igneous rocks, have distinct spatial patterns that
can be further decomposed into subcategories that might exhibit varying degrees of
spatial correlation with each other. In these cases, the CLQ can reveal interesting
patterns of association that may shed light on underlying processes of attraction,
repulsion, dependency, and resource requirements.
Software
A BSD-licensed software tool to calculate the CLQ for any point shapefile is avail-
able at http://seg.gmu.edu/clq (accessed April 18, 2011).
Geographical Analysis
324
Note
1 A table of CLQ results and significance is available from the author.
References
Anselin, L. (1995). ‘‘Local Indicators of Spatial Association—LISA.’’ Geographical Analysis
27, 93–115.
Anselin, L. (2003). GeoDa 0.9 User’s Guide. Urbana-Champaign, IL: Spatial Analysis
Laboratory, University of Illinois.
Blair, J. (1995). Local Economic Development: Analysis and Practice. London: Sage.
Ceyhan, E. (2008). ‘‘On the Use of Nearest Neighbor Contingency Tables for Testing Spatial
Segregation.’’ Environmental Ecological Statistics 17, 247–82.
Cliff, A. D., and J. K. Ord. (1981). Spatial Processes: Models and Application. London: Pion.
Clifford, P., S. Richardson, and D. Hemon. (1989). ‘‘Assessing the Significance of the
Correlation between Two Spatial Processes.’’ Biometrics 45, 123–34.
Cogbill, C. V., J. Burk, and G. Motzkin. (2002). ‘‘The Forests of Presettlement New England,
USA: Spatial and Compositional Patterns Based on Town Proprietor Surveys.’’ Journal of
Biogeography 29, 1279–304.
Cressie, N. A. C. (1991). Statistics for Spatial Data. New York: Wiley.
Dacey, M. F. (1965). ‘‘A Review of Measures of Contiguity for Two and K-Color Maps.’’ In
Spatial Analysis: A Reader in Statistical Geography, 479–95, edited by B. J. L. Berry and
D. F. Marble. Englewood Cliffs, NJ: Prentice-Hall.
Dale, R. T. (1999). Spatial Pattern Analysis in Plant Ecology. Cambridge: Cambridge
University Press.
David, F. N. (1971). ‘‘Measurement of Diversity: Multiple Cell Contents.’’ In Proceedings of
the Sixth Berkely Symposium on Mathematical Statistics and Probability, 4: 109–36.
Berkeley: University of California Press.
de Smith, M. J., M. F. Goodchild, and P. A. Longley. (2009). Geospatial Analysis: A
Comprehensive Guide to Principles, Techniques and Software Tools. Leicester, U.K.:
Matador.
Dixon, P. M. (1994). ‘‘Testing Spatial Segregation Using a Nearest-Neighbor Contingency
Table.’’ Ecology 75, 1940–48.
Dixon, P. M. (2002). ‘‘Nearest-Neighbor Contingency Table Analysis of Spatial Segregation
for Several Species.’’ Ecoscience 9, 142–51.
Dutilleul, P. (1993). ‘‘Modifying the t Test for Assessing the Correlation between Two Spatial
Processes.’’ Biometrics 49, 305–14.
Dyer, J. M. (2006). ‘‘Revisiting the Deciduous Forests of Eastern North America.’’ Bioscience
56, 341–52.
Friedman, J. H., J. L. Bentley, and R. A. Finkel. (1977). ‘‘An Algorithm for Finding Best
Matches in Logarithmic Expected Time.’’ ACM Transactions on Mathematical Software
3, 209–26.
Galiano, E. F. (1986). ‘‘The Use of Conditional Probability Spectra in the Detection of
Segregation between Plant Species.’’ Oikos 46, 132–38.
Gammage, G. (2003). Phoenix in Perspective: Reflections on Developing the Desert. Tempe,
AZ: Herberger Center for Design Excellence, College of Architecture and Environmental
Design, Arizona State University.
The Colocation QuotientTimothy F. Leslie and Barry J. Kronenfeld
325
Giuliano, G., and K. A. Small. (1991). ‘‘Subcenters in the Los Angeles Region.’’ Regional
Science and Urban Economics 21, 163–82.
Griffith, D. G. (2010). ‘‘The Moran Coefficient for Non-Normal Data.’’ Journal of Statistical
Planning and Inference 140, 2980–90.
Haining, R. (2003). Spatial Data Analysis: Theory and Practice. New York: Cambridge
University Press.
Hubbell, S. P., R. Condit, and R. B. Foster. (2005). Barro Colorado Forest Census Plot Data.
Available at http://www.stri.si.edu/ (accessed August 27, 2010).
Iyer, K. (1949). ‘‘The First and Second Moments of Some Probability Distributions Arising
from Points on a Lattice, and Their Applications.’’ Biometrika 36, 135–41.
Kavousi, A., M. R. Meshkani, and M. Mohammadzadeh. (2010). ‘‘Spatial Analysis of Auto-
Multivariate Lattice Data.’’ Statistical Papers (doi 10.1007/s00362-009-0302-0).
Available at http://www.springerlink.com/content/y566158255816717/ (accessed April
18, 2011).
Lee, S. (2001). ‘‘Developing a Bivariate Spatial Association Measure: An Integration of
Pearson’s rand Moran’s I.’’ Journal of Geographical Systems 3, 369–85.
Leslie, T. F., and B. O
´hUallacha
´in. (2006). ‘‘Polycentric Phoenix.’’ Economic Geography 82,
167–92.
Marcon, E., and F. Puech. (2003). ‘‘Evaluating the Geographic Concentration of Industries
Using Distance-Based Methods.’’ Journal of Economic Geography 3, 409–28.
Mulligan, G. F., and C. Schmidt. (2005). ‘‘A Note on Localization and Specialization.’’
Growth and Change 36, 565–76.
O
´hUallacha
´in, B., and T. F. Leslie. (2005). ‘‘Spatial Convergence and Spillovers in American
Invention.’’ Annals of the Association of American Geographers 95, 866–86.
Okabe, A., and I. Yamada. (2001). ‘‘The K-Function Method on a Network and Its
Computational Implementation.’’ Geographical Analysis 33, 271–90.
Ord, J. K. (1990). ‘‘Statistical Methods for Point Pattern Data.’’ In Spatial Statistics: Past,
Present, and Future, 31–35, edited by D. Griffith. Ann Arbor, MI: Institute of
Mathematical Geography.
Rukhin, A. L., and R. Vallejos. (2008). ‘‘Codispersion Coefficients for Spatial and Temporal
Series.’’ Statistics and Probability Letters 78, 1290–300.
Stevens, P. H., and D. G. Jenkins. (2000). ‘‘Analyzing Species Distributions among
Temporary Ponds with a Permutation Test Approach to the Join-Count Statistics.’’
Aquatic Ecology 34, 91–99.
Stimson, R. J., R. R. Stough, and B. H. Roberts. (2006). Regional Economic Development:
Analysis and Planning Strategy. Berlin: Springer.
Tobler, W. (1970). ‘‘A Computer Movie Simulating Urban Growth in the Detroit Region.’’
Economic Geography 46, 234–40.
Upton, G., and B. Fingleton. (1985). Spatial Data Analysis by Example. Vol. 1, Point Pattern
and Quantitative Data. Chichester: Wiley.
Vallejos, R. (2008). ‘‘Assessing the Association between Two Spatial or Temporal
Sequences.’’ Journal of Applied Statistics 35, 1323–43.
Wartenberg, D. (1985). ‘‘Multivariate Spatial Correlation: A Method for Exploratory
Geographical Analysis.’’ Geographical Analysis 17, 263–83.
Geographical Analysis
326
... Location quotient Leslie et al. (2011) [50] Present the colocation quotient (CLQ) to quantify spatial association between categories of a population. Present the first analysis of spatial patterns and directional spatial associations between six medical resources across Wuhan city by POI data and LCLQ method. ...
... Location quotient Leslie et al. (2011) [50] Present the colocation quotient (CLQ) to quantify spatial association between categories of a population. Present the first analysis of spatial patterns and directional spatial associations between six medical resources across Wuhan city by POI data and LCLQ method. ...
... However, due to the agglomeration effect, points in some regions may aggregate, and a global unified threshold may misjudge the spatial proximity relationship, leading to inaccurate mining results. The asymmetry of interactions between spatial elements and agglomeration effects are two important issues in mining spatial association rules [50]. ...
Article
Full-text available
Spatial association rule mining can reveal the inherent laws of spatial object interdependence and is an important part of spatial data mining. Most of the existing algorithms for mining local spatial association rules are oriented towards the spatial association between two categories of points and cannot fully reflect the spatial heterogeneity of complex spatial relations among multiple categories of points. In addition, the interactions between points in different categories are often asymmetrical. However, the existing algorithms ignore this asymmetry. To address the above problems, an algorithm for mining local spatial association rules for point data of multiple categories based on position quotients is proposed. First, the proximity relationship between points is determined by an adaptive filter, and the spatial weight value is given according to Gaussian kernel function. Then, the multivariate local colocation quotient of each point is calculated to measure the strength of the local regional spatial association rule. Finally, the Monte Carlo simulation function is used to generate a random sample distribution to test the significance of the results. The algorithm is verified on artificial simulation data and real Point of Interest (POI) data. The experimental results show that the algorithm can identify significant association regions of different spatial association rules for point sets.
... We have shown that the subcellular distribution of RNA is highly structured with RNAforest. As such, we developed RNAcoloc, an approach that combines the Colocation Quotient (CLQ) [53] metric and tensor decomposition for context-specific RNA colocalization (see the " Methods" section). The CLQ is a colocalization statistic that is capable of accounting for the biophysical properties of RNA spatial distributions. ...
... In the case that A = B , N ′ B equals the total number of B transcripts minus 1. N denotes the total number of transcripts in the cell. Following statistical recommendations from the original formulation of the colocation quotient (CLQ), genes with fewer than 10 transcripts were not considered to reduce sparsity and improve testing power [53]. ...
Article
Full-text available
The spatial organization of molecules in a cell is essential for their functions. While current methods focus on discerning tissue architecture, cell–cell interactions, and spatial expression patterns, they are limited to the multicellular scale. We present Bento, a Python toolkit that takes advantage of single-molecule information to enable spatial analysis at the subcellular scale. Bento ingests molecular coordinates and segmentation boundaries to perform three analyses: defining subcellular domains, annotating localization patterns, and quantifying gene–gene colocalization. We demonstrate MERFISH, seqFISH + , Molecular Cartography, and Xenium datasets. Bento is part of the open-source Scverse ecosystem, enabling integration with other single-cell analysis tools.
... Existing spatial association analysis methods can be divided into two classes: spatial statistics and data mining approaches. In spatial statistics approaches, the cross K-function (Cressie 2015) is frequently used for two independent distributions, and the colocation quotient (Leslie and Kronenfeld 2011) is suitable for detecting colocation patterns in categories of a population. For spatial data with attributes, Haining (1991) introduced the Clifford-Richardson approach to adjust bivariate correlation (Pearson coefficient and Spearman rank correlation coefficient) with spatial data; Lee (2001) developed a bivariate spatial association measure by integrating Pearson's r and Moran's I;and Anselin (2019) proposed a local indicator of multivariate spatial association by extending the local Geary c statistic to a multivariate context. ...
Article
Full-text available
Spatial flows represent spatial interactions or movements. Mining colocation patterns of different types of flows may uncover the spatial dependences and associations among flows. Previous studies proposed a flow colocation pattern mining method and established a significance test under the null hypothesis of independence for the results. In fact, the definition of the null hypothesis is crucial in significance testing. Choosing an inappropriate null hypothesis may lead to misunderstandings about the spatial interactions between flows. In practice, the overall distribution patterns of different types of flows may be clustered. In these cases, the null hypothesis of independence will result in unconvincing results. Thus, considering the overall spatial pattern of flows, in this study, we changed the null hypothesis to random labeling to establish the statistical significance of flow colocation patterns. Furthermore, we compared and analyzed the impacts of different null hypotheses on flow colocation pattern mining through synthetic data tests with different preset patterns and situations. Additionally, we used empirical data from ride-hailing trips to show the practicality of the method.
... While kernel density estimation of the distance distribution [26], K function [41,42], and the colocation quotient [43,44] are noteworthy strands of study, the analysis of colocation patterns has been limited to autocorrelation among a set of points or the correlation between two sets of points. For the analysis of points with multiple types, these approaches can be applied multiple times to all possible pairs of types. ...
Article
Full-text available
The agglomeration effect significantly influences firms’ site selection. Manufacturing firms often exhibit intricate spatial co-location patterns that are indicative of agglomerations due to their reliance on material input and product output across various subdivisions of manufacture. In this study, we present an analytical approach employing the Q statistic and additive color mixing visualization to assess co-location patterns of manufacturing firms. We identified frequent pairs and triplets of manufacturing divisions, mapping them to reveal distinct categories: labor-intensive clusters, upstream/downstream industrial chains, and technology-spillover clusters. These agglomeration categories concentrate in different regions of the city. Policy implications are proposed to promote the upgrade of labor-intensive divisions, enhance the operational efficiency of upstream/downstream industrial chains, and reinforce the spillover effects of technology-intensive divisions.
... A drawback is that they also assume the CSR in the null hypothesis. The colocation quotient (CLQ) resolves this problem by introducing a randomization test (Leslie andKronenfeld 2011). Cromley et al. (2014) generalizes the CLQ using the spatial weight function and proposes a local version of CLQ. ...
Article
Full-text available
This paper develops a new method for analyzing the relationship between a set of points and another single point, the latter of which we call a reference point. This relationship has been discussed in various academic fields, such as geography, criminology, and epidemiology. Analytical methods, however, have not yet been fully developed, which has motivated this paper. Our method reveals how the number of points varies by the distance from a reference point and by direction. It visualizes the spatial pattern of points in relation to a reference point, describes the point pattern using mathematical models, and statistically evaluates the difference between two sets of points. We applied the proposed method to analyze the spatial pattern of the climbers of Mt. Azuma, Japan. The result gave us useful and interesting findings, indicating the method’s soundness.
... The co-location pattern refers to the spatial distribution pattern of an event at different scales, used for the analysis of clustering patterns of point sets in space (24). The co-location pattern analysis method will use the Co-location Quotient (CLQ), which was proposed by Timothy (25). There are two key indicators in this method, the Global Co-location Quotient (GCLQ) and the Local Co-location Quotient (LCLQ). ...
Article
Full-text available
Introduction COVID-19, being a new type of infectious disease, holds significant implications for scientific prevention and control to understand its spatiotemporal transmission process. This study examines the diverse spatial patterns of COVID-19 within Wuhan by analyzing early case data alongside urban infrastructure information. Methods Through co-location analysis, we assess both local and global spatial risks linked to the epidemic. In addition, we use the Geodetector, identifying facilities displaying unique spatial risk characteristics, revealing factors contributing to heightened risk. Results Our findings unveil a noticeable spatial distribution of COVID-19 in the city, notably influenced by road networks and functional zones. Higher risk levels are observed in the central city compared to its outskirts. Specific facilities such as parking, residence, ATM, bank, entertainment, and hospital consistently exhibit connections with COVID-19 case sites. Conversely, facilities like subway station, dessert restaurant, and movie theater display a stronger association with case sites as distance increases, hinting at their potential as outbreak focal points. Discussion Despite our success in containing the recent COVID-19 outbreak, uncertainties persist regarding its origin and initial spread. Some experts caution that with increased human activity, similar outbreaks might become more frequent. This research provides a comprehensive analytical framework centered on urban facilities, contributing quantitatively to understanding their impact on the spatial risks linked with COVID-19 outbreaks. It enriches our understanding of the interconnectedness between urban facility distribution and transportation flow, affirming and refining the distance decay law governing infectious disease risks. Furthermore, the study offers practical guidance for post-epidemic urban planning, promoting the development of safer urban environments resilient to epidemics. It equips government bodies with a reliable quantitative analysis method for more accurately predicting and assessing infectious disease risks. In conclusion, this study furnishes both theoretical and empirical support for tailoring distinct strategies to prevent and control COVID-19 epidemics.
... With the rapid expansion of urban commercial sites and residences, POI data with full-sample characteristics provide a new perspective for quantitative research on the spatial association between the two types of spaces. By adopting the concept of location quotient in economic geography, Leslie et al. [28,29] introduced a quantitative analysis method called the colocation quotient, which measures the directed spatial dependence among different elements. This method allows for the analysis of the spatial association among spatial elements from the perspective of asymmetric spatial dependence and of the spatial heterogeneity of spatial association at the local scale [30]. ...
Article
Full-text available
Identifying the spatial association between commercial sites and residences is important for urban planning. However, (1) the patterns of spatial association between commercial sites and residences across an urban space and (2) how the spatial association patterns of each commercial format and different levels of residences vary remain unclear. To address these gaps, this study used point-of-interest data of commercial sites and residences in Beijing, China, to calculate colocation quotients, which were used for identifying the spatial association characteristics and patterns of commercial sites and residences in the city. The results show that (1) the global colocation quotient of commercial sites and residences in Beijing is below 1, indicating relatively weak spatial association. The spatial association between each commercial format and residences varies greatly and shows the characteristics of integration of high-frequency consumption and separation of low-frequency consumption. Additionally, the spatial associations between high-grade residences and commercial formats are relatively weak, whereas those between low-grade residences and commercial formats are relatively strong. (2) The local spatial association patterns of various commercial formats and residences exhibit obvious spatial heterogeneity. Overall, the proportions of various commercial formats attracted by residences are considerably higher than those of residences attracted by various commercial formats, revealing spatial asymmetry. Within the Fourth Ring Road, commercial formats are mainly attracted by residences, showing a spatial association pattern of “distribute commercial sites according to the location of residences”. The proportions of residences attracted by commercial formats increase outside the Fourth Ring Road, presenting a spatial association pattern of “commercial formats attracting residences”. The findings offer valuable insights into the development mechanisms of commercial and residential spaces and provide valuable information for urban planning.
Article
Spatial co‐location pattern (CP) mining can discover sets of geographical features frequently appearing in adjacent locations, which is valuable for comprehending the co‐occurrence relationship between features. However, due to the quantitative differences and heterogeneous distribution of features, the probabilities that features appear in each other's neighborhood are unequal, resulting in an asymmetric spatial pattern. Current studies have paid little attention to the asymmetric characteristics of CPs. Therefore, this study explores the CPs and their asymmetric relationships. Firstly, we adopt the weighted participation index to evaluate the frequency of global candidate CPs. Secondly, we employ an asymmetry index we developed and the local co‐location quotient to quantify the asymmetry intensity of CPs. The results indicate that the frequent CPs mainly comprise facilities related to the residents' daily lives. Investigating the asymmetric relationships and spatial associations among features in the CPs is significant for identifying resource shortages and rationally planning urban resources.
Book
Full-text available
Preface Readership Acknowledgements Introduction Part I. The Context for Spatial Data Analysis: 1. Spatial data analysis: scientific and policy context 2. The nature of spatial data Part II. Spatial Data: Obtaining Data And Quality Issues: 3. Obtaining spatial data through sampling 4. Data quality: implications for spatial data analysis Part III. The Exploratory Analysis of Spatial Data: 5. Exploratory analysis of spatial data 6. Exploratory spatial data analysis: visualisation methods 7. Exploratory spatial data analysis: numerical methods Part IV. Hypothesis Testing in the Presence of Spatial Autocorrelation: 8. Hypothesis testing in the presence of spatial dependence Part V. Modeling Spatial Data: 9. Models for the statistical analysis of spatial data 10. Statistical modeling of spatial variation: descriptive modeling 11. Statistical modeling of spatial variation: explanatory modeling Appendices References Index.
Book
Full-text available
The second edition of this book is completely reedited making the book even more valuable for graduate students, reflecting recent advances and adding insightful new material. The book is about the analysis of regional economic performance and change, and how analysis integrates with strategies for local and regional economic development policy and planning. First, the book provides the reader with an overview of key theoretical and conceptual contexts within which the economic development process takes place. However, the deliberate emphasis is to provide the reader with an account of quantitative and qualitative approaches to regional economic analysis and of old and new strategic frameworks for formulating regional economic development planning. The second edition brings to the present its original thesis about the need for regions to be fast and flexible, but also to be proactive in order to be prepared to experience increasingly greater shocks while having less time to adjust their economic development to achieve sustainability. This is underscored by events that have occurred since 2001: 9/11 terrorist attacks, continuing rapid advances in technology, the rise China and India, the Tsunami, and all the known on-going and unforeseen risks and challenges that confront nations around the globe and the regions and localities within them. The book presents strategies and the traditional and expanded methods used to create and implement them.
Article
Spatial segregation of species occurs when a species is more likely to be located in the vicinity of conspecifics. This can be investigated by mapping and identifying all locations in a study area, then analyzing the nearest-neighbor contingency table, where each location is classified by its species and the species of its nearest neighbor. Nearest-neighbor contingency tables for two species can be analyzed using the methods in Dixon (1994). Here, I present methods to analyze contingency tables for any number of species. Calculation and interpretation of the multispecies contingency table are illustrated by two examples: spatial segregation of species in a swamp forest, with five types of points (Fraxinus caroliniana, Nyssa sylvatica, Nyssa aquatica, Taxodium disticum, and "other species"), and spatial segregation in the gamodioecious tree Nyssa aquatica, with three types of points (male, female, and juvenile). Two issues that affect the results and their interpretation are the choice of randomization (random labelling or toroidal rotation) and the choice of test (pairwise or multispecies).
Article
Sun Belt cities have a reputation for sprawling disarray. Although Phoenix is often depicted as the ultimate large fast-growing, low-density Sun Belt metropolis, we found considerable order in the location of business establishments. We tested spatial pattern to show that establishments in a variety of sectors are significantly clustered and found that while clustering declines outside the central business district (CBD) and subcenters, all sectors remain significantly clustered in the suburbs, A new method that we developed to assess spatial relationships of establishments across sectors revealed that spatial intersectoral associations are evident between some intermediate-sector establishments and within final demand. These intersectoral associations mostly carry over to portions of the urbanized area beyond the CBD and subcenters. A cartographic analysis details sectoral locational patterns across the metropolitan area and the relationships between the function of subcenters and the transportation network. We compare the economic structure of the CBD and five subcenters. Phoenix has a distinctively specialized CBD. Some subcenters are functionally diversified, while others are specialized. The rank-size rule is a good approximation of the size order of centers. We conclude that continued forces of accessibility, externality, and regulation shape the spatial structure of Phoenix.
Article
Clifford, Richardson, and Hemon presented modified tests of association between two spatially autocorrelated processes, for lattice and non-lattice data. These tests are built on the sample covariance and on the sample correlation coefficient, they require the estimation of an effective sample size that takes into account the spatial structure of both processes. Clifford et al. developed their method on the basis of an approximation of the variance of the sample correlation coefficient and assessed it by Monte Carlo simulations for lattice and non-lattice networks of moderate to large size. In the present paper, the variance of the sample covariance is computed for a finite number of locations, under the multinormality assumption, and the mathematical derivation of the definition of effective sample size is given. The theoretically expected number of degrees of freedom for the modified t test with renewed modifications is compared with that computed on the basis of equation (2.9) of Clifford et al. The largest differences are observed for small numbers of locations and high autocorrelation, in particular when the latter is present with opposite sign in the two processes Basic references that were missing in Clifford et al. are given and inherent ambiguities are discussed.
Article
A test, based on the probability of finding a certain species at increasing distances from another species, was used to describe the spatial interactions of plants in a Mediterranean pasture. The test was also applied to a set of models with two artificial species which had different spatial arrangements. Results from the models were satisfactory and helped interpret field results. The grassland species studied displayed different strategies of spatial occupation which depended on environmental conditions. The test proved to be suitable for describing species interactions in a very detailed manner. /// Тест, основанный на вероятности нахождения определенного вида, на увеличивающемся расстоянии от другого вида, использован для описания пространственных взаимодействий растений на Средиземноморском пастбище. Тест таюже использовали для серии моделей с двумя искусственными видами, имеющими различное пространственное распределение. Результаты моделиравания были удовлетворительны и помогли интерпретировать результаты полевых исследований. Изучение видов травянистых местообитаний показало наличие разных стратегий пространственного распределения, зависящего от условий среды. Тест оказался пригодным для описания взаимодействий видов в очень детальной форме.
Article
Clifford, Richardson, and Hemon (1989, Biometrics 45, 123-134) presented modified tests of association between two spatially autocorrelated processes, for lattice and non-lattice data. These tests are built on the sample covariance and on the sample correlation coefficient; they require the estimation of an effective sample size that takes into account the spatial structure of both processes. Clifford et al. developed their method on the basis of an approximation of the variance of the sample correlation coefficient and assessed it by Monte Carlo simulations for lattice and non-lattice networks of moderate to large size. In the present paper, the variance of the sample covariance is computed for a finite number of locations, under the multinormality assumption, and the mathematical derivation of the definition of effective sample size is given. The theoretically expected number of degrees of freedom for the modified t test with renewed modifications is compared with that computed on the basis of equation (2.9) of Clifford et al. (1989). The largest differences are observed for small numbers of locations and high autocorrelation, in particular when the latter is present with opposite sign in the two processes. Basic references that were missing in Clifford et al. (1989) are given and inherent ambiguities are discussed.