ArticlePDF Available

The Colocation Quotient: A New Measure of Spatial Association Between Categorical Subsets of Points

July 2011
Geographical Analysis 43(3):306 - 326

July 2011
43(3):306 - 326

DOI:10.1111/j.1538-4632.2011.00821.x

Authors:

Timothy F Leslie

George Mason University

Barry Kronenfeld

Eastern Illinois University

This article presents a new metric we label the colocation quotient (CLQ), a measurement designed to quantify (potentially asymmetrical) spatial association between categories of a population that may itself exhibit spatial autocorrelation. We begin by explaining why most metrics of categorical spatial association are inadequate for many common situations. Our focus is on where a single categorical data variable is measured at point locations that constitute a population of interest. We then develop our new metric, the CLQ, as a point-based association metric most similar to the cross-k-function and join count statistic. However, it differs from the former in that it is based on distance ranks rather than on raw distances and differs from the latter in that it is asymmetric. After introducing the statistical calculation and underlying rationale, a random labeling technique is described to test for significance. The new metric is applied to economic and ecological point data to demonstrate its broad utility. The method expands upon explanatory powers present in current point-based colocation statistics.

Illustrations of spatial patterns exhibiting (1, 2) asymmetry in pairwise categorical spatial associations and (3) spatial autocorrelation in the overall population.

…

Portion of the Barro Colorado tree data set.

…

Figures - uploaded by Barry Kronenfeld

Content may be subject to copyright.

Content uploaded by Barry Kronenfeld

Content may be subject to copyright.

The Colocation Quotient: A New Measure of

Spatial Association Between Categorical

Subsets of Points

Timothy F. Leslie,

Barry J. Kronenfeld

Department of Geography and Geoinformation Science, George Mason University, Fairfax, VA,

Department of Geology/Geography, Eastern Illinois University, Charleston, IL

This article presents a new metric we label the colocation quotient (CLQ), a mea-

surement designed to quantify (potentially asymmetrical) spatial association between

categories of a population that may itself exhibit spatial autocorrelation. We begin by

explaining why most metrics of categorical spatial association are inadequate for many

common situations. Our focus is on where a single categorical data variable is mea-

sured at point locations that constitute a population of interest. We then develop our

new metric, the CLQ, as a point-based association metric most similar to the cross-k-

function and join count statistic. However, it differs from the former in that it is based

on distance ranks rather than on raw distances and differs from the latter in that it is

asymmetric. After introducing the statistical calculation and underlying rationale, a

random labeling technique is described to test for significance. The new metric is

applied to economic and ecological point data to demonstrate its broad utility. The

method expands upon explanatory powers present in current point-based colocation

statistics.

Introduction

Geographers have long considered the relationship between the characteristics of

an object and its neighbors. Tobler’s statement relating all things, near more than

far, remains one of the few statements geographers can claim as ‘‘law’’ (Tobler

1970). Tobler’s law applies to qualitative concepts such as culture and ecological

process, as well as to quantified measures like patenting rates and potential eva-

potranspiration (Galiano 1986; O

´hUallacha

´in and Leslie 2005). Quantifying spatial

relationships has become a hallmark of geographic analysis.

Among the various Tobleresque relationships that are of interest to a geogra-

pher, the spatial relationship between distinct populations or distributions is one of

Correspondence: Timothy F Leslie, Department of Geography and Geoinformation Science,

George Mason University, 4400 University Dr MS 6C3, Fairfax, VA 22030

e-mail: tleslie@gmu.edu

Submitted: April 21, 2009. Revised version accepted: August 24, 2010.

Geographical Analysis 43 (2011) 306–326 r2011 The Ohio State University306

Geographical Analysis ISSN 0016-7363

the most fundamental. This type of spatial relationship may be denoted by the term

spatial association. Aspects of spatial association are captured in the concepts of

‘‘spatial overlay,’’ ‘‘cross-correlation,’’ and ‘‘colocation’’ (Wartenberg 1985; de

Smith, Goodchild, and Longley 2009). The term spatial association is used else-

where to refer to pattern either within a single population (i.e., spatial autocorre-

lation) or between two or more populations. Here, we confine usage of the term to

the latter situation only, corresponding to what Lee (2001) refers to as ‘‘bivariate

spatial association.’’ In contrast to spatial autocorrelation, analysis of spatial asso-

ciation requires simultaneous consideration of multiple patterns and processes. The

autocorrelative structure of a joint population, and of each distinct subpopulation,

should be considered when selecting a metric for spatial analysis, as these aspects

of pattern may influence the observed association between populations.

Our interest lies in the situation where a single categorical data variable is

measured at point locations that constitute a population of interest. Because the

values of interest are nominal in nature, measures of spatial association developed

for ratio point data, such as the cross-variogram (Vallejos 2008), are not suitable.

Other measures, such as the join count statistic (Cliff and Ord 1981), are typically

applied to polygon rather than point data; this is also true of Moran’s coefficient,

which can be applied to nominal data (Griffith 2010). The most similar measure to

what we propose is the cross-k-function (Cressie 1991), but because it measures

spatial association between two populations, the null hypothesis that it tests is not

appropriate to the situation in which categorized individuals come from a single

population.

To analyze this situation, we develop a generalized method to determine

whether categories within a population are spatially correlated and, if so, how, in

what direction, and by how much. This situation typifies a variety of problems in

both human and physical geography. For example, one might wish to determine the

colocation preferences of businesses of different types within a metropolitan area or

examine the relationship between pairs of tree species in a forest setting in order to

identify possible interspecies relationships.

In either case, the underlying distribution is the result of two conceptually dis-

tinct spatial processes. First, the spatial structure of the overall population may

cause point locations to be clustered or dispersed. Second, nested within the overall

spatial pattern, relationships between categories may result in some categories be-

ing more or less likely to occur near others. Failure to distinguish between these two

hierarchical processes can result in spurious findings. In addition to separating

spatial association between categories from overall population clustering, recog-

nizing that categorical effects may be asymmetric is also important. Asymmetric

relationships in ecology include obligatory predatorism and parasitism, in which a

predator or parasite is confined to locations where the prey or host is found, but

the reverse is not necessarily true. In logistics, businesses further down a supply

chain often are dependent on (and therefore located near) their suppliers, while

suppliers locate based on natural resources and other inputs. Any metric of pairwise

The Colocation QuotientTimothy F. Leslie and Barry J. Kronenfeld

307

categorical spatial correlation should be able to deal with this asymmetry. In some

cases, points in category A may prefer category B, points in category B may prefer

category C, and points in C may prefer being close to category A. In such cases, a

symmetrical spatial association metric could potentially find ‘‘significance’’ in all

or none of the bidirectional pairwise associations it measures (e.g., A 2B,

B2C, and C 2A).

Our metric, the colocation quotient (CLQ), quantifies spatial relationships

between categories by building on the concept of the location quotient used by

geographers and economists to judge a region’s degree of specialization in a

particular industry (Blair 1995; Stimson, Stough, and Roberts 2006). The CLQ is

defined with respect to two categories (e.g., types A and B), and provides a measure

of the degree to which one categorical subset is spatially dependent on the other.

Specifically, CLQ

A!B

measures the degree to which type A events are spatially

attracted to type B events. The CLQ is calculated as a ratio of observed versus ex-

pected points of one type among the set of nearest neighbors of points of another

type. It also may be viewed as a modification of traditional measures of spatial

correlation between categories, including the join count statistic and the cross-k-

function.

In the next section, we explain why these existing metrics of categorical spatial

association are inadequate for many common situations. We then develop a new

statistical measure and a corresponding significance-testing framework. Finally, we

present applications of the metric in socioeconomic and physical geography con-

texts, and then conclude with suggestions for implementation.

Motivation

Categorical variables present an interesting challenge to measuring Tobler’s law

due to the multiplicity of relationships. In particular, the presence of multiple sub-

categories of a single type of entity results in two complicating factors, which are

illustrated in Fig. 1. First, the interaction between any given pair of categories often

is asymmetrical. In Fig. 1(1), asymmetry results from a unidirectional dependency,

in which individuals of category B are found only in close proximity to category A,

but individuals of category A may be found in any location independent of the

Figure 1. Illustrations of spatial patterns exhibiting (1, 2) asymmetry in pairwise categorical

spatial associations and (3) spatial autocorrelation in the overall population.

Geographical Analysis

308

presence of category B. As demonstrated, and despite its similarity to Fig. 1(1), the

attraction between A and B in Fig. 1(2) is symmetric; a spatial association metric

must be capable of distinguishing between these two patterns. Second, spatial

relationships between categories often are confounded by spatial autocorrelation of

the joint population. Fig. 1(3) illustrates a situation in which spatial autocorrelation

exists in the overall population, but little further categorical association is evident.

A metric that cannot distinguish between these two types of correlative processes

has only limited practical value.

In the following section, we review the two most commonly used metrics of

spatial association between categories: the join count statistic and Ripley’s cross-k-

function. We argue that each metric has shortcomings when applied to the afore-

mentioned situations.

Join count statistic

The join count statistic is an area-level measure of spatial association, that is, of

correlation between categories on a k-color map (Dacey 1965). The statistic op-

erates by comparing the number of times a pair of categories occurs in adjacent

positions with the expectation of randomness (Iyer 1949; David 1971; Cliff and Ord

1981). It often is applied to binary grid data, which are typically conceptualized as

a black-and-white checkerboard or as an irregular polygon tessellation, in which

case the statistic becomes a measure of spatial autocorrelation. Links between each

polygon are counted as color-same (black touching black or white touching white)

or color-different (black touching white). The resulting counts are tallied, and they

can be used to determine if the data are significantly autocorrelated. These counts

are compared with the expectations for a binomial random variable under the null

hypothesis using a w

distribution.

Despite being conceptually simple, the join count statistic has not been im-

plemented in most popular geographic information system software packages. In

spatial settings where source data come as points rather than as areal values, com-

putation requires a geometrical association of point pairs. Traditionally, this asso-

ciation is accomplished by drawing Thiessen polygons around each point and

treating the resulting diagram as a polygon tessellation (Upton and Fingleton 1985).

This implementation introduces a certain degree of arbitrariness to the pairing of

points with their ‘‘neighbors.’’ Furthermore, point pairs are defined in a symmet-

rical manner, so that counts of A !B and B !A joins are equal by definition.

The join count statistic, by nature, cannot detect asymmetry such as that shown in

Fig. 1(1).

The emphasis on binary measures also results in the join count statistic

rarely being used to measure categorical spatial association. Although the under-

lying theory for moving the join count statistic to more than two colors is well

developed (Haining 2003), this sort of analysis is rarely done in practice. Instead,

scholars primarily use this statistic to examine the degree of spatial autocorrelation

The Colocation QuotientTimothy F. Leslie and Barry J. Kronenfeld

309

within individual categories, even when their data contain multiple categories

(e.g., Stevens and Jenkins 2000). Another spatial binary conceptualization, the

autologistic model, measures the likelihood of a point’s neighbor having a value

of one given that the point itself has a value of one (Cressie 1991). The autologistic

model provides a framework for quantifying the probability of occurrence

given neighborhood relations but does not provide theoretical guidance about

how neighborhood relations should be defined. Multivariate developments for the

autologistic model are very recent (Kavousi, Meshkani, and Mohammadzadeh

2010).

Researchers who have applied join count measures to multivariate data have

not implemented an ‘‘overall’’ statistic, so analysis can be done only by examining

each pair of variables separately. The original join count statistic has no need of an

overall statistic: either a binary pair is significantly autocorrelated negatively or

positively, or it is insignificant. In the multiple-category situation, the matrix of

pairwise significance in each pair of categories should be augmented by an overall

count of same–same linkages and an associated significance value, similar to the

work by Dixon (2002) and Ceyhan (2008).

Cross-k-function

The cross-k-function is an extension of Ripley’s k-function for two distributions

(Cressie 1991). It measures the overall density of category B within a prescribed

neighborhood around individuals of category A. This measure is compared with the

overall density of B, which is equivalent to the probability of finding B in a random

area. Results are presented as a graph of clustering over the range of distance radii,

similar to presentations of Ripley’s k.

The cross-k-function is asymmetrical and can account for differences in the

complementary relationships between two categories. However, metrical distance

is used rather than topological neighborhood distance. As a consequence, the

effects of the spatial pattern of an overall population are comingled with the effects

of cross-correlation between categories. The cross-k-function, while providing a

graph of results, can lead to erroneous conclusions because of effects occurring at

multiple scales within a data set. Given the pattern shown in Fig. 1(3), for example,

the cross-k-function would report highly significant positive spatial correlations

between every pair of categories because the density of individuals in each cate-

gory is higher in the vicinity of individuals of any other category. However, if A, B,

and C are businesses and if each cluster represents a city, then this is a trivial result

in most types of analysis because businesses clustering within cities is already well-

known. Of greater interest is the question of whether specific pairs of categories are

more mutually clustered than would be expected given the spatial pattern of a

parent population. A variation of the cross-k-function developed for network situ-

ations partially corrects this problem, but the correction is only applicable for net-

work-based analyses (Okabe and Yamada 2001).

Geographical Analysis

310

Other related metrics

A number of papers and articles propose or examine methods of analyzing spatial

association between values measured at points. Clifford, Richardson, and Hemon

(1989) and Dutilleul (1993) examine the effect of spatial autocorrelation on the

significance value of the standard correlation coefficient (r) between two geocoded

variables. Wartenberg (1985) multivariate spatial correlation expands Moran’s Ito

examine quantitative multivariate geographic distributions and shows analogies

to principal components analysis. Lee (2001) builds on Wartenberg (1985) work to

decompose Moran’s Iinto a spatial smoothing scalar and correlation (Pearson’s r)

between spatially lagged (smoothed) values of observed variables, and uses this

decomposition to create a bivariate measure of spatial association. Vallejos (2008)

and Rukhin and Vallejos (2008) use a normalized cross-variogram, treating point

data as samples of a continous spatial process. These measures all derive from a

conceptualization of two or more ratio variables distributed on a continuous spatial

domain. Although it may be possible to adapt these methods to the measurement

of spatial association between nominal values distributed on a discontinuous

(point) domain, such adaptation is not straightforward and is beyond the scope of

this article.

Other metrics have been developed to describe spatial association between

categories of points, but none handle the problems of asymmetry and nested pattern

correlation. Galiano (1986) calculates conditional probabilities within distance

neighborhoods, similar to the cross-k-function, to study relationships between tree

species. Dale (1999) dismisses Galiano’s conditional probabilities as being equiv-

alent to the paired quadrat covariance, a symmetrical measure of cross-correlation

based on fixed-area quadrats that is affected by spatial pattern in a joint population.

Leslie and O

´hUallacha

´in (2006) presented the nearest establishment with asym-

metrical relationships (NEAR) statistic, a preliminary version of the CLQ, but they

exclude same-category associations, discount multiple observations at a single lo-

cation, and do not provide a basis for understanding the statistical significance of

their results. The need remains for an asymmetrical topological measure that can

work with categorical data constrained by a parent population that may itself be

clustered (or dispersed). The use of nearest neighbors is apposite, as the closest

individuals are generally expected to have the greatest influence (Ord 1990).

Within the field of ecology, a few investigations use nearest neighbor contingency

tables to investigate point patterns (Dixon 1994, 2002; Ceyhan 2008). However,

these investigations lack a solid semantic foundation from which other researchers

can choose when to use their statistics versus other statistics. The CLQ was devel-

oped to fill this need and is explained in the following section.

Method

Although a number of metrics exist to quantify spatial association (Cressie 1991;

Okabe and Yamada 2001; Dixon 2002; Leslie and O

´hUallacha

´in 2006), a

The Colocation QuotientTimothy F. Leslie and Barry J. Kronenfeld

311

conceptual framework to distinguish these patterns and processes has not been ar-

ticulated. As an impetus for our methods, we seek to do three things: (1) to con-

solidate our proposed analytic into a small number of accessible equations; (2) to

discuss the existence and causes of asymmetry in measures of spatial association;

and (3) to create and test a null hypothesis that captures the analytical power and

explains the limits of the statistic. In developing our methodological framework, we

build upon the importance of nearest neighbors. This construction is done to ad-

dress the problems that arise from the clustering of joint point patterns for reasons

other than colocation.

In developing a statistical metric that distinguishes between the effects of

autocorrelation of a joint population and the specific associative relationships

between pairs of categorical subsets, identification of the appropriate null hypoth-

esis is important. While research implementing the cross-k-function uses a null

hypothesis of ‘‘there is no spatial association between any pair of categorical sub-

sets,’’ the null hypothesis for a CLQ-based analysis is ‘‘given the clustering of the

joint population, there is no spatial association between pairs of categorical sub-

sets.’’ That is, we take the geometric pattern of a joint population as a given and

search for associations that cannot be explained by this joint pattern alone.

Let Pdenote a point population within which each individual is assigned

uniquely to one of k-categories in a classification system X, and let AAX and BAX

denote (possibly the same) categories in X. CLQ

A!B

is defined as the ratio of ob-

served to expected proportions of B among A’s nearest neighbors. Formally, this

calculation is given by

CLQA!B¼CA!B=NA

BðN1Þ;ð1Þ

where Ndenotes the population size of the set of categories under analysis; N

denotes the population size of A; N0

denotes the population size of B (if A6¼B) or

the population size of B minus 1 (if A 5B); and C

A!B

denotes the count of type A

points whose nearest neighbor is a type B point; defined more rigorously in equa-

tion (7) below. The numerator of CLQ

A!B

is the proportion of type B points among

A’s nearest neighbors (i.e., the observed proportion), while the denominator is the

proportion of type B points that could be a nearest neighbor to each type A point

(i.e., the expected proportion). To calculate the expected proportion, N1 rather

than Nis used in the denominator, because a point cannot be its own nearest

neighbor (Dixon 2002). Similarly, in the calculation of the same–same category

CLQ, N0

is defined as the count of type B ( 5A) points minus one because each

point of category A can have all other points of type A as neighbors except itself.

Semantically, CLQ

A!B

denotes the spatial attraction of A to B, or, alterna-

tively, as the degree to which B attracts A. For example, CLQ

A!B

52 indicates

that A is twice as likely to have B as its nearest neighbor (i.e., to locate near a point

of type B) as would be expected by chance. The attraction expressed by CLQ

A!B

is unidirectional because it is dependent on nearest neighbor relationships that may

Geographical Analysis

312

be asymmetric. If many cases exist where A’s nearest neighbor is B but B’s nearest

neighbor is not A, then C

A!B

B!A

, and, therefore, CLQ

A!B

4CLQ

B!A

logically expressing that A is more attracted to B than B is to A. Same-category

CLQs are interpreted in a similar manner, such that a CLQ

A!A

50.67 indicates

that A is only two-thirds as likely to be its own nearest neighbor as would be

expected given A’s proportion in the overall parent population; in this case, the

attraction is bidirectional.

The CLQ can be viewed as a simple modification of either the join count sta-

tistic or the cross-k-function. With regard to the join count statistic, the CLQ is

derived by replacing pairwise joins with nearest neighbor counts. From the cross-k-

function, the CLQ is derived by substituting neighbor ranks for absolute distances as

the basis for determining relative probabilities. The CLQ also may be considered an

extension of metrics used to measure the degree to which categories are associated

with specific locations or types of locations. Economic geographers are familiar

with the location quotient, which measures the ratio of a local economy’s propor-

tion of economic activity in a particular sector to the proportion of activity in the

country and/or region that encompasses it (Blair 1995; Stimson, Stough, and Rob-

erts 2006). The location quotient assigns values greater (or less) than one to places

with greater (or less) than average activity in a particular sector. A similar measure

used in forestry and other ecological applications is fidelity, which describes the

degree to which a given species is associated with a particular community type

(e.g., Dyer 2006). Though a conceptual descendant of these metrics, the CLQ de-

scribes spatial association of one category of objects with another category rather

than with a region or set of regions.

Like classical location quotients, a CLQ value of one has semantic importance.

The value of one occurs when the proportion of category B individuals among

category A individuals’ closest neighbors equals the proportion of category B in its

overall population (excluding one individual of category A). A CLQ

A!B

greater

than one shows a higher number of nearest neighbors of category B than expected

given the relative counts in its population, whereas a value less than one indicates

that points in group B are closest neighbors to points in group A less frequently than

expected. The lowest possible value is zero, which occurs when no points in cat-

egory B are the closest neighbor to any points of category A. Every integer value

above unity indicates a multiple of ‘‘closeness’’ more than expected. The same-

category CLQ is undefined if any category has fewer than two points.

The CLQ does have a maximum value that depends on the proportion within

the overall population of the category under examination as well as certain geo-

metrical constraints. Ignoring these geometrical constraints, the proportional max-

imum CLQ

A!B

is found when all of A’s neighbors are B (C

A!B

), resulting in

CLQA!B¼NA=NA

BðN1Þ¼1

BðN1Þ¼N1

:ð2Þ

The Colocation QuotientTimothy F. Leslie and Barry J. Kronenfeld

313

This formula shows that the maximum degree to which A can be attracted to B

depends on the population of B and, somewhat counterintuitively, that this rela-

tionship is an inverse one. In other words, the larger the population of B, the less the

attractive force it can exert on A. The reason for this is that if B constitutes a sig-

nificant proportion of the overall population, then one would expect a large pro-

portion of type A individuals to locate near a type B individual due to chance alone.

For example, suppose that type B individuals made up 500 (nearly half) of an

overall population of 1001; regardless of the population of A, one would expect

half of all type A individuals to have a type B individual as their nearest neighbor.

Even if every type A individual had a type B individual as its nearest neighbor, this

would only be twice as many as expected, and the maximum value of CLQ

A!B

would be equal to two. This issue becomes important only when dealing with

a large number of categories with substantial variance in their relative popula-

tions, especially when one category makes up a large percentage of the parent

population.

Geometry also limits the maximum CLQ value. Maximum values occur when

each point of category B has a point of category A as its nearest neighbor. This

maximum increases as B becomes a larger share of the overall population. A geo-

metric limit occurs when every point of category A is surrounded by a ring of five

category B points that each have the category A point as their nearest neighbor. In

this situation, any additional category B point will be just as close or closer to an

existing category B point as it is to the central A point, and so the proportion of

category B points that have A as their nearest neighbor cannot increase any further.

Therefore, 5N

is substituted for C

A!B

in equation (1) to determine the geometric

maximum:

CLQA!B¼5NB=NA

NB=ðN1Þ¼5N1ðÞ

;ð3Þ

for nonsame category analysis. For the same–same category analysis, the geometric

maximum does not apply, as higher values are not achieved when a set of points is

closer to a central point than to other points in a ring. In situations where N

is more

than five times the size of N

, the geometric maximum rather than the proportional

maximum is the maximum CLQ value.

Considering both numerical and geometric constraints, equation (4) furnishes

the formula for the maximum CLQ, which holds for both the multivariate and

bivariate situations:

MaxðCLQA!BÞ¼Min N1

;5N1ðÞ



:ð4Þ

This maximum expresses two constraints on the value of CLQ

A!B

. First, as the

relative population of B increases, the degree to which A is attracted to B is limited

numerically. This restriction arises because having a category B point as a nearest

neighbor is less ‘‘surprising’’ when category B points are more common. Second, as

Geographical Analysis

314

the proportion of A increases, the degree to which A is attracted to B is limited

geometrically. This constraint occurs because only so many category A points can

be packed around each category B point.

Because the CLQ semantically indicates the ratio of observed to expected,

demonstrating that its expectation is unity is important. Given the values of N,N

and N0

, a random allocation of categories dictates that the expected count C

A!B

of type A points’ nearest neighbors that are of type B is simply N

multiplied by the

conditional probability of selecting a point of type B given that one point of type A

has already been selected:

A!B

ðÞ¼NAEp

BjA



¼NAN0

N1



:ð5Þ

Substituting this expected count of nearest neighbors E(C

A!B

) into equation

(1), the expectation of the CLQ becomes

ECLQA!B

ðÞ¼ECA!B=NA

BðN1Þ

¼N1

NAN0



A!B

ðÞ¼

N1

NAN0



NAN0

N1



¼1:ð6Þ

Therefore, the expected value of the CLQ is one if categories are randomly

allocated across a fixed point pattern.

In the preceding discussion, each category A point is assumed to have exactly

one nearest neighbor. However, often a point has several equidistant nearest

neighbors. This could occur when multiple points coexist or appear to coexist at

a single location, as, for example, when several retail stores share the same postal

address. Equidistant nearest neighbors also are common when points are arranged

on a regular grid. By making explicit the definition of C

A!B

in equation (1), the

CLQ can easily accommodate such situations. Because CLQ

A!B

expresses the

degree to which A is attracted to B, the weight of each category A point is made

equal by formally defining C

A!B

CA!B¼X

i¼1X

j¼1

Bijð1;0Þ

n;ð7Þ

where nis the number of equidistant nearest neighbors a point has, and B

is a 0–1

decision rule variable of whether the jth equidistant nearest neighbor of the ith

category A point is of category B. Thus, the contribution of each category A point to

the total nearest neighbor count is exactly one and is not determined by the number

of equidistant neighbors.

Asymmetry is defined by the condition that CLQ

A!B

6¼CLQ

B!A

. Equation (1)

shows that this occurs only when C

A!B

6¼C

B!A

. This means that asymmetry in

the CLQ results if and only if there exist asymmetrical spatial configurations in

which the nearest neighbor of an individual does not have that individual as its own

nearest neighbor. In Fig. 1(1), for example, several clusters are present in which

The Colocation QuotientTimothy F. Leslie and Barry J. Kronenfeld

315

more than one individual of type B is clustered around a single individual of type A.

This results in asymmetry because there are many type B individuals whose nearest

neighbor is an individual of type A but who are not the nearest neighbors of that

type A individual. Specifically in this example, CLQ

A!B

50.95 (not statistically

significantly different from one), but CLQ

B!A

51.9: B is the closest neighbor of A

just slightly less than would be expected for a completely random mixture, while A

is the closest neighbor of B much more than a random mixture would suggest. This

result indicates that the occurrence of B is strongly dependent on the occurrence of

A, but not vice versa. The maximum CLQ value for both sectors is 1.9, which is

reached by CLQ

B!A

. Also notable is that CLQ

B!B

is 0, while CLQ

A!A

is 1.05.

These same-category CLQs show that B appears to avoid itself as its own nearest

neighbor (as noted previously, it appears to strongly prefer A as its nearest neigh-

bor), while A has itself as its own nearest neighbor only slightly more than average.

In Fig. 1(2), half of the points of category B have been removed. This new pat-

tern shares very little in common with Fig. 1(1) because significant differences exist

between the two figures of when A is B’s nearest neighbor but not the reverse.

Now CLQ

A!B

and CLQ

B!A

are the same value: 1.4. These CLQ values seman-

tically indicate that A and B prefer the opposite category 40% more than would be

expected for a random distribution. The cross-category CLQs are not the only val-

ues to change with the removal of category B points: while CLQ

B!B

remains zero,

CLQ

A!A

decreases to 0.77. In Fig. 1(2), the number of As is a much larger portion

of the population, and the expectation of same-category links increases. However,

as the actual number of same-category nearest neighbors from category A to cat-

egory A remains the same, CLQ

A!A

decreases between Figs. 1(1) and 1(2). This

example illustrates how spatial attraction of one categorical subset for another is

influenced not only by the number of observed nearest neighbors but also by the

proportions of the category types within the parent population that influence the

expected number of nearest neighbors.

While the pairwise CLQ matrix provides a means of identifying important cat-

egorical relationships, a global statistic facilitates significance testing. The global

CLQ is defined as the ratio of the observed number of same-category nearest

neighbor pairs to that expected number under the null hypothesis of no spatial as-

sociation between categories. This global CLQ is demonstrated as

CLQGlobal ¼P

A2X

CA!A

A2X

NANA1

N1



:ð8Þ

The denominator represents the expected number of same-category nearest

neighbors under the null hypothesis of no spatial association.

To determine which patterns are statistically different from random, we need to

know the likelihood of an observed CLQ occurring within a given spatial pattern if

categorical assignments are random. Research on spatial autocorrelation statistics

Geographical Analysis

316

such as the join count shows that the assumptions of a simple t-test of the difference

of proportions do not hold and that nonnormality varies with the size of a data set

(Cliff and Ord 1981). We follow recent developments in spatial statistics, such as

the LISA (Anselin 1995, 2003), the network cross-k-function (Okabe and Yamada

2001), and the Ripley’s k-function (Marcon and Puech 2003) in using Monte Carlo

simulation, which makes no assumptions about the expected distribution except

that location behaviors are similar across a study area. In each simulation trial, the

proportion of the total population assigned to each category is held constant, but

these category assignments are randomly redistributed within a population, and

each pairwise CLQ as well as the CLQ

global

is recalculated. After a predetermined

number of permutations, the simulated sample distributions for the pairwise and

global CLQs are used to determine the significance of observed CLQs. Two-tailed

significance is determined by taking the lesser of the number of trials in which the

simulated CLQ was greater than or equal to, or less than or equal to, the observed

CLQ and multiplying by two.

Calculation of the CLQ and Monte Carlo simulation to determine significance

was implemented in a Visual Basic. Net (Microsoft Corp., Redmond, WA) stand-

alone program. A spatial index was created to facilitate efficient computation of

nearest neighbors, which is performed in O(nlog (n)) time including index creation

(Friedman, Bentley, and Finkel 1977), where nis the number of points in the pop-

ulation. Each Monte Carlo simulation requires an O(n) allocation of categories

among the existing points but not recomputation of nearest neighbors. Therefore,

the overall computational efficiency is, therefore, the greater of O(nlog n) and

O(mn), where mis the number of simulations.

Finally, the ability of the method to incorporate easily point-based data sets in

the CLQ is substantial because it does not require the use of predefined areal units

that were devised for purposes outside the identification of the phenomenon in

question, such as traffic analysis zones to examine metropolitan employment (Gi-

uliano and Small 1991) or municipal townships to analyze ecological data (Cogbill,

Burk, and Motzkin 2002). Point data do not have the typical location quotient

sensitivity to scale (Mulligan and Schmidt 2005). However, caution is warranted

when applying the CLQ to categories with a small count. We do not recommend

the use of this method when a category has fewer than 10 individuals because the

power of the test will be low. Aggregating categories, if theoretically appropriate,

would likely solve this problem.

Two applications of the CLQ

To illustrate the proposed metric, we determined global and pairwise CLQs for two

data sets from very different domains. The first data set consists of 36,909 business

establishments in the Phoenix metropolitan region, classified according to eco-

nomic sector. The second data set consists of 368,122 trees tallied in a 50 ha plot on

Barro Colorado Island in the Panama Canal, classified according to health and

The Colocation QuotientTimothy F. Leslie and Barry J. Kronenfeld

317

health-related physiognomic characteristics. For each data set, we ran 10,000 sim-

ulations to determine significance values for the global and pairwise CLQs. Global

CLQs for both data sets are strongly positive and highly significant (Po0.001).

Establishments in phoenix

Which types of businesses locate near one another? Do establishments colocate

near establishments in their own sector or in different sectors? While this topic is

addressed at the intrasectoral level (no same-category associations) in Leslie and

´hUallacha

´in (2006), the addition of same-category associations as well as intra-

sectoral links provides a comparison and a more in-depth look at the postmodern

metropolitan area of Phoenix. Phoenix is a flat city with few natural barriers, de-

veloped around the automobile, and decentralized for suburbanizing residents

(Gammage 2003). Intrasectorally, in 2002 Phoenix had three groupings: secondary

sector, wholesale and transportation, and administrative support; finance, insur-

ance, and producer services; and retailers with entertainment and accommodation

establishments (Leslie and O

´hUallacha

´in 2006). In the Leslie and O

´hUallacha

´in

(2006) analysis, the effect of same-category associations is mentioned but not

investigated.

A point-level establishment data set created by the Maricopa Association of

Governments (MAG) is used here. As a regional planning alliance, MAG conducts a

regular survey of nongovernmental employers in the region, most recently in 2004.

Business category groupings identified by this survey appear in Table 1. Maps

Table 1 Descriptive Categorical Information for Phoenix Economic Analysis, 2004

NAICS Sector N

11–23 Agriculture, mining, utilities, construction 3,705

31–33 Manufacturing 3,021

42–43 Wholesale trade 2,732

44–45 Retail 5,279

48–49 Transport 802

51 Information 786

52 Finance and insurance 1,931

53 Real estate 1,608

54–55 Professional, scientific, technical, management services 3,631

56 Administrative support 1,954

61 Education 1,185

62 Health care and social assistance 3,045

71 Arts, entertainment, and recreation 622

72 Accommodation and food services 3,199

81 Other services 2,886

92 Public administration 519

Total 36,905

Geographical Analysis

318

of these data appear in Leslie and O

´hUallacha

´in (2006). The data are spatially

clustered and have a nearest neighbor R-value of 0.24 (significantly nonrandom at

the 0.01 level). Conducting a cross-k-function analysis on this data set would likely

find many pairwise categorical associations simply because the data set itself is

highly clustered.

As noted, the global CLQ is strongly positive (CLQ

global

52.53). This finding is

highlighted by the presence of same-category pairwise CLQs (Table 2, diagonal)

that are significant and greater than one, which indicates that businesses of all

categories have strong preferences for colocating with other businesses of the same

category. This effect is strongest in public administration establishments (NAICS

92), which are 14 times more likely to have their neighbor be the same category

than would be expected for a random distribution. This effect is weakest for ad-

ministrative support establishments (NAICS 56), which are just one-and-a-half

times more likely to have another administrative support establishment as their

nearest neighbor. Same-category values indicate that most establishment types are,

in general, two-and-a-half times more likely to locate next to a similar establish-

ment than would be expected from a random mixture.

A sector-by-sector inspection reveals that, in general, location preferences do

tend to be symmetric and reveal substantial category-groupings. The classic group-

ing of natural resources and subsequent processing is present. The primary sector

(agriculture, mining, utilities, and construction [NAICS 11–23]), manufacturing

sector (NAICS 31–33), wholesale sector (NAICS 42–43), and transportation and

warehousing sector (NAICS 48–49) all have high preferences for each other, with

the unexpected exception that the wholesale sector and transportation and ware-

housing sector have no associative links with the primary sector. The primary sector

also has a mutually high CLQ with administrative support (NAICS 56) and educa-

tion (NAICS 61), although the manufacturing, wholesale, and transportation and

warehousing sectors do not. The retail sector (NAICS 44) has CLQs significantly less

than one for almost every category except itself and the accommodation and food

services sector (NAICS 72), likely a result of the placement of Phoenix retail es-

tablishments in strip and shopping malls. Other services (NAICS 81) tend to have

retail as a nearest neighbor more often than expected, although the reverse is not

true. Producer services also have a grouping; information (NAICS 51), finance and

insurance (NAICS 52), and professional, scientific, technical, and management

services (NAICS 54–55) have mutually reciprocated high CLQs with each other.

Left out of this mix is the real estate sector (NAICS 53), which has strong location

preferences only with the finance and insurance sector but not with information or

professional, scientific, technical, and management services. Professional, scien-

tific, technical, and management services are mutually close to administrative sup-

port, but information and finance and insurance services do not share this

association. Finally, accommodation and food services have a link with other ser-

vices, also likely to be a result of colocation in strip malls throughout the metro-

politan area.

The Colocation QuotientTimothy F. Leslie and Barry J. Kronenfeld

319

Table 2 CLQs for Phoenix Establishments by NAICS, 2004

NAICS 11–23 31–33 42–43 44–45 48–49 51 52 53 54–55 56 61 62 71 72 81 92 GEO MAX

11–23 2.16 1.25 1.20 0.61 0.74 0.86 1.34 1.36 0.56 0.44 0.63 49.80

31–33 1.33 2.59 2.19 0.69 1.60 0.42 0.69 0.75 0.56 0.35 0.49 0.48 0.77 0.49 61.08

42–43 2.28 2.33 0.73 1.48 0.61 0.74 0.75 0.65 0.43 0.56 0.56 0.75 0.57 67.54

44–45 0.53 0.62 0.64 2.41 0.61 0.49 0.82 0.82 0.48 0.64 0.52 0.58 1.70 0.51 34.95

48–49 1.34 1.90 1.68 0.72 5.11 0.43 0.58 0.44 0.52 0.70 230.07

51 0.76 0.73 2.27 1.7 1.63 0.69 0.75 234.76

52 0.52 0.45 0.55 0.75 0.31 1.9 3.31 1.38 1.73 1.30 0.61 0.64 95.56

53 0.53 0.73 1.37 1.68 1.81 114.75

54–55 0.88 0.67 0.76 0.55 0.47 1.69 1.87 2.19 1.5 0.76 0.60 0.76 50.82

56 1.18 0.82 0.72 1.39 1.52 1.63 0.70 0.63 94.43

61 1.29 0.61 0.60 0.71 0.69 1.37 3.52 1.65 0.64 155.71

62 0.47 0.42 0.41 0.67 0.47 0.69 0.81 0.72 4.19 0.74 60.60

71 0.61 0.61 1.5 2.48 1.34 1.38 296.66

72 0.35 0.41 0.45 1.73 0.45 0.70 0.64 0.62 0.67 0.74 2.60 1.29 0.56 57.68

81 0.72 0.86 1.20 0.81 0.70 0.85 1.19 1.77 0.57 63.94

92 0.73 0.42 0.60 0.42 0.49 14.39 355.53

NUM MAX 9.96 12.22 13.51 6.99 46.01 46.95 19.11 22.95 10.16 18.89 31.14 12.12 59.33 11.54 12.79 71.11

Note: Values indicate the likelihood of a point’s nearest neighbor belonging to the column category given that the point belongs to the row

category, as compared with a random distribution (CLQ

row !column

). Only colocation values significantly different from one at the 0.05 level or

below are shown. Global CLQ 52.53, P50.001.

Geographical Analysis

320

Important asymmetries are present, aside from those previously mentioned. The

primary and other services sectors both have a large number of asymmetric associ-

ations. Public administration (NAICS 92) also has several of these asymmetric asso-

ciations. Asymmetries appear to reflect not supply-chain mechanics but rather

category sets where one sector may be the basis of a cluster (such as a medical

[NAICS 62], retail, or government complex) and other establishment types are arrayed

around it. This pattern causes high levels of same-category associations, with asym-

metries in the support services (administrative support, other services). Some sectors

appear to have very little colocation requirements in either direction. The public ad-

ministration and arts, entertainment, and recreation sectors (NAICS 71) have only

three or four significant associations outside of their same-sector preferences.

Tree conditions in Barro Colorado island

Data for the Barro Colorado Island example were obtained from the Center for

Tropical Forest Science (Hubbell, Condit, and Foster 2005). Previously connected

with the larger forest, this island was formed when the surrounding area was

flooded during the construction of the Panama Canal in the early 20th century. All

trees within the plot have been tallied since 1982. The most recent data, collected

in 2005, include observations about health and growth conditions. Live trees are

noted if they are buttressed (i.e., have enlarged trunks at the base), multistemmed,

or leaning significantly from a vertical position, or if the main trunk is broken below

the crown. Buttressed trees have noticeably widened trunks near the ground, which

may be caused by wet or unstable soil or internal rot. In addition, trees that died

since the previous census are noted as still standing, down, or missing. More than

one condition is recorded for many trees; in these cases, we used the most severe

condition for analysis. In this manner, we assigned trees to one of eight possible

categories (Table 3).

Several hypotheses relating to spatial patterns of growth and mortality naturally

arise from these data. For example, one hypothesis is that all types of mortality are

Table 3 Tree Condition Categories Used in Analysis of the Barro Colorado Data

Condition Definition Count

Normal No special code recorded 175,077

Buttressed Buttressed to at least 1.3 m, but not leaning or broken 2,462

Multistem Multistemmed plant, not buttressed 20,542

Leaning Leaning, but not broken or dead 12,757

Broken Broken above 1.3 m, but not dead 11,816

Dead Dead, not downed or missing 86,491

Downed Dead, trunk lying on ground 7,408

Missing Tree recorded in earlier survey, but not found in 2005 51,569

Note: Categories were aggregated from the original data, in which a single tree could be

assigned multiple codes.

The Colocation QuotientTimothy F. Leslie and Barry J. Kronenfeld

321

colocated. An alternative hypothesis is that wind-controlled mortality events ex-

hibit a spatial pattern distinct from senescence, disease, and other types of mortality

that leave a tree standing. Other hypotheses relate to the causes of certain mor-

phological characteristics. For example, buttressing and multistemmed growth are

the result of favorable growing conditions or, conversely, are a response to adverse

conditions. Spatial patterns of colocation provide both direct and indirect evidence

for these types of hypotheses.

Fig. 2 portrays a portion of the data. The nearest neighbor statistic indicates that

the overall spatial pattern is slightly clustered (R50.982; Po0.01). By itself, this

minor departure from randomness is not a great concern. However, careful exam-

ination of Fig. 2 reveals that the pattern is more nuanced: apparent clustering is the

result of large gaps containing few or no trees, likely caused by the presence of

roads, rivers, or recent disturbances. If these gaps are excluded, the tree pattern in

the remaining areas is likely to exhibit a slightly dispersed pattern. When the CLQ

matrix is computed for these data, several significant patterns stand out.

First, same-category CLQs are significantly positive for seven of the eight cat-

egories. This tendency for trees of the same category to colocate results in a weak

but highly significant global CLQ (CLQ

global

51.13). Second, patterns of colocation

suggest common disturbances, perhaps from wind: live trees that are leaning or

broken are strongly colocated with each other and also with downed dead trees.

The strongest measures of same-category spatial autocorrelation also occur among

these three categories, which also are somewhat colocated with missing trees but

not with standing dead trees. Standing dead trees are not colocated with any other

category, a pattern that suggests that most deaths are caused by senescence or local

Figure 2. Portion of the Barro Colorado tree data set.

Geographical Analysis

322

disease outbreaks, rather than from windthrow. Also, both buttressing and multi-

stem growth appear to be negatively associated with disturbance. Indeed, the ab-

sence of buttressed trees is notable in the vicinity of leaning and downed trees and

of standing dead trees. Even normal live trees show a slight tendency toward co-

location with buttressed and multistem trees. Consequently, healthy trees living in

fertile, sheltered locations appear to be more capable of growing multiple stems

and buttressed bases.

Although most relationships in this example are symmetrical, a notable asym-

metry exists in relationships that involve buttressed trees. Leaning and downed trees

avoid locating near buttressed trees (CLQ

leaning !buttressed

50.56, Po0.01;

CLQ

downed !buttressed

50.71, P50.04). In contrast, buttressed trees locate near

leaning and downed trees only slightly less than would be expected by chance

(CLQ

buttressed !leaning

50.91, P50.45; CLQ

buttressed !downed

50.95, P50.78).

This finding suggests that buttressed trees tend to exclude other trees and prefer

isolation.

Final considerations

The CLQ is a reenvisioning of spatial association at the point level that provides an

overall and pairwise method of describing degrees of spatial association. The pair-

wise information is descriptive of the strength of a relationship and has a natural

interpretation as a multiplicative factor on probability of occurrence. Comparison

of bidirectional CLQ pairs further provides information about the potential asym-

metry among types of points. Each pair’s linkages are described simply, and the

final results describe spatial associations after controlling for the clustering level of

an overall data set, an improvement to the cross-k-function analysis. Finally, the

addition of same-category associations, the ability to calculate properly with mul-

tiple points in the same location, and a quantification of statistical significance are

substantial improvements over Leslie and O

´hUallacha

´in (2006) NEAR statistic. The

CLQ statistic is an extension of work by Dixon (1994, 2002) in its evaluation

of maximum ratio values and the production of a comparison variable that has a

semantic meaning.

While one of our goals for the CLQ is to capture asymmetrical relationships,

over the course of developing the CLQ we found that the definition of asymmetry is

more nuanced than we had at first anticipated. The CLQ specifically captures

asymmetry in spatial configuration, which occurs when the nearest neighbor of the

nearest neighbor of a point is not the original point. The CLQ describes how certain

categories can locate more or less often compared with a random distribution and

supports discovery of asymmetrical relationships in which one category appears to

have a greater chance of locating next to category X, while category X points do not

reciprocate, which can lead to meaningful insights about the underlying pattern.

Application of the CLQ to two very different data sets illustrates its general

applicability. Empirically, pairwise CLQs reveal three types of information, which

The Colocation QuotientTimothy F. Leslie and Barry J. Kronenfeld

323

are similar for both data sets. First, significant same-type autocorrelation is strong in

both data sets. Both business establishments in Phoenix and trees on Barro Colo-

rado Island are more likely to have neighbors of the same sector or category than

indicated by a random distribution. Second, symmetrical associations between

categories reveal major groupings or data clusters. In Phoenix, two major groupings

are revealed: one of establishments in resource- and transportation-based sectors,

and a second of high-level services involving information, finance and insurance,

and professional, scientific, technical, and management services, though not the real

estate sector. On Barro Colorado Island, two major groupings are also revealed: one

of healthy, buttressed, and multistemmed trees, and a second of leaning, broken,

downed, and missing trees, but not standing dead trees. Some categories, such as

retail in Phoenix and standing dead trees on Barro Colorado Island, do not show such

group affinities as they have relatively weak or no positive colocation tendencies

with any other category. Third, asymmetries in pairwise CLQs reveal inequities in the

degree of influence exerted by each category. In Phoenix, asymmetries surprisingly

do not appear to follow supply-chain mechanics but instead indicate that certain

industries form core clusters with a mix of other sector types around this core. On

Barro Colorado Island, asymmetry suggests that the forces that cause trees to buttress

themselves at the base also isolate these trees from others.

The CLQ appears to be robust and stable. We propose that the CLQ be used to

examine a point data set consisting of multiple subcategories of a single type of en-

tity, such as trees or businesses. The cross-k-function, in contrast, should be used to

quantify the relationship between conceptually distinct entity types whose joint pop-

ulation does not form a semantically meaningful unit. The CLQ should be used in

place of the cross-k-function in situations where clustering of a joint population could

confound results, though we realize that the distinction between conceptual category

types and subcategories is not always so clear-cut. In many real-world situations, the

joint population shares traits that suggest similar spatial distributions, and the purpose

of analysis is to identify pairwise categorical relationships beyond those expected

from a joint population. In the realm of human geography, the spatial patterns of

cities, homes, businesses, and political institutions are controlled by the overall pop-

ulation distribution and transportation infrastructure, and exhibit tendencies to locate

in varying degrees of proximity to other human activity centers. Similarly, natural

categories, such as lichen, birds, and igneous rocks, have distinct spatial patterns that

can be further decomposed into subcategories that might exhibit varying degrees of

spatial correlation with each other. In these cases, the CLQ can reveal interesting

patterns of association that may shed light on underlying processes of attraction,

repulsion, dependency, and resource requirements.

Software

A BSD-licensed software tool to calculate the CLQ for any point shapefile is avail-

able at http://seg.gmu.edu/clq (accessed April 18, 2011).

Geographical Analysis

324

Note

1 A table of CLQ results and significance is available from the author.

References

Anselin, L. (1995). ‘‘Local Indicators of Spatial Association—LISA.’’ Geographical Analysis

27, 93–115.

Anselin, L. (2003). GeoDa 0.9 User’s Guide. Urbana-Champaign, IL: Spatial Analysis

Laboratory, University of Illinois.

Blair, J. (1995). Local Economic Development: Analysis and Practice. London: Sage.

Ceyhan, E. (2008). ‘‘On the Use of Nearest Neighbor Contingency Tables for Testing Spatial

Segregation.’’ Environmental Ecological Statistics 17, 247–82.

Cliff, A. D., and J. K. Ord. (1981). Spatial Processes: Models and Application. London: Pion.

Clifford, P., S. Richardson, and D. Hemon. (1989). ‘‘Assessing the Significance of the

Correlation between Two Spatial Processes.’’ Biometrics 45, 123–34.

Cogbill, C. V., J. Burk, and G. Motzkin. (2002). ‘‘The Forests of Presettlement New England,

USA: Spatial and Compositional Patterns Based on Town Proprietor Surveys.’’ Journal of

Biogeography 29, 1279–304.

Cressie, N. A. C. (1991). Statistics for Spatial Data. New York: Wiley.

Dacey, M. F. (1965). ‘‘A Review of Measures of Contiguity for Two and K-Color Maps.’’ In

Spatial Analysis: A Reader in Statistical Geography, 479–95, edited by B. J. L. Berry and

D. F. Marble. Englewood Cliffs, NJ: Prentice-Hall.

Dale, R. T. (1999). Spatial Pattern Analysis in Plant Ecology. Cambridge: Cambridge

University Press.

David, F. N. (1971). ‘‘Measurement of Diversity: Multiple Cell Contents.’’ In Proceedings of

the Sixth Berkely Symposium on Mathematical Statistics and Probability, 4: 109–36.

Berkeley: University of California Press.

de Smith, M. J., M. F. Goodchild, and P. A. Longley. (2009). Geospatial Analysis: A

Comprehensive Guide to Principles, Techniques and Software Tools. Leicester, U.K.:

Matador.

Dixon, P. M. (1994). ‘‘Testing Spatial Segregation Using a Nearest-Neighbor Contingency

Table.’’ Ecology 75, 1940–48.

Dixon, P. M. (2002). ‘‘Nearest-Neighbor Contingency Table Analysis of Spatial Segregation

for Several Species.’’ Ecoscience 9, 142–51.

Dutilleul, P. (1993). ‘‘Modifying the t Test for Assessing the Correlation between Two Spatial

Processes.’’ Biometrics 49, 305–14.

Dyer, J. M. (2006). ‘‘Revisiting the Deciduous Forests of Eastern North America.’’ Bioscience

56, 341–52.

Friedman, J. H., J. L. Bentley, and R. A. Finkel. (1977). ‘‘An Algorithm for Finding Best

Matches in Logarithmic Expected Time.’’ ACM Transactions on Mathematical Software

3, 209–26.

Galiano, E. F. (1986). ‘‘The Use of Conditional Probability Spectra in the Detection of

Segregation between Plant Species.’’ Oikos 46, 132–38.

Gammage, G. (2003). Phoenix in Perspective: Reflections on Developing the Desert. Tempe,

AZ: Herberger Center for Design Excellence, College of Architecture and Environmental

Design, Arizona State University.

The Colocation QuotientTimothy F. Leslie and Barry J. Kronenfeld

325

Giuliano, G., and K. A. Small. (1991). ‘‘Subcenters in the Los Angeles Region.’’ Regional

Science and Urban Economics 21, 163–82.

Griffith, D. G. (2010). ‘‘The Moran Coefficient for Non-Normal Data.’’ Journal of Statistical

Planning and Inference 140, 2980–90.

Haining, R. (2003). Spatial Data Analysis: Theory and Practice. New York: Cambridge

University Press.

Hubbell, S. P., R. Condit, and R. B. Foster. (2005). Barro Colorado Forest Census Plot Data.

Available at http://www.stri.si.edu/ (accessed August 27, 2010).

Iyer, K. (1949). ‘‘The First and Second Moments of Some Probability Distributions Arising

from Points on a Lattice, and Their Applications.’’ Biometrika 36, 135–41.

Kavousi, A., M. R. Meshkani, and M. Mohammadzadeh. (2010). ‘‘Spatial Analysis of Auto-

Multivariate Lattice Data.’’ Statistical Papers (doi 10.1007/s00362-009-0302-0).

Available at http://www.springerlink.com/content/y566158255816717/ (accessed April

18, 2011).

Lee, S. (2001). ‘‘Developing a Bivariate Spatial Association Measure: An Integration of

Pearson’s rand Moran’s I.’’ Journal of Geographical Systems 3, 369–85.

Leslie, T. F., and B. O

´hUallacha

´in. (2006). ‘‘Polycentric Phoenix.’’ Economic Geography 82,

167–92.

Marcon, E., and F. Puech. (2003). ‘‘Evaluating the Geographic Concentration of Industries

Using Distance-Based Methods.’’ Journal of Economic Geography 3, 409–28.

Mulligan, G. F., and C. Schmidt. (2005). ‘‘A Note on Localization and Specialization.’’

Growth and Change 36, 565–76.

´hUallacha

´in, B., and T. F. Leslie. (2005). ‘‘Spatial Convergence and Spillovers in American

Invention.’’ Annals of the Association of American Geographers 95, 866–86.

Okabe, A., and I. Yamada. (2001). ‘‘The K-Function Method on a Network and Its

Computational Implementation.’’ Geographical Analysis 33, 271–90.

Ord, J. K. (1990). ‘‘Statistical Methods for Point Pattern Data.’’ In Spatial Statistics: Past,

Present, and Future, 31–35, edited by D. Griffith. Ann Arbor, MI: Institute of

Mathematical Geography.

Rukhin, A. L., and R. Vallejos. (2008). ‘‘Codispersion Coefficients for Spatial and Temporal

Series.’’ Statistics and Probability Letters 78, 1290–300.

Stevens, P. H., and D. G. Jenkins. (2000). ‘‘Analyzing Species Distributions among

Temporary Ponds with a Permutation Test Approach to the Join-Count Statistics.’’

Aquatic Ecology 34, 91–99.

Stimson, R. J., R. R. Stough, and B. H. Roberts. (2006). Regional Economic Development:

Analysis and Planning Strategy. Berlin: Springer.

Tobler, W. (1970). ‘‘A Computer Movie Simulating Urban Growth in the Detroit Region.’’

Economic Geography 46, 234–40.

Upton, G., and B. Fingleton. (1985). Spatial Data Analysis by Example. Vol. 1, Point Pattern

and Quantitative Data. Chichester: Wiley.

Vallejos, R. (2008). ‘‘Assessing the Association between Two Spatial or Temporal

Sequences.’’ Journal of Applied Statistics 35, 1323–43.

Wartenberg, D. (1985). ‘‘Multivariate Spatial Correlation: A Method for Exploratory

Geographical Analysis.’’ Geographical Analysis 17, 263–83.

Geographical Analysis

326

Mining significant local spatial association rules for multi-category point data

Article

Full-text available

Feb 2024

Spatial association rule mining can reveal the inherent laws of spatial object interdependence and is an important part of spatial data mining. Most of the existing algorithms for mining local spatial association rules are oriented towards the spatial association between two categories of points and cannot fully reflect the spatial heterogeneity of complex spatial relations among multiple categories of points. In addition, the interactions between points in different categories are often asymmetrical. However, the existing algorithms ignore this asymmetry. To address the above problems, an algorithm for mining local spatial association rules for point data of multiple categories based on position quotients is proposed. First, the proximity relationship between points is determined by an adaptive filter, and the spatial weight value is given according to Gaussian kernel function. Then, the multivariate local colocation quotient of each point is calculated to measure the strength of the local regional spatial association rule. Finally, the Monte Carlo simulation function is used to generate a random sample distribution to test the significance of the results. The algorithm is verified on artificial simulation data and real Point of Interest (POI) data. The experimental results show that the algorithm can identify significant association regions of different spatial association rules for point sets.

Bento: a toolkit for subcellular analysis of spatial transcriptomics data

Article

Full-text available

Apr 2024
GENOME BIOL

The spatial organization of molecules in a cell is essential for their functions. While current methods focus on discerning tissue architecture, cell–cell interactions, and spatial expression patterns, they are limited to the multicellular scale. We present Bento, a Python toolkit that takes advantage of single-molecule information to enable spatial analysis at the subcellular scale. Bento ingests molecular coordinates and segmentation boundaries to perform three analyses: defining subcellular domains, annotating localization patterns, and quantifying gene–gene colocalization. We demonstrate MERFISH, seqFISH + , Molecular Cartography, and Xenium datasets. Bento is part of the open-source Scverse ecosystem, enabling integration with other single-cell analysis tools.

Rethinking the null hypothesis in significant colocation pattern mining of spatial flows

Article

Full-text available

May 2024
J GEOGR SYST

Spatial flows represent spatial interactions or movements. Mining colocation patterns of different types of flows may uncover the spatial dependences and associations among flows. Previous studies proposed a flow colocation pattern mining method and established a significance test under the null hypothesis of independence for the results. In fact, the definition of the null hypothesis is crucial in significance testing. Choosing an inappropriate null hypothesis may lead to misunderstandings about the spatial interactions between flows. In practice, the overall distribution patterns of different types of flows may be clustered. In these cases, the null hypothesis of independence will result in unconvincing results. Thus, considering the overall spatial pattern of flows, in this study, we changed the null hypothesis to random labeling to establish the statistical significance of flow colocation patterns. Furthermore, we compared and analyzed the impacts of different null hypotheses on flow colocation pattern mining through synthetic data tests with different preset patterns and situations. Additionally, we used empirical data from ride-hailing trips to show the practicality of the method.

Mining co-location patterns of manufacturing firms using Q statistic and additive color mixing

Article

Full-text available

Mar 2024
PLOS ONE

The agglomeration effect significantly influences firms’ site selection. Manufacturing firms often exhibit intricate spatial co-location patterns that are indicative of agglomerations due to their reliance on material input and product output across various subdivisions of manufacture. In this study, we present an analytical approach employing the Q statistic and additive color mixing visualization to assess co-location patterns of manufacturing firms. We identified frequent pairs and triplets of manufacturing divisions, mapping them to reveal distinct categories: labor-intensive clusters, upstream/downstream industrial chains, and technology-spillover clusters. These agglomeration categories concentrate in different regions of the city. Policy implications are proposed to promote the upgrade of labor-intensive divisions, enhance the operational efficiency of upstream/downstream industrial chains, and reinforce the spillover effects of technology-intensive divisions.

Analysis of a spatial point pattern in relation to a reference point

Article

Full-text available

Jan 2024
J GEOGR SYST

This paper develops a new method for analyzing the relationship between a set of points and another single point, the latter of which we call a reference point. This relationship has been discussed in various academic fields, such as geography, criminology, and epidemiology. Analytical methods, however, have not yet been fully developed, which has motivated this paper. Our method reveals how the number of points varies by the distance from a reference point and by direction. It visualizes the spatial pattern of points in relation to a reference point, describes the point pattern using mathematical models, and statistically evaluates the difference between two sets of points. We applied the proposed method to analyze the spatial pattern of the climbers of Mt. Azuma, Japan. The result gave us useful and interesting findings, indicating the method’s soundness.

Spatial co-location patterns between early COVID-19 risk and urban facilities: a case study of Wuhan, China

Article

Full-text available

Jan 2024

Introduction COVID-19, being a new type of infectious disease, holds significant implications for scientific prevention and control to understand its spatiotemporal transmission process. This study examines the diverse spatial patterns of COVID-19 within Wuhan by analyzing early case data alongside urban infrastructure information. Methods Through co-location analysis, we assess both local and global spatial risks linked to the epidemic. In addition, we use the Geodetector, identifying facilities displaying unique spatial risk characteristics, revealing factors contributing to heightened risk. Results Our findings unveil a noticeable spatial distribution of COVID-19 in the city, notably influenced by road networks and functional zones. Higher risk levels are observed in the central city compared to its outskirts. Specific facilities such as parking, residence, ATM, bank, entertainment, and hospital consistently exhibit connections with COVID-19 case sites. Conversely, facilities like subway station, dessert restaurant, and movie theater display a stronger association with case sites as distance increases, hinting at their potential as outbreak focal points. Discussion Despite our success in containing the recent COVID-19 outbreak, uncertainties persist regarding its origin and initial spread. Some experts caution that with increased human activity, similar outbreaks might become more frequent. This research provides a comprehensive analytical framework centered on urban facilities, contributing quantitatively to understanding their impact on the spatial risks linked with COVID-19 outbreaks. It enriches our understanding of the interconnectedness between urban facility distribution and transportation flow, affirming and refining the distance decay law governing infectious disease risks. Furthermore, the study offers practical guidance for post-epidemic urban planning, promoting the development of safer urban environments resilient to epidemics. It equips government bodies with a reliable quantitative analysis method for more accurately predicting and assessing infectious disease risks. In conclusion, this study furnishes both theoretical and empirical support for tailoring distinct strategies to prevent and control COVID-19 epidemics.

Detecting the Spatial Association between Commercial Sites and Residences in Beijing on the Basis of the Colocation Quotient

Article

Full-text available

Dec 2023
ISPRS

Identifying the spatial association between commercial sites and residences is important for urban planning. However, (1) the patterns of spatial association between commercial sites and residences across an urban space and (2) how the spatial association patterns of each commercial format and different levels of residences vary remain unclear. To address these gaps, this study used point-of-interest data of commercial sites and residences in Beijing, China, to calculate colocation quotients, which were used for identifying the spatial association characteristics and patterns of commercial sites and residences in the city. The results show that (1) the global colocation quotient of commercial sites and residences in Beijing is below 1, indicating relatively weak spatial association. The spatial association between each commercial format and residences varies greatly and shows the characteristics of integration of high-frequency consumption and separation of low-frequency consumption. Additionally, the spatial associations between high-grade residences and commercial formats are relatively weak, whereas those between low-grade residences and commercial formats are relatively strong. (2) The local spatial association patterns of various commercial formats and residences exhibit obvious spatial heterogeneity. Overall, the proportions of various commercial formats attracted by residences are considerably higher than those of residences attracted by various commercial formats, revealing spatial asymmetry. Within the Fourth Ring Road, commercial formats are mainly attracted by residences, showing a spatial association pattern of “distribute commercial sites according to the location of residences”. The proportions of residences attracted by commercial formats increase outside the Fourth Ring Road, presenting a spatial association pattern of “commercial formats attracting residences”. The findings offer valuable insights into the development mechanisms of commercial and residential spaces and provide valuable information for urban planning.

Discovering spatial co‐location patterns of urban facilities and their asymmetric characteristics

Article

Jun 2024

Spatial co‐location pattern (CP) mining can discover sets of geographical features frequently appearing in adjacent locations, which is valuable for comprehending the co‐occurrence relationship between features. However, due to the quantitative differences and heterogeneous distribution of features, the probabilities that features appear in each other's neighborhood are unequal, resulting in an asymmetric spatial pattern. Current studies have paid little attention to the asymmetric characteristics of CPs. Therefore, this study explores the CPs and their asymmetric relationships. Firstly, we adopt the weighted participation index to evaluate the frequency of global candidate CPs. Secondly, we employ an asymmetry index we developed and the local co‐location quotient to quantify the asymmetry intensity of CPs. The results indicate that the frequent CPs mainly comprise facilities related to the residents' daily lives. Investigating the asymmetric relationships and spatial associations among features in the CPs is significant for identifying resource shortages and rationally planning urban resources.

Exploring the correlation between hard-braking events and traffic crashes in regional transportation networks: A geospatial perspective

Article

Jun 2024

Application of the local colocation quotient method in jobs-housing balance measurement based on mobile phone data: A case study of Nanjing City

Article

Feb 2024
COMPUT ENVIRON URBAN

Spatial Data Analysis: Theory and Practice

Book

Full-text available

Apr 2003

Robert Haining

Preface Readership Acknowledgements Introduction Part I. The Context for Spatial Data Analysis: 1. Spatial data analysis: scientific and policy context 2. The nature of spatial data Part II. Spatial Data: Obtaining Data And Quality Issues: 3. Obtaining spatial data through sampling 4. Data quality: implications for spatial data analysis Part III. The Exploratory Analysis of Spatial Data: 5. Exploratory analysis of spatial data 6. Exploratory spatial data analysis: visualisation methods 7. Exploratory spatial data analysis: numerical methods Part IV. Hypothesis Testing in the Presence of Spatial Autocorrelation: 8. Hypothesis testing in the presence of spatial dependence Part V. Modeling Spatial Data: 9. Models for the statistical analysis of spatial data 10. Statistical modeling of spatial variation: descriptive modeling 11. Statistical modeling of spatial variation: explanatory modeling Appendices References Index.

Regional Economic Development: Analysis and Planning Strategy

Book

Full-text available

Apr 2006

The second edition of this book is completely reedited making the book even more valuable for graduate students, reflecting recent advances and adding insightful new material. The book is about the analysis of regional economic performance and change, and how analysis integrates with strategies for local and regional economic development policy and planning. First, the book provides the reader with an overview of key theoretical and conceptual contexts within which the economic development process takes place. However, the deliberate emphasis is to provide the reader with an account of quantitative and qualitative approaches to regional economic analysis and of old and new strategic frameworks for formulating regional economic development planning. The second edition brings to the present its original thesis about the need for regions to be fast and flexible, but also to be proactive in order to be prepared to experience increasingly greater shocks while having less time to adjust their economic development to achieve sustainability. This is underscored by events that have occurred since 2001: 9/11 terrorist attacks, continuing rapid advances in technology, the rise China and India, the Tsunami, and all the known on-going and unforeseen risks and challenges that confront nations around the globe and the regions and localities within them. The book presents strategies and the traditional and expanded methods used to create and implement them.

Local indicator of spatial association-LISA

Article

Jan 1995

Luc Anselin

Nearest-neighbor contingency table analysis of spatial segregation for several species

Article

Jan 2002

Philip M. Dixon

Spatial segregation of species occurs when a species is more likely to be located in the vicinity of conspecifics. This can be investigated by mapping and identifying all locations in a study area, then analyzing the nearest-neighbor contingency table, where each location is classified by its species and the species of its nearest neighbor. Nearest-neighbor contingency tables for two species can be analyzed using the methods in Dixon (1994). Here, I present methods to analyze contingency tables for any number of species. Calculation and interpretation of the multispecies contingency table are illustrated by two examples: spatial segregation of species in a swamp forest, with five types of points (Fraxinus caroliniana, Nyssa sylvatica, Nyssa aquatica, Taxodium disticum, and "other species"), and spatial segregation in the gamodioecious tree Nyssa aquatica, with three types of points (male, female, and juvenile). Two issues that affect the results and their interpretation are the choice of randomization (random labelling or toroidal rotation) and the choice of test (pairwise or multispecies).

Polycentric phoenix

Article

Apr 2006

Sun Belt cities have a reputation for sprawling disarray. Although Phoenix is often depicted as the ultimate large fast-growing, low-density Sun Belt metropolis, we found considerable order in the location of business establishments. We tested spatial pattern to show that establishments in a variety of sectors are significantly clustered and found that while clustering declines outside the central business district (CBD) and subcenters, all sectors remain significantly clustered in the suburbs, A new method that we developed to assess spatial relationships of establishments across sectors revealed that spatial intersectoral associations are evident between some intermediate-sector establishments and within final demand. These intersectoral associations mostly carry over to portions of the urbanized area beyond the CBD and subcenters. A cartographic analysis details sectoral locational patterns across the metropolitan area and the relationships between the function of subcenters and the transportation network. We compare the economic structure of the CBD and five subcenters. Phoenix has a distinctively specialized CBD. Some subcenters are functionally diversified, while others are specialized. The rank-size rule is a good approximation of the size order of centers. We conclude that continued forces of accessibility, externality, and regulation shape the spatial structure of Phoenix.

Modifying the t test for assessing the correlation between two spatial processes

Article

Jan 1993
BIOMETRICS

P. Dutilleul

Clifford, Richardson, and Hemon presented modified tests of association between two spatially autocorrelated processes, for lattice and non-lattice data. These tests are built on the sample covariance and on the sample correlation coefficient, they require the estimation of an effective sample size that takes into account the spatial structure of both processes. Clifford et al. developed their method on the basis of an approximation of the variance of the sample correlation coefficient and assessed it by Monte Carlo simulations for lattice and non-lattice networks of moderate to large size. In the present paper, the variance of the sample covariance is computed for a finite number of locations, under the multinormality assumption, and the mathematical derivation of the definition of effective sample size is given. The theoretically expected number of degrees of freedom for the modified t test with renewed modifications is compared with that computed on the basis of equation (2.9) of Clifford et al. The largest differences are observed for small numbers of locations and high autocorrelation, in particular when the latter is present with opposite sign in the two processes Basic references that were missing in Clifford et al. are given and inherent ambiguities are discussed.

Local Economic Development: Analysis and Practice

Article

Jun 1997
Publ Prod Manag Rev

The Use of Conditional Probability Spectra in the Detection of Segregation between Plant Species

Article

Apr 1986

E. F. Galiano

A test, based on the probability of finding a certain species at increasing distances from another species, was used to describe the spatial interactions of plants in a Mediterranean pasture. The test was also applied to a set of models with two artificial species which had different spatial arrangements. Results from the models were satisfactory and helped interpret field results. The grassland species studied displayed different strategies of spatial occupation which depended on environmental conditions. The test proved to be suitable for describing species interactions in a very detailed manner. /// Тест, основанный на вероятности нахождения определенного вида, на увеличивающемся расстоянии от другого вида, использован для описания пространственных взаимодействий растений на Средиземноморском пастбище. Тест таюже использовали для серии моделей с двумя искусственными видами, имеющими различное пространственное распределение. Результаты моделиравания были удовлетворительны и помогли интерпретировать результаты полевых исследований. Изучение видов травянистых местообитаний показало наличие разных стратегий пространственного распределения, зависящего от условий среды. Тест оказался пригодным для описания взаимодействий видов в очень детальной форме.

Modifying the t Test for Assessing the Correlation Between Two Spatial Processes

Article

Mar 1993

Clifford, Richardson, and Hemon (1989, Biometrics 45, 123-134) presented modified tests of association between two spatially autocorrelated processes, for lattice and non-lattice data. These tests are built on the sample covariance and on the sample correlation coefficient; they require the estimation of an effective sample size that takes into account the spatial structure of both processes. Clifford et al. developed their method on the basis of an approximation of the variance of the sample correlation coefficient and assessed it by Monte Carlo simulations for lattice and non-lattice networks of moderate to large size. In the present paper, the variance of the sample covariance is computed for a finite number of locations, under the multinormality assumption, and the mathematical derivation of the definition of effective sample size is given. The theoretically expected number of degrees of freedom for the modified t test with renewed modifications is compared with that computed on the basis of equation (2.9) of Clifford et al. (1989). The largest differences are observed for small numbers of locations and high autocorrelation, in particular when the latter is present with opposite sign in the two processes. Basic references that were missing in Clifford et al. (1989) are given and inherent ambiguities are discussed.

Statistical methods for point pattern data

Article

Keith Ord

The Colocation Quotient: A New Measure of Spatial Association Between Categorical Subsets of Points

Abstract and Figures

Recommended publications

Uncertainty of Large-Area Estimates of Indicators of Forest Structural Gamma Diversity: A Study Base...

Wang X, Cumming SG.. Measuring landscape configuration with normalized metrics. Landscape Ecol 26: 7...

Prioritization of timber species richness hotspots for optimal harvesting and conservation planning...

Comparison of statistical tests for habitat associations in tropical forests: A case study of sympat...