BookPDF Available

Spatial Data Analysis: Theory and Practice

April 2003

April 2003

DOI:10.1017/CBO9780511754944

Publisher: Cambridge University Press
ISBN: 0 521 77437 3

Authors:

Robert Haining

University of Cambridge

Preface Readership Acknowledgements Introduction Part I. The Context for Spatial Data Analysis: 1. Spatial data analysis: scientific and policy context 2. The nature of spatial data Part II. Spatial Data: Obtaining Data And Quality Issues: 3. Obtaining spatial data through sampling 4. Data quality: implications for spatial data analysis Part III. The Exploratory Analysis of Spatial Data: 5. Exploratory analysis of spatial data 6. Exploratory spatial data analysis: visualisation methods 7. Exploratory spatial data analysis: numerical methods Part IV. Hypothesis Testing in the Presence of Spatial Autocorrelation: 8. Hypothesis testing in the presence of spatial dependence Part V. Modeling Spatial Data: 9. Models for the statistical analysis of spatial data 10. Statistical modeling of spatial variation: descriptive modeling 11. Statistical modeling of spatial variation: explanatory modeling Appendices References Index.

Content uploaded by Robert Haining

Content may be subject to copyright.

Spatial

Data

Analysis

Theory and

Practice

ROBERT HAINING

University of Cambridge

CAMBRIDGE

UNIVERSITY PRESS

Contents

Preface xv

Acknovvledgements xvii

Introduction

0.1 About the book 1

0.2 What is spatial data analysis? 4

0.3 Motivation for the book 5

0.4 Organization 8

0.5 The spatial data matrix 10

Part A

The context for spatial data analysis

Spatial data analysis: scientific and policy context

1.1 Spatial data analysis in science 15

1.1.1 Generic issues of place, context and space in scientific

explanation 16

(a)

Location as place and context

(b)

Location and spatial relationships 18

1.1.2 Spatial processes 21

1.2

Place and space in specific areas of scientific explanation 22

1.2.1 Defining spatial subdisciplines 22

1.2.2

Examples: selected research areas

(a)

Environmental criminology 24

(b)

Geographical and environmental (spatial)

epidemiology

(c)

Regional economics and the new economic

geography 29

vii

viii

Contents

(d)

Urban studies 31

(e)

Environmental sciences 32

1.2.3 Spatial data analysis in problem solving 33

1.3

Spatial data analysis in the policy area 36

1.4

Some examples of problems that arise in analysing

spatial data 40

1.4.1 Description and map interpretation 40

1.4.2

Information redundancy 41

1.4.3 Modelling 41

1.5

Concluding remarks 41

2 The nature of spatial data 43

2.1

The spatial data matrix: conceptualization and

representation issues 44

2.1.1

Geographic space: objects, fields and geometric

representations 44

2.1.2 Geographic space: spatial dependence in attribute

values 46

2.1.3

Variables 47

(a)

Classifying variables 48

(b)

Levels ofmeasurement 50

2.1.4 Sample or population? 51

2.2

The spatial data matrix: its form 54

2.3

The spatial data matrix: its quality 57

2.3.1

Model quality

(a)

Attribute representation

(b)

Spatial representation: general considerations 59

(c)

Spatial representation: resolution and

aggregation 61

2.3.2 Data

quality 61

(a)

Accuracy 63

(b)

Resolution 67

(c)

Consistency 70

(d)

Completeness 71

2.4 Quantifying

spatial dependence 74

(a)

Fields: data from two-dimensional continuous

space 74

(b)

Objects: data from two-dimensional discrete

space 79

2.5

Concluding remarlcs 87

Contents ix

Part B

Spatial data: obtaining data and quality issues

Obtaining spatial data through sampling

3.1 Sources of spatial data 91

3.2 Spatial sampling 93

3.2.1 The purpose and conduct of spatial sampling 93

3.2.2 Design- and model-based approaches to spatial

sampling 96

(a)

Design-based approach to sampling 96

(b)

Model-based approach to sampling 98

(c)

Comparative comments 99

3.2.3 Sampling plans 100

3.2.4 Selected sampling problems 103

(a)

Design-based estimation of the population mean 103

(b)

Model-based estimation of means 106

(c)

Spatial prediction 107

(d)

Sampling to identify extreme values or detect

rare events 108

3.3 Maps through simulation 113

Data quality: implications for spatial data analysis

116

4.1 Errors in data and spatial data analysis 116

4.1.1 Models for measurement error 116

(a)

Independent error models 117

(b)

Spatially correlated error models 118

4.1.2 Gross errors 119

(a)

Distributional outliers 119

(b)

Spatial outliers 122

(c)

Testing for outliers in large data sets 123

4.1.3 Error propagation 124

4.2 Data resolution and spatial data analysis 127

4.2.1 Variable precision and tests of significance 128

4.2.2 The change of support problem 129

(a)

Change of support in geostatistics 129

(b)

Areal interpolation 131

4.2.3 Analysing relationships using aggregate data 138

(a)

Ecological inference: parameter estimation 141

(b)

Ecological inference in environmental epidemiology:

identifying valid hypotheses 147

(c)

The modifiable areal units problem (MAUP) 150

x Contents

4.3 Data consistency and spatial data analysis 151

4.4 Data completeness and spatial data analysis 152

4.4.1 The missing-data problem 154

(a)

Approaches to analysis when data are missing 156

(b)

Approaches to analysis when spatial data are

missing 159

4.4.2 Spatial interpolation, spatial prediction 164

4.4.3 Boundaries, weights matrices and data completeness 174

4.5 Concluding remarks 177

Part C

The exploratory analysis of spatial data

Exploratory spatial data analysis: conceptual models

181

5.1 EDA and ESDA 181

5.2 Conceptual models of spatial variation 183

(a)

The regional model 183

(b)

Spatial `rough' and `smooth' 184

(c)

Scales of spatial variation 185

Exploratory spatial data analysis: visualization methods

188

6.1 Data visualization and exploratory data analysis 188

6.1.1 Data visualization: approaches and tasks 189

6.1.2 Data visualization: developments through computers 192

6.1.3 Data visualization: selected techniques 193

6.2 Visualizing spatial data 194

6.2.1 Data preparation issues for aggregated data: variable

values 194

6.2.2 Data preparation

issues

for aggregated data: the spatial

framework 199

(a)

Non-spatial approaches to region building 200

(b)

Spatial approaches to region building 201

(c)

Design criteria for region building 203

6.2.3 Special issues in the visualization of spatial data 206

6.3 Data visualization and exploratory spatial data analysis 210

6.3.1 Spatial data visualization: selected techniques for univariate

data 211

(a)

Methods for data associated with point or area

objects 211

(b)

Methods for data from a continuous surface 215

6.3.2 Spatial data visualization: selected techniques for bi- and

multi-variate data 218

Contents xi

6.3.3 Uptake of breast cancer screening in Sheffield 219

6.4 Concluding remarks 225

Exploratory spatial data analysis: numerical methods

226

7.1 Smoothing methods 227

7.1.1 Resistant smoothing of graph plots 227

7.1.2 Resistant description of spatial dependencies 228

7.1.3 Map smoothing 228

(a)

Simple mean and median smoothers 230

(b)

Introducing distance weighting 230

(c)

Smoothing rates 232

(d)

Non-linear smoothing: headbanging 234

(e)

Non-linear smoothing: median polishing 236

(f)

Some comparative examples 237

7.2 The exploratory identification of global map properties: overall

dustering 237

7.2.1 Clustering in area data 242

7.2.2 Clustering in a marked point pattern 247

7.3 The exploratory identification oflocal map properties 250

7.3.1 Cluster detection 251

(a)

Area data 251

(b)

Inhomogeneous point data 259

7.3.2 Focused tests 263

7.4 Map comparison 265

(a)

Bivariate association 265

(b)

Spatial association 268

Part

D Hypothesis testing and spatial autocorrelation

Hypothesis testing in the presence of spatial

dependence

273

8.1 Spatial autocorrelation and testing the mean of a spatial

data set 275

8.2 Spatial autocorrelation and tests of bivariate

association 278

8.2.1 Pearson's product moment correlation coefficient 278

8.2.2 Chi-square tests for contingency tables 283

Part E

Modelling spatial data

Models for the statistical analysis of spatial data

289

9.1 Descriptive models 292

9.1.1 Models for large-scale spatial variation 293

xii Contents

9.1.2 Models for Small-scale spatial variation 293

(a)

Models for data from a surface 293

(b)

Models for continuous-valued area data 297

(c)

Models for discrete-valued area data 304

9.1.3 Models with several scales of spatial variation 306

9.1.4 Hierarchical Bayesian models 307

9.2 Explanatory models 312

9.2.1 Models for continuous-valued response variables: normal

regression models 312

9.2.2 Models for discrete-valued area data: generalized linear

models 316

9.2.3 Hierarchical models

(a)

Adding covariates to hierarchical Bayesian models 320

(b)

Modelling spatial context: multi-level models 321

10 Statistical modelling of spatial variation: descriptive

modelling

325

10.1 Models for representing spatial variation 325

10.1.1 Models for continuous-valued variables 326

(a)

Trend surface models with independent errors 326

(b)

Semi-variogram and covariance models 327

(c)

Trend surface models with spatially correlated errors 331

10.1.2 Models for discrete-valued variables 334

10.2 Some general problems in modelling spatial variation 338

10.3 Hierarchical Bayesian models 339

11 Statistical modelling of spatial variation: explanatory

modelling

350

11.1 Methodologies for spatial data modelling 350

11.1.1 The 'classical' approach 350

11.1.2 The econometric approach 353

(a)

A general spatial specification 355

(b)

Two models of spatial pricing 356

11.1.3 A

data-driven' methodology 358

11.2 Some applications of linear modelling of spatial data 358

11.2.1 Testing for regional income convergence 359

11.2.2 Models for binary responses 361

(a)

A logistic model with spatial lags an the covariates 361

(b)

Autologistic models with covariates 364

11.2.3 Multi-level modelling 365

Contents xiii

11.2.4 Bayesian modelling of burglaries in Sheffield 367

11.2.5 Bayesian modelling of children excluded from school

376

11.3

Concluding comments 378

Appendix I Software

379

Appendix II Cambridgeshire lung cancer data

381

Appendix III Sheffield burglary data

385

Appendix IV Children excluded from school: Sheffield

391

References

394

Index 424

The spatial clustering and heterogeneity of the burglary and concentrated disadvantage relationship in Washington, DC

Article

Full-text available

Jun 2024
GeoJournal

This study shows that the global and local spatial patterns of burglary rates in the District of Columbia neighborhoods varied significantly between 2019 and 2021, and that their relationship with Concentrated Disadvantage (CD) was spatially heterogeneous. Hotspot clusters of neighborhoods with high levels of burglary changed rapidly from one year to another, while clusters with positive and negative local associations between burglary and CD did not change significantly over time. The mains lessons are that burglary hotspots are harder to predict than bivariate burglary and CD hotspots, and that the previous relationship varies significantly across neighborhoods. The research and policy implication is that we need to move beyond the spatial univariate analysis of hotspots of crime, to more detailed spatial bivariate analyses of correlates of crime.

Bayesian Rank Likelihood Estimation for Spatial Latent Trait Model

Preprint

Full-text available

Jun 2024

In this study, a spatial latent trait model was developed to address the challenge of parameter estimation for ordinal response variables. The development of the model involved employing the Bayesian rank likelihood estimation method. The simulation algorithm was provided in detail, and the performance and sensitivity of the developed method were evaluated using simulation techniques. Method evaluation was conducted to identify any convergence issues in the developed method. The results showed that trace plots of all parameters (β, υ, and γ) showed good mixing and quick convergence. The potential scale reduction factor value for all parameters did not exceed one, indicating that convergence issues were not identified. Additionally, the developed method performed well, as demonstrated by the posterior predictive check, since simulated data generated from the posterior predictive distribution closely resemble the observed data. The developed method also effectively captures within-region variations and spatial correlations between the regions through the latent traits parameters. The assessment of performance included metrics such as root mean square error, mean absolute error, and the probability coverage of the corresponding 95% confidence intervals of the estimates. The results indicate that the estimates obtained from the developed method outperform the existing classical estimates. As a result, it can be concluded that the spatial latent trait model using Bayesian rank likelihood estimation is regarded as the better model.

Multiscale Space-Time Analysis of Environmental Changes in the Oil Sands Area (Alberta, Canada

Article

Full-text available

May 2024

Stefania Bertazzon

Our study encompasses the Oil Sands Area (OSA) within northern Alberta, Canada, which has experienced substantial environmental changes over the last decades, in association with natural and anthropogenic disturbances. Using composites of Landsat imagery for 5-year intervals between 2000 and 2020, we performed two parallel geospatial analyses to assess environmental changes, examining landscape metrics and spectral indices. Landscape metrics were calculated from land use/land cover maps derived from a Random Forest supervised classification. Spectral indices included Normalized Difference Vegetation Index (NDVI) and Normalised Difference Built-up Index (NDBI), among others. Both hierarchical zonal analysis of spectral indices and zonal landscape metrics were calculated based upon two different aggregations of nested drainage basin features from hydrologic unit code (HUC - Watersheds of Alberta). Spatial contiguity of changes was evaluated by hotspot analysis. HUCs determined to experience significant changes at coarse aggregation level were examined at finer level. The combination of landscape metrics and zonal analysis provided evidence of substantial, yet localized, areas of changing trends. Mixed forest experienced the most significant changes; urban/barren areas initially increased and later decreased, indicating change both in agricultural and human-made areas.

COVID-19 Infection: A Mozambican Case Study

Article

Feb 2024

In China, the country of COVID-19 origin, until February 23rd, 2020, more than 77000 cases of COVID-19 infection were reported, and 60% of confirmed cases were reported in the city of Wuhan. Mozambique declared a state of emergency in March 2020, different prevention measures were implemented to control and respond in a timely manner to the pandemic, including the early diagnosis of cases of the disease. The present work reports some details about a larger project with the main objective of computing models of analysis and visualization of COVID-19 data in Mozambique. The topic falls within the area of Statistics with the purpose of providing evidence that explains the stage of the country regarding the evolution of COVID-19 cases, (from the notification of the first case of COVID-19 in Mozambique on March 22nd, 2020, until May 31st, 2022) with the focus on the provinces of Maputo, Nampula, Cabo Delgado and Niassa. The work considered qualitative and quantitative data to allow decision-making in the health area on measures to prevent the pandemic and the trend of cases and deaths from the disease.

How Does Spatial Heterogeneity Affect Industrial Outputs? Literature Review and Research Prospects

Article

Full-text available

Jan 2023

The impact of spatial heterogeneity on industrial outputs is a new important topic in economic geography. A considerable amount of research literature has accumulated, but the academic community lacks a systematic and comprehensive review and consensus on this topic. This study carried out research by mining the relevant classical literature. This investigation first combed the connotation of spatial heterogeneity, which is both corresponding to and related to spatial dependence. Theorists generally acknowledge that there is spatial heterogeneity in the process of industrial outputs. Then this study summarizes the logical basis, relationship coordination, measurement and other aspects of the effect of spatial heterogeneity on industrial outputs. In analyzing the impact of spatial het-erogeneity on industrial outputs, we should not ignore the spatial dimension, but must also pay attention to the heterogeneity of individual enterprises. Industrial output analysis needs to be based on the relationship between spatial heterogeneity and spatial dependence. The influence of spatial heterogeneity on industrial outputs and the degree of differences among observation objects can be measured by econometric methods. The common indicators for measuring and quantitatively describing the impact of spatial heterogeneity on industrial outputs mainly include semivariogram, the spatial expansion model and the geographical weighted regression model. Finally, some directions of future research are pointed out in order to provide useful ideas for future theoretical research and industrial practice.

Amazon forest biogeography predicts resilience and vulnerability to drought

Article

Full-text available

Jun 2024
NATURE

Amazonia contains the most extensive tropical forests on Earth, but Amazon carbon sinks of atmospheric CO2 are declining, as deforestation and climate-change-associated droughts1–4 threaten to push these forests past a tipping point towards collapse5–8. Forests exhibit complex drought responses, indicating both resilience (photosynthetic greening) and vulnerability (browning and tree mortality), that are difficult to explain by climate variation alone9–17. Here we combine remotely sensed photosynthetic indices with ground-measured tree demography to identify mechanisms underlying drought resilience/vulnerability in different intact forest ecotopes18,19 (defined by water-table depth, soil fertility and texture, and vegetation characteristics). In higher-fertility southern Amazonia, drought response was structured by water-table depth, with resilient greening in shallow-water-table forests (where greater water availability heightened response to excess sunlight), contrasting with vulnerability (browning and excess tree mortality) over deeper water tables. Notably, the resilience of shallow-water-table forest weakened as drought lengthened. By contrast, lower-fertility northern Amazonia, with slower-growing but hardier trees (or, alternatively, tall forests, with deep-rooted water access), supported more-drought-resilient forests independent of water-table depth. This functional biogeography of drought response provides a framework for conservation decisions and improved predictions of heterogeneous forest responses to future climate changes, warning that Amazonia’s most productive forests are also at greatest risk, and that longer/more frequent droughts are undermining multiple ecohydrological strategies and capacities for Amazon forest resilience.

Dengue fever mapping in Bangladesh: A spatial modeling approach

Article

Full-text available

May 2024

Background Epidemics of the dengue virus can trigger widespread morbidity and mortality along with no specific treatment. Examining the spatial autocorrelation and variability of dengue prevalence throughout Bangladesh's 64 districts was the focus of this study. Methods The spatial autocorrelation is evaluated with the help of Moran I $I$ and Geary C $C$. Local Moran I $I$ was used to detect hotspots and cold spots, whereas local Getis Ord G was used to identify only spatial hotspots. The spatial heterogeneity has been detected using various conventional and spatial models, including the Poisson‐Gamma model, the Poisson‐Lognormal Model, the Conditional Autoregressive (CAR) model, the Convolution model, and the BYM2 model, respectively. These models are implemented using Gibbs sampling and other Bayesian hierarchical approaches to analyze the posterior distribution effectively, enabling inference within a Bayesian context. Results The study's findings show that Moran I $I$and Geary C $C$analysis provides a substantial clustering pattern of positive spatial autocorrelation of dengue fever (DF) rates between surrounding districts at a 90% confidence interval. The Local Indicators of Spatial Autocorrelation cluster mapped spatial clusters and outliers based on prevalence rates, while the local Getis‐Ord G displayed a thorough breakdown of high or low rates, omitting outliers. Although Chattogram had the most dengue cases (15,752), Khulna district had a higher prevalence rate (133.636) than Chattogram (104.796). The BYM2 model, determined to be well‐fitted based on the lowest Deviance Information Criterion value (527.340), explains a significant association between spatial heterogeneity and prevalence rates. Conclusion This research pinpoints the district with the highest prevalence rate for dengue and the neighboring districts that also have high risk, allowing government agencies and communities to take the necessary precautions to mollify the risk effect of DF.

Impact of urban form and street infrastructure on pedestrian-motorist collisions

Article

May 2024
Int J Inj Contr Saf Promot

İç Turizmdeki Pandemi Etkisinin Kümelenme Analizi Kullanılarak İncelenmesi: Türkiye Örneği

Article

May 2024

Bu çalışmada, belediye ve bakanlık belgeli tesislere gelen yerli turistlere ilişkin istatistikler, mekânsal istatistik yöntemlerine göre incelenmiştir. 2018, 2020 ve 2022 yılları esas alınarak pandemi sürecinin iç turizmdeki etkisi ortaya konulmuştur. Verilerin mekânsal dağılımını belirleyebilmek için küresel ve yerel olarak Moran’s I yöntemleri kullanılarak mekânsal otokorelasyon analizi yapılmıştır. İlçe ölçeğinde gerçekleştirilen analiz sonucunda, pandeminin tercih edilen turizm mekânlarında, yerli turistler özelinde belirgin bir değişime sebep olmadığı ortaya konulmuştur. Salgın dönemi ve salgın sonrasındaki dönemde sıklıkla ifade edilen radikal değişiklik söylemleri en azından bu süreler içerisinde istatistiksel olarak gerçekleşmemiştir. Ancak değişim uzun bir zaman dilimine ihtiyaç duymaktadır. Yaşanan krizler sektörü etkilediği gibi turistlerin tercihlerini de etkilemektedir. Risk ve krizleri azaltmak, planlamaların yapılması, sürdürülebilir turizm gelişiminin sağlanması, ekonomik, sosyal ve çevresel sonuçları izlemek için turist tutum ve davranışlarını takip etmek önem arz etmektedir. Bundan dolayı çalışmada coğrafi kümelenme eğilimi değerlendirilmiştir. Sonuçlar, Türkiye’de iç turizmin giderek daha fazla fark edilir hale gelen coğrafi kümelenme eğiliminde olduğunu ortaya koymaktadır.

A Spatial Interaction Model for the Identification of Urban Functional Regions

Chapter

May 2024

The Detection of Clusters Using a Spatial Version of the Chi-Square Goodness-of-Fit Statistic

Article

Full-text available

Jan 1999

Peter Rogerson

A test statistic for the detection of spatial clusters is developed by generalizing the common chi‐square goodness‐of‐fit test. The paper includes a discussion of the relationship between the statistic and other associated statistics, and provides an analysis of both its null distribution and power. The paper concludes with the development of a local version of the statistic and an application to leukemia clustering in central New York.

Integration of Spatial Markets

Article

Full-text available

Feb 1990

Studies of spatial market integration draw their implications from a theory which assumes that there are no intraregional transport costs. An alternative theory is offered, based on the assumptions that buyers and sellers are spatially dispersed and intraregional transport costs are significant. This implies that the market is a linked oligopoly (or oligopsony) and that market integration tests are tests of alternative oligopoly price formation processes. For example, collusive basing-point pricing produces results typically assumed to imply efficiently integrated markets, while competitive FOB pricing does not. The theoretical implications are illustrated with an analysis of hog prices in Canada.

Loss functions for estimation of extrema with an application to disease mapping

Article

Sep 2003
CAN J STAT

It is often of interest to find the maximum or near maxima among a set of vector-valued parameters in a statistical model; in the case of disease mapping, for example, these correspond to relative-risk “hotspots” where public-health intervention may be needed. The general problem is one of estimating nonlinear functions of the ensemble of relative risks, but biased estimates result if posterior means are simply substituted into these nonlinear functions. The authors obtain better estimates of extrema from a new, weighted ranks squared error loss function. The derivation of these Bayes estimators assumes a hidden-Markov random-field model for relative risks, and their behaviour is illustrated with real and simulated data.Les valeurs maximale ou quasi-maximale du vecteur de paramètres d'un modèle statistique revětent souvent un intérět particulier; sur une carte des cas de maladie, par exemple, ces valeurs correspondent à des “points chauds” pouvant nécessiter une intervention publique. Le problème se réduit généralement à estimer des fonctions non linéaires de l'ensemble des risques relatifs, mais les estimations obtenues sont biaisées si on se borne à y remplacer les moyennes a posteriori. Les auteurs proposent de meilleurs estimateurs des extrěmes, construits au moyen d'une nouvelle fonction de perte quadratique pondérée basée sur les rangs. Le calcul de ces estimateurs de Bayes suppose que les risques relatifs sont modélisés à l'aide d'un champ aléatoire de Markov caché; leur comportement est illustré à l'aide de données réelles et simulées.

Spatial Data Analysis: Theory and Practice

Abstract

Recommended publications

[Spatial exploratory analysis of road accidents in Ciudad Juarez, Mexico]

Urban Growth and Land-Use Structure in Two Mediterranean Regions: An Exploratory Spatial Data

Spatial autocorrelation in global vegetation fires: exploratory analysis of screened MODIS hotspot d...

Exploratory point pattern analysis for modeling earthquake data