Figure 1 - uploaded by Andrew B Lawson
Content may be subject to copyright.
Lousisina county map with five counties listed: Caddo, east Baton Rouge, Calcasieu, Orleans, Jefferson. 

Lousisina county map with five counties listed: Caddo, east Baton Rouge, Calcasieu, Orleans, Jefferson. 

Source publication
Article
Full-text available
In this article, we examine the development and use of covariate models where the relation with explanantory covariates is spatially adaptive. In this way space is regarded as an effect modifier. We examine the possibility of discrete groupings of coefficients (clustering of coefficients). Our application is to prostate cancer survival based on the...

Context in source publication

Context 1
... Â is the complete parameter set. Evaluation of the full posterior can be achieved via posterior sampling algorithms using McMC. For a random number of grouping levels there may be a need to consider variable dimension sampling (e.g. reversible jump). However, for the analysis reported here we decided to explore a fixed dimension, for simplicity of analysis and interpretation. With a fixed dimension the main computational concern would be in the evaluation of the Potts model, where a normalization is needed. In initial modeling we examined a variety of pseudolikelihood implementations for this model but found that the clustering results were highly sensitive to the parameter and this was also very difficult to estimate with accuracy. Hence we did not pursue this model in the work described here. Instead we examined three main models with fixed k l : the SM model, threshold CAR, and MM model. We chose the case of k l 1⁄4 2 8 l , for simplicity. With fixed k l a standard hybrid Gibbs– Metropolis updating algorithm was employed via adaptive rejection sampling. We used slice sampling for the precision parameters. The implementation was possible on WinBUGS using a zeroes trick for the censored AFT likelihood. The program is available on request from the authors. The SEER cancer registry of Louisiana records information on all cancer diagnoses occurring since initiation of records. In the case of prostate cancer (PrCA), all males diagnosed with this outcome are recorded with date of diagnosis, stage and grade of cancer, along with a variety of individual demographic information (such as age, marital status, ethnicity). Figure 1 displays the county map of the state. Later we will discuss results of our analysis with reference to a choice of five counties, which broadly represent a spectrum of rural to urban and also different socioeconomic and ethnicity distributions: Caddo, Calcasieu, East Baton Rouge, Orleans, Jefferson counties. Our focus is the prostate cancer experience within Louisiana and on how spatial location impacts the survival experience post diagnosis, after allowing for other demographic effects. Our data consist of 10,264 patients, in the 64 counties of this state, for a 5-year period: 2000– 2004. Both date of diagnosis (DOD) and date of vital outcome (DOV) are known. Those surviving beyond 2004 are regarded as censored. Figure 2 displays a selection of nine counties and their Kaplan–Meier survival curves for these data. Counties Caddo and Calcasieu are code 22017 and 22019, respectively. Two features that are remarkable in these data are the racial divergence in survival (two group: black versus white survival) and the fact that the survival curves cross for the different ethnic groups. The other major feature of note is the fact that different (geographically distant) counties can have quite different survival experiences (e.g. county 22065 compared to 22119). This suggests that spatial location could be important in defining survival experience. In a previous study, Zhang and Lawson 5 developed an AFT model for these data where a spatiallycorrelated random effect was included to allow for county level effects. This effect was introduced within a Bayesian hierarchical model framework to allow for residual confounding. In fact, two county specific effects were considered: uncorrelated and spatially correlated. In our models for these data we have considered 4 variants. We have confined our grouping to only 2 groups (i.e. k l 1⁄4 2) for simplicity, and compared these to a standrad random effect AFT ...

Similar publications

Article
Full-text available
Background: On the basis of the lack of response of invasive lobular breast cancer to neoadjuvant chemotherapy, we questioned the effectiveness of adjuvant chemotherapy in relation to histology. Patients and methods: Women with primary nonmetastatic invasive ductal or (mixed type) lobular breast cancer, aged 50-70 years, diagnosed between 1995 a...
Preprint
Full-text available
Background: We report here the first population-based incidence rates and prognosis of primary central nervous system lymphoma (PCNSL) in Finland. Methods: Finnish Cancer Registry data by histological diagnosis and tumor location (2007-2017) for cases with diffuse large B-cell lymphoma. Results: During 2007–2017, 392 new cases of PCNSL were reporte...
Article
Full-text available
Objectives: Women diagnosed with breast cancer are offered treatment and therapy based on tumor characteristics, including tumor diameter. There is scarce knowledge whether tumor diameter is accurately reported, or whether it is unconsciously rounded to the nearest half-centimeter (terminal digit preference). This study aimed to assess the precisio...
Article
Full-text available
Backgrounds: The present study evaluated Korean women with lung cancer and compared the clinical characteristics of ever-smoker and never-smoker groups using the National Lung Cancer Registry. Methods: In affiliation with the Korean Central Cancer Registry, the Korean Association for Lung Cancer constructed a registry into which 10% of the lung can...
Article
Full-text available
This study was aimed to characterize the distribution of colorectal cancer risk using family history of cancers by data mining. Family histories for 10,066 colorectal cancer cases recruited to population cancer registries of the Colon Cancer Family Registry were analyzed using a data mining framework. A novel index was developed to quantify familia...

Citations

... In addition, some studies have employed varying coefficient regression models based on spatial cluster frameworks. For example, Lawson [18] proposed an approach that provides the grouping of regression coefficients directly when the number of groups is known a priori. Lee [19] proposed a spatial cluster detection method for regression coefficients, which directly identifies an unknown number of spatial clusters in the regression coefficients via hypothesis testing and the construction of spatially varying coefficient regression based on detected spatial clusters. ...
Article
Full-text available
Demographic and educational factors are essential, influential factors of early childhood development. This study aimed to investigate spatial patterns in the association between attendance at preschool and children's developmental vulnerabilities in one or more domain(s) in their first year of full-time school at a small area level in Queensland, Australia. This was achieved by applying geographically weighted regression (GWR) followed by K-means clustering of the regression coefficients. Three distinct geographical clusters were found in Queensland using the GWR coefficients. The first cluster covered more than half of the state of Queensland, including the Greater Brisbane region, and displays a strong negative association between developmental vulnerabilities and attendance at preschool. That is, areas with high proportions of preschool attendance tended to have lower proportions of children with at least one developmental vulnerability in the first year of full-time school. Clusters two and three were characterized by stronger negative associations between developmental vulnerabilities, English as the mother language, and geographic remoteness, respectively. This research provides evidence of the need for collaboration between health and education sectors in specific regions of Queensland to update current service provision policies and to ensure holistic and appropriate care is available to support children with developmental vulnerabilities.
... In addition, some studies have employed varying coefficient regression models based on spatial cluster frameworks. For example, Lawson [18] proposed an approach that provides the grouping of regression coefficients directly when the number of groups is known a priori. Lee [19] proposed a spatial cluster detection method for regression coefficients, which directly identifies an unknown number of spatial clusters in the regression coefficients via hypothesis testing and the construction of spatially varying coefficient regression based on detected spatial clusters. ...
Preprint
Full-text available
The research explores the influence of preschool attendance (one year before full-time school) on the development of children during their first year of school. Using data collected by the Australian Early Development Census, the findings show that areas with high proportions of preschool attendance tended to have lower proportions of children with at least one developmental vulnerability. Developmental vulnerablities include not being able to cope with the school day (tired, hungry, low energy), unable to get along with others or aggressive behaviour, trouble with reading/writing or numbers. These findings, of course, vary by region. Using Data Analysis and Machine Learning, the researchers were able to identify three distinct clusters within Queensland, each characterised by different socio-demographic variables influencing the relationship between preschool attendance and developmental vulnerability. These analyses contribute to understanding regions with high vulnerability and the potential need for tailored policies or investments
... According to the previous studies (Lawson et al., 2014;Lee et al., 2017), the time-varying coefficients may have locally homogeneous trends due to spatial adjacency. Thus, this paper models the regression coefficients as cluster-specific in the sense that the areas in the same cluster have common covariate effects with particular temporal patterns, which is different from the patterns in other clusters. ...
Article
The transmission of COVID‐19 epidemic is a global emergency which is worsened by the genetic mutations of SARS‐CoV‐2. However, till date, few statistical studies have researched the COVID‐19 spread patterns in terms of the variant cases. Hence, this paper aims to explore the associated risk factors of Delta variant, the most contagious strain of COVID‐19. The study collected the state‐level COVID‐19 Delta variant cases in the United States during a 12‐week period and included potential environmental, socioeconomic and public prevention factors as independent variables. Instead of regarding the covariate effects as constant, this paper proposes a flexible Bayesian hierarchical model with spatio‐temporally varying coefficients to account for data heterogeneity. The method enables us to cluster the states into distinctive groups based on the temporal trends of the coefficients and simultaneously identify significant risk factors for each cluster. The findings contribute novel insight into the dynamics of covariate effects on the COVID‐19 Delta variant over space and time, which could help the government develop targeted prevention measures for vulnerable regions based on the selected risk factors. This article is protected by copyright. All rights reserved
... Spatial heterogeneity in regression coefficients has been addressed by clustered varying coefficient regression for the spatial data. Lawson et al. (2014) proposed the grouped spatial varying coefficient regression when the total number of groups is given. Recently, and Lee et al. (2020) proposed spatial cluster detection approaches of regression coefficients. ...
Article
Spatial cluster detection, which is the identification of spatial units adjacent in space associated with distinctive patterns of data of interest relative to background variation, is useful for discerning spatial heterogeneity in regression coefficients. Some real studies with regression-based models on air quality data show that there exists not only spatial heterogeneity but also heteroscedasticity between air pollution and its predictors. Since the low air quality is a well-known risk factor for mortality, various cardiopulmonary diseases, and preterm birth, the analysis at the tail would be of more interest than the center of air pollution distribution. In this article, we develop a spatial cluster detection approach using a threshold quantile regression model to capture the spatial heterogeneity and heteroscedasticity. We introduce two threshold variables in the quantile regression model to define a spatial cluster. The proposed test statistic for identifying the spatial cluster is the supremum of the Wald process over the space of threshold parameters. We establish the limiting distribution of the test statistic under the null hypothesis that the quantile regression coefficient is the same over the entire spatial domain at the given quantile level. The performance of our proposed method is assessed by simulation studies. The proposed method is also applied to analyze the particulate matter (PM 2.5 ) concentration and aerosol optical depth (AOD) data in the Northeastern United States in order to study geographical heterogeneity in the association between AOD and PM 2.5 at different quantile levels.
... Alternatively, there are some studies for varying coefficient regression models based on spatial cluster frameworks. Lawson et al 28 proposed an approach which provides the grouping of regression coefficients directly when the number of groups is known a priori. Lee et al 29,30 proposed a spatial cluster detection method for regression coefficients which allows the identification of an unknown number of spatial clusters in the regression coefficients directly via hypothesis testing and the construction of spatially varying coefficient regression based on detected spatial clusters. ...
Article
In regression analysis for spatio-temporal data, identifying clusters of spatial units over time in a regression coefficient could provide insight into the unique relationship between a response and covariates in certain subdomains of space and time windows relative to the background in other parts of the spatial domain and the time period of interest. In this article, we propose a varying coefficient regression method for spatial data repeatedly sampled over time, with heterogeneity in regression coefficients across both space and over time. In particular, we extend a varying coefficient regression model for spatial-only data to spatio-temporal data with flexible temporal patterns. We consider the detection of a potential cylindrical cluster of regression coefficients based on testing whether the regression coefficient is the same or not over the entire spatial domain for each time point. For multiple clusters, we develop a sequential identification approach. We assess the power and identification of known clusters via a simulation study. Our proposed methodology is illustrated by the analysis of a cancer mortality dataset in the Southeast of the U.S.
... Geographically weighted regression provides regression coefficient estimates that are locally weighted and vary across space. Alternatively, in a Bayesian framework, Lawson, Choi, and Zhang (2014) proposed an approach to a grouped spatial varying coefficient regression when the total number of groups is known a priori. However, neither method is directly applicable to the detection of hot spots. ...
Article
Identifying spatial clusters of different regression coefficients is a useful tool for discerning the distinctive relationship between a response and covariates in space. Most of the existing cluster detection methods aim to identify the spatial similarity in responses, and the standard cluster detection algorithm assumes independent spatial units. However, the response variables are spatially correlated in many environmental applications. We propose a mixed‐effects model for spatial cluster detection that takes spatial correlation into account. Compared to a fixed‐effects model, the introduced random effects explain extra variability among the spatial responses beyond the cluster effect, thus reducing the false positive rate. The developed method exploits a sequential searching scheme and is able to identify multiple potentially overlapping clusters. We use simulation studies to evaluate the performance of our proposed method in terms of the true and false positive rates of a known cluster and the identification of multiple known clusters. We apply our proposed methodology to particulate matter (PM2.5) concentration data from the Northeastern United States in order to study the weather effect on PM2.5 and to investigate the association between the simulations from a numerical model and the satellite‐derived aerosol optical depth data. We find geographical hot spots that show distinct features, comparing to the background.
... Several studies have been conducted to model survival data accounting for spatial variation. Data and analyses of prostate cancer (PrCa) are available at international [13,14], national [15] and smaller scale [16,17,18,19], the disease showing a large spatial variability. The incidence of prostate cancer is highest in Scandinavian countries and lowest in China [20]. ...
... Location accounts for dissimilarities in the composition of populations and differentiates risks that are the product of physical and social environments. The Surveillance, Epidemiology and End Results (SEER) database [21] of the National Cancer Institute has been used extensively for modeling prostate cancer survival, accounting for the county level spatial variation [16,17,18,19]. The SEER Program registries routinely collect data on patient demographics, primary tumor site, tumor morphology and stage at diagnosis, first course of treatment, and follow-up for vital status. ...
Article
We propose a Bayesian spatial model for time-to-event data in which we allow the censoring mechanism to depend on covariates and have a spatial structure. The survival model incorporates a cure rate fraction and assumes that the time-to-event follows a Weibull distribution, with covariates such as race, stage, grade, marital status and age at diagnosis being linked to its scale parameter. With right censoring being a primary concern, we consider a joint logistic regression model for the death versus censoring indicator, allowing dependence on covariates and including a spatial structure via the use of random effects. We apply the models to examine prostate cancer data from the Surveillance, Epidemiology, and End Results (SEER) registry, which displays marked spatial variation.
... This method does not directly model clustering of regression coefficients. In addition, Lawson et al. [21] proposed discrete grouping of regression coefficients by considering a prior distribution for spatial grouping in a Bayesian framework. While this method directly provides grouping of regression coefficients, the number of groups needs to be specified in advance. ...
Article
Popular approaches to spatial cluster detection, such as the spatial scan statistic, are defined in terms of the responses. Here, we consider a varying-coefficient regression and spatial clusters in the regression coefficients. For varying-coefficient regression, such as the geographically weighted regression, different regression coefficients are obtained for different spatial units. It is often of interest to the practitioners to identify clusters of spatial units with distinct patterns in a regression coefficient, but there is no formal statistical methodology for that. Rather, cluster identification is often ad-hoc such as by eyeballing the map of fitted regression coefficients and discerning patterns. In this paper, we develop new methodology for spatial cluster detection in the regression setting based on hypotheses testing. We evaluate our methods in terms of power and coverages for true clusters via simulation studies. For illustration, our methodology is applied to a cancer mortality dataset. Copyright
... Subsequent papers explored Bayesian semiparametric modeling (2), spatiotemporal modeling (3,8), semiparametric proportional odds models with spatial frailties (6), joint survival and longitudinal modeling with frailties (44), and parametric accelerated failure time models (42). Finally, we refer the reader to Lawson et al. (29) for spatial survival models that do not deploy spatial frailties. ...
Article
With increasing accessibility to geographic information systems (GIS) software, statisticians and data analysts routinely encounter scientific data sets with geocoded locations. This has generated considerable interest in statistical modeling for location-referenced spatial data. In public health, spatial data routinely arise as aggregates over regions, such as counts or rates over counties, census tracts, or some other administrative delineation. Such data are often referred to as areal data. This review article provides a brief overview of statistical models that account for spatial dependence in areal data. It does so in the context of two applications: disease mapping and spatial survival analysis. Disease maps are used to highlight geographic areas with high and low prevalence, incidence, or mortality rates of a specific disease and the variability of such rates over a spatial domain. They can also be used to detect hot spots or spatial clusters that may arise owing to common environmental, demographic, or cultural effects shared by neighboring regions. Spatial survival analysis refers to the modeling and analysis for geographically referenced time-to-event data, where a subject is followed up to an event (e.g., death or onset of a disease) or is censored, whichever comes first. Spatial survival analysis is used to analyze clustered survival data when the clustering arises from geographical regions or strata. Illustrations are provided in these application domains. Expected final online publication date for the Annual Review of Public Health Volume 37 is March 17, 2016. Please see http://www.annualreviews.org/catalog/pubdates.aspx for revised estimates.