Article

Nearest‐Neighbour Systems and the Auto‐Logistic Model for Binary Data

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

Bartlett (1966) and Whittle (1963), respectively, have proposed alternative, non‐equivalent definitions of nearest‐neighbour systems. The former, conditional probability definition, whilst the more intuitively attractive, presents several basic problems, not least in the identification of available models. In this paper, conditional probability nearest‐neighbour systems for interacting random variables on a two‐dimensional rectangular lattice are examined. It is shown that, in the case of 0, 1 variables and a homogeneous system, the only possibility is a logistic‐type model but in which the explanatory variables at a point are the surrounding array variables themselves. A spatial–temporal approach leading to the same model is also suggested. The final section deals with linear nearest‐neighbour systems, especially for continuous variables. The results of the paper may easily be extended to three or more dimensions.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... For these reasons we advocate to turn away from the raster paradigm when modeling agricultural landscapes. Using a network-based representation of interactions among landscape elements, we construct Gibbs energies based on network structure (see, e.g., the recent collection of papers introduced by Fienberg (2010)) and, more specifically, models pertaining to the widely used class of discrete Markov random fields; see the seminal work of Besag (1972), Hammersley and Clifford (1971). Approaches relevant to our work are the nearest-neighbour Markov structures of Baddeley and Møller (1989) and the representations based on connected components introduced in Møller and Waagepetersen (1998). ...
... gives relative preference to category 1 over category 0 such that landscapes tend to have more objects of category 1 than of category 0 for type C, provided that the energy terms of other landscape descriptors do not conversely influence the proportion of categories. Markov models with only two-level categories and terms for activity and pairwise interaction can be viewed as variants of the classical Ising model (Gallavotti (1999)) and, more generally, of autologistic regression models; see Besag (1972), Hammersley and Clifford (1971) and Section 3 of van Lieshout (2019). ...
... The likelihood function is not tractable in practice due to the normalizing constant c(β) in the probability mass function (5). Instead, we use a pseudo-likelihood based on conditional distributions; see Besag (1972Besag ( , 1974, Møller and Waagepetersen (1998), Stoehr (2017), van Lieshout (2000 and, particularly, Section 3.5 of van Lieshout (2019). Given n objects x = (x 1 , . . . ...
... A simple and direct way to account for spatial autocorrelation within a logistic model allows the response probability, π i , at the ith observation -here, the ith county -to depend upon the other observed binary responses in some pre-defined, possibly non-spatial, neighborhood, N i , for that county. Besag [2] first proposed such a formulation to incorporate neighboring autocorrelation, the conditional autologistic model ...
... When β 2 = 0, no autocorrelation exists in the data, and the autologistic model then reduces to a standard logistic regression (which we call the 'independence model'). Caragea and Kaiser [7] offered a correction to (2) that extends the conditional model into a centered autologistic form, by redefining the spatial autocovariate (essentially, by centering it): ...
... Or, for that matter, our MPL fit for the centered autologistic model could be replaced by some form of Bayesian fit [19,39] if sufficient hierarchical prior information were available to apply the Bayesian paradigm. Indeed, the concept of adjusting for autocorrelation in a logistic regression is obviously not new: besides Besag's original paper [2] and the centered extension in [7], applications include use of auxiliary variables to account for severity of adverse events [4], variational methods to capture spatial dependence via Gaussian processes [16], and extensions to (sparse) generalized linear mixed models [22], among many others. In all these cases, further development for implementation in our risk-analytic context, and how to account for the necessary benchmark components, would be required. ...
Article
We develop and study a quantitative, interdisciplinary strategy for conducting statistical risk analyses within the ‘benchmark risk’ paradigm of contemporary risk assessment when potential autocorrelation exists among sample units. We use the methodology to explore information on vulnerability to natural hazards across 3108 counties in the conterminous 48 US states, applying a place-based resilience index to an existing knowledgebase of hazardous incidents and related human casualties. An extension of a centered autologistic regression model is applied to relate local, county-level vulnerability to hazardous outcomes. Adjustments for autocorrelation embedded in the resiliency information are applied via a novel, non-spatial neighborhood structure. Statistical risk-benchmarking techniques are then incorporated into the modeling framework, wherein levels of high and low vulnerability to hazards are identified.
... Modeling dependency between different binary variables is an essential statistical task with many applications in medicine, life sciences, economics, and sociology. The basic statistical tools used in such modeling are the autologistic model ([Besag (1972)]) and network modeling based on the Ising model model ([German(1984)], [Ravikumar et al. (2010)], [Zavlis et al. (2021)], [Abeyasinghe et al. (2020)], [Dimitrakopoulos et al. (2020)], [Briganti and Linkowski (2000)]), which is a special case of autologistic model. More general information of graphical models for discrete data can be found in [Madigan at al. (1995)] and [Maathuis et al. (2019)]. ...
... More general information of graphical models for discrete data can be found in [Madigan at al. (1995)] and [Maathuis et al. (2019)]. The classical autologistic model ([Besag (1972)]) has been applied many times, e.g., in epidemiology, marketing, agriculture, ecology, forestry, geography, and image analysis ([Gégout-Petit et al.(2019)], [Caragea and Kaiser (2009)], [Shin et al. (2019)], [He et al. (2003)], [Koutsias N (2003)]). [Caragea and Kaiser (2009)] considered a centered autologistic model with more interpretable parameters which describe spatial dependence. ...
Preprint
Full-text available
We propose a new graphical model to describe the comorbidity of allergic diseases. We present our model in two versions. First, we introduce a generative model that correctly reflects the variables' causal relationship. Then we propose an approximation of the generative model by another misspecified model that is computationally more efficient and easily interpretable. We will focus on the misspecified version, which we consider more practical. We include in the model two directed graphs, one graph of known dependency between the main binary variables (diseases), and a second graph of the dependence between the occurrence of the diseases and their symptoms. In the model, we also consider additional auxiliary variables. The proposed model is evaluated on a cross-sectional multicentre study in Poland on the ECAP database (www.ecap.pl). An assessment of the stability of the proposed model was obtained using bootstrap and jackknife techniques.
... Gibbs random fields, such as the auto-logistic Ising model (Besag, 1972), have been studied in great detail in statistics and employed in various forms in spatial statistics for modelling binary outcomes with neighbourhood dependencies. To accommodate interpretations in terms of the Behavioural and Social Sciences, Robins et al. (2001) elaborated on these Gibbs-distributions and derived a class of 'social influence' models from a set of specific dependence assumptions. ...
... In this case the ALAAM reduces to a logistic regression model. Auto-logistic models relax the assumption of independence by allowing the state of sites to depend on the states of their neighbours in, for example, a lattice like in the Ising model (Besag, 1972). Besag (1974) elaborate auto-logistic models for different types of lattice systems and define dependencies of the first as well as the second order. ...
Article
Full-text available
The network influence model is a model for binary outcome variables that accounts for dependencies between outcomes for units that are relationally tied. The basic influence model was previously extended to afford a suite of new dependence assumptions and because of its relation to traditional Markov random field models it is often referred to as the auto logistic actor‐attribute model (ALAAM). We extend on current approaches for fitting ALAAMs by presenting a comprehensive Bayesian inference scheme that supports testing of dependencies across subsets of data and the presence of missing data. We illustrate different aspects of the procedures through three empirical examples: masculinity attitudes in an all‐male Australian school class, educational progression in Swedish schools, and unemployment among adults in a community sample in Australia.
... Bayesian networks are closely related to structural equation models [41] and are distinguished from other types of graphical models by the usage of directed edges instead of undirected, symmetric edges to make more specific statements about conditional independencies between variables resulting from the graphical structure. Whereas undirected graphical models such as the Ising/autologistic model [14] and the Boltzmann machine [56] do not make statements about the direction of edges, Bayesian networks may use this directionality to encode causal information [110]. While it may not be the case that every Bayesian network analysis requires causal interpretation of the links, it is often true that these are used to represent a causal mechanisms [64]. ...
... Some representative scenarios include data in non-Euclidean spaces [31] with a treelike topology [62,137] or data associated with angular coordinates. GLMs with adjustments to include spatially correlated random effects [14] or spatially correlated error variables are a mainstay in spatial econometrics [5] and in spatial Bayesian modeling [13]. The primary differentiating factor between spatial GLMs and Bayesian networks is that the former only models dependencies between the covariates and the response variable though their relative simplicity can be attractive for cases in which Bayes net features are unnecessary, while Bayesian networks typically model the joint correlation structure between all variables simultaneously. ...
Article
Bayesian networks are a popular class of multivariate probabilistic models as they allow for the translation of prior beliefs about conditional dependencies between variables to be easily encoded into their model structure. Due to their widespread usage, they are often applied to spatial data for inferring properties of the systems under study and also generating predictions for how these systems may behave in the future. We review published research on methodologies for representing spatial data with Bayesian networks and also summarize the application areas for which Bayesian networks are employed in the modeling of spatial data. We find that a wide variety of perspectives are taken, including a GIS-centric focus on efficiently generating geospatial predictions, a statistical focus on rigorously constructing graphical models controlling for spatial correlation, as well as a range of problem-specific heuristics for mitigating the effects of spatial correlation and dependency arising in spatial data analysis. Special attention is also paid to potential future directions for integration of Bayesian networks with spatial processes.
... There are strong links between statistical physics and spatial statistics. These date back to the work of Besag [6,7] on lattice autoregression models, which was inspired by the Ising model developed for magnetic systems by the physicist Ernst Ising [32], and to the seminal Monte Carlo algorithm of Nicholas Metropolis [38], which was initially developed for the simulation of liquids and today occupies a central position in Bayesian statistics and simulation-based statistical inference [42]. ...
... From (4.3), it follows immediately that our grid approximation of the model is closely related to two widely used models for lattice data in spatial statistics. The most popular and fundamental model for areal data observed over known graphs, which are defined by means of the respective adjacency matrix, is due to Besag [6,7,8,43], and it is often referred to as the conditionally autoregressive (CAR) model. The general Besag model can be characterized through its conditional distributions ...
Article
Full-text available
We investigate a connection between spatial statistics and statistical physics to obtain new covariance functions with direct physical interpretation for spatial random fields. These covariance functions are based on the exponential Boltzmann-Gibbs representation and use an energy functional to represent interactions between the values of the random field at different points in space. This formulation results in closed-form generalized covariance functions, which display infinite variance in Euclidean spaces of dimension larger than one. We propose regularization schemes in real and reciprocal (spectral) space that lead to well-behaved covariance structures. The real-space regularization parameter allows a continuous interpolation between the Boltzmann-Gibbs covariance and the exponential covariance. We also propose discretized approximations on regular grids, and we show that they represent reparametrized versions of the well-known Besag and Leroux lattice models. We then discuss parameter estimation and spatial prediction for the regularized Boltzmann-Gibbs covariance model in two dimensions. We recommend using the pairwise difference likelihood that combines satisfactory estimation performance and good scalability with many observation points. The predictive performance of the regularized covariance function is assessed by means of cross-validation statistics. Irregularly spaced samples from the Walker Lake dataset are used, and spatial prediction is conducted by means of ordinary kriging. The regularized Boltzmann- Gibbs covariance yields improved predictive performance compared to the exponential covariance model.
... Extending the expression for the MVL model without regressors (also known as auto-logistic model) given in Besag (1972) we define the probability of market basket y mt conditional on regressors x mt as follows: ...
Article
Full-text available
Using multivariate logit models, we analyze purchases of product categories made by individual households. We introduce a sparse multivariate logit model that considers only a subset of all two-way interactions. A combined forward and backward selection procedure based on a cross-validated performance measure excludes about 74 % of the possible two-way interactions. We also specify random coefficient versions of both the non-sparse and the sparse model. The fact that the random coefficient models lead to better values of the Bayesian information criterion demonstrates the importance of latent heterogeneity. The random coefficients sparse model attains the best statistical performance if we consider model complexity and offers a better interpretability. We investigate the cross-purchase effects of household segments derived from this random coefficient model. As additional interpretation aid we cluster categories and category pairs by integer programming. We demonstrate what the best performing sparse model implies for cross-selling by product recommendations and store layout. The sparse model leads to managerial implications with respect to the effects of advertising in local newspapers and flyers that are as a rule close to those implied by its non-sparse counterpart.
... The temporal pattern of pheromone emission (calling) and the frequencies of the behavioural units were compared using the Generalized Linear Model (GLM) (Besag, 1972), assuming a Poisson distribution. In the calling behavioural experiments, time and lineage were considered predictor variables, assuming the independence of the time factor. ...
Article
We analysed the influence of laboratory domestication, under relaxed conditions, on the courtship behaviour of the fruit fly species Anastrepha obliqua, an important agricultural pest. We compared the temporal patterns of pheromone emission (Calling behaviour) and the frequencies and sequences of the courtship behavioural units of males of a laboratory lineage and a wild lineage. Our results indicated similarities in the temporal beha-vioural patterns of calling, the durations of their behavioural sequences, the final sequences of courtships resulting in copulation, of wild and laboratory males. Differences , however, were observed between the two populations in terms of the frequencies of the behavioural units executed and the initial sequence of courtship. Differences were noted in the presence or absence of some behavioural units within the courtship behavioural repertoires of the laboratory-reared and wild. The wild males did not show units such as Alignment, Contact, Fighting and Marking Leaf that were observed in the laboratory males' courtship behaviour under laboratory conditions; on the other hand, laboratory males did not show the Abdominal movements and Oscillation observed in the courtship behaviour of wild males. The rearing of A. obliqua males under relaxed conditions in the laboratory provides an environment adequate for the preservation of beha-vioural characteristics relevant to the successful mating, such as Movement, Arrowhead 1, and Attempt, and in temporal patterns of pheromone emission.
... According to Hughes et al. (2011), this model is straightforward to implement and fast to compute as the probability of a material at a specific location depends on the presence or absence of the material in only two nearest neighboring locations. The model will be briefly discussed; however, more information can be found in Bartlett and Besag (1969), Besag (1972), Honjo (1985) and Montoya-Noguera and Lopez-Caballero (2016). ...
... In the homogeneous MVL model, each coefficient is constant across households. Extending the expression for the homogeneous MVL model without regressors (also known as auto-logistic model) given in Besag (1972) we define the probability of market basket y mt conditional on regressors x mt as follows: ...
Article
Full-text available
A regressor is endogenous if it is correlated with the unobserved residual of a model. Ignoring endogeneity may lead to biased coefficients. We deal with the omitted variable bias that arises if firms set marketing variables considering factors (demand shocks) that researchers do not observe. Whereas publications on sales response or brand choice models frequently take the potential endogeneity of marketing variables into account, multicategory choice models provide a different picture. To consider endogeneity in multicategory choice models, we follow a two-step Gaussian copula approach. The first step corresponds to an individual-level random coefficient version of the multivariate logit model. We analyze yearly shopping data for one specific grocery store, referring to 29 product categories. If the assumption of a Gaussian correlation structure is met, the copula approach indicates the endogeneity of a category-specific marketing variable in about 31% of the categories. The majority of marketing variables rated as endogenous are positively correlated with the omitted variable, implying that ignoring endogeneity leads to an overestimation of the coefficients of the respective marketing variable. Finally, we investigate whether taking endogeneity into account by the copula approach leads to different managerial implications. In this regard, we demonstrate that for our data ignoring endogeneity often suggests a level of marketing activity that is too high.
... Example 5.4 (Autologistic regression models). The arguments in Section 5.3 are used here to deliver finite RBMp-estimators for autologistic regression models, which are typically estimated by composite likelihoods. Autologistic regression (J. Besag, 1974;J. E. Besag, 1972) is an important model for analysing binary responses with spatial or network correlation, used extensively in a range of disciplines, including ecology, anthropology, and computer vision; see Wolters (2017) for references. Suppose that y(s) ∈ {−1, 1} is an observation at location s ∈ S = {s 11 , . . . , s 1c 1 , . . . , s k1 , . . . , ...
Article
Full-text available
We develop a novel, general framework for reduced-bias M-estimation from asymptotically unbiased estimating functions. The framework relies on an empirical approximation of the bias by a function of derivatives of estimating function contributions. Reduced-bias M-estimation operates either implicitly, solving empirically adjusted estimating equations, or explicitly, subtracting the estimated bias from the original M-estimates, and applies to partially or fully specified models with likelihoods or surrogate objectives. Automatic differentiation can abstract away the algebra required to implement reduced-bias M-estimation. As a result, the bias-reduction methods, we introduce have broader applicability, straightforward implementation, and less algebraic or computational effort than other established bias-reduction methods that require resampling or expectations of products of log-likelihood derivatives. If M-estimation is by maximising an objective, then there always exists a bias-reducing penalised objective. That penalised objective relates to information criteria for model selection and can be enhanced with plug-in penalties to deliver reduced-bias M-estimates with extra properties, like finiteness for categorical data models. Inferential procedures and model selection procedures for M-estimators apply unaltered with the reduced-bias M-estimates. We demonstrate and assess the properties of reduced-bias M-estimation in well-used, prominent modelling settings of varying complexity.
... In our study, we expect the rate of agricultural landownership to be spatially dependent. We account for this spatial dependency using a "Besag" Gaussian model, which accounts for administrative unit's spatial autocorrelation by including a correlated spatial effect and an uncorrelated random effect (Besag, 1972). The model requires us to define the spatial neighborhood structure of dependencies among observations, which we accomplished using a first-order queen contiguity matrix. ...
Article
Full-text available
The adverse effects of climate change are likely to harm agricultural livelihoods and food supplies worldwide. Faced with challenges resulting from increasingly unpredictable weather patterns, some farmers might abandon their occupations. Existing research has found that drier than usual weather reduces landownership rates through these pathways. Such trends could be disruptive at a population level, threatening a country’s economic and political stability. We analyze subnational agricultural landownership data that cover 50 countries on four continents between 2004 and 2017. Our Demographic and Health Surveys (DHS) dataset speaks to the experiences of 1,123,714 households. Our predictor of environmental stress is the growing season standardized precipitation-evapotranspiration index, which measures deviations from local weather patterns dating back to 1901. We find that drier than average growing season weather is associated with declining landownership rates. For every dry growing season before a DHS survey, the agricultural landownership rate falls by 2.51%. This effect is most robust in African countries, which was the focus of a recent study on this topic, and we offer several plausible interpretations of these regional differences.
... For example, building off the generalized linear mixed model (GLMM) time series literature, one can consider the spatio-temporal data to follow an independent non-Gaussian (Bernoulli) distribution conditioned on a latent Gaussian dynamic process (e.g., West et al., 1985;Gamerman, 1998;Lopes et al., 2011;Cressie and Wikle, 2011). Another option is to consider such data as a binary Markov random field (i.e., an auto-logistic model; Besag (1972); Zhu et al. (2005Zhu et al. ( , 2008). Yet another approach considers the data to follow a cellular-automata (CA) with binary states with simple evolution rules that describe the change of the states over time (e.g., Hooten and Wikle, 2010;Hooten et al., 2020). ...
Preprint
Full-text available
Binary spatio-temporal data are common in many application areas. Such data can be considered from many perspectives, including via deterministic or stochastic cellular automata, where local rules govern the transition probabilities that describe the evolution of the 0 and 1 states across space and time. One implementation of a stochastic cellular automata for such data is with a spatio-temporal generalized linear model (or mixed model), with the local rule covariates being included in the transformed mean response. However, in real world applications, we seldom have a complete understanding of the local rules and it is helpful to augment the transformed linear predictor with a latent spatio-temporal dynamic process. Here, we demonstrate for the first time that an echo state network (ESN) latent process can be used to enhance the local rule covariates. We implement this in a hierarchical Bayesian framework with regularized horseshoe priors on the ESN output weight matrices, which extends the ESN literature as well. Finally, we gain added expressiveness from the ESNs by considering an ensemble of ESN reservoirs, which we accommodate through model averaging. This is also new to the ESN literature. We demonstrate our methodology on a simulated process in which we assume we do not know all of the local CA rules, as well as a fire evolution data set, and data describing the spread of raccoon rabies in Connecticut, USA.
... Numerous techniques have been developed to eliminate autocorrelation and improve statistical inferences by incorporating spatial terms that represent neighborhood locational effects [13][14][15][16][17]. Autologistic terms, which account for the effects of having neighboring grid cells populated with the focal species using neighborhood weights matrices, are the most prevalent method of incorporating spatial considerations into an occupancy modeling framework [9,18,19]. ...
Article
Full-text available
Loss of habitat and human disturbance are major factors in the worldwide decline of shorebird populations, including that of the threatened migratory piping plover (Charadrius melodus). From 2013 to 2018, we conducted land-based surveys of the shorebird community every other week during the peak piping plover season (September to March). We assessed the ability of a thin plate spline occupancy model to identify hotspot locations on Whiskey Island, Louisiana, for the piping plover and four additional shorebird species (Wilson’s plover (Charadrius wilsonia), snowy plover (Charadrius nivosus), American oystercatcher (Haematopus palliatus), and red knot (Calidris canutus)). By fitting single-species occupancy models with geographic thin plate spline parameters, hotspot priority regions for conserving piping plovers and the multispecies shorebird assemblage were identified on the island. The occupancy environmental covariate, distance to the coastline, was weakly fitting, where the spatially explicit models were heavily dependent on the spatial spline parameter for distribution estimation. Additionally, the detectability parameters for Julian date and tide stage affected model estimations, resulting in seemingly inflated estimates compared to assuming perfect detection. The models predicted species distributions, biodiversity, high-use habitats for conservation, and multispecies conservation areas using a thin-plate spline for spatially explicit estimation without significant landscape variables, demonstrating the applicability of this modeling approach for defining areas on a landscape that are more heavily used by a species or multiple species.
... Wang et al., 2019). In this sense, the spatial regression model can address the dependence factor and is a suitable method to deal with spatial interaction -spatial autocorrelation -and spatial structure -spatial homogeneity (Anselin et al., 1996;Anselin & Rey, 2010;Besag, 1972). Research from the macro spatial perspective concluded that spatial methods outperform non-spatial methods based on explicitly considering the spatial correlation of road crash data (Guo et al., 2018;Quddus, 2008;Xu & Huang, 2015). ...
Article
The article aims to investigate the influence of risk exposure factors on the frequency of road crashes from January to August 2020 in Ciudad Juarez, Mexico. It is a longitudinal study with four data sets: road crashes, population and housing census, location of economic activities, and road network information. Specifically, this study investigates the relationship between exposure factors – demographics, main roads and land use – and road crashes. A mixed method analysis was employed, (1) spatial analysis using GIS techniques; and (2) a negative binomial spatial regression model. The results showed a strong spatial dependence (0.274; p-value 0.00) of road crashes in the census tracts, and this effect was statistically significant (0.007) in the spatial regression model. In the model, a high probability (<0.05) of road crashes in the census tracts was found with the population aged 15 to 65 years, the length of main roads and the level of road coverage (Engel index), land uses with economic activities of an industrial and commercial character. The findings of this study successfully capture the social, economic, and urban conditions during the January–August 2020 period in the context of the COVID-19 pandemic. This new knowledge could help create preventive plans and policies to address the frequency of road crashes.
... For a review of the history of Geo-Statistics and Kriging methods see Armstrong (1998); Chilès and Delfiner (2012); Cressie (2015); Isaaks and Srivastava (1989); Wackernagel (2013). A major step in the development of Geo-Statistics is the work of Besag (1972Besag ( , 1974, were Markov random fields are used to characterize network-based models (called auto-models, auto-regressive models or graphical models) specifically designed to describe spatially auto-correlated data based on neighborhood relationships, see Whittle (1954Whittle ( , 1963 and Bartlett (2013). ...
Article
Full-text available
We introduce a model in the context of ecology that can be used to describe the distribution and abundance of individuals when data from field work is extremely limited (for example, in the case of endangered species). Our procedure is based on an intuitive understanding of the physical properties of phenomena. The idea is that individuals have the tendency to be attracted (or repulsed) to certain properties of the environment. At the same time, they are spread in such a way that if there is no reason for them to be in some specific locations, then they are uniformly distributed throughout the region. Our model draws from quantum mechanics, by using quantum Hamiltonians in the context of classical statistical mechanics. The equilibrium between the spreading and the attractive (or repulsive) forces determines the behavior of the species that we model, and this is expressed in terms of a global control problem of an energy operator which is the sum of a kinetic term (spreading) and a potential (attraction or repulsion). We focus on the full probability measure and a global control of the model (instead of looking at conditional measures that generate a global measure). Furthermore, we propose a numerical solution to this global control problem that overcomes the well-known major difficulty of Gibbs sampling (annealing) which is the fact that a global control is hardly reachable when the number of variables is large (the algorithms get stuck in non-optimal states).
... In the homogeneous MVL model, each coefficient is constant across households. Extending the expression for the homogeneous MVL model without independent variables (also known as auto-logistic model) given in Besag (1972) we define the probability of market basket y mt conditional on independent variables x mt as follows: ...
Article
Full-text available
We investigate the relevance of dynamic variables that reflect the purchase history of a household as independent variables in multicategory choice models. To this end, we estimate both homogeneous and finite mixture variants of the multivariate logit model. We consider two types of dynamic variables. Variables of the first type, which previous publications on multicategory choice models have ignored, are exponentially smoothed category purchases, which we simply call category loyalties. Variables of the second type are log-transformed times since the last purchase of any category. Our results clearly show that adding dynamic variables improves statistical model performance with category loyalties being more important than log-transformed times. The majority of coefficients of marketing variables (features, displays, and price reductions), pairwise category interactions, and cross-category relations differ between models either including or excluding dynamic variables. We also measure the effect of marketing variables on purchase probabilities of the same category (own effects) and on purchase probabilities of other categories (cross effects). This exercise demonstrates that the model without dynamic variables tends to overestimate own effects of marketing variables in many product categories. This positive omitted variable bias provides another explanation for the well-known problem of “overpromotion” in retailing.
... We fit two centered autologistic regression models to examine the association between each of the six vulnerability measures and the binary dependent variable measuring whether a county experienced a severe outage, on both the relative and absolute scales [34]. Centered autologistic regression is a variation of logistic regression that adjusts for spatial autocorrelation (i.e., that neighboring counties are more alike than distant counties) through the inclusion of a centered, spatially lagged y term (the average value of the neighboring counties)-the autocovariate [35]. Regression diagnostics showed that the usage of the centered autologistic model with basic binary weighting removed much of the spatial dependency observed in the residuals of the non-spatial logistic regression model (Supplementary Tables 2 and 3). ...
Article
Full-text available
Background Precipitated by an unusual winter storm, the 2021 Texas Power Crisis lasted February 10 to 27 leaving millions of customers without power. Such large-scale outages can have severe health consequences, especially among vulnerable subpopulations such as those reliant on electricity to power medical equipment, but limited studies have evaluated sociodemographic disparities associated with outages. Objective To characterize the 2021 Texas Power Crisis in relation to distribution, duration, preparedness, and issues of environmental justice. Methods We used hourly Texas-wide county-level power outage data to estimate geographic clustering and association between outage exposure (distribution and duration) and six measures of racial, social, political, and/or medical vulnerability: Black and Hispanic populations, the Centers for Disease Control and Prevention (CDC) Social Vulnerability Index (SVI), Medicare electricity-dependent durable medical equipment (DME) usage, nursing homes, and hospitals. To examine individual-level experience and preparedness, we used a preexisting and non-representative internet survey. Results At the peak of the Texas Power Crisis, nearly 1/3 of customers statewide (N = 4,011,776 households/businesses) lost power. We identified multiple counties that faced a dual burden of racial/social/medical vulnerability and power outage exposure, after accounting for multiple comparisons. County-level spatial analyses indicated that counties where more Hispanic residents resided tended to endure more severe outages (OR = 1.16, 95% CI: 1.02, 1.40). We did not observe socioeconomic or medical disparities. With individual-level survey data among 1038 respondents, we found that Black respondents were more likely to report outages lasting 24+ hours and that younger individuals and those with lower educational attainment were less likely to be prepared for outages. Significance Power outages can be deadly, and medically vulnerable, socioeconomically vulnerable, and marginalized groups may be disproportionately impacted or less prepared. Climate and energy policy must equitably address power outages, future grid improvements, and disaster preparedness and management.
... Models with unknown normalizing constants arise frequently in many different areas. Examples include Ising models [17] in statistical physics, autologistic models [6] [5] in spatial statistics, exponential random graph models [33] in sociology, disease transmission models [29] in epidemiology, and so on. The corresponding statistical inference problem can be formulated as follows. ...
... The situation is similar to the widely studied problem of inference based on the observation of a single instance of a Markov random field, and in many cases reduces to this if we only observe the system at one time point at stationarity. These are used, for instance, in spatial statistics (Besag, 1972;Gelfand et al, 2010) and image reconstruction (Besag, 1986;Geman and Geman, 1984). The conditioning method we consider here is similar to the ''coding'' scheme introduced by Besag (1974), which conditions on a set of sites that makes the remaining observations independent; consistency of such methods has been shown by Comets (1992) and reviewed by Larribe and Fearnhead (2011). ...
Article
Although the rates at which positions in the genome mutate are known to depend not only on the nucleotide to be mutated, but also on neighboring nucleotides, it remains challenging to do phylogenetic inference using models of context-dependent mutation. In these models, the effects of one mutation may in principle propagate to faraway locations, making it difficult to compute exact likelihoods. This article shows how to use bounds on the propagation of dependency to compute likelihoods of mutation of a given segment of genome by marginalizing over sufficiently long flanking sequence. This can be used for maximum likelihood or Bayesian inference. Protocols examining residuals and iterative model refinement are also discussed. Tools for efficiently working with these models are provided in an R package, which could be used in other applications. The method is used to examine context dependence of mutations since the common ancestor of humans and chimpanzee.
... Notice that in spite of this clear difference, some existing studies, for example, Lin and Clayton (2005, Assumptions 2.1 and 2.2, Section 2) have suggested a linear auto-regressive order 1 (AR(1)) type longitudinal correlation model for the analysis of spatial binary data. Following Besag (1972aBesag ( , b, 1974, some other studies (e.g., Rathbun and Cresie 1994) have suggested a binary dynamic logistic model to analyze such spatial binary data. We briefly explain this dynamic (longitudinal) logit model for spatial multinomial data, as follows. ...
Article
There exist many studies on regression analysis for spatial binary data, espsecially in ecological, environmental and socio-economic setups, where spatial responses from neighboring locations within a given threshold distance are correlated. However, in some of these studies, it could be more natural to consider a spatial regression analysis for categorical response data with more than two categories, as an improvement over the spatial binary analysis. But, this type of regression analysis for spatial categorical/multinomial data is not adequately addressed in the literature. One of the main reasons is the difficulty of modeling the spatial familial correlations for categorical data, where a spatial family is generated within the threshold distance for each of the two selected neighboring locations. Also, some of the locations from two families may be pair-wise correlated. Unlike the existing studies, in this paper we propose a familial random effects based multinomial logits mixed (MLM) effects model which accommodates both within and between familial correlations for spatial multinomial data. In this context, the proposed spatial multinomial correlations are contrasted with existing longitudinal multinomial correlations so that the longitudinal correlation models are avoided for spatial multinomial data. Both regression effects and the random effects influence parameters are estimated using the generalized quasi-likelihood approach, whereas the random effects variance and correlation parameters are estimated by the well known method of moments. The large sample properties such as consistency of the proposed estimators are studied analytically. The asymptotic normality of the regression estimators is also studied for the convenience of constructing the confidence intervals when needed. The devirations and proofs are given in details, as opposed to conducting a limited simulation study, to justify the validity and convergence properties of the proposed estimators. The estimating equations those produced consistent estimates are clearly formulated for the computational benefit to the practitioners.
... Autologistic models of plant disease spread integrate binary response variables, including disease presence or absence. These models were implicated in a pioneering work on the development of statistical theory (Besag 1972). The autologistic model has been applied to establish the incidence of footrot disease in endive (Besag 1977). ...
Article
Full-text available
Rice yellow mottle virus (RYMV) causes severe rice ( Oryza sativa L.) yield loss. It has been endemic to sub-Saharan Africa and Madagascar since 1966. Transmission (plant community level) and long-dispersal (regional and continental scale) models have been established but viral spread in farming communities continues, while the conditions causing local disease outbreaks remain unclear. We hypothesized that local outbreaks, comprising inter-plot virus spread and intra-plot disease aggravation, are significantly associated with individual farmers’ attributes and agronomic practices. To test this hypothesis, spatial autoregressive models were constructed using variables collected by visual observation and farmer interviews. Field surveys were conducted during four consecutive cropping seasons from 2011 to 2013 in the Lower Moshi Irrigation Scheme of Kilimanjaro, Tanzania. Our models detected spatial dependence in inter-plot virus spread, but not in intra-plot disease aggravation. The probability of inter-plot virus spread increased with use of the IR64 cultivar (26.9%), but decreased with straw removal (27.8%) and crop rotation (47.7%). The probability of intra-plot disease aggravation decreased with herbicide application (24.3%) and crop rotation (35.4%). A simple cost-benefit analysis suggested that inter-plot virus spread should be mitigated by cultivar replacement and straw removal. When disease severity is critical, intra-plot disease aggravation should be inhibited by herbicide application, and rice should be rotated with other crops. This is the first study to upscale the spatial autoregressive model from the experimental field level to the farming community level, by obtaining variables through easy-to-implement techniques such as visual observation and farmer interview. Our models successfully identified candidate agronomic practices for the control of RYMV. However, as the causal relationships between agronomic practices and RYMV outbreaks remain unknown, field trials are needed to develop robust control measures.
... Acting like an additional covariate in regression models, ESF has the advantage of accounting for SA without altering the likelihood functions or regression equation description forms of geographic variables, and hence reducing methodological as well as computational complexities, particularly in the cases of non-Gaussian variables such as a binomial and Poisson (Besag 1972;Griffith, 2002Griffith, , 2004Griffith, , 2010Kaiser & Cressie 1997). Because an ESF can capture local as well as regional and global SA structures, different levels of associations between the geographic variable and the covariates can be estimated by adding selected interaction terms between chosen eigenvectors from the Moran Spectrum and the covariates, providing an alternative to Geographically Weighted Regression (GWR) (Fotheringham, Brunsdon, and Charlton 2002;Gelfand et al. 2003;Griffith 2008;Griffith, Chun, and Bin 2019;Waller et al. 2007), effectively addressing the issue of spatial heterogeneity. ...
Article
Full-text available
Geoinformatic Tupu, or Geoinformatic graph spectrum, is a theoretical as well as a technical framework for generalizing geographic knowledge and solving real world problems. Geoinformatic Tupu is a promising platform for capitalizing on the technical advances of Geographic Information Systems, and to integrate the Chinese traditional way of thinking with modern information technology. It has been one of the major research topics in the Chinese GIScience community in recent decades, with an evolving epistemological development. A core objective of Geoinformatic Tupu is to recover and represent geographic principles with the Tupu approach, which is adopted in this paper to formulate the First Law of Geography (FLG) [i.e. the law of spatial autocorrelation] as the Moran Spectrum – a combination of sequential diagrams, graphs, and numeric components. Using the Moran Spectrum as a conduit, we present the theory of Moran Eigenvector Spatial Filtering (MESF), a distinct branch of spatial statistics that has demonstrable advantages in statistical modelling and machine learning, but has yet to be widely disseminated due to its conceptual and computational complexity. This paper demonstrates the effectiveness of the Tupu approach in enriching the representation of the FLG as well as deepening its applications. It also suggests inclusion of the Moran Spectrum as a core component in Geoinformatic Tupu.
... The situation is similar to the widely-studied problem of inference based on observation of a single instance of a Markov random field, and in many cases reduces to this if we only observe the system at one time point at stationarity. These are used, for instance, in spatial statistics [Besag, 1972, Gelfand et al., 2010 and t OOXOXOOOO O X O X OOOOOXOXO Figure 1: An overview of the method for a neighbor-dependent model, showing an initial sequence on top, a collection of mutations progressing through time, and a final sequence. The pink trapezoidal region represents the "range of influence", i.e., possible locations and times at which a mutations is likely to affect the mutation outcome at the sites of interest (the three sites in the center of the final sequence). ...
Preprint
Full-text available
Although the rates at which positions in the genome mutate are known to depend not only on the nucleotide to be mutated, but also on neighboring nucleotides, it remains challenging to do phylogenetic inference using models of context-dependent mutation. In these models, the effects of one mutation may in principle propagate to faraway locations, making it difficult to compute exact likelihoods. This paper shows how to use bounds on the propagation of dependency to compute likelihoods of mutation of a given segment of genome by marginalizing over sufficiently long flanking sequence. This can be used for maximum likelihood or Bayesian inference. Protocols examining residuals and iterative model refinement are also discussed. Tools for efficiently working with these models are provided in an R package, that could be used in other applications. The method is used to examine context dependence of mutations since the common ancestor of humans and chimpanzee.
... Vegetation indices such as NDVI and its change trend can well reflect the change trends of vegetation [60], which can provide a reference for simulation and prediction. The spatial autocorrelation factor was proposed to address the spatial autocorrelation effect inherent to spatial statistical analysis [61]. Traditional logistic regression methods can significantly improve the simulation accuracy along with it [57], but machine learning methods have not yet considered it. ...
Article
Full-text available
Mangrove forests are important woody plant communities that grow in the intertidal zone between land and sea. They provide important social, ecological and economic services to coastal areas. In recent years, the growth environment of mangrove forests has been threatened. Mangrove forests have become one of the most endangered ecosystems in the world. To better protect mangrove forests, effective monitoring methods are essential. In this study, a spatio-temporal simulation method for mangrove forests was proposed in the mangrove protected areas of Hainan Island, China. This method compared the simulation accuracy of different models in terms of spatial characteristics, evaluated the applicability of driving factors in mangrove simulation and predicted the future spatio-temporal distribution and change trends of mangrove forests under different scenarios. The simulation results of different models showed that AutoRF (random forest with spatial autocorrelation) performs best in spatial characteristic simulation. Driving factors such as the Enhanced Vegetation Index (EVI), various location indices and the spatial autocorrelation factor can significantly improve the accuracy of mangrove simulations. The prediction results for Hainan Island showed that the mangrove area increased slowly under a natural growth scenario (NGS), decreased significantly under an economic development scenario (EDS) and increased significantly under a mangrove protection scenario (MPS) with 4460, 2704 and 5456 ha respectively by 2037. The contraction of mangrove forests is closely related to the expansion of aquaculture ponds, building land and cultivated land. Mangrove contraction is more severe in marginal or fragmented areas. The expansion of mangrove forests is due to the contraction of aquaculture ponds, cultivated land and other forests. The areas around existing mangrove forests and on both sides of the riverbank are typical areas prone to mangrove expansion. The MPS should be the most suitable development direction for the future, as it can reasonably balance economic development with mangrove protection.
... We may assume that it is not necessary to consider the potentials on large cliques to correctly model the spatial dependency. Then, in the autologistic model introduced byBesag (1972) for binary data, we only consider the cliques of one and two nodes, i.e. potentials at one location, and interactions between two locations, and the potential functions associated with larger cliques are set to zero. 23 i.e. such that C is the set of cliques of G ...
Thesis
This thesis deals with maximum likelihood estimation in dynamic and spatial extensions of the stochastic block model (SBM), based respectively on hidden Markov chains and fields. First, we consider a dynamic version of the stochastic block model, suited for the observation of networks at multiple time steps. In this dynamic SBM, the nodes are partitioned into latent classes and the connection between two nodes is drawn from a Bernoulli distribution depending on the classes of these two nodes. The temporal evolution of the nodes memberships is modeled by a hidden Markov chain. We prove the consistency (as the numbers of nodes and time steps increase) of the maximum likelihood and variational estimators of the parameters and obtain upper bounds on their rates of convergence. We also explore the case where the number of time steps is fixed and the connectivity parameters are allowed to vary. Besides, we obtain some results regarding parameter identifiability. Second, we introduce a spatial version of the SBM, suited for the observation of networks at different spatial locations. As before, the nodes are partitioned into latent classes and the connection is drawn from a Bernoulli distribution depending on the classes of these two nodes. The spatial evolution of the nodes memberships is modeled through hidden Markov random fields. We first prove that the parameter is generically identifiable under certain conditions. For the estimation of the parameters, we propose an algorithm based on the simulated field Expectation-Maximisation (EM) algorithm, a variation of the EM algorithm relying on a mean field like approximation based on the simulation of latent configurations.
... This additional approach was intended to be confirmatory of the regressions, free of a distance-decay function. Methods are outlined and discussed in Besag (1972), Cliff and Ord (1981), Augustin et al. (1996), Dormann (2007, Zuur et al. (2009), andEvans (2018). ...
Article
Full-text available
Liana dynamics may influence tree dynamics and vice versa. Only long‐term studies can perhaps disentangle them. In two permanent plots of lowland dipterocarp forest at Danum, a liana census in 1988 was repeated in 2018. The primary forest was still in a late stage of recovery from an inferred large and natural disturbance in the past. Mean number of lianas per tree decreased by 22% and 34% in plots 1 and 2, and in different ways. By 2018, there were relatively more trees with few lianas and relatively fewer trees with many lianas than in 1988. The redistribution was strongest for overstory trees of the Dipterocarpaceae (more with no lianas by 2018) and understory trees of the Euphorbiaceae (many losing high loads in especially plot 2). Proportion of trees with lianas increased overall by 3.5%. The number of lianas per tree showed a quadratic relationship with tree size: maximal for large trees, and fewer for smaller and very large trees. Tree survival and stem growth rate were significantly negatively related to the number of lianas after accounting for spatial autocorrelation. Monte Carlo random subsampling of trees in 1988 and 2018, to achieve statistical independence, established significance of change. Dipterocarps and euphorbs clearly differed in their liana dynamics between plots. Regression models had different forms for the two plots, which reflected a complicated structural–spatial variability in host–liana dynamics. Analysis of the abundant tree species individually highlighted a group of emergent dipterocarps with low liana counts decreasing with time. Building on an earlier hypothesis for this forest type and site, these very large trees appear to have been losing their lianas by branch shedding, as they moved into and out of the main canopy. They were evidently escaping from the parasite. The process may in part explain the characteristically very uneven forest canopy at Danum. Change in liana density was therefore contingent on both forest history and site succession, and plot‐level structure and tree dynamics. Liana promotion in the intermittent ENSO dry periods was seemingly being offset by closing of the forest and dominance by dipterocarps in late seral stages.
... There is also a long history on spatial binary data analysis. See, for example, Besag [4,5] and Besag [6], for some early studies. For some recent studies over the past three decades, see, for example, Rathban and Cressie [14], Heagerty and Lele [9], Lin and Clayton [11], and Ainsworth et al. [1]. ...
Article
There is a long history of spatial regression analysis where it is important to accommodate the spatial correlations among the responses from neighboring locations for any valid inferences. Among numerous modeling approaches, the so-called spatial auto-regression (SAR) model in a linear setup, and the conditional auto-regression (CAR) model in a binary setup, are widely used. For spatial binary analysis, there exists two other competitive approaches, namely the bivariate probit models (BPM) based composite likelihood approach using local lattices; and a ‘Working’ correlations based QL (quasi-likelihood) (WCQL) approach. These correlation models, however, fail to accommodate both within and between correlations among spatial families, where a spatial family is naturally formed within a threshold distance of a selected location, and the member locations between two neighboring families may also be correlated. In this paper, we exploit this latter two-ways, within and between correlations among spatial families and develop a unified correlation model for all exponential family based such as linear, count or binary data. We further exploit the proposed correlation structure based generalized quasi-likelihood (GQL) and method of moments (MM) approaches for model parameters estimation. As far as the estimation properties are concerned, because in practice one encounters a large spatial sample, we make sure that the proposed GQL and MM estimators are consistent.
... Another important difference is that we do not require the binary indicators I(β lk = 0) for 0 ≤ k < 2 l and l ≤ L max to be iid Bernoulli random variables. This extension allows us to consider, for example, Ising prior constructions [7] which allow the inclusion indicators to be related through a Markovian model. ...
Preprint
Many real-life applications involve estimation of curves that exhibit complicated shapes including jumps or varying-frequency oscillations. Practical methods have been devised that can adapt to a locally varying complexity of an unknown function (e.g. variable-knot splines, sparse wavelet reconstructions, kernel methods or trees/forests). However, the overwhelming majority of existing asymptotic minimaxity theory is predicated on homogeneous smoothness assumptions. Focusing on locally Holderian functions, we provide new locally adaptive posterior concentration rate results under the supremum loss for widely used Bayesian machine learning techniques in white noise and non-parametric regression. In particular, we show that popular spike-and-slab priors and Bayesian CART are uniformly locally adaptive. In addition, we propose a new class of repulsive partitioning priors which relate to variable knot splines and which are exact-rate adaptive. For uncertainty quantification, we construct locally adaptive confidence bands whose width depends on the local smoothness and which achieve uniform asymptotic coverage under local self-similarity. To illustrate that spatial adaptation is not at all automatic, we provide lower-bound results showing that popular hierarchical Gaussian process priors fall short of spatial adaptation.
... Nous empruntons ce formalisme aux Markov Random Fields (MRFs) [32] ou champs de Markov en français. Les MRFs décrivent un formalisme mathématique notamment utilisé en traitement de l'image [63,106] et peuvent plus généralement s'appliquer à la description de processus spatiaux [7,8,29]. ...
Thesis
Full-text available
Les applications collaboratives décentralisées permettent de répondre aux problèmes de confidentialité, de disponibilité et de sécurité inhérents aux plateformes collaboratives centralisées. Elles reposent sur un paradigme de communication pair-à-pair selon lequel tous les utilisateurs sont directement connectés les uns aux autres. Les collaborations ayant tendance à s'élargir et dépasser les frontières des organisations, il est nécessaire de garantir aux utilisateurs le contrôle sur leurs données tout en assurant la disponibilité de la collaboration. Pour ce faire, il est possible d'utiliser comme topologie le réseau social qui s'est tissé entre les collaborateurs. Le manque d'information sur ce maillage de confiance nous amène à développer une approche pour étudier ses propriétés morphologiques. Dans cette thèse, nous développons et mettons en œuvre une approche permettant d'étudier la structure sociale des interactions dans le cadre de collaborations inter-organisationnelles. Nous proposons une approche stochastique qui s'inspire des Exponential Random Graph Models (ERGM) et des modèles spatiaux. Nous définissons un formalisme qui met en avant la structure des interactions et intègre la dimension organisationnelle. Nous proposons d'utiliser une méthode d'inférence bayésienne, ABC Shdadow, pour contourner les difficultés liées à l'estimation de ce modèle. Cette approche est mise en œuvre sur un exemple réel : les collaborations initiées par les chercheurs d'un laboratoire. Elle permet notamment de montrer la faible propension, pour un chercheur, à tisser des liens avec d'autres laboratoires. Nous montrons que cette approche peut être appliquée à d'autres types d'interactions sociales, comme les interactions entre les enfants d'une école primaire. Enfin, nous présentons une stratégie de parallélisation de l'échantillonneur de Gibbs visant à traiter des graphes de plus grande taille dans un temps raisonnable.
... Examples of outcomes that have been typically treated as individual, but that extant research has found to be contagious include academic performance , 1 The ALAAM is "Autologistic" because it has the form of a logistic regression with a binary outcome variable (presence/absence of an attribute) that is predicted using an exponential function taking as argument combinations of the attribute itself with other covariates. The term "autologistic" may also be linked to the historical fact that ALAAMs-and ERGMs from which they derive-share a common origin in the autologistic Ising model for Markov random fields (Besag, 1972). 2 Ideally, the data for ALAAMs should be collected at two points in time, with the network data collected prior to the social contagion outcome. ...
Article
Full-text available
Autologistic Actor Attribute Models (ALAAMs) provide new analytical opportunities to advance research on how individual attitudes, cognitions, behaviors, and outcomes diffuse through networks of social relations in which individuals in organizations are embedded. ALAAMs add to available statistical models of social contagion the possibility of formulating and testing competing hypotheses about the specific mechanisms that shape patterns of adoption/diffusion. The main objective of this paper is to provide an introduction and a guide to the specification, estimation, interpretation and evaluation of ALAAMs. Using original data, we demonstrate the value of ALAAMs in an analysis of academic performance and social networks in a class of graduate management students. We find evidence that both high and low performance are contagious, i.e., diffuse through social contact. However, the contagion mechanisms that contribute to the diffusion of high performance and low performance differ subtly and systematically. Our results help us identify new questions that ALAAMs allow us to ask, new answers they may be able to provide, and the constraints that need to be relaxed to facilitate their more general adoption in organizational research. Forthcoming in: Organizational Research Methods Keywords: autologistic actor attribute model (ALAAM), individual performance, diffusion, exponential-family random graph models (ERGMs), social contagion, social influence, social networks, statistical models ° We are grateful to Garry Robins for his insightful comments offered on an earlier draft, and to Jüergen Lerner for his expert help and advice on the figures. 2
... The Potts and Ising (J.E. Besag, 1972) models are equivalent when the hidden stochastic process is binary valued. ...
Thesis
The work presented in this thesis deals with medical image segmentation in the field of vascular surgery, where like in most of the fields of medical research, the images deeply transform the medical knowledge and the treatments. The spread of endovascular surgical procedures and biomaterial implantations is permitted by the recent progress in image processing. However, many issues (corrupted and scarce images, huge data to process, ...) are still barriers to the development of safer and patient-specific surgical treatments. We address these issues via the study of new probabilistic models, notably via hidden Markov models.The new models are thus motivated by real issues. However, they offer significant methodological contributions for statistical image processing and have been developed in a rigorous theoretical context. In particular, new models of hidden Markov chain and pairwise hidden Markov field are proposed, in order to model highly correlated noises. A triplet Markov tree model is developed to enhance the spatial correlations in the classical Markov tree model, based on properties of Markov fields. Inference and parameter estimation within the new models are specifically treated in our studies. The models are first tested on synthetic data and compared to other alternative approaches from the literature. Eventually, the models are applied to real images from vascular surgery available at the Geprovas laboratory.
... Because similar types of construction land tend to be distributed in the same areas, the expansion of construction land is likely to be affected by the surrounding land; therefore, it is necessary to eliminate the spatial autocorrelation effect. To correct the spatial autocorrelation problem of the logistic regression model, an autologistic model was proposed in 1972 [79]. Based on the original model, the autologistic model adds spatial correlation variables, which greatly improves the accuracy of probability prediction and the versatility of the model. ...
Article
Full-text available
Studying the factors that influence the expansion of different types of construction land is instrumental in formulating targeted policies and regulations, and can reduce or prevent the negative impacts of unreasonable land use changes. Using land use survey data of Beijing (2001 and 2010), an autologistic model quantitatively analyzed the leading driving forces and differences in four types of construction land expansion (industrial, residential, public service, and commercial land types), focusing on the impact of spatial autocorrelation. The results showed that the influencing factors vary greatly for different types of construction land expansion; the same factor may have a different impact on different construction land, and both planning factors and spatial autocorrelation variables have a significant positive effect on the four types. Accordingly, the municipal government should consider the differences in the expansion mechanisms and driving forces of different construction land and formulate suitable planning schemes, observe the impact of spatial autocorrelation on construction land expansion, and guide spatial agglomeration through policies while appropriately controlling the scale of expansion. The methods and policy recommendations of this research are significant for urban land expansion research and policy formulations in other transition economies and developing countries.
... we construct a first-order spatial lag structure over the regional entities defined through a contiguity-based spatial weighting matrix which we use to set up a conditional autoregressive specification of the spatial effect (Besag 1972;Besag 1974). Further, to allow for potentially unobserved spatial heterogeneity, we additionally included a spatial random effect assuming an iid Gaussian distribution. ...
Article
Full-text available
Attempts to constrain the spread of Covid-19 included the temporal reintroduction of travel restrictions and border controls within the Schengen area. While such restrictions clearly involve costs, their benefits have been disputed. We use a new set of daily regional data of confirmed Covid-19 cases from the respective statistical agencies of 18 Western European countries. Our data starts with calendar week 10 (starting 2nd March 2020) and extends to calendar week 17 (ending 26th April 2020), which allows us to test for treatment effects of border controls. Based on PPML methods and a Bayesian INLA approach we find that border controls had a significant effect to limit the pandemic.
... However, exploring the dependency of caries experiencesamong children's teeth must be undertakenviaanadequate modeling strategy (11)(12).The teeth in childhood are emerging subsequently and are linked through supporting tissues in the two jaws, such that the teeth positions in children's mouth has a spatial structure (13).Hence, the caries(present, not present) statuses of the teeth are spatially correlated binary data and investigating the teeth caries associations via spatial auto logistic model has been a reasonableidea (8)(9)(10)(11)(12)(13).Originally introduced by Besag, the basic auto logistic model was presented to evaluate the effects of neighboring responses on a binary response in a spatial lattice domain (14)(15).The Besag's basic model has been extended to auto logistic regression model andit has beenwidely used by many authors including Augustin et al. However, recently, Caragea and Kaiser have demonstrated that the traditional autologistic model fails to provide meaningful interpretations of parameters across varying levels of statistical dependence (23).To compensate the deficiency and as a development of the basic model,they introduced the centered autologistic model (24). ...
Article
Full-text available
Background. Dental caries has affected 60-90% of children worldwide. We developed the centered auto logistic regression model to assess the effects of both socio-demographic factors and contacting teeth on tooth caries experiences in primary school children. Methods. In a cross-sectional study socio-demographic characteristics and dental caries statuses in a random sample of school children aged 7-12 years from Yasuj city, Iran, were gathered through a questionnaire and a dental chart. The effects of the contacting teeth along with the socio-demographic factors on dental caries development in children were assessed through fitting the centered auto logistic regression model and computing odds ratio with its 95% confidence interval. Results. The effects of age and residential place on dental caries presence in the children were significantly different from zero. As age increased the odds of having a carious tooth decreased in children and compared with urban children, rural children had higher odds of having tooth caries experiences. Furthermore, the three contacting teeth had strictly significant effects on decaying a tooth. Conclusions. Using the centered auto logistic regression model it was shown that the children’s age and residential place were associated with dental carries experiences and the carious contacting teeth had high effects on decaying a tooth in children. In mixed dentition stage caries preventive health care programs must be centered on younger JOURNAL OF CRITICAL REVIEWS ISSN- 2394-5125 VOL 7, ISSUE 07, 2020 1618 age and rural children. In addition, in prophylactic and treatment dentistry attention must be towards checking up and protecting the three teeth in contact with a carious tooth.
Article
Full-text available
The basic concepts in the application of Bayesian approaches to image analysis have been introduced as advanced method. The Bayesian approach contains benefits in respect to image analysis and interpretation because it permits the use of prior knowledge concerning the situation under study. This paper use to investigate the application of some of the well-known procedures (determines number labels of image under several conditions like noise of image, resolution of image) for the Bayesian image analysis with segmentation as a joint prior model in order to estimate the Maximum Likelihood (ML). Markov random field with segmentation which resulted by mean of posterior (MP). This paper contains several sections. Firstly, includes introduction about the image analysis with, Bayes frame work and statistical background to Markov's random field and its relationship through Markov Chain Monte Carlo. Secondly, section, which directly addresses solutions by using a principle of segmentation, which is a representation in threshold, is simply type introduced. Thirdly, presents the description of Experiment of Segmentation by depend on histogram also study of the factor (one prior) and the same model by adding segmentation (joint prior) based on techniques presented in the previously section are discussed . Fourthly, this section contains on the second prior implementation and simulation by using phantom data (Castle from south East Asia) also steps of estimation. Finally, result of experiment as well as estimation and summary.
Article
Household financial distress is a complicated problem. Several social problems have been identified as potential risk factors. Conversely, financial distress has also been identified as a risk factor for some of those social problems. Graphical models can be used to better understand the co-dependencies between these problems. In this approach, problem variables are network nodes and the relations between them are represented by weighted edges. Linked administrative data on social service usage by 6,848 households from neighbourhoods with a high proportion of social housing were used to estimate a pairwise Markov random field with binary variables. The main challenges in graph estimation from data are (a) determining which nodes are directly connected by edges and (b) assigning weights to those edges. The eLasso method used in psychological networks addresses both these challenges. In the resulting graph financial distress occupies a central position that connects to both youth related problems as well as adult social problems. The graph approach contributes to a better theoretical understanding of financial distress and it offers valuable insights to social policy makers.
Article
Motivated by claims reserving in run-off triangles, a class of threshold autoregressive nearest-neighbour (TAR-NN) models extending a major class of parametric nonlinear time series models, namely threshold autoregressive (TAR) models, is introduced. The proposed class of models also introduces a flexible regime-switching mechanism to nearest-neighbour models. Attention is given to a sub-class of TAR-NN models, namely self-exciting threshold autoregressive nearest-neighbour models (SETAR-NN), for uses in claims reserving. The (strict) stationarity and geometric ergodicity of the SETAR-NN model, and more generally, a two-dimensional nonlinear autoregressive random field, are discussed. The conditional least-square (CLS) method is used to estimate the SETAR-NN model and some of its nested models. Simulation studies on the parameter estimates from the CLS method are conducted. Using real insurance claims data and stochastic simulations, the applications of the SETAR-NN model and the nested models for projecting future claims liabilities are discussed. Comparisons of those models with the Bootstrap-Chain-Ladder (BCL) model for claims reserving are provided.
Article
Full-text available
Collaboration graphs are relevant sources of information to understand behavioural tendencies of groups of individuals. The study of these graphs enables figuring out factors that may affect the efficiency and the sustainability of cooperative work. For example, such a collaboration involves researchers who develop relationships with their external counterparts to address scientific challenges. As relations and projects change over time, the evolution of social structures must be tackled. We propose a statistical approach considering different structural collaboration patterns and captures the dynamic of the relational structures over the years. Our approach combines spatial processes modelling and Exponential Random Graph Models used to analyse social processes. Since the normalising constant involved in classical Markov Chain Monte Carlo (MCMC) approaches is intractable, the inference remains challenging. To overcome this issue, we propose a Bayesian tool that relies on the recent ABC Shadow algorithm. The method is illustrated on real data sets from an open archive of scholarly documents. Through a simple formalism, our approach highlights the interactions between the different types of social relations at stake in the collaboration network.
Article
Full-text available
We apply the Ising model with nearest-neighbor correlations (INNC) in the problem of interpolation of spatially correlated data on regular grids. The correlations are captured by short-range interactions between “Ising spins”. The INNC algorithm can be used with label data (classification) as well as discrete and continuous real-valued data (regression). In the regression problem, INNC approximates continuous variables by means of a user-specified number of classes. INNC predicts the class identity at unmeasured points by using the Monte Carlo simulation conditioned on the observed data (partial sample). The algorithm locally respects the sample values and globally aims to minimize the deviation between an energy measure of the partial sample and that of the entire grid. INNC is non-parametric and, thus, is suitable for non-Gaussian data. The method is found to be very competitive with respect to interpolation accuracy and computational efficiency compared to some standard methods. Thus, this method provides a useful tool for filling gaps in gridded data such as satellite images.
Article
Predicting human immunodeficiency virus (HIV) epidemiology is vital for achieving public health milestones. Incorporating spatial dependence when data varies by region can often provide better prediction results, at the cost of computational efficiency. However, with the growing number of covariates available that capture the data variability, the benefit of a spatial model could be less crucial. We investigate this conjecture by considering both non-spatial and spatial models for county-level HIV prediction over the US. Due to many counties with zero HIV incidences, we utilize a two-part model, with one part estimating the probability of positive HIV rates and the other estimating HIV rates of counties not classified as zero. Based on our data, the compound of logistic regression and a generalized estimating equation outperforms the candidate models in making predictions. The results suggest that considering spatial correlation for our data is not necessarily advantageous when the purpose is making predictions.
Article
Full-text available
This study shows that residual variation can cause problems related to scaling in exponential random graph models (ERGM). Residual variation is likely to exist when there are unmeasured variables in a model-even those uncor-related with other predictors-or when the logistic form of the model is inappropriate. As a consequence, coefficients cannot be interpreted as effect sizes or compared between models and homophily coefficients, as well as other interaction coefficients, cannot be interpreted as substantive effects in most ERGM applications. We conduct a series of simulations considering the substantive impact of these issues, revealing that realistic levels of residual variation can have large consequences for ERGM inference. A flexible methodological framework is introduced to overcome these problems. Formal tests of mediation and moderation are also proposed. These methods are applied to revisit the relationship between selective mixing and triadic closure in a large AddHealth school friendship network. Extensions to other classes of statistical work models are discussed.
Article
The field of landscape genetics enables the study of infectious disease dynamics by connecting the landscape features with evolutionary changes. Quantifying genetic correlation across space is helpful in providing insight into the rate of spread of an infectious disease. We investigate two genetic patterns in spatially referenced single-nucleotide polymorphisms (SNPs): isolation by distance and isolation by resistance. We model the data using a Generalized Linear Mixed effect Model (GLMM) with spatially referenced random effects and provide a novel approach for estimating parameters in spatial GLMMs. In this approach, we use the links between binary probit models and bivariate normal probabilities to directly compute the model-based covariance function for spatial binary data. Parameter estimation is based on minimizing sum of squared distance between the elements of sample covariance and model-based covariance matrices. We analyze data including Brucella Abortus SNPs from spatially referenced hosts in the Greater Yellowstone Ecosystem.
Article
We analyze market baskets of individual households in two consumer durables categories (music, computer related products) by the multivariate logit (MVL) model, its finite mixture extension (FM-MVL) and the conditional restricted Boltzmann machine (CRBM). The CRBM attains a vastly better out-of-sample performance than MVL and FM-MVL models. Based on simulation-based likelihood ratio tests we prefer the CRBM to the FM-MVL model. To interpret hidden variables of conditional Boltzmann machines we look at their average probability differences between purchase and non-purchases of any sub-category across all baskets. To measure interdependences we compute cross effects between sub-categories for the best performing FM-MVL model and CRBM. In both product categories the CRBM indicates more or higher positive cross effects than the FM-MVL model. Finally, we suggest appropriate future research based on larger and more detailed data sets.
Preprint
Recent advances in the scale and diversity of population genomic datasets for bacteria now provide the potential for genome-wide patterns of co-evolution to be studied at the resolution of individual bases. The major human pathogen Streptococcus pneumoniae represents the first bacterial organism for which densely enough sampled population data became available for such an analysis. Here we describe a new statistical method, genomeDCA, which uses recent advances in computational structural biology to identify the polymorphic loci under the strongest co-evolutionary pressures. Genome data from over three thousand pneumococcal isolates identified 5,199 putative epistatic interactions between 1,936 sites. Over three-quarters of the links were between sites within the pbp2x , pbp1a and pbp2b genes, the sequences of which are critical in determining non-susceptibility to beta-lactam antibiotics. A network-based analysis found these genes were also coupled to that encoding dihydrofolate reductase, changes to which underlie trimethoprim resistance. Distinct from these resistance genes, a large network component of 384 protein coding sequences encompassed many genes critical in basic cellular functions, while another distinct component included genes associated with virulence. These results have the potential both to identify previously unsuspected protein-protein interactions, as well as genes making independent contributions to the same phenotype. This approach greatly enhances the future potential of epistasis analysis for systems biology, and can complement genome-wide association studies as a means of formulating hypotheses for experimental work. Author Summary Epistatic interactions between polymorphisms in DNA are recognized as important drivers of evolution in numerous organisms. Study of epistasis in bacteria has been hampered by the lack of both densely sampled population genomic data, suitable statistical models and powerful inference algorithms for extremely high-dimensional parameter spaces. We introduce the first model-based method for genome-wide epistasis analysis and use the largest available bacterial population genome data set on Streptococcus pneumoniae (the pneumococcus) to demonstrate its potential for biological discovery. Our approach reveals interacting networks of resistance, virulence and core machinery genes in the pneumococcus, which highlights putative candidates for novel drug targets. Our method significantly enhances the future potential of epistasis analysis for systems biology, and can complement genome-wide association studies as a means of formulating hypotheses for experimental work.
Chapter
This article describes a class of log‐linear models for statistical analysis of social network data commonly referred to as exponential random graph (ERG) models. ERG models are appropriate when one wishes to model the ties among pairs of nodes in a graph as a collection of binary outcome variables. Based on a set of dependence assumptions for how the tie variables are associated, the ERG model can be expressed as an exponential family distribution in canonical form with graph statistics that take the form of local configurations. The ERG model is introduced in its basic form, and the current estimation strategies are presented. Extensions of ERG and current challenges are briefly discussed.
Article
Some theoretical results are noted for a two-dimensional conditional probability model discussed in Appendix II of my Presidential Address (1967), and are used to correct marginal frequencies calculated on an invalid symmetry assumption.
Article
The relation of statistical inference to the wider problem of all inductive inference is reviewed. For scientific inference in general the competing approaches are the hypothetical-deductive and the Bayesian, and the formalism of each is discussed in statistical contexts in terms of the two main concepts of probability—chance and degree of belief. Inference problems arising with stochastic processes and time-series are considered against this background, and the author’s own general attitude to statistical inference reiterated. Two appendices refer respectively to two specific technical problems (i) separating a discrete and a spectral density component, (ii) specification and inference for "nearest-neighbour" systems.
Article
A general class of spatial-temporal Markov processes is defined leading to the standard spatial equilibrium distribution for nearest-neighbour models on a multi-dimensional lattice. Physical properties are obtainable from the marginal spatial spectral function. However, only the simplest one-dimensional case corresponds to a linear model with a readily derived spectrum. Non-linear models corresponding to two- and three-dimensional lattices are presented in their simplest terms, and a preliminary discussion of approximate solutions is included.
Article
DOI:https://doi.org/10.1103/RevModPhys.25.159
Article
The partition function of a two-dimensional "ferromagnetic" with scalar "spins" (Ising model) is computed rigorously for the case of vanishing field. The eigenwert problem involved in the corresponding computation for a long strip crystal of finite width (n atoms), joined straight to itself around a cylinder, is solved by direct product decomposition; in the special case n=∞ an integral replaces a sum. The choice of different interaction energies (±J,±J′) in the (0 1) and (1 0) directions does not complicate the problem. The two-way infinite crystal has an order-disorder transition at a temperature T=Tc given by the condition sinh(2J / kTc) sinh(2J′ / kTc)=1. The energy is a continuous function of T; but the specific heat becomes infinite as -log |T-Tc|. For strips of finite width, the maximum of the specific heat increases linearly with log n. The order-converting dual transformation invented by Kramers and Wannier effects a simple automorphism of the basis of the quaternion algebra which is natural to the problem in hand. In addition to the thermodynamic properties of the massive crystal, the free energy of a (0 1) boundary between areas of opposite order is computed; on this basis the mean ordered length of a strip crystal is (exp (2J / kT) tanh(2J′ / kT))n.
Chaînes doubles de Markoff et fonctions aléatoires de deux variables. C
  • Levy
LEVY, P. (1948). Chaines doubles de Markoff et fonctions aleatoires de deux variables. C. R. Acad. Sci., Paris, 226, 53-55.