Article

Spatial interaction and statistical analysis of lattice systems

Authors:
To read the full-text of this research, you can request a copy directly from the author.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... Markov random fields (MRFs) are a default choice for prior distribution on z, as an MRF, defined for a given NUG, is in fact a probability distribution whose conditional distributions elicit the same dependence structure (i.e. dependencies between areal units) as the specified NUG (Besag, 1974(Besag, , 1975. While the one-to-one correspondence between a NUG and an MRF leads to a simple and intuitive specification of the joint probability distribution of z, inference on the parameters governing an MRF is computationally burdensome due to an intractable normalizing constant when z is discrete. ...
... Markov random fields (MRFs) are a default choice for prior distribution on z, as an MRF, defined for a given NUG, is in fact a probability distribution whose conditional distributions elicit the same dependence structure (i.e. dependencies between areal units) as the specified NUG (Besag, 1974(Besag, , 1975. While the one-to-one correspondence between a NUG and an MRF leads to a simple and intuitive specification of the joint probability distribution of z, inference on the parameters governing an MRF is computationally burdensome due to an intractable normalizing constant when z is discrete. ...
... While the one-to-one correspondence between a NUG and an MRF leads to a simple and intuitive specification of the joint probability distribution of z, inference on the parameters governing an MRF is computationally burdensome due to an intractable normalizing constant when z is discrete. While diverse approximation methods for the MRF have been proposed (Reeves and Pettitt, 2004;Friel et al., 2009;McGrory et al., 2009McGrory et al., , 2012, the approach that uses the pseudo-likelihood (PL) of Besag (1975) in place of the analytical form of the intractable MRF prior density persists in Bayesian analyses (Heikkinen and Hogmander, 1994;Hoeting et al., 2000;Wang et al., 2000;Hughes et al., 2011;Pereyra and McLaughlin, 2017;Marsman and Haslbeck, 2023). While relatively easy to implement and computationally efficient, this approach lacks rigor as there are no guarantees that the approximation leads to valid posterior inference (see Section 3.3). ...
Preprint
Full-text available
Current approaches for modeling discrete-valued outcomes associated with spatially-dependent areal units incur computational and theoretical challenges, especially in the Bayesian setting when full posterior inference is desired. As an alternative, we propose a novel statistical modeling framework for this data setting, namely a mixture of directed graphical models (MDGMs). The components of the mixture, directed graphical models, can be represented by directed acyclic graphs (DAGs) and are computationally quick to evaluate. The DAGs representing the mixture components are selected to correspond to an undirected graphical representation of an assumed spatial contiguity/dependence structure of the areal units, which underlies the specification of traditional modeling approaches for discrete spatial processes such as Markov random fields (MRFs). We introduce the concept of compatibility to show how an undirected graph can be used as a template for the structural dependencies between areal units to create sets of DAGs which, as a collection, preserve the structural dependencies represented in the template undirected graph. We then introduce three classes of compatible DAGs and corresponding algorithms for fitting MDGMs based on these classes. In addition, we compare MDGMs to MRFs and a popular Bayesian MRF model approximation used in high-dimensional settings in a series of simulations and an analysis of ecometrics data collected as part of the Adolescent Health and Development in Context Study.
... , T , and with spatially-structured, temporally independent innovations (Cressie and Wikle, 2015, Sahu, 2022). A proper conditional autoregressive (PCAR) specification (Banerjee et al., 2014, Besag, 1974, Sahu, 2022) is assumed to model the spatial dependence of these innovations. The marginal spatio-temporal means of expo-sure and unmeasured confounder are generated from Fourier basis expansions with period T . ...
... x |j−k| . The spatial dependence is defined by the matrix Ω x,pcar , which has a proper conditional autoregressive (PCAR) specification (Banerjee et al., 2014, Besag, 1974, Sahu, 2022: ...
Preprint
Full-text available
Spatial confounding, often regarded as a major concern in epidemiological studies, relates to the difficulty of recovering the effect of an exposure on an outcome when these variables are associated with unobserved factors. This issue is particularly challenging in spatio-temporal analyses, where it has been less explored so far. To study the effects of air pollution on mortality in Italy, we argue that a model that simultaneously accounts for spatio-temporal confounding and for the non-linear form of the effect of interest is needed. To this end, we propose a Bayesian dynamic generalized linear model, which allows for a non-linear association and for a decomposition of the exposure effect into two components. This decomposition accommodates associations with the outcome at fine and coarse temporal and spatial scales of variation. These features, when combined, allow reducing the spatio-temporal confounding bias and recovering the true shape of the association, as demonstrated through simulation studies. The results from the real-data application indicate that the exposure effect seems to have different magnitudes in different seasons, with peaks in the summer. We hypothesize that this could be due to possible interactions between the exposure variable with air temperature and unmeasured confounders.
... Describing, visualising, and modelling spatio-temporal panel data is often challenging, particularly in high-dimensional settings with numerous time points and spatial areas. Conditional autoregressive models (CAR) (Besag, 1974) and their variants, such as intrinsic conditional autoregressive models (ICAR), are frequently employed to describe areal data. Although it is possible to extend CAR models to spatio-temporal settings (Clayton and Bernardinelli, 1996), these interaction models can become computationally intensive with high-dimensional data where spatial and temporal effects are not separable (Knorr-Held, 2000). ...
... Next we introduce the intrinsic conditional autoregressive (ICAR) model, which is a special case of the conditional autoregressive (CAR) model (Besag, 1974), especially meant for spatial analysis to depict the amount of dependency between different locations. These models are based on a spatial division of a larger area, employing a neighbourhood definition for the generated subareas, and combining both of these with a Gaussian distribution in order to quantify the amount of the spatial dependency. ...
Preprint
Real world spatio-temporal datasets, and phenomena related to them, are often challenging to visualise or gain a general overview of. In order to summarise information encompassed in such data, we combine two well known statistical modelling methods. To account for the spatial dimension, we use the intrinsic modification of the conditional autoregression, and incorporate it with the hidden Markov model, allowing the spatial patterns to vary over time. We apply our method into parish register data considering deaths caused by measles in Finland in 1750-1850, and gain novel insight of previously undiscovered infection dynamics. Five distinctive, reoccurring states describing spatially and temporally differing infection burden and potential routes of spread are identified. We also find that there is a change in the occurrences of the most typical spatial patterns circa 1812, possibly due to changes in communication routes after major administrative transformations in Finland.
... Therefore, the random effects are also usually specified at this areal level. The spatial model introduced by [1] serves as the foundation for various models employed to analyze areal level data and capture spatial effects; Conditional Auto-Regression (CAR) model proposes a multivariate Gaussian distribution for the areal level spatial effects with a mean and covariance matrix which depends on the geographical structure. The spatial effect is propositioned to be correlated to the spatial effects in nearby locations (i.e. ...
... However, area 4 has neighbors from both sub-region 1 and sub-region 2, so it is assigned a mixture of precision parameters from the two sub-regions. Note that if the two sub-regions have the same precision i.e. τ 1 = τ 2 , then the first term in Q Q Q N S will be 3 2 τ 1 + 1 2 τ 2 = 2τ 1 = 2τ 2 , and subsequently Q N S will simplify to Q S , which is the precision matrix of the stationary Besag model from [1]. ...
Article
This paper aims to extend the Besag model, a widely used Bayesian spatial model in disease mapping, to a non-stationary spatial model for irregular lattice-type data. The goal is to improve the model’s ability to capture complex spatial dependence patterns and increase interpretability. The proposed model uses multiple precision parameters, accounting for different intensities of spatial dependence in different sub-regions. We derive a joint penalized complexity prior to the flexible local precision parameters to prevent overfitting and ensure contraction to the stationary model at a user-defined rate. The proposed methodology can be used as a basis for the development of various other non-stationary effects over other domains such as time. An accompanying R package fbesag equips the reader with the necessary tools for immediate use and application. We illustrate the novelty of the proposal by modeling the risk of dengue in Brazil, where the stationary spatial assumption fails and interesting risk profiles are estimated when accounting for spatial non-stationary. Additionally, we model different causes of death in Brazil, where we use the new model to investigate the spatial stationarity of these causes.
... , I) are not independently distributed; this is because the adjacent counties tend to be correlated owing to the potential of sharing similar environmental or social factors [25]. Besag (1974) [26] proposed a conditional autoregressive (CAR) distribution for ω i and assumed ...
... , I) are not independently distributed; this is because the adjacent counties tend to be correlated owing to the potential of sharing similar environmental or social factors [25]. Besag (1974) [26] proposed a conditional autoregressive (CAR) distribution for ω i and assumed ...
Article
Full-text available
Prostate cancer is the most common cancer after non-melanoma skin cancer and the second leading cause of cancer deaths in US men. Its incidence and mortality rates vary substantially across geographical regions and over time, with large disparities by race, geographic regions (i.e., Appalachia), among others. The widely used Cox proportional hazards model is usually not applicable in such scenarios owing to the violation of the proportional hazards assumption. In this paper, we fit Bayesian accelerated failure time models for the analysis of prostate cancer survival and take dependent spatial structures and temporal information into account by incorporating random effects with multivariate conditional autoregressive priors. In particular, we relax the proportional hazards assumption, consider flexible frailty structures in space and time, and also explore strategies for handling the temporal variable. The parameter estimation and inference are based on a Monte Carlo Markov chain technique under a Bayesian framework. The deviance information criterion is used to check goodness of fit and to select the best candidate model. Extensive simulations are performed to examine and compare the performances of models in different contexts. Finally, we illustrate our approach by using the 2004-2014 Pennsylvania Prostate Cancer Registry data to explore spatial-temporal heterogeneity in overall survival and identify significant risk factors.
... These undirected graphical models typically play a critical role in constraint satisfaction problems [17,20], and have significant applications in language and speech processing [11,14]. Their utility extends to image processing [21,30,36] and more broadly in the realm of spatial statistics [9], demonstrating their versatility and importance in contemporary research. These models are widely employed in diverse fields such as statistical physics [41], natural language processing [46], image analysis [70], and spatial statistics [53]. ...
... To address this challenge, numerous methods have been proposed. A prominent approach is the pseudo-likelihood method [9], which substitutes the likelihood involving the normalizing constant with the product of conditional probabilities that do not include this constant. However, its effectiveness is contingent on the pseudo-likelihood being a close approximation to the actual likelihood, which typically occurs in graphs with simpler structures. ...
... Cette notion est assez proche de la propriété de Markov qui stipule que la loi conditionnelle de la valeur d'un site dépend seulement de ses sites voisins. Ainsi, la probabilité qu'un pixel appartienneà une classe dépend non seulement de sonintensité, mais aussi de celles de ses voisins [2,5]. ...
... On constate que cette propriété de la caractéristique locale en un site s, est une généralisation en dimension 2 de la propriété de Markov. Hammersley Cliffond [2,7] etablit l'équivalence entre champ de Gibss et champ de Markov dans le cas de la positivité de la probabilité. Donc un champ de Markov qui vérifie la condition de positivité est aussi et mesure de Gibbs. ...
Article
Full-text available
Nous proposons dans cet article une méthode de segmentation vectorielle des images multi-spectrales MSG2 basée sur le formalisme markovien. L’utilisation des champs de Markov permet de prendre en compte les interactions spatiales et spectrales entre les pixels de l’image multi-composante. Cependant leur utilisation se heurte au choix initial de leurs paramètres souvent obtenus par des algorithmes de segmentation proprement dits qui peuvent engendrer des charges de calcul supplémentaires. Afin d’éviter ce problème, une pré-segmentation simple, obtenue sur l’une des composantes spectrale de l’image, ne nécessitant pratiquement aucun calcul est proposée. Les différents résultats obtenus sont évalués à base du critère de Borsotti.
... This model uses prior distributions to impose a structure on the underlying random process to ensure that the closer areas are, the more related they are, and borrows strength over neighbouring regions to get more reliable region-specific estimates. In addition, a differential smoothing parameter depending on the neighbourhood (socio-economic and remoteness) characteristics has the benefit of improving the stability of the estimates, thereby producing more robust estimates, especially in areas with sparse populations (Besag, 1974). The use of a Bayesian modelling framework means that we can borrow spatial strength from neighbouring regions, and use auxiliary data and characteristics from similar neighbourhoods. ...
Article
Full-text available
The family lives of children and their early childhood development outcomes are attributable to the level of socio-economic disadvantage and relative isolation. This study aims to investigate how the disadvantage of the local area (i.e., socio-economic indexes for areas (SEIFA)) and the remoteness (i.e., accessibility/remoteness index of Australia (ARIA)) contribute to improved prevalence estimates of child development vulnerability in statistical areas level 3 (SA3) and 4 (SA4) across Australia. Data from the 2018 Australian Early Development Census (AEDC) has been used. The study included 308,953 children involved in the AEDC 2018 where one-in-ten of them were considered to be developmentally vulnerable, nationally. We developed models in a hierarchical Bayesian framework at the SA3 level using SEIFA and ARIA indices as covariates to account for spatial and unobserved heterogeneity. The performances of developed models are examined based on the consistency at SA3, SA4, and state level. The results reveal that SEIFA makes a significant contribution to explaining the spatial variation in childhood development vulnerability across small domains in Australia. Further, the inclusion of the ARIA score improves the model performance and provides better accuracy, particularly in remote and very remote regions. In these regions, the spatial model fails to distinguish the remoteness characteristics. The chosen non-spatial model accounting for heterogeneity at higher hierarchies performs best. The utilization of socio-economic disadvantage and geographic remoteness of the finer level domains helps to explain the geographic variation in child development vulnerability, particularly in sparsely populated remote regions in Australia.
... Due to the intractability of MLE approach, it is desirable to seek for some other candidate methods that do not involve computing the log-normalizing constants. To this end, the pseudolikelihood approach [4,5] was proposed, where the pseudolikelihood function is defined as the product of conditional distributions such that the normalizing constant gets cancelled out. In this case, the maximum pseudolikelihood estimator (MPLE) can indeed alleviate the issue of dealing with a possibly complicated normalizing constant. ...
Preprint
Full-text available
Spin glass models with quadratic-type Hamiltonians are disordered statistical physics systems with competing ferromagnetic and anti-ferromagnetic spin interactions. The corresponding Gibbs measures belong to the exponential family parametrized by (inverse) temperature $\beta>0$ and external field $h\in\mathbb{R}$. Given a sample from these Gibbs measures, a statistically fundamental question is to infer the temperature and external field parameters. In 2007, Chatterjee (Ann. Statist. 35 (2007), no.5, 1931-1946) first proved that in the absence of external field $h=0$, the maximum pseudolikelihood estimator for $\beta$ is $\sqrt{N}$-consistent under some mild assumptions on the disorder matrices. It was left open whether the same method can be used to estimate the temperature and external field simultaneously. In this paper, under some easily verifiable conditions, we prove that the bivariate maximum pseudolikelihood estimator is indeed jointly $\sqrt{N}$-consistent for the temperature and external field parameters. The examples cover the classical Sherrington-Kirkpatrick model and its diluted variants.
... These methods are well-suited for the disaggregation of data that exhibits spatial autocorrelation. Three geostatistical models can be found in the literature -Geographically Weighted Regression (GWR) [73], the Conditional Autoregressive (CAR) model [74], and variants of kriging [75]. ...
Article
Full-text available
National-level climate action plans are often formulated broadly. Spatially disaggregating these plans to individual municipalities can offer substantial benefits, such as enabling regional climate action strategies and for assessing the feasibility of national objectives. Numerous spatial disaggregation approaches can be found in the literature. This study reviews and categorizes these. The review is followed by a discussion of the relevant methods for the disaggregation of climate action plans. It is seen that methods employing proxy data, machine learning models, and geostatistical ones are the most relevant methods for the spatial disaggregation of national energy and climate plans. The analysis offers guidance for selecting appropriate methods based on factors such as data availability at the municipal level and the presence of spatial autocorrelation in the data. As the urgency of addressing climate change escalates, understanding the spatial aspects of national energy and climate strategies becomes increasingly important. This review will serve as a valuable guide for researchers and practitioners applying spatial disaggregation in this crucial field.
... probabilities. The MVL model has been applied to market basket data by Russell and Petersen (2000) building upon earlier publications in statistics (Cox 1972;Besag 1974). We can write the purchase probability of category j in market basket t of household m conditional on purchases of the other categories collected in vector y −jtm , the categoryspecific loyalty loy jmt and the category-specific marketing variable mvar jt as: ...
Article
Full-text available
Using multivariate logit models, we analyze purchases of product categories made by individual households. We introduce a sparse multivariate logit model that considers only a subset of all two-way interactions. A combined forward and backward selection procedure based on a cross-validated performance measure excludes about 74 % of the possible two-way interactions. We also specify random coefficient versions of both the non-sparse and the sparse model. The fact that the random coefficient models lead to better values of the Bayesian information criterion demonstrates the importance of latent heterogeneity. The random coefficients sparse model attains the best statistical performance if we consider model complexity and offers a better interpretability. We investigate the cross-purchase effects of household segments derived from this random coefficient model. As additional interpretation aid we cluster categories and category pairs by integer programming. We demonstrate what the best performing sparse model implies for cross-selling by product recommendations and store layout. The sparse model leads to managerial implications with respect to the effects of advertising in local newspapers and flyers that are as a rule close to those implied by its non-sparse counterpart.
... However, it is not unusual that such likelihood is difficult to work with either because no closed form expression exists, e.g., the corresponding statistical model is too complex, or because its computational cost is prohibitive. For example, the first situation occurs with max-stable processes [de Haan, 1984] or Gibbs random fields [Besag, 1974;Grelaud et al., 2009], while the second case frequently occurs with mixture models [Simola et al., 2021]. ...
Preprint
Full-text available
Approximate Bayesian Computation (ABC) methods are commonly used to approximate posterior distributions in models with unknown or computationally intractable likelihoods. Classical ABC methods are based on nearest neighbor type algorithms and rely on the choice of so-called summary statistics, distances between datasets and a tolerance threshold. Recently, methods combining ABC with more complex machine learning algorithms have been proposed to mitigate the impact of these "user-choices". In this paper, we propose the first, to our knowledge, ABC method completely free of summary statistics, distance and tolerance threshold. Moreover, in contrast with usual generalizations of the ABC method, it associates a confidence interval (having a proper frequentist marginal coverage) with the posterior mean estimation (or other moment-type estimates). Our method, ABCD-Conformal, uses a neural network with Monte Carlo Dropout to provide an estimation of the posterior mean (or others moment type functional), and conformal theory to obtain associated confidence sets. Efficient for estimating multidimensional parameters, we test this new method on three different applications and compare it with other ABC methods in the literature.
... The next ingredient is the model. The most common model is the conditionally autoregressive (CAR) model by Besag (1974). Its general form is: ...
Article
Full-text available
This paper addresses the critical issue of road safety and accident prevention by integrating road features, network theory, and advanced statistical models. It emphasises the importance of understanding the relationship between road infrastructure and accident risk, which impacts on various administrative stakeholders and on citizens’ safety. While existing literature focuses on road features and engineering solutions, this paper highlights the need to consider implicit spatial constraints as well. Our study builds on prior research by proposing a novel approach that merges conditional autoregressive modelling with a two-stage mixed Geographically weighted Poisson regression. This integrated methodology allows us to consider both the effect of risk factors at a global level and at a local road level. By leveraging the strengths of these two methods, we aim to capture both overarching trends and local variations of risk factors, thereby offering a comprehensive understanding of accident risk factors. Using data from the Open Street Map database, which covers the wide province of Milan in Italy, our models identify influential street characteristics, providing valuable insights for informed decision-making regarding road safety measures. Our method can be applied to any region in the world. The paper describes the models used, the dataset employed, and presents a detailed numerical analysis demonstrating the effectiveness of the approach in identifying and understanding accident risk factors within road networks. This information can help guide investments for the benefit of society.
... For the first model, we make use of ideas behind the change of support framework as described in Cressie and Wikle (2011) and conditional autoregressive (CAR) modeling Besag (1974)to create a unique random effects model to move from the point-level to the area-level. Of particular importance is that our modeling is performed using weights determined by the extended Hausdorff distance. ...
Article
Full-text available
One measurement modality for rainfall is a fixed location rain gauge. However, extreme rainfall, flooding, and other climate extremes often occur at larger spatial scales and affect more than one location in a community. For example, in 2017 Hurricane Harvey impacted all of Houston and the surrounding region causing widespread flooding. Flood risk modeling requires understanding of rainfall for hydrologic regions, which may contain one or more rain gauges. Further, policy changes to address the risks and damages of natural hazards such as severe flooding are usually made at the community/neighborhood level or higher geo-spatial scale. Therefore, spatial-temporal methods which convert results from one spatial scale to another are especially useful in applications for evolving environmental extremes. We develop a point-to-area random effects (PARE) modeling strategy for understanding spatial-temporal extreme values at the areal level, when the core information are time series at point locations distributed over the region.
... By the Hammersley-Clifford theorem [68], such a MRF satisfying p(x) > 0 everywhere can also be rewritten as p(x) = 1 Z e −U (x) , where Z = x∈X e −U (x) . In this case U (x) is called the energy function and has the form U (x) = c∈C V c (x), which is a sum of clique potentials V c (x) over all all possible cliques. ...
Preprint
Diffusion models can learn strong image priors from underlying data distribution and use them to solve inverse problems, but the training process is computationally expensive and requires lots of data. Such bottlenecks prevent most existing works from being feasible for high-dimensional and high-resolution data such as 3D images. This paper proposes a method to learn an efficient data prior for the entire image by training diffusion models only on patches of images. Specifically, we propose a patch-based position-aware diffusion inverse solver, called PaDIS, where we obtain the score function of the whole image through scores of patches and their positional encoding and utilize this as the prior for solving inverse problems. First of all, we show that this diffusion model achieves an improved memory efficiency and data efficiency while still maintaining the capability to generate entire images via positional encoding. Additionally, the proposed PaDIS model is highly flexible and can be plugged in with different diffusion inverse solvers (DIS). We demonstrate that the proposed PaDIS approach enables solving various inverse problems in both natural and medical image domains, including CT reconstruction, deblurring, and superresolution, given only patch-based priors. Notably, PaDIS outperforms previous DIS methods trained on entire image priors in the case of limited training data, demonstrating the data efficiency of our proposed approach by learning patch-based prior.
... For spatial equations as in (49) or (50), an analogous property given the 'past' X(s), s ̸ = t does not hold since Cov(X(s), ε(t)) ̸ = 0 (s ̸ = t) as a rule. This issue is important in spatial statistics and has been discussed in the literature, see [3,4] and the references therein, distinguishing between 'simultaneous' and 'conditional autoregressive schemes'. The recent work [12] discusses some conditional autoregressive models with LRD property. ...
Preprint
Full-text available
We consider fractional integral operators $(I-T)^d, d \in (-1,1)$ acting on functions $g: \mathbb{Z}^{\nu} \to \mathbb{R}, \nu \ge 1 $, where $T $ is the transition operator of a random walk on $\mathbb{Z}^{\nu}$. We obtain sufficient and necessary conditions for the existence, invertibility and square summability of kernels $\tau (\mathbf{s}; d), \mathmb{s} \in \mathbb{Z}^{\nu}$ of $(I-T)^d $. Asymptotic behavior of $\tau (\mathbf{s}; d)$ as $|\mathbf{s}| \to \infty$ is identified following local limit theorem for random walk. A class of fractionally integrated random fields $X$ on $\mathbb{Z}^{\nu}$ solving the difference equation $(I-T)^d X = \varepsilon$ with white noise on the right-hand side is discussed, and their scaling limits. Several examples including fractional lattice Laplace and heat operators are studied in detail.
... This modelling introduces spatial dependence using stochastic models on graphs, where nodes represent regions, and edges connect neighbouring regions 12 . Other methods include Markov random fields using undirected graphs 13,14 or directed acyclic graphical autoregression (DAGAR) models 15 . ...
Article
Full-text available
In the field of population health research, understanding the similarities between geographical areas and quantifying their shared effects on health outcomes is crucial. In this paper, we synthesise a number of existing methods to create a new approach that specifically addresses this goal. The approach is called a Bayesian spatial Dirichlet process clustered heterogeneous regression model. This non-parametric framework allows for inference on the number of clusters and the clustering configurations, while simultaneously estimating the parameters for each cluster. We demonstrate the efficacy of the proposed algorithm using simulated data and further apply it to analyse influential factors affecting children’s health development domains in Queensland. The study provides valuable insights into the contributions of regional similarities in education and demographics to health outcomes, aiding targeted interventions and policy design.
... Markov Field models, a generalization of Markov Chains and the undirected analogue of Bayesian DAGs, have also been applied to understand 'omics data in various disease contexts 9 , while also underlying statistical physics and thermodynamics models 10 of interactions occurring on structured "network" arrangements (e.g. crystalline solids) 11 . A Gaussian Markov Field approach to analyzing metabolomics data even suggested that the network interactions learned from data were enriched for actual metabolic reactions 12 , and a Markov Field was used to briefly describe an integrative analysis of proteomic and metabolomic data of body weight changes in human populations 13 . ...
Preprint
Full-text available
Multi-modal biological datasets provide rich information from diverse scales or facets of complex biological systems that can be analyzed to highlight the critical multi-scale interactions underlying specific biological phenomena. However, identifying the most vital associations among features and outputs can be beset by a high degree of spurious connections due to indirect effects of various immune features propagating through an unmapped biological network. Here, we applied a probabilistic graphical modeling approach, Markov Fields, to empirically dissect the most direct associations correlations between features from a public multi-modal dataset (antibody titers, antibody-dependent functions, cytokines, cytometry) from macaques undergoing intravenous BCG vaccination, a promising vaccine strategy against the major public health threat tuberculosis. This yielded a network influence model that interprets the assemblage of multi-scale paths by which vaccine effects propagate through the immune network to eventually protect against tuberculosis infection. Importantly, our modeling shows that the vast majority of apparent correlations between features arise from indirect effects relating distant immune features. For a test of predictive capability, we conducted experimental depletion of B cells during BCG IV vaccination in macaques (which did not reduce BCG IV-mediated protection against tuberculosis) and validated that our Markov Field model can predict systems-wide modulation of numerous features across the immune system in response to this perturbation. Finally, we applied our model to discern perturbations in the network that could have strong effects on IV-BCG efficacy. All together, we have demonstrated that probabilistic graphical modeling can increase interpretability and predictive value of multi-modal datasets for identifying new disease treatment targets.
... MRF and GRF are characterized by local and global property, respectively. The Hammersley and Clifford theorem [25] created an equivalence between GRF and MRF. This increased the utility of MRF in image processing application. ...
Preprint
Full-text available
This is only a draft copy. For final document refer the main document in the publisher site
... Another development in the process involves modifying the spatial weight matrix. The determination of spatial weighting systems within earlier space-time models often utilized methods pioneered by Besag [29]. Mukhaiyar, et al. [30] demonstrated that a spatial weight matrix can be constructed using a topological (undirected graph) network model, employing the Minimum Spanning Tree (MST) approach. ...
Preprint
Full-text available
Space-time extrapolation models are usually constrained to a limited number of observed locations and lack the ability to provide information about the values at unobserved locations. However, integrating these models with spatial interpolation techniques, it is possible to obtain more informative visual representations. The Generalized Space-Time Autoregressive (GSTAR) model, as a multivariate space-time extrapolation model, is often used due to its simplicity. Within the framework of the GSTAR model, a crucial component is the spatial weight matrix, which facilitates the establishment of spatial relationships among different locations. This matrix can be constructed by employing graph theory, particularly Minimum Spanning Tree (MST), as an extension of the model. Additionally, spatial interpolation can be achieved through the utilization of kriging methods, by gridding the observed spatial locations. Although the amalgamation of these two models does not exhibit superior performance compared to univariate time series models in risk mapping, particularly in the context of groundwater level observed in peatland areas within Riau Province, Indonesia, the model can provide more robust conclusions.
... where Z(θ 0 , θ 1 ) is an intractable partition function ensuring the distribution is normalized. As proven by Besag (1974), the conditional distribution for a single node A i given all other nodes, is, with positive spatially correlation θ 1 > 0: ...
Article
Decision-makers often observe the occurrence of events through a reporting process. City governments, for example, rely on resident reports to find and then resolve urban infrastructural problems such as fallen street trees, flooded basements, or rat infestations. Without additional assumptions, there is no way to distinguish events that occur but are not reported from events that truly did not occur--a fundamental problem in settings with positive-unlabeled data. Because disparities in reporting rates correlate with resident demographics, addressing incidents only on the basis of reports leads to systematic neglect in neighborhoods that are less likely to report events. We show how to overcome this challenge by leveraging the fact that events are spatially correlated. Our framework uses a Bayesian spatial latent variable model to infer event occurrence probabilities and applies it to storm-induced flooding reports in New York City, further pooling results across multiple storms. We show that a model accounting for under-reporting and spatial correlation predicts future reports more accurately than other models, and further induces a more equitable set of inspections: its allocations better reflect the population and provide equitable service to non-white, less traditionally educated, and lower-income residents. This finding reflects heterogeneous reporting behavior learned by the model: reporting rates are higher in Census tracts with higher populations, proportions of white residents, and proportions of owner-occupied households. Our work lays the groundwork for more equitable proactive government services, even with disparate reporting behavior.
... In order to illustrate how marginal variability and inverse smoothness are related to the DGP structure, we consider two widely adopted models in spatial statistics: the Gaussian Random Field with exponential covariance function, which is a special case of the Matérn covariance function (Matérn 1986), largely used in geostatistical analysis and the Conditional Auto Regressive (CAR) model (Besag 1974;Rue and Held 2005), widely used for areal data modeling. Both models are parameterized by a spatial correlation parameter denoted as in what follows. ...
Article
Full-text available
In the last two decades, significant research efforts have been dedicated to addressing the issue of spatial confounding in linear regression models. Confounding occurs when the relationship between the covariate and the response variable is influenced by an unmeasured confounder associated with both. This results in biased estimators for the regression coefficients reduced efficiency, and misleading interpretations. This article aims to understand how confounding relates to the parameters of the data generating process. The sampling properties of the regression coefficient estimator are derived as ratios of dependent quadratic forms in Gaussian random variables: this allows us to obtain exact expressions for the marginal bias and variance of the estimator, that were not obtained in previous studies. Moreover, we provide an approximate measure of the marginal bias that gives insights of the main determinants of bias. Applications in the framework of geostatistical and areal data modeling are presented. Particular attention is devoted to the difference between smoothness and variability of random vectors involved in the data generating process. Results indicate that marginal covariance between the covariate and the confounder, along with marginal variability of the covariate, play the most relevant role in determining the magnitude of confounding, as measured by the bias.
... Our third case study represents a typical instance where an innovation (cremation in this case) is not necessarily expected to reach a stable equilibrium in the population. While one could attempt to develop a complex model that accounts for possible cyclicity (e.g. using a sinusoidal model), a simpler solution is to model the diffusion as a random walk process using a intrinsic Gaussian conditional autoregressive (ICAR) model (Besag, 1974;Rue and Held, 2004). This approach consists of dividing our chronological window of analyses into a series of n time-blocks and estimating the associated vector of probabilities p t = 1 , p t = 2 , …, p t = n . ...
Article
Full-text available
Archaeological data provide a potential to investigate the diffusion of technological and cultural traits. However, much of this research agenda currently needs more formal quantitative methods to address small sample sizes and chronological uncertainty. This paper introduces a novel Bayesian framework for inferring the shape of diffusion curves using radiocarbon data associated with the presence/absence of a particular innovation. We developed two distinct approaches: 1) a hierarchical model that enables the fitting of an s-shaped diffusion curve whilst accounting for inter-site variations in the probability of sampling the innovation itself, and 2) a non-parametric model that can estimate the changing proportion of the innovation across user-defined time-blocks. The robustness of the two approaches was first tested against simulated datasets and then applied to investigate three case studies, the first pair on the diffusion of farming in prehistoric Japan and Britain and the third on cycles of changes in the burial practices of later prehistoric Britain.
... If two districts are neighbors, then observed predictors and responses may have similar values in those districts. The Besag model includes a spatially structured random component ( ) to account for spatial variations with the fixed predictors [33]. A widely used scheme of representation for such conditional distribution under irregularly shaped areas is the intrinsic conditional autoregressive (ICAR) model, where the conditional distribution of given − is given by ...
Article
Full-text available
Despite a decrease in the prevalence of low birth weight (LBW) over time, its ongoing significance as a public health concern in Bangladesh remains evident. Low birth weight is believed to be a contributing factor to infant mortality, prolonged health complications, and vulnerability to non-communicable diseases. This study utilizes nationally representative data from the Multiple Indicator Cluster Surveys (MICS) conducted in 2012-2013 and 2019 to explore factors associated with birth weight. Modeling birth weight data considers interactions among factors, clustering in data, and spatial correlation. District-level maps are generated to identify high-risk areas for LBW. The average birth weight has shown a modest increase, rising from 2.93 kg in 2012-2013 to 2.96 kg in 2019. The study employs a regression tree, a popular machine learning algorithm, to discern essential interactions among potential determinants of birth weight. Findings from various models, including fixed effect, mixed effect, and spatial dependence models, highlight the significance of factors such as maternal age, household head's education, antenatal care, and few data-driven interactions influencing birth weight. District-specific maps reveal lower average birth weights in the southwestern region and selected northern districts, persisting across the two survey periods. Accounting for hierarchical structure and spatial autocorrelation improves model performance, particularly when fitting the most recent round of survey data. The study aims to inform policy formulation and targeted interventions at the district level by utilizing a machine learning technique and regression models to identify vulnerable groups of children requiring heightened attention.
... where Λ k and k correspond to the neighbors and the number of neighbors of region k, respectively, and u is the precision term. Besag 29 proved that the corresponding joint specification of u follows a multivariate normal distribution with mean 0 and precision matrix Q = u (D − A), where D is a (r × r) diagonal matrix with d kk containing the number of neighbors of k, and d kl = 0, ∀k ≠ l. Moreover, as shown by Besag et al. 31,30 the joint distribution of u as specified above can be further simplified to the following pairwise difference: ...
Article
Relative survival represents the preferred framework for the analysis of population cancer survival data. The aim is to model the survival probability associated with cancer in the absence of information about the cause of death. Recent data linkage developments have allowed for incorporating the place of residence into the population cancer databases; however, modeling this spatial information has received little attention in the relative survival setting. We propose a flexible parametric class of spatial excess hazard models (along with inference tools), named “Relative Survival Spatial General Hazard,” that allows for the inclusion of fixed and spatial effects in both time-level and hazard-level components. We illustrate the performance of the proposed model using an extensive simulation study, and provide guidelines about the interplay of sample size, censoring, and model misspecification. We present a case study using real data from colon cancer patients in England. This case study illustrates how a spatial model can be used to identify geographical areas with low cancer survival, as well as how to summarize such a model through marginal survival quantities and spatial effects.
... Even if the observations (S 1 , · · · , S N ) are dependent, the sequence has some mixing properties that imply asymptotic independence. So we estimate the parameter θ using a pseudo-likelihood method introduced in the early 70's in Besag (1974). Our estimator is then the maximal argument of the pseudo-likelihood function: ...
Article
Full-text available
In this paper, we propose an estimator for the Ornstein–Uhlenbeck parameters based on observations of its supremum. We derive an analytic expression for the supremum density. Making use of the pseudo-likelihood method based on the supremum density, our estimator is constructed as the maximal argument of this function. Using weak-dependency results, we prove some statistical properties on the estimator such as consistency and asymptotic normality. Finally, we apply our estimator to simulated and real data.
... Recalling the Hammersley-Clifford theorem, modelling indicator variables as a MRF is equivalent of using a Gibbs distribution for prior probabilities. Based on the Hammersley-Clifford theorem the process {Z i : i ∈ N } is a MRF, if and only if its joint distribution is a Gibbs distribution (Besag, 1974). ...
Preprint
Full-text available
The purpose of this paper is to extend standard finite mixture models in the context of multinomial mixtures for spatial data, in order to cluster geographical units according to demographic characteristics. The spatial information is incorporated on the model through the mixing probabilities of each component. To be more specific, a Gibbs distribution is assumed for prior probabilities. In this way, assignment of each observation is affected by neighbors' cluster and spatial dependence is included in the model. Estimation is based on a modified EM algorithm which is enriched by an extra, initial step for approximating the field. The simulated field algorithm is used in this initial step. The presented model will be used for clustering municipalities of Attica with respect to age distribution of residents.
... along the diagonal and − i is on the off-diagonal, provided that B is symmetric and positive definite (see Besag, 1974). The is are general weights defining the influence of province s on the prior mean of i , while i represents area-level characteristics such as the number of neighborhoods Hogan and Tchernis (2004). ...
Article
Full-text available
This paper proposes spatial comprehensive composite indicators to evaluate the well-being levels and ranking of Italian provinces with data from the Equitable and Sustainable Well-Being dashboard. We use a method based on Bayesian latent factor models, which allow us to include spatial dependence across Italian provinces, quantify uncertainty in the resulting estimates, and estimate data-driven weights for elementary indicators. The results reveal that our data-driven approach changes the resulting composite indicator rankings compared to those produced by traditional composite indicators’ approaches. Estimated social and economic well-being is unequally distributed among southern and northern Italian provinces. In contrast, the environmental dimension appears less spatially clustered, and its composite indicators also reach above-average levels in the southern provinces. The time series of well-being composite indicators of Italian macro-areas shows clustering and macro-areas discrimination on larger territorial units.
... An application of Proposition 1 in Yang et al. (2015) shows that a valid joint probability distribution function from the above given set of specified conditional distributions can be constructed. By Assumption 1 and Assumption 2 in Section 4.1 of Besag (1974), such distribution defines an undirected graph G = (V, E) in which a missing edge between node s and node t corresponds to the condition θ st = θ ts = 0. On the other side, one edge between node s and node t implies θ st ≡ θ ts = 0. ...
Article
Full-text available
Mainly motivated by the problem of modelling biological processes underlying the basic functions of a cell-that typically involve complex interactions between genes-we present a new algorithm, called PC-LPGM, for learning the structure of undirected graphical models over discrete variables. We prove theoretical consistency of PC-LPGM in the limit of infinite observations and discuss its robustness to model misspecification. To evaluate the performance of PC-LPGM in recovering the true structure of the graphs in situations where relatively moderate sample sizes are available, extensive simulation studies are conducted, that also allow to compare our proposal with its main competitors. A biological validation of the algorithm is presented through the analysis of two real data sets.
... Autologistic model: Consider the popular autologistic models (Besag, 1974), which are Markov random field models for binary observations. Let ...
... Up to now, our simulation study focused on kinds of unilateral DGP, which is not only motivated by computational advantages, but also by Whittle's representation theorem (Whittle, 1954), which states that, under mild conditions, any two-dimensional bilateral spatial DGP can be represented (exactly or approximately) by a unilateral DGP having the same spatial ACF. However, the order of the original bilateral process is much lower, i. e., such bilateral DGPs constitute a parsimonious approach for generating sophisticated dependence structures, see Whittle (1954); Besag (1974); Arbia et al. (2018) for further details. Therefore, we conclude our simulation study by considering a few bilateral DGPs as well. ...
Article
We analyze data occurring in a regular two-dimensional grid for spatial dependence based on spatial ordinal patterns (SOPs). After having derived the asymptotic distribution of the SOP frequencies under the null hypothesis of spatial independence, we use the concept of the type of SOPs to define the statistics to test for spatial dependence. The proposed tests are not only implemented for real-valued random variables, but a solution for discrete-valued spatial processes in the plane is provided as well. The performances of the spatial-dependence tests are comprehensively analyzed by simulations, considering various data-generating processes. The results show that SOP-based dependence tests have good size properties and constitute an important and valuable complement to the spatial autocorrelation function. To be more specific, SOP-based tests can detect spatial dependence in non-linear processes, and they are robust with respect to outliers and zero inflation. To illustrate their application in practice, two real-world data examples from agricultural sciences are analyzed.
... Then, we see in Figure 1(a) that p = e 1 (labeled with 0) is in the convex hull, and, furthermore, that the x 1 coordinate of the point of intersection is (2+1)/2 = 3/2, and the separating line's location does not depend where, along the ray from the origin, p is. Now, instead, let p equal (3,2) . This is the situation depicted in Figure 1(b). ...
Preprint
Full-text available
Understanding the spatial dynamics within tissue microenvironments is crucial for deciphering cellular interactions and molecular signaling in living systems. These spatial characteristics govern cell distribution, extracellular matrix components, and signaling molecules, influencing local biochemical and biophysical conditions. Decoding these features offers insights into physiological processes, disease progression, and clinical outcomes. By elucidating spatial relationships between cell types, researchers uncover tissue architecture, cell communication networks, and microenvironment dynamics, aiding in the identification of biomarkers and therapeutic targets. Digital pathology imaging, including Hematoxylin and Eosin (H&E) staining, provides high-resolution histological information that offer intricate insights into cell-cell spatial relationships with greater details. However, current methods for capturing cell-cell spatial interactions are constrained by either methodological scopes or implementations restricted to script-level access. This limitation undermines generalizability and standardization, crucial for ensuring reproducibility. To address these limitations, we introduce SpatialQPFs , an extendable R package designed for extraction of interpretable spatial features from digital pathology images. By leveraging segmented cell information, our package provides researchers with a comprehensive toolkit for applying a range of spatial statistical methods within a stochastic process framework which includes analysis of point pattern data, areal data, and geostatistical data. This allows for a thorough analysis of cell spatial relationships, enhancing the depth and accuracy of spatial insights derived from the tissue, thereby empowering researchers to conduct comprehensive spatial analyses efficiently and reproducibly.
Article
Full-text available
The basic concepts in the application of Bayesian approaches to image analysis have been introduced as advanced method. The Bayesian approach contains benefits in respect to image analysis and interpretation because it permits the use of prior knowledge concerning the situation under study. This paper use to investigate the application of some of the well-known procedures (determines number labels of image under several conditions like noise of image, resolution of image) for the Bayesian image analysis with segmentation as a joint prior model in order to estimate the Maximum Likelihood (ML). Markov random field with segmentation which resulted by mean of posterior (MP). This paper contains several sections. Firstly, includes introduction about the image analysis with, Bayes frame work and statistical background to Markov's random field and its relationship through Markov Chain Monte Carlo. Secondly, section, which directly addresses solutions by using a principle of segmentation, which is a representation in threshold, is simply type introduced. Thirdly, presents the description of Experiment of Segmentation by depend on histogram also study of the factor (one prior) and the same model by adding segmentation (joint prior) based on techniques presented in the previously section are discussed . Fourthly, this section contains on the second prior implementation and simulation by using phantom data (Castle from south East Asia) also steps of estimation. Finally, result of experiment as well as estimation and summary.
Article
Despite the recognized gut-brain axis link, natural variations in microbial profiles between patients hinder definition of normal abundance ranges, confounding the impact of dysbiosis on infant neurodevelopment. We infer a digital twin of the infant microbiome, forecasting ecosystem trajectories from a few initial observations. Using 16 S ribosomal RNA profiles from 88 preterm infants (398 fecal samples and 32,942 abundance estimates for 91 microbial classes), the model (Q-net) predicts abundance dynamics with R ² = 0.69. Contrasting the fit to Q-nets of typical versus suboptimal development, we can reliably estimate individual deficit risk ( M δ ) and identify infants achieving poor future head circumference growth with ≈76% area under the receiver operator characteristic curve, 95% ± 1.8% positive predictive value at 98% specificity at 30 weeks postmenstrual age. We find that early transplantation might mitigate risk for ≈45.2% of the cohort, with potentially negative effects from incorrect supplementation. Q-nets are generative artificial intelligence models for ecosystem dynamics, with broad potential applications.
Article
Full-text available
This study addresses the need to boost fruit and vegetable consumption amidst rising diet‐related health concerns. Blueberries, rich in phenolic phytochemicals, offer significant health benefits. Using a basket‐based choice experiment (BBCE), the study identifies sensory descriptors that enhance blueberry purchasing likelihood. Packaging with a “Stay Fresh” label reduces price sensitivity compared to others. Additionally, blueberries are commonly purchased alongside other berries rather than as substitutes. Demographic factors such as gender, age, education, employment, fitness, ethnicity, region, nutritional value perception, and budget influence blueberry selection. These insights can aid growers, retailers, and marketers in increasing fresh blueberry demand.
Preprint
Full-text available
One of the lingering questions regarding the 2016 presidential election is ‘would things have been different, had Hilary Clinton visited Wisconsin?’ We tackle this question with a nonparametric identification strategy based on directed acyclic graph (DAG) and a Bayesian multilevel spatial statistical model to identify and estimate the campaign visit effect at the county-level. Our results suggest that the Wisconsin narrative misses the most important aspects of the 2016 election. In all counties in the 48 contiguous states, a Trump visit increases the support for Clinton. In half of the counties, a Clinton visit increases the support for Trump.
Article
Full-text available
Blind image deconvolution plays a very important role in the fields such as astronomical observation and fluorescence microscopy imaging, in which the noise follows Poisson distribution. However, due to the ill-posedness, it is a very challenging task to reach a satisfactory result from a single blurred image especially when the power of the Poisson noise is at a high level. Therefore, in this paper, we try to achieve high-quality restoration results with multi-blurred images which are contaminated by Poisson noise. Firstly, we design a novel sparse log-step gradient prior which adopts a mixture of logarithm and step functions to regularize the image gradients and combine it with the Poisson distribution to formulate the blind multi-image deconvolution problem. Secondly, we incorporate the methods of variable splitting and Lagrange multiplier to convert the original problem into sub-problems, then we alternately solve them to achieve the estimation of all the blur kernels of corresponding blurred images. Besides, we also design a non-blind multi-image deconvolution algorithm which is based on the log-step gradient prior to reach the final restored image. Experimental results on both synthetic and real-world blurred images show that the proposed prior is very capable of suppressing negative artifacts caused by ill-posedness. The algorithm can achieve restored image of very high quality which is competitive with some state-of-the-art methods.
Preprint
There are two complementary approaches to thermodynamics: an empirical, phenomenological representation of macroscopic states and a model-based, statistical-mechanical representation of microscopic states. If only a few energy transformation steps are involved, macroscopic quantities such as energy and entropy can be estimated without ambiguity, and often the associated microscopic states are well characterised. Both approaches have been used to develop and guide many key early ecological ideas. However, most ecosystems are characterized by uncountably many transformations that operate on a wide range of space and time scales. This renders the bounds of such systems become ambiguous making both the macroscopic and microscopic approaches a challenge. As such, the implementation of both approaches remain areas ripe for further investigation. In particular, the Onsager reciprocal relations permit simplification of expectations that are yet to be fully understood in an ecological context. Here we begin on taking a few first steps in trying understand the far-reaching ramifications of these thermodynamic relations.
ResearchGate has not been able to resolve any references for this publication.