Article

Markov Chain Monte Carlo - Stochastic Simulation for Bayesian Inference

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Compared to deterministic methods, Bayesian approaches have the advantage of quantifying uncertainty in the estimates provided, but also incur a substantial computational penalty. Here, the computational expense results from the numerical sampling algorithms, e.g., Markov Chain Monte Carlo (MCMC) (Gamerman & Lopes, 2006), which can exhibit slow convergence and involve the evaluation of a potentially time-consuming computational model for each sample drawn. To alleviate this computational burden, advanced MCMC methods have been developed to reduce sampling time by improving sampling convergence (Haario, Laine, & Mira, 2006;Nichols et al., 2011) or through parallelization of the algorithms themselves (Vrugt et al., 2009;Neiswanger, Wang, & Xing, 2013;Prudencio & Cheung, 2012). ...
... Algorithm 1 summarizes the most basic form of MCMC, the Metropolis algorithm (Gamerman & Lopes, 2006). Here, the method simply draws a trial sample C t at each iteration from a proposal distribution q(C t |C (j−1) ), and then decides whether to accept or reject this sample based on the acceptance probability, A(C t , C (j−1) ). ...
... While the Metropolis algorithm is straightforward to implement, tuning of the algorithm (namely, the selection of an appropriate Σ q ) for efficient convergence and appropriate sample acceptance rate is notoriously difficult (Gamerman & Lopes, 2006) . Moreover, the posterior probability distributions formulated with Bayesian inference are often multimodal and grow in complexity with the number of unknown parameters. ...
Article
This work presents a computationally-efficient inverse approach to probabilistic damage diagnosis. Given strain data at a limited number of measurement locations, Bayesian inference and Markov Chain Monte Carlo (MCMC) sampling are used to estimate probability distributions of the unknown location, size, and orientation of damage. Substantial computational speedup is obtained by replacing a three-dimensional finite element (FE) model with an efficient surrogate model. The approach is experimentally validated on cracked test specimens where full field strains are determined using digital image correlation (DIC). Access to full field DIC data allows for testing of different hypothetical sensor arrangements, facilitating the study of strain-based diagnosis effectiveness as the distance between damage and measurement locations increases. The ability of the framework to effectively perform both probabilistic damage localization and characterization in cracked plates is demonstrated and the impact of measurement location on uncertainty in the predictions is shown. Furthermore, the analysis time to produce these predictions is orders of magnitude less than a baseline Bayesian approach with the FE method by utilizing surrogate modeling and effective numerical sampling approaches.
... The likelihood is combined with prior information to form a posterior distribution of the constitutive model parameters. 5. This posterior is explored through the Metropolis-Hastings (MH) algorithm [19,20], a Markov chain Monte Carlo (MCMC) method [20,37]. ...
... However, the direct computation of these statistics typically requires numerical integration, which is often impractical from a computational effort point of view [19,20]. For such cases, stochastic simulation with Markov chain Monte Carlo (MCMC) methods [19,37] can provide an indirect computational approach. ...
... A Markov chain is a stochastic process in which, given the present state, past and future states are independent [37]. If a Markov chain, independently of its initial distribution, reaches a stage that can be represented by a specific distribution λ, and retains this distribution for all subsequent stages, we say that λ is the limit distribution of the chain [37]. ...
Preprint
Full-text available
We consider the problem of estimating a temperature-dependent thermal conductivity model (curve) from temperature measurements. We apply a Bayesian estimation approach that takes into account measurement errors and limited prior information of system properties. The approach intertwines system simulation and Markov chain Monte Carlo (MCMC) sampling. We investigate the impact of assuming different model classes-cubic polynomials and piecewise linear functions-their parametrization, and different types of prior information-ranging from uninformative to informative. Piecewise linear functions require more parameters (conductivity values) to be estimated than the four parameters (coefficients or conductivity values) needed for cubic polynomials. The former model class is more flexible, but the latter requires less MCMC samples. While parametrizing polynomials with coefficients may feel more natural, it turns out that parametrizing them using conductivity values is far more natural for the specification of prior information. Robust estimation is possible for all model classes and parametrizations, as long as the prior information is accurate or not too informative. Gaussian Markov random field priors are especially well-suited for piecewise linear functions.
... A discrepancy function rðSðyÞ; SðŷÞÞ is used to measure the similarity between the simulated and observed datasets [50], and if the simulated data closely matches the observed data, the parameter values are accepted as plausible. The target (posterior) distribution, which is a distribution of the parameters conditional on the available data π(θ|y), can then be approximately sampled using ABC accept-reject [52], or more efficient methods [48] such as Markov chain Monte Carlo ABC (MCMC-ABC) [53,54] or sequential Monte Carlo ABC (SMC-ABC) [55,56]. For the interested reader, helpful reviews on approximate Bayesian methods can be found in Beaumont et al., [51], Drovandi [57], or Sisson et al., [49]. ...
... 3. Moving: MCMC-ABC [54] is used to move the particles according to the current distribution in the sequence p t (θ|ρ(θ) � � t ). This diversifies the ensemble (avoiding duplicates) by jittering each parameter set relative to its current values. ...
Article
Full-text available
The potential effects of conservation actions on threatened species can be predicted using ensemble ecosystem models by forecasting populations with and without intervention. These model ensembles commonly assume stable coexistence of species in the absence of available data. However, existing ensemble-generation methods become computationally inefficient as the size of the ecosystem network increases, preventing larger networks from being studied. We present a novel sequential Monte Carlo sampling approach for ensemble generation that is orders of magnitude faster than existing approaches. We demonstrate that the methods produce equivalent parameter inferences, model predictions, and tightly constrained parameter combinations using a novel sensitivity analysis method. For one case study, we demonstrate a speed-up from 108 days to 6 hours, while maintaining equivalent ensembles. Additionally, we demonstrate how to identify the parameter combinations that strongly drive feasibility and stability, drawing ecological insight from the ensembles. Now, for the first time, larger and more realistic networks can be practically simulated and analysed.
... In a different publication, a parameterized model used the age profiles of fertility to quickly and simply describe age-specific rates. Future vital rate profiles are projected using time series models of the parameters, which reflect the temporal patterns of the age profiles [6]. ...
... The Equation 6 is the Backshift format of the ARIMA(p,d,q) model, with "d" difference. ...
Article
Full-text available
Bayesian statistical analysis method has been applied. R- statistical analysis tool is used for data analysis. To forecast, the ”bayesforecast” package is needed. It is a substitute package in R for the ”forecast” package in the traditional (frequentest) statistical method. The Bayesian data analysis using the specific case of the general auto-regressive integrated moving average model (ARIMA) is processed as follows; As the first step, the stationarity of the given data-set is assessed, the time series has been made stationary by taking differences. After fitting several models, as the most appropriate fitted model, the ARIMA (1, 2, 1) model has been fitted to the data. The accuracy of the fitted model is examined, and thereafter, the developed model is analyzed. The posterior computation is done, using the Markov Chain Monte Carlo (MCMC) simulation method. The method ultimately focuses on drawing relevant inferences including the 16 years prediction, and the results are; in general, found to be satisfactory.
... These tables focus on mortality rates of insured sub-national groups in Brazil (male, female, death, and survivorship coverages). From a modeling point of view, the BR-EMS 2021 tables were estimated assuming a Binomial sampling model and Heligman-Pollard law with parameters estimated using the Bayesian paradigm (GAMERMAN; LOPES, 2006). Other studies in Brazil considered this law of mortality table estimation, such as Beltrão and Sugahara (2017) that investigated the mortality among federal civil servants from 1993 to 2014, highlighting differences by educational level assuming a Binomial distribution of deaths and fitting a modified Heligman-Pollard model with the exclusion of the first term, related to infant mortality. ...
... is the prior predictive distribution for the observables, for which there is no analytical solution, which makes us resort to computational methods for approximation of the posterior distribution via Markov Chain Monte Carlo (MCMC) methods (GAMERMAN; LOPES, 2006). Model specification is completed after assigning independent prior distributions for the vector θ. ...
Article
Full-text available
This article presents the Brazilian private insurance market’s actuarial life tables, BR- EMS 2021. Using Bayesian inference on the parameters of the Heligman- Pollard law of mortality and data from 23 insurance groups over 15 years, totaling 3.5 billion registers, the data were corrected through a two hidden-layer neural network. The resulting tables show that the insured population exhibits lower mortality rates than the general Brazilian population, even lower than the national populations of well-developed countries such as the USA. Moreover, besides the expected gender gap in mortality rates, there is a clear distance between the death and survivorship insurance coverage groups. Likewise, the insured population characteristics mitigate well-known regional structural discrepancies in the Brazilian population, indicating that being part of the selected population of insured individuals is thus associated with a more effective protection against death than other outstanding factors such as geographic region of residence.
... From a full Bayesian view, the posterior distribution is commonly not available from their analytical form, and numerical integration and simulation are considered, in particular, Markov chain Monte Carlo (MCMC) methods (Gamerman and Lopes, 2006) are used in this work. Several actuarial papers for loss reserving such as, Choy et al. (2016), Goudarzi and Zokaei (2021), and Usman and Chan (2022) obtain the posterior distribution via MCMC through open sources software (OpenBUGS, WinBUGS and RStan Lunn et al. (2000), Lunn et al. (2009), Stan Development Team (2023). ...
... where f N (·) denotes the density of the Gaussian distribution, f N T (·) the density of the Truncated Gaussian distribution, π(·) stands for the prior distribution of the static parameters in θ, and I A (·) is a characteristic function of the subset A and l ∈ {t, s, vg}. See that the resulting posterior distribution does not have an analytical solution, and we appeal to Markov chain Monte Carlo techniques (MCMC) (Gamerman and Lopes, 2006) to obtain samples from this distribution. The inference procedure is detailed in Appendix B. ...
Preprint
Full-text available
This paper focuses on modelling loss reserving to pay outstanding claims. As the amount liable on any given claim is not known until settlement, we propose a flexible model via heavy-tailed and skewed distributions to deal with outstanding liabilities. The inference relies on Markov chain Monte Carlo via Gibbs sampler with adaptive Metropolis algorithm steps allowing for fast computations and providing efficient algorithms. An illustrative example emulates a typical dataset based on a runoff triangle and investigates the properties of the proposed models. Also, a case study is considered and shows that the proposed model outperforms the usual loss reserving models well established in the literature in the presence of skewness and heavy tails.
... is the Jacobian transformation for the prior on AA T in terms of the Cholesky factor A (Olkin and Sampson, 1972), and R d is the dimension of η d . We employ Markov Chain Monte Carlo (MCMC) with Gibbs sampling and random walk Metropolis (Gamerman and Lopes, 2006), implemented in R statistical computing environment, to draw samples from the posterior distribution, as defined in (11). ...
... GAO, L., DATTA, A. and BANERJEE, S. (2022) Our models rely on posterior inference, which is achieved through MCMC posterior simulations. To accomplish this, we employ the blocked Gibbs Sampler with some adaptive and some classic Metropolis-Hastings steps, based on the work by Gamerman and Lopes (2006) and Miasojedow, Moulines and Vihola (2013). We decided not to employ an adaptive strategy for γ due to its high-dimensionality and for A due to the particular sampling step involving it. ...
Preprint
Full-text available
Epidemiologists commonly use regional aggregates of health outcomes to map mortality or incidence rates and identify geographic disparities. However, to detect health disparities across regions, it is necessary to identify "difference boundaries" that separate neighboring regions with significantly different spatial effects. This can be particularly challenging when dealing with multiple outcomes for each unit and accounting for dependence among diseases and across areal units. In this study, we address the issue of multivariate difference boundary detection for correlated diseases by formulating the problem in terms of Bayesian pairwise multiple comparisons by extending it through the introduction of adjacency modeling and disease graph dependencies. Specifically, we seek the posterior probabilities of neighboring spatial effects being different. To accomplish this, we adopt a class of multivariate areally referenced Dirichlet process models that accommodate spatial and interdisease dependence by endowing the spatial random effects with a discrete probability law. Our method is evaluated through simulation studies and applied to detect difference boundaries for multiple cancers using data from the Surveillance, Epidemiology, and End Results Program of the National Cancer Institute.
... The MCMC method constructs a Markov chain via a sequence of sample generated from the desired pdfs where each sample is probabilistically dependent on the prior sample, enabling the chain to evolve and replicate the desired pdfs closely. [35][36][37] In addition, the Metropolis-Hastings (MH) algorithm is employed to introduce an acceptance/rejection criterion to decide whether the new sample is accepted into the chain or discarded, hence eliminate the arbitrary limits on the sources of uncertainty. 38 In this study, MCMC with MH algorithm is applied to generate a distribution of samples to obtain a sound approximation of the predefined pdfs for each source of uncertainty. ...
... 58 Sampling was performed using the MCMC method with the MH algorithm. [35][36][37] In MCMC-MH, a sequence of random samples is drawn from the targeted pdf. The samples are dependent on the starting sample but become increasingly independent with the number of iterations. ...
Article
Creep behavior is susceptible to uncertainty. Replicated creep data for steel can exhibit logarithm decades of uncertainty, which necessitates the development of probabilistic creep models for reliability‐based design. The deterministic Wilshire – Cano – Stewart (WCS) model is transformed into a probabilistic creep deformation, damage, and rupture prediction model via injecting the uncertainty of test conditions, pre‐existing damage, and material constants. The WCS model is calibrated to replicated quintuplicate creep data from a “single heat” of 304 stainless steel. The Markov chain Monte Carlo (MCMC) method with the Metropolis–Hastings (MH) algorithm is employed to introduce the sources of uncertainties into the model via calibrated probability distribution functions (pdfs). The probabilistic predictions encapsulate most of the experimental outliers across isostresses‐isotherms. Interpolative and extrapolative assessments reveal that minimum‐creep‐strain‐rate (MCSR) and stress rupture (SR) predictions are consistent across isotherms. The WCS probabilistic model captures the range of creep behavior of a single heat of material using short‐term data.
... A Bayesian model is a statistical model that implements Bayes' rule to infer all uncertainty within the model (Jaynes, 1986). The most representative method is the Bayesian neural network (Neal, 2012), which draws samples from the posterior distribution of the model via MCMC (Gamerman & Lopes, 2006), Laplace methods (Mackay, 1992;Foong et al., 2020) and variational inference (Peterson & Hartman, 1989), forming the epistemic uncertainty of the model prediction. However, their obvious shortcomings of inaccurate predictions (Wenzel et al., 2020) and high com-putational costs (Gelman, 2008) prevent them from wide adoption in practice. ...
Article
Full-text available
Out-of-distribution (OOD) detection is critical to ensuring the reliability and safety of machine learning systems. For instance, in autonomous driving, we would like the driving system to issue an alert and hand over the control to humans when it detects unusual scenes or objects that it has never seen during training time and cannot make a safe decision. The term, OOD detection, first emerged in 2017 and since then has received increasing attention from the research community, leading to a plethora of methods developed, ranging from classification-based to density-based to distance-based ones. Meanwhile, several other problems, including anomaly detection (AD), novelty detection (ND), open set recognition (OSR), and outlier detection (OD), are closely related to OOD detection in terms of motivation and methodology. Despite common goals, these topics develop in isolation, and their subtle differences in definition and problem setting often confuse readers and practitioners. In this survey, we first present a unified framework called generalized OOD detection, which encompasses the five aforementioned problems, i.e.,AD, ND, OSR, OOD detection, and OD. Under our framework, these five problems can be seen as special cases or sub-tasks, and are easier to distinguish. Despite comprehensive surveys of related fields, the summarization of OOD detection methods remains incomplete and requires further advancement. This paper specifically addresses the gap in recent technical developments in the field of OOD detection. It also provides a comprehensive discussion of representative methods from other sub-tasks and how they relate to and inspire the development of OOD detection methods. The survey concludes by identifying open challenges and potential research directions.
... Narrower confidence intervals indicate more precise estimates, while wider intervals suggest greater uncertainty. By interpreting confidence intervals alongside the estimated coefficients, we can gain valuable insights into the statistical significance of the relationships between variables (Gamerman and Lopes, 2006). ...
Article
Full-text available
Linear models has been a powerful econometric tool used to show the relationship between two or more variables. Many studies also use linear approximation for nonlinear cases as it still might show valid results. OLS method requires the relationship of dependent and independent variables to be linear, although many studies employ OLS approximation even for nonlinear cases. In this study, we are introducing alternative method of intervals estimation, bootstrap, in linear regressions when the relationship is nonlinear. We compare the traditional and bootstrap confidence intervals when data has nonlinear relationship. As we need to know the true parameters, we carry out a simulation study. Our research findings indicate that when error term has non-normal shape, bootstrap interval will outperform the traditional method due to no distributional assumption and wider interval width
... Therefore, it is necessary to apply sampling statistical approaches to sample from the posterior probability distribution to obtain its statistical characteristics. Markov chain Monte Carlo (MCMC) [11,12] can generate samples from the posterior probability distribution of parameters and conduct statistical analysis efficiently [13,14]. MCMC has been successfully used for GCS parameter identification many times [4,[15][16][17][18]. ...
Article
Full-text available
Groundwater contamination source (GCS) parameter identification can help with controlling groundwater contamination. It is proverbial that groundwater contamination concentration observation errors have a significant impact on identification results, but few studies have adequately quantified the specific impact of the errors in contamination concentration observations on identification results. For this reason, this study developed a Bayesian-based integrated approach, which integrated Markov chain Monte Carlo (MCMC), relative entropy (RE), Multi-Layer Perceptron (MLP), and the surrogate model, to identify the unknown GCS parameters while quantifying the specific impact of the observation errors on identification results. Firstly, different contamination concentration observation error situations were set for subsequent research. Then, the Bayesian inversion approach based on MCMC was used for GCS parameter identification for different error situations. Finally, RE was applied to quantify the differences in the identification results of each GCS parameter under different error situations. Meanwhile, MLP was utilized to build a surrogate model to replace the original groundwater numerical simulation model in the GCS parameter identification processes of these error situations, which was to reduce the computational time and load. The developed approach was applied to two hypothetical numerical case studies involving homogeneous and heterogeneous cases. The results showed that RE could effectively quantify the differences caused by contamination concentration observation errors, and the changing trends of the RE values for GCS parameters were directly related to their sensitivity. The established MLP surrogate model could significantly reduce the computational load and time for GCS parameter identification. Overall, this study highlights that the developed approach represents a promising solution for GCS parameter identification considering observation errors.
... According to [19], the rejection sampling technique is versatile enough to derive values from g(x) even without complete information about the specification of f(x). ...
Article
Full-text available
Our research paper introduces a newly developed probability distribution called the transformed MG-extended exponential (TMGEE) distribution. This distribution is derived from the exponential distribution using the modified Frechet approach, but it has a more adaptable hazard function and unique features that we have explained in detail. We conducted simulation studies using two methods: rejection sampling and inverse transform sampling, to produce summaries and show distributional properties. Moreover, we applied the TMGEE distribution to three real datasets from the health area to demonstrate its applicability. We used the maximum likelihood estimation technique to estimate the distribution’s parameters. Our results indicate that the TMGEE distribution provides a better fit for the three sets of data as compared to nine other commonly used probability distributions, including Weibull, exponential, and lognormal distributions.
... Data were accuracy-coded, such that the upper threshold (a) of the model corresponded to a correct choice, whereas the lower bound (0) corresponded to errors. We generated 10,000 samples from the joint posterior distribution of all model parameters by using Markov chain Monte Carlo methods 66 . The initial 1000 samples were discarded as burn-in to minimize the effect of initial values on the posterior inference (see Wiecki et al. 28 for more details of the procedure). ...
Article
Full-text available
Prior evidence suggests that increasingly efficient task performance in human learning is associated with large scale brain network dynamics. However, the specific nature of this general relationship has remained unclear. Here, we characterize performance improvement during feedback-driven stimulus-response (S-R) learning by learning rate as well as S-R habit strength and test whether and how these two behavioral measures are associated with a functional brain state transition from a more integrated to a more segregated brain state across learning. Capitalizing on two separate fMRI studies using similar but not identical experimental designs, we demonstrate for both studies that a higher learning rate is associated with a more rapid brain network segregation. By contrast, S-R habit strength is not reliably related to changes in brain network segregation. Overall, our current study results highlight the utility of dynamic functional brain state analysis. From a broader perspective taking into account previous study results, our findings align with a framework that conceptualizes brain network segregation as a general feature of processing efficiency not only in feedback-driven learning as in the present study but also in other types of learning and in other task domains.
... La estimación de los modelos se realizó utilizando el método Integrated Nested Laplace Approximation (INLA), implementado en el paquete R-INLA (Rue et al., 2009), disponible en lenguaje de programación R (R Core Team, 2016), el cual se presenta como una alternativa computacionalmente eficiente respecto de métodos tradicionales como el MCMC (Gamerman & Lopes, 2006). ...
Article
Full-text available
El trabajo tiene por objetivo analizar el impacto del COVID-19 en poblaciones indígenas de los municipios de México. Para analizar dicha relación se utilizan modelos bayesianos espacio-temporales que permiten capturar la compleja dinámica de la transmisión epidemiológica en términos de dependencia espacial, temporal y espacio-temporal conjunta. Estos modelos tienen la capacidad de incluir covariables, como el porcentaje de población indígena, lo que permite cuantificar el efecto que la covariable ejerce sobre la evolución de la epidemia. Asimismo, los modelos permiten identificar clusters espacio-temporales con altas y bajas tasas de incidencia, evidenciando desigualdades en salud basadas en la proporción de población indígena residente en municipios específicos. Contrario a lo esperado, los resultados mostraron un efecto protector en la tasa de incidencia por COVID-19 para la población indígena. Además, se observó una amplia heterogeneidad en la distribución por municipio de las tasas de incidencia por COVID-19, con fluctuaciones importantes en el tiempo. Las tasas de incidencia de COVID-19 en poblaciones indígenas fueron bajas, lo que puede deberse a que la población indígena predomina en municipios de baja densidad poblacional, con menor acceso a los servicios de salud y mayor marginación social. Sin embargo, es importante interpretar estos resultados con cautela debido al elevado nivel observado de subregistro de casos de COVID-19 encontrado en poblaciones indígenas.
... As the initial start value for the Markov chain, chosen from the iterative method, serves as a good approximation of the actual parameter set for the considered case, the problem of local convergence has not been encountered here. To avoid the convergence into local minima, it is advisable to run multiple Markov chains simultaneously to prevent the chains from getting trapped within the vicinity of local minima [20]. In such cases, the number of selected chains should be typically within a single digit value to avoid computational waste. ...
Conference Paper
Full-text available
Estimating the parameters of advanced soil model is a highly challenging task that requires an in depth understanding of the soil model along with significant human intervention due to the tedious iterative nature of the calibration process. As a result, the parameter calibration becomes time consuming and its outcome varies depending on the user discretion. Among various available inverse analysis methods, the Bayesian approach utilizing Markov chain Monte Carlo (MCMC) simulations has exhibited notable efficacy in determining the constitutive model parameters. In the present study, the potential application of the MCMC method for the parametric identification of a hardening-type elastoplastic soil constitutive model has been explored. In this regard, two algorithms have been employed. The first one is a stress-based single-element MATLAB code that predicts the stress-strain and volumetric response of the soil. The second code employs the Bayesian method facilitating the parameter identification process such that the error in both the stress-strain and volumetric soil response is minimized. In this context, sensitivity analysis of the model parameters has been conducted to identify the most sensitive parameters and subsequently, the MCMC algorithm has been applied to estimate these parameters. Finally, a comparison has been presented between the soil response predicted employing the parameters estimated from the manual iteration and MCMC method by the help of root mean squared error. The results show that the MCMC method is robust and computationally efficient in estimating the constitutive parameters with minimal error.
... It effectively captures spatial and bonding relationships between atoms within the binding pockets. The conditional molecular sampling algorithm employed by Pocket2Mol demonstrates efficiency in characterizing novel position generation strategies and accurately predicting element types without relying on MCMC (Markov chain Monte Carlo) [19]. Importantly, molecules sampled from Pocket2Mol exhibit significantly improved binding affinity as validated through experimental evaluations [18]. ...
Article
Full-text available
We present a user-friendly molecular generative pipeline called Pocket Crafter, specifically designed to facilitate hit finding activity in the drug discovery process. This workflow utilized a three-dimensional (3D) generative modeling method Pocket2Mol, for the de novo design of molecules in spatial perspective for the targeted protein structures, followed by filters for chemical-physical properties and drug-likeness, structure–activity relationship analysis, and clustering to generate top virtual hit scaffolds. In our WDR5 case study, we acquired a focused set of 2029 compounds after a targeted searching within Novartis archived library based on the virtual scaffolds. Subsequently, we experimentally profiled these compounds, resulting in a novel chemical scaffold series that demonstrated activity in biochemical and biophysical assays. Pocket Crafter successfully prototyped an effective end-to-end 3D generative chemistry-based workflow for the exploration of new chemical scaffolds, which represents a promising approach in early drug discovery for hit identification.
... We also assessed the public health and health economics impacts of two additional scenarios: increased vaccination coverage, and temporary partial cross-protection from 2vHPV against HPV31, 33, and 45 (Supplementary Table S8, Supplementary Table S9) [38]. For the cross-protection scenario, time-dependent clinical trial data were used to fit a simple vaccine trial model with two parameters (efficacy against persistent infection and duration of protection), using Markov chain Monte Carlo methods [73,74]. The level of protection against transient infections with these strains was conservatively assumed to be the same as that provided by 9vHPV. ...
Article
Full-text available
Background A bivalent human papillomavirus vaccine (2vHPV) is currently used in the Netherlands; a nonavalent vaccine (9vHPV) is also licensed. Research design and methods We compared the public health and economic benefits of 2vHPV- and 9vHPV-based vaccination strategies in the Netherlands over 100 years using a validated deterministic dynamic transmission metapopulation model. Results Compared to 2vHPV, the 9vHPV strategy averted an additional 3,245 cases of and 825 deaths from 9vHPV-strain-attributable cancers, 4,247 cases of and 190 deaths from recurrent respiratory papillomatosis (RRP), and 1,009,637 cases of anogenital warts (AGWs), with an incremental cost-effectiveness ratio (ICER) of €4,975 per quality-adjusted life year (QALY) gained. The ICER increased in a scenario with increased HPV vaccination coverage rates and was relatively robust to one-way deterministic sensitivity analyses, with variation in the disease utility parameter having the most impact. When catch-up vaccination for individuals ≤26 years of age was added to the model, vaccinating with 9vHPV averted additional cancers and AGWs compared to 2vHPV vaccination. Conclusion Our analyses predict that transitioning from a 2vHPV- to a 9vHPV-based vaccination strategy would be cost-effective in the Netherlands.
... This yields more stable results in comparison to other traditional algorithms (Lerche et al., 2017). To estimate model parameters, we employed a Markov-chain Monte Carlo sampling procedure (Gamerman & Lopes, 2006). A chain with 5000 samples was used; and the first 500 samples were discarded as burn-in, to allow for the sampling procedure to settle around a value after an initial more exploratory sampling. ...
Preprint
Full-text available
Retrospective attention is involuntarily or voluntarily oriented to working memory (WM) contents. Previous research has not assessed voluntary attention ruling out the effects of involuntary attention. Furthermore, it is unknown whether the voluntariness of attention impacts differently on perceptual and semantic WM contents. To address this, reaction time and accuracy data from two retro-cueing experiments were modeled with a drift-diffusion model. Surprisingly, the voluntariness of attention did not interact with the WM content type. In turn, drift rates indicated that attention exerted greater retro-cueing effects on perceptual vs. semantic WM contents, and non‐decision times revealed effects only for voluntary attention. This evidences: first, that attention has a stronger impact on the quality of perceptual over semantic contents when they compete for WM storage; second, that voluntariness is crucial for retrieving WM contents in advance of decision-making; finally, the effects of voluntary attention can be independent of involuntary attentional orienting.
... Moreover, in this work, the source term exhibits discontinuities concerning the unknown parameters. Therefore, we opt for the gradient-free MH MCMC method [12]. ...
Article
Full-text available
This study focuses on addressing the inverse source problem associated with the parabolic equation. We rely on sparse boundary flux data as our measurements, which are acquired from a restricted section of the boundary. While it has been established that utilizing sparse boundary flux data can enable source recovery, the presence of a limited number of observation sensors poses a challenge for accurately tracing the inverse quantity of interest. To overcome this limitation, we introduce a sampling algorithm grounded in Langevin dynamics that incorporates dynamic sensors to capture the flux information. Furthermore, we propose and discuss two distinct sensor migration strategies. Remarkably, our findings demonstrate that even with only two observation sensors at our disposal, it remains feasible to successfully reconstruct the high-dimensional unknown parameters.
... Consequently, it allows for more stable results, even with fewer data per participant with respect to the traditionally used algorithms (Lerche, Voss, & Nagler, 2017). The estimation of model parameter distributions within the HDDM toolbox relies on a Markov-chain Monte Carlo sampling procedure (Gamerman & Lopes, 2006). We used a chain with 10,000 samples; the first 1000 samples were discarded as burn-in, to allow for the sampling procedure to settle around a value after an initial more exploratory sampling. ...
Article
Full-text available
Information in working memory (WM) is crucial for guiding behavior. However, not all WM representations are equally relevant simultaneously. Current theoretical frameworks propose a functional dissociation between ‘latent’ and ‘active’ states, in which relevant representations are prioritized into an optimal (active) state to face current demands, while relevant information that is not immediately needed is maintained in a dormant (latent) state. In this context, task demands can induce rapid and flexible prioritization of information from latent to active state. Critically, these functional states have been primarily studied using simple visual memories, with attention selecting and prioritizing relevant representations to serve as templates to guide subsequent behavior. It remains unclear whether more complex WM representations, such as novel stimulus-response associations, can also be prioritized into different functional states depending on their task relevance, and if so how these different formats relate to each other. In the present study, we investigated whether novel WM-guided actions can be brought into different functional states depending on current task demands. Our results reveal that planned actions can be flexibly prioritized when needed and show how their functional state modulates their influence on ongoing behavior. Moreover, they suggest the representations of novel actions of different functional states are maintained in WM via a non-orthogonal coding scheme, thus are prone to interference.
... Each of these had 100 path steps and chain lengths of 5 million, logged every 5000 generations. We also ran two full BEAST analyses; the Markov Chain Monte Carlo analyses (MCMC; Gamerman and Lopes, 2006) of each were run for over 1 billion generations (Drummond et al., 2006). Trees were sampled every 200,000 generations. ...
... The posterior full conditional of C has an analytical closed-form. Despite that, to obtain a more efficient sampling scheme, the Metropolis-Hastings step was used to update C's chain instead of using Gibbs Sampling in MCMC [2,7]. Two approaches were considered to generate proposal values for C: Multinomial proposals based on prior probabilities p; and proposals based on mutations of some elements of the configuration C drawn in the previous iteration. ...
Conference Paper
Identifying key nodes, estimating the probability of connection between them, and distinguishing latent groups are some of the main objectives of social network analysis. In this paper, we propose a class of blockmodels to model stochastic equivalence and visualize groups in an unobservable space. In this setting, the proposed method is based on two approaches: latent distances and latent dissimilarities at the group level. The projection proposed in the paper is performed without needing to project individuals, unlike the main approaches in the literature. Our approach can be used in undirected or directed graphs and is flexible enough to cluster and quantify between and withingroup tie probabilities in social networks. The effectiveness of the methodology in representing groups in latent spaces was analyzed under artificial datasets and in a case study.
... To perform Bayesian inference using the stochastic SIR model, (2) and (3), we apply the linear noise approximation (LNA) to obtain a tractable likelihood function, assume a Normal prior, and use a Markov chain Monte Carlo algorithm (see e.g., Gamerman and Lopes (2006)) to sample values from the posterior distribution of the parameter . We fix based on the set removal rate in the IBM, i.e., = 1∕ . ...
... The special case where a single random vector is sampled involves treatment of the elements component-by-component, and this scheme is known as the component-wise Metropolis-Hastings (CWMH) algorithm. There are different strategies to choose the scanning order of a Gibbs sampler; see [14,38]. ...
Article
Full-text available
This paper introduces CUQIpy, a versatile open-source Python package for computational uncertainty quantification (UQ) in inverse problems, presented as Part I of a two-part series. CUQIpy employs a Bayesian framework, integrating prior knowledge with observed data to produce posterior probability distributions that characterize the uncertainty in computed solutions to inverse problems. The package offers a high-level modeling framework with concise syntax, allowing users to easily specify their inverse problems, prior information, and statistical assumptions. CUQIpy supports a range of efficient sampling strategies and is designed to handle large-scale problems. Notably, the automatic sampler selection feature analyzes the problem structure and chooses a suitable sampler without user intervention, streamlining the process. With a selection of probability distributions, test problems, computational methods, and visualization tools, CUQIpy serves as a powerful, flexible, and adaptable tool for UQ in a wide selection of inverse problems. Part II of the series focuses on the use of CUQIpy for UQ in inverse problems with partial differential equations (PDEs).
... A Metropolis-Hastings step will assess the remaining unknowns, as their full conditionals are nonlinear, making direct sampling difficult. The Gibbs and Metropolis-Hastings steps will be performed alternately in a so-called Metropolis in Gibbs MCMC algorithm (Gamerman and Lopes 2006;Müller 1994;Robert et al. 2010;Albani et al. 2021) as defined by the pseudo-code Algorithm 1 in Section A.1 in the Supplementary Materials. ...
Article
Full-text available
We propose a methodology to estimate unknown atmospheric releases, including the number of emissions, addressing overfitting, and using an economical number of unknowns. It is based on the combination of accurate modeling to solve the dispersion problem with Bayesian inference to identify the parameters from observed concentrations. The estimation tool is tested with the Fusion Field Trial 2007 (FFT-07) data set.
... An standard choice to be tested in the future is letting σ 2 k ∼ Half-Cauchy(a σ , b σ ), for k = 1, 2, and ρ ∼ Unif(a ρ , b ρ ), for carefully chosen values of the new set of hyperparameters a σ , b σ , a ρ , b ρ . In this regard, either simulation-based methods (Gamerman & Lopes, 2006) or variational approximations (Blei et al., 2017) can be employed to approximate the posterior distribution. This approach will be pursued elsewhere. ...
Article
Network data arises naturally in a wide variety of applications in different fields. In this article we discuss in detail the statistical modeling of financial networks. The structure of such networks red has not been studied thoroughly in the past, mainly due to limited accessible data. We explore the structure of a real trading network corresponding to transactions within the natural gas future market over a four-year period. The detection of meaningful communities of actors within networks is particularly relevant to understand the topology of a complex system like this. We explore the usage of stochastic block models in conjunction with a nonparametric Bayesian approach in order to identify clusters of traders in a flexible modeling framework. Our findings strongly indicate that the proposed models are highly reliable at detecting community structures.
... The Bayesian paradigm provides a natural mechanism for handling partially observed datasets, incorporating observation errors, and quantifying and propagating the uncertainty in the model parameters and dynamic components. For mechanistic models formed of systems of stochastic differential equations with tractable likelihoods, or suitable tractable approximations to the model likelihood, Markov chain Monte Carlo methods can be used [95,96], common for applications in systems biology [97,98]. Such techniques have been employed to infer model parameters for stem cell methylation patterns in colonic crypts, including cell population numbers, niche succession time, and the rate of the methylation/demethylation process [99]. ...
Article
Full-text available
Purpose of Review To explore the advances and future research directions in image analysis and computational modelling of human stem cells (hSCs) for ophthalmological applications. Recent Findings hSCs hold great potential in ocular regenerative medicine due to their application in cell-based therapies and in disease modelling and drug discovery using state-of-the-art 2D and 3D organoid models. However, a deeper characterisation of their complex, multi-scale properties is required to optimise their translation to clinical practice. Image analysis combined with computational modelling is a powerful tool to explore mechanisms of hSC behaviour and aid clinical diagnosis and therapy. Summary Many computational models draw on a variety of techniques, often blending continuum and discrete approaches, and have been used to describe cell differentiation and self-organisation. Machine learning tools are having a significant impact in model development and improving image classification processes for clinical diagnosis and treatment and will be the focus of much future research.
... Response times are a sum of durations of decision processes, and processes unrelated to decision making (i.e., the non-decision time) such as perceptual encoding and response selection.The Bayesian Inference (i.e., BI) model treats evidence accumulation as iterative sampling of a noisy sensory input at time ( = ( , )), which depends on the presented motion direction ( ) and the sampling noise ( ). Using iterative Bayesian inference35 , sensory samples are evaluated against each of the 16 perceptual channels, represented as a probabilistic distribution centred on 16 possible motion directions ( ), with the spread reflecting internal noise (̂,̂). This process yields a vector of likelihoods (or certainties) of all channels given the sampled sensory inputs (Figure 4A). ...
Preprint
Full-text available
Fast and accurate decisions are fundamental for adaptive behaviour. Theories of decision making posit that evidence in favour of different choices is gradually accumulated until a critical value is reached. It remains unclear, however, which aspects of the neural code get updated during evidence accumulation. Here we investigated whether evidence accumulation relies on a gradual increase in the precision of neural representations of sensory input. Healthy human volunteers discriminated global motion direction over a patch of moving dots, and their brain activity was recorded using electroencephalography. Time-resolved neural uncertainty was estimated using multivariate feature-specific analyses of brain activity. Behavioural measures were modelled using iterative Bayesian inference either on its own (i.e., the full model), or by swapping free model parameters with neural uncertainty estimates derived from brain recordings. The neurally-restricted model was further refitted using randomly shuffled neural uncertainty. The full model and the unshuffled neural model yielded very good and comparable fits to the data, while the shuffled neural model yielded worse fits. Taken together, the findings reveal that the brain relies on reducing neural uncertainty to regulate decision making. They also provide neurobiological support for Bayesian inference as a fundamental computational mechanism in support of decision making.
... Thus, the estimates of the individual-level parameters are constrained by the grouplevel estimates and gradually update the group-level parameter estimates (Shiffrin et al., 2008). Then the posterior distribution of the group-level parameters is estimated using a Monte Carlo Markov chain method, which is a parameter sampling algorithm commonly used for Bayesian approaches (Gamerman & Lopes, 2014), with prior distributions of the group-level parameters and the likelihood of the observed RT data (positive/negative if choice is high/low value option) as a function of the subject-level decision parameters (Navarro & Fuss, 2009). In addition, due to the common existence of fast RT contaminates in the RT data, 5 HDDM specifies a generative model for RT as a mixture model with a fixed probability (usually set to 0.05) of RTs coming from a uniform distribution of RT contaminants, and the remaining probability of RTs coming from the drift-diffusion process. ...
Article
Full-text available
Behavioral science demands skillful experimentation and high-quality data that are typically gathered in person. However, the COVID-19 pandemic forced many behavioral research laboratories to close. Thankfully, new tools for conducting online experiments allow researchers to elicit psychological responses and gather behavioral data with unprecedented precision. It is now possible to quickly conduct large-scale high-quality behavioral experiments online, even for studies designed to generate data necessary for complex computational models. However, these techniques require new skills that might be unfamiliar to behavioral researchers who are more familiar with laboratory-based experimentation. We present a detailed tutorial introducing an end-to-end build of an online experimental pipeline and corresponding data analysis. We provide an example study investigating people’s media preferences using drift-diffusion modeling (DDM), paying particular attention to potential issues that come with online behavioral experimentation. This tutorial includes sample data and code for conducting and analyzing DDM data gathered in an online experiment, thereby mitigating the extent to which researchers must reinvent the wheel.
... the initial condition (0) of model (2.18), we utilize MCMC simulations based on the Delayed Rejection Adaptive Metropolis Hastings (DRAM) algorithm[45,46]. These six parameters and one initial value are estimated by 10000 iterations with a burn-in of 2000 iterations, andFig. ...
... Therefore, we utilize these two data sets for our model validation and parameter estimation. We employ extensive Markov-chain Monte-Carlo (MCMC) simulations based on the adaptive combination Delayed Rejection and Adaptive Metropolis (DRAM) method [58][59][60] for system (1) to estimate the value of parameters r and p T . Using 1000 sample realizations, we obtain the parameter values for r and p T for both patients (patient A and B). ...
... Kroki algorytmu Gibbsa (np. [11], [24], [58]): ...
Article
The primary objective of the work is to use Bayesian methods to investigate women fertility in Poland and identify key factors influencing it. Bayesian Poisson regression model has been used in the analysis. The model allows determining factors that have a significant impact on the number of children born. Moreover Bayesian approach makes it possible to incorporate a priori knowledge and improve the estimation of model parameters. The model has been estimated using Markov chain Monte Carlo method with Gibbs sampling. The work has been based on the Polish study ”Family changes and Fertility Patterns in Poland” (1991). The following attributes have been considered in the analysis of women fertility: place of living, education, marital status, employment and religion. The results have been compared with the results of related research for Poland and other countries.
... Therefore, we utilize these two data sets for our model validation and parameter estimation. We employ extensive Markov-chain Monte-Carlo (MCMC) simulations based on the adaptive combination Delayed Rejection and Adaptive Metropolis (DRAM) method [58][59][60] for system (1) to estimate the value of parameters and . Using 1000 sample realizations, we obtain the parameter values for and for both patients (patient A and B). ...
... Based on the waiting time data presented in Table 1, we estimate the parameter values (θ, α) = (0.0077, 0.0422) using the method of moments. To generate samples from the specified distribution with the parameter values (θ, α) = (0.0077, 0.0422), we utilize the acceptance-rejection method [11]. In this process, we set the sample size to 1, 000 and perform 1, 000, 000 itera- tions for each method to ensure sufficient sample generation and accurate representation of the distribution. ...
Article
Full-text available
We have introduced a novel continuous distribution known as the Klongdee distribution, which is a combination of the exponential distribution with parameter (θ α) and the gamma distribution with parameters (2, θ α). We thoroughly examined various statistical properties that provide insights into probability distributions. These properties encompass measures such as the cumulative distribution function, moments about the origin, and the moment-generating function. Additionally, we explored other important measures including skewness, kurtosis, C.V., and reliability measures. Furthermore, we explore parameter estimation using nonlinear least squares methods. The numerical results presented compare the unweighted and weighted least squares (UWLS and WLS) methods, maximum likelihood estimation (MLE), and method of moments (MOM). Based on our findings, the MLE demonstrates superior performance compared to other parameter estimation methods. Moreover, we demonstrate the application of this distribution within an actuarial context, specifically in the analysis of collective risk models using a mixed Poisson framework. By incorporating the proposed distribution into the mixed Poisson model and analyzing a real-life dataset, it has been determined that the Poisson-Klongdee model outperforms alternative models in terms of performance. Highlighting its capability to mitigate the problem of overcharges, the Poisson-Klongdee model has been proven to be a valuable tool.
... This procedure was also performed by Andrade et al. (2015) and discussed in Gamerman and Lopes (2006); Krüger, Lerch, Thorarinsdottir, and Gneiting (2020). On this wise, the expected value of y t+h is: ...
Article
Full-text available
Extensions of the Autoregressive Moving Average, ARMA(p, q), class for modeling non-Gaussian time series have been proposed in the literature in recent years, being applied in phenomena such as counts and rates. One of them is the Generalized Autoregres-sive Moving Average, GARMA(p, q), that is supported by the Generalized Linear Models theory and has been studied under the Bayesian perspective. This paper aimed to study models for time series of counts using the Poisson, Negative binomial and Poisson inverse Gaussian distributions, and adopting the Bayesian framework. To do so, we carried out a simulation study and, in addition, we showed a practical application and evaluation of these models by using a set of real data, corresponding to the number of vehicle thefts in Brazil.
... Recientemente, los métodos Bayesianos se han vuelto populares con la aparición de nuevos algoritmos computacionales que abordan esta integración de manera directa. El desarrollo de métodos, como los métodos de Monte Carlo vía cadenas de Markov (MCMC, Markov chain Monte Carlo) (Chen et al., 2000;Gamerman y Lopes, 2006), el algoritmo de aproximación integral anidada de Laplace (INLA, Integrated Nested Laplace Approximation) (Rue et al., 2009(Rue et al., , 2017, o los métodos de Monte Carlo Hamiltoniano (HMC, Hamiltonian Monte Carlo) (Betancourt, 2018), han permitido dar soluciones numéricas para problemas basados en modelos complejos. ...
... Utiliza-se, portanto, técnicas de simulação, com auxílio de pacotes residentes no software R, como o MCMCpack, que implementa alguns modelos, e o coda, para visualização das cadeias, aplicando assim, o método "Monte Carlo via cadeias de Markov (MCMC) para obter as respectivas distribuições a posteriori" (GAMERMAN e LOPES, 2006). Kinas e Andrade (2010) enfatizam que "o rápido crescimento do uso da estatística bayesiana em ciências aplicadas ao longo das duas últimas décadas, foi facilitado pelo surgimento de vários programas computacionais para efetuar os cálculos estatísticos necessários". ...
Chapter
Full-text available
Apesar dos incentivos em pesquisas aplicadas a ecofisiologia de espécies florestais, trabalhos envolvendo o comportamento fisiológico de espécies nativas ante a adversidade ocasionadas pela deficiência hídrica ainda são escassos. O presente estudo objetivou avaliar o comportamento fisiológico de mudas de Bauhinia forficata Link (pata-de-vaca) quanto submetidas à supressão de rega e posterior reirrigação. Para tanto foi realizado um experimento em casa de vegetação do Laboratório de Fisiologia Vegetal da Universidade Federal de Alagoas, adotando-se um delineamento experimental inteiramente casualizado composto por três tratamentos hídricos (Controle – regado diariamente, Supressão de rega e Reirrigado) e duas épocas de avaliação (na ocasião da interrupção do rendimento quântico fotoquímico (Fv/Fm) do fotossistema II (PSII) e após a constatação da normalidade do Fv/Fm máximo do PSII no tratamento reirrigado) com dez repetições. Semanalmente foram avaliadas a altura das plantas, número de folhas e diâmetro do caule; também foram verificadas a Fv/Fm e os pigmentos fotossintéticos: clorofila a, b e carotenóides. Ao término do experimento foram determinados o comprimento radicular, produção e alocação de biomassa seca das folhas, caules, raízes e total. Os dados obtidos foram submetidos ao teste de Tukey a 5%. Os resultados permitem inferir que o índice Fv/Fm é um bom indicador do comportamento fisiológico de mudas de pata de vaca e que as mesmas suportam até oito dias de estiagem sem comprometer seu metabolismo, contudo, podem ser severamente afetadas caso o período de estiagem perdure, informação que pode ser útil a programas de reflorestamento e produtores de mudas da referida espécie.
... A commonly used method for obtaining posterior summaries is Markov Chain Monte Carlo (MCMC) simulation [42][43][44][45][46]. The high accuracy of an MCMC sampling-based approach is achieved at a high computational cost and is too expensive for the quasi real-time characterization and calibration setting in which the ICC framework will eventually operate (additional comments in Appendix A.1). ...
Preprint
Full-text available
Computational simulation is increasingly relied upon for high-consequence engineering decisions, and a foundational element to solid mechanics simulations, such as finite element analysis (FEA), is a credible constitutive or material model. Calibration of these complex models is an essential step; however, the selection, calibration and validation of material models is often a discrete, multi-stage process that is decoupled from material characterization activities, which means the data collected does not always align with the data that is needed. To address this issue, an integrated workflow for delivering an enhanced characterization and calibration procedure (Interlaced Characterization and Calibration (ICC)) is introduced. This framework leverages Bayesian optimal experimental design (BOED) to select the optimal load path for a cruciform specimen in order to collect the most informative data for model calibration. The critical first piece of algorithm development is to demonstrate the active experimental design for a fast model with simulated data. For this demonstration, a material point simulator that models a plane stress elastoplastic material subject to bi-axial loading was chosen. The ICC framework is demonstrated on two exemplar problems in which BOED is used to determine which load step to take, e.g., in which direction to increment the strain, at each iteration of the characterization and calibration cycle. Calibration results from data obtained by adaptively selecting the load path within the ICC algorithm are compared to results from data generated under two naive static load paths that were chosen a priori based on human intuition. In these exemplar problems, data generated in an adaptive setting resulted in calibrated model parameters with reduced measures of uncertainty compared to the static settings.
... In the inverse UQ context, MCMC methods are generally used to solve the Bayesian inverse problem by providing samples from the posterior density on model parameters, thereby enabling estimation of the posterior density or moments thereof [75], [106], [107], [108], [109]. Advanced MCMC methods have been developed to deal with the complexity and high dimensionality of posterior distributions [110], [111], [112], [113]. ...
Article
Full-text available
Data-driven science and technology offer transformative tools and methods to science. This review article highlights the latest development and progress in the interdisciplinary field of data-driven plasma science (DDPS), i.e., plasma science whose progress is driven strongly by data and data analyses. Plasma is considered to be the most ubiquitous form of observable matter in the universe. Data associated with plasmas can, therefore, cover extremely large spatial and temporal scales, and often provide essential information for other scientific disciplines. Thanks to the latest technological developments, plasma experiments, observations, and computation now produce a large amount of data that can no longer be analyzed or interpreted manually. This trend now necessitates a highly sophisticated use of high-performance computers for data analyses, making artificial intelligence and machine learning vital components of DDPS. This article contains seven primary sections, in addition to the introduction and summary. Following an overview of fundamental data-driven science, five other sections cover widely studied topics of plasma science and technologies, i.e., basic plasma physics and laboratory experiments, magnetic confinement fusion, inertial confinement fusion and high-energy-density physics, space and astronomical plasmas, and plasma technologies for industrial and other applications. The final section before the summary discusses plasma-related databases that could significantly contribute to DDPS. Each primary section starts with a brief introduction to the topic, discusses the state-of-the-art developments in the use of data and/or data-scientific approaches, and presents the summary and outlook. Despite the recent impressive signs of progress, the DDPS is still in its infancy. This article attempts to offer a broad perspective on the development of this field and identify where further innovations are required.
... To perform Bayesian inference using the stochastic SIR model, (2) and (3), we apply the linear noise approximation (LNA) to obtain a tractable likelihood function, assume a Normal prior, and use a Markov chain Monte Carlo algorithm (see e.g., [24]) to sample values from the posterior distribution of the parameter β. We fix γ based on the set removal rate in the ABM, i.e., γ = 1/T I . ...
Preprint
Full-text available
Tree populations worldwide are facing an unprecedented threat from a variety of tree diseases and invasive pests. Their spread, exacerbated by increasing globalisation and climate change, has an enormous environmental, economic and social impact. Computational agent-based models are a popular tool for describing and forecasting the spread of tree diseases due to their flexibility and ability to reveal collective behaviours. In this paper we present a versatile agent-based model with a Gaussian infectivity kernel to describe the spread of a generic tree disease through a synthetic treescape. We then explore several methods of calculating the basic reproduction number R0, a characteristic measurement of disease infectivity, defining the expected number of new infections resulting from one newly infected individual throughout their infectious period. It is a useful comparative summary parameter of a disease and can be used to explore the threshold dynamics of epidemics through mathematical models. We demonstrate several methods of estimating R0 through the agent-based model, including contact tracing, inferring the Kermack-McKendrick SIR model parameters using the linear noise approximation, and an analytical approximation. As an illustrative example, we then use the model and each of the methods to calculate estimates of R0 for the ash dieback epidemic in the UK.
Article
In Bayesian probabilistic programming, a central problem is to estimate the normalised posterior distribution (NPD) of a probabilistic program with conditioning via score (a.k.a. observe) statements. Most previous approaches address this problem by Markov Chain Monte Carlo and variational inference, and therefore could not generate guaranteed outcomes within a finite time limit. Moreover, existing methods for exact inference either impose syntactic restrictions or cannot guarantee successful inference in general. In this work, we propose a novel automated approach to derive guaranteed bounds for NPD via polynomial solving. We first establish a fixed-point theorem for the wide class of score-at-end Bayesian probabilistic programs that terminate almost-surely and have a single bounded score statement at program termination. Then, we propose a multiplicative variant of Optional Stopping Theorem (OST) to address score-recursive Bayesian programs where score statements with weights greater than one could appear inside a loop. Bayesian nonparametric models, enjoying a renaissance in statistics and machine learning, can be represented by score-recursive Bayesian programs and are difficult to handle due to an integrability issue. Finally, we use polynomial solving to implement our fixed-point theorem and OST variant. To improve the accuracy of the polynomial solving, we further propose a truncation operation and the synthesis of multiple bounds over various program inputs. Our approach can handle Bayesian probabilistic programs with unbounded while loops and continuous distributions with infinite supports. Experiments over a wide range of benchmarks show that compared with the most relevant approach (Beutner et al., PLDI 2022) for guaranteed NPD analysis via recursion unrolling, our approach is more time efficient and derives comparable or even tighter NPD bounds. Furthermore, our approach can handle score-recursive programs which previous approaches could not.
Article
Full-text available
O presente trabalho aborda a solução de um problema inverso em transferência de calor por convecção forçada em regime estacionário com fluxo laminar em micro canal com regime de deslizamento. A análise dos resultados do problema inverso envolve a estimativa do perfil de temperatura de entrada utilizando as informações a priori juntamente com as medições de temperatura simuladas, realizadas a jusante da entrada. Pela primeira vez na literatura, é empregado uma técnica Bayesiana (método de Monte Carlo via Cadeias de Markov) para um problema de estimativa de função de entrada em micro canal, por meio do algoritmo Metropolis-Hastings. São analisadas duas funções de entrada distintas, uma parábola e uma função degrau, e são obtidas boas aproximações para essas funções, assim como uma distribuição de probabilidade para os parâmetros da função.
Technical Report
Full-text available
A geração eólica é fundamental para redução do uso de recursos fósseis e, consequen-temente, da emissão de gases de efeito estufa (GEEs). Atualmente, o Sistema Interligado Nacional (SIN) conta com cerca de 9.971 aerogeradores em operação, a sexta maior rede no ranking mundial, sendo que no Nordeste estão localizados 80% dos parques eólicos brasileiros. No entanto, devido à intermitência da geração eólica, a entrada em larga escala da fonte eólica no planejamento de sistemas hidrotérmicos implica desafios para sua integração ao sistema. Por esse motivo, há a necessidade do tratamento da incerteza da velocidade do vento na cadeia de modelos computacionais que suportam as decisões operativas. Assim, visando contribuir com a modelagem da incerteza eólica no planejamento da operação (médio prazo), este trabalho propõe uma metodologia que consiste na aplicação do modelo fatorial dinâmico (MFD) para a geração de séries sintéticas das médias mensais da velocidade do vento nas localidades com aproveitamentos eólicos. A metodologia proposta foi aplicada em um conjunto de séries de reanálises da velocidade do vento oriundas do MERRA-2. Os resultados obtidos para os oito parques eólicos do SIN analisados mostram que a metodologia é promissora, dada a boa qualidade das previsões mensais da média mensal da velocidade do vento até dois anos à frente e, sobretudo, a boa representatividade das séries sintéticas geradas pelo modelo.
Article
Full-text available
The high prevalence of human papillomavirus (HPV) infection in China suggests there would be a substantial positive health impact of widespread vaccination against HPV. We adapted a previously described dynamic transmission model of the natural history of HPV infection and related diseases to the Chinese setting to estimate the public health impact in China of 2-valent (with and without cross-protection), 4-valent, and 9-valent HPV vaccination strategies. The model predicted the incidence and mortality associated with HPV-related diseases, including cervical and noncervical cancers, genital warts, and recurrent respiratory papillomatosis (RRP), based on the various vaccination coverage rate (VCR) scenarios, over a 100-year time horizon. The public health impact of the 4 vaccination strategies was estimated in terms of cases and deaths averted compared to a scenario with no vaccination. Under the assumption of various primary and catch-up VCR scenarios, all 4 vaccination strategies reduced the incidence of cervical cancer in females and noncervical cancers in both sexes, and the 4-valent and 9-valent vaccines reduced the incidence of genital warts and RRP in both sexes. The 9-valent vaccination strategy was superior on all outcomes. The number of cervical cancer cases averted over 100 years ranged from ~ 1 million to ~ 5 million while the number of cervical cancer deaths averted was ~ 345,000 to ~ 1.9 million cases, depending on the VCR scenario. The VCR for primary vaccination was the major driver of cases averted.
Article
This paper focuses on modeling surrender time for policyholders in the context of life insurance. In this setup, a large lapse rate at the first months of a contract is often observed, with a decrease in this rate after some months. The modeling of the time to cancelation must account for this specific behavior. Another stylized fact is that policies which are not canceled in the study period are considered censored. To account for both censoring and heterogeneous lapse rates, this work assumes a Bayesian survival model with a mixture of regressions. The inference is based on data augmentation allowing for fast computations even for datasets of over millions of clients. Moreover, frequentist point estimation based on Expectation–Maximization algorithm is also presented. An illustrative example emulates a typical behavior for life insurance contracts, and a simulated study investigates the properties of the proposed model. A case study is considered and illustrates the flexibility of our proposed model allowing different specifications of mixture components. In particular, the observed censoring in the insurance context might be up to $50\%$ of the data, which is very unusual for survival models in other fields such as epidemiology. This aspect is exploited in our simulated study.
Article
Ignoring the presence of dependent censoring in data analysis can lead to biased estimates, for example, not considering the effect of abandonment of the tuberculosis treatment may influence inferences about the cure probability. In order to assess the relationship between cure and abandonment outcomes, we propose a copula Bayesian approach. Therefore, the main objective of this work is to introduce a Bayesian survival regression model, capable of taking into account the dependent censoring in the adjustment. So, this proposed approach is based on Clayton's copula, to provide the relation between survival and dependent censoring times. In addition, the Weibull and the piecewise exponential marginal distributions are considered in order to fit the times. A simulation study is carried out to perform comparisons between different scenarios of dependence, different specifications of prior distributions, and comparisons with the maximum likelihood inference. Finally, we apply the proposed approach to a tuberculosis treatment adherence dataset of an HIV cohort from Alvorada‐RS, Brazil. Results show that cure and abandonment outcomes are negatively correlated, that is, as long as the chance of abandoning the treatment increases, the chance of tuberculosis cure decreases.
ResearchGate has not been able to resolve any references for this publication.