Article

Mathematical Statistics: A Decision Theoretic Approach.

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Rigorous notions of point estimation and optimality of an estimator can be achieved only within a decision-theoretic framework (see, e.g., [7]), at least if we admit all estimators into competition and disregard distinguished restrictions such as unbiasedness or equivariance. In turn, decision theory proves to be genuinely Bayesian, thanks to a well-known result by Abraham Wald. ...
... solves the minimization problem (7). By the way, notice that a combination of de Finetti's representation theorem with basic properties of conditional distributions entails that ...
... where W U denotes the 2-Wasserstein distance on P 2 (U, d U ). See [27] [Chapters [6][7] for more information on the Wasserstein distance. Therefore, if π n (· | x 1 , . . . ...
Article
Full-text available
The point estimation problems that emerge in Bayesian predictive inference are concerned with random quantities which depend on both observable and non-observable variables. Intuition suggests splitting such problems into two phases, the former relying on estimation of the random parameter of the model, the latter concerning estimation of the original quantity from the distinguished element of the statistical model obtained by plug-in of the estimated parameter in the place of the random parameter. This paper discusses both phases within a decision theoretic framework. As a main result, a non-standard loss function on the space of parameters, given in terms of a Wasserstein distance, is proposed to carry out the first phase. Finally, the asymptotic efficiency of the entire procedure is discussed.
... In a similar reasoning, if an employee is willing to pay $100 in order to have a safety equipment installed, to reduce the risk of death by one percent, we may say that his value of statistical life is $10,000. The key concept here is the von Neumann-Morgentern utility function, in the framework of decision theory (von Neumann & Morgenstern, 1947;Ferguson, 1967;Berger, 1985;Campello de Souza, 2007b). ...
... They should include, for instance, the value of recreation and leisure, domestic production, mental health, and potential marital and family conflict. We should be using a von Neumann-Morgenstern utility function, in the framework of decision theory (von Neumann & Morgenstern, 1947;Ferguson, 1967;Berger, 1985;Campello de Souza, 2007b). There are many ways of doing this. ...
... We need conceptual and analytical tools in order to go ahead with this endeavor. Decision Theory (von Neumann & Morgenstern, 1947;Ferguson, 1967;Berger, 1985;Campello de Souza, 2007b) offers an adequate tool box. ...
... We use the monotone likelihood ratio property (Ferguson, 1967) to analyze the results. Let Z + be the set of non-negative integers and X be a discrete random variable with probability ...
... Proof This result is essentially known (Ferguson, 1967). ...
Article
Full-text available
Sponsored search advertising has steadily emerged as one of the most popular advertising tools in online retail. Customers prefer search results that appear on the top to those that appear lower and are willing to pay more for products/brands that appear higher on the search. Sponsored search has a higher conversion efficiency and impacts demand more endogenously through the ranking on the search page than traditional advertising. Online retailers (e-tailers) invest aggressively in bidding to ensure they are ranked high on the search pages. The dynamic nature of sponsored search entails a higher degree of inventory readiness, and e-tailers must dovetail their sponsored search advertising strategy to drive traffic with the level of inventory to avoid consumer disappointments due to stockouts. Extant research has not delved into this critical aspect of sponsored search advertising. We endeavor to solve this business problem for an e-tailer in a dynamic stochastic setting and provide a multi-threshold decision support framework based on different inventory levels. The policy identifies inventory levels: (i) at which a retailer should not place an order, (ii) her desired level of inventory, and (iii) a ceiling up to which no bids are placed. The e-tailer can use our proposed framework to derive an inventory based sponsored search advertising campaign that ensures synchronization between bids and inventory and increases profits. Our results show that customers’ sensitivity to the website’s search rank and variation in reservation price impact the e-tailer's inventory and sponsored search bidding decisions.
... A rigorous proof of this fundamental result is tedious and involves several delicate technical details. Alternative proofs can be found in [14][15][16][17][18]. ...
... Note that several interesting statistical and practical applications of these results to invariant sequential testing and multisample slippage scenarios are discussed in Sections 4.5 and 4.6 of Tartakovsky et al. [6] (see Mosteller [30] and Ferguson [16] for terminology regarding multisample slippage problems). ...
Article
Full-text available
In the first part of this article, we discuss and generalize the complete convergence introduced by Hsu and Robbins in 1947 to the r-complete convergence introduced by Tartakovsky in 1998. We also establish its relation to the r-quick convergence first introduced by Strassen in 1967 and extensively studied by Lai. Our work is motivated by various statistical problems, mostly in sequential analysis. As we show in the second part, generalizing and studying these convergence modes is important not only in probability theory but also to solve challenging statistical problems in hypothesis testing and changepoint detection for general stochastic non-i.i.d. models.
... In Section 2, we introduce basic notions and key results in standard statistical decision theory: randomized decision procedures, minimax decision procedures, least favorable priors, and the minimax theorem for statistical decision problems with finite parameter space. Classic treatments can be found in [Fer67] and [BG54], the latter emphasizing the connection with game theory, but restricting itself to finite discrete spaces. A modern treatment can be found in [LC98]. ...
... We conclude this section with the following example showing that Theorem 2.2 (specifically Eq. (2.4)) need not hold for statistical decision problems with infinite parameter spaces. This counterexample can be found in [Fer67]. ...
Preprint
For statistical decision problems with finite parameter space, it is well-known that the upper value (minimax value) agrees with the lower value (maximin value). Only under a generalized notion of prior does such an equivalence carry over to the case infinite parameter spaces, provided nature can play a prior distribution and the statistician can play a randomized strategy. Various such extensions of this classical result have been established, but they are subject to technical conditions such as compactness of the parameter space or continuity of the risk functions. Using nonstandard analysis, we prove a minimax theorem for arbitrary statistical decision problems. Informally, we show that for every statistical decision problem, the standard upper value equals the lower value when the $\sup$ is taken over the collection of all internal priors, which may assign infinitesimal probability to (internal) events. Applying our nonstandard minimax theorem, we derive several standard minimax theorems: a minimax theorem on compact parameter space with continuous risk functions, a finitely additive minimax theorem with bounded risk functions and a minimax theorem on totally bounded metric parameter spaces with Lipschitz risk functions.
... for any x ∈ R n denote the spatial rank map of P . In the univariate case n = 1, it is showed in [8] that spatial quantiles of order α ∈ [0, 1) in direction u ∈ {−1, +1} reduce to the usual quantiles of order (αu + 1)/2 ∈ [0, 1). Still when n = 1, we have that ...
... Since A is at most countable, we have that R Q k → R P almost everywhere. In order to apply the dominated convergence theorem to the r.h.s. of (8), observe that L * n (ψ) ∈ L 1 (R n ). Indeed, if n is even, we have that (−∆) ...
Preprint
Full-text available
We address the problem of recovering a probability measure $P$ over $\R^n$ (e.g. its density $f_P$ if one exists) knowing the associated multivariate spatial rank $R_P$ only. It has been shown in \cite{Kol1997} that multivariate spatial ranks characterize probability measures. We strengthen this result by explictly recovering $f_P$ by $R_P$ in the form of a (potentially fractional) partial differential equation $f_P = \LL_n (R_P)$, where $\LL_n$ is a differential operator given in closed form that depends on $n$. When $P$ admits no density, we further show that the equality $P=\LL_n (R_P)$ still holds in the sense of distributions (i.e. generalized functions). We throughly investigate the regularity properties of spatial ranks and use the PDE we established to give qualitative results on depths contours and regions. %We illustrate the relation between $f_P$ and $R_P$ on a few examples in dimension $2$ and $3$. We study the local properties of the operator $\LL_n$ and show that it is non-local when $n$ is even. We conclude the paper with a partial counterpart to the non-localizability in even dimensions.
... We exclude the possibility of a randomized strategy, although it is sometimes better than any nonrandomized strategy, at least theoretically. (For example, see Section 1.5 in Ferguson [30]). For example, let us consider six demand points, (ξ, φ) = (π/2 ± , 0), (π/2 ± , 2π/3), (π/2 ± , 4π/3) on the unit sphere, where is a small positive constant. ...
... Then, we call the prior a least favorable prior (LFP). LFP is one of the technical terms in statistical decision theory or game theory (see, e.g., Section 1.7, p. 39 in Ferguson [30]). Using the least favorable prior π LF , PQI is defined by ...
Article
Full-text available
When we consider an error model in a quantum computing system, we assume a parametric model where a prepared qubit belongs. Keeping this in mind, we focus on the evaluation of the amount of information we obtain when we know the system belongs to the model within the parameter range. Excluding classical fluctuations, uncertainty still remains in the system. We propose an information quantity called purely quantum information to evaluate this and give it an operational meaning. For the qubit case, it is relevant to the facility location problem on the unit sphere, which is well known in operations research. For general cases, we extend this to the facility location problem in complex projective spaces. Purely quantum information reflects the uncertainty of a quantum system and is related to the minimum entropy rather than the von Neumann entropy.
... for any θ ∈ Θ 0 (Ferguson, 1967). By proceeding conditionally we ensure that the NRP is unaffected by the value of δ. ...
... As an alternative to a LR test, we instead choose, for each t ∈ T, the critical function, φ (D) to maximize the derivative of the (conditional) power function (Ferguson, 1967, Lemma 1, Section 5.5). ...
... The so-called residuals are first generated by employing traditional techniques such as the state observer approach [55,92,190,209], the Kalman filter approach [10,116,205,207,208], the parity space approach [31,54,64,73] or the parameter estimation approach [82], etc. They are then evaluated by utilizing statistical decision theory [10,109,175], including non-sequential hypothesis testing [19,49,109], sequential hypothesis testing [105,175,194,195] or sequential change-point detection and isolation [10,105,175]. faults) can be described as ...
... Theorem 2.8. (Bayesian test [49,175]). Consider the multiple hypothesis testing problem between K + 1 simple hypotheses H 0 , H 1 , · · · , H K with the loss function 0 − 1 and the a priori probabilities q 0 , q 1 , · · · , q K . ...
Thesis
This PhD thesis is registered in the framework of the project “SCALA” which received financial support through the program ANR-11-SECU-0005. Its ultimate objective involves the on-line monitoring of Supervisory Control And Data Acquisition (SCADA) systems against cyber-physical attacks. The problem is formulated as the sequential detection and isolation of transient signals in stochastic-dynamical systems in the presence of unknown system states and random noises. It is solved by using the analytical redundancy approach consisting of two steps: residual generation and residual evaluation. The residuals are firstly generated by both Kalman filter and parity space approaches. They are then evaluated by using sequential analysis techniques taking into account certain criteria of optimality. However, these classical criteria are not adequate for the surveillance of safety-critical infrastructures. For such applications, it is suggested to minimize the worst-case probability of missed detection subject to acceptable levels on the worst-case probability of false alarm and false isolation. For the detection task, the optimization problem is formulated and solved in both scenarios: exactly and partially known parameters. The sub-optimal tests are obtained and their statistical properties are investigated. Preliminary results for the isolation task are also obtained. The proposed algorithms are applied to the detection and isolation of malicious attacks on a simple SCADA water network
... The Wager model is in fact an early application of statistical decision theory. A modern version of this theory was developed by Abraham Wald [13], see also [14]. As pointed out in [3,12], it is preferable to use the Bayesian version [2] of statistical decision theory for the Wager, since subjective probabilities are employed in order to assess degrees of beliefs, and these are gradually updated as more evidence is acquired, in accordance with Bayes' Theorem. ...
... A. 3 IMPACT OF BELIEF BY HEART ON DECISION RULE 3 In this section we will analyze how decision rule 3 is impacted when belief by heart is taken into account. This corresponds to the stopping rule * 3 of Section 3. Our result below will parallel Theorem 1, but when studying * 3 we also have to incorporate the agent's predicted degree of belief̂( | * , +1 ) in , due to revealed evidence, as defined in (14). When comparing 3 with * 3 , that is, comparing rule 3 without and with revealed evidence, it turns out that more is required to postpone the decision of accepting , for the model with revealed evidence, whenever ( | * , +1 ) ≥ ( | ). ...
Chapter
Full-text available
In this paper we study the temporal aspect of the decision between two mutually exclusive alternatives C and N, where N is the default state and C is an offer that is available for an unknown period of time. The primary example we have in mind, due to Blaise Pascal [1], is when C corresponds to the option of becoming a Christian, whereas N is short for not taking this step. It is assumed that the decision maker or agent bases his decision on his rational belief in whether C is true, and his willingness to accept the offer. To this end we take a Bayesian approach [2] and quantify degrees of belief as posterior probabilities based on prior beliefs and evidence. [3] Two temporal aspects of the decision are highlighted. First, we use Bayesian sequential decision theory in order to give conditions under which it is preferable to postpone the decision or not. Second, we specify the way in which the agent is able to influence his decision. To this end, we divide rewards and degrees of belief into the following three components: 1) a foundational part, 2) circumstances, and 3) subjective preferences. Component 1 is identical for all humans, 2 is individual-specific and only caused by external influences, whereas 3 is also individual, and caused by internal influences from the agent himself. We conclude by discussing whether 1-3 have a deeper spiritual meaning and the connection between component 3 and free will. [4]
... Model-based FD is performed by comparing the system's measured variables with the information obtained from a mathematical model of the process. Some of the model-based FD techniques includes statistical hypothesis testing approaches, e.g., Bayesian, likelihood, minimax, etc, [41,42], observer-based approaches [43,44], interval approaches [45]. In contrast to model-based approaches, where a priori knowledge about the inspected system is needed, in data-based methods, only the availability of historical process data is required [46]. ...
... The GLRT is an important statistical method that can be used to solve composite hypotheses testing problems by maximizing the likelihood ratio function over all possible faults [42,80,81,82,17]. Consider the fault detection problem, where a measured vector E ∈ R N follows one of the two Gaussian distributions: ...
Thesis
Process monitoring is becoming increasingly important to maintain reliable and safe process operation. Among the most important applications of process safety are those related to environmental and chemical processes. A critical fault in a chemical or a petrochemical process may not only cause a degradation in the process performance or lower its product quality, but it can also result in catastrophes that may lead to fatal accidents and substantial economic losses. Therefore, detecting anomalies in chemical processes is vital for their safe and proper operations. Also, abnormal atmospheric pollution levels negatively affect the public health, animals, plants, climate, and damage the natural resources. Therefore, monitoring air quality is also crucial for the safety of humans and the environment. Thus, the main aim of this study is to develop enhanced fault detection methods than can improve air quality monitoring and the operation of chemical processes. When a model of the monitored process is available, model-based monitoring methods rely on comparing the process measured variables with the information obtained from the available model. Unfortunately, accurate models may not be available, especially for complex chemical and environmental processes. In the absence of a process model, latent variable models, such principal component analysis (PCA) and partial least squares (PLS), have been successfully used in monitoring processes with highly correlated process variables. When a process model is available, on the other hand, statistical hypothesis testing methods, such as the generalized likelihood ratio test (GLRT), have shown good fault detection abilities. In this thesis, extensions using nonlinear models and input latent variable regression techniques (such as PCA) are made to achieve further improvements and widen the applicability of the developed methods in practice. Also, kernel PCA is used to deal process nonlinearities. Unfortunately, PCA and kernel PCA models are batch and then they demand the availability of the process data before building the model. In most situations, however, fault detection is required online, i.e., as the data are collected from the process. Therefore, recursive PCA and kPCA-based statistical hypothesis testing, so recursive PCA and kernel PCA techniques will be developed in order to extend the advantages of the developed techniques to online processes. The third objective of this work is to utilize the developed fault detection methods to enhance monitoring various chemical and environmental processes. The developed fault detection techniques are used to enhance monitoring the concentration levels of various air pollutants, such as ozone, nitrogen oxides, sulfur oxides, dust, and others. Real air pollution data from France are used in this important application. The developed fault detection methods are also utilized to enhance monitoring of various chemical processes such as continuous stirred-tank reactor (CSTR) and Tennessee Eastman process (TEP).
... This is a symmetric loss which assumes that the under-estimation and over-estimation of the same magnitude are equally serious. But, for estimating the scale parameter, Ferguson (2014) modified this loss function and defined it as, ...
Article
Full-text available
The aim of this article is to investigate the association between the classical and Bayesian approaches through Fisher information. For any particular distribution, the computation of Fisher information is quite significant, as it provides the amount of information about the unknown parameter inferred from the observed data and is related to classical methods of estimation. Also, in the light of some prior knowledge, we may estimate the unknown parameter through Bayesian approach. Specifically, we want to see a relationship between information and Bayes estimation. In this article, the scale parameter of the one-parameter exponential distribution is estimated under the weighted squared error and Kullback-Leibler distance loss functions. The information acquired from both the classical and Bayesian methodologies have been connected through the risk intensity and error intensity which have been introduced in this article. The results of extensive simulation studies using these intensity measures show that the Bayes estimator performs more intensely as the amount of Fisher information increases. It is seen that the Fisher information, which is pivotal to many classical estimation methods, has a relationship with the Bayesian method depending on prior distribution, at least in this case, as the intensity measures of the Bayes estimator decrease with the increase in information. Further, to comprehend the theoretical notion of association, two real-life datasets have been included to show usefulness in practical field.
... The literature on statistical decision theory has followed Wald in measuring sampling and overall performance by risk and Bayes risk. See, for example, the texts of Ferguson (1967) and Berger (1985). ...
Article
Full-text available
The statistical decision theory pioneered by (Wald, Statistical decision functions, Wiley, 1950) has used state-dependent mean loss (risk) to measure the performance of statistical decision functions across potential samples. We think it evident that evaluation of performance should respect stochastic dominance, but we do not see a compelling reason to focus exclusively on mean loss. We think it instructive to also measure performance by other functionals that respect stochastic dominance, such as quantiles of the distribution of loss. This paper develops general principles and illustrative applications for statistical decision theory respecting stochastic dominance. We modify the Wald definition of admissibility to an analogous concept of stochastic dominance (SD) admissibility, which uses stochastic dominance rather than mean sampling performance to compare alternative decision rules. We study SD admissibility in two relatively simple classes of decision problems that arise in treatment choice. We reevaluate the relationship between the MLE, James–Stein, and James–Stein positive part estimators from the perspective of SD admissibility. We consider alternative criteria for choice among SD-admissible rules. We juxtapose traditional criteria based on risk, regret, or Bayes risk with analogous ones based on quantiles of state-dependent sampling distributions or the Bayes distribution of loss.
... Given the necessary background, the problem is to decide on optimal decision rules with reference to some measure of performance. We now proceed to outline the foundations of this theory; for more detailed accounts, see Wald [26], Blackwell and Girshick [27], Ferguson [28], and De Groot [6], among others. ...
Chapter
Full-text available
In statistics, frequentist approach has often been considered as the only appropriate way to carry out scientific and applied work. However, since the 1950s, Bayesian statistics has been progressively gaining ground in academia. The purpose of this study is to demonstrate the points of encounter between these two apparently opposite currents of thought. For it, several topics are reviewed, explaining what Bayes’ Theorem is by means of didactic examples. On the other hand, it is shown that the frequentist reject the central postulate of the Bayesian approach, but are forced to replace it with alternative solutions, the most generalized being the Maximum Likelihood. Facing this discrepancy, it is suggested that it could be a misinterpretation between both approaches and offer examples in which Bayes’ postulate and the Maximum Likelihood principle yield the same numerical answer. Then, inferences from a priori information, both non-informative and informative, are analyzed and the inferential proposals of both schools are explored. In addition, the fiducial approach, which works with sufficient statistics, is discussed. All these aspects are discussed from the mathematical perspectives of renowned statisticians such as Fisher, Keynes, Carnap, Good, Durbin, Box, Giere, Neyman, Pearson, among others. In addition, philosophical assumptions that philosophers such as Lakatos, Popper and Kuhn, among others, have failed to offer are sought in order to establish a possible reconciliation between these currents of statistical thought in apparent conflict.
... By Prokhorov's theorem, G is sequentially weakly compact, and since the weak topology on any finite dimensional linear space is metrizable, G and therefore G s are also compact. Now we use the fact that the Bayes risk r(π) considered as a functional of the prior π is a strictly concave functional due to the squared error loss being strictly convex [18,26]. It also being the case that r(π) is upper semicontinuous [34], sup π∈G is attained. ...
Article
Full-text available
In this paper, we propose a method for wavelet denoising of signals contaminated with Gaussian noise when prior information about the ${L^{2}}$-energy of the signal is available. Assuming the independence model, according to which the wavelet coefficients are treated individually, we propose simple, level-dependent shrinkage rules that turn out to be Γ-minimax for a suitable class of priors. The proposed methodology is particularly well suited in denoising tasks when the signal-to-noise ratio is low, which is illustrated by simulations on a battery of some standard test functions. Comparison to some commonly used wavelet shrinkage methods is provided.
... Here ̂ is a Bayes rule (Ferguson 1967, Robert 2007, and is an optimal decision if one decides to minimize the expected loss. ...
Preprint
Full-text available
Typical machine learning regression applications aim to report the mean or the median of the predictive probability distribution, via training with a squared or an absolute error scoring function. The importance of issuing predictions of more functionals of the predictive probability distribution (quantiles and expectiles) has been recognized as a means to quantify the uncertainty of the prediction. In deep learning (DL) applications, that is possible through quantile and expectile regression neural networks (QRNN and ERNN respectively). Here we introduce deep Huber quantile regression networks (DHQRN) that nest QRNNs and ERNNs as edge cases. DHQRN can predict Huber quantiles, which are more general functionals in the sense that they nest quantiles and expectiles as limiting cases. The main idea is to train a deep learning algorithm with the Huber quantile regression function, which is consistent for the Huber quantile functional. As a proof of concept, DHQRN are applied to predict house prices in Australia. In this context, predictive performances of three DL architectures are discussed along with evidential interpretation of results from an economic case study.
... where, δ(x) is the estimate of λ. Generalization on the squared error loss is termed as the weighted squared error loss function (WSELF) (Ferguson, 1967). So, the WSELF for the scale parameter λ is, ...
Article
Full-text available
Estimation of unknown parameters using different loss functions encompasses a major area in the decision theory. Specifically, distance loss functions are preferable as it measures the discrepancies between two probability density functions from the same family indexed by different parameters. In this article, Hellinger distance loss function is considered for scale parameter λ of two-parameter Rayleigh distribution. After simplifications, form of loss is obtained and that is meaningful if parameter is not large and Bayes estimate of λ is calculated under that loss function. So, the Bayes estimate may be termed as ‘Pseudo Bayes estimate’ with respect to the actual Hellinger distance loss function as it is obtained using approximations to actual loss. To compare the performance of the estimator under these loss functions, we also consider weighted squared error loss function (WSELF) which is usually used for the estimation of the scale parameter. An extensive simulation is carried out to study the behaviour of the Bayes estimators under the three different loss functions, i.e. simplified, actual and WSE loss functions. From the numericalresults it is found that the estimators perform well under the Hellinger distance loss function in comparison with the traditionally used WSELF. Also, we demonstrate the methodology by analyzing two real-life datasets.
... RL methods use Q-tables to model player behavior [84]. This type of player behavior modeling is only achieved with inverse RL in first-person shooters [81], educational games [85], and adventure games [86]. The IRL paradigm has recently received increasing attention for its LfD applications. ...
Article
Full-text available
Recent advances in the digital gaming industry have provided impressive demonstrations of highly skillful artificial intelligence agents capable of performing complex intelligent behaviors. Additionally, there is a significant increase in demand for intelligent agents that can imitate video game characters and human players to increase the perceived value of engagement, entertainment, and satisfaction. The believability of an artificial agent’s behavior is usually measured only by its ability in a specific task. Recent research has shown that ability alone is not enough to identify human-like behavior. In this work, we propose a case-based reasoning (CBR) approach to develop human-like agents using human game play traces to reduce model-based programming effort. The proposed framework builds on the demonstrated case storage, retrieval and solution methods by emphasizing the impact of seven different similarity measures. The goal of this framework is to allow agents to learn from a small number of demonstrations of a given task and immediately generalize to new scenarios of the same task without task-specific development. The performance of the proposed method is evaluated using instrumental measures of accuracy and similarity with multiple loss functions, e.g. by comparing traces left by agents and players. The study also developed an automated process to generate a corpus for a simulation case study of the Pac-Man game to validate our proposed model. We provide empirical evidence that CBR systems recognize human player behavior more accurately than trained models, with an average accuracy of 75%, and are easy to deploy. The believability of play styles between human players and AI agents was measured using two automated methods to validate the results. We show that the high p-values produced by these two methods confirm the believability of our trained agents.
... so that by exploiting (D. 15) it follows that ...
Preprint
Prophet inequalities are a central object of study in optimal stopping theory. A gambler is sent values online, sampled from an instance of independent distributions, in an adversarial, random or selected order, depending on the model. When observing each value, the gambler either accepts it as a reward or irrevocably rejects it and proceeds to observe the next value. The goal of the gambler, who cannot see the future, is maximising the expected value of the reward while competing against the expectation of a prophet (the offline maximum). In other words, one seeks to maximise the gambler-to-prophet ratio of the expectations. The model, in which the gambler selects the arrival order first, and then observes the values, is known as Order Selection. Recently it has been shown that in this model a ratio of $0.7251$ can be attained for any instance. If the gambler chooses the arrival order (uniformly) at random, we obtain the Random Order model. The worst case ratio over all possible instances has been extensively studied for at least $40$ years. Still, it is not known if carefully choosing the order, or simply taking it at random, benefits the gambler. We prove that, in the Random Order model, no algorithm can achieve a ratio larger than $0.7235$, thus showing for the first time that there is a real benefit in choosing the order.
... , d}, and V ∼ C ⊥ is independent of X ∼ K. Then C ∈ S K . The fact that U j is uniformly distributed over (0, 1) was first observed in Ferguson (1967). Also, it was shown in Rüschendorf (2009) that C ∈ S K . ...
Article
In this article, we present a review of important results and statistical applications obtained or generalized by Canadian pioneers and their collaborators, for empirical processes of independent and identically distributed observations, pseudo-observations, and time series. In particular, we consider weak convergence and strong approximations results, as well as tests for model adequacy such as tests of independence , tests of goodness-of-fit, tests of change-point and tests of serial dependence for time series. We also consider applications of empirical processes of interacting particle systems for the approximation of measure-valued processes. Résumé: Dans cet article, nous faisons une revue de résultats importants et des applications statistiques des processus empiriques obtenus ou généralisés par d'éminents chercheurs canadiens et leurs collab-orateurs, pour des observations indépendantes et identiquement distribuées, des pseudo-observations, ainsi que des séries chronologiques. En particulier, nous considérons des résultats de convergence en loi, d'approximations fortes, ainsi que des tests pour la validation de modèles tels que des tests d'indépendance, des tests d'adéquations, des tests de rupture, et des tests d'indépendance sérielle pour des séries chronologiques. Nous considérons aussi des applications des processus empiriques de systèmes de particules interactives, notamment pour l'approximation de processusà valeurs mesures. La revue cana-dienne de statistique 50: 1-29; 2022
... After some time, observational data about the real-world system becomes available and is combined with the model estimate to help it better reflect the true state of the system and to reduce the uncertainty of the model predictions. Since the data are utilized as they are collected the process has a relationship to sequential statistical estimation (Ferguson, 1967) and optimal interpolation (Daley, 1991). ...
Preprint
Full-text available
Wildland fires pose an increasingly serious problem in our society. The number and severity of these fires has been rising for many years. Wildfires pose direct threats to life and property as well as threats through ancillary effects like reduced air quality. The aim of this thesis is to develop techniques to help combat the impacts of wildfires by improving wildfire modeling capabilities by using satellite fire observations. Already much work has been done in this direction by other researchers. Our work seeks to expand the body of knowledge using mathematically sound methods to utilize information about wildfires that considers the uncertainties inherent in the satellite data. In this thesis we explore methods for using satellite data to help initialize and steer wildfire simulations. In particular, we develop a method for constructing the history of a fire, a new technique for assimilating wildfire data, and a method for modifying the behavior of a modeled fire by inferring information about the fuels in the fire domain. These goals rely on being able to estimate the time a fire first arrived at every location in a geographic region of interest. Because detailed knowledge of real wildfires is typically unavailable, the basic procedure for developing and testing the methods in this thesis will be to first work with simulated data so that the estimates produced can be compared with known solutions. The methods thus developed are then applied to real-world scenarios. Analysis of these scenarios shows that the work with constructing the history of fires and data assimilation improves improves fire modeling capabilities. The research is significant because it gives us a better understanding of the capabilities and limitations of using satellite data to inform wildfire models and it points the way towards new avenues for modeling fire behavior.
... is any Borel set in R d . Using Theorem 16.13 on page 229 in [9] and Lemma 3 on page 74 in [18], we obtain ...
Preprint
In this paper, we consider constrained discounted stochastic games with a countably generated state space and norm continuous transition probability having a density function. We prove existence of approximate stationary equilibria and stationary weak correlated equilibria. Our results imply the existence of stationary Nash equilibrium in $ARAT$ stochastic games.
... In this part, apart from the two commonly used prognostic metrics, RMSE and score function [25] , two other metrics, i.e.,    accuracy [26] ( A    ) and average interval score [27] (AIS), are also utilized to evaluate quantitatively the prediction performance. A    is a binary metric that evaluates whether the prediction result falls within bounds   at time  . ...
Article
Full-text available
Recently, deep learning is widely used in the field of remaining useful life (RUL) prediction. Among various deep learning technologies, recurrent neural network (RNN) and its variant, e.g., long short-term memory (LSTM) network, are gaining more attention because of their capability of capturing temporal dependence. Although the existing RNN-based approaches have demonstrated their RUL prediction effectiveness, they still suffer from the following two limitations: 1) it is difficult for RNN to extract directly degradation features from original monitoring data, and 2) most of the RNN-based prognostics methods are unable to quantify the uncertainty of prediction results. To address the above limitations, this paper proposes a new method named Residual convolution LSTM (RC-LSTM) network. In RC-LSTM, a new ResNet-based convolution LSTM (Res-ConvLSTM) layer is stacked with convolution LSTM (ConvLSTM) layer to extract degradation representations from monitoring data. Then, predicated on the RUL following a normal distribution, an appropriate output layer is constructed to quantify the uncertainty of the forecast result. Finally, the effectiveness and superiority of RC-LSTM is verified using monitoring data from accelerated degradation tests of rolling element bearings.
... Statistical decision theory [6,7,19] essentially builds upon the notions of preference and uncertainty, which are formalized in terms of utility functions and probability distributions, respectively (cf. Fig. 1, upper panel). ...
Preprint
Full-text available
Recent applications of machine learning (ML) reveal a noticeable shift from its use for predictive modeling in the sense of a data-driven construction of models mainly used for the purpose of prediction (of ground-truth facts) to its use for prescriptive modeling. What is meant by this is the task of learning a model that stipulates appropriate decisions about the right course of action in real-world scenarios: Which medical therapy should be applied? Should this person be hired for the job? As argued in this article, prescriptive modeling comes with new technical conditions for learning and new demands regarding reliability, responsibility, and the ethics of decision making. Therefore, to support the data-driven design of decision-making agents that act in a rational but at the same time responsible manner, a rigorous methodological foundation of prescriptive ML is needed. The purpose of this short paper is to elaborate on specific characteristics of prescriptive ML and to highlight some key challenges it implies. Besides, drawing connections to other branches of contemporary AI research, the grounding of prescriptive ML in a (generalized) decision-theoretic framework is advocated.
... Bayesian test and minimax test, involves designing a test such that it satisfies only one criterion while a test designed following bi-criteria approach must satisfy two criteria of optimality simultaneously. The readers are referred to [20,204,205] for more details of monocriteria approach. The bi-criteria approach is discussed in this chapter. ...
Thesis
The twenty-first century witnesses the digital revolution that allows digital media to become ubiquitous. They play a more and more important role in our everyday life. Similarly, sophisticated image editing software has been more accessible, resulting in the fact that falsified images are appearing with a growing frequency and sophistication. The credibility and trustworthiness of digital images have been eroded. To restore the trust to digital images, the field of digital image forensics was born. This thesis is part of the field of digital image forensics. Two important problems are addressed: image origin identification and hidden data detection. These problems are cast into the framework of hypothesis testing theory. The approach proposes to design a statistical test that allows us to guarantee a prescribed false alarm probability. In order to achieve a high detection performance, it is proposed to exploit statistical properties of natural images by modeling the main steps of image processing pipeline of a digital camera. The methodology throughout this manuscript consists of studying an optimal test given by the Likelihood Ratio Test in the ideal context where all model parameters are known in advance. When the model parameters are unknown, a method is proposed for parameter estimation in order to design a Generalized Likelihood Ratio Test whose statistical performances are analytically established. Numerical experiments on simulated and real images highlight the relevance of the proposed approach
... To learn a quantile map with a machine learning model, the model is trained to minimize the quantile loss for any given quantile α ∈ (0, 1). The quantile loss for an individual sample x i is defined as [8] ...
... This is a central problem of statistical decision theory, unfortunately it is unsolvable. There are methods to accomplish partial objectives [26]. The Minimax approach tries to minimize the risk of the worst-case scenario, often leading to designs that are too conservative. ...
Conference Paper
The use of computational methods in design engineering is growing rapidly at all stages of the design process, with the final goal of a substantial reduction of the cost and time for the development of a design. Simulations and optimization algorithms can be combined together into what is known as Simulation-Based Design (SBD) techniques. Using these tools the designers may find the minimum of some user defined objective functions with constraints, under the general mathematical framework of a Non-Linear Programming problem. There are problems of course: computational complexity, noise, robustness and accuracy of the numerical simulations, flexibility in the use of these tools; all these issues will have to be solved before the SBD methodology can become more widespread. In the paper, some derivative-based algorithms and methods are initially described, including efficient ways to compute the gradient of the objective function. Derivative-free methods - such as genetic algorithms and swarm methods are then described and compared on both algebraic tests and on hydrodynamic design problems. Both local and global hydrodynamic ship design optimization problems are addressed, defined in either a single- or a multi-objective formulation framework. Methods for reducing the computational expense are presented. Metamodels (or surrogated models) are a rigorous framework for optimizing expensive computer simulations through the use of inexpensive approximations of expensive analysis codes. The Variable Fidelity idea tries instead to alleviate the computational expense of relying exclusively on high-fidelity models by taking advantage of well-established engineering approximation concepts. Examples of real ship hydrodynamic design optimization cases are given, reporting results mostly collected through a series of projects funded by the Office of Naval Research. Whenever possible, an experimental check of the success of the optimization process is always advisable. Several examples of this testing activity are reported in the paper one is illustrated by the two pictures at the top of this page, which show the wave pattern close to the sonar dome of an Italian Navy Anti-Submarine Warfare corvette: left, the original design; right, the optimized one.
... 29 I am grateful to Gary Chamberlain here for pointing out the relevance of Wald's Complete Class Theorem. See, e.g., Ferguson (1967) for details. explore other scoring systems and see whether they also give rise to Lockean patterns of beliefs. ...
Article
Full-text available
On the Lockean thesis one ought to believe a proposition if and only if one assigns it a credence at or above a threshold (Foley in Am Philos Q 29(2):111–124, 1992). The Lockean thesis, thus, provides a way of characterizing sets of all-or-nothing beliefs. Here we give two independent characterizations of the sets of beliefs satisfying the Lockean thesis. One is in terms of betting dispositions associated with full beliefs and one is in terms of an accuracy scoring system for full beliefs. These characterizations are parallel to, but not merely derivative from, the more familiar Dutch Book (de Finetti in Theory of probability, vol 1, Wiley, London, 1974) and accuracy (Joyce in Philos Sci 65(4):575–603, 1998) arguments for probabilism.
... © 2015 NSP Natural Sciences Publishing Cor. Proof: the proof follows from the asymptotic normality of MLE as in [10] and the references listed below. ...
Article
Full-text available
In this paper, the estimation of R = P [Y < X], namely Stress-Strength model is studied when both X and Y are two independent random variables with the extended linear exponential distribution (ELED), under different assumptions about their parameters, Maximum likelihood estimator in the case of fixed two parameters (= 1, = 2), common unknown parameters (1 = 2 = , 1 = 2 =), and all unknown parameters (1 , 2 , 1 , 2 , ,) can also be obtained in explicit form. Estimating R with Bayes estimator with non-informative prior in the same previous cases with the same parameters , we obtain the asymptotic distribution of the maximum likelihood estimator and it can be used to construct confidence interval of R. Different methods are compared using simulations and one data analysis has been performed for illustrative purposes.
... where is a random variable with E [ ] < ∞ (Ferguson, 1967;Serfling, 2004). The two-dimensional plot ( , ( )) is called the quantile plot which was first proposed by Sir Francis Galton (Galton et al., 1885). ...
Article
Full-text available
We propose a new embedding method, named Quantile–Quantile Embedding (QQE), for distribution transformation and manifold embedding with the ability to choose the embedding distribution. QQE, which uses the concept of quantile–quantile plot from visual statistical tests, can transform the distribution of data to any theoretical desired distribution or empirical reference sample. Moreover, QQE gives the user a choice of embedding distribution in embedding the manifold of data into the low dimensional embedding space. It can also be used for modifying the embedding distribution of other dimensionality reduction methods, such as PCA, t-SNE, and deep metric learning, for better representation or visualization of data. We propose QQE in both unsupervised and supervised forms. QQE can also transform a distribution to either an exact reference distribution or its shape. We show that QQE allows for better discrimination of classes in some cases. Our experiments on different synthetic and image datasets show the effectiveness of the proposed embedding method.
... The squared error loss function is widely used by [9], [20], [10] and [21]. Afterward some time researchers [22], [23], [17], [24] and [25] pointed out that the use of symmetric loss function is inappropriate in some situations. There may be a situation where a negative error may be more serious than positive error or vice-versa. ...
Article
We study the problem of loss estimation that involves for an observable \(X \sim f_{\theta }\) the choice of a first-stage estimator \(\hat{\gamma }\) of \(\gamma (\theta )\), incurred loss \(L=L(\theta , \hat{\gamma })\), and the choice of a second-stage estimator \(\hat{L}\) of L. We consider both: (i) a sequential version where the first-stage estimate and loss are fixed and optimization is performed at the second-stage level, and (ii) a simultaneous version with a Rukhin-type loss function designed for the evaluation of \((\hat{\gamma }, \hat{L})\) as an estimator of \((\gamma , L)\). We explore various Bayesian solutions and provide minimax estimators for both situations (i) and (ii). The analysis is carried out for several probability models, including multivariate normal models \(N_d(\theta , \sigma ^2 I_d)\) with both known and unknown \(\sigma ^2\), Gamma, univariate and multivariate Poisson, and negative binomial models, and relates to different choices of the first-stage and second-stage losses. The minimax findings are achieved by identifying a least favourable sequence of priors and depend critically on particular Bayesian solution properties, namely situations where the second-stage estimator \(\hat{L}(x)\) is constant as a function of x.
Preprint
Full-text available
We present a comprehensive account of Rao's Score (RS) test, starting from the first principle of testing, and going over several applications and recent extensions. The paper is intended to be like a one-stop-shopping for RS test history.
Article
Full-text available
This paper proposes a new approach for dealing with imbalanced classes and prior probability shifts in supervised classification tasks. Coupled with any feature space partitioning method, our criterion aims to compute an almost-Bayesian randomized equalizer classifier for which the maxima of the class-conditional risks are minimized. Our approach belongs to the historically well-studied field of randomized minimax criteria. Our new criterion can be considered as a self-sufficient classifier, or can be easily coupled with any pretrained Convolutional Neural Networks and Decision Trees to address the issues of imbalanced classes and prior probability shifts. Numerical experiments compare our criterion to several state-of-the-art algorithms and show the relevance of our approach when it is necessary to well classify the minority classes and to equalize the risks per class. Experiments on the CIFAR-100 database show that our criterion scales well when the number of classes is large.
Article
Full-text available
We study a discrete-time multi-type Wright–Fisher population process. The mean-field dynamics of the stochastic process is induced by a general replicator difference equation. We prove several results regarding the asymptotic behavior of the model, focusing on the impact of the mean-field dynamics on it. One of the results is a limit theorem that describes sufficient conditions for an almost certain path to extinction, first eliminating the type which is the least fit at the mean-field equilibrium. The effect is explained by the metastability of the stochastic system, which under the conditions of the theorem spends almost all time before the extinction event in a neighborhood of the equilibrium. In addition to the limit theorems, we propose a maximization principle for a general deterministic replicator dynamics and study its implications for the stochastic model.
Preprint
Full-text available
In this paper, Bayesian inference has been applied to the generalized gamma distribution parameters based on the characteristic prior by utilizing the Fourier transformation of the cumulative distribution function. For comparing the characteristic prior with the informative gamma prior, the mean squared errors and the mean percentage errors for the parameters are studied based on both priors based on symmetric and asymmetric loss functions, via Monte Carlo simulations. The simulation results indicated that characteristic prior, which do not contain hyperparameters, are more efficient than informative gamma prior and provide better estimates. Finally, a numerical example is given to demonstrate the efficiency of the two priors.
Article
A group of experts with different prior beliefs must choose a treatment. A dataset is made public and leads to revisions of beliefs. We propose a model where the experts’ disagreements are resolved through bargaining, using the Nash bargaining solution. Experts bargain after disclosure of the dataset. Bargaining may lead to an inefficient use of information in a strong sense: experts receive a lower payoff in every state and for any prior belief (i.e., inadmissibility). Bargaining exhibits underreaction to information as compared to the normative solution in which experts bargain ex ante on the procedure used to exploit the data. (JEL C78, D82, D83)
Chapter
This chapter explores how a distortion metric can be incorporated into the design of a communication system for a loss tolerant source that incorporates an acknowledgment (ACK)/negative acknowledgment (NACK) feedback channel, thus allowing for the use of an automatic repeat query (ARQ) or Hybrid ARQ protocol. It explains a passive transmitter scenario, where, on receiving a NACK, the transmitter can only do a retransmission of the codeword transmitted earlier. The chapter argues that the task of designing a source–channel feedback generation rule for packet combining‐based ARQ can be mapped to a classical sequential decision problem. The tree‐structured quantizer is capable of coding in several stages; each stage provides a refinement of the previous stage. The chapter explains a joint source–channel coding scheme for real‐time traffic that, nevertheless, uses feedback‐based error control.
Article
Economists usually inform policymakers with conclusions that come from studying statistical expectations, or arithmetic means, of potential outcomes. I introduce other types of means to study, from the broader “quasilinear” family, and show that often they will better respect the needs of policymakers. The same logic reveals that a common bias correction from Goldberger and Kennedy is counterproductive because it implies a contradiction in policymaker needs. In making these arguments, I collate and build on many earlier contributions, which before now have been disjointed and spread across outlets for different research fields.
Thesis
The efficient operation of wastewater treatment plants (WWTP)s is a key to ensuring a sustainable healthy and green environment. Monitoring wastewater processes is helpful not only for evaluating the process operating conditions but also for inspecting product quality for the life of human communities. WWTP systems are of nonlinear nature and unstable with relations and time varying changing daily (air temperature, rain,...). The state variables in WWTP can't be directly measured, therefore, state estimation-based fault detection methods required to be developed to achieve efficient monitoring. Therefore, the main objectives of this thesis are to develop a new fault detection techniques in WWTP plant, which can effectively address the effects of changing process environmental or operational conditions and have potential to be applied in WWTP practice to more effectively solve practical WWTP and fault detection problems. The developed model-based fault detection techniques are based on state and parameters estimation and univariate statistical detection charts. The state estimation methods extend the classical approaches to the nonlinear case considering non-Gaussian processes. More importantly, the developed methods provide effective estimation accuracies when compared to the classical state estimation techniques. The developed state estimation techniques are applied to evaluate the monitored residuals which are used for fault detection purposes. To perform the detection phase, a new control charts are developed to improve the fault detection abilities considering different types and sizes of faults. Both numerical simulation studies and experimental analysis have been conducted to verify the effectiveness and demonstrate potential practical applications of the developed methods. The detection performances are assessed in terms of missed detection rate, false alarm rate, detection speed, sensibility to fault sizes and robustness to noise levels. The results show the efficiency of developed monitoring strategies in terms of detection accuracies. The developed state estimation based fault detection techniques can also be used in a wide range of applications. In this thesis, they are utilized to improve the operation of wastewater Treatment Process. Examples of other applications include: chemical, biological, environmental and structural health monitoring systems. In summary, the present study has addressed a series of fundamental problems with WWTP, especially, problems associated with fault detection under normal and abnormal operational conditions. Simulated and experimental studies have demonstrated the potential and significance of these results in practical engineering applications.
Article
Full-text available
The efficiency of state-of-the-art algorithms for the dueling bandits problem is essentially due to a clever exploitation of (stochastic) transitivity properties of pairwise comparisons: If one arm is likely to beat a second one, which in turn is likely to beat a third one, then the first is also likely to beat the third one. By now, however, there is no way to test the validity of corresponding assumptions, although this would be a key prerequisite to guarantee the meaningfulness of the results produced by an algorithm. In this paper, we investigate the problem of testing different forms of stochastic transitivity in an online manner. We derive lower bounds on the expected sample complexity of any sequential hypothesis testing algorithm for various forms of stochastic transitivity, thereby providing additional motivation to focus on weak stochastic transitivity. To this end, we introduce an algorithmic framework for the dueling bandits problem, in which the statistical validity of weak stochastic transitivity can be tested, either actively or passively, based on a multiple binomial hypothesis test. Moreover, by exploiting a connection between weak stochastic transitivity and graph theory, we suggest an enhancement to further improve the efficiency of the testing algorithm. In the active setting, both variants achieve an expected sample complexity that is optimal up to a logarithmic factor.
Article
Administrative problems such as computer malfunction and power outage occasionally lead to missing item scores and hence to incomplete data on mastery tests such as the AP and U.S. Medical Licensing examinations. Investigators are often interested in estimating the probabilities of passing of the examinees with incomplete data on mastery tests. However, there is a lack of research on this estimation problem. The goal of this article is to suggest two new approaches—one each based on classical test theory and item response theory—for estimating the probabilities of passing of the examinees with incomplete data on mastery tests. The two approaches are demonstrated to have high accuracy and negligible misclassification rates.
Article
The problem of sequentially estimating quantiles is considered in the case when the observations become available at random times. Certain class of sequential estimation procedures which are composed of optimal stopping time and sequential minimum risk invariant estimator of a median is obtained under a invariant loss function and with the observation cost determined by a convex function of the moment of stopping and the number of observations up to this moment.
ResearchGate has not been able to resolve any references for this publication.