PreprintPDF Available

Hypothesis testing near singularities and boundaries

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

The likelihood ratio statistic, with its asymptotic chi-squared distribution at regular model points, is often used for hypothesis testing. At model singularities and boundaries, however, the asymptotic distribution may not be chi-squared , as highlighted by recent work of Drton. Indeed, poor behavior of a chi-squared for testing near singularities and boundaries is apparent in simulations, and can lead to conservative or anti-conservative tests. Here we develop a new distribution designed for use in hypothesis testing near singularities and boundaries, which asymptotically agrees with that of the likelihood ratio statistic. For two example trinomial models, arising in the context of inference of evolutionary trees, we show the new distributions outperform a chi-squared.
Content may be subject to copyright.
A preview of the PDF is not available
ResearchGate has not been able to resolve any citations for this publication.
Book
Applied Probability and Stochastic Processes, Second Edition presents a self-contained introduction to elementary probability theory and stochastic processes with a special emphasis on their applications in science, engineering, finance, computer science, and operations research. It covers the theoretical foundations for modeling time-dependent random phenomena in these areas and illustrates applications through the analysis of numerous practical examples. The author draws on his 50 years of experience in the field to give your students a better understanding of probability theory and stochastic processes and enable them to use stochastic modeling in their work. New to the Second Edition. • Completely rewritten part on probability theory-now more than double in size. • New sections on time series analysis, random walks, branching processes, and spectral analysis of stationary stochastic processes. • Comprehensive numerical discussions of examples, which replace the more theoretically challenging sections. • Additional examples, exercises, and figures. Presenting the material in a student-friendly, application-oriented manner, this non-measure theoretic text only assumes a mathematical maturity that applied science students acquire during their undergraduate studies in mathematics. Many exercises allow students to assess their understanding of the topics. In addition, the book occasionally describes connections between probabilistic concepts and corresponding statistical approaches to facilitate comprehension. Some important proofs and challenging examples and exercises are also included for more theoretically interested readers.
Article
Numerous statistical methods have been developed to estimate evolutionary relationships among a collection of present-day species, typically represented by a phylogenetic tree, using the information contained in the DNA sequences sampled from representatives of each species. In the current era of high-throughput genome sequencing, the models underlying such methods have become increasingly sophisticated, and the resulting computations are often prohibitive. Here we consider the problem of rigorously testing the phylogenetic relationships among collections of four species under the multispecies coalescent model that accommodates both multi-locus datasets and SNP data. Our test employs a new statistic — the summed absolute differences between certain columns in flattened phylogenetic matrices — as well as a previously-used statistic that measures the distance of a flattened matrix from the space of rank-10 matrices. We derive distributional results for both statistics and study the performance of the corresponding hypothesis tests using both simulated and empirical data. We discuss how these tests may be used to improve inference of phylogenetic relationships for larger samples of species under the multispecies coalescent model, a problem that has until recently been computationally intractable.
Article
The importance of developing useful and appropriate statistical methods for analyzing discrete multivariate data is apparent from the enormous amount of attention this subject has commanded in the literature over the last thirty years. Central to these discussions has been Pearson's X2 statistic and the loglikelihood ratio statistic G2. Our review seeks to consolidate this fragmented literature and develop a unifying theme for much of this research. The traditional X2 and G2 statistics are viewed as members of the power-divergence family of statistics, and are linked through a single real-valued parameter. The principal areas covered in this comparative survey are small-sample comparisons of X2 and G2 under both classical (fixed-cells) assumptions and sparseness assumptions, efficiency comparisons, and various modifications to the test statistics (including parameter estimation for ungrouped data, data-dependent and overlapping cell boundaries, serially dependent data, and smoothing). Finally some future areas for research are discussed. /// Dans cet article nous examinons le rôle des statistiques X2 (de Pearson) et G2 (le logarithme du rapport de vraisemblance) dans une famille de statistiques qui servent à tester des hypothèses concernant les données discrètes et multivariées. Cette famile, qui s'appelle 'the power-divergence family of statistics' est engendrée par un seul paramètre λ; le cas λ = 1 correspond à X2, alors que λ = 0 correspond à G2. Certaines proprietés de la famille et certains cas particuliers sont presentés. Une statistique nouvelle (λ = ⅔) est souvent préférable à X2 et G2.
Article
This article investigates the family { I λ ;λ ϵ ℝ} of power divergence statistics for testing the fit of observed frequencies { X i ; i = 1, …, k } to expected frequencies { E i ; i = 1, …, k }. From the definition it can easily be seen that Pearson's X ² (λ = 1), the log likelihood ratio statistic (λ = 0), the Freeman‐Tukey statistic (λ = –½) the modified log likelihood ratio statistic (λ = –1) and the Neyman modified X ² (λ = –2), are all special cases. Most of the work presented is devoted to an analytic study of the asymptotic difference between different I λ however finite sample results have been presented as a check and a supplement to our conclusions. A new goodness‐of‐fit statistic, where λ = ⅔, emerges as an excellent and compromising alternative to the old warriors, I ⁰ and I ¹ .
Article
The analysis of moment structural models has become an important tool of investigation in behavioural, educational and economic studies. The chi-squared large-sample test is routinely employed to assess the goodness of fit of the model considered. However, in order to invoke the standard asymptotic distribution theory certain regularity conditions have to be met. Here we consider the case where the population value of the parameter vector is a boundary point of the feasible region. We show that in this case the asymptotic distribution of test statistic is a mixture of chi-squared distributions. The problem of finding the corresponding weights is discussed.
Article
Large sample properties of the likelihood function when the true parameter value may be on the boundary of the parameter space are described. Specifically, the asymptotic distribution of maximum likelihood estimators and likelihood ratio statistics are derived. These results generalize the work of Moran (1971), Chant (1974), and Chernoff (1954). Some of Chant's results are shown to be incorrect.The approach used in deriving these results follows from comments made by Moran and Chant. The problem is shown to be asymptotically equivalent to the problem of estimating the restricted mean of a multivariate Gaussian distribution from a sample of size 1. In this representation the Gaussian random variable corresponds to the limit of the normalized score statistic and the estimate of the mean corresponds to the limit of the normalized maximum likelihood estimator. Thus the limiting distribution of the maximum likelihood estimator is the same as the distribution of the projection of the Gaussian random variable onto the region of admissible values for the mean.A variety of examples is provided for which the limiting distributions of likelihood ratio statistics are mixtures of chi-squared distributions. One example is provided with a nuisance parameter on the boundary for which the asymptotic distribution is not a mixture of chi-squared distributions.
Chapter
This book is an introduction to the field of asymptotic statistics. The treatment is both practical and mathematically rigorous. In addition to most of the standard topics of an asymptotics course, including likelihood inference, M-estimation, the theory of asymptotic efficiency, U-statistics, and rank procedures, the book also presents recent research topics such as semiparametric models, the bootstrap, and empirical processes and their applications. The topics are organized from the central idea of approximation by limit experiments, which gives the book one of its unifying themes. This entails mainly the local approximation of the classical i.i.d. set up with smooth parameters by location experiments involving a single, normally distributed observation. Thus, even the standard subjects of asymptotic statistics are presented in a novel way. Suitable as a graduate or Master's level statistics text, this book will also give researchers an overview of research in asymptotic statistics.