Chapter
To read the full-text of this research, you can request a copy directly from the author.

Abstract

Ihr volle wissenschaftstheoretisch-philosophische Tiefe erhält die Statistik erst mit der Betrachtung des allgemeinen Induktionsproblems. Man kann sie sogar als den am weitesten ausgearbeiteten theoretisch fundierten als auch praktisch erfolgreichen Versuch auffassen, jenes zu lösen. Die Formulierung von Tukey deutet bereits an, dass die Statistik nicht eine, sondern ein ganzes Spektrum spezieller Lösungen anbietet. Genauso wenig wie es den Stein der Weisen gibt, existiert ein Induktionsprinzip. Vielmehr gibt es eine ganze Reihe von Ansätzen und verschiedenartige Klassen von Argumenten um Verallgemeinerungen zu rechtfertigen.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

Article
Full-text available
One comparative analysis method of English comparative literature based on big data Bayesian method has been proposed to improve the calculation accuracy of comparative analysis process of English comparative literature. Firstly, one multilayer Bayesian classification recognition algorithm mechanism has been proposed for the problem of low accuracy and poor calculation efficiency of Bayesian classification algorithm under the big data environment; Gabor multilayer feature extraction algorithm has been proposed for multilayer Bayesian algorithm, Gabor multilayer feature extraction Bayesian algorithm has been designed and realized. Secondly, the algorithm process of comparative analysis system of English comparative literature based on this improved algorithm has been designed and the success rate of comparative analysis of English comparative literature has been improved effectively based on the features of comparative analysis of English comparative literature. Finally, the effectiveness of algorithm has been verified through experimental simulation.
Article
It is unlikely that anyone has had a greater impact on the methodology of scientific research in the twentieth century than Ronald Aylmer Fisher. From his early work in developing statistical methods needed for the interpretation of experimental data, he went on to recast the entire theoretical basis for mathematical statistics and to initiate the deliberate study and development of experimental design central to the whole process of the Natural Sciences. In clarifying the principles of inductive inference, Fisher greatly enlarged our understanding of the nature of uncertainty and contributed fundamentally to the philosophy of our age. This volume presents a selection from Fisher's letters on statistical inference and analysis and related topics. It also includes relevant material from the letters (from many distinguished scientists) to which he was replying. It is a companion volume to Natural selection, heredity, and eugenics: selected correspondence of R. A. Fisher with Leonard Darwin and others (ed. J. H. Bennett, Clarendon Press, Oxford, 1983).
Chapter
Highly Structured Stochastic Systems (HSSS) is a modern strategy for building statistical models for challenging real-world problems, for computing with them, and for interpreting the resulting inferences. Complexity is handled by working up from simple local assumptions in a coherent way, and that is the key to modelling, computation, inference and interpretation; the unifying framework is that of Bayesian hierarchical models. The aim of this book is to make recent developments in HSSS accessible to a general statistical audience. Graphical modelling and Markov chain Monte Carlo (MCMC) methodology are central to the field, and in this text they are covered in depth. The chapters on graphical modelling focus on causality and its interplay with time, the role of latent variables, and on some innovative applications. Those on Monte Carlo algorithms include discussion of the impact of recent theoretical work on the evaluation of performance in MCMC, extensions to variable dimension problems, and methods for dynamic problems based on particle filters. Coverage of these underlying methodologies is balanced by substantive areas of application - in the areas of spatial statistics (with epidemiological, ecological and image analysis applications) and biology (including infectious diseases, gene mapping and evolutionary genetics). The book concludes with two topics (model criticism and Bayesian nonparametrics) that seek to challenge the parametric assumptions that otherwise underlie most HSSS models. Altogether there are 15 topics in the book, and for each there is a substantial article by a leading author in the field, and two invited commentaries that complement, extend or discuss the main article, and should be read in parallel. All authors are distinguished researchers in the field, and were active participants in an international research programme on HSSS. This is the 27th volume in the Oxford Statistical Science Series, which includes texts and monographs covering many topics of current research interest in pure and applied statistics. These texts focus on topics that have been at the forefront of research interest for several years. Other books in the series include: J.Durbin and S.J.Koopman: Time series analysis by State Space Models; Peter J. Diggle, Patrick Heagerty, Kung-Yee Liang, Scott L. Zeger: Analysis of Longitudinal Data 2/e; J.K. Lindsey: Nonlinear Models in Medical Statistics; Peter J. Green, Nils L. Hjort and Sylvia Richardson: Highly Structured Stochastic Systems; Margaret S. Pepe: Statistical Evaluation of Medical Tests.
Article
Editor James Fetzer presents an analytical and historical introduction and a comprehensive bibliography together with selections of many of Carl G. Hempel’s most important studies to give students and scholars an ideal opportunity to appreciate the enduring contributions of one of the most influential philosophers of science of the 20th century.
Book
The authors address the assumptions and methods that allow us to turn observations into causal knowledge, and use even incomplete causal knowledge in planning and prediction to influence and control our environment. What assumptions and methods allow us to turn observations into causal knowledge, and how can even incomplete causal knowledge be used in planning and prediction to influence and control our environment? In this book Peter Spirtes, Clark Glymour, and Richard Scheines address these questions using the formalism of Bayes networks, with results that have been applied in diverse areas of research in the social, behavioral, and physical sciences. The authors show that although experimental and observational study designs may not always permit the same inferences, they are subject to uniform principles. They axiomatize the connection between causal structure and probabilistic independence, explore several varieties of causal indistinguishability, formulate a theory of manipulation, and develop asymptotically reliable procedures for searching over equivalence classes of causal models, including models of categorical data and structural equation models with and without latent variables. The authors show that the relationship between causality and probability can also help to clarify such diverse topics in statistics as the comparative power of experimentation versus observation, Simpson's paradox, errors in regression models, retrospective versus prospective sampling, and variable selection. The second edition contains a new introduction and an extensive survey of advances and applications that have appeared since the first edition was published in 1993. Bradford Books imprint
Article
The definition of second order interaction in a (2 × 2 × 2) table given by Bartlett is accepted, but it is shown by an example that the vanishing of this second order interaction does not necessarily justify the mechanical procedure of forming the three component 2 × 2 tables and testing each of these for significance by standard methods.*
Book
The Minimum Message Length (MML) Principle is an information-theoretic approach to induction, hypothesis testing, model selection, and statistical inference. MML, which provides a formal specification for the implementation of Occam's Razor, asserts that the ‘best’ explanation of observed data is the shortest. Further, an explanation is acceptable (i.e. the induction is justified) only if the explanation is shorter than the original data. This book gives a sound introduction to the Minimum Message Length Principle and its applications, provides the theoretical arguments for the adoption of the principle, and shows the development of certain approximations that assist its practical application. MML appears also to provide both a normative and a descriptive basis for inductive reasoning generally, and scientific induction in particular. The book describes this basis and aims to show its relevance to the Philosophy of Science. Statistical and Inductive Inference by Minimum Message Length will be of special interest to graduate students and researchers in Machine Learning and Data Mining, scientists and analysts in various disciplines wishing to make use of computer techniques for hypothesis discovery, statisticians and econometricians interested in the underlying theory of their discipline, and persons interested in the Philosophy of Science. The book could also be used in a graduate-level course in Machine Learning and Estimation and Model-selection, Econometrics and Data Mining. C.S. Wallace was appointed Foundation Chair of Computer Science at Monash University in 1968, at the age of 35, where he worked until his death in 2004. He received an ACM Fellowship in 1995, and was appointed Professor Emeritus in 1996. Professor Wallace made numerous significant contributions to diverse areas of Computer Science, such as Computer Architecture, Simulation and Machine Learning. His final research focused primarily on the Minimum Message Length Principle.
Book
"…the author has packaged an excellent and modern set of topics around the development and use of quantitative models.... If you need to learn about resampling, this book would be a good place to start." —Technometrics (Review of the Second Edition) This thoroughly revised and expanded third edition is a practical guide to data analysis using the bootstrap, cross-validation, and permutation tests. Only requiring minimal mathematics beyond algebra, the book provides a table-free introduction to data analysis utilizing numerous exercises, practical data sets, and freely available statistical shareware. Topics and Features * Practical presentation covers both the bootstrap and permutations along with the program code necessary to put them to work. * Includes a systematic guide to selecting the correct procedure for a particular application. * Detailed coverage of classification, estimation, experimental design, hypothesis testing, and modeling. * Suitable for both classroom use and individual self-study. New to the Third Edition * Procedures are grouped by application; a prefatory chapter guides readers to the appropriate reading matter. * Program listings and screen shots now accompany each resampling procedure: Whether one programs in C++, CART, Blossom, Box Sampler (an Excel add-in), EViews, MATLAB, R, Resampling Stats, SAS macros, S-PLUS, Stata, or StatXact, readers will find the program listings and screen shots needed to put each resampling procedure into practice. * To simplify programming, code for readers to download and apply is posted at http://www.springeronline.com/0-8176-4386-9. * Notation has been simplified and, where possible, eliminated. * A glossary and answers to selected exercises are included. With its accessible style and intuitive topic development, the book is an excellent basic resource for the power, simplicity, and versatility of resampling methods. It is an essential resource for statisticians, biostatisticians, statistical consultants, students, and research professionals in the biological, physical, and social sciences, engineering, and technology.
Book
No statistical model is "true" or "false," "right" or "wrong"; the models just have varying performance, which can be assessed. The main theme in this book is to teach modeling based on the principle that the objective is to extract the information from data that can be learned with suggested classes of probability models. The intuitive and fundamental concepts of complexity, learnable information, and noise are formalized, which provides a firm information theoretic foundation for statistical modeling. Inspired by Kolmogorov's structure function in the algorithmic theory of complexity, this is accomplished by finding the shortest code length, called the stochastic complexity, with which the data can be encoded when advantage is taken of the models in a suggested class, which amounts to the MDL (Minimum Description Length) principle. The complexity, in turn, breaks up into the shortest code length for the optimal model in a set of models that can be optimally distinguished from the given data and the rest, which defines "noise" as the incompressible part in the data without useful information. Such a view of the modeling problem permits a unified treatment of any type of parameters, their number, and even their structure. Since only optimally distinguished models are worthy of testing, we get a logically sound and straightforward treatment of hypothesis testing, in which for the first time the confidence in the test result can be assessed. Although the prerequisites include only basic probability calculus and statistics, a moderate level of mathematical proficiency would be beneficial. The different and logically unassailable view of statistical modelling should provide excellent grounds for further research and suggest topics for graduate students in all fields of modern engineering, including and not restricted to signal and image processing, bioinformatics, pattern recognition, and machine learning to mention just a few. The author is an Honorary Doctor and Professor Emeritus of the Technical University of Tampere, Finland, a Fellow of Helsinki Institute for Information Technology, and visiting Professor in the Computer Learning Research Center of University of London, Holloway, England. He is also a Foreign Member of Finland's Academy of Science and Letters, an Associate Editor of IMA Journal of Mathematical Control and Information and of EURASIP Journal on Bioinformatics and Systems Biology. He is also a former Associate Editor of Source Coding of IEEE Transactions on Information Theory. The author is the recipient of the IEEE Information Theory Society's 1993 Richard W. Hamming medal for fundamental contributions to information theory, statistical inference, control theory, and the theory of complexity; the Information Theory Society's Golden Jubilee Award in 1998 for Technological Innovation for inventing Arithmetic Coding; and the 2006 Kolmogorov medal by University of London. He has also received an IBM Corporate Award for the MDL and PMDL Principles in 1991, and two best paper awards.
Book
Graphics are great for exploring data, but how can they be used for looking at the large datasets that are commonplace to-day? This book shows how to look at ways of visualizing large datasets, whether large in numbers of cases or large in numbers of variables or large in both. Data visualization is useful for data cleaning, exploring data, identifying trends and clusters, spotting local patterns, evaluating modeling output, and presenting results. It is essential for exploratory data analysis and data mining. Data analysts, statisticians, computer scientists-indeed anyone who has to explore a large dataset of their own-should benefit from reading this book. New approaches to graphics are needed to visualize the information in large datasets and most of the innovations described in this book are developments of standard graphics. There are considerable advantages in extending displays which are well-known and well-tried, both in understanding how best to make use of them in your work and in presenting results to others. It should also make the book readily accessible for readers who already have a little experience of drawing statistical graphics. All ideas are illustrated with displays from analyses of real datasets and the authors emphasize the importance of interpreting displays effectively. Graphics should be drawn to convey information and the book includes many insightful examples. Antony Unwin holds the Chair of Computer Oriented Statistics and Data Analysis at the University of Augsburg. He has been involved in developing visualization software for twenty years. Martin Theus is a Senior Researcher at the University of Augsburg, has worked in industry and research in both Germany and the USA, and is the author of the visualization software Mondrian. Heike Hofmann is Assistant Professor of Statistics at Iowa State University. She wrote the software MANET and has also cooperated in the development of the GGobi software.
Article
Die Allgemeine Erkenntnislehre gilt als das Hauptwerk von Moritz Schlick. Hierin entwickelt Schlick in Auseinandersetzung mit zeitgenössischen Positionen seine einflussreichen Gedanken zum Wesen der Erkenntnis, zum Verhältnis zwischen Psychologie und Logik, zum Leib-Seele-Problem und zum erkenntnistheoretischen Realismusstreit. Der Text wurde während der frühen Rostocker Jahre Schlicks, von 1911 bis 1916, verfasst. Die Allgemeine Erkenntnislehre ist ein Meilenstein der wissenschaftlichen Philosophie und grundlegend für die spätere Entwicklung des Wiener Kreises des Logischen Empirismus.
Book
This fully updated and revised third edition, presents a wide ranging, balanced account of the fundamental issues across the full spectrum of inference and decision-making. Much has happened in this field since the second edition was published: for example, Bayesian inferential procedures have not only gained acceptance but are often the preferred methodology. This book will be welcomed by both the student and practising statistician wishing to study at a fairly elementary level, the basic conceptual and interpretative distinctions between the different approaches, how they interrelate, what assumptions they are based on, and the practical implications of such distinctions. As in earlier editions, the material is set in a historical context to more powerfully illustrate the ideas and concepts. Includes fully updated and revised material from the successful second edition Recent changes in emphasis, principle and methodology are carefully explained and evaluated Discusses all recent major developments Particular attention is given to the nature and importance of basic concepts (probability, utility, likelihood etc) Includes extensive references and bibliography Written by a well-known and respected author, the essence of this successful book remains unchanged providing the reader with a thorough explanation of the many approaches to inference and decision making.
Article
T he attempt to reinterpret the common tests of significance used in scientific research as though they constituted some kind of acceptance procedure and led to “decisions” in Wald's sense, originated in several misapprehensions and has led, apparently, to several more. The three phrases examined here, with a view to elucidating the fallacies they embody, are: “Repeated sampling from the same population”, Errors of the “second kind”, “Inductive behaviour”. Mathematicians without personal contact with the Natural Sciences have often been misled by such phrases. The errors to which they lead are not always only numerical.
Chapter
Mindless applications of regression models to poorly measured data in the social sciences are what Freedman deplores. I yield to nobody in my Opposition to mindlessness—in the natural sciences as in the social sciences—and I am steadfastly in favor of good measurement of theoretical relevant constructs. But frequently the lines between mindlessness and wise exploration, between measurement guided by lame-brained theories and that inspired by a truly visionary world view are unclear, distinguishable only through the myopia-correcting lenses of hindsight. Freedman uses such hindsight to discern and describe some examples of mathematical models in the natural sciences that have succeeded admirably, and he contrasts them to “typical regression models in the social sciences.” He suggests the comparison may be unfair and even cruel: In this suggestion he is correct; The comparison is certainly unfair. While social science comes off second-best by design, surely the tables would be turned were we to take shoddy examples of physical science and compare them with the best of theory-guided social science research, whether or not it uses regression analysis.
Chapter
In our various ways, Fienberg and I are both addressing the relevance of Standard Statistical models to social science research. I found it convenient to make part of my argument in terms of a comparison between the use of models in the natural sciences and in the social sciences. Fienberg seems to disagree more with my history lesson than with its conclusions, but that may be a matter of rhetoric—on both sides.
Chapter
Intuitive Statistics—Some Inferential Problems Multiplicity—A Pervasive Problem Some Remedies Theories for Data Analysis Uses for Mathematics In Defense of Controlled Magical Thinking
Article
The parametric likelihood and likelihood principle (LP) play a central role in parametric methodology and in the foundations of statistics.The main purpose of this article is to extend the concepts of likelihood and LP to general inferential aims and models covering, for example, prediction and empirical Bayes models. The likelihood function is the joint distribution of the data and the unobserved variables of inferential interest, considered as a function of the parameters and these inferential variables. LP is based on this likelihood, and the principles of sufficiency (SP) and conditionality (CP) are modified such that the equivalence SP and CP double left right arrow LP is valid, generalizing Birnbaum's theorem.
Article
Data mining is a new discipline lying at the interface of statistics, database technology, pattern recognition, machine learning, and other areas. It is concerned with the secondary analysis of large databases in order to find previously unsuspected relationships which are of interest or value to the database owners. New problems arise, partly as a consequence of the sheer size of the data sets involved, and partly because of issues of pattern matching. However, since statistics provides the intellectual glue underlying the effort, it is important for statisticians to become involved. There are very real opportunities for statisticians to make significant contributions.
Article
Who is there that has not longed that the power and privilege of selection among alternatives should be taken away from him in some important crisis of his life, and that his conduct should be arranged for him, either this way or that, by some divine power if it were possible, — by some patriarchal power in the absence of divinity, — or by chance even, if nothing better than chance could be found to do it? Anthony Trollope Phineas Finn Vol. II, Ch. LX. In the design and analysis of an experiment there are several places where an element of randomization can be used: the design can be selected at random, the result can have a random element adjoined to it, or the random element already present can be used in the analysis. The first technique is much used by statisticians; for example, in making a survey of a population, Basu (1980) calls it prerandomization.
Chapter
Regression models have not been so useful in the social sciences. In an attempt to see why, such models are contrasted with successful mathematical models in the natural sciences, including Kepler’s three laws of motion for the planets.