Statistical and Inductive Inference by Minimum Message Length


Book Description

The Minimum Message Length (MML) Principle is an information-theoretic approach to induction, hypothesis testing, model selection, and statistical inference. MML, which provides a formal specification for the implementation of Occam's Razor, asserts that the ‘best’ explanation of observed data is the shortest. Further, an explanation is acceptable (i.e. the induction is justified) only if the explanation is shorter than the original data. This book gives a sound introduction to the Minimum Message Length Principle and its applications, provides the theoretical arguments for the adoption of the principle, and shows the development of certain approximations that assist its practical application. MML appears also to provide both a normative and a descriptive basis for inductive reasoning generally, and scientific induction in particular. The book describes this basis and aims to show its relevance to the Philosophy of Science. Statistical and Inductive Inference by Minimum Message Length will be of special interest to graduate students and researchers in Machine Learning and Data Mining, scientists and analysts in various disciplines wishing to make use of computer techniques for hypothesis discovery, statisticians and econometricians interested in the underlying theory of their discipline, and persons interested in the Philosophy of Science. The book could also be used in a graduate-level course in Machine Learning and Estimation and Model-selection, Econometrics and Data Mining. C.S. Wallace was appointed Foundation Chair of Computer Science at Monash University in 1968, at the age of 35, where he worked until his death in 2004. He received an ACM Fellowship in 1995, and was appointed Professor Emeritus in 1996. Professor Wallace made numerous significant contributions to diverse areas of Computer Science, such as Computer Architecture, Simulation and Machine Learning. His final research focused primarily on the Minimum Message Length Principle.




Statistical and Inductive Inference by Minimum Message Length


Book Description

Mythanksareduetothemanypeoplewhohaveassistedintheworkreported here and in the preparation of this book. The work is incomplete and this account of it rougher than it might be. Such virtues as it has owe much to others; the faults are all mine. MyworkleadingtothisbookbeganwhenDavidBoultonandIattempted to develop a method for intrinsic classi?cation. Given data on a sample from some population, we aimed to discover whether the population should be considered to be a mixture of di?erent types, classes or species of thing, and, if so, how many classes were present, what each class looked like, and which things in the sample belonged to which class. I saw the problem as one of Bayesian inference, but with prior probability densities replaced by discrete probabilities re?ecting the precision to which the data would allow parameters to be estimated. Boulton, however, proposed that a classi?cation of the sample was a way of brie?y encoding the data: once each class was described and each thing assigned to a class, the data for a thing would be partially implied by the characteristics of its class, and hence require little further description. After some weeks’ arguing our cases, we decided on the maths for each approach, and soon discovered they gave essentially the same results. Without Boulton’s insight, we may never have made the connection between inference and brief encoding, which is the heart of this work.




Advances in Computing and Information - ICCI '90


Book Description

This volume contains selected and invited papers presented at the International Conference on Computing and Information, ICCI '90, Niagara Falls, Ontario, Canada, May 23-26, 1990. ICCI conferences provide an international forum for presenting new results in research, development and applications in computing and information. Their primary goal is to promote an interchange of ideas and cooperation between practitioners and theorists in the interdisciplinary fields of computing, communication and information theory. The four main topic areas of ICCI '90 are: - Information and coding theory, statistics and probability, - Foundations of computer science, theory of algorithms and programming, - Concurrency, parallelism, communications, networking, computer architecture and VLSI, - Data and software engineering, databases, expert systems, information systems, decision making, and AI methodologies.




The Minimum Description Length Principle


Book Description

This introduction to the MDL Principle provides a reference accessible to graduate students and researchers in statistics, pattern classification, machine learning, and data mining, to philosophers interested in the foundations of statistics, and to researchers in other applied sciences that involve model selection.




Bayesian Networks and Decision Graphs


Book Description

This is a brand new edition of an essential work on Bayesian networks and decision graphs. It is an introduction to probabilistic graphical models including Bayesian networks and influence diagrams. The reader is guided through the two types of frameworks with examples and exercises, which also give instruction on how to build these models. Structured in two parts, the first section focuses on probabilistic graphical models, while the second part deals with decision graphs, and in addition to the frameworks described in the previous edition, it also introduces Markov decision process and partially ordered decision problems.




Statistical Inference as Severe Testing


Book Description

Mounting failures of replication in social and biological sciences give a new urgency to critically appraising proposed reforms. This book pulls back the cover on disagreements between experts charged with restoring integrity to science. It denies two pervasive views of the role of probability in inference: to assign degrees of belief, and to control error rates in a long run. If statistical consumers are unaware of assumptions behind rival evidence reforms, they can't scrutinize the consequences that affect them (in personalized medicine, psychology, etc.). The book sets sail with a simple tool: if little has been done to rule out flaws in inferring a claim, then it has not passed a severe test. Many methods advocated by data experts do not stand up to severe scrutiny and are in tension with successful strategies for blocking or accounting for cherry picking and selective reporting. Through a series of excursions and exhibits, the philosophy and history of inductive inference come alive. Philosophical tools are put to work to solve problems about science and pseudoscience, induction and falsification.







Advances in Minimum Description Length


Book Description

A source book for state-of-the-art MDL, including an extensive tutorial and recent theoretical advances and practical applications in fields ranging from bioinformatics to psychology.




The Nature of Statistical Learning Theory


Book Description

The aim of this book is to discuss the fundamental ideas which lie behind the statistical theory of learning and generalization. It considers learning as a general problem of function estimation based on empirical data. Omitting proofs and technical details, the author concentrates on discussing the main results of learning theory and their connections to fundamental problems in statistics. This second edition contains three new chapters devoted to further development of the learning theory and SVM techniques. Written in a readable and concise style, the book is intended for statisticians, mathematicians, physicists, and computer scientists.




Coding Ockham's Razor


Book Description

This book explores inductive inference using the minimum message length (MML) principle, a Bayesian method which is a realisation of Ockham's Razor based on information theory. Accompanied by a library of software, the book can assist an applications programmer, student or researcher in the fields of data analysis and machine learning to write computer programs based upon this principle. MML inference has been around for 50 years and yet only one highly technical book has been written about the subject. The majority of research in the field has been backed by specialised one-off programs but this book includes a library of general MML–based software, in Java. The Java source code is available under the GNU GPL open-source license. The software library is documented using Javadoc which produces extensive cross referenced HTML manual pages. Every probability distribution and statistical model that is described in the book is implemented and documented in the software library. The library may contain a component that directly solves a reader's inference problem, or contain components that can be put together to solve the problem, or provide a standard interface under which a new component can be written to solve the problem. This book will be of interest to application developers in the fields of machine learning and statistics as well as academics, postdocs, programmers and data scientists. It could also be used by third year or fourth year undergraduate or postgraduate students.