The Foundations of Statistics: A Simulation-based Approach


Book Description

Statistics and hypothesis testing are routinely used in areas (such as linguistics) that are traditionally not mathematically intensive. In such fields, when faced with experimental data, many students and researchers tend to rely on commercial packages to carry out statistical data analysis, often without understanding the logic of the statistical tests they rely on. As a consequence, results are often misinterpreted, and users have difficulty in flexibly applying techniques relevant to their own research — they use whatever they happen to have learned. A simple solution is to teach the fundamental ideas of statistical hypothesis testing without using too much mathematics. This book provides a non-mathematical, simulation-based introduction to basic statistical concepts and encourages readers to try out the simulations themselves using the source code and data provided (the freely available programming language R is used throughout). Since the code presented in the text almost always requires the use of previously introduced programming constructs, diligent students also acquire basic programming abilities in R. The book is intended for advanced undergraduate and graduate students in any discipline, although the focus is on linguistics, psychology, and cognitive science. It is designed for self-instruction, but it can also be used as a textbook for a first course on statistics. Earlier versions of the book have been used in undergraduate and graduate courses in Europe and the US. ”Vasishth and Broe have written an attractive introduction to the foundations of statistics. It is concise, surprisingly comprehensive, self-contained and yet quite accessible. Highly recommended.” Harald Baayen, Professor of Linguistics, University of Alberta, Canada ”By using the text students not only learn to do the specific things outlined in the book, they also gain a skill set that empowers them to explore new areas that lie beyond the book’s coverage.” Colin Phillips, Professor of Linguistics, University of Maryland, USA




Probability, Statistics, and Data


Book Description

This book is a fresh approach to a calculus based, first course in probability and statistics, using R throughout to give a central role to data and simulation. The book introduces probability with Monte Carlo simulation as an essential tool. Simulation makes challenging probability questions quickly accessible and easily understandable. Mathematical approaches are included, using calculus when appropriate, but are always connected to experimental computations. Using R and simulation gives a nuanced understanding of statistical inference. The impact of departure from assumptions in statistical tests is emphasized, quantified using simulations, and demonstrated with real data. The book compares parametric and non-parametric methods through simulation, allowing for a thorough investigation of testing error and power. The text builds R skills from the outset, allowing modern methods of resampling and cross validation to be introduced along with traditional statistical techniques. Fifty-two data sets are included in the complementary R package fosdata. Most of these data sets are from recently published papers, so that you are working with current, real data, which is often large and messy. Two central chapters use powerful tidyverse tools (dplyr, ggplot2, tidyr, stringr) to wrangle data and produce meaningful visualizations. Preliminary versions of the book have been used for five semesters at Saint Louis University, and the majority of the more than 400 exercises have been classroom tested.




Introductory Statistics with Randomization and Simulation


Book Description

This textbook may be downloaded as a free PDF on the project's website, and the paperback is sold royalty-free. OpenIntro develops free textbooks and course resources for introductory statistics that exceeds the quality standards of traditional textbooks and resources, and that maximizes accessibility options for the typical student. The approach taken in this textbooks differs from OpenIntro Statistics in its introduction to inference. The foundations for inference are provided using randomization and simulation methods. Once a solid foundation is formed, a transition is made to traditional approaches, where the normal and t distributions are used for hypothesis testing and the construction of confidence intervals.




Foundations of Data Science


Book Description

This book provides an introduction to the mathematical and algorithmic foundations of data science, including machine learning, high-dimensional geometry, and analysis of large networks. Topics include the counterintuitive nature of data in high dimensions, important linear algebraic techniques such as singular value decomposition, the theory of random walks and Markov chains, the fundamentals of and important algorithms for machine learning, algorithms and analysis for clustering, probabilistic models for large networks, representation learning including topic modelling and non-negative matrix factorization, wavelets and compressed sensing. Important probabilistic techniques are developed including the law of large numbers, tail inequalities, analysis of random projections, generalization guarantees in machine learning, and moment methods for analysis of phase transitions in large random graphs. Additionally, important structural and complexity measures are discussed such as matrix norms and VC-dimension. This book is suitable for both undergraduate and graduate courses in the design and analysis of algorithms for data.




Simulation-based Econometric Methods


Book Description

This book introduces a new generation of statistical econometrics. After linear models leading to analytical expressions for estimators, and non-linear models using numerical optimization algorithms, the availability of high- speed computing has enabled econometricians to consider econometric models without simple analytical expressions. The previous difficulties presented by the presence of integrals of large dimensions in the probability density functions or in the moments can be circumvented by a simulation-based approach. After a brief survey of classical parametric and semi-parametric non-linear estimation methods and a description of problems in which criterion functions contain integrals, the authors present a general form of the model where it is possible to simulate the observations. They then move to calibration problems and the simulated analogue of the method of moments, before considering simulated versions of maximum likelihood, pseudo-maximum likelihood, or non-linear least squares. The general principle of indirect inference is presented and is then applied to limited dependent variable models and to financial series.




OpenIntro Statistics


Book Description

The OpenIntro project was founded in 2009 to improve the quality and availability of education by producing exceptional books and teaching tools that are free to use and easy to modify. We feature real data whenever possible, and files for the entire textbook are freely available at openintro.org. Visit our website, openintro.org. We provide free videos, statistical software labs, lecture slides, course management tools, and many other helpful resources.




Statistical Inference as Severe Testing


Book Description

Mounting failures of replication in social and biological sciences give a new urgency to critically appraising proposed reforms. This book pulls back the cover on disagreements between experts charged with restoring integrity to science. It denies two pervasive views of the role of probability in inference: to assign degrees of belief, and to control error rates in a long run. If statistical consumers are unaware of assumptions behind rival evidence reforms, they can't scrutinize the consequences that affect them (in personalized medicine, psychology, etc.). The book sets sail with a simple tool: if little has been done to rule out flaws in inferring a claim, then it has not passed a severe test. Many methods advocated by data experts do not stand up to severe scrutiny and are in tension with successful strategies for blocking or accounting for cherry picking and selective reporting. Through a series of excursions and exhibits, the philosophy and history of inductive inference come alive. Philosophical tools are put to work to solve problems about science and pseudoscience, induction and falsification.




All of Statistics


Book Description

Taken literally, the title "All of Statistics" is an exaggeration. But in spirit, the title is apt, as the book does cover a much broader range of topics than a typical introductory book on mathematical statistics. This book is for people who want to learn probability and statistics quickly. It is suitable for graduate or advanced undergraduate students in computer science, mathematics, statistics, and related disciplines. The book includes modern topics like non-parametric curve estimation, bootstrapping, and classification, topics that are usually relegated to follow-up courses. The reader is presumed to know calculus and a little linear algebra. No previous knowledge of probability and statistics is required. Statistics, data mining, and machine learning are all concerned with collecting and analysing data.




Introduction to Statistical Investigations


Book Description

Introduction to Statistical Investigations leads students to learn about the process of conducting statistical investigations from data collection, to exploring data, to statistical inference, to drawing appropriate conclusions. The text is designed for a one-semester introductory statistics course. It focuses on genuine research studies, active learning, and effective use of technology. Simulations and randomization tests introduce statistical inference, yielding a strong conceptual foundation that bridges students to theory-based inference approaches. Repetition allows students to see the logic and scope of inference. This implementation follows the GAISE recommendations endorsed by the American Statistical Association.




Improving and extending quantitative reasoning in second language research


Book Description

Currents in Language Learning is a biennial book series published by Wiley and the Language Learning Research Club at the University of Michigan. It provides programmatic state-of-the-art overviews of current issues in the language sciences and their applications in first, second, and bi/multilingual language acquisition in naturalistic and tutored contexts. It brings together disciplinary perspectives from linguistics, psychology, education, anthropology, sociology, cognitive science, and neuroscience.