Foundations of Agnostic Statistics


Book Description

Provides an introduction to modern statistical theory for social and health scientists while invoking minimal modeling assumptions.




Foundations of Data Science


Book Description

This book provides an introduction to the mathematical and algorithmic foundations of data science, including machine learning, high-dimensional geometry, and analysis of large networks. Topics include the counterintuitive nature of data in high dimensions, important linear algebraic techniques such as singular value decomposition, the theory of random walks and Markov chains, the fundamentals of and important algorithms for machine learning, algorithms and analysis for clustering, probabilistic models for large networks, representation learning including topic modelling and non-negative matrix factorization, wavelets and compressed sensing. Important probabilistic techniques are developed including the law of large numbers, tail inequalities, analysis of random projections, generalization guarantees in machine learning, and moment methods for analysis of phase transitions in large random graphs. Additionally, important structural and complexity measures are discussed such as matrix norms and VC-dimension. This book is suitable for both undergraduate and graduate courses in the design and analysis of algorithms for data.




OpenIntro Statistics


Book Description

The OpenIntro project was founded in 2009 to improve the quality and availability of education by producing exceptional books and teaching tools that are free to use and easy to modify. We feature real data whenever possible, and files for the entire textbook are freely available at openintro.org. Visit our website, openintro.org. We provide free videos, statistical software labs, lecture slides, course management tools, and many other helpful resources.




Modern Mathematical Statistics with Applications


Book Description

This 3rd edition of Modern Mathematical Statistics with Applications tries to strike a balance between mathematical foundations and statistical practice. The book provides a clear and current exposition of statistical concepts and methodology, including many examples and exercises based on real data gleaned from publicly available sources. Here is a small but representative selection of scenarios for our examples and exercises based on information in recent articles: Use of the “Big Mac index” by the publication The Economist as a humorous way to compare product costs across nations Visualizing how the concentration of lead levels in cartridges varies for each of five brands of e-cigarettes Describing the distribution of grip size among surgeons and how it impacts their ability to use a particular brand of surgical stapler Estimating the true average odometer reading of used Porsche Boxsters listed for sale on www.cars.com Comparing head acceleration after impact when wearing a football helmet with acceleration without a helmet Investigating the relationship between body mass index and foot load while running The main focus of the book is on presenting and illustrating methods of inferential statistics used by investigators in a wide variety of disciplines, from actuarial science all the way to zoology. It begins with a chapter on descriptive statistics that immediately exposes the reader to the analysis of real data. The next six chapters develop the probability material that facilitates the transition from simply describing data to drawing formal conclusions based on inferential methodology. Point estimation, the use of statistical intervals, and hypothesis testing are the topics of the first three inferential chapters. The remainder of the book explores the use of these methods in a variety of more complex settings. This edition includes many new examples and exercises as well as an introduction to the simulation of events and probability distributions. There are more than 1300 exercises in the book, ranging from very straightforward to reasonably challenging. Many sections have been rewritten with the goal of streamlining and providing a more accessible exposition. Output from the most common statistical software packages is included wherever appropriate (a feature absent from virtually all other mathematical statistics textbooks). The authors hope that their enthusiasm for the theory and applicability of statistics to real world problems will encourage students to pursue more training in the discipline.




Text as Data


Book Description

A guide for using computational text analysis to learn about the social world From social media posts and text messages to digital government documents and archives, researchers are bombarded with a deluge of text reflecting the social world. This textual data gives unprecedented insights into fundamental questions in the social sciences, humanities, and industry. Meanwhile new machine learning tools are rapidly transforming the way science and business are conducted. Text as Data shows how to combine new sources of data, machine learning tools, and social science research design to develop and evaluate new insights. Text as Data is organized around the core tasks in research projects using text—representation, discovery, measurement, prediction, and causal inference. The authors offer a sequential, iterative, and inductive approach to research design. Each research task is presented complete with real-world applications, example methods, and a distinct style of task-focused research. Bridging many divides—computer science and social science, the qualitative and the quantitative, and industry and academia—Text as Data is an ideal resource for anyone wanting to analyze large collections of text in an era when data is abundant and computation is cheap, but the enduring challenges of social science remain. Overview of how to use text as data Research design for a world of data deluge Examples from across the social sciences and industry




Foundations of Machine Learning, second edition


Book Description

A new edition of a graduate-level machine learning textbook that focuses on the analysis and theory of algorithms. This book is a general introduction to machine learning that can serve as a textbook for graduate students and a reference for researchers. It covers fundamental modern topics in machine learning while providing the theoretical basis and conceptual tools needed for the discussion and justification of algorithms. It also describes several key aspects of the application of these algorithms. The authors aim to present novel theoretical tools and concepts while giving concise proofs even for relatively advanced topics. Foundations of Machine Learning is unique in its focus on the analysis and theory of algorithms. The first four chapters lay the theoretical foundation for what follows; subsequent chapters are mostly self-contained. Topics covered include the Probably Approximately Correct (PAC) learning framework; generalization bounds based on Rademacher complexity and VC-dimension; Support Vector Machines (SVMs); kernel methods; boosting; on-line learning; multi-class classification; ranking; regression; algorithmic stability; dimensionality reduction; learning automata and languages; and reinforcement learning. Each chapter ends with a set of exercises. Appendixes provide additional material including concise probability review. This second edition offers three new chapters, on model selection, maximum entropy models, and conditional entropy models. New material in the appendixes includes a major section on Fenchel duality, expanded coverage of concentration inequalities, and an entirely new entry on information theory. More than half of the exercises are new to this edition.




Demystifying Causal Inference


Book Description

This book provides an accessible introduction to causal inference and data analysis with R, specifically for a public policy audience. It aims to demystify these topics by presenting them through practical policy examples from a range of disciplines. It provides a hands-on approach to working with data in R using the popular tidyverse package. High quality R packages for specific causal inference techniques like ggdag, Matching, rdrobust, dosearch etc. are used in the book. The book is in two parts. The first part begins with a detailed narrative about John Snow’s heroic investigations into the cause of cholera. The chapters that follow cover basic elements of R, regression, and an introduction to causality using the potential outcomes framework and causal graphs. The second part covers specific causal inference methods, including experiments, matching, panel data, difference-in-differences, regression discontinuity design, instrumental variables and meta-analysis, with the help of empirical case studies of policy issues. The book adopts a layered approach that makes it accessible and intuitive, using helpful concepts, applications, simulation, and data graphs. Many public policy questions are inherently causal, such as the effect of a policy on a particular outcome. Hence, the book would not only be of interest to students in public policy and executive education, but also to anyone interested in analysing data for application to public policy.




Target Estimation and Adjustment Weighting for Survey Nonresponse and Sampling Bias


Book Description

We elaborate a general workflow of weighting-based survey inference, decomposing it into two main tasks. The first is the estimation of population targets from one or more sources of auxiliary information. The second is the construction of weights that calibrate the survey sample to the population targets. We emphasize that these tasks are predicated on models of the measurement, sampling, and nonresponse process whose assumptions cannot be fully tested. After describing this workflow in abstract terms, we then describe in detail how it can be applied to the analysis of historical and contemporary opinion polls. We also discuss extensions of the basic workflow, particularly inference for causal quantities and multilevel regression and poststratification.




Integrating Inferences


Book Description

Develops a new approach to the use of causal models for qualitative and mixed-method research design and causal inference.




Elementary Probability for Applications


Book Description

This clear and lively introduction to probability theory concentrates on the results that are the most useful for applications, including combinatorial probability and Markov chains. Concise and focused, it is designed for a one-semester introductory course in probability for students who have some familiarity with basic calculus. Reflecting the author's philosophy that the best way to learn probability is to see it in action, there are more than 350 problems and 200 examples. The examples contain all the old standards such as the birthday problem and Monty Hall, but also include a number of applications not found in other books, from areas as broad ranging as genetics, sports, finance, and inventory management.