Breakthroughs in Statistics


Book Description

Volume III includes more selections of articles that have initiated fundamental changes in statistical methodology. It contains articles published before 1980 that were overlooked in the previous two volumes plus articles from the 1980's - all of them chosen after consulting many of today's leading statisticians.




Study Design and Statistical Analysis


Book Description

This book takes the reader through the entire research process: choosing a question, designing a study, collecting the data, using univariate, bivariate and multivariable analysis, and publishing the results. It does so by using plain language rather than complex derivations and mathematical formulae. It focuses on the nuts and bolts of performing research by asking and answering the most basic questions about doing research studies. Making good use of numerous tables, graphs and tips, this book helps to demystify the process. A generous number of up-to-date examples from the clinical literature give an illustrated and practical account of how to use multivariable analysis.




Statistical Methods in Water Resources


Book Description

Data on water quality and other environmental issues are being collected at an ever-increasing rate. In the past, however, the techniques used by scientists to interpret this data have not progressed as quickly. This is a book of modern statistical methods for analysis of practical problems in water quality and water resources.The last fifteen years have seen major advances in the fields of exploratory data analysis (EDA) and robust statistical methods. The 'real-life' characteristics of environmental data tend to drive analysis towards the use of these methods. These advances are presented in a practical and relevant format. Alternate methods are compared, highlighting the strengths and weaknesses of each as applied to environmental data. Techniques for trend analysis and dealing with water below the detection limit are topics covered, which are of great interest to consultants in water-quality and hydrology, scientists in state, provincial and federal water resources, and geological survey agencies.The practising water resources scientist will find the worked examples using actual field data from case studies of environmental problems, of real value. Exercises at the end of each chapter enable the mechanics of the methodological process to be fully understood, with data sets included on diskette for easy use. The result is a book that is both up-to-date and immediately relevant to ongoing work in the environmental and water sciences.




A Nationwide Framework for Surveillance of Cardiovascular and Chronic Lung Diseases


Book Description

Chronic diseases are common and costly, yet they are also among the most preventable health problems. Comprehensive and accurate disease surveillance systems are needed to implement successful efforts which will reduce the burden of chronic diseases on the U.S. population. A number of sources of surveillance data-including population surveys, cohort studies, disease registries, administrative health data, and vital statistics-contribute critical information about chronic disease. But no central surveillance system provides the information needed to analyze how chronic disease impacts the U.S. population, to identify public health priorities, or to track the progress of preventive efforts. A Nationwide Framework for Surveillance of Cardiovascular and Chronic Lung Diseases outlines a conceptual framework for building a national chronic disease surveillance system focused primarily on cardiovascular and chronic lung diseases. This system should be capable of providing data on disparities in incidence and prevalence of the diseases by race, ethnicity, socioeconomic status, and geographic region, along with data on disease risk factors, clinical care delivery, and functional health outcomes. This coordinated surveillance system is needed to integrate and expand existing information across the multiple levels of decision making in order to generate actionable, timely knowledge for a range of stakeholders at the local, state or regional, and national levels. The recommendations presented in A Nationwide Framework for Surveillance of Cardiovascular and Chronic Lung Diseases focus on data collection, resource allocation, monitoring activities, and implementation. The report also recommends that systems evolve along with new knowledge about emerging risk factors, advancing technologies, and new understanding of the basis for disease. This report will inform decision-making among federal health agencies, especially the Department of Health and Human Services; public health and clinical practitioners; non-governmental organizations; and policy makers, among others.




An Introduction to Statistical Learning


Book Description

An Introduction to Statistical Learning provides an accessible overview of the field of statistical learning, an essential toolset for making sense of the vast and complex data sets that have emerged in fields ranging from biology to finance, marketing, and astrophysics in the past twenty years. This book presents some of the most important modeling and prediction techniques, along with relevant applications. Topics include linear regression, classification, resampling methods, shrinkage approaches, tree-based methods, support vector machines, clustering, deep learning, survival analysis, multiple testing, and more. Color graphics and real-world examples are used to illustrate the methods presented. This book is targeted at statisticians and non-statisticians alike, who wish to use cutting-edge statistical learning techniques to analyze their data. Four of the authors co-wrote An Introduction to Statistical Learning, With Applications in R (ISLR), which has become a mainstay of undergraduate and graduate classrooms worldwide, as well as an important reference book for data scientists. One of the keys to its success was that each chapter contains a tutorial on implementing the analyses and methods presented in the R scientific computing environment. However, in recent years Python has become a popular language for data science, and there has been increasing demand for a Python-based alternative to ISLR. Hence, this book (ISLP) covers the same materials as ISLR but with labs implemented in Python. These labs will be useful both for Python novices, as well as experienced users.




Clinical Prediction Models


Book Description

The second edition of this volume provides insight and practical illustrations on how modern statistical concepts and regression methods can be applied in medical prediction problems, including diagnostic and prognostic outcomes. Many advances have been made in statistical approaches towards outcome prediction, but a sensible strategy is needed for model development, validation, and updating, such that prediction models can better support medical practice. There is an increasing need for personalized evidence-based medicine that uses an individualized approach to medical decision-making. In this Big Data era, there is expanded access to large volumes of routinely collected data and an increased number of applications for prediction models, such as targeted early detection of disease and individualized approaches to diagnostic testing and treatment. Clinical Prediction Models presents a practical checklist that needs to be considered for development of a valid prediction model. Steps include preliminary considerations such as dealing with missing values; coding of predictors; selection of main effects and interactions for a multivariable model; estimation of model parameters with shrinkage methods and incorporation of external data; evaluation of performance and usefulness; internal validation; and presentation formatting. The text also addresses common issues that make prediction models suboptimal, such as small sample sizes, exaggerated claims, and poor generalizability. The text is primarily intended for clinical epidemiologists and biostatisticians. Including many case studies and publicly available R code and data sets, the book is also appropriate as a textbook for a graduate course on predictive modeling in diagnosis and prognosis. While practical in nature, the book also provides a philosophical perspective on data analysis in medicine that goes beyond predictive modeling. Updates to this new and expanded edition include: • A discussion of Big Data and its implications for the design of prediction models • Machine learning issues • More simulations with missing ‘y’ values • Extended discussion on between-cohort heterogeneity • Description of ShinyApp • Updated LASSO illustration • New case studies




The Elements of Statistical Learning


Book Description

During the past decade there has been an explosion in computation and information technology. With it have come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It should be a valuable resource for statisticians and anyone interested in data mining in science or industry. The book’s coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting---the first comprehensive treatment of this topic in any book. This major new edition features many topics not covered in the original, including graphical models, random forests, ensemble methods, least angle regression & path algorithms for the lasso, non-negative matrix factorization, and spectral clustering. There is also a chapter on methods for “wide” data (p bigger than n), including multiple testing and false discovery rates. Trevor Hastie, Robert Tibshirani, and Jerome Friedman are professors of statistics at Stanford University. They are prominent researchers in this area: Hastie and Tibshirani developed generalized additive models and wrote a popular book of that title. Hastie co-developed much of the statistical modeling software and environment in R/S-PLUS and invented principal curves and surfaces. Tibshirani proposed the lasso and is co-author of the very successful An Introduction to the Bootstrap. Friedman is the co-inventor of many data-mining tools including CART, MARS, projection pursuit and gradient boosting.




Small Clinical Trials


Book Description

Clinical trials are used to elucidate the most appropriate preventive, diagnostic, or treatment options for individuals with a given medical condition. Perhaps the most essential feature of a clinical trial is that it aims to use results based on a limited sample of research participants to see if the intervention is safe and effective or if it is comparable to a comparison treatment. Sample size is a crucial component of any clinical trial. A trial with a small number of research participants is more prone to variability and carries a considerable risk of failing to demonstrate the effectiveness of a given intervention when one really is present. This may occur in phase I (safety and pharmacologic profiles), II (pilot efficacy evaluation), and III (extensive assessment of safety and efficacy) trials. Although phase I and II studies may have smaller sample sizes, they usually have adequate statistical power, which is the committee's definition of a "large" trial. Sometimes a trial with eight participants may have adequate statistical power, statistical power being the probability of rejecting the null hypothesis when the hypothesis is false. Small Clinical Trials assesses the current methodologies and the appropriate situations for the conduct of clinical trials with small sample sizes. This report assesses the published literature on various strategies such as (1) meta-analysis to combine disparate information from several studies including Bayesian techniques as in the confidence profile method and (2) other alternatives such as assessing therapeutic results in a single treated population (e.g., astronauts) by sequentially measuring whether the intervention is falling above or below a preestablished probability outcome range and meeting predesigned specifications as opposed to incremental improvement.




Parametric Statistical Change Point Analysis


Book Description

Recently there has been a keen interest in the statistical analysis of change point detec tion and estimation. Mainly, it is because change point problems can be encountered in many disciplines such as economics, finance, medicine, psychology, geology, litera ture, etc. , and even in our daily lives. From the statistical point of view, a change point is a place or time point such that the observations follow one distribution up to that point and follow another distribution after that point. Multiple change points problem can also be defined similarly. So the change point(s) problem is two fold: one is to de cide if there is any change (often viewed as a hypothesis testing problem), another is to locate the change point when there is a change present (often viewed as an estimation problem). The earliest change point study can be traced back to the 1950s. During the fol lowing period of some forty years, numerous articles have been published in various journals and proceedings. Many of them cover the topic of single change point in the means of a sequence of independently normally distributed random variables. Another popularly covered topic is a change point in regression models such as linear regres sion and autoregression. The methods used are mainly likelihood ratio, nonparametric, and Bayesian. Few authors also considered the change point problem in other model settings such as the gamma and exponential.




Statistical Procedures for Agricultural Research


Book Description

Here in one easy-to-understand volume are the statistical procedures and techniques the agricultural researcher needs to know in order to design, implement, analyze, and interpret the results of most experiments with crops. Designed specifically for the non-statistician, this valuable guide focuses on the practical problems of the field researcher. Throughout, it emphasizes the use of statistics as a tool of research—one that will help pinpoint research problems and select remedial measures. Whenever possible, mathematical formulations and statistical jargon are avoided. Originally published by the International Rice Research Institute, this widely respected guide has been totally updated and much expanded in this Second Edition. It now features new chapters on the analysis of multi-observation data and experiments conducted over time and space. Also included is a chapter on experiments in farmers' fields, a subject of major concern in developing countries where agricultural research is commonly conducted outside experiment stations. Statistical Procedures for Agricultural Research, Second Edition will prove equally useful to students and professional researchers in all agricultural and biological disciplines. A wealth of examples of actual experiments help readers to choose the statistical method best suited for their needs, and enable even the most complicated procedures to be easily understood and directly applied. An International Rice Research Institute Book