Statistical Analytics for Health Data Science with SAS and R


Book Description

This book aims to compile typical fundamental-to-advanced statistical methods to be used for health data sciences. Although the book promotes applications to health and health-related data, the models in the book can be used to analyze any kind of data. The data are analyzed with the commonly used statistical software of R/SAS (with online supplementary on SPSS/Stata). The data and computing programs will be available to facilitate readers’ learning experience. There has been considerable attention to making statistical methods and analytics available to health data science researchers and students. This book brings it all together to provide a concise point-of-reference for the most commonly used statistical methods from the fundamental level to the advanced level. We envisage this book will contribute to the rapid development in health data science. We provide straightforward explanations of the collected statistical theory and models, compilations of a variety of publicly available data, and illustrations of data analytics using commonly used statistical software of SAS/R. We will have the data and computer programs available for readers to replicate and implement the new methods. The primary readers would be applied data scientists and practitioners in any field of data science, applied statistical analysts and scientists in public health, academic researchers, and graduate students in statistics and biostatistics. The secondary readers would be R&D professionals/practitioners in industry and governmental agencies. This book can be used for both teaching and applied research.




Data Science and Predictive Analytics


Book Description

This textbook integrates important mathematical foundations, efficient computational algorithms, applied statistical inference techniques, and cutting-edge machine learning approaches to address a wide range of crucial biomedical informatics, health analytics applications, and decision science challenges. Each concept in the book includes a rigorous symbolic formulation coupled with computational algorithms and complete end-to-end pipeline protocols implemented as functional R electronic markdown notebooks. These workflows support active learning and demonstrate comprehensive data manipulations, interactive visualizations, and sophisticated analytics. The content includes open problems, state-of-the-art scientific knowledge, ethical integration of heterogeneous scientific tools, and procedures for systematic validation and dissemination of reproducible research findings. Complementary to the enormous challenges related to handling, interrogating, and understanding massive amounts of complex structured and unstructured data, there are unique opportunities that come with access to a wealth of feature-rich, high-dimensional, and time-varying information. The topics covered in Data Science and Predictive Analytics address specific knowledge gaps, resolve educational barriers, and mitigate workforce information-readiness and data science deficiencies. Specifically, it provides a transdisciplinary curriculum integrating core mathematical principles, modern computational methods, advanced data science techniques, model-based machine learning, model-free artificial intelligence, and innovative biomedical applications. The book’s fourteen chapters start with an introduction and progressively build foundational skills from visualization to linear modeling, dimensionality reduction, supervised classification, black-box machine learning techniques, qualitative learning methods, unsupervised clustering, model performance assessment, feature selection strategies, longitudinal data analytics, optimization, neural networks, and deep learning. The second edition of the book includes additional learning-based strategies utilizing generative adversarial networks, transfer learning, and synthetic data generation, as well as eight complementary electronic appendices. This textbook is suitable for formal didactic instructor-guided course education, as well as for individual or team-supported self-learning. The material is presented at the upper-division and graduate-level college courses and covers applied and interdisciplinary mathematics, contemporary learning-based data science techniques, computational algorithm development, optimization theory, statistical computing, and biomedical sciences. The analytical techniques and predictive scientific methods described in the book may be useful to a wide range of readers, formal and informal learners, college instructors, researchers, and engineers throughout the academy, industry, government, regulatory, funding, and policy agencies. The supporting book website provides many examples, datasets, functional scripts, complete electronic notebooks, extensive appendices, and additional materials.




Statistical Analytics for Health Data Science Using R/SAS


Book Description

"This book is aimed to compile typical fundamental to advanced statistical methods to be used for health data sciences. This book promotes the applications to health and health-related data. However, the models in this book can be used to analyse any kind of data. The data are analysed with the commonly used statistical software of R/SAS (with online supplementary on SPSS/Stata). The data and computing programs will be available to facilitate readers' learning experience. There has been considerable attention to making statistical methods and analytics available to health data science researchers and students. This book brings it all together to provide a concise point-of-reference for most commonly used statistical methods from the fundamental level to the advanced level. We envisage this book will contribute to the rapid development in health data science. We provide straightforward explanations of the collected statistical theory and models, compilations of a variety of publicly available data, and illustrations of data analytics using commonly used statistical software of SAS/R. We will have the data and computer programs available for readers to replicate and implement the new methods. The primary readers would be applied data scientists and practitioners in any field of data science, applied statistical analysts and scientists in public health, academic researchers, and graduate students in statistics and biostatistics. The secondary readers would be R&D professionals/practitioners in industry and governmental agencies. This book can be used for both teaching and applied research"--




End-to-End Data Science with SAS


Book Description

Learn data science concepts with real-world examples in SAS! End-to-End Data Science with SAS: A Hands-On Programming Guide provides clear and practical explanations of the data science environment, machine learning techniques, and the SAS programming knowledge necessary to develop machine learning models in any industry. The book covers concepts including understanding the business need, creating a modeling data set, linear regression, parametric classification models, and non-parametric classification models. Real-world business examples and example code are used to demonstrate each process step-by-step. Although a significant amount of background information and supporting mathematics are presented, the book is not structured as a textbook, but rather it is a user’s guide for the application of data science and machine learning in a business environment. Readers will learn how to think like a data scientist, wrangle messy data, choose a model, and evaluate the model’s effectiveness. New data scientists or professionals who want more experience with SAS will find this book to be an invaluable reference. Take your data science career to the next level by mastering SAS programming for machine learning models.




The Little SAS Book


Book Description

A classic that just keeps getting better, The Little SAS Book is essential for anyone learning SAS programming. Lora Delwiche and Susan Slaughter offer a user-friendly approach so that readers can quickly and easily learn the most commonly used features of the SAS language. Each topic is presented in a self-contained, two-page layout complete with examples and graphics. Nearly every section has been revised to ensure that the sixth edition is fully up-to-date. This edition is also interface-independent, written for all SAS programmers whether they use SAS Studio, SAS Enterprise Guide, or the SAS windowing environment. New sections have been added covering PROC SQL, iterative DO loops, DO WHILE and DO UNTIL statements, %DO statements, using variable names with special characters, the ODS EXCEL destination, and the XLSX LIBNAME engine. This title belongs on every SAS programmer's bookshelf. It's a resource not just to get you started, but one you will return to as you continue to improve your programming skills. Learn more about the updates to The Little SAS Book, Sixth Edition here. Reviews for The Little SAS Book, Sixth Edition can be read here.




Statistics for Health Data Science


Book Description

Students and researchers in the health sciences are faced with greater opportunity and challenge than ever before. The opportunity stems from the explosion in publicly available data that simultaneously informs and inspires new avenues of investigation. The challenge is that the analytic tools required go far beyond the standard methods and models of basic statistics. This textbook aims to equip health care researchers with the most important elements of a modern health analytics toolkit, drawing from the fields of statistics, health econometrics, and data science. This textbook is designed to overcome students’ anxiety about data and statistics and to help them to become confident users of appropriate analytic methods for health care research studies. Methods are presented organically, with new material building naturally on what has come before. Each technique is motivated by a topical research question, explained in non-technical terms, and accompanied by engaging explanations and examples. In this way, the authors cultivate a deep (“organic”) understanding of a range of analytic techniques, their assumptions and data requirements, and their advantages and limitations. They illustrate all lessons via analyses of real data from a variety of publicly available databases, addressing relevant research questions and comparing findings to those of published studies. Ultimately, this textbook is designed to cultivate health services researchers that are thoughtful and well informed about health data science, rather than data analysts. This textbook differs from the competition in its unique blend of methods and its determination to ensure that readers gain an understanding of how, when, and why to apply them. It provides the public health researcher with a way to think analytically about scientific questions, and it offers well-founded guidance for pairing data with methods for valid analysis. Readers should feel emboldened to tackle analysis of real public datasets using traditional statistical models, health econometrics methods, and even predictive algorithms. Accompanying code and data sets are provided in an author site: https://roman-gulati.github.io/statistics-for-health-data-science/




Learn R for Applied Statistics


Book Description

Gain the R programming language fundamentals for doing the applied statistics useful for data exploration and analysis in data science and data mining. This book covers topics ranging from R syntax basics, descriptive statistics, and data visualizations to inferential statistics and regressions. After learning R’s syntax, you will work through data visualizations such as histograms and boxplot charting, descriptive statistics, and inferential statistics such as t-test, chi-square test, ANOVA, non-parametric test, and linear regressions. Learn R for Applied Statistics is a timely skills-migration book that equips you with the R programming fundamentals and introduces you to applied statistics for data explorations. What You Will LearnDiscover R, statistics, data science, data mining, and big data Master the fundamentals of R programming, including variables and arithmetic, vectors, lists, data frames, conditional statements, loops, and functions Work with descriptive statistics Create data visualizations, including bar charts, line charts, scatter plots, boxplots, histograms, and scatterplots Use inferential statistics including t-tests, chi-square tests, ANOVA, non-parametric tests, linear regressions, and multiple linear regressions Who This Book Is For Those who are interested in data science, in particular data exploration using applied statistics, and the use of R programming for data visualizations.




Likelihood Methods in Survival Analysis


Book Description

Many conventional survival analysis methods, such as the Kaplan-Meier method for survival function estimation and the partial likelihood method for Cox model regression coefficients estimation, were developed under the assumption that survival times are subject to right censoring only. However, in practice, survival time observations may include interval-censored data, especially when the exact time of the event of interest cannot be observed. When interval-censored observations are present in a survival dataset, one generally needs to consider likelihood-based methods for inference. If the survival model under consideration is fully parametric, then likelihood-based methods impose neither theoretical nor computational challenges. However, if the model is semi-parametric, there will be difficulties in both theoretical and computational aspects. Likelihood Methods in Survival Analysis: With R Examples explores these challenges and provides practical solutions. It not only covers conventional Cox models where survival times are subject to interval censoring, but also extends to more complicated models, such as stratified Cox models, extended Cox models where time-varying covariates are present, mixture cure Cox models, and Cox models with dependent right censoring. The book also discusses non-Cox models, particularly the additive hazards model and parametric log-linear models for bivariate survival times where there is dependence among competing outcomes. Features Provides a broad and accessible overview of likelihood methods in survival analysis Covers a wide range of data types and models, from the semi-parametric Cox model with interval censoring through to parametric survival models for competing risks Includes many examples using real data to illustrate the methods Includes integrated R code for implementation of the methods Supplemented by a GitHub repository with datasets and R code The book will make an ideal reference for researchers and graduate students of biostatistics, statistics, and data science, whose interest in survival analysis extend beyond applications. It offers useful and solid training to those who wish to enhance their knowledge in the methodology and computational aspects of biostatistics.




Statistical Methods in Health Disparity Research


Book Description

• Presents an overview of methods and applications of health disparity estimation • First book to synthesize research in this field in a unified statistical framework • Covers classical approaches, and builds to more modern computational techniques • Includes many worked examples and case studies using real data • Discusses available software for estimation




Design and Analysis of Pragmatic Trials


Book Description

This book begins with an introduction of pragmatic cluster randomized trials (PCTs) and reviews various pragmatic issues that need to be addressed by statisticians at the design stage. It discusses the advantages and disadvantages of each type of PCT, and provides sample size formulas, sensitivity analyses, and examples for sample size calculation. The generalized estimating equation (GEE) method will be employed to derive sample size formulas for various types of outcomes from the exponential family, including continuous, binary, and count variables. Experimental designs that have been frequently employed in PCTs will be discussed, including cluster randomized designs, matched-pair cluster randomized design, stratified cluster randomized design, stepped-wedge cluster randomized design, longitudinal cluster randomized design, and crossover cluster randomized design. It demonstrates that the GEE approach is flexible to accommodate pragmatic issues such as hierarchical correlation structures, different missing data patterns, randomly varying cluster sizes, etc. It has been reported that the GEE approach leads to under-estimated variance with limited numbers of clusters. The remedy for this limitation is investigated for the design of PCTs. This book can assist practitioners in the design of PCTs by providing a description of the advantages and disadvantages of various PCTs and sample size formulas that address various pragmatic issues, facilitating the proper implementation of PCTs to improve health care. It can also serve as a textbook for biostatistics students at the graduate level to enhance their knowledge or skill in clinical trial design. Key Features: Discuss the advantages and disadvantages of each type of PCTs, and provide sample size formulas, sensitivity analyses, and examples. Address an unmet need for guidance books on sample size calculations for PCTs; A wide variety of experimental designs adopted by PCTs are covered; The sample size solutions can be readily implemented due to the accommodation of common pragmatic issues encountered in real-world practice; Useful to both academic and industrial biostatisticians involved in clinical trial design; Can be used as a textbook for graduate students majoring in statistics and biostatistics.