Statistical Methods for Family- Ased Association Studies for Complex Human Diseases: Single- Ocus and Haplotype Methods


Book Description

Disease-gene fine-mapping is an important task in human genetics. Linkage and association analyses are the two main approaches for exploring disease susceptibility genes. In Chapter 1, we introduce the development of methods for disease-gene mapping in the past decades and present the rationale behind our new method development. Family-based association analyses have provided powerful tools for disease-gene mapping. The Association in the Presence of Linkage test (APL), a family-based association method, can use nuclear families with multiple affected siblings and infer missing parental genotypes properly in the linkage region. In Chapter 2, we generalized and extended APL so that it can be applied to general nuclear family structures using a bootstrap variance estimator. Unlike the original APL that can handle at most two affected siblings, the new APL can handle up to three affected siblings. We also extended APL from a single-marker test to a multiple-marker haplotype analysis. According to our simulations, the new APL has a correct type I error rate and more power than other family-based association methods such as PDT, FBATD BAT, and PDTPHASE in nuclear families with missing parents. The robustness of APL when there are rare alleles or haplotypes and when there is population substructure such that the allele frequencies in the population deviated from the Hardy-Weinberg Equilibrium (HWE) assumption was also examined in Chapter 2. Genes on the X chromosome play a role in many common diseases. Linkage analyses have identified regions on the X chromosome with high linkage peaks for several diseases. Currently there are few family-based association methods available for X-chromosome markers. In order to fill in this gap, we proposed a novel family-based association method, X-APL, in Chapter 3. X-APL is a modification of APL and shares some important properties with APL. X-APL can also perform haplotype analyses, which is the only family-based test of associat.







Analysis of Complex Disease Association Studies


Book Description

According to the National Institute of Health, a genome-wide association study is defined as any study of genetic variation across the entire human genome that is designed to identify genetic associations with observable traits (such as blood pressure or weight), or the presence or absence of a disease or condition. Whole genome information, when combined with clinical and other phenotype data, offers the potential for increased understanding of basic biological processes affecting human health, improvement in the prediction of disease and patient care, and ultimately the realization of the promise of personalized medicine. In addition, rapid advances in understanding the patterns of human genetic variation and maturing high-throughput, cost-effective methods for genotyping are providing powerful research tools for identifying genetic variants that contribute to health and disease. This burgeoning science merges the principles of statistics and genetics studies to make sense of the vast amounts of information available with the mapping of genomes. In order to make the most of the information available, statistical tools must be tailored and translated for the analytical issues which are original to large-scale association studies. Analysis of Complex Disease Association Studies will provide researchers with advanced biological knowledge who are entering the field of genome-wide association studies with the groundwork to apply statistical analysis tools appropriately and effectively. With the use of consistent examples throughout the work, chapters will provide readers with best practice for getting started (design), analyzing, and interpreting data according to their research interests. Frequently used tests will be highlighted and a critical analysis of the advantages and disadvantage complimented by case studies for each will provide readers with the information they need to make the right choice for their research. Additional tools including links to analysis tools, tutorials, and references will be available electronically to ensure the latest information is available. Easy access to key information including advantages and disadvantage of tests for particular applications, identification of databases, languages and their capabilities, data management risks, frequently used tests Extensive list of references including links to tutorial websites Case studies and Tips and Tricks




Statistical Methods in Genetic Epidemiology


Book Description

This well-organized and clearly written text has a unique focus on methods of identifying the joint effects of genes and environment on disease patterns. It follows the natural sequence of research, taking readers through the study designs and statistical analysis techniques for determining whether a trait runs in families, testing hypotheses about whether a familial tendency is due to genetic or environmental factors or both, estimating the parameters of a genetic model, localizing and ultimately isolating the responsible genes, and finally characterizing their effects in the population. Examples from the literature on the genetic epidemiology of breast and colorectal cancer, among other diseases, illustrate this process. Although the book is oriented primarily towards graduate students in epidemiology, biostatistics and human genetics, it will also serve as a comprehensive reference work for researchers. Introductory chapters on molecular biology, Mendelian genetics, epidemiology, statistics, and population genetics will help make the book accessible to those coming from one of these fields without a background in the others. It strikes a good balance between epidemiologic study designs and statistical methods of data analysis.




Statistical Methods in Genetic Association


Book Description

Association studies offer great promise in dissecting the genetic basic of human complex diseases. The rapid expansion of genomic information and the cost-effective genotyping technologies have enabled us to systematically interrogate the role of human genetic variation in common diseases by genome-wide association (GWA) mapping. However, the scale and complexity of such studies will raise significant challenges in study design and data analysis. In this dissertation, we investigated several statistical problems that relevant to population-based association studies and the fine-scale mapping of genetic variants that influence susceptibility to complex diseases. First, we developed a variance-based effect size estimator for the locus-specific genetic effect. Comparing to the traditional measures, the proposed estimator is less sensitive to the risk allele frequency and the population prevalence of the disease. We demonstrated the sample size requirement would be considerable large to obtain an accurate estimate on moderate genetic effect and the sample size will increase exponentially with increased demand for precision. We next compared the power of different association test statistics. We observed that the genotype based single-locus tests is generally more powerful than the multi-locus or haplotype based statistics, especially for risk alleles far from additive; and the power of genotype based tests can be uniformly improved by applying the ordered restriction on genotypic risks. Finally, we tested different GWA strategies and explored the factors that may influence the power of GWA studies by extensive simulations using empirical genotype data from the HapMap ENCODE Project. Our results indicate that current commercial genome-wide typing products are capable of capturing most of the common risk variants; however, their power in detecting rare risk variants or variants within recombination hot spots is not satisfactory. We also showed that the properties of the risk variants (e.g. allele frequency, local recombination rate, and functional category) have significant impacts on the power of GWA. The results generated from this comprehensive exercise would be helpful for developing efficient GWA studies.







Haplotype-based Statistical Inference for Case-control Genetic Association Studies with Complex Sampling


Book Description

With the advances in human genome research, it is now believed that the risks of many complex diseases are triggered by the interplay of genetic susceptibilities and environmental exposures. The population-based case-control study (PBCCS) is widely used to investigate the role of genetic variants and environmental exposures in the etiology of complex diseases. There are numerous ways to implement the selection process of cases and controls. In its simplest form, a simple random sampling (SRS) design is used to choose cases and controls from diseased and disease-free population, respectively. Though SRS is easy to conduct and relevant statistical methodologies are well developed, more sophisticated complex sampling (like stratified, clustered, and multistage sampling) for the selection of cases and/or controls are needed for a number of reasons. First, complex sampling is more time and cost efficient than SRS. Second, representative sample can be chosen by conducting complex sampling and thus biased selection of cases and/or controls could be avoided. As a result, complex sampling is now being used increasingly in large-scale population-based case-control or cross-sectional genetic association studies. The analysis of complex sampling data, however, requires special attention due to the following reasons. First, varying selection probabilities as well as adjustments for nonresponse and incomplete coverage of the population at risk result in differential population weight for each individual. Secondly, multistage clustered sampling design will induce non-negligible intra-cluster correlation. It has been well recognized that invalid inferences can be drawn if we ignore these two complications. There are very limited literature regarding PBCCS with complex sampling. Therefore there is a need to develop statistical methods for properly addressing those complication induced by complex sampling in genetic association studies. In this dissertation, we propose a series of innovative statistical methods for genetic association studies that account for various sampling designs. Robust variance estimators have been developed using the Taylor Linearization technique to incorporate di erential weighting and clustering effect. Monte-Carlo simulation studies are utilized to study the properties of the proposed estimators under various sampling designs. The application of the proposed methods is also illustrated using the U.S. Kidney Cancer Study (USKCS), which is one of the largest PBCSS with genome available so far.




Novel Approaches to the Analysis of Family Data in Genetic Epidemiology


Book Description

Genome-wide association studies (GWAS) for complex disorders with large case-control populations have been performed on hundreds of traits in more than 1200 published studies (http://www.genome.gov/gwastudies/) but the variants detected by GWAS account for little of the heritability of these traits, leading to an increasing interest in using family based designs. While GWAS studies are designed to find common variants with low to moderate attributable risks, family based studies are expected to find rare variants with high attributable risk. Because family-based designs can better control both genetic and environmental background, this study design is robust to heterogeneity and population stratification. Moreover, in family-based analysis, the background genetic variation can be modeled to control the residual variance which could increase the power to identify disease associated rare variants. Analysis of families can also help us gain knowledge about disease transmission and inheritance patterns. Although a family-based design has the advantage of being robust to false positives, novel and powerful methods to analyze families in genetic epidemiology continue to be needed, especially for the interaction between genetic and environmental factors associated with disease. Moreover, with the rapid development of sequencing technology, advances in approaches to the design and analysis of sequencing data in families are also greatly needed. The 11 articles in this book all introduce new methodology and, using family data, substantial new findings are presented in the areas of infectious diseases, diabetes, eye traits, autism spectrum disorder and prostate cancer.




Design, Analysis, and Interpretation of Genome-Wide Association Scans


Book Description

This book presents the statistical aspects of designing, analyzing and interpreting the results of genome-wide association scans (GWAS studies) for genetic causes of disease using unrelated subjects. Particular detail is given to the practical aspects of employing the bioinformatics and data handling methods necessary to prepare data for statistical analysis. The goal in writing this book is to give statisticians, epidemiologists, and students in these fields the tools to design a powerful genome-wide study based on current technology. The other part of this is showing readers how to conduct analysis of the created study. Design and Analysis of Genome-Wide Association Studies provides a compendium of well-established statistical methods based upon single SNP associations. It also provides an introduction to more advanced statistical methods and issues. Knowing that technology, for instance large scale SNP arrays, is quickly changing, this text has significant lessons for future use with sequencing data. Emphasis on statistical concepts that apply to the problem of finding disease associations irrespective of the technology ensures its future applications. The author includes current bioinformatics tools while outlining the tools that will be required for use with extensive databases from future large scale sequencing projects. The author includes current bioinformatics tools while outlining additional issues and needs arising from the extensive databases from future large scale sequencing projects.




Comparison of Statistical Methods of Haplotype Reconstruction and Logistic Regression for Association Studies


Book Description

Investigating association between disease and single nucleotide polymorphisms (SNPs) has been an approach for genetic association studies and more recently investigating association between disease and haplotypes has become another accepted method. Haplotypes are physically linked combinations of alleles from a stretch of DNA and can serve to increase power of finding an association due to interactions between inclusive SNPs and the increased area of chromosome that is taken into consideration. Determining haplotypes experimentally or by family studies is a costly and timeinefficient method, so haplotype reconstruction by statistical methods has become an adopted practice. The problem with computational methods is the extra. source of error from ambiguous haplotypes that has to be included in statistical analysis. This paper investigates methods of error management with three different 1ogistic regression packages, two of which are specific to analysis of genetic data. Methods are applied to simulated data and a data set looking for genetic risk factors for non-Hodgkin Lymphoma.