Statistical Models for Clustering Dynamic Gene Expression Profiles


Book Description

As a first attempt of its kind, we capitalize on the simplest Haar wavelet shrinkage technique to break an original signal down into spectrum by taking its averages and differences and, subsequently, to detect gene clusters that differ in the smooth coefficients extracting from noisy time series gene expression data. This wavelet-based model will have many implications for addressing biologically meaningful hypotheses at the interplay between gene actions/interactions and developmental pathways in various complex biological processes or networks.




Gene Expression Data Analysis


Book Description

Development of high-throughput technologies in molecular biology during the last two decades has contributed to the production of tremendous amounts of data. Microarray and RNA sequencing are two such widely used high-throughput technologies for simultaneously monitoring the expression patterns of thousands of genes. Data produced from such experiments are voluminous (both in dimensionality and numbers of instances) and evolving in nature. Analysis of huge amounts of data toward the identification of interesting patterns that are relevant for a given biological question requires high-performance computational infrastructure as well as efficient machine learning algorithms. Cross-communication of ideas between biologists and computer scientists remains a big challenge. Gene Expression Data Analysis: A Statistical and Machine Learning Perspective has been written with a multidisciplinary audience in mind. The book discusses gene expression data analysis from molecular biology, machine learning, and statistical perspectives. Readers will be able to acquire both theoretical and practical knowledge of methods for identifying novel patterns of high biological significance. To measure the effectiveness of such algorithms, we discuss statistical and biological performance metrics that can be used in real life or in a simulated environment. This book discusses a large number of benchmark algorithms, tools, systems, and repositories that are commonly used in analyzing gene expression data and validating results. This book will benefit students, researchers, and practitioners in biology, medicine, and computer science by enabling them to acquire in-depth knowledge in statistical and machine-learning-based methods for analyzing gene expression data. Key Features: An introduction to the Central Dogma of molecular biology and information flow in biological systems A systematic overview of the methods for generating gene expression data Background knowledge on statistical modeling and machine learning techniques Detailed methodology of analyzing gene expression data with an example case study Clustering methods for finding co-expression patterns from microarray, bulkRNA, and scRNA data A large number of practical tools, systems, and repositories that are useful for computational biologists to create, analyze, and validate biologically relevant gene expression patterns Suitable for multidisciplinary researchers and practitioners in computer science and the biological sciences




The Analysis of Gene Expression Data


Book Description

This book presents practical approaches for the analysis of data from gene expression micro-arrays. It describes the conceptual and methodological underpinning for a statistical tool and its implementation in software. The book includes coverage of various packages that are part of the Bioconductor project and several related R tools. The materials presented cover a range of software tools designed for varied audiences.




Current State-of-the-Art of Clustering Methods for Gene Expression Data with RNA-Seq


Book Description

Latest developments in high-throughput cDNA sequencing (RNA-seq) have revolutionized gene expression profiling. This analysis aims to compare the expression levels of multiple genes between two or more samples, under specific circumstances or in a specific cell to give a global picture of cellular function. Thanks to these advances, gene expression data are being generated in large throughput. One of the primary data analysis tasks for gene expression studies involves data-mining techniques such as clustering and classification. Clustering, which is an unsupervised learning technique, has been widely used as a computational tool to facilitate our understanding of gene functions and regulations involved in a biological process. Cluster analysis aims to group the large number of genes present in a sample of gene expression profile data, such that similar or related genes are in same clusters, and different or unrelated genes are in distinct ones. Classification on the other hand can be used for grouping samples based on their expression profile. There are many clustering and classification algorithms that can be applied in gene expression experiments, the most widely used are hierarchical clustering, k-means clustering and model-based clustering that depend on a model to sort out the number of clusters. Depending on the data structure, a fitting clustering method must be used. In this chapter, we present a state of art of clustering algorithms and statistical approaches for grouping similar gene expression profiles that can be applied to RNA-seq data analysis and software tools dedicated to these methods. In addition, we discuss challenges in cluster analysis, and compare the performance of height commonly used clustering methods on four different public datasets from recount2.




Advances in Bioinformatics and Computational Biology


Book Description

This book constitutes the refereed proceedings of the Third Brazilian Symposium on Bioinformatics, BSB 2008, held in Sao Paulo, Brazil, in August 2008 - co-located with IWGD 2008, the International Workshop on Genomic Databases. The 14 revised full papers and 5 extended abstracts were carefully reviewed and selected from 41 submissions. The papers address a broad range of current topics in computational biology and bioinformatics featuring original research in computer science, mathematics and statistics as well as in molecular biology, biochemistry, genetics, medicine, microbiology and other life sciences.




Statistics for Microarrays


Book Description

Interest in microarrays has increased considerably in the last ten years. This increase in the use of microarray technology has led to the need for good standards of microarray experimental notation, data representation, and the introduction of standard experimental controls, as well as standard data normalization and analysis techniques. Statistics for Microarrays: Design, Analysis and Inference is the first book that presents a coherent and systematic overview of statistical methods in all stages in the process of analysing microarray data – from getting good data to obtaining meaningful results. Provides an overview of statistics for microarrays, including experimental design, data preparation, image analysis, normalization, quality control, and statistical inference. Features many examples throughout using real data from microarray experiments. Computational techniques are integrated into the text. Takes a very practical approach, suitable for statistically-minded biologists. Supported by a Website featuring colour images, software, and data sets. Primarily aimed at statistically-minded biologists, bioinformaticians, biostatisticians, and computer scientists working with microarray data, the book is also suitable for postgraduate students of bioinformatics.




Topics in Applied Statistics


Book Description

This volume presents 27 selected papers in topics that range from statistical applications in business and finance to applications in clinical trials and biomarker analysis. All papers feature original, peer-reviewed content. The editors intentionally selected papers that cover many topics so that the volume will serve the whole statistical community and a variety of research interests. The papers represent select contributions to the 21st ICSA Applied Statistics Symposium. The International Chinese Statistical Association (ICSA) Symposium took place between the 23rd and 26th of June, 2012 in Boston, Massachusetts. It was co-sponsored by the International Society for Biopharmaceutical Statistics (ISBS) and American Statistical Association (ASA). This is the inaugural proceedings volume to share research from the ICSA Applied Statistics Symposium.




Handbook of Statistical Bioinformatics


Book Description

Numerous fascinating breakthroughs in biotechnology have generated large volumes and diverse types of high throughput data that demand the development of efficient and appropriate tools in computational statistics integrated with biological knowledge and computational algorithms. This volume collects contributed chapters from leading researchers to survey the many active research topics and promote the visibility of this research area. This volume is intended to provide an introductory and reference book for students and researchers who are interested in the recent developments of computational statistics in computational biology.




Springer Handbook of Automation


Book Description

This handbook incorporates new developments in automation. It also presents a widespread and well-structured conglomeration of new emerging application areas, such as medical systems and health, transportation, security and maintenance, service, construction and retail as well as production or logistics. The handbook is not only an ideal resource for automation experts but also for people new to this expanding field.




Approaches to Improve the Precision of Similarity Patterns and Reproducibility for Cluster Analysis


Book Description

This study is about developing new clustering analysis algorithms to analyze microarray gene expression data. With the use of clustering analysis, it is possible to infer the function of genes in a cluster by referring to those with known function in the same cluster. In microarray data, thousands of genes expression profiles are observed across different experimental conditions. Due to the complex experimental designs, the observations from different experimental conditions might be correlated. To account for the correlations from different experimental conditions and correlations among different genes, new clustering algorithms have been developed which are based on Bayesian infinite mixture models in a Bayesian data analysis framework. The correlations have been taken into account by specifying accurate variance-covariance matrices in statistical model definitions. In this way when correlations are present, the new algorithms can precisely represent the observed data. Consequently, the new algorithms produce more stable and reproducible cluster results. Mathematical and computational procedures have been developed and implemented through appropriate computer programs. Gibbs sampler was used to estimate the posterior distribution of clusters. Posterior pairwise probabilities (PPP) of co-clustering of two genes are obtained based on the estimated classification variable distribution. By treating PPPs as the pairwise similarity measures, clusters are formed using traditional hierarchical cluster analysis algorithms. The new algorithms and existing clustering algorithms were applied to simulated data, as well as real-world data to compare their performance. Compared with the existing clustering algorithms, when non-zero correlations exist, the new algorithms generally obtained more accurate and stable clustering results.