Gene Expression Data Analysis


Book Description

Development of high-throughput technologies in molecular biology during the last two decades has contributed to the production of tremendous amounts of data. Microarray and RNA sequencing are two such widely used high-throughput technologies for simultaneously monitoring the expression patterns of thousands of genes. Data produced from such experiments are voluminous (both in dimensionality and numbers of instances) and evolving in nature. Analysis of huge amounts of data toward the identification of interesting patterns that are relevant for a given biological question requires high-performance computational infrastructure as well as efficient machine learning algorithms. Cross-communication of ideas between biologists and computer scientists remains a big challenge. Gene Expression Data Analysis: A Statistical and Machine Learning Perspective has been written with a multidisciplinary audience in mind. The book discusses gene expression data analysis from molecular biology, machine learning, and statistical perspectives. Readers will be able to acquire both theoretical and practical knowledge of methods for identifying novel patterns of high biological significance. To measure the effectiveness of such algorithms, we discuss statistical and biological performance metrics that can be used in real life or in a simulated environment. This book discusses a large number of benchmark algorithms, tools, systems, and repositories that are commonly used in analyzing gene expression data and validating results. This book will benefit students, researchers, and practitioners in biology, medicine, and computer science by enabling them to acquire in-depth knowledge in statistical and machine-learning-based methods for analyzing gene expression data. Key Features: An introduction to the Central Dogma of molecular biology and information flow in biological systems A systematic overview of the methods for generating gene expression data Background knowledge on statistical modeling and machine learning techniques Detailed methodology of analyzing gene expression data with an example case study Clustering methods for finding co-expression patterns from microarray, bulkRNA, and scRNA data A large number of practical tools, systems, and repositories that are useful for computational biologists to create, analyze, and validate biologically relevant gene expression patterns Suitable for multidisciplinary researchers and practitioners in computer science and the biological sciences




Bayesian Mixtures and Gene Expression Profiling with Missing Data


Book Description

Missing values are one of the problems encountered in microarray data analysis. For many of the clustering algorithms applied in microarray data analysis, a complete data matrix is required. The traditional approach to solving the missing value problem is to fill in with estimates by imputation. Once the missing value estimates are imputed, they remain fixed during the following clustering process. Poorly estimated missing data points will impair reliability of the cluster analysis. In this particular study, we tested the ability of a novel clustering method based on a Bayesian infinite mixtures model (IMM) to accommodate missing data. In a simulation study and a prostate cancer dataset, by examining the specificity and sensitivity of clusters we demonstrated that the IMM method has increased precision of the cluster analysis without requirement of a prior imputation. IMM is more robust in clustering an incomplete dataset than traditional clustering methods, which require prior imputation.




Proceedings of the Second International Conference on Computer and Communication Technologies


Book Description

The book is about all aspects of computing, communication, general sciences and educational research covered at the Second International Conference on Computer & Communication Technologies held during 24-26 July 2015 at Hyderabad. It hosted by CMR Technical Campus in association with Division – V (Education & Research) CSI, India. After a rigorous review only quality papers are selected and included in this book. The entire book is divided into three volumes. Three volumes cover a variety of topics which include medical imaging, networks, data mining, intelligent computing, software design, image processing, mobile computing, digital signals and speech processing, video surveillance and processing, web mining, wireless sensor networks, circuit analysis, fuzzy systems, antenna and communication systems, biomedical signal processing and applications, cloud computing, embedded systems applications and cyber security and digital forensic. The readers of these volumes will be highly benefited from the technical contents of the topics.




A Study on Some Missing Value Estimation Algorithms for DNA Microarray Data


Book Description

This dissertation, "A Study on Some Missing Value Estimation Algorithms for DNA Microarray Data" by Ching-wan, Tai, 戴青雲, was obtained from The University of Hong Kong (Pokfulam, Hong Kong) and is being sold pursuant to Creative Commons: Attribution 3.0 Hong Kong License. The content of this dissertation has not been altered in any way. We have altered the formatting in order to facilitate the ease of printing and reading of the dissertation. All rights not granted by the above license are retained by the author. Abstract: Abstract of thesis entitled A STUDY ON SOME MISSING VALUE ESTIMATION ALGORITHMS FOR DNA MICROARRAY DATA Submitted by TAI Ching-Wan for the degree of Master of Philosophy at The University of Hong Kong in December 2006 In this thesis, three missing value estimation algorithms, namely, KNNimpute, SVDimpute, and LLSimpute for DNA microarray data were studied. KNNimpute isabenchmarkimputationformissingvalueestimation. SVDimputeisanimputa- tion method making uses of the idea of principal component analysis. LLSimpute is a recently developed outstanding imputation method. Each of these three al- gorithms were tested on three different DNA microarray datasets. The results confirmed that LLSimpute outperforms both KNNimpute and SVDimpute. A new performance criterion, which measures the percentage error of the es- timated data value against the true data value, was proposed to complement the commonly used normalized root mean squared error for measuring performance of the algorithms. This new performance allows more detailed comparisons among different algorithms. A new result was obtained which shows that, when applying SVDimpute, es- timation using a high enough number of eigengenes may give satisfactory perfor- mance that is comparable to that of LLSimpute. In addition to the existing methods of using the Euclidean distance and the Pearson correlation coefficient for selecting neighboring genes in the LLSimputealgorithm, a new method using the vector angle was proposed. It was shown that theperformanceofLLSimputewasimprovedwhenincorporatingthisnewmethod. Finally, after realizing the distinct characteristics of the strengths of (i) the SVDimpute, (ii) using Euclidean distances and (iii) using vector angles to select neighboring genes, a mixed gene-selection strategy was proposed. The resulting algorithm outperforms all the three existing algorithms. DOI: 10.5353/th_b3936451 Subjects: Missing observations (Statistics) DNA microarrays - Statistical methods Algorithms




Computational Systems Bioinformatics


Book Description

Computational systems biology is a new and rapidly developing field of research, concerned with understanding the structure and processes of biological systems at the molecular, cellular, tissue, and organ levels through computational modeling as well as novel information theoretic data and image analysis methods. By focusing on either information processing of biological data or on modeling physical and chemical processes of biosystems, and in combination with the recent breakthrough in deciphering the human genome, computational systems biology is guaranteed to play a central role in disease prediction and preventive medicine, gene technology and pharmaceuticals, and other biotechnology fields. This book begins by introducing the basic mathematical, statistical, and data mining principles of computational systems biology, and then presents bioinformatics technology in microarray and sequence analysis step-by-step. Offering an insightful look into the effectiveness of the systems approach in computational biology, it focuses on recurrent themes in bioinformatics, biomedical applications, and future directions for research.




Encyclopedia of Bioinformatics and Computational Biology


Book Description

Encyclopedia of Bioinformatics and Computational Biology: ABC of Bioinformatics, Three Volume Set combines elements of computer science, information technology, mathematics, statistics and biotechnology, providing the methodology and in silico solutions to mine biological data and processes. The book covers Theory, Topics and Applications, with a special focus on Integrative –omics and Systems Biology. The theoretical, methodological underpinnings of BCB, including phylogeny are covered, as are more current areas of focus, such as translational bioinformatics, cheminformatics, and environmental informatics. Finally, Applications provide guidance for commonly asked questions. This major reference work spans basic and cutting-edge methodologies authored by leaders in the field, providing an invaluable resource for students, scientists, professionals in research institutes, and a broad swath of researchers in biotechnology and the biomedical and pharmaceutical industries. Brings together information from computer science, information technology, mathematics, statistics and biotechnology Written and reviewed by leading experts in the field, providing a unique and authoritative resource Focuses on the main theoretical and methodological concepts before expanding on specific topics and applications Includes interactive images, multimedia tools and crosslinking to further resources and databases




Data Mining for Biomedical Applications


Book Description

This book constitutes the refereed proceedings of the International Workshop on Data Mining for Biomedical Applications, BioDM 2006, held in Singapore in conjunction with the 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2006). The 14 revised full papers presented together with one keynote talk were carefully reviewed and selected from 35 submissions. The papers are organized in topical sections




Research in Computational Molecular Biology


Book Description

This book constitutes the refereed proceedings of the 18th Annual International Conference on Research in Computational Molecular Biology, RECOMB 2014, held in Pittsburgh, PA, USA, in April 2014. The 35 extended abstracts were carefully reviewed and selected from 154 submissions. They report on original research in all areas of computational molecular biology and bioinformatics.




Computational Intelligence in Pattern Recognition


Book Description

This book features high-quality research papers presented at the 2nd International Conference on Computational Intelligence in Pattern Recognition (CIPR 2020), held at the Institute of Engineering and Management, Kolkata, West Bengal, India, on 4–5 January 2020. It includes practical development experiences in various areas of data analysis and pattern recognition, focusing on soft computing technologies, clustering and classification algorithms, rough set and fuzzy set theory, evolutionary computations, neural science and neural network systems, image processing, combinatorial pattern matching, social network analysis, audio and video data analysis, data mining in dynamic environments, bioinformatics, hybrid computing, big data analytics and deep learning. It also provides innovative solutions to the challenges in these areas and discusses recent developments.