Approaches to Improve the Precision of Similarity Patterns and Reproducibility for Cluster Analysis


Book Description

This study is about developing new clustering analysis algorithms to analyze microarray gene expression data. With the use of clustering analysis, it is possible to infer the function of genes in a cluster by referring to those with known function in the same cluster. In microarray data, thousands of genes expression profiles are observed across different experimental conditions. Due to the complex experimental designs, the observations from different experimental conditions might be correlated. To account for the correlations from different experimental conditions and correlations among different genes, new clustering algorithms have been developed which are based on Bayesian infinite mixture models in a Bayesian data analysis framework. The correlations have been taken into account by specifying accurate variance-covariance matrices in statistical model definitions. In this way when correlations are present, the new algorithms can precisely represent the observed data. Consequently, the new algorithms produce more stable and reproducible cluster results. Mathematical and computational procedures have been developed and implemented through appropriate computer programs. Gibbs sampler was used to estimate the posterior distribution of clusters. Posterior pairwise probabilities (PPP) of co-clustering of two genes are obtained based on the estimated classification variable distribution. By treating PPPs as the pairwise similarity measures, clusters are formed using traditional hierarchical cluster analysis algorithms. The new algorithms and existing clustering algorithms were applied to simulated data, as well as real-world data to compare their performance. Compared with the existing clustering algorithms, when non-zero correlations exist, the new algorithms generally obtained more accurate and stable clustering results.







Unsupervised Classification


Book Description

Clustering is an important unsupervised classification technique where data points are grouped such that points that are similar in some sense belong to the same cluster. Cluster analysis is a complex problem as a variety of similarity and dissimilarity measures exist in the literature. This is the first book focused on clustering with a particular emphasis on symmetry-based measures of similarity and metaheuristic approaches. The aim is to find a suitable grouping of the input data set so that some criteria are optimized, and using this the authors frame the clustering problem as an optimization one where the objectives to be optimized may represent different characteristics such as compactness, symmetrical compactness, separation between clusters, or connectivity within a cluster. They explain the techniques in detail and outline many detailed applications in data mining, remote sensing and brain imaging, gene expression data analysis, and face detection. The book will be useful to graduate students and researchers in computer science, electrical engineering, system science, and information technology, both as a text and as a reference book. It will also be useful to researchers and practitioners in industry working on pattern recognition, data mining, soft computing, metaheuristics, bioinformatics, remote sensing, and brain imaging.




Reproducibility and Replicability in Science


Book Description

One of the pathways by which the scientific community confirms the validity of a new scientific discovery is by repeating the research that produced it. When a scientific effort fails to independently confirm the computations or results of a previous study, some fear that it may be a symptom of a lack of rigor in science, while others argue that such an observed inconsistency can be an important precursor to new discovery. Concerns about reproducibility and replicability have been expressed in both scientific and popular media. As these concerns came to light, Congress requested that the National Academies of Sciences, Engineering, and Medicine conduct a study to assess the extent of issues related to reproducibility and replicability and to offer recommendations for improving rigor and transparency in scientific research. Reproducibility and Replicability in Science defines reproducibility and replicability and examines the factors that may lead to non-reproducibility and non-replicability in research. Unlike the typical expectation of reproducibility between two computations, expectations about replicability are more nuanced, and in some cases a lack of replicability can aid the process of scientific discovery. This report provides recommendations to researchers, academic institutions, journals, and funders on steps they can take to improve reproducibility and replicability in science.




Computational Genomics with R


Book Description

Computational Genomics with R provides a starting point for beginners in genomic data analysis and also guides more advanced practitioners to sophisticated data analysis techniques in genomics. The book covers topics from R programming, to machine learning and statistics, to the latest genomic data analysis techniques. The text provides accessible information and explanations, always with the genomics context in the background. This also contains practical and well-documented examples in R so readers can analyze their data by simply reusing the code presented. As the field of computational genomics is interdisciplinary, it requires different starting points for people with different backgrounds. For example, a biologist might skip sections on basic genome biology and start with R programming, whereas a computer scientist might want to start with genome biology. After reading: You will have the basics of R and be able to dive right into specialized uses of R for computational genomics such as using Bioconductor packages. You will be familiar with statistics, supervised and unsupervised learning techniques that are important in data modeling, and exploratory analysis of high-dimensional data. You will understand genomic intervals and operations on them that are used for tasks such as aligned read counting and genomic feature annotation. You will know the basics of processing and quality checking high-throughput sequencing data. You will be able to do sequence analysis, such as calculating GC content for parts of a genome or finding transcription factor binding sites. You will know about visualization techniques used in genomics, such as heatmaps, meta-gene plots, and genomic track visualization. You will be familiar with analysis of different high-throughput sequencing data sets, such as RNA-seq, ChIP-seq, and BS-seq. You will know basic techniques for integrating and interpreting multi-omics datasets. Altuna Akalin is a group leader and head of the Bioinformatics and Omics Data Science Platform at the Berlin Institute of Medical Systems Biology, Max Delbrück Center, Berlin. He has been developing computational methods for analyzing and integrating large-scale genomics data sets since 2002. He has published an extensive body of work in this area. The framework for this book grew out of the yearly computational genomics courses he has been organizing and teaching since 2015.







Bioinformatic and Statistical Analysis of Microbiome Data


Book Description

This unique book addresses the bioinformatic and statistical modelling and also the analysis of microbiome data using cutting-edge QIIME 2 and R software. It covers core analysis topics in both bioinformatics and statistics, which provides a complete workflow for microbiome data analysis: from raw sequencing reads to community analysis and statistical hypothesis testing. It includes real-world data from the authors’ research and from the public domain, and discusses the implementation of QIIME 2 and R for data analysis step-by-step. The data as well as QIIME 2 and R computer programs are publicly available, allowing readers to replicate the model development and data analysis presented in each chapter so that these new methods can be readily applied in their own research. Bioinformatic and Statistical Analysis of Microbiome Data is an ideal book for advanced graduate students and researchers in the clinical, biomedical, agricultural, and environmental fields, as well as those studying bioinformatics, statistics, and big data analysis.




DNA Methods in Food Safety


Book Description

Molecular typing of foodborne pathogens has become an indispensable tool in epidemiological studies. Thanks to these techniques, we now have a better understanding of the distribution and appearance of bacterial foodborne diseases and have a deeper knowledge of the type of food products associated with the major foodborne pathogens. Within the molecular techniques, DNA-based techniques have prospered for more than 40 years and have been incorporated in the first surveillance systems to monitor bacterial foodborne pathogens in the United States and other countries. However, DNA techniques vary widely and many microbiology laboratory personnel working with food and/or water face the dilemma of which method to incorporate. DNA Methods in Food Safety: Molecular Typing of Foodborne and Waterborne Bacterial Pathogens succinctly reviews more than 25 years of data on a variety of DNA typing techniques, summarizing the different mathematical models for analysis and interpretation of results, and detailing their efficacy in typing different foodborne and waterborne bacterial pathogens, such as Campylobacter, Clostridium perfringens, Listeria, Salmonella, among others. Section I describes the different DNA techniques used in the typing of bacterial foodborne pathogens, whilst Section II deals with the application of these techniques to type the most important bacterial foodborne pathogens. In Section II the emphasis is placed on the pathogen, and each chapter describes some of the most appropriate techniques for typing each bacterial pathogen. The techniques presented in this book are the most significant in the study of the molecular epidemiology of bacterial foodborne pathogens to date. It therefore provides a unique reference for students and professionals in the field of microbiology, food and water safety and epidemiology and molecular epidemiology.




Electric Vehicle Integration via Smart Charging


Book Description

This book brings together important new contributions covering electric vehicle smart charging (EVSC) from a multidisciplinary group of global experts, providing a comprehensive look at EVSC and its role in meeting long-term goals for decarbonization of electricity generation and transportation. This multidisciplinary reference presents practical aspects and approaches to the technology, along with evidence from its applications to real-world energy systems. Electric Vehicle Integration via Smart Charging is suitable for practitioners and industry stakeholders working on EVSC, as well as researchers and developers from different branches of engineering, energy, transportation, economic, and operation research fields.