Advances in Genomic Sequence Analysis and Pattern Discovery


Book Description

Mapping the genomic landscapes is one of the most exciting frontiers of science. We have the opportunity to reverse engineer the blueprints and the control systems of living organisms. Computational tools are key enablers in the deciphering process. This book provides an in-depth presentation of some of the important computational biology approaches to genomic sequence analysis. The first section of the book discusses methods for discovering patterns in DNA and RNA. This is followed by the second section that reflects on methods in various ways, including performance, usage and paradigms.




Advances in Bioinformatics


Book Description

This book presents the latest developments in bioinformatics, highlighting the importance of bioinformatics in genomics, transcriptomics, metabolism and cheminformatics analysis, as well as in drug discovery and development. It covers tools, data mining and analysis, protein analysis, computational vaccine, and drug design. Covering cheminformatics, computational evolutionary biology and the role of next-generation sequencing and neural network analysis, it also discusses the use of bioinformatics tools in the development of precision medicine. This book offers a valuable source of information for not only beginners in bioinformatics, but also for students, researchers, scientists, clinicians, practitioners, policymakers, and stakeholders who are interested in harnessing the potential of bioinformatics in many areas.




Pattern Discovery in Biomolecular Data


Book Description

Finding patterns in biomolecular data, particularly in DNA and RNA, is at the center of modern biological research. These data are complex and growing rapidly, so the search for patterns requires increasingly sophisticated computer methods. Pattern Discovery in Biomolecular Data provides a clear, up-to-date summary of the principal techniques. Each chapter is self-contained, and the techniques are drawn from many fields, including graph theory, information theory, statistics, genetic algorithms, computer visualization, and vision. Since pattern searches often benefit from multiple approaches, the book presents methods in their purest form so that readers can best choose the method or combination that fits their needs. The chapters focus on finding patterns in DNA, RNA, and protein sequences, finding patterns in 2D and 3D structures, and choosing system components. This volume will be invaluable for all workers in genomics and genetic analysis, and others whose research requires biocomputing.




Efficient Large-Scale Machine Learning Algorithms for Genomic Sequences


Book Description

High-throughput sequencing (HTS) has led to many breakthroughs in basic and translational biology research. With this technology, researchers can interrogate whole genomes at single-nucleotide resolution. The large volume of data generated by HTS experiments necessitates the development of novel algorithms that can efficiently process these data. At the advent of HTS, several rudimentary methods were proposed. Often, these methods applied compromising strategies such as discarding a majority of the data or reducing the complexity of the models. This thesis focuses on the development of machine learning methods for efficiently capturing complex patterns from high volumes of HTS data.First, we focus on on de novo motif discovery, a popular sequence analysis method that predates HTS. Given multiple input sequences, the goal of motif discovery is to identify one or more candidate motifs, which are biopolymer sequence patterns that are conjectured to have biological significance. In the context of transcription factor (TF) binding, motifs may represent the sequence binding preference of proteins. Traditional motif discovery algorithms do not scale well with the number of input sequences, which can make motif discovery intractable for the volume of data generated by HTS experiments. One common solution is to only perform motif discovery on a small fraction of the sequences. Scalable algorithms that simplify the motif models are popular alternatives. Our approach is a stochastic method that is scalable and retains the modeling power of past methods.Second, we leverage deep learning methods to annotate the pathogenicity of genetic variants. Deep learning is a class of machine learning algorithms concerned with deep neural networks (DNNs). DNNs use a cascade of layers of nonlinear processing units for feature extraction and transformation. Each layer uses the output from the previous layer as its input. Similar to our novel motif discovery algorithm, artificial neural networks can be efficiently trained in a stochastic manner. Using a large labeled dataset comprised of tens of millions of pathogenic and benign genetic variants, we trained a deep neural network to discriminate between the two categories. Previous methods either focused only on variants lying in protein coding regions, which cover less than 2% of the human genome, or applied simpler models such as linear support vector machines, which can not usually capture non-linear patterns like deep neural networks can.Finally, we discuss convolutional (CNN) and recurrent (RNN) neural networks, variations of DNNs that are especially well-suited for studying sequential data. Specifically, we stacked a bidirectional recurrent layer on top of a convolutional layer to form a hybrid model. The model accepts raw DNA sequences as inputs and predicts chromatin markers, including histone modifications, open chromatin, and transcription factor binding. In this specific application, the convolutional kernels are analogous to motifs, hence the model learning is essentially also performing motif discovery. Compared to a pure convolutional model, the hybrid model requires fewer free parameters to achieve superior performance. We conjecture that the recurrent layer allows our model spatial and orientation dependencies among motifs better than a pure convolutional model can. With some modifications to this framework, the model can accept cell type-specific features, such as gene expression and open chromatin DNase I cleavage, to accurately predict transcription factor binding across cell types. We submitted our model to the ENCODE-DREAM in vivo Transcription Factor Binding Site Prediction Challenge, where it was among the top performing models. We implemented several novel heuristics, which significantly reduced the training time and the computational overhead. These heuristics were instrumental to meet the Challenge deadlines and to make the method more accessible for the research community.HTS has already transformed the landscape of basic and translational research, proving itself as a mainstay of modern biological research. As more data are generated and new assays are developed, there will be an increasing need for computational methods to integrate the data to yield new biological insights. We have only begun to scratch the surface of discovering what is possible from both an experimental and a computational perspective. Thus, further development of versatile and efficient statistical models is crucial to maintaining the momentum for new biological discoveries.




Next Generation Sequencing


Book Description

Next generation sequencing (NGS) has surpassed the traditional Sanger sequencing method to become the main choice for large-scale, genome-wide sequencing studies with ultra-high-throughput production and a huge reduction in costs. The NGS technologies have had enormous impact on the studies of structural and functional genomics in all the life sciences. In this book, Next Generation Sequencing Advances, Applications and Challenges, the sixteen chapters written by experts cover various aspects of NGS including genomics, transcriptomics and methylomics, the sequencing platforms, and the bioinformatics challenges in processing and analysing huge amounts of sequencing data. Following an overview of the evolution of NGS in the brave new world of omics, the book examines the advances and challenges of NGS applications in basic and applied research on microorganisms, agricultural plants and humans. This book is of value to all who are interested in DNA sequencing and bioinformatics across all fields of the life sciences.




Genomics at the Nexus of AI, Computer Vision, and Machine Learning


Book Description

The book provides a comprehensive understanding of cutting-edge research and applications at the intersection of genomics and advanced AI techniques and serves as an essential resource for researchers, bioinformaticians, and practitioners looking to leverage genomics data for AI-driven insights and innovations. The book encompasses a wide range of topics, starting with an introduction to genomics data and its unique characteristics. Each chapter unfolds a unique facet, delving into the collaborative potential and challenges that arise from advanced technologies. It explores image analysis techniques specifically tailored for genomic data. It also delves into deep learning showcasing the power of convolutional neural networks (CNN) and recurrent neural networks (RNN) in genomic image analysis and sequence analysis. Readers will gain practical knowledge on how to apply deep learning techniques to unlock patterns and relationships in genomics data. Transfer learning, a popular technique in AI, is explored in the context of genomics, demonstrating how knowledge from pre-trained models can be effectively transferred to genomic datasets, leading to improved performance and efficiency. Also covered is the domain adaptation techniques specifically tailored for genomics data. The book explores how genomics principles can inspire the design of AI algorithms, including genetic algorithms, evolutionary computing, and genetic programming. Additional chapters delve into the interpretation of genomic data using AI and ML models, including techniques for feature importance and visualization, as well as explainable AI methods that aid in understanding the inner workings of the models. The applications of genomics in AI span various domains, and the book explores AI-driven drug discovery and personalized medicine, genomic data analysis for disease diagnosis and prognosis, and the advancement of AI-enabled genomic research. Lastly, the book addresses the ethical considerations in integrating genomics with AI, computer vision, and machine learning. Audience The book will appeal to biomedical and computer/data scientists and researchers working in genomics and bioinformatics seeking to leverage AI, computer vision, and machine learning for enhanced analysis and discovery; healthcare professionals advancing personalized medicine and patient care; industry leaders and decision-makers in biotechnology, pharmaceuticals, and healthcare industries seeking strategic insights into the integration of genomics and advanced technologies.




Genome Analysis


Book Description

In recent years there have been tremendous achievements made in DNA sequencing technologies and corresponding innovations in data analysis and bioinformatics that have revolutionized the field of genome analysis. In this book, an impressive array of expert authors highlight and review current advances in genome analysis. This volume provides an invaluable, up-to-date and comprehensive overview of the methods currently employed for next-generation sequencing (NGS) data analysis, highlights their problems and limitations, demonstrates the applications and indicates the developing trends in various fields of genome research. The first part of the book is devoted to the methods and applications that arose from, or were significantly advanced by, NGS technologies: the identification of structural variation from DNA-seq data; whole-transcriptome analysis and discovery of small interfering RNAs (siRNAs) from RNA-seq data; motif finding in promoter regions, enhancer prediction and nucleosome sequence code discovery from ChiP-Seq data; identification of methylation patterns in cancer from MeDIP-seq data; transposon identification in NGS data; metagenomics and metatranscriptomics; NGS of viral communities; and causes and consequences of genome instabilities. The second part is devoted to the field of RNA biology with the last three chapters devoted to computational methods of RNA structure prediction including context-free grammar applications. An essential book for everyone involved in sequence data analysis, next-generation sequencing, high-throughput sequencing, RNA structure prediction, bioinformatics and genome analysis.




Introduction to Bioinformatics


Book Description

CD-ROM contains: chapter illustrations -- full and trial versions of programs.




Advances in the Understanding of Biological Sciences Using Next Generation Sequencing (NGS) Approaches


Book Description

Provides a global view of the recent advances in the biological sciences and the adaption of the pathogen to the host plants revealed using NGS. Molecular Omic’s is now a major driving force to learn the adaption genetics and a great challenge to the scientific community, which can be resolved through the application of the NGS technologies. The availability of complete genome sequences, the respective model species for dicot and monocot plant groups, presents a global opportunity to delineate the identification, function and the expression of the genes, to develop new tools for the identification of the new genes and pathway identification. Genome-wide research tools, resources and approaches such as data mining for structural similarities, gene expression profiling at the DNA and RNA level with rapid increase in available genome sequencing efforts, expressed sequence tags (ESTs), RNA-seq, gene expression profiling, induced deletion mutants and insertional mutants, and gene expression knock-down (gene silencing) studies with RNAi and microRNAs have become integral parts of plant molecular omic’s. Molecular diversity and mutational approaches present the first line of approach to unravel the genetic and molecular basis for several traits, QTL related to disease resistance, which includes host approaches to combat the pathogens and to understand the adaptation of the pathogen to the plant host. Using NGS technologies, understanding of adaptation genetics towards stress tolerance has been correlated to the epigenetics. Naturally occurring allelic variations, genome shuffling and variations induced by chemical or radiation mutagenesis are also being used in functional genomics to elucidate the pathway for the pathogen and stress tolerance and is widely illustrated in demonstrating the identification of the genes responsible for tolerance in plants, bacterial and fungal species.




Advances in Computational Biology


Book Description

Proceedings of The 2009 International Conference on Bioinformatics and Computational Biology in Las Vegas, NV, July 13-16, 2009. Recent advances in Computational Biology are covered through a variety of topics. Both inward research (core areas of computational biology and computer science) and outward research (multi-disciplinary, Inter-disciplinary, and applications) will be covered during the conferences. These include: Gene regulation, Gene expression databases, Gene pattern discovery and identification, Genetic network modeling and inference, Gene expression analysis, RNA and DNA structure and sequencing, Biomedical engineering, Microarrays, Molecular sequence and structure databases, Molecular dynamics and simulation, Molecular sequence classification, alignment and assembly, Image processing In medicine and biological sciences, Sequence analysis and alignment, Informatics and Statistics in Biopharmaceutical Research, Software tools for computational biology and bioinformatics, Comparative genomics; and more.