Parsimony, Phylogeny, and Genomics


Book Description

Table of contents




Parsimony, Phylogeny, and Genomics


Book Description

Parsimony analysis (cladistics) has long been one of the most widely used methods of phylogenetic inference in the fields of systematic and evolutionary biology. Moreover it has mathematical attributes that lend itself for use with complex, genomic-scale data sets. This book demonstrates the potential that this powerful hierarchical data summarization method also has for both structural and functional comparative genomic research.




Phylogenomics


Book Description

Phylogenomics: A Primer, Second Edition is for advanced undergraduate and graduate biology students studying molecular biology, comparative biology, evolution, genomics, and biodiversity. This book explains the essential concepts underlying the storage and manipulation of genomics level data, construction of phylogenetic trees, population genetics, natural selection, the tree of life, DNA barcoding, and metagenomics. The inclusion of problem-solving exercises in each chapter provides students with a solid grasp of the important molecular and evolutionary questions facing modern biologists as well as the tools needed to answer them.




Enhance the Understanding of Whole-genome Evolution by Designing, Accelerating and Parallelizing Phylogenetic Algorithms


Book Description

The advent of new technology enhance the speed and reduce the cost for sequencing biological data. Making biological sense of this genomic data is a big challenge to the algorithm design as well as the high performance computing society. There are many problems in Bioinformatics, such as how new functional genes arise, why genes are organized into chromosomes, how species are connected through the evolutionary tree of life, or why arrangements are subject to change. Phylogenetic analyses have become essential to research on the evolutionary tree of life. It can help us to track the history of species and the relationship between different genes or genomes through millions of years. One of the fundamentals for phylogenetic construction is the computation of distances between genomes. Since there are much more complicated combinatoric patterns in rearrangement events, the distance computation is still a hot topic as much belongs to mathematics as to biology. For the distance computation with input of two genomes containing unequal gene contents (with insertions/deletions and duplications) the problem is especially hard. In this thesis, we will discuss about our contributions to the distance estimation for unequal gene order data. The problem of finding the median of three genomes is the key process in building the most parsimonious phylogenetic trees from genome rearrangement data. For genomes with unequal contents, to the best of our knowledge, there is no algorithm that can help to find the median. In this thesis, we make our contributions to the median computation in two aspects. 1) Algorithm engineering aspect, we harness the power of streaming graph analytics methods to implement an exact DCJ median algorithm which run as fast as the heuristic algorithm and can help construct a better phylogenetic tree. 2) Algorithmic aspect, we theoretically formulate the problem of finding median with input of genomes having unequal gene content, which leads to the design and implementation of an efficient Lin-Kernighan heuristic based median algorithm. Inferring phylogenies (evolutionary history) of a set of given species is the ultimate goal when the distance and median model are chosen. For more than a decade, biologists and computer scientists have studied how to infer phylogenies by the measurement of genome rearrangement events using gene order data. While evolution is not an inherently parsimonious process, maximum parsimony (MP) phylogenetic analysis has been supported by widely applied to the phylogeny inference to study the evolutionary patterns of genome rearrangements. There are generally two problems with the MP phylogenetic arose by genome rearrangement: One is, given a set of modern genomes, how to compute the topologies of the according phylogenetic tree; Another is, given the topology of a model tree, how to infer the gene orders of the ancestor species. To assemble a MP phylogenetic tree constructor, there are multiple NP hard problems involved, unfortunately, they organized as one problem on top of other problems. Which means, to solve a NP hard problem, we need to solve multiple NP hard sub-problems. For phylogenetic tree construction with the input of unequal content genomes, there are three layers of NP hard problems. In this thesis, we will mainly discuss about our contributions to the design and implementation of the software package DCJUC (Phylogeny Inference using DCJ model to cope with Unequal Content Genomes), that can help to achieve both of these two goals. Aside from the biological problems, another issue we need to concern is about the use of the power of parallel computing to assist accelerating algorithms to handle huge data sets, such as the high resolution gene order data. For one thing, all of the method to tackle with phylogenetic problems are based on branch and bound algorithms, which are quite irregular and unfriendly to parallel computing. To parallelize these algorithms, we need to properly enhance the efficiency for localized memory access and load balance methods to make sure that each thread can put their potentials into full play. For the other, there is a revolution taking place in computing with the availability of commodity graphical processors such as Nvidia GPU and with many-core CPUs such as Cray-XMT, or Intel Xeon Phi Coprocessor with 60 cores. These architectures provide a new way for us to achieve high performance at much lower cost. However, code running on these machines are not so easily programmed, and scientific computing is hard to tune well on them. We try to explore the potentials of these architectures to help us accelerate branch and bound based phylogenetic algorithms.




Bioinformatics and Phylogenetics


Book Description

This volume presents a compelling collection of state-of-the-art work in algorithmic computational biology, honoring the legacy of Professor Bernard M.E. Moret in this field. Reflecting the wide-ranging influences of Prof. Moret’s research, the coverage encompasses such areas as phylogenetic tree and network estimation, genome rearrangements, cancer phylogeny, species trees, divide-and-conquer strategies, and integer linear programming. Each self-contained chapter provides an introduction to a cutting-edge problem of particular computational and mathematical interest. Topics and features: addresses the challenges in developing accurate and efficient software for the NP-hard maximum likelihood phylogeny estimation problem; describes the inference of species trees, covering strategies to scale phylogeny estimation methods to large datasets, and the construction of taxonomic supertrees; discusses the inference of ultrametric distances from additive distance matrices, and the inference of ancestral genomes under genome rearrangement events; reviews different techniques for inferring evolutionary histories in cancer, from the use of chromosomal rearrangements to tumor phylogenetics approaches; examines problems in phylogenetic networks, including questions relating to discrete mathematics, and issues of statistical estimation; highlights how evolution can provide a framework within which to understand comparative and functional genomics; provides an introduction to Integer Linear Programming and its use in computational biology, including its use for solving the Traveling Salesman Problem. Offering an invaluable source of insights for computer scientists, applied mathematicians, and statisticians, this illuminating volume will also prove useful for graduate courses on computational biology and bioinformatics.




Mathematics of Evolution and Phylogeny


Book Description

Table of contents




Sequence — Evolution — Function


Book Description

Sequence - Evolution - Function is an introduction to the computational approaches that play a critical role in the emerging new branch of biology known as functional genomics. The book provides the reader with an understanding of the principles and approaches of functional genomics and of the potential and limitations of computational and experimental approaches to genome analysis. Sequence - Evolution - Function should help bridge the "digital divide" between biologists and computer scientists, allowing biologists to better grasp the peculiarities of the emerging field of Genome Biology and to learn how to benefit from the enormous amount of sequence data available in the public databases. The book is non-technical with respect to the computer methods for genome analysis and discusses these methods from the user's viewpoint, without addressing mathematical and algorithmic details. Prior practical familiarity with the basic methods for sequence analysis is a major advantage, but a reader without such experience will be able to use the book as an introduction to these methods. This book is perfect for introductory level courses in computational methods for comparative and functional genomics.




Statistics and Truth


Book Description

Written by one of the top most statisticians with experience in diverse fields of applications of statistics, the book deals with the philosophical and methodological aspects of information technology, collection and analysis of data to provide insight into a problem, whether it is scientific research, policy making by government or decision making in our daily lives.The author dispels the doubts that chance is an expression of our ignorance which makes accurate prediction impossible and illustrates how our thinking has changed with quantification of uncertainty by showing that chance is no longer the obstructor but a way of expressing our knowledge. Indeed, chance can create and help in the investigation of truth. It is eloquently demonstrated with numerous examples of applications that statistics is the science, technology and art of extracting information from data and is based on a study of the laws of chance. It is highlighted how statistical ideas played a vital role in scientific and other investigations even before statistics was recognized as a separate discipline and how statistics is now evolving as a versatile, powerful and inevitable tool in diverse fields of human endeavor such as literature, legal matters, industry, archaeology and medicine.Use of statistics to the layman in improving the quality of life through wise decision making is emphasized.




Models and Algorithms for Genome Evolution


Book Description

This authoritative text/reference presents a review of the history, current status, and potential future directions of computational biology in molecular evolution. Gathering together the unique insights of an international selection of prestigious researchers, this must-read volume examines the latest developments in the field, the challenges that remain, and the new avenues emerging from the growing influx of sequence data. These viewpoints build upon the pioneering work of David Sankoff, one of the founding fathers of computational biology, and mark the 50th anniversary of his first scientific article. The broad spectrum of rich contributions in this essential collection will appeal to all computer scientists, mathematicians and biologists involved in comparative genomics, phylogenetics and related areas.




Introduction to Evolutionary Genomics


Book Description

This authoritative textbook/reference presents a comprehensive introduction to the field of evolutionary genomics. The opening chapters describe the fundamental concepts in molecular biology and genome evolution for readers without any prior background in this area. This is followed by a detailed examination of genome evolution in various different groups of organisms. The text then concludes with a review of practical methods essential to researchers in the field. This updated and revised new edition also features historical perspectives on contributions to evolutionary genomics from related fields such as molecular evolution, genetics, and numerical taxonomy. Topics and features: introduces the basics of molecular biology, covering protein structure and diversity, as well as DNA replication, transcription, and translation; examines the phylogenetic relationships of DNA sequences, and the processes of mutation, neutral evolution, and natural selection; presents a brief evolutionary history of life, surveying the key features of the genomes of prokaryotes, eukaryotes, viruses and phages, vertebrates, and humans; reviews the various biological “omic” databases, and discusses the analysis of homologous nucleotide and amino acid sequences; provides an overview of the experimental sequencing of genomes and transcriptomes, and the construction of phylogenetic trees; describes methods for estimating of evolutionary distances, and performing studies of population genetics; supplies additional supporting material at an associated website. Serving as an indispensable textbook for graduate and advanced undergraduate courses on evolutionary genomics, this accessible overview will also prove invaluable to researchers from both computer science and the biological sciences seeking a primer on the field.