High Confidence Network Predictions from Big Biological Data


Book Description

Biology functions in a most intriguing fashion, with human cells being regulated by multiplex networks of proteins and their dependent systems that control everything from proliferation to cell death. Notably, there are cases when these networks fail to function properly. In some diseases there are multiple small perturbations that push the otherwise healthy cells into a state of malfunction. These maladies are referred to as complex diseases, and include common disorders such as allergy, diabetes type II, and multiple sclerosis, and due to their complexity there is no universally defined approach to fully understand their pathogenesis or pathophysiology. While these perturbations can be measured using high-throughput technologies, the interplay of these perturbations is generally to complex to understand without any structured mathematical analysis. There is today numerous such methods that put the small perturbations of complex diseases into relation of interactions among each other. However, the methods have historically struggled with notable uncertainty in their predictions. This uncertainty can be addressed by at least two different approaches. First, mechanistically realistic mathematical modelling is an approach that has the capacity to accurately describe almost any biological system, but such models can to-date only describe small systems and networks. Secondly, large-scale mathematical modelling approaches exist, but the faithfulness of the models to the underlying biology has been compromised to achieve algorithms that are computationally effective. In this Ph.D. thesis, I suggest how high confidence predictions of network interactions can be extracted from big biological. First, I show how large-scale data can be used when building high-quality ODE models (Paper I). Secondly, by developing the software LASSIM, I show how ODE models can be expanded to the size of entire cell systems (Paper II). However, while LASSIM showed that powerful non-linear ODE-modelling can be applied to understand big biological data, it still remained a machine learning-based approach in contrast to hypothesis-driven model development. Instead, two more studies revolving around large-scale modelling approaches were initiated. The third study suggested that ambiguities in model selection and interaction identification greatly compromise the accuracy of available tools, and that the novel software of Paper III, LiPLike, can be used to remove such predictions. Intriguingly, while LiPLike was able to effectively discard false identifications, the accuracy of predictions remained relatively low. This low accuracy was thought to arise from model simplifications, and therefore the next study aimed at finding methods that come closer to the true biological system (Paper IV). In particular, the study aimed at predicting protein abundance -the true mediators of biological functionality- from the much more easily accessible mRNA levels, and found that such models could be used to get several new insights on protein mechanisms, which was exemplified by the identification of important biomarkers of autoimmune diseases. The analysis of big biological data and the underlying networks is a centrepiece of understanding both diseases and how cell functionality is orchestrated. The work that is presented in this Ph.D. thesis represents a journey between fields with different views on how these networks should be inferred. In particular, it aimed to combine the accuracy of small-scale mechanistic modelling with the system-spanning potential of large-scale linear system modelling, and this thesis thus provides a tool-bench of methods and insights on how knowledge can be extracted from big biological data, and in extension it is a small step towards a generation of new comprehensions of biological systems and complex diseases. Biologiska system är komplexa att förstå och det är först relativt nyligen man på ett strukturerat sätt börjat att analysera biologiska data genom matematisk analys. Ett av de tydligaste områden där en matematisk analys av biologiska system behövs är vid studier av komplexa sjukdomar. Sådana sjukdomar, till vilka åkommor som multipel skleros, diabetes typ II och allergi hör, uppstår genom en komplicerad kombination av arv och miljö som inte är helt förstådd. Studier av komplexa sjukdomar har dock kunnat identifiera många små potentiella störningar över hela det biologiska systemet, men ingen av dessa störningar är individuellt avgörande för att utveckla en komplex sjukdom. Denna svåröverskådlighet förhindrar traditionella analyser för att finna ursprunget till sjukdomen, och går det inte förstå en sjukdom försämras möjligheterna att till exempel hitta nya läkemedel eller att ställa diagnos. För att förstå hur systemen bakom komplexa sjukdomar fungerar, eller inte fungerar, tas olika prover vilka ofta resulterar i enorma mängder data. Dessa datamängder är oftast så stora att vi människor inte kan tolka dem genom att bara läsa talen, utan vi måste använda olika typer av matematiska modeller och datorprogram för att sådan data ska berätta något för oss. Inom två överlappande fält som kommit att kallas systembiologi och bioinformatik har metoder för att analysera biologiska data haft en snabb utveckling de senaste 50 åren. Dessa metoder har haft som mål att svara på flertalet frågor, och ett framträdande mål har varit att identifiera skillnader mellan hur friska och sjuka celler fungerar. En stor del av cellens funktioner regleras av olika nätverk av proteiner, och ett annat mål har varit att förstå hur dessa nätverk regleras. Ytterligare ett mål har varit att identifiera mätbara värden, så kallade biomarkörer, som kan användas för att identifiera sjukdom hos patienter. De metoder som används för att svara på dessa frågor kan grovt delas in i två grupper, mekanistisk modellering och storskalig modellering, med respektive styrkor och svagheter. Mekanistisk modellering har potentialen att ge mycket träffsäkra prediktioner, men kräver mycket manuellt arbete och har därför varit en alltför tidskrävande metod för att applicera på stora biologiska datamängder. Storskalig modellering klarar enkelt av stora datamängder, men har i stället haft en så låg tillförlitlighet att metoder vars förutsägelser är bättre än slumpen i många fall kunnat betraktats som bra. Denna doktorsavhandling kretsar kring utvecklingen och användandet av metoder för att analysera stora mängder av biologiska data, och har i fyra arbeten ämnat att förbättra metoder inom både småskalig mekanistisk modellering (artikel I och II) och storskalig modellering (artikel III och IV). Artikel I analyserade hur diabetes typ II påverkar fettcellers svar på insulin och hur denna insulinsignal kan beskrivas matematiskt. Detta första arbete var begränsat till just små modeller, och en naturlig utveckling var att undersöka om mekanistiska modeller kan skalas upp och beskriva system som täcker en större del av cellens funktionalitet. Detta möjliggjordes i artikel II genom LASSIM, en metod och programvara som kan expandera små mekanistiska modeller till mångdubbel storlek. Under skapandet av LASSIM stod det dock klart att storskalig modellering förblir en metod som är mycket tidskrävande. Därför syftade artikel III till att förbättra tillförlitligheten för prediktioner från befintliga metoder som kan hantera stora datamängder. Mer specifikt föreslog artikel III en ny algoritm, LiPLike, som kan användas för att ta bort prediktioner som saknar konfidens i data. Även om det gick att observera hur LiPLike kunde förbättra tillförlitligheten för etablerade metoder var flera av LiPLikes prediktioner fortfarande fel, vilket kunde antas bero på att den underliggande biologin skiljer sig från det matematiska modellantagande som låg till grund för studien. Därför inleddes den sista delen i denna avhandling, vilken syftade att utreda hur data kan beskrivas på mer biologiskt relevanta sätt. Även om det är proteiner som främst reglerar cellens system, baseras majoriteten av matematiska modeller på ett förstadium till proteiner som kallas mRNA. Anledningen till detta är att det både är svårt och kostsamt att mäta proteiner i ett prov, vilket gör att man istället förlitar sig på mRNA. I artikel IV användes matematisk modellering för att prediktera mängden protein i olika typer av immunceller. Dessa modeller visade sig vara användbara för att identifiera mätbara markörer för olika sjukdomar. Därmed går det använda mRNA-data på sätt som tar modeller närmare verkligheten, och som i förlängningen kan höja tillförlitligheten hos matematiska prediktioner. Forskningen är bara i början av ett långt arbete för att förstå hur celler fungerar, samt hur komplexa sjukdomar uppstår. En central del i detta arbete är att systematiskt beskriva de underliggande system som styr cellen, och detta går nästan enbart att uppnå genom en strukturerad matematisk analys. Denna avhandling kan sammanfattas som en serie arbeten som dels skalar upp storleken på modelleringsmetoder som tidigare varit begränsade till små modeller, och dels höjer tillförlitligheten på mer beräkningseffektiva modeller. Dessa bidrag kommer förhoppningsvis ligga till grund för en ökad förståelse för hur biologiska system bör analyseras och i förlängningen hur komplexa sjukdomar kan motverkas.




Proteomics for Biological Discovery


Book Description

An update to the popular guide to proteomics technology applications in biomedical research Building on the strength of the original edition, this book presents the state of the art in the field of proteomics and offers students and scientists new tools and techniques to advance their own research. Written by leading experts in the field, it provides readers with an understanding of new and emerging directions for proteomics research and applications. Proteomics for Biological Discovery begins by discussing the emergence of proteomics technologies and summarizing the potential insights to be gained from proteome-level research. The tools of proteomics, from conventional to novel techniques, are thoroughly covered, from underlying concepts to limitations and future directions. Later chapters provide an overview of the current developments in post-translational modification studies, structural proteomics, biochemical proteomics, applied proteomics, and bioinformatics relevant to proteomics. Chapters cover: Quantitative Proteomics for Differential Protein Expression Profiling; Protein Microarrays; Protein Biomarker Discovery; Biomarker Discovery using Mass Spectrometry Imaging; Protein-Protein Interactions; Mass Spectrometry Of Intact Protein Complexes; Crosslinking Applications in Structural Proteomics; Functional Proteomics; High Resolution Interrogation of Biological Systems via Mass Cytometry; Characterization of Drug-Protein Interactions by Chemoproteomics; Phosphorylation; Large-Scale Phosphoproteomics; and Probing Glycoforms of Individual Proteins Using Antibody-Lectin Sandwich Arrays. Presents a comprehensive and coherent review of the major issues in proteomic technology development, bioinformatics, strategic approaches, and applications Chapters offer a rigorous overview with summary of limitations, emerging approaches, questions, and realistic future industry and basic science applications Features new coverage of mass spectrometry for high throughput proteomic measurements, and novel quantitation strategies such as spectral counting and stable isotope labeling Discusses higher level integrative aspects, including technical challenges and applications for drug discovery Offers new chapters on biomarker discovery, global phosphorylation analysis, proteomic profiling using antibodies, and single cell mass spectrometry Proteomics for Biological Discovery is an excellent advanced resource for graduate students, postdoctoral fellows, and scientists across all the major fields of biomedical science.




Biological Data Mining in Protein Interaction Networks


Book Description

"The goal of this book is to disseminate research results and best practices from cross-disciplinary researchers and practitioners interested in, and working on bioinformatics, data mining, and proteomics"--Provided by publisher.




Issues in Bioengineering and Bioinformatics: 2011 Edition


Book Description

Issues in Bioengineering and Bioinformatics: 2011 Edition is a ScholarlyEditions™ eBook that delivers timely, authoritative, and comprehensive information about Bioengineering and Bioinformatics. The editors have built Issues in Bioengineering and Bioinformatics: 2011 Edition on the vast information databases of ScholarlyNews.™ You can expect the information about Bioengineering and Bioinformatics in this eBook to be deeper than what you can access anywhere else, as well as consistently reliable, authoritative, informed, and relevant. The content of Issues in Bioengineering and Bioinformatics: 2011 Edition has been produced by the world’s leading scientists, engineers, analysts, research institutions, and companies. All of the content is from peer-reviewed sources, and all of it is written, assembled, and edited by the editors at ScholarlyEditions™ and available exclusively from us. You now have a source you can cite with authority, confidence, and credibility. More information is available at http://www.ScholarlyEditions.com/.




Handbook of Systems Biology


Book Description

This book provides an entry point into Systems Biology for researchers in genetics, molecular biology, cell biology, microbiology and biomedical science to understand the key concepts to expanding their work. Chapters organized around broader themes of Organelles and Organisms, Systems Properties of Biological Processes, Cellular Networks, and Systems Biology and Disease discuss the development of concepts, the current applications, and the future prospects. Emphasis is placed on concepts and insights into the multi-disciplinary nature of the field as well as the importance of systems biology in human biological research. Technology, being an extremely important aspect of scientific progress overall, and in the creation of new fields in particular, is discussed in 'boxes' within each chapter to relate to appropriate topics. - 2013 Honorable Mention for Single Volume Reference in Science from the Association of American Publishers' PROSE Awards - Emphasizes the interdisciplinary nature of systems biology with contributions from leaders in a variety of disciplines - Includes the latest research developments in human and animal models to assist with translational research - Presents biological and computational aspects of the science side-by-side to facilitate collaboration between computational and biological researchers




Research in Computational Molecular Biology


Book Description

This book constitutes the refereed proceedings of the 12th Annual International Conference on Research in Computational Molecular Biology, RECOMB 2008. It presents current issues in algorithmic, theoretical, and experimental bioinformatics.




Cancer Systems Biology


Book Description

The unprecedented amount of data produced with high-throughput experimentation forces biologists to employ mathematical representation and computation to glean meaningful information in systems-level biology. Applying this approach to the underlying molecular mechanisms of tumorgenesis, cancer research is enjoying a series of new discoveries and biological insights. Unique in its dualistic approach, this book introduces the concepts and theories of systems biology and their applications in cancer research. It presents basic cancer biology and cutting-edge topics of cancer research for computational biologists alongside systems biology analysis tools for experimental biologists.




Basics of Bioinformatics


Book Description

This book outlines 11 courses and 15 research topics in bioinformatics, based on curriculums and talks in a graduate summer school on bioinformatics that was held in Tsinghua University. The courses include: Basics for Bioinformatics, Basic Statistics for Bioinformatics, Topics in Computational Genomics, Statistical Methods in Bioinformatics, Algorithms in Computational Biology, Multivariate Statistical Methods in Bioinformatics Research, Association Analysis for Human Diseases: Methods and Examples, Data Mining and Knowledge Discovery Methods with Case Examples, Applied Bioinformatics Tools, Foundations for the Study of Structure and Function of Proteins, Computational Systems Biology Approaches for Deciphering Traditional Chinese Medicine, and Advanced Topics in Bioinformatics and Computational Biology. This book can serve as not only a primer for beginners in bioinformatics, but also a highly summarized yet systematic reference book for researchers in this field. Rui Jiang and Xuegong Zhang are both professors at the Department of Automation, Tsinghua University, China. Professor Michael Q. Zhang works at the Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA.




Analyzing Network Data in Biology and Medicine


Book Description

The increased and widespread availability of large network data resources in recent years has resulted in a growing need for effective methods for their analysis. The challenge is to detect patterns that provide a better understanding of the data. However, this is not a straightforward task because of the size of the data sets and the computer power required for the analysis. The solution is to devise methods for approximately answering the questions posed, and these methods will vary depending on the data sets under scrutiny. This cutting-edge text introduces biological concepts and biotechnologies producing the data, graph and network theory, cluster analysis and machine learning, before discussing the thought processes and creativity involved in the analysis of large-scale biological and medical data sets, using a wide range of real-life examples. Bringing together leading experts, this text provides an ideal introduction to and insight into the interdisciplinary field of network data analysis in biomedicine.




Advances in Biotechnology Research and Application: 2012 Edition


Book Description

Advances in Biotechnology Research and Application / 2012 Edition is a ScholarlyEditions™ eBook that delivers timely, authoritative, and comprehensive information about Biotechnology. The editors have built Advances in Biotechnology Research and Application / 2012 Edition on the vast information databases of ScholarlyNews.™ You can expect the information about Biotechnology in this eBook to be deeper than what you can access anywhere else, as well as consistently reliable, authoritative, informed, and relevant. The content of Advances in Biotechnology Research and Application / 2012 Edition has been produced by the world’s leading scientists, engineers, analysts, research institutions, and companies. All of the content is from peer-reviewed sources, and all of it is written, assembled, and edited by the editors at ScholarlyEditions™ and available exclusively from us. You now have a source you can cite with authority, confidence, and credibility. More information is available at http://www.ScholarlyEditions.com/.