Accelerating Bioinformatics Applications on CUDA-enabled Multi-GPU Systems


Book Description

A wide range of bioinformatics applications have to deal with a continuously growing amount of data generated by high-throughput sequencing techniques. Exclusively CPU-based workstations fail to keep up with the task. Instead of employing dozens of CPU cluster nodes to increase the computational power, massively parallel accelerators like modern CUDA-enabled GPUs can be used to achieve higher throughput and reduce execution times. However, memory capacity of such devices is often limited. Efficient parallelization and data distribution are essential to accelerate performance critical components of bionformatics pipelines like read classification and read mapping. In this thesis we analyze and optimize tasks common to many GPU-based applications in the context of bioinformatics. We study sequence processing, construction and querying of k-mer-based hash tables, segmented sort as well as multi-GPU communication. With these methods we accelerate suffix array construction and metagenomic read classification on CUDA-enabled GPUs by overcoming the aforementioned challenges. By leveraging multiple GPUs, we extend the limited memory available from a single GPU to allow for the construction of larger indices. Our communication library, called Gossip, introduces optimized scatter, gather and all-to-all patterns for multi-GPU systems. Gossip's all-to-all communication pattern is successfully applied to suffix array construction, accelerating it to run in 3.44 s for a full-length human genome on an 8-GPU server, which is faster than previously reported 4.8 seconds achieved by employing 1600 cores on 100 nodes on a CPU-based HPC cluster. Furthermore, we introduce MetaCache-GPU -- an ultra-fast metagenomic short read classifier specifically tailored to fit the characteristics of CUDA-enabled accelerators. Our approach employs a novel hash table variant featuring efficient minhash fingerprinting of reads for locality-sensitive hashing and their rapid insertion using warp-aggregated operations. Our performance evaluation shows that MetaCache-GPU is able to build large reference databases in a matter of seconds, enabling instantaneous operability, while popular CPU-based tools such as Kraken2 require over an hour for index construction on the same data. In the light of an ever-growing number of reference genomes, MetaCache-GPU is the first metagenomic classifier that makes analysis pipelines with on-demand composition of large-scale reference genome sets practical. Although many sub-problems in this thesis are optimized in a specific application context, they also apply to other bioinformatics problems like k-mer counting, sequence alignment and assembly, which would benefit from GPU acceleration. In addition to the insights from this work, we make our source code publicly available to allow for easier adaptation of our methods to related problems.







High-performance Processing of Next-generation Sequencing Data on CUDA-enabled GPUs


Book Description

With the technological advances in the field of genomics and sequencing, the processing of vast amounts of generated data becomes more and more challenging. Nowadays, software for processing large-scale datasets of sequencing reads may take hours to days to complete, even on high-end workstations. This explains the need for new approaches to achieve faster, high-performance applications. In contrast to traditional CPU-based software, algorithms utilizing the massively-parallel many-core architecture and fast memory of GPUs are potentially able to deliver the desired performance in many fields. In this thesis, we introduce two novel GPU-accelerated applications, CARE and CAREx, for common steps in sequence processing pipelines, error correction and read extension of Next Generation Sequencing (NGS) Illumina data, to improve the results of down-stream data analysis. To the best of our knowledge, CARE and CAREx are the first modern GPU-accelerated solutions for the respective problems. A key component of our algorithm is the identification of similar DNA sequences within a dataset. For this purpose, we developed a minhashing-based index data structure for large-scale read datasets. In conjunction with our fast bit-parallel shifted hamming distance computations, this allows for the efficient identification of similar reads. The resulting set of similar sequences is subsequently arranged into a gap-free multiple-sequence alignment to solve the problem at hand. Sequencing machines introduce both systematic errors and random errors. CARE, Context-Aware Read Error corrector, accurately removes errors introduced by NGS sequencing machines during the initial sequencing of a biological sample. With the help of a pre-trained Random Forest, CARE generates two orders-of-magnitude fewer false positives than its competitors. At the same time, it shows similar numbers of true positives. Read extension describes the process of elongating DNA sequences. The presence of longer sequences improves the resolution of more, larger structures within a genome. CAREx, Context-Aware Read Extender, produces longer sequences, so called pseudo-long reads, by connecting the two reads of read pairs which were sequenced in close proximity. Evaluation shows that CAREx produces significantly more highly accurate pseudo-long reads than the state-of-the-art. With algorithms tailored towards high-performance GPU computations, both CARE and CAREx run significantly faster than the CPU-based competitors, while, at the same time, produce more accurate results. The processing of a large Human dataset with 30x coverage with CARE requires less than 30 minutes using a single A100 GPU. This time can be further reduced down to 10 minutes on multi-GPU systems. In contrast, CPU-based tools like Musket or BFC take 3 hours and 1.5 hours, respectively. Read extension of a Human dataset with CAREx takes 3.3 hours to complete on a single GPU, whereas Konnector2 requires over a day to complete. This shows that large-scale sequence processing can greatly benefit from the usage of GPUs, and that multiple-sequence alignment-based algorithms should be considered despite their increased complexity because they provide great accuracy. While our general building blocks have been tailored towards our needs for error correction and read extension, they could also prove useful in other GPU-accelerated applications that process sequence data.




CUDA Fortran for Scientists and Engineers


Book Description

CUDA Fortran for Scientists and Engineers shows how high-performance application developers can leverage the power of GPUs using Fortran, the familiar language of scientific computing and supercomputer performance benchmarking. The authors presume no prior parallel computing experience, and cover the basics along with best practices for efficient GPU computing using CUDA Fortran. To help you add CUDA Fortran to existing Fortran codes, the book explains how to understand the target GPU architecture, identify computationally intensive parts of the code, and modify the code to manage the data and parallelism and optimize performance. All of this is done in Fortran, without having to rewrite in another language. Each concept is illustrated with actual examples so you can immediately evaluate the performance of your code in comparison. Leverage the power of GPU computing with PGI’s CUDA Fortran compiler Gain insights from members of the CUDA Fortran language development team Includes multi-GPU programming in CUDA Fortran, covering both peer-to-peer and message passing interface (MPI) approaches Includes full source code for all the examples and several case studies Download source code and slides from the book's companion website




Proceedings of ICRIC 2019


Book Description

This book presents high-quality, original contributions (both theoretical and experimental) on software engineering, cloud computing, computer networks & internet technologies, artificial intelligence, information security, and database and distributed computing. It gathers papers presented at ICRIC 2019, the 2nd International Conference on Recent Innovations in Computing, which was held in Jammu, India, in March 2019. This conference series represents a targeted response to the growing need for research that reports on and assesses the practical implications of IoT and network technologies, AI and machine learning, cloud-based e-Learning and big data, security and privacy, image processing and computer vision, and next-generation computing technologies.




Computational Science and Its Applications - ICCSA 2014


Book Description

The six-volume set LNCS 8579-8584 constitutes the refereed proceedings of the 14th International Conference on Computational Science and Its Applications, ICCSA 2014, held in Guimarães, Portugal, in June/July 2014. The 347 revised papers presented in 30 workshops and a special track were carefully reviewed and selected from 1167. The 289 papers presented in the workshops cover various areas in computational science ranging from computational science technologies to specific areas of computational science such as computational geometry and security.




Database Systems for Advanced Applications


Book Description

This two volume set LNCS 9049 and LNCS 9050 constitutes the refereed proceedings of the 20th International Conference on Database Systems for Advanced Applications, DASFAA 2015, held in Hanoi, Vietnam, in April 2015. The 63 full papers presented were carefully reviewed and selected from a total of 287 submissions. The papers cover the following topics: data mining; data streams and time series; database storage and index; spatio-temporal data; modern computing platform; social networks; information integration and data quality; information retrieval and summarization; security and privacy; outlier and imbalanced data analysis; probabilistic and uncertain data; query processing.




Efficient Solutions for Bioinformatics Applications Using Gpus


Book Description

This dissertation, "Efficient Solutions for Bioinformatics Applications Using GPUs" by Chi-man, Liu, 廖志敏, was obtained from The University of Hong Kong (Pokfulam, Hong Kong) and is being sold pursuant to Creative Commons: Attribution 3.0 Hong Kong License. The content of this dissertation has not been altered in any way. We have altered the formatting in order to facilitate the ease of printing and reading of the dissertation. All rights not granted by the above license are retained by the author. Abstract: Over the past few years, DNA sequencing technology has been advancing at such a fast pace that computer hardware and software can hardly meet the ever-increasing demand for sequence analysis. A natural approach to boost analysis efficiency is parallelization, which divides the problem into smaller ones that are to be solved simultaneously on multiple execution units. Common architectures such as multi-core CPUs and clusters can increase the throughput to some extent, but the hardware setup and maintenance costs are prohibitive. Fortunately, the newly emerged general-purpose GPU programming paradigm gives us a low-cost alternative for parallelization. This thesis presents GPU-accelerated algorithms for several problems in bioinformatics, along with implementations to demonstrate their power in handling enormous totally different limitations and optimization techniques than the CPU. The first tool presented is SOAP3-dp, which is a DNA short-read aligner highly optimized for speed. Prior to SOAP3-DP, the fastest short-read aligner was its predecessor SOAP2, which was capable of aligning 1 million 100-bp reads in 5 minutes. SOAP3-dp beats this record by aligning the same volume in only 10 seconds. The key to unlocking this unprecedented speed is the revamped BWT engine underlying SOAP3-dp. All data structures and associated operations have been tailor made for the GPU to achieve optimized performance. Experiments show that SOAP3-dp not only excels in speed, but also outperforms other aligners in both alignment sensitivity and accuracy. The next tools are for constructing data structures, namely Burrows-Wheeler transform (BWT) and de Bruijn graphs (DBGs), to facilitate genome assembly of short reads, especially large metagenomics data. The BWT index for a set of short reads has recently found its use in string-graph assemblers [44], as it provides a succinct way of representing huge string graphs which would otherwise exceed the main memory limit. Constructing the BWT index for a million reads is by itself not an easy task, let alone optimize for the GPU. Another class of assemblers, the DBG-based assemblers, also faces the same problem. This thesis presents construction algorithms for both the BWT and DBGs in a succinct form. In our experiments, we constructed the succinct DBG for a metagenomics data set with over 200 gigabases in 3 hours, and the resulting DBG only consumed 31.2 GB of memory. We also constructed the BWT index for 10 million 100-bp reads in 40 minutes using 4 quad-core machines. Lastly, we introduce a SNP detection tool, iSNPcall, which detects SNPs from a set of reads. Given a set of user-supplied annotated SNPs, iSNPcall focuses only on alignments covering these SNPs, which greatly accelerates the detection of SNPs at the prescribed loci. The annotated SNPs also helps us distinguish sequencing errors from authentic SNPs alleles easily. This is in contrast to the traditional de novo method which aligns reads onto the reference genome and then filters inauthentic mismatches according to some probabilities. After comparing on several applications, iSNPcall was found to give a higher accuracy than the de novo method, especially for samples with low coverage. Subjects: Graphics processing units Bioinformatics




Patient-specific Hemodynamic Computations: Application to Personalized Diagnosis of Cardiovascular Pathologies


Book Description

Hemodynamic computations represent a state-of-the-art approach for patient-specific assessment of cardiovascular pathologies. The book presents the development of reduced-order multiscale hemodynamic models for coronary artery disease, aortic coarctation and whole body circulation, which can be applied in routine clinical settings for personalized diagnosis. Specific parameter estimation frameworks are introduced for calibrating the parameters of the models and high performance computing solutions are employed to reduce their execution time. The personalized computational models are validated against patient-specific measurements. The book is written for scientists in the field of biomedical engineering focusing on the cardiovascular system, as well as for research-oriented physicians in cardiology and industrial players in the field of healthcare technologies.




High-Performance Computing Using FPGAs


Book Description

High-Performance Computing using FPGA covers the area of high performance reconfigurable computing (HPRC). This book provides an overview of architectures, tools and applications for High-Performance Reconfigurable Computing (HPRC). FPGAs offer very high I/O bandwidth and fine-grained, custom and flexible parallelism and with the ever-increasing computational needs coupled with the frequency/power wall, the increasing maturity and capabilities of FPGAs, and the advent of multicore processors which has caused the acceptance of parallel computational models. The Part on architectures will introduce different FPGA-based HPC platforms: attached co-processor HPRC architectures such as the CHREC’s Novo-G and EPCC’s Maxwell systems; tightly coupled HRPC architectures, e.g. the Convey hybrid-core computer; reconfigurably networked HPRC architectures, e.g. the QPACE system, and standalone HPRC architectures such as EPFL’s CONFETTI system. The Part on Tools will focus on high-level programming approaches for HPRC, with chapters on C-to-Gate tools (such as Impulse-C, AutoESL, Handel-C, MORA-C++); Graphical tools (MATLAB-Simulink, NI LabVIEW); Domain-specific languages, languages for heterogeneous computing(for example OpenCL, Microsoft’s Kiwi and Alchemy projects). The part on Applications will present case from several application domains where HPRC has been used successfully, such as Bioinformatics and Computational Biology; Financial Computing; Stencil computations; Information retrieval; Lattice QCD; Astrophysics simulations; Weather and climate modeling.