High Performance Visualization


Book Description

Visualization and analysis tools, techniques, and algorithms have undergone a rapid evolution in recent decades to accommodate explosive growth in data size and complexity and to exploit emerging multi- and many-core computational platforms. High Performance Visualization: Enabling Extreme-Scale Scientific Insight focuses on the subset of scientifi




Advanced Visualization Techniques and Data Representations for Large Scale Scientific Data


Book Description

Scientific simulations provide a critical means for understanding and predicting important natural phenomena, often having significant impact on policy-making and the environment's well-being on the regional and global scales. The output of a typical leading-edge simulation is so voluminous and complex that advanced visualization techniques are urgently needed to explore and interpret the computed results. The new challenges of visualizing large simulation data are mainly imposed by the fact that data are too massive for transferring, storing, and processing. The gap between data generation and scientific discovery is getting wider. A viable solution to bridge the disparity is based on the concept of in-situ processing that can greatly reduce data movement and storage requirements by coupling visualization with simulation. It thus requires designing and deploying new parallel visualization techniques on cutting-edge high performance systems characterized by heterogeneous processors, a high level of concurrency, and deep memory hierarchies. This dissertation makes contributions to the design of new visualization and data representation techniques to facilitate large-scale visualization on highly parallel distributed systems. We carefully study novel data representations of large and complex simulation data, and explore corresponding data partitioning and distribution schemes to ensure the stability of a visualization system in a large heterogeneous computing environment. Another task of this research is to exploit intra-node and inter-node parallelism at a high level of concurrency to improve parallel efficiency of visualization algorithms. We also study the communication patterns and data access patterns of parallel visualization process, and evaluate and enhance our new data representations to minimize inter-node data exchange. Lastly, we pair these techniques with multi-resolution advantage of data abstraction guided by an uncertainty-driven approach to make it possible to realize scalable visualization solutions for large simulations. We carry out the experimental study based on selected, representative simulations and corresponding applications, such as high-performance and high-quality visualization of climate models and efficient data representations for the analysis of large-scale flow simulations. We demonstrate that well-designed visualization techniques and data representations for simulation data can facilitate more responsive and intuitive studies of visualization at large scale, and hence enhance scientists' potential to discover complex patterns and understand numerical simulations.







Interactive GPU-based Visualization of Large Dynamic Particle Data


Book Description

Prevalent types of data in scientific visualization are volumetric data, vector field data, and particle-based data. Particle data typically originates from measurements and simulations in various fields, such as life sciences or physics. The particles are often visualized directly, that is, by simple representants like spheres. Interactive rendering facilitates the exploration and visual analysis of the data. With increasing data set sizes in terms of particle numbers, interactive high-quality visualization is a challenging task. This is especially true for dynamic data or abstract representations that are based on the raw particle data. This book covers direct particle visualization using simple glyphs as well as abstractions that are application-driven such as clustering and aggregation. It targets visualization researchers and developers who are interested in visualization techniques for large, dynamic particle-based data. Its explanations focus on GPU-accelerated algorithms for high-performance rendering and data processing that run in real-time on modern desktop hardware. Consequently, the implementation of said algorithms and the required data structures to make use of the capabilities of modern graphics APIs are discussed in detail. Furthermore, it covers GPU-accelerated methods for the generation of application-dependent abstract representations. This includes various representations commonly used in application areas such as structural biology, systems biology, thermodynamics, and astrophysics.




High Performance Visualization


Book Description

The research in this dissertation aims to address the challenges to visualization resulting from large and complex datasets. The thesis is that effective high performance visualization, which is responsive to the challenges of large data, follows from a combination of parallel software architectures and optimizations along with steps to reduce processing load in the visualization pipeline. Broadly speaking, the research follows a bifurcated approach. One approach focuses on algorithms and architectures that leverage parallel computing platforms to increase the capacity of the visualization pipeline. Topics in this approach include a sort-first parallel rendering architecture, remote parallel visualization, and the first-ever study of hybrid parallelization of volume rendering at extreme concurrency. The other approach aims to reduce the amount of work entering the visualization pipeline. We coin the term "query-driven visualization" to refer to the process of limiting the amount of data entering the visualization pipeline to that deemed "scientifically interesting," and use state-of-the-art indexing algorithms for rapid data subsetting. This approach has better performance than the best similar algorithms in visualization, and proves useful in diverse applications like forensic cybersecurity analysis and study of output from a high energy physics plasma laser-wakefield simulation code.




In Situ Visualization for Computational Science


Book Description

This book provides an overview of the emerging field of in situ visualization, i.e. visualizing simulation data as it is generated. In situ visualization is a processing paradigm in response to recent trends in the development of high-performance computers. It has great promise in its ability to access increased temporal resolution and leverage extensive computational power. However, the paradigm also is widely viewed as limiting when it comes to exploration-oriented use cases. Furthermore, it will require visualization systems to become increasingly complex and constrained in usage. As research efforts on in situ visualization are growing, the state of the art and best practices are rapidly maturing. Specifically, this book contains chapters that reflect state-of-the-art research results and best practices in the area of in situ visualization. Our target audience are researchers and practitioners from the areas of mathematics computational science, high-performance computing, and computer science that work on or with in situ techniques, or desire to do so in future.




Conquering Big Data with High Performance Computing


Book Description

This book provides an overview of the resources and research projects that are bringing Big Data and High Performance Computing (HPC) on converging tracks. It demystifies Big Data and HPC for the reader by covering the primary resources, middleware, applications, and tools that enable the usage of HPC platforms for Big Data management and processing.Through interesting use-cases from traditional and non-traditional HPC domains, the book highlights the most critical challenges related to Big Data processing and management, and shows ways to mitigate them using HPC resources. Unlike most books on Big Data, it covers a variety of alternatives to Hadoop, and explains the differences between HPC platforms and Hadoop.Written by professionals and researchers in a range of departments and fields, this book is designed for anyone studying Big Data and its future directions. Those studying HPC will also find the content valuable.




Big Data and Visual Analytics


Book Description

This book provides users with cutting edge methods and technologies in the area of big data and visual analytics, as well as an insight to the big data and data analytics research conducted by world-renowned researchers in this field. The authors present comprehensive educational resources on big data and visual analytics covering state-of-the art techniques on data analytics, data and information visualization, and visual analytics. Each chapter covers specific topics related to big data and data analytics as virtual data machine, security of big data, big data applications, high performance computing cluster, and big data implementation techniques. Every chapter includes a description of an unique contribution to the area of big data and visual analytics. This book is a valuable resource for researchers and professionals working in the area of big data, data analytics, and information visualization. Advanced-level students studying computer science will also find this book helpful as a secondary textbook or reference.




Big Data Algorithms for Visualization and Supervised Learning


Book Description

Explosive growth in data size, data complexity, and data rates, triggered by emergence of high-throughput technologies such as remote sensing, crowd-sourcing, social networks, or computational advertising, in recent years has led to an increasing availability of data sets of unprecedented scales, with billions of high-dimensional data examples stored on hundreds of terabytes of memory. In order to make use of this large-scale data and extract useful knowledge, researchers in machine learning and data mining communities are faced with numerous challenges, since the data mining and machine learning tools designed for standard desktop computers are not capable of addressing these problems due to memory and time constraints. As a result, there exists an evident need for development of novel, scalable algorithms for big data. In this thesis we address these important problems, and propose both supervised and unsupervised tools for handling large-scale data. First, we consider unsupervised approach to big data analysis, and explore scalable, efficient visualization method that allows fast knowledge extraction. Next, we consider supervised learning setting and propose algorithms for fast training of accurate classification models on large data sets, capable of learning state-of-the-art classifiers on data sets with millions of examples and features within minutes. Data visualization have been used for hundreds of years in scientific research, as it allows humans to easily get a better insight into complex data they are studying. Despite its long history, there is a clear need for further development of visualization methods when working with large-scale, high-dimensional data, where commonly used visualization tools are either too simplistic to gain a deeper insight into the data properties, or are too cumbersome or computationally costly. We present a novel method for data ordering and visualization. By combining efficient clustering using k-means algorithm and near-optimal ordering of found clusters using state-of-the-art TSP-solver, we obtain efficient algorithm that achieves performance better than existing, computationally intensive methods. In addition, we present visualization method for smaller-scale problems based on object matching. The experiments show that the methods allow for fast detection of hidden patterns, even by users without expertise in the areas of data mining and machine learning. Supervised learning is another important task, often intractable in many modern applications due to time and memory constraints, considering prohibitively large scales of the data sets. To address this issue, we first consider Multi-hyperplane Machine (MM) classification model, and propose online Adaptive MM algorithm which represents a trade-off between linear and kernel Support Vector Machines (SVMs), as it trains MMs in linear time on limited memory while achieving competitive accuracies on large-scale non-linear problems. Moreover, we present a C++ toolbox for developing scalable classification models, which provides an Application Programming Interface (API) for training of large-scale classifiers, as well as highly-optimized implementations of several state-of-the-art SVM approximators. Lastly, we consider parallelization and distributed learning approaches to large-scale supervised learning, and propose AROW-MapReduce, a distributed learning algorithm for confidence-weighted models using MapReduce framework. Experimental evaluation of the proposed methods shows state-of-the-art performance on a number of synthetic and real-world data sets, further paving a way for efficient and effective knowledge extraction from big data problems.