Efficient Feature-Driven Visualization of Large-Scale Scientific Data


Book Description

Very large, complex scientific data acquired in many research areas creates critical challenges for scientists to understand, analyze, and organize their data. The objective of this project is to expand the feature extraction and analysis capabilities to develop powerful and accurate visualization tools that can assist domain scientists with their requirements in multiple phases of scientific discovery. We have recently developed several feature-driven visualization methods for extracting different data characteristics of volumetric datasets. Our results verify the hypothesis in the proposal and will be used to develop additional prototype systems.




Feature Extraction and Parallel Visualization for Large-scale Scientific Data


Book Description

Advanced computing and sensing technologies enable scientists to study natural and physical phenomena with unprecedented precision, resulting in an explosive growth of data. The unprecedented amounts of data generated from large scientific simulations impose a grand challenge in data analytics and visualization due to the fact that data are too massive for transferring, storing, and processing.







High Performance Visualization


Book Description

Visualization and analysis tools, techniques, and algorithms have undergone a rapid evolution in recent decades to accommodate explosive growth in data size and complexity and to exploit emerging multi- and many-core computational platforms. High Performance Visualization: Enabling Extreme-Scale Scientific Insight focuses on the subset of scientific visualization concerned with algorithm design, implementation, and optimization for use on today’s largest computational platforms. The book collects some of the most seminal work in the field, including algorithms and implementations running at the highest levels of concurrency and used by scientific researchers worldwide. After introducing the fundamental concepts of parallel visualization, the book explores approaches to accelerate visualization and analysis operations on high performance computing platforms. Looking to the future and anticipating changes to computational platforms in the transition from the petascale to exascale regime, it presents the main research challenges and describes several contemporary, high performance visualization implementations. Reflecting major concepts in high performance visualization, this book unifies a large and diverse body of computer science research, development, and practical applications. It describes the state of the art at the intersection of scientific visualization, large data, and high performance computing trends, giving readers the foundation to apply the concepts and carry out future research in this area.




Scalable Extraction and Visualization of Scientific Features with Load-balanced Parallelism


Book Description

Extracting and visualizing features from scientific data can help scientists derive valuable insights. An extraction and visualization pipeline usually includes three steps: (1) scientific feature detection, (2) union-find for features' connected component labeling, and (3) visualization and analysis. As the scale of scientific data generated by experiments and simulations grows, it becomes a common practice to use distributed computing to handle large-scale data with data-parallelism, where data is partitioned and distributed over parallel processors. Three challenges arise for feature extraction and visualization on scientific applications. First, traditional feature detectors may not be effective and robust enough to capture features of interest across different scientific settings, because scientific features usually are highly nonlinear and recognized by domain scientists' soft knowledge. Second, existing union-find algorithms are either serial or not scalable enough to deal with extreme-scale datasets generated in the modern era. Third, existing parallel feature extraction and visualization algorithms fail to automatically reduce communication costs when optimizing the performance of processing units. This dissertation studies scalable scientific feature extraction and visualization to tackle the three challenges. First, we design human-centric interactive visual analytics based on scientists' requirements to address domain-specific feature detection and tracking. We focus on an essential problem in earth sciences: spatiotemporal analysis of viscous and gravitational fingers. Viscous and gravitational flow instabilities cause a displacement front to break up into finger-like fluids. Previously, scientists mainly detected the finger features using density thresholding, where scientists specify certain density thresholds and extract super-level sets from input density scalar fields. However, the results of density thresholding are sensitive to the selected threshold values, and a few single threshold values are usually not sufficient to extract and track satisfied time-varying finger features. In our study, scientists can detect and visualize spatiotemporal fingers interactively to elucidate the dynamics of the flow instabilities. Our study has two main contributions. (1) We propose a ridge-guided detection to extract curvilinear geometry and branching topology of fingers, which provides richer geometric structures than the density thresholding. (2) We devise an interactive visual-analytics system with geometric-glyph augmented tracking graphs to allow scientists to navigate how the fingers and their branches grow, merge, and split over both space and time. Feedback from earth scientists demonstrates the efficacy of our approach for spatiotemporal geometry-driven analyses of fingers. Second, we improve the scalability of union-find algorithms using asynchronous and load-balanced parallelism. Union-find is widely used in scientific feature extraction and visualization techniques, such as tracking critical points and extracting level sets. However, distributed and parallel union-find can suffer from high synchronization costs and imbalanced workloads of participating processors. In our study, we present a novel distributed union-find algorithm that features asynchronous parallelism and k-d tree based load balancing for scalable scientific feature extraction and visualization. We prove that global synchronizations in existing distributed union-find can be eliminated without changing final results, allowing overlapped communications and computations for scalable processing. We also use a k-d tree decomposition to redistribute inputs in order to improve workload balancing. We benchmark the scalability of our algorithm with up to 1,024 processors using both synthetic and application data. We demonstrate the use of our algorithm in critical point tracking and super-level set extraction with high-speed imaging experiments and fusion plasma simulations, respectively. Third, we take communication costs into account of parallel algorithm design. We explore an online reinforcement learning (RL) paradigm to optimize parallel particle tracing performance dynamically in distributed-memory systems with the reduction of I/O and communication costs. Our method combines three novel components: (1) a workload donation model, (2) a high-order workload estimation model, and (3) a communication cost model. First, our RL-based workload donation model monitors the workloads of processors and creates RL agents to donate particles and data blocks from high-workload processors to low-workload processors to minimize the execution time. The RL agents learn the donation strategy on-the-fly based on reward and cost functions. The reward and cost functions are designed to consider processors' workload changes and data transfer costs for every donation action. Second, we propose an online workload estimation model to help our RL model estimate the workload distribution of processors in future computations. Third, we use the communication cost model that considers both block and particle data exchange costs to help the agents make effective decisions with minimized communication costs. We demonstrate that our algorithm adapts to different flow behaviors in large-scale fluid dynamics, ocean, and weather simulation data. Our algorithm improves parallel particle tracing performance in terms of parallel efficiency, load balance, and costs of I/O and communication for evaluations up to 16,384 processors.




Advanced Visualization Techniques and Data Representations for Large Scale Scientific Data


Book Description

Scientific simulations provide a critical means for understanding and predicting important natural phenomena, often having significant impact on policy-making and the environment's well-being on the regional and global scales. The output of a typical leading-edge simulation is so voluminous and complex that advanced visualization techniques are urgently needed to explore and interpret the computed results. The new challenges of visualizing large simulation data are mainly imposed by the fact that data are too massive for transferring, storing, and processing. The gap between data generation and scientific discovery is getting wider. A viable solution to bridge the disparity is based on the concept of in-situ processing that can greatly reduce data movement and storage requirements by coupling visualization with simulation. It thus requires designing and deploying new parallel visualization techniques on cutting-edge high performance systems characterized by heterogeneous processors, a high level of concurrency, and deep memory hierarchies. This dissertation makes contributions to the design of new visualization and data representation techniques to facilitate large-scale visualization on highly parallel distributed systems. We carefully study novel data representations of large and complex simulation data, and explore corresponding data partitioning and distribution schemes to ensure the stability of a visualization system in a large heterogeneous computing environment. Another task of this research is to exploit intra-node and inter-node parallelism at a high level of concurrency to improve parallel efficiency of visualization algorithms. We also study the communication patterns and data access patterns of parallel visualization process, and evaluate and enhance our new data representations to minimize inter-node data exchange. Lastly, we pair these techniques with multi-resolution advantage of data abstraction guided by an uncertainty-driven approach to make it possible to realize scalable visualization solutions for large simulations. We carry out the experimental study based on selected, representative simulations and corresponding applications, such as high-performance and high-quality visualization of climate models and efficient data representations for the analysis of large-scale flow simulations. We demonstrate that well-designed visualization techniques and data representations for simulation data can facilitate more responsive and intuitive studies of visualization at large scale, and hence enhance scientists' potential to discover complex patterns and understand numerical simulations.




Distribution-based Exploration and Visualization of Large-scale Vector and Multivariate Fields


Book Description

Due to the ever increasing of computing power in the last few decades, the size of scientific data produced by various scientific simulations has been growing rapidly. As a result, effective techniques to visualize and explore those large-scale scientific data are becoming more and more important in understanding the data. However, for data at such a large scale, effective analysis and visualization is a non-trivial task due to several reasons. First, it is often time consuming and memory intensive to perform visualization and analysis directly on the original data. Second, as the data become large and complex, visualization usually suffers from visual cluttering and occlusion, which makes it difficult for users to understand the data. In order to address the aforementioned challenges, in this dissertation, a distribution-based query-driven framework to visualize and analyze large-scale scientific data is proposed. We propose to use statistical distributions to summarize large-scale data sets. The summarized data is then used to substitute the original data to support efficient and interactive query-driven visualization which is often free of occlusion. In this dissertation, the proposed framework is applied to flow fields and multivariate scalar fields. We first demonstrate the application of the proposed framework to flow fields. For a flow field, the statistical data summarization is computed from geometries such as streamlines and stream surfaces computed from the flow field. Stream surfaces and streamlines are two popular methods for visualizing flow fields. When the data size is large, distributed memory parallelism usually is needed. In this dissertation, a new scalable algorithm is proposed to compute stream surfaces from large-scale flow fields efficiently on distributed memory machines. After we obtain a large number of computed streamlines or stream surfaces, a direct visualization of all the densely computed geometries is seldom useful due to visual cluttering and occlusion. To solve the visual cluttering problem, a distribution-based query-driven framework to explore those densely computed streamlines is presented. Then, the proposed framework is applied to multivariate scalar fields. When dealing with multivariate data, in order to understand the data, it is often useful to show the regions of interest based on user specified criteria. In the presence of large-scale multivariate data, efficient techniques to summarize the data and answer users’ queries are needed. In this dissertation, we first propose to use multivariate histograms to summarize the data and demonstrate how effective query-driven visualization can be achieved based on those multivariate histograms. However, storing multivariate histograms in the form of multi-dimensional arrays is very expensive. To enable efficient visualization and exploration of multivariate data sets, we present a compact structure to store multivariate histograms to reduce their huge space cost while supporting different kinds of histogram query operations efficiently. We also present an interactive system to assist users to effectively design multivariate transfer functions. Multiple regions of interest could be highlighted through multivariate volume rendering based on the user specified multivariate transfer function.




Interactive Feature Selection and Visualization for Large Observational Data


Book Description

Data can create enormous values in both scientific and industrial fields, especially for access to new knowledge and inspiration of innovation. As the massive increases in computing power, data storage capacity, as well as capability of data generation and collection, the scientific research communities are confronting with a transformation of exploiting the advanced uses of the large-scale, complex, and high-resolution data sets in situation awareness and decision-making projects. To comprehensively analyze the big data problems requires the analyses aiming at various aspects which involves of effective selections of static and time-varying feature patterns that fulfills the interests of domain users. To fully utilize the benefits of the ever-growing size of data and computing power in real applications, we proposed a general feature analysis pipeline and an integrated system that is general, scalable, and reliable for interactive feature selection and visualization of large observational data for situation awareness. The great challenge tackled in this dissertation was about how to effectively identify and select meaningful features in a complex feature space. Our research efforts mainly included three aspects: 1. Enable domain users to better define their interests of analysis; 2. Accelerate the process of feature selection; 3. Comprehensively present the intermediate and final analysis results in a visualized way. For static feature selection, we developed a series of quantitative metrics that related the user interest with the spatio-temporal characteristics of features. For timevarying feature selection, we proposed the concept of generalized feature set and used a generalized time-varying feature to describe the selection interest. Additionally, we provided a scalable system framework that manages both data processing and interactive visualization, and effectively exploits the computation and analysis resources. The methods and the system design together actualized interactive feature selections from two representative large observational data sets with large spatial and temporal resolutions respectively. The final results supported the endeavors in applications of big data analysis regarding combining the statistical methods with high performance computing techniques to visualize real events interactively.




Query-driven Analysis and Visualization for Large-scale Scientific Dataset Using Geometry Summarization and Bitmap Indexing


Book Description

The computational power of modern supercomputers grows rapidly, and it facilitates scientists to produce high-resolution datasets when simulating physical or weather models, which generate extreme scale data with multiple variables most of the time. However, storage, transmission, or exploration of such large-scale data is challenging. In the past decades, several visualization approaches have been developed to effectively explore datasets by displaying underlying information of datasets. Query-driven visualization is one of the prominent approaches, as it significantly reduces visual exploration time by only focusing on interesting or important features for further analysis and decision making. However, as the size of scientific datasets becomes too large, traditional data exploration approaches become ineffective. An emerging approach is to create data summarizations to first reduce the size of the dataset, and then perform data exploration on the data summarization. An ideal data summarization aims at preserving the characteristics of the raw data as much as possible while keeping the size small. However, to retrieve salient features from the raw data and create such importance-based data summarizations is challenging. In this dissertation, we address the issues that need to be solved when applying query-driven analysis and visualization using data summarizations.




In Situ Visualization for Computational Science


Book Description

This book provides an overview of the emerging field of in situ visualization, i.e. visualizing simulation data as it is generated. In situ visualization is a processing paradigm in response to recent trends in the development of high-performance computers. It has great promise in its ability to access increased temporal resolution and leverage extensive computational power. However, the paradigm also is widely viewed as limiting when it comes to exploration-oriented use cases. Furthermore, it will require visualization systems to become increasingly complex and constrained in usage. As research efforts on in situ visualization are growing, the state of the art and best practices are rapidly maturing. Specifically, this book contains chapters that reflect state-of-the-art research results and best practices in the area of in situ visualization. Our target audience are researchers and practitioners from the areas of mathematics computational science, high-performance computing, and computer science that work on or with in situ techniques, or desire to do so in future.