An Application of Multivariate Statistical Analysis for Query-Driven Visualization


Book Description

Abstract?Driven by the ability to generate ever-larger, increasingly complex data, there is an urgent need in the scientific community for scalable analysis methods that can rapidly identify salient trends in scientific data. Query-Driven Visualization (QDV) strategies are among the small subset of techniques that can address both large and highly complex datasets. This paper extends the utility of QDV strategies with a statistics-based framework that integrates non-parametric distribution estimation techniques with a new segmentation strategy to visually identify statistically significant trends and features within the solution space of a query. In this framework, query distribution estimates help users to interactively explore their query's solution and visually identify the regions where the combined behavior of constrained variables is most important, statistically, to their inquiry. Our new segmentation strategy extends the distribution estimation analysis by visually conveying the individual importance of each variable to these regions of high statistical significance. We demonstrate the analysis benefits these two strategies provide and show how they may be used to facilitate the refinement of constraints over variables expressed in a user's query. We apply our method to datasets from two different scientific domains to demonstrate its broad applicability.




Query-driven Visualization Strategies for the Analysis and Visualization of Large, Complex Datasets


Book Description

There is an urgent need in scientific communities, driven by their ability to generate ever-larger, increasingly complex data, for scalable analysis methods that rapidly identify salient trends in scientific data. Query-Driven Visualization (QDV) methods are among the small subset of techniques that are able to address both large and highly complex datasets---e.g. multivariate, multitemporal, and multiresolution representations of scalar, vector, and function field data. This dissertation presents new methods that either directly extend the utility and accelerate the performance of QDV as a whole, or enable QDV's substantial and flexible analysis strengths to be applied to new areas of scientific research. The first part of this dissertation presents a new data-parallel strategy that accelerates the most fundamental task performed by QDV: the evaluation of user defined, ad hoc queries. The second part of this dissertation extends QDV strategies to analyze and visualize time-varying adaptive mesh refinement (AMR) data. AMR techniques are used in many scientific communities to efficiently and accurately model complex, continuous physical phenomena. By extending QDV methods to address the dynamic spatiotemporal properties of time-varying AMR data, I provide scientists with a powerful tool for visually analyzing the data generated from these important simulations. The final part of this dissertation leverages statistical analysis methods to generate deeper insight into the regions that are selected by a user's query. In this effort I introduce two new methods that increase the utility of query-driven strategies. The first strategy uses correlation fields, created between pairs of variables, in conjunction with the cumulative distribution functions (CDF) of variables expressed in a user's query. This strategy identifies important variable interactions within query regions. The second strategy forms a statistical-based segmentation within the query-region to generate deeper insight into the ``statistical structure'' of a user's query. In this approach, segments indicate which variable contributes most to the underlying joint density distribution of the user's query. These segments, when used in conjunction with each variable's CDF, intuitively aid users in refining the constraints over the variables in their query.




High Performance Visualization


Book Description

Visualization and analysis tools, techniques, and algorithms have undergone a rapid evolution in recent decades to accommodate explosive growth in data size and complexity and to exploit emerging multi- and many-core computational platforms. High Performance Visualization: Enabling Extreme-Scale Scientific Insight focuses on the subset of scientific visualization concerned with algorithm design, implementation, and optimization for use on today’s largest computational platforms. The book collects some of the most seminal work in the field, including algorithms and implementations running at the highest levels of concurrency and used by scientific researchers worldwide. After introducing the fundamental concepts of parallel visualization, the book explores approaches to accelerate visualization and analysis operations on high performance computing platforms. Looking to the future and anticipating changes to computational platforms in the transition from the petascale to exascale regime, it presents the main research challenges and describes several contemporary, high performance visualization implementations. Reflecting major concepts in high performance visualization, this book unifies a large and diverse body of computer science research, development, and practical applications. It describes the state of the art at the intersection of scientific visualization, large data, and high performance computing trends, giving readers the foundation to apply the concepts and carry out future research in this area.




In Situ Visualization for Computational Science


Book Description

This book provides an overview of the emerging field of in situ visualization, i.e. visualizing simulation data as it is generated. In situ visualization is a processing paradigm in response to recent trends in the development of high-performance computers. It has great promise in its ability to access increased temporal resolution and leverage extensive computational power. However, the paradigm also is widely viewed as limiting when it comes to exploration-oriented use cases. Furthermore, it will require visualization systems to become increasingly complex and constrained in usage. As research efforts on in situ visualization are growing, the state of the art and best practices are rapidly maturing. Specifically, this book contains chapters that reflect state-of-the-art research results and best practices in the area of in situ visualization. Our target audience are researchers and practitioners from the areas of mathematics computational science, high-performance computing, and computer science that work on or with in situ techniques, or desire to do so in future.




Distribution-based Exploration and Visualization of Large-scale Vector and Multivariate Fields


Book Description

Due to the ever increasing of computing power in the last few decades, the size of scientific data produced by various scientific simulations has been growing rapidly. As a result, effective techniques to visualize and explore those large-scale scientific data are becoming more and more important in understanding the data. However, for data at such a large scale, effective analysis and visualization is a non-trivial task due to several reasons. First, it is often time consuming and memory intensive to perform visualization and analysis directly on the original data. Second, as the data become large and complex, visualization usually suffers from visual cluttering and occlusion, which makes it difficult for users to understand the data. In order to address the aforementioned challenges, in this dissertation, a distribution-based query-driven framework to visualize and analyze large-scale scientific data is proposed. We propose to use statistical distributions to summarize large-scale data sets. The summarized data is then used to substitute the original data to support efficient and interactive query-driven visualization which is often free of occlusion. In this dissertation, the proposed framework is applied to flow fields and multivariate scalar fields. We first demonstrate the application of the proposed framework to flow fields. For a flow field, the statistical data summarization is computed from geometries such as streamlines and stream surfaces computed from the flow field. Stream surfaces and streamlines are two popular methods for visualizing flow fields. When the data size is large, distributed memory parallelism usually is needed. In this dissertation, a new scalable algorithm is proposed to compute stream surfaces from large-scale flow fields efficiently on distributed memory machines. After we obtain a large number of computed streamlines or stream surfaces, a direct visualization of all the densely computed geometries is seldom useful due to visual cluttering and occlusion. To solve the visual cluttering problem, a distribution-based query-driven framework to explore those densely computed streamlines is presented. Then, the proposed framework is applied to multivariate scalar fields. When dealing with multivariate data, in order to understand the data, it is often useful to show the regions of interest based on user specified criteria. In the presence of large-scale multivariate data, efficient techniques to summarize the data and answer users’ queries are needed. In this dissertation, we first propose to use multivariate histograms to summarize the data and demonstrate how effective query-driven visualization can be achieved based on those multivariate histograms. However, storing multivariate histograms in the form of multi-dimensional arrays is very expensive. To enable efficient visualization and exploration of multivariate data sets, we present a compact structure to store multivariate histograms to reduce their huge space cost while supporting different kinds of histogram query operations efficiently. We also present an interactive system to assist users to effectively design multivariate transfer functions. Multiple regions of interest could be highlighted through multivariate volume rendering based on the user specified multivariate transfer function.




Data Mining and Data Visualization


Book Description

Data Mining and Data Visualization focuses on dealing with large-scale data, a field commonly referred to as data mining. The book is divided into three sections. The first deals with an introduction to statistical aspects of data mining and machine learning and includes applications to text analysis, computer intrusion detection, and hiding of information in digital files. The second section focuses on a variety of statistical methodologies that have proven to be effective in data mining applications. These include clustering, classification, multivariate density estimation, tree-based methods, pattern recognition, outlier detection, genetic algorithms, and dimensionality reduction. The third section focuses on data visualization and covers issues of visualization of high-dimensional data, novel graphical techniques with a focus on human factors, interactive graphics, and data visualization using virtual reality. This book represents a thorough cross section of internationally renowned thinkers who are inventing methods for dealing with a new data paradigm. Distinguished contributors who are international experts in aspects of data mining Includes data mining approaches to non-numerical data mining including text data, Internet traffic data, and geographic data Highly topical discussions reflecting current thinking on contemporary technical issues, e.g. streaming data Discusses taxonomy of dataset sizes, computational complexity, and scalability usually ignored in most discussions Thorough discussion of data visualization issues blending statistical, human factors, and computational insights




Smoothing of Multivariate Data


Book Description

An applied treatment of the key methods and state-of-the-art tools for visualizing and understanding statistical data Smoothing of Multivariate Data provides an illustrative and hands-on approach to the multivariate aspects of density estimation, emphasizing the use of visualization tools. Rather than outlining the theoretical concepts of classification and regression, this book focuses on the procedures for estimating a multivariate distribution via smoothing. The author first provides an introduction to various visualization tools that can be used to construct representations of multivariate functions, sets, data, and scales of multivariate density estimates. Next, readers are presented with an extensive review of the basic mathematical tools that are needed to asymptotically analyze the behavior of multivariate density estimators, with coverage of density classes, lower bounds, empirical processes, and manipulation of density estimates. The book concludes with an extensive toolbox of multivariate density estimators, including anisotropic kernel estimators, minimization estimators, multivariate adaptive histograms, and wavelet estimators. A completely interactive experience is encouraged, as all examples and figurescan be easily replicated using the R software package, and every chapter concludes with numerous exercises that allow readers to test their understanding of the presented techniques. The R software is freely available on the book's related Web site along with "Code" sections for each chapter that provide short instructions for working in the R environment. Combining mathematical analysis with practical implementations, Smoothing of Multivariate Data is an excellent book for courses in multivariate analysis, data analysis, and nonparametric statistics at the upper-undergraduate and graduatelevels. It also serves as a valuable reference for practitioners and researchers in the fields of statistics, computer science, economics, and engineering.




Variable Interactions in Query-Driven Visualization


Book Description

One fundamental element of scientific inquiry is discoveringrelationships, particularly the interactions between different variablesin observed or simulated phenomena. Building upon our prior work in thefield of Query-Driven Visualization, where visual data analysisprocessing is focused on subsets of large data deemed to be"scientifically interesting," this new work focuses on a novel knowledgediscovery capability suitable for use with petascale class datasets. Itenables visual presentation of the presence or absence of relationships(correlations) between variables in data subsets produced by Query-Drivenmethodologies. This technique holds great potential for enablingknowledge discovery from large and complex datasets currently emergingfrom SciDAC and INCITE projects. It is sufficiently generally to beapplicable to any time of complex, time-varying, multivariate data fromstructured, unstructured or adaptive grids.




Applied Multivariate Statistical Analysis


Book Description

With a wealth of examples and exercises, this is a brand new edition of a classic work on multivariate data analysis. A key advantage of the work is its accessibility as it presents tools and concepts in a way that is understandable for non-mathematicians.




Multivariate Statistical Modeling in Engineering and Management


Book Description

The book focuses on problem solving for practitioners and model building for academicians under multivariate situations. This book helps readers in understanding the issues, such as knowing variability, extracting patterns, building relationships, and making objective decisions. A large number of multivariate statistical models are covered in the book. The readers will learn how a practical problem can be converted to a statistical problem and how the statistical solution can be interpreted as a practical solution. Key features: Links data generation process with statistical distributions in multivariate domain Provides step by step procedure for estimating parameters of developed models Provides blueprint for data driven decision making Includes practical examples and case studies relevant for intended audiences The book will help everyone involved in data driven problem solving, modeling and decision making.