Core Data Analysis: Summarization, Correlation, and Visualization


Book Description

This text examines the goals of data analysis with respect to enhancing knowledge, and identifies data summarization and correlation analysis as the core issues. Data summarization, both quantitative and categorical, is treated within the encoder-decoder paradigm bringing forward a number of mathematically supported insights into the methods and relations between them. Two Chapters describe methods for categorical summarization: partitioning, divisive clustering and separate cluster finding and another explain the methods for quantitative summarization, Principal Component Analysis and PageRank. Features: · An in-depth presentation of K-means partitioning including a corresponding Pythagorean decomposition of the data scatter. · Advice regarding such issues as clustering of categorical and mixed scale data, similarity and network data, interpretation aids, anomalous clusters, the number of clusters, etc. · Thorough attention to data-driven modelling including a number of mathematically stated relations between statistical and geometrical concepts including those between goodness-of-fit criteria for decision trees and data standardization, similarity and consensus clustering, modularity clustering and uniform partitioning. New edition highlights: · Inclusion of ranking issues such as Google PageRank, linear stratification and tied rankings median, consensus clustering, semi-average clustering, one-cluster clustering · Restructured to make the logics more straightforward and sections self-contained Core Data Analysis: Summarization, Correlation and Visualization is aimed at those who are eager to participate in developing the field as well as appealing to novices and practitioners.




Core Concepts in Data Analysis: Summarization, Correlation and Visualization


Book Description

Core Concepts in Data Analysis: Summarization, Correlation and Visualization provides in-depth descriptions of those data analysis approaches that either summarize data (principal component analysis and clustering, including hierarchical and network clustering) or correlate different aspects of data (decision trees, linear rules, neuron networks, and Bayes rule). Boris Mirkin takes an unconventional approach and introduces the concept of multivariate data summarization as a counterpart to conventional machine learning prediction schemes, utilizing techniques from statistics, data analysis, data mining, machine learning, computational intelligence, and information retrieval. Innovations following from his in-depth analysis of the models underlying summarization techniques are introduced, and applied to challenging issues such as the number of clusters, mixed scale data standardization, interpretation of the solutions, as well as relations between seemingly unrelated concepts: goodness-of-fit functions for classification trees and data standardization, spectral clustering and additive clustering, correlation and visualization of contingency data. The mathematical detail is encapsulated in the so-called “formulation” parts, whereas most material is delivered through “presentation” parts that explain the methods by applying them to small real-world data sets; concise “computation” parts inform of the algorithmic and coding issues. Four layers of active learning and self-study exercises are provided: worked examples, case studies, projects and questions.




Core Concepts in Data Analysis: Summarization, Correlation and Visualization


Book Description

Core Concepts in Data Analysis: Summarization, Correlation and Visualization provides in-depth descriptions of those data analysis approaches that either summarize data (principal component analysis and clustering, including hierarchical and network clustering) or correlate different aspects of data (decision trees, linear rules, neuron networks, and Bayes rule). Boris Mirkin takes an unconventional approach and introduces the concept of multivariate data summarization as a counterpart to conventional machine learning prediction schemes, utilizing techniques from statistics, data analysis, data mining, machine learning, computational intelligence, and information retrieval. Innovations following from his in-depth analysis of the models underlying summarization techniques are introduced, and applied to challenging issues such as the number of clusters, mixed scale data standardization, interpretation of the solutions, as well as relations between seemingly unrelated concepts: goodness-of-fit functions for classification trees and data standardization, spectral clustering and additive clustering, correlation and visualization of contingency data. The mathematical detail is encapsulated in the so-called “formulation” parts, whereas most material is delivered through “presentation” parts that explain the methods by applying them to small real-world data sets; concise “computation” parts inform of the algorithmic and coding issues. Four layers of active learning and self-study exercises are provided: worked examples, case studies, projects and questions.




Applications of Artificial Intelligence in COVID-19


Book Description

The book examines the role of artificial intelligence during the COVID-19 pandemic, including its application in i) early warnings and alerts, ii) tracking and prediction, iii) data dashboards, iv) diagnosis and prognosis, v) treatments, and cures, and vi) social control. It explores the use of artificial intelligence in the context of population screening and assessing infection risks, and presents mathematical models for epidemic prediction of COVID-19. Furthermore, the book discusses artificial intelligence-mediated diagnosis, and how machine learning can help in the development of drugs to treat the disease. Lastly, it analyzes various artificial intelligence-based models to improve the critical care of COVID-19 patients.




Scientific Data Analysis with R


Book Description

In an era marked by exponential growth in data generation and an unprecedented convergence of technology and healthcare, the intersection of biostatistics and data science has become a pivotal domain. This book is the ideal companion in navigating the convergence of statistical methodologies and data science techniques with diverse applications implemented in the open-source environment of R. It is designed to be a comprehensive guide, marrying the principles of biostatistics with the practical implementation of statistics and data science in R, thereby empowering learners, researchers, and practitioners with the tools necessary to extract meaningful knowledge from biological, health, and medical datasets. This book is intended for students, researchers, and professionals eager to harness the combined power of biostatistics, data science, and the R programming language while gathering vital statistical knowledge needed for cutting-edge scientists in all fields. It is useful for those seeking to understand the basics of data science and statistical analysis, or looking to enhance their skills in handling any simple or complex data including biological, health, medical, and industry data. Key Features: Presents contemporary concepts of data science and biostatistics with real-life data analysis examples Promotes the evolution of fundamental and advanced methods applying to real-life problem-solving cases Explores computational statistical data science techniques from initial conception to recent developments of biostatistics Provides all R codes and real-world datasets to practice and competently apply into reader’s own domains Written in an exclusive state-of-the-art deductive approach without any theoretical hitches to support all contemporary readers




Data Analysis and Optimization


Book Description

This book presents the state-of-the-art in the emerging field of data science and includes models for layered security with applications in the protection of sites—such as large gathering places—through high-stake decision-making tasks. Such tasks include cancer diagnostics, self-driving cars, and others where wrong decisions can possibly have catastrophic consequences. Additionally, this book provides readers with automated methods to analyze patterns and models for various types of data, with applications ranging from scientific discovery to business intelligence and analytics. The book primarily includes exploratory data analysis, pattern mining, clustering, and classification supported by real life case studies. The statistical section of this book explores the impact of data mining and modeling on the predictability assessment of time series. Further new notions of mean values based on ideas of multi-criteria optimization are compared with their conventional definitions, leading to new algorithmic approaches to the calculation of the suggested new means. The style of the written chapters and the provision of a broad yet in-depth overview of data mining, integrating novel concepts from machine learning and statistics, make the book accessible to upper level undergraduate and graduate students in data mining courses. Students and professionals specializing in computer and management science, data mining for high-dimensional data, complex graphs and networks will benefit from the cutting-edge ideas and practically motivated case studies in this book.




Database and Expert Systems Applications


Book Description

The double volumes LNCS 12391-12392 constitutes the papers of the 31st International Conference on Database and Expert Systems Applications, DEXA 2020, which will be held online in September 2020. The 38 full papers presented together with 20 short papers plus 1 keynote papers in these volumes were carefully reviewed and selected from a total of 190 submissions.




Intelligent Data Engineering and Automated Learning – IDEAL 2019


Book Description

This two-volume set of LNCS 11871 and 11872 constitutes the thoroughly refereed conference proceedings of the 20th International Conference on Intelligent Data Engineering and Automated Learning, IDEAL 2019, held in Manchester, UK, in November 2019. The 94 full papers presented were carefully reviewed and selected from 149 submissions. These papers provided a timely sample of the latest advances in data engineering and machine learning, from methodologies, frameworks, and algorithms to applications. The core themes of IDEAL 2019 include big data challenges, machine learning, data mining, information retrieval and management, bio-/neuro-informatics, bio-inspired models (including neural networks, evolutionary computation and swarm intelligence), agents and hybrid intelligent systems, real-world applications of intelligent techniques and AI.







Clusters, Orders, and Trees: Methods and Applications


Book Description

The volume is dedicated to Boris Mirkin on the occasion of his 70th birthday. In addition to his startling PhD results in abstract automata theory, Mirkin’s ground breaking contributions in various fields of decision making and data analysis have marked the fourth quarter of the 20th century and beyond. Mirkin has done pioneering work in group choice, clustering, data mining and knowledge discovery aimed at finding and describing non-trivial or hidden structures—first of all, clusters, orderings and hierarchies—in multivariate and/or network data. This volume contains a collection of papers reflecting recent developments rooted in Mirkin’s fundamental contribution to the state-of-the-art in group choice, ordering, clustering, data mining and knowledge discovery. Researchers, students and software engineers will benefit from new knowledge discovery techniques and application directions.