Frequent Pattern Mining


Book Description

This comprehensive reference consists of 18 chapters from prominent researchers in the field. Each chapter is self-contained, and synthesizes one aspect of frequent pattern mining. An emphasis is placed on simplifying the content, so that students and practitioners can benefit from the book. Each chapter contains a survey describing key research on the topic, a case study and future directions. Key topics include: Pattern Growth Methods, Frequent Pattern Mining in Data Streams, Mining Graph Patterns, Big Data Frequent Pattern Mining, Algorithms for Data Clustering and more. Advanced-level students in computer science, researchers and practitioners from industry will find this book an invaluable reference.




Periodic Pattern Mining


Book Description

This book provides an introduction to the field of periodic pattern mining, reviews state-of-the-art techniques, discusses recent advances, and reviews open-source software. Periodic pattern mining is a popular and emerging research area in the field of data mining. It involves discovering all regularly occurring patterns in temporal databases. One of the major applications of periodic pattern mining is the analysis of customer transaction databases to discover sets of items that have been regularly purchased by customers. Discovering such patterns has several implications for understanding the behavior of customers. Since the first work on periodic pattern mining, numerous studies have been published and great advances have been made in this field. The book consists of three main parts: introduction, algorithms, and applications. The first chapter is an introduction to pattern mining and periodic pattern mining. The concepts of periodicity, periodic support, search space exploration techniques, and pruning strategies are discussed. The main types of algorithms are also presented such as periodic-frequent pattern growth, partial periodic pattern-growth, and periodic high-utility itemset mining algorithm. Challenges and research opportunities are reviewed. The chapters that follow present state-of-the-art techniques for discovering periodic patterns in (1) transactional databases, (2) temporal databases, (3) quantitative temporal databases, and (4) big data. Then, the theory on concise representations of periodic patterns is presented, as well as hiding sensitive information using privacy-preserving data mining techniques. The book concludes with several applications of periodic pattern mining, including applications in air pollution data analytics, accident data analytics, and traffic congestion analytics.




Efficient Frequent Pattern Mining from Big Data and Its Applications


Book Description

Frequent pattern mining is an important research areas in data mining. Since its introduction, it has drawn attention of many researchers. Consequently, many algorithms have been proposed. Popular algorithms include level-wise Apriori based algorithms, tree based algorithms, and hyperlinked array structure based algorithms. While these algorithms are popular and beneficial due to some nice properties, they also suffer from some drawbacks such as multiple database scans, recursive tree constructions, or multiple hyperlink adjustments. In the current era of big data, high volumes of a wide variety of valuable data of different veracities can be easily collected or generated at high velocity in various real-life applications. Among these 5V's of big data, I focus on handling high volumes of big data in my Ph.D. thesis. Specifically, I design and implement a new efficient frequent pattern mining algorithmic technique called B-mine, which overcomes some of the aforementioned drawbacks and achieves better performance when compared with existing algorithms. I also extend my B-mine algorithm into a family of algorithms that can perform big data mining efficiently. Moreover, I design four different frameworks that apply this family of algorithms to the real-life application of social network mining. Evaluation results show the efficiency and practicality of all these algorithms.




Improving Time and Space Efficiency of Trie Data Structure


Book Description

Trie or prefix tree [2] is a data structure that has been used widely in some applications such as prefix-matching, auto-complete suggestions, and IP routing tables for a long time. What makes tries even more interesting is that its time complexity is dependent on the length of the keys inserted or searched in the trie, instead of on the total number of keys in the data structure. Tries are also strong contenders to consider against hash tables in various applications due to two reasons - their almost deterministic time complexity based on average key length, especially when using large number of short length keys, and support for range queries. IP routing table is one such example that chooses tries over hash tables. But even with all these benefits, tries have largely remained unused in a lot of potential candidate applications , for example in database indexing, due to their space consumption. The amount of pointers used in a trie causes its space consumption to be a lot more than many other data structures such as B+ Trees. Another issue we realized with tries is that even though the time complexity can be of a magnitude far less than some other data structures for short length keys, it can be considerably higher if the keys are of longer lengths. Insertion in a trie can prove to be a repetitive operation for many nodes if the keys are repetitive or have many common prefixes adding to the execution overhead. With this in mind, we propose two optimizations of the trie data structure to address the time and space complexity issues.In the first optimization we present a system that reduces the time for inserts in the trie data structure by up-to 50% for some workloads by tweaking the algorithm. In the second optimization we developed a new version of the trie data structure by taking inspiration from B+ trees, allowing us to not only reduce the space consumption fortries but also to allow features such as efficient range search.




High-Utility Pattern Mining


Book Description

This book presents an overview of techniques for discovering high-utility patterns (patterns with a high importance) in data. It introduces the main types of high-utility patterns, as well as the theory and core algorithms for high-utility pattern mining, and describes recent advances, applications, open-source software, and research opportunities. It also discusses several types of discrete data, including customer transaction data and sequential data. The book consists of twelve chapters, seven of which are surveys presenting the main subfields of high-utility pattern mining, including itemset mining, sequential pattern mining, big data pattern mining, metaheuristic-based approaches, privacy-preserving pattern mining, and pattern visualization. The remaining five chapters describe key techniques and applications, such as discovering concise representations and regular patterns.




Data Mining


Book Description

This textbook explores the different aspects of data mining from the fundamentals to the complex data types and their applications, capturing the wide diversity of problem domains for data mining issues. It goes beyond the traditional focus on data mining problems to introduce advanced data types such as text, time series, discrete sequences, spatial data, graph data, and social networks. Until now, no single book has addressed all these topics in a comprehensive and integrated way. The chapters of this book fall into one of three categories: Fundamental chapters: Data mining has four main problems, which correspond to clustering, classification, association pattern mining, and outlier analysis. These chapters comprehensively discuss a wide variety of methods for these problems. Domain chapters: These chapters discuss the specific methods used for different domains of data such as text data, time-series data, sequence data, graph data, and spatial data. Application chapters: These chapters study important applications such as stream mining, Web mining, ranking, recommendations, social networks, and privacy preservation. The domain chapters also have an applied flavor. Appropriate for both introductory and advanced data mining courses, Data Mining: The Textbook balances mathematical details and intuition. It contains the necessary mathematical details for professors and researchers, but it is presented in a simple and intuitive style to improve accessibility for students and industrial practitioners (including those with a limited mathematical background). Numerous illustrations, examples, and exercises are included, with an emphasis on semantically interpretable examples. Praise for Data Mining: The Textbook - “As I read through this book, I have already decided to use it in my classes. This is a book written by an outstanding researcher who has made fundamental contributions to data mining, in a way that is both accessible and up to date. The book is complete with theory and practical use cases. It’s a must-have for students and professors alike!" -- Qiang Yang, Chair of Computer Science and Engineering at Hong Kong University of Science and Technology "This is the most amazing and comprehensive text book on data mining. It covers not only the fundamental problems, such as clustering, classification, outliers and frequent patterns, and different data types, including text, time series, sequences, spatial data and graphs, but also various applications, such as recommenders, Web, social network and privacy. It is a great book for graduate students and researchers as well as practitioners." -- Philip S. Yu, UIC Distinguished Professor and Wexler Chair in Information Technology at University of Illinois at Chicago




Handbook of Data Structures and Applications


Book Description

The Handbook of Data Structures and Applications was first published over a decade ago. This second edition aims to update the first by focusing on areas of research in data structures that have seen significant progress. While the discipline of data structures has not matured as rapidly as other areas of computer science, the book aims to update those areas that have seen advances. Retaining the seven-part structure of the first edition, the handbook begins with a review of introductory material, followed by a discussion of well-known classes of data structures, Priority Queues, Dictionary Structures, and Multidimensional structures. The editors next analyze miscellaneous data structures, which are well-known structures that elude easy classification. The book then addresses mechanisms and tools that were developed to facilitate the use of data structures in real programs. It concludes with an examination of the applications of data structures. Four new chapters have been added on Bloom Filters, Binary Decision Diagrams, Data Structures for Cheminformatics, and Data Structures for Big Data Stores, and updates have been made to other chapters that appeared in the first edition. The Handbook is invaluable for suggesting new ideas for research in data structures, and for revealing application contexts in which they can be deployed. Practitioners devising algorithms will gain insight into organizing data, allowing them to solve algorithmic problems more efficiently.




Large-Scale Parallel Data Mining


Book Description

With the unprecedented growth-rate at which data is being collected and stored electronically today in almost all fields of human endeavor, the efficient extraction of useful information from the data available is becoming an increasing scientific challenge and a massive economic need. This book presents thoroughly reviewed and revised full versions of papers presented at a workshop on the topic held during KDD'99 in San Diego, California, USA in August 1999 complemented by several invited chapters and a detailed introductory survey in order to provide complete coverage of the relevant issues. The contributions presented cover all major tasks in data mining including parallel and distributed mining frameworks, associations, sequences, clustering, and classification. All in all, the volume presents the state of the art in the young and dynamic field of parallel and distributed data mining methods. It will be a valuable source of reference for researchers and professionals.




Advanced Computer Science Applications


Book Description

This new book brings together the most recent trends related to AI, machine learning, and network security. The chapters cover diverse topics on machine learning algorithms and security analytics, AI and machine learning, and ntework security applications. The volume presents a survey of speculative parallelism techniques, performance reviews, and efficient power consumption. The book also covers the concepts of IoT, security early detection for COVID-19, multimetric geoprahpical routing in VANETs, V2X communication in VANET, and optimization of congestion control scheme for VANETs. This book is a comprehensive take on recent applications and advancement in the field of computer science and will be of value to scientists, researchers, faculty, and students involved in research in the area of AI, machine learning, and network security.