Computing with Data


Book Description

This book introduces basic computing skills designed for industry professionals without a strong computer science background. Written in an easily accessible manner, and accompanied by a user-friendly website, it serves as a self-study guide to survey data science and data engineering for those who aspire to start a computing career, or expand on their current roles, in areas such as applied statistics, big data, machine learning, data mining, and informatics. The authors draw from their combined experience working at software and social network companies, on big data products at several major online retailers, as well as their experience building big data systems for an AI startup. Spanning from the basic inner workings of a computer to advanced data manipulation techniques, this book opens doors for readers to quickly explore and enhance their computing knowledge. Computing with Data comprises a wide range of computational topics essential for data scientists, analysts, and engineers, providing them with the necessary tools to be successful in any role that involves computing with data. The introduction is self-contained, and chapters progress from basic hardware concepts to operating systems, programming languages, graphing and processing data, testing and programming tools, big data frameworks, and cloud computing. The book is fashioned with several audiences in mind. Readers without a strong educational background in CS--or those who need a refresher--will find the chapters on hardware, operating systems, and programming languages particularly useful. Readers with a strong educational background in CS, but without significant industry background, will find the following chapters especially beneficial: learning R, testing, programming, visualizing and processing data in Python and R, system design for big data, data stores, and software craftsmanship.




Parallel Computing for Data Science


Book Description

This is one of the first parallel computing books to focus exclusively on parallel data structures, algorithms, software tools, and applications in data science. The book prepares readers to write effective parallel code in various languages and learn more about different R packages and other tools. It covers the classic n observations, p variables matrix format and common data structures. Many examples illustrate the range of issues encountered in parallel programming.




Advances in Computing and Data Sciences


Book Description

This book constitutes the post-conference proceedings of the 4th International Conference on Advances in Computing and Data Sciences, ICACDS 2020, held in Valletta, Malta, in April 2020.* The 46 full papers were carefully reviewed and selected from 354 submissions. The papers are centered around topics like advanced computing, data sciences, distributed systems organizing principles, development frameworks and environments, software verification and validation, computational complexity and cryptography, machine learning theory, database theory, probabilistic representations. * The conference was held virtually due to the COVID-19 pandemic.




Nature Inspired Computing for Data Science


Book Description

This book discusses the current research and concepts in data science and how these can be addressed using different nature-inspired optimization techniques. Focusing on various data science problems, including classification, clustering, forecasting, and deep learning, it explores how researchers are using nature-inspired optimization techniques to find solutions to these problems in domains such as disease analysis and health care, object recognition, vehicular ad-hoc networking, high-dimensional data analysis, gene expression analysis, microgrids, and deep learning. As such it provides insights and inspiration for researchers to wanting to employ nature-inspired optimization techniques in their own endeavors.




Soft Computing in Data Science


Book Description

This book constitutes the refereed proceedings of the 6th International Conference on Soft Computing in Data Science, SCDS 2021, which was held virtually in November 2021. The 31 revised full papers presented were carefully reviewed and selected from 79 submissions. The papers are organized in topical sections on ​​AI techniques and applications; data analytics and technologies; data mining and image processing; machine & statistical learning.




Data Science and Big Data Computing


Book Description

This illuminating text/reference surveys the state of the art in data science, and provides practical guidance on big data analytics. Expert perspectives are provided by authoritative researchers and practitioners from around the world, discussing research developments and emerging trends, presenting case studies on helpful frameworks and innovative methodologies, and suggesting best practices for efficient and effective data analytics. Features: reviews a framework for fast data applications, a technique for complex event processing, and agglomerative approaches for the partitioning of networks; introduces a unified approach to data modeling and management, and a distributed computing perspective on interfacing physical and cyber worlds; presents techniques for machine learning for big data, and identifying duplicate records in data repositories; examines enabling technologies and tools for data mining; proposes frameworks for data extraction, and adaptive decision making and social media analysis.




Human-Centered Data Science


Book Description

Best practices for addressing the bias and inequality that may result from the automated collection, analysis, and distribution of large datasets. Human-centered data science is a new interdisciplinary field that draws from human-computer interaction, social science, statistics, and computational techniques. This book, written by founders of the field, introduces best practices for addressing the bias and inequality that may result from the automated collection, analysis, and distribution of very large datasets. It offers a brief and accessible overview of many common statistical and algorithmic data science techniques, explains human-centered approaches to data science problems, and presents practical guidelines and real-world case studies to help readers apply these methods. The authors explain how data scientists’ choices are involved at every stage of the data science workflow—and show how a human-centered approach can enhance each one, by making the process more transparent, asking questions, and considering the social context of the data. They describe how tools from social science might be incorporated into data science practices, discuss different types of collaboration, and consider data storytelling through visualization. The book shows that data science practitioners can build rigorous and ethical algorithms and design projects that use cutting-edge computational tools and address social concerns.




Advanced Soft Computing Techniques in Data Science, IoT and Cloud Computing


Book Description

This book plays a significant role in improvising human life to a great extent. The new applications of soft computing can be regarded as an emerging field in computer science, automatic control engineering, medicine, biology application, natural environmental engineering, and pattern recognition. Now, the exemplar model for soft computing is human brain. The use of various techniques of soft computing is nowadays successfully implemented in many domestic, commercial, and industrial applications due to the low-cost and very high-performance digital processors and also the decline price of the memory chips. This is the main reason behind the wider expansion of soft computing techniques and its application areas. These computing methods also play a significant role in the design and optimization in diverse engineering disciplines. With the influence and the development of the Internet of things (IoT) concept, the need for using soft computing techniques has become more significant than ever. In general, soft computing methods are closely similar to biological processes than traditional techniques, which are mostly based on formal logical systems, such as sentential logic and predicate logic, or rely heavily on computer-aided numerical analysis. Soft computing techniques are anticipated to complement each other. The aim of these techniques is to accept imprecision, uncertainties, and approximations to get a rapid solution. However, recent advancements in representation soft computing algorithms (fuzzy logic,evolutionary computation, machine learning, and probabilistic reasoning) generate a more intelligent and robust system providing a human interpretable, low-cost, approximate solution. Soft computing-based algorithms have demonstrated great performance to a variety of areas including multimedia retrieval, fault tolerance, system modelling, network architecture, Web semantics, big data analytics, time series, biomedical and health informatics, etc. Soft computing approaches such as genetic programming (GP), support vector machine–firefly algorithm (SVM-FFA), artificial neural network (ANN), and support vector machine–wavelet (SVM–Wavelet) have emerged as powerful computational models. These have also shown significant success in dealing with massive data analysis for large number of applications. All the researchers and practitioners will be highly benefited those who are working in field of computer engineering, medicine, biology application, signal processing, and mechanical engineering. This book is a good collection of state-of-the-art approaches for soft computing-based applications to various engineering fields. It is very beneficial for the new researchers and practitioners working in the field to quickly know the best performing methods. They would be able to compare different approaches and can carry forward their research in the most important area of research which has direct impact on betterment of the human life and health. This book is very useful because there is no book in the market which provides a good collection of state-of-the-art methods of soft computing-based models for multimedia retrieval, fault tolerance, system modelling, network architecture, Web semantics, big data analytics, time series, and biomedical and health informatics.




The Data Science Design Manual


Book Description

This engaging and clearly written textbook/reference provides a must-have introduction to the rapidly emerging interdisciplinary field of data science. It focuses on the principles fundamental to becoming a good data scientist and the key skills needed to build systems for collecting, analyzing, and interpreting data. The Data Science Design Manual is a source of practical insights that highlights what really matters in analyzing data, and provides an intuitive understanding of how these core concepts can be used. The book does not emphasize any particular programming language or suite of data-analysis tools, focusing instead on high-level discussion of important design principles. This easy-to-read text ideally serves the needs of undergraduate and early graduate students embarking on an “Introduction to Data Science” course. It reveals how this discipline sits at the intersection of statistics, computer science, and machine learning, with a distinct heft and character of its own. Practitioners in these and related fields will find this book perfect for self-study as well. Additional learning tools: Contains “War Stories,” offering perspectives on how data science applies in the real world Includes “Homework Problems,” providing a wide range of exercises and projects for self-study Provides a complete set of lecture slides and online video lectures at www.data-manual.com Provides “Take-Home Lessons,” emphasizing the big-picture concepts to learn from each chapter Recommends exciting “Kaggle Challenges” from the online platform Kaggle Highlights “False Starts,” revealing the subtle reasons why certain approaches fail Offers examples taken from the data science television show “The Quant Shop” (www.quant-shop.com)




Data Science


Book Description

A concise introduction to the emerging field of data science, explaining its evolution, relation to machine learning, current uses, data infrastructure issues, and ethical challenges. The goal of data science is to improve decision making through the analysis of data. Today data science determines the ads we see online, the books and movies that are recommended to us online, which emails are filtered into our spam folders, and even how much we pay for health insurance. This volume in the MIT Press Essential Knowledge series offers a concise introduction to the emerging field of data science, explaining its evolution, current uses, data infrastructure issues, and ethical challenges. It has never been easier for organizations to gather, store, and process data. Use of data science is driven by the rise of big data and social media, the development of high-performance computing, and the emergence of such powerful methods for data analysis and modeling as deep learning. Data science encompasses a set of principles, problem definitions, algorithms, and processes for extracting non-obvious and useful patterns from large datasets. It is closely related to the fields of data mining and machine learning, but broader in scope. This book offers a brief history of the field, introduces fundamental data concepts, and describes the stages in a data science project. It considers data infrastructure and the challenges posed by integrating data from multiple sources, introduces the basics of machine learning, and discusses how to link machine learning expertise with real-world problems. The book also reviews ethical and legal issues, developments in data regulation, and computational approaches to preserving privacy. Finally, it considers the future impact of data science and offers principles for success in data science projects.