Making Databases Work


Book Description

This book celebrates Michael Stonebraker's accomplishments that led to his 2014 ACM A.M. Turing Award "for fundamental contributions to the concepts and practices underlying modern database systems." The book describes, for the broad computing community, the unique nature, significance, and impact of Mike's achievements in advancing modern database systems over more than forty years. Today, data is considered the world's most valuable resource, whether it is in the tens of millions of databases used to manage the world's businesses and governments, in the billions of databases in our smartphones and watches, or residing elsewhere, as yet unmanaged, awaiting the elusive next generation of database systems. Every one of the millions or billions of databases includes features that are celebrated by the 2014 Turing Award and are described in this book. Why should I care about databases? What is a database? What is data management? What is a database management system (DBMS)? These are just some of the questions that this book answers, in describing the development of data management through the achievements of Mike Stonebraker and his over 200 collaborators. In reading the stories in this book, you will discover core data management concepts that were developed over the two greatest eras (so far) of data management technology. The book is a collection of 36 stories written by Mike and 38 of his collaborators: 23 world-leading database researchers, 11 world-class systems engineers, and 4 business partners. If you are an aspiring researcher, engineer, or entrepreneur you might read these stories to find these turning points as practice to tilt at your own computer-science windmills, to spur yourself to your next step of innovation and achievement.




Intelligent Computing for Interactive System Design


Book Description

Intelligent Computing for Interactive System Design provides a comprehensive resource on what has become the dominant paradigm in designing novel interaction methods, involving gestures, speech, text, touch and brain-controlled interaction, embedded in innovative and emerging human-computer interfaces. These interfaces support ubiquitous interaction with applications and services running on smartphones, wearables, in-vehicle systems, virtual and augmented reality, robotic systems, the Internet of Things (IoT), and many other domains that are now highly competitive, both in commercial and in research contexts. This book presents the crucial theoretical foundations needed by any student, researcher, or practitioner working on novel interface design, with chapters on statistical methods, digital signal processing (DSP), and machine learning (ML). These foundations are followed by chapters that discuss case studies on smart cities, brain-computer interfaces, probabilistic mobile text entry, secure gestures, personal context from mobile phones, adaptive touch interfaces, and automotive user interfaces. The case studies chapters also highlight an in-depth look at the practical application of DSP and ML methods used for processing of touch, gesture, biometric, or embedded sensor inputs. A common theme throughout the case studies is ubiquitous support for humans in their daily professional or personal activities. In addition, the book provides walk-through examples of different DSP and ML techniques and their use in interactive systems. Common terms are defined, and information on practical resources is provided (e.g., software tools, data resources) for hands-on project work to develop and evaluate multimodal and multi-sensor systems. In a series of in-chapter commentary boxes, an expert on the legal and ethical issues explores the emergent deep concerns of the professional community, on how DSP and ML should be adopted and used in socially appropriate ways, to most effectively advance human performance during ubiquitous interaction with omnipresent computers. This carefully edited collection is written by international experts and pioneers in the fields of DSP and ML. It provides a textbook for students and a reference and technology roadmap for developers and professionals working on interaction design on emerging platforms.




Software


Book Description

Software history has a deep impact on current software designers, computer scientists, and technologists. System constraints imposed in the past and the designs that responded to them are often unknown or poorly understood by students and practitioners, yet modern software systems often include “old” software and “historical” programming techniques. This work looks at software history through specific software areas to develop student-consumable practices, design principles, lessons learned, and trends useful in current and future software design. It also exposes key areas that are widely used in modern software, yet infrequently taught in computing programs. Written as a textbook, this book uses specific cases from the past and present to explore the impact of software trends and techniques. Building on concepts from the history of science and technology, software history examines such areas as fundamentals, operating systems, programming languages, programming environments, networking, and databases. These topics are covered from their earliest beginnings to their modern variants. There are focused case studies on UNIX, APL, SAGE, GNU Emacs, Autoflow, internet protocols, System R, and others. Extensive problems and suggested projects enable readers to deeply delve into the history of software in areas that interest them most.




The Continuing Arms Race


Book Description

As human activities moved to the digital domain, so did all the well-known malicious behaviors including fraud, theft, and other trickery. There is no silver bullet, and each security threat calls for a specific answer. One specific threat is that applications accept malformed inputs, and in many cases it is possible to craft inputs that let an intruder take full control over the target computer system. The nature of systems programming languages lies at the heart of the problem. Rather than rewriting decades of well-tested functionality, this book examines ways to live with the (programming) sins of the past while shoring up security in the most efficient manner possible. We explore a range of different options, each making significant progress towards securing legacy programs from malicious inputs. The solutions explored include enforcement-type defenses, which excludes certain program executions because they never arise during normal operation. Another strand explores the idea of presenting adversaries with a moving target that unpredictably changes its attack surface thanks to randomization. We also cover tandem execution ideas where the compromise of one executing clone causes it to diverge from another thus revealing adversarial activities. The main purpose of this book is to provide readers with some of the most influential works on run-time exploits and defenses. We hope that the material in this book will inspire readers and generate new ideas and paradigms.




Concurrency


Book Description

This book is a celebration of Leslie Lamport's work on concurrency, interwoven in four-and-a-half decades of an evolving industry: from the introduction of the first personal computer to an era when parallel and distributed multiprocessors are abundant. His works lay formal foundations for concurrent computations executed by interconnected computers. Some of the algorithms have become standard engineering practice for fault tolerant distributed computing – distributed systems that continue to function correctly despite failures of individual components. He also developed a substantial body of work on the formal specification and verification of concurrent systems, and has contributed to the development of automated tools applying these methods. Part I consists of technical chapters of the book and a biography. The technical chapters of this book present a retrospective on Lamport's original ideas from experts in the field. Through this lens, it portrays their long-lasting impact. The chapters cover timeless notions Lamport introduced: the Bakery algorithm, atomic shared registers and sequential consistency; causality and logical time; Byzantine Agreement; state machine replication and Paxos; temporal logic of actions (TLA). The professional biography tells of Lamport's career, providing the context in which his work arose and broke new grounds, and discusses LaTeX – perhaps Lamport’s most influential contribution outside the field of concurrency. This chapter gives a voice to the people behind the achievements, notably Lamport himself, and additionally the colleagues around him, who inspired, collaborated, and helped him drive worldwide impact. Part II consists of a selection of Leslie Lamport's most influential papers. This book touches on a lifetime of contributions by Leslie Lamport to the field of concurrency and on the extensive influence he had on people working in the field. It will be of value to historians of science, and to researchers and students who work in the area of concurrency and who are interested to read about the work of one of the most influential researchers in this field.




Applied Affective Computing


Book Description

Affective computing is a nascent field situated at the intersection of artificial intelligence with social and behavioral science. It studies how human emotions are perceived and expressed, which then informs the design of intelligent agents and systems that can either mimic this behavior to improve their intelligence or incorporate such knowledge to effectively understand and communicate with their human collaborators. Affective computing research has recently seen significant advances and is making a critical transformation from exploratory studies to real-world applications in the emerging research area known as applied affective computing. This book offers readers an overview of the state-of-the-art and emerging themes in affective computing, including a comprehensive review of the existing approaches to affective computing systems and social signal processing. It provides in-depth case studies of applied affective computing in various domains, such as social robotics and mental well-being. It also addresses ethical concerns related to affective computing and how to prevent misuse of the technology in research and applications. Further, this book identifies future directions for the field and summarizes a set of guidelines for developing next-generation affective computing systems that are effective, safe, and human-centered. For researchers and practitioners new to affective computing, this book will serve as an introduction to the field to help them in identifying new research topics or developing novel applications. For more experienced researchers and practitioners, the discussions in this book provide guidance for adopting a human-centered design and development approach to advance affective computing.




Frontiers of Multimedia Research


Book Description

The field of multimedia is unique in offering a rich and dynamic forum for researchers from “traditional” fields to collaborate and develop new solutions and knowledge that transcend the boundaries of individual disciplines. Despite the prolific research activities and outcomes, however, few efforts have been made to develop books that serve as an introduction to the rich spectrum of topics covered by this broad field. A few books are available that either focus on specific subfields or basic background in multimedia. Tutorial-style materials covering the active topics being pursued by the leading researchers at frontiers of the field are currently lacking. In 2015, ACM SIGMM, the special interest group on multimedia, launched a new initiative to address this void by selecting and inviting 12 rising-star speakers from different subfields of multimedia research to deliver plenary tutorial-style talks at the ACM Multimedia conference for 2015. Each speaker discussed the challenges and state-of-the-art developments of their prospective research areas in a general manner to the broad community. The covered topics were comprehensive, including multimedia content understanding, multimodal human-human and human-computer interaction, multimedia social media, and multimedia system architecture and deployment. Following the very positive responses to these talks, the speakers were invited to expand the content covered in their talks into chapters that can be used as reference material for researchers, students, and practitioners. Each chapter discusses the problems, technical challenges, state-of-the-art approaches and performances, open issues, and promising direction for future work. Collectively, the chapters provide an excellent sampling of major topics addressed by the community as a whole. This book, capturing some of the outcomes of such efforts, is well positioned to fill the aforementioned needs in providing tutorial-style reference materials for frontier topics in multimedia. At the same time, the speed and sophistication required of data processing have grown. In addition to simple queries, complex algorithms like machine learning and graph analysis are becoming common. And in addition to batch processing, streaming analysis of real-time data is required to let organizations take timely action. Future computing platforms will need to not only scale out traditional workloads, but support these new applications too. This book, a revised version of the 2014 ACM Dissertation Award winning dissertation, proposes an architecture for cluster computing systems that can tackle emerging data processing workloads at scale. Whereas early cluster computing systems, like MapReduce, handled batch processing, our architecture also enables streaming and interactive queries, while keeping MapReduce's scalability and fault tolerance. And whereas most deployed systems only support simple one-pass computations (e.g., SQL queries), ours also extends to the multi-pass algorithms required for complex analytics like machine learning. Finally, unlike the specialized systems proposed for some of these workloads, our architecture allows these computations to be combined, enabling rich new applications that intermix, for example, streaming and batch processing. We achieve these results through a simple extension to MapReduce that adds primitives for data sharing, called Resilient Distributed Datasets (RDDs). We show that this is enough to capture a wide range of workloads. We implement RDDs in the open source Spark system, which we evaluate using synthetic and real workloads. Spark matches or exceeds the performance of specialized systems in many domains, while offering stronger fault tolerance properties and allowing these workloads to be combined. Finally, we examine the generality of RDDs from both a theoretical modeling perspective and a systems perspective. This version of the dissertation makes corrections throughout the text and adds a new section on the evolution of Apache Spark in industry since 2014. In addition, editing, formatting, and links for the references have been added.




Data Cleaning


Book Description

This is an overview of the end-to-end data cleaning process. Data quality is one of the most important problems in data management, since dirty data often leads to inaccurate data analytics results and incorrect business decisions. Poor data across businesses and the U.S. government are reported to cost trillions of dollars a year. Multiple surveys show that dirty data is the most common barrier faced by data scientists. Not surprisingly, developing effective and efficient data cleaning solutions is challenging and is rife with deep theoretical and engineering problems. This book is about data cleaning, which is used to refer to all kinds of tasks and activities to detect and repair errors in the data. Rather than focus on a particular data cleaning task, this book describes various error detection and repair methods, and attempts to anchor these proposals with multiple taxonomies and views. Specifically, it covers four of the most common and important data cleaning tasks, namely, outlier detection, data transformation, error repair (including imputing missing values), and data deduplication. Furthermore, due to the increasing popularity and applicability of machine learning techniques, it includes a chapter that specifically explores how machine learning techniques are used for data cleaning, and how data cleaning is used to improve machine learning models. This book is intended to serve as a useful reference for researchers and practitioners who are interested in the area of data quality and data cleaning. It can also be used as a textbook for a graduate course. Although we aim at covering state-of-the-art algorithms and techniques, we recognize that data cleaning is still an active field of research and therefore provide future directions of research whenever appropriate.




Heterogeneous Computing


Book Description

If you look around you will find that all computer systems, from your portable devices to the strongest supercomputers, are heterogeneous in nature. The most obvious heterogeneity is the existence of computing nodes of different capabilities (e.g. multicore, GPUs, FPGAs, ...). But there are also other heterogeneity factors that exist in computing systems, like the memory system components, interconnection, etc. The main reason for these different types of heterogeneity is to have good performance with power efficiency. Heterogeneous computing results in both challenges and opportunities. This book discusses both. It shows that we need to deal with these challenges at all levels of the computing stack: from algorithms all the way to process technology. We discuss the topic of heterogeneous computing from different angles: hardware challenges, current hardware state-of-the-art, software issues, how to make the best use of the current heterogeneous systems, and what lies ahead. The aim of this book is to introduce the big picture of heterogeneous computing. Whether you are a hardware designer or a software developer, you need to know how the pieces of the puzzle fit together. The main goal is to bring researchers and engineers to the forefront of the research frontier in the new era that started a few years ago and is expected to continue for decades. We believe that academics, researchers, practitioners, and students will benefit from this book and will be prepared to tackle the big wave of heterogeneous computing that is here to stay.




Trust and Risk in Internet Commerce


Book Description

This book provides information on trust and risk to businesses that are developing electronic commerce systems and helps consumers understand the risks in using the Internet for purchases and show them how to protect themselves.