A Course in In-Memory Data Management


Book Description

Recent achievements in hardware and software development, such as multi-core CPUs and DRAM capacities of multiple terabytes per server, enabled the introduction of a revolutionary technology: in-memory data management. This technology supports the flexible and extremely fast analysis of massive amounts of enterprise data. Professor Hasso Plattner and his research group at the Hasso Plattner Institute in Potsdam, Germany, have been investigating and teaching the corresponding concepts and their adoption in the software industry for years. This book is based on an online course that was first launched in autumn 2012 with more than 13,000 enrolled students and marked the successful starting point of the openHPI e-learning platform. The course is mainly designed for students of computer science, software engineering, and IT related subjects, but addresses business experts, software developers, technology experts, and IT analysts alike. Plattner and his group focus on exploring the inner mechanics of a column-oriented dictionary-encoded in-memory database. Covered topics include - amongst others - physical data storage and access, basic database operators, compression mechanisms, and parallel join algorithms. Beyond that, implications for future enterprise applications and their development are discussed. Step by step, readers will understand the radical differences and advantages of the new technology over traditional row-oriented, disk-based databases. In this completely revised 2nd edition, we incorporate the feedback of thousands of course participants on openHPI and take into account latest advancements in hard- and software. Improved figures, explanations, and examples further ease the understanding of the concepts presented. We introduce advanced data management techniques such as transparent aggregate caches and provide new showcases that demonstrate the potential of in-memory databases for two diverse industries: retail and life sciences.




In-Memory Data Management


Book Description

In the last 50 years the world has been completely transformed through the use of IT. We have now reached a new inflection point. Here we present, for the first time, how in-memory computing is changing the way businesses are run. Today, enterprise data is split into separate databases for performance reasons. Analytical data resides in warehouses, synchronized periodically with transactional systems. This separation makes flexible, real-time reporting on current data impossible. Multi-core CPUs, large main memories, cloud computing and powerful mobile devices are serving as the foundation for the transition of enterprises away from this restrictive model. We describe techniques that allow analytical and transactional processing at the speed of thought and enable new ways of doing business. The book is intended for university students, IT-professionals and IT-managers, but also for senior management who wish to create new business processes by leveraging in-memory computing.




In-Memory Data Management


Book Description

In the last fifty years the world has been completely transformed through the use of IT. We have now reached a new inflection point. This book presents, for the first time, how in-memory data management is changing the way businesses are run. Today, enterprise data is split into separate databases for performance reasons. Multi-core CPUs, large main memories, cloud computing and powerful mobile devices are serving as the foundation for the transition of enterprises away from this restrictive model. This book provides the technical foundation for processing combined transactional and analytical operations in the same database. In the year since we published the first edition of this book, the performance gains enabled by the use of in-memory technology in enterprise applications has truly marked an inflection point in the market. The new content in this second edition focuses on the development of these in-memory enterprise applications, showing how they leverage the capabilities of in-memory technology. The book is intended for university students, IT-professionals and IT-managers, but also for senior management who wish to create new business processes.




Building a Columnar Database on RAMCloud


Book Description

This book examines the field of parallel database management systems and illustrates the great variety of solutions based on a shared-storage or a shared-nothing architecture. Constantly dropping memory prices and the desire to operate with low-latency responses on large sets of data paved the way for main memory-based parallel database management systems. However, this area is currently dominated by the shared-nothing approach in order to preserve the in-memory performance advantage by processing data locally on each server. The main argument this book makes is that such an unilateral development will cease due to the combination of the following three trends: a) Today’s network technology features remote direct memory access (RDMA) and narrows the performance gap between accessing main memory on a server and of a remote server to and even below a single order of magnitude. b) Modern storage systems scale gracefully, are elastic and provide high-availability. c) A modern storage system such as Stanford’s RAM Cloud even keeps all data resident in the main memory. Exploiting these characteristics in the context of a main memory-based parallel database management system is desirable. The book demonstrates that the advent of RDMA-enabled network technology makes the creation of a parallel main memory DBMS based on a shared-storage approach feasible.




C++ Pointers and Dynamic Memory Management


Book Description

Using techniques developed in the classroom at America Online's Programmer's University, Michael Daconta deftly pilots programmers through the intricacies of the two most difficult aspects of C++ programming: pointers and dynamic memory management. Written by a programmer for programmers, this no-nonsense, nuts-and-bolts guide shows you how to fully exploit advanced C++ programming features, such as creating class-specific allocators, understanding references versus pointers, manipulating multidimensional arrays with pointers, and how pointers and dynamic memory are the core of object-oriented constructs like inheritance, name-mangling, and virtual functions. Covers all aspects of pointers including: pointer pointers, function pointers, and even class member pointers Over 350 source code functions—code on every topic OOP constructs dissected and implemented in C Interviews with leading C++ experts Valuable money-saving coupons on developer products Free source code disk Disk includes: Reusable code libraries—over 350 source code functions you can use to protect and enhance your applications Memory debugger Read C++ Pointers and Dynamic Memory Management and learn how to combine the elegance of object-oriented programming with the power of pointers and dynamic memory!




Pro .NET Memory Management


Book Description

Understand .NET memory management internal workings, pitfalls, and techniques in order to effectively avoid a wide range of performance and scalability problems in your software. Despite automatic memory management in .NET, there are many advantages to be found in understanding how .NET memory works and how you can best write software that interacts with it efficiently and effectively. Pro .NET Memory Management is your comprehensive guide to writing better software by understanding and working with memory management in .NET. Thoroughly vetted by the .NET Team at Microsoft, this book contains 25 valuable troubleshooting scenarios designed to help diagnose challenging memory problems. Readers will also benefit from a multitude of .NET memory management “rules” to live by that introduce methods for writing memory-aware code and the means for avoiding common, destructive pitfalls. What You'll LearnUnderstand the theoretical underpinnings of automatic memory management Take a deep dive into every aspect of .NET memory management, including detailed coverage of garbage collection (GC) implementation, that would otherwise take years of experience to acquire Get practical advice on how this knowledge can be applied in real-world software development Use practical knowledge of tools related to .NET memory management to diagnose various memory-related issuesExplore various aspects of advanced memory management, including use of Span and Memory types Who This Book Is For .NET developers, solution architects, and performance engineers




Data Analytics with Hadoop


Book Description

Ready to use statistical and machine-learning techniques across large data sets? This practical guide shows you why the Hadoop ecosystem is perfect for the job. Instead of deployment, operations, or software development usually associated with distributed computing, you’ll focus on particular analyses you can build, the data warehousing techniques that Hadoop provides, and higher order data workflows this framework can produce. Data scientists and analysts will learn how to perform a wide range of techniques, from writing MapReduce and Spark applications with Python to using advanced modeling and data management with Spark MLlib, Hive, and HBase. You’ll also learn about the analytical processes and data systems available to build and empower data products that can handle—and actually require—huge amounts of data. Understand core concepts behind Hadoop and cluster computing Use design patterns and parallel analytical algorithms to create distributed data analysis jobs Learn about data management, mining, and warehousing in a distributed context using Apache Hive and HBase Use Sqoop and Apache Flume to ingest data from relational databases Program complex Hadoop and Spark applications with Apache Pig and Spark DataFrames Perform machine learning techniques such as classification, clustering, and collaborative filtering with Spark’s MLlib




Database Internals


Book Description

When it comes to choosing, using, and maintaining a database, understanding its internals is essential. But with so many distributed databases and tools available today, it’s often difficult to understand what each one offers and how they differ. With this practical guide, Alex Petrov guides developers through the concepts behind modern database and storage engine internals. Throughout the book, you’ll explore relevant material gleaned from numerous books, papers, blog posts, and the source code of several open source databases. These resources are listed at the end of parts one and two. You’ll discover that the most significant distinctions among many modern databases reside in subsystems that determine how storage is organized and how data is distributed. This book examines: Storage engines: Explore storage classification and taxonomy, and dive into B-Tree-based and immutable Log Structured storage engines, with differences and use-cases for each Storage building blocks: Learn how database files are organized to build efficient storage, using auxiliary data structures such as Page Cache, Buffer Pool and Write-Ahead Log Distributed systems: Learn step-by-step how nodes and processes connect and build complex communication patterns Database clusters: Which consistency models are commonly used by modern databases and how distributed storage systems achieve consistency




Patterns in Data Management


Book Description

This book is not a standard textbook. This book was written extending and complementing preexisting educational videos I designed and recorded in winter 2013/14. The main goal of these videos was to use them in my flipped classroom "Database Systems" which is an intermediate-level university course designed for B.Sc. students in their third year or M.Sc. students of computer science and related disciplines. Though in general my students liked both the flipped classroom model and (most of) the videos, several students asked for an additional written script that would allow them to quickly lookup explanations for material in text that would otherwise be hard to re-find in the videos. Therefore, in spring 2015, I started working on such a course script which more and more evolved into something that I feel comfortable calling it a book. One central question I had to confront was: would I repeat all material from the videos in the textbook? In other words, would the book be designed to work without the videos? I quickly realized that writing such an old-fashioned text-oriented book, a "textbook", wouldn't be the appropriate thing to do anymore in 2015. My videos as well as the accompanying material are freely available to everyone anyways. And unless you are sitting on the local train from Saarbr�cken to Neustadt, you will almost always have Internet access to watch them. In fact, downloading the videos in advance isn't terribly hard anyway. This observation changed the original purpose of what this book would be good for: not so much the primary source of the course's content, but a different view on that content, explaining that content where possible in other words. In addition, one goal was to be concise in the textual explanations allowing you to quickly re-find and remember things you learned from the videos without going through a large body of text.




NoSQL Distilled


Book Description

'NoSQL Distilled' is designed to provide you with enough background on how NoSQL databases work, so that you can choose the right data store without having to trawl the whole web to do it. It won't answer your questions definitively, but it should narrow down the range of options you have to consider.