Cache and Memory Hierarchy Design


Book Description

An authoritative book for hardware and software designers. Caches are by far the simplest and most effective mechanism for improving computer performance. This innovative book exposes the characteristics of performance-optimal single and multi-level cache hierarchies by approaching the cache design process through the novel perspective of minimizing execution times. It presents useful data on the relative performance of a wide spectrum of machines and offers empirical and analytical evaluations of the underlying phenomena. This book will help computer professionals appreciate the impact of caches and enable designers to maximize performance given particular implementation constraints.




Multi-Core Cache Hierarchies


Book Description

A key determinant of overall system performance and power dissipation is the cache hierarchy since access to off-chip memory consumes many more cycles and energy than on-chip accesses. In addition, multi-core processors are expected to place ever higher bandwidth demands on the memory system. All these issues make it important to avoid off-chip memory access by improving the efficiency of the on-chip cache. Future multi-core processors will have many large cache banks connected by a network and shared by many cores. Hence, many important problems must be solved: cache resources must be allocated across many cores, data must be placed in cache banks that are near the accessing core, and the most important data must be identified for retention. Finally, difficulties in scaling existing technologies require adapting to and exploiting new technology constraints. The book attempts a synthesis of recent cache research that has focused on innovations for multi-core processors. It is an excellent starting point for early-stage graduate students, researchers, and practitioners who wish to understand the landscape of recent cache research. The book is suitable as a reference for advanced computer architecture classes as well as for experienced researchers and VLSI engineers. Table of Contents: Basic Elements of Large Cache Design / Organizing Data in CMP Last Level Caches / Policies Impacting Cache Hit Rates / Interconnection Networks within Large Caches / Technology / Concluding Remarks




Exploring Memory Hierarchy Design with Emerging Memory Technologies


Book Description

This book equips readers with tools for computer architecture of high performance, low power, and high reliability memory hierarchy in computer systems based on emerging memory technologies, such as STTRAM, PCM, FBDRAM, etc. The techniques described offer advantages of high density, near-zero static power, and immunity to soft errors, which have the potential of overcoming the “memory wall.” The authors discuss memory design from various perspectives: emerging memory technologies are employed in the memory hierarchy with novel architecture modification; hybrid memory structure is introduced to leverage advantages from multiple memory technologies; an analytical model named “Moguls” is introduced to explore quantitatively the optimization design of a memory hierarchy; finally, the vulnerability of the CMPs to radiation-based soft errors is improved by replacing different levels of on-chip memory with STT-RAMs.




The Fractal Structure of Data Reference


Book Description

The architectural concept of a memory hierarchy has been immensely successful, making possible today's spectacular pace of technology evolution in both the volume of data and the speed of data access. Its success is difficult to understand, however, when examined within the traditional "memoryless" framework of performance analysis. The `memoryless' framework cannot properly reflect a memory hierarchy's ability to take advantage of patterns of data use that are transient. The Fractal Structure of Data Reference: Applications to the Memory Hierarchy both introduces, and justifies empirically, an alternative modeling framework in which arrivals are driven by a statistically self-similar underlying process, and are transient in nature. The substance of this book comes from the ability of the model to impose a mathematically tractable structure on important problems involving the operation and performance of a memory hierarchy. It describes events as they play out at a wide range of time scales, from the operation of file buffers and storage control cache, to a statistical view of entire disk storage applications. Striking insights are obtained about how memory hierarchies work, and how to exploit them to best advantage. The emphasis is on the practical application of such results. The Fractal Structure of Data Reference: Applications to the Memory Hierarchy will be of interest to professionals working in the area of applied computer performance and capacity planning, particularly those with a focus on disk storage. The book is also an excellent reference for those interested in database and data structure research.




Redesigning the Memory Hierarchy to Exploit Static and Dynamic Application Information


Book Description

Memory hierarchies are crucial to performance and energy efficiency, but current systems adopt rigid, hardware-managed cache hierarchies that cause needless data movement. The root cause for the inefficiencies of cache hierarchies is that they adopt a legacy interface and ignore most application information. Specifically, they are structured as a rigid hierarchy of progressively larger and slower cache levels, with sizes and policies fixed at design time. Caches expose a flat address space to programs that hides the hierarchy's structure and transparently move data across cache levels in fixed-size blocks using simple, fixed heuristics. Besides squandering valuable application-level information, this design is very costly: providing the illusion of a flat address space requires complex address translation machinery, such as associative lookups in caches. This thesis proposes to redesign the memory hierarchy to better exploit application information. We take a cross-layer approach that redesigns the hardware-software interface to put software in control of the hierarchy and naturally convey application semantics. We focus on two main directions: First, we design reconfigurable cache hierarchies that exploit dynamic application information to optimize their structure on the fly, approaching the performance of the best application-specific hierarchy Hardware monitors application memory behavior at low overhead, and a software runtime uses this information to periodically reconfigure the system. This approach enables software to (i) build single- or multi-level virtual cache hierarchies tailored to the needs of each application, making effective use of spatially distributed and heterogeneous (e.g., SRAM and stacked DRAM) cache banks; (ii) replicate shared data near-optimally to minimize on-chip and off-chip traffic; and (iii) schedule computation across systems with heterogeneous hierarchies (e.g., systems with near-data processors). Specializing the memory system to each application improves performance and energy efficiency, since applications can avoid using resources that they do not benefit from, and use the remaining resources to hold their data at minimum latency and energy. For example, virtual cache hierarchies improve full-system energy-delay-product (EDP) by up to 85% over a combination of state-of-the-art techniques. Second, we redesign the memory hierarchy to exploit static application information by managing variable-sized objects, the natural unit of data access in programs, instead of fixed-size cache lines. We present the Hotpads object-based hierarchy, which leverages object semantics to hide the memory layout and dispense with the flat address space interface. Similarly to how memory-safe languages abstract the memory layout, Hotpads exposes an interface based on object pointers that disallows arbitrary address arithmetic. This avoids the need for associative caches. Instead, Hotpads moves objects across a hierarchy of directly addressed memories. It rewrites pointers to avoid most associative lookups, provides hardware support for memory management, and unifies hierarchical garbage collection and data placement. Hotpads also enables many new optimizations. For instance, we have designed Zippads, a memory hierarchy that leverages Hotpads to compress objects. Leveraging object semantics and the ability to rewrite pointers in Hotpads, Zippads compresses and stores objects more compactly, with a novel compression algorithm that exploits redundancy across objects. Though object-based languages are often seen as sacrificing performance for productivity, this work shows that hardware can exploit this abstraction to improve performance and efficiency over cache hierarchies: Hotpads reduces dynamic memory hierarchy energy by 2.6x and improves performance by 34%; and Zippads reduces main memory footprint by 2x while improving performance by 30%.




Multi-Core Cache Hierarchies


Book Description

A key determinant of overall system performance and power dissipation is the cache hierarchy since access to off-chip memory consumes many more cycles and energy than on-chip accesses. In addition, multi-core processors are expected to place ever higher bandwidth demands on the memory system. All these issues make it important to avoid off-chip memory access by improving the efficiency of the on-chip cache. Future multi-core processors will have many large cache banks connected by a network and shared by many cores. Hence, many important problems must be solved: cache resources must be allocated across many cores, data must be placed in cache banks that are near the accessing core, and the most important data must be identified for retention. Finally, difficulties in scaling existing technologies require adapting to and exploiting new technology constraints.The book attempts a synthesis of recent cache research that has focused on innovations for multi-core processors. It is an excellent starting point for early-stage graduate students, researchers, and practitioners who wish to understand the landscape of recent cache research.The book is suitable as a reference for advanced computer architecture classes as well as for experienced researchers and VLSI engineers.Table of Contents: Basic Elements of Large Cache Design / Organizing Data in CMP Last Level Caches / Policies Impacting Cache Hit Rates / Interconnection Networks within Large Caches / Technology / Concluding Remarks




The Cache Memory Book


Book Description

The Second Edition of The Cache Memory Book introduces systems designers to the concepts behind cache design. The book teaches the basic cache concepts and more exotic techniques. It leads readers through someof the most intricate protocols used in complex multiprocessor caches. Written in an accessible, informal style, this text demystifies cache memory design by translating cache concepts and jargon into practical methodologies and real-life examples. It also provides adequate detail to serve as a reference book for ongoing work in cache memory design. The Second Edition includes an updated and expanded glossary of cache memory terms and buzzwords. The book provides new real world applications of cache memory design and a new chapter on cache"tricks". Illustrates detailed example designs of caches Provides numerous examples in the form of block diagrams, timing waveforms, state tables, and code traces Defines and discusses more than 240 cache specific buzzwords, comparing in detail the relative merits of different design methodologies Includes an extensive glossary, complete with clear definitions, synonyms, and references to the appropriate text discussions




Architectural Techniques to Enable Reliable and High Performance Memory Hierarchy in Chip Multi-processors


Book Description

Constant technology scaling has enabled modern computing systems to achieve high degrees of thread-level parallelism, making the design of a highly scalable and dense memory hierarchy a major challenge. During the past few decades SRAM has been widely used as the dominant technology to build on-chip cache hierarchies. On the other hand, for the main memory, DRAM has been exploited to satisfy the applications demand. However, both of these two technologies face serious scalability and power consumption problems. While there has been enormous research work to address the drawbacks of these technologies, researchers have also been considering non-volatile memory technologies to replace SRAM and DRAM in future processors. Among dierent non-volatile technologies, Spin-Transfer Torque RAM (STT-RAM) and Phase Change Memory (PCM) are the most promising candidates to replace SRAM and DRAM technologies, respectively. Researchers believe that the memory hierarchy in future computing systems will consist of a hybrid combination of current technologies (i.e., SRAM and DRAM) and non-volatile technologies (e.g., STT-RAM, and PCM). While each of these technologies have their own unique features, they have some specic limitations as well. Therefore, in order to achieve a memory hierarchy that satises all the system-level requirements, we need to study each of these memory technologies.In this dissertation, the author proposes several mechanisms to address some of the major issues with each of these technologies. To relieve the wear-out problem in a PCM-based main memory, a compression-based platform is proposed, where the compression scheme collaborates with wear-leveling and error correction schemes to further extend the memory lifetime. On the other hand, to mitigate the write disturbance problem in PCM, a new write strategy as well as a non-overlapping data layout is proposed to manage the thermal disturbance among adjacent cells.For the on-chip cache, however, we would like to achieve a scalable low-latency conguration. To this end, the author proposes a morphable SLC-MLC STT-RAM cache which dynamically trade-os between larger capacity and lower latency, based on the applications demand. While adopting scalable memory technologies, such as STT-RAM, improves the performance of cache-sensitive applications, the cache thrashing problem will stil exist in applications with very large data working-set. To address this issue, the author proposes a selective caching mechanism for highly parallel architectures. And, also introduces a criticality-aware compressed last-level cache which is capable of holding a larger portion of the data working-set while the access latency is kept low.




In-Memory Computing Hardware Accelerators for Data-Intensive Applications


Book Description

This book describes the state-of-the-art of technology and research on In-Memory Computing Hardware Accelerators for Data-Intensive Applications. The authors discuss how processing-centric computing has become insufficient to meet target requirements and how Memory-centric computing may be better suited for the needs of current applications. This reveals for readers how current and emerging memory technologies are causing a shift in the computing paradigm. The authors do deep-dive discussions on volatile and non-volatile memory technologies, covering their basic memory cell structures, operations, different computational memory designs and the challenges associated with them. Specific case studies and potential applications are provided along with their current status and commercial availability in the market.