Architecting Large Caches with Reduced Energy


Book Description

With process scaling, a large cache will be required in the future in order to meet the demands of emerging multi-core systems with higher processing speeds. However, the low density of Static Random Access Memory (SRAM) hinders the growth of cache capacity, which can take up to half of the die area. At the same time, main memory, with its long latency and limited bandwidth, also does not keep up with the speed of the CPU. Thus, new approaches are needed to increase on-die cache capacity and overcome the memory wall problem. Using the emerging 3D-stacked Dynamic Random Access Memory (DRAM) cache, which can easily provide gigabytes of storage, as the last level cache, is one potential approach to address the memory wall problem. However, the DRAM cache suffers from high energy consumption with increasing capacity. This dissertation first presents an energy-efficient DRAM cache design. This design is based on the observation that the DRAM cache with longer bitlines consumes more energy due to larger capacitance. We propose TCache, which partitions every subarray of DRAM cache banks into three sublevels and schedules energy-efficient data movement among these levels based on reuse distance. We also propose the LevelMap and WayMap to indicate in which sublevel and way that every data block of the DRAM cache is located. The Energy-efficient Data Movement policy based on the reuse distance is presented to increase the hit rate in the energy-efficient sublevel regions. Evaluations show these techniques reduce DRAM cache energy consumption by 33.4% (by 11% after considering DRAM cache controller and DRAM cache logic overall). Performance is improved by 10.6% on average over the baseline DRAM cache (by 7% after considering DRAM cache controller and DRAM cache logic overall). A novel hybrid cache architecture consisting of both a DRAM region and a Spin-Transfer-Torque-RAM (STT-RAM) region is then introduced. This design is based on the observation that there are many redundant bits written in the row buffer and futile bits written back to STT-RAM cells, which do not change the cells value but still cost high write energy. We propose the selective write back to row buffer and selective write back to cell array optimizations to reduce high write energy of the STT-RAM region by removing the unnecessary bit-writes. In this dissertation, we also propose the reuse distance-oriented data movement and a novel tag design for the hybrid cache. The results show that our hybrid cache achieves on average a 28.3% energy reduction and 6.7% performance improvement for the write optimizations (15% energy reduction and 4% performance improvement after considering hybrid cache controller and hybrid cache logic overall). Although STT-RAM with near-zero leakage can be integrated with the DRAM cache as a hybrid cache to reduce static energy, the high write energy of STT-RAM brings another energy challenge. In this dissertation, we also describe a tri-regional hybrid cache that can enjoy the advantage of both DRAM and STT-RAM technologies. We propose an asymmetric data access policy and a prediction table to further reduce the energy of the large hybrid cache. Using the tri-regional design, the results show that energy is reduced by 26% and performance is improved by 11% on average. However, the limitation is that the DRAM-style refresh cannot sufficiently remove error in the STT-RAM, which needs the error correcting method such as the ECC to completely eliminate the error.




Low-Power Electronics Design


Book Description

The power consumption of integrated circuits is one of the most problematic considerations affecting the design of high-performance chips and portable devices. The study of power-saving design methodologies now must also include subjects such as systems on chips, embedded software, and the future of microelectronics. Low-Power Electronics Design covers all major aspects of low-power design of ICs in deep submicron technologies and addresses emerging topics related to future design. This volume explores, in individual chapters written by expert authors, the many low-power techniques born during the past decade. It also discusses the many different domains and disciplines that impact power consumption, including processors, complex circuits, software, CAD tools, and energy sources and management. The authors delve into what many specialists predict about the future by presenting techniques that are promising but are not yet reality. They investigate nanotechnologies, optical circuits, ad hoc networks, e-textiles, as well as human powered sources of energy. Low-Power Electronics Design delivers a complete picture of today's methods for reducing power, and also illustrates the advances in chip design that may be commonplace 10 or 15 years from now.




An Energy Efficient TCAM Enhanced Cache Architecture


Book Description

Microprocessors are used in a variety of systems ranging from high-performance super computers running scientific applications to battery powered cell phones performing realtime tasks. Due to the large disparity between processor clock speed and main memory access time, most modern processors include several caches, which consume more than half of the total chip area and power budget. As the performance gap between processors andmemory has increased, the trend has been to increase the size of the on-chip caches. However, increasing the cache size also increases its access time and energy consumptions. This growing power dissipation problem is making traditional cooling and packaging techniques less effective thus requiring cache designers to focus more on architectural level energy efficiency than performance alone. The goal of this thesis is to propose a new cache architecture and to evaluate its efficiency in terms of miss rate, system performance, energy consumption, and area overhead. The proposed architecture employs the use of a few Ternary-CAM (TCAM) cells in the tag array to enable dynamic compression of tag entries containing contiguous values. By dynamically compressing tag entries, the number of entries in the tag array can be reduced by 2[superscript]N, where N is the number of tag bits that can be compressed. The architecture described in this thesis is applicable to any cache structure that uses Content Addressable Memory (CAM) cells to store tag bits. To evaluate the effectiveness of the TCAM Enhanced Cache Architecture for a wide scope of applications, two case studies were performed - the L2 Data-TLB (DTLB) of a high-performance processor and the L1 instruction and data caches of a low-power embedded processor. Results indicate that a L2 DTLB implementing 3-bit tag compression can achieve 93% of the performance of a conventional L2 DTLB of the same size while reducing the on-chip energy consumption by 74% and the total area by 50%. Similarly, an embedded processor cache implementing 2-bit tag compression achieves 99% of the performance of a conventional cache while reducing the on-chip energy consumption by 33% and the total area by 10%.




Memory Architecture Exploration for Programmable Embedded Systems


Book Description

Memory Architecture Exploration for Programmable Embedded Systems addresses efficient exploration of alternative memory architectures, assisted by a "compiler-in-the-loop" that allows effective matching of the target application to the processor-memory architecture. This new approach for memory architecture exploration replaces the traditional black-box view of the memory system and allows for aggressive co-optimization of the programmable processor together with a customized memory system. The book concludes with a set of experiments demonstrating the utility of this exploration approach. The authors perform architecture and compiler exploration for a set of large, real-life benchmarks, uncovering promising memory configurations from different perspectives, such as cost, performance and power.




Computer Architecture Techniques for Power-Efficiency


Book Description

In the last few years, power dissipation has become an important design constraint, on par with performance, in the design of new computer systems. Whereas in the past, the primary job of the computer architect was to translate improvements in operating frequency and transistor count into performance, now power efficiency must be taken into account at every step of the design process. While for some time, architects have been successful in delivering 40% to 50% annual improvement in processor performance, costs that were previously brushed aside eventually caught up. The most critical of these costs is the inexorable increase in power dissipation and power density in processors. Power dissipation issues have catalyzed new topic areas in computer architecture, resulting in a substantial body of work on more power-efficient architectures. Power dissipation coupled with diminishing performance gains, was also the main cause for the switch from single-core to multi-core architectures and a slowdown in frequency increase. This book aims to document some of the most important architectural techniques that were invented, proposed, and applied to reduce both dynamic power and static power dissipation in processors and memory hierarchies. A significant number of techniques have been proposed for a wide range of situations and this book synthesizes those techniques by focusing on their common characteristics. Table of Contents: Introduction / Modeling, Simulation, and Measurement / Using Voltage and Frequency Adjustments to Manage Dynamic Power / Optimizing Capacitance and Switching Activity to Reduce Dynamic Power / Managing Static (Leakage) Power / Conclusions




Advanced Computer Architecture


Book Description

This book constitutes the refereed proceedings of the 11th Annual Conference on Advanced Computer Architecture, ACA 2016, held in Weihai, China, in August 2016. The 17 revised full papers presented were carefully reviewed and selected from 89 submissions. The papers address issues such as processors and circuits; high performance computing; GPUs and accelerators; cloud and data centers; energy and reliability; intelligence computing and mobile computing.




Advances in Computer Systems Architecture


Book Description

This book constitutes the refereed proceedings of the 10th Asia-Pacific Computer Systems Architecture Conference, ACSAC 2005, held in Singapore in October 2005. The 65 revised full papers presented were carefully reviewed and selected from 173 submissions. The papers are organized in topical sections on energy efficient and power aware techniques, methodologies and architectures for application-specific systems, processor architectures and microarchitectures, high-reliability and fault-tolerant architectures, compiler and OS for emerging architectures, data value predictions, reconfigurable computing systems and polymorphic architectures, interconnect networks and network interfaces, parallel architectures and computation models, hardware-software partitioning, verification, and testing of complex architectures, architectures for secured computing, simulation and performance evaluation, architectures for emerging technologies and applications, and memory systems hierarchy and management.




Architecture of Computing Systems - ARCS 2006


Book Description

This book constitutes the refereed proceedings of the 19th International Conference on Architecture of Computing Systems, ARCS 2006, held in March 2006. The 32 revised full papers presented together with two invited and keynote papers were carefully reviewed and selected from 174 submissions. The papers are organized in topical sections on pervasive computing, memory systems, architectures, multiprocessing, energy efficient design, power awareness, network protocols, security, and distributed networks.




Computer Architecture for Scientists


Book Description

A principled, high-level view of computer performance and how to exploit it. Ideal for software architects and data scientists.




Advances in Computer Systems Architecture


Book Description

This book constitutes the refereed proceedings of the 8th Asia-Pacific Computer Systems Architecture Conference, ACSAC 2003, held in Aizu-Wakamatsu, Japan in September 2003. The 23 revised full papers presented together with 8 invited papers were carefully reviewed and selected from 30 submissions. The papers are organized in topical sections on processor architectures and innovative microarchitectures, parallel computer architectures and computation models, reconfigurable architectures, computer arithmetic, cache and memory architectures, and interconnection networks and network interfaces.