Improving Performance of In-memory Key-value Stores Using a 3d-stacked Architecture


Book Description

Web services and cloud computing are rapidly growing as more users get online around the world and utilize the internet for a growing number of purposes. This puts more demand on in-memory key-value stores as web servers must handle a massive influx of user requests. Data centers will thus find it more challenging to meet their SLAs (Service Level Agreements), as the latency of the 90th percentile of requests may become quite unpredictable. To alleviate this growing concern, we utilize a stacked DRAM architecture as a LLC (last-level cache) that is modified to exploit some common power-law access patterns in user requests. More specifically, we observe that the majority of the memory traffic generated by a key-value store is due to requests for large values, even though large values account for a very small portion (typically around 5%) of overall requests. Thus, we choose to prioritize the cachelines that belong to large values in the stacked DRAM cache by allowing priority cachelines to only be evicted by other priority cachelines. Using this priority scheme, we are able to improve the 90th percentile request latency by as much as 42.4% over a standard stacked DRAM cache architecture.




Architecting Large Caches with Reduced Energy


Book Description

With process scaling, a large cache will be required in the future in order to meet the demands of emerging multi-core systems with higher processing speeds. However, the low density of Static Random Access Memory (SRAM) hinders the growth of cache capacity, which can take up to half of the die area. At the same time, main memory, with its long latency and limited bandwidth, also does not keep up with the speed of the CPU. Thus, new approaches are needed to increase on-die cache capacity and overcome the memory wall problem. Using the emerging 3D-stacked Dynamic Random Access Memory (DRAM) cache, which can easily provide gigabytes of storage, as the last level cache, is one potential approach to address the memory wall problem. However, the DRAM cache suffers from high energy consumption with increasing capacity. This dissertation first presents an energy-efficient DRAM cache design. This design is based on the observation that the DRAM cache with longer bitlines consumes more energy due to larger capacitance. We propose TCache, which partitions every subarray of DRAM cache banks into three sublevels and schedules energy-efficient data movement among these levels based on reuse distance. We also propose the LevelMap and WayMap to indicate in which sublevel and way that every data block of the DRAM cache is located. The Energy-efficient Data Movement policy based on the reuse distance is presented to increase the hit rate in the energy-efficient sublevel regions. Evaluations show these techniques reduce DRAM cache energy consumption by 33.4% (by 11% after considering DRAM cache controller and DRAM cache logic overall). Performance is improved by 10.6% on average over the baseline DRAM cache (by 7% after considering DRAM cache controller and DRAM cache logic overall). A novel hybrid cache architecture consisting of both a DRAM region and a Spin-Transfer-Torque-RAM (STT-RAM) region is then introduced. This design is based on the observation that there are many redundant bits written in the row buffer and futile bits written back to STT-RAM cells, which do not change the cells value but still cost high write energy. We propose the selective write back to row buffer and selective write back to cell array optimizations to reduce high write energy of the STT-RAM region by removing the unnecessary bit-writes. In this dissertation, we also propose the reuse distance-oriented data movement and a novel tag design for the hybrid cache. The results show that our hybrid cache achieves on average a 28.3% energy reduction and 6.7% performance improvement for the write optimizations (15% energy reduction and 4% performance improvement after considering hybrid cache controller and hybrid cache logic overall). Although STT-RAM with near-zero leakage can be integrated with the DRAM cache as a hybrid cache to reduce static energy, the high write energy of STT-RAM brings another energy challenge. In this dissertation, we also describe a tri-regional hybrid cache that can enjoy the advantage of both DRAM and STT-RAM technologies. We propose an asymmetric data access policy and a prediction table to further reduce the energy of the large hybrid cache. Using the tri-regional design, the results show that energy is reduced by 26% and performance is improved by 11% on average. However, the limitation is that the DRAM-style refresh cannot sufficiently remove error in the STT-RAM, which needs the error correcting method such as the ECC to completely eliminate the error.




More-than-Moore 2.5D and 3D SiP Integration


Book Description

This book presents a realistic and a holistic review of the microelectronic and semiconductor technology options in the post Moore’s Law regime. Technical tradeoffs, from architecture down to manufacturing processes, associated with the 2.5D and 3D integration technologies, as well as the business and product management considerations encountered when faced by disruptive technology options, are presented. Coverage includes a discussion of Integrated Device Manufacturer (IDM) vs Fabless, vs Foundry, and Outsourced Assembly and Test (OSAT) barriers to implementation of disruptive technology options. This book is a must-read for any IC product team that is considering getting off the Moore’s Law track, and leveraging some of the More-than-Moore technology options for their next microelectronic product.




75th Anniversary of the Transistor


Book Description

75th Anniversary of the Transistor 75th anniversary commemorative volume reflecting the transistor's development since inception to current state of the art 75th Anniversary of the Transistor is a commemorative anniversary volume to celebrate the invention of the transistor. The anniversary volume was conceived by the IEEE Electron Devices Society (EDS) to provide comprehensive yet compact coverage of the historical perspectives underlying the invention of the transistor and its subsequent evolution into a multitude of integration and manufacturing technologies and applications. The book reflects the transistor's development since inception to the current state of the art that continues to enable scaling to very large-scale integrated circuits of higher functionality and speed. The stages in this evolution covered are in chronological order to reflect historical developments. Narratives and experiences are provided by a select number of venerated industry and academic leaders, and retired veterans, of the semiconductor industry. 75th Anniversary of the Transistor highlights: Historical perspectives of the state-of-the-art pre-solid-state-transistor world (pre-1947) leading to the invention of the transistor Invention of the bipolar junction transistor (BJT) and analytical formulations by Shockley (1948) and their impact on the semiconductor industry Large scale integration, Moore's Law (1965) and transistor scaling (1974), and MOS/LSI, including flash memories — SRAMs, DRAMs (1963), and the Toshiba NAND flash memory (1989) Image sensors (1986), including charge-coupled devices, and related microsensor applications With comprehensive yet succinct and accessible coverage of one of the cornerstones of modern technology, 75th Anniversary of the Transistor is an essential reference for engineers, researchers, and undergraduate students looking for historical perspective from leaders in the field.




Supercomputing Frontiers


Book Description

This open access book constitutes the refereed proceedings of the 6th Asian Supercomputing Conference, SCFA 2020, which was planned to be held in February 2020, but unfortunately, the physical conference was cancelled due to the COVID-19 pandemic.The 8 full papers presented in this book were carefully reviewed and selected from 22 submissions. They cover a range of topics including file systems, memory hierarchy, HPC cloud platform, container image configuration workflow, large-scale applications, and scheduling.




Temporal Data & the Relational Model


Book Description

A review of relational concepts -- An overview of Tutorial D -- Time and the database -- What is the problem? -- Intervals -- Operators on intervals -- The EXPAND and COLLAPSE operators -- The PACK and UNPACK operators -- Generalizing the relational operators -- Database design -- Integrity constraints 1 : candidate keys and related constraints -- Integrity constraints 2 : general constraints -- Database queries -- Database updates -- Stated times and logged times -- Point and interval types revisited.




Three-Dimensional Design Methodologies for Tree-based FPGA Architecture


Book Description

This book focuses on the development of 3D design and implementation methodologies for Tree-based FPGA architecture. It also stresses the needs for new and augmented 3D CAD tools to support designs such as, the design for 3D, to manufacture high performance 3D integrated circuits and reconfigurable FPGA-based systems. This book was written as a text that covers the foundations of 3D integrated system design and FPGA architecture design. It was written for the use in an elective or core course at the graduate level in field of Electrical Engineering, Computer Engineering and Doctoral Research programs. No previous background on 3D integration is required, nevertheless fundamental understanding of 2D CMOS VLSI design is required. It is assumed that reader has taken the core curriculum in Electrical Engineering or Computer Engineering, with courses like CMOS VLSI design, Digital System Design and Microelectronics Circuits being the most important. It is accessible for self-study by both senior students and professionals alike.




Handbook of 3D Integration, Volume 4


Book Description

This fourth volume of the landmark handbook focuses on the design, testing, and thermal management of 3D-integrated circuits, both from a technological and materials science perspective. Edited and authored by key contributors from top research institutions and high-tech companies, the first part of the book provides an overview of the latest developments in 3D chip design, including challenges and opportunities. The second part focuses on the test methods used to assess the quality and reliability of the 3D-integrated circuits, while the third and final part deals with thermal management and advanced cooling technologies and their integration.




Processor and System-on-Chip Simulation


Book Description

Simulation of computer architectures has made rapid progress recently. The primary application areas are hardware/software performance estimation and optimization as well as functional and timing verification. Recent, innovative technologies such as retargetable simulator generation, dynamic binary translation, or sampling simulation have enabled widespread use of processor and system-on-chip (SoC) simulation tools in the semiconductor and embedded system industries. Simultaneously, processor and SoC simulation is still a very active research area, e.g. what amounts to higher simulation speed, flexibility, and accuracy/speed trade-offs. This book presents and discusses the principle technologies and state-of-the-art in high-level hardware architecture simulation, both at the processor and the system-on-chip level.




CMOS Digital IC


Book Description