Improving Processor Performance by Dynamically Pre-processing the Instruction Stream


Book Description

The exponentially increasing gap between processors and off-chip memory, as measured in processor cycles, is rapidly turning memory latency into a major processor performance bottleneck. Traditional solutions, such as employing multiple levels of caches, are expensive and do not work well with some applications. We evaluate a technique, called runahead pre-processing, that can significantly improve processor performance. The instruction and data stream prefetches generated during runahead episodes led to a significant performance improvement for all of the benchmarks we examined. We found that runahead typically led to about a 30% reduction in CPI for the four Spec95 integer benchmarks that we simulated, while runahead was able to reduce CPI by 77% for the STREAM benchmark. This is for a five stage pipeline with two levels of split instruction and data caches: 8KB each of L1, and 1MB each of L2. A significant result is that when the latency to off-chip memory increases, or if the caching performance for a particular benchmark is poor, runahead is especially effective as the processor has more opportunities in which to pre-process instructions. Finally, runahead appears particularly well suited for use with high clock-rate in-order processors that employ relatively inexpensive memory hierarchies.



















Advanced Processors


Book Description

The book is written for an undergraduate course on the 16-bit, 32-bit and 64-bit Intel Processors. It provides comprehensive coverage of the hardware and software aspects of 8086/88, 80286, 80386, 80486 and Pentium Processors. The book uses plain and lucid language to explain each topic. The book provides the logical method of explaining the various complicated concepts and stepwise techniques for easy understanding, making the subject more interesting. The book begins with the 8086 architecture, instruction set, Assembly Language Programming (ALP) and interfacing 8086 with support chips, memory and I/O. It focuses on features, architecture, pin description, data types, addressing modes and newly supported instructions of 80286 and 80386 microprocessors. It discusses various operating modes supported by 80386 - Real Mode, Protected Mode and Virtual 8086 Mode. Finally, the book focuses on multitasking, exception handling, 80486 architecture, Pentium architecture and RISC processor. It describes Pentium superscalar architecture, pipelining, instruction pairing rules, instruction and data cache, floating-point unit, Pentium Pro architecture, Pentium MMX architecture, Hyper Treading Core2- Duo features and concept of RISC processor.




Proceedings


Book Description




Reliable and Energy Efficient Streaming Multiprocessor Systems


Book Description

This book discusses analysis, design and optimization techniques for streaming multiprocessor systems, while satisfying a given area, performance, and energy budget. The authors describe design flows for both application-specific and general purpose streaming systems. Coverage also includes the use of machine learning for thermal optimization at run-time, when an application is being executed. The design flow described in this book extends to thermal and energy optimization with multiple applications running sequentially and concurrently.