QED Post-silicon Validation and Debug


Book Description

During post-silicon validation and debug, manufactured integrated circuits (ICs) are tested in actual system environments to detect and fix design flaws (bugs). Traditional pre-silicon verification is inadequate; as a result, many critical bugs are detected only after ICs are manufactured (i.e., during post-silicon validation and debug). However, post-silicon validation and debug is challenging because traditional techniques are ad hoc (e.g., insertion of various Design for Debug structures based on various heuristics), and the associated costs are rising faster than design costs. These challenges are further magnified by the slowdown of silicon CMOS scaling, as ICs incorporate tremendous complexity to meet increasing demands for improvements in performance and energy efficiency. Examples include the use of multiple processor cores, co-processors, hardware accelerators, uncore components (defined as components in an SoC that are neither the processor cores nor the co-processors / accelerators; examples of uncore components include cache controllers, memory controllers, and interconnection networks), and power management units. This dissertation presents the Quick Error Detection (QED) technique to overcome post-silicon validation and debug challenges. QED is essential because long error detection latency, the time elapsed between the occurrence of an error caused by a bug and its manifestation as an observable failure, severely limits the effectiveness of existing post-silicon validation and debug approaches. Experimental results collected using several state-of-the-art commercial hardware platforms, as well as results obtained from simulations of various bug scenarios that occurred in commercial multi-core System-on-Chips (SoCs), demonstrate the effectiveness and practicality of QED: 1. QED improves error detection latencies by up to 9 orders of magnitude, from billions of clock cycles to very few clock cycles (generally fewer than 1,000 clock cycles for most bug scenarios). 2. QED enables up to 4-fold improvement in bug coverage (i.e., QED detects bugs that may be missed by traditional post-silicon validation approaches). 3. Symbolic Quick Error Detection (Symbolic QED) localizes difficult logic bugs automatically in a few hours (less than 7 hours for most bug scenarios), without requiring any additional hardware. Localizing a bug involves identifying a bug trace (defined as a sequence of inputs, e.g., instructions, that activates and detects the bug) and identifying the hardware design block where the bug is (possibly) located. This was demonstrated for an open-source multi-core SoC consisting of 500 millions transistors. In contrast, it might take days or weeks (or even months) of manual work, per bug, when traditional techniques are used. QED is effective for bugs inside processor cores, co-processors / software-programmable accelerators (which are components in an SoC that can be programmed using software to perform a specific set of functions, examples include graphic processing unit and digital signal processor), non-programmable hardware accelerators (which are components in a SoC that are designed to perform a pre-defined set of functions, but cannot be programmed using software, examples include accelerators for video or audio compression), and uncore components such as cache controllers, memory controllers, and interconnection networks. QED has been successfully used in industry during post-silicon validation and debug of a commercial multi-core SoC.




Post-Silicon Validation and Debug


Book Description

This book provides a comprehensive coverage of System-on-Chip (SoC) post-silicon validation and debug challenges and state-of-the-art solutions with contributions from SoC designers, academic researchers as well as SoC verification experts. The readers will get a clear understanding of the existing debug infrastructure and how they can be effectively utilized to verify and debug SoCs.




Hardware and Software: Verification and Testing


Book Description

This book constitutes the refereed proceedings of the 13th International Haifa Verification Conference, HVC 2017, held in Haifa, Israel in November 2017.The 13 revised full papers presented together with 4 poster and 5 tool demo papers were carefully reviewed and selected from 45 submissions. They are dedicated to advance the state of the art and state of the practice in verification and testing and are discussing future directions of testing and verification for hardware, software, and complex hybrid systems.




Computer Aided Verification


Book Description

The two-volume set LNCS 10426 and LNCS 10427 constitutes the refereed proceedings of the 29th International Conference on Computer Aided Verification, CAV 2017, held in Heidelberg, Germany, in July 2017. The total of 50 full and 7 short papers presented together with 5 keynotes and tutorials in the proceedings was carefully reviewed and selected from 191 submissions. The CAV conference series is dedicated to the advancement of the theory and practice of computer-aided formal analysis of hardware and software systems. The conference covers the spectrum from theoretical results to concrete applications, with an emphasis on practical verification tools and the algorithms and techniques that are needed for their implementation.




Debug Automation from Pre-Silicon to Post-Silicon


Book Description

This book describes automated debugging approaches for the bugs and the faults which appear in different abstraction levels of a hardware system. The authors employ a transaction-based debug approach to systems at the transaction-level, asserting the correct relation of transactions. The automated debug approach for design bugs finds the potential fault candidates at RTL and gate-level of a circuit. Debug techniques for logic bugs and synchronization bugs are demonstrated, enabling readers to localize the most difficult bugs. Debug automation for electrical faults (delay faults)finds the potentially failing speedpaths in a circuit at gate-level. The various debug approaches described achieve high diagnosis accuracy and reduce the debugging time, shortening the IC development cycle and increasing the productivity of designers. Describes a unified framework for debug automation used at both pre-silicon and post-silicon stages; Provides approaches for debug automation of a hardware system at different levels of abstraction, i.e., chip, gate-level, RTL and transaction level; Includes techniques for debug automation of design bugs and electrical faults, as well as an infrastructure to debug NoC-based multiprocessor SoCs.




Variance Validation for Post-silicon Debugging in Network on Chip


Book Description

Since more complex components are integrated into a single chip and scale of chip goes far beyond, pre-silicon verification is getting hard to achieve full coverage at chip level and catch design bugs under physical conditions. As a complementary checking step, post-silicon validation has demonstrated its importance in verifying chip functionality. But still, the task of post-silicon validation is extremely difficult, especially for complex designs such as network on chip. In this thesis, a novel on-chip validation method based on the concept of variance validation is proposed to facilitate the process of post-silicon validation targeting the network on chip platform. Cores are paired and tagged as a functional core and a validating core in each pair. Variances are created for programs executed in each pair of cores. By comparing the outputs from each pair of cores, the method enables at-speed on-chip validation for core-based architectures and helps detect functional bugs after chip manufacturing. With little effort, the method can be extended to support reliable system design. Experiments are carried out and effectiveness of the proposed variance validation method is demonstrated by simulation results.




PROCEEDINGS OF THE 20TH CONFERENCE ON FORMAL METHODS IN COMPUTER-AIDED DESIGN – FMCAD 2020


Book Description

Formal Methods in Computer-Aided Design (FMCAD) is a conference series on the theory and applications of formal methods in hardware and system verification. FMCAD provides a leading forum to researchers in academia and industry for presenting and discussing ground-breaking methods, technologies, theoretical results, and tools for reasoning formally about computing systems. FMCAD covers formal aspects of computer-aided system design including verification, specification, synthesis, and testing.




A Reconfigurable Post-silicon Debug Infrastructure for Systems-on-chip


Book Description

As the level of integrated circuit (IC) complexity continues to increase, the post-silicon validation stage is becoming a large component of the overall development cost. To address this, we propose a reconfigurable post-silicon debug infrastructure that enhances the post-silicon validation process by enabling the observation and control of signals that are internal to the manufactured device. The infrastructure is composed of dedicated programmable logic and programmable access networks. Our reconfigurable infrastructure enables not only the diagnoses of bugs; it also allows the detection and potential correction of errors in normal operation. In this thesis we describe the architecture, implementation and operation of our new infrastructure. Furthermore, we identify and address three key challenges arising from the implementation of this infrastructure. Our results demonstrate that it is possible to implement an effective reconfigurable post-silicon infrastructure that is able to observe and control circuits operating at full speed, with an area overhead of between 5% and 10% for many of our target ICs.




Introduction to VLSI Design Flow


Book Description




Hardware and Software: Verification and Testing


Book Description

This book constitutes the refereed proceedings of the 9th International Haifa Verification Conference, HVC 2013, held in Haifa, Israel in November 2013. The 24 revised full papers presented were carefully reviewed and selected from 49 submissions. The papers are organized in topical sections on SAT and SMT-based verification, software testing, supporting dynamic verification, specification and coverage, abstraction and model presentation.