Low-cost and Efficient Fault Detection and Diagnosis Schemes for Modern Cores


Book Description

Continuous improvements in transistor scaling together with microarchitectural advances have made possible the widespread adoption of high-performance processors across all market segments. However, the growing reliability threats induced by technology scaling and by the complexity of designs are challenging the production of cheap yet robust systems. Soft error trends are haunting, especially for combinational logic, and parity and ECC codes are therefore becoming insufficient as combinational logic turns into the dominant source of soft errors. Furthermore, experts are warning about the need to also address intermittent and permanent faults during processor runtime, as increasing temperatures and device variations will accelerate inherent aging phenomena. These challenges specially threaten the commodity segments, which impose requirements that existing fault tolerance mechanisms cannot offer. Current techniques based on redundant execution were devised in a time when high penalties were assumed for the sake of high reliability levels. Novel light-weight techniques are therefore needed to enable fault protection in the mass market segments. The complexity of designs is making post-silicon validation extremely expensive. Validation costs exceed design costs, and the number of discovered bugs is growing, both during validation and once products hit the market. Fault localization and diagnosis are the biggest bottlenecks, magnified by huge detection latencies, limited internal observability, and costly server farms to generate test outputs. This thesis explores two directions to address some of the critical challenges introduced by unreliable technologies and by the limitations of current validation approaches. We first explore mechanisms for comprehensively detecting multiple sources of failures in modern processors during their lifetime (including transient, intermittent, permanent and also design bugs). Our solutions embrace a paradigm where fault tolerance is built based on exploiting high-level microarchitectural invariants that are reusable across designs, rather than relying on re-execution or ad-hoc block-level protection. To do so, we decompose the basic functionalities of processors into high-level tasks and propose three novel runtime verification solutions that combined enable global error detection: a computation/register dataflow checker, a memory dataflow checker, and a control flow checker. The techniques use the concept of end-to-end signatures and allow designers to adjust the fault coverage to their needs, by trading-off area, power and performance. Our fault injection studies reveal that our methods provide high coverage levels while causing significantly lower performance, power and area costs than existing techniques. Then, this thesis extends the applicability of the proposed error detection schemes to the validation phases. We present a fault localization and diagnosis solution for the memory dataflow by combining our error detection mechanism, a new low-cost logging mechanism and a diagnosis program. Selected internal activity is continuously traced and kept in a memory-resident log whose capacity can be expanded to suite validation needs. The solution can catch undiscovered bugs, reducing the dependence on simulation farms that compute golden outputs. Upon error detection, the diagnosis algorithm analyzes the log to automatically locate the bug, and also to determine its root cause. Our evaluations show that very high localization coverage and diagnosis accuracy can be obtained at very low performance and area costs. The net result is a simplification of current debugging practices, which are extremely manual, time consuming and cumbersome. Altogether, the integrated solutions proposed in this thesis capacitate the industry to deliver more reliable and correct processors as technology evolves into more complex designs and more vulnerable transistors.




Fault-Diagnosis Systems


Book Description

With increasing demands for efficiency and product quality plus progress in the integration of automatic control systems in high-cost mechatronic and safety-critical processes, the field of supervision (or monitoring), fault detection and fault diagnosis plays an important role. The book gives an introduction into advanced methods of fault detection and diagnosis (FDD). After definitions of important terms, it considers the reliability, availability, safety and systems integrity of technical processes. Then fault-detection methods for single signals without models such as limit and trend checking and with harmonic and stochastic models, such as Fourier analysis, correlation and wavelets are treated. This is followed by fault detection with process models using the relationships between signals such as parameter estimation, parity equations, observers and principal component analysis. The treated fault-diagnosis methods include classification methods from Bayes classification to neural networks with decision trees and inference methods from approximate reasoning with fuzzy logic to hybrid fuzzy-neuro systems. Several practical examples for fault detection and diagnosis of DC motor drives, a centrifugal pump, automotive suspension and tire demonstrate applications.




Real-Time Fault Detection and Diagnosis Using Intelligent Monitoring and Supervision Systems


Book Description

In monitoring and supervision schemes, fault detection and diagnosis characterize high efficiency and quality production systems. To achieve such properties, these structures are based on techniques that allow detection and diagnosis of failures in real time. Detection signals faults and diagnostics provide the root cause and location. Fault detection is based on signal and process mathematical models, while fault diagnosis is focused on systems theory and process modeling. Monitoring and supervision complement each other in fault management, thus enabling normal and continuous operation. Its application avoids stopping productive processes by early detection of failures and by applying real-time actions to eliminate them, such as predictive and proactive maintenance based on process conditions. The integration of all these methodologies enables intelligent monitoring and supervision systems, enabling real-time fault detection and diagnosis. Their high performance is associated with statistical decision-making techniques, expert systems, artificial neural networks, fuzzy logic and computational procedures, making them efficient and fully autonomous in making decisions in the real-time operation of a production system.







Model-Based Fault Diagnosis Techniques


Book Description

Guaranteeing a high system performance over a wide operating range is an important issue surrounding the design of automatic control systems with successively increasing complexity. As a key technology in the search for a solution, advanced fault detection and identification (FDI) is receiving considerable attention. This book introduces basic model-based FDI schemes, advanced analysis and design algorithms, and mathematical and control-theoretic tools. This second edition of Model-Based Fault Diagnosis Techniques contains: • new material on fault isolation and identification and alarm management; • extended and revised treatment of systematic threshold determination for systems with both deterministic unknown inputs and stochastic noises; • addition of the continuously-stirred tank heater as a representative process-industrial benchmark; and • enhanced discussion of residual evaluation which now deals with stochastic processes. Model-based Fault Diagnosis Techniques will interest academic researchers working in fault identification and diagnosis and as a text it is suitable for graduate students in a formal university-based course or as a self-study aid for practising engineers working with automatic control or mechatronic systems from backgrounds as diverse as chemical process and power engineering.




Power Electronics and Renewable Energy Systems


Book Description

The book is a collection of high-quality peer-reviewed research papers presented in the Proceedings of International Conference on Power Electronics and Renewable Energy Systems (ICPERES 2014) held at Rajalakshmi Engineering College, Chennai, India. These research papers provide the latest developments in the broad area of Power Electronics and Renewable Energy. The book discusses wide variety of industrial, engineering and scientific applications of the emerging techniques. It presents invited papers from the inventors/originators of new applications and advanced technologies.




Fault Detection, Diagnosis and Prognosis


Book Description

This book presents the main concepts, state of the art, advances, and case studies of fault detection, diagnosis, and prognosis. This topic is a critical variable in industry to reach and maintain competitiveness. Therefore, proper management of the corrective, predictive, and preventive politics in any industry is required. This book complements other subdisciplines such as economics, finance, marketing, decision and risk analysis, engineering, etc. The book presents real case studies in multiple disciplines. It considers the main topics using prognostic and subdiscipline techniques. It is essential to link these topics with the areas of finance, scheduling, resources, downtime, etc. to increase productivity, profitability, maintainability, reliability, safety, and availability, and reduce costs and downtime. Advances in mathematics, modeling, computational techniques, dynamic analysis, etc. are employed analytically. Computational techniques, dynamic analysis, probabilistic methods, and mathematical optimization techniques are expertly blended to support the analysis of prognostic problems with defined constraints and requirements. The book is intended for graduate students and professionals in industrial engineering, business administration, industrial organization, operations management, applied microeconomics, and the decisions sciences, either studying maintenance or needing to solve large, specific, and complex maintenance management problems as part of their jobs. The work will also be of interest to researches from academia.




Fault Detection, Supervision and Safety for Technical Processes 1991


Book Description

These Proceedings provide a general overview as well as detailed information on the developing field of reliability and safety of technical processes in automatically controlled processes. The plenary papers present the state-of-the-art and an overview in the areas of aircraft and nuclear power stations, because these safety-critical system domains possess the most highly developed fault management and supervision schemes. Additional plenary papers covered the recent developments in analytical redundancy. In total there are 95 papers presented in these Proceedings.




Architecture Design for Soft Errors


Book Description

Architecture Design for Soft Errors provides a comprehensive description of the architectural techniques to tackle the soft error problem. It covers the new methodologies for quantitative analysis of soft errors as well as novel, cost-effective architectural techniques to mitigate them. To provide readers with a better grasp of the broader problem definition and solution space, this book also delves into the physics of soft errors and reviews current circuit and software mitigation techniques. There are a number of different ways this book can be read or used in a course: as a complete course on architecture design for soft errors covering the entire book; a short course on architecture design for soft errors; and as a reference book on classical fault-tolerant machines. This book is recommended for practitioners in semi-conductor industry, researchers and developers in computer architecture, advanced graduate seminar courses on soft errors, and (iv) as a reference book for undergraduate courses in computer architecture. Helps readers build-in fault tolerance to the billions of microchips produced each year, all of which are subject to soft errors Shows readers how to quantify their soft error reliability Provides state-of-the-art techniques to protect against soft errors




Data-driven Detection and Diagnosis of Faults in Traction Systems of High-speed Trains


Book Description

This book addresses the needs of researchers and practitioners in the field of high-speed trains, especially those whose work involves safety and reliability issues in traction systems. It will appeal to researchers and graduate students at institutions of higher learning, research labs, and in the industrial R&D sector, catering to a readership from a broad range of disciplines including intelligent transportation, electrical engineering, mechanical engineering, chemical engineering, the biological sciences and engineering, economics, ecology, and the mathematical sciences.