Fault-Tolerant Search Algorithms


Book Description

Why a book on fault-tolerant search algorithms? Searching is one of the fundamental problems in computer science. Time and again algorithmic and combinatorial issues originally studied in the context of search find application in the most diverse areas of computer science and discrete mathematics. On the other hand, fault-tolerance is a necessary ingredient of computing. Due to their inherent complexity, information systems are naturally prone to errors, which may appear at any level – as imprecisions in the data, bugs in the software, or transient or permanent hardware failures. This book provides a concise, rigorous and up-to-date account of different approaches to fault-tolerance in the context of algorithmic search theory. Thanks to their basic structure, search problems offer insights into how fault-tolerant techniques may be applied in various scenarios. In the first part of the book, a paradigmatic model for fault-tolerant search is presented, the Ulam—Rényi problem. Following a didactic approach, the author takes the reader on a tour of Ulam—Rényi problem variants of increasing complexity. In the context of this basic model, fundamental combinatorial and algorithmic issues in the design of fault-tolerant search procedures are discussed. The algorithmic efficiency achievable is analyzed with respect to the statistical nature of the error sources, and the amount of information on which the search algorithm bases its decisions. In the second part of the book, more general models of faults and fault-tolerance are considered. Special attention is given to the application of fault-tolerant search procedures to specific problems in distributed computing, bioinformatics and computational learning. This book will be of special value to researchers from the areas of combinatorial search and fault-tolerant computation, but also to researchers in learning and coding theory, databases, and artificial intelligence. Only basic training in discrete mathematics is assumed. Parts of the book can be used as the basis for specialized graduate courses on combinatorial search, or as supporting material for a graduate or undergraduate course on error-correcting codes.




Cloud Reliability Engineering


Book Description

Coud reliability engineering is a leading issue of cloud services. Cloud service providers guarantee computation, storage and applications through service-level agreements (SLAs) for promised levels of performance and uptime. Cloud Reliability Engineering: Technologies and Tools presents case studies examining cloud services, their challenges, and the reliability mechanisms used by cloud service providers. These case studies provide readers with techniques to harness cloud reliability and availability requirements in their own endeavors. Both conceptual and applied, the book explains reliability theory and the best practices used by cloud service companies to provide high availability. It also examines load balancing, and cloud security. Written by researchers and practitioners, the book’s chapters are a comprehensive study of cloud reliability and availability issues and solutions. Various reliability class distributions and their effects on cloud reliability are discussed. An important aspect of reliability block diagrams is used to categorize poor reliability of cloud infrastructures, where enhancement can be made to lower the failure rate of the system. This technique can be used in design and functional stages to determine poor reliability of a system and provide target improvements. Load balancing for reliability is examined as a migrating process or performed by using virtual machines. The approach employed to identify the lightly loaded destination node to which the processes/virtual machines migrate can be optimized by employing a genetic algorithm. To analyze security risk and reliability, a novel technique for minimizing the number of keys and the security system is presented. The book also provides an overview of testing methods for the cloud, and a case study discusses testing reliability, installability, and security. A comprehensive volume, Cloud Reliability Engineering: Technologies and Tools combines research, theory, and best practices used to engineer reliable cloud availability and performance.




Fault-Tolerant Message-Passing Distributed Systems


Book Description

This book presents the most important fault-tolerant distributed programming abstractions and their associated distributed algorithms, in particular in terms of reliable communication and agreement, which lie at the heart of nearly all distributed applications. These programming abstractions, distributed objects or services, allow software designers and programmers to cope with asynchrony and the most important types of failures such as process crashes, message losses, and malicious behaviors of computing entities, widely known under the term "Byzantine fault-tolerance". The author introduces these notions in an incremental manner, starting from a clear specification, followed by algorithms which are first described intuitively and then proved correct. The book also presents impossibility results in classic distributed computing models, along with strategies, mainly failure detectors and randomization, that allow us to enrich these models. In this sense, the book constitutes an introduction to the science of distributed computing, with applications in all domains of distributed systems, such as cloud computing and blockchains. Each chapter comes with exercises and bibliographic notes to help the reader approach, understand, and master the fascinating field of fault-tolerant distributed computing.




Research Anthology on Architectures, Frameworks, and Integration Strategies for Distributed and Cloud Computing


Book Description

Distributed systems intertwine with our everyday lives. The benefits and current shortcomings of the underpinning technologies are experienced by a wide range of people and their smart devices. With the rise of large-scale IoT and similar distributed systems, cloud bursting technologies, and partial outsourcing solutions, private entities are encouraged to increase their efficiency and offer unparalleled availability and reliability to their users. The Research Anthology on Architectures, Frameworks, and Integration Strategies for Distributed and Cloud Computing is a vital reference source that provides valuable insight into current and emergent research occurring within the field of distributed computing. It also presents architectures and service frameworks to achieve highly integrated distributed systems and solutions to integration and efficient management challenges faced by current and future distributed systems. Highlighting a range of topics such as data sharing, wireless sensor networks, and scalability, this multi-volume book is ideally designed for system administrators, integrators, designers, developers, researchers, academicians, and students.




Fault-Tolerant Systems


Book Description

Fault-Tolerant Systems is the first book on fault tolerance design with a systems approach to both hardware and software. No other text on the market takes this approach, nor offers the comprehensive and up-to-date treatment that Koren and Krishna provide. This book incorporates case studies that highlight six different computer systems with fault-tolerance techniques implemented in their design. A complete ancillary package is available to lecturers, including online solutions manual for instructors and PowerPoint slides. Students, designers, and architects of high performance processors will value this comprehensive overview of the field. - The first book on fault tolerance design with a systems approach - Comprehensive coverage of both hardware and software fault tolerance, as well as information and time redundancy - Incorporated case studies highlight six different computer systems with fault-tolerance techniques implemented in their design - Available to lecturers is a complete ancillary package including online solutions manual for instructors and PowerPoint slides




Swarm Intelligence Optimization


Book Description

Resource optimization has always been a thrust area of research, and as the Internet of Things (IoT) is the most talked about topic of the current era of technology, it has become the need of the hour. Therefore, the idea behind this book was to simplify the journey of those who aspire to understand resource optimization in the IoT. To this end, included in this book are various real-time/offline applications and algorithms/case studies in the fields of engineering, computer science, information security, and cloud computing, along with the modern tools and various technologies used in systems, leaving the reader with a high level of understanding of various techniques and algorithms used in resource optimization.




Bio-Inspired Fault-Tolerant Algorithms for Network-on-Chip


Book Description

Network on Chip (NoC) addresses the communication requirement of different nodes on System on Chip. The bio-inspired algorithms improve the bandwidth utilization, maximize the throughput and reduce the end-to-end latency and inter-flit arrival time. This book exclusively presents in-depth information regarding bio-inspired algorithms solving real world problems focussing on fault-tolerant algorithms inspired by the biological brain and implemented on NoC. It further documents the bio-inspired algorithms in general and more specifically, in the design of NoC. It gives an exhaustive review and analysis of the NoC architectures developed during the last decade according to various parameters. Key Features: Covers bio-inspired solutions pertaining to Network-on-Chip (NoC) design solving real world examples Includes bio-inspired NoC fault-tolerant algorithms with detail coding examples Lists fault-tolerant algorithms with detailed examples Reviews basic concepts of NoC Discusses NoC architectures developed-to-date




Fault Tolerant Attitude Estimation for Small Satellites


Book Description

Small satellites use commercial off-the-shelf sensors and actuators for attitude determination and control (ADC) to reduce the cost. These sensors and actuators are usually not as robust as the available, more expensive, space-proven equipment. As a result, the ADC system of small satellites is more vulnerable to any fault compared to a system for larger competitors. This book aims to present useful solutions for fault tolerance in ADC systems of small satellites. The contents of the book can be divided into two categories: fault tolerant attitude filtering algorithms for small satellites and sensor calibration methods to compensate the sensor errors. MATLAB® will be used to demonstrate simulations. Presents fault tolerant attitude estimation algorithms for small satellites with an emphasis on algorithms’ practicability and applicability Incorporates fundamental knowledge about the attitude determination methods at large Discusses comprehensive information about attitude sensors for small satellites Reviews calibration algorithms for small satellite magnetometers with simulated examples Supports theory with MATLAB simulation results which can be easily understood by individuals without a comprehensive background in this field Covers up-to-date discussions for small satellite attitude systems design Dr. Chingiz Hajiyev is a professor at the Faculty of Aeronautics and Astronautics, Istanbul Technical University (Istanbul, Turkey). Dr. Halil Ersin Soken is an assistant professor at the Aerospace Engineering Department, Middle East Technical University (Ankara, Turkey).




Introduction To Quantum Computation And Information


Book Description

This book aims to provide a pedagogical introduction to the subjects of quantum information and quantum computation. Topics include non-locality of quantum mechanics, quantum computation, quantum cryptography, quantum error correction, fault-tolerant quantum computation as well as some experimental aspects of quantum computation and quantum cryptography. Only knowledge of basic quantum mechanics is assumed. Whenever more advanced concepts and techniques are used, they are introduced carefully. This book is meant to be a self-contained overview. While basic concepts are discussed in detail, unnecessary technical details are excluded. It is well-suited for a wide audience ranging from physics graduate students to advanced researchers.This book is based on a lecture series held at Hewlett-Packard Labs, Basic Research Institute in the Mathematical Sciences (BRIMS), Bristol from November 1996 to April 1997, and also includes other contributions.




Data-driven Design of Fault Diagnosis and Fault-tolerant Control Systems


Book Description

Data-driven Design of Fault Diagnosis and Fault-tolerant Control Systems presents basic statistical process monitoring, fault diagnosis, and control methods and introduces advanced data-driven schemes for the design of fault diagnosis and fault-tolerant control systems catering to the needs of dynamic industrial processes. With ever increasing demands for reliability, availability and safety in technical processes and assets, process monitoring and fault-tolerance have become important issues surrounding the design of automatic control systems. This text shows the reader how, thanks to the rapid development of information technology, key techniques of data-driven and statistical process monitoring and control can now become widely used in industrial practice to address these issues. To allow for self-contained study and facilitate implementation in real applications, important mathematical and control theoretical knowledge and tools are included in this book. Major schemes are presented in algorithm form and demonstrated on industrial case systems. Data-driven Design of Fault Diagnosis and Fault-tolerant Control Systems will be of interest to process and control engineers, engineering students and researchers with a control engineering background.