Building Dependable Distributed Systems


Book Description

A one-volume guide to the most essential techniques for designing and building dependable distributed systems Instead of covering a broad range of research works for each dependability strategy, this useful reference focuses on only a selected few (usually the most seminal works, the most practical approaches, or the first publication of each approach), explaining each in depth, usually with a comprehensive set of examples. Each technique is dissected thoroughly enough so that readers who are not familiar with dependable distributed computing can actually grasp the technique after studying the book. Building Dependable Distributed Systems consists of eight chapters. The first introduces the basic concepts and terminology of dependable distributed computing, and also provides an overview of the primary means of achieving dependability. Checkpointing and logging mechanisms, which are the most commonly used means of achieving limited degree of fault tolerance, are described in the second chapter. Works on recovery-oriented computing, focusing on the practical techniques that reduce the fault detection and recovery times for Internet-based applications, are covered in chapter three. Chapter four outlines the replication techniques for data and service fault tolerance. This chapter also pays particular attention to optimistic replication and the CAP theorem. Chapter five explains a few seminal works on group communication systems. Chapter six introduces the distributed consensus problem and covers a number of Paxos family algorithms in depth. The Byzantine generals problem and its latest solutions, including the seminal Practical Byzantine Fault Tolerance (PBFT) algorithm and a number of its derivatives, are introduced in chapter seven. The final chapter details the latest research results surrounding application-aware Byzantine fault tolerance, which represents an important step forward in the practical use of Byzantine fault tolerance techniques.




Fundamentals of Dependable Computing for Software Engineers


Book Description

Fundamentals of Dependable Computing for Software Engineers presents the essential elements of computer system dependability. The book describes a comprehensive dependability-engineering process and explains the roles of software and software engineers in computer system dependability. Readers will learn: Why dependability matters What it means for a system to be dependable How to build a dependable software system How to assess whether a software system is adequately dependable The author focuses on the actions needed to reduce the rate of failure to an acceptable level, covering material essential for engineers developing systems with extreme consequences of failure, such as safety-critical systems, security-critical systems, and critical infrastructure systems. The text explores the systems engineering aspects of dependability and provides a framework for engineers to reason and make decisions about software and its dependability. It also offers a comprehensive approach to achieve software dependability and includes a bibliography of the most relevant literature. Emphasizing the software engineering elements of dependability, this book helps software and computer engineers in fields requiring ultra-high levels of dependability, such as avionics, medical devices, automotive electronics, weapon systems, and advanced information systems, construct software systems that are dependable and within budget and time constraints.




Reliability Engineering and Services


Book Description

Offers a holistic approach to guiding product design, manufacturing, and after-sales support as the manufacturing industry transitions from a product-oriented model to service-oriented paradigm This book provides fundamental knowledge and best industry practices in reliability modelling, maintenance optimization, and service parts logistics planning. It aims to develop an integrated product-service system (IPSS) synthesizing design for reliability, performance-based maintenance, and spare parts inventory. It also presents a lifecycle reliability-inventory optimization framework where reliability, redundancy, maintenance, and service parts are jointly coordinated. Additionally, the book aims to report the latest advances in reliability growth planning, maintenance contracting and spares inventory logistics under non-stationary demand condition. Reliability Engineering and Service provides in-depth chapter coverage of topics such as: Reliability Concepts and Models; Mean and Variance of Reliability Estimates; Design for Reliability; Reliability Growth Planning; Accelerated Life Testing and Its Economics; Renewal Theory and Superimposed Renewals; Maintenance and Performance-Based Logistics; Warranty Service Models; Basic Spare Parts Inventory Models; Repairable Inventory Systems; Integrated Product-Service Systems (IPPS), and Resilience Modeling and Planning Guides engineers to design reliable products at a low cost Assists service engineers in providing superior after-sales support Enables managers to respond to the changing market and customer needs Uses end-of-chapter case studies to illustrate industry best practice Lifecycle approach to reliability, maintenance and spares provisioning Reliability Engineering and Service is an important book for graduate engineering students, researchers, and industry-based reliability practitioners and consultants.




Delta-4: A Generic Architecture for Dependable Distributed Computing


Book Description

Delta-4 is a 5-nation, 13-partner project that has been investigating the achievement of dependability in open distributed systems, including real-time systems. This book describes the design and validation of the distributed fault-tolerant architecture developed within this project. The key features of the Delta-4 architecture are: (a) a distributed object-oriented application support environment; (b) built-in support for user-transparent fault tolerance; (c) use of multicast or group communication protocols; and (d) use of standard off the-shelf processors and standard local area network technology with minimum specialized hardware. The book is organized as follows: The first 3 chapters give an overview of the architecture's objectives and of the architecture itself, and compare the proposed solutions with other approaches. Chapters 4 to 12 give a more detailed insight into the Delta-4 architectural concepts. Chapters 4 and 5 are devoted to providing a firm set of general concepts and terminology regarding dependable and real-time computing. Chapter 6 is centred on fault-tolerance techniques based on distribution. The description of the architecture itself commences with a description of the Delta-4 application support environment (Deltase) in chapter 7. Two variants of the architecture - the Delta-4 Open System Architecture (OSA) and the Delta-4 Extra Performance Architecture (XPA) - are described respectively in chapters 8 and 9. Both variants of the architecture have a common underlying basis for dependable multicasting, i. e.




Dependable Computing - EDCC-1


Book Description

This book presents the proceedings of the First European Dependable Computing Conference (EDCC-1), held in Berlin, Germany, in October 1994. EDCC is the merger of two former European events on dependable computing. The volume comprises 34 refereed full papers selected from 106 submissions. The contributions address all current aspects of dependable computing and reflect the state of the art in dependable systems research and advanced applications; among the topics covered are hardware and software reliability, safety-critical and secure systems, fault-tolerance and detection, verification and validation, formal methods, hardware and software testing, and parallel and distributed systems.





Book Description




Markov Chains and Dependability Theory


Book Description

Covers fundamental and applied results of Markov chain analysis for the evaluation of dependability metrics, for graduate students and researchers.




Reliability Engineering


Book Description

This book gives a practical guide for designers and users in Information and Communication Technology context. In particular, in the first Section, the definition of the fundamental terms according to the international standards are given. Then, some theoretical concepts and reliability models are presented in Chapters 2 and 3: the aim is to evaluate performance for components and systems and reliability growth. Chapter 4, by introducing the laboratory tests, puts in evidence the reliability concept from the experimental point of view. In ICT context, the failure rate for a given system can be evaluate by means of specific reliability prediction handbooks; this aspect is considered in Chapter 5, with practical applications. In Chapters 6, 7 and 8, the more complex aspects regarding both the Maintainability, Availability and Dependability are taken into account; in particular, some fundamental techniques such as FMECA (Failure Mode, Effects, and Criticality Analysis) and FTA (Fault Tree Analysis) are presented with examples for reparable systems.




Dependability Metrics


Book Description

This tutorial book gives an overview of the current state of the art in measuring the different aspects of dependability of systems: reliability, security and performance.