Elastic Resource Management in Cloud Computing Platforms


Book Description

Large scale enterprise applications are known to observe dynamic workload; provisioning correct capacity for these applications remains an important and challenging problem. Predicting high variability fluctuations in workload or the peak workload is difficult; erroneous predictions often lead to under-utilized systems or in some situations cause temporarily outage of an otherwise well provisioned web-site. Consequently, rather than provisioning server capacity to handle infrequent peak workloads, an alternate approach of dynamically provisioning capacity on-the-fly in response to workload fluctuations has become popular. Cloud platforms are particularly suited for such applications due to their ability to provision capacity when needed and charge for usage on pay-per-use basis. Cloud environments enable elastic provisioning by providing a variety of hardware configurations as well as mechanisms to add or remove server capacity. The first part of this thesis presents Kingfisher, a cost-aware system that provides a generalized provisioning framework for supporting elasticity in the cloud by (i) leveraging multiple mechanisms to reduce the time to transition to new configurations, and (ii) optimizing the selection of a virtual server configuration that minimize cost. Majority of these enterprise applications, deployed as web applications, are distributed or replicated with a multi-tier architecture. SLAs for such applications are often expressed as a high percentile of a performance metric, for e.g. 99 percentile of end to end response time is less than 1 sec. In the second part of this thesis I present a model driven technique which provisions a multi-tier application for such an SLA and is targeted for cloud platforms. Enterprises critically depend on these applications and often own large IT infrastructure to support the regular operation of these applications. However, provisioning for a peak load or for high percentile of response time could be prohibitively expensive. Thus there is a need of hybrid cloud model, where the enterprise uses its own private resources for the majority of its computing, but then "bursts" into the cloud when local resources are insufficient. I discuss a new system, namely Seagull, which performs dynamic provisioning over a hybrid cloud model by enabling cloud bursting. Finally, I describe a methodology to model the configuration patterns (i.e deployment topologies) of different control plane services of a cloud management system itself. I present a generic methodology, based on empirical profiling, which provides initial deployment configuration of a control plane service and also a mechanism which iteratively adjusts the configuration to avoid violation of control plane's Service Level Objective (SLO).




Elastic Resource Management in Distributed Clouds


Book Description

The ubiquitous nature of computing devices and their increasing reliance on remote resources have driven and shaped public cloud platforms into unprecedented large-scale, distributed data centers. Concurrently, a plethora of cloud-based applications are experiencing multi-dimensional workload dynamics---workload volumes that vary along both time and space axes and with higher frequency. The interplay of diverse workload characteristics and distributed clouds raises several key challenges for efficiently and dynamically managing server resources. First, current cloud platforms impose certain restrictions that might hinder some resource management tasks. Second, an application-agnostic approach might not entail appropriate performance goals, therefore, requires numerous specific methods. Third, provisioning resources outside LAN boundary might incur huge delay which would impact the desired agility. In this dissertation, I investigate the above challenges and present the design of automated systems that manage resources for various applications in distributed clouds. The intermediate goal of these automated systems is to fully exploit potential benefits such as reduced network latency offered by increasingly distributed server resources. The ultimate goal is to improve end-to-end user response time with novel resource management approaches, within a certain cost budget. Centered around these two goals, I first investigate how to optimize the location and performance of virtual machines in distributed clouds. I use virtual desktops, mostly serving a single user, as an example use case for developing a black-box approach that ranks virtual machines based on their dynamic latency requirements. Those with high latency sensitivities have a higher priority of being placed or migrated to a cloud location closest to their users. Next, I relax the assumption of well-provisioned virtual machines and look at how to provision enough resources for applications that exhibit both temporal and spatial workload fluctuations. I propose an application-agnostic queueing model that captures the resource utilization and server response time. Building upon this model, I present a geo-elastic provisioning approach---referred as geo-elasticity---for replicable multi-tier applications that can spin up an appropriate amount of server resources in any cloud locations. Last, I explore the benefits of providing geo-elasticity for database clouds, a popular platform for hosting application backends. Performing geo-elastic provisioning for backend database servers entails several challenges that are specific to database workload, and therefore requires tailored solutions. In addition, cloud platforms offer resources at various prices for different locations. Towards this end, I propose a cost-aware geo-elasticity that combines a regression-based workload model and a queueing network capacity model for database clouds. In summary, hosting a diverse set of applications in an increasingly distributed cloud makes it interesting and necessary to develop new, efficient and dynamic resource management approaches.




Adaptive Resource Management and Scheduling for Cloud Computing


Book Description

This book constitutes the thoroughly refereed post-conference proceedings of the Second International Workshop on Adaptive Resource Management and Scheduling for Cloud Computing, ARMS-CC 2015, held in Conjunction with ACM Symposium on Principles of Distributed Computing, PODC 2015, in Donostia-San Sebastián, Spain, in July 2015. The 12 revised full papers, including 1 invited paper, were carefully reviewed and selected from 24 submissions. The papers have identified several important aspects of the problem addressed by ARMS-CC: self-* and autonomous cloud systems, cloud quality management and service level agreement (SLA), scalable computing, mobile cloud computing, cloud computing techniques for big data, high performance cloud computing, resource management in big data platforms, scheduling algorithms for big data processing, cloud composition, federation, bridging, and bursting, cloud resource virtualization and composition, load-balancing and co-allocation, fault tolerance, reliability, and availability of cloud systems.




Resource Management in Utility and Cloud Computing


Book Description

This SpringerBrief reviews the existing market-oriented strategies for economically managing resource allocation in distributed systems. It describes three new schemes that address cost-efficiency, user incentives, and allocation fairness with regard to different scheduling contexts. The first scheme, taking the Amazon EC2TM market as a case of study, investigates the optimal resource rental planning models based on linear integer programming and stochastic optimization techniques. This model is useful to explore the interaction between the cloud infrastructure provider and the cloud resource customers. The second scheme targets a free-trade resource market, studying the interactions amongst multiple rational resource traders. Leveraging an optimization framework from AI, this scheme examines the spontaneous exchange of resources among multiple resource owners. Finally, the third scheme describes an experimental market-oriented resource sharing platform inspired by eBay's transaction model. The study presented in this book sheds light on economic models and their implication to the utility-oriented scheduling problems.







Transiency-driven Resource Management for Cloud Computing Platforms


Book Description

Modern distributed server applications are hosted on enterprise or cloud data centers that provide computing, storage, and networking capabilities to these applications. These applications are built using the implicit assumption that the underlying servers will be stable and normally available, barring for occasional faults. In many emerging scenarios, however, data centers and clouds only provide transient, rather than continuous, availability of their servers. Transiency in modern distributed systems arises in many contexts, such as green data centers powered using renewable intermittent sources, and cloud platforms that provide lower-cost transient servers which can be unilaterally revoked by the cloud operator. Transient computing resources are increasingly important, and existing fault-tolerance and resource management techniques are inadequate for transient servers because applications typically assume continuous resource availability. This thesis presents research in distributed systems design that treats transiency as a first-class design principle. I show that combining transiency-specific fault-tolerance mechanisms with resource management policies to suit application characteristics and requirements, can yield significant cost and performance benefits. These mechanisms and policies have been implemented and prototyped as part of software systems, which allow a wide range of applications, such as interactive services and distributed data processing, to be deployed on transient servers, and can reduce cloud computing costs by up to 90\%. This thesis makes contributions to four areas of computer systems research: transiency-specific fault-tolerance, resource allocation, abstractions, and resource reclamation. For reducing the impact of transient server revocations, I develop two fault-tolerance techniques that are tailored to transient server characteristics and application requirements. For interactive applications, I build a derivative cloud platform that masks revocations by transparently moving application-state between servers of different types. Similarly, for distributed data processing applications, I investigate the use of application level periodic checkpointing to reduce the performance impact of server revocations. For managing and reducing the risk of server revocations, I investigate the use of server portfolios that allow transient resource allocation to be tailored to application requirements. Finally, I investigate how resource providers (such as cloud platforms) can provide transient resource availability without revocation, by looking into alternative resource reclamation techniques. I develop resource deflation, wherein a server's resources are fractionally reclaimed, allowing the application to continue execution albeit with fewer resources. Resource deflation generalizes revocation, and the deflation mechanisms and cluster-wide policies can yield both high cluster utilization and low application performance degradation.







Research Anthology on Architectures, Frameworks, and Integration Strategies for Distributed and Cloud Computing


Book Description

Distributed systems intertwine with our everyday lives. The benefits and current shortcomings of the underpinning technologies are experienced by a wide range of people and their smart devices. With the rise of large-scale IoT and similar distributed systems, cloud bursting technologies, and partial outsourcing solutions, private entities are encouraged to increase their efficiency and offer unparalleled availability and reliability to their users. The Research Anthology on Architectures, Frameworks, and Integration Strategies for Distributed and Cloud Computing is a vital reference source that provides valuable insight into current and emergent research occurring within the field of distributed computing. It also presents architectures and service frameworks to achieve highly integrated distributed systems and solutions to integration and efficient management challenges faced by current and future distributed systems. Highlighting a range of topics such as data sharing, wireless sensor networks, and scalability, this multi-volume book is ideally designed for system administrators, integrators, designers, developers, researchers, academicians, and students.




Cloud Computing Patterns


Book Description

The current work provides CIOs, software architects, project managers, developers, and cloud strategy initiatives with a set of architectural patterns that offer nuggets of advice on how to achieve common cloud computing-related goals. The cloud computing patterns capture knowledge and experience in an abstract format that is independent of concrete vendor products. Readers are provided with a toolbox to structure cloud computing strategies and design cloud application architectures. By using this book cloud-native applications can be implemented and best suited cloud vendors and tooling for individual usage scenarios can be selected. The cloud computing patterns offer a unique blend of academic knowledge and practical experience due to the mix of authors. Academic knowledge is brought in by Christoph Fehling and Professor Dr. Frank Leymann who work on cloud research at the University of Stuttgart. Practical experience in building cloud applications, selecting cloud vendors, and designing enterprise architecture as a cloud customer is brought in by Dr. Ralph Retter who works as an IT architect at T‐Systems, Walter Schupeck, who works as a Technology Manager in the field of Enterprise Architecture at Daimler AG,and Peter Arbitter, the former head of T Systems’ cloud architecture and IT portfolio team and now working for Microsoft. Voices on Cloud Computing Patterns Cloud computing is especially beneficial for large companies such as Daimler AG. Prerequisite is a thorough analysis of its impact on the existing applications and the IT architectures. During our collaborative research with the University of Stuttgart, we identified a vendor-neutral and structured approach to describe properties of cloud offerings and requirements on cloud environments. The resulting Cloud Computing Patterns have profoundly impacted our corporate IT strategy regarding the adoption of cloud computing. They help our architects, project managers and developers in the refinement of architectural guidelines and communicate requirements to our integration partners and software suppliers. Dr. Michael Gorriz – CIO Daimler AG Ever since 2005 T-Systems has provided a flexible and reliable cloud platform with its “Dynamic Services”. Today these cloud services cover a huge variety of corporate applications, especially enterprise resource planning, business intelligence, video, voice communication, collaboration, messaging and mobility services. The book was written by senior cloud pioneers sharing their technology foresight combining essential information and practical experiences. This valuable compilation helps both practitioners and clients to really understand which new types of services are readily available, how they really work and importantly how to benefit from the cloud. Dr. Marcus Hacke – Senior Vice President, T-Systems International GmbH This book provides a conceptual framework and very timely guidance for people and organizations building applications for the cloud. Patterns are a proven approach to building robust and sustainable applications and systems. The authors adapt and extend it to cloud computing, drawing on their own experience and deep contributions to the field. Each pattern includes an extensive discussion of the state of the art, with implementation considerations and practical examples that the reader can apply to their own projects. By capturing our collective knowledge about building good cloud applications and by providing a format to integrate new insights, this book provides an important tool not just for individual practitioners and teams, but for the cloud computing community at large. Kristof Kloeckner – General Manager,Rational Software, IBMSoftware Group




Architecting Cloud Computing Solutions


Book Description

Accelerating Business and Mission Success with Cloud Computing. Key Features A step-by-step guide that will practically guide you through implementing Cloud computing services effectively and efficiently. Learn to choose the most ideal Cloud service model, and adopt appropriate Cloud design considerations for your organization. Leverage Cloud computing methodologies to successfully develop a cost-effective Cloud environment successfully. Book Description Cloud adoption is a core component of digital transformation. Scaling the IT environment, making it resilient, and reducing costs are what organizations want. Architecting Cloud Computing Solutions presents and explains critical Cloud solution design considerations and technology decisions required to choose and deploy the right Cloud service and deployment models, based on your business and technology service requirements. This book starts with the fundamentals of cloud computing and its architectural concepts. It then walks you through Cloud service models (IaaS, PaaS, and SaaS), deployment models (public, private, community, and hybrid) and implementation options (Enterprise, MSP, and CSP) to explain and describe the key considerations and challenges organizations face during cloud migration. Later, this book delves into how to leverage DevOps, Cloud-Native, and Serverless architectures in your Cloud environment and presents industry best practices for scaling your Cloud environment. Finally, this book addresses (in depth) managing essential cloud technology service components such as data storage, security controls, and disaster recovery. By the end of this book, you will have mastered all the design considerations and operational trades required to adopt Cloud services, no matter which cloud service provider you choose. What you will learn Manage changes in the digital transformation and cloud transition process Design and build architectures that support specific business cases Design, modify, and aggregate baseline cloud architectures Familiarize yourself with cloud application security and cloud computing security threats Design and architect small, medium, and large cloud computing solutions Who this book is for If you are an IT Administrator, Cloud Architect, or a Solution Architect keen to benefit from cloud adoption for your organization, then this book is for you. Small business owners, managers, or consultants will also find this book useful. No prior knowledge of Cloud computing is needed.