Book Description
This dissertation focuses on architecting survivable network designs and developing new bandwidth-allocation mechanisms as an integral part of the network's daily operations to better survive major network failures. This is an important topic since telecommunication networks are exposed to many threats such as malicious attacks, equipment failures, human errors (e.g., misconfigurations), and large-scale disasters, both human-made (e.g., due to weapons of mass destruction (WMD) attacks) and natural. Also, the emergence of bandwidth-hungry applications has led to rapid growth in the volume of the data traffic in our networks, and failures (especially a large-scale disaster) result in huge data loss/service disruption in these networks. To alleviate the detrimental effects of failures on network services, the following two aspects need close attention: i) survivable network design and ii) survivable network operation. To improve the network performance under failure mode of operation in a cost-effective manner, this dissertation focuses on architecting survivable network designs and providing intelligent ways of allocating network resources during network operation for better survivability. Today’s networks support diverse services: from cloud-based services (e.g., video streaming) to traditional ones (e.g., VoIP), which have different requirements (e.g., delay/latency tolerance and bandwidth) and characteristics (e.g., origination, importance, and revenue generation). With such heterogeneity, using the same traffic engineering policies (routing, protection, restoration strategies, etc.) for all services can result in suboptimal solutions. In the first part of the dissertation (Chapters 2 and 3), we focus on cost-effective and survivable service provisioning schemes exploiting service heterogeneity, especially degraded-service tolerance, which exploits a service’s capability to operate with reduced bandwidth. We propose two novel disaster-aware service-provisioning schemes: i) the first one combines degraded-service tolerance with the fact that cloud-based services do not require one-to-one connectivity (unicast), and multiplexes cloud-based services over multiple paths destined to multiple servers/datacenters with manycasting and ii) the second one proposes to reallocate scarce network resources after a failure based on degraded-service tolerance of connections. Both schemes maintain some bandwidth (i.e., degraded service) after a failure vs. no service at all. In the second part of the dissertation (Chapters 4 and 5), we shift our focus to improve Software-Defined Network survivability by designing the control plane to be resilient enough to survive network failures whenever possible and to be quickly recoverable at other times. To this end, we propose to design the control plane as a virtual network and map the controllers over the physical network such that the connectivity among the controllers (controller-to-controller) and between the switches to the controllers (switch-to- controllers) is not compromised by physical infrastructure failures caused by disasters. Also, we propose a novel switch-to-controller assignment and communication path routing scheme which prepares for restoration after failures to minimize the recovery time of disrupted switch-to-controller paths, so that the disrupted data traffic can be recovered quickly.