Improving the Rate-Distortion Performance in Distributed Video Coding


Book Description

Distributed video coding is a coding paradigm, which allows encoding of video frames at a complexity that is substantially lower than that in conventional video coding schemes. This feature makes it suitable for some emerging applications such as wireless surveillance video and mobile camera phones. In distributed video coding, a subset of frames in the video sequence, known as the key frames, are encoded using a conventional intra-frame encoder, such as H264/AVC in the intra mode, and then transmitted to the decoder. The remaining frames, known as the Wyner-Ziv frames, are encoded based on the Wyner-Ziv principle by using the channel codes, such as LDPC codes. In the transform-domain distributed video coding, each Wyner-Ziv frame undergoes a 4x4 block DCT transform and the resulting DCT coefficients are grouped into DCT bands. The bitplaines corresponding to each DCT band are encoded by a channel encoder, for example an LDPCA encoder, one after another. The resulting error-correcting bits are retained in a buffer at the encoder and transmitted incrementally as needed by the decoder. At the decoder, the key frames are first decoded. The decoded key frames are then used to generate a side information frame as an initial estimate of the corresponding Wyner-Ziv frame, usually by employing an interpolation method. The difference between the DCT band in the side information frame and the corresponding one in the Wyner-Ziv frame, referred to as the correlation noise, is often modeled by Laplacian distribution. A soft-input information for each bit in the bitplane is obtained using this correlation noise model and the corresponding DCT band of the side information frame. The channel decoder then uses this soft-input information along with some error-correcting bits sent by the encoder to decode the bitplanes of each DCT band in each of the Wyner-Ziv frames. Hence, an accurate estimation of the correlation noise model parameter(s) and generation of high-quality side information are required for reliable soft-input information for the bitplanes in the decoder, which in turn leads to a more efficient decoding. Consequently, less error-correcting bits need to be transmitted from the encoder to the decoder to decode the bitplanes, leading to a better compression efficiency and rate-distortion performance. The correlation noise is not stationary and its statistics vary within each Wyner-Ziv frame and within its corresponding DCT bands. Hence, it is difficult to find an accurate model for the correlation noise and estimate its parameters precisely at the decoder. Moreover, in existing schemes the parameters of the correlation noise for each DCT band are estimated before the decoder starts to decode the bitplanes of that DCT band and they are not modified and kept unchanged during decoding process of the bitplanes. Another problem of concern is that, since side information frame is generated in the decoder using the temporal interpolation between the previously decoded frames, the quality of the side information frames is generally poor when the motions between the frames are non-linear. Hence, generating a high-quality side information is a challenging problem. This thesis is concerned with the study of accurate estimation of correlation noise model parameters and increasing in the quality of the side information from the standpoint of improving the rate-distortion performance in distributed video coding. A new scheme is proposed for the estimation of the correlation noise parameters wherein the decoder decodes simultaneously all the bitplanes of a DCT band in a Wyner-Ziv frame and then refines the parameters of the correlation noise model of the band in an iterative manner. This process is carried out on an augmented factor graph using a new recursive message passing algorithm, with the side information generated and kept unchanged during the decoding of the Wyner-Ziv frame. Extensive simulations are carried out showing that the proposed decoder leads to an improved rate-distortion performance in comparison to the original DISCOVER codec and in another DVC codec employing side information frame refinement, particularly for video sequences with high motion content. In the second part of this work, a new algorithm for the generation of the side information is proposed to refine the initial side information frame using the additional information obtained after decoding the previous DCT bands of a Wyner-Ziv frame. The simulations are carried out demonstrating that the proposed algorithm provides a performance superior to that of schemes employing the other side information refinement mechanisms. Finally, it is shown that incorporating the proposed algorithm for refining the side information into the decoder proposed in the first part of the thesis leads to a further improvement in the rate-distortion performance of the DVC codec.







Distributed Multiple Description Coding


Book Description

This book examines distributed video coding (DVC) and multiple description coding (MDC), two novel techniques designed to address the problems of conventional image and video compression coding. Covering all fundamental concepts and core technologies, the chapters can also be read as independent and self-sufficient, describing each methodology in sufficient detail to enable readers to repeat the corresponding experiments easily. Topics and features: provides a broad overview of DVC and MDC, from the basic principles to the latest research; covers sub-sampling based MDC, quantization based MDC, transform based MDC, and FEC based MDC; discusses Sleplian-Wolf coding based on Turbo and LDPC respectively, and comparing relative performance; includes original algorithms of MDC and DVC; presents the basic frameworks and experimental results, to help readers improve the efficiency of MDC and DVC; introduces the classical DVC system for mobile communications, providing the developmental environment in detail.




Rate Distortion Theory for Causal Video Coding


Book Description

Due to the sheer volume of data involved, video coding is an important application of lossy source coding, and has received wide industrial interest and support as evidenced by the development and success of a series of video coding standards. All MPEG-series and H-series video coding standards proposed so far are based upon a video coding paradigm called predictive video coding, where video source frames Xi, i=1,2 ..., N, are encoded in a frame by frame manner, the encoder and decoder for each frame Xi, i =1, 2 ..., N, enlist help only from all previous encoded frames Sj, j=1, 2 ..., i-1. In this thesis, we will look further beyond all existing and proposed video coding standards, and introduce a new coding paradigm called causal video coding, in which the encoder for each frame Xi can use all previous original frames Xj, j=1, 2 ..., i-1, and all previous encoded frames Sj, while the corresponding decoder can use only all previous encoded frames. We consider all studies, comparisons, and designs on causal video coding from an information theoretic point of view. Let R*c(D1 ..., D_N) (R*p(D1 ..., D_N), respectively) denote the minimum total rate required to achieve a given distortion level D1 ..., D_N> 0 in causal video coding (predictive video coding, respectively). A novel computation approach is proposed to analytically characterize, numerically compute, and compare the minimum total rate of causal video coding R*c(D1 ..., D_N) required to achieve a given distortion (quality) level D1 ..., D_N> 0. Specifically, we first show that for jointly stationary and ergodic sources X1 ..., X_N, R*c(D1 ..., D_N) is equal to the infimum of the n-th order total rate distortion function R_{c, n}(D1 ..., D_N) over all n, where R_{c, n}(D1 ..., D_N) itself is given by the minimum of an information quantity over a set of auxiliary random variables. We then present an iterative algorithm for computing R_{c, n}(D1 ..., D_N) and demonstrate the convergence of the algorithm to the global minimum. The global convergence of the algorithm further enables us to not only establish a single-letter characterization of R*c(D1 ..., D_N) in a novel way when the N sources are an independent and identically distributed (IID) vector source, but also demonstrate a somewhat surprising result (dubbed the more and less coding theorem)--under some conditions on source frames and distortion, the more frames need to be encoded and transmitted, the less amount of data after encoding has to be actually sent. With the help of the algorithm, it is also shown by example that R*c(D1 ..., D_N) is in general much smaller than the total rate offered by the traditional greedy coding method by which each frame is encoded in a local optimum manner based on all information available to the encoder of the frame. As a by-product, an extended Markov lemma is established for correlated ergodic sources. From an information theoretic point of view, it is interesting to compare causal video coding and predictive video coding, which all existing video coding standards proposed so far are based upon. In this thesis, by fixing N=3, we first derive a single-letter characterization of R*p(D1, D2, D3) for an IID vector source (X1, X2, X3) where X1 and X2 are independent, and then demonstrate the existence of such X1, X2, X3 for which R*p(D1, D2, D3)>R*c(D1, D2, D3) under some conditions on source frames and distortion. This result makes causal video coding an attractive framework for future video coding systems and standards. The design of causal video coding is also considered in the thesis from an information theoretic perspective by modeling each frame as a stationary information source. We first put forth a concept called causal scalar quantization, and then propose an algorithm for designing optimum fixed-rate causal scalar quantizers for causal video coding to minimize the total distortion among all sources. Simulation results show that in comparison with fixed-rate predictive scalar quantization, fixed-rate causal scalar quantization offers as large as 16% quality improvement (distortion reduction).




Rate-Distortion Based Video Compression


Book Description

One of the most intriguing problems in video processing is the removal of the redundancy or the compression of a video signal. There are a large number of applications which depend on video compression. Data compression represents the enabling technology behind the multimedia and digital television revolution. In motion compensated lossy video compression the original video sequence is first split into three new sources of information, segmentation, motion and residual error. These three information sources are then quantized, leading to a reduced rate for their representation but also to a distorted reconstructed video sequence. After the decomposition of the original source into segmentation, mo tion and residual error information is decided, the key remaining problem is the allocation of the available bits into these three sources of information. In this monograph a theory is developed which provides a solution to this fundamental bit allocation problem. It can be applied to all quad-tree-based motion com pensated video coders which use a first order differential pulse code modulation (DPCM) scheme for the encoding of the displacement vector field (DVF) and a block-based transform scheme for the encoding of the displaced frame differ ence (DFD). An optimal motion estimator which results in the smallest DFD energy for a given bit rate for the encoding of the DVF is also a result of this theory. Such a motion estimator is used to formulate a motion compensated interpolation scheme which incorporates a global smoothness constraint for the DVF.




Information, Computer and Application Engineering


Book Description

This proceedings volume brings together peer-reviewed papers presented at the International Conference on Information Technology and Computer Application Engineering, held 10-11 December 2014, in Hong Kong, China. Specific topics under consideration include Computational Intelligence, Computer Science and its Applications, Intelligent Information Processing and Knowledge Engineering, Intelligent Networks and Instruments, Multimedia Signal Processing and Analysis, Intelligent Computer-Aided Design Systems and other related topics. This book provides readers a state-of-the-art survey of recent innovations and research worldwide in Information Technology and Computer Application Engineering, in so-doing furthering the development and growth of these research fields, strengthening international academic cooperation and communication, and promoting the fruitful exchange of research ideas. This volume will be of interest to professionals and academics alike, serving as a broad overview of the latest advances in the dynamic field of Information Technology and Computer Application Engineering.







Advances in Multimedia Information Processing - PCM 2008


Book Description

This book constitutes the refereed proceedings of the 9th Pacific Rim Conference on Multimedia, PCM 2008, held in Tainan, Taiwan, in December 2008. The 79 revised full papers and 39 revised poster presented were carefully reviewed and selected from 210 submissions. The papers are organized in topical sections on next generation video coding techniques, audio processing and classification, interactive multimedia systems, advances in H.264/AVC, multimedia networking techniques, advanced image processing techniques, video analysis and its applications, image detection and classification, visual and spatial analyses, multimedia human computer interfaces, multimedia security and DRM, advanced image and video processing, multimedia database and retrieval, multimedia management and authoring, multimedia personalization, multimedia for e-learning, multimedia networking techniques, multimedia systems and applications, advanced multimedia techniques, as well as multimedia processing and analyses.




Social Computing


Book Description

This two volume set (CCIS 623 and 634) constitutes the refereed proceedings of the Second International Conference of Young Computer Scientists, Engineers and Educators, ICYCSEE 2016, held in Harbin, China, in August 2016. The 91 revised full papers presented were carefully reviewed and selected from 338 submissions. The papers are organized in topical sections on Research Track (Part I) and Education Track, Industry Track, and Demo Track (Part II) and cover a wide range of topics related to social computing, social media, social network analysis, social modeling, social recommendation, machine learning, data mining.




Power and Distortion Optimized Video Coding for Pervasive Computing Applications


Book Description

This dissertation investigates video encoding schemes for pervasive computing applications that must ensure low power consumption in addition to high compression efficiency. The contribution of the dissertation is the formulation of a theoretical problem that captures the joint optimization of power and distortion in video coding. The study of the complexity distribution of typical video encoders helps to develop a complexity-scalable video encoding architecture that includes several control parameters to adjust the power consumption of the major modules of the encoder. An analytic framework to model, control and optimize the power-rate-distortion is developed, which facilitates the development of optimization schemes to determine the best configuration of the complexity control parameters according to either or both the power supply level of the device and the video presentation quality. The dissertation proposes complexity control schemes that dynamically adjust the control parameters. Using extensive simulations on an instruction set simulator, the accuracy of the model, and quality of the optimization schemes are investigated. For additional performance improvement, we propose algorithms that exploit the video content to reduce the power consumption and improve the video quality. This is done by obtaining and maintaining the "motion history" of a video sequence in a hierarchical fashion. By adaptively adjusting the complexity parameters according to the motion history information gained from the video sequence, the power is saved when the scene has little motion and consumed when the motion activity increases. Extensive experiments have been performed to show the validity and merits of the proposed techniques.