Implementation of a Fast Inter-prediction Mode Decision in H.264/AVC Video Encoder


Book Description

H.264/MPEG-4 Part 10 or AVC (advanced video coding) is currently one of the most widely used industry standards for video compression. There are several video codec solutions, both software and hardware, available in the market for H.264. This video compression technology is primarily used in applications such as video conferencing, mobile TV, blu-ray discs, digital television and internet video streaming. This thesis uses the JM 17.2 reference software [15], which is available for all users and can be downloaded from http://iphome.hhi.de/suehring/tml. The software is mainly used for educational purposes; it also includes the reference software manual which has information about installation, compilation and usage. In real time applications such as video streaming and video conferencing it is important that the video encoding/decoding is fast. It is known, that most of the complexity lies in the H.264 encoder, specifically the motion estimation (ME) and mode decision process introduces high computational complexity and takes a lot of CPU (central processing unit) usage. The mode decision process is complex because of variable block sizes (16X16 to 4x4) motion estimation and half and quarter pixel motion compensations. Hence, the objective of this thesis is to reduce the encoding time while maintaining the same quality and efficiency of compression. The Fast adaptive termination (FAT) [30] algorithm is used in the mode decision and motion estimation process. Based on the rate-distortion (RD) cost characteristics all the inter modes are classified as either skip modes or non-skip modes. In order to select the best mode for any macroblock, the minimum RD cost of these two modes is predicted. Further, for skip mode, an early-skip mode detection test is proposed; for non-skip mode a three-stage scheme is proposed to speed up the mode decision process. Experimental results demonstrate that the proposed technique has good robustness in coding efficiency with different quantization parameters (QP) and various video sequences. It is able to achieve encoding time saving by 47.6% and loss of only 0.01% decrease in structural similarity index matrix (SSIM) with negligible degradation in peak signal to noise ratio (PSNR) and acceptable increase in bit rate.




Reducing the Compexity of Inter-prediction Mode Decision for High Effeciency Video Codec


Book Description

The High Efficiency Video Coding (HEVC) standard is the latest joint video project of the International Telecommunication Unit (ITU-T) Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG) standardization organizations, working together in a partnership known as the Joint Collaborative Team on Video Coding (JCT-VC). While the HEVC is based on the same architecture of the widely used H.264/AVC (Advance Video Coding) standard [8], it includes many new coding tools, and almost all the encoder blocks are optimized with respect to their counterparts in the H.264/AVC standard. This allows the new standard to achieve up to 50% bitrate reduction compared to its predecessor with the same visual quality at the cost of increased complexity [1]. Like H.264/AVC, mode decisions with Motion Estimation (ME) remain among the most time-consuming computations in HEVC. In an inter-prediction mode decision, a fullsearch algorithm searches for every possible block size and refines the results from integer-pel to quarter-pel resolution. Thus, a full-search algorithm guarantees the highest level of compression performance. However, the considerable computational complexity for a mode decision decreases the encoding speed. In this thesis a fast adaptive termination [20] algorithm is proposed that terminates early the mode decision in inter-prediction for HEVC. Based on Rate Distortion (RD) cost, all the inter prediction modes are classified as skip or non-skip modes, and to select the best mode minimum RD cost of these two modes are predicted. For skip mode, the mode decision is predicted in early stage while in non-skip mode different stages are proposed to speed-up the mode decision. Experimental results based on several video test sequences suggest a decrease of about 25%-40% in encoding time is achieved with implementation of the Fast Adaptive Termination algorithm for interprediction mode decision with negligible degradation in peak signal to noise ratio (PSNR). Metrics such as BD-bitrate (Bjøntegaard Delta bitrate), BD-PSNR (Bjøntegaard Delta Peak Signal to Noise Ratio), SSIM (Structural Similarity) and computational complexity are also used.




Fast Intra Mode Decision in High Efficiency Video Coding


Book Description

In this thesis a CU early termination algorithm with a fast intra prediction algorithm is proposed that terminates complete full search prediction for the CU and replaced by CU early termination algorithm which determines the complexity of the CU block then on sent decision is made to further split or non-split the CU. This is followed by a PU mode decision to find the optimal modes prediction mode from 35 prediction modes. This includes a two-step process: firstly calculating the Sum of Absolute Differences (SAD) of all the modes by down sampling method and secondly applying a three step search algorithm to remove unnecessary modes. This is followed by early RDOQ (Rate Distortion Optimization Quantization) termination algorithm to further reduce the encoding time. Experimental results based on several video test sequences suggest a decrease of about 35%-48% in encoding time is achieved with implementation of the proposed CU early termination algorithm and fast intra mode decision algorithm for intra predication mode decision with negligible degradation in peak signal to noise ratio (PSNR). Metrics such as BD-bitrate (Bjøntegaard Delta bitrate), BD-PSNR (Bjøntegaard Delta Peak Signal to Noise Ratio) and RD curve (Rate Distortion) are also used.







Implementing Rate-distortion Optimization on a Resource-limited H.264 Encoder


Book Description

This thesis models the rate-distortion characteristics of an H.264 video compression encoder to improve its mode decision performance. First, it provides a background to the fundamentals of video compression. Then it describes the problem of estimating rate and distortion of a macroblock given limited computational resources. It derives the macroblock rate and distortion as a function of the residual SAD and H.264 quantization parameter QP. From the resulting equations, this thesis implements and verifies rate-distortion optimization on a resource-limited H.264 encoder. Finally, it explores other avenues of improvement.




Multilayers Fast Mode Decision Algorithm for Scalable Video Coding


Book Description

Scalable Video Coding (SVC) is the extension of H.264/AVC. It has higher coding complexity and encoding time in SVC encoder. SVC is gaining great interest because of its ability and scalability to adapt in various network conditions. SVC allows partial transmission and decoding of a bitstream. This research deals with the fast mode decision algorithm for decreasing encoding time or fastening the mode decision process of the SVC encoder. Moreover, the performance of SVC over IEEE 802.11g wireless LAN has been evaluated using Scalable Video Evaluation Framework (SVEF). The fast mode decision scheme has been implemented and successfully decreased encoding time with negligible loss of the quality and bitrate requirements. The streaming simulation has also been performed using the SVEF simulator. The simulation result shows the proposed fast mode decision algorithm provides time saving up to 45% while maintaining video quality with negligible PSNR loss.




Implementation of Complexity Reduction Algorithm for Intra Mode Selection in H.264/AVC


Book Description

For applications with low computational capabilities like handheld devices, it is necessary that the encoding complexity is minimal. But H.264, which is the most widely accepted video platform employs several powerful coding techniques that increase encoding complexity. Hence, the objective of this thesis is to implement an algorithm which reduces the encoding complexity by about 25%, but retains the quality of the existing intra prediction algorithm. H.264 offers nine modes for intra prediction of 4x4 luminance blocks, which includes DC prediction and eight directional modes (N4). For regions with less spatial detail, H.264 supports 16x16 intra coding, where in one of the four prediction modes (DC, vertical, horizontal and planar) is chosen for the prediction of the entire luminance component of the macro-block (N16). In addition, H.264 supports intra prediction for the 8x8 chrominance blocks which also use the similar four prediction modes as 16x16 luminance blocks (N8). The existing intra prediction algorithm uses Rate Distortion Optimization(RDO) to examine all possible combinations of coding modes. Therefore the number of mode combinations for each macro-block would be N8x (16xN4 + N16) = 4 x (16 x 9 + 4), which sums up to 592. Thus, to select the best mode for one macro-block in the intra prediction, the H.264/AVC encoder carries out 592 RDO calculations. As a result, the complexity of the encoder increases extremely. This thesis adopts a complexity reduction algorithm using simple directional masks and neighboring modes where in, the number of mode combinations are reduced to 132 at the most, with negligible loss of PSNR(peak signal to noise ratio) and bit-rate increase compared with the H.264 exhaustive search.




Algorithms and Hardware Co-design of HEVC Intra Encoders


Book Description

Digital video is becoming extremely important nowadays and its importance has greatly increased in the last two decades. Due to the rapid development of information and communication technologies, the demand for Ultra-High Definition (UHD) video applications is becoming stronger. However, the most prevalent video compression standard H.264/AVC released in 2003 is inefficient when it comes to UHD videos. The increasing desire for superior compression efficiency to H.264/AVC leads to the standardization of High Efficiency Video Coding (HEVC). Compared with the H.264/AVC standard, HEVC offers a double compression ratio at the same level of video quality or substantial improvement of video quality at the same video bitrate. Yet, HE-VC/H.265 possesses superior compression efficiency, its complexity is several times more than H.264/AVC, impeding its high throughput implementation. Currently, most of the researchers have focused merely on algorithm level adaptations of HEVC/H.265 standard to reduce computational intensity without considering the hardware feasibility. What's more, the exploration of efficient hardware architecture design is not exhaustive. Only a few research works have been conducted to explore efficient hardware architectures of HEVC/H.265 standard. In this dissertation, we investigate efficient algorithm adaptations and hardware architecture design of HEVC intra encoders. We also explore the deep learning approach in mode prediction. From the algorithm point of view, we propose three efficient hardware-oriented algorithm adaptations, including mode reduction, fast coding unit (CU) cost estimation, and group-based CABAC (context-adaptive binary arithmetic coding) rate estimation. Mode reduction aims to reduce mode candidates of each prediction unit (PU) in the rate-distortion optimization (RDO) process, which is both computation-intensive and time-consuming. Fast CU cost estimation is applied to reduce the complexity in rate-distortion (RD) calculation of each CU. Group-based CABAC rate estimation is proposed to parallelize syntax elements processing to greatly improve rate estimation throughput. From the hardware design perspective, a fully parallel hardware architecture of HEVC intra encoder is developed to sustain UHD video compression at 4K@30fps. The fully parallel architecture introduces four prediction engines (PE) and each PE performs the full cycle of mode prediction, transform, quantization, inverse quantization, inverse transform, reconstruction, rate-distortion estimation independently. PU blocks with different PU sizes will be processed by the different prediction engines (PE) simultaneously. Also, an efficient hardware implementation of a group-based CABAC rate estimator is incorporated into the proposed HEVC intra encoder for accurate and high-throughput rate estimation. To take advantage of the deep learning approach, we also propose a fully connected layer based neural network (FCLNN) mode preselection scheme to reduce the number of RDO modes of luma prediction blocks. All angular prediction modes are classified into 7 prediction groups. Each group contains 3-5 prediction modes that exhibit a similar prediction angle. A rough angle detection algorithm is designed to determine the prediction direction of the current block, then a small scale FCLNN is exploited to refine the mode prediction.




Optimization of a Software-only H.264 Encoder


Book Description

With the booming of semiconductor and information technology, digital video have become more popular than ever. The large volume nature of video data has made the capturing, storage, transmission and communication of video data a challenging task. H.264, the latest video coding standard as the joint efforts of MPEG and ITU-T has been considered the state of the art of video coding. However, the performance gain comes at the cost of much higher implementation complexity driving by the Moore's law. To utilize the better coding efficiency of H.264 in video communication, this thesis is focused on improving the encoding speed of an existing H.264 video encoder - T264, leveraging the powerful specific multimedia instructions provided in INTEL/AMD CPUs. The encoder speed optimization was carried out at three different layers: high level language (C-language) level optimization using techniques such as trading code size with speed, move constants outside loop, optimizing large probability events and etc. The second layer takes advantage of the SIMD (Single Instruction Multiple Data) instructions supported by INTEL/AMD MMX/SSE/SSE2 technologies. At this layer, assembly code is written to leverage these powerful parallel instructions. And at the third layer, the optimization is carried out at algorithm level to try to make motion estimation/mode decision faster. After all these changes, the software-only encoder speed is improved around 50% percent with un-noticeable video quality difference.