Implementation of Complexity Reduction Algorithm for Intra Mode Selection in H.264/AVC


Book Description

For applications with low computational capabilities like handheld devices, it is necessary that the encoding complexity is minimal. But H.264, which is the most widely accepted video platform employs several powerful coding techniques that increase encoding complexity. Hence, the objective of this thesis is to implement an algorithm which reduces the encoding complexity by about 25%, but retains the quality of the existing intra prediction algorithm. H.264 offers nine modes for intra prediction of 4x4 luminance blocks, which includes DC prediction and eight directional modes (N4). For regions with less spatial detail, H.264 supports 16x16 intra coding, where in one of the four prediction modes (DC, vertical, horizontal and planar) is chosen for the prediction of the entire luminance component of the macro-block (N16). In addition, H.264 supports intra prediction for the 8x8 chrominance blocks which also use the similar four prediction modes as 16x16 luminance blocks (N8). The existing intra prediction algorithm uses Rate Distortion Optimization(RDO) to examine all possible combinations of coding modes. Therefore the number of mode combinations for each macro-block would be N8x (16xN4 + N16) = 4 x (16 x 9 + 4), which sums up to 592. Thus, to select the best mode for one macro-block in the intra prediction, the H.264/AVC encoder carries out 592 RDO calculations. As a result, the complexity of the encoder increases extremely. This thesis adopts a complexity reduction algorithm using simple directional masks and neighboring modes where in, the number of mode combinations are reduced to 132 at the most, with negligible loss of PSNR(peak signal to noise ratio) and bit-rate increase compared with the H.264 exhaustive search.




Video coding standards


Book Description

The requirements for multimedia (especially video and audio) communications increase rapidly in the last two decades in broad areas such as television, entertainment, interactive services, telecommunications, conference, medicine, security, business, traffic, defense and banking. Video and audio coding standards play most important roles in multimedia communications. In order to meet these requirements, series of video and audio coding standards have been developed such as MPEG-2, MPEG-4, MPEG-21 for audio and video by ISO/IEC, H.26x for video and G.72x for audio by ITU-T, Video Coder 1 (VC-1) for video by the Society of Motion Picture and Television Engineers (SMPTE) and RealVideo (RV) 9 for video by Real Networks. AVS China is the abbreviation for Audio Video Coding Standard of China. This new standard includes four main technical areas, which are systems, video, audio and digital copyright management (DRM), and some supporting documents such as consistency verification. The second part of the standard known as AVS1-P2 (Video - Jizhun) was approved as the national standard of China in 2006, and several final drafts of the standard have been completed, including AVS1-P1 (System - Broadcast), AVS1-P2 (Video - Zengqiang), AVS1-P3 (Audio - Double track), AVS1-P3 (Audio - 5.1), AVS1-P7 (Mobile Video), AVS-S-P2 (Video) and AVS-S-P3 (Audio). AVS China provides a technical solution for many applications such as digital broadcasting (SDTV and HDTV), high-density storage media, Internet streaming media, and will be used in the domestic IPTV, satellite and possibly the cable TV market. Comparing with other coding standards such as H.264 AVC, the advantages of AVS video standard include similar performance, lower complexity, lower implementation cost and licensing fees. This standard has attracted great deal of attention from industries related to television, multimedia communications and even chip manufacturing from around the world. Also many well known companies have joined the AVS Group to be Full Members or Observing Members. The 163 members of AVS Group include Texas Instruments (TI) Co., Agilent Technologies Co. Ltd., Envivio Inc., NDS, Philips Research East Asia, Aisino Corporation, LG, Alcatel Shanghai Bell Co. Ltd., Nokia (China) Investment (NCIC) Co. Ltd., Sony (China) Ltd., and Toshiba (China) Co. Ltd. as well as some high level universities in China. Thus there is a pressing need from the instructors, students, and engineers for a book dealing with the topic of AVS China and its performance comparisons with similar standards such as H.264, VC-1 and RV-9.




Real-Time Heterogeneous Video Transcoding for Low-Power Applications


Book Description

This book introduces a novel transcoding algorithm for real time video applications, designed to overcome inter-operability problems between MPEG-2 to H.264/AVC. The new algorithm achieves 92.8% reduction in the transcoding run time at a price of an acceptable Peak Signal-to-Noise Ratio (PSNR) degradation, enabling readers to use it for real time video applications. The algorithm described is evaluated through simulation and experimental results. In addition, the authors present a hardware implementation of the new algorithm using Field Programmable Gate Array (FPGA) and Application-specific standard products (ASIC). • Describes a novel transcoding algorithm for real time video applications, designed to overcome inter-operability problems between H.264/AVC to MPEG-2; • Implements algorithm presented using Field Programmable Gate Array (FPGA) and Application-specific Integrated Circuit (ASIC); • Demonstrates the solution to real problems, with verification through simulation and experimental results.




Computer, Intelligent Computing and Education Technology


Book Description

This proceedings set contains selected Computer, Information and Education Technology related papers from the 2014 International Conference on Computer, Intelligent Computing and Education Technology (CICET 2014), held March 27-28, 2014 in Hong Kong. The proceedings aims to provide a platform for researchers, engineers and academics as well as indu




MultiMedia Modeling


Book Description

The two-volume set LNCS 8325 and 8326 constitutes the thoroughly refereed proceedings of the 20th Anniversary International Conference on Multimedia Modeling, MMM 2014, held in Dublin, Ireland, in January 2014. The 46 revised regular papers, 11 short papers and 9 demonstration papers were carefully reviewed and selected from 176 submissions. 28 special session papers and 6 papers from Video Browser Showdown workshop are also included in the proceedings. The papers included in these two volumes cover a diverse range of topics including: applications of multimedia modelling, interactive retrieval, image and video collections, 3D and augmented reality, temporal analysis of multimedia content, compression and streaming. Special session papers cover the following topics: Mediadrom: artful post-TV scenarios, MM analysis for surveillance video and security applications, 3D multimedia computing and modeling, social geo-media analytics and retrieval, multimedia hyperlinking and retrieval.




Versatile Video Coding


Book Description

Video is the main driver of bandwidth use, accounting for over 80 per cent of consumer Internet traffic. Video compression is a critical component of many of the available multimedia applications, it is necessary for storage or transmission of digital video over today's band-limited networks. The majority of this video is coded using international standards developed in collaboration with ITU-T Study Group and MPEG. The MPEG family of video coding standards begun on the early 1990s with MPEG-1, developed for video and audio storage on CD-ROMs, with support for progressive video. MPEG-2 was standardized in 1995 for applications of video on DVD, standard and high definition television, with support for interlaced and progressive video. MPEG-4 part 2, also known as MPEG-2 video, was standardized in 1999 for applications of low- bit rate multimedia on mobile platforms and the Internet, with the support of object-based or content based coding by modeling the scene as background and foreground. Since MPEG-1, the main video coding standards were based on the so-called macroblocks. However, research groups continued the work beyond the traditional video coding architectures and found that macroblocks could limit the performance of the compression when using high-resolution video. Therefore, in 2013 the high efficiency video coding (HEVC) also known and H.265, was released, with a structure similar to H.264/AVC but using coding units with more flexible partitions than the traditional macroblocks. HEVC has greater flexibility in prediction modes and transform block sizes, also it has a more sophisticated interpolation and de blocking filters. In 2006 the VC-1 was released. VC-1 is a video codec implemented by Microsoft and the Microsoft Windows Media Video (VMW) 9 and standardized by the Society of Motion Picture and Television Engineers (SMPTE). In 2017 the Joint Video Experts Team (JVET) released a call for proposals for a new video coding standard initially called Beyond the HEVC, Future Video Coding (FVC) or known as Versatile Video Coding (VVC). VVC is being built on top of HEVC for application on Standard Dynamic Range (SDR), High Dynamic Range (HDR) and 360° Video. The VVC is planned to be finalized by 2020. This book presents the new VVC, and updates on the HEVC. The book discusses the advances in lossless coding and covers the topic of screen content coding. Technical topics discussed include: Beyond the High Efficiency Video CodingHigh Efficiency Video Coding encoderScreen contentLossless and visually lossless coding algorithmsFast coding algorithmsVisual quality assessmentOther screen content coding algorithmsOverview of JPEG Series




Advances in Visual Computing


Book Description

The two volume set LNCS 7431 and 7432 constitutes the refereed proceedings of the 8th International Symposium on Visual Computing, ISVC 2012, held in Rethymnon, Crete, Greece, in July 2012. The 68 revised full papers and 35 poster papers presented together with 45 special track papers were carefully reviewed and selected from more than 200 submissions. The papers are organized in topical sections: Part I (LNCS 7431) comprises computational bioimaging; computer graphics; calibration and 3D vision; object recognition; illumination, modeling, and segmentation; visualization; 3D mapping, modeling and surface reconstruction; motion and tracking; optimization for vision, graphics, and medical imaging, HCI and recognition. Part II (LNCS 7432) comprises topics such as unconstrained biometrics: advances and trends; intelligent environments: algorithms and applications; applications; virtual reality; face processing and recognition.




Implementation of a Fast Inter-prediction Mode Decision in H.264/AVC Video Encoder


Book Description

H.264/MPEG-4 Part 10 or AVC (advanced video coding) is currently one of the most widely used industry standards for video compression. There are several video codec solutions, both software and hardware, available in the market for H.264. This video compression technology is primarily used in applications such as video conferencing, mobile TV, blu-ray discs, digital television and internet video streaming. This thesis uses the JM 17.2 reference software [15], which is available for all users and can be downloaded from http://iphome.hhi.de/suehring/tml. The software is mainly used for educational purposes; it also includes the reference software manual which has information about installation, compilation and usage. In real time applications such as video streaming and video conferencing it is important that the video encoding/decoding is fast. It is known, that most of the complexity lies in the H.264 encoder, specifically the motion estimation (ME) and mode decision process introduces high computational complexity and takes a lot of CPU (central processing unit) usage. The mode decision process is complex because of variable block sizes (16X16 to 4x4) motion estimation and half and quarter pixel motion compensations. Hence, the objective of this thesis is to reduce the encoding time while maintaining the same quality and efficiency of compression. The Fast adaptive termination (FAT) [30] algorithm is used in the mode decision and motion estimation process. Based on the rate-distortion (RD) cost characteristics all the inter modes are classified as either skip modes or non-skip modes. In order to select the best mode for any macroblock, the minimum RD cost of these two modes is predicted. Further, for skip mode, an early-skip mode detection test is proposed; for non-skip mode a three-stage scheme is proposed to speed up the mode decision process. Experimental results demonstrate that the proposed technique has good robustness in coding efficiency with different quantization parameters (QP) and various video sequences. It is able to achieve encoding time saving by 47.6% and loss of only 0.01% decrease in structural similarity index matrix (SSIM) with negligible degradation in peak signal to noise ratio (PSNR) and acceptable increase in bit rate.




Image Analysis and Recognition


Book Description

The two-volume set LNCS 4141, and LNCS 4142 constitutes the refereed proceedings of the Third International Conference on Image Analysis and Recognition, ICIAR 2006. The volumes present 71 revised full papers and 92 revised poster papers together with 2 invited lectures. Volume I includes papers on image restoration and enhancement, image segmentation, image and video processing and analysis, image and video coding and encryption, image retrieval and indexing, and more.




Advances in Future Computer and Control Systems


Book Description

FCCS2012 is an integrated conference concentrating its focus on Future Computer and Control Systems. “Advances in Future Computer and Control Systems” presents the proceedings of the 2012 International Conference on Future Computer and Control Systems(FCCS2012) held April 21-22,2012, in Changsha, China including recent research results on Future Computer and Control Systems of researchers from all around the world.