Robust Linear Prediction Analysis for Low Bit-rate Speech Coding


Book Description

Speech coding is a very important area of research in digital signal processing. It is a fundamental element of digital communications and has progressed at a fast pace in parallel to the increase of demands in telecommunication services and capabilities. Most of the speech coders reported in the literature are based on linear prediction (LP) analysis. Code Excited Linear Predictive (CELP) coder is a typical and popular example of this class of coders. This coder performs LP analysis of speech for extracting LP coefficients and employs an analysis-by-synthesis procedure to search a stochastic codebook to compute the excitation signal. The method used for performing LP analysis plays an important role in the design of a CELP coder. The autocorrelation method is conventionally used for LP analysis. Though this works reasonably well for noise-free (clean) speech, its performance goes down when signal is corrupted by noise. Spectral analysis of speech signals in noisy environments is an aspect of speech coding that deserves more attention. This dissertation studies the application of recently proposed robust LP analysis methods for estimating the power spectrum envelope of speech signals. These methods are the moving average, moving maximum and average threshold methods. The proposed methods will be compared to the more commonly used methods of LP analysis, such as the conventional autocorrelation method and the Spectral Envelope Estimation Vocoder (SEEVOC) method. The Linear Predictive Coding (LPC) spectrum calculated from these proposed methods are shown to be more robust. These methods work as well as the conventional methods when the speech signal is clean or has high signal-to-noise ratio. Also, these robust methods give less quantisation distortion than the conventional methods. The application of these robust methods for speech compression using the CELP coder provides better speech quality when compared to the conventional LP analysis methods.




Ultra Low Bit-Rate Speech Coding


Book Description

"Ultra Low Bit-Rate Speech Coding" focuses on the specialized topic of speech coding at very low bit-rates of 1 Kbits/sec and less, particularly at the lower ends of this range, down to 100 bps. The authors set forth the fundamental results and trends that form the basis for such ultra low bit-rates to be viable and provide a comprehensive overview of various techniques and systems in literature to date, with particular attention to their work in the paradigm of unit-selection based segment quantization. The book is for research students, academic faculty and researchers, and industry practitioners in the areas of speech processing and speech coding.




Algorithms and Software for Predictive and Perceptual Modeling of Speech


Book Description

From the early pulse code modulation-based coders to some of the recent multi-rate wideband speech coding standards, the area of speech coding made several significant strides with an objective to attain high quality of speech at the lowest possible bit rate. This book presents some of the recent advances in linear prediction (LP)-based speech analysis that employ perceptual models for narrow- and wide-band speech coding. The LP analysis-synthesis framework has been successful for speech coding because it fits well the source-system paradigm for speech synthesis. Limitations associated with the conventional LP have been studied extensively, and several extensions to LP-based analysis-synthesis have been proposed, e.g., the discrete all-pole modeling, the perceptual LP, the warped LP, the LP with modified filter structures, the IIR-based pure LP, all-pole modeling using the weighted-sum of LSP polynomials, the LP for low frequency emphasis, and the cascade-form LP. These extensions can be classified as algorithms that either attempt to improve the LP spectral envelope fitting performance or embed perceptual models in the LP. The first half of the book reviews some of the recent developments in predictive modeling of speech with the help of MatlabTM Simulation examples. Advantages of integrating perceptual models in low bit rate speech coding depend on the accuracy of these models to mimic the human performance and, more importantly, on the achievable "coding gains" and "computational overhead" associated with these physiological models. Methods that exploit the masking properties of the human ear in speech coding standards, even today, are largely based on concepts introduced by Schroeder and Atal in 1979. For example, a simple approach employed in speech coding standards is to use a perceptual weighting filter to shape the quantization noise according to the masking properties of the human ear. The second half of the book reviews some of the recent developments in perceptual modeling of speech (e.g., masking threshold, psychoacoustic models, auditory excitation pattern, and loudness) with the help of MatlabTM simulations. Supplementary material including MatlabTM programs and simulation examples presented in this book can also be accessed here. Table of Contents: Introduction / Predictive Modeling of Speech / Perceptual Modeling of Speech




Speech and Audio Processing for Coding, Enhancement and Recognition


Book Description

This book describes the basic principles underlying the generation, coding, transmission and enhancement of speech and audio signals, including advanced statistical and machine learning techniques for speech and speaker recognition with an overview of the key innovations in these areas. Key research undertaken in speech coding, speech enhancement, speech recognition, emotion recognition and speaker diarization are also presented, along with recent advances and new paradigms in these areas.




Advances in Speech Coding


Book Description

Speech coding has been an ongoing area of research for several decades, yet the level of activity and interest in this area has expanded dramatically in the last several years. Important advances in algorithmic techniques for speech coding have recently emerged and excellent progress has been achieved in producing high quality speech at bit rates as low as 4.8 kb/s. Although the complexity of the newer more sophisticated algorithms greatly exceeds that of older methods (such as ADPCM), today's powerful programmable signal processor chips allow rapid technology transfer from research to product development and permit many new cost-effective applications of speech coding. In particular, low bit rate voice technology is converging with the needs of the rapidly evolving digital telecom munication networks. The IEEE Workshop on Speech Coding for Telecommunications was held in Vancouver, British Columbia, Canada, from September 5 to 8, 1989. The objective of the workshop was to provide a forum for discussion of recent developments and future directions in speech coding. The workshop attracted over 130 researchers from several countries and its technical program included 51 papers.







Improved Speech Coding Based on Open-loop Parameter Estimation


Book Description

A nonlinear optimization algorithm for linear predictive speech coding was developed early that not only optimizes the linear model coefficients for the open loop predictor, but does the optimization including the effects of quantization of the transmitted residual. It also simultaneously optimizes the quantization levels used for each speech segment. In this paper, we present an improved method for initialization of this nonlinear algorithm, and demonstrate substantial improvements in performance. In addition, the new procedure produces monotonically improving speech quality with increasing numbers of bits used in the transmitted error residual. Examples of speech encoding and decoding are given for 8 speech segments and signal to noise levels as high as 47 dB are produced. As in typical linear predictive coding, the optimization is done on the open loop speech analysis model. Here we demonstrate that minimizing the error of the closed loop speech reconstruction, instead of the simpler open loop optimization, is likely to produce negligible improvement in speech quality. The examples suggest that the algorithm here is close to giving the best perfomance obtainable from a linear model, for the chosen order with the chosen number of bits for the codebook.




Springer Handbook of Speech Processing


Book Description

This handbook plays a fundamental role in sustainable progress in speech research and development. With an accessible format and with accompanying DVD-Rom, it targets three categories of readers: graduate students, professors and active researchers in academia, and engineers in industry who need to understand or implement some specific algorithms for their speech-related products. It is a superb source of application-oriented, authoritative and comprehensive information about these technologies, this work combines the established knowledge derived from research in such fast evolving disciplines as Signal Processing and Communications, Acoustics, Computer Science and Linguistics.




Digital Speech


Book Description

Building on the success of the first edition Digital Speech offers extensive new, updated and revised material based upon the latest research. This Second Edition continues to provide the fundamental technical background required for low bit rate speech coding and the hottest developments in digital speech coding techniques that are applicable to evolving communication systems. Features new chapters on Pitch Estimation and Voice-Unvoiced Classification of Speech, Harmonic Speech Coding and Multimode Speech Coding Presents a comprehensively revised chapter entitled Analysis by Synthesis LPC Coding including specific examples of popular speech coders such as CELP (Code-Excited Linear Predictive) Coding Contains an updated chapter on Efficient LPC Quantization Methods including MSVQ and anti-aliasing filtering Discusses Voice Activity Detection (VAD) methods Offers expanded coverage of speech enhancement techniques such as echo cancellation and noise suppression Written by a well-known, highly respected academic, this authoritative volume will be invaluable to practising engineers, network designers, computer scientists and advanced students in communications, electrical and electronic engineering.