Distant Speech Recognition


Book Description

A complete overview of distant automatic speech recognition The performance of conventional Automatic Speech Recognition (ASR) systems degrades dramatically as soon as the microphone is moved away from the mouth of the speaker. This is due to a broad variety of effects such as background noise, overlapping speech from other speakers, and reverberation. While traditional ASR systems underperform for speech captured with far-field sensors, there are a number of novel techniques within the recognition system as well as techniques developed in other areas of signal processing that can mitigate the deleterious effects of noise and reverberation, as well as separating speech from overlapping speakers. Distant Speech Recognitionpresents a contemporary and comprehensive description of both theoretic abstraction and practical issues inherent in the distant ASR problem. Key Features: Covers the entire topic of distant ASR and offers practical solutions to overcome the problems related to it Provides documentation and sample scripts to enable readers to construct state-of-the-art distant speech recognition systems Gives relevant background information in acoustics and filter techniques, Explains the extraction and enhancement of classification relevant speech features Describes maximum likelihood as well as discriminative parameter estimation, and maximum likelihood normalization techniques Discusses the use of multi-microphone configurations for speaker tracking and channel combination Presents several applications of the methods and technologies described in this book Accompanying website with open source software and tools to construct state-of-the-art distant speech recognition systems This reference will be an invaluable resource for researchers, developers, engineers and other professionals, as well as advanced students in speech technology, signal processing, acoustics, statistics and artificial intelligence fields.




Springer Handbook of Speech Processing


Book Description

This handbook plays a fundamental role in sustainable progress in speech research and development. With an accessible format and with accompanying DVD-Rom, it targets three categories of readers: graduate students, professors and active researchers in academia, and engineers in industry who need to understand or implement some specific algorithms for their speech-related products. It is a superb source of application-oriented, authoritative and comprehensive information about these technologies, this work combines the established knowledge derived from research in such fast evolving disciplines as Signal Processing and Communications, Acoustics, Computer Science and Linguistics.







Companion Technology


Book Description

Future technical systems will be companion systems, competent assistants that provide their functionality in a completely individualized way, adapting to a user’s capabilities, preferences, requirements, and current needs, and taking into account both the emotional state and the situation of the individual user. This book presents the enabling technology for such systems. It introduces a variety of methods and techniques to implement an individualized, adaptive, flexible, and robust behavior for technical systems by means of cognitive processes, including perception, cognition, interaction, planning, and reasoning. The technological developments are complemented by empirical studies from psychological and neurobiological perspectives.




Speech & Language Processing


Book Description




Multimodal Technologies for Perception of Humans


Book Description

This book constitutes the thoroughly refereed joint post-workshop proceedings of two co-located events: the Second International Workshop on Classification of Events, Activities and Relationships, CLEAR 2007, and the 5th Rich Transcription 2007 Meeting Recognition evaluation, RT 2007, held in succession in Baltimore, MD, USA, in May 2007. The workshops had complementary evaluation efforts; CLEAR for the evaluation of human activities, events, and relationships in multiple multimodal data domains; and RT for the evaluation of speech transcription-related technologies from meeting room audio collections. The 35 revised full papers presented from CLEAR 2007 cover 3D person tracking, 2D face detection and tracking, person and vehicle tracking on surveillance data, vehicle and person tracking aerial videos, person identification, head pose estimation, and acoustic event detection. The 15 revised full papers presented from RT 2007 are organized in topical sections on speech-to-text, and speaker diarization.




Speech Enhancement


Book Description

We live in a noisy world! In all applications (telecommunications, hands-free communications, recording, human-machine interfaces, etc) that require at least one microphone, the signal of interest is usually contaminated by noise and reverberation. As a result, the microphone signal has to be "cleaned" with digital signal processing tools before it is played out, transmitted, or stored. This book is about speech enhancement. Different well-known and state-of-the-art methods for noise reduction, with one or multiple microphones, are discussed. By speech enhancement, we mean not only noise reduction but also dereverberation and separation of independent signals. These topics are also covered in this book. However, the general emphasis is on noise reduction because of the large number of applications that can benefit from this technology. The goal of this book is to provide a strong reference for researchers, engineers, and graduate students who are interested in the problem of signal and speech enhancement. To do so, we invited well-known experts to contribute chapters covering the state of the art in this focused field.




Human-Robot Interaction


Book Description

This book offers the first comprehensive yet critical overview of methods used to evaluate interaction between humans and social robots. It reviews commonly used evaluation methods, and shows that they are not always suitable for this purpose. Using representative case studies, the book identifies good and bad practices for evaluating human-robot interactions and proposes new standardized processes as well as recommendations, carefully developed on the basis of intensive discussions between specialists in various HRI-related disciplines, e.g. psychology, ethology, ergonomics, sociology, ethnography, robotics, and computer science. The book is the result of a close, long-standing collaboration between the editors and the invited contributors, including, but not limited to, their inspiring discussions at the workshop on Evaluation Methods Standardization for Human-Robot Interaction (EMSHRI), which have been organized yearly since 2015. By highlighting and weighing good and bad practices in evaluation design for HRI, the book will stimulate the scientific community to search for better solutions, take advantages of interdisciplinary collaborations, and encourage the development of new standards to accommodate the growing presence of robots in the day-to-day and social lives of human beings.




Artificial Neural Networks and Machine Learning – ICANN 2016


Book Description

The two volume set, LNCS 9886 + 9887, constitutes the proceedings of the 25th International Conference on Artificial Neural Networks, ICANN 2016, held in Barcelona, Spain, in September 2016. The 121 full papers included in this volume were carefully reviewed and selected from 227 submissions. They were organized in topical sections named: from neurons to networks; networks and dynamics; higher nervous functions; neuronal hardware; learning foundations; deep learning; classifications and forecasting; and recognition and navigation. There are 47 short paper abstracts that are included in the back matter of the volume.