Speech Recognition and Coding


Book Description

Based on a NATO Advanced Study Institute held in 1993, this book addresses recent advances in automatic speech recognition and speech coding. The book contains contributions by many of the most outstanding researchers from the best laboratories worldwide in the field. The contributions have been grouped into five parts: on acoustic modeling; language modeling; speech processing, analysis and synthesis; speech coding; and vector quantization and neural nets. For each of these topics, some of the best-known researchers were invited to give a lecture. In addition to these lectures, the topics were complemented with discussions and presentations of the work of those attending. Altogether, the reader is given a wide perspective on recent advances in the field and will be able to see the trends for future work.







Readings in Speech Recognition


Book Description

After more than two decades of research activity, speech recognition has begun to live up to its promise as a practical technology and interest in the field is growing dramatically. Readings in Speech Recognition provides a collection of seminal papers that have influenced or redirected the field and that illustrate the central insights that have emerged over the years. The editors provide an introduction to the field, its concerns and research problems. Subsequent chapters are devoted to the main schools of thought and design philosophies that have motivated different approaches to speech recognition system design. Each chapter includes an introduction to the papers that highlights the major insights or needs that have motivated an approach to a problem and describes the commonalities and differences of that approach to others in the book.




Spoken Language Understanding


Book Description

Spoken language understanding (SLU) is an emerging field in between speech and language processing, investigating human/ machine and human/ human communication by leveraging technologies from signal processing, pattern recognition, machine learning and artificial intelligence. SLU systems are designed to extract the meaning from speech utterances and its applications are vast, from voice search in mobile devices to meeting summarization, attracting interest from both commercial and academic sectors. Both human/machine and human/human communications can benefit from the application of SLU, using differing tasks and approaches to better understand and utilize such communications. This book covers the state-of-the-art approaches for the most popular SLU tasks with chapters written by well-known researchers in the respective fields. Key features include: Presents a fully integrated view of the two distinct disciplines of speech processing and language processing for SLU tasks. Defines what is possible today for SLU as an enabling technology for enterprise (e.g., customer care centers or company meetings), and consumer (e.g., entertainment, mobile, car, robot, or smart environments) applications and outlines the key research areas. Provides a unique source of distilled information on methods for computer modeling of semantic information in human/machine and human/human conversations. This book can be successfully used for graduate courses in electronics engineering, computer science or computational linguistics. Moreover, technologists interested in processing spoken communications will find it a useful source of collated information of the topic drawn from the two distinct disciplines of speech processing and language processing under the new area of SLU.




Pattern Recognition in Speech and Language Processing


Book Description

Over the last 20 years, approaches to designing speech and language processing algorithms have moved from methods based on linguistics and speech science to data-driven pattern recognition techniques. These techniques have been the focus of intense, fast-moving research and have contributed to significant advances in this field. Pattern Reco




Neural Models of language Processes


Book Description

Neural Models of Language Processes offers an interdisciplinary approach to understanding the nature of human language and the means whereby we use it. The book is organized into five parts. Part I provides an opening framework that addresses three tasks: to place neurolinguistics in current perspective; to provide two case studies of aphasia; and to discuss the ""rules of the game"" of the various disciplines that contribute to this volume. Part II on artificial intelligence (AI) and processing models discusses the contribution of AI to neurolinguistics. The chapters in this section introduce three AI systems for language perception: the HWIM and HEARSAY systems that proceed from an acoustic input to a semantic interpretation of the utterance it represents, and Marcus9 system for parsing sentences presented in text. Studying these systems demonstrates the virtues of implemented or implementable models. Part III on linguistic and psycholinguistic perspectives includes studies such as nonaphasic language behavior and the linguistics and psycholinguistics of sign language. Part IV examines neurological perspectives such as the neuropathological basis of Broca's aphasia and the simulation of speech production without a computer. Part V on neuroscience and brain theory includes studies such as the histology, architectonics, and asymmetry of language areas; hierarchy and evolution in neurolinguistics; and perceptual-motor processes and the neural basis of language.




The Voice in the Machine


Book Description

An examination of more than sixty years of successes and failures in developing technologies that allow computers to understand human spoken language. Stanley Kubrick's 1968 film 2001: A Space Odyssey famously featured HAL, a computer with the ability to hold lengthy conversations with his fellow space travelers. More than forty years later, we have advanced computer technology that Kubrick never imagined, but we do not have computers that talk and understand speech as HAL did. Is it a failure of our technology that we have not gotten much further than an automated voice that tells us to "say or press 1"? Or is there something fundamental in human language and speech that we do not yet understand deeply enough to be able to replicate in a computer? In The Voice in the Machine, Roberto Pieraccini examines six decades of work in science and technology to develop computers that can interact with humans using speech and the industry that has arisen around the quest for these technologies. He shows that although the computers today that understand speech may not have HAL's capacity for conversation, they have capabilities that make them usable in many applications today and are on a fast track of improvement and innovation. Pieraccini describes the evolution of speech recognition and speech understanding processes from waveform methods to artificial intelligence approaches to statistical learning and modeling of human speech based on a rigorous mathematical model--specifically, Hidden Markov Models (HMM). He details the development of dialog systems, the ability to produce speech, and the process of bringing talking machines to the market. Finally, he asks a question that only the future can answer: will we end up with HAL-like computers or something completely unexpected?







The Art and Business of Speech Recognition


Book Description

Most people have experienced an automated speech-recognition system when calling a company. Instead of prompting callers to choose an option by entering numbers, the system asks questions and understands spoken responses. With a more advanced application, callers may feel as if they're having a conversation with another person. Not only will the system respond intelligently, its voice even has personality. The Art and Business of Speech Recognition examines both the rapid emergence and broad potential of speech-recognition applications. By explaining the nature, design, development, and use of such applications, this book addresses two particular needs: Business managers must understand the competitive advantage that speech-recognition applications provide: a more effective way to engage, serve, and retain customers over the phone. Application designers must know how to meet their most critical business goal: a satisfying customer experience. Author Blade Kotelly illuminates these needs from the perspective of an experienced, business-focused practitioner. Among the diverse applications he's worked on, perhaps his most influential design is the flight-information system developed for United Airlines, about which Julie Vallone wrote in Investor's Business Daily "By the end of the conversation, you might want to take the voice to dinner." If dinner is the analogy, this concise book is an ideal first course. Managers will learn the potential of speech-recognition applications to reduce costs, increase customer satisfaction, enhance the company brand, and even grow revenues. Designers, especially those just beginning to work in the voice domain, will learn user-interface design principles and techniques needed to develop and deploy successful applications. The examples in the book are real, the writing is accessible and lucid, and the solutions presented are attainable today. 0321154924B12242002




No Code Required


Book Description

No Code Required presents the various design, system architectures, research methodologies, and evaluation strategies that are used by end users programming on the Web. It also presents the tools that will allow users to participate in the creation of their own Web. Comprised of seven parts, the book provides basic information about the field of end-user programming. Part 1 points out that the Firefox browser is one of the differentiating factors considered for end-user programming on the Web. Part 2 discusses the automation and customization of the Web. Part 3 covers the different approaches to proposing a specialized platform for creating a new Web browser. Part 4 discusses three systems that focus on the customized tools that will be used by the end users in exploring large amounts of data on the Web. Part 5 explains the role of natural language in the end-user programming systems. Part 6 provides an overview of the assumptions on the accessibility of the Web site owners of the Web content. Lastly, Part 7 offers the idea of the Web-active end user, an individual who is seeking new technologies. - The first book since Web 2.0 that covers the latest research, development, and systems emerging from HCI research labs on end user programming tools - Featuring contributions from the creators of Adobe's Zoetrope and Intel's Mash Maker, discussing test results, implementation, feedback, and ways forward in this booming area