Computational Models of Visual Processing


Book Description

The more than twenty contributions in this book, all new and previously unpublished, provide an up-to-date survey of contemporary research on computational modeling of the visual system. The approaches represented range from neurophysiology to psychophysics, and from retinal function to the analysis of visual cues to motion, color, texture, and depth. The contributions are linked thematically by a consistent consideration of the links between empirical data and computational models in the study of visual function. An introductory chapter by Edward Adelson and James Bergen gives a new and elegant formalization of the elements of early vision. Subsequent sections treat receptors and sampling, models of neural function, detection and discrimination, color and shading, motion and texture, and 3D shape. Each section is introduced by a brief topical review and summary. ContributorsEdward H. Adelson, Albert J. Ahumada, Jr., James R. Bergen, David G. Birch, David H. Brainard, Heinrich H. Bülthoff, Charles Chubb, Nancy J. Coletta, Michael D'Zmura, John P. Frisby, Norma Graham, Norberto M. Grzywacz, P. William Haake, Michael J. Hawken, David J. Heeger, Donald C. Hood, Elizabeth B. Johnston, Daniel Kersten, Michael S. Landy, Peter Lennie, J. Stephen Mansfield, J. Anthony Movshon, Jacob Nachmias, Andrew J. Parker, Denis G. Pelli, Stephen B. Pollard, R. Clay Reid, Robert Shapley, Carlo L. M. Tiana, Brian A. Wandell, Andrew B. Watson, David R. Williams, Hugh R. Wilson, Yuede. Yang, Alan L. Yuille




Computational Models of Visual Processing


Book Description

The more than twenty contributions in this book, all new and previously unpublished, provide an up-to-date survey of contemporary research on computational modeling of the visual system. The approaches represented range from neurophysiology to psychophysics, and from retinal function to the analysis of visual cues to motion, color, texture, and depth. The contributions are linked thematically by a consistent consideration of the links between empirical data and computational models in the study of visual function. An introductory chapter by Edward Adelson and James Bergen gives a new and elegant formalization of the elements of early vision. Subsequent sections treat receptors and sampling, models of neural function, detection and discrimination, color and shading, motion and texture, and 3D shape. Each section is introduced by a brief topical review and summary. ContributorsEdward H. Adelson, Albert J. Ahumada, Jr., James R. Bergen, David G. Birch, David H. Brainard, Heinrich H. Bülthoff, Charles Chubb, Nancy J. Coletta, Michael D'Zmura, John P. Frisby, Norma Graham, Norberto M. Grzywacz, P. William Haake, Michael J. Hawken, David J. Heeger, Donald C. Hood, Elizabeth B. Johnston, Daniel Kersten, Michael S. Landy, Peter Lennie, J. Stephen Mansfield, J. Anthony Movshon, Jacob Nachmias, Andrew J. Parker, Denis G. Pelli, Stephen B. Pollard, R. Clay Reid, Robert Shapley, Carlo L. M. Tiana, Brian A. Wandell, Andrew B. Watson, David R. Williams, Hugh R. Wilson, Yuede. Yang, Alan L. Yuille







Integration of Natural Language and Vision Processing


Book Description

Although there has been much progress in developing theories, models and systems in the areas of Natural Language Processing (NLP) and Vision Processing (VP) there has heretofore been little progress on integrating these subareas of Artificial Intelligence (AI). This book contains a set of edited papers addressing computational models and systems for the integration of NLP and VP. The papers focus on site descriptions such as that of the large Japanese $500 million Real World Computing (RWC) project, on historical philosophical issues, on systems which have been built and which integrate the processing of visual scenes together with language about them, and on spatial relations which appear to be the key to integration. The U.S.A., Japan and the EU are well reflected, showing up the fact that integration is a truly international issue. There is no doubt that all of this will be necessary for the InformationSuperHighways of the future.










Computational Vision


Book Description

This text provides an introduction to computational aspects of early vision, in particular, color, stereo, and visual navigation. It integrates approaches from psychophysics and quantitative neurobiology, as well as theories and algorithms from machine vision and photogrammetry. When presenting mathematical material, it uses detailed verbal descriptions and illustrations to clarify complex points. The text is suitable for upper-level students in neuroscience, biology, and psychology who have basic mathematical skills and are interested in studying the mathematical modeling of perception.




Selective Visual Attention


Book Description

Visual attention is a relatively new area of study combining a number of disciplines: artificial neural networks, artificial intelligence, vision science and psychology. The aim is to build computational models similar to human vision in order to solve tough problems for many potential applications including object recognition, unmanned vehicle navigation, and image and video coding and processing. In this book, the authors provide an up to date and highly applied introduction to the topic of visual attention, aiding researchers in creating powerful computer vision systems. Areas covered include the significance of vision research, psychology and computer vision, existing computational visual attention models, and the authors' contributions on visual attention models, and applications in various image and video processing tasks. This book is geared for graduates students and researchers in neural networks, image processing, machine learning, computer vision, and other areas of biologically inspired model building and applications. The book can also be used by practicing engineers looking for techniques involving the application of image coding, video processing, machine vision and brain-like robots to real-world systems. Other students and researchers with interdisciplinary interests will also find this book appealing. Provides a key knowledge boost to developers of image processing applications Is unique in emphasizing the practical utility of attention mechanisms Includes a number of real-world examples that readers can implement in their own work: robot navigation and object selection image and video quality assessment image and video coding Provides codes for users to apply in practical attentional models and mechanisms




Human Perception of Visual Information


Book Description

Recent years have witnessed important advancements in our understanding of the psychological underpinnings of subjective properties of visual information, such as aesthetics, memorability, or induced emotions. Concurrently, computational models of objective visual properties such as semantic labelling and geometric relationships have made significant breakthroughs using the latest achievements in machine learning and large-scale data collection. There has also been limited but important work exploiting these breakthroughs to improve computational modelling of subjective visual properties. The time is ripe to explore how advances in both of these fields of study can be mutually enriching and lead to further progress. This book combines perspectives from psychology and machine learning to showcase a new, unified understanding of how images and videos influence high-level visual perception - particularly interestingness, affective values and emotions, aesthetic values, memorability, novelty, complexity, visual composition and stylistic attributes, and creativity. These human-based metrics are interesting for a very broad range of current applications, ranging from content retrieval and search, storytelling, to targeted advertising, education and learning, and content filtering. Work already exists in the literature that studies the psychological aspects of these notions or investigates potential correlations between two or more of these human concepts. Attempts at building computational models capable of predicting such notions can also be found, using state-of-the-art machine learning techniques. Nevertheless their performance proves that there is still room for improvement, as the tasks are by nature highly challenging and multifaceted, requiring thought on both the psychological implications of the human concepts, as well as their translation to machines.




Computational Models of Early Visual Processing Layers


Book Description

Visual information passes through layers of processing along the visual pathway, such as retina, lateral geniculate nucleus (LGN), primary visual cortex (V1), prestriate cortex (V2), and beyond. Understanding the functional roles of these visual processing layers will not only help to understand psychophysical and neuroanatomical observations of these layers, but also would help to build intelligent computer vision systems that exhibit human-like behaviors and performance. One of the popular theories about the functional role of visual perception, the efficient coding theory, hypothesizes that the early visual processing layers serve to capture the statistical structure of the visual inputs by removing the redundancy in the visual outputs. Linear implementations of the efficient coding theory, such as independent component analysis (ICA) and sparse coding, learn visual features exhibiting the receptive field properties of V1 simple cells when they are applied to grayscale image patches. In this dissertation, we explore different aspects of the early visual processing layers by building computational models following the efficient coding theory. 1) We develop a hierarchical model, Recursive ICA, that captures nonlinear statistical structures of the visual inputs that cannot be captured by a single layer of ICA. The model is motivated by the idea that higher layers of the visual pathway, such as V2, might work under similar computational principles as the primary visual cortex. Hence we apply a second layer of ICA on top of the first layer ICA outputs. To allow the second layer of ICA to better capture nonlinear statistical structures, we derive a coordinate-wise nonlinear activation function that transforms the first layer ICA's outputs to the second layer ICA's inputs. When applied to grayscale image patches, the model's second layer learns nonlinear visual features, such as texture boundaries and shape contours. We apply the above model to natural scene images, such as forest and grassland, to learn some generic visual features, and then use these features for face and handwritten digit recognition. We get higher recognition rates than those systems built with features designed for face and digit recognition. (2) We show that retinal coding, the pre-cortical stage of visual processing, can also be explained by the efficient coding theory. The retinal coding model turns out to be another variation of Sparse PCA, a technique widely applied in signal processing, financial analysis, bioinformatics, etc. Compared with ICA, which removes the redundancy among the input dimensions, Sparse PCA removes redundancy among the input samples. We apply Sparse PCA to grayscale images, chromatic images, grayscale videos, environmental sound, and human speech, and learn visual and auditory features that exhibit the filtering properties of retinal ganglion cells and auditory nerve fibers. This work suggests that the pre-cortical stages of visual and auditory pathway might work under similar computational principles.