Toward Real-world Cross-view Image Geo-localization


Book Description

Cross-view image geo-localization aims to determine the locations of street-view query images by searching in a GPS-tagged reference image database from aerial view. One fundamental challenge is the dramatic view-point/domain difference between the street-view query images and aerial-view reference images. Recent works have made great progress on bridging the domain gap with advanced deep learning techniques and geometric prior knowledge, i.e. the query is aligned at the center of one aerial-view reference image (spatial alignment) and the orientation relationship between the two views is known (orientation alignment). However, such prior knowledge of the geometry correspondence of the two views is usually not available for real-world scenarios. In this dissertation, we first explore how current model would perform in real-world scenarios, where the spatial or orientation alignment is not available and geometric prior knowledge (e.g. polar transform) does not work well. For spatial alignment, we collect a new dataset with real-world protocol for this scenario and propose a better solution, as the first to explore multiple reference correspondence and GPS offset prediction beyond image-level retrieval. For orientation alignment, we demonstrate better metric learning techniques for this scenario and propose to estimate the orientation without explicit supervision. Then we propose a novel visual explanation method as well as the first quantitative analysis of visual explanation of deep metric learning to gain deeper understanding about the model with improved orientation estimation. Finally, we propose the first pure transformer-based method which does not rely on geometric prior knowledge (polar transform) and generalizes well on real-world scenarios w/o orientation or spatial alignment. We also provide quantitative measurement on computational cost to show that our model is more efficient than previous methods. In summary, we push cross-view image geo-localization toward real-world application with more realistic settings, higher accuracy, lower computational cost and better understanding/interpretation.




Image and Graphics


Book Description

This three-volume set LNCS 12888, 12898, and 12890 constitutes the refereed conference proceedings of the 11th International Conference on Image and Graphics, ICIG 2021, held in Haikou, China, in August 2021.* The 198 full papers presented were selected from 421 submissions and focus on advances of theory, techniques and algorithms as well as innovative technologies of image, video and graphics processing and fostering innovation, entrepreneurship, and networking. *The conference was postponed due to the COVID-19 pandemic.




Computational Visual Media


Book Description




Knowledge Science, Engineering and Management


Book Description

This volume set constitutes the refereed proceedings of the 16th International Conference on Knowledge Science, Engineering and Management, KSEM 2023, which was held in Guangzhou, China, during August 16–18, 2023. The 114 full papers and 30 short papers included in this book were carefully reviewed and selected from 395 submissions. They were organized in topical sections as follows: knowledge science with learning and AI; knowledge engineering research and applications; knowledge management systems; and emerging technologies for knowledge science, engineering and management.




Computer Vision -- ECCV 2014


Book Description

The seven-volume set comprising LNCS volumes 8689-8695 constitutes the refereed proceedings of the 13th European Conference on Computer Vision, ECCV 2014, held in Zurich, Switzerland, in September 2014. The 363 revised papers presented were carefully reviewed and selected from 1444 submissions. The papers are organized in topical sections on tracking and activity recognition; recognition; learning and inference; structure from motion and feature matching; computational photography and low-level vision; vision; segmentation and saliency; context and 3D scenes; motion and 3D scene analysis; and poster sessions.




Bulletin of the Atomic Scientists


Book Description

The Bulletin of the Atomic Scientists is the premier public resource on scientific and technological developments that impact global security. Founded by Manhattan Project Scientists, the Bulletin's iconic "Doomsday Clock" stimulates solutions for a safer world.




Visual Geo-localization and Location-aware Image Understanding


Book Description

Geo-localization is the problem of discovering the location where an image or video was captured. Recently, large scale geo-localization methods which are devised for ground-level imagery and employ techniques similar to image matching have attracted much interest. In these methods, given a reference dataset composed of geo-tagged images, the problem is to estimate the geo-location of a query by finding its matching reference images. In this dissertation, we address three questions central to geo-spatial analysis of ground-level imagery: 1) How to geo-localize images and videos captured at unknown locations? 2) How to refine the geo-location of already geo-tagged data? 3) How to utilize the extracted geo-tags? We present a new framework for geo-locating an image utilizing a novel multiple nearest neighbor feature matching method using Generalized Minimum Clique Graphs (GMCP). First, we extract local features (e.g., SIFT) from the query image and retrieve a number of nearest neighbors for each query feature from the reference data set. Next, we apply our GMCP-based feature matching to select a single nearest neighbor for each query feature such that all matches are globally consistent. Our approach to feature matching is based on the proposition that the first nearest neighbors are not necessarily the best choices for finding correspondences in image matching. Therefore, the proposed method considers multiple reference nearest neighbors as potential matches and selects the correct ones by enforcing the consistency among their global features (e.g., GIST) using GMCP. Our evaluations using a new data set of 102k Street View images shows the proposed method outperforms the state-of-the-art by 10 percent.




Multimodal Location Estimation of Videos and Images


Book Description

This book presents an overview of the field of multimodal location estimation. The authors' aim is to describe the research results in this field in a unified way. The book describes fundamental methods of acoustic, visual, textual, social graph, and metadata processing as well as multimodal integration methods used for location estimation. In addition, the book covers benchmark metrics and explores the limits of the technology based on a human baseline. The book also outlines privacy implications and discusses directions for future research in the area.




Recent Advances in Image Restoration with Applications to Real World Problems


Book Description

In the past few decades, imaging hardware has improved tremendously in terms of resolution, making widespread usage of images in many diverse applications on Earth and planetary missions. However, practical issues associated with image acquisition are still affecting image quality. Some of these issues such as blurring, measurement noise, mosaicing artifacts, low spatial or spectral resolution, etc. can seriously affect the accuracy of the aforementioned applications. This book intends to provide the reader with a glimpse of the latest developments and recent advances in image restoration, which includes image super-resolution, image fusion to enhance spatial, spectral resolution, and temporal resolutions, and the generation of synthetic images using deep learning techniques. Some practical applications are also included.




Multiple View Geometry in Computer Vision


Book Description

A basic problem in computer vision is to understand the structure of a real world scene given several images of it. Techniques for solving this problem are taken from projective geometry and photogrammetry. Here, the authors cover the geometric principles and their algebraic representation in terms of camera projection matrices, the fundamental matrix and the trifocal tensor. The theory and methods of computation of these entities are discussed with real examples, as is their use in the reconstruction of scenes from multiple images. The new edition features an extended introduction covering the key ideas in the book (which itself has been updated with additional examples and appendices) and significant new results which have appeared since the first edition. Comprehensive background material is provided, so readers familiar with linear algebra and basic numerical methods can understand the projective geometry and estimation algorithms presented, and implement the algorithms directly from the book.