跨视图图像序列地理位置定位

论文标题

跨视图图像序列地理位置定位

Cross-View Image Sequence Geo-localization

论文作者

Zhang, Xiaohan, Sultani, Waqas, Wshah, Safwan

论文摘要

跨视图地理位置定位旨在通过将其与地理标记的空中图像参考数据库相匹配，以估算查询地面视图图像的GPS位置。为了解决这个具有挑战性的问题，最近的方法使用全景地面视图图像来增加可见性范围。尽管吸引人，但与有限的视野（FOV）图像的视频相比，全景图像不容易获得。在本文中，我们提出了第一种以有限的FOV图像为序列的跨视图地理定位方法。我们的模型经过训练有素的端到端，可使用基于注意的时间特征聚合模块捕获帧内的时间结构。为了在推理期间坚固地处理不同的序列长度和GPS噪声，我们建议使用顺序辍学方案模拟变体长度序列。为了评估在现实设置中提出的方法，我们提出了一个新的大规模数据集，其中包含地面视图序列以及相应的空中视图图像。与几个竞争基线相比，广泛的实验和比较证明了所提出的方法的优越性。

Cross-view geo-localization aims to estimate the GPS location of a query ground-view image by matching it to images from a reference database of geo-tagged aerial images. To address this challenging problem, recent approaches use panoramic ground-view images to increase the range of visibility. Although appealing, panoramic images are not readily available compared to the videos of limited Field-Of-View (FOV) images. In this paper, we present the first cross-view geo-localization method that works on a sequence of limited FOV images. Our model is trained end-to-end to capture the temporal structure that lies within the frames using the attention-based temporal feature aggregation module. To robustly tackle different sequences length and GPS noises during inference, we propose to use a sequential dropout scheme to simulate variant length sequences. To evaluate the proposed approach in realistic settings, we present a new large-scale dataset containing ground-view sequences along with the corresponding aerial-view images. Extensive experiments and comparisons demonstrate the superiority of the proposed approach compared to several competitive baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题