Current Issue Cover
深度学习的跨视角地理定位方法综述

周博文, 李阳, 马鑫骥, 苗壮, 张睿(陆军工程大学)

摘 要
跨视角地理定位技术是计算机视觉领域中的重要问题之一,因其可在缺乏卫星定位环境中实现实时定位,一直受到图像配准、导航定位和图像检索等诸多领域的关注。传统的跨视角地理定位方法采用手工特征进行特征抽取,导致定位精度受限。随着深度学习技术的发展,深度学习的跨视角地理定位方法成为当前的主流技术。但由于跨视角地理定位任务涉及多个步骤、迁移知识广泛,因此本领域仍缺少相关综述。本文首次从跨视角地理定位任务框架的视角,对当前深度学习的跨视角地理定位方法进行全面综述。在问题概述的基础上,本文对数据预处理、深度学习网络、特征注意力模块和损失函数等技术的发展进行了归纳总结。通过近百篇高影响力文献的梳理,本文总结出跨视角地理定位任务的特性和改进思路,有助于启发研究者设计新方法。此外,本文还在两个具有代表性的数据集上分别测试了10种不同深度学习的跨视角地理定位方法。从实验精度、模型的参数量和推理速度三方面综合评估了现有方法的性能。最后,基于对上述跨视角地理定位方法的归纳分析,本文结合实际应用指出该领域存在的一些问题,并对未来发展趋势进行讨论,希望为该领域感兴趣的学者提供参考。
关键词
A survey of cross-view geo-localization methods based on deep learning

Zhou Bowen, Li Yang, Ma Xinji, Miao Zhuang, Zhang Rui(Army Engineering University of PLA)

Abstract
Cross-view geo-localization aims to estimate a target geographical location by matching images from different viewpoints. It is usually viewed as an image retrieval task, which has been widely adopted in various artificial intelligence tasks, such as person re-identification, vehicle re-identification, and image registration. The main challenge of this localization task is the drastic changes between different viewpoints, which reduces the retrieval performance of the model. Conventional techniques for cross-view geo-localization rely on manual feature extraction, which restricts precision when determining location. With the development of deep learning techniques, deep learning-based cross-view geo-localization methods have become the current mainstream technology. However, due to the involvement of multiple steps and extensive transfer knowledge in cross-view geo-localization tasks, there is still a lack of relevant literature reviews in this field. In this paper, we propose the first review of cross-view geo-localization methods based on deep learning. We provide a comprehensive overview of the current state-of-the-art cross-view geo-localization methods that rely on deep learning. The focus of this paper is to analyze the various developments in data preprocessing, deep learning networks, feature attention modules, and loss functions within the context of cross-view geo-localization tasks. To address the challenges in this field, the data preprocessing phase involves feature alignment, sampling strategies, and data augmentation. Feature alignment serves as prior knowledge for cross-view geo-localization, which contributes to improving the localization accuracy. The use of GAN networks has emerged as a prominent trend for feature alignment. Additionally, the discrepancy in sample quantities between satellite, ground, and drone images necessitates effective sampling strategies and data augmentation techniques to achieve training balance. Deep learning networks play a critical role in extracting image features, and their performance directly impacts the accuracy of cross-view geo-localization tasks. In general, the methods that use Transformer as the backbone network have higher accuracy than those that use ResNet as the backbone network. The methods that use the ConvNeXt network perform the best among all approaches. To further extract image features and enhance the discriminative power of the model, it is necessary to design feature attention modules. These modules, through learning effective attention mechanisms, adaptively weight the input images or feature maps to better focus on the task-relevant regions or features. Experimental results show that the use of feature attention modules can explore previously unattended feature information, further extract image features, and enhance the discriminative power of the model. Loss functions are used to help the model better fit the data and accelerate the convergence speed of the model. They guide the training direction of the entire network based on the results of the loss function, enabling the model to learn better representations and further improve the accuracy of cross-view geo-localization tasks. The commonly used loss functions include contrastive loss, triplet loss, and three other types of loss. With the improvement of loss functions, the number of samples extracted by the model has evolved from one-to-one to one-to-many, allowing the model to cover all samples during training and further enhance the model"s performance. Through the analysis of nearly a hundred influential literature, this paper summarizes the characteristics and improvement ideas of cross-view geo-localization tasks, which can inspire researchers to design new methods. In addition, this paper tests 10 deep learning-based cross-view geo-localization methods on two representative datasets. The evaluation includes the backbone network type and input data size of cross-view geo-localization methods. In the University-1652 dataset, two accuracy metrics (R@1 and AP), model parameters, and inference speed are evaluated. In the CVUSA dataset, four accuracy metrics, including R@1, R@5, R@10, and R@Top1, are mainly evaluated. The experimental results show that the performance of the backbone network type and larger image data input size have a positive impact on the model"s performance. Finally, building upon an extensive review of the current state-of-the-art cross-view geo-localization methods, we discuss the challenges and provide several further research directions for cross-view geo-localization. We hope to provide some suggestions for future research directions.
Keywords

订阅号|日报