Current Issue Cover
多模态遥感图像配准方法研究综述

朱柏, 叶沅鑫(西南交通大学地球科学与工程学院)

摘 要
随着对地观测技术的不断发展,从星载、机载和地面平台上众多的一体化立体观测设施被发射,这些传感器设备可以动态提供不同空间、时间和光谱分辨率的多模态遥感图像,只有充分利用各类多模态遥感图像才能有效地为自然资源管理、防灾减灾和环境监测等不同应用提供更可靠和全面的对地观测结果。但是由于不同传感器之间的成像机理不同,多模态图像之间呈现显著的辐射差异、几何差异、时相差异和视角差异等,给多模态遥感图像高精度的配准带来了巨大的挑战。为推进多模态遥感图像配准研究技术的发展,本文对当前主流的多模态遥感图像配准方法系统性地进行了梳理、分析、介绍和总结。首先梳理了单模态到多模态遥感图像配准的研究发展演化过程;然后分别分析了基于区域、基于特征和基于深度学习方法中代表性算法的核心思想,并给出已开源代码的链接;同时分享了现有公开的多模态遥感图像配准数据集,并介绍了数据集的详细内容和特点;最后给出了现阶段多模态遥感图像高精度配准研究中所存在的一些问题和严峻挑战,并对未来研究的发展趋势进行了前瞻性的展望,旨在推动多模态遥感图像配准领域实现更加深入的突破和创新。
关键词
Multimodal remote sensing image registration: a survey

Zhu Bai, Ye Yuanxin(School of Geosciences and Environmental Engineering,Southwest Jiaotong University)

Abstract
With the advent of new infrastructure construction and the era of intelligent photogrammetry, global aerospace and aviation remote sensing technology has developed rapidly. Numerous multi-sensors integrate stereoscopic observation facilities have been launched from spaceborne, airborne and terrestrial platforms, and the types of sensors have also developed from traditional single-mode sensors (such as optical sensors) to a new generation of multimodal sensors (such as multispectral, hyperspectral, LiDAR and SAR sensors). These advanced sensor devices can dynamically provide multimodal remote sensing images with different spatial, temporal and spectral resolutions, and can obtain more reliable, comprehensive and accurate observation results than single-modal sensors through joint processing of spaceborne, airborne and terrestrial multimodal data. Therefore, it is of great scientific significance to carry out research on multimodal remote sensing image registration. Only by fully integrating and utilizing various multimodal remote sensing images can we effectively achieve multi-level and multi-perspective Earth observation. In order to promote the development of multimodal remote sensing image registration research technology, we systematically sort out, analyze, introduce and summarize the current mainstream multi-modal remote sensing image registration methods. We first sort out the research development and evolution process from single-modal to multimodal remote sensing image registration. We then analyze the core ideas of representative algorithms among area-based, feature-based, and deep-learning-based pipelines, while introducing the contribution of the author team in the field of multimodal remote sensing image registration. Area-based registration (template matching) pipeline mainly includes two types: information-theory-based and structural feature-based registration methods. The structural feature-based method consists of sparse structural features and dense structural features. From the perspective of the robustness and efficiency of comprehensive registration, dense-structure-feature-based methods have obvious effectiveness and advantages in handling significant nonlinear radiation differences between multimodal remote sensing images, and can meet many current application needs. Whereas, area-based registration pipeline generally relies on geo-referencing of remote sensing images to predict the rough range of template matching. Feature-based registration methods can be refined into three categories: feature registration based on gradient optimization, local self-similarity and phase consistency. The feature registration of gradient optimization usually designs consistent gradients for specific multimodal images. The generalization of this type of method based on gradient optimization is generally poor, and it is difficult to maintain the same performance on other types of multimodal images. The feature registration of local self-similarity (LSS) also has limitations, as the relatively low discriminative power of LSS descriptors may result in the inability to maintain robust matching performance in the presence of complex nonlinear radiation differences. The feature registration of phase consistency has high computational complexity, and the registration process is generally time-consuming. Feature-based registration pipeline utilizes the local spatial relationship between adjacent pixels to construct a high-dimensional information feature vector for each feature point. Compared with template matching methods, they usually face a heavy computational burden, and in the assumed inevitable serious outliers are prone to occur in matching, especially in multi-modal registration situations where scale, rotation, and radiation differences exist simultaneously. Generally speaking, the registration robustness of feature-based methods is not as stable as that of area-based methods. The deep-learning-based pipeline can be divided into modular and end-to-end registration methods. The most common strategy for modular registration methods is to embed deep networks into feature-based or region-based methods, taking advantage of deep learning"s complete data-driven and high-dimensional deep feature extraction to generate more robust features or more effective descriptors or similarity measures, thereby improving the robustness of image registration. Modular registration methods can be subdivided into three categories: learning-based template matching, learning-based feature matching, and style transfer-based modal unification. Modular registration methods are easy to train and have strong flexibility, but it is difficult to avoid the error accumulation problem that easily occurs in multi-stage tasks and may fall into local optimality. The end-to-end registration methods directly estimate the geometric transformation parameters or deformation field to achieve image registration by directly constructing an end-to-end neural network structure. The training objectives of the end-to-end network are consistent and can obtain the global optimal solution, but there are also problems such as high training difficulty and poor interpretability. However, there is currently no complete and comprehensive database containing all types of multi-modal remote sensing image pairs, and the lack of training and testing data greatly limits the development of deep learning-based registration methods. Further, we share existing public multimodal remote sensing image registration datasets, as well as supplement by a small number of registration datasets in the field of computer vision. Finally, the existing problems and challenges in the current research on high-precision registration of multimodal remote sensing images are analyzed, and a forward-looking outlook on the development trend of future research is given, which aims at promoting further breakthroughs and innovations in the field of multimodal remote sensing image registration.
Keywords

订阅号|日报