Current Issue Cover

朱柏, 叶沅鑫(西南交通大学地球科学与工程学院)

摘 要
Multimodal remote sensing image registration: a survey

Zhu Bai, Ye Yuanxin(School of Geosciences and Environmental Engineering,Southwest Jiaotong University)

With the advent of new infrastructure construction and the era of intelligent photogrammetry, global aerospace and aviation remote sensing technology has developed rapidly. Numerous multi-sensors integrate stereoscopic observation facilities have been launched from spaceborne, airborne and terrestrial platforms, and the types of sensors have also developed from traditional single-mode sensors (such as optical sensors) to a new generation of multimodal sensors (such as multispectral, hyperspectral, LiDAR and SAR sensors). These advanced sensor devices can dynamically provide multimodal remote sensing images with different spatial, temporal and spectral resolutions, and can obtain more reliable, comprehensive and accurate observation results than single-modal sensors through joint processing of spaceborne, airborne and terrestrial multimodal data. Therefore, it is of great scientific significance to carry out research on multimodal remote sensing image registration. Only by fully integrating and utilizing various multimodal remote sensing images can we effectively achieve multi-level and multi-perspective Earth observation. In order to promote the development of multimodal remote sensing image registration research technology, we systematically sort out, analyze, introduce and summarize the current mainstream multi-modal remote sensing image registration methods. We first sort out the research development and evolution process from single-modal to multimodal remote sensing image registration. We then analyze the core ideas of representative algorithms among area-based, feature-based, and deep-learning-based pipelines, while introducing the contribution of the author team in the field of multimodal remote sensing image registration. Area-based registration (template matching) pipeline mainly includes two types: information-theory-based and structural feature-based registration methods. The structural feature-based method consists of sparse structural features and dense structural features. From the perspective of the robustness and efficiency of comprehensive registration, dense-structure-feature-based methods have obvious effectiveness and advantages in handling significant nonlinear radiation differences between multimodal remote sensing images, and can meet many current application needs. Whereas, area-based registration pipeline generally relies on geo-referencing of remote sensing images to predict the rough range of template matching. Feature-based registration methods can be refined into three categories: feature registration based on gradient optimization, local self-similarity and phase consistency. The feature registration of gradient optimization usually designs consistent gradients for specific multimodal images. The generalization of this type of method based on gradient optimization is generally poor, and it is difficult to maintain the same performance on other types of multimodal images. The feature registration of local self-similarity (LSS) also has limitations, as the relatively low discriminative power of LSS descriptors may result in the inability to maintain robust matching performance in the presence of complex nonlinear radiation differences. The feature registration of phase consistency has high computational complexity, and the registration process is generally time-consuming. Feature-based registration pipeline utilizes the local spatial relationship between adjacent pixels to construct a high-dimensional information feature vector for each feature point. Compared with template matching methods, they usually face a heavy computational burden, and in the assumed inevitable serious outliers are prone to occur in matching, especially in multi-modal registration situations where scale, rotation, and radiation differences exist simultaneously. Generally speaking, the registration robustness of feature-based methods is not as stable as that of area-based methods. The deep-learning-based pipeline can be divided into modular and end-to-end registration methods. The most common strategy for modular registration methods is to embed deep networks into feature-based or region-based methods, taking advantage of deep learning"s complete data-driven and high-dimensional deep feature extraction to generate more robust features or more effective descriptors or similarity measures, thereby improving the robustness of image registration. Modular registration methods can be subdivided into three categories: learning-based template matching, learning-based feature matching, and style transfer-based modal unification. Modular registration methods are easy to train and have strong flexibility, but it is difficult to avoid the error accumulation problem that easily occurs in multi-stage tasks and may fall into local optimality. The end-to-end registration methods directly estimate the geometric transformation parameters or deformation field to achieve image registration by directly constructing an end-to-end neural network structure. The training objectives of the end-to-end network are consistent and can obtain the global optimal solution, but there are also problems such as high training difficulty and poor interpretability. However, there is currently no complete and comprehensive database containing all types of multi-modal remote sensing image pairs, and the lack of training and testing data greatly limits the development of deep learning-based registration methods. Further, we share existing public multimodal remote sensing image registration datasets, as well as supplement by a small number of registration datasets in the field of computer vision. Finally, the existing problems and challenges in the current research on high-precision registration of multimodal remote sensing images are analyzed, and a forward-looking outlook on the development trend of future research is given, which aims at promoting further breakthroughs and innovations in the field of multimodal remote sensing image registration.