Current Issue Cover
多模态数据特征融合的广角图像生成

姜智颖1, 张曾翕1, 刘晋源2, 刘日升1(1.大连理工大学软件学院;2.大连理工大学机械学院)

摘 要
目的 图像拼接通过整合不同视角的可见光数据获得广角合成图。不利的天气因素会使采集到的可见光数据退化,导致拼接效果不佳。红外传感器通过热辐射成像,在不利的条件下也能突出目标,克服环境和人为因素的影响。方法 考虑到红外传感器和可见光传感器的成像互补性,本文提出了一个基于多模态数据(红外和可见光数据)特征融合的图像拼接算法。首先利用红外数据准确的结构特征和可见光数据丰富的纹理细节由粗到细地进行偏移估计并通过非参数化的直接线性变换得到变形矩阵。然后将拼接后的红外和可见光数据进行融合,丰富了场景感知信息。结果 本文选择包含530对可拼接多模态图像的真实数据集以及包含200对合成数据集作为测试数据,选取了3个最新的融合方法,包括RFN(residual fusion network)、ReCoNet(recurrent correction network)和DATFuse(dual attention transformer),和7个拼接方法,包括APAP(as projective as possible)、SPW(single-perspective warps)、WPIS(wide parallax image stitching)、SLAS(seam-guided local alignment and stitching)、VFIS(view-free image stitching)、RSFI(reconstructing stitched features to images)以及UDIS++(unsupervised deep image stitching)组成的21种融合-拼接策略进行了定性和定量的性能对比。在拼接性能上,本文方法实现了准确的跨视角场景对齐,平均角点误差降低了53%,避免了鬼影的出现;在多模态互补信息整合方面,本文方法能自适应兼顾红外图像的结构信息以及可见光图像的丰富纹理细节,信息熵较DATFuse-UDIS++策略提升了24.6%。结论 本文方法在结合了红外和可见光图像成像互补优势的基础上,通过多尺度递归估计实现了更加准确的大视角场景生成;与常规可见光图像拼接相鲁棒性更强。
关键词
Multi-Modality Feature Fusion Based Wide Field-of-View Image Generation

(School of Mechanical Engineering, Dalian University of Technology)

Abstract
Objective Image stitching, a cornerstone in the realm of computer vision, is dedicated to assembling a comprehensive field-of-view image by merging visible data captured from multiple vantage points within a specific scene. This fusion enhances scene perception and facilitates advanced processing. The current state-of-the-art in image stitching primarily hinges on the detection of feature points within the scene, necessitating their dense and uniform distribution throughout the image. However, these approaches encounter significant challenges in outdoor environments or when applied to military equipment, where adverse weather conditions such as rain, haze, and low light can severely degrade the quality of visible images. This degradation impedes the extraction of feature points, a critical step in the stitching process. Furthermore, factors like camouflage and occlusion can lead to data loss, disrupting the distribution of feature points and, as a result, compromising the quality of the stitched image. These limitations often manifest as ghosting effects, undermining the effectiveness of the stitching and its robustness in practical applications. In this challenging context, infrared sensors, which detect thermal radiation to image scenes, emerge as a robust alternative. They excel in highlighting targets even under unfavorable conditions, mitigating the impact of environmental and human factors. This capability renders them highly valuable in military surveillance applications. However, a significant drawback of thermal imaging is its inability to capture the rich texture details that are abundant in visible images. These details are crucial for an accurate and comprehensive perception of the scene. Method To overcome the limitations inherent in conventional visible image stitching and to extend the applicability of stitching technology across various environments, this paper proposes a groundbreaking image stitching algorithm. This algorithm is based on the fusion of features from multi-modality images, specifically, infrared and visible images. By exploiting the complementary characteristics of infrared and visible data, our approach integrates the precise structural features of infrared images with the rich, detailed attributes of visible images. This integration is pivotal for achieving accurate homography matrix estimation for scenes viewed from multiple angles. A distinctive aspect of our method is the incorporation of a learnable feature pyramid structure. This structure is instrumental in estimating sparse offsets in a gradual, coarse-to-fine manner, culminating in the derivation of the deformation matrix through a non-parametric direct linear transformation. An innovative aspect of our approach is the fusion of stitched infrared and visible data to enrich the perceptual information of the generated scene. This fusion process entails mining deep features of the scene for contextual semantic information, while also utilizing shallow features to address the deficiencies in up-sampled data. This strategy aims to produce more accurate and reliable fused results. Result We selected a real-world dataset comprising 530 pairs of stitchable multi-modal images and a synthetic dataset containing 200 pairs of data as test datasets. It compared the qualitative and quantitative performance of 21 fusion-stitching strategies, incorporating three of the latest fusion methods, including RFN(residual fusion network), ReCoNet(recurrent correction network), DATFuse(dual attention transformer), and seven stitching methods, namely APAP(as projective as possible), SPW(single-perspective warps), WPIS(wide parallax image stitching), SLAS(seam-guided local alignment and stitching), VFIS(view-free image stitching), RSFI(reconstructing stitched features to images), UDIS++(unsupervised deep image stitching). In terms of stitching performance, our method achieved accurate cross-view scene alignment, with an average corner error reduced by 53%, preventing ghosting and abnormal distortion, even outperforming existing feature-point-based stitching algorithms in challenging large-baseline scenarios. Regarding the integration of multi-modal complementary information, our method adaptively balanced the robust imaging capability of infrared images to highlight structural information and the rich texture details of visible images, resulting in an information entropy increase of 24.6% compared to the DATFuse-UDIS++ strategy, demonstrating significant advantages. Conclusion The proposed infrared and visible based image stitching method marks a significant advancement in the field of computer vision. It effectively addresses the limitations of traditional image stitching methods, particularly under adverse environmental conditions. Moreover, it broadens the scope of stitching technology, making it more versatile and applicable in diverse settings. The combination of infrared and visible imagery in this algorithm could revolutionize scene perception and processing, especially in military and outdoor applications where accuracy, detail, and robustness are of utmost importance. Furthermore, the algorithm"s ability to fuse different types of data opens up new avenues for research and application. It suggests potential uses in other fields such as environmental monitoring, search and rescue operations, and even in artistic and creative domains where novel visual representations are sought. The fusion technique employed in our algorithm not only enhances the visual quality of the stitched images but also adds a layer of information that could be vital in critical applications like surveillance and reconnaissance. It effectively addresses the key challenges faced in traditional image stitching, particularly under adverse environmental conditions, and demonstrates superior performance over existing methods. This advancement not only enhances our ability to perceive and process scenes more effectively but also paves the way for future innovations in image processing and analysis.
Keywords

订阅号|日报