融合事件相机的视觉场景识别
刘熠晨1, 余磊2, 余淮2, 杨文2(1.武汉大学电子信息学院;2.中国湖北省武汉市武汉大学电子信息学院) 摘 要
目的 传统视觉场景识别算法的性能依赖光学图像的成像质量,因此高速和高动态范围场景导致的图像质量下降会进一步影响视觉场景识别算法的性能。针对此问题,本文提出融合事件相机的视觉场景识别算法,利用事件相机的低延时和高动态范围的性质,提升视觉场景识别算法在高速和高动态范围等极端场景下的识别性能。方法 本文提出的方法首先使用图像特征提取模块提取质量良好的参考图像的特征,然后使用多模态特征融合模块提取查询图像及其曝光区间事件信息的多模态融合特征,最后通过特征匹配查找与查询图像最相似的参考图像。结果 在MVSEC和RobotCar两个数据集上的实验表明,本文方法对比现有视觉场景识别算法在高速和高动态范围场景下具有明显优势。具体来说,在高速高动态范围场景下,本文提出的方法在MVSEC数据集上相比现有算法在召回率与精度上最高分别有5.39%与8.55%的提升,在RobotCar数据集上相比现有算法在召回率与精度上最高分别有3.36%与4.41%的提升。结论 本文提出了融合事件相机的视觉场景识别算法,利用了事件相机在高速和高动态范围场景的成像优势,有效提升了视觉场景识别算法在高速和高动态范围场景下的场景识别性能。
关键词
Visual Place Recognition with Fusion Event Cameras
(Electronic Information School, Wuhan, Hubei, 430072, CN) Abstract
Objective The performance of traditional visual place recognition (VPR) algorithms depends on the imaging quality of optical images, however, optical cameras suffer from low temporal resolution and low dynamic range. For example, in a scene with high-speed mo-tion, it is difficult for an optical camera to continuously capture the rapid changes in the position of the scene in the imaging plane, resulting in motion blur in the output image, and when the scene brightness exceeds the recording range of the camera photosensitive chip, the output image of the optical camera Degradation such as underexposure and overexposure may occur. The blurring, underex-posure and overexposure of optical images will lead to the loss of image texture structure information, which will lead to the reduc-tion of the performance of visual scene recognition algorithms. Therefore, the recognition performance of image-based VPR algo-rithms is poor in high-speed and high dynamic range (HDR) scenarios. Event camera is a new type of visual sensor inspired by bio-logical vision. It has the characteristics of low latency and high dynamic range. Using event camera can effectively improve the recognition performance of VPR algorithms in high-speed and high-dynamic-range scenes. Therefore, this paper proposes a VPR algorithm fused with event cameras, which utilizes the low latency and high dynamic range characteristics of event cameras to im-prove the recognition performance of VPR algorithms in extreme scenarios such as high speed and HDR. Method The proposed method first fuses the information of the query image and the events within its exposure time interval to obtain the multimodal fea-tures of the query location, and then retrieves the reference image closest to the multimodal features of the query location in the refer-ence image database. To compare the multimodal query information with the reference image, the proposed method first extracts the features of the reference image with good quality using the Image Feature Extraction (IFE) module, and then inputs the query image and its events within the exposure time interval to the multimodal Multi-modal fusion features are obtained by the Multi-modal Fea-ture Fusion (MFF) module, and finally the reference image most similar to the query image is obtained through feature matching retrieval, thereby completing visual scene recognition. The training of the network is supervised by a triplet loss. The triplet loss drives the network to learn in the direction where the vector distance between the query feature and the positive feature is smaller, and the vector distance between the negative feature is larger, until the difference between the negative distance and the positive distance is not less than the similarity distance constant. Therefore, it is possible to distinguish reference images with similar and different fields of view from the query image according to the similarity in the feature vector space, and further complete the task of VPR. Result The experiments are carried out on the two datasets of MVSEC and RobotCar respectively. The proposed method is compared in experiments with image-based, event-camera-based, and methods that utilize both image and event camera information. Under differ-ent exposure scenarios and high-speed scenarios, the proposed method has obvious advantages over existing visual scene recognition algorithms. Specifically, on the MVSEC dataset, the proposed method can reach a maximum recall rate of 99.36%, and a maximum recognition accuracy of 96.34%, which improves the recall rate by 5.39% and the precision by 8.55% compared with the existing VPR methods. And on the RobotCar dataset, the proposed method can reach a maximum recall rate of 97.33%, and a maximum recognition accuracy of 93.30%, which improves the recall rate by 3.36% and the precision by 4.41% compared with the existing VPR methods. The experimental results show that in the high-speed and HDR scene, the proposed method has obvious advantages compared with the existing VPR algorithm and there is a significant improvement in the recognition performance. Conclusion This paper proposes a VPR algorithm that fuses event cameras, which utilizes the characteristics of low latency and high dynamic range of event cameras and overcomes the problem of loss of image information in high-speed and HDR scenes. This method effectively fus-es information from both image and event modalities, thereby improving the performance of VPR in high-speed and HDR scenarios.
Keywords
|