Current Issue Cover
多尺度深度特征融合的变化检测

樊玮, 周末, 黄睿(中国民航大学计算机科学与技术学院, 天津 300300)

摘 要
目的 图像的变化检测是视觉领域的一个重要问题,传统的变化检测对光照变化、相机位姿差异过于敏感,使得在真实场景中检测结果较差。鉴于卷积神经网络(convolutional neural networks,CNN)可以提取图像中的深度语义特征,提出一种基于多尺度深度特征融合的变化检测模型,通过提取并融合图像的高级语义特征来克服检测噪音。方法 使用VGG(visual geometry group)16作为网络的基本模型,采用孪生网络结构,分别从参考图像和查询图像中提取不同网络层的深度特征。将两幅图像对应网络层的深度特征拼接后送入一个编码层,通过编码层逐步将高层与低层网络特征进行多尺度融合,充分结合高层的语义和低层的纹理特征,检测出准确的变化区域。使用卷积层对每一个编码层的特征进行运算产生对应尺度的预测结果。将不同尺度的预测结果融合得到进一步细化的检测结果。结果 与SC_SOBS(SC-self-organizing background subtraction)、SuBSENSE(self-balanced sensitivity segmenter)、FGCD(fine-grained change detection)和全卷积网络(fully convolutional network,FCN)4种检测方法进行对比。与性能第2的模型FCN相比,本文方法在VL_CMU_CD(visual localization of Carnegie Mellon University for change detection)数据集中,综合评价指标F1值和精度值分别提高了12.2%和24.4%;在PCD(panoramic change detection)数据集中,F1值和精度值分别提高了2.1%和17.7%;在CDnet(change detection net)数据集中,F1值和精度值分别提高了8.5%和5.8%。结论 本文提出的基于多尺度深度特征融合的变化检测方法,利用卷积神经网络的不同网络层特征,有效克服了光照和相机位姿差异,在不同数据集上均能得到较为鲁棒的变化检测结果。
关键词
Multiscale deep features fusion for change detection

Fan Wei, Zhou Mo, Huang Rui(Civil Aviation University of China, Computer Science and Technology, Tianjin 300300, China)

Abstract
Objective Change detection aims at detecting the difference of the images captured from same scene in different time observations. This condition is an important research problem in computer vision. However, the traditional change detection methods, which use the handcrafted features and heuristic models, suffer from lighting variations and camera pose differences, resulting in bad change detection results. Recent deep learning-based convolutional neural networks (CNN) achieve huge success on several computer vision problems, such as image classification, semantic segmentation, and saliency detection. The main reason of the success of the deep learning-based methods is the abstract ability of CNN. To conquer the bad effects of lighting variations and camera pose differences, we can employ deep learning-based CNN in change detection problems. Unlike semantic segmentation, change detection inputs image pairs of two time observations. Thus, a key research problem is how to design an effective architecture of CNN, which can fully explore the intrinsic changes of the image pairs. To generate robust change detection results, we propose in this study a multiscale deep feature fusion-based change detection (MDFCD). Method The proposed MDFCD network has two streams of feature extracting sub-networks, which share weight parameters. Each sub-network is responsible for learning to extract semantic features from the corresponding RGB image. We use VGG(visual geometry group)16 as the basic backbone of the proposed MDFCD. The last fully connected layers of VGG16 are removed to save the spatial resolution of the features of the last convolutional layer. We adopt the features of convolutional blocks Conv3, Conv4, and Conv5 of VGG16 as our multiscale deep features because they can capture high-level, middle-level, and low-level features. Then, the Enc (encoding) module is proposed to fuse the deep features from the two time observations of the same convolutional block. We use "concat" operation to concatenate the features. The resulted features are input into Enc to generate change detection adaptive features at the corresponding feature level. The encoded features from the lower layer are upsampled twice in height and width. Then, we concatenate the deep features of the previous layers' convolutional blocks. Subsequently, Enc is used again to learn adaptive features. By progressively incorporating the features from Conv5 to Conv3, we obtain deep fusion of CNN features at multiple scales. To generate robust change detection, we add a convolutional layer with 2×3×3 convolutional filters to generate change prediction at each scale encoding module. Then, the change predictions of all scales are concatenated together to produce final change detection results. Note that we use bicubic upsampling operation to upsample the change detection map at each scale to the size of the input image. Result We compared three benchmark datasets, namely, VL_CMU_CD(visual localization of Carnegie Mellon University for change detection), PCD(panoramic change detection), CDnet(change detection net), by the state-of-the-art change detection methods, that is, FGCD(fine-grained change detection), SC_SOBS(SC-self-organizing background subtraction), SuBSENSE(self-balanced sensitirity segmenter), and FCN(fully convolutional network). We employed F1-measure, recall, precision, specific, FPR(false positive rate), FNR(false negative rate), PWC(percentage of wrong classification) to evaluate the difference of the compared change detection methods. The experiments show that MDFCD are better than the other compared methods. Among the compared methods, deep learning-based change detection method FCD performs the best. On VL_CMU_CD, the F1-measure and Precision of MDFCD achieve 12.2% and 24.4% relative improvements over the second-placed change detection method FCN, respectively. On PCD, the F1-measure and precision of MDFCD obtain 2.1% and 17.7% relative improvements over FCN, respectively. On CDnet, compared with FCN, our F1-measure and precision achieve 8.5% and 5.8% relative improvements, respectively. From the experiments, we can find that MDFCD can detect the fine grained changes, such as telegraph poles. The proposed MDFCD are better in distinguishing the real changes with false changes caused by lighting variations and camera pose difference compared with FCN. Conclusion We studied how to effectively explore the deep convolutional neural networks for change detection problem. MDFCD network is proposed to alleviate the bad effects introduced by lighting variations and camera pose differences. The proposed method adopts a siamese network with VGG16 as the backbone. Each path is responsible for extracting deep features from reference and query images. We also proposed encoding module that fuses multiscale deep convolutional features and learn change detection adaptive features. The deep features are integrated together from high layers' semantic features with low layers' texture features. With this fusion strategy, the proposed method can generate more robust change detection results than other compared methods. The high layers' semantic features can effectively avoid the negative changes caused by lighting and season change. Meanwhile, the low layers' texture features help the proposed method obtain accurate changes at the object boundaries. Compared with deep learning method, FCN, where in the input is concatenate reference and query images, our method of extracting features with respect to each image can extract representatively features for change detection. However, as a general problem of deep learning-based methods, one should use large volume of training images to train CNNs. Another problem is that the present change detection methods pay considerable attention on region changes but not on object-level changes. In our future work, we plan to use weak supervised and unsupervised method to study the change detection to avoid using pixel-level labeled training images. We also plan to study incorporating object detection in change detection to generate object-level changes.
Keywords

订阅号|日报