多支路协同的RGB-T图像显著性目标检测
Multi-path collaborative salient object detection based on RGB-T images
- 2021年26卷第10期 页码:2388-2399
收稿日期:2020-06-28,
修回日期:2020-08-26,
录用日期:2020-9-2,
纸质出版日期:2021-10-16
DOI: 10.11834/jig.200317
移动端阅览
浏览全部资源
扫码关注微信
收稿日期:2020-06-28,
修回日期:2020-08-26,
录用日期:2020-9-2,
纸质出版日期:2021-10-16
移动端阅览
目的
2
显著性目标检测是机器视觉应用的基础,然而目前很多方法在显著性物体与背景相似、低光照等一些复杂场景得到的效果并不理想。为了提升显著性检测的性能,提出一种多支路协同的RGB-T(thermal)图像显著性目标检测方法。
方法
2
将模型主体设计为两条主干网络和三条解码支路。主干网络用于提取RGB图像和Thermal图像的特征表示,解码支路则分别对RGB特征、Thermal特征以及两者的融合特征以协同互补的方式预测图像中的显著性物体。在特征提取的主干网络中,通过特征增强模块实现多模图像的融合互补,同时采用适当修正的金字塔池化模块,从深层次特征中获取全局语义信息。在解码过程中,利用通道注意力机制进一步区分卷积神经网络(convolutional neural networks,CNN)生成的特征在不同通道之间对应的语义信息差异。
结果
2
在VT821和VT1000两个数据集上进行测试,本文方法的最大F-measure值分别为0.843 7和0.880 5,平均绝对误差(mean absolute error,MAE)值分别为0.039 4和0.032 2,相较于对比方法,提升了整体检测性能。
结论
2
通过对比实验表明,本文提出的方法提高了显著性检测的稳定性,在一些低光照场景取得了更好效果。
Objective
2
Saliency detection is a fundamental technology in computer vision and image processing
which aims to identify the most visually distinctive objects or regions in an image. As a preprocessing step
salient object detection plays a critical role in many computer vision applications
including visual tracking
scene classification
image retrieval
and content-based image compression. While numerous salient object detection methods have been presented
most of them are designed for RGB images only or depth RGB (RGB-D) images. However
these methods remain challenging in some complex scenarios. RGB methods may fail to distinguish salient objects from backgrounds when exposed to similar foreground and background or low-contrast conditions. RGB-D methods also suffer from challenging scenarios characterized by low-light conditions and variations in illumination. Considering that thermal infrared images are invariant to illumination conditions
we propose a multi-path collaborative salient object detection method in this study
which is designed to improve the performance of saliency detection by using the multi-mode feature information of RGB and thermal images.
Method
2
In this study
we design a novel end-to-end deep neural network for thermal RGB (RGB-T) salient object detection
which consists of an encoder network and a decoder network
including the feature enhance module
the pyramid pooling module
the channel attention module
and the
l
1
-norm fusion strategy. First
the main body of the model contains two backbone networks for extracting the feature representations of RGB and thermal images
respectively. Then
three decoding branches are used to predict the saliency maps in a coordinated and complementary manner for extracted RGB feature
thermal feature
and fusion feature of both
respectively. The two backbone network streams have the same structure
which is based on Visual Geometry Group 19-layer (VGG-19) net. In order to make a better fit with saliency detection task
we only maintain five convolutional blocks of VGG-19 net and discard the last pooling and fully connected layers to preserve more spatial information from the input image. Second
the feature enhance module is used to fully extract and fuse multi-modal complementary cues from RGB and thermal streams. The modified pyramid pooling module is employed to capture global semantic information from deep-level features
which is used to locate salient objects. Finally
in the decoding process
the channel attention mechanism is designed to distinguish the semantic differences between the different channels
thereby improving the decoder's ability to separate salient objects from backgrounds. The entire model is trained in an end-to-end manner. Our training set consists of 900 aligned RGB-T image pairs that are randomly selected from each subset of the VT1000 dataset. To prevent overfitting
we augment the training set by flipping and rotating operations. Our method is implemented with PyTorch toolbox and trained on a PC with GTX 1080Ti GPU and 11 GB of memory. The input images are uniformly resized to 256×256 pixels. The momentum
weight decay
and learning rate are set as 0.9
0.000 5
and 1E-9
respectively. During training
the softmax entropy loss is used to converge the entire network.
Result
2
We compare our model with four state-of-the-art saliency models
including two RGB-based methods and two RGB-D-based methods
on two public datasets
namely
VT821 and VT1000. The quantitative evaluation metrics contain F-measure
mean absolute error (MAE)
and precision-recall(PR) curves
and we also provide several saliency maps of each method for visual comparison. The experimental results demonstrate that our model outperforms other methods
and the saliency maps have more refined shapes under challenging conditions
such as poor illumination and low contrast. Compared with the other four methods in VT821
our method obtains the best results on maximum F-measure and MAE. The maximum F-measure (higher is better) increases by 0.26%
and the MAE (less is better) decreases by 0.17% than the second-ranked method. Compared with the other four methods in VT1000
our model also achieves the best result on maximum F-measure
which reaches 88.05% and increases by 0.46% compared with the second-ranked method. However
the MAE is 3.22%
which increases by 0.09% and is slightly poorer than the first-ranked method.
Conclusion
2
We propose a CNN-based method for RGB-T salient object detection. To the best of our knowledge
existing saliency detection methods are mostly based on RGB or RGB-D images
so it is very meaningful to explore the application of CNN for RGB-T salient object detection. The experimental results on two public RGB-T datasets demonstrate that the method proposed in this study performs better than the state-of-the-art methods
especially for challenging scenes with poor illumination
complex background
or low contrast
which proves that it is effective to improve the performance by fusing multi-modal information from RGB and thermal images. However
public datasets for RGB-T salient detection are lacking
which is very important for the performance of deep learning network. At the same time
detection speed is a key measurement in the preprocessing step of other computer vision tasks. Thus
in the future work
we will collect more high-quality datasets for RGB-T salient detection and design more light-weight models to increase the speed of detection.
Chen H and Li Y F. 2019. Three-stream attention-aware network for RGB-D salient object detection. IEEE Transactions on Image Processing, 28(6): 2825-2835[DOI:10.1109/TIP.2019.2891104]
Fan D P, Wang J and Liang X M. 2015. Improving image retrieval using the context-aware saliency areas. Applied Mechanics and Materials, 734: 596-599[DOI:10.4028/www.scientific.net/AMM.734.596]
Guo C L and Zhang L M. 2010. A novel multiresolution spatiotemporal saliency detection model and its applications in image and video compression. IEEE Transactions on Image Processing, 19(1): 185-198[DOI:10.1109/TIP.2009.2030969]
Ha Q S, Watanabe K, Karasawa T, Ushiku Y and Harada T. 2017. MFNet: towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes//Proceedings of 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems. Vancouver, Canada: IEEE: 5108-5115[ DOI: 10.1109/IROS.2017.8206396 http://dx.doi.org/10.1109/IROS.2017.8206396 ]
He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778[ DOI: 10.1109/CVPR.2016.90 http://dx.doi.org/10.1109/CVPR.2016.90 ]
Li C L, Xia W, Yan Y, Luo B and Tang J. 2020. Segmenting objects in day and night: edge-conditioned CNN for thermal image semantic segmentation. IEEE Transactions on Neural Networks and Learning Systems, 32(7): 3069-3082[DOI:10.1109/TNNLS.2020.3009373]
Li H and Wu X J. 2019. DenseFuse: a fusion approach to infrared and visible images. IEEE Transactions on Image Processing, 28(5): 2614-2623[DOI:10.1109/TIP.2018.2887342]
Liu J J, Hou Q B, Cheng M M, Feng J S and Jiang J M. 2019. A simple pooling-based design for real-time salient object detection//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 3912-3921[ DOI: 10.1109/CVPR.2019.00404 http://dx.doi.org/10.1109/CVPR.2019.00404 ]
Perazzi F, Krähenbühl P, Pritch Y and Hornung A. 2012. Saliency filters: contrast based filtering for salient region detection//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, USA: IEEE: 733-740[ DOI: 10.1109/CVPR.2012.6247743 http://dx.doi.org/10.1109/CVPR.2012.6247743 ]
Piao Y R, Ji W, Li J J, Zhang M and Lu H C. 2019. Depth-induced multi-scale recurrent attention network for saliency detection//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 7253-7262[ DOI: 10.1109/ICCV.2019.00735 http://dx.doi.org/10.1109/ICCV.2019.00735 ]
Piao Y R, Rong Z K, Zhang M, Ren W S and Lu H C. 2020. A2dele: adaptive and attentive depth distiller for efficient RGB-D salient object detection//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE[ DOI: 10.1109/CVPR42600.2020.00908 http://dx.doi.org/10.1109/CVPR42600.2020.00908 ]
Simonyan K and Zisserman A. 2015. Very deep convolutional networks for large-scale image recognition[EB/OL]. https://arxiv.org/pdf/1409.1556.pdf https://arxiv.org/pdf/1409.1556.pdf
Tang J, Fan D Z, Wang X X, Tu Z Z and Li C L. 2019. RGBT salient object detection: benchmark and a novelcooperative ranking approach. IEEE Transactions on Circuits and Systems for Video Technology, 30(12): 4421-4433[DOI:10.1109/TCSVT.2019.2951621]
Tu Z Z, Xia T, Li C L, Wang X X, Ma Y and Tang J. 2019. RGB-T image saliency detection via collaborative graph learning. IEEE Transactions on Multimedia, 22(1): 160-173[DOI:10.1109/TMM.2019.2924578]
Wang F L, Zhen Y, Zhong B N and Ji R R. 2015. Robust infrared target tracking based on particle filter with embedded saliency detection. Information Sciences, 301: 215-226[DOI:10.1016/j.ins.2014.12.022]
Wang N N and Gong X J. 2019. Adaptive fusion for RGB-D salient object detection. IEEE Access, 7: 55277-55284[DOI:10.1109/ACCESS.2019.2913107]
Wang T T, Borji A, Zhang L H, Zhang P P and Lu H C. 2017. A stagewise refinement model for detecting salient objects in images//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 4039-4048[ DOI: 10.1109/ICCV.2017.433 http://dx.doi.org/10.1109/ICCV.2017.433 ]
Wei Y C, Liang X D, Chen Y P, Shen X H, Cheng M M, Feng J S, Zhao Y and Yan S C. 2017. STC: a simple to complex framework for weakly-supervised semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(11): 2314-2320[DOI:10.1109/TPAMI.2016.2636150]
Wu Z, Su L and Huang Q M. 2019. Cascaded partial decoder for fast and accurate salient object detection//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 3902-3911[ DOI: 10.1109/CVPR.2019.00403 http://dx.doi.org/10.1109/CVPR.2019.00403 ]
Zhang L H, Yang C, Lu H C, Ruan X and Yang M H. 2017a. Ranking saliency. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(9): 1892-1904[DOI:10.1109/TPAMI.2016.2609426]
Zhang P P, Wang D, Lu H C, Wang H Y and Ruan X. 2017b. Amulet: aggregating multi-level convolutional features for salient object detection//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 202-211[ DOI: 10.1109/ICCV.2017.31 http://dx.doi.org/10.1109/ICCV.2017.31 ]
Zhang Q, Huang N C, Yao L, Zhang D W, Shan C F and Han J G. 2019. RGB-T salient object detection via fusing multi-level CNN features. IEEE Transactions on Image Processing, 29: 3321-3335[DOI:10.1109/TIP.2019.2959253]
Zhang X N, Wang T T, Qi J Q, Lu H C and Wang G. 2018. Progressive attention guided recurrent network for salient object detection//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 714-722[ DOI: 10.1109/CVPR.2018.00081 http://dx.doi.org/10.1109/CVPR.2018.00081 ]
Zhao J X, Cao Y, Fan D P, Cheng M M, Li X Y and Zhang L. 2019. Contrast prior and fluid pyramid integration for RGBD salient object detection//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 3922-3931[ DOI: 10.1109/CVPR.2019.00405 http://dx.doi.org/10.1109/CVPR.2019.00405 ]
相关作者
相关机构