多支路协同的RGB-T图像显著性目标检测

蒋亭亭; 刘昱; 马欣; 孙景林

doi:10.11834/jig.200317

图像分析和识别 | 浏览量 : 0 下载量: 45 CSCD: 4

PDF
导出
分享
收藏
专辑

多支路协同的RGB-T图像显著性目标检测
Multi-path collaborative salient object detection based on RGB-T images
2021年26卷第10期页码：2388-2399
收稿日期：2020-06-28，

修回日期：2020-08-26，

录用日期：2020-9-2，

纸质出版日期：2021-10-16
DOI： 10.11834/jig.200317
稿件说明：

移动端阅览

蒋亭亭, 刘昱, 马欣, 孙景林. 多支路协同的RGB-T图像显著性目标检测[J]. 中国图象图形学报, 2021,26(10):2388-2399. DOI： 10.11834/jig.200317.

Tingting Jiang, Yu Liu, Xin Ma, Jinglin Sun. Multi-path collaborative salient object detection based on RGB-T images[J]. Journal of image and graphics, 2021, 26(10): 2388-2399. DOI： 10.11834/jig.200317.

摘要

目的

显著性目标检测是机器视觉应用的基础，然而目前很多方法在显著性物体与背景相似、低光照等一些复杂场景得到的效果并不理想。为了提升显著性检测的性能，提出一种多支路协同的RGB-T（thermal）图像显著性目标检测方法。

方法

将模型主体设计为两条主干网络和三条解码支路。主干网络用于提取RGB图像和Thermal图像的特征表示，解码支路则分别对RGB特征、Thermal特征以及两者的融合特征以协同互补的方式预测图像中的显著性物体。在特征提取的主干网络中，通过特征增强模块实现多模图像的融合互补，同时采用适当修正的金字塔池化模块，从深层次特征中获取全局语义信息。在解码过程中，利用通道注意力机制进一步区分卷积神经网络（convolutional neural networks，CNN）生成的特征在不同通道之间对应的语义信息差异。

结果

在VT821和VT1000两个数据集上进行测试，本文方法的最大F-measure值分别为0.843 7和0.880 5，平均绝对误差（mean absolute error，MAE）值分别为0.039 4和0.032 2，相较于对比方法，提升了整体检测性能。

结论

通过对比实验表明，本文提出的方法提高了显著性检测的稳定性，在一些低光照场景取得了更好效果。

Abstract

Objective

Saliency detection is a fundamental technology in computer vision and image processing

which aims to identify the most visually distinctive objects or regions in an image. As a preprocessing step

salient object detection plays a critical role in many computer vision applications

including visual tracking

scene classification

image retrieval

and content-based image compression. While numerous salient object detection methods have been presented

most of them are designed for RGB images only or depth RGB (RGB-D) images. However

these methods remain challenging in some complex scenarios. RGB methods may fail to distinguish salient objects from backgrounds when exposed to similar foreground and background or low-contrast conditions. RGB-D methods also suffer from challenging scenarios characterized by low-light conditions and variations in illumination. Considering that thermal infrared images are invariant to illumination conditions

we propose a multi-path collaborative salient object detection method in this study

which is designed to improve the performance of saliency detection by using the multi-mode feature information of RGB and thermal images.

Method

In this study

we design a novel end-to-end deep neural network for thermal RGB (RGB-T) salient object detection

which consists of an encoder network and a decoder network

including the feature enhance module

the pyramid pooling module

the channel attention module

and the

-norm fusion strategy. First

the main body of the model contains two backbone networks for extracting the feature representations of RGB and thermal images

respectively. Then

three decoding branches are used to predict the saliency maps in a coordinated and complementary manner for extracted RGB feature

thermal feature

and fusion feature of both

respectively. The two backbone network streams have the same structure

which is based on Visual Geometry Group 19-layer (VGG-19) net. In order to make a better fit with saliency detection task

we only maintain five convolutional blocks of VGG-19 net and discard the last pooling and fully connected layers to preserve more spatial information from the input image. Second

the feature enhance module is used to fully extract and fuse multi-modal complementary cues from RGB and thermal streams. The modified pyramid pooling module is employed to capture global semantic information from deep-level features

which is used to locate salient objects. Finally

in the decoding process

the channel attention mechanism is designed to distinguish the semantic differences between the different channels

thereby improving the decoder's ability to separate salient objects from backgrounds. The entire model is trained in an end-to-end manner. Our training set consists of 900 aligned RGB-T image pairs that are randomly selected from each subset of the VT1000 dataset. To prevent overfitting

we augment the training set by flipping and rotating operations. Our method is implemented with PyTorch toolbox and trained on a PC with GTX 1080Ti GPU and 11 GB of memory. The input images are uniformly resized to 256×256 pixels. The momentum

weight decay

and learning rate are set as 0.9

0.000 5

and 1E-9

respectively. During training

the softmax entropy loss is used to converge the entire network.

Result

We compare our model with four state-of-the-art saliency models

including two RGB-based methods and two RGB-D-based methods

on two public datasets

namely

VT821 and VT1000. The quantitative evaluation metrics contain F-measure

mean absolute error (MAE)

and precision-recall(PR) curves

and we also provide several saliency maps of each method for visual comparison. The experimental results demonstrate that our model outperforms other methods

and the saliency maps have more refined shapes under challenging conditions

such as poor illumination and low contrast. Compared with the other four methods in VT821

our method obtains the best results on maximum F-measure and MAE. The maximum F-measure (higher is better) increases by 0.26%

and the MAE (less is better) decreases by 0.17% than the second-ranked method. Compared with the other four methods in VT1000

our model also achieves the best result on maximum F-measure

which reaches 88.05% and increases by 0.46% compared with the second-ranked method. However

the MAE is 3.22%

which increases by 0.09% and is slightly poorer than the first-ranked method.

Conclusion

We propose a CNN-based method for RGB-T salient object detection. To the best of our knowledge

existing saliency detection methods are mostly based on RGB or RGB-D images

so it is very meaningful to explore the application of CNN for RGB-T salient object detection. The experimental results on two public RGB-T datasets demonstrate that the method proposed in this study performs better than the state-of-the-art methods

especially for challenging scenes with poor illumination

complex background

or low contrast

which proves that it is effective to improve the performance by fusing multi-modal information from RGB and thermal images. However

public datasets for RGB-T salient detection are lacking

which is very important for the performance of deep learning network. At the same time

detection speed is a key measurement in the preprocessing step of other computer vision tasks. Thus

in the future work

we will collect more high-quality datasets for RGB-T salient detection and design more light-weight models to increase the speed of detection.

关键词

Keywords

references

Chen H and Li Y F. 2019. Three-stream attention-aware network for RGB-D salient object detection. IEEE Transactions on Image Processing, 28(6): 2825-2835[DOI:10.1109/TIP.2019.2891104]

Fan D P, Wang J and Liang X M. 2015. Improving image retrieval using the context-aware saliency areas. Applied Mechanics and Materials, 734: 596-599[DOI:10.4028/www.scientific.net/AMM.734.596]

Guo C L and Zhang L M. 2010. A novel multiresolution spatiotemporal saliency detection model and its applications in image and video compression. IEEE Transactions on Image Processing, 19(1): 185-198[DOI:10.1109/TIP.2009.2030969]

Ha Q S, Watanabe K, Karasawa T, Ushiku Y and Harada T. 2017. MFNet: towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes//Proceedings of 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems. Vancouver, Canada: IEEE: 5108-5115[ DOI: 10.1109/IROS.2017.8206396 http://dx.doi.org/10.1109/IROS.2017.8206396 ]

He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778[ DOI: 10.1109/CVPR.2016.90 http://dx.doi.org/10.1109/CVPR.2016.90 ]

Li C L, Xia W, Yan Y, Luo B and Tang J. 2020. Segmenting objects in day and night: edge-conditioned CNN for thermal image semantic segmentation. IEEE Transactions on Neural Networks and Learning Systems, 32(7): 3069-3082[DOI:10.1109/TNNLS.2020.3009373]

Li H and Wu X J. 2019. DenseFuse: a fusion approach to infrared and visible images. IEEE Transactions on Image Processing, 28(5): 2614-2623[DOI:10.1109/TIP.2018.2887342]

Liu J J, Hou Q B, Cheng M M, Feng J S and Jiang J M. 2019. A simple pooling-based design for real-time salient object detection//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 3912-3921[ DOI: 10.1109/CVPR.2019.00404 http://dx.doi.org/10.1109/CVPR.2019.00404 ]

Perazzi F, Krähenbühl P, Pritch Y and Hornung A. 2012. Saliency filters: contrast based filtering for salient region detection//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, USA: IEEE: 733-740[ DOI: 10.1109/CVPR.2012.6247743 http://dx.doi.org/10.1109/CVPR.2012.6247743 ]

Piao Y R, Ji W, Li J J, Zhang M and Lu H C. 2019. Depth-induced multi-scale recurrent attention network for saliency detection//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 7253-7262[ DOI: 10.1109/ICCV.2019.00735 http://dx.doi.org/10.1109/ICCV.2019.00735 ]

Piao Y R, Rong Z K, Zhang M, Ren W S and Lu H C. 2020. A2dele: adaptive and attentive depth distiller for efficient RGB-D salient object detection//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE[ DOI: 10.1109/CVPR42600.2020.00908 http://dx.doi.org/10.1109/CVPR42600.2020.00908 ]

Simonyan K and Zisserman A. 2015. Very deep convolutional networks for large-scale image recognition[EB/OL]. https://arxiv.org/pdf/1409.1556.pdf https://arxiv.org/pdf/1409.1556.pdf

Tang J, Fan D Z, Wang X X, Tu Z Z and Li C L. 2019. RGBT salient object detection: benchmark and a novelcooperative ranking approach. IEEE Transactions on Circuits and Systems for Video Technology, 30(12): 4421-4433[DOI:10.1109/TCSVT.2019.2951621]

Tu Z Z, Xia T, Li C L, Wang X X, Ma Y and Tang J. 2019. RGB-T image saliency detection via collaborative graph learning. IEEE Transactions on Multimedia, 22(1): 160-173[DOI:10.1109/TMM.2019.2924578]

Wang F L, Zhen Y, Zhong B N and Ji R R. 2015. Robust infrared target tracking based on particle filter with embedded saliency detection. Information Sciences, 301: 215-226[DOI:10.1016/j.ins.2014.12.022]

Wang N N and Gong X J. 2019. Adaptive fusion for RGB-D salient object detection. IEEE Access, 7: 55277-55284[DOI:10.1109/ACCESS.2019.2913107]

Wang T T, Borji A, Zhang L H, Zhang P P and Lu H C. 2017. A stagewise refinement model for detecting salient objects in images//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 4039-4048[ DOI: 10.1109/ICCV.2017.433 http://dx.doi.org/10.1109/ICCV.2017.433 ]

Wei Y C, Liang X D, Chen Y P, Shen X H, Cheng M M, Feng J S, Zhao Y and Yan S C. 2017. STC: a simple to complex framework for weakly-supervised semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(11): 2314-2320[DOI:10.1109/TPAMI.2016.2636150]

Wu Z, Su L and Huang Q M. 2019. Cascaded partial decoder for fast and accurate salient object detection//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 3902-3911[ DOI: 10.1109/CVPR.2019.00403 http://dx.doi.org/10.1109/CVPR.2019.00403 ]

Zhang L H, Yang C, Lu H C, Ruan X and Yang M H. 2017a. Ranking saliency. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(9): 1892-1904[DOI:10.1109/TPAMI.2016.2609426]

Zhang P P, Wang D, Lu H C, Wang H Y and Ruan X. 2017b. Amulet: aggregating multi-level convolutional features for salient object detection//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 202-211[ DOI: 10.1109/ICCV.2017.31 http://dx.doi.org/10.1109/ICCV.2017.31 ]

Zhang Q, Huang N C, Yao L, Zhang D W, Shan C F and Han J G. 2019. RGB-T salient object detection via fusing multi-level CNN features. IEEE Transactions on Image Processing, 29: 3321-3335[DOI:10.1109/TIP.2019.2959253]

Zhang X N, Wang T T, Qi J Q, Lu H C and Wang G. 2018. Progressive attention guided recurrent network for salient object detection//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 714-722[ DOI: 10.1109/CVPR.2018.00081 http://dx.doi.org/10.1109/CVPR.2018.00081 ]

Zhao J X, Cao Y, Fan D P, Cheng M M, Li X Y and Zhang L. 2019. Contrast prior and fluid pyramid integration for RGBD salient object detection//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 3922-3931[ DOI: 10.1109/CVPR.2019.00405 http://dx.doi.org/10.1109/CVPR.2019.00405 ]

文章被引用时，请邮件提醒。

提交

融合通道注意力的跨尺度Transformer图像超分辨率重建

多尺度融合增强的纵膈淋巴结超声弹性图像分割

LFSCA-UNet：基于空间与通道注意力机制的肝纤维化区域分割网络

傅里叶变换通道注意力网络的胆管癌高光谱图像分割