深度聚类注意力机制下的显著对象检测

陈庆文; 谢宏文; 查浩; 奚瑜; 张雪

doi:10.11834/jig.200081

图像分析和识别 | 浏览量 : 0 下载量: 33 CSCD: 2

PDF
导出
分享
收藏
专辑

深度聚类注意力机制下的显著对象检测
Salient object detection based on deep clustering attention mechanism
2021年26卷第5期页码：1017-1029
收稿：2020-03-13，

修回：2020-7-15，

录用：2020-7-22，

纸质出版：2021-05-16
DOI： 10.11834/jig.200081
稿件说明：

移动端阅览

陈庆文, 谢宏文, 查浩, 奚瑜, 张雪. 深度聚类注意力机制下的显著对象检测[J]. 中国图象图形学报, 2021,26(5):1017-1029. DOI： 10.11834/jig.200081.

Qingwen Chen, Hongwen Xie, Hao Zha, Yu Xi, Xue Zhang. Salient object detection based on deep clustering attention mechanism[J]. Journal of Image and Graphics, 2021, 26(5): 1017-1029. DOI： 10.11834/jig.200081.

摘要

目的

为了得到精确的显著对象分割结果，基于深度学习的方法大多引入注意力机制进行特征加权，以抑制噪声和冗余信息，但是对注意力机制的建模过程粗糙，并将所有特征均等处理，无法显式学习不同通道以及不同空间区域的全局重要性。为此，本文提出一种基于深度聚类注意力机制(deep cluster attention，DCA)的显著对象检测算法DCANet (DCA network)，以更好地建模特征级别的像素上下文关联。

方法

DCA显式地将特征图分别在通道和空间上进行区域划分，即将特征聚类分为前景敏感区和背景敏感区。然后在类内执行一般性的逐像素注意力加权，并在类间进一步执行语义级注意力加权。DCA的思想清晰易懂，参数量少，可以便捷地部署到任意显著性检测网络中。

结果

在6个数据集上与19种方法的对比实验验证了DCA对得到精细显著对象分割掩码的有效性。在各项评价指标上，部署DCA之后的模型效果都得到了提升。在ECSSD (extended cornplex scene saliency dataset)数据集上，DCANet的性能比第2名在F值上提升了0.9%；在DUT-OMRON (Dalian University of Technology and OMRON Corporation)数据集中，DCANet的性能比第2名在F值上提升了0.5%，平均绝对误差(mean absolute error，MAE)降低了3.2%；在HKU-IS数据集上，DCANet的性能比第2名在F值上提升了0.3%，MAE降低了2.8%；在PASCAL (pattern analysis，statistical modeling and computational learning)-S (subset)数据集上，DCANet的性能则比第2名在F值上提升了0.8%，MAE降低了4.2%。

结论

本文提出的深度聚类注意力机制通过细粒度的通道划分和空间区域划分，有效地增强了前景敏感类的全局显著得分。与现有的注意力机制相比，DCA思想清晰、效果明显、部署简单，同时也为一般性的注意力机制研究提供了新的可行的研究方向。

Abstract

Objective

Salient object detection is a basic task in the field of computer vision

which simulates the human visual attention mechanism and quickly detects attractive objects in the scene that are most likely to represent user query variables and contain the most information. As a preprocessing step of other vision tasks

such as image resizing

visual tracking

person re-identification

and image segmentation

salient object detection plays a very important role. The traditional salient object detection method mainly uses the method of manually extracting features of the image to detect. However

this process is time-consuming and labor-intensive

and the results cannot meet the requirements. With the rise of deep learning

a large number of feature extraction algorithms based on convolutional neural networks have emerged. Compared with traditional feature extraction methods

using deep neural networks to extract features has better quality and more accurate prediction. In order to obtain accurate salient object segmentation results

deep learning-based methods mostly introduce attention mechanisms for feature weighting to suppress noise and redundant information. However

the modeling process of the existing attention mechanism is quite rough

which treats each position in the feature tensor equally and directly solves the attention score. This strategy cannot explicitly learn the global importance of different channels and different spatial regions

which may lead to missed detection or misdetection. To this end

in this study

we propose a deep clustering attention (DCA) mechanism to better model the feature-level pixel-by-pixel relationship.

Method

In this study

the proposed DCA explicitly divides the feature tensors into several categories channel-wise and spatial-wise; that is

it clusters the features into foreground and background sensitive regions. Then

general per-pixel attention weighting is performed within each class

and semantical attention weighting is further performed inter-classes. The idea of DCA is easy to understand

whose parameter quantity is also small and can be deployed in any salient detection network. This method can efficiently separate the foreground and background regions. In addition

through supervised learning on the edges of salient objects

the prediction can get clearer edges

and the results are more accurate.

Result

Comparison of 19 state-of-the-art methods on six large public datasets demonstrates the effectiveness of DCA in modeling pixel-wise attention

which is very helpful for obtaining finely salient object segmentation mask. On various evaluation indicators

the effects of the model after the deployment of DCA have improved. On the extended cornplex scene saliency(ECSSD) dataset

the performance of DCANet increased by 0.9% over the second place (F-measure value). On the Dalian University of Technology and OMRON Corporation(DUT-OMRON) dataset

the performance of DCANet increased by 0.5% over the second place (F-measure value)

and the MAE decreased by 3.2%. On the HKU-IS dataset

the performance of DCANet is 0.3% higher than the second place (F-measure value)

and the MAE is reduced by 2.8%. On the pattern analysis

statistical modeling and computational learning(PASCAL)-subset(S) dataset

the performance of DCANet is 0.8% higher than the second place (F-measure value)

and the MAE is reduced by 4.2%.

Conclusion

The DCA proposed in this study effectively enhances the globally salient scores of foreground sensitive classes through more fine-grained channel partitioning and spatial region partitioning. This paper analyzes the deficiencies of the existing salient object detection algorithm based on attention mechanism and proposes a method for explicitly dividing feature channels and spatial regions. The attention modeling mechanism helps the model training process perceive and adapt tasks quickly. Compared with the existing attention mechanism

the idea of DCA is clear

the effect is significant

and it is simple to deploy. Meanwhile

DCA provides a viable new research direction for the study of more general attention mechanisms.

关键词

Keywords

references

Borji A, Cheng M M, Jiang H Z and Li J. 2015. Salient object detection: a benchmark. IEEE Transactions on Image Processing, 24(12): 5706-5722[DOI:10.1109/TIP.2015.2487833]

Borji A, Frintrop S, Sihite D N and Itti L. 2012. Adaptive object tracking by learning background context//Proceedings of 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. Providence, USA: IEEE: 23-30[ DOI: 10.1109/CVPRW.2012.6239191 http://dx.doi.org/10.1109/CVPRW.2012.6239191 ]

Cong R M, Lei J J, Fu H Z, Cheng M M, Lin W S and Huang Q M. 2019. Review of visual saliency detection with comprehensive information. IEEE Transactions on Circuits and Systems for Video Technology, 29(10): 2941-2959[DOI:10.1109/TCSVT.2018.2870832]

Craye C, Filliat D and Goudou J F. 2016. Environment exploration for object-based visual saliency learning//Proceedings of 2016IEEE International Conference on Robotics and Automation (ICRA). Stockholm, Sweden: IEEE: 2303-2309[ DOI: 10.1109/ICRA.2016.7487379 http://dx.doi.org/10.1109/ICRA.2016.7487379 ]

Deng J, Dong W, Socher R, Li L J, Li K and Li F F. 2009. ImageNet: a large-scale hierarchical image database//Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, USA: IEEE: 248-255[ DOI: 10.1109/CVPR.2009.5206848 http://dx.doi.org/10.1109/CVPR.2009.5206848 ]

Fan D P, Cheng M M, Liu Y, Li T and Borji A. 2017. Structure-measure: a new way to evaluate foreground maps//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 4548-4557[ DOI: 10.1109/ICCV.2017.487 http://dx.doi.org/10.1109/ICCV.2017.487 ]

Feng M Y, Lu H C and Ding E R. 2019. Attentive feedback network for boundary-aware salient object detection//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 1623-1632[ DOI: 10.1109/CVPR.2019.00172 http://dx.doi.org/10.1109/CVPR.2019.00172 ]

Han J W, Zhang D W, Wen S F, Guo L, Liu T M and Li X L. 2015. Two-stage learning to predict human eye fixations via SDAEs. IEEE Transactions on Cybernetics, 46(2): 487-498[DOI:10.1109/TCYB.2015.2404432]

Harel J, Koch C and Perona P. 2007. Graph-based visual saliency//Proceedings of the 19th International Conference on Neural Information Processing Systems. Vancouver, Canada: MIT Press: 545-552

Hou Q B, Cheng M M, Hu X W, Borji A, Tu Z W and Torr P. 2017. Deeply supervised salient object detection with short connections//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 3203-3212[ DOI: 10.1109/CVPR.2017.563 http://dx.doi.org/10.1109/CVPR.2017.563 ]

Huang X, Shen C Y, Boix X and Zhao Q. 2015. SALICON: reducing the semantic gap in saliency prediction by adapting deep neural networks//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 262-270[ DOI: 10.1109/ICCV.2015.38 http://dx.doi.org/10.1109/ICCV.2015.38 ]

Kingma D P and Ba J. 2014. Adam: a method for stochastic optimization[EB/OL]. [2020-02-25] . https://arxiv.org/pdf/1412.6980.pdf https://arxiv.org/pdf/1412.6980.pdf

Lee G, Tai Y W and Kim J. 2016. Deep saliency with encoded low level distance map and high level features//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 660-668[ DOI: 10.1109/CVPR.2016.78 http://dx.doi.org/10.1109/CVPR.2016.78 ]

Li G B and Yu Y Z. 2015. Visual saliency based on multiscale deep features//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 5455-5463[ DOI: 10.1109/CVPR.2015.7299184 http://dx.doi.org/10.1109/CVPR.2015.7299184 ]

Li G B and Yu Y Z. 2016. Deep contrast learning for salient object detection//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 478-487[ DOI: 10.1109/CVPR.2016.58 http://dx.doi.org/10.1109/CVPR.2016.58 ]

Li X, Yang F, Cheng H, Liu W and Shen D G. 2018. Contour knowledge transfer for salient object detection//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 370-385[ DOI: 10.1007/978-3-030-01267-0_22 http://dx.doi.org/10.1007/978-3-030-01267-0_22 ]

Li X, Zhao L M, Wei L N, Yang M H, Wu F, Zhuang Y T, Ling H B and Wang J D. 2016. Deepsaliency: Multi-task deep neural network model for salient object detection. IEEE Transactions on Image Processing, 25(8): 3919-3930[DOI:10.1109/TIP.2016.2579306]

Li Y, Hou X D, Koch C, Rehg J M and Yuille A L. 2014. The secrets of salient object segmentation//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE: 280-287[ DOI: 10.1109/CVPR.2014.43 http://dx.doi.org/10.1109/CVPR.2014.43 ]

Liu N and Han J W. 2016. DHSNet: deep hierarchical saliency network for salient object detection//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 678-686[ DOI: 10.1109/CVPR.2016.80 http://dx.doi.org/10.1109/CVPR.2016.80 ]

Liu N, Han J W and Yang M H. 2018. PiCANet: learning pixel-wise contextual attention for saliency detection//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 3089-3098[ DOI: 10.1109/CVPR.2018.00326 http://dx.doi.org/10.1109/CVPR.2018.00326 ]

Movahedi V and Elder J H. 2010. Design and perceptual validation of performance measures for salient object segmentation//Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops. San Francisco, USA: IEEE: 49-56[ DOI: 10.1109/CVPRW.2010.5543739 http://dx.doi.org/10.1109/CVPRW.2010.5543739 ]

Qin X B, He S D, Yang X C, Dehghan M, Qin Q M and Martin J. 2018. Accurate outline extraction of individual building from very high-resolution optical images. IEEE Geoscience and Remote Sensing Letters, 15(11): 1775-1779[DOI:10.1109/LGRS.2018.2857719]

Shi J P, Yan Q, Xu L and Jia J Y. 2016. Hierarchical image saliency detection on extended CSSD. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(4): 717-729[DOI:10.1109/TPAMI.2015.2465960]

Simonyan K and Zisserman A. 2014. Very deep convolutional networks for large-scale image recognition[EB/OL]. [2020-02-13] . https://arxiv.org/pdf/1409.1556.pdf https://arxiv.org/pdf/1409.1556.pdf

Sun M J, Zhou Z Q, Hu Q H, Wang Z and Jiang J M. 2019. SG-FCN: a motion and memory-based deep learning model for video saliency detection. IEEE Transactions on Cybernetics, 49(8): 2900-2911[DOI:10.1109/TCYB.2018.2832053]

Wang L J, Lu H C, Ruan X and Yang M H. 2015. Deep networks for saliency detection via local estimation and global search//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 3183-3192[ DOI: 10.1109/CVPR.2015.7298938 http://dx.doi.org/10.1109/CVPR.2015.7298938 ]

Wang L J, Lu H C, Wang Y F, Feng M Y, Wang D, Yin B C and Ruan X. 2017. Learning to detect salient objects with image-level supervision//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE: 136-145[ DOI: 10.1109/CVPR.2017.404 http://dx.doi.org/10.1109/CVPR.2017.404 ]

Wang L Z, Wang L J, Lu H C, Zhang P P and Ruan X. 2018a. Salient object detection with recurrent fully convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(7): 1734-1746[DOI:10.1109/TPAMI.2018.2846598]

Wang T T, Zhang L H, Wang S, Lu H C, Yang G, Ruan X and Borji A. 2018b. Detect globally, refine locally: a novel approach to saliency detection//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 3127-3135[ DOI: 10.1109/CVPR.2018.00330 http://dx.doi.org/10.1109/CVPR.2018.00330 ]

Wang W G, Lai Q X, Fu H Z, Shen J B, Ling H B and Yang R G. 2019. Salient object detection in the deep learning era: an in-depth survey. IEEE Transactions on Pattern Analysis and Machine Intelligence: #9320524[DOI:10.1109/TPAMI.2021.3051099]

Yan Q, Xu L, Shi J P and Jia J Y. 2013. Hierarchical saliency detection//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, USA: IEEE: 1155-1162[ DOI: 10.1109/CVPR.2013.153 http://dx.doi.org/10.1109/CVPR.2013.153 ]

Yang C, Zhang L H, Lu H C, Ruan X and Yang M H. 2013. Saliency detection via graph-based manifold ranking//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, USA: IEEE: 3166-3173[ DOI: 10.1109/CVPR.2013.407 http://dx.doi.org/10.1109/CVPR.2013.407 ]

Yuan Y C, Li C Y, Kim J, Cai W D and Feng D D. 2018. Reversion correction and regularized random walk ranking for saliency detection. IEEE Transactions on Image Processing, 27(3): 1311-1322[DOI:10.1109/TIP.2017.2762422]

Zeng Y, Zhuge Y Z, Lu H C, Zhang L H, Qian M Y and Yu Y Z. 2019. Multi-source weak supervision for saliency detection//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 6074-6083[ DOI: 10.1109/CVPR.2019.00623 http://dx.doi.org/10.1109/CVPR.2019.00623 ]

Zhang J M, Ma S G, Sameki M, Sclaroff S, Betke M, Lin Z, Shen X H, Price B and Měch R. 2015. Salient object subitizing//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 4045-4054[ DOI: 10.1109/CVPR.2015.7299031 http://dx.doi.org/10.1109/CVPR.2015.7299031 ]

Zhang J M, Sclaroff S, Lin Z, Shen X H, Price B and Mech R. 2016. Unconstrained salient object detection via proposal subset optimization//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 5733-5742[ DOI: 10.1109/CVPR.2016.618 http://dx.doi.org/10.1109/CVPR.2016.618 ]

Zhang L H, Yang C, Lu H C, Ruan X and Yang M H. 2017c. Ranking saliency. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(9): 1892-1904[DOI:10.1109/TPAMI.2016.2609426]

Zhang P P, Liu W, Lu H C and Shen C H. 2018a. Salient object detection by lossless feature reflection//Proceedings of the 27th International Joint Conference on Artificial Intelligence. Stockholm, Sweden: ACM: 1149-1155

Zhang P P, Wang D, Lu H C, Wang H W and Ruan X. 2017a. Amulet: aggregating multi-level convolutional features for salient object detection//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 202-211[ DOI: 10.1109/ICCV.2017.31 http://dx.doi.org/10.1109/ICCV.2017.31 ]

Zhang P P, Wang D, Lu H C, Wang H Y and Yin B C. 2017b. Learning uncertain convolutional features for accurate saliency detection//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 212-221[ DOI: 10.1109/ICCV.2017.32 http://dx.doi.org/10.1109/ICCV.2017.32 ]

Zhao R, Ouyang W L and Wang X G. 2013. Unsupervised salience learning for person re-identification//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, USA: IEEE: 3586-3593[ DOI: 10.1109/CVPR.2013.460 http://dx.doi.org/10.1109/CVPR.2013.460 ]

Zhao R, Ouyang W L, Li H S and Wang X G. 2015. Saliency detection by multi-context deep learning//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 1265-1274[ DOI: 10.1109/CVPR.2015.7298731 http://dx.doi.org/10.1109/CVPR.2015.7298731 ]