融合多重注意力机制的人眼注视点预测
Eye fixation prediction combining with multiple attention mechanism
- 2022年27卷第12期 页码:3503-3515
收稿:2021-08-03,
修回:2021-12-27,
录用:2022-1-3,
纸质出版:2022-12-16
DOI: 10.11834/jig.210590
移动端阅览

浏览全部资源
扫码关注微信
收稿:2021-08-03,
修回:2021-12-27,
录用:2022-1-3,
纸质出版:2022-12-16
移动端阅览
目的
2
经典的人眼注视点预测模型通常采用跳跃连接的方式融合高、低层次特征,容易导致不同层级之间特征的重要性难以权衡,且没有考虑人眼在观察图像时偏向中心区域的问题。对此,本文提出一种融合注意力机制的图像特征提取方法,并利用高斯学习模块对提取的特征进行优化,提高了人眼注视点预测的精度。
方法
2
提出一种新的基于多重注意力机制(multiple attention mechanism,MAM)的人眼注视点预测模型,综合利用3种不同的注意力机制,对添加空洞卷积的ResNet-50模型提取的特征信息分别在空间、通道和层级上进行加权。该网络主要由特征提取模块、多重注意力模块和高斯学习优化模块组成。其中,空洞卷积能够有效获取不同大小的感受野信息,保证特征图分辨率大小的不变性;多重注意力模块旨在自动优化获得的低层丰富的细节信息和高层的全局语义信息,并充分提取特征图通道和空间信息,防止过度依赖模型中的高层特征;高斯学习模块用来自动选择合适的高斯模糊核来模糊显著性图像,解决人眼观察图像时的中心偏置问题。
结果
2
在公开数据集SALICON(saliency in context)上的实验表明,提出的方法相较于同结构的SAM-Res(saliency attention modal)模型以及DINet(dilated inception network)模型在相对熵(Kullback-Leibler divergence,KLD)、sAUC(shuffled area under ROC curve)和信息增益(information gain,IG)评价标准上分别提高了33%、0.3%和6%;53%、0.5%和192%。
结论
2
实验结果表明,提出的人眼注视点预测模型能通过加权的方式分别提取空间、通道、层之间的特征
在多数人眼注视点预测指标上超过了主流模型。
Objective
2
Human eye fixation recognition has been developing in images-related computer vision in recent years. The distinctive salient regions of an image are selected for capturing visual structure better. Recent saliency models are developed through salient object detection
object segmentation and image cropping. Traditional applications are focused on hand-crafted features based on low-level cues (e.g.
contrast
texture
color) for saliency prediction. However
these features are easily failed to simulate the complex activation of the human visual system
especially in complex scenarios. Existing eye fixation prediction models often use jump connections to fuse high-level and low-level features
which easily leads to the difficulty of weighing the importance of features between different levels
and the gazing problem are biased toward the center. Commonly
humans are inclined to look at the center of the image when there are no obvious salient regions. We develop layer attention mechanism that different weights are assigned to different layer features for selective layer features extraction
and the channel attention mechanism and spatial attention mechanism are integrated to selectively extract different channel and spatial features in convolutional features. In addition
we facilitate a method of Gaussian learning to solve the problem of the center priors and improve the prediction accuracy.
Method
2
Our eye fixation prediction model is based on multiple attention mechanism network (MAM-Net)
which uses three different attention mechanisms to weight the feature information of different layers
different channels
and different image pixels extracted by the ResNet-50 model with dilated convolution. Our network is mainly composed of the feature extraction module
the novel multiple attention mechanism (MAM) module
and the Gaussian learning optimization module. 1) A dilated convolution network is used to capture long-range information via extracting local and global feature maps
which can contain a lot of different receptive fields. 2) A MAM attention module is incorporated features from different contexts of layer
channels
and image pixels of feature maps and output an intermediate saliency map. 3) A Gaussian learning layer is used to select best kernel automatically to blur the intermediate saliency map and generate the final saliency map. Our MAM module aims to optimize the obtained low-level features automatically in the context of rich details and high-level global semantic information features
fully extract channel and spatial information
and prevent over-reliance on high-level features. The Gaussian learning module is used for the final optimization processing since human eyes tend to focus to the image center
which is inconsistent with the prediction results of common methods. The deficiency of setting Gaussian fuzzy parameters is avoided by human prior in our method.
Result
2
Experiments on the public dataset saliency in context(SALICON) show that our results has improved Kullback-Leibler divergence (KLD)
shuffled area under region of interest(ROC) curve (sAUC)
and information gain (IG) evaluation criteria by 33%
0.3%
and 6%; 53%
0.6%
and 192%
respectively.
Conclusion
2
We propose a novel attentive model for predicting human eye fixations on natural images. Our MAM-Net can be used to predict saliency map of an image
which extract high-level and low-level features. The channel and spatial attention mechanism can optimize the feature maps of different layers
and the layer attention mechanism can predict the saliency map of the image composed of high-level and low-level features as well. We illustrate a Gaussian learning blur layer in terms of the integrated saliency maps optimization with different kernel.
Borji A. 2021. Saliency prediction in the deep learning era: successes and Limitations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(2): 679-700[DOI: 10.1109/TPAMI.2019.2935715]
Borji A and Itti L. 2012. Exploiting local and global patch rarities for saliency detection//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, USA: IEEE: 478-485[DOI: 10.1109/CVPR.2012.6247711 http://dx.doi.org/10.1109/CVPR.2012.6247711 ]
Bylinskii Z, Judd T, Oliva A, Torralba A and Durand F. 2019. What do different evaluation metrics tell us about saliency models? IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(3): 740-757[DOI: 10.1109/TPAMI.2018.2815601 http://dx.doi.org/10.1109/TPAMI.2018.2815601 ]
Che Z H, Borji A, Zhai G T, Min X K, Guo G D and Le Callet P. 2020. How is gaze influenced by image transformations? Dataset and model. IEEE Transactions on Image Processing, 29: 2287-2300[DOI: 10.1109/TIP.2019.2945857]
Cheng M M, Mitra N J, Huang X L, Torr P H S and Hu S M. 2015. Global contrast based salient region detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(3): 569-582[DOI: 10.1109/TPAMI.2014.2345401]
Cornia M, Baraldi L, Serra G and Cucchiara R. 2018. Predicting human eye fixations via an LSTM-based saliency attentive model. IEEE Transactions on Image Processing, 27(10): 5142-5154[DOI: 10.1109/TIP.2018.2851672]
Dorta G, Vicente S, Agapito L, Campbell N D F and Simpson I. 2018. Structured uncertainty prediction networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 5477-5485[DOI: 10.1109/CVPR.2018.00574 http://dx.doi.org/10.1109/CVPR.2018.00574 ]
Harel J, Koch C and Perona P. 2007. Graph-based visual saliency//Proceedings of the 19th International Conference on Neural Information Processing Systems. Vancouver, Canada: ACM: 545-552
He W and Pan C. 2022. The salient object detection based on attention-guided network. Journal of Image and Graphics, 27(4): 1176-1190
何伟, 潘晨. 2022. 注意力引导网络的显著性目标检测. 中国图象图形学报, 27(4): 1176-1190)[DOI: 10.11834/jig.200658]
Hu J, Shen L, Albanie S, Sun G and Wu E H. 2020. Squeeze-and-Excitation networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(8): 2011-2023[DOI: 10.1109/TPAMI.2019.2913372]
Huang X, Shen C Y, Boix X and Zhao Q. 2015. SALICON: reducing the semantic gap in saliency prediction by adapting deep neural networks//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 267-270[DOI: 10.1109/ICCV.2015.38 http://dx.doi.org/10.1109/ICCV.2015.38 ]
Judd T, Ehinger K, Durand F and Torralba A. 2009. Learning to predict where humans look//Proceedings of the 12th IEEE International Conference on Computer Vision. Kyoto, Japan: IEEE: 2106-2113[DOI: 10.1109/ICCV.2009.5459462 http://dx.doi.org/10.1109/ICCV.2009.5459462 ]
Kruthiventi S S S, Ayush K and Babu R V. 2017. DeepFix: a fully convolutional neural network for predicting human eye fixations. IEEE Transactions on Image Processing, 26(9): 4446-4456[DOI: 10.1109/TIP.2017.2710620]
Kümmerer M, Theis L and Bethge M. 2014. Deep gaze I: boosting saliency prediction with feature maps trained on ImageNet[EB/OL]. [2021-05-09] . https://arxiv.org/pdf/1411.1045.pdf https://arxiv.org/pdf/1411.1045.pdf
Liang M and Hu X L. 2015. Predicting eye fixations with higher-level visual features. IEEE Transactions on Image Processing, 24(3): 1178-1189[DOI: 10.1109/TIP.2015.2395713]
Liu N and Han J W. 2018. A deep spatial contextual long-term recurrent convolutional network for saliency detection. IEEE Transactions on Image Processing, 27(7): 3264-3274[DOI: 10.1109/TIP.2018.2817047]
Mahdi A and Qin J. 2019. An extensive evaluation of deep features of convolutional neural networks for saliency prediction of human visual attention. Journal of Visual Communication and Image Representation, 65: #102662[DOI: 10.1016/j.jvcir.2019.102662]
Oyama T and Yamanaka T. 2018. Influence of image classification accuracy on saliency map estimation. CAAI Transactions on Intelligence Technology, 3(3): 140-152[DOI: 10.1049/trit.2018.1012]
Tatler B W. 2007. The central fixation bias in scene viewing: selecting an optimal viewing position independently of motor biases and image feature distributions. Journal of Vision, 7(14): #4[DOI: 10.1167/7.14.4]
Valenti R, Sebe N and Gevers T. 2009. Image saliency by isocentric curvedness and color//Proceedings of the 12th IEEE International Conference on Computer Vision. Kyoto, Japan: IEEE: 2185-2192[DOI: 10.1109/ICCV.2009.5459240 http://dx.doi.org/10.1109/ICCV.2009.5459240 ]
Vig E, Dorr M and Cox D. 2014. Large-Scale optimization of hierarchical features for saliency prediction in natural images//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE: 2798-2805[DOI: 10.1109/CVPR.2014.358 http://dx.doi.org/10.1109/CVPR.2014.358 ]
Wang W G, Shen J B and Jia Y D. 2019. Review of visual attention detection. Journal of Software, 30(2): 416-439
王文冠, 沈建冰, 贾云得. 2019. 视觉注意力检测综述. 软件学报, 30(2): 416-439)[DOI: 10.13328/j.cnki.jos.005636]
Woo S, Park J, Lee J Y and Kweon I S. 2018. CBAM: convolutional block attention module//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 3-19[DOI: 10.1007/978-3-030-01234-2_1 http://dx.doi.org/10.1007/978-3-030-01234-2_1 ]
Yang S, Lin G S, Jiang Q P and Lin W S. 2020. A dilated inception network for visual saliency prediction. IEEE Transactions on Multimedia, 22(8): 2163-2176[DOI: 10.1109/TMM.2019.2947352]
Zhang J M and Sclaroff S. 2016. Exploiting surroundedness for saliency detection: a Boolean map approach. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(5): 889-902[DOI: 10.1109/TPAMI.2015.2473844]
相关作者
相关机构
京公网安备11010802024621