The gating self-attention mechanism and GAN integrated video anomaly detection
- Vol. 27, Issue 11, Pages: 3210-3221(2022)
Published: 16 November 2022 ,
Accepted: 04 November 2021
DOI: 10.11834/jig.210520
移动端阅览
浏览全部资源
扫码关注微信
Published: 16 November 2022 ,
Accepted: 04 November 2021
移动端阅览
Chengming Liu, Ran Xue, Lei Shi, Yinghao Li, Yufei Gao. The gating self-attention mechanism and GAN integrated video anomaly detection. [J]. Journal of Image and Graphics 27(11):3210-3221(2022)
目的
2
视频异常行为检测是当前智能监控技术的研究热点之一,在社会安防领域具有重要应用。如何通过有效地对视频空间维度信息和时间维度信息建模来提高异常检测的精度仍是目前研究的难点。由于结构优势,生成对抗网络目前广泛应用于视频异常检测任务。针对传统生成对抗网络时空特征利用率低和检测效果差等问题,本文提出一种融合门控自注意力机制的生成对抗网络进行视频异常行为检测。
方法
2
在生成对抗网络的生成网络U-net部分引入门控自注意力机制,逐层对采样过程中的特征图进行权重分配,融合U-net网络和门控自注意力机制的性能优势,抑制输入视频帧中与异常检测任务不相关背景区域的特征表达,突出任务中不同目标对象的相关特征表达,更有效地针对时空维度信息进行建模。采用LiteFlownet网络对视频流中的运动信息进行提取,以保证视频序列之间的连续性。同时,加入强度损失函数、梯度损失函数和运动损失函数加强模型检测的稳定性,以实现对视频异常行为的检测。
结果
2
在CUHK(Chinese University of Hong Kong)Avenue、UCSD(University of California,San Diego)Ped1和UCSD Ped2等视频异常事件数据集上进行实验。在CUHK Avenue数据集中,本文方法的AUC(area under curve)为87.2%,比同类方法高2.3%;在UCSD Ped1和UCSD Ped2数据集中,本文方法的AUC值均高于同类其他方法。同时,设计了4个消融实验并对实验结果进行对比分析,本文方法具有更高的AUC值。
结论
2
实验结果表明,本文方法更适合视频异常检测任务,有效提高了异常行为检测任务模型的稳定性和准确率,且采用视频序列帧间运动信息能够显著提升异常行为检测性能。
Objective
2
Video-based abnormal behavior detection has been developing based on the intelligent surveillance technology
and it has potentials in public security. However
the issue of video-based spatio-temporal information modeling is challenged for improving the accuracy of anomaly detection. Traditional video-based abnormal behavior detection methods are focused on manual-based features extraction
such as the clear contour
motion information and trajectory of the target. Such methods are constrained of weak representation in massive video data processing. Current deep learning model method can automatically learn and extract advanced features based on massive video stream datasets
which has been widely used in video anomaly detection methods instead of manual-based features. The structural priorities of generative adversarial network (GAN) have been widely used in video anomaly detection tasks. Aiming at the problems of low utilization rate of spatio-temporal features and poor detection effect of traditional GAN
we demonstrate a video anomaly detection algorithm based on the integration of GAN and gating self-attention mechanism.
Method
2
First
the gating self-attention mechanism is introduced into the U-net part of the generative network in the GAN
and the self-attention-mechanism-derived distributed weight of the feature maps is assigned layer by layer in the sampling process. The standard U-net network is linked to the features of the targets through the jump connection structure without effective features orientation. Our research is focused on combining the structural optimization of U-net network and gated self-attention mechanism
the feature representation of background regions irrelevant to anomaly detection tasks is suppressed in input video frames
the related feature expression of different targets is highlighted
and the spatio-temporal information is modeled more effectively. Next
to guarantee the consistency between video sequences
we adopt a smoother and faster LiteFlownet network to extract the motion information between video streams. Finally
to generate higher quality frames
the loss-related multi-functions of intensity
gradient and motion are added to enhance the stability of model detection. The adversarial network is trained by PatchGAN. GAN can achieve a good and stable performance after learning adversarial optimization.
Result
2
Our experiments are carried out on the datasets of recognized video abnormal event
such as Chinese University of Hong Kong(CUHK) Avenue
University of California
San Diego(UCSD) Ped1 and UCSD Ped2
and the featured area value under receiver operating curve (ROC)
anomaly rule fraction S and peak signal-to-noise ratio (PSNR) are taken as performance evaluation indexes. For the CUHK Avenue dataset
our area under curve(AUC)reaches 87.2%
which is 2.3% higher than those similar methods. For both UCSD Ped1 and UCSD Ped2 datasets
the AUC-values are higher more. At the same time
four ablation experiments are implemented as mentioned below: 1) the model 1 is applied to video anomaly detection tasks using standard U-net as the generative network; 2) the difference of model 2 is clarified that the gating self-attention mechanism is added to the generation network U-net to verify whether the mechanism is effective or not; 3) model 3 adds a gating self-attention mechanism to the generative network U-net
and the LiteFlownet is added to verify the effectiveness of the optical flow network; and 4) our model 4 is illustrated as well. For the generated network U-net and the gating self-attention mechanism
LiteFlownet is added and the gating self-attention mechanism is merged layer by layer at the coding end to perform feature weighting processing and the merged features are identified at the decoding end. Our method can obtain higher AUC values than the other three ablation model methods. We test the trained model and visualize the PSNR value of video sequence frames. The change of PSNR value shows the accuracy of the model for abnormal behavior detection.
Conclusion
2
The experimental results show that our method achieves better recognition results on CUHK Avenue
UCSD Ped1 and UCSD Ped2 datasets
which is more suitable for video anomaly detection tasks
and effectively improves the stability and accuracy of abnormal behavior detection task model. Moreover
the performance of abnormal behavior detection can be significantly improved via using video sequence interframe motion information.
视频异常检测生成对抗网络(GAN)U-net门控自注意力机制光流网络
video anomaly detectiongenerative adversarial network(GAN)U-netgating self-attention mechanismoptical flow network
Ahmed S A, Dogra D P, Kar S and Roy P P. 2019. Trajectory-based surveillance analysis: a survey. IEEE Transactions on Circuits and Systems for Video Technology, 29(7): 1985-1997[DOI: 10.1109/TCSVT.2018.2857489]
Bahdanau D, Cho K and Bengio Y. 2014. Neural machine translation by jointly learning to align and translate. [EB/OL]. [2021-09-01]. https://arxiv.org/pdf/1409.0473.pdf[DOI:10.48550/arXiv.1409.0473http://dx.doi.org/10.48550/arXiv.1409.0473]
Dong F, Zhang Y and Nie X S. 2020. Dual discriminator generative adversarial network for video anomaly detection. IEEE Access, 8: 88170-88176[DOI: 10.1109/ACCESS.2020.2993373]
Hasan M, Choi J, Neumann J, Roy-Chowdhury A K and Davis L S. 2016. Learning temporal regularity in video sequences//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE: 733-742[DOI:10.1109/CVPR.2016.86]http://dx.doi.org/10.1109/CVPR.2016.86].
Hu H Y, Zhang L and Li Z J. 2020. Anomaly detection with autoencoder and one-class SVM. Journal of Image and Graphics, 25(12): 2614-2629
胡海洋, 张力, 李忠金. 2020. 融合自编码器和one-class SVM的异常事件检测. 中国图象图形学报,25(12): 2614-2629[DOI: 10.11834/jig.200042]
Hui T W, Tang X O and Loy C C. 2018. LiteFlowNet: a lightweight convolutional neural network for optical flow estimation//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: #936[DOI:10.1109/CVPR.2018.00936http://dx.doi.org/10.1109/CVPR.2018.00936]
Isola P, Zhu J Y, Zhou T H and Efros A A. 2017. Image-to-image translation with conditional adversarial networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 5967-5976[DOI:10.1109/CVPR.2017.632http://dx.doi.org/10.1109/CVPR.2017.632]
Kingma D and Ba J. 2014. Adam: a method for stochastic optimization. [EB/OL]. [2021-12-22].https://arxiv.org/pdf/1412.6980.pdfhttps://arxiv.org/pdf/1412.6980.pdf
Li T, Chang H, Wang M, Ni B B, Hong R C and Yan S C. 2015. Crowded scene analysis: a survey. IEEE Transactions on Circuits and Systems for Video Technology, 25(3): 367-386[DOI: 10.1109/TCSVT.2014.2358029]
Li W X, Mahadevan V and Vasconcelos N. 2014. Anomaly detection and localization in crowded scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(1): 18-32[DOI: 10.1109/TPAMI.2013.111]
Liu W, Luo W X, Lian D Z and Gao S H. 2018. Future frame prediction for anomaly detection-a new baseline//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 6536-6545[DOI:10.1109/CVPR.2018.00684http://dx.doi.org/10.1109/CVPR.2018.00684]
Lu C W, Shi J P and Jia JY. 2014. Abnormal event detection at 150 FPS in MATLAB//Proceedings of 2013 IEEE International Conference on Computer Vision. Sydney, Australia: IEEE: 2720-2727[DOI:10.1109/ICCV.2013.338http://dx.doi.org/10.1109/ICCV.2013.338]
Luo W X, Wen L and Gao S H. 2017a. A revisit of sparse coding based anomaly detection in stacked RNN framework//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 341-349[DOI:10.1109/ICCV.2017.45http://dx.doi.org/10.1109/ICCV.2017.45]
Luo W X, Wen L and Gao S H. 2017b. Remembering history with convolutional LSTM for anomaly detection//Proceedings of 2017 IEEE International Conference on Multimedia and Expo. Hong Kong, China: IEEE: 439-444[DOI:10.1109/ICME.2017.8019325http://dx.doi.org/10.1109/ICME.2017.8019325]
Ma Y X, Tan L, Dong X and Yu C C. 2019. Action recognition for intelligent monitoring. Journal of Image and Graphics, 24(2): 282-290
马钰锡, 谭励, 董旭, 于重重. 2019. 面向智能监控的行为识别. 中国图象图形学报, 24(2): 282-290[DOI: 10.11834/jig.180392]
Mahadevan V, Li W X, Bhalodia V and Vasconcelos N. 2010. Anomaly detection in crowded scenes//Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, USA: IEEE: 1975-1981[DOI:10.1109/CVPR.2010.5539872http://dx.doi.org/10.1109/CVPR.2010.5539872]
Nguyen T N and Meunier J. 2019. Anomaly detection in video sequence with appearance-motion correspondence. [EB/OL]. [2021-08-17].https://arxiv.org/pdf/1908.06351.pdfhttps://arxiv.org/pdf/1908.06351.pdf
Oktay O, Schlemper J, Le Folgoc L, Lee M, Heinrich M, Misawa K, Mori K, McDonagh S, Hammerla N Y, Kainz B, Glocker B and Rueckert D. 2018. Attention U-Net: learning where to look for the pancreas. [EB/OL]. [2021-04-11].https://arxiv.org/pdf/1804.03999.pdfhttps://arxiv.org/pdf/1804.03999.pdf
Ronneberger O, Fischer P and Brox T. 2015. U-Net: convolutional networks for biomedical image segmentation//Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Munich, Germany: Springer: 234-241[DOI:10.1007/978-3-319-24574-4_28http://dx.doi.org/10.1007/978-3-319-24574-4_28]
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S A, Huang Z H, Karpathy A, Khosla A, Bernstein M, Berg A C and Li F F. 2015. ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 115(3): 211-252[DOI: 10.1007/s11263-015-0816-y]
Tong Y B, Hu W W, Yang D K and Zhang Q S. 2006. A review on the video quality assessment methods. Journal of Computer-Aided Design and Computer Graphics, 18(5): 735-741
佟雨兵, 胡薇薇, 杨东凯, 张其善. 2006. 视频质量评价方法综述. 计算机辅助设计与图形学学报, 18(5): 735-741[DOI: 10.3321/j.issn:1003-9775.2006.05.020]
Wang F, Jiang M Q, Qian C, Yang S, Li C, Zhang H G, Wang X G and Tang X O. 2017. Residual attention network for image classification//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 6450-6458[DOI:10.1109/CVPR.2017.683http://dx.doi.org/10.1109/CVPR.2017.683]
Wang X L, Girshick R, Gupta A and He K M. 2017. Non-local neural networks//Proceedings of 2017 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 7794-7803[DOI:10.1109/CVPR.2018.00813http://dx.doi.org/10.1109/CVPR.2018.00813]
Zhu Z L, Rao Y, Wu Y, Qi J N and Zhang Y. 2019. Research progress of attention mechanism in deep learning. Chinese Information Processing, 33(6): 1-11
朱张莉, 饶元, 吴渊, 祁江楠, 张钰. 2019. 注意力机制在深度学习中的研究进展. 中文信息学报, 33(6): 1-11[DOI:10.3969/j.issn.1003-0077.2019.06.001]
相关文章
相关作者
相关机构