融合多尺度特征与全局上下文信息的X光违禁物品检测
Integrated multi-scale features and global context in X-ray detection for prohibited items
- 2022年27卷第10期 页码:3043-3057
收稿:2021-06-01,
修回:2021-11-3,
录用:2021-11-10,
纸质出版:2022-10-16
DOI: 10.11834/jig.210368
移动端阅览

浏览全部资源
扫码关注微信
收稿:2021-06-01,
修回:2021-11-3,
录用:2021-11-10,
纸质出版:2022-10-16
移动端阅览
目的
2
X光图像违禁物品检测一直是安检领域的一个基础问题,安检违禁物品形式各异,尺度变化大,以及透视性导致大量物体堆放时出现重叠遮挡现象,传统图像处理模型很容易出现漏检误检,召回率低。针对以上问题,提出一种融合多尺度特征与全局上下文信息的特征增强融合网络(feature enhancement fusion network
FEFNet)用于X光违禁物品检测。
方法
2
首先针对特征主干网络darknet53,加入空间坐标的注意力机制,将位置信息嵌入到通道注意力中,分别沿两个空间方向聚合特征,增强特征提取器对违禁目标的特征提取能力,抑制背景噪声干扰。然后,将特征提取主干网络输出的特征编码为1维向量,利用自监督二阶融合获取特征空间像素相关性矩阵,进而获取完整的全局上下文信息,为视觉遮挡区域提供全局信息指导。针对违禁物品尺度不一的问题,提出多尺度特征金字塔融合模块,增加一层小感受野预测特征用于提高对小尺度违禁目标的检测能力。最后,通过融合全局上下文特征信息和局部多尺度细节特征解决违禁物品之间的视觉遮挡问题。
结果
2
在SIXRay-Lite(security inspection X-ray)数据集上进行训练和验证,并与SSD(single shot detection)、Faster R-CNN、RetinaNet、YOLOv5(you only look once)和ACMNet(asymmetrical convolution multi-view neural network)模型进行了对比实验。结果表明,本文模型在SIXray-Lite数据集上的mAP(mean average precision)达到85.64%,特征增强融合模块和多尺度特征金字塔融合模块较原有模型分别提升了6.73%和5.93%,总体检测精度较原有检测网络提升了11.24%。
结论
2
提出的特征增强融合检测模型能够更好地提取显著差异特征,降低背景噪声干扰,提高对多尺度以及小型违禁物品的检测能力。同时利用全局上下文特征信息和多尺度局部特征相结合,有效地缓解了违禁物品之间的视觉遮挡现象,在保证实时性的同时有效地提高了模型的整体检测精度。
Objective
2
X-ray image detection is essential for prohibited items in the context of security inspection of those are different types
large-scale changes and most unidentified prohibited items. Traditional image processing models are concerned of to the status of missed and false inspections
resulting in a low model recall rate
and non-ideal analysis in real-time detection. Differentiated from regular optical images
X-ray images tends to the overlapping phenomena derived of a large number of stacked objects. It is challenged to extract effective multiple overlapping objects information for the deep learning models. The multiple overlapping objects are checked as a new object
resulting in poor classification effect and low detection accuracy. Our feature enhancement fusion network(FEFNet)is facilitated to the issue of X-ray detection of prohibited items based on multi-scale features and global context.
Method
2
First
the feature enhancement fusion model improves you only look once v3(YOLOv3)' s feature extractor darknet53 through adding a spatial coordinated attention mechanism. The improved feature extractor is called coordinate darknet
which embeds in situ information into the channel attention and aggregates features in two spatial directions. Coordinate darknet can extract more salient and discriminatory information to improve the feature extractor's ability. Specifically
the coordinated attention module is melted into the last four residual stages of the original darknet53
including two pooling modules. To obtain feature vectors in different directions
the width and height of the feature map are pooled adaptively. To obtain attention vectors in different directions
the feature vectors are processed in different directions through the batch normalization layer and activation layer. What' s more
the obtained attention vector is applied to the input feature map to yield the model to the detailed information. Next
our bilinear second-order fusion module extracts global context features. The module encodes the highest-dimensional semantic feature information output by a melted one-dimensional vector into the feature extraction backbone network. To obtain a spatial pixel correlation matrix
the bilinear pooling is used to a two-dimensional feature undergoes second-order fusion. To output the final global context features information
the correlation matrix is multiplied by the input features up-sampled and spliced with the feature pyramid. Among them
the bilinear pool operation first obtains the fusion matrix by bilinear fusion (multiplication) of two one-dimensional vectors at the same position
and sums and pools all positions following
and obtains final L2 normalization and softmax operation after the fusion feature. Finally
the feature pyramid layer is improved in response to the problem of different scales of prohibited items. Our cross-scale fusion feature pyramid module improves the ability of multi-scale prohibited items. The multi-scale feature pyramid outputs a total of 4 feature maps of different scales as predictions
and the sizes from small to large are 13×13 pixels
26×26 pixels
52×52 pixels
and 104×104 pixels. Small-scale feature maps can predict large-scale targets
and large-scale feature maps are used to improve the predicting ability of small targets. In addition
the concatenate operation is replaced with adding
which can keep more activation mapping from the coordinate darknet. Meanwhile
the global context feature is connected to other local features straightforward derived of second-order fusion
and this information optimizes the obscure and occlusion phenomenon.
Result
2
Our experiment is trained and verified on the security inspection X-ray(SIXRay-Lite) dataset
which include 7 408 samples of training data and 1 500 test data samples. Our EFENet is compared to other object detection models as well
such as single shot detection(SSD)
Faster R-CNN
RetinaNet
YOLOv5
and asymmetrical convolution multi-view neural network(ACMNet). This experimental results show that our method achieves 85.64% mean average precision(mAP) on the SIXray-Lite dataset
which is 11.24% higher than the original YOLOv3. Among them
the average detection accuracy of gun is 95.15%
the average detection accuracy of knife is 81.43%
the average detection accuracy of wrench is 81.65%
the average detection accuracy of plier is 85.95%
and the average detection accuracy of scissor is 84.00%. Our comparative analyses demonstrate the priority of our proposed model as mentioned below: 1)in comparison with the SSD model
the mAP of the FEFNet model is increased by 13.97%; 2) compared to the RetinaNet model
the mAP of the FEFNet model is increased by 7.40%; 3) compared to the Faster R-CNN model
the mAP of the FEFNet model is increased by 5.48%; 4) compared to the YOLOv5 model
the mAP of the FEFNet model is only increased 3.61%
and 5)compared to the ACMNet model
the mAP of the FEFNet model is increased by 1.34%.
Conclusion
2
Our FEFNet can be optimized to extract significant difference features
reduce background noise interference
and improve the detection ability of multi-scale and small prohibited items. The combination of global context feature information and multi-scale local feature can alleviate the visual occlusion and obscure phenomenon between prohibited items effectively
and improve the overall detection accuracy of the model while ensuring the real-time performance.
Akçay S, Kundegorski M E, Devereux M and Breckon T P. 2016. Transfer learning using convolutional neural networks for object classification within X-ray baggage security imagery//Proceedings of 2016 IEEE International Conference on Image Processing. Phoenix, USA: IEEE: 1057-1061[ DOI: 10.1109/ICIP.2016.7532519 http://dx.doi.org/10.1109/ICIP.2016.7532519 ]
Chen S and Zhang M W. 2016. Virtual dual-energy subtraction method for X-ray radiographs by using regression model based on chest anatomicalstructure. Journal of Image and Graphics, 21(9): 1247-1255
陈胜, 张茗屋. 2016. 胸部解剖结构回归模型的虚拟双能量X线减影方法. 中国图象图形学报, 21(9): 1247-1255[DOI: 10.11834/jig.20160914]
Gaus Y F A, Bhowmik N, Akçay S, Guillén-Garcia P M, Barker J W and Breckon T P. 2019. Evaluation of a dual convolutional neural network architecture for object-wise anomaly detection in cluttered X-ray security imagery//Proceedings of 2019 International Joint Conference on Neural Networks. Budapest, Hungary: IEEE: 1-8[ DOI: 10.1109/IJCNN.2019.8851829 http://dx.doi.org/10.1109/IJCNN.2019.8851829 ]
Girshick R, Donahue J, Darrell T and Malik J. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE: 580-587[ DOI: 10.1109/CVPR.2014.81 http://dx.doi.org/10.1109/CVPR.2014.81 ]
Han P, Liu Z X and He W K. 2011. An efficient two-stage enhancement algorithm of X-ray carry-on luggage images. Opto-Electronic Engineering, 38(7): 99-105
韩萍, 刘则徐, 何炜琨. 2011. 一种有效的机场安检X光手提行李图像两级增强方法. 光电工程, 38(7): 99-105[DOI: 10.3969/j.issn.1003-501X.2011.07.018]
He K M, Gkioxari G, Dollár P and Girshick R. 2017. Mask R-CNN//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 2980-2988[ DOI: 10.1109/ICCV.2017.322 http://dx.doi.org/10.1109/ICCV.2017.322 ]
He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778[ DOI: 10.1109/CVPR.2016.90 http://dx.doi.org/10.1109/CVPR.2016.90 ]
Hou Q B, Zhou D Q and Feng J S. 2021. Coordinate attention for efficient mobile network design//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 13708-13717[ DOI: 10.1109/CVPR46437.2021.01350 http://dx.doi.org/10.1109/CVPR46437.2021.01350 ]
Krizhevsky A, Sutskever I and Hinton G E. 2012. Imagenet classification with deep convolutional neural networks//Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Lake Tahoe, USA: NIPS: 1106-1114
Lin T Y, Goyal P, Girshick R, He K M and Dollár P. 2017. Focal loss for dense object detection//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 2999-3007[ DOI: 10.1109/ICCV.2017.324 http://dx.doi.org/10.1109/ICCV.2017.324 ]
Liu J Y, Leng X X and Liu Y. 2019. Deep convolutional neural network based object detector for X-ray baggage security imagery//Proceedings of the 31st IEEE International Conference on Tools with Artificial Intelligence. Portland, USA: IEEE: 1757-1761[ DOI: 10.1109/ICTAI.2019.00262 http://dx.doi.org/10.1109/ICTAI.2019.00262 ]
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y and Berg A C. 2016. SSD: single shot MultiBox detector//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 21-37[ DOI: 10.1007/978-3-319-46448-0_2 http://dx.doi.org/10.1007/978-3-319-46448-0_2 ]
McCarley J S, Kramer A F, Wickens C D, Vidoni E D and Boot W R. 2004. Visual skills in airport-security screening. Psychological Science, 15(5): 302-306[DOI: 10.1111/j.0956-7976.2004.00673.x]
Mery D, Svec E and Arias M. 2015. Object recognition in baggage inspection using adaptive sparse representations of X-ray images//Proceedings of the 7th Image and Video Technology. Auckland, New Zealand: Springer: 709-720[ DOI: 10.1007/978-3-319-29451-3_56 http://dx.doi.org/10.1007/978-3-319-29451-3_56 ]
Miao C J, Xie L X, Wan F, Su C, Liu H Y, Jiao J B and Ye Q X. 2019. SIXray: A large-scale security inspection X-ray benchmark for prohibited item discovery in overlapping images//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 2114-2123[ DOI: 10.1109/CVPR.2019.00222 http://dx.doi.org/10.1109/CVPR.2019.00222 ]
Redmon J and Farhadi A. 2018. YOLOV3: an incremental improvement[EB/OL]. [2021-04-08] . https://arxiv.org/pdf/1804.02767.pdf https://arxiv.org/pdf/1804.02767.pdf
Ren S Q, He K M, Girshick R B and Sun J. 2015. Faster R-CNN: towards real-time object detection with region proposal networks//Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015. Montreal, Canada: NIPS: 91-99
Song X Z. 2014. Research on Processing Method of X-ray Security Image. Shenyang: Shenyang Ligong University
宋修竹. 2014. X光安检图像处理方法研究. 沈阳: 沈阳理工大学
Su Z G and Yao S Q. 2020. A multi-object prohibited items identification algorithm based on semantic segmentation. Journal of Signal Processing, 36(11): 1940-1946
苏志刚, 姚少卿. 2020. 基于语义分割的多目标违禁品识别算法. 信号处理, 36(11): 1940-1946 [DOI: 10.16798/j.issn.1003-0530.2020.11.017]
Tan S X and Huang Q. 2008. A fractal based region segmentation method and its application in casting defect recognition. Journal of Image and Graphics, 13(5): 918-923
谈绍熙, 黄茜. 2008. 一种在铸件缺陷识别中的区域分形分割方法. 中国图象图形学报, 13(5): 918-923 [DOI: 10.11834/jig.20080513]
Toyofuku N and Schatzki T F. 2005. Feasibility of feature-based contraband detection in X-ray images. Journal of Vision, 5(8): #958[DOI: 10.1167/5.8.958]
Wang Y, Zou W H, Yang X M, Jiang W and Wu W. 2017. X-ray image illegal object classification based on computer vision. Chinese Journal of Liquid Crystals and Displays, 32(4): 287-293
王宇, 邹文辉, 杨晓敏, 姜维, 吴炜. 2017. 基于计算机视觉的X射线图像异物分类研究. 液晶与显示, 32(4): 287-293 [DOI: 10.3788/YJYXS20173204.0287]
Whittig L D and Allardice W R. 1986. X-ray diffraction techniques//Klute A, ed. Methods of Soil Analysis: Part 1 Physical and Mineralogical Methods. Madison: American Society of Agronomy: 331-362
Woo S, Park J, Lee J Y and Kweon I S. 2018. CBAM: convolutional block attention module//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 3-19[ DOI: 10.1007/978-3-030-01234-2_1 http://dx.doi.org/10.1007/978-3-030-01234-2_1 ]
Xu M S, Zhang H G and Yang J F. 2018. Prohibited item detection in airport X-ray security images via attention mechanism based CNN//Proceedings of the 1st Chinese Conference on Pattern Recognition and Computer Vision. Guangzhou, China: Springer: 429-439[ DOI: 10.1007/978-3-030-03335-4_37 http://dx.doi.org/10.1007/978-3-030-03335-4_37 ]
Yan Z X, Hou Z Q, Xiong L, Liu X Y, Yu W S and Ma S G. 2021. Fine-grained classification based on bilinear feature fusion and YOLOv3. Journal of Image and Graphics, 26(4): 847-856
闫子旭, 侯志强, 熊磊, 刘晓义, 余旺盛, 马素刚. 2021. YOLOv3和双线性特征融合的细粒度图像分类. 中国图象图形学报, 26(4): 847-856 [DOI: 10.11834/jig.200031]
Zhang N and Zhu J F. 2015. Optimization method of civil aviation airport X ray image recognition research. Bulletin of Science and Technology, 31(8): 198-200
张宁, 朱金福. 2015. 民航机场X光机图像识别优化方法研究. 科技通报, 31(8): 198-200 [DOI: 10.3969/j.issn.1001-7119.2015.08.067]
Zhang Y K, Su Z G, Zhang H G and Yang J F. 2020. Multi-scale prohibited item detection in X-ray security image. Journal of Signal Processing, 36(7): 1096-1106
张友康, 苏志刚, 张海刚, 杨金锋. 2020. X光安检图像多尺度违禁品检测. 信号处理, 36(7): 1096-1106 [DOI: 10.16798/j.issn.1003-0530.2020.07.008]
Zhang Z R, Li Q and Guan X. 2020. Multilabel chest X-ray disease classification based on a dense squeeze-and-excitation network. Journal of Image and Graphics, 25(10): 2238-2248
张智睿, 李锵, 关欣. 2020. 密集挤压激励网络的多标签胸部X光片疾病分类. 中国图象图形学报, 25(10): 2238-2248 [DOI: 10.11834/jig.200232]
Zheng J Z and Lu S D. 2012. Review of the application of computed tomography technology in safety inspection domain. CT Theory and Applications, 21(1): 157-165
郑金州, 鲁绍栋. 2012. CT技术在安检领域应用综述. CT理论与应用研究, 21(1): 157-165
相关作者
相关机构
京公网安备11010802024621