结合双注意力机制的道路裂缝检测
Dual attention mechanism based pavement crack detection
- 2022年27卷第7期 页码:2240-2250
收稿:2020-12-18,
修回:2021-4-13,
录用:2021-4-20,
纸质出版:2022-07-16
DOI: 10.11834/jig.200758
移动端阅览

浏览全部资源
扫码关注微信
收稿:2020-12-18,
修回:2021-4-13,
录用:2021-4-20,
纸质出版:2022-07-16
移动端阅览
目的
2
道路裂缝检测旨在识别和定位裂缝对象,是保障道路安全的关键问题之一。为解决传统深度神经网络在检测背景较复杂、干扰较大的裂缝图像时精度较低的问题,设计了一种基于双注意力机制的深度学习道路裂缝检测网络。
方法
2
本文提出了在骨干网络中融入空洞卷积和两种注意力机制的方法,将其中的轻量型注意力机制与残差模块结合为残差注意力模块Res-A。对比研究了该模块“串联”和“并联”两种方式对于裂缝特征关系权重的影响并获得最佳连接。同时,引入Non-Local计算模式的注意力机制,通过挖掘特征图谱的关系权重以提高裂缝检测性能。结合两种注意力机制可以有效解决复杂背景下道路裂缝难检测的问题,提高了道路裂缝检测精度。
结果
2
在公开复杂道路裂缝数据集Crack500上进行对比实验与验证。为证明本文网络的有效性,将平均交并比(mean intersection over union,mIoU)、像素精确度(pixel accuracy,PA)和训练迭代时间作为评价指标,并进行了3组对比实验。第1组实验用于评价残差注意力模块中通道注意力机制和空间注意力机制之间不同组合方式的检测性能,结果表明这两种机制并联相加时的mIoU和PA分别为79.28%和93.88%,比其他两种组合方式分别提高了2.11%和2.08%、11.29%和0.23%。第2组实验用于评价残差注意力模块的有效性,结果表明添加残差注意力模块时的mIoU和PA分别比不添加时高出2.34%和3.01%。第3组实验用于对比本文网络和其他典型网络的检测性能。结果表明,本文网络的mIoU和PA分别比FCN(fully convolutional network)、PSPNet(pyramid scene parsing network)、ICNet(image cascade network)、PSANet(point-wise spatial attention network)和DenseASPP(dense atrous spatial pyramid pooling)高出7.67%和2.94%、1.54%和0.42%、6.51%和3.34%、7.76%和2.13%、7.70%和-1.59%。实验结果表明本文网络的mIoU和PA优于典型的深度神经网络。
结论
2
本文使用带空洞卷积的ResNet-101网络结合双注意力机制,在保持特征图分辨率并且提高感受野的同时,能够更好地适应背景复杂、干扰较多的裂缝对象。
Objective
2
The highway mileage China has outreached 150 000 kilometers till 2020 guided by Highway Network Planning 2013-2030. Road conditions evaluation has become one critical issue for China highway network further. Road crack detection is one of the key techniques to identify and locate crack objects for traffic safety. However
deep learning based cracked objects detection is challenged to cracked pixels and non-cracked pixels issues of single image. Current attention mechanism is recognized as a deep learning module. It strengthens the consistency of weight-related of crack objects during the training process
and improves the deep learning based crack detection performance. The low accuracy of the typical deep neural network needs to be improved in terms of the crack image detection of more complex background and more interference. Thanks to the road crack dataset of Crack500
our deep learning based road crack detection network is facilitated in the context of dual attention mechanism.
Method
2
To deal with the issues mentioned above
a dual attention mechanism integrated road crack detection network is designed. The ResNet-101 network that used dilated convolution is as the basic feature extraction network of the model. The ResNet network has the following features as below: 1) the number of parameters can be manipulated; 2) our network levels are clarified
and the number of multilayer's feature maps have their output features ability; 3) the network uses fewer pooling layers and redundant downsampling layers to improve transmission efficiency; 4) the network does not use dropout layer but batch normalization(BN) and global average pooling layer to regularize training process for speeding up; 5) when the number of network layers is high
the number of 3×3 convolution layers is reduced
and 1×1 convolution layers are used to control the number of input and output feature maps. The ResNet-101 network contains a total of 4 residual groups
including 3
4
23
and 3 residual blocks
respectively. Therefore
a lightweight attention mechanism is relevant to the end of the residual module for a residual attention module. The lightweight attention mechanism is composed of spatial attention mechanism and channel attention mechanism. The 4 residual groups of ResNet-101 used 3
4
23
and 3 residual attention modules
respectively. It is used to enhance consistent weight relationship of the crack objects to realize replicated features extraction of the higher layer of the crack. Giving a medium feature map
the weights relationship is sequentially inferred along the two dimensions of space and channel. Then
multiplying the original feature map to meet the adequate features. It can be seamlessly integrated into any convolutional neural network (CNN) architecture. It also can be trained end-to-end with the CNN together. Our demonstration introduced a non-local attention mechanism at the end of the ResNet-101 network. We obtained the related weight of the highest layer crack feature and achieved the crack detection result. Similarly
the attention mechanism of non-local computing module is related to spatial attention mechanism and channel attention mechanism. Spatial features are updated by the weighted features aggregation in all spots of the image. The weight is determined in terms of the similarity of the features in the two spaces. The channel attention mechanism also applies a similar self-attention mechanism to learn the relationship between any two channel mappings. It updates each channel through the weighted aggregation of all channels as well. Our coding work is implemented based on the pytorch deep learning framework. We carried out stochastic gradient descent(SGD) optimization with an initial learning rate of 0.000 1. The mean intersection over union (mIoU)
pixel accuracy (PA)
and iteration time are as the evaluation indicators of deep learning models.
Result
2
The effectiveness of this network is verified through 4 categories of comparative experiments. The first category is used to evaluate the detection performance of various combination ways of channel attention mechanism and spatial attention mechanism in the residual attention module. The best interactive way is to integrate channel attention mechanism and spatial attention mechanism in parallel. Compared to the other two interactive ways
the mIoU increases 2.11% and 11.29%each; each PA increases by 2.08% and 0.23%. The second result is used to evaluate the effectiveness of the residual attention module.
the residual attention module added mIoU and PA increase by 2.34% and 3.01% in comparison with non-residual attention module. The third illustration is used to contrast the effect of common convolution and dilated convolution
the mIoU and PA of using dilated convolution increased by 6.65% and 4.18%. The final one is used to evaluate the detection performance of our network and some deep neural networks. Compared to fully convolutional network (FCN)
pyramid scene parsing network (PSPNet)
image cascade network (ICNet)
point-wise spatial attention network (PSANet)
dense atrous spatial pyramid pooling (DenseASPP) FCN
PSPNet
ICNet
our mIoU obtained increases by 7.67%
1.54%
6.51%
7.76%
7.70%
respectively. Our PA results increases by 2.94%
0.42%
3.34%
2.13%
-1.59%
respectively.
Conclusion
2
Our network is combined the ResNet-101 network with dilated convolution and dual attention mechanism. While maintaining the resolution of the feature map and improving the receptive field
this network has its priority to adapt to crack objects with complex background and more interference. Our analyzed results show that our mIoU and PA results have promoted current deep neural networks ability.
Cao J G, Yang G T and Yang X Y. 2020. Pavement crack detection with deep learning based on attention mechanism. Journal of Computer-Aided Design and Computer Graphics, 32(8): 1324-1333
曹锦纲, 杨国田, 杨锡运. 2020. 基于注意力机制的深度学习路面裂缝检测. 计算机辅助设计与图形学学报, 32(8): 1324-1333[DOI: 10.3724/SP.J.1089.2020.18059]
Cha Y J, Choi W and Büyüköztürk O. 2017. Deep learning-based crack damage detection using convolutional neural networks. Computer-Aided Civil and Infrastructure Engineering, 32(5): 361-378[DOI: 10.1111/mice.12263]
Fu J, Liu J, Tian H J, Li Y, Bao Y J, Fang Z W and Lu H Q. 2019. Dual attention network for scene segmentation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 3146-3154[ DOI: 10.1109/cvpr.2019.00326 http://dx.doi.org/10.1109/cvpr.2019.00326 ]
He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778[ DOI: 10.1109/cvpr.2016.90 http://dx.doi.org/10.1109/cvpr.2016.90 ]
Li L F, Ma W F, Li L and Lu C. 2019. Research on detection algorithm for bridge cracks based on deep learning. Acta Automatica Sinica, 45(9): 1727-1742
李良福, 马卫飞, 李丽, 陆铖. 2019. 基于深度学习的桥梁裂缝检测算法研究. 自动化学报, 45(9): 1727-1742[DOI: 10.16383/j.aas.2018.c170052]
Liu Z Q, Cao Y W, Wang Y Z and Wang W. 2019. Computer vision-based concrete crack detection using U-net fully convolutional networks. Automation in Construction, 104: 129-139[DOI: 10.1016/j.autcon.2019.04.005]
Long J, Shelhamer E and Darrell T. 2015. Fully convolutional networks for semantic segmentation//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 3431-3440[ DOI: 10.1109/cvpr.2015.7298965 http://dx.doi.org/10.1109/cvpr.2015.7298965 ]
Sha A M, Tong Z and Gao J. 2018. Recognition and measurement of pavement disasters based on convolutional neural networks. China Journal of Highway and Transport, 31(1): 1-10
沙爱民, 童峥, 高杰. 2018. 基于卷积神经网络的路表病害识别与测量. 中国公路学报, 31(1): 1-10[DOI: 10.19721/j.cnki.1001-7372.2018.01.001]
Wang S, Wu X, Zhang Y H and Chen Q. 2018. Image crack detection with fully convolutional network based on deep learning. Journal of Computer-Aided Design and Computer Graphics, 30(5): 859-867
王森, 伍星, 张印辉, 陈庆. 2018. 基于深度学习的全卷积网络图像裂纹检测. 计算机辅助设计与图形学学报, 30(5): 859-867[DOI: 10.3724/SP.J.1089.2018.16573]
Wang X L, Girshick R, Gupta A and He K M. 2018. Non-local neural networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 7794-7803[ DOI: 10.1109/cvpr.2018.00813 http://dx.doi.org/10.1109/cvpr.2018.00813 ]
Woo S, Park J, Lee J Y and Kweon I S. 2018. CBAM: convolutional block attention module//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 3-19[ DOI: 10.1007/978-3-030-01234-2_1 http://dx.doi.org/10.1007/978-3-030-01234-2_1 ]
Yang F, Zhang L, Yu S J, Prokhorov D, Mei X and Ling H B. 2020. Feature pyramid and hierarchical boosting network for pavement crack detection. IEEE Transactions on Intelligent Transportation Systems, 21(4): 1525-1535[DOI: 10.1109/tits.2019.2910595]
Yang M K, Yu K, Zhang C, Li Z W and Yang K Y. 2018. DenseASPP for semantic segmentation in street scenes//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 3684-3692[ DOI: 10.1109/cvpr.2018.00388 http://dx.doi.org/10.1109/cvpr.2018.00388 ]
Yu F, Koltun V and Funkhouser T. 2017. Dilated residual networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 636-644[ DOI: 10.1109/cvpr.2017.75 http://dx.doi.org/10.1109/cvpr.2017.75 ]
Zhai P B, Yang H, Song T T, Yu K, Ma L X and Huang X S. 2020. Two-path semantic segmentation algorithm combining attention mechanism. Journal of Image and Graphics, 25(8): 1627-1636
翟鹏博, 杨浩, 宋婷婷, 余亢, 马龙祥, 黄向生. 2020. 结合注意力机制的双路径语义分割. 中国图象图形学报, 25(8): 1627-1636[DOI: 10.11834/jig.190533]
Zhang A, Wang K C P, Li B X, Yang E H, Dai X X, Peng Y, Fei Y, Liu Y, Li J Q and Chen C. 2017. Automated pixel-level pavement crack detection on 3D asphalt surfaces using a deep-learning network. Computer-Aided Civil and Infrastructure Engineering, 32(10): 805-819[DOI: 10.1111/mice.12297]
Zhang L, Yang F, Zhang Y D and Zhu Y J. 2016. Road crack detection using deep convolutional neural network//Proceedings of 2016 IEEE international Conference on Image Processing (ICIP). Phoenix, USA: IEEE: 3708-3712[ DOI: 10.1109/ICIP.2016.7533052 http://dx.doi.org/10.1109/ICIP.2016.7533052 ]
Zhao H S, Shi J P, Qi X J, Wang X G and Jia J Y. 2017. Pyramid scene parsing network//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 2881-2890[ DOI: 10.1109/cvpr.2017.660 http://dx.doi.org/10.1109/cvpr.2017.660 ]
Zhao H S, Qi X J, Shen X Y, Shi J P and Jia J Y. 2018a. ICNet for real-time semantic segmentation on high-resolution images//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 418-434[ DOI: 10.1007/978-3-030-01219-9_25 http://dx.doi.org/10.1007/978-3-030-01219-9_25 ]
Zhao H S, Zhang Y, Liu S, Shi J P, Loy C C, Lin D H and Jia J Y. 2018b. PSANet: point-wise spatial attention network for scene parsing//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 270-286[ DOI: 10.1007/978-3-030-01240-3_17 http://dx.doi.org/10.1007/978-3-030-01240-3_17 ]
相关作者
相关机构
京公网安备11010802024621