Current Issue Cover
结合双注意力机制的道路裂缝检测

张志华1,2,3, 温亚楠1,2,3, 慕号伟1, 杜小平4(1.兰州交通大学测绘与地理信息学院, 兰州 730070;2.地理国情监测技术应用国家地方联合工程研究中心, 兰州 730070;3.甘肃省地理国情监测工程实验室, 兰州 730070;4.中国科学院空天信息创新研究院数字地球重点实验室, 北京 100094)

摘 要
目的 道路裂缝检测旨在识别和定位裂缝对象,是保障道路安全的关键问题之一。为解决传统深度神经网络在检测背景较复杂、干扰较大的裂缝图像时精度较低的问题,设计了一种基于双注意力机制的深度学习道路裂缝检测网络。方法 本文提出了在骨干网络中融入空洞卷积和两种注意力机制的方法,将其中的轻量型注意力机制与残差模块结合为残差注意力模块Res-A。对比研究了该模块“串联”和“并联”两种方式对于裂缝特征关系权重的影响并获得最佳连接。同时,引入Non-Local计算模式的注意力机制,通过挖掘特征图谱的关系权重以提高裂缝检测性能。结合两种注意力机制可以有效解决复杂背景下道路裂缝难检测的问题,提高了道路裂缝检测精度。结果 在公开复杂道路裂缝数据集Crack500上进行对比实验与验证。为证明本文网络的有效性,将平均交并比(mean intersection over union,mIoU)、像素精确度(pixel accuracy,PA)和训练迭代时间作为评价指标,并进行了3组对比实验。第1组实验用于评价残差注意力模块中通道注意力机制和空间注意力机制之间不同组合方式的检测性能,结果表明这两种机制并联相加时的mIoU和PA分别为79.28%和93.88%,比其他两种组合方式分别提高了2.11%和2.08%、11.29%和0.23%。第2组实验用于评价残差注意力模块的有效性,结果表明添加残差注意力模块时的mIoU和PA分别比不添加时高出2.34%和3.01%。第3组实验用于对比本文网络和其他典型网络的检测性能。结果表明,本文网络的mIoU和PA分别比FCN (fully convolutional network)、PSPNet (pyramid scene parsing network)、ICNet (image cascade network)、PSANet (point-wise spatial attention network)和DenseASPP (dense atrous spatial pyramid pooling)高出7.67%和2.94%、1.54%和0.42%、6.51%和3.34%、7.76%和2.13%、7.70%和-1.59%。实验结果表明本文网络的mIoU和PA优于典型的深度神经网络。结论 本文使用带空洞卷积的ResNet-101网络结合双注意力机制,在保持特征图分辨率并且提高感受野的同时,能够更好地适应背景复杂、干扰较多的裂缝对象。
关键词
Dual attention mechanism based pavement crack detection

Zhang Zhihua1,2,3, Wen Yanan1,2,3, Mu Haowei1, Du Xiaoping4(1.Faculty of Geomatics, Lanzhou Jiaotong University, Lanzhou 730070, China;2.National-Local Joint Engineering Research Center of Technologies and Applications for National Geographic State Monitoring, Lanzhou 730070, China;3.Gansu Provincial Engineering Laboratory for National Geographic State Monitoring, Lanzhou 730070, China;4.Key Laboratory of Digital Earth Science, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China)

Abstract
Objective The highway mileage China has outreached 150 000 kilometers till 2020 guided by Highway Network Planning 2013-2030. Road conditions evaluation has become one critical issue for China highway network further. Road crack detection is one of the key techniques to identify and locate crack objects for traffic safety. However, deep learning based cracked objects detection is challenged to cracked pixels and non-cracked pixels issues of single image. Current attention mechanism is recognized as a deep learning module. It strengthens the consistency of weight-related of crack objects during the training process, and improves the deep learning based crack detection performance. The low accuracy of the typical deep neural network needs to be improved in terms of the crack image detection of more complex background and more interference. Thanks to the road crack dataset of Crack500, our deep learning based road crack detection network is facilitated in the context of dual attention mechanism. Method To deal with the issues mentioned above, a dual attention mechanism integrated road crack detection network is designed. The ResNet-101 network that used dilated convolution is as the basic feature extraction network of the model. The ResNet network has the following features as below:1) the number of parameters can be manipulated; 2) our network levels are clarified, and the number of multilayer's feature maps have their output features ability; 3) the network uses fewer pooling layers and redundant downsampling layers to improve transmission efficiency; 4) the network does not use dropout layer but batch normalization(BN) and global average pooling layer to regularize training process for speeding up; 5) when the number of network layers is high, the number of 3×3 convolution layers is reduced, and 1×1 convolution layers are used to control the number of input and output feature maps. The ResNet-101 network contains a total of 4 residual groups, including 3, 4, 23, and 3 residual blocks, respectively. Therefore, a lightweight attention mechanism is relevant to the end of the residual module for a residual attention module. The lightweight attention mechanism is composed of spatial attention mechanism and channel attention mechanism. The 4 residual groups of ResNet-101 used 3, 4, 23, and 3 residual attention modules, respectively. It is used to enhance consistent weight relationship of the crack objects to realize replicated features extraction of the higher layer of the crack. Giving a medium feature map, the weights relationship is sequentially inferred along the two dimensions of space and channel. Then, multiplying the original feature map to meet the adequate features. It can be seamlessly integrated into any convolutional neural network (CNN) architecture. It also can be trained end-to-end with the CNN together. Our demonstration introduced a non-local attention mechanism at the end of the ResNet-101 network. We obtained the related weight of the highest layer crack feature and achieved the crack detection result. Similarly, the attention mechanism of non-local computing module is related to spatial attention mechanism and channel attention mechanism. Spatial features are updated by the weighted features aggregation in all spots of the image. The weight is determined in terms of the similarity of the features in the two spaces. The channel attention mechanism also applies a similar self-attention mechanism to learn the relationship between any two channel mappings. It updates each channel through the weighted aggregation of all channels as well. Our coding work is implemented based on the pytorch deep learning framework. We carried out stochastic gradient descent(SGD) optimization with an initial learning rate of 0.000 1. The mean intersection over union (mIoU), pixel accuracy (PA), and iteration time are as the evaluation indicators of deep learning models. Result The effectiveness of this network is verified through 4 categories of comparative experiments. The first category is used to evaluate the detection performance of various combination ways of channel attention mechanism and spatial attention mechanism in the residual attention module. The best interactive way is to integrate channel attention mechanism and spatial attention mechanism in parallel. Compared to the other two interactive ways, the mIoU increases 2.11% and 11.29%each; each PA increases by 2.08% and 0.23%. The second result is used to evaluate the effectiveness of the residual attention module., the residual attention module added mIoU and PA increase by 2.34% and 3.01% in comparison with non-residual attention module. The third illustration is used to contrast the effect of common convolution and dilated convolution, the mIoU and PA of using dilated convolution increased by 6.65% and 4.18%. The final one is used to evaluate the detection performance of our network and some deep neural networks. Compared to fully convolutional network (FCN), pyramid scene parsing network (PSPNet), image cascade network (ICNet), point-wise spatial attention network (PSANet), dense atrous spatial pyramid pooling (DenseASPP) FCN, PSPNet, ICNet, our mIoU obtained increases by 7.67%, 1.54%, 6.51%, 7.76%, 7.70%, respectively. Our PA results increases by 2.94%, 0.42%, 3.34%, 2.13%, -1.59%, respectively. Conclusion Our network is combined the ResNet-101 network with dilated convolution and dual attention mechanism. While maintaining the resolution of the feature map and improving the receptive field, this network has its priority to adapt to crack objects with complex background and more interference. Our analyzed results show that our mIoU and PA results have promoted current deep neural networks ability.
Keywords

订阅号|日报