Multi-scale local feature enhanced transformer network for pavement crack detection

Xu Zhengsen; Lei Xiangda; Guan Haiyan

doi:10.11834/jig.211129

Image Analysis and Recognition | Views : 0 下载量: 0 CSCD: 0

PDF
Export
Share
Collection
Album

Multi-scale local feature enhanced transformer network for pavement crack detection
Vol. 28, Issue 4, Pages: 1019-1028(2023)
Published： 16 April 2023 ，
DOI： 10.11834/jig.211129
稿件说明：

移动端阅览

许正森，雷相达，管海燕. 2023. 多尺度局部特征增强Transformer道路裂缝检测模型. 中国图象图形学报， 28(04):1019-1028

Xu Zhengsen， Lei Xiangda， Guan Haiyan. 2023. Multi-scale local feature enhanced transformer network for pavement crack detection. Journal of Image and Graphics， 28(04):1019-1028
许正森，雷相达，管海燕. 2023. 多尺度局部特征增强Transformer道路裂缝检测模型. 中国图象图形学报， 28(04):1019-1028 DOI： 10.11834/jig.211129.

Xu Zhengsen， Lei Xiangda， Guan Haiyan. 2023. Multi-scale local feature enhanced transformer network for pavement crack detection. Journal of Image and Graphics， 28(04):1019-1028 DOI： 10.11834/jig.211129.

摘要

目的

道路裂缝是路面病害的早期征兆。定期监测路面状况、及时准确地发现路面裂缝对于交通养护机构降低成本、保证路面结构的可靠性和耐久性以及提高驾驶安全性、舒适性有重要意义。目前基于卷积神经网络的深度学习模型在长距离依赖建模方面存在不足，模型精度难以满足真实路面环境下的裂缝检测任务。一些模型通过引入空间/通道注意力机制进行长距离依赖建模，但是会导致计算量和计算复杂程度增加，无法实现实时检测。鉴于此，本文提出一种基于Transformer编码—解码结构的深度神经网络道路裂缝检测模型CTNet（crack transformer network）。

方法

该模型主要由Transformer注意力模块、多尺度局部特征增强模块、上采样模块和跨越连接4部分构成。采用Transformer注意力机制能更有效提取全局和长距离依赖关系，克服传统卷积神经网络表征输入信息的短距离相关缺陷。同时，为适应裂缝尺寸变化多样性，将Transformer与多尺度局部特征增强模块相结合，从而有效整合不同尺度局部信息，克服Transformer局部特征建模不足。

结果

通过与DeepCrack模型在不同裂缝检测数据集中的比较表明，本文提出的多尺度局部特征增强Transformer网络能快速、准确地分割路面裂缝，且效率更优。定量研究结果表明，CTNet在更有挑战性的CrackLS315 数据集中的精度、召回率和

1值达到91.38%、80.38%和85.53%，明显优于对比方法。在CrackWH100数据集中，精度、召回率和

1值进一步提升，分别达到92.70%、90.52%和91.60%。此外，CTNet的训练速率提升至DeepCrack模型的6.78倍。

结论

CTNet可以实现强噪声背景下的道路裂缝检测，检测效果优于目前最优方法，且参数量小，易于训练和部署。

Abstract

Objective

The pavement-relevant inspection is focused on the optimization for pavement cracks early-alarming detection and the preservation of pavement structure. However， conventional image processing-based techniques are labor-intensive and time-consuming， such as edge detection， threshold segmentation， template matching， and morphology operations. It is challenged to the geometric and spectral complexities of pavement crack and its contexts （e.g.， illumination variation， oil or water stains， and shadows caused by trees and vehicles）. The convolution neural network （CNN） based deep learning image processing techniques have been developing intensively. However， the CNN-based methods are less effective in long-range dependency modeling， which may cause insufficient detection results in complicated road surface scenarios. Some works are related to attention mechanisms like spatial or channel attention modules， and self-attention modules. However， these attention mechanism-based operations are still challenged for their sophistication and computational cost.

Method

To detect pavement cracks efficiently and effectively， we develop a novel Transformer-based encoder-decoder neural network， called CTNet， which consists of Transformer blocks， multi-scale local feature enhanced blocks， upsampling blocks， and skip connections. The CTNet can achieve more long-range dependency and global receptive field in terms of multi-head self-attention-based Transformer mechanism. Although Transformer is featured by high running efficiency and low computational overhead demand， it is infeasible to model local contextual information because Token generation can break the connections of neighboring regions. Thus， to capture more multi-scale local information， we design a multi-scale local feature-enhanced block in terms of a multiple dilation ratios-relevant dilation convolution block. Especially， the designed multi-scale local feature enhancement block is melted into each Transformer block for local information complement. Both of local and global low-level contextual features can be captured for feature enhancement. Afterwards， a novel decoder path is implemented to extract high-level features. The decoder consists of the Transformer blocks similar to the up-sampling blocks and the spatial details can be restored for end-to-end segmentation.

Result

To demonstrate the efficiency and effectiveness of our proposed CTNet， a series of comparative analyses and ablation studies are carried out on three datasets. First， the CTNet can optimize running efficiency， as well as comparable computation overhead and complexity compared to the current UNet， SegNet， DeepCrack， and SwinUnet. Second， CTNet is 6.78 times faster than the second-best DeepCrack model in terms of training speed. On CrackLS315 dataset， quantitative analyses are also showed that the optimal CTNet is obtained a precision of 91.38%， a recall of 80.38%， and a

1 measure of 85.53% of each； on CrackWH100 dataset， CTNet can obtain a precision of 92.70%， a recall of 90.52%， and a

1 measure 91.60% of each as well. However， it is still challenged to lack of local information when pure-Transformer-based Swin-UNet performed not well compared to fully convolution networks. Furthermore， the CTNet is insufficient to converge when the local blocks-enhanced are removed. In summary， the Transformer-based CTNet is beneficial to multi-scenario pavement cracks in terms of the global receptive field. The CTNet can get pavement crack detection results consistently.

Conclusion

The proposed CTNet has its potentials to deal with noisy pavement images for pavement crack detection.

关键词

道路工程道路裂缝检测深度学习语义分割自注意力Transformer

Keywords

road engineeringpavement crack detectiondeep learningsemantic segmentationself-attentionTransformer

references

Badrinarayanan V， Kendall A and Cipolla R. 2017. SegNet： a deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence， 39（12）： 2481-2495 ［DOI： 10.1109/TPAMI.2016.2644615http://dx.doi.org/10.1109/TPAMI.2016.2644615］

Cao H， Wang Y Y， Chen J， Jiang D S， Zhang X P， Tian Q and Wang M N. 2021. Swin-Unet： Unet-like pure transformer for medical image segmentation ［EB/OL］. ［2021-05-21］. https://arxiv.org/pdf/2105.05537.pdfhttps://arxiv.org/pdf/2105.05537.pdf

Dosovitskiy A， Beyer L， Kolesnikov A， Weissenborn D， Zhai X H， Unterthiner T， Dehghani M， Minderer M， Heigold G， Gelly S， Uszkoreit J and Houlsby N. 2021. An image is worth 16 × 16 words： transformers for image recognition at scale//Proceedings of the 9th International Conference on Learning Representations. ［s.l.］： OpenReview.net： 1-21

Du Y C， Pan N， Xu Z H， Deng F W， Shen Y and Kang H. 2021. Pavement distress detection and classification based on YOLO network. International Journal of Pavement Engineering， 22（13）： 1659-1672 ［DOI： 10.1080/10298436.2020.1714047http://dx.doi.org/10.1080/10298436.2020.1714047］

Fang F， Li L Y， Gu Y， Zhu H Y and Lim J H. 2020. A novel hybrid approach for crack detection. Pattern Recognition， 107： #107474 ［DOI： 10.1016/j.patcog.2020.107474http://dx.doi.org/10.1016/j.patcog.2020.107474］

Fei Y， Wang K C P， Zhang A， Chen C， Li J Q， Liu Y， Yang G W and Li B X. 2020. Pixel-level cracking detection on 3D asphalt pavement images through deep-learning-based CrackNet-V. IEEE Transactions on Intelligent Transportation Systems， 21（1）： 273-284 ［DOI： 10.1109/TITS.2019.2891167http://dx.doi.org/10.1109/TITS.2019.2891167］

Guo J Y， Han K， Wu H， Tang Y H， Chen X H， Wang Y H and Xu C. 2021. CMT： convolutional neural networks meet vision transformers ［EB/OL］. ［2021-07-15］. https://arxiv.org/pdf/2107.06263.pdfhttps://arxiv.org/pdf/2107.06263.pdf

Ibragimov E， Lee H J， Lee J J and Kim N. 2022. Automated pavement distress detection using region based convolutional neural networks. International Journal of Pavement Engineering， 23（6）： 1981-1992 ［DOI： 10.1080/10298436.2020.1833204http://dx.doi.org/10.1080/10298436.2020.1833204］

Li H F， Song D Z， Liu Y and Li B B. 2019. Automatic pavement crack detection by multi-scale image fusion. IEEE Transactions on Intelligent Transportation Systems， 20（6）： 2025-2036 ［DOI： 10.1109/TITS.2018.2856928http://dx.doi.org/10.1109/TITS.2018.2856928］

Li H T， Xu H Y， Tian X D， Wang Y， Cai H Y， Cui K R and Chen X D. 2020. Bridge crack detection based on SSENets. Applied Sciences， 10（12）： #4230 ［DOI： 10.3390/app10124230http://dx.doi.org/10.3390/app10124230］

Li Y W， Zhang K， Cao J Z， Timofte R and Van Gool L. 2021. LocalViT： bringing locality to vision transformers ［EB/OL］. ［2021-04-12］. https://arxiv.org/pdf/2104.05707.pdfhttps://arxiv.org/pdf/2104.05707.pdf

Liu Z， Wu W X， Gu X Y， Li S W， Wang L T and Zhang T J. 2021. Application of combining YOLO models and 3D GPR images in road detection and maintenance. Remote Sensing， 13（6）： #1081 ［DOI： 10.3390/rs13061081http://dx.doi.org/10.3390/rs13061081］

Mei Q P and Gül M. 2020. A cost effective solution for pavement crack inspection using cameras and deep neural networks. Construction and Building Materials， 256： #119397 ［DOI： 10.1016/j.conbuildmat.2020.119397http://dx.doi.org/10.1016/j.conbuildmat.2020.119397］

Pan Y， Zhang G W and Zhang L M. 2020. A spatial-channel hierarchical deep learning network for pixel-level automated crack detection. Automation in Construction， 119： #103357 ［DOI： 10.1016/j.autcon.2020.103357http://dx.doi.org/10.1016/j.autcon.2020.103357］

Ronneberger O， Fischer P and Brox T. 2015. U-Net： convolutional networks for biomedical image segmentation//Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Munich， Germany： Springer： 234-241 ［DOI： 10.1007/978-3-319-24574-4_28http://dx.doi.org/10.1007/978-3-319-24574-4_28］

Sun D W， Yao A B， Zhou A J and Zha H. 2019. Deeply-supervised knowledge synergy//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 6997-7006 ［DOI： 10.1109/CVPR.2019.00716http://dx.doi.org/10.1109/CVPR.2019.00716］

Tang Y Z， Zhang A A， Luo L， Wang G L and Yang E H. 2021. Pixel-level pavement crack segmentation with encoder-decoder network. Measurement， 184： #109914 ［DOI： 10.1016/j.measurement.2021.109914http://dx.doi.org/10.1016/j.measurement.2021.109914］

Tran T S， Tran V P， Lee H J， Flores J M and Le V P. 2022. A two-step sequential automated crack detection and severity classification process for asphalt pavements. International Journal of Pavement Engineering， 23（6）： 2019-2033 ［DOI： 10.1080/10298436.2020.1836561http://dx.doi.org/10.1080/10298436.2020.1836561］

Wang W X， Wang M F， Li H X， Zhao H， Wang K， He C T， Wang J， Zheng S F and Chen J B. 2019. Pavement crack image acquisition methods and crack extraction algorithms： a review. Journal of Traffic and Transportation Engineering （English Edition）， 6（6）： 535-556 ［DOI： 10.1016/j.jtte.2019.10.001http://dx.doi.org/10.1016/j.jtte.2019.10.001］

Wu Z Y， Kalfarisi R， Kouyoumdjian F and Taelman C. 2020. Applying deep convolutional neural network with 3D reality mesh model for water tank crack detection and evaluation. Urban Water Journal， 17（8）： 682-695 ［DOI： 10.1080/1573062X.2020.1758166http://dx.doi.org/10.1080/1573062X.2020.1758166］

Yu Y T， Guan H Y， Li D L， Zhang Y J， Jin S H and Yu C H. 2022. CCapFPN： a context-augmented capsule feature pyramid network for pavement crack detection. IEEE Transactions on Intelligent Transportation Systems， 23（4）： 3324-3335 ［DOI： 10.1109/TITS.2020.3035663http://dx.doi.org/10.1109/TITS.2020.3035663］

Zhang A， Wang K C P， Fei Y， Liu Y， Chen C， Yang G W， Li J Q， Yang E H and Qiu S. 2019. Automated pixel-level pavement crack detection on 3D asphalt surfaces with a recurrent neural network. Computer-Aided Civil and Infrastructure Engineering， 34（3）： 213-229 ［DOI： 10.1111/mice.12409http://dx.doi.org/10.1111/mice.12409］

Zhang A， Wang K C P， Fei Y， Liu Y， Tao S Y， Chen C， Li J Q and Li B X. 2018. Deep learning-based fully automated pavement crack detection on 3D asphalt surfaces with an improved CrackNet. Journal of Computing in Civil Engineering， 32（5）： #04018041 ［DOI： 10.1061/（ASCE）CP.1943-5487.0000775http://dx.doi.org/10.1061/（ASCE）CP.1943-5487.0000775］

Zhang A， Wang K C P， Li B X， Yang E H， Dai X X， Peng Y， Fei Y， Liu Y， Li J Q and Chen C. 2017. Automated pixel-level pavement crack detection on 3D asphalt surfaces using a deep-learning network. Computer-Aided Civil and Infrastructure Engineering， 32（10）： 805-819 ［DOI： 10.1111/mice.12297http://dx.doi.org/10.1111/mice.12297］

Zhang K G， Zhang Y T and Cheng H D. 2021. CrackGAN： pavement crack detection using partially accurate ground truths based on generative adversarial learning. IEEE Transactions on Intelligent Transportation Systems， 22（2）： 1306-1319 ［DOI： 10.1109/TITS.2020.2990703http://dx.doi.org/10.1109/TITS.2020.2990703］

Zhang Q L and Yang Y B. 2021. ResT： an efficient transformer for visual recognition//Proceedings of the 35th Conference on Neural Information Processing Systems. ［s.l.］： MIT Press： 15475-15485

Zheng S X， Lu J C， Zhao H S， Zhu X T， Luo Z K， Wang Y B， Fu Y W， Feng J F， Xiang T， Torr P H S and Zhang L. 2021. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville： IEEE： 6881-6890 ［DOI： 10.1109/CVPR46437.2021.00681http://dx.doi.org/10.1109/CVPR46437.2021.00681］

Zou Q， Zhang Z， Li Q Q， Qi X B， Wang Q and Wang S. 2019. DeepCrack： learning hierarchical convolutional features for crack detection. IEEE Transactions on Image Processing， 28（3）： 1498-1512 ［DOI： 10.1109/TIP.2018.2878966http://dx.doi.org/10.1109/TIP.2018.2878966］

Alert me when the article has been cited

提交

Weakly supervised semantic segmentation based on deep learning

Low-light image enhancement guided by semantic segmentation and HSV color space

Blueprint separable convolution Transformer network for lightweight image super-resolution

Survey of image deblurring

Double-view feature fusion network for LiDAR semantic segmentation