面向全景智齿检测的内卷解耦轻量化网络

曾怡峰; 姚潇; 华飞; 王佩佩; 顾敏

doi:10.11834/jig.220377

医学图像处理 | 浏览量 : 0 下载量: 1 CSCD: 0

PDF
导出
分享
收藏
专辑

面向全景智齿检测的内卷解耦轻量化网络
A lightweight network-involute and decoupled for panoramic wisdom tooth detection
2023年28卷第8期页码：2491-2504
纸质出版日期： 2023-08-16 ，
DOI： 10.11834/jig.220377
稿件说明：

移动端阅览

曾怡峰，姚潇，华飞，王佩佩，顾敏. 2023. 面向全景智齿检测的内卷解耦轻量化网络. 中国图象图形学报， 28(08):2491-2504

Zeng Yifeng， Yao Xiao， Hua Fei， Wang Peipei， Gu Min. 2023. A lightweight network-involute and decoupled for panoramic wisdom tooth detection. Journal of Image and Graphics， 28(08):2491-2504
曾怡峰，姚潇，华飞，王佩佩，顾敏. 2023. 面向全景智齿检测的内卷解耦轻量化网络. 中国图象图形学报， 28(08):2491-2504 DOI： 10.11834/jig.220377.

Zeng Yifeng， Yao Xiao， Hua Fei， Wang Peipei， Gu Min. 2023. A lightweight network-involute and decoupled for panoramic wisdom tooth detection. Journal of Image and Graphics， 28(08):2491-2504 DOI： 10.11834/jig.220377.

摘要

目的

全口曲面断层片（全景片）需要病人的正确摆位辅以仪器的合理配置而取得合格的成像：以面中线为界，双侧上下颌骨等结构呈左右对称；牙齿的咬合面连线呈缓慢的微笑曲线，各牙齿在全景片上的生理位置是基本固定的。因此，以全景片为代表的口腔医学图像具备固定的前、背景关系和稳定的空间结构，但基于常规卷积的网络因其卷积的空间无关性而对上述空间域的结构信息并不敏感。虽然一些特殊的注意力模块能够引导模型关注特定信息并给予加权，但是它关注的信息常常背离人们的期望，反而降低模型性能；另一方面，注意力作为嵌入式的模块往往会提高计算量和参数量。针对口腔医学图像的结构特性，提出适用于全景智齿检测的基于内卷解耦的YOLO（you only look once）模型。

方法

在主干网络中，通过重塑跨阶段分部（cross stage partial，CSP）结构并引入一种具备空间特异性的内卷积方式，使模型优先关注空间域中信息量最大的视觉元素，以此强化模型对空间信息的建模能力；在检测头结构中，提出采用多支路解耦结构克服任务耦合的负面影响，解决内卷算子与YOLO模型的适配性问题，并对各支路的损失函数进行针对性优化。

结果

在全景片数据集上的智齿检测的实验结果表明，本文方法的检测性能和模型参数大幅优于近年优秀的单阶段目标检测模型，相较于本文的基线模型，参数量缩减了42.5%，平均精确率提升了6.3%，充分验证了本文模型结构的合理性及对于智齿检测任务的有效性。

结论

本文针对口腔医学图像的空间结构性质提出的基于内卷解耦的全景智齿检测方案，具有更强的空间信息建模能力，且降低了参数量成本。

Abstract

Objective

The human dentition-related third molar can be developed and erupted as an impacted tooth. The stomatologists are often required to clarify the current status and potential complications of the disease in terms of panoramic image analyzed impaction level and angle of the mandibular third molar. The panorama is a two-dimensional view， and the artifacts， image overlap and deformation-derived interpretation are vulnerable. The diagnosis and evaluation of diseases are often challenged for manual interpretation of medical images. To get optimal artificial intelligence medical aided diagnosis， we attempt to melt deep learning-based target detection algorithm into panoramic film data. Consensus object image analysis is restricted by complex background， and the obvious texture difference of categories is challenged for vulnerable perception of convolutional neural network in the panoramic image， texture-consistent teeth are closely pre-arranged in related to the integration of fixed front background relationship and certain spatial structure characteristics. Stomatologist is still required to judge the abnormal condition of wisdom teeth in terms of mutual relationship between the spatial position and tooth interaction， and discrimination process of this relationship can be concerned and modeled in terms of visual task-relevant spatial attention mechanism. Specifically， the attention mechanism is beneficial to suppress redundant channels or pixels to a certain extent. It can be melted into the trunk neural network as a plug-in module， or attached to the top of the trunk to extract high-level semantic relations， for which the bottom layer of convolution features can be preserved.

Method

The convolution property of neural network is analyzed， and the attention-specific inner convolution operator to the spatial element information can be used optimally. It is melted into you only look once （YOLO） target detection model to improve the performance and reduce the parameters on the premise of ensuring the advantages of YOLO itself. A YOLO-based panoramic wisdom tooth detection scheme is proposed as well. The main contributions are listed as follows： 1） an improved cross stage partial （CSP） structure （invoCSP） is proposed， which can optimize the integration of CSP structure and revolution operator， and the YOLO model is introduced derived from its stacking and the related revolution operator. The contextual information can be summarized in a wider spatial range， and different area-oriented weight can be adaptively balanced and allocated in the feature map as well. Finally， spatial modeling ability is improved to fully extract the spatial structure information on the data set； 2） we analyze the defect of task coupling in the YOLO model， excavate the potential properties of the involution operator， and summarize the external conditions. To fully decompose and decouple the three specific tasks of the two properties of target detection， a three-branch decoupling structure is constructed in the detection head structure. The applicability of the YOLO model can be improved further. This scheme can be used to alleviate its training process and non-convergence problem of the involution method； 3） three branch detection head can avoid sharing weight parameters further for independent optimization. The modified loss function can be used to optimize the tasks on different branches， introduce focal-loss to the confidence loss， and a newly intersection of union （IoU）-loss is applied to the boundary regression of the prediction frame and an advanced classification loss.

Result

To realize the classification and labeling of mandibular wisdom teeth， a newly panoramic film data set is developed for the winter classification method， which is commonly-used in the diagnosis and treatment of wisdom teeth in oral clinic. It can be randomly disturbed after histogram equalization. Three stomatologists focus on labeling the mandibular wisdom teeth independently for many times under the circumstances of unified diagnostic criteria and labeling rules， and a total of 973 consistent data can be reached. Finally， the panoramic wisdom tooth data set-constructed experimental results demonstrate its potentials for single-stage target detection model in terms of the detection performance and model parameters. Compared to the benched model of YOLOX-tiny， the parameter is lower by 42.5% and the mAP_50 index is higher by 6.3 percentage points. In addition， comparative analysis is carried out with nine sort of popular single-stage target detection models as well. The performance of the yooid model is beneficial for same parameter quantity-related optimization. It can not only identify wisdom tooth types accurately， but also return to the prediction frame stably with high IoU and closer to the real label. It is comparable to the detection performance of large model in terms of constraints of smaller parameters， and the highest mAP_50 index can be even achieved.

Conclusion

To deal with the problem of panorama-based wisdom tooth detection， the convolution property is analyzed in the neural network. The involution operator is adopted， in which specific attention is added to the spatial element information of additional network structure excluded， and it is introduced into the YOLO target detection model feasibly. The performance is improved and the parameters can be reduced on the premise of ensuring the advantages of YOLO. A panoramic wisdom tooth detection network model is facilitated based on involution decoupling. Through the comparison of qualitative and quantitative experiments， it is verified that the proposed model can effectively detect the constructed target object， and the rationality of the design idea of the model can be improved as well. Furthermore， it demonstrates that the decoupled structure can fit the design of involution. We sort the relationship between involution and coupling out， and a multiple-branch decoupled structure can be used to improve the YOLO model-oriented adaptability of involution operator further. The parameters of this model are reduced greatly beyond high efficiency of performance， which is suitable for the application environment of real-time detection. It is predicted that this method proposed is beneficial to realize lightweight application level deployment in preliminary screening and objective reference for Stomatology further.

关键词

全景片智齿目标检测YOLO解耦内卷

Keywords

panoramic radiographwisdom toothtarget detectionyou only look once （YOLO）decouplinginvolution

references

Bilgir E， Bayrakdar I Ş， Çelik Ö， Orhan K， Akkoca F， Sağlam H， Odabaş A， Aslan A F， Ozecetin C， Kıllı M and Rozylo-Kalinowska I. 2021. An artifıcial ıntelligence approach to automatic tooth detection and numbering in panoramic radiographs. BMC Med Imaging， 21（1）： #124 ［DOI： 10.1186/s12880-021-00656-7http://dx.doi.org/10.1186/s12880-021-00656-7］

Bochkovskiy A， Wang C Y and Liao H Y M. 2020. YOLOv4： optimal speed and accuracy of object detection ［EB/OL］. ［2022-04-25］. http://arxiv.org/pdf/2004.10934.pdfhttp://arxiv.org/pdf/2004.10934.pdf

Carion N， Massa F， Synnaeve G， Usunier N， Kirillov A and Zagoruyko S. 2020. End-to-end object detection with transformers//Proceedings of the 16th European Conference on Computer Vision. Glasgow， UK： Springer： 213-229 ［DOI： 10.1007/978-3-030-58452-8_13http://dx.doi.org/10.1007/978-3-030-58452-8_13］

Chen Q， Wang Y M， Yang T， Zhang X Y， Cheng J and Sun J. 2021. You only look one-level feature//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville， USA： IEEE： 13034-13043 ［DOI： 10.1109/CVPR46437.2021.01284http://dx.doi.org/10.1109/CVPR46437.2021.01284］

Chen Y P， Kalantidis Y， Li J S， Yan S C and Feng J S. 2018. A2-nets： double attention networks ［EB/OL］. ［2022-04-25］. http://arxiv.org/pdf/1810.11579.pdfhttp://arxiv.org/pdf/1810.11579.pdf

Fu Y， Wu J Q， Shao X， Yang X， Ji A P and Zheng J J. 2020. Analysis of flow rate and disease types of patients in department of oral emergency at night. Journal of Modern Stomatology， 34（4）： 235-238

付元，吴佳琪，邵校，杨雪，姬爱平，郑佳佳. 2020. 口腔急诊夜间患者流量及病种分析. 现代口腔医学杂志， 34（4）： 235-238

Ge Z， Liu S T， Wang F， Li Z M and Sun J. 2021. YOLOX： exceeding YOLO series in 2021 ［EB/OL］. ［2022-04-25］. http://arxiv.org/pdf/2107.08430.pdfhttp://arxiv.org/pdf/2107.08430.pdf

Glenn J. 2021. Yolov5. https://github.com/ultralytics/yolov5.1，2，3，5，6

He J B， Erfani S， Ma X J， Bailey J， Chi Y and Hua X S. 2022. Alpha-IoU： a family of power intersection over union losses for bounding box regression ［EB/OL］. ［2022-04-25］. http://arxiv.org/pdf/2110.13675.pdfhttp://arxiv.org/pdf/2110.13675.pdf

Idris A M， Al-Mashraqi A A， Abidi N H， Vani N V， Elamin E I， Khubrani Y H， Sh Alhazmi A， Alamir A H， Fageeh H N， Meshni A A， Mashyakhy M H， Makrami A M， Gareeb A and Jafer M. 2021. Third molar impaction in the Jazan region： evaluation of the prevalence and clinical presentation. The Saudi Dental Journal， 33（4）： 194-200 ［DOI： 10.1016/j.sdentj.2020.02.004http://dx.doi.org/10.1016/j.sdentj.2020.02.004］

Kim S W， Kook H K， Sun J Y， Kang M C and Ko S J. 2018. Parallel feature pyramid network for object detection//Proceedings of the 15th European Conference on Computer Vision. Munich， Germany： Springer： 239-256 ［DOI： 10.1007/978-3-030-01228-1_15http://dx.doi.org/10.1007/978-3-030-01228-1_15］

Leng Z Q， Tan M X， Liu C X， Cubuk E D， Shi X J， Cheng S Y and Anguelov D. 2022. PolyLoss： a polynomial expansion perspective of classification loss functions ［EB/OL］. ［2022-04-25］. http://arxiv.org/pdf/2204.12511.pdfhttp://arxiv.org/pdf/2204.12511.pdf

Li D， Hu J， Wang C H， Li X T， She Q， Zhu L， Zhang T and Chen Q F. 2021. Involution： inverting the inherence of convolution for visual recognition//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville， USA： IEEE： 12316-12325 ［DOI： 10.1109/CVPR46437.2021.01214http://dx.doi.org/10.1109/CVPR46437.2021.01214］

Li Z B. 2016. The Effect of Non-Impacted Third Molars on the Condition of Their Adjacent Teeth. Xi’an： The Fourth Military Medical University

李治邦. 2016. 非阻生第三磨牙对邻牙健康影响的回顾性研究. 西安：第四军医大学

Lin T Y， Goyal P， Girshick R， He K M and Doll􀅡r P. 2017. Focal loss for dense object detection//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice， Italy： IEEE： 2999-3007 ［DOI： 10.1109/ICCV.2017.324http://dx.doi.org/10.1109/ICCV.2017.324］

Liu S， Qi L， Qin H F， Shi J P and Jia J Y. 2018. Path aggregation network for instance segmentation//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 8759-8768 ［DOI： 10.1109/CVPR.2018.00913http://dx.doi.org/10.1109/CVPR.2018.00913］

Liu W， Anguelov D， Erhan D， Szegedy C， Reed S， Fu C Y and Berg A C. 2016. SSD： single shot MultiBox detector//Proceedings of the 14th European Conference on Computer Vision. Amsterdam， the Netherlands： Springer： 21-37 ［DOI： 10.1007/978-3-319-46448-0_2http://dx.doi.org/10.1007/978-3-319-46448-0_2］

Mima Y， Nakayama R， Hizukuri A and Murata K. 2022. Tooth detection for each tooth type by application of faster R-CNNs to divided analysis areas of dental panoramic X-ray images. Radiological Physics and Technology， 15（2）： 170-176 ［DOI： 10.1007/s12194-022-00659-1http://dx.doi.org/10.1007/s12194-022-00659-1］

Muresan M P， Barbura A R and Nedevschi S. 2020. Teeth detection and dental problem classification in panoramic X-ray images using deep learning and image processing techniques//Proceedings of the 16th IEEE International Conference on Intelligent Computer Communication and Processing. Cluj-Napoca， Romania： IEEE： 457-463 ［DOI： 10.1109/ICCP51029.2020.9266244http://dx.doi.org/10.1109/ICCP51029.2020.9266244］

Redmon J， Divvala S， Girshick R and Farhadi A. 2016. You only look once： unified， real-time object detection//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas， USA： IEEE： 779-788 ［DOI： 10.1109/CVPR.2016.91http://dx.doi.org/10.1109/CVPR.2016.91］

Redmon J and Farhadi A. 2017. YOLO9000： better， faster， stronger//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu， USA： IEEE： 6517-6525 ［DOI： 10.1109/CVPR.2017.690http://dx.doi.org/10.1109/CVPR.2017.690］

Redmon J and Farhadi A. 2018. YOLOv3： an incremental improvement ［EB/OL］. ［2022-04-25］. http://arxiv.org/pdf/1804.02767.pdfhttp://arxiv.org/pdf/1804.02767.pdf

Song G L， Liu Y and Wang X G. 2020. Revisiting the sibling head in object detector//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 11560-11569 ［DOI： 10.1109/CVPR42600.2020.01158http://dx.doi.org/10.1109/CVPR42600.2020.01158］

Sun C， Myers A， Vondrick C， Murphy K and Schmid C. 2019. VideoBERT： a joint model for video and language representation learning//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul， Korea （South）： IEEE： 7463-7472 ［DOI： 10.1109/ICCV.2019.00756http://dx.doi.org/10.1109/ICCV.2019.00756］

Tian Z， Shen C H， Chen H and He T. 2019. FCOS： fully convolutional one-stage object detection//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul， Korea （South）： IEEE： 9626-9635 ［DOI： 10.1109/ICCV.2019.00972http://dx.doi.org/10.1109/ICCV.2019.00972］

Ventä I， Vehkalahti M M， Huumonen S and Suominen A L. 2020. Prevalence of third molars determined by panoramic radiographs in a population-based survey of adult Finns. Community Dentistry and Oral Epidemiology， 48（3）： 208-214 ［DOI： 10.1111/cdoe.12517http://dx.doi.org/10.1111/cdoe.12517］

Wang C Y， Liao H Y M， Wu Y H， Chen P Y， Hsieh J W and Yeh I H. 2020. CSPNet： a new backbone that can enhance learning capability of CNN//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Seattle， USA： IEEE： 1571-1580 ［DOI： 10.1109/CVPRW50498.2020.00203http://dx.doi.org/10.1109/CVPRW50498.2020.00203］

Wang Y F， Yang L， Wang D Y and Wu B. 2022. YOLR： an automatic tooth segmentation and detection network//Proceedings Volume 12179， the 2nd International Conference on Medical Imaging and Additive Manufacturing. Xiamen， China： SPIE： 264-272 ［DOI： 10.1117/12.2636673http://dx.doi.org/10.1117/12.2636673］

Winter G B. 1926. Principles of exodontia as applied to the impacted third molar. St Louis： American Medical books

Wu Y， Chen Y P， Yuan L， Liu Z C， Wang L J， Li H Z and Fu Y. 2020. Rethinking classification and localization for object detection//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 10183-10192 ［DOI： 10.1109/CVPR42600.2020.01020http://dx.doi.org/10.1109/CVPR42600.2020.01020］

Yang L and Shu X B. 2019. Global and local based convolutional neural networks for impacted tooth classification. Journal of Computer Applications， 39（S1）： 250-253

杨鎏，舒祥波. 2019. 基于全局-局部卷积神经网络的阻生牙分类. 计算机应用， 39（S1）： 250-253

Yue K Y， Sun M， Yuan Y C， Zhou F， Ding E R and Xu F X. 2018. Compact generalized non-local network//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal， Canada： Curran Associates Inc.： 6511-6520

Yüksel A E， Gültekin S， Simsar E， Özdemir Ş D， Gündoğar M， Tokgöz S B and Hamamcı İ E. 2021. Dental enumeration and multiple treatment detection on panoramic X-rays using deep learning. Scientific Reports， 11（1）： #12342 ［DOI： 10.1038/s41598-021-90386-1http://dx.doi.org/10.1038/s41598-021-90386-1］

Zhang L， Li S J， Shi X and Kong L X. 2018. Factors related to clinical image quality of panoramic radiography. Journal of Oral and Maxillofacial Surgery， 28（4）： 225-228

张利，李生娇，施雄，孔令霞. 2018. 口腔全景片图像质量影响因素分析. 口腔颌面外科杂志， 28（4）： 225-228 ［DOI： 10.3969/j.issn.1005-4979.2018.04.009http://dx.doi.org/10.3969/j.issn.1005-4979.2018.04.009］

文章被引用时，请邮件提醒。

提交

单帧红外图像多尺度小目标检测技术综述

融合非临近跳连与多尺度残差结构的小目标车辆检测

短时记忆与CenterTrack的车辆多目标跟踪

YOLOv3和双线性特征融合的细粒度图像分类

SAR图像舰船目标检测的信息几何方法