双编码特征注意网络的手术器械分割
A dual-encoder feature attention network for surgical instrument segmentation
- 2023年28卷第10期 页码:3214-3230
纸质出版日期: 2023-10-16
DOI: 10.11834/jig.220716
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2023-10-16 ,
移动端阅览
杨磊, 谷玉格, 边桂彬, 刘艳红. 2023. 双编码特征注意网络的手术器械分割. 中国图象图形学报, 28(10):3214-3230
Yang Lei, Gu Yuge, Bian Guibin, Liu Yanhong. 2023. A dual-encoder feature attention network for surgical instrument segmentation. Journal of Image and Graphics, 28(10):3214-3230
目的
2
手术器械分割是外科手术机器人精准操作的关键环节之一,然而,受复杂因素的影响,精准的手术器械分割目前仍然面临着一定的挑战,如低对比度手术器械、复杂的手术环境、镜面反射以及手术器械的尺度和形状变化等,造成分割结果存在模糊边界和细节错分的问题,影响手术器械分割的精度。针对以上挑战,提出了一种新的手术器械分割网络,实现内窥镜图像中手术器械的准确分割。
方法
2
为了实现内窥镜图像的准确表征以获取有效的特征图,提出了基于卷积神经网络(convolutional neural network,CNN)和Transformer融合的双编码器结构,实现分割网络对细节特征和全局上下文语义信息的提取。为了实现局部特征图的特征增强,引入空洞卷积,设计了多尺度注意融合模块,以获取多尺度注意力特征图。针对手术器械分割面临的类不均衡问题,引入全局注意力模块,提高分割网络对手术器械区域的关注度,并减少对于无关特征的关注。
结果
2
为了有效验证本文模型的性能,使用两个公共手术器械分割数据集进行性能分析和测试。基于定性分析和定量分析通过消融实验和对比实验,验证了本文算法的有效性和优越性。实验结果表明:在Kvasir-instrument数据集上,本文算法的Dice分数和mIOU(mean intersection over union)值分别为96.46%和94.12%;在Endovis2017(2017 Endoscopic Vision Challenge)数据集上,本文算法的Dice分数和mIOU值分别为96.27%和92.55%。相较于对比的先进分割网络,本文算法实现了分割精度的有效提升。同时,消融研究也证明了本文算法方案设计的合理性,缺失任何一个子模块都会造成不同程度的精度损失。
结论
2
本文所提出的分割模型有效地融合了CNN和Transformer的优点,同时实现了细节特征和全局上下文信息的充分提取,可以实现手术器械准确、稳定分割。
Objective
2
Medical instruments are recognized as indispensable tools to deal with surgerical tasks. Surgical trauma is still challenged to be optimized farther. The emerging surgical robots could shrink the harmful degree of derived of tsurgery operations, and it has higher stability and stronger learning ability in comparison with manual-based surgery. The precise segmentation of surgical instruments is a key link to the smooth operation of surgical robots. The existing segmentation methods can be used to locate the surgical instruments and segment the shape of the surgical instruments roughly. Due to these complex factors are required to be resolved in relevance to low contrast of surgical instruments, complex environment, mirror reflection, different sizes and shapes of surgical instruments, these segmentation methods are still challenged for a certain loss on boundary information and detailed features of surgical instruments, resulting in blurred boundaries and misclassification of details. To optimize its related surgical instrument segmentation, we develop a Transformer and convolutional neural network (CNN) based dual-encoder fusion segmentation network in terms of endoscopic images-relevant surgical instrument segmentation.
Method
2
For the encoder-decoder framework, a dual-encoder fusion segmentation network is facilitated to construct an end-to-end surgical instrument segmentation scheme. To optimize weak feature representation ability and get effective context features further,a Transformer and CNN fused dual-encoder block is built up to strengthen endoscopic images-related extraction ability of local details and global context information simultaneously. In addition, effective multi-scale feature extraction is also essential for the improvement of segmentation accuracy since heterogeneous surgical instruments are existed in sizes and shapes. To extract multi-scale attention feature maps, a multi-scale attention fusion module is embedded into the bottleneck layer for feature enhancement of local feature maps. To resolve its class imbalance issue-related surgical instrument segmentation task, an attention gated block is also introduced into the decoder unit to integrate the segmentation network into the surgical instruments better, and the attention to irrelevant features can be reduced as well.
Result
2
To verify the effectiveness and potentials of the dual-encoder fusion segmentation network proposed, two sort of publicity datasets on surgical instrument segmentation are adopted, including cataract surgery dataset (Kvasir-instrument dataset) and gastrointestinal surgery dataset (Endovis2017 dataset). Combined with the qualitative analysis and quantitative analysis, the segmentation performance is tested based on three sorts of experiments in related to ablation, comparison and visualization. The proposed dual-encoder fusion segmentation network has obtained a good segmentation results on both two datasets, which could achieve 96.46% of Dice score and 94.12% of mean intersection over union (mIOU) value on the Kvasir-instrument dataset, and 96.27% of Dice score and 92.55% of mIOU value on the Endovis2017 dataset. Compared to other related state-of-the-art comparison methods, the Dice score is improved by 1.51% and the mIOU value is improved by 2.52% compared to progressive alternating attention network(PAANet) model on the Kvasir-instrument dataset, and the Dice score is improved by 1.62% and the mIOU is improved by 2.22% compared to refined attention segmentation network(RASNet) model on the Endovis2017 dataset. Furthermore, to verify the effectiveness of each sub-module, quantitative and qualitative analysis based ablation experiments are also carried out. The dual-encoder module can be verified to improve its segmentation accuracy for Kvasir-instrument dataset and Endovis2017 dataset as well.
Conclusion
2
To optimize surgical instrument segmentation task against such problems like mirror reflection, different shapes and size, and class imbalance, a CNN and Transformer based dual-encoder fusion segmentation network is developed to build up an end-to-end surgical instrument segmentation scheme. It is predicted that our method proposed can be used to segment the surgical instruments accurately based on endoscopic images in various shapes and sizes, which can provide a potential ability for robot-assisted surgery further.
深度学习手术器械分割卷积神经网络(CNN)Transformer双编码特征注意机制
deep learningsurgical instrument segmentationconvolutional neural network (CNN)Transformerdual encoderfeature attention mechanism
Allan M, Kondo S, Bodenstedt S, Leger S, Kadkhodamohammadi R, Luengo I, Fuentes F, Flouty E, Mohammed A, Pedersen M, Kori A, Alex V, Krishnamurthi G, Rauber D, Mendel R, Palm C, Bano S, Saibro G, Shih C S, Chiang H A, Zhuang J T, Yang J L, Iglovikov V, Dobrenkii A, Reddiboina M, Reddy A, Liu X T, Gao C, Unberath M, Kim M, Kim C, Kim C, Kim H, Lee G, Ullah I, Luna M, Park S H, Azizian M, Stoyanov D, Maier-Hein L and Speidel S. 2020. 2018 robotic scene segmentation challenge [EB/OL]. [2022-08-15]. https://arxiv.org/pdf/2001.11190.pdfhttps://arxiv.org/pdf/2001.11190.pdf
Allan M, Shvets A, Kurmann T, Zhang Z C, Duggal R, Su Y H, Rieke N, Laina I, Kalavakonda N, Bodenstedt S, Herrera L, Li W Q, Iglovikov V, Luo H L, Yang J, Stoyanov D, Maier-Hein L, Speidel S and Azizian M. 2019. 2017 robotic instrument segmentation challenge [EB/OL]. [2022-08-15]. https://arxiv.org/pdf/1902.06426.pdfhttps://arxiv.org/pdf/1902.06426.pdf
Badrinarayanan V, Kendall A and Cipolla R. 2017. SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(12): 2481-2495 [DOI: 10.1109/TPAMI.2016.2644615http://dx.doi.org/10.1109/TPAMI.2016.2644615]
Bouget D, Allan M, Stoyanov D and Jannin P. 2017. Vision-based and marker-less surgical tool detection and tracking: a review of the literature. Medical Image Analysis, 35: 633-654 [DOI: 10.1016/j.media.2016.09.003http://dx.doi.org/10.1016/j.media.2016.09.003]
Chen L C, Papandreou G, Kokkinos I, Murphy K and Yuille A L. 2016. Semantic image segmentation with deep convolutional nets and fully connected CRFs [EB/OL]. [2022-08-15]. https://arxiv.org/pdf/1412.7062.pdfhttps://arxiv.org/pdf/1412.7062.pdf
Chen L C, Papandreou G, Kokkinos I, Murphy K and Yuille A L. 2018a. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4): 834-848 [DOI: 10.1109/TPAMI.2017.2699184http://dx.doi.org/10.1109/TPAMI.2017.2699184]
Chen L C, Papandreou G, Schroff F and Adam H. 2017. Rethinking atrous convolution for semantic image segmentation [EB/OL]. [2022-08-15]. https://arxiv.org/pdf/1706.05587.pdfhttps://arxiv.org/pdf/1706.05587.pdf
Chen L C, Zhu Y K, Papandreou G, Schroff F and Adam H. 2018b. Encoder-decoder with atrous separable convolution for semantic image segmentation//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 833-851 [DOI: 10.1007/978-3-030-01234-2_49http://dx.doi.org/10.1007/978-3-030-01234-2_49]
Cheriet M, Said J N and Suen C Y. 1998. A recursive thresholding technique for image segmentation. IEEE Transactions on Image Processing, 7(6): 918-921 [DOI: 10.1109/83.679444http://dx.doi.org/10.1109/83.679444]
Dumoulin V and Visin F. 2018. A guide to convolution arithmetic for deep learning [EB/OL]. [2022-08-15]. https://arxiv.org/pdf/1603.07285.pdfhttps://arxiv.org/pdf/1603.07285.pdf
Fabijańska A. 2011. Variance filter for edge detection and edge-based image segmentation//Perspective Technologies and Methods in MEMS Design. Polyana, Ukraine: IEEE: 151-154
Feng S L, Zhao H M, Shi F, Cheng X N, Wang M, Ma Y H, Xiang D H, Zhu W F and Chen X J. 2020. CPFNet: context pyramid fusion network for medical image segmentation. IEEE Transactions on Medical Imaging, 39(10): 3008-3018 [DOI: 10.1109/TMI.2020.2983721http://dx.doi.org/10.1109/TMI.2020.2983721]
Gu Z W, Cheng J, Fu H Z, Zhou K, Hao H Y, Zhao Y T, Zhang T Y, Gao S H and Liu J. 2019. CE-Net: context encoder network for 2D medical image segmentation. IEEE Transactions on Medical Imaging, 38(10): 2281-2292 [DOI: 10.1109/TMI.2019.2903562http://dx.doi.org/10.1109/TMI.2019.2903562]
Hasan S M K and Linte C A. 2019. U-NetPlus: a modified encoder-decoder U-Net architecture for semantic and instance segmentation of surgical instruments from laparoscopic images//Proceedings of the 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). Berlin, Germany: IEEE: 7205-7211 [DOI: 10.1109/EMBC.2019.8856791http://dx.doi.org/10.1109/EMBC.2019.8856791]
He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778 [DOI: 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90]
Howard A G, Zhu M L, Chen B, Kalenichenko D, Wang W J, Weyand T, Andreetto M and Adam H. 2017. MobileNets: efficient convolutional neural networks for mobile vision applications [EB/OL]. [2022-08-15]. https://arxiv.org/pdf/1704.04861.pdfhttps://arxiv.org/pdf/1704.04861.pdf
Hu J, Shen L and Sun G. 2018. Squeeze-and-excitation networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 7132-7141 [DOI: 10.1109/CVPR.2018.00745http://dx.doi.org/10.1109/CVPR.2018.00745]
Hu X H, Li B J and Liu F. 2013. Image segmentation based on graph theory in multi-color space for maize leaf disease. Transactions of the Chinese Society for Agricultural Machinery, 44(2): 177-181
虎晓红, 李炳军, 刘芳. 2013. 多颜色空间中玉米叶部病害图像图论分割方法. 农业机械学报, 44(2): 177-181 [DOI: 10.6041/j.issn.1000-1298.2013.02.033http://dx.doi.org/10.6041/j.issn.1000-1298.2013.02.033]
Iglovikov V and Shvets A. 2018. Ternausnet: U-Net with VGG11 encoder pre-trained on ImageNet for image segmentation [EB/OL]. [2022-08-15]. https://arxiv.org/pdf/1801.05746.pdfhttps://arxiv.org/pdf/1801.05746.pdf
Isensee F, Jaeger P F, Kohl S A A, Petersen J and Maier-Hein K H. 2021. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature Methods, 18(2): 203-211 [DOI: 10.1038/s41592-020-01008-zhttp://dx.doi.org/10.1038/s41592-020-01008-z]
Jha D, Ali S, Emanuelsen K, Hicks S A, Thambawita V, Garcia-Ceja E, Riegler M A, de Lange T, Schmidt P T, Johansen H D, Johansen D and Halvorsen P. 2021a. Kvasir-Instrument: diagnostic and therapeutic tool//Proceedings of the 27th International Conference on Multimedia Modeling. Prague, Czech Republic: Springer: 218-229 [DOI: 10.1007/978-3-030-67835-7_19http://dx.doi.org/10.1007/978-3-030-67835-7_19]
Jha D, Riegler M A, Johansen D, Halvorsen P and Johansen H D. 2020. DoubleU-Net: a deep convolutional neural network for medical image segmentation//Proceedings of the 33rd IEEE International Symposium on Computer-Based Medical Systems (CBMS). Rochester, USA: IEEE: 558-564 [DOI: 10.1109/CBMS49503.2020.00111http://dx.doi.org/10.1109/CBMS49503.2020.00111]
Jha D, Smedsrud P H, Riegler M A, Johansen D, De Lange T, Halvorsen P and Johansen H D. 2019. ResUNet++: an advanced architecture for medical image segmentation//Proceedings of 2019 IEEE International Symposium on Multimedia (ISM). San Diego, USA: IEEE: 225-230 [DOI: 10.1109/ISM46123.2019.00049http://dx.doi.org/10.1109/ISM46123.2019.00049]
Jha D, Tomar N K, Ali S, Riegler M A, Johansen H D, Johansen D, de Lange T and Halvorsen P. 2021b. NanoNet: real-time polyp segmentation in video capsule endoscopy and colonoscopy//Proceedings of the 34th IEEE International Symposium on Computer-Based Medical Systems (CBMS). Aveiro, Portugal: IEEE: 37-43 [DOI: 10.1109/CBMS52027.2021.00014http://dx.doi.org/10.1109/CBMS52027.2021.00014]
Jin Y M, Cheng K Y, Dou Q and Heng P A. 2019. Incorporating temporal prior from motion flow for instrument segmentation in minimally invasive surgery video//Proceedings of the 22nd International Conference on Medical Image Computing and Computer-Assisted Intervention. Shenzhen, China: Springer: 440-448 [DOI: 10.1007/978-3-030-32254-0_49http://dx.doi.org/10.1007/978-3-030-32254-0_49]
Jing Z C, Ye J T and Xu G L. 2018. A geometric flow approach for region-based image segmentation-theoretical analysis. Acta Mathematicae Applicatae Sinica, English Series, 34(1): 65-76 [DOI: 10.1007/s10255-018-0723-4http://dx.doi.org/10.1007/s10255-018-0723-4]
Li K Y, Ding G T and Wang H T. 2018. L-FCN: a lightweight fully convolutional network for biomedical semantic segmentation//Proceedings of 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). Madrid, Spain: IEEE: 2363-2367 [DOI: 10.1109/BIBM.2018.8621265http://dx.doi.org/10.1109/BIBM.2018.8621265]
Liu D C, Wei Y H, Jiang T T, Wang Y Z, Miao R L, Shan F and Li Z Y. 2020. Unsupervised surgical instrument segmentation via anchor generation and semantic diffusion//Proceedings of the 23rd International Conference on Medical Image Computing and Computer-Assisted Intervention. Lima, Peru: Springer: 657-667 [DOI: 10.1007/978-3-030-59716-0_63http://dx.doi.org/10.1007/978-3-030-59716-0_63]
Long J, Shelhamer E and Darrell T. 2015. Fully convolutional networks for semantic segmentation//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, USA: IEEE: 3431-3440 [DOI: 10.1109/CVPR.2015.7298965http://dx.doi.org/10.1109/CVPR.2015.7298965]
Lu H C, Tian S W, Yu L, Liu L, Cheng J L, Wu W D, Kang X J and Zhang D Z. 2022. DCACNet: dual context aggregation and attention-guided cross deconvolution network for medical image segmentation. Computer Methods and Programs in Biomedicine, 214: #106566 [DOI: 10.1016/j.cmpb.2021.106566http://dx.doi.org/10.1016/j.cmpb.2021.106566]
Luo K K, Wang T and Ye F F. 2021. U-Net segmentation model of brain tumor MR image based on attention mechanism and multi-view fusion. Journal of Image and Graphics, 26(9): 2208-2218
罗恺锴, 王婷, 叶芳芳. 2021. 引入注意力机制和多视角融合的脑肿瘤MR图像U-Net分割模型. 中国图象图形学报, 26(9): 2208-2218 [DOI: 10.11834/jig.200584http://dx.doi.org/10.11834/jig.200584]
Mahmood T, Cho S W and Park K R. 2022. DSRD-Net: dual-stream residual dense network for semantic segmentation of instruments in robot-assisted surgery. Expert Systems with Applications, 202: #117420 [DOI: 10.1016/j.eswa.2022.117420http://dx.doi.org/10.1016/j.eswa.2022.117420]
Ni Z L, Bian G B, Xie X L, Hou Z G, Zhou X H and Zhou Y J. 2019. RASNet: segmentation for tracking surgical instruments in surgical videos using refined attention segmentation network//Proceedings of the 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). Berlin, Germany: IEEE: 5735-5738 [DOI: 10.1109/EMBC.2019.8856495http://dx.doi.org/10.1109/EMBC.2019.8856495]
Oktay O, Schlemper J, Le Folgoc L, Lee M, Heinrich M, Misawa K, Mori K, McDonagh S, Hammerla N Y, Kainz B, Glocker B and Rueckert D. 2018. Attention U-Net: learning where to look for the pancreas [EB/OL]. [2022-08-15]. https://arxiv.org/pdf/1804.03999.pdfhttps://arxiv.org/pdf/1804.03999.pdf
Ronneberger O, Fischer P and Brox T. 2015. U-Net: convolutional networks for biomedical image segmentation//Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Munich, Germany: Springer: 234-241 [DOI: 10.1007/978-3-319-24574-4_28http://dx.doi.org/10.1007/978-3-319-24574-4_28]
Simonyan K and Zisserman A. 2015. Very deep convolutional networks for large-scale image recognition [EB/OL]. [2022-08-15]. https://arxiv.org/pdf/1409.1556.pdfhttps://arxiv.org/pdf/1409.1556.pdf
Srivastava A, Chanda S, Jha D, Riegler M A, Halvorsen P, Johansen D and Pal U. 2021. PAANet: progressive alternating attention for automatic medical image segmentation//Proceedings of the 4th International Conference on Bio-Engineering for Smart Technologies. Paris/Créteil, France: IEEE: 1-4 [DOI: 10.1109/BioSMART54244.2021.9677844http://dx.doi.org/10.1109/BioSMART54244.2021.9677844]
Srivastava A, Jha D, Chanda S, Pal U, Johansen H D, Johansen D, Riegler M A, Ali S and Halvorsen P. 2022. MSRF-Net: a multi-scale residual fusion network for biomedical image segmentation [EB/OL]. [2022-08-15]. https://arxiv.org/pdf/2105.07451v2.pdfhttps://arxiv.org/pdf/2105.07451v2.pdf
Tang J. 2010. A color image segmentation algorithm based on region growing//Proceedings of the 2nd International Conference on Computer Engineering and Technology. Chengdu, China: IEEE: V6-634-V6-637 [DOI: 10.1109/ICCET.2010.5486012http://dx.doi.org/10.1109/ICCET.2010.5486012]
Wang B, Lei Y, Tian S B, Wang T H, Liu Y Z, Patel P, Jani A B, Mao H, Curran W J, Liu T and Yang X F. 2019. Deeply supervised 3D fully convolutional networks with group dilated convolution for automatic MRI prostate segmentation. Medical Physics, 46(4): 1707-1718 [DOI: 10.1002/mp.13416http://dx.doi.org/10.1002/mp.13416]
Wang Y B, Xiao Y X and Wang Z J. 2022. A state transition algorithm based on jump operator applied to image threshold segmentation//Proceedings of the 7th International Conference on Intelligent Computing and Signal Processing (ICSP). Xi’an, China: IEEE: 516-523 [DOI: 10.1109/ICSP54964.2022.9778663http://dx.doi.org/10.1109/ICSP54964.2022.9778663]
Wu H S, Chen S H, Chen G L, Wang W, Lei B Y and Wen Z K. 2022. FAT-Net: feature adaptive Transformers for automated skin lesion segmentation. Medical Image Analysis, 76: #102327 [DOI: 10.1016/j.media.2021.102327http://dx.doi.org/10.1016/j.media.2021.102327]
Xia H Y, Ma M J, Li H S and Song S X. 2022. MC-Net: multi-scale context-attention network for medical CT image segmentation. Applied Intelligence, 52(2): 1508-1519 [DOI: 10.1007/s10489-021-02506-zhttp://dx.doi.org/10.1007/s10489-021-02506-z]
Yang L, Gu Y G, Bian G B and Liu Y H. 2022. DRR-Net: a dense-connected residual recurrent convolutional network for surgical instrument segmentation from endoscopic images. IEEE Transactions on Medical Robotics and Bionics, 4(3): 696-707 [DOI: 10.1109/TMRB.2022.3193420http://dx.doi.org/10.1109/TMRB.2022.3193420]
Yu F and Koltun V. 2016. Multi-scale context aggregation by dilated convolutions [EB/OL]. [2022-08-15]. https://arxiv.org/pdf/1511.07122.pdfhttps://arxiv.org/pdf/1511.07122.pdf
Yue Y J, Li X S, Zhao H and Wang H J. 2020. Image segmentation method of crop diseases based on improved segnet neural network//Proceedings of 2020 IEEE International Conference on Mechatronics and Automation (ICMA). Beijing, China: IEEE: 1986-1991 [DOI: 10.1109/ICMA49215.2020.9233609http://dx.doi.org/10.1109/ICMA49215.2020.9233609]
Zhou T, Dong Y L, Huo B Q, Liu S and Ma Z J. 2021. U-Net and its applications in medical image segmentation: a review. Journal of Image and Graphics, 26(9): 2058-2077
周涛, 董雅丽, 霍兵强, 刘珊, 马宗军. 2021. U-Net网络医学图像分割应用综述. 中国图象图形学报, 26(9): 2058-2077 [DOI: 10.11834/jig.200704http://dx.doi.org/10.11834/jig.200704]
Zhou Z W, Rahman Siddiquee M M, Tajbakhsh N and Liang J M. 2018. UNet++: a nested U-Net architecture for medical image segmentation//Proceedings of the 4th International Workshop and the 8th International Workshop on Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support. Granada, Spain: Springer: 3-11 [DOI: 10.1007/978-3-030-00889-5_1http://dx.doi.org/10.1007/978-3-030-00889-5_1]
相关作者
相关机构