发布时间: 2021-11-16 摘要点击次数: 全文下载次数: DOI: 10.11834/jig.200353 2021 | Volume 26 | Number 11 医学图像处理

 收稿日期: 2020-07-17; 修回日期: 2020-11-14; 预印本日期: 2020-11-21 作者简介: 黎生丹, 1995年生, 男, 硕士研究生, 主要研究方向为基于深度学习的医学影像处理。E-mail: 646286091@qq.com 柏正尧, 通信作者, 男, 教授, 主要研究方向为信号处理、图像处理、模式识别。E-mail: baizhy@ynu.edu.cn *通信作者: 柏正尧  baizhy@ynu.edu.cn 中图法分类号: TP399 文献标识码: A 文章编号: 1006-8961(2021)11-2723-09

# 关键词

Multiorgan lesion detection and segmentation based on deep learning
Li Shengdan, Bai Zhengyao
School of Information Science and Engineering, Yunnan University, Kunming 650091, China

# Abstract

Objective Most of the computed tomography (CT) image analysis networks based on deep learning are designed for a single lesion type, such that they are incapable of detecting multiple types of lesions. The general CT image analysis network focusing on accurate, timely diagnosis and treatment of patients is urgently needed. The public medical image set is quite difficult to build because doctors or researchers must process the existing CT images more efficiently and diagnose diseases more accurately. To improve the performance of CT image analysis networks, several scholars have constructed 3D convolutional neural networks (3D CNN) to extract substantial spatial features which have better performances than those of 2D CNN. However, the high computational complexity in 3D CNN restricts the depth of the designed networks, resulting in performance bottlenecks. Recently, the CT image dataset with multiple lesion types, DeepLesion, has contributed to the universal network construction for lesion detection and segmentation task on CT images. Different lesion scales and types cause a large burden on lesion detection and segmentation. To address the problems and improve the performance of CT image analysis networks, we propose a model based on deep convolutional networks to accomplish the tasks of multi-organ lesion detection and segmentation on CT images, which will help doctors diagnose the disease quickly and accurately. Method The proposed model mainly consists of two parts. 1) Backbone networks. To extract multi-dimension, multi-scale features, we integrate bidirectional feature pyramid networks and densely connected convolutional networks into the backbone network. The model's inputs are the combination of CT key slice and the neighboring slices, where the former provides ground truth information, and the latter provide the 3D context. Combining the backbone network with feature fusion method enables the 2D network to extract spatial information from adjacent slices. Thus, the network can use features of the adjacent slices and key slice, and network performance can be improved by utilizing the 3D context from the CT slices. Moreover, we try to simplify and fine tune the network structure such that our model has a better performance as well as low computational complexity than the original architecture. 2) Detection and segmentation branches. To produce high-quality, typical proposals, we place the features fused with 3D context into the region of proposal network. The cascaded R-CNN (region convolutional neural network) with gradually increasing threshold resamples the generated proposals, and the high-quality proposals are fed into the detection and segmentation branches. We set the anchor ratios to 1:2, 1:1, and 2:1, and the sizes in region of proposal networks to 16, 24, 32, 48, and 96 for the different scales of lesions. We take different cascaded stages with different value of intersection over union such as 0.5, 0.6, and 0.7 to find the suitable cascaded stages. The original region of interest (ROI) pool method is substituted with ROI align for better performances. Result We validate the network's performance on the dataset DeepLesion containing 32 120 CT images with different types of lesions. We split the dataset into three subsets, namely, training set, testing set, and validating set, with proportions of 70%, 15%, and 15%, respectively. We employ the stochastic gradient descent method to train the proposed model with an initial learning rate of 0.001. The rate will drop to 1/10 of the original value in the fourth and sixth epoch (eight epochs in total for training). Four groups of comparative experiments are conducted to explore the effects of different networks on detection and segmentation performance. Multiple network structures such as feature pyramid networks (FPN), bidirectional feature pyramid networks (BiFPN), feature fusion, and different number of cascade stages and segmentation branch are considered in our experiments. Experimental results show that BiFPN can function well in the detection task compared with FPN. Moreover, detection performance is greatly improved by using the feature fusion method. As the number of cascaded stages increases, detection accuracy drops slightly, while the performance of segmentation improves greatly. In addition, the networks without a segmentation branch can detect lesions more accurately than those with a segmentation branch. Hence, we recognize a negative relationship between detection and segmentation tasks. We can select different structures for distinct requirements on detection or segmentation accuracy to achieve satisfying results. If doctors or researchers want to diagnose lesions more accurately, the baseline network without a segmentation branch can meet the requirements. For more accurate segmentation results, baseline network with three cascaded stages network can achieve the goal. We present the results from the three-stage cascaded networks. The results show that the average detection accuracy of our model on the DeepLesion test set is 83.15%, while the average distance error between the segmentation prediction result and the real weak label of response evaluation criteria in solid tumors (RECIST)'s endpoint is 1.27 mm, and the average radius error is 1.69 mm. Our network's performance in segmentation is superior to the multitask universal lesion analysis network for joint lesion detection, tagging, and segmentation and auto RECIST. The inference time per image in our network is 91.7 ms. Conclusion The proposed model achieves good detection and segmentation performance on CT images, and takes less time to predict. It is suitable for accomplishing lesion detection and segmentation in CT images with similar lesion types in the DeepLesion dataset. Our model trained on DeepLesion can help doctors diagnose lesions on multiple organs using a computer.

# Key words

deep learning; computed tomography(CT)images; lesion detection; lesion segmentation; DeepLeison

# 1.1 Mask R-CNN

Mask R-CNN(region convolutional neural networks)(He等，2017)是计算机视觉领域通用的实例分割框架，可以实现对目标的检测与分割任务，如图 1所示。为使特征图中的感兴趣区域(regions of interest，ROI)能更精确地采样，引入ROI Align操作。

# 4.2.1 检测分支

 $S = \frac{{TP}}{{TP + FN}}$ (1)

# 4.2.2 分割分支

DeepLesion数据集没有提供像素级别的分割标签，采用实体瘤疗效评价标准(response evaluation criteria in solid tumors，RECIST)(Eisenhauer等，2009)构建分割的软标签，真实标注框、RECIST以及检测与分割结果如图 9所示。加粗的绿色框和绿色的长短轴分别表示真实标注框和RECIST，红色长短轴及其构建的轮廓是网络预测的分割结果，红色细框表示预测框，上方的0.976表示预测为病灶的概率。分割指标采用与MULAN相同的两项指标：

1) 从真实标签的端点到预测结果的平均距离误差；

2) 预测结果与真实标签直径长度的平均误差。

# 4.3 实验结果分析

Table 1 Comparison of detection accuracy(sensitivity) and ablation studies on the test set of DeepLesion among different methods

 /% 方法 每幅图像负样本数 均值 0.5 1 2 4 ULDor(Tang等，2019) 52.86 64.8 74.84 84.38 69.22 3DCE(Yan等，2018a) 62.48 73.37 80.7 85.65 75.55 RetinaNet(Zlocha等，2019) 72.15 80.07 86.4 90.77 82.35 MULAN*(Yan等，2019) - - - - 84.24 FPN无特征融合 65.88 75.67 82.25 87.46 77.81 BiFPN无特征融合 67.00 76.2 82.94 88.09 78.56 FPN 74.96 82.45 87.13 91.21 83.94 BiFPN(基准网络) 75.47 82.76 87.85 91.00 84.27 BiFPN(基准网络)+无分割 75.61 82.89 88.25 92.09 84.71 BiFPN+级联两阶段 72.42 81.9 87.36 91.41 83.27 BiFPN+级联两阶段+无分割 73.37 82.61 87.89 91.67 83.89 BiFPN+级联三阶段 73.09 81.33 87.03 91.16 83.15 BiFPN+级联三阶段+无分割 73.85 81.54 87.37 91.28 83.51 注：加粗字体表示各列最优结果，“-”表示该项指标不详，无分割表示剔除分割任务。

Table 2 Comparison of segmentation accuracy (error) and ablation studies on the test set of DeepLesion among different methods

 /mm 方法 端点平均距离 直径平均误差 Auto RECIST(Tang等，2018) - 1.71 MULAN*(Yan等，2019) 1.43 1.97 BiFPN(基准网络) 1.49 2.07 BiFPN(基准网络)+级联两阶段 1.38 1.88 BiFPN(基准网络)+级联三阶段 1.27 1.69 注：加粗字体表示各列最优结果，“-”表示该项指标不详。

Table 3 Comparison of inference time per image among different methods

 /ms 方法 每幅图像推断时间 MULAN*(Yan等，2019) 102.2 BiFPN(基准网络) 95.7 BiFPN(基准网络)+ 级联两阶段 93.8 BiFPN(基准网络)+ 级联三阶段 91.7 注：加粗字体表示最优结果。

1) 第1组实验。从表 1可知，BiFPN病灶检测性能明显优于FPN，说明引入权重的BiFPN能够更好地平衡不同特征图的影响，从而提高检测性能。

2) 第2组实验。从表 1可以看出，加入特征融合后，网络性能都得到提升，而BiFPN结合特征融合的网络性能优于其他网络，说明特征融合通过提取CT相邻切片中的空间信息，能够较大幅度地提高网络特征提取能力，进而提高检测精确度。

3) 第3组实验。从表 1可知，随着级联阶段的增多，检测性能有所下降，说明设定值为0.5的阈值能较好地符合候选样本的IOU值，而过大的阈值容易造成过拟合而导致精度下降。从表 2可知，随着级联阶段的增加，分割精度有较大提升，并且级联三阶段网络分割误差最小，本文的实验结果即为级联三阶段网络的预测结果图。从表 3可知，级联三阶段网络每幅图像推断时间最短，说明符合要求高阈值的候选样本数量较少，因而处理速度得到加快。

4) 第4组实验。从表 1可知，去除分割分支后的网络检测性能都有所提高。结合特征融合的BiFPN网络在去除分割任务后的检测精度达到最优，为84.71%，说明分割任务与检测任务存在耦合关系。

# 参考文献

• Cai Z W and Vasconcelos N. 2018. Cascade R-CNN: delving into high quality object detection//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 6154-6162[DOI: 10.1109/CVPR.2018.00644]
• Eisenhauer E A, Therasse P, Bogaerts J, Schwartz L H, Sargent D, Ford R, Dancey J, Arbuck S, Gwyther S, Mooney M, Rubinstein L, Shankar L, Dodd L, Kaplan R, Lacombe D, Verweij J. 2009. New response evaluation criteria in solid tumours: revised RECIST guideline (version 1.1). European Journal of Cancer, 45(2): 228-247 [DOI:10.1016/j.ejca.2008.10.026]
• He K M, Gkioxari G, Dollár P and Girshick R. 2017. Mask R-CNN//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 2980-2988[DOI: 10.1109/ICCV.2017.322]
• Huang G, Liu Z, Van Der Maaten L and Weinberger K Q. 2017. Densely connected convolutional networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 2261-2269[DOI: 10.1109/CVPR.2017.243]
• Lin T Y, Dollár P, Girshick R, He K M, Hariharan B and Belongie S. 2017. Feature pyramid networks for object detection//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 936-944[DOI: 10.1109/CVPR.2017.106]
• Ren S Q, He K M, Girshick R, Sun J. 2017. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6): 1137-1149 [DOI:10.1109/TPAMI.2016.2577031]
• Roth H R, Shen C, Oda H, Oda M, Hayashi Y, Misawa K, Mori K. 2018. Deep learning and its application to medical image segmentation. Medical Imaging Technology, 36(2): 63-71 [DOI:10.11409/mit.36.63]
• Setio A A A, Traverso A, De Bel T, Berens M S N, Van Den Bogaard C, Cerello P, Chen H, Dou Q, Fantacci M E, Geurts B, Van Der Gugten R, Heng P A, Jansen B, De Kaste M M J, Kotov V, Lin J Y H, Manders J T M C, Sóñora-Mengana A, García-Naranjo J C, Papavasileiou E, Prokop M, Saletta M, Schaefer-Prokop C M, Scholten E T, Scholten L, Snoeren M M, Torres E L, Vandemeulebroucke J, Walasek N, Zuidhof G C A, Van Ginneken B, Jacobs C. 2017. Validation, comparison, and combination of algorithms for automatic detection of pulmonary nodules in computed tomography images: the luna16 challenge. Medical Image Analysis, 42: 1-13 [DOI:10.1016/j.media.2017.06.015]
• Taha A, Lo P, Li J N and Zhao T. 2018. Kid-net: convolution networks for kidney vessels segmentation from CT-volumes//Proceedings of the 21st International Conference on Medical Image Computing and Computer Assisted Intervention. Granada, Spain: Springer: 463-471[DOI: 10.1007/978-3-030-00937-3_53]
• Tan M X, Pang R M and Le Q V. 2020. EfficientDet: scalable and efficient object detection//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 10778-10787[DOI: 10.1109/CVPR42600.2020.01079]
• Tang Y, Harrison A P, Bagheri M, Xiao J and Summers R M. 2018. Semi-automatic RECIST labeling on CT scans with cascaded convolutional neural networks//Proceedings of the 21st International Conference on Medical Image Computing and Computer-Assisted Intervention. Granada, Spain: Springer: 405-413[DOI: 10.1007/978-3-030-00937-3_47]
• Tang Y B, Yan K, Tang Y X, Liu J M, Xiao J and Summers R M. 2019. Uldor: A universal lesion detector for CT scans with pseudo masks and hard negative example mining//The 16th IEEE International Symposium on Biomedical Imaging (ISBI 2019). Venice, Italy: IEEE: 833-836[DOI: 10.1109/ISBI.2019.8759478]
• Xie S N, Girshick R, Dollár P, Tu Z W and He K M. 2017. Aggregated residual transformations for deep neural networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 5987-5995[DOI: 10.1109/CVPR.2017.634]
• Xie W Y, Chen Y B, Wang J Y, Li Q, Chen Q. 2019. Detection of pulmonary nodules in CT images based on convolutional neural networks. Computer Engineering and Design, 40(12): 3575-3581 (谢未央, 陈彦博, 王季勇, 李强, 陈群. 2019. 基于卷积神经网络的CT图像肺结节检测. 计算机工程与设计, 40(12): 3575-3581) [DOI:10.16208/j.issn1000-7024.2019.12.035]
• Yan K, Bagheri M and Summers R M. 2018a. 3D context enhanced region-based convolutional neural network for end-to-end lesion detection//Proceedings of the 21st International Conference on Medical Image Computing and Computer-Assisted Intervention. Granada, Spain: Springer: 511-519[DOI: 10.1007/978-3-030-00928-1_58]
• Yan K, Tang Y, Peng Y, Sandfort V, Bagheri M, Lu Z Y and Summers R M. 2019. MULAN: multitask universal lesion analysis network for joint lesion detection, tagging, and segmentation//Proceedings of the 22nd International Conference on Medical Image Computing and Computer-Assisted Intervention. Shenzhen, China: Springer: 194-202[DOI: 10.1007/978-3-030-32226-7_22]
• Yan K, Wang X S, Lu L, Summers R M. 2018b. DeepLesion: automated mining of large-scale lesion annotations and universal lesion detection with deep learning. Journal of Medical Imaging, 5(3): #036501 [DOI:10.1117/1.JMI.5.3.036501]
• Zlocha M, Dou Q and Glocker B. 2019. Improving retinaNet for CT lesion detection with dense masks from weak RECIST labels//Proceedings of the 22nd International Conference on Medical Image Computing and Computer-Assisted Intervention. Shenzhen, China: Springer: 402-410[DOI:10.1007/978-3-030-32226-7_45]