黎生丹,柏正尧(云南大学信息学院, 昆明 650091)
目的 多部位病灶具有大小各异和类型多样的特点，对其准确检测和分割具有一定的难度。为此，本文设计了一种2.5D深度卷积神经网络模型，实现对多种病灶类型的计算机断层扫描（computed tomography，CT）图像的病灶检测与分割。方法 利用密集卷积网络和双向特征金字塔网络组成的骨干网络提取图像中的多尺度和多维度信息，输入为带有标注的中央切片和提供空间信息的相邻切片共同组合而成的CT切片组。将融合空间信息的特征图送入区域建议网络并生成候选区域样本，再由多阈值级联网络组成的Cascade R-CNN（region convolutional neural networks）筛选高质量样本送入检测与分割分支进行训练。结果 本文模型在DeepLesion数据集上进行验证。结果表明，在测试集上的平均检测精度为83.15%，分割预测结果与真实标签的端点平均距离误差为1.27 mm，直径平均误差为1.69 mm，分割性能优于MULAN（multitask universal lesion analysis network for joint lesion detection，tagging and segmentation）和Auto RECIST（response evaluation criteria in solid tumors），且推断每幅图像平均时间花费仅91.7 ms。结论 对于多种部位的CT图像，本文模型取得良好的检测与分割性能，并且预测时间花费较少，适用病变类别与DeepLesion数据集类似的CT图像实现病灶检测与分割。本文模型在一定程度上能满足医疗人员利用计算机分析多部位CT图像的需求。
Multiorgan lesion detection and segmentation based on deep learning
Li Shengdan,Bai Zhengyao(School of Information Science and Engineering, Yunnan University, Kunming 650091, China)
Objective Most of the computed tomography (CT) image analysis networks based on deep learning are designed for a single lesion type, such that they are incapable of detecting multiple types of lesions. The general CT image analysis network focusing on accurate, timely diagnosis and treatment of patients is urgently needed. The public medical image set is quite difficult to build because doctors or researchers must process the existing CT images more efficiently and diagnose diseases more accurately. To improve the performance of CT image analysis networks, several scholars have constructed 3D convolutional neural networks (3D CNN) to extract substantial spatial features which have better performances than those of 2D CNN. However, the high computational complexity in 3D CNN restricts the depth of the designed networks, resulting in performance bottlenecks. Recently, the CT image dataset with multiple lesion types, DeepLesion, has contributed to the universal network construction for lesion detection and segmentation task on CT images. Different lesion scales and types cause a large burden on lesion detection and segmentation. To address the problems and improve the performance of CT image analysis networks, we propose a model based on deep convolutional networks to accomplish the tasks of multi-organ lesion detection and segmentation on CT images, which will help doctors diagnose the disease quickly and accurately. Method The proposed model mainly consists of two parts. 1) Backbone networks. To extract multi-dimension, multi-scale features, we integrate bidirectional feature pyramid networks and densely connected convolutional networks into the backbone network. The model's inputs are the combination of CT key slice and the neighboring slices, where the former provides ground truth information, and the latter provide the 3D context. Combining the backbone network with feature fusion method enables the 2D network to extract spatial information from adjacent slices. Thus, the network can use features of the adjacent slices and key slice, and network performance can be improved by utilizing the 3D context from the CT slices. Moreover, we try to simplify and fine tune the network structure such that our model has a better performance as well as low computational complexity than the original architecture. 2) Detection and segmentation branches. To produce high-quality, typical proposals, we place the features fused with 3D context into the region of proposal network. The cascaded R-CNN (region convolutional neural network) with gradually increasing threshold resamples the generated proposals, and the high-quality proposals are fed into the detection and segmentation branches. We set the anchor ratios to 1:2, 1:1, and 2:1, and the sizes in region of proposal networks to 16, 24, 32, 48, and 96 for the different scales of lesions. We take different cascaded stages with different value of intersection over union such as 0.5, 0.6, and 0.7 to find the suitable cascaded stages. The original region of interest (ROI) pool method is substituted with ROI align for better performances. Result We validate the network's performance on the dataset DeepLesion containing 32 120 CT images with different types of lesions. We split the dataset into three subsets, namely, training set, testing set, and validating set, with proportions of 70%, 15%, and 15%, respectively. We employ the stochastic gradient descent method to train the proposed model with an initial learning rate of 0.001. The rate will drop to 1/10 of the original value in the fourth and sixth epoch (eight epochs in total for training). Four groups of comparative experiments are conducted to explore the effects of different networks on detection and segmentation performance. Multiple network structures such as feature pyramid networks (FPN), bidirectional feature pyramid networks (BiFPN), feature fusion, and different number of cascade stages and segmentation branch are considered in our experiments. Experimental results show that BiFPN can function well in the detection task compared with FPN. Moreover, detection performance is greatly improved by using the feature fusion method. As the number of cascaded stages increases, detection accuracy drops slightly, while the performance of segmentation improves greatly. In addition, the networks without a segmentation branch can detect lesions more accurately than those with a segmentation branch. Hence, we recognize a negative relationship between detection and segmentation tasks. We can select different structures for distinct requirements on detection or segmentation accuracy to achieve satisfying results. If doctors or researchers want to diagnose lesions more accurately, the baseline network without a segmentation branch can meet the requirements. For more accurate segmentation results, baseline network with three cascaded stages network can achieve the goal. We present the results from the three-stage cascaded networks. The results show that the average detection accuracy of our model on the DeepLesion test set is 83.15%, while the average distance error between the segmentation prediction result and the real weak label of response evaluation criteria in solid tumors (RECIST)'s endpoint is 1.27 mm, and the average radius error is 1.69 mm. Our network's performance in segmentation is superior to the multitask universal lesion analysis network for joint lesion detection, tagging, and segmentation and auto RECIST. The inference time per image in our network is 91.7 ms. Conclusion The proposed model achieves good detection and segmentation performance on CT images, and takes less time to predict. It is suitable for accomplishing lesion detection and segmentation in CT images with similar lesion types in the DeepLesion dataset. Our model trained on DeepLesion can help doctors diagnose lesions on multiple organs using a computer.