Current Issue Cover
改进预训练编码器U-Net模型的PET肿瘤自动分割

何慧, 陈胜(上海理工大学光电信息与计算机工程学院, 上海 200093)

摘 要
目的 为制定放疗计划并评估放疗效果,精确的PET(positron emission tomography)肿瘤分割在临床中至关重要。由于PET图像存在低信噪比和有限的空间分辨率等特点,为此提出一种应用预训练编码器的深度卷积U-Net自动肿瘤分割方法。方法 模型的编码器部分用ImageNet上预训练的VGG19编码器代替;引入基于Jaccard距离的损失函数满足对样本重新加权的需要;引入了DropBlock取代传统的正则化方法,有效避免过拟合。结果 PET数据库共包含1 309幅图像,专业的放射科医师提供了肿瘤的掩模、肿瘤的轮廓和高斯平滑后的轮廓作为模型的金标准。实验结果表明,本文方法对PET图像中的肿瘤分割具有较高的性能。Dice系数、Hausdorff距离、Jaccard指数、灵敏度和正预测值分别为0.862、1.735、0.769、0.894和0.899。最后,给出基于分割结果的3维可视化,与金标准的3维可视化相对比,本文方法分割结果可以达到金标准的88.5%,这使得在PET图像中准确地自动识别和连续测量肿瘤体积成为可能。结论 本文提出的肿瘤分割方法有助于实现更准确、稳定、快速的肿瘤分割。
关键词
Automatic tumor segmentation in PET by deep convolutional U-Net with pre-trained encoder

He Hui, Chen Sheng(School of Optical Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China)

Abstract
Objective Positron emission tomography (PET) is a crucial technique established for patient administration in neurology, oncology, and cardiology. Particularly in clinical oncology, fluorodeoxy glucose PET is usually applied in therapy monitoring, radiotherapy planning, staging, diagnosis, and follow-up. Adaptive radiation therapy assists in radiation treatment with the hope that specific therapies aimed at individual patients and target tumors can be developed to re-optimize the treatment plan as early as possible. The use of PET greatly benefits adaptive radiation therapy. Manual delineation is time-consuming and highly dependent on observers. Previous studies have shown that automatic computer-generated segmentations are more reproducible than manual delineations, especially for radiomic analysis. Therefore, automatic and accurate tumor delineation is highly demanded for subsequent determination of therapeutic options and achievement of an improved prognosis. Over the past decade, dozens of methods have been used, which rely on multiple image segmentation approaches or composed methods from broad categories, including thresholding, region-based, contour-based, and graph-based methods as well as clustering, statistical techniques, and machine learning. However, those methods depend on hand-crafted features and possess a limited capability to represent features. For medical image segmentation, convolutional neural networks (CNNs) have demonstrated competitive performance. Nevertheless, these methods show the image classification based on region, in which the integral input image is split up and turns into small regions. Then, whether the small region belongs to the target (foreground) or not can be predicted by the CNN model of each small region. Therefore, each region merely stands for a partial area of the image; thus, this algorithm merely involves limited contextual knowledge that belongs to the small region. A U-Net, which is considered an optimal segmentation network for medical imaging, is trained end to end. It includes a contractive path and an expensive path produced by a combination of convolutional, up-sampling, and pooling layers. This architecture certified itself to be highly effective in using limited amounts of data for segmentation problems. Affected by recent achievements in deep learning, we exploit an automatic tumor segmentation method by deep convolutional U-Net with a pre-trained encoder. Method In this paper, we present a fully automatic method for tumor segmentation by using a 14-layer U-Net model with two blocks of a VGG19 encoder pre-trained with ImageNet. The pre-trained VGG19 encoder contains 260 160 trainable parameters. The rest of our network consists of 14 layers with 1 605 961 trainable parameters. We fix the stride at 2. We propose three-step strategies to ensure effective and efficient learning with limited training data. First, we use the first two blocks of VGG19 as the contracting path and introduce rectified linear units (ReLUs) to each convolutional layer as the activation function. For the symmetrically expanding path, we arrange ReLUs and batch normalization after each convolutional layer. The loss of boundary pixels in each convolution layer necessitates cropping. For the last layer, we use a 1×1 convolution to map each 64-channel feature vector, and each component expresses the chance that the corresponding input pixel is within a target tumor. Second, a tumor holds a small portion within an entire PET image. Therefore, the pixel-wise classification tends to be biased to the outside of targets, leading to a high probability to partially segment or miss tumors. A loss function based on Jaccard distance is applied to eliminate the need for sample re-weighting, which is a typical procedure when using cross-entropy as the loss function for image segmentation due to a strong imbalance between the number of foreground and background pixels. Third, we import the DropBlock technique to replace the normal regularization dropout method because the former can help the U-Net efficiently avoid overfitting. This approach is a structured shape of dropout in which units in a successive region of a feature map are dropped together. In addition to the convolution layers, applying DropBlock in skip connections increases the accuracy. Result A database that contains 1 309 PET images is applied to train and test the proposed segmentation model. We split the database into a before-radiotherapy (BR) sub-database and an after-radiotherapy (AR) sub-database. We use the mask, contour, and smoothed contour of a tumor, which are provided by an expert radiologist, as truths for teaching the proposed model. Experimental results on the BR sub-database show that our method presented a relatively high performance of tumor segmentation in PET images. The Dice coefficient (DI), Hausdorff distance, Jaccard index, sensitivity, and positive predicted value (PPV) are 0.862, 1.735, 0.769, 0.894, and 0.899, respectively. In the test stage, the total processing time of the testing dataset of the BR sub-database needs an average of 1.39 s, which can meet clinical real-time requirements. Then, we fine-tune the weights of the model that we have selected on the BR sub-database by training the network further with the AR sub-database. Experimental results indicate a good segmentation performance with a DI of 0.852, SE of 0.840, and PPV of 0.893. Compared with the traditional U-Net, our method increased by 5.9%, 15.1%, 1.9%, respectively. Finally, the volume of the segmented tumors in the PET images is presented, enabling the accurate automated identification and serial measurement of tumor volumes in PET images. Conclusion This study uses a 14-layer U-Net architecture with a VGG19 pre-trained encoder for tumor segmentation in PET images. We demonstrate how to improve the performance of the U-Net by using a technique called fine-tuning in an encoder of network for initializing weights. Although fine-tuning has been widely applied in image classification tasks, it has not been applied to the like-U-Net-type architectures for medical image segmentation tasks. We use the Jaccard distance as the loss function to improve the segmentation performance. Overall results show that our approach is suitable for various tumors with minimum post-processing and without pre-processing. We believe that this method could be generalized effectively to other medical image segmentation tasks.
Keywords

订阅号|日报