TransAS-UNet:融合Swin Transformer和UNet的乳腺癌区域分割算法
徐旺旺1, 许良凤1, 李博凯1, 周曦1, 律娜2, 詹曙1(1.合肥工业大学;2.安徽医科大学第一附属医院) 摘 要
目的 乳腺癌在女性中是致病严重且发病率较高的疾病,早期乳腺癌症检测是全世界需要解决的重要难题。如今乳腺癌的诊断方法有临床检查、影像学检查和组织病理学检查。在影像学检查中常用的方式是X光、CT、磁共振等,其中乳房X光片已被用于检测早期癌症,然而从本地乳房X线照片中手动分割肿块是一项非常耗时且容易出错的任务。因此,需要一个集成的计算机辅助诊断(CAD)系统来帮助放射科医生进行自动和精确的乳房肿块识别。方法 在这项工作中,本文基于深度学习图像分割框架,对比了不同图像分割模型,同时在UNet结构上采用了Swin架构来代替分割任务中的下采样和上采样过程实现局部和全局特征的交互,其次利用transformer来获取更多的全局信息和不同层次特征来取代短连接,实现多尺度特征融合从而精准分割,在分割模型阶段也采用了Multi-Attention ResNet分类网络对癌症区域的等级识别,更好的对乳腺癌进行诊断医疗。结果 论文提出的模型在乳腺癌X光数据集INbreast上实现肿块的准确分割,iou值达到95.58%,Dice系数为93.45%,与其他的分割模型相比提高了4~6%,将得到的二值化分割图像进行四分类,Accuracy值达到95.24%。结论 实验表明,本文提出的TransAS-UNet图像分割方法具有良好的性能和临床意义,该方法优于其他二维图像医学分割方法。
关键词
TransAS-UNet:Regional segmentation of breast cancer by fusion of Swin Transformer and UNet algorithm
XU Wang-wang, XU Liang-feng1, LI Bo-kai1, ZHOU Xi1, LV Na2, ZHAN Shu1(1.Heifei University of Technology;2.Anhui Water Conservancy and Electric Power Technical College) Abstract
Objective Breast cancer is a serious and high morbidity disease in women. Early detection of breast cancer is an important problem that needs to be solved all over the world. The current diagnostic methods for breast cancer include clinical examination, imaging examination and histopathological examination. The commonly used methods in imaging examination are X-ray, CT, magnetic resonance, etc., among which mammograms have been used to detect early cancers, however, manually segmenting the mass from the local mammogram is a very time-consuming and error-prone task. Therefore, an integrated computer aided diagnosis (CAD) system is needed to help radiologists perform automatic and precise breast mass identification. Method In this work, we compared different image segmentation models based on the deep learning image segmentation framework. At the same time, based on the UNet structure, we adopt the Swin architecture to replace the downsampling and upsampling processes in the segmentation task to realize the interaction between local and global features. Secondly, we use transformer to obtain more global information and different hierarchical features to replace short connections, and realize multi-scale feature fusion so as to achieve accurate segmentation. In the segmentation model stage, we also use Multi-Attention ResNet classification network to identify the classification of cancer regions. Better diagnosis and treatment of breast cancer. In the segmentation process, Swin Transformer and ASPP modules are used to replace the common convolution layer by analogy with the UNet structure model, and the shift window and multiple attention are used to achieve the integration of feature information inside the image slice and extract the information complementarity between non-adjacent areas. At the same time, ASPP structure can achieve self-attention of local information with increasing receptive field. A Transformer structure is introduced to correlate information between different layers to prevent the loss of shallow layers of important information during downsampling convolution. The final architecture not only inherits transformer"s advantages in learning global semantic associations, but also uses different levels of characteristics to preserve more semantics and more detail in the model. As the input data set of classification network, binarized images obtained by segmentation model can be used to identify different categories of breast cancer tumors. Based on ResNet50, this classification model adds multi-type attention modules and overfitting operations. SE and SK attention can optimize network parameters so that it only pays attention to the differences of segmentation regions. Thus improving the efficiency of the model. The model proposed by us achieved accurate segmentation of the lump on the breast cancer X-ray data set INbreast, and we also compared it with five segmentation structures, UNet, unet ++, Res18_UNet, MultiRes_UNet and Dense_UNet.After segmentation model, a more accurate binary map of cancer region was obtained. Problems such as feature information blending of different levels and self-concern of local information of convolutional layer exist in up-down sampling based on UNet structure. Therefore, Swin Transformer structure is adopted, which has sliding window operation and hierarchical design. Shifted Window Attention is shifted mainly by the Window Attention module and the shifted window attention module, which enables the input feature graph to be sliced into multiple Windows, shifted the weight of each window according to shifted self-attention, and shifted the position of the entire feature graph. It can realize the information interaction within the same feature graph. In the process of up and down sampling, we use four Swin Transformer structures, and in the process of slice fusion, we use pyramid structure ASPP to replace the common feature graph channel addition operation, which can use multiple convolution check feature graph convolution and channel fusion, and the given input can be sampled in parallel with cavity convolution at different sampling rates. Achieve multiple scale capture image context information. In order to better integrate high and low dimensional spatial information, we propose a new multi-scale feature graph fusion strategy and use Transformer with skip connections to enhance spatial domain information representation.Each cancer image was classified into normal, mass, deformation and calcification according to the introduction of INbreast data set. Each category was labeled and then sent to the classification network. The classification model we adopt takes ResNet50 as the baseline model. On this basis, two different kinds of attention, SE and SK, are added. SK convolution mainly replaces 3×3 convolution at every BottleNeck, which is a bottleneck. Thus, more image features can be extracted at the convolutional layer, while SE belongs to channel attention, and each channel can be weighted before the pixel value is output. Three methods, Gaussian error gradient descent, label smoothing and partial data enhancement, are introduced to improve the accuracy of the model. Result In the same parameter environment, the iou value reached 95.58%. Dice coefficient was 93.45%, which was 4~6% higher than that of other segmentation models. The binary segmentation image is classified into four categories, and the Accuracy reached 95.24%. Conclusion .Experiments show that our proposed TransAS-UNet image segmentation method has good performance and clinical significance, which is superior to other 2D image medical segmentation methods.
Keywords
|