融合注意力机制与可变形卷积的多尺度骨病变检测

方成; 柏正尧

发布时间： 2021-09-16
摘要点击次数： 1989
全文下载次数： 1365
DOI: 10.11834/jig.200476
2021 | Volume 26 | Number 9

融合注意力机制与可变形卷积的多尺度骨病变检测

方成, 柏正尧(云南大学信息学院, 昆明 650500)

摘要

目的在计算机断层扫描（computed tomography，CT）影像中对骨组织部位进行自动分析和检测，对于骨科疾病的早期诊断具有重要意义，然而基于人工分析诊断的方法存在效率较低、诊断的准确性和客观一致性无法保证等问题。为此，本文研究构建一个骨组织病变检测的级联神经网络模型，以期为骨科医生的诊断提供支持。方法在影像预处理阶段使用改进的增强方法对CT影像进行对比度增强并获取影像中的人体有效部位；根据骨骼组织CT值（Hounsfield unit，HU）的分布范围进行阈值分割，得到大致的骨组织区域；以级联目标检测模型为研究基线，结合注意力机制与可变形卷积增加特征图的全局上下文的相关性，以适应形态多变的骨病灶；通过特征融合模块促进不同尺度特征信息之间的融合，并在多个尺度特征图上分别进行骨组织病变训练和预测。结果在DeepLesion数据集上进行实验，实验结果表明，本文网络对骨病变检测的召回率（recall）、准确率（precision）、F1分数、平均精度（average precision，AP）分别为0.85、0.613、0.712以及0.816；较对照组中性能最优的通用CT病变检测网络对骨病变检测的召回率提升0.15。结论本文提出的网络模型对CT骨组织病变具有较好的检测效果，能够对骨组织病变判别诊断提供辅助支持，提高诊断效率，降低漏诊风险。

关键词

骨病变检测多尺度目标检测注意力机制医学影像处理级联神经网络

Multi-scale bone lesion detection based on attention mechanism and deformable convolution

Fang Cheng, Bai Zhengyao(School of Information Science and Engineering, Yunnan University, Kunming 650500, China)

Abstract

Objective Since frequent orthopedic diseases cause serious harm to human body, automatic analysis and detection of bone tissue position in computed tomography (CT) has crucial clinical significance for early diagnosis of orthopedic diseases. The method based on manual analysis and diagnosis of bone tissue in CT image has problems such as low efficiency. The accuracy and objective consistency of diagnosis cannot be achieved. Therefore, A cascaded neural network model for bone tissue lesion detection has been demonstrated to aid decision support for orthopedic surgeons' diagnosis. Method The proposed bone lesion detection algorithm has mainly consist of four steps. At first, convert original data into CT value data in terms of the conversion formula and relative files that have been illustrated by the National Institutes of Health Clinical Center (NIHCC), USA, in the preprocessing stage. The segmentation has been conducted via the mean value of CT value data as the threshold in order to filter out most of the non-human body parts in the image. The segmentation cannot be filtered out entirely due to the high CT value of the bed plate material of CT equipment. Rectangle kernels (RK)-based opening operation in morphological operations have benefited to filter out the CT bed plate from CT image. According to the characteristics of bone tissue in CT image, a contrast enhancement method based on Gamma transform is proposed to enhance the contrast of CT images. Next, the approximate bone tissue area in enhanced CT images have been calculated via thresholding based on the distribution range of the Hounsfield unit (HU) of the bone tissue in the CT image. The cascaded object detection model has been set up as our baseline. The attention mechanism and deformable convolution have been integrated to increase the global context relevance of the feature map based on the bone lesions with variable shapes. At last, the feature fusion module has been used to strengthen the fusion of feature information at various scales. A multi-scale feature map for the training and prediction of bone tissue lesions has been sorted out. Result Four designated groups of comparative experiments in the context of the network structure have compared with modeling capability. The detected model has been mainly examined based on average precision (AP). 1) ResNet50, ResNet101 and ResNeXt101 have been used as feature extraction networks to complete training and testing based on naïve Cascade R-CNN(region-convolutional neural network) model to calculate the baseline. The analyzed results have shown that ResNeXt101 has the best optimization based on the AP up to 0.543 to get the baseline. 2) Feature pyramid networks (FPN), path aggregation feature pyramid networks (PAFPN), neural architecture search-feature pyramid networks (NAS-FPN) and naive structure have been adapted to complete model training and testing based on the calculated Cascade R-CNN. The feature fusion module based on the best value has been sorted out. The best PAFPN based on the AP increased to 0.721 has been leaked out the feature fusion module. 3) Two groups of comparative experiments have been illustrated. Firstly, batch normalization (BN) and group normalization (GN) modules have been used in the head of R-CNN for entire training and testing. The results have shown that the performance of GN is better than BN with 0.723 AP. Attention mechanism block and deformable convolution block have been embedded in model to verify their effectiveness in the next step. The verified results have shown the effectiveness of attention mechanism and deformable convolution module. The AP have been improved to 0.816. 4) The trained model and other object detection network models have been calculated to compare the testing value of each model. The research experiments results have been achieved based on DeepLesion dataset. The results have shown as below:1) the recall is 0.85; 2) the precision is 0.613; 3) the F1-score is 0.712; 4) the AP is 0.816. The performance has been significantly improved in comparison of the existing universal CT lesion detection models based on the recall rates of 0.574 and 0.70. Conclusion The main methods such as HU value threshold segmentation and morphological operations have been used to filter out most of the non-bone tissue area in the CT image at the image preprocessing stage. The bone tissue area has been highlighted coupled with the enhancement of the image contrast further. The training model has been accelerated to reduce the interference of noise. The second group of experimental results have improved the fusion of low-level feature information, high-level feature information and enhances the location information of high-level features. The semantic information of low-level features based on the multi-scale feature pyramid fusion module has been embedded in the network structure. The detection performance has been significantly improved at the end. The third group of experimental results have been concluded based on the enhancing adaptation of attention mechanism and the weight of irrelevant information deduction. The operation of deformable convolution module has realized the network adaptation further based on multi-shapes and sizes convolution kernels the fourth group of experiment results have achieved via the comparison experiment between our model and other object detection models. The metrics including recall, precision, F1-score and AP have been mainly evaluated in these models. The experimental results have demonstrated that the model analysis has a good detection effect on the CT bone tissue lesions in the context of upgrading diagnostic efficiency, missed diagnosis deduction and quick diagnosis and treatment. The differential diagnosis of bone tissue lesions has been aided effectively. The real-time detection capability can be strengthened via the deduction of model parameters quantity and the time of training and judging.

Keywords

bone lesion detection multi-scale object detection attention mechanism medical image process Cascade R-CNN

在线采编平台

在线出版

年度会议

下载中心

年度信息