多分支深度特征融合的中医脑卒中辅助诊断

王麒达; 冀伦文; 强彦; 王华虎; 赵琛琦; 李慧芝; 赵紫娟

发布时间： 2022-03-19
摘要点击次数： 12196
全文下载次数： 33084
DOI: 10.11834/jig.210744
2022 | Volume 27 | Number 3

多分支深度特征融合的中医脑卒中辅助诊断

王麒达¹, 冀伦文², 强彦¹, 王华虎³, 赵琛琦¹, 李慧芝⁴, 赵紫娟¹(1.太原理工大学信息与计算机学院, 晋中 030600;2.太原理工大学期刊中心, 太原 030024;3.北京大学光华管理学院, 北京 100871;4.山西慧虎健康科技有限公司, 太原 030032)

摘要

目的脑卒中发病征兆不明显，发病速度快且致死率高，目前医学领域的主要诊疗手段仍是针对脑卒中病时和病后，但在病前预测方面缺少有效办法。中医对于治未病等效果显著，其中望诊更是中医诊疗的重要方式。本文结合中医望诊，基于面部与手部图像提出了一种多分支深度特征融合的中医脑卒中辅助诊断方法。方法针对不同部位图像，分别构建两部位的双分支特征提取模块，将面部和手部的重点区域作为主体分支提取主要特征。根据中医望诊在面部与手部诊疗的特点，进一步将眉心的纹理特征和掌心的颜色特征作为辅助信息提取辅助特征；在此基础上提出信息交互模块（information interaction module，IIM），将主要特征与辅助特征进行有效信息交互，从而辅助主体分支提取更有区别性的信息；最终将两部位的特征进行融合降维用于脑卒中辅助诊断。结果本文将采集的3 011例面部和手部图像数据进行筛查扩充后作为实验数据集，并在不同评价指标下与当前主流的分类模型进行对比。实验结果表明，本文方法在准确性上达到了83.36%，相比ResNet-34、DenseNet121、VGG16（Visual Geometry Group 16-layer net）和InceptionV3等其他主流分类模型性能提高了3%~7%；在特异性和敏感性方面分别为82.47%和85.10%，其效果优于对比方法。结论本文方法能够有效结合中医望诊的诊疗经验并通过传统面部和手部图像实现对脑卒中的有效预测，为中医望诊在脑卒中方面的客观化和便捷化发展提供了帮助。

关键词

中医望诊脑卒中图像识别特征提取特征融合卷积神经网络

Multi-branch deep feature fusion method for traditional Chinese medicine intervened human cerebral stroke aided diagnosis

Wang Qida¹, Ji Lunwen², Qiang Yan¹, Wang Huahu³, Zhao Chenqi¹, Li Huizhi⁴, Zhao Zijuan¹(1.College of Information and Computer, Taiyuan University of Technology, Jinzhong 030600, China;2.The Journal Center, Taiyuan University of Technology, Taiyuan 030024, China;3.Guanghua School of Management, Peking University, Beijing 100871, China;4.Shanxi Huihu Health Science and Technology Company with Limited Liability, Taiyuan 030032, China)

Abstract

Objective Stroke is a severe human cerebrovascular disease that causes brain tissue damage due to sudden rupture of blood vessels or vascular obstruction originated blood flow inefficiency. The incidence of ischemic stroke is high frequency, high recurrence rate and high fatality rate. Traditional Chinese medicine (TCM) has its priority of stroke. In the four-diagnosis-inspection, listening, asking and feeling the pulse-(in Chinese) of Chinese medicine, the prime step of inspection shows that information extraction is one of the essential factors of TCM. But, TCM has its constraints of medical standardization and manual factor issues. A deep learning technology is benefit to further recognize the constraints of TCM. Method Our research illustrates a dual-branch cross-attention feature fusion model (DCFFM) based on facial images and hand images. It can assists in predicting stroke disease well. The overall model is divided into three parts:facial feature extraction module, hand feature extraction module and feature fusion module. For the facial feature extraction module, we construct the subject branch and auxiliary information branch to extract facial features. In accordance with the guidance of Chinese medicine doctors, we pre-process the facial image and use the key diagnostic area of the stroke in the face as the input of the main branch. In addition, we also integrate the knowledge of inspection of TCM to cut out the image of the area around the eyebrows, and use the Sobel filter to extract the gradient image as the input of the auxiliary information branch. For the hand feature extraction module, this demonstration adopts the same double-branch structure to cut out the palm area as the input of the main branch. In order to more stably and accurately reflect the pathological condition of the hand and reflect the small changes in the characteristics of the hand, we convert the palm image from the RGB color space to the HSV color space and transfer it to the differentiated auxiliary information branch. The proposed branches are respectively input to their respective convolution blocks, and the depth characteristics of the input data are extracted based on the convolution operation. Max pooling is used on the feature maps and batch normalization is used to prevent the model from over-fitting. In addition, we use two loss functions to constrain the training of the two feature extraction modules, and use the total loss to constrain the entire model. Between the two branches of each feature extraction module, we built an information interaction module (IIM) for further information interaction amongst the branches to reveal the model extract distinctive features. It assigns a certain weight to the feature map of auxiliary information and then interconnected with the feature of the subject branch. We use 1×1 convolution fusion to reduce dimensionality. Under no special operations circumstances, the IIM can be trained in an end-to-end manipulation. For the feature fusion module, multiple convolutional layers are used for overall fusion dimensionality reduction to generate the prediction result via the multi-branch deep feature fusion based on fusing the depth features of the facial feature extraction module and the hand feature extraction module. Result In order to aid model training and improve the stability and robustness of the model, our demonstration screens and extends the collected 3 011 face and hand image data. We remove some scrambled images with scars and conduct the data extension by horizontal flipping. We remove some images of peeling, disability, background clutter and implement random option for the remaining images to expand the data by horizontal flipping. It is determined that 3 964 images of positive and negative samples are involved as the experimental data set. Our multiple sets of comparative experiments and ablation experiments have been facilitated based on a variety of evaluation indicators to verify the performance of the model, such as accuracy, specificity, sensitivity, and F1-score. First, we compare the overall performance of the proposed method with the current mainstream classification algorithms. Experimental results show that the accuracy of the method proposed reaches 83.36%, which is 3%-7% higher than the performance of other mainstream classification models. Based on ten-fold cross-validation, the specificity and sensitivity reached 82.47% and 85.10% respectively. The illustrated sensitivity shows a relatively large advantage, indicating that the method in this paper has a better performance for the detection of true positives. Next, we still verified the impact of facial feature extraction module, hand feature extraction module and IIM on the performance of the model. This analyzed results show that feature extraction of face data and hand data can effectively improve the performance of the model simultaneously. In addition, the IIM has targeted the sensitivity and specificity of the model. Conclusion Our method can use human facial features and human hand data to assist in stroke prediction, and has good stability and robustness. Meanwhile, the demonstrated IIM also promotes the information interaction between multi-branch tasks.

Keywords

inspection diagnosis of traditional Chinese medicine stroke image identification feature extraction feature fusion convolutional neural network