发布时间: 2018-06-16 摘要点击次数: 全文下载次数: DOI: 10.11834/jig.170556 2018 | Volume 23 | Number 6 医学图像处理

1. 大连理工大学计算机科学与技术学院, 大连 116024;
2. 大连民族大学计算机科学与工程学院, 大连 116600
 收稿日期: 2017-10-26; 修回日期: 2017-12-05 基金项目: 国家自然科学基金项目（61272373，61202254，71303031）；辽宁省自然科学基金项目（201602195，DC201502030202）；中央高校自主科研基金项目（DC13010313，DC201502030202）；辽宁省博士科研启动基金项目（201601084） 第一作者简介: 于玉海(1980-), 讲师, 2018年大连理工大学在读计算机软件与理论专业博士研究生, 主要研究方向为卷积神经网络和生物医学图像处理。E-mail:yuyh@dlnu.edu.cn. 中图法分类号: TP391.6 文献标识码: A 文章编号: 1006-8961(2018)06-0917-00

关键词

Classification modeling and recognition for cross modal and multi-label biomedical image
Yu Yuhai1, Lin Hongfei1, Meng Jiana2, Guo Hai2, Zhao Zhehuan1
1. School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China;
2. School of Computer Science & Engineering, Dalian 116600, China
Supported by: National Natural Science Foundation of China(61272373, 61202254, 71303031);Natural Science Foundation of Liaoning Province, China(201602195, DC201502030202)

Abstract

Objective The amount of biomedical literature in electronic format has increased considerably with the development of the Internet. PubMed comprises more than 27 million citations for biomedical literature linking to full-text content from PubMed Central and publisher web sites. The figures in these biomedical studies can be retrieved through tools along with the full text. However, the lack of associated metadata, apart from the captions, hinders the fulfillment of richer information requirements of biomedical researchers and educators. The modality of a figure is an extremely useful type of metadata. Therefore, biomedical modality classification is an important primary step that can aid users to access required biomedical images and further improve the performance of the literature retrieval system. Many images in the biomedical literature (more than 40%) are compound figures including several subfigures with various biomedical modalities, such as computerized tomography, X-ray, or generic biomedical illustrations. The subfigures in one compound figure may describe one medical problem in several views and have strong semantic correlation with each other. Thus, these figures are valuable to biomedical research and education. The standard approach to modality recognition from biomedical compound figure first detects whether the figure is compound or not. If it is compound, then a figure separation algorithm is first invoked to split it into its constituent subfigures. Then, another multi-class classifier is used to predict the modality of each subfigure. Nevertheless, the figure separation algorithms are not perfect, and the errors in figure separation propagate to the multi-class model for modality classification. Recently, some multi-label learning models use pre-trained convolutional neural networks to extract high-level features to recognize the image modalities from the compound figures. These deep learning methods learn more expressive representations of image data. However, convolutional neural networks may be hindered to disentangle the factors of variation by the limited samples with high variability and the imbalanced label distribution of training data. A new cross-modal multi-label classification model using convolutional neural networks based on hybrid transfer learning is presented to learn biomedical modality information from the compound figure without separating it into subfigures. Method An end-to-end training and multi-label classification method, which does not require additional classifiers, is proposed. Building two convolutional neural networks enables to learn the components of an image without learning from single separated subfigure that represents the image modalities, but from labeled compound figures and their captions. The proposed cross-modal model learns general domain features from large-scale nature images and more special biomedical domain features from the simple figures and their captions in biomedical literature, leveraging techniques of heterogeneous and homogeneous transfer learning. Specifically, the proposed visual convolutional neural network (CNN) is pre-trained on a large auxiliary dataset, which contains approximately 1.2 million labeled training images of 1000 classes. Then, the top layer of the deep CNN is trained from scratch on single-label simple biomedical figures to achieve homogeneous transfer learning. The key point of such transfer learning is fine-tuning the pre-trained deep visual models on the current multi-label compound figure dataset. The architecture of the deep visual models should be changed slightly and then they could be fine-tuned on the current dataset. On the other hand, the weights of the embedding layer are initialized by the word vectors, which are pre-trained on captions extracted from 300 000 biomedical articles in PubMed, and are updated while training the networks. Similar to the homogeneous transfer learning strategy of visual model, the proposed textual convolutional neural networks are first pre-trained on the captions of the simple biomedical figures. Then, the pre-trained textual model is fine-tuned on current multi-label compound figures to capture more biomedical features. Finally, cross-modal multi-label learning model combines outputs of the visual and textual models to predict labels using multi-stage fusion strategy. Result The proposed cross-modal multi-label classification model based on hybrid transfer learning is evaluated on the dataset of the multi-label classification task in ImageCLEF2016. Our approach is evaluated based on multi-label classification Hamming Loss and Macro F1 Score, according to the evaluation criterion of the benchmark. The two comparative models learn multi-label information only from visual content. They pre-train AlexNet on large-scale nature images. Then, the DeCAF features are extracted from the pre-trained AlexNet and fed into the SVM classifier with a linear kernel. One comparative model predicts modalities by the highest score of SVM and the other model predicts by the highest posterior probability. The visual model achieves 33.9% lower Hamming Loss and 100.3% higher Macro F1 Score by introducing homogeneous transfer learning technique, and the textual model efficiently improves the performance in the two metrics. Thus, the proposed cross-modal model can achieve similar Hamming Loss of 0.0157 with the state-of-the-art model and obtain 52.5% higher Macro F1 Score, which is increased from 0.320 to 0.488. Conclusion A new method to extract biomedical modalities from the compound figures is proposed. The proposed models obtain more competitive results than the other reported methods in the literature. The proposed cross-modal model exhibits acceptable generalization capability and could achieve higher performance. The results imply that the homogeneous transfer learning method can aid deep convolutional neural networks (DCNNs) to capture a larger number of biomedical domain features and improve the performance of multi-label classification. The proposed cross-modal model addresses the problems of overfitting and imbalanced dataset and effectively recognizes modalities from biomedical compound figures based on visual content and textual information. In the future, building DCNNs and training networks with new techniques could further improve the proposed method.

Key words

multi-label learning; convolutional neural network; transfer learning; medical image; deep learning

2.1 多标签分类

 $\begin{array}{*{20}{c}} {\mathit{\boldsymbol{T}} = \left\{ {\left( {{x_1},{\mathit{\boldsymbol{Y}}_1}} \right),\left( {{x_2},{\mathit{\boldsymbol{Y}}_2}} \right), \cdots ,} \right.}\\ {\left. {\left( {{x_n},{\mathit{\boldsymbol{Y}}_n}} \right)} \right\}\left( {{x_i} \in \mathit{\boldsymbol{X}},{Y_i} \subseteq \mathit{\boldsymbol{L}}} \right)} \end{array}$ (1)

 $\mathit{\boldsymbol{h}}\left( {{x_i}} \right) = \left\{ {y\left| {f\left( {{x_i},y} \right) > t,y \in \mathit{\boldsymbol{L}}} \right.} \right\}$ (2)

$t$ 可以是常量(例如0.5)，或者是从训练集中推理阈值的函数，它能将标签空间对分为相关和不相关标签集。

2.2.1 图像模型

 $Sigmoid\left( x \right) = \frac{1}{{1 + \exp \left( { - x} \right)}}$ (3)

$\mathit{\boldsymbol{X}}$ 表示训练集中的 $n$ 个样本，使用Admax优化器控制学习速率，通过小批量为32个随机图像的方式训练模型，以迭代方式更新权重 $\mathit{\boldsymbol{w}}$ ，最小化损失函数：

 $L\left( {\mathit{\boldsymbol{w}},\mathit{\boldsymbol{X}}} \right) = \frac{1}{n}\sum\limits_{i = 1}^n {l\left( {f\left( {{x_i},\mathit{\boldsymbol{w}}} \right),{{\mathit{\boldsymbol{y'}}}_i}} \right)}$ (4)

 $\begin{array}{*{20}{c}} {l\left( {{\mathit{\boldsymbol{y}}_i},{{\mathit{\boldsymbol{y'}}}_i}} \right) = }\\ { - \sum\limits_{j = 1}^q {\left( {{{y'}_{ij}}\log {y_{ij}} + \left( {1 - {{y'}_{ij}}} \right)\log \left( {1 - {y_{ij}}} \right)} \right)} } \end{array}$ (5)

2.3 跨模态标签标定算法

 $y = \left\{ \begin{array}{l} 1\;\;\;\;\;{p_j} \ge t\\ 0\;\;\;\;{p_i} < t \end{array} \right.$ (6)

 $t = \arg \mathop {\min }\limits_t \left| {LCard\left( \mathit{\boldsymbol{X}} \right) - \left( {\frac{1}{m}\sum\limits_{i = 1}^m {\sum\limits_{j = 1}^q {{1_{{p_j} > t}}} } } \right)} \right|$ (7)

 $LCard\left( \mathit{\boldsymbol{X}} \right) = \frac{1}{n}\sum\limits_{i = 1}^n {\left| {{\mathit{\boldsymbol{Y}}_i}} \right|}$ (8)

3.1 数据集

Table 1 Thirty class codes of multi-label classification

 类代码 类名称 DRUS 超声波影像 DRMR 核磁共振影像 DRCT 计算机化断层显像 DRXR X光照相术 DRAN 血管造影术 DRPE 正电子发射计算机断层显像 DRCO 联合多种模式影像叠加图 DVDM 皮肤病影像 DVEN 内窥镜显像 DVOR 其他器官的影像 DSEE 脑电图 DSEC 心电图 DSEM 肌电图 DMLI 光学显微镜成像 DMEL 电子显微镜成像 DMTR 透射显微镜成像 DMFL 荧光显微镜成像 D3DR 三维重构图 GTAB 表格 GPLI 程序列表 GFIG 统计图表 GSCR 屏幕截图 GFLO 流程图 GSYS 系统概图 GGEN 基因序列图 GGEL 凝胶色谱 GCHE 化学结构图 GMAT 数学公式 GNCP 非临床照片 GHDR 手绘草图

3.4 评价指标

 $\begin{array}{*{20}{c}} {\mathit{\boldsymbol{S}} = \left\{ {\left( {{x_1},{\mathit{\boldsymbol{Y}}_1}} \right),\left( {{x_2},{\mathit{\boldsymbol{Y}}_2}} \right), \cdots ,} \right.}\\ {\left. {\left( {{x_m},{\mathit{\boldsymbol{Y}}_m}} \right)} \right\}\left( {{x_i} \in \mathit{\boldsymbol{X}},{\mathit{\boldsymbol{Y}}_i} \subseteq \mathit{\boldsymbol{L}}} \right)} \end{array}$ (9)

 $hloss\left( h \right) = \frac{1}{m}\sum\limits_{i = 1}^m {\frac{{\left| {h\left( {{x_i}} \right)\Delta {\mathit{\boldsymbol{Y}}_i}} \right|}}{{\left| \mathit{\boldsymbol{L}} \right|}}}$ (10)

 $p\left( h \right) = \frac{1}{m}\sum\limits_{i = 1}^m {\frac{{\left| {{\mathit{\boldsymbol{Y}}_i} \cap h\left( {{x_i}} \right)} \right|}}{{\left| {h\left( {{x_i}} \right)} \right|}}}$ (11)

 $r\left( h \right) = \frac{1}{m}\sum\limits_{i = 1}^m {\frac{{\left| {{\mathit{\boldsymbol{Y}}_i} \cap h\left( {{x_i}} \right)} \right|}}{{\left| {{\mathit{\boldsymbol{Y}}_i}} \right|}}}$ (12)

 $F1\left( h \right) = 2 \times \frac{{p\left( h \right) \times r\left( h \right)}}{{p\left( h \right) + r\left( h \right)}}$ (13)

 $F{1_{Macro}} = \frac{1}{q}\sum\limits_{i = 1}^q {F{1_i}} ,{y_i} \in \mathit{\boldsymbol{L}}$ (14)

3.5.1 多标签分类算法性能比较

Table 2 Results of multi-label classification methods in ImageCLEF2016

 方法 10FCV 测试 H-Loss $F{1_{{\rm{Macro}}}}$ H-Loss $F{1_{{\rm{Macro}}}}$ BMET MLC1[11] - - 0.013 1 0.295 BMET MLC2[11] - - 0.013 5 0.320 Hetero_TL_V 0.028 1 0.171 0.024 2 0.237 Hybrid_TL_V 0.022 4 0.316 0.016 0 0.482 No_TL_T 0.036 5 0.082 0.036 4 0.024 Homo_TL_T 0.032 9 0.117 0.023 9 0.185 Hybrid_TL_Cross-Modal 0.022 4 0.333 0.015 7 0.488

3.5.3 跨模态标签标定

Table 3 Comparison of threshold calibration methods

 方法 10FCV 测试 H-Loss $F{1_{{\rm{Macro}}}}$ H-Loss $F{1_{{\rm{Macro}}}}$ Minimizing_LCard 0.026 7 0.348 0.020 6 0.477 Threshold_0.5 0.022 6 0.326 0.016 1 0.470 Highest_Probability 0.022 6 0.287 0.015 0 0.438 TH_0.5 0.022 4 0.333 0.015 7 0.488

参考文献

• [1] Lu Z Y. PubMed and beyond:a survey of web tools for searching biomedical literature[J]. Database, 2011, 2011: baq036. [DOI:10.1093/database/baq036]
• [2] De Herrera A G S, Kalpathy-Cramer J, Fushman D D, et al. Overview of the ImageCLEF 2013 medical tasks[C]//Working Notes of CLEF 2013 Conference. Valencia, Spain: CEUR-WS, 2013: 1-15.
• [3] De Herrera A G S, Schaer R, Bromuri S, et al. Overview of the ImageCLEF 2016 medical task[C]//Working Notes of CLEF 2016 Conference. Évora, Portugal: CEUR-WS, 2016: 219-232.
• [4] Santosh K C, Xue Z Y, Antani S K, et al. NLM at ImageCLEF2015: biomedical multipanel figure separation[C]//Working Notes of CLEF 2015 Conference. Toulouse, France: CEUR-WS, 2015: 1-8.
• [5] Li P Y, Sorensen S, Kolagunda A, et al. UDEL CIS working notes in ImageCLEF 2016[C]//Working Notes of CLEF 2016 Conference. Évora, Portugal: CEUR-WS, 2016: 334-346.
• [6] Santosh K C, Aafaque A, Antani S, et al. Line segment-based stitched multipanel figure separation for effective biomedical CBIR[J]. International Journal of Pattern Recognition and Artificial Intelligence, 2017, 31(6): 1757003. [DOI:10.1142/S0218001417570038]
• [7] Pelka O, Friedrich C M. FHDO biomedical computer science group at medical classification task of ImageCLEF 2015[C]//Working Notes of CLEF 2015 Conference. Toulouse, France: CEUR-WS, 2015: 1-14.
• [8] Koitka S, Friedrich C M. Traditional feature engineering and deep learning approaches at medical classification task of ImageCLEF 2016[C]//Working Notes of CLEF 2016 Conference. Évora, Portugal: CEUR-WS, 2016: 304-317.
• [9] Kumar A, Kim J, Lyndon D, et al. An ensemble of fine-tuned convolutional neural networks for medical image classification[J]. IEEE Journal of Biomedical and Health Informatics, 2017, 21(1): 31–40. [DOI:10.1109/JBHI.2016.2635663]
• [10] De Herrera A G S, Müller H, Bromuri S. Overview of the ImageCLEF 2015 medical classification task[C]//Working Notes of CLEF 2015 Conference. Toulouse, France: CEUR-WS, 2015: 1-13.
• [11] Kumar A, Lyndon D, Kim J, et al. Subfigure and multi-Label classification using a fine-tuned convolutional neural network[C]//Working Notes of CLEF 2016 Conference. Évora, Portugal: CEUR-WS, 2016: 318-321.
• [12] Russakovsky O, Deng J, Su H, et al. Imagenet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115(3): 211–252. [DOI:10.1007/s11263-015-0816-y]
• [13] Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[C]//Proceedings of Advances in Neural Information Processing Systems. Lake Tahoe, USA: NIPS, 2012: 1097-1105. [DOI:10.1145/3065386]
• [14] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[C]//Proceedings of the 2015 International Conference for Learning Representations. San Diego, USA: ICLR, 2015: 1-14.
• [15] Szegedy C, Liu W, Jia Y Q, et al. Going deeper with convolutions[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, Massachusetts, USA: IEEE, 2015: 1-9. [DOI:10.1109/cvpr.2015.7298594]
• [16] He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 770-778. [DOI:10.1109/cvpr.2016.90]
• [17] Schapire R E, Singer Y. BoosTexter:a boosting-based system for text categorization[J]. Machine Learning, 2000, 39(2-3): 135–168. [DOI:10.1023/A:1007649029923]
• [18] Zhang M L, Zhou Z H. A review on multi-label learning algorithms[J]. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(8): 1819–1837. [DOI:10.1109/TKDE.2013.39]
• [19] Wang J, Yang Y, Mao J H, et al. CNN-RNN: a unified framework for multi-label image classification[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 2285-2294. [DOI:10.1109/cvpr.2016.251]
• [20] Yang H, ZhouJ T, Zhang Y, et al. Exploit bounding box annotations for multi-label object recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 280-288. [DOI:10.1109/cvpr.2016.37]
• [21] Yu Q H, Wang J J, Zhang S Z, et al. Combining local and global hypotheses in deep neural network for multi-label image classification[J]. Neurocomputing, 2017, 235: 38–45. [DOI:10.1016/j.neucom.2016.12.051]
• [22] Read J, Pfahringer B, Holmes G, et al. Classifier chains for multi-label classification[J]. Machine Learning, 2011, 85(3): 333–359. [DOI:10.1007/s10994-011-5256-5]
• [23] Tsoumakas G, Katakis I, Vlahavas I. Random k-labelsets for multilabel classification[J]. IEEE Transactions on Knowledge and Data Engineering, 2011, 23(7): 1079–1089. [DOI:10.1109/TKDE.2010.164]
• [24] Boutell M R, Luo J B, Shen X P, et al. Learning multi-label scene classification[J]. Pattern Recognition, 2004, 37(9): 1757–1771. [DOI:10.1016/j.patcog.2004.03.009]
• [25] Zhang M L, Zhou Z H. ML-KNN:a lazy learning approach to multi-label learning[J]. Pattern Recognition, 2007, 40(7): 2038–2048. [DOI:10.1016/j.patcog.2006.12.019]
• [26] Elisseeff A, Weston J. A kernel method for multi-labelled classification[C]//Preceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic. Vancouver, British Columbia, Canada: ACM, 2001: 681-687.
• [27] Clare A, King R D. Knowledge discovery in multi-label phenotype data[C]//Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery. Freiburg, Germany: Springer, 2001: 42-53. [DOI:10.1007/3-540-44794-6_4]
• [28] Quevedo J R, Luaces O, Bahamonde A. Multilabel classifiers with a probabilistic thresholding strategy[J]. Pattern Recognition, 2012, 45(2): 876–883. [DOI:10.1016/j.patcog.2011.08.007]
• [29] Gao M C, Xu Z Y, Lu L, et al. Holistic interstitial lung disease detection using deep convolutional neural networks: multi-label learning and unordered pooling[J]. arXiv preprint arXiv: 1701. 05616, 2017.
• [30] Everingham M, Van Gool L, Williams C K I, et al. The pascal visual object classes (VOC) challenge[J]. International Journal of Computer Vision, 2010, 88(2): 303–338. [DOI:10.1007/s11263-009-0275-4]
• [31] You D, Rahman M M, Antani S, et al. Text-and content-based biomedical image modality classification[C]//Proceedings of the SPIE Volume 8674, Medical Imaging 2013: Advanced PACS-based Imaging Informatics and Therapeutic Applications. Lake Buena Vista, FL, USA: SPIE, 2013, 8674: 86740L. [DOI:10.1117/12.2007932]
• [32] Rahman M M, You D, Simpson M S, et al. Multimodal biomedical image retrieval using hierarchical classification and modality fusion[J]. International Journal of Multimedia Information Retrieval, 2013, 2(3): 159–173. [DOI:10.1007/s13735-013-0038-4]
• [33] Codella N, Connell J, Pankanti S, et al. Automated medical image modality recognition by fusion of visual and text information[C]//Proceddings of the 17th International Conference on Medical Image Computing and Computer-Assisted Intervention. Boston, MA, USA: Springer, 2014: 487-495. [DOI:10.1007/978-3-319-10470-6_61]
• [34] Yu Y H, Lin H F, Yu Q H, et al. Modality classification for medical images using multiple deep convolutional neural networks[J]. Journal of Computational Information Systems, 2015, 11(15): 5403–5413. [DOI:10.12733/jcis14859]
• [35] Cheng B B, Stanley R J, Antani S, et al. Graphical figure classification using data fusion for integrating text and image features[C]//Preceedings of the 12th International Conference on Document Analysis and Recognition. Washington, DC, USA: IEEE. 2013: 693-697. [DOI:10.1109/icdar.2013.142]
• [36] Bromuri S, Zufferey D, Hennebert J, et al. Multi-label classification of chronically ill patients with bag of words and supervised dimensionality reduction algorithms[J]. Journal of Biomedical Informatics, 2014, 51: 165–175. [DOI:10.1016/j.jbi.2014.05.010]
• [37] Yu Y H, Lin H F, Meng J N, et al. Visual and textual sentiment analysis of a microblog using deep convolutional neural networks[J]. Algorithms, 2016, 9(2): 41. [DOI:10.3390/a9020041]
• [38] Chen D, Riddle D L. Function of the PHA-4/FOXA transcription factor during $C. elegans$ post-embryonic development[J]. BMC Developmental Biology, 2008, 8: 26. [DOI:10.1186/1471-213X-8-26]
• [39] Yu Y H, Lin H F, Meng J N, et al. Assembling deep neural networks for medical compound figure detection[J]. Information, 2017, 8(2): 48. [DOI:10.3390/info8020048]
• [40] Yu Y H, Lin H F, Meng J N, et al. Deep transfer learning for modality classification of medical images[J]. Information, 2017, 8(3): 91. [DOI:10.3390/info8030091]
• [41] Tahir M A, Kittler J, Bouridane A. Multilabel classification using heterogeneous ensemble of multi-label classifiers[J]. Pattern Recognition Letters, 2012, 33(5): 513–523. [DOI:10.1016/j.patrec.2011.10.019]