1. 大连理工大学计算机科学与技术学院, 大连 116024;
2. 大连民族大学计算机科学与工程学院, 大连 116600
 收稿日期: 2017-10-26; 修回日期: 2017-12-05 基金项目: 国家自然科学基金项目（61272373，61202254，71303031）；辽宁省自然科学基金项目（201602195，DC201502030202）；中央高校自主科研基金项目（DC13010313，DC201502030202）；辽宁省博士科研启动基金项目（201601084） 第一作者简介: 于玉海(1980-), 讲师, 2018年大连理工大学在读计算机软件与理论专业博士研究生, 主要研究方向为卷积神经网络和生物医学图像处理。E-mail:yuyh@dlnu.edu.cn. 中图法分类号: TP391.6 文献标识码: A 文章编号: 1006-8961(2018)06-0917-00

关键词

Classification modeling and recognition for cross modal and multi-label biomedical image
Yu Yuhai1, Lin Hongfei1, Meng Jiana2, Guo Hai2, Zhao Zhehuan1
Abstract

Objective The amount of biomedical literature in electronic format has increased considerably with the development of the Internet. PubMed comprises more than 27 million citations for biomedical literature linking to full-text content from PubMed Central and publisher web sites. The figures in these biomedical studies can be retrieved through tools along with the full text. However, the lack of associated metadata, apart from the captions, hinders the fulfillment of richer information requirements of biomedical researchers and educators. The modality of a figure is an extremely useful type of metadata. Therefore, biomedical modality classification is an important primary step that can aid users to access required biomedical images and further improve the performance of the literature retrieval system. Many images in the biomedical literature (more than 40%) are compound figures including several subfigures with various biomedical modalities, such as computerized tomography, X-ray, or generic biomedical illustrations. The subfigures in one compound figure may describe one medical problem in several views and have strong semantic correlation with each other. Thus, these figures are valuable to biomedical research and education. The standard approach to modality recognition from biomedical compound figure first detects whether the figure is compound or not. If it is compound, then a figure separation algorithm is first invoked to split it into its constituent subfigures. Then, another multi-class classifier is used to predict the modality of each subfigure. Nevertheless, the figure separation algorithms are not perfect, and the errors in figure separation propagate to the multi-class model for modality classification. Recently, some multi-label learning models use pre-trained convolutional neural networks to extract high-level features to recognize the image modalities from the compound figures. These deep learning methods learn more expressive representations of image data. However, convolutional neural networks may be hindered to disentangle the factors of variation by the limited samples with high variability and the imbalanced label distribution of training data. A new cross-modal multi-label classification model using convolutional neural networks based on hybrid transfer learning is presented to learn biomedical modality information from the compound figure without separating it into subfigures. Method An end-to-end training and multi-label classification method, which does not require additional classifiers, is proposed. Building two convolutional neural networks enables to learn the components of an image without learning from single separated subfigure that represents the image modalities, but from labeled compound figures and their captions. The proposed cross-modal model learns general domain features from large-scale nature images and more special biomedical domain features from the simple figures and their captions in biomedical literature, leveraging techniques of heterogeneous and homogeneous transfer learning. Specifically, the proposed visual convolutional neural network (CNN) is pre-trained on a large auxiliary dataset, which contains approximately 1.2 million labeled training images of 1000 classes. Then, the top layer of the deep CNN is trained from scratch on single-label simple biomedical figures to achieve homogeneous transfer learning. The key point of such transfer learning is fine-tuning the pre-trained deep visual models on the current multi-label compound figure dataset. The architecture of the deep visual models should be changed slightly and then they could be fine-tuned on the current dataset. On the other hand, the weights of the embedding layer are initialized by the word vectors, which are pre-trained on captions extracted from 300 000 biomedical articles in PubMed, and are updated while training the networks. Similar to the homogeneous transfer learning strategy of visual model, the proposed textual convolutional neural networks are first pre-trained on the captions of the simple biomedical figures. Then, the pre-trained textual model is fine-tuned on current multi-label compound figures to capture more biomedical features. Finally, cross-modal multi-label learning model combines outputs of the visual and textual models to predict labels using multi-stage fusion strategy. Result The proposed cross-modal multi-label classification model based on hybrid transfer learning is evaluated on the dataset of the multi-label classification task in ImageCLEF2016. Our approach is evaluated based on multi-label classification Hamming Loss and Macro F1 Score, according to the evaluation criterion of the benchmark. The two comparative models learn multi-label information only from visual content. They pre-train AlexNet on large-scale nature images. Then, the DeCAF features are extracted from the pre-trained AlexNet and fed into the SVM classifier with a linear kernel. One comparative model predicts modalities by the highest score of SVM and the other model predicts by the highest posterior probability. The visual model achieves 33.9% lower Hamming Loss and 100.3% higher Macro F1 Score by introducing homogeneous transfer learning technique, and the textual model efficiently improves the performance in the two metrics. Thus, the proposed cross-modal model can achieve similar Hamming Loss of 0.0157 with the state-of-the-art model and obtain 52.5% higher Macro F1 Score, which is increased from 0.320 to 0.488. Conclusion A new method to extract biomedical modalities from the compound figures is proposed. The proposed models obtain more competitive results than the other reported methods in the literature. The proposed cross-modal model exhibits acceptable generalization capability and could achieve higher performance. The results imply that the homogeneous transfer learning method can aid deep convolutional neural networks (DCNNs) to capture a larger number of biomedical domain features and improve the performance of multi-label classification. The proposed cross-modal model addresses the problems of overfitting and imbalanced dataset and effectively recognizes modalities from biomedical compound figures based on visual content and textual information. In the future, building DCNNs and training networks with new techniques could further improve the proposed method.

Key words

multi-label learning; convolutional neural network; transfer learning; medical image; deep learning

2.1 多标签分类

 $\begin{array}{*{20}{c}} {\mathit{\boldsymbol{T}} = \left\{ {\left( {{x_1},{\mathit{\boldsymbol{Y}}_1}} \right),\left( {{x_2},{\mathit{\boldsymbol{Y}}_2}} \right), \cdots ,} \right.}\\ {\left. {\left( {{x_n},{\mathit{\boldsymbol{Y}}_n}} \right)} \right\}\left( {{x_i} \in \mathit{\boldsymbol{X}},{Y_i} \subseteq \mathit{\boldsymbol{L}}} \right)} \end{array}$ (1)

 $\mathit{\boldsymbol{h}}\left( {{x_i}} \right) = \left\{ {y\left| {f\left( {{x_i},y} \right) > t,y \in \mathit{\boldsymbol{L}}} \right.} \right\}$ (2)

$t$ 可以是常量(例如0.5)，或者是从训练集中推理阈值的函数，它能将标签空间对分为相关和不相关标签集。

2.2.1 图像模型

 $Sigmoid\left( x \right) = \frac{1}{{1 + \exp \left( { - x} \right)}}$ (3)

$\mathit{\boldsymbol{X}}$ 表示训练集中的 $n$ 个样本，使用Admax优化器控制学习速率，通过小批量为32个随机图像的方式训练模型，以迭代方式更新权重 $\mathit{\boldsymbol{w}}$ ，最小化损失函数：

 $L\left( {\mathit{\boldsymbol{w}},\mathit{\boldsymbol{X}}} \right) = \frac{1}{n}\sum\limits_{i = 1}^n {l\left( {f\left( {{x_i},\mathit{\boldsymbol{w}}} \right),{{\mathit{\boldsymbol{y'}}}_i}} \right)}$ (4)

 $\begin{array}{*{20}{c}} {l\left( {{\mathit{\boldsymbol{y}}_i},{{\mathit{\boldsymbol{y'}}}_i}} \right) = }\\ { - \sum\limits_{j = 1}^q {\left( {{{y'}_{ij}}\log {y_{ij}} + \left( {1 - {{y'}_{ij}}} \right)\log \left( {1 - {y_{ij}}} \right)} \right)} } \end{array}$ (5)

2.3 跨模态标签标定算法

 $y = \left\{ \begin{array}{l} 1\;\;\;\;\;{p_j} \ge t\\ 0\;\;\;\;{p_i} < t \end{array} \right.$ (6)

 $t = \arg \mathop {\min }\limits_t \left| {LCard\left( \mathit{\boldsymbol{X}} \right) - \left( {\frac{1}{m}\sum\limits_{i = 1}^m {\sum\limits_{j = 1}^q {{1_{{p_j} > t}}} } } \right)} \right|$ (7)

 $LCard\left( \mathit{\boldsymbol{X}} \right) = \frac{1}{n}\sum\limits_{i = 1}^n {\left| {{\mathit{\boldsymbol{Y}}_i}} \right|}$ (8)

3.1 数据集

Table 1 Thirty class codes of multi-label classification

 类代码 类名称 DRUS 超声波影像 DRMR 核磁共振影像 DRCT 计算机化断层显像 DRXR X光照相术 DRAN 血管造影术 DRPE 正电子发射计算机断层显像 DRCO 联合多种模式影像叠加图 DVDM 皮肤病影像 DVEN 内窥镜显像 DVOR 其他器官的影像 DSEE 脑电图 DSEC 心电图 DSEM 肌电图 DMLI 光学显微镜成像 DMEL 电子显微镜成像 DMTR 透射显微镜成像 DMFL 荧光显微镜成像 D3DR 三维重构图 GTAB 表格 GPLI 程序列表 GFIG 统计图表 GSCR 屏幕截图 GFLO 流程图 GSYS 系统概图 GGEN 基因序列图 GGEL 凝胶色谱 GCHE 化学结构图 GMAT 数学公式 GNCP 非临床照片 GHDR 手绘草图

3.4 评价指标

 $\begin{array}{*{20}{c}} {\mathit{\boldsymbol{S}} = \left\{ {\left( {{x_1},{\mathit{\boldsymbol{Y}}_1}} \right),\left( {{x_2},{\mathit{\boldsymbol{Y}}_2}} \right), \cdots ,} \right.}\\ {\left. {\left( {{x_m},{\mathit{\boldsymbol{Y}}_m}} \right)} \right\}\left( {{x_i} \in \mathit{\boldsymbol{X}},{\mathit{\boldsymbol{Y}}_i} \subseteq \mathit{\boldsymbol{L}}} \right)} \end{array}$ (9)

 $hloss\left( h \right) = \frac{1}{m}\sum\limits_{i = 1}^m {\frac{{\left| {h\left( {{x_i}} \right)\Delta {\mathit{\boldsymbol{Y}}_i}} \right|}}{{\left| \mathit{\boldsymbol{L}} \right|}}}$ (10)

 $p\left( h \right) = \frac{1}{m}\sum\limits_{i = 1}^m {\frac{{\left| {{\mathit{\boldsymbol{Y}}_i} \cap h\left( {{x_i}} \right)} \right|}}{{\left| {h\left( {{x_i}} \right)} \right|}}}$ (11)

 $r\left( h \right) = \frac{1}{m}\sum\limits_{i = 1}^m {\frac{{\left| {{\mathit{\boldsymbol{Y}}_i} \cap h\left( {{x_i}} \right)} \right|}}{{\left| {{\mathit{\boldsymbol{Y}}_i}} \right|}}}$ (12)

 $F1\left( h \right) = 2 \times \frac{{p\left( h \right) \times r\left( h \right)}}{{p\left( h \right) + r\left( h \right)}}$ (13)

 $F{1_{Macro}} = \frac{1}{q}\sum\limits_{i = 1}^q {F{1_i}} ,{y_i} \in \mathit{\boldsymbol{L}}$ (14)

3.5.1 多标签分类算法性能比较

Table 2 Results of multi-label classification methods in ImageCLEF2016

 方法 10FCV 测试 H-Loss $F{1_{{\rm{Macro}}}}$ H-Loss $F{1_{{\rm{Macro}}}}$ BMET MLC1[11] - - 0.013 1 0.295 BMET MLC2[11] - - 0.013 5 0.320 Hetero_TL_V 0.028 1 0.171 0.024 2 0.237 Hybrid_TL_V 0.022 4 0.316 0.016 0 0.482 No_TL_T 0.036 5 0.082 0.036 4 0.024 Homo_TL_T 0.032 9 0.117 0.023 9 0.185 Hybrid_TL_Cross-Modal 0.022 4 0.333 0.015 7 0.488

3.5.3 跨模态标签标定

Table 3 Comparison of threshold calibration methods

 方法 10FCV 测试 H-Loss $F{1_{{\rm{Macro}}}}$ H-Loss $F{1_{{\rm{Macro}}}}$ Minimizing_LCard 0.026 7 0.348 0.020 6 0.477 Threshold_0.5 0.022 6 0.326 0.016 1 0.470 Highest_Probability 0.022 6 0.287 0.015 0 0.438 TH_0.5 0.022 4 0.333 0.015 7 0.488

